09/11

2023 iThome 鐵人賽

DAY 11

Software Development

30天準備google面試系列第 11 篇

15th鐵人賽

qqwwee2006

2023-09-11 21:40:42

322 瀏覽

分享至

Design a search engine

check requirement

Q: do we talk about a search engine for the entrie web, like Google or just some inttanet tool?
A: Google!
Q: Can we focus on the problem of generating reasonable search results at massive result?
A: how do you define the reasonable?
Q: We can return user a page and check if the user click the result.

Try to do my self

about the alg, I know in the past, google use pagelink as the core alg for the search engine

alg: use the pagelink, the page, more than more pages include, is more important page

and nowadays, the ML tech is popular, we can use ML to add differenct tag about a page, and detect what intent that user is searching

alg: pagelink, use ML add tag for each page and detect the user intent

We need to use some alg to block the spam website, for example the website have too much link, and don't have useful info.

alg: pagelink, use ML add tag for each page and detect the user intent
 block spam sebstie

Then we can talk about the system flow, we need a massive scale database to store it, absotlutely noSql database, and use ML service cluster to update each tag in a cycle

Databease(Nosql) <- ML service (cycle to do it)

And for the hot pot search result we can use cahce to record the result

 redis -> database <- ML service

Let's start to think about what the user flow.
user may input the keyworld and other user info(e.g. account / location / lanaguage...etc)

user input -> detect service (to divide the user intent/ user tag) -> cache -> search service -> database <- ML service

course example

At first, we can talk about the alg to detect which the keyword, the basice search alg is TF-IDF

alg: TF-IDF
use the term between the term in this website between the whole website, can know how special of it.

Term frequency/ document frequency

But the document frequency is a difficult problem, we can think about if there is other idea to resolve it.

We can start on what's the result that we need to display to user

Inverted index
keyword -> sorted list of documents
e.g. pagerank
backlinks
terms in the doc
their position
font size /headings /etc.(formatting)
titles

then we need to start to design how to extract these data

Web repository -> indexer -> index    <-------
                       |                     |
                  Backlinks -> Url normailzer --->links->pagerank algo
                                    |
                                    _> scoring, sorting,ranking -->inverted index

need to improve

As a typical problem, I need to think start user flow and the algo detail.

Burst Balloons

Q: https://leetcode.com/problems/burst-balloons/description/

/**
    dp[i][j] = max(dp[i][j], nums[i - 1] * nums[k] * nums[j + 1] + dp[i][k - 1] + dp[k + 1][j]) 
    i <= k <= j
 */

class Solution {
    public int maxCoins(int[] nums) {
        int n = nums.length + 2;
        int[] newNums = new int[n];
        for (int i = 0;i < nums.length;i++) {
            newNums[i+1] = nums[i];
        }
        newNums[0] = 1;
        newNums[n - 1] = 1;
        int[][] dp = new int[n][n];
        return dp(dp, newNums, 1, n - 2);
    }

    public int dp(int[][] memo, int[] nums, int left, int right) {
        if (right - left < 0) {
            return 0;
        }
        if (memo[left][right] > 0) {
            return memo[left][right];
        }
        int result = 0;
        for (int i = left; i <= right; i++) {
            int gain = nums[left - 1] * nums[i] * nums[right + 1];
            int remaining = dp(memo, nums, left, i - 1) + dp(memo, nums, i + 1, right);
            result = Math.max(result, remaining + gain);
        }
        memo[left][right] = result;
        return result;
    }
}

DP is too hard...