iT邦幫忙

2023 iThome 鐵人賽

DAY 11
0

Design a search engine

check requirement

Q: do we talk about a search engine for the entrie web, like Google or just some inttanet tool?
A: Google!
Q: Can we focus on the problem of generating reasonable search results at massive result?
A: how do you define the reasonable?
Q: We can return user a page and check if the user click the result.

Try to do my self

about the alg, I know in the past, google use pagelink as the core alg for the search engine

alg: use the pagelink, the page, more than more pages include, is more important page

and nowadays, the ML tech is popular, we can use ML to add differenct tag about a page, and detect what intent that user is searching

alg: pagelink, use ML add tag for each page and detect the user intent

We need to use some alg to block the spam website, for example the website have too much link, and don't have useful info.

alg: pagelink, use ML add tag for each page and detect the user intent
 block spam sebstie

Then we can talk about the system flow, we need a massive scale database to store it, absotlutely noSql database, and use ML service cluster to update each tag in a cycle

Databease(Nosql) <- ML service (cycle to do it)

And for the hot pot search result we can use cahce to record the result

 redis -> database <- ML service

Let's start to think about what the user flow.
user may input the keyworld and other user info(e.g. account / location / lanaguage...etc)

user input -> detect service (to divide the user intent/ user tag) -> cache -> search service -> database <- ML service

course example

At first, we can talk about the alg to detect which the keyword, the basice search alg is TF-IDF

alg: TF-IDF
use the term between the term in this website between the whole website, can know how special of it.

Term frequency/ document frequency

But the document frequency is a difficult problem, we can think about if there is other idea to resolve it.

We can start on what's the result that we need to display to user

Inverted index
keyword -> sorted list of documents
e.g. pagerank
backlinks
terms in the doc
their position
font size /headings /etc.(formatting)
titles

then we need to start to design how to extract these data

Web repository -> indexer -> index    <-------
                       |                     |
                  Backlinks -> Url normailzer --->links->pagerank algo
                                    |
                                    _> scoring, sorting,ranking -->inverted index

need to improve

As a typical problem, I need to think start user flow and the algo detail.

Burst Balloons

Q: https://leetcode.com/problems/burst-balloons/description/

/**
    dp[i][j] = max(dp[i][j], nums[i - 1] * nums[k] * nums[j + 1] + dp[i][k - 1] + dp[k + 1][j]) 
    i <= k <= j
 */

class Solution {
    public int maxCoins(int[] nums) {
        int n = nums.length + 2;
        int[] newNums = new int[n];
        for (int i = 0;i < nums.length;i++) {
            newNums[i+1] = nums[i];
        }
        newNums[0] = 1;
        newNums[n - 1] = 1;
        int[][] dp = new int[n][n];
        return dp(dp, newNums, 1, n - 2);
    }

    public int dp(int[][] memo, int[] nums, int left, int right) {
        if (right - left < 0) {
            return 0;
        }
        if (memo[left][right] > 0) {
            return memo[left][right];
        }
        int result = 0;
        for (int i = left; i <= right; i++) {
            int gain = nums[left - 1] * nums[i] * nums[right + 1];
            int remaining = dp(memo, nums, left, i - 1) + dp(memo, nums, i + 1, right);
            result = Math.max(result, remaining + gain);
        }
        memo[left][right] = result;
        return result;
    }
}

DP is too hard...


上一篇
09/10
下一篇
09/12
系列文
30天準備google面試30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言