Q: do we talk about a search engine for the entrie web, like Google or just some inttanet tool?
A: Google!
Q: Can we focus on the problem of generating reasonable search results at massive result?
A: how do you define the reasonable?
Q: We can return user a page and check if the user click the result.
about the alg, I know in the past, google use pagelink as the core alg for the search engine
alg: use the pagelink, the page, more than more pages include, is more important page
and nowadays, the ML tech is popular, we can use ML to add differenct tag about a page, and detect what intent that user is searching
alg: pagelink, use ML add tag for each page and detect the user intent
We need to use some alg to block the spam website, for example the website have too much link, and don't have useful info.
alg: pagelink, use ML add tag for each page and detect the user intent
block spam sebstie
Then we can talk about the system flow, we need a massive scale database to store it, absotlutely noSql database, and use ML service cluster to update each tag in a cycle
Databease(Nosql) <- ML service (cycle to do it)
And for the hot pot search result we can use cahce to record the result
redis -> database <- ML service
Let's start to think about what the user flow.
user may input the keyworld and other user info(e.g. account / location / lanaguage...etc)
user input -> detect service (to divide the user intent/ user tag) -> cache -> search service -> database <- ML service
At first, we can talk about the alg to detect which the keyword, the basice search alg is TF-IDF
alg: TF-IDF
use the term between the term in this website between the whole website, can know how special of it.
Term frequency/ document frequency
But the document frequency is a difficult problem, we can think about if there is other idea to resolve it.
We can start on what's the result that we need to display to user
Inverted index
keyword -> sorted list of documents
e.g. pagerank
backlinks
terms in the doc
their position
font size /headings /etc.(formatting)
titles
then we need to start to design how to extract these data
Web repository -> indexer -> index <-------
| |
Backlinks -> Url normailzer --->links->pagerank algo
|
_> scoring, sorting,ranking -->inverted index
As a typical problem, I need to think start user flow and the algo detail.
Q: https://leetcode.com/problems/burst-balloons/description/
/**
dp[i][j] = max(dp[i][j], nums[i - 1] * nums[k] * nums[j + 1] + dp[i][k - 1] + dp[k + 1][j])
i <= k <= j
*/
class Solution {
public int maxCoins(int[] nums) {
int n = nums.length + 2;
int[] newNums = new int[n];
for (int i = 0;i < nums.length;i++) {
newNums[i+1] = nums[i];
}
newNums[0] = 1;
newNums[n - 1] = 1;
int[][] dp = new int[n][n];
return dp(dp, newNums, 1, n - 2);
}
public int dp(int[][] memo, int[] nums, int left, int right) {
if (right - left < 0) {
return 0;
}
if (memo[left][right] > 0) {
return memo[left][right];
}
int result = 0;
for (int i = left; i <= right; i++) {
int gain = nums[left - 1] * nums[i] * nums[right + 1];
int remaining = dp(memo, nums, left, i - 1) + dp(memo, nums, i + 1, right);
result = Math.max(result, remaining + gain);
}
memo[left][right] = result;
return result;
}
}
DP is too hard...