At first, because webs amount is very large, and the depth of the website is not the first thing that we need to concern about. So we can use BFS algo to crawl the web
algo: BFS
We need to use muti service to crawl the web, we can use a queue service like kafka to help us mantain the taksk
And we need to have a list for the urls we need to start with it
init: list of urls
algo: BFS
queue service: kafka
We can use the nosql db to record all result of crawl ans s3 to save the copy of page.
and use hash function to restore the url and page
init: list of urls
algo: BFS
queue servie: kafka
database: noSql
storage: S3
data schema: url/update_time/...some data for seo/s3_key/hash_content/
algo: BFS
Queue of URL's to crawl -> page downloader -> URL extraction -> URL filter/stoplist
distributed storage (key, value)
Queue of URL's to crawl -> linked list / can backup on service and use hash to backup on different bucket
Page downloader -> we can put the url back to the queue
edge case: no limit for-loop website, the page need to login, the link is dynamoic render
Consider the function of component first, not too fast to say which service we can use.
先維持這樣每天看一題system design順便刷刷題的模式好了
剛到新公司上班 一邊還要練這些...好想吐喔
Q: https://leetcode.com/problems/excel-sheet-column-title/description/
class Solution {
public String convertToTitle(int columnNumber) {
StringBuilder sb = new StringBuilder();
while (columnNumber > 0) {
columnNumber--;
int num = columnNumber % 26;
sb.insert(0, (char) ('A' + num));
columnNumber /= 26;
}
return sb.toString();
}
}
Q: https://leetcode.com/problems/house-robber/description/
/**
f(n) n house robber max value
f(0) = nums[0];
f(1) = Math.max(nums[0], nums[1]);
f(n) = Math.max(f(n-1), f(n-2) + nums[n]);
*/
class Solution {
public int rob(int[] nums) {
if (nums.length == 1) {
return nums[0];
}
int n = nums.length;
int[] dp = new int[n];
dp[0] = nums[0];
dp[1] = Math.max(nums[0], nums[1]);
for (int i = 2; i < n;i++) {
dp[i] = Math.max(dp[i-1], dp[i-2] + nums[i]);
}
return dp[n-1];
}
}
Q: https://leetcode.com/problems/longest-mountain-in-array/description/
class Solution {
public int longestMountain(int[] arr) {
int n = arr.length;
int leftMax[] = new int[n];
int rightMax[] = new int[n];
Arrays.fill(leftMax, 1);
Arrays.fill(rightMax, 1);
for(int i = 1;i < n; i++) {
if (arr[i - 1] < arr[i]) {
leftMax[i] = leftMax[i - 1] + 1;
}
}
for (int i = n - 2;i >= 0;i--) {
if (arr[i + 1] < arr[i]) {
rightMax[i] = rightMax[i+1] + 1;
}
}
int max = 0;
for(int i = 0;i < n;i++) {
if (leftMax[i] > 1 && rightMax[i] > 1) {
max = Math.max(max, leftMax[i] + rightMax[i] - 1);
}
}
return max;
}
}