延續昨天文章,今天做一個範例: "讀取指定鐵人賽文章" 並提供API
效果圖:
程式邏輯內容
詳細實作內容可以參考我的文章17.抓取用戶IT鐵人賽文章,這邊不細講。
這邊主要強調azure function可以簡單做出一個API,並且提供在公開網域給需要的人呼叫、使用。
程式內容:
#r "Newtonsoft.Json"
using System.Net;
using Microsoft.AspNetCore.Mvc;
using AngleSharp;
using AngleSharp.Parser.Html;
using Newtonsoft.Json;
public static async Task<IActionResult> Run(HttpRequest req, ILogger log)
{
string userid = req.Query["userid"];
string articleid = req.Query["articleid"];
if(userid==null||articleid==null)
return new BadRequestObjectResult("資料有缺");
var itironman = ITIronManSyncPostService.GetITIronManPosts($"https://ithelp.ithome.com.tw/users/{userid}/ironman/{articleid}");
var posts = itironman.Result.Posts;
return (ActionResult)new OkObjectResult(JsonConvert.SerializeObject(posts, Formatting.Indented));
}
public class ITIronManSyncPostService
{
private static readonly HtmlParser _parser = new HtmlParser();
public IList<Post> Posts { get; set; } = new List<Post>();
private string _url { get; set; }
public async static Task<ITIronManSyncPostService> GetITIronManPosts(string url)
{
var itironman = new ITIronManSyncPostService();
itironman._url = url;
await itironman.ExecuteAsync();
return itironman;
}
private async Task ExecuteAsync()
{
//因為IT鐵人賽只需要三十篇文章,每頁10篇文章,抓取頁數取4頁就好
for (int i = 1; i < 4; i++)
await GetITIronManPostsAsync(_url + $"?page={i}");
}
private async Task GetITIronManPostsAsync(string url)
{
var htmlContent = (await GetAsync(url));
var document = _parser.Parse(htmlContent);
//獲取鐵人賽主題
var article = document.QuerySelector(".qa-list__title--ironman");
article.RemoveChild(article.QuerySelector("span"));/*移除系列文字*/
var articleText = article.TextContent.Trim();
//獲取鐵人賽:發布日期、標題、內容、連結
var allpost = document.QuerySelectorAll(".profile-list__content");
foreach (var postInfo in allpost)
{
var post = new Post();
var titleAndLinkDom = postInfo.QuerySelector(".qa-list__title>a");
post.Title = titleAndLinkDom.InnerHtml.Trim();
post.link = titleAndLinkDom.GetAttribute("href").Trim();
post.PubDate = DateTime.Parse(postInfo.QuerySelector(".qa-list__info-time").GetAttribute("title").Trim());
post.Article = articleText;
Posts.Add(post);
}
}
private async Task<string> GetPostContentAsync(string posturl)
{
var htmlContent = (await GetAsync(posturl));
var document = _parser.Parse(htmlContent);
return document.QuerySelectorAll(".markdown__style").FirstOrDefault().InnerHtml;
}
public async Task<string> GetAsync(string uri)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
using (HttpWebResponse response = (HttpWebResponse)await request.GetResponseAsync())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
return await reader.ReadToEndAsync();
}
}
public class Post
{
public string Title { get; set; }
public string link { get; set; }
public string Content { get; set; }
public string Article { get; set; }
public DateTime PubDate { get; set; }
}
}
今天一直卡關在Azure Function上面安裝NuGet Lib,參考了How can I use NuGet packages in my Azure Functions?文章,建立了Project.json
後Azure還是沒有安裝Lib,最後爬文爬文...在這篇找到答案Can't use Nuget package in C# Azure function
原因是Project.json
是Azure Function V1版本的作法,我選擇的版本是V2,需要使用function.proj
,可以在"函數應用程式設定"下查function版本
V1版本舉例:
想要安裝AngleSharp Lib,打開線上編輯器app service編輯器
接著在function資料夾下面建立一個Project.json
,輸入想要安裝的NuGet Lib名稱、版本:
{
"frameworks": {
"net46": {
"dependencies": {
"AngleSharp": "0.9.10"
}
}
}
}
V2版本:
在function資料夾下面建立一個function.proj
,輸入想要安裝的NuGet Lib名稱、版本:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="AngleSharp " Version="0.9.10"/>
</ItemGroup>
</Project>
在app線上編輯器保存後,等待一段時間讓azure下載、安裝NuGet Lib,之後看到project.assets.json
就代表安裝好了。檔案裡可以看到下載的依賴訊息:
因為Azure Function可以選擇流量來當收費標準
假如遇上DDOS那價格就爆炸了,要避免此情況,可以在應用程式設定
選擇每天使用量配額
azure function httptrigger像是azure web app該有的功能它也有,只是httptrigger更強調
run.csx
負責程式邏輯,function.json
負責專案設定。假如大家有興趣,可以動手做一個來測試玩玩看 :D