iT邦幫忙

0

使用htmlagilitypack c#做爬蟲沒有正確爬到我要的東西

  • 分享至 

  • xImage

我的程式無法正確網路爬蟲爬到第三層子項目,我使用htmlagilitypack c#做爬蟲,可以正確顯示到第二層,但是第三層無法正確顯示,第三層變成顯示所有項目,我的需求是要顯示這個子分類底下的項目就好了,請問我的程式到底哪裡錯了呢?我要爬的網站是這個網址:
https://assist.nat.gov.tw/wSite/qp?ctNode=150&mp=2&xslPage=qps

出問題的程式如下:

HtmlNodeCollection thirdCateNode = doc.DocumentNode.SelectNodes(String.Format("//form/div[1]/div[{0}]/div", i));
foreach (HtmlNode node3 in thirdCateNode)
{
     //Console.WriteLine(node.InnerText.Trim() + "," + node2.InnerText.Trim() +","+node3.InnerText.Trim());
      j++;

 HtmlNodeCollection thirdCate = doc.DocumentNode.SelectNodes(String.Format("//form/div[1]/div[{0}]/div[{1}]/div[2]/label", i, j));
   foreach (HtmlNode node4 in thirdCate)
   {
      Console.WriteLine(node.InnerText.Trim() + "," + node2.InnerText.Trim() + "," + node4.InnerText.Trim());
   }
  }
 j = 0;

就是錯在顯示第j個的部分錯了,有誰看的出來錯在哪裡嗎?

完整程式如下:

static void Main(string[] args)
        {
            string url = "https://assist.nat.gov.tw/wSite/qp?ctNode=150&mp=2&xslPage=qps";

            HtmlWeb web = new HtmlWeb();
            HtmlDocument doc = web.Load(url);

            HtmlNodeCollection fisrtCate = doc.DocumentNode.SelectNodes("//form/div[1]/h3");
            int i = 0, j = 0;

            foreach (HtmlNode node in fisrtCate)
            {
                //Console.WriteLine(node.InnerText.Trim());
                i++;
                HtmlNodeCollection secondCate = doc.DocumentNode.SelectNodes(String.Format("//form/div[1]/div[{0}]/div/div[1]", i));
                foreach (HtmlNode node2 in secondCate)
                {
                    if (i == 1)
                    {
                        //Console.WriteLine(node.InnerText.Trim() + "," + node2.InnerText.Trim());

                        HtmlNodeCollection thirdCateNode = doc.DocumentNode.SelectNodes(String.Format("//form/div[1]/div[{0}]/div", i));
                        foreach (HtmlNode node3 in thirdCateNode)
                        {
                            //Console.WriteLine(node.InnerText.Trim() + "," + node2.InnerText.Trim() +","+node3.InnerText.Trim());
                            j++;

                            HtmlNodeCollection thirdCate = doc.DocumentNode.SelectNodes(String.Format("//form/div[1]/div[{0}]/div[{1}]/div[2]/label", i, j));
                            foreach (HtmlNode node4 in thirdCate)
                            {
                                Console.WriteLine(node.InnerText.Trim() + "," + node2.InnerText.Trim() + "," + node4.InnerText.Trim());
                            }
                        }
                        j = 0;
                    }

                    else
                    {
                        HtmlNodeCollection smallnodes = doc.DocumentNode.SelectNodes(String.Format("//form/div[1]/div[{0}]/div/div/label", i));
                        foreach (HtmlNode smallnode in smallnodes)
                        {
                            Console.WriteLine(node.InnerText.Trim() + "," + smallnode.InnerText.Trim());
                        }
                    }
                }
            }





            Console.WriteLine("----------------------------------");

            HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//form/div[1]/div/div/div/label");

            foreach (HtmlNode node in nodes)
            {
                Console.WriteLine(node.InnerText.Trim());
            }


            Console.WriteLine("-------------Page 2-------------");

            url = "https://assist.nat.gov.tw/wSite/lp?ctNode=150&q_energySavingType=1&nowPage=1&pagesize=100";

            doc = web.Load(url);

            HtmlNodeCollection page2 = doc.DocumentNode.SelectNodes("//table/tbody[1]/tr[1]/td");
            foreach (HtmlNode node in page2)
            {
                Console.WriteLine(node.InnerText.Trim());
            }


        }

有誰可以幫忙我?感激不盡

圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

1 個回答

0
blanksoul12
iT邦研究生 5 級 ‧ 2022-05-03 08:24:46

即是要查什麼? 可否舉例?

baltic iT邦新手 4 級 ‧ 2022-05-03 09:56:10 檢舉

問題已經解決了,謝謝

我要發表回答

立即登入回答