iT邦幫忙

0

【已解決】python 爬蟲網頁選取區間資料 問題

  • 分享至 

  • xImage

我要爬梳的網站連結:https://m.coa.gov.tw/Transaction/PoultryTrans/Index


我想爬梳2018年1月1日至2021年5月31日區間的資料,
但是目前print出來的值會以{xxxx}的方式呈現
(我需要的欄位是交易日期雞蛋(產地價)),
我不理解為什麼?

程式碼提供:

import requests
from bs4 import BeautifulSoup
import pandas as pd

payload = {
    'StartDate':'2018/06/26',
    'EndDate':'2021/07/26',
    'DataSource': 1,
    'NoRest':'false',
    'NowPage':1,
    'SortAction':'DESC',
    'SortField':'TradeDateTime',
    'PageSize': 20
}

url = 'https://m.coa.gov.tw/Transaction/PoultryTrans/Index'

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36'}

resp = requests.get(url, data = payload)

# 將 HTML 轉成 BeautifulSoup 物件
soup = BeautifulSoup(resp.text, 'html.parser')

table = soup.find_all("table", {"id":"searchtable"})

print(table)

目前結果:

[<table class="table table-hover row mx-0 LCGD" id="searchtable">
<thead class="w-100">
<tr class="row mx-0 LCGD_Header" id="Title2">
<th class="col-sm" data-sortname="TradeDate" id="col1" scope="col">交易日期</th>
<th class="col-sm-1" id="col2" scope="col">農曆</th>
<th class="col-sm" id="col3" scope="col">白肉雞(2.0Kg以上)</th>
<th class="col-sm" id="col4" scope="col">白肉雞(1.75-1.95Kg)</th>
<th class="col-sm" id="col5" scope="col">白肉雞(門市價高屏)</th>
<th class="col-sm" id="col6" scope="col">雞蛋(產地)</th>
<th class="col-sm mobile-none" id="col7" scope="col" style="display:none"> </th>
<th class="col-sm mobile-none" id="col8" scope="col" style="display:none"> </th>
<th class="col-sm mobile-none" id="col9" scope="col" style="display:none"> </th>
<th class="col-sm mobile-none" id="col10" scope="col" style="display:none"> </th>
<th class="col-sm mobile-none" id="col11" scope="col" style="display:none"> </th>
</tr>
</thead>
<tbody class="w-100">
<tr class="row mx-0 LCGD_Template" id="Result">
<td class="col-sm" scope="row"><span class="d-inline-block d-sm-none">交易日期:</span><a class="btn-detail" data-tradedate="{TradeDate}">{TradeDate}</a></td>
<td class="col-sm-1"><span class="d-inline-block d-sm-none">農曆:</span>{LunarCalendar}</td>
<td class="col-sm"><span class="d-inline-block d-sm-none dataTitle1">白肉雞(2.0Kg以上):</span>{Column1Data} <span class="price-data1"><i class="far"></i> {Column1DataGain}</span> </td>
<td class="col-sm"><span class="d-inline-block d-sm-none dataTitle2">白肉雞(1.75-2.0Kg):</span>{Column2Data} <span class="price-data2"><i class="far"></i> {Column2DataGain}</span> </td>
<td class="col-sm"><span class="d-inline-block d-sm-none dataTitle3">白肉雞(高屏門市價):</span>{Column3Data} <span class="price-data3"><i class="far"></i> {Column3DataGain}</span> </td>
<td class="col-sm td-border" id="data4">
<span class="d-inline-block d-sm-none dataTitle4">雞蛋(產地價):</span>
                        {Column4Data}
                        <span class="price-data4">
<i class="far"></i>
                            {Column4DataGain}
                        </span>
</td>
<td class="col-sm" id="data5" style="display:none">
<span class="d-inline-block d-sm-none dataTitle5"></span>
                        {Column5Data}
                        <span class="price-data5">
<i class="far"></i>
                            {Column5DataGain}
                        </span>
</td>
<td class="col-sm" id="data6" style="display:none">
<span class="d-inline-block d-sm-none dataTitle6"></span>
                        {Column6Data}
                        <span class="price-data6">
<i class="far"></i>
                            {Column6DataGain}
                        </span>
</td>
<td class="col-sm" id="data7" style="display:none">
<span class="d-inline-block d-sm-none dataTitle7"></span>
                        {Column7Data}
                        <span class="price-data7">
<i class="far"></i>
                            {Column7DataGain}
                        </span>
</td>
<td class="col-sm" id="data8" style="display:none">
<span class="d-inline-block d-sm-none dataTitle8"></span>
                        {Column8Data}
                        <span class="price-data8">
<i class="far"></i>
                            {Column8DataGain}
                        </span>
</td>
<td class="col-sm td-border" id="data9" style="display:none">
<span class="d-inline-block d-sm-none dataTitle9"></span>
                        {Column9Data}
                        <span class="price-data9">
<i class="far"></i>
                            {Column9DataGain}
                        </span>
</td>
</tr>
</tbody>
</table>]

目前我先以page 1的資料撰寫,
後續再拉長日期區間。

https://www.learncodewithmike.com/2020/10/scraping-ajax-websites-using-python.html
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

1 個回答

0
jiatool
iT邦研究生 2 級 ‧ 2021-07-27 09:41:21

透過"開發人員工具"可以發現,這部分的資料是後來動態載入的。
https://ithelp.ithome.com.tw/upload/images/20210727/20139617I2f4euARxk.png

它 POST 去請求其他網址取得資料:
https://ithelp.ithome.com.tw/upload/images/20210727/20139617UX01nzh2Fv.png

可以參考我寫的其他文章:https://blog.jiatool.com/posts/gamer_commend_spider/

我要發表回答

立即登入回答