我要爬梳的網站連結:https://m.coa.gov.tw/Transaction/PoultryTrans/Index
我想爬梳2018年1月1日至2021年5月31日區間的資料,
但是目前print出來的值會以{xxxx}的方式呈現
(我需要的欄位是交易日期與雞蛋(產地價)),
我不理解為什麼?
程式碼提供:
import requests
from bs4 import BeautifulSoup
import pandas as pd
payload = {
'StartDate':'2018/06/26',
'EndDate':'2021/07/26',
'DataSource': 1,
'NoRest':'false',
'NowPage':1,
'SortAction':'DESC',
'SortField':'TradeDateTime',
'PageSize': 20
}
url = 'https://m.coa.gov.tw/Transaction/PoultryTrans/Index'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36'}
resp = requests.get(url, data = payload)
# 將 HTML 轉成 BeautifulSoup 物件
soup = BeautifulSoup(resp.text, 'html.parser')
table = soup.find_all("table", {"id":"searchtable"})
print(table)
目前結果:
[<table class="table table-hover row mx-0 LCGD" id="searchtable">
<thead class="w-100">
<tr class="row mx-0 LCGD_Header" id="Title2">
<th class="col-sm" data-sortname="TradeDate" id="col1" scope="col">交易日期</th>
<th class="col-sm-1" id="col2" scope="col">農曆</th>
<th class="col-sm" id="col3" scope="col">白肉雞(2.0Kg以上)</th>
<th class="col-sm" id="col4" scope="col">白肉雞(1.75-1.95Kg)</th>
<th class="col-sm" id="col5" scope="col">白肉雞(門市價高屏)</th>
<th class="col-sm" id="col6" scope="col">雞蛋(產地)</th>
<th class="col-sm mobile-none" id="col7" scope="col" style="display:none"> </th>
<th class="col-sm mobile-none" id="col8" scope="col" style="display:none"> </th>
<th class="col-sm mobile-none" id="col9" scope="col" style="display:none"> </th>
<th class="col-sm mobile-none" id="col10" scope="col" style="display:none"> </th>
<th class="col-sm mobile-none" id="col11" scope="col" style="display:none"> </th>
</tr>
</thead>
<tbody class="w-100">
<tr class="row mx-0 LCGD_Template" id="Result">
<td class="col-sm" scope="row"><span class="d-inline-block d-sm-none">交易日期:</span><a class="btn-detail" data-tradedate="{TradeDate}">{TradeDate}</a></td>
<td class="col-sm-1"><span class="d-inline-block d-sm-none">農曆:</span>{LunarCalendar}</td>
<td class="col-sm"><span class="d-inline-block d-sm-none dataTitle1">白肉雞(2.0Kg以上):</span>{Column1Data} <span class="price-data1"><i class="far"></i> {Column1DataGain}</span> </td>
<td class="col-sm"><span class="d-inline-block d-sm-none dataTitle2">白肉雞(1.75-2.0Kg):</span>{Column2Data} <span class="price-data2"><i class="far"></i> {Column2DataGain}</span> </td>
<td class="col-sm"><span class="d-inline-block d-sm-none dataTitle3">白肉雞(高屏門市價):</span>{Column3Data} <span class="price-data3"><i class="far"></i> {Column3DataGain}</span> </td>
<td class="col-sm td-border" id="data4">
<span class="d-inline-block d-sm-none dataTitle4">雞蛋(產地價):</span>
{Column4Data}
<span class="price-data4">
<i class="far"></i>
{Column4DataGain}
</span>
</td>
<td class="col-sm" id="data5" style="display:none">
<span class="d-inline-block d-sm-none dataTitle5"></span>
{Column5Data}
<span class="price-data5">
<i class="far"></i>
{Column5DataGain}
</span>
</td>
<td class="col-sm" id="data6" style="display:none">
<span class="d-inline-block d-sm-none dataTitle6"></span>
{Column6Data}
<span class="price-data6">
<i class="far"></i>
{Column6DataGain}
</span>
</td>
<td class="col-sm" id="data7" style="display:none">
<span class="d-inline-block d-sm-none dataTitle7"></span>
{Column7Data}
<span class="price-data7">
<i class="far"></i>
{Column7DataGain}
</span>
</td>
<td class="col-sm" id="data8" style="display:none">
<span class="d-inline-block d-sm-none dataTitle8"></span>
{Column8Data}
<span class="price-data8">
<i class="far"></i>
{Column8DataGain}
</span>
</td>
<td class="col-sm td-border" id="data9" style="display:none">
<span class="d-inline-block d-sm-none dataTitle9"></span>
{Column9Data}
<span class="price-data9">
<i class="far"></i>
{Column9DataGain}
</span>
</td>
</tr>
</tbody>
</table>]
目前我先以page 1的資料撰寫,
後續再拉長日期區間。
透過"開發人員工具"可以發現,這部分的資料是後來動態載入的。
它 POST 去請求其他網址取得資料:
可以參考我寫的其他文章:https://blog.jiatool.com/posts/gamer_commend_spider/