想請問版上大神,前些日子在利用 requests 進行爬蟲時碰到以下問題,
上網查了一下還是不太清楚如何解決,有檢查過環境內有安裝 certifi 模組,
也有試過使用代理 IP,但還是無法解決,想上來問一下有沒有甚麼解法
附上程式碼:
在第二個 requests 的時候出錯
url = "https://data.gov.tw/dataset/146529"
res = requests.get(url)
soup = BeautifulSoup(res.text, "lxml").find_all("script")[0].text
download_url = json.loads(soup)[1]["distribution"][0]["contentUrl"]
res = requests.get(download_url) # 在這行出錯
print(res.headers)
附上第二個 requestes 的 url:
直接點 url 是能正常下載的
https://apiservice.mol.gov.tw/OdService/download/A17000000J-030268-CMF
附上錯誤訊息:
HTTPSConnectionPool(host='apiservice.mol.gov.tw', port=443): Max retries exceeded with url: /OdService/download/A17000000J-030268-CMF (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))
之前也有人遇到類似問題
https://ithelp.ithome.com.tw/questions/10203445
參考看看
目前改用 urllib.request.urlopen 能成功取得 header
url = "https://data.gov.tw/dataset/146529"
res = requests.get(url)
soup = BeautifulSoup(res.text, "lxml").find_all("script")[0].text
download_url = json.loads(soup)[1]["distribution"][0]["contentUrl"]
res = urllib.request.urlopen(download_url)
pprint(res.getheaders())
附上成功截圖:
[('Date', 'Thu, 18 Nov 2021 02:24:47 GMT'),
('Server', 'Apache/2.4.39 (Win64) OpenSSL/1.1.1b'),
('X-Frame-Options', 'SAMEORIGIN'),
('Content-Disposition', 'inline; filename="A17000000J-030268-CMF.csv"'),
('Content-Type', 'application/octet-stream;charset=UTF-8'),
('Content-Length', '1406'),
('Set-Cookie', 'ROUTEID=.2; path=/;httponly;secure;HttpOnly;Secure'),
('Access-Control-Allow-Origin', '*'),
('X-XSS-Protection', '1; mode=block'),
('X-Content-Type-Options', 'nosniff'),
('X-Frame-Options', 'SAMEORIGIN'),
('Cache-Control',
'private, no-cache, no-store, proxy-revalidate, must-revalidate, '
'no-transform'),
('Pragma', 'no-cache'),
('Connection', 'close')]
Process finished with exit code 0
警告是提醒你注意.以目前你的應用是單向下載,可以評估安全性.
原來如此,感謝!!
可能是你的header或者发送的请求头那里有问题导致的,你用postman试试,postman可以获取到,那request肯定也可以。