關於Python使用requests的爬蟲問題

python python 3 python3 python request requests

神Q超人 2018-06-01 22:06:45 ‧ 53260 瀏覽

分享至

各位大大好，因為最近在學Python用來爬蟲，使用的套件是requests，目前學著用requests去取網頁中get及post的資料，但是目前在爬蝦皮的商品頁面資料遇到問題，想說是不是有搞錯或沒注意到的地方，麻煩各位大大解答。

想爬的頁面是在賣家的主頁去爬該賣場所有的商品資料，包含庫存、價格等，舉例如下頁面(把賣場名字遮起來是怕侵權或感覺像是打廣告，然後紅色框起來的部分就是我想爬的資料)：
https://shopee.tw/shop/3625341/search/

我目前的做法是，從開發人員工具裡觀察他的request，於是看到了一個get中有回傳產品的所有資料，以下圖片畫紅線的地方是產品名稱及庫存數量：

透過Headers去觀察它是使用get去請求這個資料：

但是在我使用Python的requests的get去請求的時候，跑出來的不是在Preview中的JSON資料，而是目前頁面的HTML，當然在那個HTML中也沒有產品的資料，程式碼以下附上：

import requests

res = requests.get("https://shopee.tw/api/v2/search_items/?by=pop&limit=30&match_id=3625341&newest=0&order=desc&page_type=shop")

print(res.text)

結果用圖片奉上：

其實結果的圖片也看不出什麼，但是就是一些JavaScript和少許HTML，請問版上大大有方法能夠告訴我是漏了什麼，也可以稍微給個方向讓我嘗試！麻煩大家了！

PS.如果文中有令人惹議的圖片請留言告訴我，我會再把圖片拿掉！謝謝！

登入發表討論

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

2 個回答

froce

iT邦大師 1 級 ‧ 2018-06-02 09:55:22

最佳解答

1.requests送出的UA（python-requests/2.18.4），蝦皮擋掉了，會導向蝦皮的新手教學頁面。
2.解法也很簡單，就把UA改掉就行。

import requests

headers = {'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}

res = requests.get("https://shopee.tw/api/v2/search_items/?by=pop&limit=30&match_id=3625341&newest=0&order=desc&page_type=shop", headers=headers)
print(res.request.headers)    # 看requests送出的header
print(res.text)