爬蟲新手利用Python去get蝦皮賣家網頁

pandas pandas.dataframe #excel 格式

cfeynman 2019-12-04 18:04:57 ‧ 6781 瀏覽

分享至

各位蟲大大，小弟目前還是爬蟲新手，想先從利用Python去get蝦皮賣家網頁內的評價資訊,想要爬到買家購買商品後回覆的評價資訊。
隨機試的網址是 https://shopee.tw/buyer/196617/rating

目前程式碼如下，應該是可以正常運作，但是有兩個整理得到的資訊需要協助:

import requests
from selenium import webdriver
import json

url = "https://shopee.tw/api/v2/shop/get_ratings?filter=0&limit=6&offset=0&shopid=196610&type=0&userid=196617"

path = "C:\chromedriver.exe"
driver = webdriver.Chrome(path)
driver.get(url)
Cookie = ';'.join(['{}={}'.format(item.get('name'), item.get('value')) for item in driver.get_cookies()])

header = {
    'cookie': Cookie,
    'if-none-match-': '55b03-9416b009bb04ac91e85f9aebd5c3267a',
    'referer': 'https://shopee.tw/buyer/196617/rating',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'User-Agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
    'x-api-source': 'pc',
    'x-requested-with': 'XMLHttpRequest',

}

req = requests.get(url, header)
data = json.loads(req.text)

print(req.status_code)
print(data)

執行結果:

最後的到的資訊想要利用 Pandas.Datafram 做列表，所以加了兩行如下，但得到資訊是格式不正確

import pandas
df = pandas.DataFrame(data, json())
df

執行結果:

1.是否是無法列表?還是純粹格式不對?
2.另一個問題要如何將一頁一頁的資訊列成 excel 的格式存取下來?(目前一頁只有六個評價資訊)

謝謝!

登入發表討論

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

2 個回答

dragonH

iT邦超人 5 級 ‧ 2019-12-04 20:06:06

最佳解答

不是太懂為啥用了 requests

又要用 selenium

1.是否是無法列表?還是純粹格式不對?

那個錯誤是在說你

pandas.DataFrame(data, json())

json的部分

2.另一個問題要如何將一頁一頁的資訊列成 excel 的格式存取下來?(目前一頁只有六個評價資訊)

就

爬完之後用個 module 存成 excel 阿

e.g.

openpyxl

code

import requests
import pandas

url = "https://shopee.tw/api/v2/shop/get_ratings?filter=0&limit=6&offset=0&shopid=196610&type=0&userid=196617"
header = {
  'if-none-match-': '55b03-9416b009bb04ac91e85f9aebd5c3267a',
}

data = requests.get(url, header).json()
df = pandas.DataFrame(data['data']['items'])
print(df)

result

       itemid  rating liked  shopid show_reply                                      product_items  ... rating_star  author_shopid     userid                             comment  filter delete_reason
0  1757421922       1  None  196610       None  [{'itemid': 1757421922, 'welcome_package_info'...  ...           5        2490360    2491639                                           0          None
1     3326952       1  None  196610       None  [{'itemid': 3326952, 'welcome_package_info': N...  ...           5       48510598   48511986  品質還不錯�� 只是泡泡有點難推開\n回購第二次了 之後會再回購的！
 3          None
2  1435401447       1  None  196610       None  [{'itemid': 1435401447, 'welcome_package_info'...  ...           5      191603969  191606796                                           0          None
3     3326952       1  None  196610       None  [{'itemid': 3326952, 'welcome_package_info': N...  ...           5       86474484   86475961                                           0          None
4     3594160       1  None  196610       None  [{'itemid': 3594160, 'welcome_package_info': N...  ...           5        1423089    1424149                                           0          None
5     2330735       1  None  196610       None  [{'itemid': 2330735, 'welcome_package_info': N...  ...           5       20747088   20748424                                           0          None

[6 rows x 30 columns]

回應 3
分享
檢舉

froce iT邦大師 1 級 ‧ 2019-12-04 20:51:47 檢舉

不是太懂為啥用了 requests
又要用 selenium

有時候是必要的，requests不能render javascript啊。

dragonH iT邦超人 5 級 ‧ 2019-12-04 21:34:31 檢舉

我的意思是他這個案例XD

listennn08 iT邦高手 5 級 ‧ 2019-12-05 08:29:00 檢舉

看起來 selenium 是裝飾用的

登入發表回應

froce

iT邦大師 1 級 ‧ 2019-12-04 20:09:52

ㄜ...
DEBUG訊息貼圖很難看...考驗人的眼力。

df = pandas.DataFrame(data, json())

你確定DataFrame讀JSON是這樣讀的嗎？
json也不是這樣用的。
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html

回應 3
分享
檢舉

阿展展展 iT邦好手 1 級 ‧ 2019-12-05 10:06:23 檢舉

拿出(放大鏡)

cfeynman iT邦新手 5 級 ‧ 2019-12-05 20:09:18 檢舉

感謝各位的回覆，似乎問題相當粗淺，抱歉，新手的問題讓各位高手見笑了；今天試了一整天，還是有東西試不出來，上網查半天，應該是我對程式語言的表達不了解，看的一堆還是不懂，麻煩各位指導

下圖是試爬蝦皮的資料 https://shopee.tw/buyer/4099676/rating

code:

import requests
from selenium import webdriver
import json
import pandas

url = "https://shopee.tw/api/v2/shop/get_ratings?filter=0&limit=6&offset=0&shopid=4098392&type=0&userid=4099676"

path = "C:\chromedriver.exe"
driver = webdriver.Chrome(path)
driver.get(url)
Cookie = ';'.join(['{}={}'.format(item.get('name'), item.get('value')) for item in driver.get_cookies()])



header = {
    'cookie': Cookie,
    'if-none-match-': 'ad0cf65c2f362c78081c167ace34e140',
    'if-none-match-': '55b03-eddbcc2c628e9f6639f434a78134bca1',
    'referer': 'https://shopee.tw/buyer/4099676/rating',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'User-Agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
    'x-api-source': 'pc',
    'x-requested-with': 'XMLHttpRequest',
    #'Postman-Token': '68c324d7-4894-448a-a4db-9072b6bbcf0c',
    #'Connection': 'keep-alive'

}

data = requests.get(url, header).json()
df = pandas.DataFrame(data['data']['items'])
df.encoding = 'utf-8'

print(df)

Result:

但是想抓取裡面細節的資訊試不出來

麻煩各位指導

下圖是試爬蝦皮的資料 https://shopee.tw/buyer/4099676/rating

![https://ithelp.ithome.com.tw/upload/images/20191205/20123283Yo4e5RS5sq.jpg](https://ithelp.ithome.com.tw/upload/images/20191205/20123283Yo4e5RS5sq.jpg)

code:

```
import requests
from selenium import webdriver
import json
import pandas

url = "https://shopee.tw/api/v2/shop/get_ratings?filter=0&limit=6&offset=0&shopid=4098392&type=0&userid=4099676"

path = "C:\chromedriver.exe"
driver = webdriver.Chrome(path)
driver.get(url)
Cookie = ';'.join(['{}={}'.format(item.get('name'), item.get('value')) for item in driver.get_cookies()])

header = {
    'cookie': Cookie,
    'if-none-match-': 'ad0cf65c2f362c78081c167ace34e140',
    'if-none-match-': '55b03-eddbcc2c628e9f6639f434a78134bca1',
    'referer': 'https://shopee.tw/buyer/4099676/rating',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'User-Agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
    'x-api-source': 'pc',
    'x-requested-with': 'XMLHttpRequest',
    #'Postman-Token': '68c324d7-4894-448a-a4db-9072b6bbcf0c',
    #'Connection': 'keep-alive'

}

data = requests.get(url, header).json()
df = pandas.DataFrame(data['data']['items'])
df.encoding = 'utf-8'

print(df)
```
 Result:
 ![https://ithelp.ithome.com.tw/upload/images/20191205/20123283eX3vAHCEFL.jpg](https://ithelp.ithome.com.tw/upload/images/20191205/20123283eX3vAHCEFL.jpg)

但是想抓取裡面細節的資訊試不出來

![https://ithelp.ithome.com.tw/upload/images/20191205/20123283dUAtW5VReX.jpg](https://ithelp.ithome.com.tw/upload/images/20191205/20123283dUAtW5VReX.jpg)

![https://ithelp.ithome.com.tw/upload/images/20191205/20123283JKl7NygeFM.jpg](https://ithelp.ithome.com.tw/upload/images/20191205/20123283JKl7NygeFM.jpg)

麻煩各位指導

修改

dragonH iT邦超人 5 級 ‧ 2019-12-05 21:53:39 檢舉

cfeynman

那你就要先把你想要的資料

整理過後

再用 pandas show 出來

登入發表回應