python requests 缺少部分資料片段

python 爬蟲資料科學資料工程

Peter 2022-01-20 20:32:42 ‧ 1881 瀏覽

分享至

各位前輩晚上好，小弟在嘗試爬蟲取網頁資料的時候，有成功取到資料，
但唯獨少了< section > 的部分。

想請為什麼html會少這部分的資料呢？
我該如何取得 < section >

目標資料：

程式碼：

import requests
from bs4 import BeautifulSoup

url = 'https://rent.houseprice.tw/'
res = requests.get(url, headers=headers)
html_doc = res.text
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup)

結果：

jiatool iT邦研究生 1 級 ‧ 2022-01-20 21:35:01 檢舉

如果網頁原始碼缺少這部分，那它有可能是用動態載入的
(開發人員工具 > Network > Fetch/XHR，再按F5重新整理網頁)

Peter iT邦新手 4 級 ‧ 2022-01-20 22:04:24 檢舉

感謝jiatool大大，透過 Fetch/XHR重整，確實會載入一個名為「list/」的文件，裡面get的url是一個json結構，有我要的全部資料！
但想請問我該如何取得在 Fetch/XHR中，網站另外去get的url : https://rent.houseprice.tw/ws/list/ 呢？

登入發表討論

熱門推薦

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

1 個回答

I code so I am

iT邦高手 1 級 ‧ 2022-01-21 09:17:07

最佳解答

等個幾秒，再送出request即可。

import requests
from bs4 import BeautifulSoup

s = requests.Session()
url = 'https://rent.houseprice.tw/'
headers={'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36"}
res = s.get(url, headers=headers)
html_doc = res.text
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup)

import time
time.sleep(3)

url = 'https://rent.houseprice.tw/ws/list/'
res = s.get(url, headers=headers)
html_doc = res.text
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup)