Beautifulsoup 爬蟲 select問題想抓取 td colspan = 2底下的text

網路爬蟲 beautifulsoup python select html

annannnihow 2022-12-02 20:48:53 ‧ 1645 瀏覽

分享至

這兩天在學網頁爬蟲現在遇到問題卡關

爬了好多文都沒有解答因此上來求助

我想要抓取 select html 中 td colspan=2 下的文字

但這裡沒有標籤也沒有Class 我不知道要怎麼選到

實際html如下

我想抓取video1的這個文字部分但不知怎麼select

謝謝解惑

登入發表討論

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

1 個回答

re.Zero

iT邦研究生 5 級 ‧ 2022-12-03 00:19:48

以下寫法不是最佳解，只是我第一時間隨便想到的，如有必要請自行最佳化。

#from bs4 import BeautifulSoup
## 如果 CSS selector 找到的目標 'tr.success' 在第一個：
soup = BeautifulSoup(content)
getStr = soup.select('tr.success')[0] \
    .find_all('td', recursive=False)[0] \
    .find_all('a', recursive=False)[1] \
    .string
print(getStr)

另，巡航/遍歷物件請參考 "Navigating the tree" / "遍历文档树"

回應 4
分享
檢舉

看更多先前的回應...收起先前的回應...

annannnihow iT邦新手 5 級 ‧ 2022-12-03 06:17:54 檢舉

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

你好我試了幾次一直得到此錯誤請問是甚麼意思謝謝

re.Zero iT邦研究生 5 級 ‧ 2022-12-03 10:40:02 檢舉

請確認你安裝的版本。

我用的是 "beautifulsoup4"，其內有 "find_all()" ；
而 "BeautifulSoup" (v3) 裡面沒有 "find_all()" 。
( 舊的 BeautifulSoup 3 倒是有 "findAll()"，我不清楚差異，請自參閱比較舊的 "Beautiful Soup 3 Doc" 與新的 "Beautiful Soup 4 Doc" )

另，就網頁上的說明，BeautifulSoup 已停止維護且建議使用 beautifulsoup4；但我不知你的狀況與需求，請自行確認與判斷。

annannnihow iT邦新手 5 級 ‧ 2022-12-03 15:55:26 檢舉

了解謝謝另外請問一下
如果我想要select的tr class不只success
例如還有tr class = default
(裡面html內容一樣只是class不同)
我要怎麼寫在select function裡面?
找tr success or default底下的內容

謝謝

re.Zero iT邦研究生 5 級 ‧ 2022-12-03 16:43:21 檢舉

你是要列出所有符合 CSS selector 選出的？

#from bs4 import BeautifulSoup
soup = BeautifulSoup(content, features='html.parser')
for item in soup.select('tr.success, tr.default'):
	myStr = item \
		.find_all('td', recursive=False)[0] \
		.find_all('a', recursive=False)[1] \
		.string
	print(myStr)

登入發表回應