python爬蟲新手求解

python爬蟲

al88423 2022-12-12 18:54:18 ‧ 1239 瀏覽

分享至

小弟最近開始練習摸python爬蟲，在爬資料時因為網頁的html一個標籤裡面包了好幾層，導致重複取到資料，想問有辦法解決嗎?

網頁html資料大概長這樣:

<span style="text-decoration:none">你好<font>
<span style="text-decoration:none"><span style="font-family:"Segoe UI Emoji",sans-serif">小明;</span></span></font></span>

篩選條件是這樣:
find_all("span", style="text-decoration:none")

出來會取到2筆資料，
你好小明
小明

Q.想請問會這樣是因為篩選件不夠嚴謹嗎?

re.Zero iT邦研究生 5 級 ‧ 2022-12-12 22:09:47 檢舉

你搜尋條件設定 "樣式為 'text-decoration:none' 的 'span'"，這結果很正常啊。

登入發表討論

直播研討會

1 個回答

re.Zero

iT邦研究生 5 級 ‧ 2022-12-12 22:34:56

你是想要這個？
『找不包含「樣式為 'text-decoration:none' 的 'span'」的「樣式為 'text-decoration:none' 的 'span'」』：

myKwArgs = {'name': "span", 'style': "text-decoration:none"}
for i in soup.find_all(**myKwArgs):
	if(not bool(i.find_all(**myKwArgs))):
		print('[\033[36m',i,'\033[0m]')