Python syntax error

python webscraping

sai_hikaru 2020-11-17 15:57:29 ‧ 1385 瀏覽

分享至

各位好初學Python+Web scrapping
下面是書上照打的語法
一直出現syntax error (粗體字)
不懂哪邊不對
謝謝各位

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

html=urlopen('https://en.wikipedia.org/wiki/Kevin_Bacon')
bs=BeautifulSoup(html,'html.parser')
for link in bs.find('div',{'id':'bodycontent'}).find_all('a', href=re.compile('^(/wiki/)((?!:).)*$")'):
if'href' in link.attrs:
print(link.attrs['href'])

dragonH iT邦超人 5 級 ‧ 2020-11-17 15:59:55 檢舉

錯誤是啥你要說

listennn08 iT邦高手 5 級 ‧ 2020-11-17 16:23:35 檢舉

for link in bs.find('div',{'id':'bodycontent'}).find_all('a', href=re.compile('^(/wiki/)((?!:).)*$")')
少一個右括號

登入發表討論

直播研討會

1 個回答

ccutmis

iT邦高手 2 級 ‧ 2020-11-17 17:16:11

試了一下發現三個出錯的地方
1)樓上邦友說的少了一個)括號
2)'bodyContent' 你寫成 'bodycontent' 這個雖然不會報錯，但是你就搜不出任何東西了(這個去看Kevin_bacon那個網頁原始碼，看到裡面的div id="bodyContent" 你應該就知道我說的意思了)
3) re.compile('...這裡面有出問題...')

簡單改寫一下是能撈出東西了，但不確定這是不是你要的東西，參考看看:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

html=urlopen('https://en.wikipedia.org/wiki/Kevin_Bacon')
bs=BeautifulSoup(html,'html.parser')

for link in bs.find('div',{'id':'bodyContent'}).find_all('a', href=re.compile('^(/wiki/)(?!:.*$")')):
	if 'href' in link.attrs:
		print(link.attrs['href'])

如果是初學爬虫的話，我個人建議別玩 BeautifulSoup 改學 requests_html，相關教學你可以Google "requests_html 教學"，還有正則(re)要學。

補充一下用 requests_html 寫的:

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://en.wikipedia.org/wiki/Kevin_Bacon')
for link in r.html.find('div#bodyContent a'):
    if 'href' in link.attrs and "/wiki/" in link.attrs['href']:
        print(link.attrs['href'])