目前這在運用BeautifulSoup學習爬蟲,但在取標籤id時遇上問題
我在練習取得台彩中獎號碼的code,在最下方我想取得span的id屬性,並且放到函數中判斷是否為中獎號碼的標籤,但顯示了下面的錯誤訊息
Traceback (most recent call last):
File "lotto.py", line 21, in <module>
if find_id(tag['id']):
File "C:\Users\brian\AppData\Local\Programs\Python\Python38-32\lib\site-packages\bs4\element.py", line 1401, in __getitem__
return self.attrs[key]
KeyError: 'id'
Key值好像有錯誤(Documentation有寫可以字典方式取標籤屬性),想請問這是為什麼,該怎麼解決呢?
以下是我的code
import requests as req
from bs4 import BeautifulSoup
win_id=["Lotto649Control_history_dlQuery_No1_","Lotto649Control_history_dlQuery_No2_","Lotto649Control_history_dlQuery_No3_","Lotto649Control_history_dlQuery_No4_","Lotto649Control_history_dlQuery_No5_","Lotto649Control_history_dlQuery_No6_","Lotto649Control_history_dlQuery_SNo_"]
def find_id(check_id):
global win_id
if(check_id !=None):
for i in range(len(win_id)): #找尋傳入id是否符合中獎id列表
if win_id[i] in check_id:
return True
else:
return False
url="https://www.taiwanlottery.com.tw/lotto/lotto649/history.aspx" #網頁網址
request=req.get(url).text
root=BeautifulSoup(request,"html.parser") #大樂透網站的html檔案經過BeautifulSoup分析後存入root
tags=root.find_all("span")
with open("lotto_number.txt", mode="w", encoding="utf-8") as lotto_name:
for tag in tags: #tags代表所有span標籤 tag代表個別每一個
if find_id(tag['id']):
lotto_name.write(tag['id']) #把tag["id"]帶入check_id
謝謝!
取得id值你該用 tag.get('id')
另....為什麼要這樣寫?
for i in range(len(win_id)): #找尋傳入id是否符合中獎id列表
if win_id[i] in check_id:
return True
else:
return False
這樣寫不是更好?兩行搞定,NONE也能檢查的出來,只要傳回check_id在不在win_id裡
def find_id(check_id):
global win_id
return (check_id in win_id)
我覺得連這函數都不用,直接在MAIN段處理即可
url="https://www.taiwanlottery.com.tw/lotto/lotto649/history.aspx" #網頁網址
request=req.get(url).text
root=BeautifulSoup(request,"html.parser") #大樂透網站的html檔案經過BeautifulSoup分析後存入root
tags=root.find_all("span")
with open("lotto_number.txt", mode="w", encoding="utf-8") as lotto_name:
for tag in tags: #tags代表所有span標籤 tag代表個別每一個
if tag.get('id') in win_id:
lotto_name.write(tag.get('id')) #把tag["id"]帶入check_id
只有寫id進檔案裡?不懂.........你不寫值嗎? tag.get_text()
或者...............你直接用lambda來找中獎的SPAN
wintags = root.findAll("span", {"id" : lambda L: L and L.startswith('Lotto649Control_history_dlQuery_SNo')})
#找出id開頭是Lotto649Control_history_dlQuery_SNo的SPAN
#底下就不用再CHECK IN了!!!
with open("lotto_number.txt", mode="w", encoding="utf-8") as lotto_name:
for tag in tags: #tags代表所有span標籤 tag代表個別每一個
lotto_name.write(tag.get('id'))
#把tag["id"]帶入check_id ......
#寫ID進檔案?你應該是要把中獎號碼寫回檔案才對吧
lotto_name.write(tag.get_text())
SORRY我發現大樂透網站太多組歷史期別獎號了.........
所以你要拉到最上層的近一期的中獎彩號table才對哦!
FirstTable = root.find("table",{"class": "table_org"})[0]
wintags = FirstTable.findAll("span", {"id" : lambda L: L and L.startswith('Lotto649Control_history_dlQuery_SNo')})