網頁爬蟲取出標籤id屬性問題

python 3 網路爬蟲 python beautifulsoup

brian910813 2020-06-17 22:41:18 ‧ 3133 瀏覽

分享至

目前這在運用BeautifulSoup學習爬蟲，但在取標籤id時遇上問題

我在練習取得台彩中獎號碼的code，在最下方我想取得span的id屬性，並且放到函數中判斷是否為中獎號碼的標籤，但顯示了下面的錯誤訊息

Traceback (most recent call last):
  File "lotto.py", line 21, in <module>
    if find_id(tag['id']):
  File "C:\Users\brian\AppData\Local\Programs\Python\Python38-32\lib\site-packages\bs4\element.py", line 1401, in __getitem__
    return self.attrs[key]
KeyError: 'id'

Key值好像有錯誤(Documentation有寫可以字典方式取標籤屬性)，想請問這是為什麼，該怎麼解決呢？

以下是我的code

import requests as req
from bs4 import BeautifulSoup

win_id=["Lotto649Control_history_dlQuery_No1_","Lotto649Control_history_dlQuery_No2_","Lotto649Control_history_dlQuery_No3_","Lotto649Control_history_dlQuery_No4_","Lotto649Control_history_dlQuery_No5_","Lotto649Control_history_dlQuery_No6_","Lotto649Control_history_dlQuery_SNo_"]

def find_id(check_id):
    global win_id
    if(check_id !=None):
        for i in range(len(win_id)): #找尋傳入id是否符合中獎id列表
            if win_id[i] in check_id:
                return True
            else:
                return False

url="https://www.taiwanlottery.com.tw/lotto/lotto649/history.aspx" #網頁網址
request=req.get(url).text
root=BeautifulSoup(request,"html.parser") #大樂透網站的html檔案經過BeautifulSoup分析後存入root
tags=root.find_all("span")
with open("lotto_number.txt", mode="w", encoding="utf-8") as lotto_name:
    for tag in tags: #tags代表所有span標籤 tag代表個別每一個
        if find_id(tag['id']):
            lotto_name.write(tag['id']) #把tag["id"]帶入check_id

謝謝！

登入發表討論

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

1 個回答

japhenchen

iT邦超人 1 級 ‧ 2020-06-18 08:07:37

最佳解答

取得id值你該用 tag.get('id')

另....為什麼要這樣寫？

for i in range(len(win_id)): #找尋傳入id是否符合中獎id列表
            if win_id[i] in check_id:
                return True
            else:
                return False

這樣寫不是更好？兩行搞定，NONE也能檢查的出來，只要傳回check_id在不在win_id裡

def find_id(check_id):
    global win_id
    return (check_id in win_id)

我覺得連這函數都不用，直接在MAIN段處理即可

url="https://www.taiwanlottery.com.tw/lotto/lotto649/history.aspx" #網頁網址
request=req.get(url).text
root=BeautifulSoup(request,"html.parser") #大樂透網站的html檔案經過BeautifulSoup分析後存入root
tags=root.find_all("span")
with open("lotto_number.txt", mode="w", encoding="utf-8") as lotto_name:
    for tag in tags: #tags代表所有span標籤 tag代表個別每一個
        if tag.get('id') in win_id:
            lotto_name.write(tag.get('id')) #把tag["id"]帶入check_id

只有寫id進檔案裡？不懂.........你不寫值嗎？ tag.get_text()

回應 2
分享
檢舉

japhenchen iT邦超人 1 級 ‧ 2020-06-18 08:15:39 檢舉

或者...............你直接用lambda來找中獎的SPAN

wintags = root.findAll("span", {"id" : lambda L: L and L.startswith('Lotto649Control_history_dlQuery_SNo')})
#找出id開頭是Lotto649Control_history_dlQuery_SNo的SPAN
#底下就不用再CHECK IN了！！！
with open("lotto_number.txt", mode="w", encoding="utf-8") as lotto_name:
    for tag in tags: #tags代表所有span標籤 tag代表個別每一個
        lotto_name.write(tag.get('id')) 
        #把tag["id"]帶入check_id ...... 
        #寫ID進檔案？你應該是要把中獎號碼寫回檔案才對吧
        lotto_name.write(tag.get_text())

japhenchen iT邦超人 1 級 ‧ 2020-06-18 08:30:03 檢舉

SORRY我發現大樂透網站太多組歷史期別獎號了.........
所以你要拉到最上層的近一期的中獎彩號table才對哦！

FirstTable = root.find("table",{"class": "table_org"})[0]
wintags = FirstTable.findAll("span", {"id" : lambda L: L and L.startswith('Lotto649Control_history_dlQuery_SNo')})

登入發表回應

我要發表回答

立即登入回答

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

AI會議轉錄如何盡可能縮小明文攻擊面？

IT邦幫忙