iT邦幫忙

1

網頁爬蟲 取出標籤id屬性問題

  • 分享至 

  • xImage

目前這在運用BeautifulSoup學習爬蟲,但在取標籤id時遇上問題

我在練習取得台彩中獎號碼的code,在最下方我想取得span的id屬性,並且放到函數中判斷是否為中獎號碼的標籤,但顯示了下面的錯誤訊息

Traceback (most recent call last):
  File "lotto.py", line 21, in <module>
    if find_id(tag['id']):
  File "C:\Users\brian\AppData\Local\Programs\Python\Python38-32\lib\site-packages\bs4\element.py", line 1401, in __getitem__
    return self.attrs[key]
KeyError: 'id'

Key值好像有錯誤(Documentation有寫可以字典方式取標籤屬性),想請問這是為什麼,該怎麼解決呢?

以下是我的code

import requests as req
from bs4 import BeautifulSoup

win_id=["Lotto649Control_history_dlQuery_No1_","Lotto649Control_history_dlQuery_No2_","Lotto649Control_history_dlQuery_No3_","Lotto649Control_history_dlQuery_No4_","Lotto649Control_history_dlQuery_No5_","Lotto649Control_history_dlQuery_No6_","Lotto649Control_history_dlQuery_SNo_"]

def find_id(check_id):
    global win_id
    if(check_id !=None):
        for i in range(len(win_id)): #找尋傳入id是否符合中獎id列表
            if win_id[i] in check_id:
                return True
            else:
                return False

url="https://www.taiwanlottery.com.tw/lotto/lotto649/history.aspx" #網頁網址
request=req.get(url).text
root=BeautifulSoup(request,"html.parser") #大樂透網站的html檔案經過BeautifulSoup分析後存入root
tags=root.find_all("span")
with open("lotto_number.txt", mode="w", encoding="utf-8") as lotto_name:
    for tag in tags: #tags代表所有span標籤 tag代表個別每一個
        if find_id(tag['id']):
            lotto_name.write(tag['id']) #把tag["id"]帶入check_id

謝謝!

圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

1 個回答

2
japhenchen
iT邦超人 1 級 ‧ 2020-06-18 08:07:37
最佳解答

取得id值你該用 tag.get('id')

另....為什麼要這樣寫?

for i in range(len(win_id)): #找尋傳入id是否符合中獎id列表
            if win_id[i] in check_id:
                return True
            else:
                return False

這樣寫不是更好?兩行搞定,NONE也能檢查的出來,只要傳回check_id在不在win_id裡

def find_id(check_id):
    global win_id
    return (check_id in win_id)

我覺得連這函數都不用,直接在MAIN段處理即可

url="https://www.taiwanlottery.com.tw/lotto/lotto649/history.aspx" #網頁網址
request=req.get(url).text
root=BeautifulSoup(request,"html.parser") #大樂透網站的html檔案經過BeautifulSoup分析後存入root
tags=root.find_all("span")
with open("lotto_number.txt", mode="w", encoding="utf-8") as lotto_name:
    for tag in tags: #tags代表所有span標籤 tag代表個別每一個
        if tag.get('id') in win_id:
            lotto_name.write(tag.get('id')) #把tag["id"]帶入check_id

只有寫id進檔案裡?不懂.........你不寫值嗎? tag.get_text()

或者...............你直接用lambda來找中獎的SPAN

wintags = root.findAll("span", {"id" : lambda L: L and L.startswith('Lotto649Control_history_dlQuery_SNo')})
#找出id開頭是Lotto649Control_history_dlQuery_SNo的SPAN
#底下就不用再CHECK IN了!!!
with open("lotto_number.txt", mode="w", encoding="utf-8") as lotto_name:
    for tag in tags: #tags代表所有span標籤 tag代表個別每一個
        lotto_name.write(tag.get('id')) 
        #把tag["id"]帶入check_id ...... 
        #寫ID進檔案?你應該是要把中獎號碼寫回檔案才對吧
        lotto_name.write(tag.get_text()) 

SORRY我發現大樂透網站太多組歷史期別獎號了.........
所以你要拉到最上層的近一期的中獎彩號table才對哦!

FirstTable = root.find("table",{"class": "table_org"})[0]
wintags = FirstTable.findAll("span", {"id" : lambda L: L and L.startswith('Lotto649Control_history_dlQuery_SNo')})

我要發表回答

立即登入回答