Python 練習抓資料

python

pigs0231 2018-08-16 15:01:25 ‧ 2650 瀏覽

分享至

def get_articles(dom):
    soup = BeautifulSoup(dom, 'html.parser')

    articles = []# 儲存取得的文章資料
    divs = soup.find_all('div', 'qa-list')
    for i in divs:
        if i.find('div', 'qa-list__condition'):
            push_count = 0
            try:
                push_count = i.find('span', 'qa-condition__count').string  # 轉換字串為數字
            except ValueError:  # 若轉換失敗，不做任何事，push_count 保持為 0
                pass
        if i.find('div', 'qa-list__content'):
            name_title = 0
            try:
                name_title = i.find('a', 'qa-list__title-link').string
            except ValueError:
                pass

           
            if i.find('a'): 
                href = i.find('a')['href']
                articles.append({'title': name_title, 'href': href, 'push_count': push_count})
    return articles


if __name__ == '__main__':
    page = get_web_page('https://ithelp.ithome.com.tw/')
    if page:

        current_artitles = get_articles(page)
        for post in current_artitles:
            print(post)

想問一下假設抓取得網址是IT技術問答的網站，想要從抓取Like改變為瀏覽的數字
兩個得span class 名稱都相同，請問要怎麼思考從兩邊相同的名稱中抓取到自己想要的數值呢?

登入發表討論

直播研討會

1 個回答

神Q超人

iT邦研究生 5 級 ‧ 2018-08-16 15:36:50

HELLO！
主要在您使用i.find('span', 'qa-condition__count')，
find()他只會找出第一個符合的DOM，
如果要取出所有符合的DOM可以使用find_all()方法，
就像您上方取得所有文章的list一樣，
先把所有符合條件的DOM放進一個陣列中，
就可以從那個陣列中去取自己想要的資料了！

如果有問題可以在留言告訴我，謝謝