Day16 requests模組一

2021 iThome 鐵人賽

DAY 16

影片教學

文組生的Python爬蟲之旅系列第 16 篇

13th鐵人賽

水母君

2021-09-30 13:59:16

1840 瀏覽

分享至

終於！可以進入真正的爬蟲教學啦～
我們已經有一定的實力來編寫Python和分析網頁了
今天的影片內容為建立HTTP 請求，並從網頁伺服器上取得想要的資料

要執行這個外部模組前，必須至CMD進行安裝

pip install requests

以下為影片中有使用到的程式碼

#檢查資料型態
import requests

url = "https://new.ntpu.edu.tw/"
htmlfile = requests.get(url)
print(type(htmlfile))

#Response物件的重要屬性
import requests

url = "https://new.ntpu.edu.tw/"
htmlfile = requests.get(url)

print("是否成功獲取網頁內容:", htmlfile.status_code) #列印出整數200為成功獲取
print("列印出網頁內容:\n", htmlfile.text) #\n為換行

#搜尋網頁特定內容
import requests
import re

url = "https://new.ntpu.edu.tw/"
htmlfile = requests.get(url)

word = input("請輸入想搜尋的字串:")

if word in htmlfile.text:
    print("搜尋成功!")
    data = re.findall(word, htmlfile.text) #將搜尋到的字串放入串列中 ex:[1,1,1,1,1]
    print("出現次數:", len(data))
    
else:
    print("搜尋失敗...")

#將程式稍微改良
import requests

url = "https://new.ntpu.edu.tw/"
htmlfile = requests.get(url)

if htmlfile.status_code == 200:
    print("列印出網頁內容:\n", htmlfile.text)
else:
    print("網頁下載失敗..")

#試試看其他的網站吧!
import requests

url = "https://www.kingstone.com.tw/"
htmlfile = requests.get(url)

if htmlfile.status_code == 200:
    print("列印出網頁內容:\n", htmlfile.text)
else:
    print("網頁下載失敗")

如果在影片中有說得不太清楚或錯誤的地方，歡迎留言告訴我，謝謝您的指教。