Day 22 Selenium模組一

2021 iThome 鐵人賽

DAY 22

影片教學

文組生的Python爬蟲之旅系列第 22 篇

13th鐵人賽

水母君

2021-10-06 13:17:39

4285 瀏覽

分享至

今天的影片內容為介紹另一個強大的模組—Selenium
有了它，我們就可以隨心所欲地控制瀏覽器並執行許多自動化的動作
想要搶演唱會的門票或是限量商品也是可行的
讓我們一起來認識Selenium吧！

要執行這個外部模組前，必須至CMD進行安裝

pip install selenium

接著下載chrome瀏覽器

https://www.google.com/intl/zh-TW/chrome/?brand=JJTC&gclid=CjwKCAjw7--KBhAMEiwAxfpkWCvv3CVCnH_sZzoLu1ROgeleWmuJlUuYqPHsr3m01gugT1vldtzgAxoCs9oQAvD_BwE&gclsrc=aw.ds

最後下載符合您需求的驅動程式

https://chromedriver.chromium.org/home

以下為影片中有使用到的程式碼

#傳回資料型態
#請將C:\\spider\\修改為chromedriver.exe在您電腦中的路徑
from selenium import webdriver

dirverPath = 'C:\\spider\\chromedriver.exe'
browser = webdriver.Chrome(executable_path = dirverPath)
print(type(browser))

#讓瀏覽器連上網頁
#請將C:\\spider\\修改為chromedriver.exe在您電腦中的路徑
from selenium import webdriver

dirverPath = 'C:\\spider\\chromedriver.exe'
browser = webdriver.Chrome(executable_path = dirverPath)
url = 'https://new.ntpu.edu.tw/'
browser.get(url)

#幾個重要的屬性
#請將C:\\spider\\修改為chromedriver.exe在您電腦中的路徑
from selenium import webdriver

dirverPath = 'C:\\spider\\chromedriver.exe'
browser = webdriver.Chrome(executable_path = dirverPath)
url = 'https://new.ntpu.edu.tw/'
browser.get(url)
print('瀏覽器名稱:', browser.name)
print('網頁連線id:', browser.session_id)
print('網頁網址:', browser.current_url)
print('網頁標題:', browser.title)
print('網頁原始碼:\n', browser.page_source)

#說不定...可以結合bs4？
#請將C:\\spider\\修改為chromedriver.exe在您電腦中的路徑
from selenium import webdriver
import bs4
dirverPath = 'C:\\spider\\chromedriver.exe'
browser = webdriver.Chrome(executable_path = dirverPath)
url = 'https://new.ntpu.edu.tw/'
browser.get(url)

objsoup = bs4.BeautifulSoup(browser.page_source, 'lxml')
print(type(objsoup))

#selenium結合BeautifulSoup
#請將C:\\spider\\修改為chromedriver.exe在您電腦中的路徑
from selenium import webdriver
import bs4
dirverPath = 'C:\\spider\\chromedriver.exe'
browser = webdriver.Chrome(executable_path = dirverPath)
url = 'https://www.taiwanlottery.com.tw/index_new.aspx'
browser.get(url)

objsoup = bs4.BeautifulSoup(browser.page_source, 'lxml')

print("="*100)
#以下為上次爬取雙贏彩的程式碼
doublewin = objsoup.find(class_ = 'contents_box06')
balls = doublewin.find_all(class_='ball_tx ball_blue')

order_1 = [] #開出順序
order_2 = [] #大小順序

for ball in balls:
    if len(order_1) < 12:
        order_1.append(ball.text)
    else:
        order_2.append(ball.text)
        
#最終成果
time = doublewin.find('span')
print(time.text)
print("開出順序", order_1)   
print("大小順序", order_2)

本篇影片及程式碼僅提供研究使用，請勿大量惡意地爬取資料造成對方網頁的負擔呦！
如果在影片中有說得不太清楚或錯誤的地方，歡迎留言告訴我，謝謝您的指教。