iT邦幫忙

2024 iThome 鐵人賽

DAY 21
0
自我挑戰組

從零開始學Python系列 第 21

[Day21] Python 爬蟲-2

  • 分享至 

  • xImage
  •  

續上一篇,用以下code可以自動開啟瀏覽器,並在搜尋欄中輸入「iphone 16」,並按下enter。

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

chrome_driver_path = "/Users/shiyongchun/Desktop/HTML_CSS/chromedriver/chromedriver"  
service = Service(chrome_driver_path)
driver = webdriver.Chrome(service=service)
driver.get("https://google.com")
search = driver.find_element(By.NAME, "q")
search.send_keys("iphone 16")
search.send_keys(Keys.RETURN)
print(driver.title)

time.sleep(10)
driver.quit()
  1. 在搜尋關鍵字後,擷取標題
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

chrome_driver_path = "/Users/shiyongchun/Desktop/HTML_CSS/chromedriver/chromedriver"  
service = Service(chrome_driver_path)
driver = webdriver.Chrome(service=service)
driver.get("https://google.com")
search = driver.find_element(By.NAME, "q")
search.send_keys("iphone 16")
search.send_keys(Keys.RETURN)
print(driver.title)

titles = driver.find_elements(By.CLASS_NAME, "LC20lb")
for title in titles:
    print(title.text)

time.sleep(10)
driver.quit()

其中titles = driver.find_elements(By.CLASS_NAME, "LC20lb"),可以在網頁程式碼中搜尋到,檢視原始碼的快捷鍵在Mac 是:⌘ + Option + U,windows電腦為F12。
https://ithelp.ithome.com.tw/upload/images/20240911/20168811JsLhLuyRV1.png

以下為google搜尋「iphone16」後,擷取得標題。
https://ithelp.ithome.com.tw/upload/images/20240911/20168811TSlXJlUCyh.png

  1. 網頁搜尋時間
    上述的例子中,並沒有出現這個問題,若有時候網頁搜尋跑出資料的速度較慢,會導致網頁還沒有出現結果就執行下一個擷取標題的動作,因此需要一個explicit wait來等待網頁搜尋的動作執行完畢。
    https://ithelp.ithome.com.tw/upload/images/20240911/20168811SYvEn9x3K8.png
    更多資料可以參考:explicit wait
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

這三行code執行後,就可以不用在前面加上time.sleep()

chrome_driver_path = "/Users/shiyongchun/Desktop/HTML_CSS/chromedriver/chromedriver"  
service = Service(chrome_driver_path)
driver = webdriver.Chrome(service=service)
driver.get("https://google.com")
search = driver.find_element(By.NAME, "q")
search.send_keys("iphone 16")
search.send_keys(Keys.RETURN)

time.sleep(2) #這行可以不用加了
titles = driver.find_elements(By.CLASS_NAME, "LC20lb")
for title in titles:
    print(title.text)

但改由以下程式碼來運行:等到頁面出現某個標籤之後才會開始擷取資料。

WebDriverWait(driver, 20).until(
    EC.presence_of_element_located((By.CLASS_NAME, "logo"))
)

會等待google的logo出現,最多20秒。
https://ithelp.ithome.com.tw/upload/images/20240911/2016881149qrIJWelR.png
以下為完整程式碼:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chrome_driver_path = "/Users/shiyongchun/Desktop/HTML_CSS/chromedriver/chromedriver"  
service = Service(chrome_driver_path)
driver = webdriver.Chrome(service=service)
driver.get("https://google.com")
search = driver.find_element(By.NAME, "q")
search.send_keys("iphone 16")
search.send_keys(Keys.RETURN)
print(driver.title)

WebDriverWait(driver, 20).until(
    EC.presence_of_element_located((By.CLASS_NAME, "logo"))
)

titles = driver.find_elements(By.CLASS_NAME, "LC20lb")
for title in titles:
    print(title.text)

time.sleep(10)
driver.quit()
  1. 網頁操作
  • 點擊連結:
link = driver.find_element_by_link_text("放入想要點擊的網頁名稱")
link.click()
  • 回到上一頁
driver.back()
  • 回下一頁
driver.forward()
  • 將搜尋欄位清空:可以避免關鍵字錯誤
search.clear()
  1. 爬取google關鍵字圖片
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.LINK_TEXT, "圖片"))
).click()

此行程式碼會在google搜尋之後,點下圖片的選項

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


chrome_driver_path = "/Users/shiyongchun/Desktop/HTML_CSS/chromedriver/chromedriver"  
service = Service(chrome_driver_path)
driver = webdriver.Chrome(service=service)
driver.get("https://google.com")
search = driver.find_element(By.NAME, "q")
search.send_keys("iphone 16")
search.send_keys(Keys.RETURN)

WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.LINK_TEXT, "圖片"))
).click()

time.sleep(10)
driver.quit()

https://ithelp.ithome.com.tw/upload/images/20240911/20168811mVSKCg1qJI.png

  • 接下來可以根據class來下載圖片
    https://ithelp.ithome.com.tw/upload/images/20240911/20168811FBdL0jNdAC.png
    可以發現圖片的class為「YQ4gaf」,src為網址
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "YQ4gaf"))
)

imgs = driver.find_elements(By.CLASS_NAME, "YQ4gaf")
for img in imgs:
    print(img.get_attribute("src"))

以上程式碼可以print出來所有圖片的url


上一篇
[Day20] Python 爬蟲-1
下一篇
[Day22] Python 爬蟲-3 (下載一系列搜尋到的圖片)
系列文
從零開始學Python30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言