網頁圖片爬蟲抓不下來

網路爬蟲爬蟲圖片 python爬蟲

qr88097 2022-11-05 21:48:42 ‧ 1439 瀏覽

分享至

各位前輩好！，因課業需求我需要抓取電商平台網站上的圖片，但是遇到一個問題，我在使用selenium時，會有抓不下來的問題，因為當我游標指到圖片上時，圖片會做更動，如下：
（游標未指）

（游標指向圖片後）

想請問是否不能直接使用find_element(By.CLASSNAME)，來做抓取動作，因為一直抱錯或抓不到，謝謝各位！
以下附上我的程式碼以及網頁資訊：

from ast import keyword
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
import time
import os
import wget 

#設定隨機的延遲時間
import random
import time

delay_choices = range(5,15)  #延遲的秒數
delay = random.choice(delay_choices)  #隨機選取秒數
#time.sleep(delay)  #延遲

#設定使用者代理(User-Agent)
import requests
from fake_useragent import UserAgent
 
keyword = "jeans"
 
user_agent = UserAgent()
response = requests.get(url="https://www2.hm.com/en_asia3/ladies/shop-by-product/jeans.html", headers={ 'user-agent': user_agent.random })

driver = webdriver.Chrome("/Users/wulixuan/Downloads/chromedriver")  
driver.get("https://www2.hm.com/en_asia3/ladies/shop-by-product/jeans.html")
time.sleep(4)


cookie = driver.find_element(By.ID, 'onetrust-accept-btn-handler')
cookie.click()
time.sleep(2)

for i in range(8):
    driver.execute_script("window.scrollTo(0, 6900);")
    time.sleep(delay)
    loadmore = driver.find_element(By.XPATH,"/html/body/main/div/div/div/div[3]/div[2]/button")
    loadmore.click()
#By.XPATH, '//*[@id="page-content"]/div/div[3]/div[2]/button'
imgs = driver.find_elements(By.CLASS_NAME, 'item-image')

path = os.path.join("H&M" + keyword)
os.mkdir(path)

count = 0
for img in imgs:
    save_as = os.path.join(path, keyword + str(count) + '.jpg')
    #print(img.get_attribute("src")) #取得他的屬性也就是src，且要下載就需要取得src
    wget.download(img.get_attribute("src"), save_as)
    count += 1

time.sleep(6)
driver.quit()