閒聊
昨天我們使用Selenium爬了Dcard,今天要來使用模擬使用者的情況來繼續爬Dcard 。
預期
實作
from selenium import webdriver
from time import sleep
import json
if _name_ = '_main_' :
scroll_time = int(input('請輸入捲動次數'))
driver = webdriver.Chrome()
driver.get('https://www.dcard.tw/f')
window.scrollTo
來達到我們想要的效果。from selenium import webdriver
from time import sleep
import json
if _name_ = '_main_' :
scroll_time = int(input('請輸入捲動次數'))
driver = webdriver.Chrome()
driver.get('https://www.dcard.tw/f')
sleep(2)
js = window.scrollTo(0, document.body.scrollHeight)
driver.execute_script(js)
try-except
讓程式順利運行。from selenium import webdriver
from time import sleep
import json
if _name_ = '_main_' :
scroll_time = int(input('請輸入捲動次數'))
driver = webdriver.Chrome()
driver.get('https://www.dcard.tw/f')
result = []
for now_time in range(1, scroll_time+1) :
sleep(2)
eles = driver.find_element_by_calss_name('sc-afbc95aa-0')
for ele in eles :
try :
title = ele.find_element_by_class_name('sc-afbc95aa-2').text
href = ele.find_element_by_class_name('sc-afbc95aa-2').get_attribute('href')
subtitle = ele.find_element_by_class_name('sc-5914a055-0').text
result = {
'title' : title
'href' : href
'subtitle' : subtitle
}
results.append(result)
expect :
pass
print(f"now scroll {now_time}/{scroll_time}")
js = window.scrollTo(0, document.body.scrollHeight)
driver.execute_script(js)
with open('Dcard-articles.json', 'w', encoding='utf-8') as f:
json.dump(results, f, indent=2,
sort_keys=True, ensure_ascii=False)
driver.quit() #關閉瀏覽器
from selenium import webdriver
from time import sleep
import json
if _name_ = '_main_' :
scroll_time = int(input('請輸入捲動次數'))
driver = webdriver.Chrome()
driver.get('https://www.dcard.tw/f')
result = []
prev_ele = None
for now_time in range(1, scroll_time+1) :
sleep(2)
eles = driver.find_element_by_calss_name('sc-afbc95aa-0')
try :
eles = eles[eles.index(peve_ele):]
except :
pass
for ele in eles :
try :
title = ele.find_element_by_class_name('sc-afbc95aa-2').text
href = ele.find_element_by_class_name('sc-afbc95aa-2').get_attribute('href')
subtitle = ele.find_element_by_class_name('sc-5914a055-0').text
result = {
'title' : title
'href' : href
'subtitle' : subtitle
}
results.append(result)
expect :
pass
prev_ele = eles[-1]
print(f"now scroll {now_time}/{scroll_time}")
js = window.scrollTo(0, document.body.scrollHeight)
driver.execute_script(js)
with open('Dcard-articles.json', 'w', encoding='utf-8') as f:
json.dump(results, f, indent=2,
sort_keys=True, ensure_ascii=False)
driver.quit() #關閉瀏覽器
結語
今天練習了捲動的技巧,也順利的讓程式碼執行了!
明天一起來聊聊網頁自動化這件事情~
明天!
【Day 22】認識並實作哈希值
參考資料
HTML DOM 快速導覽 - window 物件的方法 scrollTo()https://pydoing.blogspot.com/2011/10/javascript-window-scrollto.html
原生js window.scrollTo平滑滾動到頁面的某個位置https://www.796t.com/content/1541737292.html