iT邦幫忙

0

google drive 用 selenium 將滾輪滾到最底

  • 分享至 

  • xImage

最近想要把google drive的資料夾內容名稱全部爬下來,但發現用一般滾輪下拉的方式都沒辦法成功,後來想說使用鍵盤操作按下鍵,雖然沒有出現錯誤但卻沒作用,爬出來的東西就是沒有滾動的樣子,還是說我x_path找錯地方了?
(還是有辦法可以找出滾輪js的資訊做or鼠標進行拖拉的方式來成功?

import requests 
from bs4 import BeautifulSoup
import pandas as pd
from selenium import webdriver
import time
from datetime import datetime
from selenium.webdriver.common.keys import Keys

chrome_options = webdriver.ChromeOptions()  
prefs = {"profile.default_content_setting_values.notifications" : 2}
chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)  

url = google driver的網址
driver.get(url)


div = driver.find_element_by_xpath('//*[@class="PolqHc sd-ph"]/div')
div.send_keys(Keys.DOWN)

soup = BeautifulSoup(driver.page_source) 

for title in soup.select('.Q5txwe'):
    print(title.text)
js="var action=document.documentElement.scrollTop=100000"
driver.execute_script(js)
在其他網站都能滾,但在google drive就不行了
你確定你是在"我的雲端硬碟▼"的容器裡做滑鼠滾動的動作嗎?還是最上層body ?
zong1220 iT邦新手 5 級 ‧ 2021-08-13 13:16:14 檢舉
我想要在我的雲端硬碟裡做滾動的動作,但不知道該怎麼設定才好
用F12的Elements 裡找找看"我的雲端硬碟"所在的容器看看,我是找不到啦,不知是不是動態載入造成,功力不夠,或許有高手可以提供什麼招數
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

2 個回答

0
japhenchen
iT邦超人 1 級 ‧ 2021-08-13 14:45:31

試試

lenOfPage = driver.execute_script('window.scrollTo(0, [hard code the height])')
看更多先前的回應...收起先前的回應...
zong1220 iT邦新手 5 級 ‧ 2021-08-13 20:06:50 檢舉

這樣會彈出錯誤訊息Message: javascript error: Unexpected identifier

你還真朝直耶

lenOfPage = driver.execute_script('window.scrollTo(0, [hard code the height])')

裡的hard code the height是指你直接寫死的滾動高度啦,你可以直接寫

lenOfPage = driver.execute_script('window.scrollTo(0, 5000)')

zong1220 iT邦新手 5 級 ‧ 2021-08-13 21:50:19 檢舉

喔喔喔了解了解! 但還是無法造成滾動的效果/images/emoticon/emoticon02.gif

試試把滑鼠設到畫面正中間再滾動看看

1
froce
iT邦大師 1 級 ‧ 2021-08-13 15:31:33

有google drive api用api抓就好啦...
要先去開api。
理論上正常使用是不會用到錢。

from __future__ import print_function
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials

# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/drive.metadata.readonly']

def main():
    """Shows basic usage of the Drive v3 API.
    Prints the names and ids of the first 10 files the user has access to.
    """
    creds = None
    # The file token.json stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.json'):
        creds = Credentials.from_authorized_user_file('token.json', SCOPES)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.json', 'w') as token:
            token.write(creds.to_json())

    service = build('drive', 'v3', credentials=creds)

    # Call the Drive v3 API
    results = service.files().list(q="mimeType='application/vnd.google-apps.folder'",
                                         fields='nextPageToken, files(id, name)',
                                         ).execute()
    # results = service.files().list(
    #     pageSize=10, fields="nextPageToken, files(id, name)").execute()
    items = results.get('files', [])

    if not items:
        print('No files found.')
    else:
        print('Files:')
        for item in items:
            print(u'{0} ({1})'.format(item['name'], item['id']))

if __name__ == '__main__':
    main()
zong1220 iT邦新手 5 級 ‧ 2021-08-13 20:07:13 檢舉

好的謝謝我再試試看!

我要發表回答

立即登入回答