iT邦幫忙

第 11 屆 iThome 鐵人賽

DAY 29
0
AI & Data

Hands on Data Cleaning and Scraping 資料清理與爬蟲實作系列 第 29

Day29 Scraping from IMDb with Selenium 1/2 用Selenium爬取IMDb 1/2

先來看電影評分網站IMDb資料長相,抓取需要的資訊存起來,程式碼是參考自這篇文章
Take a look at how IMDb save the movie info. Get the info we want and save them down. Code reference.
https://ithelp.ithome.com.tw/upload/images/20190927/20119709dwjrx6c9Y1.jpg

# 載入所需套件 Import the packages
from pyquery import PyQuery as pq
import pandas as pd

def get_movie_info(movie_url):
    """
    從特定電影連結頁面取得資訊 Get movie info from a certain IMDb url
    """
    d = pq(movie_url)
    movie_rating = float(d("strong span").text()) # 抓取電影評分
    movie_genre = [x.text() for x in d(".subtext a").items()] # 抓取電影類型
    movie_released_date = movie_genre.pop() # 抓取電影上映日期
    movie_poster = d(".poster img").attr('src') # 抓取電影海報網址
    movie_cast = [x.text() for x in d(".primary_photo+ td a").items()] # 抓取電影演員

    # 回傳電影資訊 return the movie info
    movie_info = {
        "Rating": movie_rating,
        "Released_Date": movie_released_date,
        "Genre": movie_genre,
        "Poster_Link": movie_poster,
        "Cast": movie_cast
    }
    return movie_info

# 抓一筆電影資料看看 get the info of a movie to have a look
the_dressmaker = get_movie_info("https://www.imdb.com/title/tt2910904/")
print(the_dressmaker)

https://ithelp.ithome.com.tw/upload/images/20190927/20119709oBnsczOxYg.jpg
https://ithelp.ithome.com.tw/upload/images/20190927/20119709EnjMM9jiw3.jpg

# 存成資料框架看一下 transform the info we get into dataframe
df = pd.DataFrame.from_dict(the_dressmaker, orient='index')
df.transpose()

https://ithelp.ithome.com.tw/upload/images/20190927/20119709wCa0hGTSQL.jpg

本篇程式碼請參考Github。The code is available on Github.

文中若有錯誤還望不吝指正,感激不盡。
Please let me know if there’s any mistake in this article. Thanks for reading.

Reference 參考資料:

[1] 透過操控瀏覽器擷取網站資料

[2] What version of Chrome do I have?

[3] ChromeDriver - WebDriver for Chrome

[4] IMDb

[5] Stack Overflow


上一篇
Day28 BS4: Scrape from Youtube 2/2 用美麗的湯爬取Youtube 2/2
下一篇
Day30 Scraping from IMDb with Selenium 2/2 用Selenium爬取IMDb 2/2
系列文
Hands on Data Cleaning and Scraping 資料清理與爬蟲實作30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言