iT邦幫忙

第 11 屆 iThome 鐵人賽

DAY 22
0
AI & Data

Hands on Data Cleaning and Scraping 資料清理與爬蟲實作系列 第 22

Day22 Airbnb in Berlin 3/5 the ring zone 柏林Airbnb 3/5 蛋黃區

https://ithelp.ithome.com.tw/upload/images/20190922/20119709F5RTtjIG5F.png
到柏林旅遊,會發現市區交通票券由放射狀分為A、B、C三個區塊,想買長期票券分法只有A+B區、B+C區、A+B+C區,以一般旅遊民眾而言,移動範圍多會在A+B區,網路上沒有找到相關A+B區詳細劃分的資料,但可以這個柏林低排放區為依據,在這個區域的房源交通易達度較高,故只取出這些地區的房源做分析。這個網站由於柏林以大約S-Bahn路面輕軌環狀電車以內規劃為低排放區,姑且就稱之為柏林蛋黃區XD。

The transportation in Berlin is devided into A, B, and C three zones. But if you want to buy a longterm ticket, you can only choose between A+B zone, B+C zone, and A+B+C zone. I couldn't find any data that specify the border of three zones but I found this Low-emission Zone Area instead. The low-emission zone covers the centre of Berlin inside the S-Bahn ring so we will analyse the listings in this area.
https://ithelp.ithome.com.tw/upload/images/20190922/20119709sg8MBHrLpy.jpg
https://ithelp.ithome.com.tw/upload/images/20190922/20119709u9bklt8ks2.jpg

載入常用套件並讀入我們要分析的資料

First, we need to import the packeges we need and read in the data we are about to analyse.

# 載入所需套件 import the packages we need
import pandas as pd 
import numpy as np 
ringzipcode = pd.read_csv('airbnb/ring_zipcode.csv') # 讀入ring_zipcode.csv檔案 read in the ring_zipcode.csv file
ringzipcode.columns = ['zipcode', 'info'] # 改下欄位名 change the name of the columns
ringzipcode.head(10) # 讀取前十筆資料看一下 call the top 10 rows to take a look

https://ithelp.ithome.com.tw/upload/images/20190922/20119709aTtXC1kMBH.jpg

ringzipcode = np.array(ringzipcode)
ringzipcode_list = ringzipcode.tolist()
print(ringzipcode_list[:10]) # 印出前10筆看看 print out the top 10 list 

https://ithelp.ithome.com.tw/upload/images/20190922/20119709oMkiBPdgyN.jpg

# 我們只要存整個區域在環狀輕軌電車內的郵遞區號 we only want the postcode of areas that's in the S-Bahn ring
rz = []
for i in ringzipcode_list:
    if i[1] == 'innerhalb':
        rz.append(i[0])
print(rz)    

https://ithelp.ithome.com.tw/upload/images/20190922/20119709eIY0g82JhG.jpg

import warnings # 忽略警告訊息 
warnings.filterwarnings("ignore") 
# 讀入listing檔案來分析 Read in the listing file
listing = pd.read_csv('airbnb/listings.csv') # 讀入listing檔案來分析 read in the listing file
print('There are', listing.id.nunique(), 'listings in the listing data.')
listing.info() # 查看資料細節 the info of data
listing.head(3) # 叫出前三筆資料看看 print out the top three rows of data

https://ithelp.ithome.com.tw/upload/images/20190922/20119709kswrMZiRwo.jpg

abzone = listing["zipcode"].isin(rz)
ablisting = listing[abzone]
ablisting.info()
print('篩過郵遞區號後少了一半以上的房源數量。')
print('The listing counts become less than a half after filtered with postcode in the S-Bahn ring area.')

https://ithelp.ithome.com.tw/upload/images/20190922/20119709sPwy4KCZxn.jpg

# 篩選後數量前10名不一樣了,沒想到有些偏遠地區房源數量很多
# top 10 listings changed after filtering, didn't know that there were many listings in the regional area.
grouped_ab_df = ablisting.groupby('neighbourhood_cleansed').count()[['id']].sort_values('id', ascending=False).head(10) 
grouped_ab_df

https://ithelp.ithome.com.tw/upload/images/20190922/201197096kmtpbVRdf.jpg

grouped_ab_df.index

https://ithelp.ithome.com.tw/upload/images/20190922/20119709gEsqnIJIRz.jpg

# 房源數量前10的區域 The areas with the top 10 listings
top10 = []
for i in range(10):
    top10.append(grouped_ab_df.index[i])
print(top10)

https://ithelp.ithome.com.tw/upload/images/20190922/20119709w4TasGLOnN.jpg

ablisting_iftop10 = ablisting["neighbourhood_cleansed"].isin(top10)
ab_top10_listing = ablisting[ablisting_iftop10]
ab_top10_listing.info()

https://ithelp.ithome.com.tw/upload/images/20190922/20119709PL0DzeWT7C.jpg

ab_top10_listing.head()

https://ithelp.ithome.com.tw/upload/images/20190922/20119709Y8gaXNp2ww.jpg

# 把位於蛋黃區房源數量前10名的存成一個新的csv檔,明天來好好分析 
# save the top 10 listings areas within the low-emission zone for further analysis
ab_top10_listing.to_csv('ab_top10_listing.csv')

本篇程式碼與範例檔案請參考Github。The code and example files are available on Github.

文中若有錯誤還望不吝指正,感激不盡。
Please let me know if there’s any mistake in this article. Thanks for reading.

Reference 參考資料:

[1] Inside Airbnb

[2] 利用Airbnb來更了解居住城市,以臺北為例 Python實作(上)

[3] Airbnb listings in Berlin

[4] Low-emission Zone Area

[5] How To Use The Berlin Public Transport Without A Fine


上一篇
Day21 Airbnb in Berlin 2/5 listings overview 柏林Airbnb 2/5 房源概述
下一篇
Day23 Airbnb in Berlin 4/5 listings analysis 柏林Airbnb 4/5 蛋黃區房源分析
系列文
Hands on Data Cleaning and Scraping 資料清理與爬蟲實作30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言