iT邦幫忙

2019 iT 邦幫忙鐵人賽

DAY 4
1
AI & Data

Scrapy爬蟲與資料處理30天筆記系列 第 4

[Day 04] CSV 讀寫操作

Day 04

嗨,第四天,今天來說明csv基本的檔案的讀/寫操作吧!(應該不會太快吧?)
(題外話,今天跟同學們搭車來到板橋,中北部溫差有感,要注意保暖啊~)

The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases.

CSV是常見的格式,可以匯入/匯出格式給試算表和資料庫使用,在上一年鐵人競賽的時候我有介紹了pandas這個套件如何操作csv檔案,而今天會介紹csv這個套件如何使用,當然,如果使用Scrapy框架的話就不太需要自己做這個部分,但我們還是知道一下單純操作該怎麼做。

  • (option) 在開始之前,啟用virtualenv
source path/to/your/virtualenv/bin/activate

若沒有csv套件,可用pip安裝:

  • 若是pip3則將pip改成pip3
pip install csv

那就開始吧!

The csv module’s reader and writer objects read and write sequences. Programmers can also read and write data in dictionary form using the DictReader and DictWriter classes.

來看看下面的範例:

假設現在有筆資料檔案(jamesbond.csv),Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salaryheader,其餘為資料內容,如下:

Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Dr. No,1962,Sean Connery,Terence Young,448.8,7,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
...

接著用open開啟它:

import csv
with open('jamesbond.csv') as f:
    myCsv = csv.reader(f)
    headers = next(myCsv)
    ...

The next() function returns the next item in an iterator. You can add a default return value, to return if the iterable has reached to its end.

這裡next()用來讀取下一行的資料,而csv的第一行為header。

for迴圈讀取csv每一行的資料:

import csv
with open('jamesbond.csv') as f:
    myCsv = csv.reader(f)
    headers = next(myCsv)
    for row in myCsv:
        print(row)

可以看到每次輸出rowlist的資料型態:

'Dr. No', '1962', 'Sean Connery', 'Terence Young', '448.8', '7', '0.6']
['From Russia with Love', '1963', 'Sean Connery', 'Terence Young', '543.8', '12.6', '1.6']
['Goldfinger', '1964', 'Sean Connery', 'Guy Hamilton', '820.4', '18.6', '3.2']
['Thunderball', '1965', 'Sean Connery', 'Terence Young', '848.1', '41.9', '4.7']

因此,可以用讀取list的方式分別讀取裡面的值,像是row[0]讀取Film的資料,這裡就自己嘗試看看吧!

還有另外一種讀取方式的方法,可以將資料轉為Dictionary格式:

import csv
with open('./jamesbond.csv') as f:
    myCsvDic = csv.DictReader(f)
    for row in myCsvDic:
        print(row['Film'])

print(row)可以看到像是OrderedDict([('Film', 'Dr. No'), ('Year', '1962'), ('Actor', 'Sean Connery'), ('Director', 'Terence Young'), ('Box Office', '448.8'), ('Budget', '7'), ('Bond Actor Salary', '0.6')])的內容,可以看到它是一個OrderedDict的物件型態,可以用取得Dictionary Value的方式取值,key就是header,上面範例中我們取每行row的Film

前面都是說明如何讀取,現在,就要說明寫入的部分了!

  • 首先先定義要寫入的資料,一般來說,csv需要第一行一行header來定義有哪些column,用list的格式儲存有哪些欄位。
  • 每一行rowtuple的方式並全部放在一個list裡面。
  • 我們就以剛剛的資料來當範例,如下:
headers = ['Film', 'Year', 'Actor', 'Director', 'Box Office', 'Budget', 'Bond Actor Salary']
rows = [('Film', 'Dr. No'), ('Year', '1962'), ('Actor', 'Sean Connery'), ('Director', 'Terence Young'), ('Box Office', '448.8'), ('Budget', '7'), ('Bond Actor Salary', '0.6'),
         ('Film', 'From Russia with Love'), ('Year', '1963'), ('Actor', 'Sean Connery'), ('Director', 'Terence Young'), ('Box Office', '543.8'), ('Budget', '12.6'), ('Bond Actor Salary', '1.6'),
         ('Film', 'Goldfinger'), ('Year', '1964'), ('Actor', 'Sean Connery'), ('Director', 'Guy Hamilton'), ('Box Office', '820.4'), ('Budget', '18.6'), ('Bond Actor Salary', '3.2'),]

再來就是寫入檔案了,先用writer(),先寫入第一行headers,再把資料內容rows寫入:

with open('writedFile.csv','w') as f:
    writeCsv = csv.writer(f)
    writeCsv.writerow(headers)
    writeCsv.writerows(rows)

除了上面的方式,也可以使用DictWriter(),直接定義headers並可以將資料以Dictionary方式寫入。只要將所有的資料(dic)存在一個list內,用writerows()

rows = [{'Film':'Dr. No','Year': '1962', 'Actor': 'Sean Connery', 'Director': 'Terence Young', 'Box Office': '448.8', 'Budget':'7', 'Bond Actor Salary': '0.6'},
         {'Film': 'From Russia with Love', 'Year': '1963', 'Actor': 'Sean Connery', 'Director': 'Terence Young', 'Box Office': '543.8', 'Budget': '12.6', 'Bond Actor Salary': '1.6'},
         {'Film': 'Goldfinger', 'Year': '1964', 'Actor': 'Sean Connery', 'Director': 'Guy Hamilton', 'Box Office': '820.4', 'Budget': '18.6', 'Bond Actor Salary': '3.2'}]

with open('writedFile.csv','w') as f:
    writedCsv = csv.DictWriter(f, headers)
    writedCsv.writeheader()
    writedCsv.writerows(rows)

這裡說明一下dialect,編碼風格,默認為逗點(,)分隔,不過也支援自定義,只要透過register_dialect方法。

register_dialect()

  • delimiter : 分隔符號,默認為逗號。
  • name : 為dialect的名稱
csv.register_dialect('myDialect', delimiter='|', quoting=csv.QUOTE_ALL)

範例:

  • 建立一個myTextFile.csv檔案,內容:
Film|Year|Actor|Director|Box Office|Budget|Bond Actor Salary
Dr. No|1962|Sean Connery|Terence Young|448.8|7|0.6
From Russia with Love|1963|Sean Connery|Terence Young|543.8|12.6|1.6
Goldfinger|1964|Sean Connery|Guy Hamilton|820.4|18.6|3.2
  • 程式碼:
with open('./myTextFile.csv','r') as f:
    csv.register_dialect('my_delimite', delimiter='|', quoting=csv.QUOTE_ALL)
    lines = csv.reader(f, 'my_delimite')
    for line in lines:
        print(line)
# output
#['Film', 'Year', 'Actor', 'Director', 'Box Office', 'Budget', 'Bond Actor Salary']
# ['Dr. No', '1962', 'Sean Connery', 'Terence Young', '448.8', '7', '0.6']
# ['From Russia with Love', '1963', 'Sean Connery', 'Terence Young', '543.8', '12.6', '1.6']
# ['Goldfinger', '1964', 'Sean Connery', 'Guy Hamilton', '820.4', '18.6', '3.2'] 

可以發現輸出是以|隔開的,那這就是register_dialect的用法,用來變更分隔符號。

可以比較跟下面這個程式碼的差別(讀取同個檔案),可以看到輸出就沒有被分隔開來,因為預設為逗點(,)

with open('./myTextFile.csv','r') as f:
    lines = csv.reader(f)
    for i in lines:
        print(i)
# ['Film|Year|Actor|Director|Box Office|Budget|Bond Actor Salary']
# ['Dr. No|1962|Sean Connery|Terence Young|448.8|7|0.6']
# ['From Russia with Love|1963|Sean Connery|Terence Young|543.8|12.6|1.6']
# ['Goldfinger|1964|Sean Connery|Guy Hamilton|820.4|18.6|3.2']
  • 當然,在寫入時也可以增加參數delimiter,定義分隔的福號,像這樣:
writer = csv.writer(csvfile, delimiter=' ')
  • 列出在目前csv中的所有delimiter
print (csv.list_dialects())
  • 刪除delimiter
unregister_dialect(name)

今天說明了如何操作CSV的檔案~明天就說說json怎麼使用吧?

參考來源:
13.1. csv — CSV File Reading and Writing — Python v3.2.6 documentation
Python next() Function


上一篇
[Day 03] 檔案讀取/寫入
下一篇
[Day 05] JSON編碼解碼操作
系列文
Scrapy爬蟲與資料處理30天筆記30

尚未有邦友留言

立即登入留言