Day 04
嗨,第四天,今天來說明csv
基本的檔案的讀/寫操作吧!(應該不會太快吧?)
(題外話,今天跟同學們搭車來到板橋,中北部溫差有感,要注意保暖啊~)
The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases.
CSV
是常見的格式,可以匯入/匯出格式給試算表和資料庫使用,在上一年鐵人競賽的時候我有介紹了pandas
這個套件如何操作csv檔案,而今天會介紹csv
這個套件如何使用,當然,如果使用Scrapy
框架的話就不太需要自己做這個部分,但我們還是知道一下單純操作該怎麼做。
virtualenv
:source path/to/your/virtualenv/bin/activate
若沒有csv
套件,可用pip
安裝:
pip3
則將pip
改成pip3
pip install csv
那就開始吧!
The
csv module
’sreader
andwriter
objects read and write sequences. Programmers can also read and write data indictionary
form using theDictReader
andDictWriter
classes.
來看看下面的範例:
假設現在有筆資料檔案(jamesbond.csv),Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
為header
,其餘為資料內容,如下:
Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Dr. No,1962,Sean Connery,Terence Young,448.8,7,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
...
接著用open
開啟它:
import csv
with open('jamesbond.csv') as f:
myCsv = csv.reader(f)
headers = next(myCsv)
...
The next() function returns the next item in an iterator. You can add a default return value, to return if the iterable has reached to its end.
這裡next()
用來讀取下一行的資料,而csv
的第一行為header。
用for迴圈
讀取csv每一行的資料:
import csv
with open('jamesbond.csv') as f:
myCsv = csv.reader(f)
headers = next(myCsv)
for row in myCsv:
print(row)
可以看到每次輸出row
為list
的資料型態:
'Dr. No', '1962', 'Sean Connery', 'Terence Young', '448.8', '7', '0.6']
['From Russia with Love', '1963', 'Sean Connery', 'Terence Young', '543.8', '12.6', '1.6']
['Goldfinger', '1964', 'Sean Connery', 'Guy Hamilton', '820.4', '18.6', '3.2']
['Thunderball', '1965', 'Sean Connery', 'Terence Young', '848.1', '41.9', '4.7']
因此,可以用讀取list的方式分別讀取裡面的值,像是row[0]
讀取Film
的資料,這裡就自己嘗試看看吧!
還有另外一種讀取方式的方法,可以將資料轉為Dictionary
格式:
import csv
with open('./jamesbond.csv') as f:
myCsvDic = csv.DictReader(f)
for row in myCsvDic:
print(row['Film'])
print(row)
可以看到像是OrderedDict([('Film', 'Dr. No'), ('Year', '1962'), ('Actor', 'Sean Connery'), ('Director', 'Terence Young'), ('Box Office', '448.8'), ('Budget', '7'), ('Bond Actor Salary', '0.6')])
的內容,可以看到它是一個OrderedDict的物件型態,可以用取得Dictionary Value的方式取值,key
就是header
,上面範例中我們取每行row的Film
。
前面都是說明如何讀取
,現在,就要說明寫入
的部分了!
list
的格式儲存有哪些欄位。row
用tuple
的方式並全部放在一個list裡面。headers = ['Film', 'Year', 'Actor', 'Director', 'Box Office', 'Budget', 'Bond Actor Salary']
rows = [('Film', 'Dr. No'), ('Year', '1962'), ('Actor', 'Sean Connery'), ('Director', 'Terence Young'), ('Box Office', '448.8'), ('Budget', '7'), ('Bond Actor Salary', '0.6'),
('Film', 'From Russia with Love'), ('Year', '1963'), ('Actor', 'Sean Connery'), ('Director', 'Terence Young'), ('Box Office', '543.8'), ('Budget', '12.6'), ('Bond Actor Salary', '1.6'),
('Film', 'Goldfinger'), ('Year', '1964'), ('Actor', 'Sean Connery'), ('Director', 'Guy Hamilton'), ('Box Office', '820.4'), ('Budget', '18.6'), ('Bond Actor Salary', '3.2'),]
再來就是寫入檔案了,先用writer()
,先寫入第一行headers
,再把資料內容rows
寫入:
with open('writedFile.csv','w') as f:
writeCsv = csv.writer(f)
writeCsv.writerow(headers)
writeCsv.writerows(rows)
除了上面的方式,也可以使用DictWriter()
,直接定義headers並可以將資料以Dictionary方式寫入。只要將所有的資料(dic)存在一個list
內,用writerows()
。
rows = [{'Film':'Dr. No','Year': '1962', 'Actor': 'Sean Connery', 'Director': 'Terence Young', 'Box Office': '448.8', 'Budget':'7', 'Bond Actor Salary': '0.6'},
{'Film': 'From Russia with Love', 'Year': '1963', 'Actor': 'Sean Connery', 'Director': 'Terence Young', 'Box Office': '543.8', 'Budget': '12.6', 'Bond Actor Salary': '1.6'},
{'Film': 'Goldfinger', 'Year': '1964', 'Actor': 'Sean Connery', 'Director': 'Guy Hamilton', 'Box Office': '820.4', 'Budget': '18.6', 'Bond Actor Salary': '3.2'}]
with open('writedFile.csv','w') as f:
writedCsv = csv.DictWriter(f, headers)
writedCsv.writeheader()
writedCsv.writerows(rows)
這裡說明一下dialect
,編碼風格,默認為逗點(,)分隔,不過也支援自定義,只要透過register_dialect
方法。
register_dialect()
delimiter
: 分隔符號,默認為逗號。name
: 為dialect的名稱csv.register_dialect('myDialect', delimiter='|', quoting=csv.QUOTE_ALL)
範例:
myTextFile.csv
檔案,內容:Film|Year|Actor|Director|Box Office|Budget|Bond Actor Salary
Dr. No|1962|Sean Connery|Terence Young|448.8|7|0.6
From Russia with Love|1963|Sean Connery|Terence Young|543.8|12.6|1.6
Goldfinger|1964|Sean Connery|Guy Hamilton|820.4|18.6|3.2
with open('./myTextFile.csv','r') as f:
csv.register_dialect('my_delimite', delimiter='|', quoting=csv.QUOTE_ALL)
lines = csv.reader(f, 'my_delimite')
for line in lines:
print(line)
# output
#['Film', 'Year', 'Actor', 'Director', 'Box Office', 'Budget', 'Bond Actor Salary']
# ['Dr. No', '1962', 'Sean Connery', 'Terence Young', '448.8', '7', '0.6']
# ['From Russia with Love', '1963', 'Sean Connery', 'Terence Young', '543.8', '12.6', '1.6']
# ['Goldfinger', '1964', 'Sean Connery', 'Guy Hamilton', '820.4', '18.6', '3.2']
可以發現輸出是以|
隔開的,那這就是register_dialect
的用法,用來變更分隔符號。
可以比較跟下面這個程式碼的差別(讀取同個檔案),可以看到輸出就沒有被分隔開來,因為預設為逗點(,)
:
with open('./myTextFile.csv','r') as f:
lines = csv.reader(f)
for i in lines:
print(i)
# ['Film|Year|Actor|Director|Box Office|Budget|Bond Actor Salary']
# ['Dr. No|1962|Sean Connery|Terence Young|448.8|7|0.6']
# ['From Russia with Love|1963|Sean Connery|Terence Young|543.8|12.6|1.6']
# ['Goldfinger|1964|Sean Connery|Guy Hamilton|820.4|18.6|3.2']
delimiter
,定義分隔的福號,像這樣:writer = csv.writer(csvfile, delimiter=' ')
csv
中的所有delimiter
:print (csv.list_dialects())
delimiter
unregister_dialect(name)
今天說明了如何操作CSV的檔案~明天就說說json
怎麼使用吧?
參考來源:
13.1. csv — CSV File Reading and Writing — Python v3.2.6 documentation
Python next() Function