今天繼續介紹csvkit.
csvcut 可以用來對欄位作分片(slice),刪除或是變更順序.
範例:
-n 列出欄位
$ csvcut -n data.csv
1: state
2: county
3: fips
4: nsn
5: item_name
6: quantity
7: ui
8: acquisition_cost
9: total_cost
10: ship_date
11: federal_supply_category
12: federal_supply_category_name
13: federal_supply_class
14: federal_supply_class_name
-c 列出指定欄位
$ csvcut -c 2,5,6 data.csv | head
county,item_name,quantity
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
BUFFALO,"RIFLE,5.56 MILLIMETER",1
BUFFALO,"RIFLE,5.56 MILLIMETER",1
BUFFALO,"RIFLE,5.56 MILLIMETER",1
也可以使用欄位名稱,搭配昨天介紹的csvlook
$ csvcut -c county,item_name,quantity data.csv | head | csvlook
|----------+-----------------------+-----------|
| county | item_name | quantity |
|----------+-----------------------+-----------|
| ADAMS | RIFLE,7.62 MILLIMETER | 1 |
| ADAMS | RIFLE,7.62 MILLIMETER | 1 |
| ADAMS | RIFLE,7.62 MILLIMETER | 1 |
| ADAMS | RIFLE,7.62 MILLIMETER | 1 |
| ADAMS | RIFLE,7.62 MILLIMETER | 1 |
| ADAMS | RIFLE,7.62 MILLIMETER | 1 |
| BUFFALO | RIFLE,5.56 MILLIMETER | 1 |
| BUFFALO | RIFLE,5.56 MILLIMETER | 1 |
| BUFFALO | RIFLE,5.56 MILLIMETER | 1 |
|----------+-----------------------+-----------|
對csv檔案作一些統計
可以透過csvstat
來看範例:
$ csvcut -c county,acquisition_cost,ship_date data.csv | csvstat
1. county
<type 'unicode'>
Nulls: False
Unique values: 35
5 most frequent values:
DOUGLAS: 760
DAKOTA: 42
CASS: 37
HALL: 23
LANCASTER: 18
Max length: 10
2. acquisition_cost
<type 'float'>
Nulls: False
Min: 0.0
Max: 412000.0
Sum: 5438254.0
Mean: 5249.27992278
Median: 6000.0
Standard Deviation: 13360.1600088
Unique values: 75
5 most frequent values:
6800.0: 304
10747.0: 195
6000.0: 105
499.0: 98
0.0: 81
3. ship_date
<type 'datetime.date'>
Nulls: False
Min: 1984-12-31
Max: 2054-12-31
Unique values: 84
5 most frequent values:
2013-04-25: 495
2013-04-26: 160
2008-05-20: 28
2012-04-16: 26
2006-11-17: 20
Row count: 1036
蠻方便的呢.