今天繼續介紹csvkit.
csvcut 可以用來對欄位作分片(slice),刪除或是變更順序.
範例:
-n 列出欄位
$ csvcut -n data.csv 
  1: state
  2: county
  3: fips
  4: nsn
  5: item_name
  6: quantity
  7: ui
  8: acquisition_cost
  9: total_cost
 10: ship_date
 11: federal_supply_category
 12: federal_supply_category_name
 13: federal_supply_class
 14: federal_supply_class_name
-c 列出指定欄位
$ csvcut -c 2,5,6 data.csv | head
county,item_name,quantity
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
BUFFALO,"RIFLE,5.56 MILLIMETER",1
BUFFALO,"RIFLE,5.56 MILLIMETER",1
BUFFALO,"RIFLE,5.56 MILLIMETER",1
也可以使用欄位名稱,搭配昨天介紹的csvlook
$ csvcut -c county,item_name,quantity data.csv | head | csvlook
|----------+-----------------------+-----------|
|  county  | item_name             | quantity  |
|----------+-----------------------+-----------|
|  ADAMS   | RIFLE,7.62 MILLIMETER | 1         |
|  ADAMS   | RIFLE,7.62 MILLIMETER | 1         |
|  ADAMS   | RIFLE,7.62 MILLIMETER | 1         |
|  ADAMS   | RIFLE,7.62 MILLIMETER | 1         |
|  ADAMS   | RIFLE,7.62 MILLIMETER | 1         |
|  ADAMS   | RIFLE,7.62 MILLIMETER | 1         |
|  BUFFALO | RIFLE,5.56 MILLIMETER | 1         |
|  BUFFALO | RIFLE,5.56 MILLIMETER | 1         |
|  BUFFALO | RIFLE,5.56 MILLIMETER | 1         |
|----------+-----------------------+-----------|
對csv檔案作一些統計
可以透過csvstat
來看範例:
$ csvcut -c county,acquisition_cost,ship_date data.csv | csvstat
  1. county
	<type 'unicode'>
	Nulls: False
	Unique values: 35
	5 most frequent values:
		DOUGLAS:	760
		DAKOTA:	42
		CASS:	37
		HALL:	23
		LANCASTER:	18
	Max length: 10
  2. acquisition_cost
	<type 'float'>
	Nulls: False
	Min: 0.0
	Max: 412000.0
	Sum: 5438254.0
	Mean: 5249.27992278
	Median: 6000.0
	Standard Deviation: 13360.1600088
	Unique values: 75
	5 most frequent values:
		6800.0:	304
		10747.0:	195
		6000.0:	105
		499.0:	98
		0.0:	81
  3. ship_date
	<type 'datetime.date'>
	Nulls: False
	Min: 1984-12-31
	Max: 2054-12-31
	Unique values: 84
	5 most frequent values:
		2013-04-25:	495
		2013-04-26:	160
		2008-05-20:	28
		2012-04-16:	26
		2006-11-17:	20
Row count: 1036
蠻方便的呢.