iT邦幫忙

DAY 17
0

蠻可愛的資料庫與資料處理系列 第 17

CSVKit 2

  • 分享至 

  • xImage
  •  

今天繼續介紹csvkit.

資料切割

csvcut 可以用來對欄位作分片(slice),刪除或是變更順序.

範例:

-n 列出欄位

$ csvcut -n data.csv 
  1: state
  2: county
  3: fips
  4: nsn
  5: item_name
  6: quantity
  7: ui
  8: acquisition_cost
  9: total_cost
 10: ship_date
 11: federal_supply_category
 12: federal_supply_category_name
 13: federal_supply_class
 14: federal_supply_class_name

-c 列出指定欄位

$ csvcut -c 2,5,6 data.csv | head
county,item_name,quantity
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
ADAMS,"RIFLE,7.62 MILLIMETER",1
BUFFALO,"RIFLE,5.56 MILLIMETER",1
BUFFALO,"RIFLE,5.56 MILLIMETER",1
BUFFALO,"RIFLE,5.56 MILLIMETER",1

也可以使用欄位名稱,搭配昨天介紹的csvlook

$ csvcut -c county,item_name,quantity data.csv | head | csvlook

|----------+-----------------------+-----------|
|  county  | item_name             | quantity  |
|----------+-----------------------+-----------|
|  ADAMS   | RIFLE,7.62 MILLIMETER | 1         |
|  ADAMS   | RIFLE,7.62 MILLIMETER | 1         |
|  ADAMS   | RIFLE,7.62 MILLIMETER | 1         |
|  ADAMS   | RIFLE,7.62 MILLIMETER | 1         |
|  ADAMS   | RIFLE,7.62 MILLIMETER | 1         |
|  ADAMS   | RIFLE,7.62 MILLIMETER | 1         |
|  BUFFALO | RIFLE,5.56 MILLIMETER | 1         |
|  BUFFALO | RIFLE,5.56 MILLIMETER | 1         |
|  BUFFALO | RIFLE,5.56 MILLIMETER | 1         |
|----------+-----------------------+-----------|

資料統計

對csv檔案作一些統計

可以透過csvstat

來看範例:

$ csvcut -c county,acquisition_cost,ship_date data.csv | csvstat
  1. county
	<type 'unicode'>
	Nulls: False
	Unique values: 35
	5 most frequent values:
		DOUGLAS:	760
		DAKOTA:	42
		CASS:	37
		HALL:	23
		LANCASTER:	18
	Max length: 10
  2. acquisition_cost
	<type 'float'>
	Nulls: False
	Min: 0.0
	Max: 412000.0
	Sum: 5438254.0
	Mean: 5249.27992278
	Median: 6000.0
	Standard Deviation: 13360.1600088
	Unique values: 75
	5 most frequent values:
		6800.0:	304
		10747.0:	195
		6000.0:	105
		499.0:	98
		0.0:	81
  3. ship_date
	<type 'datetime.date'>
	Nulls: False
	Min: 1984-12-31
	Max: 2054-12-31
	Unique values: 84
	5 most frequent values:
		2013-04-25:	495
		2013-04-26:	160
		2008-05-20:	28
		2012-04-16:	26
		2006-11-17:	20

Row count: 1036

蠻方便的呢.


上一篇
CSVKit 介紹
下一篇
CSVKit 3
系列文
蠻可愛的資料庫與資料處理30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言