DAY 10
1
AI & Data

## [Day10]Learning Pandas - Series、DataFrame、Index

### 前置作業

``````pip install pandas
``````

## Pandas物件

pandas底下有三大物件Series和DataFrame和Index，主要都是用這三個物件在做運算。

### Series(一維陣列)

``````import numpy as np
import pandas as pd
ironman = pd.Series([0.11,0.22,0.33,0.44])
ironman
``````
``````# 輸出結果
0    0.11
1    0.22
2    0.33
3    0.44
dtype: float64
``````

``````print('ironman.values------->',ironman.values)
print('ironman.index------->',ironman.index)
``````
``````# 輸出結果
ironman.values-------> [0.11 0.22 0.33 0.44]
ironman.index-------> RangeIndex(start=0, stop=4, step=1)
``````

``````ironman = pd.Series([0.11,0.22,0.33,0.44], index=['a','b','c','d'])
ironman
``````
``````# 輸出結果
a    0.11
b    0.22
c    0.33
d    0.44
dtype: float64
``````

``````dic_ironman = {
'a': 11,
'b': 22,
'c': 33
}
ironman = pd.Series(dic_ironman)
ironman
``````
``````# 輸出結果
a    11
b    22
c    33
dtype: int64
``````

### DataFrame(多個Series組成)

DataFrame跟Series一樣，可以指定index，但這邊可以想像成DataFrame是多個Series組成。

``````number = pd.Series({'taipei':200, 'taichung': 300, 'changhua': 400, 'kaohsiung' : 150})
mayor = pd.Series({'taipei': 'Kui', 'taichung': 'Ha', 'changhua': 'Chin', 'kaohsiung' : 'Lui'})
ironman_df = pd.DataFrame({'number':number, 'mayor':mayor})
ironman_df
``````
``````# 輸出結果
|  | number | mayor |
| -------- | -------- | -------- |
| taipei     | 200     | Kui     |
| taichung     | 300     | Ha     |
| changhua     | 400     | Chin     |
| kaohsiung     | 150     | Lui     |
``````

``````print('ironman_df.values------->',ironman_df.values)
print('ironman_df.index------->',ironman_df.index)
print('ironman_df.columns------->',ironman_df.columns)
``````
``````# 輸出結果
ironman_df.values-------> [[200 'Kui']
[300 'Ha']
[400 'Chin']
[150 'Lui']]
ironman_df.index-------> Index(['taipei', 'taichung', 'changhua', 'kaohsiung'], dtype='object')
ironman_df.columns-------> Index(['number', 'mayor'], dtype='object')
``````

1. Series可以建立DataFrame
2. dict也可以建立DataFrame
``````pd.DataFrame(number, columns=['number']) #從單一Series
pd.DataFrame({'number':{'taipei':200, 'taichung': 300, 'changhua': 400, 'kaohsiung' : 150}}) #從dict建立
``````
``````# 輸出結果
|  | number |
| -------- | -------- |
| taipei     | 200     |
| taichung     | 300     |
| changhua     | 400     |
| kaohsiung     | 150     |
``````

``````team =np.zeros(4, dtype={'names':('name','number','team'),'formats':('U10','i2','U10')})
team['name'] =['彭政閔','林智勝','蘇偉達','陽耀勳']
team['number'] = [23,32,96,23]
team['team'] = ['兄弟象','兄弟象','兄弟象','lamigo']
pd.DataFrame(team)
``````
``````# 輸出結果
name	number	team
0	彭政閔	  23	兄弟象
1	林智勝	  32	兄弟象
2	蘇偉達	  96	兄弟象
3	陽耀勳	  23	lamigo
``````

### Index(不可修改的陣列)

``````ironman_index = pd.Index([0.11,0.22,0.33,0.44])
ironman_index
``````
``````# 輸出結果
Float64Index([0.11, 0.22, 0.33, 0.44], dtype='float64')
``````

## DataFrame、Series的存取、修改

### Series

• 切片方式
``````ironman = pd.Series([0.11,0.22,0.33,0.44], index=['a','b','c','d'])
ironman['a':'c']
``````
``````# 輸出結果
a    0.11
b    0.22
c    0.33
dtype: float64
``````

``````ironman[0:3]
``````
``````# 輸出結果
a    0.11
b    0.22
c    0.33
dtype: float64
``````
• 遮罩的方式
``````ironman[ironman > 0.22]
``````
``````# 輸出結果
c    0.33
d    0.44
dtype: float64
``````

• fancy的方式
重溫一下fancy就是指傳遞一個陣列當作index去取得元素。
``````ironman[['a','d']]
``````
``````# 輸出結果
a    0.11
d    0.44
dtype: float64
``````
• loc
當用切片取得陣列，index也是數字情況下就要使用loc，以下範例展示有使用loc和沒有的取得陣列結果

``````ironman = pd.Series([0.11,0.22,0.33,0.44], index=[1,3,5,7])
ironman[1:3]
``````
``````# 輸出結果
3    0.22
5    0.33
dtype: float64
``````

``````ironman.loc[1:3]
``````
``````# 輸出結果
1    0.11
3    0.22
dtype: float64
``````

### DataFrame

``````number = pd.Series({'taipei':200, 'taichung': 300, 'changhua': 400, 'kaohsiung' : 150})
area = pd.Series({'taipei': 22, 'taichung': 25, 'changhua': 35, 'kaohsiung' : 10})
ironman_pd = pd.DataFrame({'number':number, 'area':area})
ironman_pd['divided'] = ironman_pd['number'] / ironman_pd['area']
ironman_pd
``````
``````# 輸出結果
number	area	divided
taipei	    200	     22	   9.090909
taichung	300	     25	   12.000000
changhua	400	     35	   11.428571
kaohsiung	150	     10	   15.000000
``````
• 將陣列作轉置
``````ironman_pd.T
``````
``````# 輸出結果
taipei	   taichung	    changhua	   kaohsiung
number	  200.000000	300.0	   400.000000	    150.0
area	  22.000000	    25.0	   35.000000	    10.0
divided	   9.090909	    12.0	   11.428571	    15.0
``````
• 使用iloc來做切片
``````ironman_pd.iloc[:2,:2]
``````
``````# 輸出結果
number	area
taipei	    200	     22
taichung	300	     25
``````

``````ironman_pd.loc[ironman_pd.divided > 12]
``````
``````# 輸出結果
number	area	divided
kaohsiung	 150	 10	     15.0
``````

## Pandas資料操作

Numpy的ufunc都可以在Series和DataFrame上操作。

• 絕對值abs
``````ironman_series = pd.Series({'a':-50, 'b': 20, 'c': -30, 'd' : 22, 'e' : -40})
print(np.abs(ironman_series))
``````
``````# 輸出結果
a    50
b    20
c    30
d    22
e    40
dtype: int64
``````
• 索引對齊
只要是對齊不到index都會使用NaN表示
``````ironman_series = pd.Series({'a':-50, 'b': 20, 'c': -30, 'd' : 22, 'e' : -40})
ironman_series2 = pd.Series({'a':12, 'c': -15, 'd': -10, 'f' : -31, 'g' : 20})
``````
``````# 輸出結果
a   -38.0
b     NaN
c   -45.0
d    12.0
e     NaN
f     NaN
g     NaN
dtype: float64
``````

``````ironman_series.add(ironman_series2, fill_value = 0)
``````
``````# 輸出結果
a   -38.0
b    20.0
c   -45.0
d    12.0
e   -40.0
f   -31.0
g    20.0
dtype: float64
``````

Python的運算子和Pandas之間的對應

- sub(),subtract()
* mul(),multiply()
/ truediv(),div(),divide()
// floordiv()
% mod()
** pow()

### 之前的章節導覽

• 安裝環境
• Numpy
• 程式碼位置
• github
因為作者本身也是第一次學習Python和寫程式文章，所以編排上會有點亂，觀念可能也會錯誤，如果有疑問可以提出一起討論，等30天完成之後有其他時間會將之前寫的文章加入一些想法。

python 入門到分析股市30

### 1 則留言

0
Andy Chiu
iT邦研究生 3 級 ‧ 2018-11-08 23:13:00
``````ironman.loc[ironman.divided > 12]
``````

=> ironman_pd.loc[ironman_pd.divided > 12]

Summer iT邦新手 5 級‧ 2018-11-09 00:05:20 檢舉