1

## dataFrame技巧九式- 查、改

dataFrame技巧總介紹-
python之pandas模組dataFrame九大技巧總覽- 增刪查改讀寫切併排

dataFrame技巧的查、改性質相近，

# 資料準備- 隨機創建資料表

``````import random
import pandas as pd
import numpy as np

def random_str(length=3, miss_rate = 0):
assert 0 <= miss_rate <= 1, 'probability must in [0,1]'
miss = random.choices(range(0,2), weights= [1-miss_rate, miss_rate])[0]
return None if miss else ''.join([chr(ord('a')+random.randrange(26)) for _ in range(length)])

def random_num(low, high, miss_rate = 0):
# miss_rate between 0,1
assert 0 <= miss_rate <= 1, 'probability must in [0,1]'
miss = random.choices(range(0,2), weights= [1-miss_rate, miss_rate])[0]
return np.nan if miss else random.randint(low,high)

print(miss)

def random_df(num):
department = ['A','B','C', None]
data = {"student_id":[random_num(10000,20000) for _ in range(num)],
"student_name":[random_str() for _ in range(num)],
"department":[random.choice(department) for _ in range(num)]}
return pd.DataFrame.from_dict(data)

random.seed(30) #為了確保每次執行程式生成的資料相同，此設固定的random seed
df1 = random_df(5)
print("原始資料:")
print(df1)
``````

``````原始資料:
student_id student_name department
0       14738          qha          B
1       13440          ftq          C
2       10794          ulr          A
3       12196          mah          C
4       17561          viy          B
``````

# 查詢利器- loc與iloc

pandas的取值主要有兩種方法- loc, iloc，
i是index的意思，

``````df2 = df1.loc[[1,3,4],['student_id','department']]
print("方法一: 用loc查詢")
print(df2)

df3 = df1.iloc[[1,3,4],[0,2]]
print("方法二: 用iloc查詢")
print(df3)
``````

``````方法一: 用loc查詢
student_id department
1       13440          C
3       12196          C
4       17561          B

student_id department
1       13440          C
3       12196          C
4       17561          B
``````

## 切片語法:

``````df1 = df1.loc[2:5, :]
print(df1)
``````

``````   student_id student_name department
2       10794          ulr          A
3       12196          mah          C
4       17561          viy          B
``````

# 用中括號擷取資料

pandas用直接用中括號取值有兩種涵義

``````print(df1["student_id"])
print(df1[["student_id"]])
print(df1[["student_id", "student_name"]])
``````

``````0    14738
1    13440
2    10794
3    12196
4    17561
Name: student_id, dtype: int64

student_id
0       14738
1       13440
2       10794
3       12196
4       17561

student_id student_name
0       14738          qha
1       13440          ftq
2       10794          ulr
3       12196          mah
4       17561          viy
``````

``````print(df1[2:5])
``````

# 改

``````df1.iloc[[1,3,4],[0,2]] = "HaHaHa~"
print(df1)
``````

``````  student_id student_name department
0      14738          qha          B
1    HaHaHa~          ftq    HaHaHa~
2      10794          ulr          A
3    HaHaHa~          mah    HaHaHa~
4    HaHaHa~          viy    HaHaHa~
``````

``````df1.iloc[[1,3,4],[0,2]] = [[1,2],
[3,4],
[5,6]]
print(df1)
``````

``````   student_id student_name department
0       14738          qha          B
1           1          ftq          2
2       10794          ulr          A
3           3          mah          4
4           5          viy          6
``````

# 資料行、列互換(轉置)

``````df1 = df1.T
print(df1)
``````

``````                  0      1      2      3      4
student_id    14738  13440  10794  12196  17561
student_name    qha    ftq    ulr    mah    viy
department        B      C      A      C      B
``````

# 修改行、列名稱

1. 把行名`student_id``student_number`
``````df1.rename(columns={'student_id':'student_number'},inplace=True)
print(df1)
``````

``````   student_number student_name department
0           14738          qha          B
1           13440          ftq          C
2           10794          ulr          A
3           12196          mah          C
4           17561          viy          B
``````
1. 把列名2改two(需注意型態是int還是字串)
``````df1.rename(index={2:'two'},inplace=True)
print(df1)
``````

``````     student_id student_name department
0         14738          qha          B
1         13440          ftq          C
two       10794          ulr          A
3         12196          mah          C
4         17561          viy          B
``````

``````df1.columns = ["ID", "name", "dep."]
df1.index = [1,2,3,4,5]
print(df1)
``````

``````      ID name dep.
1  14738  qha    B
2  13440  ftq    C
3  10794  ulr    A
4  12196  mah    C
5  17561  viy    B
``````