iT邦幫忙

0

python的pandas模組實用dataFrame技巧- 按指定欄位排序、按相同id合併資料(groupby)

版本資訊:
Spyder->python3.7
pandas版本0.23.4

假設我們用pandas的dataFrame來存學生的成績資料表:

import pandas as pd

scores = {"學號":["x01","x02","x03","x04","x05"],
          "系所":["A","C","B", "B","A"],
          "姓名":["小華","小明","小李","小光","小花"],
          "國文":[80,55,75,23,12]}
score_df = pd.DataFrame.from_dict(scores)
print(score_df)

結果:

    學號 系所  姓名  國文
0  x01  A  小華  80
1  x02  C  小明  55
2  x03  B  小李  75
3  x04  B  小光  23
4  x05  A  小花  12

這張表記錄了每位學生的國文成績

按指定欄位排序

假設我們想按國文分數排序:

import pandas as pd

scores = {"學號":["x01","x02","x03","x04","x05"],
          "系所":["A","C","B", "B","A"],
          "姓名":["小華","小明","小李","小光","小花"],
          "國文":[80,55,75,23,12]}
score_df = pd.DataFrame.from_dict(scores)
score_df.sort_values("國文",inplace=True) #原地修改
print(score_df)

結果:

    學號 系所  姓名  國文
4  x05  A  小花  12
3  x04  B  小光  23
1  x02  C  小明  55
2  x03  B  小李  75
0  x01  A  小華  80

預設是遞增排序,
若想按國文分數遞減排序的話,可以設置ascending=False

import pandas as pd

scores = {"學號":["x01","x02","x03","x04","x05"],
          "系所":["A","C","B", "B","A"],
          "姓名":["小華","小明","小李","小光","小花"],
          "國文":[80,55,75,23,12]}
score_df = pd.DataFrame.from_dict(scores)
score_df.sort_values("國文",inplace=True, ascending=False) #原地修改
print(score_df)

結果:

    學號 系所  姓名  國文
0  x01  A  小華  80
2  x03  B  小李  75
1  x02  C  小明  55
3  x04  B  小光  23
4  x05  A  小花  12

相同欄資料合併

譬如說我們想要把同系所的學生統整在一起,
輸出多個dataFrame:

import pandas as pd

scores = {"學號":["x01","x02","x03","x04","x05"],
          "系所":["A","C","B", "B","A"],
          "姓名":["小華","小明","小李","小光","小花"],
          "國文":[80,55,75,23,12]}
score_df = pd.DataFrame.from_dict(scores)

df_merge = score_df.groupby(score_df['系所'])

for name, group in df_merge:
    print(name)
    print(group)
    print()

結果:

A
    學號 系所  姓名  國文
0  x01  A  小華  80
4  x05  A  小花  12

B
    學號 系所  姓名  國文
2  x03  B  小李  75
3  x04  B  小光  23

C
    學號 系所  姓名  國文
1  x02  C  小明  55

這樣便按系所拆成三個dataFrame了。

另外,若我們想統計相同系所的學生的國文分數總和:

import pandas as pd

scores = {"學號":["x01","x02","x03","x04","x05"],
          "系所":["A","C","B", "B","A"],
          "姓名":["小華","小明","小李","小光","小花"],
          "國文":[80,55,75,23,12]}
score_df = pd.DataFrame.from_dict(scores)

df_merge = score_df["國文"].groupby(score_df['系所']).sum()
print(df_merge)

結果:

系所
A    92
B    98
C    55
Name: 國文, dtype: int64

參考資料

  1. DataFrame-GroupBy 按照相同id合併相應數據

尚未有邦友留言

立即登入留言