我有很多csv檔案需要定期做樞紐分析pivot_table,這些csv的某個欄位的內容轉為樞紐後成為樞紐表格的欄位,但每次並不固定。程式的最後需要使用IF判斷式做輸出,但因為樞紐表格的欄位是動態的,無法用單一寫法,請問如何改寫?謝謝幫忙。
我的簡化範例如下,假設df為我讀取的csv檔案,在這個例子下可以執行IF判斷式做輸出。
import pandas as pd
import numpy as np
dic = {
"Category": ['Math', 'Math', 'Math', 'Math', 'Math', 'Math', 'History', 'History','History','History','History','History','Physics','Physics','Physics','Physics','Physics','Physics'],
"Name": ['John', 'Mary', 'Scott','Kevin','Judy','Lee','John', 'Mary', 'Scott','Kevin','Judy','Lee','John', 'Mary', 'Scott','Kevin','Judy','Lee'],
"Scores": [59,64,50,33,80,20,30,44,65,66,78,80,94,56,55,61,12,11]}
df = pd.DataFrame(dic)
df2=df.pivot_table(index= ['Name'], columns="Category", values=['Scores'])
df2.columns=df2.columns.droplevel(0)
if ((df2['History'] > 60) | (df2['Math'] > 70) | (df2['Physics'] > 50)).any():
print ('send email')
else:
print ('all scores under subject limit, dont send email)
可能會遇到的形況如下,讀入的df並沒有History這個Category,此時造成IF判斷式無法輸出。
import pandas as pd
import numpy as np
dic = {
"Category": ['Math', 'Math', 'Math', 'Math', 'Math', 'Math','Physics','Physics','Physics','Physics','Physics','Physics'],
"Name": ['John', 'Mary', 'Scott','Kevin','Judy','Lee','John', 'Mary', 'Scott','Kevin','Judy','Lee'],
"Scores": [59,64,50,33,80,20,30,44,65,66,78,80]}
df = pd.DataFrame(dic)
df2=df.pivot_table(index= ['Name'], columns="Category", values=['Scores'])
df2.columns=df2.columns.droplevel(0)
if ((df2['History'] > 60) | (df2['Math'] > 70) | (df2['Physics'] > 50)).any():
print ('send email')
else:
print ('all scores under subject limit, dont send email)
請問如何改寫IF判斷式,讓他能夠動態偵測我有哪些欄位。需注意的是,我的每個欄位都附帶分數的篩選條件。謝謝幫忙!
不管了,拿你第一個範例改:
你的意圖應該是 History、Math、Physics 中有任何一個人合格,就顯示 History的成績?
然後有可能欄位會少這三個其中一個這樣?
import pandas as pd
import numpy as np
# 拿掉Math做測試。
dic = {
"Category": [ 'History', 'History','History','History','History','History','Physics','Physics','Physics','Physics','Physics','Physics'],
"Name": ['John', 'Mary', 'Scott','Kevin','Judy','Lee','John', 'Mary', 'Scott','Kevin','Judy','Lee'],
"Scores": [30,44,65,66,78,80,94,56,55,61,12,11]
}
df = pd.DataFrame(dic)
df2=df.pivot_table(index= ['Name'], columns="Category", values=['Scores'])
df2.columns=df2.columns.droplevel(0)
def checkRules(df):
rules = {
"History": 60,
"Math": 70,
"Physics": 50
}
tmp = [(df[r] > rules.get(r)).any() for r in rules.keys() if r in list(df.columns)]
return any(tmp)
if checkRules(df2):
print (df2.loc[:, 'History'])
if checkRules(df2):
print ('send email')
else:
print ('all scores under subject limit, dont send email)
對,我的意思就是這樣。但是Print後面是我亂寫,實際上的應用是如上,假設任一人成績都有到達以上,發送信件。若無,就不發信件。但可能科目的欄位會缺少。 我先試著用你的方式改寫看看我的程式,馬上回復,謝謝!
你的寫法似乎是可行的,再進一步請教,假設History 改成 <60, Math>70和Physics>50維持不動。那這樣tmp那段要怎麼修改?
改用operator來操作條件
import pandas as pd
import numpy as np
# 引入操作子
# https://docs.python.org/3/library/operator.html
import operator
dic = {
"Category": [ 'History', 'History','History','History','History','History','Physics','Physics','Physics','Physics','Physics','Physics'],
"Name": ['John', 'Mary', 'Scott','Kevin','Judy','Lee','John', 'Mary', 'Scott','Kevin','Judy','Lee'],
"Scores": [30,44,65,66,78,80,94,56,55,61,12,11]}
df = pd.DataFrame(dic)
df2=df.pivot_table(index= ['Name'], columns="Category", values=['Scores'])
df2.columns=df2.columns.droplevel(0)
def checkRules(df):
# 條件改為前面為對應之操作子的函數,後面為分數
rules = {
"History": (operator.lt, 60),
"Math": (operator.gt, 70),
"Physics": (operator.gt, 50)
}
tmp = [
(
# 帶入相對應的規則,例如 operator.lt(df['History'], 60)
rules.get(r)[0](df[r], rules.get(r)[1])
).any() for r in rules.keys() if r in list(df.columns)
]
return any(tmp)
if checkRules(df2):
print (df2.loc[:, 'History'])
def checkRules(df):
rules = {
"Math": 70,
"Physics": 50
}
tmp = [(df[r] > rules.get(r)).any() for r in rules.keys() if r in list(df.columns)]
return any(tmp)
if any((df2.columns['History'] <60) | checkRules(df2)):
print ('send email')
else:
print ('all scores under subject limit, dont send email)
感謝!成功了! 我稍微改寫成以上這樣,額外處理Histroy<60的狀況,你的寫法真的很簡潔也切中我要的,非常感謝幫忙,可能一開始陳述問題上有點詞不達意希望見諒!
謝謝! 還可以用operator的寫法,學到一課了!
if 'History' in df2.columns:
if (df2['History'] > 60).any():
print (df2.loc[:, 'History'])
if 'Math' in df2.columns:
if (df2['Math'] > 70).any():
print (df2.loc[:, 'History'])
if 'Physics' in df2.columns:
if (df2['Physics'] > 50).any():
print (df2.loc[:, 'History'])