You’re given a dataframe containing sales data from a grocery store chain with columns for customer ID, gender, and date of sale.
Create a new dataset with summary level information on their purchases including the columns: customer_id
、gender
、most_recent_sale
、order_count
。
most_recent_sale
should display the date of the customer’s most recent purchase.order_count
should display the total number of purchases that the customer has made.most_recent_sale
、購買次數 order_count
import pandas as pd
def customer_analysis(df):
# create a new dataframe / 建立符合題意的 dataframe
new = df.groupby(['customer_id','gender']).agg( most_recent_sale=('date of sale','max'), order_count = ('date of sale','count'))
# reset the index / 重製索引
new.reset_index(inplace=True)
return new
使用 Pandas 分組 groupby('欄位名稱')
操作後,索引值將被指定欄位所取代,如需「重新設定索引值」,可搭配 reset_index( inplace = True 或 False )
協助處理資料內容。
嗨,我是 Eva,一位正在努力跨進資料科學領域的女子!
今天是本次鐵人賽的最後一道練習題,連同前面五天,總共分享六題的 Pandas 練習題,也越能理解基礎功的重要,如果大家有興趣的話,也可以繼續轉往 LeetCode 試題,有機會我會繼續在 Medium 分享,歡迎來追蹤喔!我們明天見!