Python pandas 使用 iterrows 逐一更新資料【Python 處理 Excel #12】

2024 iThome 鐵人賽

DAY 12

Python

30 天學會用 Python pandas 和 openpyxl 處理 Excel —— 成為用 Python 處理 Excel 檔案的高手系列第 12 篇

16th鐵人賽 pandas excel python

april

2024-09-25 12:21:10

536 瀏覽

分享至

本篇文章同步發布於 Python pandas 使用 iterrows 逐一更新資料【Python 處理 Excel #12】

前言

如果更新資料的過程需要插入一些新欄位，而這些新欄位的值依賴於每一列的值，甚至涉及多個條件判斷時，使用 iterrows 方法更新資料會更靈活。這篇文章介紹如何使用 iterrows 方法更新資料。

案例說明

現在有兩個 Excel 檔案，order.xlsx 有訂單的相關資訊，shipment.xlsx 記錄訂單的出貨記錄。同一筆訂單可能因為客戶需求、生產條件、運輸方式等因素而分批出貨至客戶手上。shipment.xlsx 保留訂單的所有出貨記錄。order.xlsx 沒有出貨相關資訊。

寫文章當下是 2024 年 9 月，本案例的目標是在 order.xlsx 針對每筆訂單加上該訂單在當月 (2024 年 9 月) 出貨的總數量。order.xlsx 和 shipment.xlsx 的具體內容如下：

order.xlsx 的內容

order_id	product_name
101	Laptop Pro
102	Gaming Desktop
103	Ultrabook

shipment.xlsx 的內容

order_id	ship_date	ship_qty
101	2024/7/5	4
102	2024/7/10	2
101	2024/8/15	3
103	2024/8/20	1
102	2024/9/1	5
101	2024/9/5	4
102	2024/9/10	2
101	2024/9/15	1
103	2024/9/20	3
102	2024/9/25	2

使用 pandas iterrows 方法更新資料

步驟 1：讀取 Excel 檔案

首先需要讀取 Excel 檔案：

import pandas as pd
from datetime import datetime, timedelta
import calendar

# 讀取 EXCEL 檔案
order_df = pd.read_excel('order.xlsx')
shipment_df = pd.read_excel('shipment.xlsx')

# 轉換 ship_date 為 datetime 類型
shipment_df['ship_date'] = pd.to_datetime(shipment_df['ship_date'])

# 新增欄位
order_df['current_month_ship_qty'] = 0

解釋

導入需要的函式庫，pandas 用於處理數據，datetime 和 timedelta 用於日期操作。
讀取兩個 Excel 文件，創建 DataFrame。
將 shipment_df 中的 ship_date 轉換為 datetime 類型，以便後續的日期操作。
在 order_df 中新增一個名稱為 current_month_ship_qty 的欄位，初始值設為 0。

步驟 2：篩選當月資料

接著從 shipment_df 的所有資料中篩選出當月 (2024 年 9 月) 的資料：

# 尋找上個月的最後一天
today = datetime.now()
prev_month_end = today.replace(day=1) - timedelta(days=1)

# 篩選當月資料
current_month_mask = prev_month_end < shipment_df['ship_date']
current_month_shipment_df = shipment_df[current_month_mask].copy()

解釋

確定當前日期。
計算上個月的最後一天。
建立一個遮罩 (mask)，用於篩選出當月的出貨資料。
使用這個遮罩建立一個新的 DataFrame，只包含當月的出貨資料。

步驟 3：使用 iterrows 方法更新資料

這個步驟使用 iterrows 方法遍歷訂單資料，並更新每個訂單的當月出貨數量：

# 更新資料
for index, row in order_df.iterrows():
    order_id = row['order_id']
    mask = current_month_shipment_df['order_id'] == order_id
    matching_rows = current_month_shipment_df[mask]
    if not matching_rows.empty:
        ship_qty = matching_rows['ship_qty'].sum()
        order_df.at[index, 'current_month_ship_qty'] = ship_qty

# 顯示更新後的訂單 DataFrame
print(order_df)

解釋

使用 iterrows() 遍歷 order_df 中的每一列
對於每個訂單，建立一個遮罩找出對應的出貨記錄。
如果找到匹配的記錄，則計算總出貨量並更新 order_df 中對應的 current_month_ship_qty 值。
最後我們輸出更新後的 order_df 查看結果。

完整的程式碼

下方式這個案例的完整程式碼：

import pandas as pd
from datetime import datetime, timedelta
import calendar

# 讀取 EXCEL 檔案
order_df = pd.read_excel('order.xlsx')
shipment_df = pd.read_excel('shipment.xlsx')

# 轉換 ship_date 為 datetime 類型
shipment_df['ship_date'] = pd.to_datetime(shipment_df['ship_date'])

# 新增欄位
order_df['current_month_ship_qty'] = 0

# 尋找上個月的最後一天
today = datetime.now()
prev_month_end = today.replace(day=1) - timedelta(days=1)

# 篩選當月資料
current_month_mask = prev_month_end < shipment_df['ship_date']
current_month_shipment_df = shipment_df[current_month_mask].copy()

# 更新資料
for index, row in order_df.iterrows():
    order_id = row['order_id']
    mask = current_month_shipment_df['order_id'] == order_id
    matching_rows = current_month_shipment_df[mask]
    if not matching_rows.empty:
        ship_qty = matching_rows['ship_qty'].sum()
        order_df.at[index, 'current_month_ship_qty'] = ship_qty

# 顯示更新後的訂單 DataFrame
print(order_df)

iterrows 方法的優缺點

iterrow 方法讓使用者得以逐列遍歷 DataFrame。這個方法的優點是類似傳統的 for 迴圈，可以輕鬆訪問修改每一列的資料。缺點是對於大型的 DataFrame，iterrows 方法的執行效率可能較差。對於新增簡單的欄位資料，之前文章介紹過的 apply 方法通常比 iterrows 方法更快。

補充說明：使用遮罩 (mask) 篩選數據

這篇文章的案例中多次使用遮罩 (mask) 篩選數據。例如：

mask = current_month_shipment_df['order_id'] == order_id
matching_rows = current_month_shipment_df[mask]

遮罩是一個布林值的 Series，與原 DataFrame 的列數相同。當使用者將遮罩應用到 DataFrame 時，只有遮罩中為 True 的列會被保留。

總結

iterrows 方法用於 pandas 逐列處理數據。
使用遮罩 (mask) 可以從 DataFrame 中有效篩選出符合特定條件的數據。
對於大型 DataFrame，需要考慮使用其他方法提高效能。

本篇文章同步發布於 Python pandas 使用 iterrows 逐一更新資料【Python 處理 Excel #12】

Python pandas 使用 merge 水平合併 DataFrame 資料【Python 處理 Excel #11】

除了 sum()，Python pandas 還有哪些更新資料的方法？【Python 處理 Excel #13】

系列文

30 天學會用 Python pandas 和 openpyxl 處理 Excel —— 成為用 Python 處理 Excel 檔案的高手共 30 篇

RSS系列文訂閱系列文

12 人訂閱

完整目錄

尚未有邦友留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22202 篇

完賽人數

600 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

30 天學會用 Python pandas 和 openpyxl 處理 Excel —— 成為用 Python 處理 Excel 檔案的高手系列 第 12 篇