Ɖ28-布朗尼/ Pandas 的索引探討與應用

2022 iThome 鐵人賽

DAY 28

AI & Data

先別急著學 Python | The Secret to Success in Python系列第 28 篇

14th鐵人賽 python kaggle excel 數據分析

juck30808

2022-10-12 23:59:26

565 瀏覽

分享至

索引過程

我們介紹過DataFrame表現得既像二維數組又像由共同的索引值組成的Series對象的字典。這能幫助你學習如何在DataFrame裡面進行數據選擇。

The Pandas Index Object

前面內容介紹的Series和DataFrame對像都包含著一個顯式定義的索引index對象，它的作用就是讓你快速訪問和修改數據。

基本索引功能

indA.size                           # returns the size
indA.shape                          # returns the shape
indA.ndim                           # returns the number of dimensions in the data
indA.dtype                          # returns the data type
indA.nbytes                         # returns the number of bytes in the data
indA.empty                          # checks if the series is empty or not
indA.hasnans                        # checks if the series has any nan value 

indA.intersection(indB)             #indA & indB  # 交集
indA.union(indB)                    #indA | indB  # 聯集
indA.symmetric_difference(indB)     #indA ^ indB  # 互斥差集

UFuncs: Index Alignment 索引對齊

對於兩個Series或DataFrame進行二元運算操作，Pandas會在運算過程中會自動將兩個數據集的索引進行對齊操作。這對於我們處理不完整的數據集的情況下非常方便，下面我們來看一些例子，假設我們從兩個不同的數據源分別獲得美國的州大小和人口。

area = pd.Series({'Alaska': 1723337, 'Texas': 695662,'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,'New York': 19651127}, name='population')

我可以利用 union 來尋找其聯集，利用這個方式可以找出索引

area.index.union(population.index)    # union

UFuncs Function

兩個任意輸入數據集中對應的另一個數據集不存在的元素都會被設置為 NaN（非數字的縮寫），也就是Pandas標示缺失數據的方法：如果填充成NaN值不是你需要的結果，你可以使用相應的ufunc函數來計算，然後在函數中設置相應的填充值參數。例如，調用A.add(B)等同於調用A + B，但是可以提供額外的參數來設置用來缺失的替換值：