( Day 18.2 ) Python 數學統計函式 statistics

2023 iThome 鐵人賽

DAY 18

Software Development

跟著 OXXO 一起學 Python系列第 37 篇

15th鐵人賽 python python系列文章

oxxo

2023-09-18 09:33:23

496 瀏覽

分享至

Python 的標準函式「statistics」提供了一些基本的數學統計函式，可以快速求出平均數、中位數、標準差、眾數...等數字統計，但如果需要更專業的統計函式，則需要參考 NumPy 或 SciPy 等第三方函式庫。

原文參考：數學統計函式 statistics

本篇使用的 Python 版本為 3.7.12，所有範例可使用 Google Colab 實作，不用安裝任何軟體 ( 參考：使用 Google Colab )

statistics 常用方法

下方列出幾種 statistics 模組常用的方法 ( 參考 Python 官方文件：statistics )：

方法	說明
mean()	計算平均值。
median()	計算中位數。
median_low()、median_high()	計算偶數個數據中，較高或較低的中位數。
median_grouped()	計算數據分組 ( 同樣數值的分成同一組 ) 的中位數。
mode()	計算眾數 ( 數據中出現最多次的數值 )。
pstdev()、pvariance()	計算數據的母體標準差和變異數。
stdev()、ariance()	計算數據的樣本標準差和變異數。

import statistics

要使用 statistics 必須先 import statistics 模組，或使用 from 的方式，單獨 import 特定的類型。

import statistics
from statistics import mean

mean()

mean() 可以計算多個數字的平均值，計算結果以小數點兩位顯示。

import statistics

arr = [1, 2, 3, 4, 5, 6, 7, 8]
a = statistics.mean(arr)    # 計算平均值
print(a)                    # 4.5

median()

mean() 可以計算多個數字的中位數，如果數字的數量為奇數，則回傳中間的數字，如果是偶數，則回傳中間數字的平均值。

import statistics

arr = [1, 2, 3, 4, 5, 6, 7, 8]
arr2 = [1, 2, 3, 4, 5, 6, 7]
a = statistics.median(arr)    # 計算中位數
b = statistics.median(arr2)   # 計算中位數
print(a)   # 4.5
print(b)   # 4

median_low()、median_high()

median_low() 和 median_high() 可以計算「偶數個」數據中，較高或較低的中位數，如果是數字的數量為奇數，則回傳中間的數字 ( 等同 median() )。

import statistics

arr = [1, 2, 3, 4, 5, 6, 7, 8]
a = statistics.median_low(arr)    # 計算較低的中位數
b = statistics.median_high(arr)   # 計算較高的中位數
print(a)   # 4
print(b)   # 5

median_grouped()

median_grouped() 可以計算數據分組 ( 同樣數值的分成同一組 ) 的中位數。

import statistics

arr = [1, 2, 2, 3, 4, 4, 4, 4, 4, 4, 5, 5]   # 數據中有重複的數值
a = statistics.median_grouped(arr)    # 計算分組的中位數
print(a)   # 3.8333333333333335

mode

mode() 可以計算眾數 ( 數據中出現最多次的數值 )。

import statistics

arr = [1, 2, 2, 3, 4, 4, 4, 4, 4, 4, 5, 5]
a = statistics.mode(arr)    # 計算出現最多次的數值
print(a)   # 4

只要是可迭代的物件，就可以使用 mode 計算，下方的例子會計算一個字串中，出現最多次的字母 ( 注意，在 Python 3.8 版以前，如果最多次的字母不只一個，會發生錯誤 )。

import statistics

text = 'hello world'
a = statistics.mode(text)    # 計算出現最多次的字母
print(a)   # l

pstdev()、pvariance()

pstdev() 和 pvariance() 可以計算數據的母體標準差和變異數。

import statistics

arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]
a = statistics.pstdev(arr)
b = statistics.pvariance(arr)
print(a)   # 2.581988897471611
print(b)   # 6.666666666666667

stdev()、variance()

stdev() 和 variance() 可以計算數據的樣本標準差和變異數。

import statistics

arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]
a = statistics.stdev(arr)
b = statistics.variance(arr)
print(a)   # 2.7386127875258306
print(b)   # 7.5

跟著 OXXO 一起學 Python系列第 37 篇