1

## 【統計學】(自己的筆記)共變異數，相關係數與迴歸直線

1. 共變異數 cov(X,Y)定義為`E[(X-average(X))(Y-average(Y))]`
2. 相關係數，全名Pearson Correlation Coefficient，
其值介於-1~1，絕對值愈接近1則數據愈接近一條直線，
`rho(X,Y)=cov(X,Y)/(std(X)*std(Y))`，std是標準差
3. 迴歸直線: 滿足最小平方法的直線y=a+bx，
其中`b=rho(X,Y)*std(Y)/std(X)`
`a=average(Y)-b*average(X)`

``````import matplotlib.pyplot as plt
import math
import numpy as np

def average(L):
return sum(L)/len(L)

# 共變異數
def cov(X,Y):
assert len(X)==len(Y), "Lengths of array X and Y are not equal."
aveX,aveY= average(X),average(Y)
return average([(X[i]-aveX)*(Y[i]-aveY) for i in range(len(X))])

# 標準差
def std(L):
mean = average(L)
L2 = [(y-mean)**2 for y in L]
return math.sqrt(average(L2))

"""

"""
def rho(X,Y):
return cov(X,Y)/(std(X)*std(Y))

"""

a=average(Y)-b*average(X)
"""
def regressionLine(X,Y):
b=rho(X,Y)*std(Y)/std(X)
a=average(Y)-b*average(X)
print(f"迴歸直線為: y={round(a,2)}+{round(b,2)}x")
return a,b

if __name__ == "__main__":
X=[1,3,5,7,9,11,13]
Y=[2,5,8,11.1,14,17,20]
print(cov(X,Y),std(X),std(Y))
print(rho(X,Y))
a,b=regressionLine(X,Y)

plt.title("Test Chart", fontsize=24) #圖表標題
plt.xlabel("xValue", fontsize=16) #x軸標題
plt.ylabel("yValue", fontsize=16) #y軸標題
xpt = np.linspace(min(X),max(X),100)
ypt = a+b*xpt
plt.scatter(xpt,ypt,c='c') #畫迴歸直線
plt.scatter(X,Y,c='k')
plt.show() #顯示繪製的圖形
``````