Day13 線性回歸實作

2021 iThome 鐵人賽

DAY 13

AI & Data

我比機器更需要學習系列第 13 篇

13th鐵人賽

Tun

2021-09-19 18:46:55

966 瀏覽

分享至

https://github.com/PacktPublishing/Machine-Learning-Algorithms

二維線性回歸圖

首先導入套件，上面的是用來算數學的;下面的是用來畫畫的，並且幫它們取綽號(np & plt)。

import numpy as np
import matplotlib.pyplot as plt

再來，用seed()隨機產生整數的亂數後，使用最小平方差公式，定義損失函數。

np.random.seed(1000)
def loss(v):
    e = 0.0
    for i in range(nb_samples):
        e += np.square(v[0] + v[1]*X[i] - Y[i])
    return 0.5 * e

接著，定義梯度下降法函數。

def gradient(v):
    g = np.zeros(shape=2)
    for i in range(nb_samples):
        g[0] += (v[0] + v[1]*X[i] - Y[i])
        g[1] += ((v[0] + v[1]*X[i] - Y[i]) * X[i])
    return g

最後，從scipy套件導入minimize最優化函數，就可以印出二維線性回歸圖了。

from scipy.optimize import minimize
result = minimize(fun=loss, x0=np.array([0.0, 0.0]), jac=gradient, method='L-BFGS-B')

高維度線性回歸

首先導入boston範例，將它印出看看長怎樣。

from sklearn.datasets import load_boston
def show_dataset(data):
    fig, ax = plt.subplots(4, 3, figsize=(20, 15))
    for i in range(4):
        for j in range(3):
            ax[i, j].plot(data.data[:, i + (j + 1) * 3])
            ax[i, j].grid()
    plt.show()
boston = load_boston()
show_dataset(boston)

因為原始資料太少，所以把資料拆成訓練用跟測試用來交叉驗證，就是把資料切成K等分，K-1等分用來訓練模型，就會迭代K次，最後，建模並訓練模型。

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, cross_val_score
X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, test_size=0.1)
lr = LinearRegression(normalize=True)
lr.fit(X_train, Y_train)

再來，使用scikit-learn內建score()評估模型，計算準確率。

print('Score %.3f' % lr.score(X_test, Y_test)) // Score 0.693

接著，運用cross_val_score()選擇評分測試方式，這裡用負均方差。

from sklearn.model_selection import train_test_split, cross_val_score
scores = cross_val_score(lr, boston.data, boston.target, cv=7, scoring='neg_mean_squared_error')
print('CV Negative mean squared errors mean: %.3f' % scores.mean()) // CV Negative mean squared errors mean: -37.287
print('CV Negative mean squared errors std: %.3f' % scores.std()) // CV Negative mean squared errors std: 46.790

最後，計算實際結果與樣本之間的差，越靠近1越好、越靠近0越不好。

r2_scores = cross_val_score(lr, boston.data, boston.target, cv=10, scoring='r2')
print('CV R2 score: %.3f' % r2_scores.mean()) // CV R2 score: 0.203

Day 12 強化學習 (Reinforcement Learning)

Day14 邏輯斯迴歸實作

系列文

我比機器更需要學習共 23 篇

RSS系列文訂閱系列文

6 人訂閱

完整目錄

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22195 篇

完賽人數

600 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙

我比機器更需要學習系列 第 13 篇