DAY[26]-Kaggle實戰模型準備-線性模型

第 11 屆 iThome 鐵人賽

DAY 26

AI & Data

Python機器學習介紹與實戰系列第 26 篇

11th鐵人賽 python3 machine learning

Austin

團隊Bikini Bottom

2019-10-11 20:33:57

2948 瀏覽

分享至

資料的部分準備完畢之後，接下來最重要的就是模型的產生以及訓練了，在這裡我們先定義了線性模型的交叉驗證以及參數組合，方便之後進行使用。
定義交叉驗證評分函數

import numpy as np
from sklearn.model_selection import KFold, cross_val_score
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import make_pipeline
kfolds = KFold(n_splits=10, shuffle=True, random_state=42)

def rmsle(y, y_pred):
    return np.sqrt(mean_squared_error(y, y_pred))

def cv_rmse(model, X=X):
    rmse = np.sqrt(-cross_val_score(model, X, y, scoring="neg_mean_squared_error", cv=kfolds))
    return (rmse)

設定參數區間

alphas_alt = [14.5, 14.6, 14.7, 14.8, 14.9, 15, 15.1, 15.2, 15.3, 15.4, 15.5]
alphas2 = [5e-05, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005, 0.0006, 0.0007, 0.0008]
e_alphas = [0.0001, 0.0002, 0.0003, 0.0004, 0.0005, 0.0006, 0.0007]
e_l1ratio = [0.8, 0.85, 0.9, 0.95, 0.99, 1]

pipeline是一種產線的概念，將前處理函式一併放入參數設定當中，就可以快速的完成前處理的步驟並直接訓練模型。

from sklearn.svm import SVR
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
 
ridge = make_pipeline(RobustScaler(), RidgeCV(alphas=alphas_alt, cv=kfolds))
lasso = make_pipeline(RobustScaler(), LassoCV(max_iter=1e7, alphas=alphas2, random_state=42, cv=kfolds))
elasticnet = make_pipeline(RobustScaler(), ElasticNetCV(max_iter=1e7, alphas=e_alphas, cv=kfolds, l1_ratio=e_l1ratio))                                
svr = make_pipeline(RobustScaler(), SVR(C= 20, epsilon= 0.008, gamma=0.0003,))

DAY[25]-Kaggle實戰特徵處理(2)

DAY[27]-Kaggle實戰 Boosting模型與Stacking

系列文

Python機器學習介紹與實戰共 30 篇

RSS系列文訂閱系列文

52 人訂閱

完整目錄

尚未有邦友留言

立即登入留言

Python機器學習介紹與實戰系列 第 26 篇

DAY[26]-Kaggle實戰 模型準備-線性模型

尚未有邦友留言

標記使用者

Python機器學習介紹與實戰系列第 26 篇

DAY[26]-Kaggle實戰模型準備-線性模型