[Day 24] Hyperparameter tuning / 調校超參數 part III

第 11 屆 iThome 鐵人賽

DAY 24

AI & Data

跟top kaggler學習如何贏得資料分析競賽系列第 24 篇

11th鐵人賽

madeleine

2019-09-25 23:27:23

3062 瀏覽

分享至

Neural nets

– Pytorch, Tensorflow, Keras...

Linear models

– SVM, logistic regression
– Vowpal Wabbit, FTRL

這邊僅討論 dense neural network, 指的是 fully connected layers.

What framework to use?

– Keras,Lasagne – TensorFlow
– MxNet
– PyTorch

PyTorch 或 Keras <-- 推薦 / Recommended

Neural nets

1. Number of neurons per layer : 在每一層加上神經元, 讓 model 可以學習更複雜的決策疆界, 以及加速 overfit.

2. Number of layers : 在還不確定 model 效果前, 開始先一層一層 64 units 慢慢加, 邊加邊 debug 跟降低 loss.

3. Optimizers

SGD + momentum --> stochastic gradient descent with momentum, 比較慢但穩, 訓練的模型通常比較好
Adam/Adadelta/Adagrad/... --> In practice lead to more overfitting, 比較快但要注意易導致 overfitting.

4. Batch size : 建議一剛開始用 32 或 64, 若 underfitting 就增加, 反之則減少.

5. Learning rate : 建議一剛開始用 0.1, 然後嘗試減少後再試, learning rate 跟 batch size 有相關, 作法是以 α 增加 batch size 時, 也可以同時用 α 增加 learning rate.

6. Regularization

L2/L1 for weights : 早期作法 L1, L2 來正規化, 現在都用 dropout
Dropout/Dropconnect
Static dropconnect : 下圖一的第一層隱藏層 128 units 改成 4096 units(下圖二), 網路變巨大, 所以在比賽中在 input layer 跟第一層隱藏層中間隨機丟掉 99%, 是很有威力的正規化做法.

圖一

圖二

截圖自 Coursera

Linear models

1. Scikit-learn

SVC/SVR <-- 現在少人用了
. Sklearn wraps libLinear and libSVM : SVM 好用, 不太需要調校
. Compile yourself for multicore support
LogisticRegression/LinearRegression + regularizers
SGDClassifier/SGDRegressor <-- 之前 metrics 講過了

2. Vowpal Wabbit ：逐行在硬碟讀資料及處理, 不會整包跑, 可用來學習大量資料

FTRL (flow the regularized leader)

3. Regularization parameter (C, alpha, lambda, ...)

Start with very small value and increase it.
SVC starts to work slower as C increases

4. Regularization type

L1/L2/L1+L2 -- try each
L1 can be used for feature selection

短髮狄哥的手把手語法練習又來囉, 下列是整套 notebook 搬來的語法跟步驟 : Hyperparameters_tuning_video2_RF_n_estimators

This notebook shows, how to compute RandomForest's accuracy scores for each value of n_estimators without retraining the model. No rocket science involved, but still useful.

Load some data

import sklearn.datasets
from sklearn.model_selection import train_test_split

X, y = sklearn.datasets.load_digits(10,True)
X_train, X_val, y_train, y_val = train_test_split(X, y)

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

Step 1: first fit a Random Forest to the data. Set n_estimators to a high value.

rf = RandomForestClassifier(n_estimators=500, max_depth=4, n_jobs=-1)
rf.fit(X_train, y_train)

截圖自 Coursera

Step 2: Get predictions for each tree in Random Forest separately.

predictions = []
for tree in rf.estimators_:
    predictions.append(tree.predict_proba(X_val)[None, :])

Step 3: Concatenate the predictions to a tensor of size (number of trees, number of objects, number of classes).

predictions = np.vstack(predictions)

Step 4: Сompute cumulative average of the predictions. That will be a tensor, that will contain predictions of the random forests for each n_estimators.

cum_mean = np.cumsum(predictions, axis=0)/np.arange(1, predictions.shape[0] + 1)[:, None, None]

Step 5: Get accuracy scores for each n_estimators value

scores = []
for pred in cum_mean:
    scores.append(accuracy_score(y_val, np.argmax(pred, axis=1)))

That is it! Plot the resulting scores to obtain similar plot to one that appeared on the slides.

plt.figure(figsize=(10, 6))
plt.plot(scores, linewidth=3)
plt.xlabel('num_trees')
plt.ylabel('accuracy');

截圖自 Coursera

[Day 23] Hyperparameter tuning / 調校超參數 part II

[Day 25] tSNE - dimension reduction / 非線性降維方法與視覺化

系列文

跟top kaggler學習如何贏得資料分析競賽共 30 篇

RSS系列文訂閱系列文

21 人訂閱

完整目錄

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22195 篇

完賽人數

600 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙

跟top kaggler學習如何贏得資料分析競賽 系列 第 24 篇