iT邦幫忙

第 11 屆 iThome 鐵人賽

DAY 24
0
AI & Data

跟top kaggler學習如何贏得資料分析競賽 系列 第 24

[Day 24] Hyperparameter tuning / 調校超參數 part III

  • 分享至 

  • xImage
  •  

Neural nets

– Pytorch, Tensorflow, Keras...

Linear models

– SVM, logistic regression
– Vowpal Wabbit, FTRL

這邊僅討論 dense neural network, 指的是 fully connected layers.


What framework to use?

– Keras,Lasagne – TensorFlow
– MxNet
– PyTorch

PyTorch 或 Keras <-- 推薦 / Recommended


Neural nets

1. Number of neurons per layer : 在每一層加上神經元, 讓 model 可以學習更複雜的決策疆界, 以及加速 overfit.

2. Number of layers : 在還不確定 model 效果前, 開始先一層一層 64 units 慢慢加, 邊加邊 debug 跟降低 loss.

3. Optimizers

  • SGD + momentum --> stochastic gradient descent with momentum, 比較慢但穩, 訓練的模型通常比較好
  • Adam/Adadelta/Adagrad/... --> In practice lead to more overfitting, 比較快但要注意易導致 overfitting.

4. Batch size : 建議一剛開始用 32 或 64, 若 underfitting 就增加, 反之則減少.

5. Learning rate : 建議一剛開始用 0.1, 然後嘗試減少後再試, learning rate 跟 batch size 有相關, 作法是以 α 增加 batch size 時, 也可以同時用 α 增加 learning rate.

6. Regularization

  • L2/L1 for weights : 早期作法 L1, L2 來正規化, 現在都用 dropout
  • Dropout/Dropconnect
  • Static dropconnect : 下圖一的第一層隱藏層 128 units 改成 4096 units(下圖二), 網路變巨大, 所以在比賽中在 input layer 跟第一層隱藏層中間隨機丟掉 99%, 是很有威力的正規化做法.

圖一
https://ithelp.ithome.com.tw/upload/images/20190925/20108719vtDdFCEkL4.png
圖二
https://ithelp.ithome.com.tw/upload/images/20190925/20108719RP60S1YRkF.png
截圖自 Coursera


Linear models

1. Scikit-learn

  • SVC/SVR <-- 現在少人用了
    . Sklearn wraps libLinear and libSVM : SVM 好用, 不太需要調校
    . Compile yourself for multicore support
  • LogisticRegression/LinearRegression + regularizers
  • SGDClassifier/SGDRegressor <-- 之前 metrics 講過了

2. Vowpal Wabbit :逐行在硬碟讀資料及處理, 不會整包跑, 可用來學習大量資料

  • FTRL (flow the regularized leader)

3. Regularization parameter (C, alpha, lambda, ...)

  • Start with very small value and increase it.
  • SVC starts to work slower as C increases

4. Regularization type

  • L1/L2/L1+L2 -- try each
  • L1 can be used for feature selection

短髮狄哥的手把手語法練習又來囉, 下列是整套 notebook 搬來的語法跟步驟 : Hyperparameters_tuning_video2_RF_n_estimators

This notebook shows, how to compute RandomForest's accuracy scores for each value of n_estimators without retraining the model. No rocket science involved, but still useful.

Load some data

import sklearn.datasets
from sklearn.model_selection import train_test_split

X, y = sklearn.datasets.load_digits(10,True)
X_train, X_val, y_train, y_val = train_test_split(X, y)
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

Step 1: first fit a Random Forest to the data. Set n_estimators to a high value.

rf = RandomForestClassifier(n_estimators=500, max_depth=4, n_jobs=-1)
rf.fit(X_train, y_train)

https://ithelp.ithome.com.tw/upload/images/20190925/20108719TVJSyzBOHz.png
截圖自 Coursera

Step 2: Get predictions for each tree in Random Forest separately.

predictions = []
for tree in rf.estimators_:
    predictions.append(tree.predict_proba(X_val)[None, :])

Step 3: Concatenate the predictions to a tensor of size (number of trees, number of objects, number of classes).

predictions = np.vstack(predictions)

Step 4: Сompute cumulative average of the predictions. That will be a tensor, that will contain predictions of the random forests for each n_estimators.

cum_mean = np.cumsum(predictions, axis=0)/np.arange(1, predictions.shape[0] + 1)[:, None, None]

Step 5: Get accuracy scores for each n_estimators value

scores = []
for pred in cum_mean:
    scores.append(accuracy_score(y_val, np.argmax(pred, axis=1)))

That is it! Plot the resulting scores to obtain similar plot to one that appeared on the slides.

plt.figure(figsize=(10, 6))
plt.plot(scores, linewidth=3)
plt.xlabel('num_trees')
plt.ylabel('accuracy');

https://ithelp.ithome.com.tw/upload/images/20190925/20108719El7th4CAWR.png

截圖自 Coursera


上一篇
[Day 23] Hyperparameter tuning / 調校超參數 part II
下一篇
[Day 25] tSNE - dimension reduction / 非線性降維方法與視覺化
系列文
跟top kaggler學習如何贏得資料分析競賽 30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言