這裡我們將使用 optuna
來進行貝葉斯優化,並將其集成到一個 AutoML 工作流中,來自動選擇和優化模型的超參數。
先安裝 optuna 和 scikit-learn
pip install optuna
pip install scikit-learn
執行此段程式碼
import optuna
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection
import sklearn.svm
# 下載示例數據集
data = sklearn.datasets.load_breast_cancer()
X = data.data
y = data.target
# 定義目標函數
def objective(trial):
classifier_name = trial.suggest_categorical("classifier", ["RandomForest", "SVC"])
if classifier_name == "RandomForest":
n_estimators = trial.suggest_int("n_estimators", 10, 100)
max_depth = trial.suggest_int("max_depth", 2, 32, log=True)
classifier_obj = sklearn.ensemble.RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth)
else:
C = trial.suggest_loguniform("C", 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=C, gamma="auto")
# 使用交叉驗證評估模型
score = sklearn.model_selection.cross_val_score(classifier_obj, X, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
# 創建研究以進行超參數搜索
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)
# 輸出最佳參數
print("Best trial:")
trial = study.best_trial
print(f" Accuracy: {trial.value}")
print(" Best hyperparameters: ", trial.params)
optuna
來進行貝葉斯優化。在這裡,我們優化兩個模型的超參數:隨機森林(RandomForest
)和支持向量機(SVC
)。cross_val_score
來進行交叉驗證,從而評估每次的模型表現。實驗結果
[I 2024-09-16 10:35:01,297] A new study created in memory with name: no-name-9e42bdc7-c73e-4e82-80e7-af0461d45672
[I 2024-09-16 10:35:01,844] Trial 0 finished with value: 0.6274204028589994 and parameters: {'classifier': 'SVC', 'C': 9.012114180566522}. Best is trial 0 with value: 0.6274204028589994.
[I 2024-09-16 10:35:02,105] Trial 1 finished with value: 0.6274204028589994 and parameters: {'classifier': 'SVC', 'C': 62678.40611887763}. Best is trial 0 with value: 0.6274204028589994.
[I 2024-09-16 10:35:02,486] Trial 2 finished with value: 0.9437668244685788 and parameters: {'classifier': 'RandomForest', 'n_estimators': 92, 'max_depth': 2}. Best is trial 2 with value: 0.9437668244685788.
[I 2024-09-16 10:35:02,788] Trial 3 finished with value: 0.9525851666202544 and parameters: {'classifier': 'RandomForest', 'n_estimators': 16, 'max_depth': 3}. Best is trial 3 with value: 0.9525851666202544.
[I 2024-09-16 10:35:03,057] Trial 4 finished with value: 0.6274204028589994 and parameters: {'classifier': 'SVC', 'C': 72965996.5695198}. Best is trial 3 with value: 0.9525851666202544.
[I 2024-09-16 10:35:03,453] Trial 5 finished with value: 0.9560753736192332 and parameters: {'classifier': 'RandomForest', 'n_estimators': 58, 'max_depth': 14}. Best is trial 5 with value: 0.9560753736192332.
...
...
...
[I 2024-09-16 10:35:12,654] Trial 95 finished with value: 0.9543024227234754 and parameters: {'classifier': 'RandomForest', 'n_estimators': 74, 'max_depth': 9}. Best is trial 44 with value: 0.9718926946997123.
[I 2024-09-16 10:35:12,718] Trial 96 finished with value: 0.9525758841548315 and parameters: {'classifier': 'RandomForest', 'n_estimators': 19, 'max_depth': 15}. Best is trial 44 with value: 0.9718926946997123.
[I 2024-09-16 10:35:12,829] Trial 97 finished with value: 0.9595841455490578 and parameters: {'classifier': 'RandomForest', 'n_estimators': 47, 'max_depth': 13}. Best is trial 44 with value: 0.9718926946997123.
[I 2024-09-16 10:35:12,940] Trial 98 finished with value: 0.9560660911538105 and parameters: {'classifier': 'RandomForest', 'n_estimators': 53, 'max_depth': 11}. Best is trial 44 with value: 0.9718926946997123.
[I 2024-09-16 10:35:13,021] Trial 99 finished with value: 0.959574863083635 and parameters: {'classifier': 'RandomForest', 'n_estimators': 29, 'max_depth': 10}. Best is trial 44 with value: 0.9718926946997123.
Best trial:
Accuracy: 0.9718926946997123
Best hyperparameters: {'classifier': 'RandomForest', 'n_estimators': 36, 'max_depth': 6}
找到的最佳解為:
Accuracy: 0.9718926946997123
Best hyperparameters: {'classifier': 'RandomForest', 'n_estimators': 36, 'max_depth': 6}
AutoML 減少了對人力和時間的依賴,特別是在高維度的超參數空間中,貝葉斯優化的高效性尤為突出。隨著這些技術的進步,研究者和開發者能夠更加專注於模型的結構和應用本身,而非花費大量時間在超參數調整上。想像一下,透過 AutoML 和貝葉斯優化,電商平台可以自動優化推薦系統的參數,將點擊率提升 15%,或者金融機構可以自動調整風控模型,將壞帳率降低 10%。 這些技術的進步,讓企業能夠更快速地將機器學習應用到實際業務中,創造更大的價值。我們可以預見,未來將會有更多領域受益於自動化機器學習,釋放出更大的潛力。