iT邦幫忙

2024 iThome 鐵人賽

DAY 28
0
自我挑戰組

30 天程式學習筆記:我的自學成長之路系列 第 28

[DAY 28]AutoML實戰:如何利用貝葉斯優化找到超參數最佳解?

  • 分享至 

  • xImage
  •  

用 Python 實現 AutoML 和貝葉斯優化

這裡我們將使用 optuna 來進行貝葉斯優化,並將其集成到一個 AutoML 工作流中,來自動選擇和優化模型的超參數。

  1. 先安裝 optuna 和 scikit-learn

     pip install optuna
     pip install scikit-learn
    
  2. 執行此段程式碼

    import optuna
    import sklearn.datasets
    import sklearn.ensemble
    import sklearn.model_selection
    import sklearn.svm
    
    # 下載示例數據集
    data = sklearn.datasets.load_breast_cancer()
    X = data.data
    y = data.target
    
    # 定義目標函數
    def objective(trial):
        classifier_name = trial.suggest_categorical("classifier", ["RandomForest", "SVC"])
    
        if classifier_name == "RandomForest":
            n_estimators = trial.suggest_int("n_estimators", 10, 100)
            max_depth = trial.suggest_int("max_depth", 2, 32, log=True)
            classifier_obj = sklearn.ensemble.RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth)
        else:
            C = trial.suggest_loguniform("C", 1e-10, 1e10)
            classifier_obj = sklearn.svm.SVC(C=C, gamma="auto")
    
        # 使用交叉驗證評估模型
        score = sklearn.model_selection.cross_val_score(classifier_obj, X, y, n_jobs=-1, cv=3)
        accuracy = score.mean()
    
        return accuracy
    
    # 創建研究以進行超參數搜索
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=100)
    
    # 輸出最佳參數
    print("Best trial:")
    trial = study.best_trial
    
    print(f"  Accuracy: {trial.value}")
    print("  Best hyperparameters: ", trial.params)
    
    • 我們使用 optuna 來進行貝葉斯優化。在這裡,我們優化兩個模型的超參數:隨機森林(RandomForest)和支持向量機(SVC)。
    • 每個 trial 都會隨機選擇模型並嘗試不同的超參數組合,最終會找到一組最佳參數,使得模型的準確度達到最大。
    • 我們使用 cross_val_score 來進行交叉驗證,從而評估每次的模型表現。
  3. 實驗結果

    [I 2024-09-16 10:35:01,297] A new study created in memory with name: no-name-9e42bdc7-c73e-4e82-80e7-af0461d45672
    [I 2024-09-16 10:35:01,844] Trial 0 finished with value: 0.6274204028589994 and parameters: {'classifier': 'SVC', 'C': 9.012114180566522}. Best is trial 0 with value: 0.6274204028589994.
    [I 2024-09-16 10:35:02,105] Trial 1 finished with value: 0.6274204028589994 and parameters: {'classifier': 'SVC', 'C': 62678.40611887763}. Best is trial 0 with value: 0.6274204028589994.
    [I 2024-09-16 10:35:02,486] Trial 2 finished with value: 0.9437668244685788 and parameters: {'classifier': 'RandomForest', 'n_estimators': 92, 'max_depth': 2}. Best is trial 2 with value: 0.9437668244685788.
    [I 2024-09-16 10:35:02,788] Trial 3 finished with value: 0.9525851666202544 and parameters: {'classifier': 'RandomForest', 'n_estimators': 16, 'max_depth': 3}. Best is trial 3 with value: 0.9525851666202544.
    [I 2024-09-16 10:35:03,057] Trial 4 finished with value: 0.6274204028589994 and parameters: {'classifier': 'SVC', 'C': 72965996.5695198}. Best is trial 3 with value: 0.9525851666202544.
    [I 2024-09-16 10:35:03,453] Trial 5 finished with value: 0.9560753736192332 and parameters: {'classifier': 'RandomForest', 'n_estimators': 58, 'max_depth': 14}. Best is trial 5 with value: 0.9560753736192332.
    ...
    ...
    ...
    [I 2024-09-16 10:35:12,654] Trial 95 finished with value: 0.9543024227234754 and parameters: {'classifier': 'RandomForest', 'n_estimators': 74, 'max_depth': 9}. Best is trial 44 with value: 0.9718926946997123.
    [I 2024-09-16 10:35:12,718] Trial 96 finished with value: 0.9525758841548315 and parameters: {'classifier': 'RandomForest', 'n_estimators': 19, 'max_depth': 15}. Best is trial 44 with value: 0.9718926946997123.
    [I 2024-09-16 10:35:12,829] Trial 97 finished with value: 0.9595841455490578 and parameters: {'classifier': 'RandomForest', 'n_estimators': 47, 'max_depth': 13}. Best is trial 44 with value: 0.9718926946997123.
    [I 2024-09-16 10:35:12,940] Trial 98 finished with value: 0.9560660911538105 and parameters: {'classifier': 'RandomForest', 'n_estimators': 53, 'max_depth': 11}. Best is trial 44 with value: 0.9718926946997123.
    [I 2024-09-16 10:35:13,021] Trial 99 finished with value: 0.959574863083635 and parameters: {'classifier': 'RandomForest', 'n_estimators': 29, 'max_depth': 10}. Best is trial 44 with value: 0.9718926946997123.
    Best trial:
      Accuracy: 0.9718926946997123
      Best hyperparameters:  {'classifier': 'RandomForest', 'n_estimators': 36, 'max_depth': 6}
    

    找到的最佳解為:

    Accuracy: 0.9718926946997123
    Best hyperparameters: {'classifier': 'RandomForest', 'n_estimators': 36, 'max_depth': 6}

結論

AutoML 減少了對人力和時間的依賴,特別是在高維度的超參數空間中,貝葉斯優化的高效性尤為突出。隨著這些技術的進步,研究者和開發者能夠更加專注於模型的結構和應用本身,而非花費大量時間在超參數調整上。想像一下,透過 AutoML 和貝葉斯優化,電商平台可以自動優化推薦系統的參數,將點擊率提升 15%,或者金融機構可以自動調整風控模型,將壞帳率降低 10%。 這些技術的進步,讓企業能夠更快速地將機器學習應用到實際業務中,創造更大的價值。我們可以預見,未來將會有更多領域受益於自動化機器學習,釋放出更大的潛力。


上一篇
[DAY 27]告別手動調參:AutoML 打造高效機器學習流程
下一篇
[DAY 29]Python API 教學:使用 Flask 和 ngrok 打造你的公開服務
系列文
30 天程式學習筆記:我的自學成長之路30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言