Microsoft Azure Machine Learning - Day 3

自學筆記

s790502ss 2022-02-06 20:10:30 ‧ 1215 瀏覽

分享至

Chap.I Practical drill 實戰演練

以下內容來自這裡

Prat4. Run Experiments 執行實驗

此處將會練習使用 Azure 內建的筆記本功能，來編寫上篇提到的 Experiments、Script & ML flow （內容來自這裡）

從處理資料、建模到預測，我們把這個作業流程稱作一次實驗（Experiment）。
Azure 機器學習中，一次實驗常搭配腳本（Script）或管道（Pipeline）來輸出，及記錄實驗結果。
本篇將使用 Azure 機器學習 SDK 來運行 Python 代碼作為實驗。

4-0. 下載筆記本資料

在 Jupyter notebook 環境中，創建一個新的終端機，輸入以下指令：

git clone https://github.com/MicrosoftLearning/mslearn-dp100 mslearn-dp100

即可下載以下所有筆記本內容。

4-1. 驗證是否已安裝 Azure 機器學習 SDK

我們必須先檢查當前環境的 SDK 版本。
在 Jupyter notebook 環境中，創建一個新的終端機，並分別輸入以下指令：

pip show azureml-sdk

pip show azureml-widgets

理論上，Azure Notebook 預設是有安裝這兩個套件的。

4-2. 創建並執行一個內聯實驗（Inline Experiment）

直接在環境中創建並執行實驗，稱為內聯實驗。

import azureml.core
from azureml.core import Workspace, Experiment
import pandas as pd
import matplotlib.pyplot as plt

# 從保存的配置文件加載工作區
ws = Workspace.from_config()

# 重要 Experiment()：創建（或呼叫）workspace 中的某個實驗
experiment = Experiment(workspace=ws, name="mslearn-designer-predict-diabetes")

# 重要 start_logging()：將實驗過程訊息全紀錄下
run = experiment.start_logging()
print("Starting experiment:", experiment.name)

data = pd.read_csv('data/diabetes.csv')
row_count = (len(data))

# 重要 run.log()：寫入工作日誌檔 run.log
# 'observations' 表示觀察值，row_count 則是總共紀錄多少筆資料
run.log('observations', row_count)
print(f'Analyzing {row_count} rows of data')

# 記錄數字列的匯總統計信息
med_columns = ['PlasmaGlucose', 'DiastolicBloodPressure', 'TricepsThickness', 'SerumInsulin', 'BMI']
summary_stats = data[med_columns].describe().to_dict()
for col in summary_stats:
    keys = list(summary_stats[col].keys())
    values = list(summary_stats[col].values())
    for index in range(len(keys)):
        run.log_row(col, stat=keys[index], value = values[index])

# 抽樣抽 100 筆，寫進新的 csv 檔案，放進容器內
data.sample(100).to_csv('sample.csv', index=False, header=True)

# 重要 run.upload_file()：保存結果或更新 file 到雲端機台
run.upload_file(name='outputs/sample.csv', path_or_stream='./sample.csv')

# 重要 run.complete()：結束執行
run.complete()

>>  Starting experiment: mslearn-designer-predict-diabetes
    Analyzing 10000 rows of data

使用 download_file 方法單獨下載實驗生成的文件

import os

download_folder = 'downloaded-files'

# Download files in the "outputs" folder
run.download_files(prefix='outputs', output_directory=download_folder)

# Verify the files have been downloaded
for root, directories, filenames in os.walk(download_folder): 
    for filename in filenames:  
        print (os.path.join(root,filename))

4-3. 創建並執行實驗腳本（Experiment Script）

在 4-2. 範例中，我們在筆記本中執行了一次內聯實驗。
實際上，更常使用的方法是為實驗創建一個腳本，將實驗所需的所有資料一起存儲在資料夾中，並讓 Azure ML 根據資料夾中的腳本運行試驗。

建立一個資料夾，複製所有數據到裡面

import os, shutil

# 創建一個資料夾
folder_name = 'diabetes-experiment-files'
experiment_folder = './' + folder_name
os.makedirs(folder_name, exist_ok=True)

# Copy 檔案
shutil.copy('data/diabetes.csv', os.path.join(folder_name, "diabetes.csv"))

建立一個腳本（這邊尚未呼叫，所以並不會執行）

%%writefile $folder_name/diabetes_experiment.py
# 上面魔術字串意義是：將所有下面內容寫入 diabetes_experiment.py

from azureml.core import Run
import pandas as pd
import os

# 重要 Run.get_context()：會檢索腳本所在的文件夾加載糖尿病數據
run = Run.get_context()

# Load Data
data = pd.read_csv('diabetes.csv')

# 算總共有幾個 row
row_count = (len(data))

# 重要 run.log()：寫入工作日誌檔 run.log
# 'observations' 表示觀察值，row_count 則是總共紀錄多少筆資料
run.log('observations', row_count)
print('Analyzing {} rows of data'.format(row_count))

# 計算並記錄標籤計數
diabetic_counts = data['Diabetic'].value_counts()
print(diabetic_counts)
for k, v in diabetic_counts.items():
    run.log('Label:' + str(k), v)

# 在輸出文件夾中保存數據樣本（自動上傳）
os.makedirs('outputs', exist_ok=True)
# 抽樣抽 100 筆，寫進新的 csv 檔案，放進容器內
data.sample(100).to_csv("outputs/sample.csv", index=False, header=True)

# 重要 run.complete()：結束執行
run.complete()

真正執行腳本

要運行腳本，須創建一個 ScriptRunConfig 來標識要在實驗中運行的 Python 腳本，用它去運行實驗。

from azureml.core import Experiment, ScriptRunConfig, Environment
from azureml.widgets import RunDetails

# 指定 Python 的實驗環境
env = Environment.from_conda_specification("experiment_env", "environment.yml")

# 重要 ScriptRunConfig()：定義實驗腳本的運行參數
script_config = ScriptRunConfig(
    source_directory=experiment_folder,
    
    # 把 diabetes_experiment.py 內容載入
    script='diabetes_experiment.py',
    
    # 虛擬機器環境
    environment=env
)

# 提交實驗
experiment = Experiment(workspace=ws, name='mslearn-designer-predict-diabetes')
run = experiment.submit(config=script_config)
RunDetails(run).show()

# 等待跑完要下此指令，否則還沒跑完會往下個跑
run.wait_for_completion()

查看實驗運行時的資料

使用 get_details_with_logs 方法，該輸出將包含日誌數據。（不建議使用）

run.get_details_with_logs()

下載日誌文件，並在編輯器中查看它們。（建議使用）

import os

log_folder = 'downloaded-logs'

# Download all files
run.get_all_logs(destination=log_folder)

# Verify the files have been downloaded
for root, directories, filenames in os.walk(log_folder): 
    for filename in filenames:  
        print (os.path.join(root,filename))

4-4. ML flow

MLflow 是一個用於管理機器學習過程的平台。
它通常（但不限於）在 Databricks 環境中用於協調實驗和跟踪指標。
在 Azure 機器學習實驗中，可使用 MLflow 追蹤指標，去替代本機的日誌功能。

白話文就是，ML flow 會將實驗內容與結果轉化為一個流程，以便在端點上執行計算。

確認版本

!pip show azureml-mlflow

Output：

>>  Name: azureml-mlflow
    Version: 1.37.0
    Summary: Contains the integration code of AzureML with Mlflow.
    Home-page: https://docs.microsoft.com/python/api/overview/azure/ml/?view=azure-ml-py
    Author: Microsoft Corp
    Author-email: None
    License: Proprietary https://aka.ms/azureml-preview-sdk-license 
    Location: /anaconda/envs/azureml_py36/lib/python3.6/site-packages
    Requires: jsonpickle, mlflow-skinny, azureml-core
    Required-by: azureml-train-automl-runtime

4-5. Use MLflow w/ Inline Experiment

要使用 MLflow 追蹤內聯實驗的指標，須將「MLflow tracking URI」設置到運行實驗的 workspace。
這使您能夠使用 mlflow tracking 方法，將實驗運行中的數據記錄下來。

MLflow

from azureml.core import Experiment
import pandas as pd
import mlflow

# 重要 mlflow.set_tracking_uri()：將「MLflow tracking URI」設置到 workspace
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

# 重要 Experiment()：創建（或呼叫）workspace 中的某個實驗
experiment = Experiment(workspace=ws, name='mslearn-designer-predict-diabetes')
mlflow.set_experiment(experiment.name)

# 重要 with mlflow.start_run()：開始 MLflow 實驗
with mlflow.start_run():

    # 印出實驗名稱
    print("Starting experiment:", experiment.name)

    # Load data
    data = pd.read_csv('data/diabetes.csv')

    # 計算 row 數量與紀錄結果
    row_count = (len(data))
    mlflow.log_metric('observations', row_count)
    print("Run complete")

Output：

>>  Starting experiment: mslearn-designer-predict-diabetes
    Run complete

獲取最新運行的實驗紀錄

# 獲取最新運行的實驗
run = list(experiment.get_runs())[0]

# 獲取記錄的指標
print("\nMetrics:")
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics.get(key))

# 獲取指向 Azure ML 工作室中的實驗的鏈接
experiment_url = experiment.get_portal_url()
print('See details at', experiment_url)

Output：

Metrics:
observations 10000.0
See details at https://ml.azure.com/experiments/id/be0ba982-962a-4dc3-a2f3-5c6a86dfb86f?wsid=/subscriptions/f9c05b3c-0e46-4abd-bede-f7602a17903d/resourcegroups/rg1/workspaces/ws1&tid=f2ca3e4f-4773-4ef4-bcd0-bac34ac841e8

4-6. Use MLflow w/ Experiment Script

最後，我們還可以結合上述 Experiment Script 與 MLflow 兩者來使用。

創建資料夾與複製檔案

import os, shutil

# 創建資料夾
folder_name = 'mlflow-experiment-files'
experiment_folder = './' + folder_name
os.makedirs(folder_name, exist_ok=True)

# Copy 檔案
shutil.copy('data/diabetes.csv', os.path.join(folder_name, "diabetes.csv"))

MLflow

%%writefile $folder_name/mlflow_diabetes.py
# 上面魔術字串意義是：將所有下面內容寫入 mlflow_diabetes.py

from azureml.core import Run
import pandas as pd
import mlflow

# 重要 with mlflow.start_run()：開始 MLflow 實驗
with mlflow.start_run():

    # Load data
    data = pd.read_csv('diabetes.csv')

    # 計算 row 數量與紀錄結果
    row_count = (len(data))
    print('observations:', row_count)
    mlflow.log_metric('observations', row_count)

Output：

Writing mlflow-experiment-files/mlflow_diabetes.py

Experiment Script

在 Azure ML 實驗腳本中使用 MLflow tracking 時，MLflow tracking URI 會在開始運行實驗時自動設置。
但是，要運行腳本的環境須包含所需的 mlflow package。

# 要運行腳本，須創建一個 ScriptRunConfig 來標識要在實驗中運行的 Python 腳本，用它去運行實驗。
from azureml.core import Experiment, ScriptRunConfig, Environment
from azureml.widgets import RunDetails

# 指定 Python 的實驗環境
env = Environment.from_conda_specification("experiment_env", "environment.yml")

# 重要 ScriptRunConfig()：定義實驗腳本的運行參數
script_mlflow = ScriptRunConfig(
    source_directory=experiment_folder,

    # 把 mlflow_diabetes.py 內容載入
    script='mlflow_diabetes.py',

    # 虛擬機器環境
    environment=env
    )

# 提交實驗
experiment = Experiment(workspace=ws, name='mslearn-designer-predict-diabetes-0205')
run = experiment.submit(config=script_mlflow)
RunDetails(run).show()

# 等待跑完要下此指令，否則還沒跑完會往下個跑
run.wait_for_completion()

獲取 logged metrics

# Get logged metrics
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics.get(key))

Output：

>>  observations 10000.0

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19831 篇

完賽人數

528 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

Microsoft Azure Machine Learning - Day 3

Chap.I Practical drill 實戰演練

Prat4. Run Experiments 執行實驗

4-0. 下載筆記本資料

4-1. 驗證是否已安裝 Azure 機器學習 SDK

4-2. 創建並執行一個內聯實驗（Inline Experiment）

使用 download_file 方法單獨下載實驗生成的文件

4-3. 創建並執行實驗腳本（Experiment Script）

建立一個資料夾，複製所有數據到裡面

建立一個腳本（這邊尚未呼叫，所以並不會執行）

真正執行腳本

查看實驗運行時的資料

4-4. ML flow

確認版本

4-5. Use MLflow w/ Inline Experiment

MLflow

獲取最新運行的實驗紀錄

4-6. Use MLflow w/ Experiment Script

創建資料夾與複製檔案

MLflow

Experiment Script

獲取 logged metrics

尚未有邦友留言

標記使用者