Day 20 Azure machine learning: "Hello Azure" experiment- 試一下最簡單的實驗

2021 iThome 鐵人賽

DAY 20

AI & Data

我不太懂 AI，可是我會一點 Python 和 Azure系列第 20 篇

13th鐵人賽 microsoft azure azure machine learning

Ben

團隊能去健身房後發現硬舉退步一百公斤的五隻雞

2021-09-20 06:59:37

2883 瀏覽

分享至

Azure machine learning: "Hello Azure" experiment- 試一下最簡單的實驗

雖然上一篇文章有提到，在工作區- workspace執行實驗時，需要先設定環境，也需要上傳需要用到的資料，但是這一篇文章要執行的實驗十分簡單，只是要印出Hello, world!，所以不需要額外的資料，環境也可以直接用預設的環境。在workspace預設的環境中，用的Python版本是3.6.2，套件則是只安裝了azureml-defaults。

在workspace執行實驗時，至少需要兩個Python script：

一個要在workspace利用計算叢集執行的程式碼：hello.py
另一個 script run_experiment.py 在本機執行，用來通知workspace開始執行上述的 script

示範程式

hello.py

print("Hello, Azure!")

run_experiment.py

import os
from azureml.core import Workspace, Experiment, ScriptRunConfig
from azureml.core.authentication import InteractiveLoginAuthentication


def main():
    """
    Hello on Azure machine learning.
    """
    interactive_auth = InteractiveLoginAuthentication(tenant_id=os.getenv("TENANT_ID"))
    work_space = Workspace.from_config(auth=interactive_auth)
    # 建立實驗
    experiment = Experiment(workspace=work_space, name="hello-experiment")
    # 設定 config
    config = ScriptRunConfig(
        source_directory=".", # code放在哪個資料夾
        script="hello.py", # 要上傳的code
        compute_target="cpu-cluster" # 指定計算叢集
    )
    # 讓實驗依照 config 執行
    run = experiment.submit(config)
    aml_url = run.get_portal_url()
    print(aml_url)# 此連結可以看到 log
    run.wait_for_completion(show_output=True)# 過程中的紀錄都會列出


if __name__ == "__main__":
    main()

準備好上述程式碼後，我們就能執行：

python3.7 run_experiment.py

執行之後，程式碼會把程式碼上傳執行。執行的時間大概要十幾分鐘左右，這時候你會想，為什麼要這麼久？因為......

Azure 會從 build docker image 開始，build 完，然後再推到 Azure Container Registry- ACR 存放，這一步應該就是最花時間的步驟了。到workspace，進入實驗（下圖中，左側燒杯圖示）中查看輸出 + 紀錄檔，可以看到20_image_build_log.txt，這檔案紀錄上述過程。
接著，會把 dcoker image 拉到虛擬機器中展開成 container（記錄在55_azureml-excution-tvmp_xxxxx.txt）。
然後，把需要執行的程式碼放入 container 之中（記錄在65_jobp_prep-tvmp_xxxxx.txt）。
終於可以執行print("Hello Azure!")了。如果上傳的程式碼出錯，也可以從這裡的紀錄發現錯誤訊息。通常會出問題的地方，多半是在使用者想要執行的程式碼上，所以可以透過觀察 70_driver_log.txt 發現問題所在。
最後結束實驗，把運算資源釋放出來。