我們在訓練模型的過程中,常常會寫好訓練用的 script,跑那個 script 以訓練模型並得到結果。下面這個 MNIST 手寫數字辨識的程式碼,一般就是我們在訓練模型時會用的程式碼。
import tensorflow as tf
data = tf.keras.datasets.mnist
(x_train, y_train),(x_test,y_test) = data.load_data()
x_train = x_train.reshape(x_train.shape[0], 28, 28,1)
x_test = x_test.reshape(x_test.shape[0], 28, 28,1)
x_train, x_test = x_train/255.0, x_test/255.0
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Conv2D(32,(3,3), activation='relu',input_shape=(28,28,1)),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64,(3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
]
)
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
model.fit(x_train,y_train,epochs=10)
model.evaluate(x_test, y_test)
model.save("./mnist.h5")
可是我們學到了 Experiment,學到 Run,學到了 Enviroment,還有運算資源等等的元件,我們要怎麼把他們用在一個真實世界的 AI 專案呢?這時候就要用到 ScriptRunConfig 了。ScriptRunConfig 可以想成一個 config 檔,我們把想要跑的環境、程式碼等等的資訊設定進去,然後再把這個 config 檔提交給 experiment 來跑實驗。現在我們就來看看 ScriptRunConfig 怎麼使用吧!
train_mnist.py
。from azureml.core import Run
import tensorflow as tf
# 用 Run.get_context() 來開始一個 Run。
run = Run.get_context()
data = tf.keras.datasets.mnist
(x_train, y_train),(x_test,y_test) = data.load_data()
x_train = x_train.reshape(x_train.shape[0], 28, 28,1)
x_test = x_test.reshape(x_test.shape[0], 28, 28,1)
x_train, x_test = x_train/255.0, x_test/255.0
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Conv2D(32,(3,3), activation='relu',input_shape=(28,28,1)),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64,(3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
]
)
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
model.fit(x_train,y_train,epochs=10)
loss, acc = model.evaluate(x_test, y_test)
#用 Run 記一下 log
run.log('Accuracy', acc)
model.save("./mnist.h5")
from azureml.core import Experiment, ScriptRunConfig, Environment, Workspace
ws = Workspace.from_config()
# 這裡我們用內建的環境
env = Environment.get(ws,"AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu")
# 建立 ScriptRunConfig
script_config = ScriptRunConfig(source_directory='.',
script='train_mnist.py',
environment=env)
# 提交實驗
experiment = Experiment(workspace=ws, name='training-experiment')
run = experiment.submit(config=script_config)
run.wait_for_completion()
等待這個實驗跑完了之後,我們就到 AML 的圖形化介面去看,真的多出這個 experiment 了。如下圖。
點進這個實驗看,就可以到剛剛跑好那個 Run。
我們剛剛有做一個 log,去紀錄下準確度,我們點進 Metrics 這個頁簽,就可以看到剛剛的紀錄囉!
train_mnist.py
程式碼改寫如下,另存成 train_mnist_para.py
:import argparse
from azureml.core import Run
import tensorflow as tf
run = Run.get_context()
# 用 argparse 來參數化,我們把 echo 改成參數化吧!
parser = argparse.ArgumentParser()
parser.add_argument('--epochs', type=int, dest='epochs', default=1)
args = parser.parse_args()
epoch_para = args.epochs
data = tf.keras.datasets.mnist
(x_train, y_train),(x_test,y_test) = data.load_data()
x_train = x_train.reshape(x_train.shape[0], 28, 28,1)
x_test = x_test.reshape(x_test.shape[0], 28, 28,1)
x_train, x_test = x_train/255.0, x_test/255.0
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)
model = tf.keras.models.Sequential(
[
tf.keras.layers.Conv2D(32,(3,3), activation='relu',input_shape=(28,28,1)),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64,(3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
]
)
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
# 把 argparse 得到的參數放進去
model.fit(x_train,y_train,epochs=epoch_para)
loss, acc = model.evaluate(x_test, y_test)
# Log 一下 epoch 數
run.log('Epoch', epoch_para)
run.log('Accuracy', acc)
model.save("./mnist.h5")
from azureml.core import Experiment, ScriptRunConfig, Environment, Workspace
ws = Workspace.from_config()
env = Environment.get(ws,"AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu")
# 把參數設定進去
# 多個參數可以 ['--epochs', 2, --filter, 64] 依照這樣子的規則往後加
script_config = ScriptRunConfig(source_directory='.',
script='train_mnist_para.py',
arguments = ['--epochs', 2],
environment=env)
# 提交實驗,這裡我們再取另一個實驗名稱
experiment = Experiment(workspace=ws, name='training_experiment_para')
run = experiment.submit(config=script_config)
run.wait_for_completion()
等待這個實驗跑完了之後,我們就到 AML 的圖形化介面去看,真的多出這個參數化的 experiment 了。如下圖。
點進這個實驗看,就可以到剛剛跑好那個 Run。
我們剛剛有做一個 log,去紀錄下準確度,我們點進 Metrics 這個頁簽,可以看到 Accuracy 和 Epoch 的紀錄了,真的只跑了兩個 Epoch 呢!
以上就是今天的 ScriptRunConfig 啦!有沒有覺得這個功能真的超好用的呢?
明天我們就來註冊模型吧!
ps. 今天居然破6500了,程式碼一多,內容就爆炸啊 QQ