Day 30: 再造訪 ONNX Runtime

11th鐵人賽

renewang

2019-10-15 18:19:35

4683 瀏覽

分享至

ONNX Runtime 的任務

ONNX Runtime 提供一個與機械學習和深度學習實現架構獨立，能夠以高效能的方式在異質計算平台執行模型的方法。建立一個 ONNX Runtime 可執行的模型，通常需要遵循三個步驟：

用你喜愛的機械學習和深度學習架構訓練你的機械模型
轉換或輸出模型為 ONNX 格式（如 ONNXMLTools 即是一個 converter）
載入和執行模型使用 ONNX Runtime
我們在上篇文章中，就有以 sk2onnx 的例子來說明上列的三個步驟。該例子使用 ONNX Runtime 作為檢查的工具，今天我們則會根據 ONNX 的教學文 -- Inferencing SSD ONNX model using ONNX Runtime Server 將 ONNX deploy 到 ONNX Runtime 的伺服器上。

SSD 是 Single Shot Multibox Detector 的縮寫，是屬於使用 anchor box 而非傳統多次全掃描的物體偵測模型，關於 SSD 的詳細解釋，可以參考 Sam Zheng 的 SSD(Single Shot MultiBox Detector) 詳解
。另外也建議讀惡可以前往 Torch Hub了解怎麼使用 pre-trained 的 SSD 物體偵測模型>

Pre-requisites

為了能讓例子能順利運行，

下載相依 pacakges：numpy, pillow 和 matplotlib
建立一個 assets 資料夾
從 ONNX tutorial repo 下載兩個 python 檔案，predict_pb2.py 和 onnx_ml_pb2.py 放在 assets 資料夾下。後者定義一個 TensorProt 物件，若有安裝 onnx package 則非必要。
下載輸入影像檔blueangels.jpg
下載 COCO 標註檔:coco_clases.txt
建立 output 資料夾：這個資料夾用來存放輸出檔。如果你想要把結果圖片存在別的地方，記得改原始碼中 plt.savefig(<output_path>)的 <output_path> 即可。

我們今天的主角是這張影像：
original image

ORT Server docker image

目前 ONNX Runtime Server 目前還是實驗性質的 beta 版，建議使用 ONNX Runtime 提供的 docker image 來啟動 Server。讀者可以依循下列的命令將 image 從 Azure 的 registry repo 使用 docker pull 拉下 docker image 到本地端：
docker pull mcr.microsoft.com/onnxruntime/server
一旦 ORT Server 的 docker image 已經在本地端，則可以用 docker run 來啟動 server。
docker run -it -v $(pwd):$(pwd) -e MODEL_ABSOLUTE_PATH=$(pwd)/ssd.onnx -p 9001:8001 mcr.microsoft.com/onnxruntime/server
以上的命令是依據 mcr.microsoft.com/onnxruntime/server docker image 建立一個 container。該 container 有：

開啟 interative session，並允許 tty。 (-it option)
用同樣的目錄名稱，掛載目前所在的目錄到 container 的入口目錄下。(-v $(pwd):$(pwd) option)
開啟 container 的 8001 port，使其對應 host 的 9001 port。(-p 9001:8001 option)
設定環境變數 MODEL_ABSOLUTE_PATH 為 ssd.onnx 的位置。在這裡可以知道目前的工作目錄影該要包含 ssd.onnx。你可以改變呼叫 docker run 的位置，或改變呼叫的選項（-e MODEL_ABSOLUTE_PATH=$(pwd)/ssd.onnx option）
大家如果查看官方的教學，可以注意到官方教學使用 sudo 來執行 docker 命令。筆者的版本是 Docker Desktop for Mac 沒有遇到權限不足的問題。但，如果有讀者有權限不足的問題，可以使用 sudo 試試看。

輸入和輸出影像

使用 Pillow 的 Image 類別來讀入影像。 from PIL import Image
輸入的影像先需要轉為模型可以認得的維度，所以先調整尺寸為 1200 x 1200 img.resize((1200, 1200), Image.BILINEAR)
接著需要轉為四維，第一個維度為 batch，因為訓練時都是使用批次訓練，但因為是單一的測試影像，所以 batch 軸為 1。另外模型的輸入影像格式為 channel-first，所以輸出的影像檔為 (1, 3, 1200, 1200)
輸入影像需要先 scale 像素，使其在 0 到 1 之間，然後進行標準化。標準化所使用的 mean 和 stddev 則用給定常數的方式（可能依據 COCO 的訓練影像集算出）

mean_vec = np.array([0.485, 0.456, 0.406])
stddev_vec = np.array([0.229, 0.224, 0.225])
norm_img = (img_data/255 - np.expand_dims(mean_vec, 0).reshape(1, 3, 1, 1))  / np.expand_dims(stddev_vec, 0).reshape(1, 3, 1, 1)

建立請求 http 訊息

下列的原始碼，使用定義在 onnx_ml_pb2 的 protobuf schema 來包覆輸入張量。

# import assets.onnx_ml_pb2 as onnx_ml_pb2
# 先建立一個空的 TensorProt 物件
# input_tensor = onnx_ml_pb2.TensorProto()
from onnx import TensorProto
input_tensor = TensorProto()
# 指定物件的維度
input_tensor.dims.extend(norm_img_data.shape)
# 指定物件的資料型態（查看 onnx.proto 的 TensorProto 的 DataType enum 1 為 FLOAT）
input_tensor.data_type = 1
# 指定資料給 input_tensor 的 buffer 屬性，也是 raw_data
input_tensor.raw_data = norm_img_data.tobytes()

接下來，使用定義在 predict_pb2 的 protobuf schema 來包覆預測訊息。一樣先建立一個空的 PredictRequest 物件，接著用上面建立好的張量，填入 inputs 訊息。

import assets.predict_pb2 as predict_pb2
# 建立 PredictRequest 物件
request_message = predict_pb2.PredictRequest()
# 填入輸入張量資訊
request_message.inputs["image"].data_type = input_tensor.data_type
request_message.inputs["image"].dims.extend(input_tensor.dims)
request_message.inputs["image"].raw_data = input_tensor.raw_data

# 建立 header
content_type_headers = ['application/vnd.google.protobuf']
for h in content_type_headers:
    request_headers = {
        'Content-Type': h,
        'Accept': 'application/x-protobuf'
    }
print(request_headers)
# => {'Content-Type': 'application/vnd.google.protobuf', 'Accept': 'application/x-protobuf'}

傳送到 ONNX runtime server，讓 ORT server 做預測，最後傳回預測的結果。

import requests
# 將 PORT_NUMBER 改為在建立 ORT server docker image ，你所使用的 port 。
PORT_NUMBER = 9001 
# 建立 request 的 URL 
inference_url = "http://127.0.0.1:" + str(PORT_NUMBER) + "/v1/models/ssd/versions/1:predict"
# 使用傳送帶有預測訊息的 Http request ，並接收 Server 傳回的訊息
response = requests.post(inference_url, headers=request_headers, data=request_message.SerializeToString())
print(response)
#=> <Response [200]>
# Server 端的訊息
#[2019-10-15 08:32:31.621] [466485c3-c446-47b2-9258-3790bb45256c] [info] Model #Name: ssd, Version: 1, Action: predict
#[2019-10-15 08:32:31.659] [ServerApp] [info] [ServerApp onnxruntime #inference_session.cc:708 Run]: Running with tag: 466485c3-c446-47b2-9258-#3790bb45256c
#[2019-10-15 08:32:31.660] [ServerApp] [info] [466485c3-c446-47b2-9258-#3790bb45256c onnxruntime sequential_executor.cc:41 Execute]: Begin execution

這裡要注意的真的是 PORT_NUMBER，筆者在呼叫 docker run 指令，用容器內 8001 的 port 對應到 host 的 9001 port。如果你的客戶端是在 docker 裡面發送請求，你就要傳送訊息到 8001，而不是 9001。同理，如果你的客戶端是在 host 裡面發送請求，就要對 9001 發送請求。

解析傳回訊息

解析的步驟如下：

建立由 predict_pb2 protobuf 定義的 PredictResponse 物件，該物件有 outputs 屬性，該屬性持有一個 dict 物件，每一個元素的 key 是模型的輸出變數，而 value 則是一個持有資料的TensorProto 物件。在 SSD 例子中，需要傳為分類標註（'labels'），分類信賴職（'scores'）和 bounding boxes（'bboxes'）。
2. 再用 np.frombuffer 傳入三個物件的 raw_data 建立 numpy.ndarray。然而使用 np.frombuffer的輸出會改變原 TensorProto 物件的維度。但只要你的資料是連續且 C-order（row major），你就可以回復到原物件維度。

# 先建立一個空的  PredictResponse 物件
response_message = predict_pb2.PredictResponse()
# 解析 reponse.content 以填滿 response_message 屬性
response_message.ParseFromString(response.content)
# 這裡使用 np.frombuffer 去建立預測的 bounding boxes, labels 和 scores 
# response_message.outputs['bboxes'] 是一個 TensorProto，使用 raw_data 屬性獲取資料buffer 讓 np.frombuffer 方法解析
bboxes = np.frombuffer(response_message.outputs['bboxes'].raw_data, dtype=np.float32)
# response_message.outputs['labels'] 是一個 TensorProto，使用 raw_data 屬性獲取資料buffer 讓 np.frombuffer 方法解析
labels = np.frombuffer(response_message.outputs['labels'].raw_data, dtype=np.int64)
# response_message.outputs['scores'] 是一個 TensorProto，使用 raw_data 屬性獲取資料buffer 讓 np.frombuffer 方法解析
scores = np.frombuffer(response_message.outputs['scores'].raw_data, dtype=np.float32)
print('Boxes shape:', response_message.outputs['bboxes'].dims, 'As numpy:', bboxes.shape)
# => Boxes shape: [1, 200, 4] As numpy: (800,)
print('Labels shape:', response_message.outputs['labels'].dims, 'As numpy:', labels.shape)
# => Labels shape: [1, 200] As numpy: (200,)
print('Scores shape:', response_message.outputs['scores'].dims,  'As numpy:', scores.shape)
# => print('Scores shape:', response_message.outputs['scores'].dims,  'As numpy:', scores.shape)

最後使用 matplotlib 畫出 bounding box，步驟為：

先解析物件類別檔案：classes = [line.rstrip('\n') for line in open('assets/coco_classes.txt')]
畫出前 num_boxes bounding boxes。由 ORT server 傳回的 boxes 如上所見，總共有 200 個，且照他們的 scores 高低排列。所以我們只要取前 num_boxes bounding boxes即可。另外，Bounding boxes 的座標被 scale 零到一的數值，如果要會在影像上則需要依照影像的長寬來調整 bounding box 的大小。

resized_width = 1200  
resized_height = 1200
num_boxes = 6   
# 取前 num_boxes bounding box
for c in range(num_boxes):    
    base_index = c * 4
    # 依照 row-order 存取每一個 bounding box 的四個座標，依照  y1, x1, y2, x2 的順序
    y1, x1, y2, x2 = bboxes[base_index] * resized_height, bboxes[base_index + 1] * resized_width, bboxes[base_index + 2] * resized_height, bboxes[base_index + 3] * resized_width 
    color = 'blue'
    box_h = (y2 - y1)
    box_w = (x2 - x1)
    # 需要傳入 bounding 長和寬
    bbox = patches.Rectangle((y1, x1), box_h, box_w, linewidth=2, edgecolor=color, facecolor='none')
    ax.add_patch(bbox)
    # 同時在 bounding box 旁寫類別名稱
    plt.text(y1, x1, s=classes[labels[c] - 1], color='white', verticalalignment='top', bbox={'color': color, 'pad': 0})