Day 12：物體偵測(Object Detection) + 影像標題(Image Captioning)

2018 iT 邦幫忙鐵人賽

DAY 12

AI & Machine Learning

以100張圖理解 Neural Network -- 觀念與實踐系列第 12 篇

2018鐵人賽 neural network machine learning ai

I code so I am

2017-12-22 09:37:45

97603 瀏覽

分享至

圖. 影像標題(Image Captioning)，圖片來源：cs231n_2017_lecture11 Detection and Segmentation

前言

影像辨識的發展，可以從 ImageNet ILSVRC 挑戰賽(Large Scale Visual Recognition Challenge)題目一窺端倪，2011年題目為影像分類(Classification)、影像分類及定位(Classification with Localization)，到了2017年題目為物體定位(Object Localization)、物體偵測(Object Detection)、影片物體偵測(Object Detection from Video)，就可以了解整個技術的演進，我們看看下圖，電腦視覺的應用概分為下列幾類：

分類(Classification)
按語意切割(Semantic Segmentation)：按事物類別區分像素分塊，不區分『實例』(Instance)。
定位(Classification + Localization)：標註單一物體(Single Object)所在的位置及大小。
物體偵測(Object Detection)：標註多個物體(Multiple Object)所在的位置及大小。
實體切割(Instance Segmentation)：標註『實例』(Instance)，同一類的物體可以區分各別的位置及大小，尤其物體之間有重疊。

圖. 除了分類(Classification)，其他電腦視覺的應用，圖片來源：cs231n_2017_lecture11 Detection and Segmentation

之前在『Day 10：CNN 應用 -- 找出相似的照片』一篇介紹如何找到我們如何找到相似的物體，問題再延伸一點，我們還要知道物體所在的位置及大小，進而把它標註出來，以利使用者迅速找到關注的物體，例如從監視器找嫌疑犯，如果能在每一幀(Frame)標註嫌疑犯，那警察抓嫌疑犯就方便多了。所以，我們就來看看怎麼實作。

實作

本程式來自Object Detection · Martin Thoma，程式較長，不易說明，我將部分參數固定(Hard Code)，並加上註解，放在SSD資料夾，可在這裡找到，其中 weights_SSD300.hdf5 檔案過大，請自https://mega.nz/#F!7RowVLCL!q3cEVRK9jyOSB9el3SssIA 下載，放在SSD資料夾中，再準備一組照片檔(*.jpg")，辨識內容限20類照片 -- 飛機、單車、鳥、小船、瓶子、巴士、轎車、貓、椅子、牛、餐桌、狗、馬、機車、人、盆栽、羊、沙發、火車、顯示器等，放在程式所在目錄下的images子目錄，然後執行下列指令：
python ssd_test.py

"""
Run object detection with VOC classes.

This is just a minor modification of code from
https://github.com/rykov8/ssd_keras
"""

from keras.applications.imagenet_utils import preprocess_input
from keras.preprocessing import image
import matplotlib.pyplot as plt
import numpy as np
from scipy.misc import imread
import sys

from ssd import SSD300
from ssd_utils import BBoxUtility
import os
from os.path import basename

def create_overlay(img, results, , plt_fname):
    plt.clf()
    # Parse the outputs.
    det_label = results[:, 0]
    det_conf = results[:, 1]
    det_xmin = results[:, 2]
    det_ymin = results[:, 3]
    det_xmax = results[:, 4]
    det_ymax = results[:, 5]

    # Get detections with confidence higher than 0.6.
    top_indices = [i for i, conf in enumerate(det_conf) if conf >= 0.6]

    top_conf = det_conf[top_indices]
    top_label_indices = det_label[top_indices].tolist()
    top_xmin = det_xmin[top_indices]
    top_ymin = det_ymin[top_indices]
    top_xmax = det_xmax[top_indices]
    top_ymax = det_ymax[top_indices]
    colors = plt.cm.hsv(np.linspace(0, 1, 21)).tolist()

    plt.imshow(img / 255.)
    currentAxis = plt.gca()
    currentAxis.axis('off')

    for i in range(top_conf.shape[0]):
        xmin = int(round(top_xmin[i] * img.shape[1]))
        ymin = int(round(top_ymin[i] * img.shape[0]))
        xmax = int(round(top_xmax[i] * img.shape[1]))
        ymax = int(round(top_ymax[i] * img.shape[0]))
        score = top_conf[i]
        label = int(top_label_indices[i])
        label_name = voc_classes[label - 1]
        display_txt = '{:0.2f}, {}'.format(score, label_name)
        coords = (xmin, ymin), xmax - xmin + 1, ymax - ymin + 1
        color = colors[label]
        currentAxis.add_patch(plt.Rectangle(*coords,
                                            fill=False,
                                            edgecolor=color,
                                            linewidth=2))
        currentAxis.text(xmin, ymin, display_txt,
                         bbox={'facecolor': color, 'alpha': 0.5})
    plt.savefig(plt_fname)
    print("save "+plt_fname)

if __name__ == "__main__":
    import glob
    imagesList = glob.glob("images/*.jpg")

    # Load the model
    voc_classes = ['Aeroplane', 'Bicycle', 'Bird', 'Boat', 'Bottle',
                   'Bus', 'Car', 'Cat', 'Chair', 'Cow', 'Diningtable',
                   'Dog', 'Horse', 'Motorbike', 'Person', 'Pottedplant',
                   'Sheep', 'Sofa', 'Train', 'Tvmonitor']
    NUM_CLASSES = len(voc_classes) + 1
    input_shape = (300, 300, 3)
    model = SSD300(input_shape, num_classes=NUM_CLASSES)
    model.load_weights('weights_SSD300.hdf5', by_name=True)
    bbox_util = BBoxUtility(NUM_CLASSES)

    # Load the inputs
    inputs = []
    images = []
    for img_path in imagesList:
        print("process " + img_path)
        img = image.load_img(img_path, target_size=(300, 300))
        img = image.img_to_array(img)
        images.append(imread(img_path))
        inputs.append(img.copy())
    # 前置處理
    print("前置處理...")
    inputs = preprocess_input(np.array(inputs))

    # 預測
    print("預測...")
    preds = model.predict(inputs, batch_size=1, verbose=1)
    # 取得預測結果
    results = bbox_util.detection_out(preds)
    print("results[0]=")
    print(results[0][0][1])

    # create folder if not exist
    output_directory="results"
    if not os.path.exists(output_directory):
        os.makedirs(output_directory)
    # proces images
    for i, img in enumerate(images):
        # 產生有框的 output files
        create_overlay(img, results[i], voc_classes,
           output_directory+"/{}.png".format(basename(os.path.splitext(imagesList[i])[0])))
    # Garbage collection, to prevent from TensorFlow error
    import gc
    gc.collect()

程式說明

程式不像前兩篇，這次的程式很快就可以執行完，程式流程說明如下：

首先會搜尋images資料夾下的jpg檔案。
呼叫 SSD300 函數，載入模型及訓練好的參數檔。
呼叫 detection_out 函數，取得預測結果，放在 results 變數，內容包括的類別序號及對應的機率，還有標註方框的位置及大小。
呼叫 create_overlay 函數，將標註方框及類別名稱畫在原圖上，另存新檔，放在 results 資料夾，附檔名為 png。
SSD 模型定義在 ssd.py 中，該模型結構非常複雜，可以在呼叫 SSD300 函數後，加下列指令將結構圖存檔：

from keras.utils.vis_utils import plot_model
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

針對執行結果有以下幾點說明：

影像標題(Image Captioning)：結果不只會有方框，也會標註類別名稱，如果不在 20 類的物體會顯示錯誤的類別名稱，如大象及熊讚。
只限20類是因為我們直接載入訓練好的模型(Pre-trained Model) -- weights_SSD300.hdf5，它的訓練資料只含20類物體，實際應用時，可以針對目標，放入更適合的類別資料，重新訓練，就可以得到我們想要的結果了。
訓練資料來源為PASCAL Visual Object Classes。
理論基礎請參考『SSD: Single Shot MultiBox Detector』。
本例是將結果存檔，如果要將結果直接顯示出來，可搭配使用 OpenCV 工具箱。

結語

有關影像的辨識就介紹到這裡，其實，筆者還很想花時間作一些實驗，但迫於鐵人賽不等人，只好等賽程結束，再好好重整旗鼓了，下一次開始我們就要開始『自然語言處理』之旅，它涉及的知識範圍更廣，筆者會花更多的篇幅，與大家一起努力。

Day 11：風格轉換(Style Transfer) -- 人人都可以是畢卡索

Day 13：『自然語言處理』(NLP) 概念介紹

系列文

以100張圖理解 Neural Network -- 觀念與實踐共 31 篇

RSS系列文訂閱系列文

467 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

4 則留言

ronaldxsun

iT邦新手 5 級 ‧ 2018-01-17 14:44:42

版主你好
請問若不想利用現成以訓練好的資料庫
想要自行訓練一筆資料
請問有什麼參考方法可以提供嗎?

回應 3
檢舉

I code so I am iT邦高手 1 級 ‧ 2018-01-17 15:18:34 檢舉

如果不是要辨識 voc 的20類物品，要重新訓練其他物品，可以不要載入權重，即將下行拿掉:
model.load_weights('weights_SSD300.hdf5', by_name=True)

再將訓練資料集放入模型中，重新訓練即可。

ronaldxsun iT邦新手 5 級 ‧ 2018-01-18 12:05:37 檢舉

了解~
不好意思，因為對於SSD的架構還不是太了解。
"再將訓練資料集放入模型中，重新訓練即可。"
意思是再撰寫一份train.py的檔案output成.hdf5就可以了嗎?
因為看專案裡面好像沒有提供訓練的code。
麻煩版主

I code so I am iT邦高手 1 級 ‧ 2018-01-18 16:50:04 檢舉

請參考下列這一篇：
https://ithelp.ithome.com.tw/articles/10191404

將原先的
model.load_weights('weights_SSD300.hdf5', by_name=True)
改為進行訓練，x/y 為自行準備的訓練資料
train_history = model.fit(x=x_Train_norm, y=y_TrainOneHot, validation_split=0.2, epochs=10, batch_size=800, verbose=2)

登入發表回應

peterhsu

iT邦新手 5 級 ‧ 2018-01-19 20:16:00

邦友你好
謝謝你做關於此tensorflow的文章，
我也對對於如何製作自己的train model感到疑惑，
也有參考樓上的留言但是還是不太了解。

因為其中的 model.load_weights('weights_SSD300.hdf5', by_name=True)
是指讀取weights_SSD300.hdf5
其中day2的mnist訓練方法也適用於SSD300嗎?
如何製作一個適合SSD300的trainning model.hdf5不太了解該怎麼進行..
謝謝您

回應 2
檢舉

I code so I am iT邦高手 1 級 ‧ 2018-01-19 22:06:25 檢舉

Keras 的程序是固定的

建立模型：本例如下，SSD300 定義在 ssd.py
model = SSD300(input_shape, num_classes=NUM_CLASSES)
訓練模型：model.fit，本例直接載入事先訓練的參數(model.load_weights)，未進行訓練，故要自行訓練，則需改為fit
評估模型：model.evaluate
預測：model.predict

day2 展示自行訓練的完整程序，所以，本例要自行訓練，可參考day2。

要儲存訓練結果，請參考day4，或執行下列指令：
from keras.models import load_model
model.save('model.h5') # creates a HDF5 file 'model.h5'

peterhsu iT邦新手 5 級 ‧ 2018-01-23 19:15:52 檢舉

您好
謝謝你的回答，請問若只需要訓練兩類(指非A即B)
那我model.fit裡面變數該怎麼撰寫呢?
因為不太了解內部變數的意義。
不好意思麻煩你

登入發表回應

whard

iT邦新手 5 級 ‧ 2019-01-02 01:49:53

版主您好，
感謝您這篇文章的教學，我也照著您的Day2及Day4兩篇文章使用Cifar-10這個數據集完成了訓練並且存成了hdf5檔，
但是使用model.load_weights匯入hdf5檔後，預測出來的框完全是錯誤的(很多都是標空白處)，
想請問這樣的情況是因為訓練圖片大小的問題嗎(3232)? 是否一定要用300300得圖檔來訓練呢??
還是預測程式這邊有參數設定錯誤了??? 還麻煩版主您解答了。

回應 1
檢舉

I code so I am iT邦高手 1 級 ‧ 2019-01-02 18:49:53 檢舉

應該是圖片大小的問題，必須是 300x300 以上，程式會自動 resize，解析度過低無法解析。Cifar-10解析度只有32x32。

登入發表回應

johnsonnnn

iT邦新手 4 級 ‧ 2019-11-26 14:56:53

你好
用ssd訓練時圖片不是會resize
那我們輸入的標記框不需要修改大小嗎？

回應 3
檢舉

I code so I am iT邦高手 1 級 ‧ 2019-11-26 19:56:47 檢舉

ssd_test.py 第90行
img = image.load_img(img_path, target_size=(300, 300))
會自動 resize

johnsonnnn iT邦新手 4 級 ‧ 2019-11-26 22:27:58 檢舉

謝謝你的回答

johnsonnnn iT邦新手 4 級 ‧ 2019-11-30 19:22:14 檢舉

還有想請問訓練時的priors應該如何設置

登入發表回應

我要留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19755 篇

完賽人數

529 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

以100張圖理解 Neural Network -- 觀念與實踐系列 第 12 篇