【第3天】資料前處理-YOLOv4與自動框選中文字

2021 iThome 鐵人賽

DAY 3

AI & Data

手寫中文字之影像辨識系列第 3 篇

13th鐵人賽

Ethan Chen

2021-09-18 23:09:20

6089 瀏覽

分享至

現況

觀察主辦單位提供的資料集(約7萬張圖檔)，發現圖檔大致分為下列幾種。
1.1 圖檔內只有1個中文字

1.2 圖檔內中文字有其他字跡或只有半個字

1.3 圖檔內含2個以上中文字

1.4 圖檔內無文字(僅有空白背景)
為了分類上述圖檔，決定採用one-stage物件偵測演算法-YOLOv4，小量抽樣建模後，自動框選中文字。
此次抽樣建模的資料集較小(3000張)，適合部署到Colab上以免費GPU訓練。若資料集過大，不建議以Colab訓練模型，曾經遇到VMware硬碟空間不足、上傳資料時間過久、模型訓練到一半Colab斷線或提示GPU用量額滿的慘痛經驗。

工具/套件

LabelImg
YOLOv4
Google Colab

內容

LabelImg

1.1 Windows環境-安裝與執行
- <方法一> 以Python執行
  
  ※註：詳細步驟或其他環境的安裝，可參考官方文件
- <方法二> 以EXE檔執行
  於官方I/O下載解壓縮後，即可執行。
1.2 操作過程
- 執行LabelImg後，分別點選Open Dir與Change Save Dir，設定訓練集圖檔路徑及標記Annotations(XML檔)的儲存路徑。
- 勾選選單中View的Auto Save
- 框選中文字體(Bounding box)，並標記為word (快捷鍵：W拉框框、D下一張、A上一張)
- 每標記完一張圖檔都會產出一個對應的XML檔案(切記！檔名盡量不要使用中文字)
YOLOv4部署到Colab訓練

2.1 在Colab上訓練YOLOv4，可以避免Windows+YOlOv4，在呼叫本地端GPU時出現異常。若有Windows+YOlOv4的穩定訓練方法，也歡迎大家留言告訴我。

2.2 事前準備
- 按此下載 train.rar，並解壓縮成train資料夾。
- 將資料集(3000張)放進train下的\VOCdevkit\VOC2021\JPEGImages
- 將LabelImg標記產生的XML檔(3000個)，放進train底下\VOCdevkit\VOC2021\Annotations
- 執行train資料夾中的gen_train_val.py，將資料集以8:2比例分配train與val。
```
import os, random
from os.path import join, splitext

base = 'VOC2021'
source = join('VOCdevkit',base, 'JPEGImages/')
train_txt = join('VOCdevkit',base, 'ImageSets/Main/train.txt')
val_txt = join('VOCdevkit',base, 'ImageSets/Main/val.txt')

files = os.listdir(source)
random.shuffle(files)

f_train = open(train_txt,'a')
f_val = open(val_txt,'a')

for i, file in enumerate(files):
    name = splitext(file)[0]
    if (i >=len(files)*0.2):
        f_train.write(name+'\n')
    else:
        f_val.write(name+'\n')

f_train.close()
f_val.close()
```
- 執行gen_train_val.py後，得到train.txt與val.txt
- train資料夾中voc_label.py，用於YOLOv4訓練模型前的datasets預處理(標記Train / Test/Val資料集)，將在部署到Colab之後執行。
  
  ※註：如果想增加YOLOv4辨識的標籤種類(classes)，需修改train資料夾中檔案的參數設定，請參閱下圖。
- 接下來是將整個train資料夾壓縮成train3.zip
2.3 部署到Colab訓練：最後一步是將train3.zip上傳到Colab訓練，完成模型訓練後，壓縮成train3_finished.tar.gz，再下載到本地端。考量到篇幅過長，詳細過程請參閱我的Github

模型成效
3.1 將train3_finished.tar.gz解壓縮後，train資料夾中predit.py，執行後可自動框選中文字。

import cv2
import numpy as np
import os

#讀取模型與訓練權重
def initNet():
    CONFIG = 'yolov4-tiny-myobj.cfg'
    WEIGHT = 'yolov4-tiny-myobj_last.weights'
    net = cv2.dnn.readNet(CONFIG,WEIGHT)
    model = cv2.dnn_DetectionModel(net)
    model.setInputParams(size=(416, 416), scale=1/255.0)
    model.setInputSwapRB(True)
    return model

#物件偵測
def nnProcess(image, model):
    classes, confs, boxes = model.detect(image, 0.4, 0.1)
    return classes, confs, boxes

#框選偵測到的物件
def drawBox(image, classes, confs, boxes):
    new_image = image.copy()
    for (classid, conf, box) in zip(classes, confs,boxes):
     x, y, w, h = box
        if x - 2 < 0:
            x = 2
        if y - 2 < 0:
            y = 2
        cv2.rectangle(new_image, (x - 2, y - 2), (x + w + 2, y + h + 2), (0, 255, 0), 2)
    return new_image

if __name__ == '__main__':
    # 主辦單位提供的資料集(約7萬張)
    source = './01_origin/'
    files = os.listdir(source)
    # 依照正整數排序
    files.sort(key=lambda x:int(x[:-6]))
    model = initNet()
    for file in files:
        img = cv2.imdecode(np.fromfile(source+file,dtype=np.uint8), -1)
        classes, confs, boxes = nnProcess(img, model)
        try:
            frame = drawBox(img, classes, confs, boxes)
            frame = cv2.resize(frame, (240, 200), interpolation=cv2.INTER_CUBIC)
            # 顯示框選後的圖片
            cv2.imshow('img', frame)
            cv2.waitKey()
        except:
            continue
    print('程式執行完畢')

3.2 框選中文字效果

圖檔內只有1個中文字
圖檔內中文字有其他字跡或只有半個字
圖檔內含2個以上中文字
圖檔內無文字(僅有空白背景)

小結

從3.2的結果得知，訓練後的YOLOv4模型，可以正確地框選出中文字。唯一美中不足的地方，大概是「體」被模型當成2個字框選。
承上，推測可能是小量抽樣(3000張)中，「體」這種情況的樣本出現頻率較低。增加抽樣數量、重新訓練模型後，應該可改善框選表現。
考量到：「正式比賽時，每張圖檔內只會有一個最正確的中文字」。因此，下一章的目標是：「從7萬張圖檔中，取出YOLOv4模型框選為1個中文字的圖檔，再將物件偵測框(綠色邊界框)內的中文字裁切、另存圖檔，做為新的資料集」。

讓我們繼續看下去...