Day 12 Azure cognitive service: OCR- 光學字元辨識

2021 iThome 鐵人賽

DAY 12

AI & Data

我不太懂 AI，可是我會一點 Python 和 Azure系列第 12 篇

13th鐵人賽 microsoft azure azure azure cognitive services

Ben

團隊能去健身房後發現硬舉退步一百公斤的五隻雞

2021-09-12 07:28:38

5839 瀏覽

分享至

Azure cognitive service: OCR- 光學字元辨識

OCR- Optical Character Recognition

The quick brown fox jumps over the lazy dog. 這句話涵蓋了 a 到 z ，共 26 個字母，這句話原本是用來檢測鍵盤有沒有故障。

光學字元辨識是透過影像處理，擷取並辨別影像上的文字，稍微介紹一下簡單作法的流程：

影像前處理：把多餘的雜訊去除，比較單純的情況，可以先將圖片二值化，變成黑白圖片，然後再把雜訊去除。


原始影像	二值化	濾雜訊

文字偵測：找出圖像中是文字的部分
- 利用深度學習找出文字的位置，例如：EAST- Efficient accurate scene text detector。
- 在文字圖形相對單純的情況，例如：有些網站會用驗證碼來確認使用者是人類，也就是所謂的驗證碼（Completely Automated Public Turing test to tell Computers and Humans Apart，CAPTCHA）。這種情況可能是固定字數的字母，字母大小可能也差不多，再加上一些雜訊，可以用簡單的規則偵測每個字母的位置。


原始影像	前處理	文字切分

文字辨識：辨別圖像中的文字為何
- 可以利用深度學習得到結果，例如：CRNN- Convolutional Recurrent Neural Network。
- 或者，當文字相對單純的情況，可以用比較簡單的做法，例如：LeNet、KNN或SVM，把偵測到的字母切成固定大小的影像，透過機器學習的方法去做分類，辨別每個字母。
後處理：文字辨識不見得會完全正確，這時可以根據上下文，再搭配已經建立好的詞庫，校正文字辨識的結果。

以上這些流程，除了後處理以外，其他都可以靠 Azure OCR 來幫忙搞定，下面開始利用 Azure 來處理 OCR。

OCR with Azure

先取得金鑰 (SUBSCRIPTION KEY) 和端點 (ENDPOINT)，作法與物體偵測一樣，都是從電腦視覺服務的頁面取得。
安裝Python套件
- azure-cognitiveservices-vision-computervision
- Pillow
- requests

from azure.cognitiveservices.vision.computervision \
import ComputerVisionClient
from msrest.authentication import (
  CognitiveServicesCredentials
)
from azure.cognitiveservices.vision.computervision.models \
import OperationStatusCodes
from io import BytesIO
import requests
from PIL import Image, ImageDraw, ImageFont

# 利用金鑰SUBSCRIPTION_KEY和端點ENDPOINT，取得使用電腦視覺服務的權限。
SUBSCRIPTION_KEY = "YOUR SUBSCRIPTION_KEY"
ENDPOINT = "YOUR ENDPOINT"
CV_CLIENT = ComputerVisionClient(
    ENDPOINT, CognitiveServicesCredentials(SUBSCRIPTION_KEY)
)

# 讀取 URL 得到圖片
url = "https://i.imgur.com/qyWiqQv.jpg"
response = requests.get(url)
img = Image.open(BytesIO(response.content))


draw = ImageDraw.Draw(img)
font_size = int(5e-2 * img.size[1])
fnt = ImageFont.truetype(
  "../static/TaipeiSansTCBeta-Regular.ttf",
  size=font_size)

# 開始利用 Azure 電腦視覺執行 OCR
ocr_results = CV_CLIENT.read(url, raw=True)
operation_location_remote = \
ocr_results.headers["Operation-Location"]
operation_id = operation_location_remote.split("/")[-1]

# 因為讀取文字有多有少，所以時間會不一，透過 operation_id 可以確認目前進度
status = ["notStarted", "running"]
while True:
    get_handw_text_results = \
    CV_CLIENT.get_read_result(operation_id)
    if get_handw_text_results.status not in status:
        break
    time.sleep(1)

# 當執行狀態 status 為 succeeded ，就可以把結果標示在原本的照片上了
succeeded = OperationStatusCodes.succeeded

if get_handw_text_results.status == succeeded:
    res = get_handw_text_results.analyze_result.read_results
    for text_result in res:
        for line in text_result.lines:
            bounding_box = line.bounding_box
            bounding_box += bounding_box[:2]
            draw.line(
                line.bounding_box, 
                fill=(255, 0, 0), 
                width=int(font_size / 10)
            )
            
            left = line.bounding_box[0]
            top = line.bounding_box[1]
            draw.text(
                [left, top - font_size],
                line.text,
                fill=(0, 255, 255),
                font=fnt,
            )

# bounding_box是四邊形的頂點 [x1, y1, x2, y2, x3, y3, x4, y4]，這邊的四邊形並非長方形，要使用 draw.line 畫出封閉四邊形。
# draw.line 需要知道起點位置，才能畫出封閉形狀。


img.save("output.png")