iT邦幫忙

1

Python__識別圖片中的文字

需要安裝的有

  • PIL
  • pytesseract
  • Tesseract-OCR

打開命令提示元,輸入:

  1. PIL
pip install pillow
  1. Pytesseract
pip install pytesseract
  1. Tesseract-OCR
    Tesseract-orc-setup-3.02.02.exe

要記得自己的安裝路徑(我的安裝路徑為:C:\Program Files(x86)\Tesseract-OCR),待會會用到。


已上都完成後,開始進入實做吧!
首先用小畫家測試一下
https://ithelp.ithome.com.tw/upload/images/20181202/20113725Cjbbm5oFaK.png

import pytesseract
from PIL import Image

pytesseract.pytesseract.tesseract_cmd = 'C://Program Files (x86)/Tesseract-OCR/tesseract.exe'
image = Image.open("C:\Users\user\Desktop\Myimgtest\test_1.png")
text = pytesseract.image_to_string(image)
print(text)

輸出結果

Hello word !

功能介紹
pytesseract.pytesseract.tesseract_cmd 為Tesseract-OCR的安裝路徑
Image.open 你所要辨識圖片的所在地
pytesseract.image_to_string 圖片轉換為文字

如果發生SyntaxError

image = Image.open("C:\Users\user\Desktop\Myimgtest\test_1.png")
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

記得在引號最前面加個r → (r"")

在字符串前加個r 是為了告訴編譯器這個string是個raw string,不要轉譯

image = Image.open(r"C:\Users\user\Desktop\Myimgtest\test_1.png")


接下來換個的圖片來測試
https://ithelp.ithome.com.tw/upload/images/20181202/20113725ECLl2V1x8c.jpg

輸出結果

This translation was prepared by Lloyd Kramer. Kramer graduated from the
University of California, Berkeley, with a major in Russian. He is also a graduate of the U.S.
Navy Foreign Language School in Boulder, Colorado. While a student at Berkeley he was
president of Dobro Slovo, the Slavic language honor society. As a naval officer during World
War H he served as both interpreter and translator in Russian for the U.S. Navy. After the
war, Kramer worked for a year as an analyst in Washington, DC. Subsequent to this
assignment, he joined the staff of the Hoover Institute and Library, Stanford University,
where he helped organize and catalog the Institute's large collection of Slavic language nonv

book materials.

Mr. Kramer now resides, with his Wife Martha, in Twain Harte, California

February 23, 2000

尚未有邦友留言

立即登入留言