有時候我們在做網站爬蟲,遇到登入網站會員需要填入識別碼,有時候又是數字又是英文,你可以裝 tesseract 直接在command line使用,或用圖像辨識工具 OcrSpace 免費版一天可以使用500次
來進入今天的主題~~
然後到剛才註冊的信箱收信
就可以拿到API key了
有好新人幫我們把語法包裝成gem了,先安裝:
gem 'ocr_space'
別忘記 bundle 喔
在想用OcrSpace的檔案裡面
require 'ocr_space'
resource = OcrSpace::Resource.new(apikey: "YOUR API KEY")
如果今天要解析這張圖
result = resource.clean_convert url: "https://i.imgur.com/hbDvSlo.png"
puts result
=> #If you want to find the secrets of the universe, think in terms of energy, frequency and vibration. AZ QUOTES
result = resource.clean_convert file: "/Users/suyesh/Desktop/nicola_tesla.jpg"
puts result
=> #If you want to find the secrets of the universe, think in terms of energy, frequency and vibration. AZ QUOTES
翻看source code可以看到 #clean_convert 可以加上兩個optionlanguage
跟 isOverlayRequired
clean_convert(apikey: @api_key, language: 'eng', isOverlayRequired: false, file: nil, url: nil)
這兩個option介紹如下
#[Optional] 預設為false,選擇是否圖片上覆蓋文字
Allows you to specify if the image/pdf text overlay is required. Overlay could be #used to show the text over the image
#Czech = ce; Danish = dan; Dutch = dut; English = eng; Finnish = fin; French = fre; German = ger;Hungarian=hun;
#Italian = ita; Norwegian = nor; Polish = pol; Portuguese = por; Spanish = spa; Swedish = swe; #ChineseSimplified = chs; Greek = gre; Japanese = jpn; Russian = rus; Turkish = tur; ChineseTraditional = cht; #Korean = kor