iT邦幫忙

2023 iThome 鐵人賽

DAY 17
2
SideProject30

Laravel 擴展宇宙:從 1 到 100 十倍速打造產品獨角獸系列 第 17

#16 用 Colab 打造你的雲端機器學習運算平台 (1/2)

  • 分享至 

  • xImage
  •  

cover

在這個數位資訊爆發的時代,誰能掌握資訊,就如同在淘金潮中掌握金礦的勝利者一般,在這數位巨山中挖出有價值的資料。

在目前的聲音應用浪潮中,有能力將將聲音數據轉換成文字,這將大大地將資料分析和應用的範圍大大的拓展。在這篇文章中,我們將探索如何利用 Google Colab 和相關的學習模型來建立一個雲端的語音轉文字 (Speech-to-Text, STT) 系統。

什麼是 Speech-to-Text (STT)?

Speech-to-Text (STT) 是一種自動將語音轉換為文字的技術。這項技術在許多領域都有專門的應用場景,例如轉錄字幕、語音識別和虛擬助理......等等。
stt workflow

Colab 是啥

Google Colab 是一個免費的雲端 Jupyter Notebook 環境,它提供了免費的 GPU 和 TPU 資源,讓你可以在雲端運行你的機器學習和資料分析專案,提供了一個很好的 UI

Beginning

在開始之前,要先來認識一下選用的技術細節,在這邊選用的是 openai 自家出產的 whisper

Whisper 是個由 OpenAI 開發的自動語音辨識系統 (ASR, Automatic Speech Recognition)。OpenAI 透過收集網路上 68 萬小時的多語言(98 種語言)和多任務監督資料,對 Whisper 進行訓練。Whisper 的訓練資料包含了來自世界各地的各種口音、背景雜音和技術術語。OpenAI 認為,如此龐大而多元的資料集可以提高 Whisper 的辨識能力。

除了語音辨識,Whisper 還能進行多種語言的轉錄,以及將這些語言翻譯成英文。OpenAI 將 Whisper 的模型和推理程式碼開放給開發者,希望 Whisper 能作為建立有用的應用程式和進一步研究語音處理技術的基礎。

flow from openai whisper
openai whisper 的 overview

Benchmarks

以下是幾個主要的改良版本的比較,來自於 Whisper JAX

OpenAI Transformers Whisper JAX Whisper JAX
Framework PyTorch PyTorch JAX JAX
Backend GPU GPU GPU TPU
1 min 13.8 4.54 1.72 0.45
10 min 108.3 20.2 9.38 2.01
1 hour 1001.0 126.1 75.3 13.8

實際上從 openai 發表 Whisper 起近一年,社群各方都有人想辦法提升轉換精準度跟速度。像是 faster-whisperwhisper.cpp ,甚至是可以跑在瀏覽器中的 whisper.wasm

開始實作

在建構完整的流程之前,我們先來測試執行的流程是否可以符合需求的預期。

先連結 google drive 到 notebook 使用的 VM disk

from google.colab import drive
drive.mount('/content/drive')

安裝好 whisper 之後可以直接讀取檔案轉換

!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
!whisper "output.wav" --model large --language en

輸出結果:

[00:00.000 --> 00:24.800]  At the end of the Cold War, the United States made a policy decision that may be one of
[00:24.800 --> 00:27.880]  the biggest mistakes of the 20th century.
[00:27.880 --> 00:32.600]  It's contributed to chaos and uncertainty in this current day.
[00:32.600 --> 00:37.280]  And it's not based on politics, it's based on games.
[00:37.280 --> 00:40.640]  In game theory, there are two types of games.
[00:40.640 --> 00:44.600]  There are finite games and there are infinite games.
[00:44.600 --> 00:51.320]  A finite game is defined as known players, fixed rules, and agreed upon objective.
[00:51.320 --> 00:53.600]  Baseball, right?
[00:53.600 --> 00:57.280]  An infinite game is defined as known and unknown players.
[00:57.280 --> 01:02.800]  The rules are changeable and the objective is to perpetuate the game.
[01:02.800 --> 01:07.160]  When you pit a finite player versus a finite player, the system is stable.
[01:07.160 --> 01:08.360]  Baseball is stable.
[01:08.360 --> 01:10.800]  So is conventional war for that matter.
[01:10.800 --> 01:15.800]  When you pit an infinite player versus an infinite player, the system is also stable.
[01:15.800 --> 01:17.520]  The Cold War was stable.

看起來轉錄出來的效果,跟我們預期的準確度很接近了。

接下來可以試用 faster-whisper 看看效果

!pip install "faster-whisper @ https://github.com/guillaumekln/faster-whisper/archive/refs/heads/master.tar.gz"

# Convert model for faster-whisper
!ct2-transformers-converter --model openai/whisper-large-v2 --output_dir whisper-large-v2-ct2 --copy_files tokenizer.json --quantization float16
!ct2-transformers-converter --model openai/whisper-medium --output_dir whisper-medium-ct2 --copy_files tokenizer.json --quantization float16
!ct2-transformers-converter --model openai/whisper-small --output_dir whisper-small-ct2 --copy_files tokenizer.json --quantization float16
from faster_whisper import WhisperModel
import csv

model_path = "whisper-large-v2-ct2/"

# Run on GPU with FP16
model = WhisperModel(model_path, device="cuda", compute_type="float16")
# or run on GPU with INT8
# model = WhisperModel(model_path, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_path, device="cpu", compute_type="int8")

segments, info = model.transcribe(audio="output.wav", beam_size=5,  language="en")
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

captions = []
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
    line = {'start':segment.start, 'end':segment.end, 'text': segment.text}
    captions.append(line)

with open("captions.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["start", "end", "text"])
    for caption in captions:
        writer.writerow(['%.2f' % caption["start"], '%.2f' % caption["end"], caption["text"]])
[0.00s -> 16.60s]  Thanks very much.
[16.60s -> 24.82s]  At the end of the Cold War, the United States made a policy decision that may be one of
[24.82s -> 27.90s]  the biggest mistakes of the 20th century.
[27.90s -> 32.58s]  It's contributed to chaos and uncertainty in this current day.
[32.58s -> 37.26s]  And it's not based on politics, it's based on games.
[37.26s -> 40.66s]  In game theory, there are two types of games.
[40.66s -> 44.58s]  There are finite games and there are infinite games.
[44.58s -> 51.34s]  A finite game is defined as known players, fixed rules, and agreed upon objective.
[51.34s -> 53.58s]  Baseball, right?
[53.58s -> 57.26s]  An infinite game is defined as known and unknown players.
[57.26s -> 62.78s]  The rules are changeable and the objective is to perpetuate the game.
[62.78s -> 67.14s]  When you pit a finite player versus a finite player, the system is stable.
[67.14s -> 68.34s]  Baseball is stable.
[68.34s -> 70.78s]  So is conventional war, for that matter.
[70.78s -> 75.78s]  When you pit an infinite player versus an infinite player, the system is also stable.
[75.78s -> 77.50s]  The Cold War was stable.

Next

明天我們就可以來到把整個流程串接,確認實際可行的流程與程式碼。

預計的流程

Carwler

  • Get podcast episodes pending list and download audio files.
  • Save the files to google drive

Colab

  • Connect google drive to colab as a folder
  • Install and setup openai whisper
  • Loop the list and get audio files to transcribe
  • Save the captions to google drive

Referrences


上一篇
#15 打造專業的產品領航員:推薦系統初嘗試 (3/3)
下一篇
#17 用 Colab 打造你的雲端機器學習運算平台 (2/2)
系列文
Laravel 擴展宇宙:從 1 到 100 十倍速打造產品獨角獸30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言