[插播]Azure也有文字轉語音？？？

2024 iThome 鐵人賽

DAY 29

生成式 AI

用AI寫程式也是要點本事的系列第 29 篇

16th鐵人賽

tw_hsu

2024-09-11 12:37:34

220 瀏覽

分享至

見標題。
而且該功能已經有人幫忙整合成Flutter套件了！

所以先去註冊...取得Key...
注意！這個Key的取得程序比較複雜，建議可以問問AI。寫得挺詳盡的。


1. 登入 [Azure 入口網站](https://portal.azure.com/)。
2. 在左側導航欄中，點擊 "創建資源"。
3. 在 "新建" 頁面中，搜尋 "Cognitive Services"。
4. 在搜索結果中，選擇 "Cognitive Services"，然後點擊 "創建" 按鈕。
5. 在 "創建" 頁面中，填寫必要的資訊，如訂閱、資源組、區域、名稱等，然後點擊 "檢閱 + 創建" 按鈕。
6. 在 "檢閱 + 創建" 頁面中，確認你輸入的資訊無誤後，點擊 "創建" 按鈕。
7. 等待 Azure 創建你的 Cognitive Services 資源。創建完成後，你將被導向到你的 Cognitive Services 資源的概覽頁面。
8. 在概覽頁面中，點擊左側導航欄中的 "金鑰和端點"。
9. 在 "金鑰和端點" 頁面中，你可以看到你的訂閱金鑰和端點。你可以點擊 "複製" 按鈕來複製你的訂閱金鑰。

然後來使用AI測試看看能不能導入該功能吧！


import 'dart:io';
import 'package:http/http.dart' as http;

class AzureTts {
  final String subscriptionKey;
  final String region;

  AzureTts({required this.subscriptionKey, required this.region});

  Future<http.Response> textToSpeech(String text) async {
    final String endpoint = 'https://$region.tts.speech.microsoft.com/cognitiveservices/v1';
    final String contentType = 'application/ssml+xml';
    final String outputFormat = 'audio-16khz-128kbitrate-mono-mp3';
    final String ssml = '''
      <speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'>
        <voice name='en-US-Guy24kRUS'>
          <prosody rate='-10.00%'>$text</prosody>
        </voice>
      </speak>
    ''';

    final http.Response response = await http.post(
      Uri.parse(endpoint),
      headers: {
        'Content-Type': contentType,
        'X-Microsoft-OutputFormat': outputFormat,
        'Ocp-Apim-Subscription-Key': subscriptionKey,
      },
      body: ssml,
    );

    return response;
  }
}

神奇的是：AI並沒有選擇使用套件，而是使用標準的html連線方式與API溝通後取得語音資料。

因此，要增加播放方式。

這是播放的方式。


  Future<void> playAudio(Uint8List audioData) async {
    AudioPlayer audioPlayer = AudioPlayer();
    await audioPlayer.playBytes(audioData);
  }

這是使用播放方式的方式。（好饒舌。）


  if (response.statusCode == 200) {
    Uint8List audioData = response.bodyBytes;
    await playAudio(audioData);
  } else {
    print('Failed to generate speech: ${response.statusCode}');
  }

最後...追加一個百萬終極大獎...它可以指定回傳的語音資料格式，也就是說之前煩惱半天的轉檔問題，就這樣迎刃而解了！


在 Azure Text to Speech 服務中，我們可以通過在 HTTP POST 請求的標頭中設定 X-Microsoft-OutputFormat 來指定輸出的音訊格式。以下是一些可用的音訊格式：  
raw-8khz-8bit-mono-mulaw: 8kHz 8位元單聲道 u-law 格式。
riff-8khz-8bit-mono-mulaw: 8kHz 8位元單聲道 u-law 格式，並包含 RIFF 頭部。
riff-16khz-16bit-mono-pcm: 16kHz 16位元單聲道 PCM 格式，並包含 RIFF 頭部。

程式碼：

Future<http.Response> textToSpeech(String text) async {
  final String endpoint = 'https://$region.tts.speech.microsoft.com/cognitiveservices/v1';
  final String contentType = 'application/ssml+xml';
  final String outputFormat = 'riff-16khz-16bit-mono-pcm'; // Set the output format to WAV
  // ...

  final http.Response response = await http.post(
    Uri.parse(endpoint),
    headers: {
      'Content-Type': contentType,
      'X-Microsoft-OutputFormat': outputFormat,
      'Ocp-Apim-Subscription-Key': subscriptionKey,
    },
    body: ssml,
  );

  return response;
}

好...來測試看看！...失敗！竟然告訴我它的XML標頭有錯誤！
以上洋洋灑灑寫了一堆，但其實是一長串無法修復的Code。
（我偷懶了，直接把整合套件的範例拿來使用，很順利的取得語音檔資料，也確實能播放。現在就差最後一步：存檔。）