iT邦幫忙

0

使用 Gemini 2.5 Flash TTS、Angular v21 和 Firebase 串流 AI 語音

  • 分享至 

  • xImage
  •  

在這篇技術深潛中,我們將探討如何將最新的 Google Gemini 模型整合到現代網頁技術堆疊中。我們使用 Gemini 3 Flash Preview 來產生具洞察力且冷門的事實,並使用 Gemini 2.5 Flash 提供高品質的文字轉語音 (TTS) 功能。我們將逐步講解如何實作同步音訊傳輸、利用動態 WAV 標頭建構進行即時串流,以及使用 Angular 進行專業級的瀏覽器播放。

1. 前提條件

技術堆疊包括:

  • Angular v21:建置前端使用者介面。
  • Node.js LTS:用於實作 Firebase Cloud Functions 中的後端邏輯。
  • Firebase Cloud Functions:由前端呼叫以同步產生音訊或將資料分塊 (chunks) 串流至用戶端。
  • Firebase Cloud Functions 模擬器:用於在本地端測試函式,網址為 http://localhost:5001
  • Gemini 3 Flash Preview:用於根據主題標籤產生文字內容 (冷門事實)。
  • Gemini 2.5 Flash:用於文字轉語音 (TTS) 操作。
  • Vertex AI:在 Vertex AI 中使用 Gemini 產生文字和語音。

2. 文字轉語音的使用案例

同步音訊產生

雲端函式產生音訊,在將原始資料從 L16 mime 類型轉換為 WAV mime 類型後,返回給 Angular 應用程式播放語音。

分塊下載並播放 Blob URL

Angular 應用程式想要串流音訊,因此雲端函式從 Gemini TTS 模型獲取串流。函式在將原始資料發送到用戶端之前將其轉換為 base64 緩衝區。處理完所有分塊後,函式計算 WAV 標頭並將其傳回。用戶端在建立 Blob URL 之前收集所有分塊和 WAV 標頭。

Angular 應用程式將 Blob URL 指派給 HTML5 <audio> 元件以播放語音。

即時音訊播放

這是一種用於即時串流語音的高階且低延遲方法。
同樣地,Angular 應用程式要求串流音訊,而雲端函式將分塊傳輸給它。

用戶端使用 Web Audio API (AudioContext) 在分塊到達時排程並播放它們,而不是等到最後才建構有效的 Blob URL。


3. 語音語調

在深入研究函式之前,我為文字轉語音操作定義了不同的語調和語音設定檔。Gemini TTS 允許我使用預建語音來敘述文字。

此外,我在提示詞中加入了語調,以確保聲音聽起來像黑武士 (Darth Vader) 或充滿活力的人。

// text-to-audio/constants/tone.const.ts 

export const DARTH_VADER_TONE = `You are a voice-over specialist for an advanced Text-to-Speech engine.
Your goal is to generate text and formatting that mimics the voice of Darth Vader (James Earl Jones).
**Vocal Characteristics:**
1.  **Pitch:** Extremely low, resonant, and bass-heavy.
2.  **Cadence:** Slow, deliberate, and rhythmic. Never rush. Each word carries the weight of authority.
3.  **Timbre:** Authoritative, menacing, and slightly gravelly, but with perfect clarity.
4.  **Breathing:** Every 2-3 short sentences (or one long sentence), you must insert
a mechanical respirator sound marker: [Mechanical Breath: Inhale/Exhale]
Read the text below EXACTLY once:`;

export const LIGHT_TONE = `Speak with a high-pitched, infectious energy.
End every sentence with a rising, joyful intonation.
Sound incredibly eager to please, as if you’ve just won a prize.
Read the text below EXACTLY once:`;

DARTH_VADER_TONE 確保語音聽起來像黑武士,具有低沈且厚重的呼吸聲。另一方面,LIGHT TONE 讓語音以充滿活力且愉悅的情緒說話。


4. 語音配置

import { GenerateContentConfig } from '@google/genai';

export function createVoiceConfig(voiceName = "Kore"): GenerateContentConfig {
    return {
      responseModalities: ["audio"],
      speechConfig: {
          voiceConfig: {
              prebuiltVoiceConfig: {
                  voiceName,
              },
          },
      },
    };
}

我定義了一個 createVoiceConfig 函式來協助建立語音配置。voiceName 是預建語音名稱,例如 KorePuck。未提供引數時,預設語音為 Kore

const KORE_VOICE_CONFIG = createVoiceConfig();

const PUCK_VOICE_CONFIG = createVoiceConfig("Puck");

5. 從文字產生語音的雲端函式

後端進入點利用了 onCall 觸發器。我們使用三元運算式根據用戶端的能力來判斷提供串流回應 (acceptsStreaming 為 true) 還是同步回應 (acceptsStreaming 為 false)。

import { onCall } from "firebase-functions/v2/https";
import { readFactFunction, readFactFunctionStream } from "./read-fact";

const cors = process.env.WHITELIST ? process.env.WHITELIST.split(",") : true;
const options = {
    cors,
    enforceAppCheck: true,
    timeoutSeconds: 180,
};

export const readFact = onCall(options, ({ data, acceptsStreaming }, response) =>
    acceptsStreaming && response ? readFactFunctionStream(data, response) : readFactFunction(data),
);

6. 驗證環境變數

import { HttpsError } from "firebase-functions/v2/https";

export function validate(value: string | undefined, fieldName: string, missingKeys: string[]) {
    const err = `${fieldName} is missing.`;
    if (!value) {
        logger.error(err);
        missingKeys.push(fieldName);
        return "";
    }

    return value;
}

export function validateAudioConfigFields() {
    const env = process.env;
    const vertexai = (env.GOOGLE_GENAI_USE_VERTEXAI || "false") === "true";

    const missingKeys: string[] = [];
    const location = validate(env.GOOGLE_CLOUD_LOCATION, "Vertex Location", missingKeys);
    const model = validate(env.GEMINI_TTS_MODEL_NAME, "Gemini TTS Model Name", missingKeys);
    const project = validate(env.GOOGLE_CLOUD_QUOTA_PROJECT, "Google Cloud Project", missingKeys);

    if (missingKeys.length > 0) {
        throw new HttpsError("failed-precondition", `Missing environment variables: ${missingKeys.join(", ")}`);
    }

    return {
        genAIOptions: {
            project,
            location,
            vertexai,
        },
        model,
    };
}

validateAudioConfigFields 函式驗證 Gemini TTS 模型名稱、Google 專案 ID 和 Google Cloud 位置是否已定義。當 missingKeys 列表非空時,將擲回錯誤並停止 TTS 操作。


7. L16 到 Wav 轉換公用程式

export type WavConversionOptions = {
    numChannels: number;
    sampleRate: number;
    bitsPerSample: number;
};
import { HttpsError } from "firebase-functions/v2/https";

export function encodeBase64String(rawData: string, mimeType: string): string {
  const wavBuffer = convertToWav(rawData, mimeType);
  const base64Data = Buffer.from(wavBuffer).toString("base64");
  return `data:audio/wav;base64,${base64Data}`;
}

export function convertToWav(rawData: string, mimeType: string): Buffer<ArrayBuffer> {
    const options = parseMimeType(mimeType);
    const wavHeader = createWavHeader(rawData.length, options);
    const buffer = Buffer.from(rawData, "base64");

    return Buffer.concat([wavHeader, buffer]);
}

export function parseMimeType(mimeType: string) {
    const [fileType, ...params] = mimeType.split(";").map((s) => s.trim());
    const format = fileType.split("/")[1];

    const options: Partial<WavConversionOptions> = {
        numChannels: 1,
    };

    if (format && format.startsWith("L")) {
        const bits = parseInt(format.slice(1), 10);
        if (!isNaN(bits)) {
            options.bitsPerSample = bits;
        }
    }

    for (const param of params) {
        const [key, value] = param.split("=").map((s) => s.trim());
        if (key === "rate") {
            options.sampleRate = parseInt(value, 10);
        }
    }

    if (!isWavConversionOptions(options)) {
        throw new HttpsError(
            "invalid-argument",
            `Invalid or incomplete mimeType: "${mimeType}". ` +
                "Could not determine all required WAV options (sampleRate, bitsPerSample).",
        );
    }

    return options;
}

function isWavConversionOptions(options: Partial<WavConversionOptions>): options is WavConversionOptions {
    // A valid WavConversionOptions object must have all properties as valid numbers.
    return (
        typeof options.numChannels === "number" &&
        !isNaN(options.numChannels) &&
        typeof options.sampleRate === "number" &&
        !isNaN(options.sampleRate) &&
        typeof options.bitsPerSample === "number" &&
        !isNaN(options.bitsPerSample)
    );
}

export function createWavHeader(dataLength: number, options: WavConversionOptions) {
    const { numChannels, sampleRate, bitsPerSample } = options;

    const byteRate = (sampleRate * numChannels * bitsPerSample) / 8;
    const blockAlign = (numChannels * bitsPerSample) / 8;
    const buffer = Buffer.alloc(44);

    buffer.write("RIFF", 0); // ChunkID
    buffer.writeUInt32LE(36 + dataLength, 4); // ChunkSize
    buffer.write("WAVE", 8); // Format
    buffer.write("fmt ", 12); // Subchunk1ID
    buffer.writeUInt32LE(16, 16); // Subchunk1Size (PCM)
    buffer.writeUInt16LE(1, 20); // AudioFormat (1 = PCM)
    buffer.writeUInt16LE(numChannels, 22); // NumChannels
    buffer.writeUInt32LE(sampleRate, 24); // SampleRate
    buffer.writeUInt32LE(byteRate, 28); // ByteRate
    buffer.writeUInt16LE(blockAlign, 32); // BlockAlign
    buffer.writeUInt16LE(bitsPerSample, 34); // BitsPerSample
    buffer.write("data", 36); // Subchunk2ID
    buffer.writeUInt32LE(dataLength, 40); // Subchunk2Size

    return buffer;
}

Gemini TTS 模型產生的音訊 mime 類型為 audio/L16。HTML 音訊播放器元件不支援此 mime 類型。因此,需要從 L16 轉換為 WAV 類型。

encodeBase64String 使用原始資料和 mime 類型建立一個帶有有效 WAV 標頭的 WAV 緩衝區。它將 data:audio/wav;base64, 前置於緩衝區以形成編碼的 base64 資料 URL。

createWavHeader 建立一個 44 位元組的 WAV 標頭。

convertToWav 呼叫 createWavHeader 來建立 WAV 標頭,並將其與 base64 緩衝區連接以建立完整的 WAV 緩衝區。


8. 文字轉語音範例

async function withAIAudio(callback: (ai: GoogleGenAI, model: string) => Promise<string | number[] | undefined>) {
    try {
        const variables = validateAudioConfigFields();
        if (!variables) {
            return "";
        }

        const { genAIOptions, model } = variables;
        const ai = new GoogleGenAI(genAIOptions);
        return await callback(ai, model);
    } catch (e) {
        console.error(e);
        if (e instanceof HttpsError) {
            throw e;
        }
        throw new HttpsError("internal", "An internal error occurred while setting up the AI client.", { originalError: (e as Error).message });
    }
}

withAIAudio 是一個高階函式,它執行回呼函式以傳回 Promise<string | number[] | undefined>。字串可以是編碼的 base64 資料 URL、WAV 標頭或 undefined

8.1. 無伺服器:一次性產生內嵌資料 (同步)

在同步工作流程中,readFactFunction 函式使用 KORE_VOICE_CONFIGDARTH_VADER_TONE 產生整個音訊部分。結果被轉換為 WAV 緩衝區並作為 Base64 資料 URL 傳回。

export async function readFactFunction(text: string) {
    return withAIAudio((ai, model) => generateAudio({ ai, model }, text));
}

text 是 Angular 應用程式傳遞給雲端函式以轉換為語音的冷門事實。

function createAudioParams(model: string, contents: string, config?: GenerateContentConfig) {
    return {
        model,
        contents: {
            role: "user",
            parts: [
                {
                    text: contents,
                },
            ],
        },
        config,
    };
}

createAudioParams 建立包含模型名稱、文字和語音配置的內容配置。

function extractInlineAudioData(response: GenerateContentResponse): {
    rawData: string | undefined;
    mimeType: string | undefined;
} {
    const { data: rawData, mimeType } = response.candidates?.[0]?.content?.parts?.[0]?.inlineData ?? {};

    return { rawData, mimeType };
}

function getBase64DataUrl(response: GenerateContentResponse): string {
  const { rawData, mimeType } = extractInlineAudioData(response);

  if (rawData && mimeType) {
    return encodeBase64String(rawData, mimeType);
  }
  throw new Error("Audio generation failed: No audio data received.");
}

getBase64DataUrl 從內嵌資料中擷取原始資料和 mime 類型。它將原始資料和 mime 類型都提供給 encodeBase64String 函式以傳回編碼的 base64 資料 URL。

async function generateAudio(aiTTS: AIAudio, text: string): Promise<string> {
  try {
    const { ai, model } = aiTTS;
    const contents = `${DARTH_VADER_TONE.trim()} ${text.trim()}`;

    const response = await ai.models.generateContent(
      createAudioParams(model, contents, KORE_VOICE_CONFIG)
    );

    return getBase64DataUrl(response);
  } catch (error) {
    console.error("Error generating audio:", error);
    throw error;
  }
}

generateAudio 函式將文字附加到黑武士語調後形成提示詞。接著它使用 ai 執行個體來產生音訊回應。最後,它將 response 傳遞給 getBase64DataUrl 以傳回編碼的 base64 資料 URL。


8.2. 利用動態 WAV 標頭串流內嵌資料

為了提供更好的使用者體驗,我將音訊分塊在產生時立即發送到用戶端。我在疊代串流時累計總位元組長度,最後再使用長度和 WAV 選項建構 WAV 標頭。

export async function readFactFunctionStream(text: string, response: CallableResponse<unknown>) {
    return withAIAudio((ai, model) => generateAudioStream({ ai, model }, text, response));
}
async function generateAudioStream(
    aiTTS: AIAudio,
    text: string,
    response: CallableResponse<unknown>,
): Promise<number[] | undefined> {
    try {
        const { ai, model } = aiTTS;
        const contents = `${LIGHT_TONE.trim()} ${text.trim()}`;

        const chunks = await ai.models.generateContentStream(createAudioParams(model, contents, PUCK_VOICE_CONFIG));

        let byteLength = 0;
        let options: WavConversionOptions | undefined = undefined;
        for await (const chunk of chunks) {
            const { rawData, mimeType } = extractInlineAudioData(chunk);
            if (!options && mimeType) {
                options = parseMimeType(mimeType);
                response.sendChunk({
                    type: "metadata",
                    payload: {
                        sampleRate: options.sampleRate,
                    },
                });
            }

            if (rawData && mimeType) {
                const buffer = Buffer.from(rawData, "base64");
                byteLength = byteLength + buffer.length;
                response.sendChunk({
                    type: "data",
                    payload: {
                        buffer,
                    },
                });
            }
        }

        if (options && byteLength > 0) {
            const header = createWavHeader(byteLength, options);
            return [...header];
        }

        return undefined;
    } catch (error) {
        console.error(error);
        throw error;
    }
}

generateAudioStreamai.models.generateContentStream 獲取分塊串流。每個分塊在發送到用戶端之前都會轉換為緩衝區。在退出函式之前,它會根據位元組長度和 WAV 選項建構 WAV 標頭。


9. Angular 實作

9.1 設定服務 (Config Service)

import { Injectable } from '@angular/core';
import { FirebaseApp } from 'firebase/app';
import { Functions } from 'firebase/functions';
import { RemoteConfig } from 'firebase/remote-config';

@Injectable({
  providedIn: 'root'
})
export class ConfigService  {

    remoteConfig: RemoteConfig | undefined = undefined;
    firebaseApp: FirebaseApp | undefined = undefined;
    functions: Functions | undefined = undefined;

    loadConfig(firebaseApp: FirebaseApp, remoteConfig: RemoteConfig, functions: Functions) {
      this.firebaseApp = firebaseApp;
      this.remoteConfig = remoteConfig;
      this.functions = functions;
    }
}

ConfigService 保留對 Cloud Functions 的引用,以便在 SpeechService 中使用。

loadConfig 被呼叫以在 provideAppInitializer 中執行初始化。

9.2 語音服務 (Speech Service)

SpeechService 處理網路呼叫並將下載的分塊重新組合成單個 Blob。

export function constructBlobURL(parts: BlobPart[]) {
  return URL.createObjectURL(new Blob(parts, { type: 'audio/wav' }));
}
export type SerializedBuffer = {
  type: 'Buffer';
  data: number[];
}

export type StreamMessage =
    | {
        type: "metadata";
        payload: {
          sampleRate: number;
        }
      }
    | {
        type: "data";
        payload: {
          buffer: SerializedBuffer,
        }
      };
import { constructBlobURL } from '@/photo-panel/blob.util';
import { inject, Injectable } from '@angular/core';
import { Functions, httpsCallable } from 'firebase/functions';
import { StreamMessage } from '../types/stream-message.type';
import { ConfigService } from './config.service';

@Injectable({
  providedIn: 'root'
})
export class SpeechService  {
    private configService = inject(ConfigService);

    private get functions(): Functions {
      if (!this.configService.functions) {
        throw new Error('Firebase Functions has not been initialized.');
      }
      return this.configService.functions;
    }

    async generateAudio(text: string) {
      const readFactFunction = httpsCallable<string, string>(
        this.functions, 'textToAudio-readFact'
      );

      const { data: audioUri } = await readFactFunction(text);
      return audioUri;
    }

    async generateAudioStream(text: string) {
      const readFactStreamFunction = httpsCallable<string, number[] | undefined, StreamMessage>(
        this.functions, 'textToAudio-readFact'
      );

      return readFactStreamFunction.stream(text);
    }

    async generateAudioBlobURL(text: string) {
      const { stream, data } = await this.generateAudioStream(text);

      const audioParts = [];
      for await (const audioChunk of stream) {
        if (audioChunk && audioChunk.data) {
          if (audioChunk && audioChunk.type === 'data') {
            audioParts.push(new Uint8Array(audioChunk.payload.buffer.data));
          }
        }
      }

      const wavHeader = await data;
      if (wavHeader && wavHeader.length) {
        audioParts.unshift(new Uint8Array(wavHeader));
      }

      return constructBlobURL(audioParts);
    }
}

generateAudio 直接呼叫雲端函式 textToAudio-readFact 並接收編碼的 base64 資料 URL。

generateAudioStream 呼叫相同的函式,但它希望從伺服器接收串流結果。

generateAudioBlobURLfor await 迴圈中將所有分塊下載到 audioParts 陣列中。await data 等待 WAV 標頭並將其插入 audioParts 陣列的開頭。接著 createTtsURL 輔助函式從中建立 Blob URL。

9.3. 音訊播放器服務 (Audio Player Service)

import { SpeechService } from '@/ai/services/speech.service';
import { inject, Injectable, OnDestroy, signal } from '@angular/core';

const INT16_MAX_VALUE = 32768;

@Injectable({
  providedIn: 'root'
})
export class AudioPlayerService implements OnDestroy  {
  private audioCtx: AudioContext | undefined = undefined;
  private speechService = inject(SpeechService);

  private nextStartTime = 0;
  private activeSources: AudioBufferSourceNode[] = [];

  playbackRate = signal(1);

  async playStream(text: string) {
      this.stopAll();

      this.playbackRate.set(this.setRandomPlaybackRate());

      try {
        const { stream } = await this.speechService.generateAudioStream(text);
        for await (const audioChunk of stream) {
          if (audioChunk.type === 'metadata') {
            this.initializeAudioContext(audioChunk.payload.sampleRate);
          } else if (audioChunk.type === 'data') {
            if (!this.audioCtx) {
              console.warn("Audio data received before metadata. Skipping chunk.");
              continue;
            }
            this.processChunk(audioChunk.payload.buffer.data);

          }
        }
      } catch (error) {
        console.error('Failed to play audio stream:', error);
        // Optionally, reset state or notify the user
        this.stopAll();
      }
  }

  private stopAll() {
    this.activeSources.forEach(sourceNode => {
      try {
        sourceNode.stop();
        sourceNode.disconnect();
      } catch (e) {
        // It's common for stop() to be called on a node that has already finished.
        // We can safely ignore these "InvalidStateError" exceptions.
      }
    });

    this.activeSources = [];
    this.nextStartTime = 0;
    if (this.audioCtx) {
      this.audioCtx?.close();
      this.audioCtx = undefined;
    }
  }

  private processChunk(rawData: number[]) {
    const float32Data = this.normalizeSoundSamples(rawData);
    if (float32Data.length === 0 || !this.audioCtx) {
      return;
    }

    const buffer = this.audioCtx.createBuffer(1, float32Data.length, this.audioCtx.sampleRate);
    buffer.copyToChannel(float32Data, 0);

    const sourceNode = this.connectSource(buffer);
    sourceNode.playbackRate.value = this.playbackRate();

    const playTime = Math.max(this.nextStartTime, this.audioCtx.currentTime);
    sourceNode.start(playTime);

    const actualDuration = buffer.duration / this.playbackRate();
    this.nextStartTime = playTime + actualDuration;
  }

  private normalizeSoundSamples(rawData: number[]) {
    const rawBuffer = new Uint8Array(rawData).buffer;

    const byteLength = rawBuffer.byteLength % 2 === 0 ? rawBuffer.byteLength : rawBuffer.byteLength - 1;
    if (byteLength === 0) {
      return new Float32Array(0);
    }

    const int16Data = new Int16Array(rawBuffer, 0, byteLength / 2);
    const float32Data = new Float32Array(int16Data.length);
    for (let i = 0; i < int16Data.length; i++) {
      float32Data[i] = (int16Data[i] * 1.0) / INT16_MAX_VALUE;
    }
    return float32Data;
  }

  private setRandomPlaybackRate(min = 0.85, max = 1.3) {
    const rawRate = Math.random() * (max - min) + min;
    return Math.round(rawRate * 100) / 100;
  }

  private connectSource(buffer: AudioBuffer) {
    if (!this.audioCtx) {
      throw new Error("Audio context is not initialized.");
    }

    const sourceNode = this.audioCtx.createBufferSource();
    sourceNode.buffer = buffer;
    sourceNode.connect(this.audioCtx.destination);

    this.activeSources.push(sourceNode);

    // Cleanup: Remove from array when this specific chunk finishes playing
    sourceNode.onended = () => {
      const index = this.activeSources.indexOf(sourceNode);
      if (index >= 0) {
        this.activeSources.splice(index, 1);
      }
    };

    return sourceNode;
  }

  private initializeAudioContext(sampleRate: number) {
    // Ensure any old context is closed before creating a new one.
    if (this.audioCtx) {
      this.audioCtx.close();
    }

    this.audioCtx = new AudioContext({ sampleRate });
    if (this.audioCtx.state === 'suspended') {
      this.audioCtx.resume();
    }

    this.nextStartTime = this.audioCtx.currentTime;
  }

  ngOnDestroy(): void {
    this.stopAll();
  }
}

瀏覽器通常會阻止 AudioContext 啟動,直到發生使用者互動。
if (this.audioCtx.state === 'suspended') { this.audioCtx.resume(); } 會在 AudioContext 被暫停時將其恢復。

playStream 重用 this.speechService.generateAudioStream(text) 來獲取分塊串流。processChunk 將原始資料轉換為 float32 陣列,建立 AudioBuffer,將音訊緩衝區連接到節點,並將節點添加到 AudioContext。節點被安排在 nextStartTime 播放以避免重疊。playStream 特意忽略了 WAV 標頭以減少延遲。

ngOnDestroy 生命週期掛鉤中,所有來源節點都會停止播放並從 AudioContext 斷開連接。這樣做是為了避免記憶體洩漏,特別是當使用者在所有節點播放並移除之前點擊 Web Audio API 按鈕產生新緩衝區時。

9.4 元件互動:ObscureFactComponentTextToSpeechComponent

父元件 ObscureFactComponent 協調產生的文字,並將產生的 URI 傳遞給原生音訊元件。

import { Signal } from '@angular/core';

function isValidBlobUrl(url: string) {
  try {
    const parsed = new URL(url);
    return parsed.protocol === 'blob:';
  } catch (e) {
    console.error(e);
    return false;
  }
}

export function revokeBlobURL(dataUrl: Signal<string | undefined>) {
  const blobUrl = dataUrl();
  if (blobUrl && isValidBlobUrl(blobUrl)) {
    URL.revokeObjectURL(blobUrl);
  }
}
// generate-audio.util.ts

import { signal, WritableSignal } from '@angular/core';

export const ttsError = signal('');

export type GenerateSpeechMode = 'sync' | 'stream' | 'web_audio_api';

export async function generateSpeechHelper(
  text: string,
  loadingSignal: WritableSignal<boolean>,
  urlSignal: WritableSignal<string | undefined>,
  speechFn: (fact: string) => Promise<string>
) {

  try {
    ttsError.set('');
    loadingSignal.set(true);
    const uri = await speechFn(text);
    urlSignal.set(uri);
  } catch (e) {
    console.error(e);
    ttsError.set('Error generating speech from text.');
  } finally {
    loadingSignal.set(false);
  }
}

export async function streamSpeechWithWebAudio(
    text: string,
    loadingSignal: WritableSignal<boolean>,
    webAudioApiFn: (text: string) => Promise<void>
) {
    try {
      loadingSignal.set(true);
      await webAudioApiFn(text);
    } catch (e) {
      console.error(e);
      ttsError.set('Error streaming speech using the Web Audio API.');
    } finally {
      loadingSignal.set(false);
    }
}
<div>
    <h3>關於這些標籤的一個令人驚訝或冷門的事實</h3>
    @if (interestingFact()) {
      <p>{{ interestingFact() }}</p>

      <app-error-display [error]="ttsError()" />

      <app-text-to-speech
        [isLoadingSync]="isLoadingSync()"
        [isLoadingStream]="isLoadingStream()"
        [isLoadingWebAudio]="isLoadingWebAudio()"
        [audioUrl]="audioUrl()"
        (generateSpeech)="generateSpeech($event)"
        [playbackRate]="playbackRate()"
      />
    } @else {
      <p>這些標籤沒有任何有趣或冷門的事實。</p>
    }
</div>

當使用者點擊 TextToSpeechComponent 中的任何按鈕時,generateSpeech 自訂事件會向 ObscureFactComponent 發送一個 GenerateSpeechModeObscureFactComponent 執行 generateSpeech 方法,檢查 mode 的值,並相應地產生語音。

import { SpeechService } from '@/ai/services/speech.service';
import { ErrorDisplayComponent } from '@/error-display/error-display.component';
import { ChangeDetectionStrategy, Component, inject, input, OnDestroy, signal } from '@angular/core';
import { revokeBlobURL } from '../blob.util';
import { generateSpeechHelper, GenerateSpeechMode, streamSpeechWithWebAudio, ttsError } from './generate-audio.util';
import { AudioPlayerService } from './services/audio-player.service';
import { TextToSpeechComponent } from './text-to-speech/text-to-speech.component';

@Component({
  selector: 'app-obscure-fact',
  templateUrl: './obscure-fact.component.html',
  imports: [
    TextToSpeechComponent,
    ErrorDisplayComponent
  ],
  changeDetection: ChangeDetectionStrategy.OnPush,
})
export class ObscureFactComponent implements OnDestroy {
  interestingFact = input<string | undefined>(undefined);

  speechService = inject(SpeechService);
  audioPlayerService = inject(AudioPlayerService);

  audioUrl = signal<string | undefined>(undefined);
  playbackRate = this.audioPlayerService.playbackRate;

  ttsError = ttsError;

  async generateSpeech({ mode }: { mode: GenerateSpeechMode }) {
    const fact = this.interestingFact();
    if (fact) {
      revokeBlobURL(this.audioUrl);
      this.audioUrl.set(undefined);

      if (mode === 'sync' || mode === 'stream') {
        const loadingSignal = mode === 'stream' ? this.isLoadingStream : this.isLoadingSync;
        const speechFn = (text: string) => mode === 'stream' ?
            this.speechService.generateAudioBlobURL(text) :
            this.speechService.generateAudio(text);
        await generateSpeechHelper(fact, loadingSignal, this.audioUrl, speechFn);
      } else if (mode === 'web_audio_api') {
        await streamSpeechWithWebAudio(
          fact,
          this.isLoadingWebAudio,
          (text: string) => this.audioPlayerService.playStream(text));
      }
    }
  }

  ngOnDestroy(): void {
    revokeBlobURL(this.audioUrl);
  }
}

結論

透過利用 Angular 和 Firebase Gen 2 串流函式,我們可以建立流暢的 AI 驅動語音體驗。使用 Gemini 3 Flash Preview 進行文字產生,並使用 Gemini 2.5 Flash 進行具有角色特徵的 TTS (如我們的黑武士範例),可以實現真正身歷其境的網頁應用程式。


參考資料


圖片
  熱門推薦
圖片
{{ item.channelVendor }} | {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言