第一次學 Kotlin Koog AI 就上手 Day 17：增加使用者體驗：流式處理與即時回應

2025 iThome 鐵人賽

DAY 17

生成式 AI

第一次學 Kotlin Koog AI 就上手系列第 17 篇

17th鐵人賽

cashwugeek

2025-08-31 00:01:30

373 瀏覽

分享至

在實際的用戶互動中，等待 AI 完整回應往往會讓用戶感到焦急。今天我們要學習 Koog 框架的流式處理功能，實現即時的 AI 回應，大幅提升用戶體驗。透過 executeStreaming API，我們將讓智慧客服助手能夠像真人對話一樣，逐字逐句地即時回應用戶

流式處理的重要性

傳統批量處理 vs 流式處理

在傳統的 AI 應用中，我們習慣等待模型完全生成回應後再顯示給用戶

// 傳統批量處理方式
val response = executor.execute(prompt, model)
println(response.content) // 等待完整回應後一次性顯示

這種方式雖然簡單，但存在明顯的用戶體驗問題

長時間等待：複雜問題可能需要等待 10-30 秒
無回饋感：用戶不知道系統是否正在處理
互動性差：無法提前預覽和打斷回應

流式處理則能夠

即時回饋：文字逐字產生，如同真人打字
降低感知延遲：用戶能立即看到回應開始
更好的互動性：可以提前理解回應內容

Koog 框架中的流式處理架構

Koog 框架通過 Kotlin 的 Flow API 提供了優雅的流式處理支援

interface LLMClient {
    // 傳統同步執行
    abstract suspend fun execute(prompt: Prompt, model: LLModel, tools: List<ToolDescriptor>): List<Message.Response>

    // 流式執行 - 返回 Flow<String>
    abstract fun executeStreaming(prompt: Prompt, model: LLModel): Flow<String>
}

基本流式處理實作

簡單的流式聊天範例

讓我們從最基本的流式處理開始

suspend fun main() {

    // 建立執行器
    val executor = simpleOpenAIExecutor(ApiKeyManager.openAIApiKey!!)

    // 建立提示
    val prompt = prompt("streaming") {
        system("你是一個友善的 AI 助手，請使用正體中文回答問題")
        user("請簡單的說明，什麼是 Kotlin 的協程")
    }

    // 流式執行
    println("AI 正在回應...")
    executor.executeStreaming(prompt, OpenAIModels.CostOptimized.GPT4_1Mini)
        .collect { token ->
            // 即時輸出每個文字片段
            print(token)
        }

    println("\n回應完成！")
}

執行 AI 回應內容

直接從結果是看不太出來 streaming 的效果 XD，它會幾個字幾個字地輸出，讓使用者可以及時看到 AI 的回應進度

AI 正在回應...
Kotlin 的協程（Coroutine）是一種用來進行非同步程式設計的輕量級工具。它可以讓你用同步的寫法寫出非同步的程式碼，讓程式更簡潔且易於閱讀。協程可以暫停執行（掛起），等候某些操作完成後再繼續，而不會阻塞整個線程，提升效能並減少資源浪費。簡單來說，協程讓異步任務的撰寫變得像寫同步程式一樣直覺方便。
回應完成！

流式處理的錯誤處理

在網路環境中，流式處理更容易遇到中斷問題，因此需要完善的錯誤處理

class RobustStreamingChat(private val executor: PromptExecutor) {

    fun streamWithRetry(
        prompt: Prompt,
        model: LLModel,
        maxRetries: Int = 3
    ): Flow<String> = flow {
        var attempt = 0
        var success = false

        while (attempt <= maxRetries && !success) {
            try {
                executor.executeStreaming(prompt, model).collect { token ->
                    emit(token)

                    // 模擬出錯的情況
                    delay(100L)
                    throw SocketTimeoutException("Dummy timeout")
                }
                success = true
            } catch (e: Exception) {
                attempt++
                if (attempt < maxRetries) {
                    emit("\n[連線中斷，正在重新嘗試... ($attempt/$maxRetries)]\n")
                    // 指數退避
                    delay(1000L * attempt)
                } else {
                    emit("\n[連線失敗，請稍後再試... ($attempt/$maxRetries)]\n")
                    throw e
                }
            }
        }
    }
}

錯誤處理使用範例

suspend fun main() {

    // 建立執行器
    val executor = simpleOpenAIExecutor(ApiKeyManager.openAIApiKey!!)

    // 建立提示
    val prompt = prompt("streaming") {
        system("你是一個友善的 AI 助手，請使用正體中文回答問題")
        user("請簡單的說明，什麼是 Kotlin 的協程")
    }

    // 流式執行
    println("AI 正在回應...")
    val robustStreamingChat = RobustStreamingChat(executor)
    robustStreamingChat.streamWithRetry(prompt, OpenAIModels.CostOptimized.GPT4_1Mini)
        .collect { token ->
            // 即時輸出每個文字片段
            print(token)
        }

    println("\n回應完成！")
}

執行 AI 回應內容

AI 正在回應...

[連線中斷，正在重新嘗試... (1/3)]

[連線中斷，正在重新嘗試... (2/3)]

Exception in thread "main" [連線失敗，請稍後再試... (3/3)]
java.lang.IllegalStateException: Error from OpenAI API: 200 OK: Dummy timeout

效能優化技巧

基本效能追蹤

我們可以建立簡單的效能監控來追蹤流式處理的表現

class SimpleStreamingMonitor {

    data class StreamingStats(
        val totalTokens: Int,
        val duration: Duration,
        val firstTokenDelay: Duration
    )

    // 監控流式處理效能
    fun Flow<String>.withPerformanceTracking(): Flow<String> = flow {
        val startTime = TimeSource.Monotonic.markNow()
        var firstTokenTime: Duration? = null
        var tokenCount = 0

        collect { token ->
            tokenCount++

            // 記錄第一個 token 的時間
            if (firstTokenTime == null) {
                firstTokenTime = startTime.elapsedNow()
            }

            emit(token)
        }

        // 輸出統計資訊
        val totalDuration = startTime.elapsedNow()
        val stats = StreamingStats(
            totalTokens = tokenCount,
            duration = totalDuration,
            firstTokenDelay = firstTokenTime ?: Duration.ZERO
        )

        println("\n=== 效能統計 ===")
        println("總 Token 數：${stats.totalTokens}")
        println("總耗時：${stats.duration}")
        println("首 Token 延遲：${stats.firstTokenDelay}")
    }
}

基本效能追蹤使用範例

suspend fun main() {

    // 建立執行器
    val executor = simpleOpenAIExecutor(ApiKeyManager.openAIApiKey!!)

    // 建立提示
    val prompt = prompt("streaming") {
        system("你是一個友善的 AI 助手，請使用正體中文回答問題")
        user("請簡單的說明，什麼是 Kotlin 的協程")
    }

    // 流式執行
    println("AI 正在回應...")

    with(SimpleStreamingMonitor()) {
        executor.executeStreaming(prompt, OpenAIModels.CostOptimized.GPT4_1Mini)
            .withPerformanceTracking()
            .collect { token ->
                // 即時輸出每個文字片段
                print(token)
            }
    }

    println("\n回應完成！")
}

執行 AI 回應內容

AI 正在回應...
Kotlin 的協程（Coroutine）是一種輕量級的非同步程式設計工具，讓你可以用類似同步的方式撰寫非同步或並行的程式碼。它可以在不阻塞主執行緒的情況下，進行耗時操作（例如網路請求、檔案讀寫），提升效能和使用者體驗。協程能夠暫停和恢復執行，讓程式更容易閱讀和維護。簡單來說，協程就是用來簡化非同步程式碼寫法的技術。
=== 效能統計 ===
總 Token 數：130
總耗時：3.520779625s
首 Token 延遲：1.451959625s

回應完成！

超時管理

確保流式處理在合理時間內完成

class StreamingWithTimeout(private val executor: PromptExecutor) {

    suspend fun execute(
        prompt: Prompt,
        model: LLModel,
        timeoutSeconds: Long = 5
    ): Flow<String> = flow {
        try {
            withTimeout(timeoutSeconds * 1000) {
                executor.executeStreaming(prompt, model).collect { token ->

                    // 模擬超過時間
                    delay(6000)

                    emit(token)
                }
            }
        } catch (e: TimeoutCancellationException) {
            emit("\n[回應超時，請稍後再試]")
        } catch (e: IllegalStateException) {
            emit("\n[回應超時，請稍後再試]")
        }
    }
}

超時管理使用範例

suspend fun main() {

    // 建立執行器
    val executor = simpleOpenAIExecutor(ApiKeyManager.openAIApiKey!!)

    // 建立提示
    val prompt = prompt("streaming") {
        system("你是一個友善的 AI 助手，請使用正體中文回答問題")
        user("請簡單的說明，什麼是 Kotlin 的協程")
    }

    // 流式執行
    println("AI 正在回應...")

    val streamingWithTimeout = StreamingWithTimeout(executor)
    streamingWithTimeout.execute(prompt, OpenAIModels.CostOptimized.GPT4_1Mini)
        .collect { token ->
            // 即時輸出每個文字片段
            print(token)
        }

    println("\n回應完成！")
}

執行 AI 回應內容

6 秒後會呈現超時的訊息

AI 正在回應...

[回應超時，請稍後再試]
回應完成！

總結

今天我們學習了 Koog 框架的流式處理基礎功能

掌握 executeStreaming API：學會使用 Flow 處理即時回應
基本錯誤處理：處理網路中斷和連線問題
流式客服系統：建立支援多輪對話的即時客服助手
效能追蹤：監控 token 速率和回應延遲
超時管理：確保流式處理的穩定性

流式處理大幅提升了 AI 應用的用戶體驗，讓互動更加自然流暢。透過 Kotlin 的 Flow API，我們可以輕鬆實現複雜的即時回應功能。在下一篇文章中，我們將學習多 LLM 整合與智慧切換，探討如何在不同場景下選擇最適合的語言模型

參考資料

圖片來源：AI 產生

同步刊登於 Blog 第一次學 Kotlin Koog AI 就上手 Day 17：增加使用者體驗：流式處理與即時回應

第一次學 Kotlin Koog AI 就上手 Day 16：策略路由：根據問題類型選擇最適合的 Agent

第一次學 Kotlin Koog AI 就上手 Day 18：服務不中斷：建立智慧路由與容錯機制

系列文

第一次學 Kotlin Koog AI 就上手共 34 篇

RSS系列文訂閱系列文

5 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19867 篇

完賽人數

529 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

第一次學 Kotlin Koog AI 就上手系列 第 17 篇