Day 16 - Debug 的藝術 - AI 應用的錯誤處理策略 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

2025 iThome 鐵人賽

DAY 16

生成式 AI

AI-Driven Development - 個人開發者的敏捷實踐系列第 16 篇

Day 16 - Debug 的藝術 - AI 應用的錯誤處理策略

17th鐵人賽

團隊籠中鳥

2025-09-16 21:47:44

356 瀏覽

分享至

在開發 AI 應用的過程中，你一定遇過這些狀況：API 突然超時、模型回應格式錯誤、Token 限制爆掉、服務暫時不可用...今天我們要深入探討如何優雅地處理這些錯誤，讓你的 AI 應用更加穩定可靠。

為什麼 AI 應用的錯誤處理特別重要？

AI 應用與傳統應用最大的差異在於其不確定性：

回應格式不穩定：LLM 可能產生非預期格式的輸出
外部依賴性高：依賴第三方 API 服務
成本考量：每次重試都有費用
延遲敏感：使用者期待即時回應

常見的 AI 應用錯誤類型

1. 網路與連線錯誤

import openai
from openai import OpenAI

client = OpenAI()

try:
    response = [client.chat](http://client.chat).completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello"}]
    )
except openai.APIConnectionError as e:
    print(f"網路連線失敗：{e.__cause__}")
    # 實作重連邏輯
except openai.APITimeoutError as e:
    print(f"請求超時：{e}")
    # 考慮增加 timeout 或簡化請求

2. API 限制錯誤

import time
from tenacity import retry, wait_exponential, stop_after_attempt

@retry(
    wait=wait_exponential(multiplier=1, min=4, max=60),
    stop=stop_after_attempt(5)
)
def call_api_with_retry():
    try:
        response = [client.chat](http://client.chat).completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "分析這段文字..."}]
        )
        return response
    except openai.RateLimitError as e:
        print(f"達到速率限制：{e.status_code}")
        # 從 response header 取得 retry-after
        retry_after = e.response.headers.get("retry-after")
        if retry_after:
            time.sleep(int(retry_after))
        raise  # 重新拋出異常讓 retry 裝飾器處理

3. 輸出解析錯誤

import json
from typing import Dict, Any

class OutputParser:
    def __init__(self, max_retries=3):
        self.max_retries = max_retries
    
    def parse_json_response(self, text: str, llm_client) -> Dict[str, Any]:
        """嘗試解析 JSON，失敗時要求 LLM 修正"""
        
        for attempt in range(self.max_retries):
            try:
                # 嘗試解析 JSON
                return json.loads(text)
            except json.JSONDecodeError as e:
                if attempt == self.max_retries - 1:
                    # 最後一次嘗試失敗，返回錯誤
                    raise ValueError(f"無法解析 JSON: {e}")
                
                # 要求 LLM 修正格式
                repair_prompt = f"""
                以下 JSON 格式有誤，請修正並只回傳正確的 JSON：
                
                錯誤訊息：{str(e)}
                原始內容：{text}
                
                請只回傳修正後的 JSON，不要加任何說明。
                """
                
                response = llm_[client.chat](http://client.chat).completions.create(
                    model="gpt-4o-mini",
                    messages=[{"role": "user", "content": repair_prompt}],
                    temperature=0
                )
                
                text = response.choices[0].message.content
                print(f"嘗試修復 JSON (第 {attempt + 1} 次)")
        
        return {}

進階錯誤處理策略

1. 設定合理的超時與重試機制

import httpx
from openai import OpenAI

# 細緻的超時控制
client = OpenAI(
    timeout=httpx.Timeout(
        connect=5.0,      # 連線超時
        read=30.0,        # 讀取超時
        write=10.0,       # 寫入超時
        pool=2.0          # 連線池超時
    ),
    max_retries=3        # 自動重試次數
)

# 針對特定請求調整設定
response = client.with_options(
    timeout=60.0,  # 長文本處理需要更多時間
    max_retries=5
).chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "處理這份長文件..."}]
)

2. 實作 Fallback 機制

from typing import Optional, List
from dataclasses import dataclass

@dataclass
class ModelConfig:
    name: str
    max_tokens: int
    cost_per_1k: float

class AIServiceWithFallback:
    def __init__(self):
        # 設定模型優先順序
        self.models = [
            ModelConfig("gpt-4o", 128000, 0.005),
            ModelConfig("gpt-4o-mini", 128000, 0.00015),
            ModelConfig("gpt-3.5-turbo", 16385, 0.0005)
        ]
        self.client = OpenAI()
    
    def complete(self, prompt: str, max_retries: int = 2) -> Optional[str]:
        """嘗試使用不同模型完成請求"""
        errors = []
        
        for model in self.models:
            for attempt in range(max_retries):
                try:
                    response = [self.client.chat](http://self.client.chat).completions.create(
                        model=[model.name](http://model.name),
                        messages=[{"role": "user", "content": prompt}],
                        max_tokens=min(1000, model.max_tokens)
                    )
                    print(f"成功使用 {[model.name](http://model.name)}")
                    return response.choices[0].message.content
                    
                except openai.APIStatusError as e:
                    error_msg = f"{[model.name](http://model.name)} 失敗 (嘗試 {attempt + 1}): {e}"
                    errors.append(error_msg)
                    print(error_msg)
                    
                    # 如果是 token 限制錯誤，直接換模型
                    if e.status_code == 400 and "token" in str(e).lower():
                        break
                    
                    # 其他錯誤等待後重試
                    if attempt < max_retries - 1:
                        time.sleep(2 ** attempt)  # 指數退避
        
        # 所有嘗試都失敗
        raise Exception(f"所有模型都失敗：\n" + "\n".join(errors))

3. 錯誤監控與日誌記錄

import logging
from datetime import datetime
from typing import Any, Dict
import json

class AIErrorMonitor:
    def __init__(self, log_file="ai_errors.log"):
        # 設定日誌
        logging.basicConfig(
            filename=log_file,
            level=[logging.INFO](http://logging.INFO),
            format='%(asctime)s - %(levelname)s - %(message)s'
        )
        self.logger = logging.getLogger(__name__)
        
        # 錯誤統計
        self.error_stats = {
            "total_requests": 0,
            "successful_requests": 0,
            "errors_by_type": {},
            "errors_by_model": {}
        }

最佳實踐建議

1. 設計優雅的降級策略

當主要模型失敗時，自動切換到較簡單的模型
提供快取的回應作為臨時方案
顯示友善的錯誤訊息給使用者

2. 實作智慧重試機制

使用指數退避（exponential backoff）
根據錯誤類型決定是否重試
設定最大重試次數避免無限循環

3. 主動監控與預防

追蹤錯誤模式，提早發現問題
設定警報闾值，及時通知
定期分析錯誤日誌，改善系統

4. 成本控制

記錄每次重試的成本
設定預算上限
在開發環境使用較便宜的模型測試

小結

錯誤處理是 AI 應用成熟度的重要指標。一個優秀的錯誤處理系統不僅能提升使用者體驗，還能節省開發時間和營運成本。記住，錯誤不可避免，但我們可以優雅地處理它們。

Day 15 - AI Test-Driven Development：測試驅動開發的流程升級

Day 17 - AI 工具實戰：Claude Code 從零到上手

系列文

AI-Driven Development - 個人開發者的敏捷實踐共 30 篇

RSS系列文訂閱系列文

25 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19866 篇

完賽人數

529 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

AI-Driven Development - 個人開發者的敏捷實踐系列 第 16 篇