2025 iThome 鐵人賽

DAY 27

生成式 AI

我的 AI 助手開發系列第 27 篇

DAY 27

17th鐵人賽

愛瘸瘸

團隊nutc imac T1

2025-10-11 15:26:20

129 瀏覽

分享至

錯誤處理與容錯機制

前言

在複雜的 AI 應用程式中，錯誤處理與容錯機制是確保系統穩定性的關鍵技術。今天來簡單介紹一下錯誤處理策略，展示如何構建一個健壯的 AI 應用程式。

一、錯誤處理架構設計

1.1 多層次錯誤處理策略

LangChain Agent 系統採用分層錯誤處理架構：

應用程式層級錯誤處理
├── 啟動錯誤捕捉
├── 全域異常處理
└── 優雅關閉機制

組件層級錯誤處理
├── LangChain 初始化錯誤
├── API 連線錯誤處理
├── 檔案處理錯誤
└── Agent 解析錯誤

執行緒層級錯誤處理
├── GUI 執行緒錯誤隔離
├── Agent 執行緒容錯
├── 檔案處理執行緒錯誤
└── 訊息佇列錯誤處理

功能層級錯誤處理
├── 個別工具錯誤處理
├── 資料驗證錯誤
├── 狀態管理錯誤
└── 使用者輸入錯誤

1.2 應用程式啟動錯誤處理

if __name__ == "__main__":
    try:
        customtkinter.set_appearance_mode("dark")
        app = LangChainAgentApp()
        app.mainloop()
    except Exception as e:
        print(f"應用程式啟動錯誤: {e}")
        input("按Enter鍵結束...")

啟動錯誤處理特點：

全域捕捉：捕捉應用程式啟動過程中的所有異常
用戶友善：提供清晰的錯誤訊息和退出提示
調試支援：保留詳細錯誤資訊供開發者分析
優雅退出：即使啟動失敗也能正常結束程式

二、API 與網路錯誤處理

2.1 網路連線錯誤處理

def open_paint_tool(self, input_text: str) -> str:
    """開啟小畫家應用程式"""
    print(f"[DEBUG] ===== open_paint_tool 被調用了! =====")
    print(f"[DEBUG] 準備調用API: {LOCAL_API_BASE}/open-paint")
    try:
        response = requests.post(f"{LOCAL_API_BASE}/open-paint")
        response.raise_for_status()
        result = response.json()
        print(f"[DEBUG] API回應成功: {result}")
        return result["message"]
    except requests.exceptions.ConnectionError:
        print(f"[DEBUG] API連線失敗")
        return "錯誤：無法連線到本地 API 服務。請確認伺服器已啟動。"
    except Exception as e:
        print(f"[DEBUG] API調用發生錯誤: {e}")
        return f"開啟小畫家時發生錯誤: {e}"

2.2 特定網路錯誤處理策略

def open_notepad_tool(self, input_text: str) -> str:
    """開啟記事本應用程式"""
    try:
        response = requests.post(f"{LOCAL_API_BASE}/open-notepad")
        response.raise_for_status()
        result = response.json()
        return result["message"]
    except requests.exceptions.ConnectionError:
        return "錯誤：無法連線到本地 API 服務。請確認伺服器已啟動。"
    except Exception as e:
        return f"開啟記事本時發生錯誤: {e}"

網路錯誤處理原則：

特定異常優先：優先處理 ConnectionError 提供精確錯誤訊息
通用異常兜底：使用通用 Exception 捕捉未預期的錯誤
用戶引導：提供具體的解決建議（如確認伺服器狀態）
調試資訊：記錄詳細的調試訊息供開發者分析

2.3 API 配額與限制錯誤處理

except Exception as e:
    print(f"[DEBUG] generate_content_tool發生錯誤: {type(e).__name__}: {e}")
    # 當API配額用完時，提示用戶
    if "quota" in str(e).lower() or "429" in str(e):
        return f"內容生成功能暫時不可用（API配額已用完）。請稍後重試或考慮升級API方案。原始錯誤：{str(e)}"
    else:
        return f"生成內容時發生錯誤: {e}"

API 配額錯誤處理：

智能識別：透過關鍵字識別配額相關錯誤
用戶指導：提供具體的解決方案建議
原始錯誤保留：保留完整錯誤訊息供進階用戶參考
功能降級：在 API 不可用時提供替代方案

三、LangChain Agent 錯誤處理

3.1 Agent 初始化錯誤處理

def _initialize_langchain(self):
    """初始化 LangChain 組件"""
    try:
        # Gemini API 設定
        api_key = os.getenv('API_KEY')
        
        # 創建自定義 LLM
        self.gemini_llm = GeminiLLM(api_key=api_key, model_name=os.environ.get("GEMINI_MODEL", "models/gemini-2.5-flash"))
        
        # 初始化系統工具
        self.system_tools = SystemTools(app_instance=self)
        
        # 創建 Agent...
        
    except Exception as e:
        print(f"[ERROR] LangChain 初始化失敗: {e}")
        self.agent = None
        self.system_tools = None

3.2 API Key 驗證與錯誤處理

def __init__(self, app_instance=None):
    self.opencc_converter = OpenCC('s2t')
    self.app_instance = app_instance
    # 初始化Gemini用於內容生成
    api_key = os.environ.get("GEMINI_API_KEY") or (getattr(app_instance, "gemini_llm", None) and getattr(app_instance.gemini_llm, "api_key", None))
    if not api_key:
        raise RuntimeError("未找到 GEMINI_API_KEY。請設定環境變數後再執行。")
    genai.configure(api_key=api_key)
    model_name = os.environ.get("GEMINI_MODEL") or (getattr(app_instance, "gemini_llm", None) and getattr(app_instance.gemini_llm, "model_name", "models/gemini-2.5-flash")) or "models/gemini-2.5-flash"
    self.content_model = genai.GenerativeModel(model_name)

初始化錯誤處理策略：

環境變數驗證：確保必要的 API 金鑰存在
明確錯誤訊息：提供具體的配置指導
安全的降級：初始化失敗時設定安全的預設值
狀態標記：標記組件狀態供後續檢查使用

3.3 Agent 執行錯誤處理

def get_agent_response(self, question):
    """使用 Agent 處理問題"""
    if not self.agent:
        self.message_queue.put(("remove_waiting", ""))
        self.message_queue.put(("Agent", "Agent 初始化失敗，請檢查設定。"))
        return
    
    try:
        # 使用 Agent 處理問題
        response = self.agent.run(question)
        self.message_queue.put(("remove_waiting", ""))
        self.message_queue.put(("Agent", response))
    except Exception as e:
        self.message_queue.put(("remove_waiting", ""))
        self.message_queue.put(("系統", f"Agent 處理時發生錯誤: {e}"))

Agent 執行錯誤處理：

前置檢查：執行前檢查 Agent 是否可用
執行緒安全：透過訊息佇列安全地回報錯誤
狀態清理：移除等待狀態避免界面卡住
用戶反饋：提供清楚的錯誤說明

3.4 自定義錯誤處理函數

# 創建自定義錯誤處理函數
def custom_error_handler(error) -> str:
    """自定義錯誤處理函數，嘗試解析並執行動作"""
    # 將錯誤對象轉換為字符串
    error_msg = str(error)
    print(f"[DEBUG] 解析錯誤，嘗試自定義處理: {error_msg}")
    print(f"[DEBUG] 錯誤類型: {type(error)}")
    
    # 首先檢查是否是應用程式開啟請求，忽略LLM可能編造的觀察結果
    # 只要錯誤訊息中包含Action請求，就實際執行工具
    if "Action: 開啟小畫家" in error_msg:
        print(f"[DEBUG] 檢測到小畫家開啟請求，直接執行真正的工具")
        try:
            result = self.system_tools.open_paint_tool("")
            return f"小畫家應用程式已成功開啟，您可以開始繪圖了。實際結果：{result}"
        except Exception as e:
            return f"開啟小畫家時發生錯誤: {e}"

自定義錯誤處理特點：

智能解析：從錯誤訊息中提取可執行的動作
工具映射：將錯誤中的動作映射到實際工具
容錯執行：即使格式錯誤也能完成用戶請求
結果驗證：確保實際執行結果正確回報

四、執行緒間錯誤處理

4.1 訊息佇列錯誤處理

def check_queue(self):
    """檢查訊息佇列"""
    try:
        processed_count = 0
        max_process_per_cycle = 5  # 限制每次處理的最大訊息數
        
        while processed_count < max_process_per_cycle:
            speaker, message = self.message_queue.get_nowait()
            if speaker == "remove_waiting":
                self._remove_waiting_message()
            else:
                self._append_to_history(speaker, message)
            processed_count += 1
    except queue.Empty:
        pass
    except Exception as e:
        print(f"[DEBUG] 佇列處理錯誤: {e}")
    
    # 增加檢查間隔，降低CPU使用率
    self.after(200, self.check_queue)

佇列錯誤處理策略：

預期異常處理：queue.Empty 是正常情況，不記錄錯誤
非預期異常記錄：記錄其他異常但不中斷系統運行
持續運行保證：確保佇列檢查迴圈永不停止
性能保護：限制單次處理數量避免系統卡頓

4.2 檔案處理執行緒錯誤處理

def _upload_file_thread(self, file_path):
    """在背景執行緒中處理檔案上傳"""
    try:
        filename = Path(file_path).name
        file_ext = Path(file_path).suffix.lower()
        
        # 使用一般檔案上傳工具
        result = self.system_tools.upload_file_tool(file_path)
        self.message_queue.put(("系統", result))
        
        # 如果上傳成功，設置等待用戶操作的狀態
        if "上傳成功" in result:
            # 提取檔案內容並保存
            content_start = result.find("內容: ") + 4
            file_content = result[content_start:] if content_start > 3 else result
            
            # 保存檔案信息以供後續使用
            self.last_file_content = file_content
            self.last_file_name = filename
            self.waiting_for_file_action = True
            
            # 詢問用戶想要做什麼
            self.message_queue.put(("系統", f"檔案 {filename} 上傳成功！請問您想要做什麼？"))
    except Exception as e:
        self.message_queue.put(("系統", f"檔案處理時發生錯誤: {e}"))

背景執行緒錯誤處理：

完整錯誤捕捉：捕捉檔案處理過程中的所有錯誤
安全訊息傳遞：透過佇列安全地回報錯誤
狀態一致性：確保錯誤發生時狀態保持一致
用戶通知：即時告知用戶檔案處理狀態

4.3 Agent 執行緒容錯機制

# 檢查是否正在等待檔案操作回覆
if hasattr(self, 'waiting_for_file_action') and self.waiting_for_file_action:
    if hasattr(self, 'last_uploaded_file_content') and self.last_uploaded_file_content:
        try:
            # 直接調用檔案分析工具，傳入專家信息
            response = self.system_tools.analyze_file_content_tool(analysis_type, expert_type, expert_prompt)
            self.message_queue.put(("remove_waiting", ""))
            self.message_queue.put(("Agent", response))
            
            # 保存分析結果
            self.last_analysis_result = response
            
            # 清除等待標記
            self.waiting_for_file_action = False
            return
            
        except Exception as e:
            self.message_queue.put(("remove_waiting", ""))
            self.message_queue.put(("系統", f"檔案分析時發生錯誤: {e}"))
            self.waiting_for_file_action = False
            return

Agent 執行緒容錯特點：

狀態驗證：執行前檢查相關狀態和資源
錯誤隔離：錯誤不會影響其他功能的正常運行
狀態恢復：錯誤發生時恢復到安全狀態
功能降級：在部分功能失敗時保持核心功能可用

五、LLM 錯誤處理與容錯

5.1 LLM 調用錯誤處理

def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
    try:
        response = self._chat.send_message(prompt)
        return self.opencc_converter.convert(response.text)
    except Exception as e:
        return f"錯誤: {str(e)}"

LLM 調用容錯策略：

簡潔錯誤回報：將複雜的 API 錯誤轉換為簡潔的用戶訊息
繼續運行：即使單次調用失敗也不中斷整個系統
錯誤透明性：保留原始錯誤訊息供調試使用
字元編碼處理：確保錯誤訊息的正確顯示

5.2 意圖識別錯誤處理

try:
    # 使用gemini模型來分析意圖
    intention_response = self.gemini_llm._call(intention_prompt)
    expert_type = intention_response.strip()
    
    # 檢查AI判斷的專家類型是否在映射表中
    if expert_type in expert_mapping:
        config = expert_mapping[expert_type]
        return config["analysis_type"], expert_type, config["expert_prompt"]
    else:
        # 嘗試模糊匹配
        for key, config in expert_mapping.items():
            if key in expert_type or any(word in expert_type for word in key.split()):
                return config["analysis_type"], key, config["expert_prompt"]
        
        # 如果完全匹配不到，返回通用專家
        config = expert_mapping["通用專家"]
        return config["analysis_type"], "通用專家", config["expert_prompt"]
        
except Exception as e:
    print(f"[ERROR] 意圖識別出錯: {e}")
    # 發生錯誤時返回通用專家
    return ("詳細分析", "通用專家", """你是一位知識淵博的通用專家...""")

意圖識別容錯機制：

多層匹配策略：精確匹配 → 模糊匹配 → 預設選項
安全降級：無法識別時使用通用專家
錯誤記錄：記錄意圖識別失敗的詳細資訊
功能保障：確保即使識別失敗也能提供基本服務

六、檔案處理錯誤處理

6.1 檔案上傳錯誤處理

def upload_file(self):
    """上傳檔案功能"""
    file_path = filedialog.askopenfilename(
        title="選擇要上傳的檔案",
        filetypes=[
            ("所有支援的檔案", "*.txt;*.py;*.js;*.html;*.css;*.json;*.xml;*.csv;*.docx;*.doc;*.pdf;*.xlsx;*.xls"),
            # ... 其他檔案類型
        ]
    )
    
    if file_path:
        self.message_queue.put(("系統", f"正在上傳檔案： {Path(file_path).name}..."))
        thread = threading.Thread(target=self._upload_file_thread, args=(file_path,))
        thread.start()

檔案選擇錯誤處理：

檔案類型驗證：透過 filetypes 限制可選檔案類型
路徑驗證：檢查用戶是否實際選擇了檔案
異步處理：使用背景執行緒避免界面凍結
進度回饋：即時顯示檔案處理進度

6.2 檔案內容解析錯誤處理

檔案內容解析錯誤通常在 SystemTools 的檔案處理工具中處理，包括：

檔案格式錯誤：不支援的檔案格式
檔案損壞：無法正常讀取的檔案
編碼錯誤：字元編碼不正確的文字檔
權限錯誤：無法存取的檔案

七、界面錯誤處理與用戶體驗

7.1 輸入驗證錯誤處理

def start_agent_call(self):
    """啟動 Agent 處理用戶輸入"""
    question = self.entry.get().strip()
    if question:
        self.entry.delete(0, "end")
        self._append_to_history("使用者", question)
        self.message_queue.put(("Agent", "正在處理…"))
        
        # 啟動新執行緒來呼叫 Agent
        thread = threading.Thread(target=self.get_agent_response, args=(question,))
        thread.start()
    else:
        self.message_queue.put(("系統", "請輸入一個問題。"))

用戶輸入驗證：

空輸入檢查：防止處理空白或無效輸入
友善提示：提供明確的輸入要求說明
狀態反饋：即時顯示處理狀態
界面保護：避免重複提交導致的問題

7.2 界面更新錯誤處理

def _update_history_display(self):
    """更新歷史紀錄顯示（支援markdown格式）"""
    self.history_textbox.configure(state="normal")
    self.history_textbox.delete("1.0", "end")
    
    textbox = self.history_textbox._textbox
    
    for message in self.chat_history:
        # 處理內容的markdown格式
        try:
            formatted_parts = self._format_markdown_text(content)
            
            for text_part, tag in formatted_parts:
                start_pos = textbox.index(tk.INSERT)
                textbox.insert(tk.END, text_part)
                if tag:
                    end_pos = textbox.index(tk.INSERT)
                    textbox.tag_add(tag, start_pos, end_pos)
        except Exception as e:
            # 如果格式化失敗，直接插入原始文本
            print(f"[DEBUG] 格式化失敗: {e}")
            textbox.insert(tk.END, content)

界面更新容錯機制：

格式化失敗降級：Markdown 解析失敗時顯示原始文字
部分更新保護：單一訊息格式化失敗不影響其他訊息
調試資訊記錄：記錄格式化失敗的詳細資訊
界面一致性：確保界面始終保持可用狀態

八、系統監控與診斷機制

8.1 調試資訊系統

print(f"[DEBUG] ===== open_paint_tool 被調用了! =====")
print(f"[DEBUG] 準備調用API: {LOCAL_API_BASE}/open-paint")
print(f"[DEBUG] API回應成功: {result}")
print(f"[DEBUG] API連線失敗")
print(f"[DEBUG] 解析錯誤，嘗試自定義處理: {error_msg}")
print(f"[DEBUG] 錯誤類型: {type(error)}")

調試資訊特點：

統一格式：使用 [DEBUG] 前綴標識調試訊息
關鍵節點記錄：記錄重要的執行節點和狀態變化
錯誤類型識別：記錄異常的具體類型和內容
執行流程追蹤：追蹤複雜流程的執行路徑

8.2 狀態檢查機制

if not self.agent:
    self.message_queue.put(("remove_waiting", ""))
    self.message_queue.put(("Agent", "Agent 初始化失敗，請檢查設定。"))
    return

if hasattr(self, 'waiting_for_file_action') and self.waiting_for_file_action:
    # 特殊檔案處理邏輯

狀態檢查策略：

前置條件驗證：執行前檢查必要組件狀態
動態屬性檢查：使用 hasattr 安全檢查動態屬性
狀態標誌管理：使用標誌變量協調複雜狀態
安全退出機制：狀態異常時提供安全的退出路徑

DAY 26

DAY 28

系列文

我的 AI 助手開發共 30 篇

RSS系列文訂閱系列文

0 人訂閱

26
DAY 26
27
DAY 27
28
DAY 28
29
DAY 29
30
DAY 30

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19854 篇

完賽人數

528 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

我的 AI 助手開發系列 第 27 篇

DAY 27

錯誤處理與容錯機制

前言

一、錯誤處理架構設計

1.1 多層次錯誤處理策略

1.2 應用程式啟動錯誤處理

二、API 與網路錯誤處理

2.1 網路連線錯誤處理

2.2 特定網路錯誤處理策略

2.3 API 配額與限制錯誤處理

三、LangChain Agent 錯誤處理

3.1 Agent 初始化錯誤處理

3.2 API Key 驗證與錯誤處理

3.3 Agent 執行錯誤處理

3.4 自定義錯誤處理函數

四、執行緒間錯誤處理

4.1 訊息佇列錯誤處理

4.2 檔案處理執行緒錯誤處理

4.3 Agent 執行緒容錯機制

五、LLM 錯誤處理與容錯

5.1 LLM 調用錯誤處理

5.2 意圖識別錯誤處理

六、檔案處理錯誤處理

6.1 檔案上傳錯誤處理

6.2 檔案內容解析錯誤處理

七、界面錯誤處理與用戶體驗

7.1 輸入驗證錯誤處理

7.2 界面更新錯誤處理

八、系統監控與診斷機制

8.1 調試資訊系統

8.2 狀態檢查機制

尚未有邦友留言

標記使用者

我的 AI 助手開發系列第 27 篇