【Day 12】產品 11：AI 語音記帳工具

2025 iThome 鐵人賽

DAY 12

生成式 AI

30天挑戰開發30種產品系列第 12 篇

17th鐵人賽

jackietung

2025-09-26 21:53:34

294 瀏覽

分享至

一、要解決什麼問題？

傳統的記帳方式常常讓人感到繁瑣。無論是手寫在筆記本上，還是在手機應用程式裡點擊、輸入，每一個步驟都消耗著寶貴的時間與精力。尤其在忙碌的生活中，我們可能因為趕時間、事情一多就忘了記錄，導致月底結算時，發現總是有幾筆帳目對不上，難以完整追蹤個人開支。

這就是我們 AI語音記帳助理 誕生的原因。我們的目標是 消除記帳過程中的所有摩擦，讓財務管理變得像說話一樣簡單、自然。透過語音輸入與人工智慧的結合，我們將繁雜的記帳步驟濃縮成一句話，讓您隨時隨地、輕鬆地管理每一筆花費，從此告別遺忘和手動輸入的困擾。

二、提示詞設計

Please build an AI Voice Accounting Agent with the following core features:

## Core Features & Specifications
1.  **Zero-Friction Voice Input:**
    * **Activation:** The system must allow users to initiate audio recording instantly with a single tap (or a dedicated widget trigger).
    * **Natural Speech Acceptance:** The AI must process free-form, conversational statements about expenses.
    * *Example:* "I spent 135 dollars at Starbucks for coffee this morning." or "Paid 15,000 for rent last Tuesday."

2.  **Intelligent AI Parsing & Structure:**
    * The core AI engine must automatically extract and structure all critical fields from the voice input.
    * **Required Fields:** **Amount** (numerical value and currency), **Category** (inferred or explicit), **Date** (resolved), and **Note/Merchant** (remaining descriptive text).

3.  **Dynamic Date Interpretation:**
    * The AI must accurately interpret and resolve relative or ambiguous time references into a precise `YYYY-MM-DD` format.
    * *Supported References:* "Today," "Yesterday," "Last Week," "The 10th of this month," etc.

4.  **Customizable Category Mapping:**
    * The system uses context clues (e.g., "gas," "latte," "movie ticket") to infer the correct category.
    * **User Control:** Users must be able to define, manage, and map their own list of custom expense categories (e.g., mapping "gym" to "Wellness").

5.  **Post-Entry Review and Quick Edit:**
    * After the AI parses the statement, the system displays the structured data for a final check.
    * All parsed fields (Amount, Date, Category, Note) must be instantly editable before saving, allowing users to correct AI inference errors quickly.

## User Interface (UI) / User Experience (UX) Flow
-   **Ready Screen:** A minimal screen dominated by a large, accessible **Microphone Icon** (the Call-to-Action) for instant recording.
-   **Audio Capture:** Clear visual feedback (e.g., a pulsating ring) confirming the system is actively listening.
-   **Transcript Display:** The raw voice-to-text transcript is shown in real-time for immediate user verification.
-   **Confirmation Modal:** Upon end-of-speech, a modal displays the AI's structured output in easily readable fields (Date: [editable], Amount: [editable], etc.).
-   **Action:** The modal features clear **"Confirm & Sync"** and **"Discard"** buttons.
-   **Success Feedback:** A brief, non-intrusive notification (e.g., a green banner) confirms **"Sync Complete to Sheets."**

## Technical Requirements
1.  The system is architected around advanced native audio models for low-latency, conversational NLU, such as **Gemini 2.5 Flash Native Audio Preview**.
2.  Full support for **Traditional Chinese (ZH-TW)** and **English (EN)** in both voice input and UI/confirmation text.

三、產品原型呈現

1. 簡語音輸入介面

中央麥克風圖示： 這是整個產品的核心，一個清晰且可點擊的 Call-to-Action (CTA)。使用者無需尋找複雜的選單或輸入框，只需點擊這個圖示，即可立即開始記錄開支。
「點擊麥克風開始描述您的支出」： 簡單明瞭的提示語，引導使用者如何操作。
語音輸入提示： 下方的兩個方框提示了使用者可以如何描述開支，包括「解釋具體消費內容，例如：買咖啡、打計程車...」以及「說明金額，並描述該筆開支發生的日期...」。這不僅為使用者提供了輸入範例，也強調了 AI 能夠理解自然語言 的能力，例如「昨天」、「今天」等非精確日期。

整個頁面設計簡潔、乾淨，旨在減少任何可能的分心，讓使用者能快速、直觀地完成語音輸入。

2. 支出帳單結構化與快速編輯

總覽面板： 頂部顯示了一個總覽金額，例如圖中的 「$1920.00」。這讓使用者能快速確認當次記錄的總金額是否正確。
結構化清單： 面板下方則以清單形式展示了 AI 解析出的每一筆獨立開支。每一筆記錄都包含了：
- 日期（27日）： AI 自動解析並轉換為精確日期。
- 備註／商家（便當、飲料與餅乾、打滴）： AI 從語音中提取的關鍵字。
- 金額（100.00、200.00、1620.00）： 精準的數字與幣別（TWD）。
「相關」按鈕： 提示使用者這些可能是相關聯的支出，方便進行後續處理。