Day 19｜每日知識快遞 (下)｜再也不用怕 Token 爆掉！工程師的分階段榨乾 LLM 心法

2025 iThome 鐵人賽

DAY 19

AI & Data

論文流浪記：我與AI 探索工具、組合流程、挑戰完整平台系列第 20 篇

Day 19｜每日知識快遞 (下)｜再也不用怕 Token 爆掉！工程師的分階段榨乾 LLM 心法

17th鐵人賽

冒牌者症候群的軟體攻城獅

團隊等待阿毛參賽中

2025-09-26 06:44:05

148 瀏覽

分享至

⚡《AI 知識系統建造日誌》這不是一篇純技術文章，而是一場工程師的魔法冒險。程式是咒語、流程是魔法陣、錯誤訊息則是黑暗詛咒。請準備好你的魔杖（鍵盤），今天，我們要踏入魔法學院的基礎魔法課，打造穩定、可擴展的 AI 知識系統。

前言

昨天，我們成功完成了每日知識快遞訂閱服務，將抓取最新論文、生成摘要、翻譯、HTML 格式化以及自動寄送整合成一套完整流程。三層架構清晰：

Pipeline → Services → Storage

今天，我們要深入探討這個魔法系統中最核心的部分：如何讓 LLM 幫我們生成精準、易讀的摘要與郵件內容。這是讓每日知識快遞真正對使用者「有感」的關鍵魔法。

LLM 技術亮點

在 Pipeline 裡，最核心的就是 如何請 LLM 幫我們產生合適的摘要與郵件內容。

也就是Day 9｜Email Pipeline 技術拆解（下） - 打造訂閱系統提到的 generate_summary

實際上線後，遇到一些問題

有些模型（如 llama3.2:3b）會直接罷工。跟我說 I can't fulfill this request.
LLM 的 context 有限制，若一次塞太多內容，容易超過 token 上限。
部分摘要效果不佳，內容不夠精簡或清楚。

因此，我們採取了今天的解方：將流程拆成三階段：

1. 拆成三階段

生成摘要 → 專注內容精簡
翻譯（可選 → 使用者語言
HTML 格式化 → 美觀郵件

這樣比一次塞一大坨 prompt 更穩，也能減少 token 爆炸的風險。

生成摘要 `summarize_paper`

summarize_paper

def summarize_paper(paper_info: dict, user: dict, retries: int = 3) -> str:
    """呼叫 LLM 生成摘要，若失敗則 fallback"""
    summary = ""
    while len(summary) < 100 and retries > 0:
        summary = llm_summary(paper=paper_info, user=user, max_words=1000)
        if summary and isinstance(summary, str) and len(summary) >= 100:
            break
        retries -= 1

        if len(summary) < 100:
            logger.warning(f"Summary seems too short: {summary}")

    return summary

llm_summary

PROMPT_FILE = pathlib.Path(__file__).parent / "prompt_template.txt"

def llm_summary(paper: Dict, user: dict, max_words: int = 300) -> str:
    if not paper:
        return "No paper provided."

    temperature = min(0.2, user.get("temperature", 0.5))

    title = paper.get("title", "No Title")
    authors = ", ".join(paper.get("authors") or [])
    authors_str = ", ".join([a.replace("{", "{{").replace("}", "}}") for a in authors])

    content = paper.get("raw_content") or paper.get("abstract", "")
    content_type = "Full Content" if paper.get("raw_content") else "Abstract"

    # 讀取 prompt template
    template_text = PROMPT_FILE.read_text(encoding="utf-8")

    # GPT-OSS 上限大約是 8192 tokens
    # Prompt 固定部分 = 250
    # Title + Authors = 40

    MAX_CONTENT_TOKENS = 6000  # 算出的最大 tokens
    AVG_TOKEN_LEN = 4.5  # 每 token 平均字元數
    max_content_chars = int(MAX_CONTENT_TOKENS * AVG_TOKEN_LEN)

    content = content[:max_content_chars]

    prompt_template = template_text.format(
        max_words=max_words,
        content_type=content_type,
        title=title,
        authors=authors_str,
        content=content,
    )

    chat_model = ChatOllama(
        model=settings.SUMMARY_MODEL_NAME,
        temperature=temperature,
        base_url=settings.OLLAMA_API_URL,
        request_kwargs={"timeout": 300},  # timeout 秒數
        reset_context=True,  # ⚡每次都清掉 session
    )

    try:
        resp = chat_model.invoke(prompt_template)
        summary = resp.content.strip()
        summary = "\n".join([line for line in summary.splitlines() if line.strip()])
        return summary
    except Exception as e:
        return f"<p><strong>Summary generation failed:</strong> {e}</p>"

摘要prompt

You are a professional research assistant.
Summarize the following paper concisely, in no more than {max_words} words.
Keep it readable for an email newsletter.
(Note: the text provided is the paper's {content_type})



Instructions:
1. Base your answer STRICTLY on the provided paper excerpts.
2. Maintain academic accuracy and precision.
3. Structure your answer logically with clear paragraphs when appropriate.
4. DO NOT include any introductory paragraphs about the authors, affiliations, or background. Focus ONLY on the paper's content, key findings, methods, and important points.

Remember:
- Do NOT make up information not present in the excerpts.
- Do NOT use knowledge beyond what's provided in the paper excerpts.
- Always acknowledge uncertainty when the excerpts are ambiguous or incomplete.
- Prioritize relevance and clarity in your response.

Paper:
Title: {title}
Authors: {authors}
Content:
{content}

Include key findings, methods, and any important points as bullet points or numbered lists.

翻譯 `llm_translate`


def llm_translate(user: dict, summary: str) -> str:
    is_translate = user.get("translate", False)
    if not is_translate:
        return summary

    user_language = user.get("user_language", "English")
    temperature = min(user.get("temperature", 0.5), 0.2)

    translation_instruction = (
        f"SUMMARIZE AND TRANSLATE THE FOLLOWING PAPER INTO {user_language.upper()} ONLY. "
        "Do NOT output English under any circumstances.",
        summary,
    )

    chat_model = ChatOllama(
        model=""gpt-oss:20b"",
        temperature=temperature,
        base_url="http://ollama:11434",
        request_kwargs={"timeout": 300},  # timeout 秒數
        reset_context=True,  # ⚡每次都清掉 session
    )

    try:
        resp = chat_model.invoke(translation_instruction)
        trans_summary = resp.content.strip()
        trans_summary = "\n".join(
            [line for line in trans_summary.splitlines() if line.strip()]
        )

        # fallback 檢查
        if trans_summary.lower().startswith("i can't"):
            return f"[Fallback] 無法完整翻譯，原文保留:\n{summary}"
        return trans_summary

    except Exception as e:
        return f"[Fallback] 翻譯失敗: {e}\n原文:\n{summary}"

翻譯 prompt

translation_instruction = (
        f"SUMMARIZE AND TRANSLATE THE FOLLOWING PAPER INTO {user_language.upper()} ONLY. "
        "Do NOT output English under any circumstances.",
        summary,
    )

fallback

except Exception as e:
        return f"[Fallback] 翻譯失敗: {e}\n原文:\n{summary}"

HTML 格式化 `format_html`


def format_html(
    paper_info: dict,
    idx: int,
    summary: str,
) -> str:
    pdf_url = paper_info.get("pdf_url")
    pdf_link_html = (
        f'<a href="{pdf_url}" target="_blank">Preview PDF</a>' if pdf_url else "N/A"
    )

    summary = llm_html_foramt(summary)

    return f"""
    <div class="paper-summary">
        <div class="paper-title">{idx}. {paper_info["title"]}</div>
        <div class="paper-meta">
            <strong>Authors:</strong> {", ".join(paper_info.get("authors", []))} <br>
            <strong>Published:</strong> {paper_info.get("published_date", "N/A")} <br>
            <strong>PDF:</strong> {pdf_link_html}
        </div>
        <div class="paper-abstract">
            {summary}
        </div>
    </div>
    """

llm_html_foramt

def llm_html_foramt(summary: str) -> str:
    if not summary or not isinstance(summary, str):
        return "Summary not available."

    chat_model = ChatOllama(
        model=settings.SUMMARY_MODEL_NAME,
        temperature=0.0,
        base_url=settings.OLLAMA_API_URL,
        request_kwargs={"timeout": 300},  # timeout 秒數
        reset_context=True,  # ⚡每次都清掉 session
    )

    html_instruction = (
        "Please convert the following summary into a well-structured HTML format suitable for email newsletters. "
        "Use appropriate HTML tags such as <p>, <strong>, <em>, and <ul>/<li> for lists. "
        "Ensure the HTML is clean and free of any unnecessary tags or attributes. "
        "Do not include any CSS or JavaScript. Only provide the HTML content.",
        summary,
    )

    try:
        resp = chat_model.invoke(html_instruction)
        html_summary = resp.content.strip()
        return html_summary
    except Exception as e:
        return f"<p><strong>HTML formatting failed:</strong> {e}</p>"

HTML 格式化 prompt

html_instruction = (
        "Please convert the following summary into a well-structured HTML format suitable for email newsletters. "
        "Use appropriate HTML tags such as <p>, <strong>, <em>, and <ul>/<li> for lists. "
        "Ensure the HTML is clean and free of any unnecessary tags or attributes. "
        "Do not include any CSS or JavaScript. Only provide the HTML content.",
        summary,
    )

生成摘要 generate_summary



def generate_summary(
    papers_and_content: tuple[list[dict], dict[str, str]], user: dict
) -> str:
    """
    將每篇論文生成 LLM 摘要，並整理成 HTML
    """
    logger = get_run_logger()
    start = time.time()
    papers, content_map = papers_and_content

    if not papers:
        logger.info("No papers to summarize.")
        return "<p>No new papers today.</p>"

    logger.info(f"Generating summary for {len(papers)} papers...")

    papers_html = ""

    for idx, p in enumerate(papers, start=1):
        paper_info = fetch_paper_info(p, content_map)
        # Stage 1: 摘要
        summary = summarize_paper(paper_info, user)
        
        # Stage 2: 翻譯
        summary = llm_translate(user, summary)
        
        # Stage 3: HTML 格式化
        papers_html += format_html(paper_info, idx, summary)

    # 美麗配方 template.html
    template_path = pathlib.Path(__file__).parent / "template.html"
    template_text = template_path.read_text(encoding="utf-8")
    final_html = Template(template_text).substitute(papers_html=papers_html)

    
    return final_html

美麗配方 template.html 一樣不變

Retry 機制

有時候 LLM 會回太短或卡住，所以我們加了 retry：

def summarize_paper(paper, user, retries=3):
    summary = ""
    while len(summary) < 100 and retries > 0:
        summary = llm_summary(paper=paper, user=user)
        retries -= 1
    return summary

Token 限制

有時會在 ollama log 顯示

time=2025-09-19T02:14:50.027Z level=WARN source=runner.go:160 msg="truncating input prompt" limit=8192 prompt=15823 keep=4 new=8192

以 gpt-oss:20b 為例：

最大 8192 token
超過就會自動 truncate

解法：

簡化 Prompt
三階段處理（摘要 → 翻譯 → HTML）

翻譯問題

有些模型（如 llama3.2:3b）會直接回覆：
「I can’t fulfill this request...」
解法：
1. 換大模型（gpt-oss:20b）就是香~
2. fallback → 英文

fallback

try:
        resp = chat_model.invoke(translation_instruction)
        trans_summary = resp.content.strip()
        trans_summary = "\n".join(
            [line for line in trans_summary.splitlines() if line.strip()]
        )

        # fallback check
        if trans_summary.lower().startswith("i can't"):
            return f"[Fallback] 無法完整翻譯，原文保留:\n{summary}"
        return trans_summary

    except Exception as e:
        return f"[Fallback] 翻譯失敗: {e}\n原文:\n{summary}"

架構回顧

Pipeline (pipeline.py)
  ├── Services (抓論文 / 摘要 / 發信)
  ├── Storage (MinIO, Qdrant)
  └── Config & Utils (logger, Firebase)

Lesson Learned

Prompt 拆解 → 更穩定
Retry 機制 → 避免失敗卡死
尊重模型限制 → 避免超 token
翻譯 fallback → 提升成功率

小結

到這裡，每日知識快遞算是完成了：

Pipeline 自動化
LLM 文本處理
翻譯 / 美化郵件
自動寄送

下一步可以加：

使用者分類訂閱
多模型選擇
更進階的個人化內容

Day 18｜每日知識快遞上線！(上)｜鍵盤一敲，AI 幫你自動整理論文精華到信箱

Day 20｜FastAPI 安全大公開！沒 CORS、Trusted Host、Rate Limiting，你的 API 就像無人看管的自助餐廳

系列文

論文流浪記：我與AI 探索工具、組合流程、挑戰完整平台共 33 篇

RSS系列文訂閱系列文

9 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19864 篇

完賽人數

529 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

論文流浪記：我與AI 探索工具、組合流程、挑戰完整平台系列 第 20 篇

Day 19｜每日知識快遞 (下)｜再也不用怕 Token 爆掉！工程師的分階段榨乾 LLM 心法

前言

LLM 技術亮點

1. 拆成三階段

生成摘要 summarize_paper

翻譯 llm_translate

HTML 格式化 format_html

生成摘要 generate_summary

Retry 機制

Token 限制

翻譯問題

架構回顧

Lesson Learned

小結

尚未有邦友留言

標記使用者

論文流浪記：我與AI 探索工具、組合流程、挑戰完整平台系列第 20 篇

生成摘要 `summarize_paper`

翻譯 `llm_translate`

HTML 格式化 `format_html`