這篇文章不是 100% 純技術——因為我偷偷撒了一點魔法粉。
程式碼是咒語、流程是魔法陣,錯誤訊息則是暗黑詛咒。
如果你只想看乾巴巴的程式碼解析,這裡可能不適合;但如果你願意把技術當成冒險,那就歡迎踏進來。
昨天,我在荒野中探索了電子郵件的魔法門;今天,我將深入魔法森林——讓這條 Email Pipeline 學會自動生成每日摘要,並精準飛進每位訂閱者的收件匣。
這次任務不只是抓論文或發郵件,而是讓每位訂閱者都能收到「專屬智慧精華」,就像卷軸自動識別誰是合格巫師,誰還得再練習魔法才能開啟——失敗者只能看錯亂的 HTML 標籤、404黑魔法 。
手中的魔法卷軸微微發光,它不只是資料的集合,而是一條通往知識寶藏的秘密通道。
我背上行囊,準備解讀卷軸裡那些未解的符文。
get user
每個使用者都是卷軸上的一個魔法符號,包含:
後續可以解鎖更多「量身訂做」魔法,例如:風格魔法
def get_subscribed_users(db: Session) -> List[Dict]:
subscribed_users = []
try:
# join users 與 user_setting
query = (
db.query(User, UserSetting)
.join(UserSetting, User.id == UserSetting.user_id)
.filter(UserSetting.subscribe_email)
)
for user, setting in query.all():
try:
email = get_user_email_from_firebase(user.id)
except HTTPException:
logger.warning(f"Skipping user {user.id}, no email found")
continue
subscribed_users.append(
{
"user_id": user.id,
"email": email,
"translate": setting.translate,
"user_language": setting.user_language,
"temperature": setting.temperature,
"system_prompt": setting.system_prompt,
"top_k": setting.top_k,
}
)
return subscribed_users
fetch_new_papers
抓取 since_date
之後 published 的新論文,避免重複。每篇論文都帶著:
def fetch_new_papers(db: Session, since_date=None, limit: int = 500) -> list[Paper]:
if since_date is None:
since_date = datetime.utcnow() - timedelta(days=30) # 預設抓過去 30 天
papers = (
db.query(Paper)
.filter(and_(Paper.pdf_processed, Paper.published_date >= since_date.date()))
.limit(limit)
.all()
)
return papers
批次抓太大?別怕,limit 就像「卷軸防呆裝置」,一次最多 500 篇,避免魔法爆炸。
with db_session() as db
做了什麼?使用 context manager (with db_session()) 讓資料庫連線更安全。db_session()
是一個 context manager,會自動處理:
所以在 with 區塊內,你可以安全使用 db 去查詢或寫入資料,區塊結束後 session 一定會關閉,不會留下半開連線。
@contextmanager
def db_session():
db = SessionLocal()
try:
yield db
db.commit()
except:
db.rollback()
raise
finally:
db.close()
fetch_paper_content_from_qdrant
def fetch_paper_content_from_qdrant(arxiv_id: str = None, title: str = None) -> str:
"""
從 Qdrant 依 arxiv_id 或 title 抓取 raw_content
"""
must_conditions = []
if arxiv_id:
must_conditions.append(
models.FieldCondition(
key="arxiv_id",
match=models.MatchValue(value=arxiv_id),
)
)
if title:
must_conditions.append(
models.FieldCondition(
key="title",
match=models.MatchValue(value=title),
)
)
if not must_conditions:
return ""
result = qdrant_client.scroll(
collection_name=COLLECTION_NAME,
scroll_filter=models.Filter(must=must_conditions),
limit=100
)
points, _ = result
if not points:
return ""
payload = points[0].payload
raw_content = payload.get("text", "")
return raw_content
fetch_paper_content_from_qdrant
從向量資料庫拉取原始內容(raw_content),確保摘要生成有足夠資訊。
🪄 技術比喻:
scroll 就像魔法搜尋咒,只過濾 metadata,精準對應 arxiv_id 或 title。
search 則像隨機魔法陣,依相似度找到內容,不適合精準匹配。
💡 Tricky point: limit=100 → 就像圖書館一次只能給你 100 本書,如果符合條件超過 100 本,就要拿卷軸 (scroll_id) 繼續拿下一批。
我這裡假設每篇論文不超過 100 chunk,有失偏頗,但簡化流程。
points, scroll_id = qdrant_client.scroll(
collection_name="my_collection",
scroll_filter=models.Filter(must=must_conditions),
limit=100
)
all_points = points.copy()
while scroll_id:
points, scroll_id = qdrant_client.scroll(
collection_name="my_collection",
scroll_filter=models.Filter(must=must_conditions),
scroll_id=scroll_id
)
all_points.extend(points)
filter_already_sent_papers
def filter_already_sent_papers(user_id: int, papers: list[dict]) -> list[dict]:
with db_session() as db:
sent_arxiv_ids = {
r.arxiv_id
for r in db.query(UserSentPaper)
.filter(UserSentPaper.user_id == user_id)
.all()
}
new_papers = [p for p in papers if p["arxiv_id"] not in sent_arxiv_ids]
return new_papers
每位訂閱者的卷軸上都會有封印:已看過(已寄過)的符文不再重複顯現。這一層保證:
確保每次打開郵件都像拆禮物,而不是每天吃剩飯 😆
generate_summary
def fetch_paper_info(paper: dict, content_map: dict[str, str]) -> dict:
"""整理單篇論文資訊,包含 raw_content(若有)"""
paper_info = {
"title": paper.get("title") or "No Title",
"authors": paper.get("authors") or [],
"abstract": paper.get("abstract") or "",
"pdf_url": paper.get("pdf_url") or None,
"published_date": paper.get("published_date") or None,
}
arxiv_id = paper.get("arxiv_id")
if arxiv_id and arxiv_id in content_map:
paper_info["raw_content"] = content_map[arxiv_id]
return paper_info
def summarize_paper(paper_info: dict, user: dict) -> str:
"""呼叫 LLM 生成摘要,若失敗則 fallback"""
try:
summary = llm_summary(paper_info, user)
return summary
except Exception as e:
return "Summary generation failed."
def generate_summary(
papers_and_content: tuple[list[dict], dict[str, str]], user: dict
) -> str:
"""
將每篇論文生成 LLM 摘要,並整理成 HTML
"""
papers, content_map = papers_and_content
if not papers:
return "<p>No new papers today.</p>"
papers_html = ""
for idx, p in enumerate(papers, start=1):
paper_info = fetch_paper_info(p, content_map)
summary = summarize_paper(paper_info, user)
pdf_url = paper_info.get("pdf_url")
pdf_link_html = (
f'<a href="{pdf_url}" target="_blank">Preview PDF</a>' if pdf_url else "N/A"
)
papers_html += f"""
<div class="paper-summary">
<div class="paper-title">{idx}. {paper_info["title"]}</div>
<div class="paper-meta">
<strong>Authors:</strong> {", ".join(paper_info.get("authors", []))} <br>
<strong>Published:</strong> {paper_info.get("published_date", "N/A")} <br>
<strong>PDF:</strong> {pdf_link_html}
</div>
<div class="paper-abstract">
{summary}
</div>
</div>
"""
template_path = pathlib.Path(__file__).parent / "template.html"
template_text = template_path.read_text(encoding="utf-8")
final_html = Template(template_text).substitute(papers_html=papers_html)
return final_html
send_email
生成了魔法卷軸後,要送到正確的巫師手中,這就是 send_email
的任務。
def send_email(
subject: str,
recipients: str,
body: str,
attachments: list[dict] = None, # [{"filename": "summary.pdf", "content": bytes}]
):
"""
使用 smtplib 發送 HTML 郵件 + 附件,保留與原 FastMail 相同的函數接口。
"""
# 建立多部分郵件(HTML + 附件)
msg = MIMEMultipart("mixed")
msg["Subject"] = subject
msg["From"] = settings.MAIL_FROM
msg["To"] = recipients
# 加入 HTML 內容
msg.attach(MIMEText(body, "html"))
# 加入附件
for att in attachments or []:
part = MIMEBase("application", "octet-stream")
part.set_payload(att["content"])
encoders.encode_base64(part)
part.add_header(
"Content-Disposition", f'attachment; filename="{att["filename"]}"'
)
msg.attach(part)
try:
# 使用 Gmail 或其他 SMTP server
with smtplib.SMTP(settings.MAIL_SERVER, settings.MAIL_PORT) as server:
if settings.MAIL_TLS:
server.starttls()
server.login(settings.MAIL_USERNAME, settings.MAIL_PASSWORD)
server.send_message(msg)
except Exception as e:
logger.error(f"Failed to send email to {recipients}: {e}")
今天,我完成了卷軸的第二層魔法:
抓論文 → 聚合內容 → 過濾重複 → 生成摘要 → 發送
每個步驟都是一次小冒險,必須精準可靠,才能把知識寶石安全送到使用者手中。要是失敗了,可能整份郵件都會變成 404 Not Found
這也再一次證明:打造 Email Pipeline,從來不只是冷冰冰的程式碼,而是一場真實的巫師試煉。
我們不只是 debug,更像在與魔法生物交涉;我們不只是排程,更像在操控時間流動。
接下來的篇章,我會繼續記錄這段冒險——既是工程筆記,也是魔法見聞錄。
如果你也準備好了,就拿起你的魔杖(鍵盤),一起探索這片未知的荒野吧。
借鏡 Day 6|你好 Ollama - 與 Ollama 模型初次見面 當中的 LangChain 封裝
def llm_summary(paper: Dict, user: dict, max_words: int = 300) -> str:
if not paper:
return "No paper provided."
is_translate = user.get("translate", False)
user_language = user.get("user_language", "English")
temperature = user.get("temperature", 0.5)
system_prompt = user.get("system_prompt", "")
title = paper.get("title", "No Title")
authors = ", ".join(paper.get("authors") or [])
authors_str = ", ".join([a.replace("{", "{{").replace("}", "}}") for a in authors])
content = paper.get("raw_content") or paper.get("abstract", "")
content_type = "Full Content" if paper.get("raw_content") else "Abstract"
translation_instruction = ""
if is_translate:
translation_instruction = (
f"Translate the summary to {user_language}. Output ONLY in {user_language}."
)
# 讀取 prompt template
template_text = PROMPT_FILE.read_text(encoding="utf-8")
prompt_template = template_text.format(
system_prompt=system_prompt,
max_words=max_words,
content_type=content_type,
translation_instruction=translation_instruction,
title=title,
authors=authors_str,
content=content,
)
chat_model = ChatOllama(
model=MODEL_NAME,
temperature=temperature,
base_url=OLLAMA_API_URL,
)
prompt = ChatPromptTemplate.from_template(prompt_template)
chain = prompt | chat_model
try:
resp = chain.invoke({})
html_summary = resp.content.strip()
html_summary = "\n".join(
[line for line in html_summary.splitlines() if line.strip()]
)
return html_summary
except Exception as e:
return f"<p><strong>Summary generation failed:</strong> {e}</p>"
{system_prompt}
You are a professional research assistant.
Summarize the following paper concisely, in no more than {max_words} words.
Keep it readable for an email newsletter.
(Note: the text provided is the paper's {content_type})
{translation_instruction}
OUTPUT MUST BE HTML and email-friendly.
Use headings (<h2>, <h3>), paragraphs (<p>), bold (<strong>), italics (<em>),
unordered lists (<ul>) and ordered lists (<ol>) for bullet points.
Do NOT use Markdown or plain text.
Ensure all tags are properly closed.
Instructions:
1. Base your answer STRICTLY on the provided paper excerpts.
2. Maintain academic accuracy and precision.
3. Structure your answer logically with clear paragraphs when appropriate.
4. DO NOT include any introductory paragraphs about the authors, affiliations, or background. Focus ONLY on the paper's content, key findings, methods, and important points.
Remember:
- Do NOT make up information not present in the excerpts.
- Do NOT use knowledge beyond what's provided in the paper excerpts.
- Always acknowledge uncertainty when the excerpts are ambiguous or incomplete.
- Prioritize relevance and clarity in your response.
- NEVER add introductory phrases or explanations before your HTML response.
Paper:
Title: {title}
Authors: {authors}
Content: {content}
Include key findings, methods, and any important points as bullet points or numbered lists.
template.html
<!DOCTYPE html>
<html lang="zh-Hant">
<head>
<meta charset="UTF-8">
<title>今日論文摘要</title>
<style>
body {
font-family: "Segoe UI", Tahoma, Geneva, Verdana, sans-serif;
line-height: 1.6;
background-color: #f9fafc;
/* 淺色背景 */
margin: 0;
padding: 20px;
}
h2 {
text-align: center;
color: #2c3e50;
margin-bottom: 30px;
background: linear-gradient(90deg, #a0c4ff, #e0f7fa);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
}
.paper-summary {
border-radius: 12px;
padding: 20px;
margin-bottom: 20px;
background: linear-gradient(145deg, #ffffff, #f0f4f8);
/* 淺色卡片 */
box-shadow: 0 4px 10px rgba(0, 0, 0, 0.08);
transition: transform 0.3s, box-shadow 0.3s, background 0.3s;
cursor: pointer;
}
.paper-summary:hover {
transform: translateY(-5px);
box-shadow: 0 8px 20px rgba(0, 0, 0, 0.15);
background: linear-gradient(145deg, #f0f4f8, #e0ebfc);
/* 淺色 hover */
}
.paper-title {
font-size: 1.5em;
font-weight: bold;
margin-bottom: 10px;
color: #f6b387;
transition: color 0.3s;
}
.paper-title:hover {
color: #ff5722;
}
.paper-meta {
font-size: 0.9em;
color: #616161;
margin-bottom: 12px;
}
.paper-meta a {
color: #1976d2;
text-decoration: none;
font-weight: bold;
}
.paper-meta a:hover {
text-decoration: underline;
}
.paper-summary p,
.paper-summary ul,
.paper-summary ol {
margin: 8px 0;
color: #424242;
}
.label {
display: inline-block;
padding: 2px 6px;
font-size: 0.75em;
font-weight: bold;
border-radius: 4px;
margin-right: 6px;
color: #fff;
}
.label-authors {
background-color: #00796b;
}
.label-date {
background-color: #f57c00;
}
.label-pdf {
background-color: #1976d2;
}
.summary-content {
display: none;
margin-top: 10px;
}
.toggle-btn {
display: inline-block;
margin-top: 8px;
padding: 5px 10px;
background-color: #007bff;
color: #fff;
border-radius: 6px;
font-size: 0.85em;
cursor: pointer;
transition: background 0.3s;
}
.toggle-btn:hover {
background-color: #0056b3;
}
</style>
</head>
<body>
<h2>今日論文摘要</h2>
<!-- Papers Section Start -->
${papers_html}
<!-- Papers Section End -->
<p><em>本摘要僅供參考,最終請依原始論文與專業判斷。</em></p>
<script>
document.querySelectorAll('.paper-summary').forEach(function (card) {
card.addEventListener('click', function () {
const content = this.querySelector('.summary-content');
if (content) {
content.style.display = content.style.display === 'none' ? 'block' : 'none';
}
});
});
</script>
</body>
</html>