⚡《AI 知識系統建造日誌》這不是一篇純技術文章,而是一場工程師的魔法冒險。程式是咒語、流程是魔法陣、錯誤訊息則是黑暗詛咒。請準備好你的魔杖(鍵盤),今天,我們要踏入魔法學院的基礎魔法課,打造穩定、可擴展的 AI 知識系統。
延續先前Ollam, RAG 介紹,若你錯過前情,可以先看看這幾篇:
Day14|RAG 魔法課 (上):Hybrid Search 與 Re-ranking 完整實戰
Day 6|你好 Ollama - 與 Ollama 模型初次見面
Day 7 | 穿越 RAG 魔法迷宮:打造智慧問答系統的秘訣 - RAG Pipeline
今天的冒險,我帶著筆電與咖啡,像拿著探險地圖的魔法學徒,準備面對 RAG 系統的工程挑戰。上一回,我們透過 Hybrid Search 與 Re-ranking 找出最相關的片段;而今天,我將深入探討如何掌控這些資料精靈,並在現實中建造一座穩固、可擴展的知識塔。
被自己搞得精疲力竭,多喝幾杯咖啡續命寫文章。 ☕
try:
prompt_data = ollama_client.prompt_builder.create_structured_prompt(
query, chunks, system_settings.user_language
)
final_prompt = prompt_data["prompt"]
except Exception:
final_prompt = ollama_client.prompt_builder.create_rag_prompt(
query, chunks, system_settings.user_language
)
parsed_response, response = await ollama_client.generate_rag_answer(
query=query,
chunks=chunks,
user_language=system_settings.user_language,
use_structured_output=True,
temperature=system_settings.temperature,
)
create_structured_prompt
Ollama 支援 structured-outputs ,讓 LLM 可以依照我們設計的結構回覆,例如遵循 RAGResponse。
官方範例:
from ollama import chat
from pydantic import BaseModel
class Country(BaseModel):
name: str
capital: str
languages: list[str]
response = chat(
messages=[
{
'role': 'user',
'content': 'Tell me about Canada.',
}
],
model='llama3.1',
format=Country.model_json_schema(),
)
country = Country.model_validate_json(response.message.content)
print(country)
output
name='Canada' capital='Ottawa' languages=['English', 'French']
理想上,我們希望 LLM 可以完全遵守 RAGResponse 結構,但現實中需要取捨。
class RAGResponse(BaseModel):
"""Structured response model for RAG queries."""
answer: str = Field(
description="Comprehensive answer based on the provided paper excerpts"
)
sources: List[str] = Field(
default_factory=list,
description="List of PDF URLs from papers used in the answer",
)
confidence: Optional[str] = Field(
default=None,
description="Confidence level: high, medium, or low based on excerpt relevance",
)
citations: Optional[List[str]] = Field(
default=None,
description="Specific arXiv IDs or paper titles referenced in the answer",
)
對應 JSON Schema:
{
"description": "Structured response model for RAG queries.",
"properties": {
"answer": {...},
"sources": {...},
"confidence": {...},
"citations": {...}
},
"required": ["answer"],
"title": "RAGResponse",
"type": "object"
}
def create_rag_prompt(
self, query: str, chunks: List[Dict[str, Any]], user_language: str = "English"
) -> str:
"""Create a RAG prompt with query and retrieved chunks.
Args:
query: User's question
chunks: List of retrieved chunks with metadata from Qdrant
Returns:
Formatted prompt string
"""
prompt = f"{self.system_prompt}\n\n"
prompt += "### Context from Papers:\n\n"
for i, chunk in enumerate(chunks, 1):
# Get the actual chunk text
chunk_text = chunk.get("chunk_text", chunk.get("content", ""))
arxiv_id = chunk.get("arxiv_id", "")
prompt += f"[{i}. arXiv:{arxiv_id}]\n{chunk_text}\n\n"
prompt += f"### Question:\n{query}\n\n"
prompt += "### Answer:\n"
prompt += (
"Provide a natural, conversational response (not JSON), cite sources using [arXiv:id] format.\n\n"
f"and Translate to {user_language}. "
f"Output ONLY in {user_language}, formatted clearly for readability"
)
return prompt
def create_structured_prompt(
self, query: str, chunks: List[Dict[str, Any]], user_language: str = "English"
) -> Dict[str, Any]:
"""Create a prompt for Ollama with structured output format.
Args:
query: User's question
chunks: List of retrieved chunks
Returns:
Dictionary with prompt and format schema for Ollama
"""
return {
"prompt": self.create_rag_prompt(query, chunks, user_language),
"format": RAGResponse.model_json_schema(),
}
對不起,我英文不好,翻譯對我來說很重要 😢
gpt-oss:20b
,而官方範例使用 llama3.1
。llama3.1
)支援 structured output,其他大模型如 gpt-oss:20b
不支援。llama3
家族,最後敲定 llama3.2:3b
這就是需要咖啡續命的原因
那現在就要取捨了。到底是用香香的 gpt-oss:20b
但是沒支援 Structured outputs 還是使用其他有支援的 。
"Return EXACTLY in JSON matching this schema:\n"
"{\n"
' "answer": "",\n'
' "sources": [],\n'
' "confidence": "",\n'
' "citations": []\n'
"}\n"
f"and Translate to {user_language}. "
f"Output ONLY in {user_language}, formatted clearly for readability"
在使用 Ollama 生成 RAG 答案時,我們需要告訴模型 format=
,才能啟用結構化輸出:
async def generate_rag_answer(
self,
query: str,
chunks: List[Dict[str, Any]],
use_structured_output: bool = False,
temperature: float = 0.5,
user_language: str = "English",
) -> Dict[str, Any]:
"""
Generate a RAG answer using retrieved chunks.
Args:
query: User's question
chunks: Retrieved document chunks with metadata
model: Model to use for generation
use_structured_output: Whether to use Ollama's structured output feature
Returns:
Dictionary with answer, sources, confidence, and citations
"""
try:
if use_structured_output:
# 使用結構化輸出
prompt_data = self.prompt_builder.create_structured_prompt(
query, chunks, user_language=user_language
)
response = await self.generate(
prompt=prompt_data["prompt"],
temperature=temperature,
top_p=0.9,
format=prompt_data["format"], # 注意:僅特定模型支持
)
else:
# 純文字 fallback
prompt = self.prompt_builder.create_rag_prompt(
query, chunks, user_language=user_language
)
logger.info(f"promptprompt {prompt}")
# Generate without format restrictions
response = await self.generate(
prompt=prompt,
temperature=temperature,
top_p=0.9,
)
if response and "response" in response:
answer_text = response["response"]
logger.info(f"Raw LLM response: {answer_text}")
if use_structured_output:
# Try to parse structured response if enabled
parsed_response = self.response_parser.parse_structured_response(
answer_text
)
logger.info(f"Parsed response: {parsed_response}")
return parsed_response, response
else:
# For plain text response, build simple response structure
sources = []
seen_urls = set()
for chunk in chunks:
arxiv_id = chunk.get("arxiv_id")
if arxiv_id:
arxiv_id_clean = (
arxiv_id.split("v")[0] if "v" in arxiv_id else arxiv_id
)
pdf_url = f"https://arxiv.org/pdf/{arxiv_id_clean}.pdf"
if pdf_url not in seen_urls:
sources.append(pdf_url)
seen_urls.add(pdf_url)
citations = list(
set(
chunk.get("arxiv_id")
for chunk in chunks
if chunk.get("arxiv_id")
)
)
return {
"answer": answer_text,
"sources": sources,
"confidence": "medium",
"citations": citations[:5],
}, response
else:
raise OllamaException("No response generated from Ollama")
except Exception as e:
logger.error(f"Error generating RAG answer: {e}")
raise OllamaException(f"Failed to generate RAG answer: {e}")
觀察到 llama3.2:3b
的輸出通常過於簡短,而 gpt-oss:20b
又不支援結構化輸出。心中冒出一個大膽的策略:先用 gpt-oss:20b
生成答案,再交由 llama3.2:3b
生成結構化輸出,但這個想法暫時先不實現。因為有以下 挑戰 要被克服
gpt-oss:20b
context limit = 8192 tokensllama3.2:3b
context limit = 4096 tokens果不其然,這些都是套路。
工程師的世界裡,沒有一步到位的理想方案。每個決策都是權衡與取捨。 >_<
這篇文章展示了 RAG 系統工程化的思維:
重點不是單純技術,而是 工程師在面對大模型的不確定性時,如何設計可控、可擴展、可容錯的系統。
class OllamaClient:
"""Client for interacting with Ollama local LLM service."""
def __init__(self, settings: Settings):
"""Initialize Ollama client with settings."""
self.base_url = settings.OLLAMA_API_URL
self.model_name = settings.MODEL_NAME
self.timeout = httpx.Timeout(float(settings.OLLAMA_TIMEOUT))
self.prompt_builder = RAGPromptBuilder()
self.response_parser = ResponseParser()
async def generate(
self,
prompt: str = "",
**kwargs,
) -> str:
"""
Generate text using specified model.
Args:
model: Model name to use
prompt: Input prompt for generation
**kwargs: Additional generation parameters
Returns:
Response dictionary or None if failed
"""
try:
async with httpx.AsyncClient(timeout=self.timeout) as client:
data = {
"model": self.model_name,
"prompt": prompt,
"stream": False,
**kwargs,
}
response = await client.post(f"{self.base_url}/api/generate", json=data)
if response.status_code == 200:
return response.json()
else:
raise OllamaException(f"Generation failed: {response.status_code}")
except httpx.ConnectError as e:
raise OllamaConnectionError(f"Cannot connect to Ollama service: {e}")
except httpx.TimeoutException as e:
raise OllamaTimeoutError(f"Ollama service timeout: {e}")
except OllamaException:
raise
except Exception as e:
raise OllamaException(f"Error generating with Ollama: {e}")
async def generate_rag_answer(
self,
query: str,
chunks: List[Dict[str, Any]],
use_structured_output: bool = False,
temperature: float = 0.5,
user_language: str = "English",
) -> Dict[str, Any]:
"""
Generate a RAG answer using retrieved chunks.
Args:
query: User's question
chunks: Retrieved document chunks with metadata
model: Model to use for generation
use_structured_output: Whether to use Ollama's structured output feature
Returns:
Dictionary with answer, sources, confidence, and citations
"""
try:
if use_structured_output:
# Use structured output with Pydantic model
prompt_data = self.prompt_builder.create_structured_prompt(
query, chunks, user_language=user_language
)
logger.info(f"prompt_data {prompt_data}\n\n")
# Generate with structured format
response = await self.generate(
prompt=prompt_data,
temperature=temperature,
top_p=0.9,
)
else:
# Fallback to plain text mode
prompt = self.prompt_builder.create_rag_prompt(
query, chunks, user_language=user_language
)
logger.info(f"promptprompt {prompt}")
# Generate without format restrictions
response = await self.generate(
prompt=prompt,
temperature=temperature,
top_p=0.9,
)
if response and "response" in response:
answer_text = response["response"]
logger.info(f"Raw LLM response: {answer_text}")
if use_structured_output:
# Try to parse structured response if enabled
parsed_response = self.response_parser.parse_structured_response(
answer_text
)
logger.info(f"Parsed response: {parsed_response}")
return parsed_response, response
else:
# For plain text response, build simple response structure
sources = []
seen_urls = set()
for chunk in chunks:
arxiv_id = chunk.get("arxiv_id")
if arxiv_id:
arxiv_id_clean = (
arxiv_id.split("v")[0] if "v" in arxiv_id else arxiv_id
)
pdf_url = f"https://arxiv.org/pdf/{arxiv_id_clean}.pdf"
if pdf_url not in seen_urls:
sources.append(pdf_url)
seen_urls.add(pdf_url)
citations = list(
set(
chunk.get("arxiv_id")
for chunk in chunks
if chunk.get("arxiv_id")
)
)
return {
"answer": answer_text,
"sources": sources,
"confidence": "medium",
"citations": citations[:5],
}, response
else:
raise OllamaException("No response generated from Ollama")
except Exception as e:
logger.error(f"Error generating RAG answer: {e}")
raise OllamaException(f"Failed to generate RAG answer: {e}")
class RAGPromptBuilder:
"""Builder class for creating RAG prompts."""
def __init__(self):
"""Initialize the prompt builder."""
self.prompts_dir = Path(__file__).parent / "prompts"
self.system_prompt = self._load_system_prompt()
def _load_system_prompt(self) -> str:
"""Load the system prompt from the text file.
Returns:
System prompt string
"""
prompt_file = self.prompts_dir / "rag_system.txt"
if not prompt_file.exists():
# Fallback to default prompt if file doesn't exist
return (
"You are an AI assistant specialized in answering questions about "
"academic papers from arXiv. Base your answer STRICTLY on the provided "
"paper excerpts."
)
return prompt_file.read_text().strip()
def create_rag_prompt(
self, query: str, chunks: List[Dict[str, Any]], user_language: str = "English"
) -> str:
"""Create a RAG prompt with query and retrieved chunks.
Args:
query: User's question
chunks: List of retrieved chunks with metadata from Qdrant
Returns:
Formatted prompt string
"""
prompt = f"{self.system_prompt}\n\n"
prompt += "### Context from Papers:\n\n"
for i, chunk in enumerate(chunks, 1):
# Get the actual chunk text
chunk_text = chunk.get("chunk_text", chunk.get("content", ""))
arxiv_id = chunk.get("arxiv_id", "")
# Only include minimal metadata - just arxiv_id for citation
prompt += f"[{i}. arXiv:{arxiv_id}]\n"
prompt += f"{chunk_text}\n\n"
prompt += f"### Question:\n{query}\n\n"
prompt += "### Answer:\n"
prompt += (
"Provide a natural, conversational response (not JSON), cite sources using [arXiv:id] format.\n\n"
f"and Translate to {user_language}. "
f"Output ONLY in {user_language}, formatted clearly for readability"
)
return prompt
def create_structured_prompt(
self, query: str, chunks: List[Dict[str, Any]], user_language: str = "English"
) -> Dict[str, Any]:
"""Create a prompt for Ollama with structured output format.
Args:
query: User's question
chunks: List of retrieved chunks
Returns:
Dictionary with prompt and format schema for Ollama
"""
return {
"prompt": self.create_rag_prompt(query, chunks, user_language),
"format": RAGResponse.model_json_schema(),
}
class ResponseParser:
"""Parser for LLM responses."""
@staticmethod
def parse_structured_response(response: str) -> Dict[str, Any]:
"""Parse a structured response from Ollama.
Args:
response: Raw LLM response string
Returns:
Dictionary with parsed response
"""
try:
# Try to parse as JSON and validate with Pydantic
parsed_json = json.loads(response)
validated_response = RAGResponse(**parsed_json)
return validated_response.model_dump()
except (json.JSONDecodeError, ValidationError):
# Fallback: try to extract JSON from the response
return ResponseParser._extract_json_fallback(response)
@staticmethod
def _extract_json_fallback(response: str) -> Dict[str, Any]:
"""Extract JSON from response text as fallback.
Args:
response: Raw response text
Returns:
Dictionary with extracted content or fallback
"""
# Try to find JSON in the response
json_match = re.search(r"\{.*\}", response, re.DOTALL)
if json_match:
try:
parsed = json.loads(json_match.group())
# Validate with Pydantic, using defaults for missing fields
validated = RAGResponse(**parsed)
return validated.model_dump()
except (json.JSONDecodeError, ValidationError):
pass
# Final fallback: return response as plain text
return {
"answer": response,
"sources": [],
"confidence": "low",
"citations": [],
}