目前想實作一個能用在日常生活中的項目,能使用到 RAG 以及 LLM
目前是想使用 search_with_lepton (這個demo 類似perplexity) 這個例子改裝改裝。
把這應用改裝成一個問答機器人,能把問的東西先在網路上搜索,再把資料跟問題餵給大模型( huggingface 的模型加上能用網頁打開的問答機器人 (gemini , bing chat , chatgpt ,claude ))。
最後程式能把各個問答機器人的回答抓回程式呈現以及總結出最正確以及詳細的回答。
目前這個項目假如在鐵人賽沒做完的話,我還會在鐵人賽每天更新進度直到完成。
以下是 search_with_lepton.py 的解析:
一開始導入各種會用到的庫:
我這邊註解寫的會比較不合規格一點,請見諒
import concurrent.futures # 提供高層級的 API 用於並行執行任務,特別是用於創建線程池或進程池,來管理多個線程或進程的執行
import glob # 用於查找符合特定模式的文件路徑名,特別是在文件系統中進行通配符匹配。
import json # 用於編碼和解碼 JSON 格式的數據,這是常用的數據交換格式,特別是在 API 請求和響應中。
import os # 提供與操作系統交互的功能,如環境變量、文件和目錄操作。
import re # 用於正則表達式操作,如字符串匹配和替換。這在處理和驗證字符串時特別有用。
import threading # 提供線程支持,用於在應用程序中創建和管理多線程,允許並行執行代碼塊。
import requests # 流行的第三方 HTTP 庫,用於發送 HTTP 請求和處理響應,通常用於調用外部 API。
import traceback # 用於跟踪異常,生成異常的錯誤追蹤信息,便於調試。
from typing import Annotated, List, Generator, Optional # 提供標準化的類型提示,用於註釋 Python 代碼中的類型信息,例如 Annotated、List、Generator 和 Optional,這有助於增強代碼的可讀性和工具支持。
from fastapi import HTTPException # 一個用於構建快速 API 的現代網絡框架。HTTPException 用於處理 HTTP 錯誤,HTMLResponse、StreamingResponse 和 RedirectResponse 用於返回不同類型的 HTTP 響應。
from fastapi.responses import HTMLResponse, StreamingResponse, RedirectResponse
import httpx # httpx:支持異步的 HTTP 客戶端庫,提供比 requests 更豐富的功能集,特別是對異步編程的支持。
from loguru import logger
# loguru:一個易於使用的日誌記錄庫,提供了強大的日誌功能,如自動化日誌格式化、異常捕獲和輸出到不同的目標(文件、控制台等)。
import leptonai
from leptonai import Client
from leptonai.kv import KV
from leptonai.photon import Photon, StaticFiles
from leptonai.photon.types import to_bool
from leptonai.api.workspace import WorkspaceInfoLocalRecord
from leptonai.util import tool
#Lepton AI 庫
#leptonai:Lepton AI 的主庫,包含了與 Lepton AI 相關的各種工具和功能。具體用途由其內部模組決定。
#leptonai.Client:用於與 Lepton AI 平台進行 API 通信的客戶端類別。
#leptonai.kv:提供鍵值存儲功能,用於存儲和檢索數據。這在保存和讀取臨時數據時很有用。
#leptonai.photon:一個框架,可能是 Lepton AI 的一部分,用於構建和運行應用程序。Photon 類別是應用程序的基礎類別,StaticFiles 用於服務靜態文件。
#leptonai.photon.types:包含了一些輔助函數,如 to_bool,用於將字符串轉換為布爾值。
#leptonai.api.workspace:提供與工作空間相關的功能,如 WorkspaceInfoLocalRecord,用於管理當前工作空間的數據。
#leptonai.util:工具模組,包含通用工具和輔助函數。tool 可能用於獲取工具規範或其他功能。
接下來程式會根據順序導入定義的一系列模板,你可以看作是常量或是 static 的 string
################################################################################
# Constant values for the RAG model.
################################################################################
# Search engine related. You don't really need to change this.
BING_SEARCH_V7_ENDPOINT = "https://api.bing.microsoft.com/v7.0/search"
BING_MKT = "en-US"
GOOGLE_SEARCH_ENDPOINT = "https://customsearch.googleapis.com/customsearch/v1"
SERPER_SEARCH_ENDPOINT = "https://google.serper.dev/search"
SEARCHAPI_SEARCH_ENDPOINT = "https://www.searchapi.io/api/v1/search"
# Specify the number of references from the search engine you want to use.
# 8 is usually a good number.
REFERENCE_COUNT = 8
# Specify the default timeout for the search engine. If the search engine
# does not respond within this time, we will return an error.
DEFAULT_SEARCH_ENGINE_TIMEOUT = 5
好的,讓我們來解釋這段 Python 代碼,它定義了一系列常量和提示模板,這些對於 RAG(Retrieval-Augmented Generation,檢索增強生成)模型的運行至關重要。
常量
BING_SEARCH_V7_ENDPOINT
, GOOGLE_SEARCH_ENDPOINT
, SERPER_SEARCH_ENDPOINT
, SEARCHAPI_SEARCH_ENDPOINT
:這些常量存儲了不同搜索引擎的 API 端點 URL,用於向這些搜索引擎發送查詢請求。BING_MKT
:指定 Bing 搜索的市場區域,這裡設置為 "en-US",表示美國英語。REFERENCE_COUNT
:指定從搜索引擎獲取的參考上下文數量,默認為 8。DEFAULT_SEARCH_ENGINE_TIMEOUT
:指定搜索引擎查詢的默認超時時間(秒),默認為 5 秒。# If the user did not provide a query, we will use this default query.
_default_query = "Who said 'live long and prosper'?"
_default_query
:如果用戶沒有提供查詢,則使用這個默認查詢:"Who said 'live long and prosper'?"。# A set of stop words to use - this is not a complete set, and you may want to
# add more given your observation.
stop_words = [
"<|im_end|>",
"[End]",
"[end]",
"\nReferences:\n",
"\nSources:\n",
"End.",
]
stop_words
:這是一個停止詞列表,用於指示 LLM 模型何時停止生成文本。這裡包括了一些常見的停止詞,如 <|im_end|>
, [End]
, [end]
, \nReferences:\n
, \nSources:\n
, End.
。提示模板
# This is really the most important part of the rag model. It gives instructions
# to the model on how to generate the answer. Of course, different models may
# behave differently, and we haven't tuned the prompt to make it optimal - this
# is left to you, application creators, as an open problem.
_rag_query_text = """
You are a large language AI assistant built by Lepton AI. You are given a user question, and please write clean, concise and accurate answer to the question. You will be given a set of related contexts to the question, each starting with a reference number like [[citation:x]], where x is a number. Please use the context and cite the context at the end of each sentence if applicable.
Your answer must be correct, accurate and written by an expert using an unbiased and professional tone. Please limit to 1024 tokens. Do not give any information that is not related to the question, and do not repeat. Say "information is missing on" followed by the related topic, if the given context do not provide sufficient information.
Please cite the contexts with the reference numbers, in the format [citation:x]. If a sentence comes from multiple contexts, please list all applicable citations, like [citation:3][citation:5]. Other than code and specific names and citations, your answer must be written in the same language as the question.
Here are the set of contexts:
{context}
Remember, don't blindly repeat the contexts verbatim. And here is the user question:
"""
_rag_query_text
:這是 RAG 模型最核心的提示模板,它指導模型如何生成答案。
[[citation:x]]
的形式提供,其中 x 是數字,模型應該在回答中適當地引用這些上下文。[citation:x]
的格式引用上下文,如果一個句子來自多個上下文,則列出所有適用的引用。# This is the prompt that asks the model to generate related questions to the
# original question and the contexts.
# Ideally, one want to include both the original question and the answer from the
# model, but we are not doing that here: if we need to wait for the answer, then
# the generation of the related questions will usually have to start only after
# the whole answer is generated. This creates a noticeable delay in the response
# time. As a result, and as you will see in the code, we will be sending out two
# consecutive requests to the model: one for the answer, and one for the related
# questions. This is not ideal, but it is a good tradeoff between response time
# and quality.
_more_questions_prompt = """
You are a helpful assistant that helps the user to ask related questions, based on user's original question and the related contexts. Please identify worthwhile topics that can be follow-ups, and write questions no longer than 20 words each. Please make sure that specifics, like events, names, locations, are included in follow up questions so they can be asked standalone. For example, if the original question asks about "the Manhattan project", in the follow up question, do not just say "the project", but use the full name "the Manhattan project". Your related questions must be in the same language as the original question.
Here are the contexts of the question:
{context}
Remember, based on the original question and related contexts, suggest three such further questions. Do NOT repeat the original question. Each related question should be no longer than 20 words. Here is the original question:
"""
_more_questions_prompt
:這個提示模板用於指導模型生成與原始問題和上下文相關的問題。
接下來程式在讀取完各個function 以及 class後 便會到最下方的入口
if __name__ == "__main__":
rag = RAG()
rag.launch()
這邊 RAG 這個 class 的實例便會創建,並以 rag 為代號。
rag.launch() 則是啟動 Photon 應用,一開始會調用init(),註冊處理器,將 RAG
類中使用 @Photon.handler
裝飾器標記的函數(如 query_function
和 ui
)註冊為 FastAPI 應用的路由處理器。
最後就是啟動服務器跟處理請求。
初始化的程式碼
def init(self):
"""
Initializes photon configs.
"""
# First, log in to the workspace.
leptonai.api.workspace.login()
self.backend = os.environ["BACKEND"].upper()
if self.backend == "LEPTON":
self.leptonsearch_client = Client(
"https://search-api.lepton.run/",
token=os.environ.get("LEPTON_WORKSPACE_TOKEN")
or WorkspaceInfoLocalRecord.get_current_workspace_token(),
stream=True,
timeout=httpx.Timeout(connect=10, read=120, write=120, pool=10),
)
elif self.backend == "BING":
self.search_api_key = os.environ["BING_SEARCH_V7_SUBSCRIPTION_KEY"]
self.search_function = lambda query: search_with_bing(
query,
self.search_api_key,
)
elif self.backend == "GOOGLE":
self.search_api_key = os.environ["GOOGLE_SEARCH_API_KEY"]
self.search_function = lambda query: search_with_google(
query,
self.search_api_key,
os.environ["GOOGLE_SEARCH_CX"],
)
elif self.backend == "SERPER":
self.search_api_key = os.environ["SERPER_SEARCH_API_KEY"]
self.search_function = lambda query: search_with_serper(
query,
self.search_api_key,
)
elif self.backend == "SEARCHAPI":
self.search_api_key = os.environ["SEARCHAPI_API_KEY"]
self.search_function = lambda query: search_with_searchapi(
query,
self.search_api_key,
)
else:
raise RuntimeError("Backend must be LEPTON, BING, GOOGLE, SERPER or SEARCHAPI.")
self.model = os.environ["LLM_MODEL"]
# An executor to carry out async tasks, such as uploading to KV.
self.executor = concurrent.futures.ThreadPoolExecutor(
max_workers=self.handler_max_concurrency * 2
)
# Create the KV to store the search results.
logger.info("Creating KV. May take a while for the first time.")
self.kv = KV(
os.environ["KV_NAME"], create_if_not_exists=True, error_if_exists=False
)
# whether we should generate related questions.
self.should_do_related_questions = to_bool(os.environ["RELATED_QUESTIONS"])
初始化程式碼的解析如下:
leptonai.api.workspace.login()
首先登錄到 Lepton AI 工作空間,這可能是為了獲取必要的認證或設置。
self.backend = os.environ["BACKEND"].upper()
從環境變量中讀取並設置後端類型(轉換為大寫)。
根據選擇的後端類型進行不同的配置:
if self.backend == "LEPTON":
self.leptonsearch_client = Client(
"https://search-api.lepton.run/",
token=os.environ.get("LEPTON_WORKSPACE_TOKEN")
or WorkspaceInfoLocalRecord.get_current_workspace_token(),
stream=True,
timeout=httpx.Timeout(connect=10, read=120, write=120, pool=10),
)
elif self.backend == "BING":
self.search_api_key = os.environ["BING_SEARCH_V7_SUBSCRIPTION_KEY"]
self.search_function = lambda query: search_with_bing(query, self.search_api_key)
# ... (類似的代碼塊用於 GOOGLE, SERPER, SEARCHAPI)
else:
raise RuntimeError("Backend must be LEPTON, BING, GOOGLE, SERPER or SEARCHAPI.")
如果提供了未知的後端類型,拋出運行時錯誤。
self.model = os.environ["LLM_MODEL"]
從環境變量設置要使用的語言模型。
self.executor = concurrent.futures.ThreadPoolExecutor(
max_workers=self.handler_max_concurrency * 2
)
創建線程池執行器以處理異步任務,如上傳到 KV 存儲。
self.kv = KV(
os.environ["KV_NAME"], create_if_not_exists=True, error_if_exists=False
)
創建或連接到一個 KV(鍵值)存儲,用於存儲搜索結果。
self.should_do_related_questions = to_bool(os.environ["RELATED_QUESTIONS"])
從環境變量決定是否生成相關問題。
這個 init
方法完成了以下關鍵任務:
當系統初始化完成並開始運行服務時,網站接到使用者的 get 請求,也就是打開該網頁時,就會跑到這段程式碼。
@Photon.handler(method="GET", path="/")
def index(self) -> RedirectResponse:
"""
Redirects "/" to the ui page.
"""
return RedirectResponse(url="/ui/index.html")
之後當使用者在搜索框輸入文字並按下確定時,網頁便會發出 post 到網站的後端,
入口也就是這邊
@Photon.handler(method="POST", path="/query")
這邊算是重頭戲,不過因為怕來不及所以就明天再解析這部分以及剩下的程式碼。