Day6 GAI爆炸時代 - LangChain 進階介紹 Output Parsers - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

2024 iThome 鐵人賽

DAY 6

生成式 AI

LLM 應用、開發框架、RAG優化及評估方法系列第 6 篇

Day6 GAI爆炸時代 - LangChain 進階介紹 Output Parsers

16th鐵人賽

wow_ppwx

2024-08-13 00:41:02

588 瀏覽

分享至

Output Parsers(輸出內容解析器)

可客製化輸出的結果

輸出文字格式的回覆內容

from langchain_core.output_parsers import StrOutputParser
str_parser = StrOutputParser()

message = chat_model.invoke("請提供一個NBA球隊和裡面最強的球員")
print(message.content)

NBA球隊：洛杉磯湖人隊

最強的球員：勒布朗·詹姆斯（LeBron James）

print(str_parser.invoke(message)) # 用StrOutputParser()會自動取出content

NBA球隊：洛杉磯湖人隊

最強的球員：勒布朗·詹姆斯（LeBron James）

輸出JSON格式的內容

from langchain_core.output_parsers import JsonOutputParser

json_parser = JsonOutputParser() # 建立針對JSON格式的輸出內容解析物件
format_instructions = json_parser.get_format_instructions()
print(format_instructions)

Return a JSON object.

message = chat_model.invoke("請提供一個NBA球隊和裡面最強的球員," f"{format_instructions},使用英文回答")

print(message.content)

{
  "team": "Los Angeles Lakers",
  "strongest_player": "LeBron James"
}

# 轉成字典
json_output = json_parser.invoke(message)
print(json_output)

{'team': 'Los Angeles Lakers', 'strongest_player': 'LeBron James'}

串接輸出內容解析器

prompt = PromptTemplate.from_template("請提供一個NBA球隊和裡面最強的中鋒、前鋒和{shoot}, " "{format_instructions}, 使用英文") # format_instructions讓他輸出JSON格式
prompt = prompt.partial(format_instructions=format_instructions)
message = chat_model.invoke(prompt.invoke({"shoot":"最強射手"}))
print(message.content)

{
  "team": "Los Angeles Lakers",
  "center": "Anthony Davis",
  "power forward": "LeBron James",
  "shooting guard": "Klay Thompson"
}

json_output = json_parser.invoke(message)
json_output

{'team': 'Los Angeles Lakers',
 'center': 'Anthony Davis',
 'power forward': 'LeBron James',
 'shooting guard': 'Klay Thompson'}

輸出csv格式內容

from langchain_core.output_parsers import CommaSeparatedListOutputParser

list_parser = CommaSeparatedListOutputParser()
print(list_parser.get_format_instructions())

Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`

prompt = PromptTemplate.from_template("請說出國家{city}的知名景點\n{instructions}").partial(instructions = list_parser.get_format_instructions())
response = chat_model.invoke(prompt.format(city="日本"))
print(response.content)

print(list_parser.invoke(response))

東京迪士尼樂園, 京都清水寺,  京都金閣寺,  大阪環球影城,  關西奈良公園,  北海道富良野,  沖繩美國村,  富士山,  鹿兒島櫻島,  札幌時計台
['東京迪士尼樂園', '京都清水寺', '京都金閣寺', '大阪環球影城', '關西奈良公園', '北海道富良野', '沖繩美國村', '富士山', '鹿兒島櫻島', '札幌時計台']

自訂輸出格式

from typing import List
from langchain.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field   # BaseModel:宣告類別和屬性、Field:描述這些屬性的具體意義和用途

class TravelPlan(BaseModel):  # 利用一個class，定義好想要輸出的格式
    destination: str = Field(description="旅遊目的地, 如日本北海道")
    activities: List[str] = Field(description="推薦的活動")
    budget: float = Field(description="預算範圍,單位新台幣")
    accommodation: List[str] = Field(description="住宿選項")

parser = PydanticOutputParser(pydantic_object=TravelPlan)  # pydantic_object必須傳入pydantic類別
format_instructions = parser.get_format_instructions()
print(format_instructions)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"destination": {"title": "Destination", "description": "\u65c5\u904a\u76ee\u7684\u5730, \u5982\u65e5\u672c\u5317\u6d77\u9053", "type": "string"}, "activities": {"title": "Activities", "description": "\u63a8\u85a6\u7684\u6d3b\u52d5", "type": "array", "items": {"type": "string"}}, "budget": {"title": "Budget", "description": "\u9810\u7b97\u7bc4\u570d,\u55ae\u4f4d\u65b0\u53f0\u5e63", "type": "number"}, "accommodation": {"title": "Accommodation", "description": "\u4f4f\u5bbf\u9078\u9805", "type": "array", "items": {"type": "string"}}}, "required": ["destination", "activities", "budget", "accommodation"]}
```

prompt = ChatPromptTemplate.from_messages(
    [("system","使用繁體中文並根據使用者要求推薦出適合的旅遊計劃,\n"
               "{format_instructions}"),
     ("human","{query}")
    ]
)
new_prompt = prompt.partial(format_instructions=format_instructions)

user_query = "我喜歡潛水以及在日落時散步, 所以想要安排一個海邊假期"
user_prompt = new_prompt.invoke({"query": user_query})
response = chat_model.invoke(user_prompt)
print(response.content)

{
    "destination": "巴厘島",
    "activities": ["潛水", "散步看日落"],
    "budget": 20000,
    "accommodation": ["海邊別墅", "度假村"]
}

parser_output = parser.invoke(response)
print(parser_output)

destination='巴厘島' activities=['潛水', '散步看日落'] budget=20000.0 accommodation=['海邊別墅', '度假村']

結構化輸出格式

from langchain.output_parsers import (
    ResponseSchema,
    StructuredOutputParser)

response_schemas = [
    ResponseSchema(   # 使用ResponseSchema建立輸出格式
        name="country_data",
        description="請提供包含國家的首都和知名景點的 JSON 物件"),
    ResponseSchema(
        name="source",
        description="回答答案的根據來源, 例如：來源網站網址",
        type="list"
    ),
    ResponseSchema(
        name="time",
        description="國家建國的時間",
        type="YYYY-MM-DD"
    )
]
output_parser = StructuredOutputParser(   # 建立物件
                            response_schemas=response_schemas)
print(output_parser.get_format_instructions())  # 會要求模型以Markdown語法中 程式碼區塊的格式輸出JSON內容

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"country_data": string  // 請提供包含國家的首都和知名景點的 JSON 物件
	"source": list  // 回答答案的根據來源, 例如：來源網站網址
	"time": YYYY-MM-DD  // 國家建國的時間
}
```

format_instructions = output_parser.get_format_instructions()
prompt = ChatPromptTemplate.from_messages([
        ("system","使用台灣語言並回答問題,{format_instructions}"),
        ("human","{question}")
        ])
prompt = prompt.partial(format_instructions=format_instructions)

response = chat_model.invoke(prompt.format(question="美國"))
print(output_parser.invoke(response))

{'country_data': '{\n\t\t"首都": "華盛頓特區",\n\t\t"知名景點": ["自由女神像", "白宮", "大峽谷"] \n\t}', 'source': ['https://www.usa.gov/'], 'time': '1776-07-04'}

print(output_parser.invoke(response)['country_data'])

    {
    		"首都": "華盛頓特區",
    		"知名景點": ["自由女神像", "白宮", "大峽谷"] 
    	}

下一次就要來講LCEL(LangChain Expression Language 囉!)

Day5 GAI爆炸時代 - LangChain 進階介紹 ChatPromptTemplate

Day7 GAI爆炸時代 - LCEL 介紹

系列文

LLM 應用、開發框架、RAG優化及評估方法共 30 篇

RSS系列文訂閱系列文

13 人訂閱

完整目錄

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22195 篇

完賽人數

600 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙

LLM 應用、開發框架、RAG優化及評估方法 系列 第 6 篇