2023 iThome 鐵人賽

DAY 10

AI & Data

AI 再次入門到進階系列第 10 篇

【Day10】Langchain 教學的簡單中文化

15th鐵人賽

中年一般人

2023-09-15 21:18:15

5372 瀏覽

分享至

這篇是我算是(想偷懶跟好奇)入門 Langchain 的筆記，整個教學都蠻基本的，算是一個感覺教學普通但是未來可期的應用框架。

極度推薦使用Colab打開

LangChain 入門教學的簡單中文化

LangChain 是一個用於"開發由語言模型驅動的應用程式"的框架。

GitHub: https://github.com/hwchase17/langchain
Docs: https://python.langchain.com/en/latest/index.html

概述:

安裝
LLMs
提示模板
Chains 鏈或工作流
Agents and Tools 代理跟工具
Memory 記憶或內存
Document Loaders 文檔載入
Indexes 索引

Installation

!pip install langchain

1. LLMs

LLMs 的通用接口：

!pip install openai

import os
os.environ["OPENAI_API_KEY"] ="請貼上你自己的OPENAI_API_KEY"
from langchain.llms import OpenAI

llm = OpenAI(temperature=0.9)  # model_name="text-davinci-003"
text = "What would be a good company name for a company that makes colorful socks?"
print(llm(text))

輸出為：VividToes.

!pip install huggingface_hub

import os
from langchain import HuggingFaceHub
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "請貼上你自己的HUGGINGFACEHUB_API_TOKEN" # 請使用 read 的 token

# https://huggingface.co/google/flan-t5-base
llm = HuggingFaceHub(repo_id="google/flan-t5-base", model_kwargs={"temperature":0.9, "max_length":64})

llm("translate English to German: How old are you?")
#print(llm("translate English to German: How old are you?"))

輸出為 Wie old sind Sie?

2. 提示模板

LangChain 能用於 prompt 提示的管理跟優化

通常來說您在應用程式中使用 LLMs 時，您不會將用戶輸入直接發送到 LLM。開發者通常需要獲取用戶輸入並"加上"構建提示 (像是加上背景, 情況或格式之類的)，然後才將其發送給 LLM。

llm("Can Barack Obama have a conversation with George Washington?")

輸出為： no
'''
prompt = """Question: Can Barack Obama have a conversation with George Washington?

Let's think step by step.

Answer: """
llm(prompt)
'''
輸出為：George Washington was born in 1789. Barack Obama was born in 1803. The answer: no.

from langchain import PromptTemplate

template = """Question: {question}

Let's think step by step.

Answer: """

prompt = PromptTemplate(template=template, input_variables=["question"])

prompt.format(question="Can Barack Obama have a conversation with George Washington?")

輸出為 Question: Can Barack Obama have a conversation with George Washington?\n\nLet's think step by step.\n\nAnswer:

3. Chains 工作流

將 LLMS 和 Prompts 結合到多步驟工作流程中

from langchain import LLMChain

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "Can Barack Obama have a conversation with George Washington?"

print(llm_chain.run(question))

輸出為：George Washington was born in 1789. Barack Obama was born in 1803. The answer: no.

4. Agents and Tools 代理和工具

代理需要 LLM 決定要採取哪些行動，採取該行動，查看觀察，並重複直到完成。

如果使用得當，代理可以發揮極其強大的作用。為了加載代理，您應該了解以下概念：

工具：一個執行特定職責的功能。這可以是：Google 搜索、數據庫查找、Python REPL、其他鏈(工作流)。
LLM：為代理提供支持的語言模型。
代理：要使用的代理。

Tools: https://python.langchain.com/en/latest/modules/agents/tools.html

Agent Types: https://python.langchain.com/en/latest/modules/agents/agents/agent_types.html

from langchain.agents import load_tools
from langchain.agents import initialize_agent
!pip install wikipedia
from langchain.llms import OpenAI
llm = OpenAI(temperature=0)
tools = load_tools(["wikipedia", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
agent.run("In what year was the film Departed with Leopnardo Dicaprio released? What is this year raised to the 0.43 power?")

下面為輸出結果：

Entering new AgentExecutor chain...
I need to find out the year the film was released and then use the calculator to calculate the power.
Action: Wikipedia
Action Input: Departed with Leonardo Dicaprio
Observation: Page: Leonardo DiCaprio filmography
Summary: Leonardo DiCaprio is an American actor who began his career performing as a child on television. He appeared on the shows The New Lassie (1989) and Santa Barbara (1990) and also had long running roles in the comedy-drama Parenthood (1990) and the sitcom Growing Pains (1991). DiCaprio played Tobias "Toby" Wolff opposite Robert De Niro in the biographical coming-of-age drama This Boy's Life in 1993. In the same year, he had a supporting role as a developmentally disabled boy Arnie Grape in What's Eating Gilbert Grape, which earned him nominations for the Academy Award for Best Supporting Actor and the Golden Globe Award for Best Supporting Actor – Motion Picture. In 1995, DiCaprio played the leading roles of an American author Jim Carroll in The Basketball Diaries and the French poet Arthur Rimbaud in Total Eclipse. The following year he played Romeo Montague in the Baz Luhrmann-directed film Romeo + Juliet (1996). DiCaprio starred with Kate Winslet in the James Cameron-directed film Titanic (1997). The film became the highest grossing at the worldwide box-office, and made him famous globally. For his performance as Jack Dawson, he received the MTV Movie Award for Best Male Performance and his first nomination for the Golden Globe Award for Best Actor – Motion Picture Drama.In 2002, DiCaprio played con-artist Frank Abagnale, Jr. opposite Tom Hanks in the Steven Spielberg-directed biographical crime-drama Catch Me If You Can and also starred in the Martin Scorsese-directed historical drama Gangs of New York. He founded his own production company, Appian Way, in 2004. The next two films he starred in were both directed by Scorsese: the Howard Hughes biopic The Aviator (2004) and the crime drama The Departed (2006). For his portrayal of Hughes in the former, DiCaprio won the Golden Globe Award for Best Actor – Motion Picture Drama and garnered his first nomination for the Academy Award for Best Actor.DiCaprio produced the environmental documentary The 11th Hour and the comedy drama Gardener of Eden in 2007. The following year, he reunited with Kate Winslet in the Sam Mendes-directed drama Revolutionary Road and appeared in the Ridley Scott-directed action film Body of Lies. DiCaprio reteamed with Scorsese in 2010 in the psychological thriller Shutter Island and also starred in the Christopher Nolan-directed science fiction heist thriller Inception. In 2011, he portrayed J. Edgar Hoover, the first director of the FBI, in the biopic J. Edgar. The following year, he played a supporting role in the Quentin Tarantino-directed western Django Unchained. DiCaprio starred in two film adaptations of novels in 2013; he first appeared as Jay Gatsby in the Luhrmann-directed adaptation of F. Scott Fitzgerald's novel The Great Gatsby, and later as Jordan Belfort in The Wolf of Wall Street, an adaptation of Belfort's memoir of the same name. The latter earned him a third Academy Award nomination for Best Actor and a Golden Globe Award for Best Actor – Motion Picture Musical or Comedy. In 2015, DiCaprio played fur trapper Hugh Glass in the survival drama The Revenant, for which he won the Academy Award for Best Actor. In 2019, he starred as an actor on the decline in the Tarantino-directed comedy-drama Once Upon a Time in Hollywood with Brad Pitt and Margot Robbie.

Page: Martin Scorsese and Leonardo DiCaprio
Summary: Martin Scorsese and Leonardo DiCaprio are frequent collaborators in cinema, with DiCaprio appearing in six feature films and one short film made by Scorsese since 2002. The films explore a variety of genres, including historical epic, crime, thriller, biopic, comedy and western. Several have been listed on many critics' year-end top ten and best-of-decade lists.
The duo's films have been nominated for thirty-one Academy Awards, winning nine. In 2013, the duo was awarded National Board of Review Spotlight award for career collaboration. Scorsese's work with DiCaprio is considered to be as vi
Thought: I now know the year the film was released
Action: Calculator
Action Input: 2006 raised to the 0.43 power
Observation: Answer: 26.30281917656938
Thought: I now know the final answer
Final Answer: The film Departed with Leonardo DiCaprio was released in 2006 and this year raised to the 0.43 power is 26.30281917656938.

Finished chain.
The film Departed with Leonardo DiCaprio was released in 2006 and this year raised to the 0.43 power is 26.30281917656938.

5. 內存

將狀態添加到鍊(工作流)和代理。

內存是鏈(工作流)/代理調用之間持久狀態的概念。
LangChain 提供了內存的標準接口、內存應用的集合以及使用內存的鏈(工作流)/代理的範例。

from langchain import OpenAI, ConversationChain

llm = OpenAI(temperature=0)
conversation = ConversationChain(llm=llm, verbose=True)

conversation.predict(input="Hi there!")

下面為輸出：

Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi there!
AI:

Finished chain.
Hi there! It's nice to meet you. How can I help you today?

conversation.predict(input="Can we talk about AI?")

下面為輸出：

Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI: Hi there! It's nice to meet you. How can I help you today?
Human: Can we talk about AI?
AI:

Finished chain.
Absolutely! What would you like to know about AI?

conversation.predict(input="I'm interested in Reinforcement Learning.")

下面為輸出：

Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI: Hi there! It's nice to meet you. How can I help you today?
Human: Can we talk about AI?
AI: Absolutely! What would you like to know about AI?
Human: I'm interested in Reinforcement Learning.
AI:

Finished chain.
Sure! Reinforcement Learning is a type of machine learning algorithm that allows an AI agent to learn from its environment by taking actions and receiving rewards or punishments. It is used to solve complex problems that require trial and error. Would you like to know more about how it works?

6. 文件裝載器

將語言模型與您自己的文本數據相結合是提升它們的有效方法。

執行此操作的第一步是將數據加載到“文檔”中 - 這是表達某些文本片段的一種花俏方式。該模塊旨在使這變得簡單。

https://python.langchain.com/en/latest/modules/indexes/document_loaders.html

from langchain.document_loaders import NotionDirectoryLoader

loader = NotionDirectoryLoader("Notion_DB")

docs = loader.load()

7. 索引

索引是指構建文檔的方法，以便 LLMs能夠最好地與它們交互。該模塊包含用於處理文檔的實用函數

嵌入：嵌入是一條信息的數字表示，例如文本、文檔、圖像、音頻等。
文本分割器：當您想要處理長文本時，有必要將該文本分割成塊。
Vectorstores：矢量數據庫存儲和索引來自 NLP 模型的矢量嵌入，以了解文本字符串、句子和整個文檔的含義和上下文，從而獲得更準確和相關的搜索結果。

import requests

url = "https://raw.githubusercontent.com/hwchase17/langchain/master/docs/modules/state_of_the_union.txt"
res = requests.get(url)
with open("state_of_the_union.txt", "w") as f:
  f.write(res.text)
  
# Document Loader
from langchain.document_loaders import TextLoader
loader = TextLoader('./state_of_the_union.txt')
documents = loader.load()

# Text Splitter
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

!pip install sentence_transformers

# Embeddings
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()

#text = "This is a test document."
#query_result = embeddings.embed_query(text)
#doc_result = embeddings.embed_documents([text])