Day14: 用 llama-index 的 workflow 來把 FunctionCallingAgent 寫出來

17th鐵人賽

poyuanchih

2025-09-29 23:20:27

155 瀏覽

分享至

Situation

我們前面大多是關注在怎麼使用 Agent
- 包含 Day8: Tavily 與 FunctionAgent 的 FunctionAgent
- 以及 Day12: SubQuestionQueryEngine(中): Streaming events 與 ReActAgent 的 ReActAgent
今天是個額外任務，我們來探索一下用 workflow 把 FunctionCallingAgent 搭起來要怎麼寫
這部分預計會有兩篇
- 一篇就是今天，用 workflow 實作 FunctionCallingAgent
- 明天預計是實作 ReActAgent

Task

今天的任務就是用 workflow 組裝一個簡化版的 FunctionCallingAgent
- 如果你對 llama-index 的 workflow 感到陌生，可以參考:
  - Day10: CitationQueryEngine 與 Workflow
    - 我們使用 workflow 的基本元件搭建了 CitationQueryEngine
  - 或是官方的教學: llama-index_Learn_basic_workflow
- 這是今天主要參考的官方範例: Workflow for a Function Calling Agent
  - 我們會修正 ChatMemoryBuffer 已經 Deprecated 的問題
  - 然後把工具換成 Tavily
  - 還有把 llm 的 streaming 改掉，因為我們想要用 gpt5-mini

Action

首先是整體 workflow 的 Design

我們直接上圖:
1.1 [event] StartEvent: input 裡面放的是 user 本輪問的問題
1.2 [step] prepare_chat_history: 這個主要做兩件事
- 更新 memory: 從 memory 拿舊的 chat_history，然後把新的 user_message 加進去
  - 所以這邊可以看到，預設是支援多倫呼叫的
- 把帶有新 message 的 list of message (chat_history) 用 InputEvent 發出去
1.3 [event] InputEvent: input 裡面放的是 chat_history
1.4 [step] handle_llm_input: 這個主要就是去用 chat_history 呼叫 llm
- 首先他可能有兩種結果:
  - llm 決定用 tool，所以回傳 ToolCallEvent，那就是接 1.5
  - llm 決定不要用 tool，所以回傳 StopEvent，那就是接 1.7
- 中間會更新 memory，來支持我們的多倫呼叫
- 還有要用 streaming event 先把 llm 的結果發出去
1.5 [event] ToolCallEvent: tool_calls 裡面放的是 list of ToolSelection
- 基本上就是有: tool_id, tool_name 還有 tool_kwargs
  - 注意到是 list of ，所以預設是支持多個工具同時呼叫的
- 關於 ToolSelection 的 source code: llama-index_ToolSelection
1.6 [step] handle_tool_calls: 就是幫 llm 去 call tool
- 結果會是一個 InputEvent(chat_history) ，然後就接回 1.3
  - 但是不會只有 tool 的結果
- 如果配置的 tool 找不到 llm 要求 call 的 tool，用 message 回傳沒這個 tool
- 如果 call tool 有任何 error，把 error 用 message 回傳要他想辦法
- 如果成功 call 了，才是把 tool 回傳加回 message
- 然後更新 memory
- 把 chat_history 包給 InputEvent 回 1.3
1.7 [event] StopEvent: result 放的是 dictionary
- StopEvent 只有 1.4 的 handle_llm_input 會發
- dictionary 有兩個 key:
  - response: 就是最終的回答
  - source: list of 每一輪的 tool_output

import

import os
from dotenv import find_dotenv, load_dotenv
_ = load_dotenv(find_dotenv())

from llama_index.core.llms import ChatMessage

from llama_index.core.workflow import Event
from typing import Any, List
from llama_index.core.llms.function_calling import FunctionCallingLLM
# from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.memory import Memory
from llama_index.core.tools import ToolSelection, ToolOutput

from llama_index.core.tools.types import BaseTool
from llama_index.core.workflow import (
    Context,
    Workflow,
    StartEvent,
    StopEvent,
    step,
)
from llama_index.llms.openai import OpenAI

from llama_index.utils.workflow import draw_all_possible_flows

這邊主要的重點是 memory from llama_index.core.memory import Memory
- 如果你對 memory 不熟悉可以參考 llama-index_Component Guides Memory
  - 看到 Managing the Memory Manually 就完全夠用了
  - 主要就是 memory.get() 會拿到 chat_history
  - 然後 memory.put(message) 可以把新的 message 加進去

event

class InputEvent(Event):
    input: list[ChatMessage]

class ToolCallEvent(Event):
    tool_calls: list[ToolSelection]

class FunctionOutputEvent(Event):
    output: ToolOutput

class StreamEvent(Event):
    msg: ChatMessage

StreamEvent 不熟悉可以參考 Day12: SubQuestionQueryEngine(中): Streaming events 與 ReActAgent

Workflow

4.1 init

class FunctionCallingAgent(Workflow):
    def __init__(
        self,
        *args: Any,
        llm: FunctionCallingLLM | None = None,
        tools: List[BaseTool] | None = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(*args, **kwargs)
        self.tools = tools or []

        self.llm = llm or OpenAI()
        assert self.llm.metadata.is_function_calling_model

~~跟 Day13 不一樣，這次就把該用的都放在 init 了~~
4.2 prepare_chat_history

    @step
    async def prepare_chat_history(
        self, ctx: Context, ev: StartEvent
    ) -> InputEvent:
        # clear sources
        await ctx.store.set("sources", [])

        # check if memory is setup
        memory = await ctx.store.get("memory", default=None)
        if not memory:
            #memory = ChatMemoryBuffer.from_defaults(llm=self.llm)
            memory = Memory.from_defaults(token_limit=40000)

        # get user input
        user_input = ev.input
        user_msg = ChatMessage(role="user", content=user_input)
        memory.put(user_msg)

        # get chat history
        chat_history = memory.get()

        # update context
        await ctx.store.set("memory", memory)

        return InputEvent(input=chat_history)

這個 sources 放的是所有 tool 呼叫的結果，每次會先清空
memory 的部分如前述
4.3 handle_llm_input

    @step
    async def handle_llm_input(
        self, ctx: Context, ev: InputEvent
    ) -> ToolCallEvent | StopEvent:
        chat_history = ev.input

        # stream the response
        #response_stream = await self.llm.astream_chat_with_tools(
        #    self.tools, chat_history=chat_history
        #)

        response = await self.llm.achat_with_tools(
            self.tools, chat_history=chat_history
        )
        ctx.write_event_to_stream(StreamEvent(msg=response.message))

        # save the final response, which should have all content
        memory = await ctx.store.get("memory")
        memory.put(response.message)
        await ctx.store.set("memory", memory)

        # get tool calls
        tool_calls = self.llm.get_tool_calls_from_response(
            response, error_on_no_tool_call=False
        )  # 如果這邊沒有 tool calls 回傳就會是 []

        if not tool_calls:
            sources = await ctx.store.get("sources", default=[])
            return StopEvent(
                result={"response": response, "sources": [*sources]}
            )
        else:
            return ToolCallEvent(tool_calls=tool_calls)

這邊要看的是回傳型別是 ToolCallEvent | StopEvent
有分支的 workflow 就是這樣做出來的
workflow 跟 visualize 是靠回傳型別知道要執行哪個 step 的
然後要注意的是這邊是呼叫 llm.achat_with_tools
- 所以他會幫我們把 tool name 跟 tool description 的資訊給 llm
- ReAct Agent 就不是這樣呼叫了，我們明天會看
llm.get_tool_calls_from_response 回傳的是 list of ToolSelection
4.4 handle_tool_calls

    @step
    async def handle_tool_calls(
        self, ctx: Context, ev: ToolCallEvent
    ) -> InputEvent:
        tool_calls = ev.tool_calls  # model 要 call 的 tool
        tools_by_name = {tool.metadata.get_name(): tool for tool in self.tools}  # 可以使用的 tool

        tool_msgs = []
        sources = await ctx.store.get("sources", default=[])

        # call tools -- safely!
        for tool_call in tool_calls:
            tool = tools_by_name.get(tool_call.tool_name)
            additional_kwargs = {
                "tool_call_id": tool_call.tool_id,
                "name": tool.metadata.get_name(),
            }
            if not tool:
                tool_msgs.append(
                    ChatMessage(
                        role="tool",
                        content=f"Tool {tool_call.tool_name} does not exist",
                        additional_kwargs=additional_kwargs,
                    )
                )
                continue

            try:
                tool_output = tool(**tool_call.tool_kwargs)
                sources.append(tool_output)
                tool_msgs.append(
                    ChatMessage(
                        role="tool",
                        content=tool_output.content,
                        additional_kwargs=additional_kwargs,
                    )
                )
            except Exception as e:
                tool_msgs.append(
                    ChatMessage(
                        role="tool",
                        content=f"Encountered error in tool call: {e}",
                        additional_kwargs=additional_kwargs,
                    )
                )

        # update memory
        memory = await ctx.store.get("memory")
        for msg in tool_msgs:
            memory.put(msg)

        await ctx.store.set("sources", sources)
        await ctx.store.set("memory", memory)

        chat_history = memory.get()
        return InputEvent(input=chat_history)

tool_output 本身包含了:
- tool_name
- raw_input
- raw_output
- is_error
- source code: llama-index_ToolOutput

visualize

draw_all_possible_flows(
    FunctionCallingAgent, filename="day14_FunctionCallingAgent.html"
)

結果就是前面 1 的圖

tools

from llama_index.tools.tavily_research.base import TavilyToolSpec

TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")
tavily_tool = TavilyToolSpec(
    api_key=TAVILY_API_KEY,
)
tools = tavily_tool.to_tool_list()
# 把工具轉成文字
tool_descs = []
for tool in tools:
    tool_descs.append(f"{tool.metadata.name}: {tool.metadata.description}")

tools_str = "\n".join(tool_descs)
tools_str

今天用的一樣是 Tavily
我們順便看一下預設的 name 跟 description

search: search(query: str, max_results: Optional[int] = 6) -> List[llama_index.core.schema.Document]
Run query through Tavily Search and return metadata.

usage

agent = FunctionCallingAgent(
    llm=OpenAI(model="gpt-5-mini", is_streaming=False), tools=tools, timeout=120, verbose=True
)
ret = await agent.run(input="Hello!")

這個就主要測試能跑

Summary

我們今天嘗試用 workflow 來把 FunctionCallingAgent 寫出來
我們學會了架設有分支的 workflow
而且這個隨著問不同的問題，真的不知道到底會跑多少 step
- 交給 llm 決定什麼時候停止
學了基礎的 memory 使用
還有釐清了 FunctionCallingAgent 的呼叫細節
- 包含 achat_with_tools
- Context、memory、chat_history
- 以及帶有 error handle 的 ToolCalling
明天來實作 ReAct Agent 跟今天可以有個具體的比較