2024 iThome 鐵人賽

DAY 18

生成式 AI

運用生成式 AI 服務所提供的API 實做應用開發（以Gemini及ChatGPT為例）系列第 18 篇

4-5 Gemini API function calling 實作

16th鐵人賽

Wolke

2024-08-18 00:04:11

1974 瀏覽

分享至

4-5 Gemini API function calling 實作

https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Function_calling.ipynb

簡介

函數調用允許開發者在代碼中創建函數描述，然後將該描述傳遞給語言模型進行請求。模型的回應包含與描述匹配的函數名稱及其調用所需的參數。函數調用讓您可以在生成式 AI 應用中使用函數作為工具，並且您可以在單個請求中定義多個函數。

本文提供了幫助您入門的代碼範例。

安裝 Python SDK

!pip install -U -q google-generativeai  # 安裝 Python SDK

導入 SDK

import google.generativeai as genai

設定 API 密鑰

要運行以下代碼單元，您的 API 密鑰必須存儲在名為 GOOGLE_API_KEY 的 Colab Secret 中。如果您還沒有 API 密鑰，或者不確定如何創建 Colab Secret，請參見認證快速入門示例。

from google.colab import userdata

GOOGLE_API_KEY = userdata.get("GOOGLE_API_KEY")
genai.configure(api_key=GOOGLE_API_KEY)

函數調用基礎

要使用函數調用，請在創建 GenerativeModel 時將函數列表傳遞給 tools 參數。模型使用函數名稱、文檔字符串、參數和參數類型註釋來決定是否需要該函數來最佳回答提示。

重要提示：SDK 將函數參數類型註釋轉換為 API 理解的格式（genai.protos.FunctionDeclaration）。API 僅支持有限的參數類型，Python SDK 的自動轉換僅支持其中的子集：AllowedTypes = int | float | bool | str | list['AllowedTypes'] | dict

def add(a: float, b: float):
    """返回 a + b。"""
    return a + b

def subtract(a: float, b: float):
    """返回 a - b。"""
    return a - b

def multiply(a: float, b: float):
    """返回 a * b。"""
    return a * b

def divide(a: float, b: float):
    """返回 a / b。"""
    return a / b

model = genai.GenerativeModel(
    model_name="gemini-1.5-flash", tools=[add, subtract, multiply, divide]
)

model

自動函數調用

函數調用自然適合多輪對話，因為它們捕捉了用戶和模型之間的來回互動。Python SDK 的 ChatSession 是對話的理想界面，因為它會為您處理對話歷史，並且使用 enable_automatic_function_calling 參數可以進一步簡化函數調用：

chat = model.start_chat(enable_automatic_function_calling=True)

啟用自動函數調用後，ChatSession.send_message 如果模型要求，會自動調用您的函數。

在以下示例中，結果似乎只是包含正確答案的文本回應：

response = chat.send_message(
    "我有 57 隻貓，每隻貓有 44 隻手套，總共有多少隻手套？"
)
response.text

然而，通過檢查聊天歷史，您可以看到對話的流程以及函數調用如何集成其中。

ChatSession.history 屬性存儲了用戶和 Gemini 模型之間對話的時間順序記錄。對話中的每個回合都由 genai.protos.Content 對象表示，其中包含以下信息：

角色：標識內容是來自 "user" 還是 "model"。
部分：一個 genai.protos.Part 對象列表，代表消息的各個組成部分。在僅文本模型中，這些部分可以是：
- 文本：純文本消息。
- 函數調用（genai.protos.FunctionCall）：模型請求執行特定函數並提供參數。
- 函數回應（genai.protos.FunctionResponse）：用戶在執行請求的函數後返回的結果。

在之前的手套計算示例中，歷史顯示了以下順序：

用戶：問關於總手套數量的問題。
模型：確定乘法函數是有用的，並向用戶發送一個 FunctionCall 請求。
用戶：ChatSession 自動執行該函數（由於設置了 enable_automatic_function_calling），並發送一個包含計算結果的 FunctionResponse。
模型：使用該函數的輸出來構建最終答案，並將其作為文本回應呈現。

for content in chat.history:
    print(content.role, "->", [type(part).to_dict(part) for part in content.parts])
    print("-" * 80)

一般情況下，狀態圖如下：

!https://ai.google.dev/images/tutorials/function_call_state_diagram.png

模型可以在返回文本回應之前響應多個函數調用，並且函數調用出現在文本回應之前。

手動函數調用

為了獲得更多控制，您可以自行處理來自模型的 genai.protos.FunctionCall 請求。如果：

您使用默認 enable_automatic_function_calling=False 的 ChatSession。
您使用 GenerativeModel.generate_content（並自己管理聊天歷史）。

以下示例是 Python 中單輪 curl 示例的函數調用的粗略等價物。它使用返回（模擬）電影播放時間信息的函數，可能來自一個假想的 API：

def find_movies(description: str, location: str = ""):
    """根據任何描述、類型、標題字詞等查找目前在影院上映的電影標題。

    參數：
        description：任何類型的描述，包括類別或類型、標題字詞、屬性等。
        location：城市和州，例如 San Francisco, CA 或郵政編碼，例如 95616
    """
    return ["Barbie", "Oppenheimer"]

def find_theaters(location: str, movie: str = ""):
    """根據位置查找影院，並可選擇電影標題。

    參數：
        location：城市和州，例如 San Francisco, CA 或郵政編碼，例如 95616
        movie：任何電影標題
    """
    return ["Googleplex 16", "Android

 Theatre"]

def get_showtimes(location: str, movie: str, theater: str, date: str):
    """
    查找特定影院上映電影的開場時間。

    參數：
      location：城市和州，例如 San Francisco, CA 或郵政編碼，例如 95616
      movie：任何電影標題
      theater：影院名稱
      date：所需的放映日期
    """
    return ["10:00", "11:00"]

使用字典來使按名稱查找函數變得更容易。您也可以使用它來將函數數組傳遞給 GenerativeModel 的 tools 參數。

functions = {
    "find_movies": find_movies,
    "find_theaters": find_theaters,
    "get_showtimes": get_showtimes,
}

model = genai.GenerativeModel(model_name="gemini-1.5-flash", tools=functions.values())

使用 generate_content() 提出問題後，模型會請求 function_call：

response = model.generate_content(
    "Mountain View 哪些影院在放映 Barbie 電影？"
)
response.candidates[0].content.parts

由於這不是使用自動函數調用的 ChatSession，因此您需要自行調用函數。

可以使用 if 語句來實現：

if function_call.name == 'find_theaters':
  find_theaters(**function_call.args)
elif ...

然而，由於您已經創建了 functions 字典，這可以簡化為：

def call_function(function_call, functions):
    function_name = function_call.name
    function_args = function_call.args
    return functions[function_name](**function_args)

part = response.candidates[0].content.parts[0]

# 檢查是否為函數調用；在實際使用中，您需要處理文本回應，因為您無法預知模型會回應什麼。
if part.function_call:
    result = call_function(part.function_call, functions)

print(result)

最後，將回應和消息歷史傳遞給下一個 generate_content() 調用，以從模型中獲取最終的文本回應。

from google.protobuf.struct_pb2 import Struct

# 將結果放入 protobuf Struct 中
s = Struct()
s.update({"result": result})

# 更新此後 <https://github.com/google/generative-ai-python/issues/243>
function_response = genai.protos.Part(
    function_response=genai.protos.FunctionResponse(name="find_theaters", response=s)
)

# 構建消息歷史
messages = [
    {"role": "user", "parts": ["Mountain View 哪些影院在放映 Barbie 電影？"]},
    {"role": "model", "parts": response.candidates[0].content.parts},
    {"role": "user", "parts": [function_response]},
]

# 生成下一個回應
response = model.generate_content(messages)
print(response.text)

平行函數調用

Gemini API 可以在單個回合中調用多個函數。這適用於需要多個函數獨立完成任務的情況。

首先設置工具。與上面的電影示例不同，這些函數不需要相互輸入，因此它們應該是平行調用的良好候選者。

def power_disco_ball(power: bool) -> bool:
    """啟動旋轉的迪斯科球。"""
    print(f"迪斯科球正在{'旋轉！' if power else '停止。'}")
    return True

def start_music(energetic: bool, loud: bool, bpm: int) -> str:
    """播放符合指定參數的音樂。

    參數：
      energetic：音樂是否充滿活力。
      loud：音樂是否響亮。
      bpm：每分鐘節拍數。

    返回：正在播放的歌曲名稱。
    """
    print(f"開始播放音樂！ {energetic=} {loud=}, {bpm=}")
    return "Never gonna give you up."

def dim_lights(brightness: float) -> bool:
    """調暗燈光。

    參數：
      brightness：燈光亮度，0.0 為關閉，1.0 為全亮。
    """
    print(f"燈光現在設置為 {brightness:.0%}")
    return True

現在用包含所有指定工具的指令調用模型。

# 設置模型與工具。
house_fns = [power_disco_ball, start_music, dim_lights]
# 嘗試使用 Pro 和 Flash...
model = genai.GenerativeModel(model_name="gemini-1.5-flash", tools=house_fns)

# 調用 API。
chat = model.start_chat()
response = chat.send_message("把這個地方變成派對現場！")

# 打印每個函數調用請求。
for part in response.parts:
    if fn := part.function_call:
        args = ", ".join(f"{key}={val}" for key, val in fn.args.items())
        print(f"{fn.name}({args})")

每個列印的結果都反映了模型請求的單個函數調用。要返回結果，請按照請求的順序包含回應。

# 模擬指定工具的回應。
responses = {
    "power_disco_ball": True,
    "start_music": "Never gonna give you up.",
    "dim_lights": True,
}

# 構建回應部分。
response_parts = [
    genai.protos.Part(function_response=genai.protos.FunctionResponse(name=fn, response={"result": val}))
    for fn, val in responses.items()
]

response = chat.send_message(response_parts)
print(response.text)

後續步驟

有用的 API 參考：

genai.GenerativeModel 類
- 它的 GenerativeModel.generate_content 方法在後台構建 genai.protos.GenerateContentRequest。
  - 請求的 .tools 字段包含一個 genai.protos.Tool 對象的列表。
  - 工具的 function_declarations 屬性包含一個 FunctionDeclarations 對象的列表。
回應 response 可能包含 genai.protos.FunctionCall，位於 response.candidates[0].contents.parts[0]。
如果設置了 enable_automatic_function_calling，則 genai.ChatSession 執行調用，並返回 genai.protos.FunctionResponse。
在響應 FunctionCall 時，模型總是期望 FunctionResponse。
如果您使用 chat.send_message 或 model.generate_content 手動回應，請記住 API 是無狀態的，您必須發送整個對話歷史（content 對象列表），而不僅僅是包含 FunctionResponse 的最後一個。
https://github.com/google-gemini/cookbook/blob/main/quickstarts/Function_calling_config.ipynb