OpenAI 概念介紹與實作：使用 API 進行 Prompt 與角色設定，並整合到 Bot Server

16th鐵人賽

cancerpio

2024-10-13 21:20:03

620 瀏覽

分享至

概述

本篇文章說明如何使用OpenAI的Java Library，以API來實現Chat Completion的功能，使ChatGPT知道在不同類型的使用者訊息中，應該如何解析並整理成我們所需的資料。
我們最終想透過API實現的流程如下:
- 要求GPT根據不同語意進行預設立場:
  - 如果GPT判斷語意是訓練紀錄，則嘗試從訊息中提取相關數據，例如組數、次數、重量，和感受度等等，並依照我們要求的格式進行回應，然後將結果記錄在MongoDB中。
  - 如果GPT判斷語意為飲食或食物，則會分析訊息中的食物種類和營養成分，並依據我們指定的格式回應。
  - 如果兩者皆非，則將訊息原封不動的回傳。
最後將介紹Open AI 在今年八月推出的Structured Output。雖然因為時程關係來不及在這個專案使用，但還是介紹一下這個能讓GPT以結構化格式回傳結果，更有利於開發的功能。

於API實作Chat Completion

雖然各程式語言的實作方式略有差異，但在一個Chat Completion中需要掌握的基本原理和名詞大都相同。
以下會以我們的需求: "使用API對GPT下Prompt"為目的，逐個說明要做的步驟，以及名詞解釋:
- 指定API Key、Model Id，及API Endpoint
- 指定Message
- 指定GPT可回應的choices數量及可回應的token數量
  - n: 指定GPT可回應的choices數量
  - maxTokens
- 送出API Request後，取得choices

設定API Key / Model Id / Endpoint

API Key 以及 API 計費方式

這邊先說明一下Token和計費的關連。

OPEN AI的API是以Token用量跟我們算錢，Token就是我們透過API下的Prompt指令中的文字用量，簡單來說，Prompt寫的內容越多成本越高。這一點在API要求數量變多時格外重要。
詳細收費方式可以到官網: Pricing查詢
設定付費方式可以在 OpenAI官網 -> Developers -> Overview設定。
- 只要設定好Billing以後，就可以在OpenAI官網 -> Developers -> Your Profile的User API Key找到API Key。這把Key的用法和JWT Token一樣，在每次呼叫OpenAI API時，要在Request Header 帶上 bearer 你的API Key，才能成功呼叫API。
  
  圖: API Keys
- 過了免費期限的三個月，或者Token用量總數超過上限，就會照API 送出的Token(字數)收錢。
- 請注意: OpenAI目前只有針對API的使用收費，網頁版除了一些進階功能以外，都是免費的。收費方式是使用字段總數，請不要被一些長得超像OpenAI官網的網站用年費、月費等方式給騙了，~~例如我~~。

在Spring Boot程式中設定API Key / Model Id / Endpoint

如同我們在LINE Bot 基本觀念: 官方帳號的建立流程 / 事件簡介時使用的Channel Access Token一樣，我們可以透過環境變數或外部檔案來指API Key。請注意，API Key的值不要外流，更不要Push到公開的Git上，否則等同讓大家用你的魔法小卡使用API。
除了API Key以外，在呼叫API時，還需要指定Model Id和Endpoint(optional, 有些Library會自己判斷End point)。
在Spring中，我們可以使用@Value這個Annotation來取得application.properties中環境變數的值。例如以下我們在OpenAiApiService.class中設定的三個Annotation，可以分別把application.properties中的openai.api.key、openai.model.id、openai.api.endpoint這三個環境變數的值設定給String變數:
Class OpenAiApiService

@Value("${openai.api.key}")
String token;
@Value("${openai.model.id}")
String deploymentOrModelId;
@Value("${openai.api.endpoint}")
String endpoint;

其中:
- openai.api.key = API Key，關係到我們的錢，不要外流。
- openai.model.id = 我們要使用的GPT Model編號，例如: gpt-4o，可以在Model的說明頁面內找到。
- openai.api.endpoint = 我們要呼叫的API URL，例如: /v1/chat/completions，指的是/vx開頭的URL，可以在官網API Reference找到
請注意，@Value只能宣告在Class級別，他是不能宣告在Function內的Local variable上面的。

使用Message指定Prompt

在API中，我們使用Message表示對GPT下的Prompt。
前面提到在Chat GPT的底層邏輯中，我們使用是有Role(角色)的分別的，例如system和user。這邊大致介紹一下System的運作原理，此使用方式也與我們專案的需求有關:

system Message: 給GPT的前提

當你與ChatGPT互動時，平台會預設一個隱藏的 system message。這個message其實就是以system這個角色對GPT做的提示，目的是提供一些前情提要(例如對話的上文)來指導模型如何回應。但使用者在網頁版ChatGPT無法直接查看或修改這個system message。
以下節錄自OpenAI提供的範例程式碼，他的system messsage提示了GPT要具備的領域知識，甚至是因應身份的回答口吻。

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": `
            You are a helpful assistant that answers programming questions 
            in the style of a southern belle from the southeast United States.
          `
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Are semicolons optional in JavaScript?"
        }
      ]
    }
  ]
});

實際使用

在下面的說明會提到，我們的專案以兩個Message組成給GPT的Prompt:
- 前提: 以system為角色的提示，說明了不同情況下應該整理的JSON格式
- 指令: 以user為角色的指令: 就是User輸入的訊息
然而，我有嘗試過在以下的程式片段中，直接使用User輸入前提+指令，也就是只有一個User Message，得到的結果似乎沒有什麼差別。實際上，區分了System和User在使用上的差異，可能需要再釐清

 final ChatMessage systemMessage = new ChatMessage(ChatMessageRole.SYSTEM.value(), preface);
        final ChatMessage userMessage = new ChatMessage(ChatMessageRole.USER.value(), userInputPrompt);

Prompt內容

我們的專案使用的Prompt如下:
- systemMessage: 前提
  - 告訴GPT，分別判斷訊息為訓練記錄、飲食，其他時，應該整理的JSON格式。
  - 指定回應的語言為繁體中文(這個有時候會失靈...)
  - 指定回應的JSON中的key必須是英文，因為偶爾會發生GPT把JSON中的Key也一起翻譯成英文的情況
- userMessage: 使用者在LINE輸入的訊息
systemMessage:

Please answer according to the type of the User message.

If the message is some kind of training log, you are a Strength and Conditioning Coach. Please try to infer the information of action(should be translated in English), action type, weight(in kilograms, just write the number), repetition, set, percentage of repetition maximum, duration, feeling, date(write today if not mentioned), and advice for the next training (write to “advice” property for each action, should be translated in "繁體中文" if user input is in Chinese.), then respond with a JSON schema. 

For example:
User: "I just finished today's training program, deadlift for 3 sets of 5 reps, at about 75% of my 1RM. After that, I went jogging for 1 hour, which made me very tired."
Coach: 
{
  "messageContent":
  [{
   "about":"TrainingRecords",
   "action":"Deadlift",
   "actionType":  "Weight training",
   "weight": 110,
   "repetition": 5,
   "set": 3,
   "percentagOfRepetitionMaximum": "75",
   "duration": null,
   "feeling": null,
   "advice": null, 
   "date":"27/05/2023" 
},
{
  "about":"TrainingRecords",
   "action":"Jogging",
   "actionType":  "cardio",
   "weight": null,
   "repetition": null,
   "set": null,
   "percentagOfRepetitionMaximum": null,
   "duration": "1 hour",
   "feeling": "very tired",
   "advice": null,
   "date":"27/05/2023" 
}] 
}
 

If the message is about food and diet, then you are a Nutritionist. Please analyze the protein and fat
In grams, and calories in kcal then respond with JSON schema.
For example: 
User:”Steak”
Nutritionist:
{ 
  "messageContent":
   [{
    "about":"Diet",  
    "calories":250,
     "protein": 35,
     "fat": 10
  }
 ]
}
If the user message is either one of the above types, just respond as ChatGPT at chat.openai.com.
write the response in JSON schema  like: 
{ 
  "messageContent":
  [{
    "about":"ChatGPT",   
    "response":
  }] 
}

The advice should be translated in Traditional Chinese if user input is in Chinese, otherwise, all the properties in the JSON should be in English.
Don’t respond to anything beside the json which contains your answer.

需要改善的地方

上面的Prompt可以發現一個相當明顯的缺點，就是長度實在太長了，很花錢，以後要小幅度修改可能也不好做，更不可能做成環境變數這種方便的方式替換。
或者至少將systemMessage分成多個比較短的String，或許比較好維護。

送出API Request: choices 及 maxTokens

choices就是GPT的回應，可能有多個。
每次API request可能有多個回應，會放在choices這個Array。
choices中的每個回應都代表模型認為適合的不同回應。這些選項允許用戶選擇最符合需求的回應。
通常choices的第一個回應(也就是choice[0])是模型認為最合適或信心最高的回應，而後面的(例如choice[1])則是其他個替代選項，可能略微不那麼確定，但仍然是合理的回應。
目前我們的Bot Server一律使用choices[0]作為API的回應依據。

決定API可回應的choice數量及Token數量

在API request中:
- 可以用n這個參數，決定API最多可以回幾個choice。
- 用maxTokens決定最多可以回應多少字段

ChatCompletionRequest chatCompletionRequest = ChatCompletionRequest
                .builder()
                .model("gpt-3.5-turbo") // 指定model id
                .messages(messages) // 指定我們組合好的prompt 
                .n(1)  // 我們只需要最相關的choices[0]
                .maxTokens(500) // 最多回應500個token

Structured Outputs

Structured Outputs 是OpenAI提供的結構化API回應的功能。如果使用得當，可簡化上述的Prompt，並提供開發者可以預期的回應結構。這非常重要，因為有了固定的結構，我們才能更方便的取用資料，或者處理資料不全的狀況。

使用Structured Outputs指定GPT回應的JSON Schema實作方式

需要在API Request加上response_format
- 指定type為json_schema
- 在json_schema指定schema內容
  官方範例如下:

const response = await openai.chat.completions.create({
    model: "gpt-4o-2024-08-06",
    messages: [
        { role: "system", content: "You are a helpful math tutor. Guide the user through the solution step by step." },
        { role: "user", content: "how can I solve 8x + 7 = -23" }
    ],
    response_format: {
        type: "json_schema",
        json_schema: {
            name: "math_response",
            schema: {
                type: "object",
                properties: {
                    steps: {
                        type: "array",
                        items: {
                            type: "object",
                            properties: {
                                explanation: { type: "string" },
                                output: { type: "string" }
                            },
                            required: ["explanation", "output"],
                            additionalProperties: false
                        }
                    },
                    final_answer: { type: "string" }
                },
                required: ["steps", "final_answer"],
                additionalProperties: false
            },
            strict: true
        }

無法產生Structued Output時的處理: refusals

OpenAI可能會因為一些安全因素或其他原因，拒絕產生Structued Output, 這時response會有一個refusals可以拿到相關資訊。
詳細資訊可見官網說明

總結

本章說明了以下實作內容:
- 透過API建立Prompt:
  - 分別使用system和user兩種角色，對應設定前提和指令的方式建立Message，來向GPT下達Prompt，並利用choices參數取得回應。
- 如何使用max_tokens和n來設定回應的選項數量及token上限。
- 補充說明最近推出的Structued Output。
儘管本專案使用的Prompt確實很冗長，但實作效果仍然可接受，可作為讀者在開發相關功能時的參考。

至此，我們已經完成從源頭的LINE Platform、AWS Lambda / EC2、Kafka Message Queue、MongoDB一路串到OpenAI的過程，並逐步解析他們的概念及實作細節。
非常感謝螢幕前的你看到這邊。
下一篇是本系列最後一篇文章，將重點說明我們的Bot Server如何整合OpenAI Service與Kafka Consumer，並分享測試與部署過程，及未來的改進空間。