[D12] 我的第一個聊天機器人 - 外部資料的整合

15th鐵人賽 langchain openai chatgpt 聊天機器人

Ted Chen

2023-09-16 05:44:30

1544 瀏覽

分享至

很高興你我都堅持到第 12 天了，希望這些內容能讓你感到越來越受益。

在本篇文章中，我們將深入探討如何將多個內部提示訊息處理單元和外部資料進行系統性整合。

示範情境

我們本篇文章的程式碼將模擬一個智慧詞彙教學助教，它能根據你選定的學習目標—例如生活用語、旅行用語或商業用語—提供個性化教材。一旦你選擇了學習情境，這個助教會從外部學習資料庫選取相應的學習內容作為教學資源，並隨機選擇一個詞彙來教學。

實做細節

當使用者輸入一條訊息後，系統首先檢查是否已知該使用者的學習目標。如果尚未確定，訊息將通過一個學習目標檢查器進行處理，以提取學生的意圖。其提示設計如下：

def check_user_learning_topic(user_message, verbose=False):
    """
    使用者學習目標的分類器。
    我們預設提供三種學習目標：生活用語、旅行用語與商業用語。
    """

    prompt=f"""
    你是一個專門用於分類使用者學習目標的聊天機器人。你的主要任務是依據使用者所提供的訊息，僅輸出對應的學習目標類別。

    根據使用者的需求，請只輸出以下其中一個學習模式標籤：

    生活用語：適用於想學習日常生活用語的使用者，如購物、與朋友聊天等。
    旅行用語：適用於想學習旅行相關用語的使用者，如預定酒店、問路等。
    商業用語：適用於為了工作或商務交流而學習的使用者，如參與會議、商業郵件等。
    如果使用者沒有提供明確的學習目標，請輸出「無明確資訊」。
    """

    messages =  [
        {   # 任務的指令
            'role':'system',
            'content': prompt
        },
        {
            'role':'user',
            'content': f'你好'
        },
        {
            'role':'assistant',
            'content': f'無明確資訊'
        },
        user_message
    ]

    # 呼叫 ChatCompletion
    response = get_completion_from_messages(messages)

    if verbose:
        print(f'check_user_learning_topic response: {response}')

    return response

測試案例如下：

test_case = {
    'role':'user',
    'content': "我想去要可以一個人自由旅行"
}

test_result = check_user_learning_topic(test_case)
print(test_result)

--- 以下是實際輸出 ---

>> 旅行用語

學習目標引導助教

另外，如果檢查器無法從使用者訊息中提取明確的學習目標，它會輸出【無明確資訊】。此時，訊息會被轉向到一個特別設計的引導助教，他專門幫助使用者明確其學習目標，如下：

def get_learning_topic_ta_prompt(user_message, settings):
    """
    引導學生表達他們學習目標的助教
    """

    user_lang = settings['user_lang']
    learning_lang = settings['learning_lang']
    learning_topics = settings['learning_topics']

    prompt = f"""
    你是一個引導學生表達他們學習目標的助教，請使用簡答且溫和的口吻回覆。

    學生學習背景
    學生的母語： {user_lang}
    學習的語言： {learning_lang}

    我們提供的學習目標：
    {learning_topics}
    """
    messages =  [
        {   # 任務的指令
            'role':'system',
            'content': prompt
        },
        user_message    # 使用者的訊息

    ]

    # 呼叫 ChatCompletion
    response = get_completion_from_messages(messages)

    return response

下面是該助教的測試案例以及輸出：

test_case = {
    'role':'user',
    'content': '不知道今天要學習什麼呢？'
}

test_result = get_learning_topic_ta_prompt(test_case, global_settings)
print(test_result)

--- 以下是實際輸出 ---

>> 今天我們提供的學習目標有生活用語、旅行用語和商業用語。你可以選擇其中一個或多個學習目標，讓我們一起開始學習吧！如果你有任何特別的學習需求或想要學習其他主題，也歡迎告訴我們，我們會盡力提供相關的學習資源

一旦檢查器成功識別使用者的學習目標，我們將根據這些資訊，從外部學習資料庫選取適當的學習內容，並將所有的其他必要資料都整合在一起，並且傳送至詞彙教學助教的提示模組。

外部資料庫的模擬

外部資料庫部分，我們模擬了一個小型學習資料庫，其中各主題都對應到一個特定的 YouTube 字幕檔，以便用作示範用。例如：

生活用語：使用 "8 Habits to Help You Live Your Best Life - English.srt" 字幕檔
旅行用語：使用 "How (And What) To Pack For a Weekend Getaway - English.srt" 字幕檔
商業用語：使用 "How to Start a Service Business _ The Journey - English (United States).srt" 字幕檔

以下這是將 YouTube 字幕檔讀取並且建立為學習資料庫的程式碼：

import pysrt

srt_project = 'MyLavi_Data/srt_files'
srt_files = {
    '生活用語': f'{srt_project}/8 Habits to Help You Live Your Best Life - English.srt',
    '旅行用語': f'{srt_project}/How (And What) To Pack For a Weekend Getaway - English.srt',
    '商業用語': f'{srt_project}/How to Start a Service Business _ The Journey - English (United States).srt',
}

def load_srt_file():
		"""
    為每個學習目標建立它的學習內容，學習內容從 srt 檔裏面讀取字幕資料。
    """
    srt_docs = {}
    for k, f in srt_files.items():
        # 讀取 SRT 檔案
        subs = pysrt.open(f)

        # 建立一個空的 list 來儲存每一行字幕
        list_of_lines = []

        # 迭代每一行字幕並提取文字
        for sub in subs:
            list_of_lines.append(sub.text)
        srt_docs[k] = list_of_lines

    return srt_docs

topic_contents = load_srt_file()
print(topic_contents)

詞彙教學助教的提示設計

有了以上學習資料庫和使用者訊息中提取的資料後，我們的詞彙教學助教將可以依照這些背景資料生成相應的教學內容，提示設計如下：

def get_lex_suggestion_ta_prompt(user_message, settings):
    """"
    這裏其實也就是類似一個詞彙教學的助教。
    """
    user_lang = settings['user_lang']
    learning_lang = settings['learning_lang']
    learning_content  = settings['learning_content']

    messages =  [
        {
            'role':'system',
            'content': f""""你是一個專業的外語老師。
            請依照使用者的學習背景以及學習內容，隨機挑選一個單字來做教學。

            學習背景：
            學生的母語： {user_lang}
            想要學習的語言： {learning_lang}

            學習內容：
            {learning_content}
            """
        },
        user_message
    ]

    # 呼叫 ChatCompletion
    response = get_completion_from_messages(messages)
    print(response)

以下是一些相關的測試案例和輸出：

user_message = {
    'role':'user',
    'content': f"是因為工作上需要，我希望可以和外國人開會時對答自如。"
}

global_settings['learning_content'] = topic_contents['商業用語']
ai_response = get_lex_suggestion_ta_prompt(user_message, global_settings)

--- 以下是實際輸出 ---

>> 根據你的學習背景和學習內容，我選擇了一個單字來教你：

單字：negotiation (談判)

解釋：談判是指在達成共識或解決問題的過程中，雙方或多方進行的互相溝通和協商。

例句：During the meeting, they engaged in a negotiation to reach a mutually beneficial agreement. (在會議中，他們進行了談判，以達成互利的協議。)

發音：[nɪˌɡoʊʃiˈeɪʃən]

最後的整合

綜合以上所有元素，我們最後的主程式碼如下：

from day12_include import is_moderation_check_passed, moderation_warning_prompt

message_histories = []
learning_topic = None
verbose = False

user_input = input("請輸入使用者訊息(bye for leave)：")
while(user_input != 'bye'):
    print(f'使用者訊息： {user_input}')

    user_message = {
        'role':'user',
        'content': f"{user_input}"
    }

    message_histories.append(user_message)

    # 使用者訊息的合適性檢查
    ai_response = None
    response_ta_name = None
    if is_moderation_check_passed(user_input):  # 通過合適性檢查時的提示處理
        # 使用者學習目標的檢查器，如果還沒有這資料時會進行處理
        if learning_topic != '生活用語' and learning_topic != '旅行用語' and learning_topic != '商業用語':
            learning_topic = check_user_learning_topic(user_message, verbose=verbose)

        # 取得對應的 ta
        response_ta = None
        if learning_topic == '生活用語' or learning_topic == '旅行用語' or learning_topic == '商業用語':
            # 如果使用者選擇了某個學習目標後，就會進入這裏

            # 這裏是依照使用者的學習目標，挑選對應的學習內容
            global_settings['learning_content'] = topic_contents[learning_topic]

            # 使用詞彙教學助教
            response_ta = get_lex_suggestion_ta_prompt

            # 助教名稱, for debug
            response_ta_name = "get_lex_suggestion_ta_prompt"
        else:
            # 使用引導使用者說出學習目標的助教
            response_ta = get_learning_topic_ta_prompt

            # 助教名稱, for debug
            response_ta_name = "get_learning_topic_ta_prompt"

        # 取得挑選的助教，他的回應
        ai_response = response_ta(user_message, global_settings)
    else:   # 違反合適性訊息的提示處理
        response_ta_name = "moderation_warning_prompt"
        ai_response = moderation_warning_prompt(user_message)

    print(f'AI 回應({response_ta_name})：{ai_response}')

    ai_message = {
        'role':'assistant',
        'content': f"{ai_response}"
    }

    message_histories.append(ai_message)

    user_input = input("請輸入使用者訊息(bye for leave)：")

下面是其中一個完整對話記錄的示範：

使用者訊息： 你好
AI 回應(get_learning_topic_ta_prompt)：你好！有什麼我可以幫助你的嗎？

使用者訊息： 我想做點學習
AI 回應(get_learning_topic_ta_prompt)：很好！請問你對於以下的學習目標有興趣嗎？

1. 生活用語
2. 旅行用語
3. 商業用語

請選擇你最感興趣的學習目標，我們可以從那裡開始。

使用者訊息： 好啊，生活上可以用到的好了
AI 回應(get_lex_suggestion_ta_prompt)：好的，我們來教學一個單字，可以在生活中使用。

單字：habit (習慣)

解釋：習慣是指一個人在長時間內重複做的行為或思維模式。

例句：Developing good study habits can help you succeed in school.（培養良好的學習習慣可以幫助你在學校取得成功。）

發音：[ˈhæbɪt]

使用者訊息： 很感謝哦
AI 回應(get_lex_suggestion_ta_prompt)：不客氣！請問你想學習什麼單字呢？

請輸入使用者訊息(bye for leave)：bye

如想要查看完整的程式碼，請點選這裡： D12. 我的第一個聊天機器人 - 外部資料的整合.ipynb

謝謝你的參與，期待我們下次見！