ELI5 GenAI Day 08 - The Developer Perspective of using Gemma part 1: Hugging Face - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

2024 iThome 鐵人賽

DAY 8

自我挑戰組

ELI5 for Generative AI and Software Development系列第 8 篇

ELI5 GenAI Day 08 - The Developer Perspective of using Gemma part 1: Hugging Face

16th鐵人賽 #gemma #gemmasprint

jimmyliao

2024-08-27 21:41:16

237 瀏覽

分享至

名詞解釋

身為開發者，在挑選 Generative AI 的來源與使用方式時，常會聽到幾個名詞:

Hugging Face: 可以視為科學家的 GitHub，提供了許多 NLP 相關的模型、資料集、工具等。當然也提供 Generative AI 的模型。
OpenAI: 有名的 GPT 系列模型，包括 GPT-2、GPT-3 等。是目前最有名的 Generative AI 之一。目前僅提供 API 服務。
Google Vertex AI: 也有提供一些 Generative AI 的模型，Gemini, Gemma 等。
Azure AI: 也有提供一些 Generative AI 的模型，如 Azure Cognitive Services 等。
Claudia AI: 另一個主流之一個 Generative AI 模型提供者。

通常，模型發佈會先以 Hugging Face 為主，因此我們先來看看如何使用 Hugging Face 的模型。

先以這個範例來拆解幾個部分。

範例說明

記得先有 Hugging Face Token

通常會有幾個步驟:

透過 Hugging Face Token 來取得模型
設定模型與 tokenizer
載入模型
載入 tokenizer
設定模型的輸入
輸入資料

步驟 1 ~ 4

Setting up the model and tokenizer

HUGGING_FACE_ACCESS_TOKEN = userdata.get('HF_TOKEN')

model_name = 'google/gemma-2-2b-it'

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    token=HUGGING_FACE_ACCESS_TOKEN
    ).to('cuda')

tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGING_FACE_ACCESS_TOKEN)

設定模型的輸入與回應 helper function

def generate_response(query, context, max_length=1000):
    prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to('cuda')

    with torch.no_grad():
        output = model.generate(input_ids, max_new_tokens=max_length, num_return_sequences=1)

    decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)

    # Extracting the answer part by removing the prompt portion
    answer_start = decoded_output.find("Answer:") + len("Answer:")
    answer = decoded_output[answer_start:].strip()

    return answer

步驟 5: 設定查找文件的 helper function - 透過模型回答問題

def query_documents(query):
    similar_chunks = find_most_similar_chunks(query)
    context = " ".join([result['chunk'].replace("\n", "") for result in similar_chunks])
    response = generate_response(query, context)
    return response, similar_chunks

步驟 6: 輸入資料

query = "How many types of regular Train Car cards are there?"
answer, relevant_chunks = query_documents(query)

print(f"Query: {query}\n\n-----\n")
print(f"Generated answer: {answer}\n\n-----\n")
print("Relevant chunks:")
for chunk in relevant_chunks:
    print(f"Document: {chunk['document']}")
    print(f"Chunk: {chunk['chunk']}".replace("\n", ""))
    print(f"Distance: {chunk['distance']}")
    print()

結論

透過 Hugging Face 載入模型會是身為開發者的第一種方式，因為 Hugging Face 提供了許多模型與資料集，並且提供了許多工具，讓開發者可以快速的使用。然而，對於 Software Engineer 來說，載入，讀取的效能將會取決於 Hubbing Face Library 以及 Hugging Face 網路速度。下一篇我們會介紹透過常見的 RESTful API 來使用 Generative AI 模型。