身為開發者,在挑選 Generative AI 的來源與使用方式時,常會聽到幾個名詞:
通常,模型發佈會先以 Hugging Face 為主,因此我們先來看看如何使用 Hugging Face 的模型。
先以這個範例 來拆解幾個部分。
HUGGING_FACE_ACCESS_TOKEN = userdata.get('HF_TOKEN')
model_name = 'google/gemma-2-2b-it'
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
token=HUGGING_FACE_ACCESS_TOKEN
).to('cuda')
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGING_FACE_ACCESS_TOKEN)
def generate_response(query, context, max_length=1000):
prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to('cuda')
with torch.no_grad():
output = model.generate(input_ids, max_new_tokens=max_length, num_return_sequences=1)
decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
# Extracting the answer part by removing the prompt portion
answer_start = decoded_output.find("Answer:") + len("Answer:")
answer = decoded_output[answer_start:].strip()
return answer
def query_documents(query):
similar_chunks = find_most_similar_chunks(query)
context = " ".join([result['chunk'].replace("\n", "") for result in similar_chunks])
response = generate_response(query, context)
return response, similar_chunks
query = "How many types of regular Train Car cards are there?"
answer, relevant_chunks = query_documents(query)
print(f"Query: {query}\n\n-----\n")
print(f"Generated answer: {answer}\n\n-----\n")
print("Relevant chunks:")
for chunk in relevant_chunks:
print(f"Document: {chunk['document']}")
print(f"Chunk: {chunk['chunk']}".replace("\n", ""))
print(f"Distance: {chunk['distance']}")
print()
透過 Hugging Face 載入模型會是身為開發者的第一種方式,因為 Hugging Face 提供了許多模型與資料集,並且提供了許多工具,讓開發者可以快速的使用。然而,對於 Software Engineer 來說,載入,讀取的效能將會取決於 Hubbing Face Library 以及 Hugging Face 網路速度。下一篇我們會介紹透過常見的 RESTful API 來使用 Generative AI 模型。