2025 iThome 鐵人賽

DAY 24

生成式 AI

AI咒術迴戰～LLM絕對領域展開系列第 24 篇

Day24-Gemini

17th鐵人賽

cindy7020

2025-10-03 20:22:39

118 瀏覽

分享至

Gemini介紹

本文件比較並示範如何以程式碼使用三大主流生成式 AI：OpenAI ChatGPT（簡述）、Google Gemini / PaLM（詳述）、以及 Anthropic Claude（詳述）。主要聚焦在 Gemini 與 Claude 的中文說明與實作範例（Python、Node.js、Embedding、Multimodal、串流、RAG 與實務建議）

快速概觀
Gemini 深入（中文說明 + 程式碼詳解）
- 2.1 Gemini 是什麼？能力與適用場景
- 2.2 認證與客戶端安裝
- 2.3 Python 範例：文字生成（同步與串流）
- 2.4 Node.js 範例：文字 + 串流
- 2.5 Embeddings 與 RAG（向量搜尋整合）
- 2.6 多模態（圖片上傳、視覺問答）
- 2.7 錯誤處理、費用控制與最佳實務
Gemini vs Claude：如何選擇（依需求）

`1.快速概觀`

Gemini（Google）：強調多模態能力、結合檢索與知識庫的整合，適合需要影像理解、長文件整合與檢索式生成（RAG）的場景
Claude（Anthropic）：強調可控性、安全性與可解釋策略（constitution-driven），適合對輸出可控性、法遵與風險低容忍度的企業應用

以下章節會以程式碼示例展示如何呼叫 API、進行 Embeddings、實作 RAG、處理多模態輸入、以及在生產環境的實務建議

`2. Gemini 深入（中文說明 + 程式碼詳解）`

`2.1 Gemini 是什麼？能力與適用場景`

Gemini 是 Google 的生成式 AI 系列（PaLM / Gemini 家族），在多模態（文字+圖片+結構化知識）整合、搜尋與知識檢索上有優勢。
適合：視覺問答、文件檢索整合（RAG）、多語言應用、需要 Google 生態（如 Drive/GCS 整合）的場景。

`2.2 認證與客戶端安裝`

申請 API Key（或透過 Google Cloud IAM 與 Service Account）
安裝官方 client（範例）：
- Python: pip install google-generative-ai（或官方建議的套件名）
- Node.js: npm install @google/generative-ai（以官方套件為準）

提示：Google SDK 與模型名稱會隨版本更新，程式中以 model='gemini-*' 或 model='models/gemini-*' 作為 placeholder，實際名稱請依官方文件

`2.3 Python 範例：文字生成（同步與串流）`

# 假設使用 google 的 generative api 客戶端（示例）
from google.generativeai import Client
import os

client = Client(api_key=os.getenv('GOOGLE_API_KEY'))

# 同步文字生成
resp = client.generate(
    model='gemini-pro',
    prompt='請用中文解釋注意力機制，並舉一個簡短例子。',
    max_output_tokens=300,
)
print(resp.result[0].content)

# 若支援串流，示例（如果 SDK 有 stream 支援）
# for chunk in client.generate_stream(...):
#     process(chunk)

注意：部分 Gemini SDK 會提供更結構化的呼叫（例如: responses.generate()），請依官方 SDK 文件為準

`2.4 Node.js 範例：文字 + 串流`

// 範例假設性的 Node.js 客戶端
import { GenerativeAI } from 'google-generative-ai';
const client = new GenerativeAI({ apiKey: process.env.GOOGLE_API_KEY });

async function run() {
  const resp = await client.generate({
    model: 'gemini-medium',
    prompt: '用中文介紹 Transformer 的直觀概念。',
    maxOutputTokens: 200
  });
  console.log(resp.candidates[0].content);
}

run();

// 若支援串流（WebSocket 或 SSE），請參考官方 stream 範例以非同步接收 tokens

`2.5 Embeddings 與 RAG（向量搜尋整合）`

流程：把文件 split → 產生 embeddings → 儲存到向量 DB（如 FAISS、Milvus、Weaviate）→ 查詢時取 top-k → 與 prompt 合併送給 Gemini 進行生成

# 1) 產生 embeddings（假設 Gemini 提供 embeddings endpoint）
q = '請幫我總結文件 A 的重點'
q_emb = client.embeddings.create(model='embed-gecko-001', input=q)

# 2) 將向量存入 FAISS（同前面示例）
# 3) 查詢 topk，組裝 prompt + 檢索到的內容，送回 generate

retrieved_docs = ['doc1 text', 'doc2 text']
prompt = '根據以下文件回答問題：
' + '
---
'.join(retrieved_docs) + '
問題：' + '...'
resp = client.generate(model='gemini-pro', prompt=prompt)

提示：為了減少 hallucination，務必在 prompt 中明確要求模型引用來源（source attribution），並限制模型僅使用檢索到的段落回答

`2.6 多模態（圖片上傳、視覺問答）`

Gemini 的多模態端點通常接受 image 或 image+text 作為輸入，回傳文字或結構化回答。

# 假設 SDK 支援 image_input 欄位
with open('receipt.jpg','rb') as f:
    image_bytes = f.read()

resp = client.generate(
    model='gemini-multimodal',
    prompt='請分析此收據並列出商家、總金額、日期。',
    image_input=image_bytes,
    max_output_tokens=200
)
print(resp.result[0].content)

實務建議：上傳前先做 OCR 與敏感資料遮蔽（PII masking），並在 prompt 中要求以表格或 JSON 格式回傳，便於後端解析

`2.7 錯誤處理、費用控制與最佳實務`

速率限制（rate limits）：實作 exponential backoff + jitter
費用控制：使用短回答、限制 max tokens、使用 cheaper model for drafts
輸入驗證：檢查並過濾非法或敏感內容，避免送到第三方
日誌：儲存 prompt（或 prompt 摘要）與 model 回應以供追蹤、審計

Day23-ChatGPT

Day25-Claude

系列文

AI咒術迴戰～LLM絕對領域展開共 30 篇

RSS系列文訂閱系列文

3 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19864 篇

完賽人數

529 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

AI咒術迴戰～LLM絕對領域展開系列 第 24 篇

Day24-Gemini

Gemini介紹

目錄

1.快速概觀

2. Gemini 深入（中文說明 + 程式碼詳解）

2.1 Gemini 是什麼？能力與適用場景

2.2 認證與客戶端安裝

2.3 Python 範例：文字生成（同步與串流）

2.4 Node.js 範例：文字 + 串流

2.5 Embeddings 與 RAG（向量搜尋整合）

2.6 多模態（圖片上傳、視覺問答）

2.7 錯誤處理、費用控制與最佳實務