Gemini 多模態大型語言模型大小事 Day25 - Google AI Studio - 文件處理功能

16th鐵人賽

kevin_chiu

2024-10-05 22:48:17

124 瀏覽

分享至

前言

    Day7 使用程式處理傳入的 PDF 文件，我們再用Google AI Studio處理一次

輸入文件

在 Google AI Studio 輸入框旁的 + 按下去，就可以選擇輸入檔案的來源

上傳文件

選擇 Google Drive 的文件

照Day7 的咒語和文件

執行結果

咒語
各用100字總結列出這兩份的論文的陳述,請用繁體中文回答

回答

Model
46.7s
warning
論文摘要 (繁體中文)
第一篇論文: 介紹了 Google 最新多模態模型 Gemini 1.5 Pro。此模型基於混合專家架構，能處理高達千萬 tokens 的超長文本、數小時的影音輸入，並具備跨模態推理能力。Gemini 1.5 Pro 在多項基準測試中超越了前一代模型，甚至在某些方面超越了最先進的 Gemini 1.0 Ultra 模型，同時訓練所需的計算資源更少、效率更高。論文重點展示了模型的超長文本理解能力，以及從單一語言資料學習翻譯新語言的驚人能力。

第二篇論文: 介紹了 OWASP Top 10 - 2017，这是一个列举了十大最关键 Web 应用安全风险的列表，旨在帮助组织开发、购买和维护可信赖的应用程序和 API。OWASP Top 10 基于大量的漏洞数据，涵盖了从数百个组织和超过 100,000 个真实应用程序和 API 收集的漏洞。这份列表的目的是教育开发者、设计者、架构师、管理者和组织了解最常见和最重要的 Web 应用安全弱点的后果。

Get Code

"""
Install the Google AI Python SDK

$ pip install google-generativeai
"""

import os
import time
import google.generativeai as genai

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

def upload_to_gemini(path, mime_type=None):
  """Uploads the given file to Gemini.

  See https://ai.google.dev/gemini-api/docs/prompting_with_media
  """
  file = genai.upload_file(path, mime_type=mime_type)
  print(f"Uploaded file '{file.display_name}' as: {file.uri}")
  return file

def wait_for_files_active(files):
  """Waits for the given files to be active.

  Some files uploaded to the Gemini API need to be processed before they can be
  used as prompt inputs. The status can be seen by querying the file's "state"
  field.

  This implementation uses a simple blocking polling loop. Production code
  should probably employ a more sophisticated approach.
  """
  print("Waiting for file processing...")
  for name in (file.name for file in files):
    file = genai.get_file(name)
    while file.state.name == "PROCESSING":
      print(".", end="", flush=True)
      time.sleep(10)
      file = genai.get_file(name)
    if file.state.name != "ACTIVE":
      raise Exception(f"File {file.name} failed to process")
  print("...all files ready")
  print()

# Create the model
generation_config = {
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 64,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}

model = genai.GenerativeModel(
  model_name="gemini-1.5-pro",
  generation_config=generation_config,
  # safety_settings = Adjust safety settings
  # See https://ai.google.dev/gemini-api/docs/safety-settings
)

# TODO Make these files available on the local file system
# You may need to update the file paths
files = [
  upload_to_gemini("2403.05530.pdf", mime_type="application/pdf"),
  upload_to_gemini("OWASP_Top_10_v2017(en).pdf", mime_type="application/pdf"),
]

# Some files have a processing delay. Wait for them to be ready.
wait_for_files_active(files)

chat_session = model.start_chat(
  history=[
    {
      "role": "user",
      "parts": [
        files[0],
        files[1],
        "各用100字總結列出這兩份的論文的陳述,請用繁體中文回答",
      ],
    },
    {
      "role": "model",
      "parts": [
        "## 論文摘要 (繁體中文)\n\n**第一篇論文:** 介紹了 Google 最新多模態模型 Gemini 1.5 Pro。此模型基於混合專家架構，能處理高達千萬 tokens 的超長文本、數小時的影音輸入，並具備跨模態推理能力。Gemini 1.5 Pro 在多項基準測試中超越了前一代模型，甚至在某些方面超越了最先進的 Gemini 1.0 Ultra 模型，同時訓練所需的計算資源更少、效率更高。論文重點展示了模型的超長文本理解能力，以及從單一語言資料學習翻譯新語言的驚人能力。\n\n**第二篇論文:**  介紹了 OWASP Top 10 - 2017，这是一个列举了十大最关键 Web 应用安全风险的列表，旨在帮助组织开发、购买和维护可信赖的应用程序和 API。OWASP Top 10 基于大量的漏洞数据，涵盖了从数百个组织和超过 100,000 个真实应用程序和 API 收集的漏洞。这份列表的目的是教育开发者、设计者、架构师、管理者和组织了解最常见和最重要的 Web 应用安全弱点的后果。",
      ],
    },
  ]
)

response = chat_session.send_message("INSERT_INPUT_HERE")

print(response.text)