pandas-ai
是一個開源套件,能夠讓使用者用 Prompt 的方式請 LLM 幫忙分析 DataFrame
(等價於excel) 裡面的數據。
以下直接照搬專案說明文件內容:
要求 PandasAI 查找 DataFrame 中某列值大於 5 的所有行
import pandas as pd
from pandasai import SmartDataframe
# Sample DataFrame
df = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
"happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})
# Instantiate a LLM
from pandasai.llm import OpenAI
llm = OpenAI(api_token="YOUR_API_TOKEN")
df = SmartDataframe(df, config={"llm": llm})
df.chat('Which are the 5 happiest countries?')
輸出:
6 Canada
7 Australia
1 United Kingdom
3 Germany
0 United States
Name: country, dtype: object
要求 PandasAI 執行更複雜的查詢。例如,您可以要求 PandasAI 計算 2 個最不幸福國家的 GDP 總和:
df.chat('What is the sum of the GDPs of the 2 unhappiest countries?')
輸出:
19012600725504
請 PandasAI 繪製圖表:
df.chat(
"Plot the histogram of countries showing for each the gdp, using different colors for each bar",
)
輸出:
以 OpenAI LLM 模型作為核心,我們需要有 OpenAI API Key
,那不就是要付費了嗎? (っ °Д °;)っ
但是!!! 我不想付費怎麼辦? 我就是客家阿(;´д`)ゞ
大家可以參考我前幾兩篇的文章:
在前兩篇文章中,我們知道 gpt4free
透過逆向工程方式,讓我們可以免費、無限次數的使用 OpenAI 的 gpt3.5
模型。
那麼老套路重現,我們只要把 gpt4free
作為 LLM 模型核心,這樣就可以在不需要付費的情況下使用 pandas-ai
了! >_<
透過繼承方式實現,如下:
import g4f
import pandas as pd
from pandasai import SmartDataframe, PandasAI
from pandasai.llm import LLM
from pandasai.prompts.base import AbstractPrompt
class Gpt4free(LLM):
"""
Class to wrap gpt4free LLMs and make PandasAI interoperable
with gpt4free.
"""
def __init__(
self,
model: str = "gpt-3.5-turbo",
provider: g4f.Provider = None,
stream: bool = False,
):
"""
__init__ method of Gpt4free Class
Args:
model (str): Model of OpenAI API.
provider (g4f.Provider): The Provider of OpenAI API.
stream (bool): Completion with streaming.
"""
self.model = model
self.provider = provider
self.stream = stream
def call(self, instruction: AbstractPrompt, suffix: str = "") -> str:
prompt = instruction.to_string() + suffix
try:
response = g4f.ChatCompletion.create(
model=self.model,
provider=self.provider,
messages=[{"role": "user", "content": prompt}],
)
except Exception as e:
raise RuntimeError(f"Failed to create chat completion with Gpt4free: {str(e)}") from e
return response
@property
def type(self) -> str:
return "gpt4free"
# Sample DataFrame
df = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
"happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})
# Instance a LLM
llm = Gpt4free()
df = SmartDataframe(df, config={"llm": llm})
print(df.chat('Which are the 5 happiest countries?'))
print(df.chat('What is the sum of the GDPs of the 2 unhappiest countries?'))
# Plot
pandas_ai = PandasAI(llm, verbose=True, save_charts=True)
df.chat(
"Plot the histogram of countries showing for each the gdp, using different colors for each bar",
)
輸出結果將與說明文件一致ಥ_ಥ
補充: 原本小弟發了一個 PR#613,想讓用戶可以直接輕鬆實現,省去自行宣告Gpt4free
繼承的程式,但是gpt4free
畢竟是種逆向工程,對於商業需求開發也不好,想想還是關了。當然,各位也可以參考這個 PR 直接改寫本地原生套件,但切記不要用去商業開發...