Day4: Langfuse 上的 Tracing 是怎麼來的，以及怎麼從 Langfuse 拿資料

2025 iThome 鐵人賽

DAY 5

生成式 AI

阿，又是一個RAG系列第 5 篇

17th鐵人賽

poyuanchih

2025-09-19 18:11:58

161 瀏覽

分享至

Situation:

我們昨天配置了 Langfuse 讓我們可以在 Langfuse 上看到 LlamaIndex 的 QueryEngine.query 之後，底下究竟發生了哪些事。

今天的範例在 Langfuse 上的結果在這裡

今天我們要更進一步的探索一下這個複雜的 TracingTree 是怎麼來的。

Task:

首先我們會先用 LlamaIndex 的 Instrumentation module 來把一個類似的 TracingTree 做出來
接著我們會用 Langfuse 的 python sdk 來把整個 trace 載下來，包含裡面每一步的input / output

Action:

Part 1 span_handler 的 trace_trees

首先是新的instrumentation模塊，我們 import 了 dispatcher 和 SpanHandler，分別初始化後把 span_handler 交給 dispatcher

from llama_index.core.instrumentation import get_dispatcher
from llama_index.core.instrumentation.span_handlers import SimpleSpanHandler

# root dispatcher
root_dispatcher = get_dispatcher()

# register span handler
simple_span_handler = SimpleSpanHandler()
root_dispatcher.add_span_handler(simple_span_handler)

接著是昨天的內容，我們讀取 .env 的環境，並且設置 langfuse

from dotenv import find_dotenv, load_dotenv
from langfuse import get_client

_ = load_dotenv(find_dotenv())
langfuse = get_client()

from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

# Initialize LlamaIndex instrumentation
LlamaIndexInstrumentor().instrument()

自訂幾個documents 做為今天的範例

from llama_index.core import Document

text_list = [
    'Langfuse is an open source LLM engineering platform to help teams collaboratively debug, analyze and iterate on their LLM Applications. '
    'With the Langfuse integration, you can track and monitor performance, traces, and metrics of your LlamaIndex application.' 
    'Detailed traces of the context augmentation and the LLM querying processes are captured and can be inspected directly in the Langfuse UI.',
    'Langfuse 真香',
    'The instrumentation module (available in llama-index v0.10.20 and later) is meant to replace the legacy callbacks module.',
    'Listed below are the core classes as well as their brief description of the instrumentation module: '
    'Event — represents a single moment in time that a certain occurrence took place within the execution of the application’s code.'
    'EventHandler — listen to the occurrences of Event’s and execute code logic at these moments in time.'
    'Span — represents the execution flow of a particular part in the application’s code and thus contains Event’s.'
    'SpanHandler — is responsible for the entering, exiting, and dropping (i.e., early exiting due to error) of Span’s.'
    'Dispatcher — emits Event’s as well as signals to enter/exit/drop a Span to the appropriate handlers.',
]
print(f"len of text_list: {len(text_list)}")
documents = [Document(text=t) for t in text_list]

做成 query_engine 後呼叫

from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-5-mini", temperature=0.0)
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine(llm=llm)
query = '介紹一下 LlamaIndex 的 instrumentation module?'
response = query_engine.query(query)

然後就可以在Langfuse上看結果：這裡

這邊我們多一步：print_trace_trees()

simple_span_handler.print_trace_trees()

結果為：

SentenceSplitter.__call__-4b1835eb-4969-4b55-a794-862a1c895198 (0.004439)
└── SentenceSplitter._parse_nodes-0039126f-da8c-402a-a2c2-22fa4cd028fe (0.003439)
    ├── SentenceSplitter.split_text_metadata_aware-348258d2-4a2e-427a-8a07-10ff7e37241c (0.001045)
    ├── SentenceSplitter.split_text_metadata_aware-3b67633d-87ce-43d7-ad57-0e7b375390fd (0.000247)
    ├── SentenceSplitter.split_text_metadata_aware-e8225c5f-960a-4db8-89bd-2464afd5301e (0.000364)
    └── SentenceSplitter.split_text_metadata_aware-ab1fef1a-0dbd-452e-8176-f85ab307777d (0.000182)


OpenAIEmbedding.get_text_embedding_batch-b0dee3ec-12e0-4fd0-9f1b-dc43f3c2bd9c (1.307544)


RetrieverQueryEngine.query-8e73a465-148f-4d31-bd49-d26ff7238f9a (11.404676)
└── RetrieverQueryEngine._query-32ad03a1-c7a7-4981-9f3f-9b3d362daf65 (11.403766)
    ├── VectorIndexRetriever.retrieve-95f973b5-e10f-4d57-b0da-c3712f17b1ec (1.280278)
    │   └── VectorIndexRetriever._retrieve-7437f3f0-acf7-4bef-bea3-f8e29bc63ee8 (1.279463)
    │       └── OpenAIEmbedding.get_query_embedding-a7437a95-af35-4414-a3ca-7e2d06fb45bf (1.277606)
    │           └── OpenAIEmbedding._get_query_embedding-4dafed62-ca6e-4f35-b993-bd6b21670dc4 (1.276776)
    └── CompactAndRefine.synthesize-a15c5505-c257-4c52-a9c9-2117eef8dc91 (10.122762)
        └── CompactAndRefine.get_response-0c8d0285-0dbe-45c4-b95f-2090b9ebd100 (10.119841)
            ├── TokenTextSplitter.split_text-c8734cc0-6890-484f-a003-39d579e19596 (0.000337)
            └── CompactAndRefine.get_response-93a55dcd-8e32-4fd7-a644-154a1ebc527b (10.118508)
                ├── TokenTextSplitter.split_text-cae71b55-aa34-4fc8-8a02-cccc5eab9638 (0.000139)
                └── DefaultRefineProgram.__call__-ce0f9cfe-6e40-444d-9967-da475c309950 (10.117663)
                    └── OpenAI.predict-aab6dbbb-913c-4941-985b-bebc1d93c785 (10.117283)
                        └── OpenAI.chat-c442fc0e-b9b0-4720-90ad-2f16d215dc45 (10.116348)

基本上這個 span 記的東西就跟 Langfuse 上 87% 像了

Part 2 download data from langfuse

先取得對應的 trace_id

traces = langfuse.api.trace.list(limit=1)
trace_data = traces.data[0]
print(f"trace_id: {trace_data.id}")
print(f"trace input: {trace_data.input}")

trace_id: 9d32f7bfd20563c0a66f030408ad0969
trace input: 介紹一下 LlamaIndex 的 instrumentation module?

這邊的trace_data會是 TraceWithDetails，還只有包含整個呼叫的input 跟 output

取得 TraceWithFullDetails 的 observations 並轉成 dataframe

TraceWithFullDetails = langfuse.api.trace.get(trace_id)
observations = trace.observations

import pandas as pd
def pydantic_list_to_dataframe(pydantic_list):
    """
    Convert a list of pydantic objects to a pandas dataframe.
    """
    data = []
    for item in pydantic_list:
        data.append(item.dict())
    return pd.DataFrame(data)

df = pydantic_list_to_dataframe(observations)

看一下dataframe有哪些 columns

df.columns

Index(['id', 'traceId', 'type', 'name', 'startTime', 'endTime',
       'modelParameters', 'input', 'metadata', 'output', 'usage', 'level',
       'usageDetails', 'costDetails', 'environment', 'inputPrice',
       'outputPrice', 'totalPrice', 'calculatedTotalCost', 'latency',
       'createdAt', 'unit', 'updatedAt', 'projectId', 'totalTokens',
       'promptTokens', 'completionTokens', 'completionStartTime', 'model',
       'version', 'statusMessage', 'parentObservationId', 'promptId',
       'promptName', 'promptVersion', 'modelId', 'calculatedInputCost',
       'calculatedOutputCost', 'timeToFirstToken'],
      dtype='object')

重要的大概就 id, name, input, output, metadata

看一下有哪些 name

df['name'].unique()

array(['RetrieverQueryEngine.query', 'RetrieverQueryEngine._query',
       'CompactAndRefine.synthesize', 'CompactAndRefine.get_response',
       'DefaultRefineProgram.__call__', 'OpenAI.predict', 'OpenAI.chat',
       'TokenTextSplitter.split_text',
       'OpenAIEmbedding.get_query_embedding',
       'VectorIndexRetriever._retrieve', 'VectorIndexRetriever.retrieve',
       'OpenAIEmbedding._get_query_embedding'], dtype=object)

到這邊，基本上就是有每一步的 input 和 output 了

Summary

我們今天用了 LlamaIndex 的 instrumentation 模組，來把一個類似 Langfuse上的 TracingTree 做出來 -> 所以基本上 TracingTree 顯示的其實就是 span
我們填坑了使用 Langfuse 的資料下載，重點在於它其實有 TraceWithDetails 和 TraceWithFullDetails，一個拿的到裡面的所有 input/output/metadata，一個只有整個 trace 的 input/output
這個 trace 的 name 或者是 span 的 name ，比如說：OpenAI.predict 就是 llama-index 他裡面有一個 class 是 OpenAI 然後呼叫了 predict 這個method，因為 dispatcher 在填id_的時候是：id_ = f"{actual_class}.{method_name}-{uuid.uuid4()}"
- 參考: dispatcher.py
到這邊就湊齊了我們的需求：針對一個 LlamaIndex 包起來的呼叫
- 我們想要看到它裡面到底呼叫了哪些東西(做了哪些事)
- 我們想要拿到每一步的 input/output 用來做驗證(壞是從哪裡開始壞)
- 壞了的話要改要去哪裡改