Day24 - Grafana Tempo 分散式追蹤入門 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

2025 iThome 鐵人賽

DAY 24

DevOps

Vibe Coding 後的挑戰：Locust x Loki 負載及監控系列第 24 篇

Day24 - Grafana Tempo 分散式追蹤入門

17th鐵人賽

熊熊工程師

團隊一蘭拉麵基本配料 5 倍辣

2025-09-12 08:57:26

129 瀏覽

分享至

在微服務架構中，一個使用者請求可能會流經數十個獨立的服務。當發生延遲或錯誤時，要找出問題的根源就像大海撈針。這就是「分散式追蹤 (Distributed Tracing)」發揮作用的地方。

今天，我們將入門分散式追蹤的世界，並學習如何使用 Grafana Tempo 這款專為追蹤而生的高效能後端系統。

1. 什麼是分散式追蹤？

分散式追蹤是一種用來監控和分析橫跨多個服務的請求流程的方法。它幫助我們視覺化一個請求的完整生命週期。

核心概念

Trace (追蹤)：代表一個請求的完整端到端旅程。每個 Trace 都有一個獨一無二的 ID。
Span (跨度)：代表 Trace 中的一個工作單元，例如一次 API 呼叫、一次資料庫查詢。每個 Span 都有自己的 ID、開始與結束時間，以及附帶的元數據 (Tags/Attributes) 和日誌。
Trace ID：一個 Trace 內的所有 Span 都會共享同一個 Trace ID，這也是將它們串聯起來的關鍵。

Distributed Tracing
(圖片來源: Grafana Labs)

透過分析 Trace 中各個 Span 的耗時與關係，我們可以輕易地發現系統中的效能瓶頸。

2. Grafana Tempo 介紹

Grafana Tempo 是一個開源、高擴展性、低成本的分散式追蹤後端。它的設計哲學是「簡單且海量」。

核心特性

低成本儲存：Tempo 主要依賴物件儲存 (如 S3, GCS, 或本地檔案系統) 來儲存追蹤數據，而非昂貴的索引資料庫。
開放標準：相容於 OpenTelemetry (OTel)、Jaeger、Zipkin 等主流開源追蹤協議。
深度整合：與 Grafana、Loki、Prometheus/Mimir 無縫整合，讓你可以在指標、日誌和追蹤之間輕鬆跳轉，實現真正的「可觀測性三位一體」。
TraceQL：內建強大的查詢語言 TraceQL，可以讓你精準地從海量追蹤數據中篩選出你需要的資訊。

3. 實戰：部署 Tempo 並埋點 FastAPI 應用

接下來，我們將使用 Docker Compose 建立一個完整的本地環境，包含：

FastAPI App: 我們要監控的目標應用。
OpenTelemetry Collector: 接收應用程式發出的追蹤數據，並轉發給 Tempo。
Tempo: 儲存追蹤數據。
Grafana: 視覺化追蹤數據。

專案結構

/day24
├── app/
│   ├── main.py
│   └── requirements.txt
├── docker-compose.yml
├── tempo.yaml
├── otel-collector.yaml
└── grafana-datasource.yaml

步驟 1: 建立 FastAPI 應用與埋點

首先，在 app/requirements.txt 中加入必要的套件：

# app/requirements.txt
fastapi
uvicorn
opentelemetry-api
opentelemetry-sdk
opentelemetry-exporter-otlp
opentelemetry-instrumentation-fastapi

然後，撰寫 app/main.py，並使用 OpenTelemetry 進行自動埋點。

# app/main.py
from fastapi import FastAPI
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

# 1. 設定服務名稱
resource = Resource(attributes={
    "service.name": "fastapi-demo-app"
})

# 2. 設定 Tracer Provider
trace.set_tracer_provider(TracerProvider(resource=resource))
tracer = trace.get_tracer(__name__)

# 3. 設定 OTLP Exporter，將數據發送到 OTEL Collector
otlp_exporter = OTLPSpanExporter(
    endpoint="otel-collector:4317",  # Collector 的 gRPC 端口
    insecure=True
)

# 4. 設定 Span Processor
span_processor = BatchSpanProcessor(otlp_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)

# 建立並自動埋點 FastAPI App
app = FastAPI()
FastAPIInstrumentor.instrument_app(app)

@app.get("/")
def read_root():
    return {"message": "Hello, World!"}

@app.get("/items/{item_id}")
def read_item(item_id: int):
    # 可以在特定端點內建立自訂的 Span
    with tracer.start_as_current_span("process_item") as span:
        span.set_attribute("item.id", item_id)
        # 模擬一些工作
        import time
        time.sleep(0.1)
        return {"item_id": item_id}

步驟 2: 設定各服務的 Config

otel-collector.yaml: 設定 Collector 接收 OTLP 數據，並將其匯出到 Tempo。

receivers:
  otlp:
    protocols:
      grpc:
      http:
processors:
  batch:
exporters:
  otlp:
    endpoint: "tempo:4317" # Tempo 的 gRPC 端口
    tls:
      insecure: true
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]

tempo.yaml: 設定 Tempo 接收 OTLP 數據，並將追蹤儲存在本地檔案系統。

server:
  http_listen_port: 3200
distributor:
  receivers:
    otlp:
      protocol:
        grpc:
          endpoint: 0.0.0.0:4317
storage:
  trace:
    backend: local
    local:
      path: /tmp/tempo/blocks

grafana-datasource.yaml: 自動為 Grafana 配置 Tempo 資料來源。

apiVersion: 1
datasources:
  - name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3200
    isDefault: true

步驟 3: 建立 `docker-compose.yml`

version: '3.8'

services:
  app:
    build: ./app
    ports:
      - "8000:8000"
    environment:
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector.yaml"]
    volumes:
      - ./otel-collector.yaml:/etc/otel-collector.yaml
    ports:
      - "4317:4317" # OTLP gRPC
      - "4318:4318" # OTLP HTTP
    depends_on:
      - tempo

  tempo:
    image: grafana/tempo:latest
    command: ["-config.file=/etc/tempo.yaml"]
    volumes:
      - ./tempo.yaml:/etc/tempo.yaml
      - ./tempo-data:/tmp/tempo
    ports:
      - "3200:3200"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - ./grafana-datasource.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
    depends_on:
      - tempo

別忘了在 app 目錄下建立一個簡單的 Dockerfile

# app/Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

4. 啟動並查詢

在 day24 目錄下執行 docker-compose up --build。
等待所有服務啟動後，發送一些請求來產生追蹤數據：
```
curl http://localhost:8000/
curl http://localhost:8000/items/123
```
打開瀏覽器，進入 Grafana http://localhost:3000。
點擊左側選單的 Explore。
頂部的資料來源選擇 Tempo。
在 Search 分頁下的 Service Name 下拉選單中，你應該能看到 fastapi-demo-app。點擊 Run query，你就能看到剛剛產生的 Trace 列表！