經過前面幾天的學習,我們已經分別掌握了日誌 (Loki)、追蹤 (Tempo) 和指標 (Mimir) 這三大可觀測性的支柱。今天,我們將迎來一個激動人心的里程碑:將這三者融會貫通,在 Grafana 中建立一個統一的儀表板,真正實現「單一窗格 (Single Pane of Glass)」的監控體驗。
在傳統的監控中,指標、日誌和追蹤通常是孤立的系統。當問題發生時,工程師需要在不同的系統和界面之間來回切換,試圖手動將線索拼湊起來。這個過程非常耗時且效率低下。
一個統一的儀表板可以讓我們:
首先,請在 day27
目錄下建立以下的檔案與資料夾結構。所有檔案的內容請參考下方的「完整設定檔內容」章節。
day27/
├── docker-compose.yml
├── loki-config.yaml
├── mimir-config.yaml
├── prometheus.yml
├── README.md
└── grafana-provisioning/
├── datasources/
│ └── datasource.yml
└── dashboards/
├── dashboard.yml
└── main-dashboard.json
確認所有檔案都已建立並填入正確內容後,在 day27
的根目錄下,執行以下 Docker Compose 命令來啟動所有服務。
# -d 參數會讓服務在背景執行
docker-compose up -d
服務啟動後,Grafana 會自動根據設定檔完成以下兩件事:
開啟 Grafana: 在您的瀏覽器中訪問 http://localhost:3000
。
找到儀表板: 點擊左側選單的 Dashboards
圖示,您應該能看到一個名為 Day 27: Unified Dashboard
的儀表板,直接點擊進入。
這個儀表板被分成了兩個部分:
Mimir: HTTP Requests Total
面板顯示了從 Mimir 查詢到的指標數據。這是我們監控系統健康狀況的「高層視圖」。Loki: Logs
面板顯示了從 Loki 查詢到的日誌數據。當我們在指標面板發現異常時,可以立刻查看同一時間範圍內的日誌。這就是整合儀表板的魔力所在。想像一個典型的問題排查場景:
trace_id
(我們在 datasource.yml
中已設定好關聯),您在日誌面板中點擊它,就可以直接跳轉到 Tempo,查看導致這條錯誤日誌的完整分散式追蹤鏈路。以下是本次練習需要用到的所有設定檔的完整內容。
docker-compose.yml
version: '3.8'
services:
loki:
image: grafana/loki:2.9.0
ports:
- "3100:3100"
volumes:
- ./loki-config.yaml:/etc/loki/local-config.yaml
command: -config.file=/etc/loki/local-config.yaml
tempo:
image: grafana/tempo:2.2.0
ports:
- "3200:3200" # Tempo
- "4317:4317" # OTLP gRPC
mimir:
image: grafana/mimir:2.9.0
ports:
- "9009:9009"
volumes:
- ./mimir-config.yaml:/etc/mimir.yaml
- mimir-data:/data/mimir
command: -config.file=/etc/mimir.yaml
prometheus:
image: prom/prometheus:v2.47.0
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
command: --config.file=/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:10.0.3
ports:
- "3000:3000"
volumes:
- ./grafana-provisioning/datasources:/etc/grafana/provisioning/datasources
- ./grafana-provisioning/dashboards:/etc/grafana/provisioning/dashboards
volumes:
mimir-data:
loki-config.yaml
auth_enabled: false
server:
http_listen_port: 3100
common:
instance_addr: 127.0.0.1
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
mimir-config.yaml
target: all
auth_enabled: false
server:
http_listen_port: 9009
grpc_listen_port: 9095
distributor:
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
ingester:
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
replication_factor: 1
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
max_transfer_retries: 0
ruler:
alertmanager_url: http://localhost
ring:
kvstore:
store: inmemory
blocks_storage:
backend: filesystem
filesystem:
dir: /data/mimir/blocks
compactor:
data_dir: /data/mimir/compactor
sharding_ring:
kvstore:
store: inmemory
store_gateway:
sharding_ring:
kvstore:
store: inmemory
prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'mimir'
static_configs:
- targets: ['mimir:9009']
remote_write:
- url: "http://mimir:9009/api/v1/push"
grafana-provisioning/datasources/datasource.yml
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: 'trace_id=(\w+)'
name: TraceID
url: '$${__value.raw}'
- name: Mimir
type: prometheus
access: proxy
url: http://mimir:9009/prometheus
isDefault: true
- name: Tempo
type: tempo
access: proxy
url: http://tempo:3200
jsonData:
tracesToLogs:
datasourceUid: 'loki'
tags: ['job', 'instance', 'pod', 'namespace']
mappedTags: [{ key: 'service.name', value: 'job' }]
spanStartTimeShift: '1s'
spanEndTimeShift: '-1s'
grafana-provisioning/dashboards/dashboard.yml
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
editable: true
options:
path: /etc/grafana/provisioning/dashboards
grafana-provisioning/dashboards/main-dashboard.json
{
"__inputs": [],
"__requires": [],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": 1,
"links": [],
"liveNow": false,
"panels": [
{
"title": "Mimir: HTTP Requests Total",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "mimir"
},
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 0
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "mimir"
},
"expr": "rate(prometheus_http_requests_total[5m])",
"legendFormat": "{{handler}}"
}
]
},
{
"title": "Loki: Logs",
"type": "logs",
"datasource": {
"type": "loki",
"uid": "loki"
},
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 0
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "loki"
},
"expr": "{job=\"mimir\"}"
}
]
}
],
"schemaVersion": 37,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "Day 27: Unified Dashboard",
"uid": "day27-unified",
"version": 1,
"weekStart": ""
}
今天,我們將可觀測性的三大支柱——指標、日誌和追蹤——整合到了一個統一的 Grafana 儀表板中。我們不僅學習了如何配置這樣一個儀表板,更重要的是理解了它在現代軟體系統監控和故障排查中的巨大價值。
透過這種方式,我們不再是看著孤立的數據點,而是在觀察一個完整的故事。從指標的宏觀趨勢,到日誌的具體細節,再到追蹤的完整上下文,我們擁有了前所未有的洞察力。
至此,我們已經完成了 Grafana 可觀測性技術棧 (Loki, Grafana, Mimir/Metrics, Tempo) 的核心學習路徑。恭喜您!