[Day 30] 監控：為你的 Go 服務加上眼睛和耳朵

2025 iThome 鐵人賽

DAY 30

Software Development

Go Clean Architecture API 開發全攻略系列第 30 篇

17th鐵人賽

nick_forever

2025-09-30 07:02:31

144 瀏覽

分享至

我們的服務已經部署上線，但它是一個「黑盒子」。
它現在健康嗎？每秒處理多少請求？平均回應時間是多少？錯誤率有沒有突然飆升？
要回答這些問題，我們需要可觀測性（Observability）。

可觀測性通常建立在三大支柱之上：

日誌 (Logs)：記錄離散的、詳細的事件。（我們在第 5 篇已完成）
指標 (Metrics)：可匯總的、數值型的數據，用於觀察趨勢和概覽。（本文重點）
追蹤 (Traces)：記錄單一請求在多個服務之間的完整生命週期。

本文將聚焦於「指標」，我們將使用 Prometheus 來收集指標，並用 Grafana 來將其視覺化。

第一步：設定監控環境

我們將透過 docker-compose.yaml 來啟動 Prometheus 和 Grafana 服務。

建立 Prometheus 設定檔：在專案根目錄下建立 prometheus.yml。

# prometheus.yml
global:
  scrape_interval: 15s # 每 15 秒抓取一次指標

scrape_configs:
  - job_name: 'go-clean-project'
    static_configs:
      - targets: ['app:5001'] # 指向我們的 Go 應用程式容器

更新 docker-compose.yaml：

# docker-compose.yaml
services:
  # ... app, db, redis 服務 ...
  prometheus:
    image: prom/prometheus:v2.47.2
    container_name: go-clean-project-prometheus
    restart: always
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:10.1.5
    container_name: go-clean-project-grafana
    restart: always
    ports:
      - "3000:3000"

第二步：在 Go 應用中暴露 `/metrics` 端點

Prometheus 採用「拉（Pull）」模型，它會定期地從我們應用程式暴露的一個 HTTP 端點上抓取指標。這個端點通常是 /metrics。

安裝 Prometheus 客戶端函式庫：

go get github.com/prometheus/client_golang

註冊 /metrics 路由：在 internal/http/route.go 中，新增一個路由來提供 Prometheus 的 handler。

// internal/http/route.go
import "github.com/prometheus/client_golang/prometheus/promhttp"

func SetupRoutes(...) {
    // ...
    router.GET("/metrics", gin.WrapH(promhttp.Handler()))
    // ...
}

第三步：定義並收集自訂指標

光有預設指標是不夠的，我們需要定義能反映我們業務狀況的自訂指標。最好的方式是透過一個中介軟體來完成。

在 internal/http/middleware/ 目錄下建立 metrics.go：

// internal/http/middleware/metrics.go
package middleware

import (
	"strconv"
	"time"

	"github.com/gin-gonic/gin"
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
)

var (
	httpRequestsTotal = promauto.NewCounterVec(
		prometheus.CounterOpts{
			Name: "http_requests_total",
			Help: "Total number of http requests.",
		},
		[]string{"method", "path", "status"}, // 標籤
	)

	httpRequestDuration = promauto.NewHistogramVec(
		prometheus.HistogramOpts{
			Name: "http_request_duration_seconds",
			Help: "Duration of http requests.",
		},
		[]string{"method", "path"},
	)
)

// Metrics 是一個收集請求指標的中介軟體
func Metrics() gin.HandlerFunc {
	return func(c *gin.Context) {
		startTime := time.Now()

		c.Next()

		duration := time.Since(startTime)
		path := c.FullPath() // 使用路由範本，避免維度爆炸
		method := c.Request.Method
		status := strconv.Itoa(c.Writer.Status())

		// 更新指標
		httpRequestDuration.WithLabelValues(method, path).Observe(duration.Seconds())
		httpRequestsTotal.WithLabelValues(method, path, status).Inc()
	}
}

最後，在 route.go 中註冊這個新的中介軟體：router.Use(Metrics())。

第四步：在 Grafana 中視覺化指標

現在，我們的應用正在持續地暴露指標，Prometheus 也正在持續地抓取它們。我們需要 Grafana 來將這些數據變成直觀的圖表。

設定 Grafana：

瀏覽 http://localhost:3000（預設帳號密碼 admin/admin）。進入 Connections -> Data sources，新增一個 Prometheus 資料來源，URL 設定為 http://prometheus:9090。

建立儀表板：建立一個新的儀表板（Dashboard），並新增一個面板（Panel）。
撰寫 PromQL 查詢：在面板的查詢編輯器中，你可以使用 PromQL（Prometheus Query Language）來查詢指標。
- 查詢 QPS (每秒請求數)：
  sum(rate(http_requests_total[5m]))
- 查詢 P95 回應延遲：
  histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, path))