到目前為止,我們寫了 table-driven 測試、加了 context/timeout、做了微基準。今天要進一步學會:
pprof: 找出程式的 CPU 熱點、記憶體分配、goroutine 狀態 等瓶頸,方便做效能優化。如何理解 pprof 的數據可參考完整文件,目前我們只會在 Step 5 初步分析,找到最耗能的 function
我們新增一個檔案 pprof.go,用 build tag 來控制(避免 production 預設就開 profiling)。
// pprof.go
//go:build debug
package main
import (
	"log"
	"net/http"
	_ "net/http/pprof"
)
// StartPprof 在 :6060 開 profiling server
func StartPprof() {
	go func() {
		log.Println("pprof listening on :6060")
		log.Println(http.ListenAndServe("localhost:6060", nil))
	}()
}
在 main.go 裡加:
func main() {
	cfg := LoadConfig()
	mux := http.NewServeMux()
	mux.HandleFunc("/healthz", healthHandler)
	mux.HandleFunc("/search", searchHandler)
	handler := LoggingMiddleware(RecoveryMiddleware(mux))
	// 啟動 pprof (只有 build tag=debug 才會啟動)
	StartPprof()
	log.Printf("Server listening on %s", cfg.Port)
	if err := http.ListenAndServe(cfg.Port, handler); err != nil {
		log.Fatal(err)
	}
}
這時會在 main.go 出現錯誤:
undefined: StartPprofcompilerUndeclaredName
在 pprof.go 出現警告:
No packages found for open file /Users/shirley/programming/golang/cloud-native-search/pprof.go.
This file may be excluded due to its build tags; try adding "-tags=<build tag>" to your gopls "buildFlags" configuration
See the documentation for more information on working with build tags:
https://github.com/golang/tools/blob/master/gopls/doc/settings.md#buildflags-string.
但沒關係,這是因為我們加了 //go:build debug ,編譯器友善提醒
# 加上 debug build tag
go run -tags debug .
此時:
:8080
:6060,可透過 http://localhost:6060/debug/pprof/ 查看接下來開兩個 terminal tab,第一個製造流量(Step 3),第二個取 profile(Step 4)
用 hey 或 ab 模擬 QPS,讓 /search 持續有請求。*
hey -n 1000000 -c 50 "<http://localhost:8080/search?q=golang>"
參數說明:
n 1000000
→ 總共要發送 1,000,000 個請求。
c 50
→ 同時維持 50 個並發連線 (concurrency)。
這表示同一時間最多有 50 個請求同時在跑。
預期結果
Summary:
  Total:	21.4561 secs
  Slowest:	0.3754 secs
  Fastest:	0.0000 secs
  Average:	0.0011 secs
  Requests/sec:	46606.8823
  Total data:	102000000 bytes
  Size/request:	102 bytes
Response time histogram:
  0.000 [1]	|
  0.038 [999124]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.075 [429]	|
  0.113 [190]	|
  0.150 [98]	|
  0.188 [7]	|
  0.225 [61]	|
  0.263 [32]	|
  0.300 [9]	|
  0.338 [1]	|
  0.375 [48]	|
Latency distribution:
  10% in 0.0001 secs
  25% in 0.0003 secs
  50% in 0.0006 secs
  75% in 0.0010 secs
  90% in 0.0020 secs
  95% in 0.0031 secs
  99% in 0.0081 secs
Details (average, fastest, slowest):
  DNS+dialup:	0.0000 secs, 0.0000 secs, 0.3754 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0027 secs
  req write:	0.0000 secs, 0.0000 secs, 0.3327 secs
  resp wait:	0.0010 secs, 0.0000 secs, 0.3753 secs
  resp read:	0.0000 secs, 0.0000 secs, 0.3322 secs
Status code distribution:
  [200]	1000000 responses
go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=10"
預期結果
go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=10"
Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=10
Saved profile in /Users/shirley/pprof/pprof.cloud-native-search.samples.cpu.002.pb.gz
File: cloud-native-search
Type: cpu
Time: 2025-09-24 00:29:37 CST
Duration: 10.13s, Total samples = 77.10s (761.26%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 75.86s, 98.39% of 77.10s total
Dropped 264 nodes (cum <= 0.39s)
Showing top 10 nodes out of 64
      flat  flat%   sum%        cum   cum%
    61.67s 79.99% 79.99%     61.69s 80.01%  syscall.syscall
     6.44s  8.35% 88.34%      6.44s  8.35%  runtime.kevent
     4.06s  5.27% 93.61%      4.06s  5.27%  runtime.pthread_cond_wait
     1.91s  2.48% 96.08%      1.91s  2.48%  runtime.usleep
     0.78s  1.01% 97.09%      0.78s  1.01%  runtime.pthread_cond_signal
     0.49s  0.64% 97.73%      0.49s  0.64%  runtime.madvise
     0.46s   0.6% 98.33%      0.46s   0.6%  runtime.pthread_cond_timedwait_relative_np
     0.03s 0.039% 98.37%      0.86s  1.12%  runtime.lock2
     0.01s 0.013% 98.38%     16.30s 21.14%  main.main.LoggingMiddleware.func2
     0.01s 0.013% 98.39%     22.37s 29.01%  net/http.(*connReader).Read
(pprof) list main.main.LoggingMiddleware
Total: 77.10s
ROUTINE ======================== main.main.LoggingMiddleware.func2 in /Users/shirley/programming/golang/cloud-native-search/middleware.go
      10ms     16.30s (flat, cum) 21.14% of Total
         .          .     10:	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
         .       10ms     11:		start := time.Now()
         .      170ms     12:		next.ServeHTTP(w, r)
      10ms       10ms     13:		duration := time.Since(start)
         .          .     14:
         .     16.11s     15:		log.Printf("%s %s %v", r.Method, r.URL.Path, duration)
         .          .     16:	})
         .          .     17:}
         .          .     18:
         .          .     19:func RecoveryMiddleware(next http.Handler) http.Handler {
         .          .     20:	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
會下載 CPU profile,接著進入互動介面。常用指令:
top:列出最耗 CPU 的函式list <func>:顯示該函式內部的耗時web:輸出 flame graph(需 graphviz)*(pprof) top
Showing nodes accounting for 75.86s, 98.39% of 77.10s total
Dropped 264 nodes (cum <= 0.39s)
Showing top 10 nodes out of 64
      flat  flat%   sum%        cum   cum%
    61.67s 79.99% 79.99%     61.69s 80.01%  syscall.syscall
     6.44s  8.35% 88.34%      6.44s  8.35%  runtime.kevent
     4.06s  5.27% 93.61%      4.06s  5.27%  runtime.pthread_cond_wait
     1.91s  2.48% 96.08%      1.91s  2.48%  runtime.usleep
     0.78s  1.01% 97.09%      0.78s  1.01%  runtime.pthread_cond_signal
     0.49s  0.64% 97.73%      0.49s  0.64%  runtime.madvise
     0.46s   0.6% 98.33%      0.46s   0.6%  runtime.pthread_cond_timedwait_relative_np
     0.03s 0.039% 98.37%      0.86s  1.12%  runtime.lock2
     0.01s 0.013% 98.38%     16.30s 21.14%  main.main.LoggingMiddleware.func2
     0.01s 0.013% 98.39%     22.37s 29.01%  net/http.(*connReader).Read
關鍵發現: 日誌中介層的累積時間佔了 21%,這是可以優化的地方。
問題出在 log.Printf 每次都要格式化字串和寫入。在高 QPS *下很昂貴。
  // middleware.go
  func LoggingMiddleware(next http.Handler) http.Handler {
      return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
          start := time.Now()
          next.ServeHTTP(w, r)
          duration := time.Since(start)
          // 只在 debug 模式才詳細記錄,或者降低日誌頻率
          if duration > 100*time.Millisecond { // 只記錄慢請求
              log.Printf("%s %s %v", r.Method, r.URL.Path, duration)
          }
      })
  }
透過 JSON 結構化日誌
type LogEntry struct {
      Method   string        `json:"method"`
      Path     string        `json:"path"`
      Duration time.Duration `json:"duration"`
  }
  func LoggingMiddleware(next http.Handler) http.Handler {
      return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
          start := time.Now()
          next.ServeHTTP(w, r)
          duration := time.Since(start)
          // 使用結構化日誌,避免字串格式化
          entry := LogEntry{
              Method:   r.Method,
              Path:     r.URL.Path,
              Duration: duration,
          }
          // 只記錄慢請求或取樣記錄
          if duration > 100*time.Millisecond {
              if data, err := json.Marshal(entry); err == nil {
                  log.Println(string(data))
              }
          }
      })
  }
重新跑壓測:
hey -n 1000000 -c 50 "http://localhost:8080/search?q=golang"
預期結果
hey -n 1000000 -c 50 "http://localhost:8080/search?q=golang"
Summary:
  Total:	14.2736 secs
  Slowest:	0.1821 secs
  Fastest:	0.0000 secs
  Average:	0.0007 secs
  Requests/sec:	70059.3366
  Total data:	102000000 bytes
  Size/request:	102 bytes
Response time histogram:
  0.000 [1]	|
  0.018 [999477]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.036 [426]	|
  0.055 [37]	|
  0.073 [5]	|
  0.091 [4]	|
  0.109 [6]	|
  0.127 [0]	|
  0.146 [0]	|
  0.164 [0]	|
  0.182 [44]	|
Latency distribution:
  10% in 0.0001 secs
  25% in 0.0002 secs
  50% in 0.0004 secs
  75% in 0.0008 secs
  90% in 0.0014 secs
  95% in 0.0021 secs
  99% in 0.0052 secs
Details (average, fastest, slowest):
  DNS+dialup:	0.0000 secs, 0.0000 secs, 0.1821 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0026 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0191 secs
  resp wait:	0.0006 secs, 0.0000 secs, 0.1811 secs
  resp read:	0.0000 secs, 0.0000 secs, 0.0347 secs
Status code distribution:
  [200]	1000000 responses
跑壓測的同時抓 profile:
go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=10"
獲得結果:
優化後,LoggingMiddleware 不再出現在 top 10
go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=10"
Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=10
Saved profile in /Users/shirley/pprof/pprof.cloud-native-search.samples.cpu.003.pb.gz
File: cloud-native-search
Type: cpu
Time: 2025-09-24 00:56:55 CST
Duration: 10.19s, Total samples = 71.04s (696.95%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 69.55s, 97.90% of 71.04s total
Dropped 260 nodes (cum <= 0.36s)
Showing top 10 nodes out of 67
      flat  flat%   sum%        cum   cum%
    57.85s 81.43% 81.43%     57.87s 81.46%  syscall.syscall
     4.95s  6.97% 88.40%      4.95s  6.97%  runtime.kevent
     3.41s  4.80% 93.20%      3.41s  4.80%  runtime.pthread_cond_wait
     1.72s  2.42% 95.62%      1.72s  2.42%  runtime.usleep
     0.60s  0.84% 96.47%      0.60s  0.84%  runtime.pthread_cond_timedwait_relative_np
     0.50s   0.7% 97.17%      0.50s   0.7%  runtime.pthread_cond_signal
     0.48s  0.68% 97.85%      0.48s  0.68%  runtime.madvise
     0.02s 0.028% 97.87%      1.03s  1.45%  runtime.lock2
     0.01s 0.014% 97.89%     20.76s 29.22%  bufio.(*Reader).Peek
     0.01s 0.014% 97.90%     32.07s 45.14%  internal/poll.(*FD).Write
現在的熱點全部是系統層面:
優化成功的證明:
LoggingMiddleware 不再出現在熱點列表可進一步優化:
heap)、阻塞(block)、goroutine。*今天學會:
*註1:Linux / Window 參考官方文件安裝 hey,macOS 則可透過 brew 安裝
brew install hey
*註2:Linux / Window 參考官方文件安裝 graphviz,macOS 則可透過 brew 安裝
brew install graphviz
*註3:QPS 是 request per second,每秒處理的請求數
*註4:
heap profile: 查看哪些 function 分配了多少記憶體,用來找 記憶體洩漏 (memory leak),可能某些結構沒釋放,或 slice/map 一直擴張。
block profile: 哪些地方 goroutine 在「等待」同步(mutex、chan)而被 block,用來找出 goroutine 卡住的地方(例如 channel send/recv、mutex.Lock)。
goroutine profile: 當下所有 goroutine 的堆疊狀態,用來發現 goroutine leak(goroutine 數量暴增卻沒回收),或確認是否有 goroutine 一直卡在同一個呼叫堆疊(死鎖、阻塞)。
go tool pprof http://localhost:6060/debug/pprof/heap
go tool pprof http://localhost:6060/debug/pprof/block
go tool pprof http://localhost:6060/debug/pprof/goroutine
👉 明天接著「如何暴露自定義的指標(QPS、Latency)」,方便用 Prometheus + Grafana 長期觀測。