到目前為止,我們寫了 table-driven 測試、加了 context/timeout、做了微基準。今天要進一步學會:
pprof: 找出程式的 CPU 熱點、記憶體分配、goroutine 狀態 等瓶頸,方便做效能優化。如何理解 pprof 的數據可參考完整文件,目前我們只會在 Step 5 初步分析,找到最耗能的 function
我們新增一個檔案 pprof.go
,用 build tag 來控制(避免 production 預設就開 profiling)。
// pprof.go
//go:build debug
package main
import (
"log"
"net/http"
_ "net/http/pprof"
)
// StartPprof 在 :6060 開 profiling server
func StartPprof() {
go func() {
log.Println("pprof listening on :6060")
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
}
在 main.go
裡加:
func main() {
cfg := LoadConfig()
mux := http.NewServeMux()
mux.HandleFunc("/healthz", healthHandler)
mux.HandleFunc("/search", searchHandler)
handler := LoggingMiddleware(RecoveryMiddleware(mux))
// 啟動 pprof (只有 build tag=debug 才會啟動)
StartPprof()
log.Printf("Server listening on %s", cfg.Port)
if err := http.ListenAndServe(cfg.Port, handler); err != nil {
log.Fatal(err)
}
}
這時會在 main.go 出現錯誤:
undefined: StartPprofcompilerUndeclaredName
在 pprof.go 出現警告:
No packages found for open file /Users/shirley/programming/golang/cloud-native-search/pprof.go.
This file may be excluded due to its build tags; try adding "-tags=<build tag>" to your gopls "buildFlags" configuration
See the documentation for more information on working with build tags:
https://github.com/golang/tools/blob/master/gopls/doc/settings.md#buildflags-string.
但沒關係,這是因為我們加了 //go:build debug
,編譯器友善提醒
# 加上 debug build tag
go run -tags debug .
此時:
:8080
:6060
,可透過 http://localhost:6060/debug/pprof/
查看接下來開兩個 terminal tab,第一個製造流量(Step 3),第二個取 profile(Step 4)
用 hey
或 ab
模擬 QPS,讓 /search
持續有請求。*
hey -n 1000000 -c 50 "<http://localhost:8080/search?q=golang>"
參數說明:
n 1000000
→ 總共要發送 1,000,000 個請求。
c 50
→ 同時維持 50 個並發連線 (concurrency)。
這表示同一時間最多有 50 個請求同時在跑。
預期結果
Summary:
Total: 21.4561 secs
Slowest: 0.3754 secs
Fastest: 0.0000 secs
Average: 0.0011 secs
Requests/sec: 46606.8823
Total data: 102000000 bytes
Size/request: 102 bytes
Response time histogram:
0.000 [1] |
0.038 [999124] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.075 [429] |
0.113 [190] |
0.150 [98] |
0.188 [7] |
0.225 [61] |
0.263 [32] |
0.300 [9] |
0.338 [1] |
0.375 [48] |
Latency distribution:
10% in 0.0001 secs
25% in 0.0003 secs
50% in 0.0006 secs
75% in 0.0010 secs
90% in 0.0020 secs
95% in 0.0031 secs
99% in 0.0081 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0000 secs, 0.3754 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0027 secs
req write: 0.0000 secs, 0.0000 secs, 0.3327 secs
resp wait: 0.0010 secs, 0.0000 secs, 0.3753 secs
resp read: 0.0000 secs, 0.0000 secs, 0.3322 secs
Status code distribution:
[200] 1000000 responses
go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=10"
預期結果
go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=10"
Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=10
Saved profile in /Users/shirley/pprof/pprof.cloud-native-search.samples.cpu.002.pb.gz
File: cloud-native-search
Type: cpu
Time: 2025-09-24 00:29:37 CST
Duration: 10.13s, Total samples = 77.10s (761.26%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 75.86s, 98.39% of 77.10s total
Dropped 264 nodes (cum <= 0.39s)
Showing top 10 nodes out of 64
flat flat% sum% cum cum%
61.67s 79.99% 79.99% 61.69s 80.01% syscall.syscall
6.44s 8.35% 88.34% 6.44s 8.35% runtime.kevent
4.06s 5.27% 93.61% 4.06s 5.27% runtime.pthread_cond_wait
1.91s 2.48% 96.08% 1.91s 2.48% runtime.usleep
0.78s 1.01% 97.09% 0.78s 1.01% runtime.pthread_cond_signal
0.49s 0.64% 97.73% 0.49s 0.64% runtime.madvise
0.46s 0.6% 98.33% 0.46s 0.6% runtime.pthread_cond_timedwait_relative_np
0.03s 0.039% 98.37% 0.86s 1.12% runtime.lock2
0.01s 0.013% 98.38% 16.30s 21.14% main.main.LoggingMiddleware.func2
0.01s 0.013% 98.39% 22.37s 29.01% net/http.(*connReader).Read
(pprof) list main.main.LoggingMiddleware
Total: 77.10s
ROUTINE ======================== main.main.LoggingMiddleware.func2 in /Users/shirley/programming/golang/cloud-native-search/middleware.go
10ms 16.30s (flat, cum) 21.14% of Total
. . 10: return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
. 10ms 11: start := time.Now()
. 170ms 12: next.ServeHTTP(w, r)
10ms 10ms 13: duration := time.Since(start)
. . 14:
. 16.11s 15: log.Printf("%s %s %v", r.Method, r.URL.Path, duration)
. . 16: })
. . 17:}
. . 18:
. . 19:func RecoveryMiddleware(next http.Handler) http.Handler {
. . 20: return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
會下載 CPU profile,接著進入互動介面。常用指令:
top
:列出最耗 CPU 的函式list <func>
:顯示該函式內部的耗時web
:輸出 flame graph(需 graphviz)*(pprof) top
Showing nodes accounting for 75.86s, 98.39% of 77.10s total
Dropped 264 nodes (cum <= 0.39s)
Showing top 10 nodes out of 64
flat flat% sum% cum cum%
61.67s 79.99% 79.99% 61.69s 80.01% syscall.syscall
6.44s 8.35% 88.34% 6.44s 8.35% runtime.kevent
4.06s 5.27% 93.61% 4.06s 5.27% runtime.pthread_cond_wait
1.91s 2.48% 96.08% 1.91s 2.48% runtime.usleep
0.78s 1.01% 97.09% 0.78s 1.01% runtime.pthread_cond_signal
0.49s 0.64% 97.73% 0.49s 0.64% runtime.madvise
0.46s 0.6% 98.33% 0.46s 0.6% runtime.pthread_cond_timedwait_relative_np
0.03s 0.039% 98.37% 0.86s 1.12% runtime.lock2
0.01s 0.013% 98.38% 16.30s 21.14% main.main.LoggingMiddleware.func2
0.01s 0.013% 98.39% 22.37s 29.01% net/http.(*connReader).Read
關鍵發現: 日誌中介層的累積時間佔了 21%,這是可以優化的地方。
問題出在 log.Printf
每次都要格式化字串和寫入。在高 QPS *下很昂貴。
// middleware.go
func LoggingMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
next.ServeHTTP(w, r)
duration := time.Since(start)
// 只在 debug 模式才詳細記錄,或者降低日誌頻率
if duration > 100*time.Millisecond { // 只記錄慢請求
log.Printf("%s %s %v", r.Method, r.URL.Path, duration)
}
})
}
透過 JSON 結構化日誌
type LogEntry struct {
Method string `json:"method"`
Path string `json:"path"`
Duration time.Duration `json:"duration"`
}
func LoggingMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
next.ServeHTTP(w, r)
duration := time.Since(start)
// 使用結構化日誌,避免字串格式化
entry := LogEntry{
Method: r.Method,
Path: r.URL.Path,
Duration: duration,
}
// 只記錄慢請求或取樣記錄
if duration > 100*time.Millisecond {
if data, err := json.Marshal(entry); err == nil {
log.Println(string(data))
}
}
})
}
重新跑壓測:
hey -n 1000000 -c 50 "http://localhost:8080/search?q=golang"
預期結果
hey -n 1000000 -c 50 "http://localhost:8080/search?q=golang"
Summary:
Total: 14.2736 secs
Slowest: 0.1821 secs
Fastest: 0.0000 secs
Average: 0.0007 secs
Requests/sec: 70059.3366
Total data: 102000000 bytes
Size/request: 102 bytes
Response time histogram:
0.000 [1] |
0.018 [999477] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.036 [426] |
0.055 [37] |
0.073 [5] |
0.091 [4] |
0.109 [6] |
0.127 [0] |
0.146 [0] |
0.164 [0] |
0.182 [44] |
Latency distribution:
10% in 0.0001 secs
25% in 0.0002 secs
50% in 0.0004 secs
75% in 0.0008 secs
90% in 0.0014 secs
95% in 0.0021 secs
99% in 0.0052 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0000 secs, 0.1821 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0026 secs
req write: 0.0000 secs, 0.0000 secs, 0.0191 secs
resp wait: 0.0006 secs, 0.0000 secs, 0.1811 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0347 secs
Status code distribution:
[200] 1000000 responses
跑壓測的同時抓 profile:
go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=10"
獲得結果:
優化後,LoggingMiddleware
不再出現在 top 10
go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=10"
Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=10
Saved profile in /Users/shirley/pprof/pprof.cloud-native-search.samples.cpu.003.pb.gz
File: cloud-native-search
Type: cpu
Time: 2025-09-24 00:56:55 CST
Duration: 10.19s, Total samples = 71.04s (696.95%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 69.55s, 97.90% of 71.04s total
Dropped 260 nodes (cum <= 0.36s)
Showing top 10 nodes out of 67
flat flat% sum% cum cum%
57.85s 81.43% 81.43% 57.87s 81.46% syscall.syscall
4.95s 6.97% 88.40% 4.95s 6.97% runtime.kevent
3.41s 4.80% 93.20% 3.41s 4.80% runtime.pthread_cond_wait
1.72s 2.42% 95.62% 1.72s 2.42% runtime.usleep
0.60s 0.84% 96.47% 0.60s 0.84% runtime.pthread_cond_timedwait_relative_np
0.50s 0.7% 97.17% 0.50s 0.7% runtime.pthread_cond_signal
0.48s 0.68% 97.85% 0.48s 0.68% runtime.madvise
0.02s 0.028% 97.87% 1.03s 1.45% runtime.lock2
0.01s 0.014% 97.89% 20.76s 29.22% bufio.(*Reader).Peek
0.01s 0.014% 97.90% 32.07s 45.14% internal/poll.(*FD).Write
現在的熱點全部是系統層面:
優化成功的證明:
LoggingMiddleware
不再出現在熱點列表可進一步優化:
heap
)、阻塞(block
)、goroutine。*今天學會:
*註1:Linux / Window 參考官方文件安裝 hey,macOS 則可透過 brew 安裝
brew install hey
*註2:Linux / Window 參考官方文件安裝 graphviz,macOS 則可透過 brew 安裝
brew install graphviz
*註3:QPS 是 request per second,每秒處理的請求數
*註4:
heap profile: 查看哪些 function 分配了多少記憶體,用來找 記憶體洩漏 (memory leak),可能某些結構沒釋放,或 slice/map 一直擴張。
block profile: 哪些地方 goroutine 在「等待」同步(mutex、chan)而被 block,用來找出 goroutine 卡住的地方(例如 channel send/recv、mutex.Lock)。
goroutine profile: 當下所有 goroutine 的堆疊狀態,用來發現 goroutine leak(goroutine 數量暴增卻沒回收),或確認是否有 goroutine 一直卡在同一個呼叫堆疊(死鎖、阻塞)。
go tool pprof http://localhost:6060/debug/pprof/heap
go tool pprof http://localhost:6060/debug/pprof/block
go tool pprof http://localhost:6060/debug/pprof/goroutine
👉 明天接著「如何暴露自定義的指標(QPS、Latency)」,方便用 Prometheus + Grafana 長期觀測。