iT邦幫忙

2025 iThome 鐵人賽

DAY 10
0

到目前為止,我們寫了 table-driven 測試、加了 context/timeout、做了微基準。今天要進一步學會:

  • 啟動 pprof:在程式中暴露 profiling 介面。
  • 跑壓測:讓服務有足夠流量,產生 profile。
  • 用 pprof 工具分析:找到熱點函式。
  • 做一次微調:看到優化數字差異。

Step 0:什麼是 pprof

pprof: 找出程式的 CPU 熱點記憶體分配goroutine 狀態 等瓶頸,方便做效能優化。如何理解 pprof 的數據可參考完整文件,目前我們只會在 Step 5 初步分析,找到最耗能的 function

Step 1:在專案裡啟動 pprof

我們新增一個檔案 pprof.go,用 build tag 來控制(避免 production 預設就開 profiling)。

// pprof.go
//go:build debug

package main

import (
	"log"
	"net/http"
	_ "net/http/pprof"
)

// StartPprof 在 :6060 開 profiling server
func StartPprof() {
	go func() {
		log.Println("pprof listening on :6060")
		log.Println(http.ListenAndServe("localhost:6060", nil))
	}()
}

main.go 裡加:

func main() {
	cfg := LoadConfig()

	mux := http.NewServeMux()
	mux.HandleFunc("/healthz", healthHandler)
	mux.HandleFunc("/search", searchHandler)

	handler := LoggingMiddleware(RecoveryMiddleware(mux))

	// 啟動 pprof (只有 build tag=debug 才會啟動)
	StartPprof()

	log.Printf("Server listening on %s", cfg.Port)
	if err := http.ListenAndServe(cfg.Port, handler); err != nil {
		log.Fatal(err)
	}
}

這時會在 main.go 出現錯誤:

undefined: StartPprofcompilerUndeclaredName

在 pprof.go 出現警告:

No packages found for open file /Users/shirley/programming/golang/cloud-native-search/pprof.go.
This file may be excluded due to its build tags; try adding "-tags=<build tag>" to your gopls "buildFlags" configuration
See the documentation for more information on working with build tags:
https://github.com/golang/tools/blob/master/gopls/doc/settings.md#buildflags-string.

但沒關係,這是因為我們加了 //go:build debug ,編譯器友善提醒


Step 2:跑起服務(debug 模式)

# 加上 debug build tag
go run -tags debug .

此時:

  • 服務一樣跑在 :8080
  • pprof 跑在 :6060,可透過 http://localhost:6060/debug/pprof/ 查看

接下來開兩個 terminal tab,第一個製造流量(Step 3),第二個取 profile(Step 4)

Step 3:製造流量

heyab 模擬 QPS,讓 /search 持續有請求。*

hey -n 1000000 -c 50 "<http://localhost:8080/search?q=golang>"

參數說明:

  • n 1000000

    → 總共要發送 1,000,000 個請求

  • c 50

    → 同時維持 50 個並發連線 (concurrency)

    這表示同一時間最多有 50 個請求同時在跑。

  • 預期結果

    Summary:
      Total:	21.4561 secs
      Slowest:	0.3754 secs
      Fastest:	0.0000 secs
      Average:	0.0011 secs
      Requests/sec:	46606.8823
    
      Total data:	102000000 bytes
      Size/request:	102 bytes
    
    Response time histogram:
      0.000 [1]	|
      0.038 [999124]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
      0.075 [429]	|
      0.113 [190]	|
      0.150 [98]	|
      0.188 [7]	|
      0.225 [61]	|
      0.263 [32]	|
      0.300 [9]	|
      0.338 [1]	|
      0.375 [48]	|
    
    Latency distribution:
      10% in 0.0001 secs
      25% in 0.0003 secs
      50% in 0.0006 secs
      75% in 0.0010 secs
      90% in 0.0020 secs
      95% in 0.0031 secs
      99% in 0.0081 secs
    
    Details (average, fastest, slowest):
      DNS+dialup:	0.0000 secs, 0.0000 secs, 0.3754 secs
      DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0027 secs
      req write:	0.0000 secs, 0.0000 secs, 0.3327 secs
      resp wait:	0.0010 secs, 0.0000 secs, 0.3753 secs
      resp read:	0.0000 secs, 0.0000 secs, 0.3322 secs
    
    Status code distribution:
      [200]	1000000 responses
    

Step 4:取 profile

go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=10"
  • 預期結果

    go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=10"
    
    Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=10
    Saved profile in /Users/shirley/pprof/pprof.cloud-native-search.samples.cpu.002.pb.gz
    File: cloud-native-search
    Type: cpu
    Time: 2025-09-24 00:29:37 CST
    Duration: 10.13s, Total samples = 77.10s (761.26%)
    Entering interactive mode (type "help" for commands, "o" for options)
    
    (pprof) top
    Showing nodes accounting for 75.86s, 98.39% of 77.10s total
    Dropped 264 nodes (cum <= 0.39s)
    Showing top 10 nodes out of 64
          flat  flat%   sum%        cum   cum%
        61.67s 79.99% 79.99%     61.69s 80.01%  syscall.syscall
         6.44s  8.35% 88.34%      6.44s  8.35%  runtime.kevent
         4.06s  5.27% 93.61%      4.06s  5.27%  runtime.pthread_cond_wait
         1.91s  2.48% 96.08%      1.91s  2.48%  runtime.usleep
         0.78s  1.01% 97.09%      0.78s  1.01%  runtime.pthread_cond_signal
         0.49s  0.64% 97.73%      0.49s  0.64%  runtime.madvise
         0.46s   0.6% 98.33%      0.46s   0.6%  runtime.pthread_cond_timedwait_relative_np
         0.03s 0.039% 98.37%      0.86s  1.12%  runtime.lock2
         0.01s 0.013% 98.38%     16.30s 21.14%  main.main.LoggingMiddleware.func2
         0.01s 0.013% 98.39%     22.37s 29.01%  net/http.(*connReader).Read
    
    (pprof) list main.main.LoggingMiddleware
    Total: 77.10s
    ROUTINE ======================== main.main.LoggingMiddleware.func2 in /Users/shirley/programming/golang/cloud-native-search/middleware.go
          10ms     16.30s (flat, cum) 21.14% of Total
             .          .     10:	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
             .       10ms     11:		start := time.Now()
             .      170ms     12:		next.ServeHTTP(w, r)
          10ms       10ms     13:		duration := time.Since(start)
             .          .     14:
             .     16.11s     15:		log.Printf("%s %s %v", r.Method, r.URL.Path, duration)
             .          .     16:	})
             .          .     17:}
             .          .     18:
             .          .     19:func RecoveryMiddleware(next http.Handler) http.Handler {
             .          .     20:	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    

會下載 CPU profile,接著進入互動介面。常用指令:

  • top:列出最耗 CPU 的函式
  • list <func>:顯示該函式內部的耗時
  • web:輸出 flame graph(需 graphviz)*

Step 5:分析 pprof 結果並找到真正的熱點

(pprof) top
Showing nodes accounting for 75.86s, 98.39% of 77.10s total
Dropped 264 nodes (cum <= 0.39s)
Showing top 10 nodes out of 64
      flat  flat%   sum%        cum   cum%
    61.67s 79.99% 79.99%     61.69s 80.01%  syscall.syscall
     6.44s  8.35% 88.34%      6.44s  8.35%  runtime.kevent
     4.06s  5.27% 93.61%      4.06s  5.27%  runtime.pthread_cond_wait
     1.91s  2.48% 96.08%      1.91s  2.48%  runtime.usleep
     0.78s  1.01% 97.09%      0.78s  1.01%  runtime.pthread_cond_signal
     0.49s  0.64% 97.73%      0.49s  0.64%  runtime.madvise
     0.46s   0.6% 98.33%      0.46s   0.6%  runtime.pthread_cond_timedwait_relative_np
     0.03s 0.039% 98.37%      0.86s  1.12%  runtime.lock2
     0.01s 0.013% 98.38%     16.30s 21.14%  main.main.LoggingMiddleware.func2
     0.01s 0.013% 98.39%     22.37s 29.01%  net/http.(*connReader).Read

每欄代表的意思:

  • flat:在這個函式「自己」花掉的 CPU 時間(不含子呼叫)。
  • flat%:flat / 總時間的百分比。
  • sum%:到目前為止累加的百分比。
  • cum (cumulative):這個函式「自己 + 所有子函式」加總的時間。
  • cum%:cum / 總時間的百分比。
  • 最後一欄:function name

如何理解 pprof log 結果

  1. syscall.syscall (79.99%):系統調用,主要是網路 I/O,這是正常的
  2. runtime.kevent (8.35%):事件輪詢,也是正常的
  3. main.LoggingMiddleware.func2 (21.14% cum):我們的日誌中介層

關鍵發現: 日誌中介層的累積時間佔了 21%,這是可以優化的地方。

Step 7:優化熱點 - 改善日誌性能

問題出在 log.Printf 每次都要格式化字串和寫入。在高 QPS *下很昂貴。

  // middleware.go
  func LoggingMiddleware(next http.Handler) http.Handler {
      return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
          start := time.Now()
          next.ServeHTTP(w, r)
          duration := time.Since(start)

          // 只在 debug 模式才詳細記錄,或者降低日誌頻率
          if duration > 100*time.Millisecond { // 只記錄慢請求
              log.Printf("%s %s %v", r.Method, r.URL.Path, duration)
          }
      })
  }

透過 JSON 結構化日誌

type LogEntry struct {
      Method   string        `json:"method"`
      Path     string        `json:"path"`
      Duration time.Duration `json:"duration"`
  }

  func LoggingMiddleware(next http.Handler) http.Handler {
      return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
          start := time.Now()
          next.ServeHTTP(w, r)
          duration := time.Since(start)

          // 使用結構化日誌,避免字串格式化
          entry := LogEntry{
              Method:   r.Method,
              Path:     r.URL.Path,
              Duration: duration,
          }

          // 只記錄慢請求或取樣記錄
          if duration > 100*time.Millisecond {
              if data, err := json.Marshal(entry); err == nil {
                  log.Println(string(data))
              }
          }
      })
  }

Step 8: 驗證優化效果

  1. 重新跑壓測:

    hey -n 1000000 -c 50 "http://localhost:8080/search?q=golang"
    
    • 預期結果

      hey -n 1000000 -c 50 "http://localhost:8080/search?q=golang"
      
      Summary:
        Total:	14.2736 secs
        Slowest:	0.1821 secs
        Fastest:	0.0000 secs
        Average:	0.0007 secs
        Requests/sec:	70059.3366
      
        Total data:	102000000 bytes
        Size/request:	102 bytes
      
      Response time histogram:
        0.000 [1]	|
        0.018 [999477]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
        0.036 [426]	|
        0.055 [37]	|
        0.073 [5]	|
        0.091 [4]	|
        0.109 [6]	|
        0.127 [0]	|
        0.146 [0]	|
        0.164 [0]	|
        0.182 [44]	|
      
      Latency distribution:
        10% in 0.0001 secs
        25% in 0.0002 secs
        50% in 0.0004 secs
        75% in 0.0008 secs
        90% in 0.0014 secs
        95% in 0.0021 secs
        99% in 0.0052 secs
      
      Details (average, fastest, slowest):
        DNS+dialup:	0.0000 secs, 0.0000 secs, 0.1821 secs
        DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0026 secs
        req write:	0.0000 secs, 0.0000 secs, 0.0191 secs
        resp wait:	0.0006 secs, 0.0000 secs, 0.1811 secs
        resp read:	0.0000 secs, 0.0000 secs, 0.0347 secs
      
      Status code distribution:
        [200]	1000000 responses
      
      
  2. 跑壓測的同時抓 profile:

    go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=10"
    
  3. 獲得結果:

    優化後,LoggingMiddleware 不再出現在 top 10

    go tool pprof "http://localhost:6060/debug/pprof/profile?seconds=10"
    
    Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=10
    Saved profile in /Users/shirley/pprof/pprof.cloud-native-search.samples.cpu.003.pb.gz
    File: cloud-native-search
    Type: cpu
    Time: 2025-09-24 00:56:55 CST
    Duration: 10.19s, Total samples = 71.04s (696.95%)
    Entering interactive mode (type "help" for commands, "o" for options)
    
    (pprof) top
    Showing nodes accounting for 69.55s, 97.90% of 71.04s total
    Dropped 260 nodes (cum <= 0.36s)
    Showing top 10 nodes out of 67
          flat  flat%   sum%        cum   cum%
        57.85s 81.43% 81.43%     57.87s 81.46%  syscall.syscall
         4.95s  6.97% 88.40%      4.95s  6.97%  runtime.kevent
         3.41s  4.80% 93.20%      3.41s  4.80%  runtime.pthread_cond_wait
         1.72s  2.42% 95.62%      1.72s  2.42%  runtime.usleep
         0.60s  0.84% 96.47%      0.60s  0.84%  runtime.pthread_cond_timedwait_relative_np
         0.50s   0.7% 97.17%      0.50s   0.7%  runtime.pthread_cond_signal
         0.48s  0.68% 97.85%      0.48s  0.68%  runtime.madvise
         0.02s 0.028% 97.87%      1.03s  1.45%  runtime.lock2
         0.01s 0.014% 97.89%     20.76s 29.22%  bufio.(*Reader).Peek
         0.01s 0.014% 97.90%     32.07s 45.14%  internal/poll.(*FD).Write
    

數據分析

現在的熱點全部是系統層面:

  • syscall.syscall (81.43%) - 網絡 I/O
  • runtime.kevent (6.97%) - 事件輪詢
  • bufio.(*Reader).Peek (29.22% cum) - HTTP 請求解析
  • internal/poll.(*FD).Write (45.14% cum) - 網絡寫入

優化成功的證明:

  1. LoggingMiddleware 不再出現在熱點列表
  2. CPU 使用更集中在系統調用 - 這是 I/O 密集型服務的理想狀態
  3. 應用層開銷大幅減少

可進一步優化:

  1. 連接池優化 - HTTP keep-alive
  2. 響應緩存 - 避免重複計算
  3. 批次處理 - 減少系統呼叫次數

建立習慣

  • pprof 不只是 CPU:也能看記憶體(heap)、阻塞(block)、goroutine。*
  • 持續記錄基線:每次優化前後,跑一次 profile,留下數據。
  • 和 Day 8 搭配:先用微基準找「單點差異」,再用 pprof 找「整體熱點」。

小結

今天學會:

  • 如何在 Go 服務裡啟動 pprof
  • 如何跑壓測並抓取 profile
  • 如何用 pprof top/list 找熱點
  • 做一次小優化並看到數據差異

*註1:Linux / Window 參考官方文件安裝 hey,macOS 則可透過 brew 安裝

brew install hey

*註2:Linux / Window 參考官方文件安裝 graphviz,macOS 則可透過 brew 安裝

brew install graphviz

*註3:QPS 是 request per second,每秒處理的請求數

*註4:

heap profile: 查看哪些 function 分配了多少記憶體,用來找 記憶體洩漏 (memory leak),可能某些結構沒釋放,或 slice/map 一直擴張。

block profile: 哪些地方 goroutine 在「等待」同步(mutex、chan)而被 block,用來找出 goroutine 卡住的地方(例如 channel send/recv、mutex.Lock)。

goroutine profile: 當下所有 goroutine 的堆疊狀態,用來發現 goroutine leak(goroutine 數量暴增卻沒回收),或確認是否有 goroutine 一直卡在同一個呼叫堆疊(死鎖、阻塞)。

go tool pprof http://localhost:6060/debug/pprof/heap

go tool pprof http://localhost:6060/debug/pprof/block

go tool pprof http://localhost:6060/debug/pprof/goroutine

👉 明天接著「如何暴露自定義的指標(QPS、Latency)」,方便用 Prometheus + Grafana 長期觀測。


上一篇
Day 9 - 併發基礎:有限併發的 Worker Pool
系列文
用 Golang + Elasticsearch + Kubernetes 打造雲原生搜尋服務10
圖片
  熱門推薦
圖片
{{ item.channelVendor }} | {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言