Day 8 - 微基準：用 go test -bench 建立效能基線 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

2025 iThome 鐵人賽

DAY 8

Software Development

用 Golang ＋ Elasticsearch ＋ Kubernetes 打造雲原生搜尋服務系列第 8 篇

Day 8 - 微基準：用 go test -bench 建立效能基線

17th鐵人賽 golang go microbenchmark benchmark

Shirley

團隊躺平的內捲小隊

2025-09-22 01:05:10

95 瀏覽

分享至

到目前為止，我們已經有了 /search、中介層、context/timeout 與錯誤策略。今天目標是：

寫幾個微基準來量測「一小段可重複的關鍵程式」，建立性能基線，後續才知道優化有沒有用。
微基準 (microbenchmark)：針對程式碼中非常小的單位（例如一個 function、一個演算法步驟、一次資料結構操作）的效能進行測量。
相對於 宏基準 (macrobenchmark / system benchmark)，它不是量測整個系統或完整工作流程，而是專注在某個小片段的時間、記憶體、配置行為。

原則：微基準要小、可重現、沒有 I/O。

要量什麼？

以我們的搜尋服務為例，最值得量的通常有：

結果組裝：[]SearchResult 的建立方式（預先配置容量 vs 動態 append）。
JSON 編碼：json.Marshal vs json.Encoder、是否重用 buffer。
HTTP handler/middleware 基本開銷：一條請求經過 mux + middleware 的成本（不做 I/O）。

新增基準檔案 bench_search_test.go：

1) 結果組裝：預配置容量 vs 動態 append

package main

import (
	"bytes"
	"encoding/json"
	"net/http"
	"net/http/httptest"
	"testing"
)

// 偽資料來源
var smallHits = []SearchResult{
	{ID: 1, Title: "Learning Go"},
	{ID: 2, Title: "Go Concurrency Patterns"},
	{ID: 3, Title: "High Performance Go"},
}

// 1A) 動態 append（可能反覆擴容）
func buildHitsAppend(src []SearchResult) []SearchResult {
	out := []SearchResult{}
	for _, h := range src {
		out = append(out, h)
	}
	return out
}

// 1B) 預先配置容量（避免擴容）
func buildHitsPrealloc(src []SearchResult) []SearchResult {
	out := make([]SearchResult, 0, len(src))
	for _, h := range src {
		out = append(out, h)
	}
	return out
}

func BenchmarkBuildHitsAppend(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = buildHitsAppend(smallHits)
	}
}

func BenchmarkBuildHitsPrealloc(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = buildHitsPrealloc(smallHits)
	}
}

重點：量「相同邏輯不同實作」的成本差異；-benchmem 會顯示配置次數（allocs/op）。

2) JSON 編碼：`json.Marshal` vs `json.Encoder`（重用 buffer）

type respEnvelope struct {
	Query string         `json:"query"`
	Hits  []SearchResult `json:"hits"`
}

var resp = respEnvelope{
	Query: "golang",
	Hits:  smallHits,
}

func BenchmarkJSONMarshal(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_, _ = json.Marshal(resp)
	}
}

func BenchmarkJSONEncoder_ReusedBuffer(b *testing.B) {
	var buf bytes.Buffer
	enc := json.NewEncoder(&buf)
	for i := 0; i < b.N; i++ {
		buf.Reset()
		_ = enc.Encode(resp) // 注意 Encode 會在結尾加 '\n'
	}
}

重點：

Marshal 每次都配新切片；Encoder 可重用 buffer。
透過 benchmem 看「allocs/op」能不能降。

3) Handler/Middleware 的基本開銷（不做 I/O）

我們可以用 httptest 驗證一條請求經過 mux + middleware 的成本。注意：這仍是微基準，不能引入 time.Sleep 或真正網路。

func BenchmarkHandlerPipeline(b *testing.B) {
	mux := http.NewServeMux()
	mux.HandleFunc("/healthz", healthHandler)
	mux.HandleFunc("/search", func(w http.ResponseWriter, r *http.Request) {
		// 直接回簡單 JSON，避免 I/O 干擾
		_ = json.NewEncoder(w).Encode(resp)
	})
	h := LoggingMiddleware(RecoveryMiddleware(mux))

	req := httptest.NewRequest(http.MethodGet, "/search?q=golang", nil)

	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		rr := httptest.NewRecorder()
		h.ServeHTTP(rr, req)
		_ = rr.Result().Body.Close()
	}
}

重點：

量「路由 + middleware」的額外成本；真實瓶頸通常不在這，但建立數字能避免過度優化錯地方。
b.ReportAllocs() 可顯示每次請求分配量。

執行指令

先關掉 CPU 降頻、背景爆量程式，避免干擾數據。基準請在同一台機器上對比。

# 只跑基準（不跑測試），並顯示記憶體配置
go test -run=^$ -bench=. -benchmem ./...

# 取三次平均，避免偶發抖動
go test -run=^$ -bench=. -benchmem -count=3 ./...

# ----- 測單一 benchmark -----

# 只跑一次 HandlerPipeline
go test -run=^$ -bench=HandlerPipeline -benchtime=1x

# 至少量到 5 秒
go test -bench=HandlerPipeline -benchmem -benchtime=5s

# 紀錄 benchmark 在未來比對前後差異
go test -bench=HandlerPipeline -benchmem > old.txt
# 修改程式後
go test -bench=HandlerPipeline -benchmem > new.txt

# 根據附錄安裝完 benchstat 即可透過工具解讀數據
benchstat old.txt new.txt 

# 跑多次樣本讓 benchstat 算出平均值、標準差、p 值
go test -bench=HandlerPipeline -benchmem -count=10 > old.txt
# 修改程式後
go test -bench=HandlerPipeline -benchmem -count=10 > new.txt

輸出判讀

參數意義：

ns/op：每次操作平均花多久（越小越好）
B/op：每次操作配置了多少記憶體（越小越好）
allocs/op：每次操作做了幾次配置（越少越好）
11：在 GOMAXPROCS=11 下跑出的成績（機器/執行緒數相關）

數據解讀：

預先配置容量的 slice 複製明顯較快（Prealloc 比 Append 快 ~36%*）。
重用 buffer 的 Encoder 明顯更省記憶體/配置 (Encoder 比 Marshal 少一半 alloc/op)。

分析前後數據

old.txt / new.txt：分別是修改前後的基準測試結果。
sec/op, B/op, allocs/op：三個觀察指標。
- sec/op = 每次操作平均花費的時間。
- B/op = 每次操作平均配置的記憶體大小（bytes）。
- allocs/op = 每次操作平均配置次數。
vs base：新舊結果的差異比較（例如 ±%）。
p=... n=...：統計檢定的 p 值與樣本數 n。

1. 執行時間 (sec/op)

HandlerPipeline-11
old: 1.373µs/op
new: 1.436µs/op
差異：~ +4.6%（但 p=1.000, n=1）

意思是新版本平均慢了 ~4.6%，但是 只跑了 1 次樣本 (n=1) → 沒有統計顯著性，benchstat 說需要至少 6 次樣本才有信心。

2. 記憶體配置大小 (B/op)

old: 1.376KiB/op
new: 1.376KiB/op

兩邊完全一樣。配置的 bytes 沒差異。

3. 配置次數 (allocs/op)

old: 18.00 allocs/op
new: 18.00 allocs/op

兩邊完全相同。平均每次 handler 請求大概會產生 18 次配置。

後續怎麼做

跑多次樣本：benchstat 需要多次樣本（通常至少 6 次）才能做統計檢定，否則它就會標示「all samples are equal」或「need ≥6 samples」。
觀察差異方向
- 如果 sec/op 持續變大 → 代表 pipeline 整體變慢。
- 如果 B/op 或 allocs/op 下降 → 表示配置優化成功（通常比時間更穩定）。
用 benchstat 判斷顯著差異
- p<0.05 通常可以認為「真的有差」。
- 如果只有 ±幾 % 而且不顯著，就當成 noise。

Microbenchmark 的 7 個 Tips

避免 I/O（網路/磁碟）：微基準只量 CPU/記憶體與程式本身。
關掉編譯器優化陷阱：確保被量的結果「有用到」；像上面示範會把結果寫入緩衝或回傳，避免被 DCE（dead code elimination）消掉。必要時可用 runtime.KeepAlive(x)。
使用 benchmem：同時看 ns/op 與 allocs/op，常常分配才是慢的根源。
b.ReportAllocs()：對 handler 這種黑箱路徑很有幫助。
固定環境：同一台機器、差不多的負載；多跑幾次取中位數。
benchtime 與 cpu：需要時拉長基準時間（例如 benchtime=2s），或試不同 CPU 核數（cpu 1,2,4）。
只比較單一維度：一次只改一件事（例如只改 JSON 寫法），便於對比。

把基線記錄下來（建立「效能契約」）

把當下的重要數字貼到 BENCHMARK.md 或 README.md：

go test -run=^$ -bench=. -benchmem -count=3 ./...

goos: darwin
goarch: arm64
pkg: github.com/arealclimber/cloud-native-search
cpu: Apple M3 Pro
BenchmarkBuildHitsAppend-11             	45286250	        24.07 ns/op	      80 B/op	       1 allocs/op
BenchmarkBuildHitsAppend-11             	50686822	        23.28 ns/op	      80 B/op	       1 allocs/op
BenchmarkBuildHitsAppend-11             	49706593	        22.68 ns/op	      80 B/op	       1 allocs/op
BenchmarkBuildHitsPrealloc-11           	65393312	        18.50 ns/op	      80 B/op	       1 allocs/op
BenchmarkBuildHitsPrealloc-11           	65247532	        18.50 ns/op	      80 B/op	       1 allocs/op
BenchmarkBuildHitsPrealloc-11           	65183881	        19.61 ns/op	      80 B/op	       1 allocs/op
BenchmarkJSONMarshal-11                 	 4423723	       256.8 ns/op	     192 B/op	       2 allocs/op
BenchmarkJSONMarshal-11                 	 4818093	       249.5 ns/op	     192 B/op	       2 allocs/op
BenchmarkJSONMarshal-11                 	 4847042	       293.8 ns/op	     192 B/op	       2 allocs/op
BenchmarkJSONEncoder_ReusedBuffer-11    	 4613480	       261.4 ns/op	      48 B/op	       1 allocs/op
BenchmarkJSONEncoder_ReusedBuffer-11    	 4957220	       240.3 ns/op	      48 B/op	       1 allocs/op
BenchmarkJSONEncoder_ReusedBuffer-11    	 4970575	       257.7 ns/op	      48 B/op	       1 allocs/op
BenchmarkHandlerPipeline-11             	  930145	      1223 ns/op	    1409 B/op	      18 allocs/op
BenchmarkHandlerPipeline-11             	 1000000	      1303 ns/op	    1409 B/op	      18 allocs/op
BenchmarkHandlerPipeline-11             	  919501	      1225 ns/op	    1409 B/op	      18 allocs/op
PASS
ok  	github.com/arealclimber/cloud-native-search	20.799s

未來改動專案（例如引入 SearchService、加欄位、換 JSON 套件），重新跑一次，與基線比較。如果大幅變慢，就要審視原因。

小結

今天完成：

寫了 3 類與我們服務密切相關的微基準（結果組裝、JSON、handler pipeline）
學會用 go test -bench 與 benchmem 產生可重現的數字
把「性能基線」留下來，讓後續優化（pprof）有「前後對照」可用

附錄

安裝 Go 官方工具用來解讀 benchmark

go install golang.org/x/perf/cmd/benchstat@latest

安裝後可執行檔會放在 $GOPATH/bin 或 $HOME/go/bin。

確保這個路徑有加到你的 $PATH，例如 macOS / Linux 的 zsh，可以在 ~/.zshrc 加上：

export PATH=$PATH:$(go env GOPATH)/bin

確認安裝成功

benchstat -h

*註：(33.63-21.38)/33.63 ≈ 36%

👉 明天，我們要開始處理併發基礎：實作一個「有限併發的 worker pool」，並以簡單壓測觀察「限制併發」對系統穩定度與尾延遲的影響。

Day 7 - 錯誤策略：%w 包裝、分類重試（退避 + 抖動）

Day 9 - 併發基礎：有限併發的 Worker Pool

系列文

用 Golang ＋ Elasticsearch ＋ Kubernetes 打造雲原生搜尋服務共 30 篇

RSS系列文訂閱系列文

6 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19838 篇

完賽人數

529 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

用 Golang ＋ Elasticsearch ＋ Kubernetes 打造雲原生搜尋服務系列 第 8 篇