以前講Trace, 講的比較多得是Linux上的dtrace或者Windows上的ETW
在OpenTelemetry裡, Trace通常指的是Distributed tracing或者叫distributed request tracing.
最常被應用在微服務架構中的應用.
若一個單體應用系統, 所有功能的程式碼都在一個主機上, 排查問題時, 可以像常見的excpetion訊息裡的stack trace追蹤一樣簡單.
若一份程式拆開來運行在許多主機上時, 難以依賴單個stack trace來排查問題.
相反的, 我們需要一個能代表整個request的事務, 從服務間的視角, 從功能組件的視角, 來讓我們能綜觀全貌, 才方便排查問題.
Trace允許我們看到request在不同服務的情況, 每個操作的時間成本, 一些log和發生的錯誤.
再透過一些tool幫助我們了解服務之間的關係與交互.
上圖是Grafana Tempo的Node Graph
能參考前年小弟的文章Distributed Tracing & OpenTelemetry介紹
Trace API最主要的功能就是用來生成Spans, 給Span分配一個唯一的TraceId.
組成跟Metric API非常雷同
API還有作用, 就是跟Context做互動.
SpanContext從OpenTracing裡來的.
講的是跨越程序邊界, 傳遞到下層Span的狀態. 換句話說, 就是Span的上下文物件. 所以SpanContext是Span的一部分.
可以序列化, 並且沿著context進行傳播(Propagation).
SpanConext在當下是不可變得, 只能提取提取或注入新的資訊來生成一個全新的spancontext.
OTel的SpanConext是符合W3C TraceCOntext標準的,
在標準的Ch2有提到
At a minimum they MUST propagate the traceparent and tracestate headers and guarantee traces are not broken. This behavior is also referred to as forwarding a trace.
traceparent則包含了
The traceparent HTTP header field identifies the incoming request in a tracing system. It has four fields:
version
trace-id
parent-id
trace-flags
tracestate則是
The tracestate header includes the parent in a potentially vendor-specific format:
tracestate: congo=t61rcWkgMzE
For example, say a client and server in a system use different tracing vendors: Congo and Rojo. A client traced in the Congo system adds the following headers to an outbound HTTP request.
traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: congo=t61rcWkgMzE
底下Go的程式碼可以清楚看到OTel是遵守這標準在設計的.
1個Span是可以與1個或多個SpanContext存在因果關係的.
像之前的這圖, 一個圓圈表示一個程序邊界.
源頭從context內提取資訊注入行成新的SpanContext.
另一邊則提取Span並組合自己的context, 形成新的SpanContext.
但兩個span之間是有關連的.
且也可以一個span, 同時跨越多個程序邊界.
// TraceID is a unique identity of a trace.
// nolint:revive // revive complains about stutter of `trace.TraceID`.
type TraceID [16]byte
// SpanID is a unique identity of a span in a trace.
type SpanID [8]byte
// SpanContext contains identifying trace information about a Span.
type SpanContext struct {
traceID TraceID
spanID SpanID
// TraceFlags, 包含該trace的詳情, 這裡的資訊會影響所有的traces
traceFlags TraceFlags
traceState TraceState
remote bool
}
// TraceState用來提供額外的特定vendor的trace identification
type TraceState struct {
list []member
}
type member struct {
Key string
Value string
}
這是Inject與Extract在Go的程式碼連結
// Inject set tracecontext from the Context into the carrier.
func (tc TraceContext) Inject(ctx context.Context, carrier TextMapCarrier) {
// 這裡從context生成一個SpanContext
sc := trace.SpanContextFromContext(ctx)
if !sc.IsValid() {
return
}
if ts := sc.TraceState().String(); ts != "" {
carrier.Set(tracestateHeader, ts)
}
// Clear all flags other than the trace-context supported sampling bit.
flags := sc.TraceFlags() & trace.FlagsSampled
h := fmt.Sprintf("%.2x-%s-%s-%s",
supportedVersion,
sc.TraceID(),
sc.SpanID(),
flags)
carrier.Set(traceparentHeader, h)
}
// Extract reads tracecontext from the carrier into a returned Context.
//
// The returned Context will be a copy of ctx and contain the extracted
// tracecontext as the remote SpanContext. If the extracted tracecontext is
// invalid, the passed ctx will be returned directly instead.
func (tc TraceContext) Extract(ctx context.Context, carrier TextMapCarrier) context.Context {
sc := tc.extract(carrier)
if !sc.IsValid() {
return ctx
}
return trace.ContextWithRemoteSpanContext(ctx, sc)
}
看到Interface, 也看到SpanContext的存取方法
// Warning: methods may be added to this interface in minor releases.
type Span interface {
End(options ...SpanEndOption)
AddEvent(name string, options ...EventOption)
IsRecording() bool
RecordError(err error, options ...EventOption)
// SpanContext returns the SpanContext of the Span. The returned SpanContext
// is usable even after the End method has been called for the Span.
SpanContext() SpanContext
SetStatus(code codes.Code, description string)
SetName(name string)
SetAttributes(kv ...attribute.KeyValue)
TracerProvider() TracerProvider
}
其實Trace是一堆Span的關聯樹.
如果一個span沒有parent span, 則會被稱為root span.
當該root span被建立時, 也會生成一個新的TraceID.
對於有parent span的child span, 它們的traceID會一樣.
且child span會繼承parent span的所有TraceState的內容.
// WithNewRoot specifies that the Span should be treated as a root Span. Any
// existing parent span context will be ignored when defining the Span's trace
// identifiers.
func WithNewRoot() SpanStartOption {
return spanOptionFunc(func(cfg SpanConfig) SpanConfig {
cfg.newRoot = true
return cfg
})
}
// Tracer is the creator of Spans.
//
// Warning: methods may be added to this interface in minor releases.
type Tracer interface {
// Start creates a span and a context.Context containing the newly-created span.
//
// If the context.Context provided in `ctx` contains a Span then the newly-created
// Span will be a child of that span, otherwise it will be a root span. This behavior
// can be overridden by providing `WithNewRoot()` as a SpanOption, causing the
// newly-created Span to be a root span even if `ctx` contains a Span.
//
// When creating a Span it is recommended to provide all known span attributes using
// the `WithAttributes()` SpanOption as samplers will only have access to the
// attributes provided when a Span is created.
//
// Any Span that is created MUST also be ended. This is the responsibility of the user.
// Implementations of this API may leak memory or other resources if Spans are not ended.
Start(ctx context.Context, spanName string, opts ...SpanStartOption) (context.Context, Span)
}
在Tracer interface內也有提到root span.
生成TraceID的實做則是在SDK內, 這裡給IDGenerator的連結做參考.
Span是一個操作的狀態反應, 操作可能成功或失敗.
這狀態也是會反應在Span的StatusCode,
StatusCode有三種狀態
這裡是程式碼連結
s := &tracepb.Span{
TraceId: tid[:],
SpanId: sid[:],
TraceState: sd.SpanContext().TraceState().String(),
Status: status(sd.Status().Code, sd.Status().Description),
StartTimeUnixNano: uint64(sd.StartTime().UnixNano()),
EndTimeUnixNano: uint64(sd.EndTime().UnixNano()),
Links: links(sd.Links()),
Kind: spanKind(sd.SpanKind()),
Name: sd.Name(),
Attributes: KeyValues(sd.Attributes()),
Events: spanEvents(sd.Events()),
DroppedAttributesCount: uint32(sd.DroppedAttributes()),
DroppedEventsCount: uint32(sd.DroppedEvents()),
DroppedLinksCount: uint32(sd.DroppedLinks()),
}
// status transform a span code and message into an OTLP span status.
func status(status codes.Code, message string) *tracepb.Status {
var c tracepb.Status_StatusCode
switch status {
case codes.Ok:
c = tracepb.Status_STATUS_CODE_OK
case codes.Error:
c = tracepb.Status_STATUS_CODE_ERROR
default:
c = tracepb.Status_STATUS_CODE_UNSET
}
return &tracepb.Status{
Code: c,
Message: message,
}
}
看到trace 列表上有出現**1 Error```
點進去該Trace就能看到Error的tag是True
點到JaegerMonitor頁面, 還能反應出Error rate數據
朋友今天傳了這張圖, 詢問上面各個屬性是什麼呢?
這些都是SpanContext內的欄位
Span Context
SpanId, TraceId, ParentId
我想閱讀到這的朋友都知道了!?
TraceState
與TraceFlag
則是與sampling採樣有關
TraceState則是有兩個值組成的r-value
和 p-value
Trace Flag只是一個8-bit欄位, 只有指定了sampled
這flag
TraceState: Probability Sampling
W3C - Sampled Flag
我讀懂後再多補充
Tags
則是對該Span進行註解與補充的K:V pair.
我們可以對已知的場景補充K-V, 甚至你能拿這K-V, 輔助你找log與metric(PromQL的Label與Value)
已知場景通常是跟業務相關的資料.
Baggage
也是以K:V pair形式存在SpanContext內, 但是這資訊會在這條請求鏈入上所有的Span內傳播.
用途跟Tags也很類似, 就是協助我們對這整個上下文內容的註解跟補充, 對我們找Log, Metrics都有所幫助.
Baggage通常相對是存放非業務相關的資料, 像是UserID, AccountID ...
Baggage在整條鏈路的全局範圍內.
而Span Tags則不會傳到下一個span , 也不會被子級的Span給繼承.
所以Span Tags多補充點對網路級存儲來說不會有倍增效應.
但Baggage就會有這議題在.
有些詞也能查找OpenCensus或OpenTracing找找中文說明會快速些.
Baggage != Span Attributes
m...中秋節拼鐵人文好硬
Tracing到目前為主我覺得很像個黏著劑, 把一堆singals給串連起來.
框架裡要啟用Tracing, 這些ID才會被產生喔!
OTel specification - Trace API