iT邦幫忙

2024 iThome 鐵人賽

DAY 27
1
Kubernetes

成為 Kubernetes 特級咒術師的 30 天修行系列 第 27

第二十七篇:Opentelemetry 在 Kubernetes領域展開第一步

  • 分享至 

  • xImage
  •  

前面介紹了可觀測性,介紹了目前開源的可觀測性工具Opentelemetry

  • 基本的組成元件
  • Opentelemetry的基本名詞(Trace、三種遙測資料、上下文傳播...等等)
  • Opentelemetry Collector的架構(recevier、processor、exporter)
  • 簡單介紹如何用python code 實現基本產生遙測資料、將span連結 以及 目前遇到的一些挑戰

到最後才發現,怎麼還沒有提到關於Opentelemetry在Kubernetes的內容,我只能說小弟不才很難消化XD
自己在看上面這些東西的時候,也是花了幾個月的時間,不斷地try,從try裡面找知識。我認識厲害的人都是先看懂大概再try,我都是邊try邊從try中消化。不然概念都很模糊,沒有親自做看看,不會懂什麼是可觀測性,不知道在微服務中為什麼需要這個東西。

像我自己在Kubernetes中最後只有部署Operator、Collecot、Instrumentation。

部署Operator是為了能夠讓我們,在Kubernetes中部署已定義好的CRD(CustomResourceDefinition)
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

這邊小提醒,官方有一行字有提醒說要先安裝cert-manager哦

To install the operator in an existing cluster, make sure you have cert-manager installed and run:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.6.3/cert-manager.yaml

千萬不要像我一樣英文不好看到指令就先執行了XD

安裝完Operator就可以在Kuberentes中部署OpenTelemetryCollector的資源了

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otelcol
spec:
  mode: daemonset
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
            cors:
              allowed_origins:
                - "http://*"
                - "https://*"
    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 15
      batch:
        send_batch_size: 10000
        timeout: 10s
    exporters:
      # NOTE: Prior to v0.86.0 use `logging` instead of `debug`.
      debug:
      otlp/jaeger/A:
        endpoint: "jaeger-collector.xxx.svc.cluster.local:14250"
        tls:
          insecure: true
      otlp/jaeger/B:
        endpoint: "jaeger-collector.xxx.svc.cluster.local:4317"
        tls:
          insecure: true
      otlp/python/C:
        endpoint: "xx.xx.x.xxx:8003"
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [otlp/jaeger/A, otlp/jaeger/B, otlp/python/C, debug]
        metrics:
          receivers: [otlp]
          processors: [batch]
          exporters: [debug]
        logs:
          receivers: [otlp]
          processors: [batch]
          exporters: [debug]

部署完Collector就能部署Instrumentation

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: xxx-instrumentation
spec:
  exporter:
    endpoint: http://otelcol-collector.xxx.svc.cluster.local:4317
    endpoint: http://otelcol-collector.xxx.svc.cluster.local:4318
  propagators:
    - tracecontext
    - baggage
    - b3
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
    env:
      - name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
        value: 'true'
  sampler:
    type: parentbased_traceidratio
    argument: "1"

今天以我的微服務為例

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  namespace: xxx
  labels:
    app: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
      annotations: 
        instrumentation.opentelemetry.io/inject-python: "true"
    spec:
      containers:
      - name: test
        image: lgcat/exp_test:v1
        imagePullPolicy: Always
        ports:
        - containerPort: 8000
        env:
        - name: REDIS_SERVER
          value: "redis://redis.xxx.svc.cluster.local:6379"
        - name: USER_SERVICE
          value: http://test1-service.xxx.svc.cluster.local:8001
        - name: OTLP_GRPC_ENDPOINT
          value: http://otelcol-collector.xxx.svc.cluster.local:4317
        resources:
          requests:
            cpu: "50m"
            memory: "64Mi"
      imagePullSecrets:
        - name: regcred
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - 10.20.1.231
                - 10.20.1.233
---
apiVersion: v1
kind: Service
metadata:
  name: test
  namespace: xxx
  labels:
    app: test-service
spec:
  selector:
    app: test
  ports:
    - protocol: TCP
      port: 8000
      targetPort: 8004 
      nodePort: 8000
  type: NodePort

部署完之後我們可以describe微服務看看,你會發現跟一般部署會有一點不一樣,你會發現多一個Init Containers,如下

Init Containers:
  opentelemetry-auto-instrumentation-python:
    Container ID:  docker://30549f52f5d417cbc14a5f4061c163f7361bbcbdfcd4d7e80b1f7ae38fdee8b9
    Image:         ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
    Image ID:      docker-pullable://ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python@sha256:6cc9c9f0a0f320f9585c3869e62b960c8ce5f92751b07ca22372805c62e987b0
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -r
      /autoinstrumentation/.
      /otel-auto-instrumentation-python
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 28 Sep 2024 06:11:24 +0000
      Finished:     Sat, 28 Sep 2024 06:11:31 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  32Mi
    Requests:
      cpu:        50m
      memory:     32Mi
    Environment:  <none>
    Mounts:
      /otel-auto-instrumentation-python from opentelemetry-auto-instrumentation-python (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rfzm7 (ro)
Containers:
  test:
    Container ID:   docker://c23f31148de65dc6186cbbcfcdb95643482c90817d9aa57dbb324dcc198d5405
    Image:          lgcat/exp_test:v1
    Image ID:       docker-pullable://lgcat/exp_test@sha256:fccbd8e3c66986c429afba3be6b61963c5c523cbfe91edb3e47459643afecacb
    Port:           8000/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sat, 28 Sep 2024 06:11:39 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     50m
      memory:  64Mi
    Environment:
      OTEL_NODE_IP:                                       (v1:status.hostIP)
      OTEL_POD_IP:                                        (v1:status.podIP)
      REDIS_SERVER:                                      redis://redis.xxx.svc.cluster.local:6379
      USER_SERVICE:                                      http://test1-service.xxx.svc.cluster.local:8001
      OTLP_GRPC_ENDPOINT:                                http://otelcol-collector.xxx.svc.cluster.local:4317
      OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED:  true
      PYTHONPATH:                                        /otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation:/otel-auto-instrumentation-python
      OTEL_TRACES_EXPORTER:                              otlp
      OTEL_EXPORTER_OTLP_TRACES_PROTOCOL:                http/protobuf
      OTEL_METRICS_EXPORTER:                             otlp
      OTEL_EXPORTER_OTLP_METRICS_PROTOCOL:               http/protobuf
      OTEL_SERVICE_NAME:                                 test
      OTEL_EXPORTER_OTLP_ENDPOINT:                       http://otelcol-collector.xxx.svc.cluster.local:4318
      OTEL_RESOURCE_ATTRIBUTES_POD_NAME:                 test-74d786f79f-xm8kx (v1:metadata.name)
      OTEL_RESOURCE_ATTRIBUTES_NODE_NAME:                 (v1:spec.nodeName)
      OTEL_PROPAGATORS:                                  tracecontext,baggage,b3
      OTEL_TRACES_SAMPLER:                               parentbased_traceidratio
      OTEL_TRACES_SAMPLER_ARG:                           1
      OTEL_RESOURCE_ATTRIBUTES:                          k8s.container.name=test,k8s.deployment.name=test,k8s.namespace.name=xxx,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.replicaset.name=test-74d786f79f,service.instance.id=xxx.$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME).test,service.version=v1
    Mounts:
      /otel-auto-instrumentation-python from opentelemetry-auto-instrumentation-python (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rfzm7 (ro)

從上面的訊息我們可以看得出來Init Container 主要的任務是將 OpenTelemetry 的自動化監控工具從容器內的 /autoinstrumentation/ 目錄複製到主容器可以訪問的指定目錄 /otel-auto-instrumentation-python。這個複製動作是通過 cp -r /autoinstrumentation/. /otel-auto-instrumentation-python 命令來完成的。

Init Container 和主容器之間共享了 /otel-auto-instrumentation-python 這個掛載點。這代表當 Init Container 完成複製工作後,主容器將能夠在其運行過程中訪問這些已安裝的 OpenTelemetry 自動化監控工具。

主容器啟動後,會透過這個掛載的目錄來訪問 OpenTelemetry 的工具,並自動開始收集跟蹤數據(traces)、日誌和指標。

所以透過部署Operator、Collecotor、Instrumentation,就可以簡單做到,雖然看似簡單,但這都是我們在Kubernetes上完成可觀測性的很大一步,有了Instrumentation可以產生遙測資料,export到Collector,在根據Collector的exporter傳送到後續可視化工具之中。


上一篇
第二十六篇:上下文傳播 How to do
下一篇
第二十八篇:引入 Jaeger 追蹤資料的可視化工具
系列文
成為 Kubernetes 特級咒術師的 30 天修行30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言