iT邦幫忙

2022 iThome 鐵人賽

DAY 25
0
DevOps

那些文件沒告訴你的 AWS EKS系列 第 25

[25] 為什麼 EKS add-on 可以管理 Kubernetes plugin

  • 分享至 

  • xImage
  •  

在 Kubernetes 中,我們經常地使用第三方的 plugin 來整合 AWS 服務,而在 EKS 上也提供對應的 plugin 版本,如 VPC CNI plugin、CoreDNS、kube-proxy、EBS CSI driver 等 plugins。而 EKS add-on 功能則提供經管理的 add-on 包含安全性更新、修復漏洞皆是由 EKS add-on 功能所管理並進行更新。

故本文將探討為何 EKS 環境可以主動更新已經部署的 Kubernetes plugin,那我自己所設定的環境變數為什麼不會被修改。

啟用 EKS add-on

以下將原先 VPC CNI plugin 更新使用 EKS add-on 作為範例:

  1. 查看 Service Account aws-node 所使用關聯的 IRSA(IAM roles for service accounts) [2] IAM role arn。
$ kubectl -n kube-system get sa aws-node -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::111111111111:role/eksctl-ironman-2022-addon-iamserviceaccount-kube-Role1-EMRZZYBVKCRL
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"aws-node","namespace":"kube-system"}}
  creationTimestamp: "2022-09-14T09:46:16Z"
  labels:
    app.kubernetes.io/managed-by: eksctl
  name: aws-node
  namespace: kube-system
  resourceVersion: "1657"
  uid: 10f55f1c-cff2-47cd-96e0-3ca6d37e69ce
secrets:
- name: aws-node-token-9z9wq
  1. 確認目前使用的 VPC CNI plugin 版本為 v1.10.1
$ kubectl -n kube-system get ds aws-node -o yaml | grep "image: "
        image: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni:v1.10.1-eksbuild.1
        image: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni-init:v1.10.1-eksbuild.1
  1. 透過 eksctl utils describe-addon-versions 命令檢視,目前 EKS cluster 所支援的 add-on 版本。以下擷取 vpc-cni 最新版本為 v1.11.4-eksbuild.1
$ eksctl utils describe-addon-versions --cluster=ironman-2022 | tail -n 1 | jq .
...
...
...

  {
    "AddonName": "vpc-cni",
    "AddonVersions": [
      {
        "AddonVersion": "v1.11.4-eksbuild.1",
        "Architecture": [
          "amd64",
          "arm64"
        ],
        "Compatibilities": [
          {
            "ClusterVersion": "1.22",
            "DefaultVersion": false,
            "PlatformVersions": [
              "*"
            ]
          }
        ]
      },
  1. 使用 eksctl create addon 命令更新 VPC CNI plugin 至 v1.11.4-eksbuild.1 並綁定對應 Service Account IAM role ARN。
$ eksctl create addon --name vpc-cni --version v1.11.4-eksbuild.1 --service-account-role-a
rn="arn:aws:iam::111111111111:role/eksctl-ironman-2022-addon-iamserviceaccount-kube-Role1-EMRZZYBVKCRL" --cluster=ironman-2022
2022-10-10 11:22:30 [ℹ]  Kubernetes version "1.22" in use by cluster "ironman-2022"
2022-10-10 11:22:30 [ℹ]  when creating an addon to replace an existing application, e.g. CoreDNS, kube-proxy & VPC-CNI the --force
 flag will ensure the currently deployed configuration is replaced
2022-10-10 11:22:30 [ℹ]  using provided ServiceAccountRoleARN "arn:aws:iam::111111111111:role/eksctl-ironman-2022-addon-iamserviceaccou
nt-kube-Role1-EMRZZYBVKCRL"
2022-10-10 11:22:30 [ℹ]  creating addon
Error: addon status transitioned to "CREATE_FAILED"

由上述錯誤資訊,可以觀察到 addon 建立失敗 CREATE_FAILED
5. 嘗試使用 --force 參數強制更新。

$ eksctl create addon --name vpc-cni --version v1.11.4-eksbuild.1 --service-account-role-arn="arn:aws:iam::111111111111:role/eksctl-ironman-2022-addon-iamserviceaccount-kube-Role1-EMRZZYBVKCRL" --cluster=ironman-2022 --force
2022-10-10 11:24:02 [ℹ]  Kubernetes version "1.22" in use by cluster "ironman-2022"
2022-10-10 11:24:02 [ℹ]  using provided ServiceAccountRoleARN "arn:aws:iam::111111111111:role/eksctl-ironman-2022-addon-iamserviceaccount-kube-Role1-EMRZZYBVKCRL"
2022-10-10 11:24:02 [ℹ]  creating addon
2022-10-10 11:27:32 [ℹ]  addon "vpc-cni" active
  1. 確認 VPC CNI plugin 更新 image 至 v1.11.4-eksbuild.1 版本。
$ kubectl -n kube-system get ds aws-node -o yaml | grep "image: "
        image: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni:v1.11.4-eksbuild.1
        image: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni-init:v1.11.4-eksbuild.1

解析

不免俗過往套路,有什麼更新先找 audit log 來一探究竟,使用以下 CloudWatch Logs Insights syntax 查看 DaemonSet aws-node 更新:

filter @logStream like /^kube-apiserver-audit/
 | fields @timestamp, @message
 | sort @timestamp asc
 | filter objectRef.name == 'aws-node' AND objectRef.resource == 'daemonsets' AND verb == 'patch'
 | limit 10000

第一筆為透過 eksctl 命令第一次更新失敗的資訊,其失敗原因為 .spec.template.spec.containers.spec.template.spec.initContainers 欄位 FieldManagerConflict :

    "@message": {
      "kind": "Event",
      "apiVersion": "audit.k8s.io/v1",
      "level": "RequestResponse",
      "auditID": "b7e38710-2b5c-4317-a899-7a8b0f603b80",
      "stage": "ResponseComplete",
      "requestURI": "/apis/apps/v1/namespaces/kube-system/daemonsets/aws-node?dryRun=All&fieldManager=eks&force=false&timeout=10s",
      "verb": "patch",
      "user": {
        "username": "eks:addon-manager",
        "uid": "aws-iam-authenticator:348370419699:AROAVCHD6X7Z4UOOTWH7N",
        "groups": [
          "system:authenticated"
        ],
        ...
      "userAgent": "Go-http-client/1.1",
      "objectRef": {
        "resource": "daemonsets",
        "namespace": "kube-system",
        "name": "aws-node",
        "apiGroup": "apps",
        "apiVersion": "v1"
      },
      "responseStatus": {
        "metadata": {},
        "status": "Failure",
        "reason": "Conflict",
        "code": 409
      },
      ...
      ...
      ...
      "responseObject": {
        "kind": "Status",
        "apiVersion": "v1",
        "metadata": {},
        "status": "Failure",
        "message": "Apply failed with 2 conflicts: conflicts with \"kubectl-client-side-apply\" using apps/v1:\n- .spec.template.spec.containers[name=\"aws-node\"].image\n- .spec.template.spec.initContainers[name=\"aws-vpc-cni-init\"].image",
        "reason": "Conflict",
        "details": {
          "causes": [
            {
              "reason": "FieldManagerConflict",
              "message": "conflict with \"kubectl-client-side-apply\" using apps/v1",
              "field": ".spec.template.spec.containers[name=\"aws-node\"].image"
            },
            {
              "reason": "FieldManagerConflict",
              "message": "conflict with \"kubectl-client-side-apply\" using apps/v1",
              "field": ".spec.template.spec.initContainers[name=\"aws-vpc-cni-init\"].image"
            }
          ]
        },
        "code": 409
      },
      "requestReceivedTimestamp": "2022-10-10T11:22:31.607660Z",
      "stageTimestamp": "2022-10-10T11:22:31.613812Z",

第二筆為透過 --forece 成功更新,可以觀察到:

  • requestURI:使用了 force=true
  • responseObject:多了 managedFields 欄位,其 manager 為 eks。
    "@message": {
      "kind": "Event",
      "apiVersion": "audit.k8s.io/v1",
      "level": "RequestResponse",
      "auditID": "40e4c444-65f0-49a8-8efa-f019a0e38508",
      "stage": "ResponseComplete",
      "requestURI": "/apis/apps/v1/namespaces/kube-system/daemonsets/aws-node?fieldManager=eks&force=true&timeout=10s",
      "verb": "patch",
      "user": {
        "username": "eks:addon-manager",
        "uid": "aws-iam-authenticator:348370419699:AROAVCHD6X7Z4UOOTWH7N",
        "groups": [
          "system:authenticated"
        ],
      ...
      ...
      ...
      "userAgent": "Go-http-client/1.1",
      "objectRef": {
        "resource": "daemonsets",
        "namespace": "kube-system",
        "name": "aws-node",
        "apiGroup": "apps",
        "apiVersion": "v1"
      },
      "responseStatus": {
        "metadata": {},
        "code": 200
      },
      ...
      ...
      "responseObject": {
        "kind": "DaemonSet",
        "apiVersion": "apps/v1",
        "metadata": {
          "name": "aws-node",
          "namespace": "kube-system",
          "uid": "19ae2e66-9bf4-4d21-9161-c0f564a62b34",
          "resourceVersion": "7590998",
          "generation": 8,
          "creationTimestamp": "2022-09-14T09:46:16Z",
          "labels": {
            "k8s-app": "aws-node"
          },
        ...
        ...
        ...
          "managedFields": [
            {
              "manager": "eks",
              "operation": "Apply",
              "apiVersion": "apps/v1",
              "time": "2022-10-10T11:24:03Z",
              "fieldsType": "FieldsV1",
              "fieldsV1": {
                "f:metadata": {
                  "f:labels": {
                    "f:k8s-app": {}
                  }
                },
                "f:spec": {
                  "f:selector": {},
                  "f:template": {
                    "f:metadata": {
                      "f:labels": {
                        "f:app.kubernetes.io/name": {},
                        "f:k8s-app": {}
                      }
                    },
                    "f:spec": {
                      "f:affinity": {
                        "f:nodeAffinity": {
                          "f:requiredDuringSchedulingIgnoredDuringExecution": {}
                        }
                      },
                      "f:containers": {
                        "k:{\"name\":\"aws-node\"}": {
                          ".": {},
                          "f:env": {
                            "k:{\"name\":\"AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER\"}": {
                              ".": {},
                              "f:name": {},
                              "f:value": {}
                            },
    ...
    ...

故我們也能使用 kubectl 命令 --show-managed-fields 參數也能觀察 Field management。

$ kubectl -n kube-system get ds aws-node --show-managed-fields -o yaml | grep -A5 "managedFields"
  managedFields:
  - apiVersion: apps/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:

Server-Side Apply

根據 EKS add-on [1] 及 blog [3] 皆提及 EKS add-on 功能使用 Kubernetes 1.18 feature - Server-Side Apply [4]。此 Server-Side Apply 功能將 kubectl apply 功能移植到 Kubernetes API server side,當設定檔欄位有所衝突時提供對應解決衝突的合併演算法。

為追蹤管理 Field Management 則被處存為 metadata object managedFields [5] 欄位。

回到 EKS 如何使用 Field Management [6] 可以分為以下兩類型:

Fully managed:所有 Field Management 指定了 f: 但沒有 k: ,則代表 Fully managed,任何修改都會造成衝突。

以下以 aws-node 為例,此 container name aws-node 是由 EKS 所管理,其中 argsimageimagePullPolicy sub-fields 皆是由 EKS 所管理。修改任何數值於這些欄位都會造成衝突。因此在上述第一次啟用 EKS addon 時,其中 .spec.template.spec.containers.spec.template.spec.initContainers 衝突而失敗。

  f:containers:
    k:{"name":"aws-node"}:
    .: {}
    f:args: {}
    f:image: {}
    f:imagePullPolicy: {}
    ...
  manager: eks

Partially managed:如果 Field Management key 有指定值, 修改 key 則會造成衝突。

以下以 aws-node 為例環境變數為例,eks 管理包含以 ADDITIONAL_ENI_TAGSAWS_VPC_CNI_NODE_PORT_SUPPORT... 為 key 的環境變數。因此修改環境變數名稱時則會造成衝突,故我們仍可以保留 EKS add-on 的環境變數值。

f:containers:
  k:{"name":"aws-node"}:
    .: {}
    f:env:
      .: {}
      k:{"name":"ADDITIONAL_ENI_TAGS"}:
        .: {}
        f:name: {}
        f:value: {}
      k:{"name":"AWS_VPC_CNI_NODE_PORT_SUPPORT"}:
        .: {}
        f:name: {}
        f:value: {}
      k:{"name":"AWS_VPC_ENI_MTU"}:
        .: {}
        f:name: {}
        f:value: {}
      k:{"name":"AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER"}:
        .: {}
        f:name: {}
        f:value: {}
      k:{"name":"AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG"}:
        .: {}
        f:name: {}
        f:value: {}
      k:{"name":"AWS_VPC_K8S_CNI_EXTERNALSNAT"}:
        .: {}
        f:name: {}
        f:value: {}

總結

EKS add-on 功能主要導入 Kubernetes Server-Side Apply,雖然由 AWS 來維護安全性漏洞或是版本上更新較為方便。若是需要自定義時,仍需要對照 Field Management 欄位。如同 CoreDNS 在部分使用場境會設定為 DaemonSet 方式部署,則可能無法藉由 add-on 方式進行。

參考文件

  1. Amazon EKS add-on configuration - https://docs.aws.amazon.com/eks/latest/userguide/add-ons-configuration.html
  2. IAM roles for service accounts - https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
  3. Introducing Amazon EKS add-ons: lifecycle management for Kubernetes operational software - https://aws.amazon.com/blogs/containers/introducing-amazon-eks-add-ons/
  4. Server-Side Apply - https://kubernetes.io/docs/reference/using-api/server-side-apply
  5. ObjectMeta v1 meta - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.25/#objectmeta-v1-meta
  6. Amazon EKS add-on configuration - Understanding field management syntax in the Kubernetes API - https://docs.aws.amazon.com/eks/latest/userguide/add-ons-configuration.html#add-on-config-management-understanding-field-management

上一篇
[24] 為什麼 EKS worker node IP address 容易佔用 IP 或是 subnets IP 地址不夠(二)
下一篇
[26] 為什麼 EKS 在更新 Kubernetes deployment 時會有 HTTP 502 的 error(一)
系列文
那些文件沒告訴你的 AWS EKS30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言