iT邦幫忙

2022 iThome 鐵人賽

DAY 10
0
DevOps

那些文件沒告訴你的 AWS EKS系列 第 10

[10] 為什麼 Managed node groups 可以保持應用程式可用性(一)

  • 分享至 

  • xImage
  •  

在 EKS 環境中,node group 作為 Kubernetes nodes,又可以分為以下兩種:

其中 Managed node groups 提及了對於使用者端來說,node 更新及終止時皆會透過 Kubernetes API 方式自動 drain 來確保應用程式 Pod 可用性。

故本文將探討 EKS Managed node groups 在 node 上面是否多做了哪些設定。

建置環境

  1. 透過 eksctl 命令建立新的 managed node group demo-ng
$ eksctl create nodegroup --nodes=1 --ssh-access --ssh-public-key=maowang --managed --name=demo-ng --cluster=ironman-2022
2022-09-25 12:58:38 [ℹ]  will use version 1.22 for new nodegroup(s) based on control plane version
2022-09-25 12:58:38 [ℹ]  nodegroup "demo-ng" will use "" [AmazonLinux2/1.22]
2022-09-25 12:58:39 [ℹ]  using EC2 key pair %!q(*string=<nil>)
2022-09-25 12:58:39 [ℹ]  1 existing nodegroup(s) (ng1-public-ssh) will be excluded
2022-09-25 12:58:39 [ℹ]  1 nodegroup (demo-ng) was included (based on the include/exclude rules)
2022-09-25 12:58:39 [ℹ]  will create a CloudFormation stack for each of 1 managed nodegroups in cluster "ironman-2022"
2022-09-25 12:58:39 [ℹ]
2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "demo-ng" } }
}
2022-09-25 12:58:39 [ℹ]  checking cluster stack for missing resources
2022-09-25 12:58:41 [ℹ]  cluster stack has all required resources
2022-09-25 12:58:43 [ℹ]  building managed nodegroup stack "eksctl-ironman-2022-nodegroup-demo-ng"
2022-09-25 12:58:43 [ℹ]  deploying stack "eksctl-ironman-2022-nodegroup-demo-ng"
2022-09-25 12:58:44 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-demo-ng"
2022-09-25 12:59:14 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-demo-ng"
2022-09-25 13:00:04 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-demo-ng"
2022-09-25 13:01:23 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-demo-ng"
2022-09-25 13:02:07 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-demo-ng"
2022-09-25 13:02:09 [ℹ]  no tasks
2022-09-25 13:02:09 [✔]  created 0 nodegroup(s) in cluster "ironman-2022"
2022-09-25 13:02:09 [ℹ]  nodegroup "demo-ng" has 1 node(s)
2022-09-25 13:02:09 [ℹ]  node "ip-192-168-18-230.eu-west-1.compute.internal" is ready
2022-09-25 13:02:09 [ℹ]  waiting for at least 1 node(s) to become ready in "demo-ng"
2022-09-25 13:02:09 [ℹ]  nodegroup "demo-ng" has 1 node(s)
2022-09-25 13:02:09 [ℹ]  node "ip-192-168-18-230.eu-west-1.compute.internal" is ready
2022-09-25 13:02:09 [✔]  created 1 managed nodegroup(s) in cluster "ironman-2022"
2022-09-25 13:02:12 [ℹ]  checking security group configuration for all nodegroups
2022-09-25 13:02:12 [ℹ]  all nodegroups have up-to-date cloudformation templates
  1. 透過 eksctl 命令建立新的 self-node group demo-self-ng
$ eksctl create nodegroup --nodes=1 --ssh-access --ssh-public-key=maowang --name=demo-self-ng --cluster=ironman-2022 --managed=false
2022-09-25 14:03:44 [ℹ]  will use version 1.22 for new nodegroup(s) based on control plane version
2022-09-25 14:03:45 [ℹ]  nodegroup "demo-self-ng" will use "ami-08702177b0dcfc054" [AmazonLinux2/1.22]
2022-09-25 14:03:45 [ℹ]  using EC2 key pair %!q(*string=<nil>)
2022-09-25 14:03:47 [ℹ]  2 existing nodegroup(s) (demo-ng,ng1-public-ssh) will be excluded
2022-09-25 14:03:47 [ℹ]  1 nodegroup (demo-self-ng) was included (based on the include/exclude rules)
2022-09-25 14:03:47 [ℹ]  will create a CloudFormation stack for each of 1 nodegroups in cluster "ironman-2022"
2022-09-25 14:03:47 [ℹ]
2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create nodegroup "demo-self-ng" } }
}
2022-09-25 14:03:47 [ℹ]  checking cluster stack for missing resources
2022-09-25 14:03:51 [ℹ]  cluster stack has all required resources
2022-09-25 14:03:51 [ℹ]  building nodegroup stack "eksctl-ironman-2022-nodegroup-demo-self-ng"
2022-09-25 14:03:51 [ℹ]  --nodes-min=1 was set automatically for nodegroup demo-self-ng
2022-09-25 14:03:51 [ℹ]  --nodes-max=1 was set automatically for nodegroup demo-self-ng
2022-09-25 14:03:51 [ℹ]  deploying stack "eksctl-ironman-2022-nodegroup-demo-self-ng"
2022-09-25 14:03:51 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-demo-self-ng"
2022-09-25 14:04:21 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-demo-self-ng"
2022-09-25 14:04:52 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-demo-self-ng"
2022-09-25 14:05:26 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-demo-self-ng"
2022-09-25 14:06:53 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-demo-self-ng"
2022-09-25 14:07:53 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-demo-self-ng"
2022-09-25 14:07:55 [ℹ]  no tasks
2022-09-25 14:07:55 [ℹ]  adding identity "arn:aws:iam::111111111111:role/eksctl-ironman-2022-nodegroup-demo-sel-NodeInstanceRole-19TJJ2L7I7GOD" to auth ConfigMap
2022-09-25 14:07:55 [ℹ]  nodegroup "demo-self-ng" has 0 node(s)
2022-09-25 14:07:55 [ℹ]  waiting for at least 1 node(s) to become ready in "demo-self-ng"
2022-09-25 14:08:20 [ℹ]  nodegroup "demo-self-ng" has 1 node(s)
2022-09-25 14:08:20 [ℹ]  node "ip-192-168-61-255.eu-west-1.compute.internal" is ready
2022-09-25 14:08:20 [✔]  created 1 nodegroup(s) in cluster "ironman-2022"
2022-09-25 14:08:20 [✔]  created 0 managed nodegroup(s) in cluster "ironman-2022"
2022-09-25 14:08:26 [ℹ]  checking security group configuration for all nodegroups
2022-09-25 14:08:26 [ℹ]  all nodegroups have up-to-date cloudformation templates
  1. 建立以下 Nginx deployment YAMl。
$ cat ./nginx-normal.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: "demo"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: "nginx-demo-deployment"
  namespace: "demo"
  labels:
    app: "nginx-demo"
spec:
  selector:
    matchLabels:
      app: "nginx-demo"
  replicas: 4
  template:
    metadata:
      labels:
        app: "nginx-demo"
    spec:
      containers:
      - image: nginx:latest
        imagePullPolicy: Always
        name: "nginx-demo"
        ports:
        - containerPort: 80

  1. 部署 Nginx deployment。
$ kubectl apply -f ./nginx-normal.yaml
namespace/demo created
deployment.apps/nginx-demo-deployment created
  1. 確認 Nginx Pod。
$ kubectl -n demo get po -o wide
NAME                                     READY   STATUS    RESTARTS   AGE   IP               NODE                                           NOMINATED NODE   READINESS GATES
nginx-demo-deployment-7587998bf4-dkd5t   1/1     Running   0          14s   192.168.29.28    ip-192-168-18-230.eu-west-1.compute.internal   <none>           <none>
nginx-demo-deployment-7587998bf4-ll6x4   1/1     Running   0          14s   192.168.55.113   ip-192-168-43-123.eu-west-1.compute.internal   <none>           <none>
nginx-demo-deployment-7587998bf4-rcg6f   1/1     Running   0          14s   192.168.78.204   ip-192-168-65-131.eu-west-1.compute.internal   <none>           <none>
nginx-demo-deployment-7587998bf4-znf82   1/1     Running   0          14s   192.168.71.187   ip-192-168-65-131.eu-west-1.compute.internal   <none>           <none>

比對差異

透過 kuebctl describe 查看 node 上的 label:

## managed node group

$ kubectl describe node ip-192-168-18-230.eu-west-1.compute.internal
Name:               ip-192-168-18-230.eu-west-1.compute.internal
Roles:              <none>
Labels:             alpha.eksctl.io/cluster-name=ironman-2022
                    alpha.eksctl.io/nodegroup-name=demo-ng
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5.large
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/capacityType=ON_DEMAND
                    eks.amazonaws.com/nodegroup=demo-ng
                    eks.amazonaws.com/nodegroup-image=ami-08702177b0dcfc054
                    eks.amazonaws.com/sourceLaunchTemplateId=lt-04b22845f7d14ce5f
                    eks.amazonaws.com/sourceLaunchTemplateVersion=1
                    failure-domain.beta.kubernetes.io/region=eu-west-1
                    failure-domain.beta.kubernetes.io/zone=eu-west-1a
                    k8s.io/cloud-provider-aws=1603bd56dd5485ad752a90bddddf4c25
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-192-168-18-230.eu-west-1.compute.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m5.large
                    topology.kubernetes.io/region=eu-west-1
                    topology.kubernetes.io/zone=eu-west-1a
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 25 Sep 2022 13:00:52 +0000
...

## self-managed nodes 

$ kubectl describe node ip-192-168-61-255.eu-west-1.compute.internal
Name:               ip-192-168-61-255.eu-west-1.compute.internal
Roles:              <none>
Labels:             alpha.eksctl.io/cluster-name=ironman-2022
                    alpha.eksctl.io/instance-id=i-02eccc7d3fa79fc6f
                    alpha.eksctl.io/nodegroup-name=demo-self-ng
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5.large
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=eu-west-1
                    failure-domain.beta.kubernetes.io/zone=eu-west-1c
                    k8s.io/cloud-provider-aws=1603bd56dd5485ad752a90bddddf4c25
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-192-168-61-255.eu-west-1.compute.internal
                    kubernetes.io/os=linux
                    node-lifecycle=on-demand
                    node.kubernetes.io/instance-type=m5.large
                    topology.kubernetes.io/region=eu-west-1
                    topology.kubernetes.io/zone=eu-west-1c
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 25 Sep 2022 14:08:00 +0000
...
...

根據上述輸出資訊,我們可以知道以下共同 label:

  • alpha.eksctl.ioeksctl 定義 prefix,因此兩個 node 上接有共同的 cluster-namenodegroup-name
  • beta.kubernetes.io:Kubernetes 保留 kubernetes.io 和 k8s.io namespace 中的所有 label。instance-type則是仰賴 cloudprovider 提供對應資訊。這部分也是兩者共同 label。
  • failure-domain.beta.kubernetes.io:即將棄用的 label,未來將會使用 topology.kubernetes.io。此也是透過 kubelet 或是外部 cloud-controller-manager 產生提供此資訊。
    在 managed node group 上,則可以觀察到 eks.amazonaws.com prefix 的資訊:
  • eks.amazonaws.com/capacityType
  • eks.amazonaws.com/nodegroup
  • eks.amazonaws.com/nodegroup-image
  • eks.amazonaws.com/sourceLaunchTemplateId
  • eks.amazonaws.com/sourceLaunchTemplateVersion
    接續,為了瞭解 eks.amazonaws.com prefix label 是怎麼來的。我們透過 SSH 分別登入至 node 查看,則可確認 kubelet 在帶入額外參數時,定義了不同的 label 於 system unit 設定檔。
## managed node group

[ec2-user@ip-192-168-18-230 ~]$ sudo ps aux | grep "kubelet"
root      3302  1.1  1.3 1822960 109300 ?      Ssl  13:00   1:08 /usr/bin/kubelet --cloud-provider aws --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime docker --network-plugin cni --node-ip=192.168.16.253 --pod-infra-container-image=602401143452.dkr.ecr.eu-west-1.amazonaws.com/eks/pause:3.5 --v=2 --node-labels=eks.amazonaws.com/sourceLaunchTemplateVersion=1,alpha.eksctl.io/nodegroup-name=demo-ng,alpha.eksctl.io/cluster-name=ironman-2022,eks.amazonaws.com/nodegroup-image=ami-08702177b0dcfc054,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=demo-ng,eks.amazonaws.com/sourceLaunchTemplateId=lt-04b22845f7d14ce5f --max-pods=29

[ec2-user@ip-192-168-18-230 ~]$ cat /etc/systemd/system/kubelet.service.d/30-kubelet-extra-args.conf
[Service]
Environment='KUBELET_EXTRA_ARGS=--node-labels=eks.amazonaws.com/sourceLaunchTemplateVersion=1,alpha.eksctl.io/nodegroup-name=demo-ng,alpha.eksctl.io/cluster-name=ironman-2022,eks.amazonaws.com/nodegroup-image=ami-08702177b0dcfc054,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=demo-ng,eks.amazonaws.com/sourceLaunchTemplateId=lt-04b22845f7d14ce5f --max-pods=29'

## self-managed nodes 
[ec2-user@ip-192-168-61-255 ~]$ sudo ps aux | grep "kubelet"
root      3389  1.1  1.3 1822960 107284 ?      Ssl  14:07   0:20 /usr/bin/kubelet --cloud-provider aws --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime docker --network-plugin cni --node-ip=192.168.61.255 --pod-infra-container-image=602401143452.dkr.ecr.eu-west-1.amazonaws.com/eks/pause:3.5 --v=2 --node-labels=alpha.eksctl.io/cluster-name=ironman-2022,alpha.eksctl.io/nodegroup-name=demo-self-ng,node-lifecycle=on-demand,alpha.eksctl.io/instance-id=i-02eccc7d3fa79fc6f

[ec2-user@ip-192-168-61-255 ~]$ cat /etc/systemd/system/kubelet.service.d/30-kubelet-extra-args.conf
[Service]
Environment='KUBELET_EXTRA_ARGS=--node-labels=alpha.eksctl.io/cluster-name=ironman-2022,alpha.eksctl.io/nodegroup-name=demo-self-ng,node-lifecycle=on-demand,alpha.eksctl.io/instance-id=i-02eccc7d3fa79fc6f'

一般來說,如果啟用一個新的 node group 後,則需要更新相對應 aws-auth ConfigMap,其細節流程也提及於 [02] 為什麼 kubectl 可以訪問 EKS cluster[4],若有興趣者可以查閱 AWS IAM Authenticator 段落。

然而在上述 eksctl,不論建立 managed node group 或是 self-managed node 皆無需更新,但是 node 卻又都可以順利加入 EKS cluster。故使用 CloudWatch Logs Insights syntax 查看 aws-auth ConfigMap 資源更新:

filter @logStream like /^kube-apiserver-audit/
 | fields @timestamp, @message
 | sort @timestamp asc
 | filter objectRef.name == 'aws-auth' AND verb != 'get' AND verb != 'list' AND verb != 'watch'
 | limit 10000

第一筆於 eksctl 建立 managed node group 時間後,由 username eks:node-manager 使用了似乎為 EKS 所管理的 Lambda function IAM role AWSWesleyClusterManagerLambda-NodeManagerRole-1KLMR524JM08L 更新 aws-auth ConfigMap,並增加了 Node 所使用的 instance profile role eksctl-ironman-2022-nodegroup-demo-ng-NodeInstanceRole-454Q2XHJ5LZ0

      "user": {
        "username": "eks:node-manager",
        "uid": "aws-iam-authenticator:222222222222:AROAVCHD6X7Z3FJJXOCS7",
        "groups": [
          "system:authenticated"
        ],
        "extra": {
          "accessKeyId": [
            "ASIAVCHD6X7Z4V3NJOVE"
          ],
          "arn": [
            "arn:aws:sts::222222222222:assumed-role/AWSWesleyClusterManagerLambda-NodeManagerRole-1KLMR524JM08L/1664110797435635508"
          ],
          "canonicalArn": [
            "arn:aws:iam::222222222222:role/AWSWesleyClusterManagerLambda-NodeManagerRole-1KLMR524JM08L"
          ],
          "sessionName": [
            "1664110797435635508"
          ]
        }
        ...
        ...
      "userAgent": "vpcLambda/v0.0.0 (linux/amd64) kubernetes/$Format",
      "objectRef": {
        "resource": "ConfigMaps",
        "namespace": "kube-system",
        "name": "aws-auth",
        "apiVersion": "v1"
      },
      "responseStatus": {
        "metadata": {},
        "code": 200
      },
      "requestObject": {
        "data": {
          "mapRoles": "- groups:\n  - system:bootstrappers\n  - system:nodes\n  rolearn: arn:aws:iam::111111111111:role/eksctl-ironman-2022-nodegroup-ng1-publ-NodeInstanceRole-HN27OZ18JS6U\n  username: system:node:{{EC2PrivateDNSName}}\n- groups:\n  - system:bootstrappers\n  - system:nodes\n  rolearn: arn:aws:iam::111111111111:role/eksctl-ironman-2022-nodegroup-demo-ng-NodeInstanceRole-454Q2XHJ5LZ0\n  username: system:node:{{EC2PrivateDNSName}}\n"
        }
      },
      ...
      ...
      "requestReceivedTimestamp": "2022-09-25T12:59:57.490185Z",
      "stageTimestamp": "2022-09-25T12:59:57.493676Z",

第二筆,則是由 username kubernetes-admin,使用了 EKS cluster creator 所關聯的 IAM cli user arn:aws:iam::111111111111:user/cli 更新 aws-auth ConfigMap 並增加了 Node 所使用的 instance profile role eksctl-ironman-2022-nodegroup-demo-sel-NodeInstanceRole-19TJJ2L7I7GOD

      "user": {
        "username": "kubernetes-admin",
        "uid": "aws-iam-authenticator:111111111111:AIDAYFMQSNSE7DQH72FL7",
        "groups": [
          "system:masters",
          "system:authenticated"
        ],
        "extra": {
          "accessKeyId": [
            "AKIAYFMQSNSE5H5ZTJDE"
          ],
          "arn": [
            "arn:aws:iam::111111111111:user/cli"
          ],
          "canonicalArn": [
            "arn:aws:iam::111111111111:user/cli"
          ],
          "sessionName": [
            ""
          ]
        }
      },
      ...
      ...
            "userAgent": "eksctl/v0.0.0 (linux/amd64) kubernetes/$Format",
      "objectRef": {
        "resource": "ConfigMaps",
        "namespace": "kube-system",
        "name": "aws-auth",
        "uid": "f4cb3fae-17ef-421d-871c-5be15efbe73f",
        "apiVersion": "v1",
        "resourceVersion": "2996694"
      },
      "responseStatus": {
        "metadata": {},
        "code": 200
      },
      "requestObject": {
        "kind": "ConfigMap",
        "apiVersion": "v1",
        "metadata": {
          "name": "aws-auth",
          "namespace": "kube-system",
          "uid": "f4cb3fae-17ef-421d-871c-5be15efbe73f",
          "resourceVersion": "2996694",
          "creationTimestamp": "2022-09-14T09:56:19Z",
          "managedFields": [
            {
              "manager": "vpcLambda",
              "operation": "Update",
              "apiVersion": "v1",
              "time": "2022-09-14T09:56:19Z",
              "fieldsType": "FieldsV1",
              "fieldsV1": {
                "f:data": {
                  ".": {},
                  "f:mapRoles": {}
                }
              }
            }
          ]
        },
        "data": {
          "mapRoles": "- groups:\n  - system:bootstrappers\n  - system:nodes\n  rolearn: arn:aws:iam::111111111111:role/eksctl-ironman-2022-nodegroup-ng1-publ-NodeInstanceRole-HN27OZ18JS6U\n  username: system:node:{{EC2PrivateDNSName}}\n- groups:\n  - system:bootstrappers\n  - system:nodes\n  rolearn: arn:aws:iam::111111111111:role/eksctl-ironman-2022-nodegroup-demo-ng-NodeInstanceRole-454Q2XHJ5LZ0\n  username: system:node:{{EC2PrivateDNSName}}\n- groups:\n  - system:bootstrappers\n  - system:nodes\n  rolearn: arn:aws:iam::111111111111:role/eksctl-ironman-2022-nodegroup-demo-sel-NodeInstanceRole-19TJJ2L7I7GOD\n  username: system:node:{{EC2PrivateDNSName}}\n",
          "mapUsers": "[]\n"
        }
      }
      ...
      ...
      "requestReceivedTimestamp": "2022-09-25T14:07:55.187756Z",
      "stageTimestamp": "2022-09-25T14:07:55.194246Z",

總結

由上述兩筆 audit log 更新 aws-auth ConfigMap 行為,可以比對出:

  1. managed node group 替使用者管理 aws-auth ConfigMap 更新。
  2. eksctl 在建立 self-managed node group 會依據 CloudFormation 建立的 Node IAM/instance role 替使用者自行更新 ConfigMap 而無需手動更新。

就目前探討了 EKS node 上 label 及在建立 node 過程中的一些差異,下一篇我們將會討論 node 更新過程 managed node group 協助了什麼 。


上述資訊透過 EKS 所提供 Logs 來驗證上游 Kubernetes 運作原理,倘若上述內文有所錯誤,隨時可以留言或是私訊我。

參考文件

  1. Managed node groups - https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html
  2. Self-managed nodes - https://docs.aws.amazon.com/eks/latest/userguide/worker.html
  3. Well-Known Labels, Annotations and Taints - https://kubernetes.io/docs/reference/labels-annotations-taints/
  4. 為什麼 kubectl 可以訪問 EKS cluster - https://ithelp.ithome.com.tw/articles/10291926

上一篇
[09] 為什麼 EKS 可以整合 IAM roles for service accounts(IRSA)(二)
下一篇
[11] 為什麼 Managed node groups 可以保持應用程式可用性(二)
系列文
那些文件沒告訴你的 AWS EKS30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言