iT邦幫忙

2022 iThome 鐵人賽

DAY 26
0
DevOps

那些文件沒告訴你的 AWS EKS系列 第 26

[26] 為什麼 EKS 在更新 Kubernetes deployment 時會有 HTTP 502 的 error(一)

  • 分享至 

  • xImage
  •  

官方 EKS 提供了 AWS Load Balancer Controller 整合 AWS ELB 資源:

  • 部署 Application Load Balancers(ALB) 作為 Kubernetes Ingress object
  • 部署 Network Load Balancers 作為 Kubernetes Service object

因此我們可以透過建立 Service 或是 Ingress 來建立相應 Load Balancers 資源。然而在 deployment 更新時,偶爾則可以觀察到 ALB 響應 HTTP 502 的狀況。因此本文將探討為什麼 ALB 會回傳 HTTP 502 錯誤。

建立環境

安裝 AWS Load Balancer Controller

  1. 下載 IAM policy。
$ curl -o iam-policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.4.4/docs/install/iam_policy.json
  1. 透過 AWS CLI aws iam create-policy 命令建立 IAM Policy。
$ aws iam create-policy \
    --policy-name AWSLoadBalancerControllerIAMPolicy \
    --policy-document file://iam-policy.json
  1. 使用 eksctl 命令建立 IRSA(IAM roles for service accounts)的 IAM role 及 Service Account
$ eksctl create iamserviceaccount \
    --cluster=ironman-2022 \
    --namespace=kube-system \
    --name=aws-load-balancer-controller \
    --attach-policy-arn=arn:aws:iam::111111111111:policy/AWSLoadBalancerControllerIAMPolicy \
    --override-existing-serviceaccounts \
    --region eu-west-1 \
    --approve
2022-10-11 14:07:43 [ℹ]  4 existing iamserviceaccount(s) (amazon-cloudwatch/cloudwatch-agent,amazon-cloudwatch/fluentd,default/my-s3-sa,kube-system/aws-node) will be excluded
2022-10-11 14:07:43 [ℹ]  1 iamserviceaccount (kube-system/aws-load-balancer-controller) was included (based on the include/exclude rules)
2022-10-11 14:07:43 [!]  metadata of serviceaccounts that exist in Kubernetes will be updated, as --override-existing-serviceaccounts was set
2022-10-11 14:07:43 [ℹ]  1 task: {
    2 sequential sub-tasks: {
        create IAM role for serviceaccount "kube-system/aws-load-balancer-controller",
        create serviceaccount "kube-system/aws-load-balancer-controller",
    } }2022-10-11 14:07:43 [ℹ]  building iamserviceaccount stack "eksctl-ironman-2022-addon-iamserviceaccount-kube-system-aws-load-balancer-controller"
2022-10-11 14:07:44 [ℹ]  deploying stack "eksctl-ironman-2022-addon-iamserviceaccount-kube-system-aws-load-balancer-controller"
2022-10-11 14:07:44 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-addon-iamserviceaccount-kube-system-aws-load-balancer-controller"
2022-10-11 14:08:14 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-addon-iamserviceaccount-kube-system-aws-load-balancer-controller"
2022-10-11 14:08:14 [ℹ]  created serviceaccount "kube-system/aws-load-balancer-controller"
  1. AWS Load Balancer Controller 提供 Helm [2] ,依照 Helm 文件進行安裝,使用最新版本 3.10.0 版本:
$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh
Downloading https://get.helm.sh/helm-v3.10.0-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm
  1. 新增 Helm repo:
$ helm repo add eks https://aws.github.io/eks-charts
  1. 於第 3 步驟已經建立 Service Account 過,因此此步驟則無需再次建立 Service Account。
$ helm install aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=ironman-2022 --set serviceAccount.create=false --set serviceAccount.name=aws-load-balancer-controller
NAME: aws-load-balancer-controller
LAST DEPLOYED: Tue Oct 11 14:11:56 2022
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
AWS Load Balancer controller installed!

建立環境

  1. 以 Nginx server 作為測試,建立 nginx-demo-ingress.yaml 包含 Deployment、Service、Ingress 資源。
$ cat ./nginx-demo-ingress.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: "demo-nginx"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: "demo-nginx-deployment"
  namespace: "demo-nginx"
spec:
  selector:
    matchLabels:
      app: "demo-nginx"
  replicas: 10
  template:
    metadata:
      labels:
        app: "demo-nginx"
        role: "backend"
    spec:
      containers:
      - image: nginx:latest
        imagePullPolicy: Always
        name: "demo-nginx"
        ports:
        - containerPort: 80
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
---
apiVersion: v1
kind: Service
metadata:
  name: "service-demo-nginx"
  namespace: "demo-nginx"
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  selector:
    app: "demo-nginx"
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: "demo-nginx"
  name: "ingress-nginx"
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  rules:
  - http:
      paths:
      - path: "/"
        pathType: Prefix
        backend:
          service:
            name: service-demo-nginx
            port:
              number: 80
  1. 部署此 YAML。
$ kubectl apply -f  ./nginx-demo-ingress.yaml
namespace/demo-nginx created
deployment.apps/demo-nginx-deployment created
service/service-demo-nginx created
ingress.networking.k8s.io/ingress-nginx created
  1. 確認 Nginx Pod 皆有正常運作。
$ kubectl -n demo-nginx get ing,svc,po -o wide
NAME                                      CLASS    HOSTS   ADDRESS                                                                   PORTS   AGE
ingress.networking.k8s.io/ingress-nginx   <none>   *       k8s-demongin-ingressn-843da381ab-1111111111.eu-west-1.elb.amazonaws.com   80      56m

NAME                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE   SELECTOR
service/service-demo-nginx   ClusterIP   10.100.247.243   <none>        80/TCP    56m   app=demo-nginx

NAME                                         READY   STATUS    RESTARTS   AGE   IP               NODE                                           NOMINATED NODE   READINESS GATES
pod/demo-nginx-deployment-75966cf8c6-2ntf4   1/1     Running   0          10m   192.168.56.52    ip-192-168-61-79.eu-west-1.compute.internal    <none>           <none>
pod/demo-nginx-deployment-75966cf8c6-4nzz7   1/1     Running   0          10m   192.168.91.189   ip-192-168-76-241.eu-west-1.compute.internal   <none>           <none>
pod/demo-nginx-deployment-75966cf8c6-4qgtj   1/1     Running   0          10m   192.168.74.144   ip-192-168-71-85.eu-west-1.compute.internal    <none>           <none>
pod/demo-nginx-deployment-75966cf8c6-5b6ws   1/1     Running   0          10m   192.168.25.57    ip-192-168-18-190.eu-west-1.compute.internal   <none>           <none>
pod/demo-nginx-deployment-75966cf8c6-6khx4   1/1     Running   0          10m   192.168.69.15    ip-192-168-71-85.eu-west-1.compute.internal    <none>           <none>
pod/demo-nginx-deployment-75966cf8c6-c76pr   1/1     Running   0          10m   192.168.40.177   ip-192-168-61-79.eu-west-1.compute.internal    <none>           <none>
pod/demo-nginx-deployment-75966cf8c6-dsztw   1/1     Running   0          10m   192.168.67.240   ip-192-168-76-241.eu-west-1.compute.internal   <none>           <none>
pod/demo-nginx-deployment-75966cf8c6-qlkj2   1/1     Running   0          10m   192.168.62.119   ip-192-168-55-245.eu-west-1.compute.internal   <none>           <none>
pod/demo-nginx-deployment-75966cf8c6-stt9c   1/1     Running   0          10m   192.168.29.171   ip-192-168-18-190.eu-west-1.compute.internal   <none>           <none>
pod/demo-nginx-deployment-75966cf8c6-zfzzf   1/1     Running   0          10m   192.168.49.142   ip-192-168-55-245.eu-west-1.compute.internal   <none>           <none>
  1. 查看 Ingress 所建立的 ALB 資源可以正常響應 HTTP 200,由於使用 Nginx latest image 目前最新版本為 1.23.1
$ curl -I k8s-demongin-ingressn-843da381ab-1111111111.eu-west-1.elb.amazonaws.com
HTTP/1.1 200 OK
Date: Tue, 11 Oct 2022 15:38:44 GMT
Content-Type: text/html
Content-Length: 615
Connection: keep-alive
Server: nginx/1.23.1
Last-Modified: Mon, 23 May 2022 22:59:19 GMT
ETag: "628c1fd7-267"
Accept-Ranges: bytes
  1. 查看 ELB Target group。
$ aws elbv2 describe-target-groups --load-balancer-arn arn:aws:elasticloadbalancing:eu-west-1:111111111111:loadbalancer/app/k8s-demongin-ingressn-843da381ab/9a5746ba008507c3
{
    "TargetGroups": [
        {
            "TargetGroupArn": "arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2",
            "TargetGroupName": "k8s-demongin-serviced-610dd2305f",
            "Protocol": "HTTP",
            "Port": 80,
            "VpcId": "vpc-0b150640c72b722db",
            "HealthCheckProtocol": "HTTP",
            "HealthCheckPort": "traffic-port",
            "HealthCheckEnabled": true,
            "HealthCheckIntervalSeconds": 15,
            "HealthCheckTimeoutSeconds": 5,
            "HealthyThresholdCount": 2,
            "UnhealthyThresholdCount": 2,
            "HealthCheckPath": "/",
            "Matcher": {
                "HttpCode": "200"
            },
            "LoadBalancerArns": [
                "arn:aws:elasticloadbalancing:eu-west-1:111111111111:loadbalancer/app/k8s-demongin-ingressn-843da381ab/9a5746ba008507c3"
            ],
            "TargetType": "ip",
            "ProtocolVersion": "HTTP1",
            "IpAddressType": "ipv4"
        }
    ]
}
$ aws elbv2 describe-target-health --target-group-arn arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2 --query 'TargetHealthDescriptions[].[Target, TargetHealth]' --output table
-----------------------------------------------------------
|                  DescribeTargetHealth                   |
+-------------------+------------------+-------+----------+
| AvailabilityZone  |       Id         | Port  |  State   |
+-------------------+------------------+-------+----------+
|  eu-west-1c       |  192.168.40.177  |  80   |          |
|                   |                  |       |  healthy |
|  eu-west-1c       |  192.168.62.119  |  80   |          |
|                   |                  |       |  healthy |
|  eu-west-1a       |  192.168.25.57   |  80   |          |
|                   |                  |       |  healthy |
|  eu-west-1b       |  192.168.74.144  |  80   |          |
|                   |                  |       |  healthy |
|  eu-west-1b       |  192.168.69.15   |  80   |          |
|                   |                  |       |  healthy |
|  eu-west-1c       |  192.168.56.52   |  80   |          |
|                   |                  |       |  healthy |
|  eu-west-1a       |  192.168.29.171  |  80   |          |
|                   |                  |       |  healthy |
|  eu-west-1b       |  192.168.91.189  |  80   |          |
|                   |                  |       |  healthy |
|  eu-west-1c       |  192.168.49.142  |  80   |          |
|                   |                  |       |  healthy |
|  eu-west-1b       |  192.168.67.240  |  80   |          |
|                   |                  |       |  healthy |
+-------------------+------------------+-------+----------+

  1. 使用以下 curl command 每 0.5 秒訪問一次 ALB endpoint。
$ counter=1; for i in {1..200}; do echo $counter; counter=$((counter+1)); sleep │
0.5; curl -I k8s-demongin-ingressn-843da381ab-1111111111.eu-west-1.elb.amazonaws.com >> test-result.txt; done
  1. 於步驟 6 同一時間,於另一個 terminal 上執行更新 image 至 Nginx image 1.22.0 版本。
$ kubectl -n demo-nginx set image deploy demo-nginx-deployment demo-nginx=nginx:1.22.0 && kubectl -n demo-nginx rollout status deploy demo-nginx-deployment
deployment.apps/demo-nginx-deployment image updated
Waiting for deployment "demo-nginx-deployment" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 7 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 7 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 7 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 7 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 3 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 3 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 3 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 2 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 2 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 2 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 8 of 10 updated replicas are available...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 9 of 10 updated replicas are available...
deployment "demo-nginx-deployment" successfully rolled out
  1. 其結果可以觀察到 ALB HTTP 502 的錯誤訊息。
$ grep -A2 "502" ./test-result.txt
HTTP/1.1 502 Bad Gateway
Server: awselb/2.0
Date: Tue, 11 Oct 2022 15:39:56 GMT
--
HTTP/1.1 502 Bad Gateway
Server: awselb/2.0
Date: Tue, 11 Oct 2022 15:39:59 GMT
--
HTTP/1.1 502 Bad Gateway
Server: awselb/2.0
Date: Tue, 11 Oct 2022 15:40:03 GMT
--
HTTP/1.1 502 Bad Gateway
Server: awselb/2.0
Date: Tue, 11 Oct 2022 15:39:06 GMT

總結

由上述資訊,我們初步架設並復現了問題,可以知道 EKS rolling update 時會有部分 HTTP 502 錯誤。明天我們將探討為什麼會有 ALB HTTP 502 的錯誤訊息。

參考文件

  1. IAM roles for service accounts - https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
  2. Helm | The package manager for Kubernetes - https://helm.sh/
  3. Installing Helm - https://helm.sh/docs/intro/install/

上一篇
[25] 為什麼 EKS add-on 可以管理 Kubernetes plugin
下一篇
[27] 為什麼 EKS 在更新 Kubernetes deployment 時會有 HTTP 502 的 error(二)
系列文
那些文件沒告訴你的 AWS EKS30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言