官方 EKS 提供了 AWS Load Balancer Controller 整合 AWS ELB 資源:
因此我們可以透過建立 Service 或是 Ingress 來建立相應 Load Balancers 資源。然而在 deployment 更新時,偶爾則可以觀察到 ALB 響應 HTTP 502 的狀況。因此本文將探討為什麼 ALB 會回傳 HTTP 502 錯誤。
$ curl -o iam-policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.4.4/docs/install/iam_policy.json
aws iam create-policy
命令建立 IAM Policy。$ aws iam create-policy \
--policy-name AWSLoadBalancerControllerIAMPolicy \
--policy-document file://iam-policy.json
eksctl
命令建立 IRSA(IAM roles for service accounts)的 IAM role 及 Service Account$ eksctl create iamserviceaccount \
--cluster=ironman-2022 \
--namespace=kube-system \
--name=aws-load-balancer-controller \
--attach-policy-arn=arn:aws:iam::111111111111:policy/AWSLoadBalancerControllerIAMPolicy \
--override-existing-serviceaccounts \
--region eu-west-1 \
--approve
2022-10-11 14:07:43 [ℹ] 4 existing iamserviceaccount(s) (amazon-cloudwatch/cloudwatch-agent,amazon-cloudwatch/fluentd,default/my-s3-sa,kube-system/aws-node) will be excluded
2022-10-11 14:07:43 [ℹ] 1 iamserviceaccount (kube-system/aws-load-balancer-controller) was included (based on the include/exclude rules)
2022-10-11 14:07:43 [!] metadata of serviceaccounts that exist in Kubernetes will be updated, as --override-existing-serviceaccounts was set
2022-10-11 14:07:43 [ℹ] 1 task: {
2 sequential sub-tasks: {
create IAM role for serviceaccount "kube-system/aws-load-balancer-controller",
create serviceaccount "kube-system/aws-load-balancer-controller",
} }2022-10-11 14:07:43 [ℹ] building iamserviceaccount stack "eksctl-ironman-2022-addon-iamserviceaccount-kube-system-aws-load-balancer-controller"
2022-10-11 14:07:44 [ℹ] deploying stack "eksctl-ironman-2022-addon-iamserviceaccount-kube-system-aws-load-balancer-controller"
2022-10-11 14:07:44 [ℹ] waiting for CloudFormation stack "eksctl-ironman-2022-addon-iamserviceaccount-kube-system-aws-load-balancer-controller"
2022-10-11 14:08:14 [ℹ] waiting for CloudFormation stack "eksctl-ironman-2022-addon-iamserviceaccount-kube-system-aws-load-balancer-controller"
2022-10-11 14:08:14 [ℹ] created serviceaccount "kube-system/aws-load-balancer-controller"
$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh
Downloading https://get.helm.sh/helm-v3.10.0-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm
$ helm repo add eks https://aws.github.io/eks-charts
$ helm install aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=ironman-2022 --set serviceAccount.create=false --set serviceAccount.name=aws-load-balancer-controller
NAME: aws-load-balancer-controller
LAST DEPLOYED: Tue Oct 11 14:11:56 2022
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
AWS Load Balancer controller installed!
$ cat ./nginx-demo-ingress.yaml
apiVersion: v1
kind: Namespace
metadata:
name: "demo-nginx"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: "demo-nginx-deployment"
namespace: "demo-nginx"
spec:
selector:
matchLabels:
app: "demo-nginx"
replicas: 10
template:
metadata:
labels:
app: "demo-nginx"
role: "backend"
spec:
containers:
- image: nginx:latest
imagePullPolicy: Always
name: "demo-nginx"
ports:
- containerPort: 80
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
---
apiVersion: v1
kind: Service
metadata:
name: "service-demo-nginx"
namespace: "demo-nginx"
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
app: "demo-nginx"
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
namespace: "demo-nginx"
name: "ingress-nginx"
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
spec:
rules:
- http:
paths:
- path: "/"
pathType: Prefix
backend:
service:
name: service-demo-nginx
port:
number: 80
$ kubectl apply -f ./nginx-demo-ingress.yaml
namespace/demo-nginx created
deployment.apps/demo-nginx-deployment created
service/service-demo-nginx created
ingress.networking.k8s.io/ingress-nginx created
$ kubectl -n demo-nginx get ing,svc,po -o wide
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress.networking.k8s.io/ingress-nginx <none> * k8s-demongin-ingressn-843da381ab-1111111111.eu-west-1.elb.amazonaws.com 80 56m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/service-demo-nginx ClusterIP 10.100.247.243 <none> 80/TCP 56m app=demo-nginx
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/demo-nginx-deployment-75966cf8c6-2ntf4 1/1 Running 0 10m 192.168.56.52 ip-192-168-61-79.eu-west-1.compute.internal <none> <none>
pod/demo-nginx-deployment-75966cf8c6-4nzz7 1/1 Running 0 10m 192.168.91.189 ip-192-168-76-241.eu-west-1.compute.internal <none> <none>
pod/demo-nginx-deployment-75966cf8c6-4qgtj 1/1 Running 0 10m 192.168.74.144 ip-192-168-71-85.eu-west-1.compute.internal <none> <none>
pod/demo-nginx-deployment-75966cf8c6-5b6ws 1/1 Running 0 10m 192.168.25.57 ip-192-168-18-190.eu-west-1.compute.internal <none> <none>
pod/demo-nginx-deployment-75966cf8c6-6khx4 1/1 Running 0 10m 192.168.69.15 ip-192-168-71-85.eu-west-1.compute.internal <none> <none>
pod/demo-nginx-deployment-75966cf8c6-c76pr 1/1 Running 0 10m 192.168.40.177 ip-192-168-61-79.eu-west-1.compute.internal <none> <none>
pod/demo-nginx-deployment-75966cf8c6-dsztw 1/1 Running 0 10m 192.168.67.240 ip-192-168-76-241.eu-west-1.compute.internal <none> <none>
pod/demo-nginx-deployment-75966cf8c6-qlkj2 1/1 Running 0 10m 192.168.62.119 ip-192-168-55-245.eu-west-1.compute.internal <none> <none>
pod/demo-nginx-deployment-75966cf8c6-stt9c 1/1 Running 0 10m 192.168.29.171 ip-192-168-18-190.eu-west-1.compute.internal <none> <none>
pod/demo-nginx-deployment-75966cf8c6-zfzzf 1/1 Running 0 10m 192.168.49.142 ip-192-168-55-245.eu-west-1.compute.internal <none> <none>
$ curl -I k8s-demongin-ingressn-843da381ab-1111111111.eu-west-1.elb.amazonaws.com
HTTP/1.1 200 OK
Date: Tue, 11 Oct 2022 15:38:44 GMT
Content-Type: text/html
Content-Length: 615
Connection: keep-alive
Server: nginx/1.23.1
Last-Modified: Mon, 23 May 2022 22:59:19 GMT
ETag: "628c1fd7-267"
Accept-Ranges: bytes
$ aws elbv2 describe-target-groups --load-balancer-arn arn:aws:elasticloadbalancing:eu-west-1:111111111111:loadbalancer/app/k8s-demongin-ingressn-843da381ab/9a5746ba008507c3
{
"TargetGroups": [
{
"TargetGroupArn": "arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2",
"TargetGroupName": "k8s-demongin-serviced-610dd2305f",
"Protocol": "HTTP",
"Port": 80,
"VpcId": "vpc-0b150640c72b722db",
"HealthCheckProtocol": "HTTP",
"HealthCheckPort": "traffic-port",
"HealthCheckEnabled": true,
"HealthCheckIntervalSeconds": 15,
"HealthCheckTimeoutSeconds": 5,
"HealthyThresholdCount": 2,
"UnhealthyThresholdCount": 2,
"HealthCheckPath": "/",
"Matcher": {
"HttpCode": "200"
},
"LoadBalancerArns": [
"arn:aws:elasticloadbalancing:eu-west-1:111111111111:loadbalancer/app/k8s-demongin-ingressn-843da381ab/9a5746ba008507c3"
],
"TargetType": "ip",
"ProtocolVersion": "HTTP1",
"IpAddressType": "ipv4"
}
]
}
$ aws elbv2 describe-target-health --target-group-arn arn:aws:elasticloadbalancing:eu-west-1:111111111111:targetgroup/k8s-demongin-serviced-610dd2305f/ffd6c2b5292a36c2 --query 'TargetHealthDescriptions[].[Target, TargetHealth]' --output table
-----------------------------------------------------------
| DescribeTargetHealth |
+-------------------+------------------+-------+----------+
| AvailabilityZone | Id | Port | State |
+-------------------+------------------+-------+----------+
| eu-west-1c | 192.168.40.177 | 80 | |
| | | | healthy |
| eu-west-1c | 192.168.62.119 | 80 | |
| | | | healthy |
| eu-west-1a | 192.168.25.57 | 80 | |
| | | | healthy |
| eu-west-1b | 192.168.74.144 | 80 | |
| | | | healthy |
| eu-west-1b | 192.168.69.15 | 80 | |
| | | | healthy |
| eu-west-1c | 192.168.56.52 | 80 | |
| | | | healthy |
| eu-west-1a | 192.168.29.171 | 80 | |
| | | | healthy |
| eu-west-1b | 192.168.91.189 | 80 | |
| | | | healthy |
| eu-west-1c | 192.168.49.142 | 80 | |
| | | | healthy |
| eu-west-1b | 192.168.67.240 | 80 | |
| | | | healthy |
+-------------------+------------------+-------+----------+
curl
command 每 0.5 秒訪問一次 ALB endpoint。$ counter=1; for i in {1..200}; do echo $counter; counter=$((counter+1)); sleep │
0.5; curl -I k8s-demongin-ingressn-843da381ab-1111111111.eu-west-1.elb.amazonaws.com >> test-result.txt; done
$ kubectl -n demo-nginx set image deploy demo-nginx-deployment demo-nginx=nginx:1.22.0 && kubectl -n demo-nginx rollout status deploy demo-nginx-deployment
deployment.apps/demo-nginx-deployment image updated
Waiting for deployment "demo-nginx-deployment" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 7 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 7 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 7 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 7 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 3 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 3 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 3 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 2 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 2 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 2 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 8 of 10 updated replicas are available...
Waiting for deployment "demo-nginx-deployment" rollout to finish: 9 of 10 updated replicas are available...
deployment "demo-nginx-deployment" successfully rolled out
$ grep -A2 "502" ./test-result.txt
HTTP/1.1 502 Bad Gateway
Server: awselb/2.0
Date: Tue, 11 Oct 2022 15:39:56 GMT
--
HTTP/1.1 502 Bad Gateway
Server: awselb/2.0
Date: Tue, 11 Oct 2022 15:39:59 GMT
--
HTTP/1.1 502 Bad Gateway
Server: awselb/2.0
Date: Tue, 11 Oct 2022 15:40:03 GMT
--
HTTP/1.1 502 Bad Gateway
Server: awselb/2.0
Date: Tue, 11 Oct 2022 15:39:06 GMT
由上述資訊,我們初步架設並復現了問題,可以知道 EKS rolling update 時會有部分 HTTP 502 錯誤。明天我們將探討為什麼會有 ALB HTTP 502 的錯誤訊息。