iT邦幫忙

2024 iThome 鐵人賽

DAY 6
0

本次學習目標

  1. Quarkus 實現健康檢查
  2. 整合 EMQX 與 PostgreSQL
  3. 探討實際部署後的 Pod 內容

我們在 Quarkus 中新增以下依賴,該依賴實作 MicroProfile Health。該依賴用於幫助應用程式實現健康檢查的端點。默認會註冊 livenessreadiness 等探針,服務啟動時都會返回 UP 狀態。

implementation 'io.quarkus:quarkus-smallrye-health'

預設提供端點路徑是以下,期會搭配 quarkus.smallrye-health.root-path 根路徑,預設是 health。更多的配置可以至官方

探針 路徑 對應配置
liveness live quarkus.smallrye-health.liveness-path
readiness ready quarkus.smallrye-health.readiness-path
startup started quarkus.smallrye-health.startup-path

MicroProfile Health 規範透過以下定義來支援這種健康檢測的契約:

  • Health endpoints
    • /health/live/health/ready 端點的訪問
    • Quarkus 將它們分別重定向到 /q/health/live/q/health/ready 端點
  • HTTP status code
    • 反映健康狀態
  • HTTP response payload
    • JSON 回應訊息提供狀態以及其它元數據和上下文
  • Application liveness
    • 定義應用程式是否啟動並正常運行
  • Application readiness
    • 定義應用程式是否可接收外部請求流量
  • Application health API
    • 根據自定義方式公開應用程序 livenessreadiness

當引入該依賴後,運行專案。並透過 curl 嘗試獲取 livenessready 等。沒錯這樣你的健康檢查就完成了。

$ curl http://localhost:8080/q/health/live
{
    "status": "UP",
    "checks": [
    ]
}
$ curl http://localhost:8080/q/health/ready
{
    "status": "UP",
    "checks": [
    ]
}
$ curl http://localhost:8080/q/health/started
{
    "status": "UP",
    "checks": [
    ]
}

liveness

底層平台向 /q/health/live 端點發出 HTTP 請求,以確定是否應該重新啟動應用程式。
如果服務是啟動的,則返回 200 的結果 UP;如果無法上線,返回 503 的結果 DOWN;無法被計算健康檢查,返回 500。此探針,預設端點在 /q/health/live

readiness

底層平台向 /q/health/ready 端點發出 HTTP 請求,以確定應用程式是否準備好接受流量。
如果服務已經準備好處理請求,返回 200 UP。相較於 liveness 不同,因為他表示服務已經啟動,但可能還不能處理任何請求(資料庫在做初始變動)。如果服務還不能接受任何請求,則返回 503 DOWN。同樣的無法被系統識別,則返回 500。此探針,預設端點在 /q/health/ready

接著引入一個連線 PostgreSQL 和 mqtt 的依賴。嘗試透過 MQTT 發送訊息,並接收訊息最後儲存至資料庫。

    implementation 'io.quarkus:quarkus-hibernate-orm-panache'
    implementation 'io.quarkus:quarkus-jdbc-postgresql'
    implementation 'io.quarkus:quarkus-messaging-mqtt'

依賴導入完成後再安裝 PostgreSQL 和 EMQX 環境,讓 Quarkus 來進行交互。

$ docker compose -f infra/docker-compose.yaml  up -d

環境安裝完後,配置連線資訊,如果要操作記得 IP 需要換成自己的環境 IP。

# configure your datasource
quarkus.datasource.db-kind = postgresql
quarkus.datasource.username = itachi
quarkus.datasource.password = 123456
quarkus.datasource.jdbc.url = jdbc:postgresql://172.25.150.200:5432/itachi

# drop and create the database at startup (use `update` to only update the schema)
quarkus.hibernate-orm.database.generation = drop-and-create


mp.messaging.outgoing.deviceOut.type=smallrye-mqtt
mp.messaging.outgoing.deviceOut.host=172.25.150.200
mp.messaging.outgoing.deviceOut.port=1883
mp.messaging.outgoing.deviceOut.auto-generated-client-id=true
mp.messaging.outgoing.deviceOut.qos=1
mp.messaging.outgoing.deviceOut.topic=test/device


mp.messaging.incoming.deviceIn.type=smallrye-mqtt
mp.messaging.incoming.deviceIn.host=172.25.150.200
mp.messaging.incoming.deviceIn.port=1883
mp.messaging.incoming.deviceIn.auto-generated-client-id=true
mp.messaging.incoming.deviceIn.qos=1
mp.messaging.incoming.deviceIn.topic=test/device
producer.generate.duration=1

當以上都完成後,運行 Quarkus 吧! 結果如下,沒錯最後 Quarkus 框架還是幫你弄好了,將會自動註冊一個 readiness 健康檢查,在 check 部分,以驗證是否能連接資料庫。而 MQTT 的探針分別針對 deviceIndeviceOut Topic。

$ curl http://localhost:8080/q/health/live
{
    "status": "UP",
    "checks": [
        {
            "name": "SmallRye Reactive Messaging - liveness check",
            "status": "UP",
            "data": {
                "deviceIn": "[OK]",
                "deviceOut": "[OK]"
            }
        }
    ]
}
$ curl http://localhost:8080/q/health/ready
{
    "status": "UP",
    "checks": [
        {
            "name": "Database connections health check",
            "status": "UP",
            "data": {
                "<default>": "UP"
            }
        },
        {
            "name": "SmallRye Reactive Messaging - readiness check",
            "status": "UP",
            "data": {
                "deviceIn": "[OK]",
                "deviceOut": "[OK]"
            }
        }
    ]
}
$ curl http://localhost:8080/q/health/started
{
    "status": "UP",
    "checks": [
        {
            "name": "SmallRye Reactive Messaging - startup check",
            "status": "UP",
            "data": {
                "deviceIn": "[OK]",
                "deviceOut": "[OK]"
            }
        }
    ]
}

假設服務需要與第三方進行交互,那也是可以自訂義端點。如下,分別定義 livenessreadiness

  1. liveness
@ApplicationScoped
@Liveness
public class LivenessProbe implements HealthCheck {
    @Override
    public HealthCheckResponse call() {
        return HealthCheckResponse
                .named("Custom Liveness Prob")
                .withData("time", String.valueOf(new Date()))
                .up()
                .build();
    }
}

以 liveness 來說,在 Quarkus 註冊流程如下。

  1. HealthCheck 必須是 CDI bean,它使用 @ApplicationScoped 進行註釋,創建了一個CDI bean 實例

  2. 使用 @Liveness 表示它是一個 liveness 的健康檢查

  3. 實作 HealthCheck 介面,並覆寫 call() 方法

  4. 只要調用 /q/health/live 端點,就會調用 call() 方法,並返回一個 HealthCheckResponse 物件

  5. 每個健康檢查都有一個名稱(named),其反映健康檢查的意圖

  6. 上下文數據可以以鍵值對的形式添加到健康檢查中。(withData)

  7. 狀態始終返回 UP(up())

  8. readiness

自訂義針對外部服務進行探針,相反的使用 @Readiness 註解實作,下面是針對外部服務進行實作。這邊定義了一個 externalURL 環境變數,用於讓探針針對該變數進行週期性的戳動作。其使用 HTTP GET 方式且狀態是 200 就表示成功。

@ConfigMapping(prefix = "health")
public interface HealthCheckConfig {

    Readiness readiness();

    interface Readiness {
        String externalURL();
    }
}

@ApplicationScoped
public class ReadinessProbe {
    @Inject
    HealthCheckConfig healthCheckConfig;

    @Readiness
    HealthCheck checkURL() {
        return new UrlHealthCheck(healthCheckConfig.readiness().externalURL())
                .name("ExternalURL health check").requestMethod(HttpMethod.GET).statusCode(200);
    }
}

定義完後,可以如下看到定義的 ExternalURL health check

$ curl http://localhost:8080/q/health/ready
{
    "status": "UP",
    "checks": [
        {
            "name": "ExternalURL health check",
            "status": "UP",
            "data": {
                "host": "GET https://www.fruityvice.com/api/fruit/banana"
            }
        },
        {
            "name": "Database connections health check",
            "status": "UP",
            "data": {
                "<default>": "UP"
            }
        },
        {
            "name": "SmallRye Reactive Messaging - readiness check",
            "status": "UP",
            "data": {
                "deviceIn": "[OK]",
                "deviceOut": "[OK]"
            }
        }
    ]
}

以上透過 Quarkus 輕鬆地透過框架配置了健康檢查部分。如果使用 Quarkus Kubernetes 依賴,生成 Kubernetes YAML 時,會在 Pod 層級資源中自動配置 livenessreadiness 等探針,其預設值不會特別去異動,就相信專業。接著將其部署至 Kubernetes 上。產生的 Deployment 如下

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    app.quarkus.io/quarkus-version: 3.13.3
    app.quarkus.io/build-timestamp: 2024-08-27 - 14:38:39 +0000
  labels:
    app.kubernetes.io/name: app-health
    app.kubernetes.io/version: 1.0.0-SNAPSHOT
    app.kubernetes.io/managed-by: quarkus
  name: app-health
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: app-health
      app.kubernetes.io/version: 1.0.0-SNAPSHOT
  template:
    metadata:
      annotations:
        app.quarkus.io/quarkus-version: 3.13.3
        app.quarkus.io/build-timestamp: 2024-08-27 - 14:38:39 +0000
      labels:
        app.kubernetes.io/managed-by: quarkus
        app.kubernetes.io/name: app-health
        app.kubernetes.io/version: 1.0.0-SNAPSHOT
    spec:
      containers:
        - env:
            - name: KUBERNETES_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          image: registry.hub.docker.com/cch0124/app-health:1.0.0-SNAPSHOT
          imagePullPolicy: Always
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /q/health/live
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 5
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 10
          name: app-health
          ports:
            - containerPort: 8080
              name: http
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /q/health/ready
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 5
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 10
          startupProbe:
            failureThreshold: 3
            httpGet:
              path: /q/health/started
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 5
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 10

下表整理上述所提的內容。

Health check endpoints HTTP status JSON payload status
/q/health/live and /q/health/ready 200 UP
/q/health/live and /q/health/ready 503 DOWN
/q/health/live and /q/health/ready 500 Undetermined *

從本範例中我們可以大致如下圖模擬 liveness 檢查流程

https://ithelp.ithome.com.tw/upload/images/20240828/201046888gpsu4FkID.png

  1. liveness 檢查和應用程式接收流量正程執行
  2. liveness 檢查失敗
  3. 應用程式無正常運行;HTTP 回應 Error
  4. 三次嘗試失敗後,容器重新啟動。在重啟期間流量被重定向到另一個實例
  5. 同 1

從本範例中我們可以大致如下圖模擬 readiness 檢查流程

https://ithelp.ithome.com.tw/upload/images/20240828/20104688X6T1Egq8AV.png

  1. readiness 檢查和應用程式接收流量正程執行
  2. readiness 檢查失敗,因為 DB 無法正常連線
  3. 應用程式無正常運行;HTTP 回應 Error,流量被重定向到另一個實例
  4. DB 恢復連線,readiness` 檢查回應為 UP
  5. 同 1

嘗試將 EMQX 服務停止。

docker stop emqx

此時去看 Pod 狀態,發現 Ready 變 False,且是 Readiness 探針是 503 屬於非成功。因此會不斷的嘗試


$ kubectl describe pods app-health-7bc94b5cdf-zlqwk
Name:             app-health-7bc94b5cdf-zlqwk
Namespace:        default
Priority:         0
...
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  56m                 default-scheduler  Successfully assigned default/app-health-7bc94b5cdf-zlqwk to k3d-ithome-lab-cluster-agent-1
  Normal   Pulling    56m                 kubelet            Pulling image "registry.hub.docker.com/cch0124/app-health:1.0.0-SNAPSHOT"
  Normal   Pulled     56m                 kubelet            Successfully pulled image "registry.hub.docker.com/cch0124/app-health:1.0.0-SNAPSHOT" in 7.466872458s (7.466893449s including waiting)
  Normal   Created    56m                 kubelet            Created container app-health
  Normal   Started    56m                 kubelet            Started container app-health
  Warning  Unhealthy  5s (x21 over 3m5s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503

且 Service 對應的 Endpoints 會將 Pod 地址移除,如下。其最短移除時間可以這樣計算 (failureThreshold - 1) * periodSeconds + timeoutSeconds

$ kubectl get endpoints
NAME            ENDPOINTS         AGE
...
app-health                        80m

如果將 emqx 給啟用則 Service 將其 Pod 位置加入至 Endpoints 資源上。

$ kubectl get endpoints -w
NAME            ENDPOINTS         AGE
app-health      10.42.1.13:8080   113m
 kubectl get pods -w -owide
NAME                                  READY   STATUS    RESTARTS       AGE     IP           NODE                              NOMINATED NODE   READINESS GATES
app-health-7bc94b5cdf-zlqwk           1/1     Running   0              114m    10.42.1.13   k3d-ithome-lab-cluster-agent-1    <none>           <none>

可以從上面結果得到以下結論:

  1. initialDelaySeconds: Pod 啟動後需要 5 秒的初始化時間
  2. Pod 的是否就緒接收對外請求健康狀態是透過檢查 /q/health/ready 端點的 HTTP 狀態碼來判斷
  3. 如果 Pod Readiness 探針持續錯誤,Kubernetes 會將其從 Endpoint 移除,以提供可用性,前提是副本數量要有
  4. periodSeconds: 每 10 秒執行一次 Readiness Probe
  5. successThreshold: 只需要一次成功的檢查,Kubernetes 就認為 Pod 就緒,即可接收外部請求
  6. timeoutSeconds: 每次檢查的超時時間為 10 秒,10 秒內沒有收到回應,則此次檢查視為失敗

如果將 liveness 探針的目標端點指向為存在的路徑,其會被 kubelet 殺掉並重啟,如下。重啟時間週期是

第一次探針:失敗(因為 /q/health/live 不存在)
第二次探針:失敗(因為 /q/health/live 不存在)
第三次探針:失敗(因為 /q/health/live 不存在)

由於每次探針的間隔是 10 秒,加上有 3 次失敗機會(failureThreshold)。所以最短在 30 秒後(10 秒 * 2 + 10 秒),Kubernetes 就會判定 Pod 不健康,並觸發重啟,計算上是 (failureThreshold - 1) * periodSeconds + timeoutSeconds

$ kubectl get pods -o wide -w
app-health-7bc94b5cdf-zlqwk           0/1     Running   0               127m    10.42.1.13   k3d-ithome-lab-cluster-agent-1    <none>           <none>
app-health-6496d8dbc4-qgcdn           0/1     Running   2 (21s ago)     29s     10.42.0.9    k3d-ithome-lab-cluster-agent-0    <none>           <none>
app-health-6496d8dbc4-qgcdn           0/1     Running   2 (22s ago)     30s     10.42.0.9    k3d-ithome-lab-cluster-agent-0    <none>           <none>
app-health-7bc94b5cdf-zlqwk           1/1     Running   0               127m    10.42.1.13   k3d-ithome-lab-cluster-agent-1    <none>           <none>
app-health-6496d8dbc4-qgcdn           1/1     Running   2 (24s ago)     32s     10.42.0.9    k3d-ithome-lab-cluster-agent-0    <none>           <none>
app-health-7bc94b5cdf-zlqwk           1/1     Terminating   0               127m    10.42.1.13   k3d-ithome-lab-cluster-agent-1    <none>           <none>
app-health-7bc94b5cdf-zlqwk           0/1     Terminating   0               127m    <none>       k3d-ithome-lab-cluster-agent-1    <none>           <none>
app-health-7bc94b5cdf-zlqwk           0/1     Terminating   0               127m    10.42.1.13   k3d-ithome-lab-cluster-agent-1    <none>           <none>
app-health-7bc94b5cdf-zlqwk           0/1     Terminating   0               127m    10.42.1.13   k3d-ithome-lab-cluster-agent-1    <none>           <none>
app-health-7bc94b5cdf-zlqwk           0/1     Terminating   0               127m    10.42.1.13   k3d-ithome-lab-cluster-agent-1    <none>           <none>
app-health-6496d8dbc4-qgcdn           0/1     Running       3 (2s ago)      63s     10.42.0.9    k3d-ithome-lab-cluster-agent-0    <none>           <none>
app-health-6496d8dbc4-qgcdn           0/1     Running       3 (9s ago)      70s     10.42.0.9    k3d-ithome-lab-cluster-agent-0    <none>           <none>
app-health-6496d8dbc4-qgcdn           1/1     Running       3 (11s ago)     72s     10.42.0.9    k3d-ithome-lab-cluster-agent-0    <none>           <none>
app-health-6496d8dbc4-qgcdn           0/1     Running       4 (2s ago)      103s    10.42.0.9    k3d-ithome-lab-cluster-agent-0    <none>           <none>
app-health-6496d8dbc4-qgcdn           0/1     Running       4 (9s ago)      110s    10.42.0.9    k3d-ithome-lab-cluster-agent-0    <none>           <none>
app-health-6496d8dbc4-qgcdn           1/1     Running       4 (11s ago)     112s    10.42.0.9    k3d-ithome-lab-cluster-agent-0    <none>           <none>

events 可以清楚知道原因其其顯示在 Message 欄位,這對於除錯無可厚非是個好資源。

$  kubectl get events --sort-by='.lastTimestamp'
LAST SEEN   TYPE      REASON              OBJECT                             MESSAGE
7m46s       Normal    Scheduled           pod/app-health-6496d8dbc4-qgcdn    Successfully assigned default/app-health-6496d8dbc4-qgcdn to k3d-ithome-lab-cluster-agent-0
11m         Warning   Unhealthy           pod/app-health-7bc94b5cdf-zlqwk    Readiness probe failed: HTTP probe failed with statuscode: 503
7m47s       Normal    ScalingReplicaSet   deployment/app-health              Scaled up replica set app-health-6496d8dbc4 to 1
7m47s       Normal    SuccessfulCreate    replicaset/app-health-6496d8dbc4   Created pod: app-health-6496d8dbc4-qgcdn
7m41s       Normal    Pulled              pod/app-health-6496d8dbc4-qgcdn    Successfully pulled image "registry.hub.docker.com/cch0124/app-health:1.0.0-SNAPSHOT" in 5.506520701s (5.506540622s including waiting)
7m39s       Normal    Pulled              pod/app-health-6496d8dbc4-qgcdn    Successfully pulled image "registry.hub.docker.com/cch0124/app-health:1.0.0-SNAPSHOT" in 1.49692412s (1.49693049s including waiting)
7m23s       Normal    Pulled              pod/app-health-6496d8dbc4-qgcdn    Successfully pulled image "registry.hub.docker.com/cch0124/app-health:1.0.0-SNAPSHOT" in 1.619209349s (1.619233429s including waiting)
7m15s       Normal    SuccessfulDelete    replicaset/app-health-7bc94b5cdf   Deleted pod: app-health-7bc94b5cdf-zlqwk
7m15s       Normal    ScalingReplicaSet   deployment/app-health              Scaled down replica set app-health-7bc94b5cdf to 0 from 1
6m47s       Warning   Unhealthy           pod/app-health-6496d8dbc4-qgcdn    Liveness probe failed: HTTP probe failed with statuscode: 404
6m47s       Normal    Killing             pod/app-health-6496d8dbc4-qgcdn    Container app-health failed liveness probe, will be restarted
6m47s       Warning   Unhealthy           pod/app-health-6496d8dbc4-qgcdn    Readiness probe failed: Get "http://10.42.0.9:8080/q/health/ready": EOF
6m46s       Normal    Pulling             pod/app-health-6496d8dbc4-qgcdn    Pulling image "registry.hub.docker.com/cch0124/app-health:1.0.0-SNAPSHOT"
6m44s       Normal    Pulled              pod/app-health-6496d8dbc4-qgcdn    Successfully pulled image "registry.hub.docker.com/cch0124/app-health:1.0.0-SNAPSHOT" in 1.530334886s (1.530354226s including waiting)
6m44s       Normal    Created             pod/app-health-6496d8dbc4-qgcdn    Created container app-health
6m44s       Normal    Started             pod/app-health-6496d8dbc4-qgcdn    Started container app-health
2m40s       Warning   BackOff             pod/app-health-6496d8dbc4-qgcdn    Back-off restarting failed container app-health in pod app-health-6496d8dbc4-qgcdn_default(cc9614a0-0583-40b0-9d3b-85e944c47172)

liveness 探針觀察到的行為是

  1. initialDelaySeconds: 當 Pod 啟動後,Kubernetes 會等待 5 秒。
  2. periodSeconds: 每 10 秒執行探針,向 Pod 的 8080 端口發送一個 HTTP GET 請求,請求 /q/health/live。
  3. failureThreshold: 如果收到成功的 HTTP 響應(通常是 200),則認為 Pod 是健康的。如果連續 3 次檢查都失敗,Kubernetes 就會認為這個 Pod 已經死亡,並重新啟動它。
  4. 重啟 Pod,如果 Pod 被判定為死亡,Kubernetes 會殺死這個 Pod,並根據 Pod 的配置重新創建一個新的 Pod。

若整題來看,檢康檢查可以帶來以下優勢

  1. 自動化健康檢查:Kubernetes 會定期檢查 Pod 的健康狀況,無需手動干預
  2. 高可用:一旦發現 Pod 不健康,Kubernetes 會迅速採取行動,重新啟動 Pod,以保證服務的可用性
  3. 穩定性:透過及時發現處理故障,提高整個系統的穩定性
  4. 人為干預少:自動化的健康檢查可以減少人為干預,提高運維效率

下圖為探針的執行順序,執行完 startupProbe 後就會往後交給 livenessreadiness 執行。


From https://blog.devgenius.io/k8s-for-de-probes-b598a1adeecf


上一篇
Quarkus 你健康了嗎 ?
下一篇
Pod 的最後一哩路
系列文
當 Quarkus 想要騎乘駱駝並用8腳章魚掌控舵手 31
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言