iT邦幫忙

2024 iThome 鐵人賽

DAY 9
0

接續上一章節,為了更方便觀察外部送進來的請求。新增了一個 API

    @GET
    @Path("/longtime")
    @Produces(MediaType.APPLICATION_JSON)
    public String day09(@QueryParam(value = "time") Long time) {
        try {
            log.infof("Sleep %d", time);
            Thread.sleep(Duration.ofSeconds(time));
            log.info("Get Data.");
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        return "ithomeHello";
    }

以及一個可觀察 Quarkus 生命週期的類別。

@ApplicationScoped
public class LifecycleBean {
    @Inject
    Logger log;

    void onStart(@Observes StartupEvent event) {
        log.info("The quarkus application is starting...");
    }

    void onStop(@Observes ShutdownEvent event) {
        log.info("The quarkus application is stopping...");

    }
}

統整以下場景,並使用上一章節方式觀察,

  1. 部署 YAML
  2. 刪除 Pod
  3. 上一步執行完後發送請求 HTTP 請求
#!/bin/bash

curl http://localhost:9090/hello/longtime?time=5
curl http://localhost:9090/hello/longtime?time=10
curl http://localhost:9090/hello/longtime?time=20

場景一 QUARKUS_SHUTDOWN_DELAY

將使用以下參數與配置進行驗證。

QUARKUS_SHUTDOWN_DELAY=35s
apiVersion: apps/v1
kind: Deployment
metadata:
 ...
  name: gracefulshutdown
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: gracefulshutdown
      app.kubernetes.io/version: day08
  template:
    ...
    spec:
      containers:
        - env:
            - name: KUBERNETES_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: QUARKUS_SHUTDOWN_DELAY
              value: "35s"
          image: registry.hub.docker.com/cch0124/gracefulshutdown:day08.3
          ...

第一行是刪除 Pod 之後,馬上發出請求,可以看到在 35s 內接收了三個請求,但最後一個並未執行完成。其

  1. 容器收到 SIGTERM 訊號後,執行那 QUARKUS_SHUTDOWN_DELAY 的設定秒數
  2. terminationGracePeriodSeconds 預設 30s 已經到了,所以對容器立即發出 KILL
  3. 可以發現容器是直接被殺掉並未有安全執行後續
2024-08-31 09:39:07,855 INFO  [org.cch.ExampleResource] (executor-thread-6) Sleep 5
2024-08-31 09:39:09,089 INFO  [org.cch.out.DeviceToPgDatabase] (executor-thread-1) save completed.
2024-08-31 09:39:09,090 INFO  [org.cch.out.AlertConsumer] (executor-thread-7) Is OK.
2024-08-31 09:39:09,345 INFO  [io.sma.health] (vert.x-eventloop-thread-21) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"SmallRye Reactive Messaging - readiness check","status":"UP","data":{"deviceIn":"[OK]","deviceOut":"[OK]"}},{"name":"Graceful Shutdown","status":"DOWN"},{"name":"Database connections health check","status":"UP","data":{"<default>":"UP"}}]}
2024-08-31 09:39:12,856 INFO  [org.cch.ExampleResource] (executor-thread-6) Get Data.
2024-08-31 09:39:12,863 INFO  [org.cch.ExampleResource] (executor-thread-6) Sleep 10
...
2024-08-31 09:39:14,090 INFO  [org.cch.out.AlertConsumer] (executor-thread-1) Is OK.
...
2024-08-31 09:39:19,091 INFO  [org.cch.out.AlertConsumer] (executor-thread-7) Is OK.
2024-08-31 09:39:19,346 INFO  [io.sma.health] (vert.x-eventloop-thread-23) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"SmallRye Reactive Messaging - readiness check","status":"UP","data":{"deviceIn":"[OK]","deviceOut":"[OK]"}},{"name":"Graceful Shutdown","status":"DOWN"},{"name":"Database connections health check","status":"UP","data":{"<default>":"UP"}}]}
2024-08-31 09:39:22,863 INFO  [org.cch.ExampleResource] (executor-thread-6) Get Data.
2024-08-31 09:39:22,869 INFO  [org.cch.ExampleResource] (executor-thread-6) Sleep 20
...
2024-08-31 09:39:24,091 INFO  [org.cch.out.AlertConsumer] (executor-thread-7) Is OK.
2024-08-31 09:39:29,092 INFO  [org.cch.out.AlertConsumer] (executor-thread-1) Is OK.
2024-08-31 09:39:29,345 INFO  [io.sma.health] (vert.x-eventloop-thread-25) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"SmallRye Reactive Messaging - readiness check","status":"UP","data":{"deviceIn":"[OK]","deviceOut":"[OK]"}},{"name":"Graceful Shutdown","status":"DOWN"},{"name":"Database connections health check","status":"UP","data":{"<default>":"UP"}}]}
2024-08-31 09:39:34,093 INFO  [org.cch.out.AlertConsumer] (executor-thread-7) Is OK.

從目前的流程來看,可以使用下面簡易的圖來看。QUARKUS_SHUTDOWN_DELAY 值過大導致 Quarkus 並未正確關閉。因此變成要拉大 terminationGracePeriodSeconds 的秒數,來保護應用程式。

https://ithelp.ithome.com.tw/upload/images/20240831/20104688a6ZWFoHP5l.png

但是除了拉長外 preStop 是否可以 ? 將 YAML 新增 lifecycle.preStop 欄位。

apiVersion: apps/v1
kind: Deployment
metadata:
 ...
  name: gracefulshutdown
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: gracefulshutdown
      app.kubernetes.io/version: day08
  template:
    ...
    spec:
      containers:
        - env:
            - name: KUBERNETES_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: QUARKUS_SHUTDOWN_DELAY
              value: "35s"
          image: registry.hub.docker.com/cch0124/gracefulshutdown:day08.3
          lifecycle:
            preStop:
              exec:
                command: ["sleep", "15"]
          ...

透過下圖可以來看,

https://ithelp.ithome.com.tw/upload/images/20240831/201046887lepI3kCWs.png

  1. 刪除 Pod 之後,同時間發送 HTTP 請求
  2. PreStop 延遲 15 秒
  3. PreStop 延遲同時間 Quarkus 持續處理請求
  4. PreStop 延遲 15 秒到之後發送 SIGTERM 訊號
  5. Quarkus 收到訊號開始進行處理 QUARKUS_SHUTDOWN_DELAY 設置處理,此時 terminationGracePeriodSeconds 剩下 15 秒,
  6. terminationGracePeriodSeconds 15 秒到後發送 Kill 訊號,Quarkus 無法正常處理 35s 的事直接被殺掉

正常優雅關閉的 Quarkus,在此範例中應該要出現以下訊息

[org.cch.LifecycleBean] (Shutdown thread) The quarkus application is stopping...
[io.quarkus] (Shutdown thread) gracefulshutdown stopped in 35.715s

對於 QUARKUS_SHUTDOWN_DELAY 而言,基本上是讓應用程式在這個期間有一個緩存,這過程中會將 rediness 探針狀態切換為 DOWN 狀態,使第三方服務不應該對此 Pod 作一個請求。

場景二 QUARKUS_SHUTDOWN_TIMEOUT

直接上 YAML

apiVersion: apps/v1
kind: Deployment
metadata:
...
  name: gracefulshutdown
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: gracefulshutdown
      app.kubernetes.io/version: day08
  template:
   ...
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - env:
            - name: KUBERNETES_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: QUARKUS_SHUTDOWN_DELAY
              value: "15s"
            - name: QUARKUS_SHUTDOWN_TIMEOUT
              value: "30s"
          image: registry.hub.docker.com/cch0124/gracefulshutdown:day08.3
          lifecycle:
            preStop:
              exec:
                command: ["sleep", "10"]
                ...

部署與驗證,配置了 QUARKUS_SHUTDOWN_TIMEOUT 執行完後會出現 All HTTP requests complete 訊息表示 HTTP 請求都已處理完成,因此這參數與 QUARKUS_SHUTDOWN_DELAY 是截然不同層面的配置。

2024-08-31 12:55:59,238 INFO  [io.qua.ver.htt.run.fil.GracefulShutdownFilter] (executor-thread-1) All HTTP requests complete
2024-08-31 12:55:59,239 INFO  [org.cch.LifecycleBean] (Shutdown thread) The quarkus application is stopping...

2024-08-31 12:56:05,607 INFO  [io.quarkus] (Shutdown thread) gracefulshutdown stopped in 32.687s

下面為流程圖

https://ithelp.ithome.com.tw/upload/images/20240831/20104688CS3lNNpk7j.png

如果再刪除 Pod 過程中將,多打一個 curl http://localhost:9090/hello/longtime?time=35 35 秒的請求,依照上面配置會發生什麼 ?

沒錯,會被 QUARKUS_SHUTDOWN_DELAY 擋下,此時該請求是無法正常進到 Quarkus 服務的,因為 Endpoints 資源無 Pod 位置。如果我們嘗試將 QUARKUS_SHUTDOWN_TIMEOUT 調整成 5s 會發生什麼 ? 會出現以下訊息,因為請求 curl http://localhost:9090/hello/longtime?time=20 會發生 5s 超時,因此對於應用程式來說就會進行關閉流程,通常會回應給用戶端 503 狀態碼。

2024-08-31 13:13:49,271 ERROR [io.qua.run.shu.ShutdownRecorder] (Shutdown thread) Timed out waiting for graceful shutdown, shutting down anyway. [Error Occurred After Shutdown]

到這邊可以總結

在 Pod 和服務作優雅關閉流程

  1. preStop 先執行
  2. preStop 先執行完成後,同時發送 SIGTERM 訊號給容器
  3. Quarkus 收到 SIGTERM 訊號開始進行優雅關閉流程
  4. 如果未在 terminationGracePeriodSeconds 時間內完成將發送 KILL 訊號強制關閉

Quarkus 配置的描述

  1. QUARKUS_SHUTDOWN_DELAY 預處理,再將服務變成不可用時,緩衝時間。在這時間範圍是還可以接收流量。
  2. QUARKUS_SHUTDOWN_TIMEOUT HTTP 請求最多可等待時間。

另外單純就 terminationGracePeriodSeconds 拉長時間,對 Quarkus 目前來看是沒有意義,因為收到 SIGTERM 訊號會立即將所有流量直接斷開。但是如下搭配 preStop 是能在 Quarkus 收到 SIGTERM 訊號前做一些延遲動作,這邊效果可以看上一章節。

apiVersion: apps/v1
kind: Deployment
metadata:
  ...
  name: gracefulshutdown
spec:
  replicas: 1
  ...
  template:
    ...
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - env:
            - name: KUBERNETES_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          image: registry.hub.docker.com/cch0124/gracefulshutdown:day08.3
          lifecycle:
            preStop:
              exec:
                command: ["sleep", "30"]
                ...

總而言之,服務本身的優雅關閉搭配 Pod 的優雅關閉,會是最好的實踐,即應用程式收到 SIGTERM 訊號時,可自己知道怎麼關閉服務,而不是讓 Kubernetes 資源來強制關機。服務本身對於關閉應用程式要處理很久的事物,如果大於 30s 就建議拉大預設的 terminationGracePeriodSeconds


上一篇
Quarkus 最後一哩路
下一篇
認識 Horizontal Pod Autoscaling(HPA)
系列文
當 Quarkus 想要騎乘駱駝並用8腳章魚掌控舵手 31
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言