第 12 屆 iThome 鐵人賽

DAY 25

DevOps

Docker獸究極進化～～ Kubernetes獸系列第 25 篇

Day-25 認識DaemonSet

12th鐵人賽 devops kubernetes

flynncanfly

團隊404 Not Found

2020-10-10 11:19:15

7398 瀏覽

分享至

前言

還記得我們前面所介紹的Deployment嗎？ Deployment一對多的掌握ReplicaSet，ReplicaSet也一對多的掌握Pod，利用水平擴增與Replicas去處理過多的任務。但會遇到一些情境是目前Deployment所無法處理的，而DaemonSet與StatefulSet就是因應這些情況所孕育出的產物。

What is DaemonSet ?

Description

DamonSet會確保在所有(或是特定)節點上，一定運行著指定的一個Pod，並且每當有新的Node加入Cluster時，DaemonSet會為他們新增這指定的一個Pod，同時只要有Node被移除Cluster外，在這Node上的指定Pod也會被移除。若想只運行在特定節點，則需要配合前面章節所講到的Day-23 Affinity and Anti-Affinity與Day-24 Taints and Tolerations做使用。

最後，當DaemonSet被移除的同時，它將刪除所有由他所創建的Pod。

When to use ?

叢集類的儲存空間，像是ceph與glusterd等。
日誌的搜集系統，像是fluent-bit、fluentd與logstash等。
監測系統，像是Prometheus與Datadog等。

How to write DaemonSet ?

YAML

老規矩，我們先來段deployment轉換而成的daemonSet yaml

daemonset.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ironman
  labels:
    name: ironman
    app: ironman
spec:
  minReadySeconds: 5
  selector:
    matchLabels:
      app: ironman
  template:
    metadata:
      labels:
        app: ironman
    spec:
      nodeSelector:
        app: ironman
      containers:
       - name: ironman
         image: ghjjhg567/ironman:latest
         imagePullPolicy: Always
         ports:
           - containerPort: 8100
         envFrom:
           - secretRef:
               name: ironman-config
         command: ["./docker-entrypoint.sh"]
         resources:
           limits:
             memory: 200Mi
           requests:
             cpu: 100m
             memory: 200Mi
       - name: redis
         image: redis:4.0
         imagePullPolicy: Always
         ports:
           - containerPort: 6379
       - name: nginx
         image: nginx
         imagePullPolicy: Always
         ports:
           - containerPort: 80
         volumeMounts:
           - mountPath: /etc/nginx/nginx.conf
             name: nginx-conf-volume
             subPath: nginx.conf
             readOnly: true
           - mountPath: /etc/nginx/conf.d/default.conf
             subPath: default.conf
             name: nginx-route-volume
             readOnly: true
         readinessProbe:
           httpGet:
             path: /v1/hc
             port: 80
           initialDelaySeconds: 5
           periodSeconds: 10
      volumes:
        - name: nginx-conf-volume
          configMap:
            name: nginx-config
        - name: nginx-route-volume
          configMap:
            name: nginx-route-volume

比較deployment與daemonSet我們可以發現幾件事：

DaemonSet的kind為DaemonSet，並且其他info與deployment無異。
DaemonSet是透過DaemonSet controller來派發pod到node上，並確保每個node(特定node)一定會有一個pod在運行著，因此保留pod的template部分。
因為DaemonSet不在是透過ReplicaSet來管理Pod，所以理所當然地將ReplicaSet與Deployment的相關設定去除。
我們希望daemonSet的pod只運行在特定node上，因此使用nodeSelector，忘記nodeSelector的讀者可以回去查閱Day-23 Affinity and Anti-Affinity。

.spec.selector:

一個object，由兩個key-value組成，matchLabels與matchExpressions。

matchLabels: 與ReplicaSetController的.spec.selector相同，daemonSet controller一樣會匹配labels與selector相符的所有pod經由這些daemonSet controller來創建或刪除pod。
matchExpressions: 複雜版本的matchLabels。除了指定key-value外，還能通過指定key、value並關聯列表組成operator。

Deployment

Step0

照慣例先刪除default namespace下所有的pod resource，來個乾淨的測試環境。

$ kubectl get pod
No resources found in default namespace.

Step2

決定要部署的node，這邊我們預期只在兩個Node上運行pod，所以只在兩個node上新增label app=ironman，並且yaml再配合nodeSelector使用。

$ kubectl get node
NAME                                                STATUS   ROLES    AGE   VERSION
gke-my-first-cluster-1-default-pool-dddd2fae-j0k1   Ready    <none>   12d   v1.18.6-gke.3504
gke-my-first-cluster-1-default-pool-dddd2fae-rfl8   Ready    <none>   12d   v1.18.6-gke.3504
gke-my-first-cluster-1-default-pool-dddd2fae-tz38   Ready    <none>   12d   v1.18.6-gke.3504

$ kubectl label nodes gke-my-first-cluster-1-default-pool-dddd2fae-j0k1 app=ironman
node/gke-my-first-cluster-1-default-pool-dddd2fae-j0k1 labeled
$kubectl label nodes gke-my-first-cluster-1-default-pool-dddd2fae-rfl8 app=ironman
node/gke-my-first-cluster-1-default-pool-dddd2fae-rfl8 labeled

Step3

一樣透過kubectl來部署daemonSet，並且發現確實daemonSet只將pod部署在兩個有label的節點上。

$ kubectl apply -f daemonset.yaml
daemonset.apps/ironman created

$ kubectl get daemonset --watch
NAME      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
ironman   2         2         0       2            0           app=ironman     10s
ironman   2         2         1       2            0           app=ironman     22s
ironman   2         2         2       2            0           app=ironman     23s
ironman   2         2         2       2            2           app=ironman     28s

部署完後在GKE上可以看到daemonSet的狀態，並也確認在每個node上都有該pod運行

How Daemon Pods are scheduled

DaemonSet的pod並非由Kubernetes scheduler來選擇與管理，而是DaemonSet的Controller來創建調度，因此也帶來以下問題:

Pod行為的不一致性：一般的Pod在剛創立時status為Pending，但DaemonSet所創立的Pod並無Pending這狀態，這也讓使用者感到困惑。
Pod的搶佔優先序：啟用搶佔後，DaemonSet Controller將在不考慮Pod的優先序情況下制定搶佔決策。

也因此v1.12後所有的DaemonSet pod調度都由default scheduler來進行。

Scheduled by default scheduler

但還是有方法能夠讓daemonSet pod由Kubernetes scheduler來管理，方法是透過nodeAffinity來做調度，而非.spen.NodeName。Kubernetes scheduler會將pod部署至特定Node上，如果node affinity of the DaemonSet pod已經存在，則會替換掉它。

Tips: 忘記nodeAffinity如何使用的讀者，請參閱Day-23 Affinity and Anti-Affinity

nodeAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchFields:
      - key: metadata.name
        operator: In
        values:
        - target-host-name

Check

我們來看一下剛剛創的daemonSet pod吧，我們可以看到pod確實都是將由default-scheduler來進行了。而daemonSet controller則專門管理daemonSet。

$ kubectl describe pod ironman-jtntw
...
..
Events:
  Type    Reason                   Age                From                                                        Message
  ----    ------                   ----               ----                                                        -------
  Normal  Scheduled                21s                default-scheduler                                           Successfully assigned default/ironman-jtntw to gke-my-first-cluster-1-default-pool-dddd2fae-j0k1
  Normal  LoadBalancerNegNotReady  21s (x2 over 21s)  neg-readiness-reflector                                     Waiting for pod to become healthy in at least one of the NEG(s): [k8s1-11092c60-default-ironman-80-859c8e3a]
  Normal  Pulling                  21s                kubelet, gke-my-first-cluster-1-default-pool-dddd2fae-j0k1  Pulling image "ghjjhg567/ironman:latest"
  Normal  Pulled                   18s                kubelet, gke-my-first-cluster-1-default-pool-dddd2fae-j0k1  Successfully pulled image "ghjjhg567/ironman:latest"
  Normal  Created                  18s                kubelet, gke-my-first-cluster-1-default-pool-dddd2fae-j0k1  Created container ironman
  Normal  Started                  18s                kubelet, gke-my-first-cluster-1-default-pool-dddd2fae-j0k1  Started container ironman
  Normal  Pulling                  18s                kubelet, gke-my-first-cluster-1-default-pool-dddd2fae-j0k1  Pulling image "redis:4.0"
  Normal  Pulled                   14s                kubelet, gke-my-first-cluster-1-default-pool-dddd2fae-j0k1  Successfully pulled image "redis:4.0"
  Normal  Created                  14s                kubelet, gke-my-first-cluster-1-default-pool-dddd2fae-j0k1  Created container redis
  Normal  Started                  14s                kubelet, gke-my-first-cluster-1-default-pool-dddd2fae-j0k1  Started container redis
  Normal  Pulling                  14s                kubelet, gke-my-first-cluster-1-default-pool-dddd2fae-j0k1  Pulling image "nginx"
  Normal  Pulled                   14s                kubelet, gke-my-first-cluster-1-default-pool-dddd2fae-j0k1  Successfully pulled image "nginx"
  Normal  Created                  14s                kubelet, gke-my-first-cluster-1-default-pool-dddd2fae-j0k1  Created container nginx
  Normal  Started                  13s                kubelet, gke-my-first-cluster-1-default-pool-dddd2fae-j0k1  Started container nginx
  Normal  LoadBalancerNegReady     2s                 neg-readiness-reflector                                     Pod has become Healthy in NEG "Key{\"k8s1-11092c60-default-ironman-80-859c8e3a\", zone: \"asia-east1-a\"}" attached to BackendService "Key{\"k8s1-11092c60-default-ironman-80-859c8e3a\"}". Marking condition "cloud.google.com/load-balancer-neg-ready" to True.

此外，系統會自動添加label為node.kubernetes.io/unschedulable:Noschedule 的容忍度到Daemonset中的
Pods。在調度DaemonSet Pod時，默認調度器會忽略unschedulable節點。

Taints and Tolerations with DaemonSet

DaemonSet controller會自動將以下的tolerations加進DaemonSet Pod當中

Toleration	Effect	Version	Description
node.kubernetes.io/not-ready	No Execute	1.13+	當出現類似網絡斷開的情況導致節點問題時，DaemonSet Pod 不會被逐出。
node.kubernetes.io/unreachable	No Execute	1.13+	當出現類似於網絡斷開的情況導致節點問題時，DaemonSet Pod 不會被逐出。
node.kubernetes.io/disk-pressure	No Schedule	1.8+
node.kubernetes.io/memory-pressure	No Schedule	1.8+
node.kubernetes.io/unschedulable	No schedule	1.12+	DaemonSet Pod能夠容忍默認調度器所設置的unschedulable屬性.
node.kubernetes.io/network-unschedulable	No Schedule	1.12+	DaemonSet在使用宿主網絡時，能夠容忍默認調度器所設置的network-unavailable屬性。

Communicating with Daemon Pods

與DaemonSet 中的Pod 進行通信的幾種可能模式如下：

Push：配置DaemonSet中的Pod，將更新發送到另一個服務，例如統計數據庫。這些服務沒有客戶端。
NodeIP和已知端口：DaemonSet中的Pod可以使用hostPort，從而可以通過節點IP訪問到Pod。客戶端能通過某種方法獲取節點IP列表，並且基於此也可以獲取到相應的端口。
DNS：創建具有相同Pod選擇算符的Headless Service，通過使用endpoints資源或從DNS中檢索到多個A記錄來發現DaemonSet。
Service：創建具有相同Pod選擇算符的服務，並使用該服務隨機訪問到某個節點上的守護進程（沒有辦法訪問到特定節點）。

這邊要特別提出來講Headless Service

Headless Service

這邊我們先來複習一下原有的Service是如何運作(可參考Day-21 Service Kube-dns and Kube-proxy)

每個Service都會有著自己的clusterIP
kube-dns將協助把service domain name轉換成clusterIP
kube-proxy再將這些clusterIP經由iptables導流至Pod

那Headless Service又有什麼不同呢？

headless_service.yaml

apiVersion: v1
kind: Service
metadata:
  name: ironman
  labels:
    app: ironman
spec:
  clusterIP: None
  ports:
    - name: ironman
      protocol: TCP
      port: 80
      targetPort: 80
  selector:
    app: ironman

主要差別就在於clusterIP為None

kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
ironman      ClusterIP   None         <none>        80/TCP    2s

那麼沒有clusterIP又有何差別呢？

沒有 ClusterIP，因此存取 service 時，k8s DNS 就沒有任何 ClusterIP 的資訊可以回應給 client
若有搭配 Label Selectors，k8s 就會建立相對應的 endpoint，而存取 service 時，k8s DNS 就會直接回應 endpoint list 的資訊(A record)，因此 client 可以使用 service domain name 直接存取到 pod，這也意外著headless service並沒有負載平衡。
若是沒有搭配 Label Selector，就沒有 ClusterIP 也沒有對應的 Endpoint。

一般來說 StatefulSet才會經常用到Headless Service!

Update DaemonSet

Node labels update: 當節點標籤有更新時，DaemonSet會向上匹配符合標籤的pod，並移除不符合標籤的pod。
Update DaemonSet Pod: 當你直接修改DaemonSet的Pod時，並不影響DaemonSet Pod template，因此新產生的DaemonSet Pod還是會依照舊template產生。
Delete DaemonSet: 當你刪除Daemonset時，使用kubectl並指定--cascade=false，
則原有正在運行的Pod並不會被刪除，接下來如果創建使用相同選擇算符的新DaemonSet，新的DaemonSet會收養已有的Pod。如果有Pod需要被替換，DaemonSet會根據其updateStrategy來替換。

Github Repo

本篇章所有程式碼將放在下面的github project當中的branch day-25

後記

DaemonSet的篇章就到這邊結束了，也希望能夠拋磚引玉，讓大家使用DaemonSet來像是蒐集log，避免避開非Worker在運行的Node..等。在後面章節也會在介紹StatefulSet，敬請期待！

Reference

https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/

Day-24 學習 Taints 與 Tolerations

Day-26 瞭解 Kubernetes Storage

系列文

Docker獸究極進化～～ Kubernetes獸共 30 篇

RSS系列文訂閱系列文

78 人訂閱

完整目錄

1 則留言

Cynthia

iT邦新手 5 級 ‧ 2020-12-30 19:04:27

抱歉，請問一下
所以 v1.12 之後 Pod Scheduling 預設是回到由 default scheduler來調度？之後卻又寫說"但還是有方法能夠讓daemonSet pod由 Kubernetes scheduler 來管理..."，所以預設的 Pod Scheduling 是 k8s scheduler（在我理解中 Kubernetes scheduler = default scheduler）？還是 DaemonSet controller 但可以修改 yaml 改用 k8s scheduler ?

回應
檢舉

登入發表回應

我要留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22211 篇

完賽人數

600 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

Docker獸 究極進化 ～～ Kubernetes獸系列 第 25 篇