Day29 操作 Kubenetes 時常會出現的一些錯誤

2022 iThome 鐵人賽

DAY 29

Software Development

被容器束縛住的小宇宙系列第 29 篇

14th鐵人賽

團隊NUTC_IMAC_柬埔寨旅遊團

2022-10-14 13:14:25

1919 瀏覽

分享至

這幾天下來， Kubernetes 的介紹也已經接近尾聲了，不曉得各位有沒有因為我的介紹而更了解不論是 K8s 亦或是 Docker 、 VM，介紹這些東西之前，在自己練習的過程也遇到了各種大大小小不同的問題，有些 Bug 也因為忘記做一些事前就該做的事情，所以總是重複報錯。

今天就讓我來跟各位說一些我常犯錯的地方，可能需要注意的地方！

機器重開機後， Kubernetes 突然不能用了。

不曉得各位有沒有碰過這問題，在前一天還很開心的在用 K8s 實做一些自己的小成品到一個段落後，想說先休息一下，明天再繼續努力，結果隔天電腦一開起來，卻看到下面這段文字：

ubuntu@ubuntu:~$ kubectl get node
The connection to the server 192.168.xx.xx:6443 was refused - did you specify the right host or port?

看到這問題時別太緊張，先檢查一下自己 swap 有沒有關閉，因為 K8s 在運作時必須要關閉 swap 來避免交換造成的效能浪費。所以，這時候可以下這段指令：

sudo swapoff -a && systemctl restart kubelet.service

這邊將 swap 重新關閉並重啟 Kubelet ，之後你再查詢一次 node 的狀態會發現，他恢復正常了！

ubuntu@ubuntu:~$ kubectl get node
NAME     STATUS   ROLES    AGE   VERSION
ubuntu   Ready    master   18d   v1.18.20

建立 Pod 時，STATUS 出現不同的訊息，像是：

ubuntu@ubuntu:~$ kubectl get pod 
NAME         READY   STATUS             RESTARTS   AGE
ubuntu-pod   0/1     ImagePullBackOff   0          20s

這時候看到這個也別慌張，我們可以先看看 Pod 裡面出什麼錯：

ubuntu@ubuntu:~$ kubectl describe pod ubuntu-pod
Name:         ubuntu-pod
Namespace:    default
Priority:     0
Node:         ubuntu/192.168.64.11
Start Time:   Thu, 15 Sep 2022 10:53:44 +0000
Labels:       app=test
Annotations:  Status:  Pending
IP:           10.244.0.10
IPs:
  IP:  10.244.0.10
Containers:
  ubuntu:
    Container ID:  
    Image:         ubuntu:200.4
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      bash
      -c
      for ((i = 0; ; i++)); do echo "$i: $(date)"; sleep 100; done
    State:          Waiting
      Reason:       ErrImagePull
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mcxxs (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-mcxxs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-mcxxs
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  3m18s                 default-scheduler  Successfully assigned default/ubuntu-pod to ubuntu
  Normal   Pulling    108s (x4 over 3m19s)  kubelet            Pulling image "ubuntu:200.4"
  Warning  Failed     104s (x4 over 3m15s)  kubelet            Failed to pull image "ubuntu:200.4": rpc error: code = Unknown desc = Error response from daemon: manifest for ubuntu:200.4 not found: manifest unknown: manifest unknown
  Warning  Failed     104s (x4 over 3m15s)  kubelet            Error: ErrImagePull
  Warning  Failed     76s (x6 over 3m15s)   kubelet            Error: ImagePullBackOff
  Normal   BackOff    61s (x7 over 3m15s)   kubelet            Back-off pulling image "ubuntu:200.4"

這一段訊息最重要的是要看下面的 Events，下面有一段寫著「Failed to pull image "ubuntu:200.4"」，這時我們就知道，原來是沒有辦法將 image pull下來，因為不小心把版本打錯， Kubernetes 系統抓不到這版本的 ubuntu 所以才報錯。所以下次遇到類似問題時，可以先看看 Pod 內部發生了什麼錯，也可以用 logs 去看是否有什麼問題，之後再根據錯誤去 Debug。

而以上這幾個是我練習 Kubernetes 時最常遇到的問題，如果讀者們在練習時，也有碰到的話，希望這些方法可以幫助到你們！

那到這邊，我的介紹也大概到了一個段落了，我們就明天再見吧！
大家掰掰！