iT邦幫忙

2025 iThome 鐵人賽

DAY 3
0
AI & Data

AIOps × Flows系列 第 3

【Day 03】環境佈署 II

  • 分享至 

  • xImage
  •  
  1. 安裝 Argo Rollouts Controller(Kubernetes 控制器)剛建立的 aiops kind 叢集
# 安裝 Argo Rollouts controller 到 argo-rollouts namespace
kubectl create namespace argo-rollouts || true
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

kubectl get pods -n argo-rollouts -o wide

※ 若出現 Pending:
可能原因及處理方法:
(A) 因為 control-plane taint(無 worker 的情境)

  • 狀況:kubectl describe pod 顯示 node=、FailedScheduling,但沒有資源不足訊息
  • 修正:對 Deployment 加 tolerations,允許排到 control-plane:
kubectl -n argo-rollouts patch deploy argo-rollouts -p '{
  "spec": { "template": { "spec": {
    "tolerations": [
      {"key":"node-role.kubernetes.io/control-plane","operator":"Exists","effect":"NoSchedule"},
      {"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}
    ]}}}}'

(B) ephemeral-storage 不足

  • 狀況:kubectl describe pod 的 Events 顯示 FailedScheduling ... Insufficient ephemeral-storage
  • 快速修正 1:移除資源 requests/limits
# 移除 containers[0].resources 整段(避免任何 request 觸發資源檢查)
kubectl -n argo-rollouts patch deploy argo-rollouts --type='json' -p='[
  {"op":"remove","path":"/spec/template/spec/containers/0/resources"}
]'
kubectl -n argo-rollouts rollout restart deploy/argo-rollouts
kubectl -n argo-rollouts rollout status deploy/argo-rollouts --timeout=120s
  • 快速修正 2:EmptyDir 改用記憶體(不落磁碟)
kubectl -n argo-rollouts patch deploy argo-rollouts --type='json' -p='[
  {"op":"add","path":"/spec/template/spec/volumes","value":[
    {"name":"plugin-bin","emptyDir":{"medium":"Memory","sizeLimit":"50Mi"}},
    {"name":"tmp","emptyDir":{"medium":"Memory","sizeLimit":"50Mi"}}
  ]},
  {"op":"add","path":"/spec/template/spec/containers/0/volumeMounts","value":[
    {"name":"plugin-bin","mountPath":"/home/argo-rollouts/plugin-bin"},
    {"name":"tmp","mountPath":"/tmp"}
  ]}
]'
kubectl -n argo-rollouts rollout restart deploy/argo-rollouts
kubectl get pods -n argo-rollouts -o wide
  1. 安裝 Helm
set -e

# 選版本(可用最新穩定版;這裡用 v3.15.3 作為例子)
HELM_VER="v3.15.3"

# 判斷架構
ARCH="$(uname -m)"
case "$ARCH" in
  x86_64|amd64)  PKG="helm-${HELM_VER}-linux-amd64.tar.gz";  SUBDIR="linux-amd64" ;;
  aarch64|arm64) PKG="helm-${HELM_VER}-linux-arm64.tar.gz";  SUBDIR="linux-arm64" ;;
  *) echo "Unsupported arch: $ARCH"; exit 1 ;;
esac

# 下載與安裝到 ~/.local/bin
mkdir -p "$HOME/.local/bin"
curl -fsSL "https://get.helm.sh/${PKG}" -o /tmp/${PKG}
tar -xzf /tmp/${PKG} -C /tmp
mv /tmp/${SUBDIR}/helm "$HOME/.local/bin/helm"
rm -rf /tmp/${SUBDIR} /tmp/${PKG}

# 加入 PATH(當前 shell 立即生效,並寫入 bashrc 供下次使用)
if ! echo "$PATH" | grep -q "$HOME/.local/bin"; then
  echo 'export PATH="$HOME/.local/bin:$PATH"' >> "$HOME/.bashrc"
  export PATH="$HOME/.local/bin:$PATH"
fi

# 驗證
helm version --short

※ 有部分安裝待日後實際開始執行 AIOps 再進行安裝
※ 明日計畫:設定監控機制及篩選系統


上一篇
【Day 02】環境佈署 I
系列文
AIOps × Flows3
圖片
  熱門推薦
圖片
{{ item.channelVendor }} | {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言