iT邦幫忙

2025 iThome 鐵人賽

DAY 20
2
DevOps

DevOps 進化論:從全能型戰士到安全守門員系列 第 20

Day 20|Observability 全面監控:Prometheus × Grafana × ELK

  • 分享至 

  • xImage
  •  

●前言

Day 20,代表第一階段的壓軸登場。

前面我們用 Terraform × Helm × CI/CD,已經能自動化建 Infra、部署應用。

但 DevOps 閉環不只要會部署,更要能監控:

🔍我的系統現在健康嗎?

🔍哪裡出現瓶頸?

🔍出事時我要去哪裡看 Log?

因此,今天收尾在 Observability(可觀測性),打造從 Metrics、Logs 到 Visualization 的完整監控鏈路。

●核心觀念:三大支柱 🪐

Observability = 不只看單一數據,而是透過不同維度 拼湊全貌。

1.Metrics(度量指標) → 系統健康度

▪CPU、Memory、Request Latency、Error Rate

▪工具:Prometheus

2.Logs(日誌) → 發生了什麼事

▪API 錯誤、異常事件、業務紀錄

▪工具:ELK(Elasticsearch, Logstash, Kibana)

3.Tracing(追蹤) → 請求在系統的路徑

▪一個 request 經過哪些微服務?哪裡延遲?

▪工具:Jaeger / OpenTelemetry(此篇先點到)

📌 今天聚焦 Prometheus × Grafana × ELK,先補上最常用的監控閉環。

●實作步驟

1.準備環境及所需檔案

本地 Minikube (或 K8s Cluster)

👉指令 : minikube start

Terraform + Helm Provider

2.Terraform 設定 Provider

▪在 main.tf 中加入:

terraform {
  required_providers {
    helm = {
      source  = "hashicorp/helm"
      version = "2.12.1"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.29.0"
    }
  }
}

provider "kubernetes" {
  config_path    = "~/.kube/config"
  config_context = "minikube"
}

provider "helm" {
  kubernetes {
    config_path    = "~/.kube/config"
    config_context = "minikube"
  }
}

3.部署 Nginx Demo(加上 Metrics Exporter)

resource "helm_release" "nginx_demo" {
  name      = "nginx-demo"
  namespace = "default"

  # 推薦:先用本機 chart 避開 provider 解析 bug
  chart = "./charts/nginx"

  wait   = false
  atomic = false

  set {
    name  = "service.type"
    value = "ClusterIP"
  }

  set {
    name  = "resources.requests.cpu"
    value = "50m"
  }

  set {
    name  = "resources.requests.memory"
    value = "64Mi"
  }

  # 啟用 exporter
  set {
    name  = "metrics.enabled"
    value = "true"
  }

  # 讓 Prometheus Operator 自動抓(kube-prometheus-stack)
  set {
    name  = "metrics.serviceMonitor.enabled"
    value = "true"
  }
  set {
    name  = "metrics.serviceMonitor.namespace"
    value = "monitoring"
  }
  # 這個 label 一定要對到你的 kube-prometheus-stack 的 release 名稱(你是 kps)
  set {
    name  = "metrics.serviceMonitor.labels.release"
    value = "kps"
  }
}

4.部署 Prometheus + Grafana(kube-prometheus-stack)

# === Monitoring:kube-prometheus-stack(含 Grafana)===
resource "helm_release" "kps" {
  name             = "kps"
  repository       = "https://prometheus-community.github.io/helm-charts"
  chart            = "kube-prometheus-stack"
  version          = "62.7.0"
  namespace        = "monitoring"
  create_namespace = true
  timeout          = 600

  # 預設帳密
  set { name = "grafana.adminUser" value = "admin" }
  set { name = "grafana.adminPassword" value = "你的強密碼" }

  set { name = "grafana.service.type" value = "ClusterIP" }
  set { name = "prometheus.service.type" value = "ClusterIP" }
}

5.部署 ELK(Elasticsearch + Kibana + Fluent-Bit)

# === ELK:Elasticsearch(單節點,Demo 規模)===
resource "helm_release" "elasticsearch" {
  name             = "elasticsearch"
  repository       = "https://helm.elastic.co"
  chart            = "elasticsearch"
  version          = "8.5.1"
  namespace        = "elk"
  create_namespace = true
  timeout = 600


  # 極簡資源配置
  set {
    name  = "replicas"
    value = "1"
  }

  set {
    name  = "resources.requests.cpu"
    value = "200m"
  }

  set {
    name  = "resources.requests.memory"
    value = "1Gi"
  }

  set {
    name  = "resources.limits.cpu"
    value = "1"
  }

  set {
    name  = "resources.limits.memory"
    value = "2Gi"
  }

  set {
    name  = "esConfig.elasticsearch\\.yml"
    value = "xpack.security.enabled: false"
  }

  # 10Gi PVC
  set {
    name  = "volumeClaimTemplate.resources.requests.storage"
    value = "10Gi"
  }
}

# === ELK:Kibana ===
resource "helm_release" "kibana" {
  name       = "kibana"
  repository = "https://helm.elastic.co"
  chart      = "kibana"
  version    = "8.5.1"
  namespace  = "elk"

  # 指向上面的 ES 服務
  set {
    name  = "elasticsearchHosts"
    value = "http://elasticsearch-master.elk.svc:9200"
  }

  set {
    name  = "service.type"
    value = "ClusterIP"
  }

  depends_on = [helm_release.elasticsearch]
}

# === Logs Collector:Fluent-Bit(DaemonSet,把 k8s 容器日誌送 ES)===
resource "helm_release" "fluent_bit" {
  name       = "fluent-bit"
  repository = "https://fluent.github.io/helm-charts"
  chart      = "fluent-bit"
  version    = "0.46.7"
  namespace  = "elk"

  # Backend 設為 ES
  set {
    name  = "backend.type"
    value = "es"
  }

  set {
    name  = "backend.es.host"
    value = "elasticsearch-master.elk.svc"
  }

  set {
    name  = "backend.es.port"
    value = "9200"
  }

  set {
    name  = "backend.es.index"
    value = "kubernetes_logs"
  }

  set {
    name  = "backend.es.logstash_prefix"
    value = "k8s"
  }

  set {
    name  = "backend.es.replace_dots"
    value = "true"
  }

  # 讀取 /var/log/containers/*.log
  set {
    name  = "inputs.tail.enabled"
    value = "true"
  }

  set {
    name  = "inputs.tail.path"
    value = "/var/log/containers/*.log"
  }

  set {
    name  = "inputs.tail.parser"
    value = "docker"
  }

  set {
    name  = "inputs.tail.tag"
    value = "kube.*"
  }

  depends_on = [helm_release.elasticsearch]
}

6.Terraform 初始化及套用

terraform init
terraform apply -auto-approve

7.驗證 Deployment

👉指令 : kubectl get pods -A

▪monitoring namespace 下應該有 Grafana、Prometheus、Alertmanager。

▪elk namespace 下有 Elasticsearch、Kibana、Fluent-Bit。

8.登入 Grafana

🔒預設帳密 : admin/admin,進入系統會請使用者再重設密碼,或者可以先設定好寫入。

9.進到Dashboard > Prometheus/Overview(Try migration取代舊的Angular面板)

▪Try migration

https://ithelp.ithome.com.tw/upload/images/20250902/20178156AzqnnLHwt9.png

▪完整儀表板

https://ithelp.ithome.com.tw/upload/images/20250902/20178156bOWt7MeRQo.png

● 總結

1. Nginx metrics → Prometheus → Grafana。

2. Pod Logs → Fluent-Bit → Elasticsearch → Kibana。

3. Grafana 同時整合 Metrics + Logs,形成 Observability 閉環。

🎯到這裡,我們已經完成第一階段的 DevOps 閉環:從 IaC → 自動部署 → 監控觀測,正式具備一個能部署、能監控的完整平台。

📝 圖解說明

  • Nginx / Pods:透過 ServiceMonitor 導出 Metrics → Prometheus;Container logs 由 Fluent-Bit 收集。
  • Prometheus:收集與儲存 Metrics,並提供給 Grafana。
  • Grafana:可同時接 Prometheus(Metrics)與 Elasticsearch(Logs),統一視覺化。
  • Fluent-Bit:DaemonSet 收集 /var/log/containers/*.log,輸出到 Elasticsearch。
  • Elasticsearch & Kibana:Elasticsearch 儲存日誌,Kibana 提供查詢介面。
  • 使用者:DevOps / SRE 透過 Grafana 看 Metrics,透過 Kibana 查 Logs → 形成完整閉環。

🔽 以下是 Observability 架構圖:

https://ithelp.ithome.com.tw/upload/images/20250902/2017815635ArJ96wFw.png

👉下一篇 : Day 21|第一階段總結 × 資源調校:Requests/Limits + 效能測試


上一篇
Day 19|Terraform × Helm : 與 CI/CD 的無縫串接
系列文
DevOps 進化論:從全能型戰士到安全守門員20
圖片
  熱門推薦
圖片
{{ item.channelVendor }} | {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言