iT邦幫忙

2021 iThome 鐵人賽

DAY 7
0
DevOps

Hashicorp Jot Notes系列 第 7

Day 7. Hashicorp Nomad: Inspect a job

Hashicorp Nomad: Inspect a job

當一套工具有一個好的Web UI可以使用時,有時候會忘記CLI怎麼下

Job status

透過nomad job status 可以查看所有job的狀態

$ nomad job status
ID                                  Type     Priority  Status   Submit Date
java-batch                          batch    50        dead     2021-09-04T00:59:17+08:00
erp                                 service  50        running  2021-09-07T22:51:48+08:00
XXXXXXX                             service  50        dead     2021-09-03T11:02:47+08:00
test                                service  50        running  2021-09-03T11:22:34+08:00
webserv                             service  50        running  2021-08-27T08:38:14+08:00
web-standby                         service  50        running  2021-08-27T08:38:56+08:00

加上ID可以查看該job的詳細資訊: nomad job status erp

$ nomad job status erp
ID            = erp
Name          = erp
Submit Date   = 2021-09-07T22:51:48+08:00
Type          = service
Priority      = 50
Datacenters   = Nomad
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
webfront    0       0         2        2       0         0

Latest Deployment
ID          = 263de268
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
webfront    2        4       2        2          2021-09-07T23:02:05+08:00

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
59e6a6bb  970f4da9  webfront    0        run      running  12m22s ago  11m53s ago
9862d9de  970f4da9  webfront    0        run      running  12m22s ago  11m53s ago
07152288  970f4da9  webfront    0        stop     failed   13m41s ago  12m20s ago
4fa59294  970f4da9  webfront    0        stop     failed   13m41s ago  12m20s ago

Job evaluation status

Job evaluation是一個job的調度狀態,可以透過參數 -evals查看,
例:以下這個job, 一開始是 job-register, 過程有alloc-failure, 再到deployment-watcher
如果有Placement Failures=true,的情況可以使用 nomad eval status EvaluationsID來查看

$ nomad job status -evals erp
ID            = erp
Name          = erp
Submit Date   = 2021-09-07T22:51:48+08:00
Type          = service
Priority      = 50
Datacenters   = Nomad
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
webfront    0       0         2        2       0         0

Evaluations
ID        Priority  Triggered By        Status    Placement Failures
1ba4c1dc  50        deployment-watcher  complete  false
00d90ff5  50        alloc-failure       complete  false
68424b01  50        alloc-failure       complete  false
c260cdb5  50        deployment-watcher  complete  false
c87b22e4  50        alloc-failure       complete  false
e9e513e1  50        job-register        complete  false

Latest Deployment
ID          = 263de268
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
webfront    2        4       2        2          2021-09-07T23:02:05+08:00

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
59e6a6bb  970f4da9  webfront    0        run      running  15m14s ago  14m45s ago
9862d9de  970f4da9  webfront    0        run      running  15m14s ago  14m45s ago
07152288  970f4da9  webfront    0        stop     failed   16m33s ago  15m12s ago
4fa59294  970f4da9  webfront    0        stop     failed   16m33s ago  15m12s ago

Job allocation status

Job allocation是一個job的被分配後的狀態,包含cpu, memory, disk等,
job allocation失敗也會有log資訊
可以透過nomad alloc status AllocationID來查看

$ nomad alloc status 07152288
ID                   = 07152288-c0f0-dc4c-3133-110c38ea2c1f
Eval ID              = e9e513e1
Name                 = erp.webfront[0]
Node ID              = 970f4da9
Node Name            = nomad-worker
Job ID               = erp
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = stop
Desired Description  = alloc was rescheduled because it failed
Created              = 26m27s ago
Modified             = 25m6s ago
Deployment ID        = 263de268
Deployment Health    = unhealthy
Replacement Alloc ID = 9862d9de

Task "nginx" is "dead"
Task Resources
CPU      Memory   Disk     Addresses
200 MHz  128 MiB  300 MiB  web: 10.x.x.x:12345

Host Volumes:
ID   Read Only
test  false

Task Events:
Started At     = N/A
Finished At    = 2021-09-07T14:51:34Z
Total Restarts = 2
Last Restart   = 2021-09-07T22:51:01+08:00

Recent Events:
Time                       Type             Description
2021-09-07T22:51:36+08:00  Killing          Sent interrupt. Waiting 5s before force killing
2021-09-07T22:51:34+08:00  Alloc Unhealthy  Unhealthy because of failed task
2021-09-07T22:51:34+08:00  Not Restarting   Exceeded allowed attempts 2 in interval 30m0s and mode is "fail"
2021-09-07T22:51:34+08:00  Driver Failure   Failed to pull `nginx:1.21`: API error (500): Head https://registry-1.docker.io/v2/library/nginx/manifests/1.21: Get https://auth.docker.io/token?scope=repository%3Alibrary%2Fnginx%3Apull&service=registry.docker.io: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-09-07T22:51:18+08:00  Driver           Downloading image
2021-09-07T22:51:01+08:00  Restarting       Task restarting in 17.735293963s
2021-09-07T22:51:01+08:00  Driver Failure   Failed to pull `nginx:1.21`: API error (500): Head https://registry-1.docker.io/v2/library/nginx/manifests/1.21: net/http: TLS handshake timeout
2021-09-07T22:50:45+08:00  Driver           Downloading image
2021-09-07T22:50:29+08:00  Restarting       Task restarting in 15.739277426s
2021-09-07T22:50:29+08:00  Driver Failure   Failed to pull `nginx:1.21`: API error (500): Head https://registry-1.docker.io/v2/library/nginx/manifests/1.21: net/http: TLS handshake timeout

上一篇
Day 6. Hashicorp Nomad: Submit a Job
下一篇
Day 8. Hashicorp Nomad: Application Logs
系列文
Hashicorp Jot Notes30

尚未有邦友留言

立即登入留言