當一套工具有一個好的Web UI可以使用時,有時候會忘記CLI怎麼下
透過nomad job status
可以查看所有job的狀態
$ nomad job status
ID Type Priority Status Submit Date
java-batch batch 50 dead 2021-09-04T00:59:17+08:00
erp service 50 running 2021-09-07T22:51:48+08:00
XXXXXXX service 50 dead 2021-09-03T11:02:47+08:00
test service 50 running 2021-09-03T11:22:34+08:00
webserv service 50 running 2021-08-27T08:38:14+08:00
web-standby service 50 running 2021-08-27T08:38:56+08:00
加上ID可以查看該job的詳細資訊: nomad job status erp
$ nomad job status erp
ID = erp
Name = erp
Submit Date = 2021-09-07T22:51:48+08:00
Type = service
Priority = 50
Datacenters = Nomad
Namespace = default
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
webfront 0 0 2 2 0 0
Latest Deployment
ID = 263de268
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
webfront 2 4 2 2 2021-09-07T23:02:05+08:00
Allocations
ID Node ID Task Group Version Desired Status Created Modified
59e6a6bb 970f4da9 webfront 0 run running 12m22s ago 11m53s ago
9862d9de 970f4da9 webfront 0 run running 12m22s ago 11m53s ago
07152288 970f4da9 webfront 0 stop failed 13m41s ago 12m20s ago
4fa59294 970f4da9 webfront 0 stop failed 13m41s ago 12m20s ago
Job evaluation是一個job的調度狀態,可以透過參數 -evals
查看,
例:以下這個job, 一開始是 job-register
, 過程有alloc-failure
, 再到deployment-watcher
如果有Placement Failures=true,的情況可以使用 nomad eval status EvaluationsID
來查看
$ nomad job status -evals erp
ID = erp
Name = erp
Submit Date = 2021-09-07T22:51:48+08:00
Type = service
Priority = 50
Datacenters = Nomad
Namespace = default
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
webfront 0 0 2 2 0 0
Evaluations
ID Priority Triggered By Status Placement Failures
1ba4c1dc 50 deployment-watcher complete false
00d90ff5 50 alloc-failure complete false
68424b01 50 alloc-failure complete false
c260cdb5 50 deployment-watcher complete false
c87b22e4 50 alloc-failure complete false
e9e513e1 50 job-register complete false
Latest Deployment
ID = 263de268
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
webfront 2 4 2 2 2021-09-07T23:02:05+08:00
Allocations
ID Node ID Task Group Version Desired Status Created Modified
59e6a6bb 970f4da9 webfront 0 run running 15m14s ago 14m45s ago
9862d9de 970f4da9 webfront 0 run running 15m14s ago 14m45s ago
07152288 970f4da9 webfront 0 stop failed 16m33s ago 15m12s ago
4fa59294 970f4da9 webfront 0 stop failed 16m33s ago 15m12s ago
Job allocation是一個job的被分配後的狀態,包含cpu, memory, disk等,
job allocation失敗也會有log資訊
可以透過nomad alloc status AllocationID
來查看
$ nomad alloc status 07152288
ID = 07152288-c0f0-dc4c-3133-110c38ea2c1f
Eval ID = e9e513e1
Name = erp.webfront[0]
Node ID = 970f4da9
Node Name = nomad-worker
Job ID = erp
Job Version = 0
Client Status = failed
Client Description = Failed tasks
Desired Status = stop
Desired Description = alloc was rescheduled because it failed
Created = 26m27s ago
Modified = 25m6s ago
Deployment ID = 263de268
Deployment Health = unhealthy
Replacement Alloc ID = 9862d9de
Task "nginx" is "dead"
Task Resources
CPU Memory Disk Addresses
200 MHz 128 MiB 300 MiB web: 10.x.x.x:12345
Host Volumes:
ID Read Only
test false
Task Events:
Started At = N/A
Finished At = 2021-09-07T14:51:34Z
Total Restarts = 2
Last Restart = 2021-09-07T22:51:01+08:00
Recent Events:
Time Type Description
2021-09-07T22:51:36+08:00 Killing Sent interrupt. Waiting 5s before force killing
2021-09-07T22:51:34+08:00 Alloc Unhealthy Unhealthy because of failed task
2021-09-07T22:51:34+08:00 Not Restarting Exceeded allowed attempts 2 in interval 30m0s and mode is "fail"
2021-09-07T22:51:34+08:00 Driver Failure Failed to pull `nginx:1.21`: API error (500): Head https://registry-1.docker.io/v2/library/nginx/manifests/1.21: Get https://auth.docker.io/token?scope=repository%3Alibrary%2Fnginx%3Apull&service=registry.docker.io: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-09-07T22:51:18+08:00 Driver Downloading image
2021-09-07T22:51:01+08:00 Restarting Task restarting in 17.735293963s
2021-09-07T22:51:01+08:00 Driver Failure Failed to pull `nginx:1.21`: API error (500): Head https://registry-1.docker.io/v2/library/nginx/manifests/1.21: net/http: TLS handshake timeout
2021-09-07T22:50:45+08:00 Driver Downloading image
2021-09-07T22:50:29+08:00 Restarting Task restarting in 15.739277426s
2021-09-07T22:50:29+08:00 Driver Failure Failed to pull `nginx:1.21`: API error (500): Head https://registry-1.docker.io/v2/library/nginx/manifests/1.21: net/http: TLS handshake timeout