Day23提到, http_port 8222
是用來提供監控的API端口.
NATS提供了幾個API, 用來提供一些統計資料和其他資訊.
打開localhost:18222:varz
會看到類似下面的資訊.
其中connect_urls
就是叢集節點的資訊max_connections
就最高連線數, 可更改.
其中的max_payload
就是表示一個payload的byte size上限.max_pending
就是對每一個connection的byte buffer大小.cluster
就該cluster的資訊.cpu
、mem
、cores
都是跟CPU, 記憶體使用率相關的資訊.slow_consumers
這個就是要監控的其中一個項目, 要是非0很久通常要警報.subscriptions
訂閱數量http_req_stats
一些監控API被存取的累積次數
{
"server_id": "NDNNR4NOELGPAB2HTRSWFLSN5DFXGFNST5Z2PVYFY4BUQI7A6E6IH6XK",
"server_name": "NDNNR4NOELGPAB2HTRSWFLSN5DFXGFNST5Z2PVYFY4BUQI7A6E6IH6XK",
"version": "2.1.8",
"proto": 1,
"git_commit": "c0b574f",
"go": "go1.14.8",
"host": "0.0.0.0",
"port": 4222,
"connect_urls": [
"172.16.230.101:4222",
"172.16.230.100:4222"
],
"max_connections": 65536,
"ping_interval": 120000000000,
"ping_max": 2,
"http_host": "0.0.0.0",
"http_port": 8222,
"http_base_path": "",
"https_port": 0,
"auth_timeout": 1,
"max_control_line": 4096,
"max_payload": 1048576,
"max_pending": 67108864,
"cluster": {
"addr": "0.0.0.0",
"cluster_port": 6222,
"auth_timeout": 1,
"urls": [
"nats-1:6222"
]
},
"gateway": {},
"leaf": {},
"tls_timeout": 0.5,
"write_deadline": 2000000000,
"start": "2020-09-30T14:36:09.165131164Z",
"now": "2020-09-30T15:45:32.530157919Z",
"uptime": "1h9m23s",
"mem": 16031744,
"cores": 12,
"gomaxprocs": 12,
"cpu": 0,
"connections": 0,
"total_connections": 4,
"routes": 1,
"remotes": 1,
"leafnodes": 0,
"in_msgs": 1126704,
"out_msgs": 1126704,
"in_bytes": 44688058,
"out_bytes": 44688058,
"slow_consumers": 0,
"subscriptions": 24,
"http_req_stats": {
"/": 0,
"/connz": 24,
"/gatewayz": 0,
"/routez": 0,
"/subsz": 0,
"/varz": 17
},
"config_load_time": "2020-09-30T14:36:09.165131164Z"
}
打開localhost:18222:conz
會看到類似連線資訊.
這隻API有一些query string我平常會用的
主要就看idle
跟pending_bytes
subscriptions_list
則是該connection的訂閱清單.
{
"server_id": "NBOJ6Q6DCTJJ7HENSLUH4JHZ6LXF6QLXWJQQCW4EUUSU5J375CPTBY6O",
"now": "2020-09-30T16:12:56.042362837Z",
"num_connections": 16,
"total": 16,
"offset": 0,
"limit": 1024,
"connections": [
{
"cid": 36,
"ip": "172.16.230.1",
"port": 47700,
"start": "2020-09-30T16:12:54.11845346Z",
"last_activity": "2020-09-30T16:12:56.041943175Z",
"rtt": "446µs",
"uptime": "1s",
"idle": "0s",
"pending_bytes": 0,
"in_msgs": 5432,
"out_msgs": 5431,
"in_bytes": 259659,
"out_bytes": 164682,
"subscriptions": 5,
"lang": "go",
"version": "1.10.0",
"subscriptions_list": [
"_STAN.acks.UgTiAgxx9Uk0Y8Pj2xwfx3",
"_INBOX.UgTiAgxx9Uk0Y8Pj2xwfz2",
"_INBOX.UgTiAgxx9Uk0Y8Pj2xwfr6",
"_INBOX.UgTiAgxx9Uk0Y8Pj2xwft5",
"_INBOX.UgTiAgxx9Uk0Y8Pj2xwfv4.*"
]
}
]
}
打開localhost:18222:subsz
會看到類似連線資訊.
這隻API有一些query string我平常會用的
{
"server_id": "NBOJ6Q6DCTJJ7HENSLUH4JHZ6LXF6QLXWJQQCW4EUUSU5J375CPTBY6O",
"now": "2020-09-30T16:31:50.348121693Z",
"num_subscriptions": 26,
"num_cache": 299,
"num_inserts": 104,
"num_removes": 78,
"num_matches": 943,
"cache_hit_rate": 0.5906680805938495,
"max_fanout": 1,
"avg_fanout": 0.9565217391304348,
"total": 26,
"offset": 0,
"limit": 1024,
"subscriptions_list": [
{
"subject": "raft.test-cluster.node1.test-cluster.accept",
"sid": "1",
"msgs": 0,
"cid": 9
}
]
}
NATS Streaming也提供了幾個API, 用來提供一些統計資料和其他資訊.
打開localhost:18222:streaming/serverz
主要就是看role
, 有Leader、Follower、Candidate. (又是Raft)
還有open_fds
跟max_fds
{
"cluster_id": "test-cluster",
"server_id": "80cTxp7LklHZZl5OwxW4TF",
"version": "0.18.0",
"go": "go1.14.4",
"state": "CLUSTERED",
"role": "Leader",
"now": "2020-09-30T16:35:36.430556512Z",
"start_time": "2020-09-30T14:36:10.203910734Z",
"uptime": "1h59m26s",
"clients": 0,
"subscriptions": 0,
"channels": 9,
"total_msgs": 1002319,
"total_bytes": 78329272,
"in_msgs": 819523,
"in_bytes": 67394943,
"out_msgs": 819534,
"out_bytes": 31336342,
"open_fds": 32,
"max_fds": 1048576
}
{
"cluster_id": "test-cluster",
"server_id": "SVGdOmes52I66DxgT4vqdm",
"version": "0.18.0",
"go": "go1.14.4",
"state": "CLUSTERED",
"role": "Follower",
"now": "2020-09-30T16:38:19.612337427Z",
"start_time": "2020-09-30T14:36:12.185931325Z",
"uptime": "2h2m7s",
"clients": 0,
"subscriptions": 0,
"channels": 9,
"total_msgs": 1002319,
"total_bytes": 78329272,
"in_msgs": 0,
"in_bytes": 0,
"out_msgs": 0,
"out_bytes": 0,
"open_fds": 32,
"max_fds": 1048576
}
localhost:18225/streaming/clientsz
這裡可以看有哪些client connected, 還有是不是有設定durable跟max_inflight, ack_wait等設定.
{
"cluster_id": "test-cluster",
"server_id": "SVGdOmes52I66DxgT4vqdm",
"now": "2020-09-30T18:01:42.995346629Z",
"offset": 0,
"limit": 1024,
"count": 1,
"total": 1,
"clients": [
{
"id": "nathan01",
"hb_inbox": "_INBOX.TU1UpmKR8mKPKY9L83f9e6",
"subscriptions": {
"testTopic": [
{
"client_id": "nathan01",
"inbox": "_INBOX.TU1UpmKR8mKPKY9L83f9t8",
"ack_inbox": "_INBOX.80cTxp7LklHZZl5OwxW5Zs",
"is_durable": false,
"is_offline": false,
"max_inflight": 1,
"ack_wait": 10,
"last_sent": 1066035,
"pending_count": 0,
"is_stalled": false
}
]
}
}
]
}
{
"cluster_id": "test-cluster",
"server_id": "SVGdOmes52I66DxgT4vqdm",
"now": "2020-09-30T18:03:20.491138838Z",
"offset": 0,
"limit": 1024,
"count": 9,
"total": 9,
"channels": [
{
"name": "testTopic",
"msgs": 1000000,
"bytes": 78140820,
"first_seq": 69834,
"last_seq": 1069833,
"subscriptions": [
{
"client_id": "nathan01",
"inbox": "_INBOX.NPh3Gq6ocJREA30inok1Is",
"ack_inbox": "_INBOX.80cTxp7LklHZZl5OwxW5cr",
"is_durable": false,
"is_offline": false,
"max_inflight": 1,
"ack_wait": 10,
"last_sent": 1069814,
"pending_count": 1,
"is_stalled": true
}
]
}
]
}
打開localhost:18225/streaming/storez
會看到類似下面的資訊.
畢竟是in-memory MQ, 不可能無限制的資源存放著沒人要的訊息. 這裡設定的都是針對單個channel的設定.
所以如下顯示, 就是一個channel可以有max_msgs
1000000個訊息,容量max_bytes
則是1GB,max_age
放多久等等的配置.
{
"cluster_id": "test-cluster",
"server_id": "9tOlQ2TFjpSopm4erVtvFf",
"now": "2020-10-01T05:31:03.574407241Z",
"type": "RAFT_FILE",
"limits": {
"max_channels": 100,
"max_msgs": 1000000,
"max_bytes": 1024000000,
"max_age": 0,
"max_subscriptions": 1000,
"max_inactivity": 0
},
"total_msgs": 1002319,
"total_bytes": 78331864
}
有了以上的基本概念,
搭配prometheus-nats-exporter
就能針對各節點進行監控, 在命令中加入想要監控的參數即可
像是prometheus-nats-exporter -varz -connz "http://localhost:8222"
題外話, 至於要選擇In-Memory MQ還是Log-based MQ能參考這篇
為什麼日誌型訊息佇列效能這麼高?