接下來要開始用go實作監控功能,我們將會利用elk的索引功能,透過olivere/elastic
來尋找想要找的記錄資料。
取得第三方套件olivere/elastic
。
go get github.com/olivere/elastic/v7
初始化elastic的client設定
func setElkClient() {
var err error
elkclient, err = elastic.NewClient(
elastic.SetURL(elkip),
elastic.SetSniff(false),
)
if err != nil {
logrus.Fatalf("Failed to create elkclient: %v", err)
}
}
在上面我們已經建立好elkclient,接下來我們將透過參數傳入開始尋找elk的記錄資料。
func getELKLog() {
var (
t string
nt string
)
status := make([]interface{}, 5) //要監控的log等級
status[0] = "400"
status[1] = "500"
status[2] = "600"
status[3] = "700"
status[4] = "800"
t = time.Now().UTC().Add(-1 * time.Minute).Format(time.RFC3339)
nt = time.Now().UTC().Format(time.RFC3339)
logrus.Infof("elk檢查開始: %s ~ %s", t, nt)
rQuery := elastic.NewRangeQuery("@timestamp") //查詢的時間區間
rQuery.Gte(t)
rQuery.Lte(nt)
query := elastic.NewBoolQuery()
query.Must(rQuery)
query.Filter(elastic.NewTermsQuery("content.severity", status...))//查詢的log等級
res, err := elkclient.Search().Index(elkindex). //elkindex 索引名字
Query(query).
Size(20). //查詢的筆數,預設為10筆
Do(context.Background())
if err != nil {
logrus.Error("elk failed: ", elkindex, err)
}
for _, hit := range res.Hits.Hits {
d, _ := hit.Source.MarshalJSON()
msg, _ := jsonparser.GetString(d, "message") //取得錯誤訊息
logrus.Info("elk msg: ", msg)
sendMsg(msg)
}
}
在監控部分,會需要一個定時器來規定檢查時間,自動去執行設定好的功能,因此我們使用第三方的排程套件來執行。
go get github.com/robfig/cron
定時每分鐘自動執行
//設定排程
func cronExec() {
c := cron.New()
c.AddFunc("0 * * * * *", func() {
go getELKLog()
})
c.Start()
select {}
}
{"msg":"elk watch start","severity":"info","time":"2020-10-06T11:25:28+08:00"}
{"msg":"elk檢查開始: 2020-10-06T02:07:00Z ~ 2020-10-06T02:08:00Z"","severity":"info","time":"2020-10-06T11:26:00+08:00"}
{"msg":"elk msg: 1:M 06 Oct 2020 02:07:38.015 * Background saving started by pid 822","severity":"info","time":"2020-10-06T11:26:00+08:00"}
{"msg":"elk msg: 822:C 06 Oct 2020 02:07:38.021 * DB saved on disk","severity":"info","time":"2020-10-06T11:26:00+08:00"}
{"msg":"elk msg: 1:M 06 Oct 2020 02:07:38.116 * Background saving terminated with success","severity":"info","time":"2020-10-06T11:26:00+08:00"}
{"msg":"elk msg: 1:M 06 Oct 2020 02:07:38.015 * 1 changes in 3600 seconds. Saving...","severity":"info","time":"2020-10-06T11:26:01+08:00"}
{"msg":"elk msg: 822:C 06 Oct 2020 02:07:38.021 * RDB: 0 MB of memory used by copy-on-write","severity":"info","time":"2020-10-06T11:26:01+08:00"}
telegram收到訊息
kibana查詢結果
上述elk簡單的範例程式,就簡單的介紹一下應用在監控上面,如果在日誌系統格式有定義統一格式,你甚至可以針對錯誤訊息來客製化訊息,讓收到訊息時就明白發生什麼問題,而不用再看完分析log後,才知問題出現在什麼地方。如果需要查看更詳細的記錄資料時,可以直接用kibana界面來查詢,ELK + telegram 這樣結合起來使用,可以讓我們達到即時發現問題,並且分析錯誤原因,以便修正問題。
完整範例程式都放在github,有需要請自行下載