Day 15 Create DataStream

第 12 屆 iThome 鐵人賽

DAY 15

Elastic Stack on Cloud

親愛的，我把ElasticSearch上雲了系列第 15 篇

12th鐵人賽 elasticsearch elk data stream

aron3312

團隊薪水被偷

2020-09-30 22:36:57

4931 瀏覽

分享至

Day 15 Create DataStream

前言

昨天我們講解了index template以及如何建構和使用，也實際操作了index template以及component template，在建立新的index時會如何對照。今天我們來講一下基於template，後續建構的Datastream。

什麼是Data Stream

先前我們有簡單介紹過，Data Stream其實就像流水一樣，可以想像成是Data不斷的流經我們的Elasticsearch，因此Data Stream適用於大量連續的時間序列資料。通常量越大，時間性明顯的資料，使用Data Stream，時間成本消耗差異會非常明顯。
而它的原理其實就是透過相關聯的Indices，將大量的時間序列資料分散儲存，達到儲存以及查詢上的效率增加。

因此整個DataStream主要提供以下兩點:

The simplicity of a single named resource you can use for requests
The storage, scalability, and cost-saving benefits of multiple indices

使用時間如下：

Use Elasticsearch to ingest, search, and manage large volumes of time series data
Want to scale and reduce costs by using ILM to automate the management of your indices
Index large volumes of time series data in Elasticsearch but rarely delete or update individual documents

統整一下，DataStram提供一個簡單的request標的，使你能夠輕易地對後面大量的indices做requests，同時也能夠更有效率儲存大量的indices

使用時機就是當資料是大量且時序資料，並且具有不須刪除或更新的狀態下(不斷往下疊加)這種特性，就可以考慮使用DataStream作為input。

Data Stream自動生成背後隱藏的indices

request請求背後DataStream流程

建立Data Stream

建立 index lifecycle management

首先建立DataStream，我們可以先建構一個ILM(index lifecycle management )的Policy，意思是因為會生成一些暫時隱藏的backup indices，

PUT /_ilm/policy/my-data-stream-policy

{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "25GB"
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

將大小設定在25GB，並且讓停留時間是30days

建立index template for data stream

這裡大家應該也大概知道index template得用途了，就是為了去configure建構出來的back up indices

我們設定一個index template，pattern就是my-data-stream*，後面接任何東西，接著將ILM的設定透過settings匯入。
另外設定一個mapping field @timestamp，類型是date_nanos。

PUT /_index_template/my-data-stream-template
{
  "index_patterns": [ "my-data-stream*" ],
  "data_stream": { },
  "priority": 200,
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date_nanos" }    
      }
    },
    "settings": {
      "index.lifecycle.name": "my-data-stream-policy"
    }
  }
}

建立index及匯入document

接著我們直接透過request _doc建立index以及匯入document。
這邊我們透過對my-data-stream POST一個_doc的request，如此一來他就會自己建構my-data-stream這個index，並且匯入我們這個json。
而這個index符合我們在index template中的設定，因此就會套入設定

POST /my-data-stream/_doc/
{
  "@timestamp": "2020-12-06T11:04:05.000Z",
  "user": {
    "id": "vlb44hny"
  },
  "message": "Login attempt failed"
}

接著我們看一下匯入後的回應

{
    "_index": ".ds-my-data-stream-000001",
    "_type": "_doc",
    "_id": "70xl33QBY_yROqwhRrJE",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

可以看到_index部分是.ds-my-data-stream-000001，這就是我們剛剛提到的backup indices

檢視建立好的DataStream

檢視剛剛上船的資料，我們可以透過對這個data stream進行index層級的操作，他會直接幫你matching這些backup indices

GET /my-data-stream/_search

回覆如下：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": ".ds-my-data-stream-000001",
                "_type": "_doc",
                "_id": "70xl33QBY_yROqwhRrJE",
                "_score": 1.0,
                "_source": {
                    "@timestamp": "2020-12-06T11:04:05.000Z",
                    "user": {
                        "id": "vlb44hny"
                    },
                    "message": "Login attempt failed"
                }
            }
        ]
    }
}

可以看到我們可以透過my-data-stream的request，找到剛剛那則文件，而他的_index欄位同樣是".ds-my-data-stream-000001"

而跟index一樣，我們也可以在index層級中，透過逗號分隔去查詢多個data stream

GET /my-data-stream,my-data-stream-alt/_search

{
  "query": {
    "match": {
      "user.id": "8a4f500d"
    }
  }
}

也可以透過正規表示去查詢data stream

GET /my-data-stream*/_search
{
  "query": {
    "match": {
      "user.id": "vlb44hny"
    }
  }
}

新增及查詢backup indices

透過針對datastream進行_rollover的request，會新增新的backup indices。
request完後，會發現backup indices新增了

.ds-my-data-stream-000002

POST /my-data-stream/_rollover/

另外可以透過_cat indices，加上條件限制找到符合的data stream backup indices。

GET /_cat/indices/my-data-stream?v&s=index&h=index,status

可以看到這些indices是open或是close，基本上沒特別關掉的都會open，而close可能是手動關閉或是lifecycle到期了之後關閉的。

而關閉的backup indices不能刪除，但可以對它進行重新開啟。

POST /my-data-stream/_open/

Day 14 Index Template And DataStream

Day 16 Use and Set DataStream

系列文

親愛的，我把ElasticSearch上雲了共 30 篇

RSS系列文訂閱系列文

32 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19831 篇

完賽人數

528 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

親愛的，我把ElasticSearch上雲了系列 第 15 篇

Day 15 Create DataStream

Day 15 Create DataStream

前言

什麼是Data Stream

建立Data Stream

建立 index lifecycle management

建立index template for data stream

建立index及匯入document

檢視建立好的DataStream

新增及查詢backup indices

尚未有邦友留言

標記使用者

親愛的，我把ElasticSearch上雲了系列第 15 篇