[Day25] 資料統計 & 運算 - Aggregation - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

第 12 屆 iThome 鐵人賽

DAY 25

Elastic Stack on Cloud

Elastic 戰台股系列第 25 篇

[Day25] 資料統計 & 運算 - Aggregation

12th鐵人賽

華叔

團隊搭著 ESTC 飛上天

2020-10-10 06:33:07

3224 瀏覽

分享至

一路戰了 23 天，感謝團長夫妻拉我入火坑，我已經看到了新的股票分析系統的無限可能性！到目前為止，我都把 Elastic Cloud + Elasticsearch 當作是我的雲端資料中心。但 Elasticsearch 應不止於此，所以接下來要來探索更多有關數據研究的方法，也就是 Aggregation。

高階概念

Aggregation 簡單來說，就是 Elasticsearh「運算」與「統計」的工具，搭配優異的資料儲存與搜尋的特性，讓ES 擁有了數據分析的能力。要掌握 Aggregations，只要了解兩個主要的概念：

Buckets: 也就是符合某種條件的 Document 集合
Metrics: 針對 Buckets 的統計運算
舉例來說，我想要找出：「股價在 10 ~ 100 之間的股票個數。」其中：
Buckets: 透過搜尋，找出最後一日收盤價為 10 ~ 100 的 Documents
Metrics: 上述 Buckets 的個數 (count)

語法

直接以一個範例說明：

GET /stock-history-prices-daily/_search
{
  "query" : {
    "match_all" : {}
  },
  "aggs" : { #1
    "data_size" : { #2
      "terms": {
          "field" : "stock_id" #3
      }       
    }
  }
}

上面的 Aggregation 語法:
#1: 告訴 ES，接下來包含了 Aggregation DSL
#2: 為這個 Aggregation 的目的取個名字，以我的例子而言，我想知道每檔股票歷史股價的資料筆數
#3: 定義 terms bucket，不同的 stock_id 各自的資料筆數
結果如下：

Scoping Aggregations

上面的例子，我們的資料目標是 Index 中的所有資料，這不是一個好的範例。透過 Query ，可以定義出資料的 "scope" ，再搭配 Aggregations 進行數據分析，才能發揮 ES 的強大威力。下面看一個例子：

GET /stock-history-prices-daily/_search
{
    "query" : {
      "bool": {
        "must": [
          {"match": {
            "date": "2020-10-08"
          }}
        ], 
        "filter": {
          "script": {
            "script": {
              "lang": "painless",
              "source": "doc['close'].value < 100" ,
              "params": {}
            }
          }
        }
      }
    },
    "aggs" : {
      "prices_distribution" : {
        "histogram": {
          "field" : "close",
          "interval": 20
        }       
      }
    } 
}