Elastic Stack第十二重

第 12 屆 iThome 鐵人賽

DAY 12

Elastic Stack on Cloud

Elastic Stack武學練功坊系列第 12 篇

12th鐵人賽

沉思者

2020-09-27 16:59:08

1486 瀏覽

分享至

Aggregations Part II(聚合)

以下會使用 "aggregations" 或 "aggs" 詞彙來表示而非 "聚合"
這篇更加詳細的介紹 "聚合" 和另一種aggs的應用

前置作業
後續範例皆使用 Elastic Stack第七重匯入的範例資料，所以還未匯入可以先照那一重做批次匯入

real power of aggs (aggs強大之處)

補充介紹aggs四大家族

Bucketing: 會建立 buckets，documents會根據條件的匹配，而被"歸類"相關的bucket中，"aggs"的最後就會得到一群buckets，都有一組documents屬於每一個bucket
Metric: 追蹤與計算documents的指標(metrics)
Matrix: 針對多個fields產生出對應的矩陣(matrix)
Pipeline: 聚合(aggregate) 其他 aggs 的產出以及他相關的指標(metrics)

aggregations can be nested!
每一個bucket定義了每一個document set，可以再透過 aggregations 把 bucket 的內容物做進一步的分析或分類

根據 aggregations 的結果，再做 aggregations，這種稱為 nested aggregations，且沒有強硬的限制層數，而在 "parent"(higher-level) aggregation 下的 aggregation 又稱為 sub-aggregations

Aggregations的結構

"aggregations" : { (1)
    "<aggregation_name>" : { (2)
        "<aggregation_type>" : { (3)
            <aggregation_body>
        }
        [,"meta" : {  [<meta_data_body>] } ]?
        [,"aggregations" : { [<sub_aggregation>]+ } ]? (4)
    }
    [,"<aggregation_name_2>" : { ... } ]*
}

(1): 此處的 aggregations 也可以使用 aggs
(2): <aggregation_name>: 由使用者自行定義，在response內的 aggregations 也會用此處定義的名稱
(3): <aggregation_type>: 每一種 aggregation 都有指定的 type，會根據選擇的 aggregation_type 產生出特定的body，例如：上一重介紹過的 Terms Aggregation 就是一種 aggregation_type
(4) 即是上面提到的 aggs 強大的地方，是可以根據 (3) 產生的結果，再做進一步的 aggregation，也就是 nested aggregations

上一重有提到如何執行上述的aggs，其實就是使用搜尋的endpoint _search ，
可用上述JSON結構與上一重或用下面範例做對照

Metrics Aggregations

根據指定field對documents的field value做指標(metrics)運算

Nested Aggregation (Bucket and Avg Aggregation)

情境：根據 state 來分類documents，再計算出那一 state 的帳戶的 平均餘額，最後以平均餘額 做降冪排序

Request (Nested Aggregation)

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": { (4)
          "average_balance": "desc"
        }
      },
      "aggs": { (1)
        "average_balance": { (2)
          "avg": { (3)
            "field": "balance"
          }
        }
      }
    }
  }
}

Request 說明
(1): 使用 Nested Aggregation
(2): <aggregation_name> 自行定義為 "average_balance"
(3): <aggregation_type> 使用 avg
(4): terms aggs 回傳的buckets 使用 Nested Aggregation 中的 "average_balance" (平均餘額) 做排序

Response