iT邦幫忙

第 12 屆 iThome 鐵人賽

DAY 24
0
Elastic Stack on Cloud

Python&Elasticsearch 入門系列 第 24

IT鐵人第24天 Elasticsearch 使用python查詢資料 Aggregations:Stats/Extended Stats

今天要介紹的是Stats(統計資料)和Extended Stats(擴展統計)

測試資料:

![https://ithelp.ithome.com.tw/upload/images/20201006/20129976M2rzQ3oSii.png](https://ithelp.ithome.com.tw/upload/images/20201006/20129976M2rzQ3oSii.png)
{
  "grades" : {
    "math" : "60",
    "mand" : "87",
    "eng" : "90",
    "soc" : "74"
  },
  "name" : "王小明",
  "class" : "資工一1",
  "sid" : "s1090101",
  "weight" : "1"
}

Stats

這種聚合方式會回傳五種數值,分別是min、max、sum、count、avg
aggs query:

{
  "aggs": {
    "stats": {
      "stats": {
        "field": "grades.math"
      }
    }
  }
}

結果:

"aggregations" : {
  "stats" : {
    "count" : 6,
    "min" : 34.0,
    "max" : 91.0,
    "avg" : 70.83333333333333,
    "sum" : 425.0
  }
}

Extended Stats

這種聚合方式比起一般的Stats會多回傳以下數值
1.sum_of_squares 平方和:a² + b² + c²...

2.variance 變異數:((a-avg)² + (b-avg)² + (c-avg)² + (d-avg)²) / 4
另外還有母體變異數(variance_population)跟樣本變異數(variance_sampling)

3.std_deviation 標準差:變異數開平方根,一樣也有母體標準差(std_deviation_population)跟樣本標準差(std_deviation_sampling)

4.std deviation bounds 標準偏差範圍:預設返回平均值正負2個標準差的範圍,可以通過設定可變參數sigma來調整正負多少個標準差,一樣有分母體跟樣本

aggs query:

{
  "aggs": {
    "stats": {
      "extended_stats": {
        "field": "grades.math"
      }
    }
  }
}

結果:

"aggregations" : {
  "stats" : {
    "count" : 6,
    "min" : 34.0,
    "max" : 91.0,
    "avg" : 70.83333333333333,
    "sum" : 425.0,
    "sum_of_squares" : 32423.0,
    "variance" : 386.472222222222,
    "variance_population" : 386.472222222222,
    "variance_sampling" : 463.7666666666664,
    "std_deviation" : 19.658896770221414,
    "std_deviation_population" : 19.658896770221414,
    "std_deviation_sampling" : 21.53524243343145,
    "std_deviation_bounds" : {
      "upper" : 110.15112687377615,
      "lower" : 31.5155397928905,
      "upper_population" : 110.15112687377615,
      "lower_population" : 31.5155397928905,
      "upper_sampling" : 113.90381820019623,
      "lower_sampling" : 27.76284846647043
    }
  }
}

加上可調參數sigma:

{
  "aggs": {
    "stats": {
      "extended_stats": {
        "field": "grades.math",
        "sigma": 3
      }
    }
  }
}

結果:

"aggregations" : {
  "stats" : {
    "count" : 6,
    "min" : 34.0,
    "max" : 91.0,
    "avg" : 70.83333333333333,
    "sum" : 425.0,
    "sum_of_squares" : 32423.0,
    "variance" : 386.472222222222,
    "variance_population" : 386.472222222222,
    "variance_sampling" : 463.7666666666664,
    "std_deviation" : 19.658896770221414,
    "std_deviation_population" : 19.658896770221414,
    "std_deviation_sampling" : 21.53524243343145,
    "std_deviation_bounds" : {
      "upper" : 129.81002364399757,
      "lower" : 11.85664302266909,
      "upper_population" : 129.81002364399757,
      "lower_population" : 11.85664302266909,
      "upper_sampling" : 135.43906063362766,
      "lower_sampling" : 6.227606033038981
    }
  }
}

今天的文章就到這邊結束,謝謝大家


上一篇
IT鐵人第23天 Elasticsearch 使用python查詢資料 Aggregations:Cardinality/Boxplot
下一篇
IT鐵人第25天 Elasticsearch 使用python查詢資料 Aggregations:Geo Bounds/Geo Centroid
系列文
Python&Elasticsearch 入門30

尚未有邦友留言

立即登入留言