DAY 28
0
Elastic Stack on Cloud

## 讀同組鐵人賽參賽者文章的心得筆記#2-聚合操作

Elastic Stack第十一重

Console 執行時會理解為單純的 "GET /df_penguins/_search" 指令，而不理會後面 JSON 檔案中的聚合查詢。所以我花了不少時間去測試為什麼別人的 bank 資料可以回傳聚合值，我的企鵝資料卻無法回傳聚合值。明明檢查 mapping 的型式也幾乎相同啊。結果是自己笨...

``````library(elastic)
library(palmerpenguins) # 企鵝資料集
library(data.table) # 資料整理
library(ggplot2) # 畫圖神器

# Connect OK!

x <- connect(
host = "111.asia-east1.gcp.elastic-cloud.com",
path = "",
user = "elastic",
pwd = "111",
port = 9243,
transport_schema = "https"
)

# 如果還沒有上傳資料記得使用 docs_bulk( ) 進行上傳
# docs_bulk(x, penguins, "df_penguins")

# 銀行的分群計算範例
# 等價 SQL:(select  island count(*) from bank2 group by state )
aggs <- '
{ "aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}'

# 回傳 list 資料，回傳為了簡單點可以直接 \$ 取值
Search( conn =x , index = "bank2" ,  body = aggs , size = 0)\$aggregations\$group_by_state\$buckets

# 企鵝的分群計數範例
# 等價 SQL:(select  island , count(*) as group_by_island from df_penguins group by island )

aggs2 <- '
{
"aggs": {
"group_by_island": {
"terms": {
"field": "island.keyword"
}
}
}
}
'

island_counts <- Search( conn =x , index = "df_penguins" ,  body = aggs2 , size = 0)\$aggregations\$group_by_island\$buckets

# list 轉成 data.table 格式
island_counts_dt <- rbindlist(island_counts)

#         key doc_count
# 1:    Biscoe       168
# 2:     Dream       124
# 3: Torgersen        52

# 最後使用 ggplot2進行圖形繪製
ggplot(data =island_counts_dt , aes( x= key , y = doc_count) ) + geom_bar(stat="identity")

``````

Elastic 30天自我修行31