2021 iThome 鐵人賽

DAY 11

Software Development

MongoDB披荊斬棘之路系列第 11 篇

DAY11 MongoDB 深入聚合與常見問題

13th鐵人賽 mongodb aggregate group

阿派

2021-09-11 13:59:49

4154 瀏覽

分享至

DAY11 MongoDB 深入聚合與常見問題

MongoDB 的運算子前面有提到過，那是屬於查詢用的，本篇還會再提到一些運算子，專門是給 aggregate 使用。$sort與$limit在昨天講完了，今天繼續...

$project

指定取出那些欄位，例如我只想知道評價最高的電影名稱，可以只顯示 name 欄位。語法如下：

db.movie.aggregate(
{"$sort" : { "rating" : -1 }},
{"$limit" : 1},
{"$project": {"name": 1}})

結果：

/* 1 */
{
    "_id" : ObjectId("6120c79d2976f517181ffefa"),
    "name" : "movieE"
}

嗯...好像不如預期，多了 _id 欄位。其實這個欄位是預設都會查出來的，需要特別關閉它，語法也很簡單。

{"$project": {_id: 0, "name": 1}}

$match

基本上就是 find 指令的條件，應用在 aggregation 就是查詢符合這些條件的資料。
我們來查詢 producer 是 "companyA" 且評價大於 6 的電影。

{"$match": {"producer": "companyA", "rating": { $gte: 6 }}}

結果

/* 1 */
{
    "_id" : ObjectId("6120c79d2976f517181ffef6"),
    "name" : "movieA",
    "language" : "en-gb",
    "rating" : 8.0,
    "totalCost" : 30000000.0,
    "producer" : "companyA"
}

/* 2 */
{
    "_id" : ObjectId("6120c79d2976f517181ffef8"),
    "name" : "movieC",
    "language" : "zh-tw",
    "rating" : 6.0,
    "totalCost" : 25000000.0,
    "producer" : "companyA"
}

$group & $sum

一般我們熟知的 group by 語法，使用指定的欄位進行分群。
$sum 也一併在這個範例使用。
假設我們要計算每間發行商的成本總和...

db.movie.aggregate([
{ "$group" :
    {
        _id: "$producer",
        "totalCost" : {"$sum":"$totalCost"}
    }
}
])

結果

/* 1 */
{
    "_id" : "companyB",
    "totalCost" : 10000000.0
}

/* 2 */
{
    "_id" : "companyA",
    "totalCost" : 65000000.0
}

/* 3 */
{
    "_id" : "companyC",
    "totalCost" : 6000000.0
}

欄位名稱目前是只能用預設的 _id，可以使用 project 語法來改變。

$lookup

lookup 就是關聯式資料庫 join 的概念，當然還有很多新增的功能，這個可能之後再單獨開文章去詳細解釋應用。簡單來說就是跨表(collection)查詢資料。
我們先新增 producer collection，裏面放的是每間公司負責人的名稱。之後我們希望查詢每部電影背後出品公司的資訊，本文目的是解說功能，就不新增太多欄位了。

新增 producer 資料語法

db.getCollection('producer').insertMany([
{"companyName": "companyA", "pic": "Thrall"},
{"companyName": "companyB", "pic": "Arthas"},
{"companyName": "companyC", "pic": "Jaina"},
])

look up 語法

db.movie.aggregate([
{ "$lookup" :
    {
        from: "producer",
        localField: "producer",
        foreignField: "companyName",
        as: "companyDetail"
    }},
{ "$project": {_id:0, language:0, producer:0}}
])

結果

/* 1 */
{
    "name" : "movieA",
    "rating" : 8.0,
    "totalCost" : 30000000.0,
    "companyDetail" : [
        {
            "_id" : ObjectId("6120d3e92976f517181ffefe"),
            "companyName" : "companyA",
            "pic" : "Thrall"
        }
    ]
}

/* 2 */
{
    "name" : "movieB",
    "rating" : 5.0,
    "totalCost" : 10000000.0,
    "companyDetail" : [
        {
            "_id" : ObjectId("6120d3e92976f517181ffefe"),
            "companyName" : "companyA",
            "pic" : "Thrall"
        }
    ]
}

/* 3 */
{
    "name" : "movieC",
    "rating" : 6.0,
    "totalCost" : 25000000.0,
    "companyDetail" : [
        {
            "_id" : ObjectId("6120d3e92976f517181ffefe"),
            "companyName" : "companyA",
            "pic" : "Thrall"
        }
    ]
}

/* 4 */
{
    "name" : "movieD",
    "rating" : 8.0,
    "totalCost" : 10000000.0,
    "companyDetail" : [
        {
            "_id" : ObjectId("6120d3e92976f517181ffeff"),
            "companyName" : "companyB",
            "pic" : "Arthas"
        }
    ]
}

/* 5 */
{
    "name" : "movieE",
    "rating" : 9.0,
    "totalCost" : 6000000.0,
    "companyDetail" : [
        {
            "_id" : ObjectId("6120d3e92976f517181fff00"),
            "companyName" : "companyC",
            "pic" : "Jaina"
        }
    ]
}

聚合大致上就介紹到這，否則篇幅可能會佔太多，有問題再發問或私信囉！

常見聚合問題 1

每間公司第一部電影是什麼
每間公司電影總成本是多少
每間公司出了幾部電影
列出分別的電影名稱與年份

扣除效能，在商業需求上應該很常遇到這樣的問題

db.movie.aggregate([
  { $match: { rating: { $gte:1 }} },
  { $group: {
              _id: '$producer',
              'totalCount': {$sum:1},
              'totalCost': {$sum: '$totalCost'},
              'totalMovies': {$push: {'name':'$name'} } ,
              'totalMovies_withLang': {$push: {'name': '$name', 'lang': '$language'} } ,                
            }
  }
])

常見聚合問題 2 - 條件統計數量

超過評分 7 分電影數量有多少？

db.movie.aggregate([
  { $match: { rating: { $gte:7 } }},
  { $count: 'HigherThan 7 rating movies count' }
])

常見聚合問題 3 - 陣列統計

假設每部電影都有一個 tags 陣列欄位 ["Action", "Eastern", "Western", "Historical", "Fantasy", "Drama", "Horror", "Thriller", "Science"] 會隨機包含一到多個元素。

總共有幾種元素
每種元素出現次數

db.movie.aggregate([
  { $unwind: { path: '$tags' }},
  { $group: { _id: '$tags', 'Counts': {$sum:1} } }
])

本系列文章會同步發表於我個人的部落格 Pie Note

DAY10 MongoDB 聚合(Aggregate)種類介紹

DAY12 MongoDB Facet 與 Bucket 分桶統計

系列文

MongoDB披荊斬棘之路共 30 篇

RSS系列文訂閱系列文

35 人訂閱

完整目錄

尚未有邦友留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22211 篇

完賽人數

600 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

MongoDB披荊斬棘之路系列 第 11 篇