# 玩轉 Python 與 MongoDB_Day17_aggregate 聚合基本操作

2023 iThome 鐵人賽

DAY 17

Software Development

玩轉 Python 與 MongoDB系列第 17 篇

15th鐵人賽 python3 pymongo mongodb 教學

熊熊工程師

2023-09-17 01:34:58

560 瀏覽

分享至

在 Mongodb 當中，提供了 aggregate 來讓我們進行複雜條件的查詢、計算，今天我們會利用 match、unwind 以及 group 來進行多對多關聯的查找

一、aggregate 簡介

aggregate 是 MongoDB 資料庫的一個功能強大的聚合操作，用於對文件集合進行多步處理和轉換。通過 aggregate，您可以執行一系列的數據處理步驟，以便進行篩選、轉換、排序、分組和計算等操作，最終生成所需的結果。

以下是一個簡單的示例，演示如何使用 aggregate 進行基本的數據處理：

# 找到年齡大於30歲的用戶，並按年齡降序排列
pipeline = [
    {
        '$match': {
            'age': {'$gt': 30}
        }
    },
    {
        '$sort': {
            'age': -1
        }
    }
]

result = db.users.aggregate(pipeline)
for user in result:
    print(user)

這個範例首先使用 $match 操作篩選出年齡大於30歲的用戶，然後使用 $sort 操作按年齡降序排列結果。

總之，aggregate 是 MongoDB 中強大的數據處理工具，適用於各種數據分析和處理需求。它可以幫助您從大規模的文件集合中提取有用的信息並生成定制的報告。

要注意透過 aggregate 查詢回來的資料會是一個 generator 的型態，必須透過 list 或是迴圈進行迭代才可讀取資料

二、多對多關聯查找

接著我們會使用昨天所建立的 author、book 以及 author_book_relation 這三張表進行多對多關聯的查詢，我們會以 author 這張表作為建立 Collection 物件的對象

首先使用 match 進行查詢，match 會協助我們進行指定條件的搜尋，基本等同於 find 的概念，下方範例可以看到我們針對 name 為 Nick 的 author 做查詢

import os
from pathlib import Path
from dotenv import load_dotenv
from pymongo.database import Database
from pymongo.collection import Collection
from pymongo.mongo_client import MongoClient
from pprint import pprint

# 讀取 .env 取得連線資訊
BASE_DIR = Path(__file__).parent.parent
load_dotenv(str(BASE_DIR / ".env"))

# 建立 client 並與 db、collection 進行連線
client = MongoClient(host=os.getenv("MONGODB_ATLAS_URL"))
database = Database(client=client, name="demo")
author_collection = Collection(database=database, name="author")

datas = list(author_collection.aggregate([
    {
        "$match": {
            "name": "Nick"
        }
    }
]))
pprint(datas)

client.close()

可以看到查詢出來的結果只有呈現 author 資料表中的資料

author

接著我們透過 look up 針對儲存關聯的表來進行查詢

...
datas = list(author_collection.aggregate([
 {
     "$match": {
         "name": "Nick"
     }
 },
 {
     "$lookup": {
         "from": "author_book_relation",
         "localField": "_id",
         "foreignField": "author_id",
         "as": "author_book_relation"
     }
 }
]))
...

下方圖片當中可以看到我們透過 lookup 將所有關聯的 id 都拉出來了
author lookup

接著我們可以透過 unwind 將我們拉出來的關聯物件進行攤平，方便等等後續針對 book_id 進行搜尋

...
  datas = list(author_collection.aggregate([
    {
        "$match": {
            "name": "Nick"
        }
    },
    {
        "$lookup": {
            "from": "author_book_relation",
            "localField": "_id",
            "foreignField": "author_id",
            "as": "author_book_relation"
        }
    },
    {
        "$unwind": "$author_book_relation"
    }
  ]))
...

下方圖片當中可以看到我們成功透過 unwind 將剛剛是 list 的 author_book_relation 欄位進行攤平，這樣我們稍後才有辦法針對此欄位底下的 book_id 進行查詢

unwind relation

接著我們要再次透過 lookup 對 book 這個 collection 進行查詢，並使用 author_book_relation 底下的 book_id

...
  datas = list(author_collection.aggregate([
    {
        "$match": {
            "name": "Nick"
        }
    },
    {
        "$lookup": {
            "from": "author_book_relation",
            "localField": "_id",
            "foreignField": "author_id",
            "as": "author_book_relation"
        }
    },
    {
      "$unwind": "$author_book_relation"
    },
    {
      "$lookup": {
          "from": "book",
          "localField": "author_book_relation.book_id",
          "foreignField": "_id",
          "as": "books"
      }
    },
  ]))
...

下圖中可以看到，我們成功透過 lookup 搭配 book_id 將資料拉出來到 books 這個欄位了

book lookup

再來我們要再次透過 unwind 將 books 進行攤平，方便等等進行 group 動作

...
  datas = list(author_collection.aggregate([
    {
        "$match": {
            "name": "Nick"
        }
    },
    {
        "$lookup": {
            "from": "author_book_relation",
            "localField": "_id",
            "foreignField": "author_id",
            "as": "author_book_relation"
        }
    },
    {
      "$unwind": "$author_book_relation"
    },
    {
      "$lookup": {
          "from": "book",
          "localField": "author_book_relation.book_id",
          "foreignField": "_id",
          "as": "books"
      }
    },
  ]))
...

下圖中可以看到 books 欄位變為一個 dictionary 的型態了

books unwind

最後我們透過 group 動作來整理資料，下方是針對三個欄位的設定說明

透過 _id 來進行分組，這個 _id 是指 author 的 _id
新建立一個欄位叫做 name，並取得分組後的第一個，由於都是相同的作者，這邊直接透過 $first 取得第一個
最後建立一個欄位叫做 books，將原先所有查找出來的 book 的 name 欄位推送進去

...
datas = list(author_collection.aggregate([
    {
        "$match": {
            "name": "Nick"
        }
    },
    {
        "$lookup": {
            "from": "author_book_relation",
            "localField": "_id",
            "foreignField": "author_id",
            "as": "author_book_relation"
        }
    },
    {
        "$unwind": "$author_book_relation"
    },
    {
        "$lookup": {
            "from": "book",
            "localField": "author_book_relation.book_id",
            "foreignField": "_id",
            "as": "books"
        }
    },
    {
        "$unwind": "$books"
    },
    {
        "$group": {
            "_id": "$_id",
            "name": {"$first": "$name"},
            "books": {"$push": "$books.name"}
        }
    }
]))
...