Day-14 | Postgres Query plan - scan path - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

2024 iThome 鐵人賽

DAY 14

Software Development

埋藏在後端工程下的地雷與寶藏系列第 14 篇

Day-14 | Postgres Query plan - scan path

16th鐵人賽 backend developer backend

Yusinz

團隊菜很長團

2024-09-28 21:47:04

186 瀏覽

分享至

昨天我們了解了 database index 機制，今天來聊聊在資料庫中，如果想要優化查詢要怎麼做?
首先我們要先了解資料庫的 Query plan ，知道他是怎麼選擇 query 的方式的。
今天我會以常用的 Postgresql 為例，來介紹他的 query plan。

Query Plan

Query Plan 是 PostgreSQL 的 Optimizer 會自動為每一個查詢產生的執行計劃，該計劃包含了：

Scan Path：查詢 table 的方式，例如昨天的 Index 等。
Join Strategy：如果查詢有使用到很多 table，會顯示如何進行 table 的 join（如 Nested Loop、Hash Join 等）。
Sort Method：查詢結果是如何排序的。

EXPLAIN

在介紹 Scan Path 前，我們先來介紹一下 EXPLAIN，如果 SQL 指令前加上 EXPLAIN，他就告訴你現在的指令是如何 query 的會有 scan path 還有執行的時間，你就能透過這些資訊來優化，待會我們就夠過 EXPLAIN 來一一檢視不同的 scan path。

建立資料

開始前先簡單介紹一下範例的資料，首先會建立一個 users 的 table 裡面會有四筆資料。

 id |  name   | age
----+---------+-----
  1 | Alice   |  30
  2 | Bob     |  25
  3 | Charlie |  35
  4 | David   |  28

Scan Path

"Scan Path" 指的資料庫在執行查詢時，是如何在 table 中查詢我們要的資料的。PostgreSQL Query Planner會根據查詢的結構、自動選擇合適的 Scan Path，以最佳化執行的效能和速度。

1. Seq Scan (Sequential Scan)
指的是照著 table 資料的順序，遍歷所有資料一遍。
昨天我們也有提到 Seq Scan 所花費的成本高，時間複雜度會是 O(N)，
因此 Postgresql 只有在整個 table 的查詢或者 table 較小時，會使用它。

我們透過 EXPLAIN 可以看到當我們去 query users 時，因為是 query 整張 table 所以 Postgres 選擇使用了 Seq Scan。

ithome=# EXPLAIN select age from users where age=30;
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on users  (cost=0.00..1.05 rows=1 width=4)
   Filter: (age = 30)

接著我們在 age 加上 index 再來執行一次

ithome=# \d users
                                    Table "public.users"
 Column |          Type          | Collation | Nullable |              Default
--------+------------------------+-----------+----------+-----------------------------------
 id     | integer                |           | not null | nextval('users_id_seq'::regclass)
 name   | character varying(100) |           |          |
 age    | integer                |           |          |
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "idx_users_age" btree (age)

你會發現，就算有了 index 他也不一定會是使用 index，因為這張 table 的資料非常的少，如果你再去使用 index 反而要多執行一次查詢(先找 index 在找 table)，所以資料量大小也會影響 planner 的選擇。

ithome=# EXPLAIN select age from users where age=30;
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on users  (cost=0.00..1.05 rows=1 width=4)
   Filter: (age = 30)

2. Index Scan
如字面意思就是透過索引來查詢資料，當查詢的條件有索引時就容易使用 Index Scan 。
這次我們把 table 的資料量新增 10 萬筆，並重新 EXPLAIN 查詢，就會發現他改使用 Index Scan 來做查詢了。

ithome=# EXPLAIN select * from users where age=30;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Index Scan using idx_users_age on users  (cost=0.28..8.30 rows=1 width=14)
   Index Cond: (age = 30)

3. Index Only Scan
當查詢只需要 index 欄位就能滿足查詢的條件而不用再透過指標回到資料表拿取資料時，就會使用 Index Only Scan。
以 users 的 table 來說我們如果只查詢 age，他就會使用這個方法，因為他不用再去 table 拿其他資料。

ithome=# EXPLAIN select age from users where age=30;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Index Only Scan using idx_users_age on users  (cost=0.28..8.30 rows=1 width=4)
   Index Cond: (age = 30)

4. Bitmap Scan
Bitmap scan 算是 index scan 的進階版，他主要解決當 index 遇到多個條件時，會多次去 index page 訪問的問題。
Bit map scan 會一次把所有符合條件的索引存在 bitmap 中，最後照順序去拿到對應到 table 的指標。
他會有兩個 step:

Bitmap Index Scan
- 他用所有的查詢條件去掃描所有的索引，如果索引有符合的就會在 bitmap 上設為1 ，不符合為 0，如下面範例所示:
```
1000010001001
```
Bitmap Heap Scan
- 將所有 Bitmap Index Scan 儲存為 1 的索引取出來，並做排序，最後用這些排序好的索引，去查詢 table。

ithome=# EXPLAIN select * from users where age>=30 and name='Bob';
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Bitmap Heap Scan on users  (cost=554.40..1811.62 rows=1 width=18)
   Recheck Cond: (age >= 30)
   Filter: ((name)::text = 'Bob'::text)
   ->  Bitmap Index Scan on idx_users_age  (cost=0.00..554.40 rows=41348 width=0)
         Index Cond: (age >= 30)
(5 rows)

5. Tid Scan
當已經知道資料的 rows 多少時，就會使用此方法。

ithome=# EXPLAIN SELECT * FROM users WHERE ctid='(0,1)';
                      QUERY PLAN
------------------------------------------------------
 Tid Scan on users  (cost=0.00..4.01 rows=1 width=18)
   TID Cond: (ctid = '(0,1)'::tid)