了解 Optimizer 原理跟 EXPLAIN 語法能幫助定位效能問題,但發現問題後要怎麼優化,就要有踩坑經驗了,在這分享 5 個 Query 優化案例!
優化前SELECT * FROM t WHERE (user_id = 1 ) OR (receive_id = 1)"
優化後SELECT * FROM t WHERE user_id = 1 UNION ALL SELECT * FROM t WHERE receive_id = 1
優化前查詢會用 (user_id) & (receive_id) Index 並觸發 Index Merge 消除重複資料,但其實兩個 OR 查詢並沒有太多交集資料,如果觸發 Index Merge 反而會多花 CPU 執行 Merge,因此可改用 UNION ALL 避免觸發 Index Merge。
優化前SELECT * FROM orders WHERE user_id = 1 AND status IN (1,3,5) AND created_at > ?
SELECT * FROM t WHERE (user_id, receive_id) IN ((1, 2), (3, 4), (5, 6))
優化後SELECT * FROM orders WHERE user_id = 1 AND status = 2 AND created_at > ? UNION ALL SELECT * FROM orders WHERE user_id = 1 AND status = 3 AND created_at > ? ...
SELECT * FROM t WHERE user_id = 1 AND receive_id = 2 UNION ALL SELECT * FROM t WHERE user_id = 3 AND receive_id = 4 ...
優化前的 IN 條件,理論上要多個 = 比較,但發現變成 range 查詢,原因是 Optimizer 發現 IN 條件有連續性,用 Range 查詢可能比多個 = 重複查詢快,但實務上 range 查詢可能會 scan 到很多不相關的資料,因此可改用 UNION ALL 避免變成 range 查詢。
優化前SELECT * FROM t WHERE user_id = 1 LIMIT 100 OFFSET 9999999999
優化後SELECT * FROM t JOIN (SELECT id FROM t WHERE user_id = 1 LIMIT 100 OFFSET 9999999999) sub_query ON o.id = sub_query.id WHERE user_id = 1
優化前,因為有 SELECT *
MySQL OFFSET 邏輯會變成從 secondary index 找到資料後,要回 clustered index 拿到完整資料,才跳過該筆資料,因此 OFFSET 量大就造成回 clustered index 量大,改成 SELECT id
就不需要回 clustered index 就能跳過資料。
優化前SELECT * FROM t WHERE uid = 123 ORDER BY id ASC LIMIT 100
優化後SELECT * FROM t force index(idx_uid) WHERE uid = 123 ORDER BY id ASC LIMIT 100
優化前,即便 uid 有 index,MySQL 仍選擇用 primary key 來查詢,原因在於優化器發現要先排序後取前 100 筆紀錄,如果直接用 primary key 就不用排序,可以直接取前 100 筆,應該會比較快,但如果表很大且 uid 基數很大時,使用 uid index 效能會更好,
優化前
SELECT COUNT(1), status FROM t WHERE uid = 1 GROUP BY status LIMIT 1000
優化後
SELECT COUNT(1), status FROM ( SELECT status FROM t WHERE uid = 1 LIMIT 1000) GROUP BY status
由於 LIMIT 執行順序在 GROUP BY 後,優化前寫法會 GROUP BY 所有資料,但其實需求只要 GROUP BY 前 1000 筆就好,執行順序參考 https://medium.com/@cindy20303705/sql%E6%9F%A5%E8%A9%A2%E5%9F%B7%E8%A1%8C%E9%A0%86%E5%BA%8F-4d62584d372d。
設計 Table Schema Index 時建議先從 Composite Index 著手,例如用 (user_id, status) 取代 (user_id),雖然 Composite Index 資料較多,但其實 Index Key 只多一個欄位可說是毫無差別,然而多一個欄位不僅能 cover 更多情境的查詢,還可觸發 ICP 優化降低回 Clustered Index 的 I/O 次數。