iT邦幫忙

第 11 屆 iThome 鐵人賽

DAY 9
0
Modern Web

關於 Ruby on Rails,我想說的是系列 第 16

[Day 16] 關聯資料表加載,解決 N+1 Query

  • 分享至 

  • xImage
  •  


等下會用到的SQL示意圖

開始前先介紹什麼是Left outer joinInner join

Left outer join:

LEFT JOIN 可以用來建立左外部連接(上圖的 A + B 部分),查詢的 SQL 敘述句 LEFT JOIN 左側資料表 (table_1) 的所有記錄都會加入到查詢結果中,最終產生一張新的表。
outer join 會列出主要表格中每一筆資料,無論它的值在另一個表格中有沒有出現
=> 即使右側資料表 (table_2) 有沒有符合的值,左側資料表(table_1)每一筆還是會顯示出來。

LEFT JOIN 查詢用法 (Example)
這是一個客戶資料表 customers:

C_Id Name City Address Phone
1 張一 台北市 XX路100號 02-12345678
2 王二 新竹縣 YY路200號 03-12345678
3 李三 高雄縣 ZZ路300號 07-12345678
而這是產品訂單的資料表 orders:
O_Id Order_No C_Id
1 2572 3
2 7375 3
3 7520 1
4 1054 1
5 1257 5
現在我們想查詢所有客戶與其訂單狀況的資料,我們可以作一個 LEFT JOIN 的 SQL 查詢:
SELECT customers.Name, orders.Order_No # 挑出要顯示的欄位
FROM customers
LEFT JOIN orders                       # orders 資料表往左接上 customers 資料表
ON customers.C_Id=orders.C_Id;         # 指定兩資料表連接的條件

其中用點號連接之 XXX.YYY 表示 XXX 資料表中的 YYY 欄位。

Name Order_No
張一 7520
張一 1054
王二 null
李三 2572
李三 7375
LEFT JOIN 會返回 左側資料表中所有資料列,就算沒有符合連接條件,也就是右側資料表中沒有匹配的資料值,就會顯示為 NULL (ex. 王二)。
剛才連接完成的表,可以看到張一跟李三都有兩筆,這是因為 orders 表有兩筆資料分別對應到他們兩。
Left outer join 參考資料

Inner Join

INNER JOIN (內部連接,上圖的 B 部分)為等值連接,必需指定等值連接的條件,而查詢結果只會返回符合連接條件的資料。

這是一個客戶資料表 customers:

C_Id Name City Address Phone
1 張一 台北市 XX路100號 02-12345678
2 王二 新竹縣 YY路200號 03-12345678
3 李三 高雄縣 ZZ路300號 07-12345678

而這是產品訂單的資料表 orders:

O_Id Order_No C_Id
1 2572 3
2 7375 3
3 7520 1
4 1054 1
5 1257 5
現在我們想列出所有客戶的訂單編號資料,我們可以作一個 INNER JOINSQL 查詢:
SELECT customers.Name, orders.Order_No
FROM customers
INNER JOIN orders
ON customers.C_Id=orders.C_Id;

查詢結果如下:

Name Order_No
李三 2572
李三 7375
張一 7520
張一 1054
查詢結果只會返回符合連接條件的資料!
=> 因為在右邊資料表找不到王二的訂單,所以新的表就沒有王二了。

Inner Join 參考資料

介紹完這兩個特性之後,我們回頭來看正題


前情提要

使用前面建立的Product(商品)跟ProductStyle(商品樣式)Model

#app/models/product.rb

class Product < ApplicationRecord
  has_many :styles, class_name: 'ProductStyle' # 關聯取名為styles
end

#app/models/product_style.rb

class ProductStyle < ApplicationRecord
  belongs_to :product
end

圖示:

Table Model
products Product
product_styles ProductStyle

資料庫裡面有這些資料:

Product.count # => 5   # 共5個商品
ProductStyle.count # => 9   # 共9個商品款式

造成 N+1 Query 的迴圈:

Product.all.map do |product|
  product.styles.first.title
end
產生以下 SQL 跟回傳值:
Product Load (0.4ms)  SELECT "products".* FROM "products"
ProductStyle Load (0.2ms)  SELECT  "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" = $1 ORDER BY "product_styles"."id" ASC LIMIT $2  [["product_id", 3], ["LIMIT", 1]]
ProductStyle Load (0.2ms)  SELECT  "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" = $1 ORDER BY "product_styles"."id" ASC LIMIT $2  [["product_id", 4], ["LIMIT", 1]]
ProductStyle Load (0.1ms)  SELECT  "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" = $1 ORDER BY "product_styles"."id" ASC LIMIT $2  [["product_id", 6], ["LIMIT", 1]]
ProductStyle Load (0.1ms)  SELECT  "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" = $1 ORDER BY "product_styles"."id" ASC LIMIT $2  [["product_id", 7], ["LIMIT", 1]]
ProductStyle Load (0.1ms)  SELECT  "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" = $1 ORDER BY "product_styles"."id" ASC LIMIT $2  [["product_id", 8], ["LIMIT", 1]]
# => ["香蕉乾", "芒果乾", "商品款式2", "商品款式3", "商品款式4"] 

後面款式都亂取名,懶得想名字了

N+1 Query這個效能殺手,造成呼叫 5 + 1次SQL(第一次呼叫products資料表,每次迴圈呼叫一次product_styles資料表,共5次)

呼叫SQL的成本是巨大的,該怎麼辦?

Rails透過ORM語法把關聯資料表做預加載,如此一來迴圈內呼叫關聯時,不用再呼叫SQL從DB取資料,改從cache取資料。使用預加載,迴圈越多次程式碼效率提升就越顯著。

加載關聯資料表

Rails 提供四種加載關聯資料表的方法

  • preload
  • eager_load
  • includes
  • joins

preload

preload會生成兩條獨立的SQL查詢,所以關聯資料表會一起被 query 出來,但不會建立資料表之間的關聯

products = Product.preload(:styles).order("products.id DESC")
產生以下 SQL:
Product Load (0.4ms)  SELECT  "products".* FROM "products" ORDER BY products.id DESC LIMIT $1  [["LIMIT", 11]]
ProductStyle Load (0.2ms)  SELECT "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" IN ($1, $2, $3, $4, $5)  [["product_id", 8], ["product_id", 7], ["product_id", 6], ["product_id", 4], ["product_id", 3]]

preload會將products有關的資料表統一加載出來,但不會建立資料表之間的關聯,如果要拿preload生成的SQL語句來做關聯查詢(where)排序(order),會出錯:

Product.preload(:styles).where("product_styles.title like ?", 'test')
噴錯如下:
ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR:  missing FROM-clause entry for table "product_styles")

不過preload確實可以解決N+1 query

Product.preload(:styles).map do |product|
  product.styles.first.title
end
只產生以下兩條 SQL 跟回傳值:
Product Load (0.4ms)  SELECT "products".* FROM "products"
ProductStyle Load (0.4ms)  SELECT "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" IN ($1, $2, $3, $4, $5)  [["product_id", 3], ["product_id", 4], ["product_id", 6], ["product_id", 7], ["product_id", 8]]
# => ["香蕉乾", "芒果乾", "商品款式2", "商品款式3", "商品款式4"] 

eager_load

eager_load 使用LEFT OUTER JOIN進行單次查詢,並加載所有關聯數據(這樣資料表之間就有關聯了)

products = Product.eager_load(:styles).order("products.id DESC")
產生以下 SQL:
SQL (0.6ms)  SELECT  DISTINCT products.id AS alias_0, "products"."id" FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" ORDER BY products.id DESC LIMIT $1  [["LIMIT", 11]]
SQL (0.6ms)  SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "product_styles"."id" AS t1_r0, "product_styles"."title" AS t1_r1, "product_styles"."price" AS t1_r2, "product_styles"."sku" AS t1_r3, "product_styles"."product_id" AS t1_r4, "product_styles"."created_at" AS t1_r5, "product_styles"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE "products"."id" IN ($1, $2, $3, $4, $5) ORDER BY products.id DESC  [["id", 8], ["id", 7], ["id", 6], ["id", 4], ["id", 3]]

可以看到為了加載關聯,eager_load用了更複雜的SQL語法,雖然速度比preload慢,但是解決了preload不能關聯查詢的問題

Product.eager_load(:styles).where("product_styles.title like ?", 'test')
產生以下 SQL:
SQL (0.5ms)  SELECT  DISTINCT "products"."id" FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE (product_styles.title like 'test') LIMIT $1  [["LIMIT", 11]]
SQL (0.4ms)  SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "product_styles"."id" AS t1_r0, "product_styles"."title" AS t1_r1, "product_styles"."price" AS t1_r2, "product_styles"."sku" AS t1_r3, "product_styles"."product_id" AS t1_r4, "product_styles"."created_at" AS t1_r5, "product_styles"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE (product_styles.title like 'test') AND "products"."id" = $1  [["id", 4]]

eager_load 同時也可以解決N+1 query

Product.eager_load(:styles).map do |product|
  product.styles.first.try(:title)
end
產生以下 SQL:
  SQL (9.4ms)  SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "products"."order_id" AS t0_r6, "product_styles"."id" AS t1_r0, "product_styles"."title" AS t1_r1, "product_styles"."price" AS t1_r2, "product_styles"."sku" AS t1_r3, "product_styles"."product_id" AS t1_r4, "product_styles"."created_at" AS t1_r5, "product_styles"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id"

從SQL語法來看精準搜尋

先看沒有限制條件,單純做eager_load
Product.eager_load(:options)
SQL (0.7ms)  SELECT  DISTINCT "products"."id" FROM "products" LEFT OUTER JOIN "options" ON "options"."product_id" = "products"."id" LIMIT $1  [["LIMIT", 11]]
SQL (0.5ms)  SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "products"."store_id" AS t0_r6, "options"."id" AS t1_r0, "options"."product_id" AS t1_r1, "options"."title" AS t1_r2, "options"."option1" AS t1_r3, "options"."option2" AS t1_r4, "options"."created_at" AS t1_r5, "options"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "options" ON "options"."product_id" = "products"."id" WHERE "products"."id" IN ($1, $2, $3, $4, $5)  [["id", 4], ["id", 2], ["id", 7], ["id", 8], ["id", 3]]

第二條SQL會把所有product的id都列出來["id", 4], ["id", 2], ["id", 7], ["id", 8], ["id", 3]

再看有限制條件eager_load
Product.eager_load(:options).where(options: {title: '尺寸'})
SQL (42.9ms)  SELECT  DISTINCT "products"."id" FROM "products" LEFT OUTER JOIN "options" ON "options"."product_id" = "products"."id" WHERE "options"."title" = $1 LIMIT $2  [["title", "尺寸"], ["LIMIT", 11]]
SQL (0.6ms)  SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "products"."store_id" AS t0_r6, "options"."id" AS t1_r0, "options"."product_id" AS t1_r1, "options"."title" AS t1_r2, "options"."option1" AS t1_r3, "options"."option2" AS t1_r4, "options"."created_at" AS t1_r5, "options"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "options" ON "options"."product_id" = "products"."id" WHERE "options"."title" = $1 AND "products"."id" IN ($2, $3)  [["title", "尺寸"], ["id", 2], ["id", 3]]

可以看到第一條 SQL 有加上條件["title", "尺寸"],篩選出符合的 products (id: 2 跟 3)
所以第二條 SQL 在查詢時,只有搜尋符合條件的["id", 2], ["id", 3]
Rails 的 ORM語法雖然方便,但是如果能了解一些SQL,就能更容易找到適當的方法。


每次都要想何時用preload?何時用eager_load? 很麻煩

所以Rails 還有includes這個大家最常用的方法

ps. eager_load在Rails 4是產生一條SQL,這裡用Rails 5 示範


includes

includes預設效果是preload,需要做關聯資料表時又會轉成eager_load

預設效果同preload

Product.includes(:styles)
產生以下 SQL:
Product Load (0.3ms)  SELECT  "products".* FROM "products" LIMIT $1  [["LIMIT", 11]]
ProductStyle Load (0.4ms)  SELECT "product_styles".* FROM "product_styles" WHERE "product_styles"."product_id" IN ($1, $2, $3, $4, $5)  [["product_id", 3], ["product_id", 4], ["product_id", 6], ["product_id", 7], ["product_id", 8]]

精準查詢關聯資料表,效果同eager_load

Product.includes(:styles).where(product_styles: {title: 'test'})
產生以下 SQL:
SQL (0.6ms)  SELECT  DISTINCT "products"."id" FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE "product_styles"."title" = $1 LIMIT $2  [["title", "test"], ["LIMIT", 11]]
SQL (0.4ms)  SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "product_styles"."id" AS t1_r0, "product_styles"."title" AS t1_r1, "product_styles"."price" AS t1_r2, "product_styles"."sku" AS t1_r3, "product_styles"."product_id" AS t1_r4, "product_styles"."created_at" AS t1_r5, "product_styles"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE "product_styles"."title" = $1 AND "products"."id" = $2  [["title", "test"], ["id", 4]]
# => #<ActiveRecord::Relation [#<Product id: 4, title: "商品二", price: 1985.0, description: "要買要快", created_at: "2019-09-30 14:38:42", updated_at: "2019-09-30 14:38:42">]> 

如果是模糊搜尋,要搭配references才能達到eager_load的效果

products = Product.includes(:styles).where("product_styles.title like ?", 'test').references(:styles)
產生以下 SQL 跟回傳值:
SQL (0.5ms)  SELECT  DISTINCT "products"."id" FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE (product_styles.title like 'test') LIMIT $1  [["LIMIT", 11]]
SQL (0.5ms)  SELECT "products"."id" AS t0_r0, "products"."title" AS t0_r1, "products"."price" AS t0_r2, "products"."description" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "product_styles"."id" AS t1_r0, "product_styles"."title" AS t1_r1, "product_styles"."price" AS t1_r2, "product_styles"."sku" AS t1_r3, "product_styles"."product_id" AS t1_r4, "product_styles"."created_at" AS t1_r5, "product_styles"."updated_at" AS t1_r6 FROM "products" LEFT OUTER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE (product_styles.title like 'test') AND "products"."id" = $1  [["id", 4]]
# => #<ActiveRecord::Relation [#<Product id: 4, title: "商品二", price: 1985.0, description: "要買要快", created_at: "2019-09-30 14:38:42", updated_at: "2019-09-30 14:38:42">]> 

ps. 模糊搜尋的includes不搭配references會噴錯,各位看倌可以拿掉試試

所以includes既可以解決N+1 query的問題,搭配reference又可以建立關聯查詢,提到N+1 query推薦用includes原因就在這裡。


joins

joins主要用於建立model之間的關係,用INNER JOIN來關聯數據, “不會”將關聯的資料做預加載,所以對N+1 query並無幫助,常用來建立關聯搜尋,
以上面的模糊搜尋為例:

Product.joins(:styles).where("product_styles.title like ?", 'test')
產生以下 SQL 跟回傳值:
Product Load (0.6ms)  SELECT  "products".* FROM "products" INNER JOIN "product_styles" ON "product_styles"."product_id" = "products"."id" WHERE (product_styles.title like 'test') LIMIT $1  [["LIMIT", 11]]
# => #<ActiveRecord::Relation [#<Product id: 4, title: "商品二", price: 1985.0, description: "要買要快", created_at: "2019-09-30 14:38:42", updated_at: "2019-09-30 14:38:42">]> 

只產生一條SQL。所以joins可以用來建立資料表關聯

但如果想要joins處理我們今天第一個問題,會更慘。因為可能產生重複資料,SQL查詢次數比沒加joins還多 w(゚Д゚)w

原因如下:

Product.joins(:styles).size #=> 9
Product.joins(:styles).uniq.size  #=> 5

原來Product.joins(:styles)實際上是:

  • 查詢全部含有product_id的product_style,並回傳該product_style所屬的product
  • 如果有很多筆product_style屬於同一個product,就會回傳大量相同的product,要用.uniq來刪除重複的項目
  • 一對一關係不會發生重複的情況
反過來就沒問題,因為 ProductStyle 對 Product 是多對一:
ProductStyle.joins(:product).size # => 9 
ProductStyle.joins(:product).uniq.size # => 9 

結論:

Rails提供關聯資料表加載,幫助我們解決N+1 query這個隱形效能殺手,也可以建立資料表的關聯。
四種該如何使用,以下是簡單的判斷

  1. 只需要解決 N+1 query,用 preload 預加載
  2. 只需要精準或模糊搜尋,使用joins 建立關聯
  3. 需要建立關聯,又需要預加載,使用eager_load
  4. 新手不知道該怎麼用,用includes 似乎最方便

上面只是概略的描述,實際執行會因為資料表的行數及列數不同,導致效能也不同,還是要依照情況而定。


上一篇
[Day 15] 效能殺手 N+1 Query
下一篇
[Day 17] 深入了解includes 原理
系列文
關於 Ruby on Rails,我想說的是23
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言