iT邦幫忙

2021 iThome 鐵人賽

DAY 19
0

19 - Activerecord-import

在寫入資料的時候,時常會有大量資料一次 import 的情況,可能是匯入整個 csv 檔案到資料表、或者是接收 api 將資料匯入。可能會使用 transaction 的方式來寫入確保資料都有正常匯入。

舉例來說,一次想入一筆資料:

  2.times.each do |time|
    Post.create(
      title: "Post test #{time}",
      content: "mock content #{time}",
      is_available: true,
      user_id: 1
    )
  end

  TRANSACTION (0.2ms)  BEGIN
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
  Post Create (0.3ms)  INSERT INTO "posts" ("title", "content", "is_available", "user_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id"  [["title", "Post test 0"], ["content", "mock content 0"], ["is_available", true], ["user_id", 1], ["created_at", "2021-10-02 08:40:31.090718"], ["updated_at", "2021-10-02 08:40:31.090718"]]
  TRANSACTION (0.6ms)  COMMIT

  TRANSACTION (0.1ms)  BEGIN
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
  Post Create (0.3ms)  INSERT INTO "posts" ("title", "content", "is_available", "user_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id"  [["title", "Post test 1"], ["content", "mock content 1"], ["is_available", true], ["user_id", 1], ["created_at", "2021-10-02 08:40:31.093480"], ["updated_at", "2021-10-02 08:40:31.093480"]]
  TRANSACTION (0.2ms)  COMMIT

這樣子的寫法會得到每一筆都會是一個 transaction,所以可能改成用一個 transaction 來包,只會有一次 transaction

  Post.transaction do
    2.times.each do |time|
      Post.create(
        title: "Post test #{time}",
        content: "mock content #{time}",
        is_available: true,
        user_id: 1
      )
    end
  end

  TRANSACTION (0.1ms)  BEGIN
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
  Post Create (0.3ms)  INSERT INTO "posts" ("title", "content", "is_available", "user_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id"  [["title", "Post test 0"], ["content", "mock content 0"], ["is_available", true], ["user_id", 1], ["created_at", "2021-10-02 08:44:53.350348"], ["updated_at", "2021-10-02 08:44:53.350348"]]
  User Load (0.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
  Post Create (0.3ms)  INSERT INTO "posts" ("title", "content", "is_available", "user_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id"  [["title", "Post test 1"], ["content", "mock content 1"], ["is_available", true], ["user_id", 1], ["created_at", "2021-10-02 08:44:53.352100"], ["updated_at", "2021-10-02 08:44:53.352100"]]
  TRANSACTION (0.6ms)  COMMIT

不過如果資料大量的時候,SQL 就會一直 n+1 的 INSERT,也可能因為 transaction 把資料表鎖住造成其他人要使用會需要排隊的情況。

所以這篇要來介紹 Activerecord-Import gem:

gem install

  # Gemfile
  gem "activerecord-import"

Example

  values = 
    2.times.each_with_object([]) do |time, array|
      array << {
        title: "Post test #{time}",
        content: "mock content #{time}",
        is_available: true,
        user_id: 1
      }
    end
  
  # values => [
    # {:title=>"Post test 0", :content=>"mock content 0", :is_available=>true, :user_id=>1},
    # {:title=>"Post test 1", :content=>"mock content 1", :is_available=>true, :user_id=>1}
  # ]

  Post.import values, validate: true

  Post Create Many (3.4ms)  INSERT INTO "posts" ("title","content","is_available","user_id","created_at","updated_at") VALUES ('Post test 0','mock content 0',TRUE,1,'2021-10-02 08:57:48.839905','2021-10-02 08:57:48.839905'),('Post test 1','mock content 1',TRUE,1,'2021-10-02 08:57:48.839905','2021-10-02 08:57:48.839905') RETURNING "id"
  => #<struct ActiveRecord::Import::Result failed_instances=[], num_inserts=1, ids=[19, 20], results=[]>

會發現只會有一筆 INSERT 且可以指定參數 validate 來看是否需要驗證,不指定則 default 為 true

註: 不需驗證的資料在 import 時速度約快有驗證的 5 倍左右

再者一個滿重要的問題,當如果輸入的資料其實其他欄位都相同,或者是差不多,但需要一個新的寫入時間來記錄是新的一筆資料的這種情況,或者是想覆蓋原本的資料,正常的情況可能會做 delete + create,不過這個套件有提供 Duplicate Key Update 來讓 MySQL 做 ON DUPLICATE KEY UPDATE,或者是 PG 做 ON CONFLICT DO UPDATE,相關可參考各資料庫 Duplicate Key Update 的寫法。

註: 須建立 uniq index 將重複資料改為 upsert

參考來源

My blog


上一篇
冒險村18 - Config
下一篇
冒險村20 - Design Pattern(1) - Decorator
系列文
冒險村-30 Day Ruby on Rails Tips Challenge30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言