Postgresql Sequence 與 Surrogate_key

第 11 屆 iThome 鐵人賽

DAY 25

Software Development

以Postgresql為主,聊聊資料庫.系列第 25 篇

11th鐵人賽 postgresql rdbms sql

一級屠豬士

2019-10-10 21:02:47

4025 瀏覽

分享至

Postgresql Sequence 與 Surrogate_key

何謂 Surrogate key, 有人翻譯為代理鍵.
是資料庫系統自己內部使用的,資料天然的鍵值叫做 Nature Key,
雖然大家都知道天然的尚好,但是不見得會有天然又能夠擔當的起
相關要求的鍵值.例如身分證字號,這並不能在我們的資料庫系統中保證
唯一性,而且是依賴外部,若有人提議要用身份證字號,學生證號等等,
而且還很堅持,那就恭喜你可能會有機會在歷史上留名了.因為好的系統,
上善若水,幾乎忘了他的存在.而重大的災難,會在歷史上留下濃烈的一筆.

關於 Surrogate_key 也可以參考以下資料:

https://en.wikipedia.org/wiki/Surrogate_key

怎樣產生 Surrogate key 呢? 以前有一些錯誤但是卻被一些人
很堅持的使用的方式.select max(id) + 1, 這是一派;還有一派是
進化一點了,發覺隨著資料量增加,計算 max() 會變得很慢,就有提出了,
建立一個專用table,裡面存放一些欄位,對應其他table當前的極大值,
取用後,再update,維護起來蠻麻煩的.以上兩派,屬於較為古老的門派.
近來興起的是UUID派.

本系列並不針對效能議題做探討,上面一些方法的優劣,也需要很長的篇幅,
故在此僅作簡單的介紹而已.

接著我們來看 sequence 與 table 的一些互動.

使用昨天建立的 sequence

create table nana.t1010 (
  gs  smallint not null
, sq3 smallint not null default nextval('nana.s3')
, sq4 smallint not null default nextval('nana.s4')
);

insert into nana.t1010 (gs)
select n
  from generate_series(1, 6) as gs(n);

select *
  from nana.t1010;

+----+-----+-----+
| gs | sq3 | sq4 |
+----+-----+-----+
|  1 |   7 | -19 |
|  2 |   1 | -20 |
|  3 |   3 | -21 |
|  4 |   5 | -22 |
|  5 |   7 | -23 |
|  6 |   1 | -24 |
+----+-----+-----+
(6 rows)

可以看到同時呼叫了兩個 sequence, 當然 sq3 是會重複的.
再來看另一個table

create table nana.t1010a (
  gs  smallint not null
, sq4 smallint not null default nextval('nana.s4')
, sq1 smallint not null default nextval('nana.s1')
);

insert into nana.t1010a (gs)
select n
  from generate_series(1, 6) as gs(n);
  
select *
  from nana.t1010a; 

+----+-----+-----+
| gs | sq4 | sq1 |
+----+-----+-----+
|  1 | -25 |  11 |
|  2 | -26 |  12 |
|  3 | -27 |  13 |
|  4 | -28 |  14 |
|  5 | -29 |  15 |
|  6 | -30 |  16 |
+----+-----+-----+
(6 rows)

可以看到 nana.sq4 這個 sequence 被這兩個 table 使用,
不會有任何問題.
之前在介紹 sequence 時,就能理解到 sequence 本質上是跟 table一樣的.
這時候就能理解到剛才提到維護一個table那派的思維,與 ANSI SQL 標準是
類似的,但是 sequence 將其封裝的更方便,可以使用 nextval() 來取值.

接著討論一個table 的主鍵,需要的條件,最主要的就是要具備唯一性.
最間單的方式就是取用系統產生的序列,不見得是遞增的喔.
因為太常使用了,加上要建立主鍵限制時,還要先產生具備唯一性的index,所以
語法上會有快速的方式,把建立sequence, 建立 unique index, 設定PK限制,
協助完成的語法.以前版本的 Postgresql / Oracle要自己先建立 sequence,
指定欄位的 default value nextval('someseq'), 而 primary key
協助建立 index 與設定 constraint.

當我們使用一些工具來觀察時,顯示時不見得會出現 primary key 在欄位之後,
而是會有 constraint 單獨列出,這是比較傳統的設定方式.

在 Postgresql 10版前,有一個serial 的方式,會協助我們建立 sequence,
很方便的建立起來,與 MySQL auto_incerment 都是方便的語法.

ANSI SQL 2003 制定了 identity column
https://sigmodrecord.org/publications/sigmodRecord/0403/E.JimAndrew-standard.pdf

在 Postgresql 10 版時,就實作了此一標準,跟 Oracle 12c 都是有支持此標準的.
之後的版本也都有支持.

接著來看實際的例子.

create table nana.t1010b (
  id integer not null generated by default as identity (increment by 2)
     primary key
, val integer not null
);

使用 psql 的 \d 來觀察

\d nana.t1010b*
                             Table "nana.t1010b"
+--------+---------+-----------+----------+----------------------------------+
| Column |  Type   | Collation | Nullable |             Default              |
+--------+---------+-----------+----------+----------------------------------+
| id     | integer |           | not null | generated by default as identity |
| val    | integer |           | not null |                                  |
+--------+---------+-----------+----------+----------------------------------+
Indexes:
    "t1010b_pkey" PRIMARY KEY, btree (id)

                     Sequence "nana.t1010b_id_seq"
+---------+-------+---------+------------+-----------+---------+-------+
|  Type   | Start | Minimum |  Maximum   | Increment | Cycles? | Cache |
+---------+-------+---------+------------+-----------+---------+-------+
| integer |     1 |       1 | 2147483647 |         2 | no      |     1 |
+---------+-------+---------+------------+-----------+---------+-------+
Sequence for identity column: nana.t1010b.id

        Index "nana.t1010b_pkey"
+--------+---------+------+------------+
| Column |  Type   | Key? | Definition |
+--------+---------+------+------------+
| id     | integer | yes  | id         |
+--------+---------+------+------------+
primary key, btree, for table "nana.t1010b"

可以發現 sequence 以及 constraint primary key 都建立好了.
而且命名也是會採用前綴方式,方便理解.

輸入資料觀察一下.

insert into nana.t1010b (val)
select n
  from generate_series(1, 12, 3) gs(n);

select *
  from nana.t1010b;

+----+-----+
| id | val |
+----+-----+
|  1 |   1 |
|  3 |   4 |
|  5 |   7 |
|  7 |  10 |
+----+-----+
(4 rows)

這時候我們insert 一筆指定 id 的資料來看看.

insert into nana.t1010b values
(-1, 99);

沒有出現錯誤,查看一下.

select *
  from nana.t1010b;
+----+-----+
| id | val |
+----+-----+
|  1 |   1 |
|  3 |   4 |
|  5 |   7 |
|  7 |  10 |
| -1 |  99 |
+----+-----+

不用訝異,create table 時 id 欄位是 generated by default.
是指我們未指定值時,使用 default , default 是用 identity 指定的
sequence.

當有設定 default 值的欄位,可以使用 default 來當作輸入表示法.

insert into nana.t1010b values
(default, 88);

可以正常輸入.當一個table 欄位數量多的時候,而部分欄位是有default 的,
例如使用 sequence 或是 clock_timestamp() 這類方式,這時候無需在
table 之後列舉欄位名稱,只需要使用 default 在輸入時表示.可以減少
操作也避免了列舉時輸入錯誤.

接著再來看另一種方式的宣告. generated always

create table nana.t1010c (
  id integer not null generated always as identity (increment by 2)
     primary key
, val integer not null
);

insert into nana.t1010c (val)
select n
  from generate_series(1, 12, 3) gs(n);

select *
  from nana.t1010c;
+----+-----+
| id | val |
+----+-----+
|  1 |   1 |
|  3 |   4 |
|  5 |   7 |
|  7 |  10 |
+----+-----+
(4 rows)

一樣輸入一筆定指定id 的資料來看看.

insert into nana.t1010c values
(-1, 99);
ERROR:  428C9: cannot insert into column "id"
DETAIL:  Column "id" is an identity column defined as GENERATED ALWAYS.

出現錯誤訊息了. 也說明了理由.

這樣就可以避免輸入值,以免影響了唯一性.

But....

insert into nana.t1010c
OVERRIDING SYSTEM VALUE
select nextval('nana.s4'), n 
  from generate_series(1,3) gs(n);

輸入進去了,還使用了另外的 sequence 產生的值.

select *
  from nana.t1010c;

+-----+-----+
| id  | val |
+-----+-----+
|   1 |   1 |
|   3 |   4 |
|   5 |   7 |
|   7 |  10 |
| -31 |   1 |
| -32 |   2 |
| -33 |   3 |
+-----+-----+
(7 rows)

只要明確宣告,也是可以指定輸入的.當然為了避免影響唯一性,最好能夠使用一個反向的sequence.
這是符合 ANSI SQL 的定義,也把控制的權利保留在開發者手中,增加了彈性.

舉例來說,當開發一個系統在上線之前,需要輸入測試資料,利用此種功能,可以方便大量輸入測試
資料,後續可以很方便的移除.這就是之前說的妙用.

另外 sequence 的 default range, 都是整數的一個方向(正或負),當sequence 達到上限時,
也可以修改 sequence 的 maxvalue, maxvalue, strat value, 再restart,就可以讓
系統繼續執行,也方便找到變更起始點.

以上這些特性,是ANSI SQL 標準,也是如 Oracle / Postgresql 等先進的資料庫提供給大家
方便又有威力特徵.希望大家理解後,能在後續開發系統時,如虎添翼!