iT邦幫忙

第 11 屆 iThome 鐵人賽

DAY 7
1
Software Development

以Postgresql為主,聊聊資料庫.系列 第 7

Postgresql 的資料型態

Postgresql 的資料型態 

與其他資料庫產品不同,Postgresql 提供了許多資料型態.
使用者也能自定義新的資料型態.

那 Postgresql 提供了哪些資料型態呢?

經過前幾天關於 pg_catalog 的介紹,相信對此問題應該會有自己的想法浮現.
按照 Postgresql 的設計哲學,應該是查詢 pg_type, 而且 pg_type 在
三探 Postgresql Catalog 介紹 oid 時就已經上場開始領通告費了.
我們來看一下他的文件.

https://www.postgresql.org/docs/11/catalog-pg-type.html

試試看以下 SQL Command

select count(*)
  from pg_catalog.pg_type
 where typnamespace::regnamespace::text = 'pg_catalog' 
   and typelem = 0
   and typrelid = 0
   and typname !~ '(^_|^pg_|^reg|_handler$)';

+-------+
| count |
+-------+
|    66 |
+-------+

這只是一種計算方式,列出較為基本的型態. 上面三個 and 的條件,可以視情況調整.
會出現更多,一些像是 point, 或是系統內部使用的,會出現.

在 psql 中用 \dTS 會列出很多.包含一些使用者後續安裝extension 內含的,或是
使用者另行定義,或是 information_schema 所需而定義的.

那我們來把上面的查詢改寫一下,利用string_agg()來聚合,
另外設定psql將header, footer, border 不顯示.

\pset border 0
Border style is 0.
\t
Tuples only is on.

select string_agg(typname, ' ' order by typname) as type_name
  from pg_catalog.pg_type
 where typnamespace::regnamespace::text = 'pg_catalog' 
   and typelem = 0
   and typrelid = 0
   and typname !~ '(^_|^pg_|^reg|_handler$)';
   
abstime aclitem any anyarray anyelement anyenum anynonarray anyrange bit bool 
bpchar bytea char cid cidr circle cstring date daterange event_trigger 
float4 float8 gtsvector inet int2 int4 int4range int8 int8range internal 
interval json jsonb macaddr macaddr8 money numeric numrange oid opaque 
path polygon record refcursor reltime smgr text tid time timestamp 
timestamptz timetz tinterval trigger tsquery tsrange tstzrange 
tsvector txid_snapshot unknown uuid varbit varchar void xid xml

Boolean 型態介紹

之前在三探 Postgresql Catalog 中,我們有利用 pg_type join pg_description
查詢 oid 的描述.
我們來查看一下 bool 

select pg_catalog.format_type(oid, typtypmod) 
     , pg_catalog.obj_description(oid, 'pg_type') as "description"
  from pg_catalog.pg_type
 where typnamespace::regnamespace::text = 'pg_catalog' 
   and typname = 'bool';
+-------------+-------------------------+
| format_type |       description       |
+-------------+-------------------------+
| boolean     | boolean, 'true'/'false' |
+-------------+-------------------------+

這次利用 pg_catalog.obj_description() 來取得描述.

Boolean 型態在資料庫中的比較,除了 true / false 以外,還要注意與 null的比較.

select a::text
     , b::text
     , (a=b)::text as "a=b"
     , format('%5s = %5s',
              coalesce(a::text, 'null'),
              coalesce(b::text, 'null')) as op
     , format('is %s',
              coalesce((a=b)::text, 'null')) as result
  from (values(true), (false), (null)) v1(a)
 cross join
       (values(true), (false), (null)) v2(b);
+-------+-------+-------+---------------+----------+
|   a   |   b   |  a=b  |      op       |  result  |
+-------+-------+-------+---------------+----------+
| true  | true  | true  |  true =  true | is true  |
| true  | false | false |  true = false | is false |
| true  | ¤     | ¤     |  true =  null | is null  |
| false | true  | false | false =  true | is false |
| false | false | true  | false = false | is true  |
| false | ¤     | ¤     | false =  null | is null  |
| ¤     | true  | ¤     |  null =  true | is null  |
| ¤     | false | ¤     |  null = false | is null  |
| ¤     | ¤     | ¤     |  null =  null | is null  |
+-------+-------+-------+---------------+----------+
(9 rows)

當我們使用 Boolean 時,可以使用 Partial Index
來看以下的例子

create table ithelp190922 (
  id integer not null primary key
, error boolean
);

insert into ithelp190922
select generate_series(1, 1000000)
     , case when random() < 0.02 
         then true
         else false
       end;

commit;

未建立 index 時的查詢計畫

explain
select count(*)
  from ithelp190922
 where error = true;

+---------------------------------------------------------------------------------------------+
|                                         QUERY PLAN                                          |
+---------------------------------------------------------------------------------------------+
| Finalize Aggregate  (cost=9612.30..9612.31 rows=1 width=8)                                  |
|   ->  Gather  (cost=9612.08..9612.29 rows=2 width=8)                                        |
|         Workers Planned: 2                                                                  |
|         ->  Partial Aggregate  (cost=8612.08..8612.09 rows=1 width=8)                       |
|               ->  Parallel Seq Scan on ithelp190922  (cost=0.00..8591.67 rows=8167 width=0) |
|                     Filter: error                                                           |
+---------------------------------------------------------------------------------------------+

注意到是 Seq Scan, 另外可以注意到特別的 Parallel 與 Workers Planned: 2
因為 Postgresql 11 是可以平行處理的.

建立 Partial Index

create index prtidx_ithelp190922
    on ithelp190922 (error)
 where error = true;

commit;

explain
select count(*)
  from ithelp190922
 where error = true;

+----------------------------------------------------------------------------------------------------------+
|                                                QUERY PLAN                                                |
+----------------------------------------------------------------------------------------------------------+
| Aggregate  (cost=2065.15..2065.16 rows=1 width=8)                                                        |
|   ->  Index Only Scan using prtidx_ithelp190922 on ithelp190922  (cost=0.29..2016.15 rows=19600 width=0) |
+----------------------------------------------------------------------------------------------------------+

使用了 index. 那要查 false 的情況呢?

explain
select count(*)
  from ithelp190922
 where error = false;
+-----------------------------------------------------------------------------------------------+
|                                          QUERY PLAN                                           |
+-----------------------------------------------------------------------------------------------+
| Finalize Aggregate  (cost=10613.13..10613.14 rows=1 width=8)                                  |
|   ->  Gather  (cost=10612.92..10613.13 rows=2 width=8)                                        |
|         Workers Planned: 2                                                                    |
|         ->  Partial Aggregate  (cost=9612.92..9612.93 rows=1 width=8)                         |
|               ->  Parallel Seq Scan on ithelp190922  (cost=0.00..8591.67 rows=408500 width=0) |
|                     Filter: (NOT error)                                                       |
+-----------------------------------------------------------------------------------------------+

還是使用 Seq Scan.

當兩種一起計算時
explain
select count(*) filter(where error = true) as true_cnt
     , count(*) filter(where error = false) as false_cnt
  from ithelp190922;
+-----------------------------------------------------------------------------------------------+
|                                          QUERY PLAN                                           |
+-----------------------------------------------------------------------------------------------+
| Finalize Aggregate  (cost=11675.22..11675.23 rows=1 width=16)                                 |
|   ->  Gather  (cost=11675.00..11675.21 rows=2 width=16)                                       |
|         Workers Planned: 2                                                                    |
|         ->  Partial Aggregate  (cost=10675.00..10675.01 rows=1 width=16)                      |
|               ->  Parallel Seq Scan on ithelp190922  (cost=0.00..8591.67 rows=416667 width=1) |
+-----------------------------------------------------------------------------------------------+

還是seq scan.由上面實測結果可知Partial Index的效用.
來看實際執行查出的值

select count(*) filter(where error = true) as true_cnt
     , count(*) filter(where error = false) as false_cnt
  from ithelp190922;

+----------+-----------+
| true_cnt | false_cnt |
+----------+-----------+
|    20149 |    979851 |
+----------+-----------+

因為兩者資料量比例懸殊,利用 Postgresql 直接計算

select 20149::float / 979851::float;
+--------------------+
|      ?column?      |
+--------------------+
| 0.0205633305471954 |
+--------------------+

如 ithelp190922 是用來紀錄log, 平時主要查詢 error = true 的資料,
false 的是備查.就可以建立 Partial Index. 因為 error = true 的資料量小,
建立的index 也較小.

來建立一般的index,兩種情況都包含.

create index fulidx_ithelp190922
    on ithelp190922 (error);

來看兩者的磁碟空間差異比較

with t1 as (
select pg_relation_size('prtidx_ithelp190922') as prt
     , pg_relation_size('fulidx_ithelp190922') as ful
)
select pg_size_pretty(prt) as part_size
     , pg_size_pretty(ful) as full_sizr
     , round(prt::numeric / ful::numeric, 2) as "%"
  from t1;

+-----------+-----------+------+
| part_size | full_sizr |  %   |
+-----------+-----------+------+
| 464 kB    | 21 MB     | 0.02 |
+-----------+-----------+------+

類似資料容量大小的比例.較小的 index 佔用的記憶體較小,有助於資料庫優化查詢.

Postgresql 的 boolean 型態在輸入時,可以有很靈活的方式.

select TRUE::Boolean, 't'::Boolean, 'yes'::Boolean;

+------+------+------+
| bool | bool | bool |
+------+------+------+
| t    | t    | t    |
+------+------+------+

我們先在 public schema 中 建立一個 is_boolean 的函數

CREATE OR REPLACE FUNCTION public.is_boolean(s VARCHAR) RETURNS BOOLEAN AS $$
BEGIN
	PERFORM s::BOOLEAN;
	RETURN TRUE;
EXCEPTION WHEN others THEN
	RETURN FALSE;
END;
$$
STRICT
LANGUAGE plpgsql IMMUTABLE
;

COMMENT ON FUNCTION public.is_boolean(s VARCHAR) IS 'Checks, whether the given parameter is a BOOLEAN';

commit;

建立一個測試的 table

create table ithelp190922a (
  x text
);

insert into ithelp190922a values
('t'),('f'),('T'),('F'),('y'),('n'),('Y'),('N'),
('true'),('false'),('TRUE'),('FALSE'),('yes'),('no'),('YES'),('NO'),('1'),('0');

select x, is_boolean(x)
  from ithelp190922a;
+-------+------------+
|   x   | is_boolean |
+-------+------------+
| t     | t          |
| f     | t          |
| T     | t          |
| F     | t          |
| y     | t          |
| n     | t          |
| Y     | t          |
| N     | t          |
| true  | t          |
| false | t          |
| TRUE  | t          |
| FALSE | t          |
| yes   | t          |
| no    | t          |
| YES   | t          |
| NO    | t          |
| 1     | t          |
| 0     | t          |
+-------+------------+
(18 rows)

1/0 除了字元模式以外,數字模式也可以.

select 0::boolean, 1::boolean;
+------+------+
| bool | bool |
+------+------+
| f    | t    |
+------+------+

以上我們介紹了基礎資料型態的查詢方式,以及Boolean型態的應用.
並在實際例子中,使用了 Postgresql 一些具有特色好用的功能.


上一篇
四探 Postgresql Catalog
下一篇
Postgresql 的字元資料型態
系列文
以Postgresql為主,聊聊資料庫.31

尚未有邦友留言

立即登入留言