[Day 26] Column Family Database：以 Cassandra 為例

2022 iThome 鐵人賽

DAY 26

Software Development

NoSQL: Not Only SQL系列第 26 篇

14th鐵人賽 nosql cassandra database columnfamily

Kei Liao

團隊30 個字以內

2022-09-28 23:47:19

1264 瀏覽

分享至

建立本機環境

這次一樣使用 Docker 在本機建立相關環境。

Pull Image

取得 cassandra 的 image。

> docker pull cassandra

Run Image

將該 image run 起來，指定 port 1236 (或其他你想要的 port 也可以)，並命名為 ironman-cassandra-demo。

> docker run -p 1236:9042 --name ironman-cassandra-demo cassandra

連線

連進該 container 並使用 cqlsh。

> docker exec -ti ironman-cassandra-demo cqlsh

預設會連進 Test Cluster。

Connected to Test Cluster at 127.0.0.1:9042
[cqlsh 6.0.0 | Cassandra 4.0.6 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.

基本操作

cql 的語法非常接近 sql，所以接下來這些看起來應該會很熟悉。

建立 Keyspace

透過 CREATE KEYSPACE 建立 keyspace，建立時要設定 replication 相關參數，可參考官方網站。

因為是本機的練習環境，所以這邊使用 SimpleStrategy 並設定 replication_factor 為 1，但正式開發不應該這樣設定。

cqlsh> CREATE KEYSPACE demo_keyspace WITH replication = { 'class':'SimpleStrategy', 'replication_factor':1 };

接著透過 DESCRIBE 或 DESC 可以看到該 keyspace 的資訊。

cqlsh> DESC KEYSPACE demo_keyspace;

得到結果：

CREATE KEYSPACE demo_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;

使用 use 進入該 keyspace。

cqlsh> USE demo_keyspace;
cqlsh:demo_keyspace>

建立 Table

建立 Table 時需要設定，在 Cassandra 中一張張的 Table 就是一個個 Column Family，所以每張 Table 在建立時要先定義好 Row Key (即 Primary Key)，Row Key 可以是單一欄位或複合欄位，其他的欄位可以一併定義，或之後再加都可以。

cqlsh:demo_keyspace> CREATE TABLE t1 ( a INT PRIMARY KEY, b INT, c INT);
cqlsh:demo_keyspace> CREATE TABLE t2 ( d INT PRIMARY KEY, e INT, f INT);
cqlsh:demo_keyspace> CREATE TABLE t3 ( g INT PRIMARY KEY, h INT, i INT);

可以透過 DESCRIBE 或 DESC 查看剛剛定義的這些 Keyspace 和 Table。

cqlsh:demo_keyspace> DESC SCHEMA;

得到回傳如下：

CREATE KEYSPACE demo_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;

CREATE TABLE demo_keyspace.t1 (
    a int PRIMARY KEY,
    b int,
    c int
) WITH additional_write_policy = '99p'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND extensions = {}
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99p';

CREATE TABLE demo_keyspace.t2 (
    d int PRIMARY KEY,
    e int,
    f int
) WITH additional_write_policy = '99p'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND extensions = {}
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99p';

CREATE TABLE demo_keyspace.t3 (
    g int PRIMARY KEY,
    h int,
    i int
) WITH additional_write_policy = '99p'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND extensions = {}
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99p';

新增資料

新增資料和 SQL 一樣使用 INSERT，語法如下：

cqlsh:demo_keyspace> INSERT INTO t1 (a,b,c) VALUES (1,2,3);
cqlsh:demo_keyspace> INSERT INTO t2 (d,e,f) VALUES (4,5,6);
cqlsh:demo_keyspace> INSERT INTO t3 (g,h,i) VALUES (7,8,NULL);

PK 為必填且不可為 NULL，沒填正確的值會回傳錯誤訊息。

cqlsh:demo_keyspace> INSERT INTO t1 (a,b,c) VALUES (NULL,2,3);

InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid null value in condition for column a"

若該 PK 已存在，會「更新」該筆資料，不會回傳 PK 衝突，這點需要注意。

cqlsh:demo_keyspace> INSERT INTO t1 (a,b,c) VALUES (1,2,3);
cqlsh:demo_keyspace> INSERT INTO t1 (a,b,c) VALUES (1,2,4);

查詢資料

使用 SELECT 對 Table 進行查詢。要注意的是，如果有下查詢條件(像是 WHERE)，在沒建立索引的條件下，查詢條件一定要包含 PK 才可以進行查詢。

cqlsh:demo_keyspace> SELECT * FROM t1;

 a | b | c
---+---+---
 1 | 2 | 4

(1 rows)

cqlsh:demo_keyspace> SELECT * FROM t1 WHERE a = 0;

 a | b | c
---+---+---

(0 rows)

條件沒下 PK 會得到錯誤訊息：

cqlsh:demo_keyspace> SELECT * FROM t1 WHERE b = 0;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

整體而言 CQL 下起來跟 SQL 非常接近，熟悉 SQL 的人直接使用 CQL 應該沒什麼大問題。但由於 Cassandra 不是關聯式資料庫，務必注意其中的差異，詳細可見官方文件，基本上都有特別標註提醒。