我們在 Day 21 講了三個概念:
講了一堆,今天就來分別針對這三個部分實作吧,我們使用的是公共數據集: bigquery-public-data.crypto_band.messages。
完成 Day 04 or Day 06
BigQuery console
打開你的 BigQuery Console:
先建立一個名叫 Crypto 的資料集:
接著下面create table 都做完的話,可以看到 Crypto 底下有四個數據集:
CREATE TABLE `crypto.messages` AS
SELECT block_timestamp_truncated, message_type
FROM `bigquery-public-data.crypto_band.messages`
WHERE block_timestamp_truncated between '2020-11-01'and '2020-11-30';
CREATE TABLE `crypto.messages_partition`
PARTITION BY DATE(block_timestamp_truncated) AS
SELECT block_timestamp_truncated, message_type
FROM `bigquery-public-data.crypto_band.messages`
WHERE block_timestamp_truncated between '2020-11-01'and '2020-11-30';
針對這個表作查詢:
SELECT block_timestamp_truncated, message_type
FROM `crypto.messages_partition`
WHERE DATE(block_timestamp_truncated) ='2020-11-01';
執行一樣的查詢,但是使用未做 Partitioned 的表可發現查詢量爆高:
SELECT block_timestamp_truncated, message_type
FROM `crypto.messages`
WHERE DATE(block_timestamp_truncated) ='2020-11-01';
CREATE TABLE `crypto.messages_cluster`
CLUSTER BY message_type AS
SELECT block_timestamp_truncated, message_type
FROM `bigquery-public-data.crypto_band.messages`
WHERE block_timestamp_truncated between '2020-11-01' and '2020-11-30';
針對這個表作查詢:
SELECT block_timestamp_truncated, message_type
FROM `crypto.messages_cluster`
WHERE message_type = 'oracle/Request';
執行一樣的查詢,但是使用未做 Clustered 的表可發現查詢量爆高:
SELECT block_timestamp_truncated, message_type
FROM `crypto.messages`
WHERE message_type = 'oracle/Request';
CREATE TABLE `crypto.messages_partition_cluster`
PARTITION BY DATE(block_timestamp_truncated)
CLUSTER BY message_type AS
SELECT block_timestamp_truncated, message_type
FROM `bigquery-public-data.crypto_band.messages`
WHERE DATE(block_timestamp_truncated) between '2020-11-01' and '2020-11-30';
針對這個表作查詢:
SELECT block_timestamp_truncated, message_type
FROM `crypto.messages_partition_cluster`
WHERE message_type = 'oracle/Report' and DATE(block_timestamp_truncated) = '2020-11-02';
執行一樣的查詢,但是使用只做 Clustered 未做 Partitioned 的表可發現查詢量爆高:
SELECT block_timestamp_truncated, message_type
FROM `crypto.messages_cluster`
WHERE message_type = 'oracle/Report' and DATE(block_timestamp_truncated) = '2020-11-02';
我們今天學會了以下三種方法的創建與比較,優化查詢的部分就到這裡告一段落。
比較 | 查詢量 |
---|---|
Standard table VS Partitioned table | 50.37 MB / 194.81 KB |
Standard table VS Clustered table | 50.37 MB / 11.46 MB |
Clustered table VS Partitioned table + Clustered table | 48.14 MB / 195.38 KB |