iT邦幫忙

2022 iThome 鐵人賽

DAY 22
1
AI & Data

[GCP BigQuery] - 探索資料倉儲,開啟你的數位轉型之旅系列 第 22

Day 22: 優化你的BigQuery 查詢,建立 Partitioned 和 Clustered table (實作)

  • 分享至 

  • xImage
  •  

https://ithelp.ithome.com.tw/upload/images/20221006/20152529kw4e9mS2p5.png

前言:

我們在 Day 21 講了三個概念:

  • Partitioned table
  • Clustered table
  • Partitioned table + Clustered table

講了一堆,今天就來分別針對這三個部分實作吧,我們使用的是公共數據集: bigquery-public-data.crypto_band.messages

準備材料:

完成 Day 04 or Day 06
BigQuery console

打開你的 BigQuery Console:

先建立一個名叫 Crypto 的資料集:

https://ithelp.ithome.com.tw/upload/images/20221006/201525292jfTFdqQQY.png

接著下面create table 都做完的話,可以看到 Crypto 底下有四個數據集:

https://ithelp.ithome.com.tw/upload/images/20221006/20152529SfxVXBP410.png

先建立一個完全沒有做 Partitioned or Clustered 的表作為參考

CREATE TABLE `crypto.messages` AS
SELECT block_timestamp_truncated, message_type
FROM `bigquery-public-data.crypto_band.messages`
WHERE block_timestamp_truncated between '2020-11-01'and '2020-11-30';

建立 Partitioned table

CREATE TABLE `crypto.messages_partition` 
PARTITION BY DATE(block_timestamp_truncated) AS
SELECT block_timestamp_truncated, message_type
FROM `bigquery-public-data.crypto_band.messages`
WHERE block_timestamp_truncated  between '2020-11-01'and '2020-11-30';

針對這個表作查詢:

SELECT block_timestamp_truncated, message_type
FROM `crypto.messages_partition` 
WHERE DATE(block_timestamp_truncated) ='2020-11-01';

https://ithelp.ithome.com.tw/upload/images/20221006/20152529vuS7MNxfcm.png

執行一樣的查詢,但是使用未做 Partitioned 的表可發現查詢量爆高:

SELECT block_timestamp_truncated, message_type
FROM `crypto.messages`
WHERE DATE(block_timestamp_truncated) ='2020-11-01';

https://ithelp.ithome.com.tw/upload/images/20221006/20152529cleVUUT4HK.png

建立 Clustered table

CREATE TABLE `crypto.messages_cluster` 
CLUSTER BY message_type AS
SELECT block_timestamp_truncated, message_type
FROM `bigquery-public-data.crypto_band.messages`
WHERE block_timestamp_truncated between '2020-11-01' and '2020-11-30';

針對這個表作查詢:

SELECT block_timestamp_truncated, message_type
FROM `crypto.messages_cluster`
WHERE  message_type = 'oracle/Request';

https://ithelp.ithome.com.tw/upload/images/20221006/20152529efJSJ6LM6F.png

執行一樣的查詢,但是使用未做 Clustered 的表可發現查詢量爆高:

SELECT block_timestamp_truncated, message_type
FROM `crypto.messages`
WHERE  message_type = 'oracle/Request';

https://ithelp.ithome.com.tw/upload/images/20221006/20152529MSAkZUbpWo.png

建立 Partitioned table + Clustered table

CREATE TABLE `crypto.messages_partition_cluster` 
PARTITION BY DATE(block_timestamp_truncated)
CLUSTER BY message_type AS
SELECT block_timestamp_truncated, message_type
FROM `bigquery-public-data.crypto_band.messages`
WHERE DATE(block_timestamp_truncated) between '2020-11-01' and '2020-11-30';

針對這個表作查詢:

SELECT block_timestamp_truncated, message_type
FROM `crypto.messages_partition_cluster` 
WHERE  message_type = 'oracle/Report' and DATE(block_timestamp_truncated) = '2020-11-02';

https://ithelp.ithome.com.tw/upload/images/20221006/20152529olEtctkBsX.png

執行一樣的查詢,但是使用只做 Clustered 未做 Partitioned 的表可發現查詢量爆高:

SELECT block_timestamp_truncated, message_type
FROM `crypto.messages_cluster` 
WHERE  message_type = 'oracle/Report' and DATE(block_timestamp_truncated) = '2020-11-02';

https://ithelp.ithome.com.tw/upload/images/20221006/20152529c17PCBZkY1.png

Summary:

我們今天學會了以下三種方法的創建與比較,優化查詢的部分就到這裡告一段落。

比較 查詢量
Standard table VS Partitioned table 50.37 MB / 194.81 KB
Standard table VS Clustered table 50.37 MB / 11.46 MB
Clustered table VS Partitioned table + Clustered table 48.14 MB / 195.38 KB

Reference:

Day 21


上一篇
Day 21: 優化你的BigQuery 查詢,Partitioned 和 Clustered 的使用時機和限制
下一篇
Day 23: Google Cloud Platform (GCP) 上的機器學習地圖,讓你知道你有什麼選擇!
系列文
[GCP BigQuery] - 探索資料倉儲,開啟你的數位轉型之旅30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言