BigQuery :: Continuonus Query

gcp bigquery data engineer

阿晟 2025-01-06 02:29:49 ‧ 1187 瀏覽

分享至

BigQuery 近年來一直不斷擴充它功能的廣度，讓他遠遠不只是 Data Warehouse 這麼簡單，而是功能強大的 Data Platform（成本也越來越高...）

有蠻多方法可以實時寫進 BigQuery 的，GCP 自身的產品服務就有很高的支援度，不過過去比較難透過 BigQuery 去處理實時的需求，下游服務若要即時更新資料，需要即時打 API 來取得資料，然而 BigQuery 在連線設計上對於高併發需求的支援不是這麼的高，並不是設計來這樣使用的。

近期還在 Preview 階段的這個功能 -- continuonus query，看起來想要讓 BigQuery 也能支援即時的資料處理，以因應近年不斷成長的各種越來越即時的資料應用。

continuous query workflows

EXPORT DATA
  OPTIONS (
    format = 'CLOUD_PUBSUB',
    uri = 'https://pubsub.googleapis.com/projects/myproject/topics/sales-3')
AS (
  SELECT
    customer_id,
    product_id,
    amounts,	
    event_timestamp,
  FROM `my_project.real_time.fct_sales`
  WHERE product_id = 3
);

這個功能不只能在 BigQuery 內部進行實時的 pipeline 跟 ETL，譬如可以運用這個機制來即時監控輸入的資料是否符合預期、有無風險等；也可以結合其他服務，像是 Pub/Sub, Bigtable，進行資料的導出，在導出後就有很高的活用空間，也許是用來即時的訓練模型、或是 AI 服務等。
在 continuous query 中可以運用 google 預先定義好 AI function。

Use AI functions
Additional APIs, IAM permissions, and Google Cloud resources are required to use a supported AI function in a continuous query. For more information, see one of the following topics, based on your use case:

Generate text by using the ML.GENERATE_TEXT function

Generate text embeddings by using the ML.GENERATE_EMBEDDING function

Understand text with the ML.UNDERSTAND_TEXT function

Translate text with the ML.TRANSLATE function

不過還是有一些限制，這個功能中無法使用複雜的 query 語法，像是 join, aggregation, group by, distinct ...等，基本上可以先視為只有最入門學到的 select, from, where 可以用，再加上一些欄位本身的轉換而已。

另外在費用上，看起來這個功能不使用 on-demant 的計價模式（per TiB），而是 capacity 的計價模式（per slot-hour），需要先建立 Reservation，才能將 query 建立在其中，而 Reservation 最基礎的設置就需要 100 slots，並且服務是建立在 Enterprise 方案上，一個月最少就要燒 4,320 USD，不知道這樣可以支持多少 continuous queries 運行 XD。

Ref: