由於DV 2.0 設計相對的複雜,使用起來跟一般資料倉儲上會有一些不同。這裡特別強調一些常用的使用模式與其中奧妙。以下的案例都是基於之前快速設定模板裡面的資料庫設計:
提取所有交易金額大於10的交易機會(Opportunity)
這個應該是一般最常見的SQL使用方法吧!如果是一般的資料倉儲項目,就可以很簡單的用一行解決:
select opportunity_id, amount from opportunity where amount > 10
但同樣的查找,套用到DV資料模型就會變得比較複雜:
# 0. 定義中心表
with hub_opp as (
select * from adv__hub_opportunity
),
# 1. 按日期在衛星表上篩選出最新載入的紀錄
sat_opp_rn as (
select
*,
row_number()
over (partition by opportunity_pk order by load_datetime desc)
as rn
from adv__sat_opportunity
),
sat_opp_last as (
select
opportunity_pk,
amount
from sat_opp_rn
where rn = 1
),
# 2. 鏈接中心與衛星表
opp as (
select
hub.opportunity_pk,
hub.opportunityid,
sat.amount
from hub_opp as hub
left join sat_opp_last as sat
on hub.opportunity_pk = sat.opportunity_pk
)
# 3. 按照商業條件篩選,載出
select * from opp
where amount > 10
為了展示所需的處理步驟,以上的SQL案例是故意以比較冗長與清楚的方式寫出來的。但可以看出來,相對一般的資料倉庫表DV就連提取簡單提取資料的步驟都會多很多。
有多少個機會(Opportunity)來自基於新加坡的帳戶(Account)?
在一般的寫法下,就算不用勉強用一行解決也是可以在一層內處理完:
select count(distinct opp.OPPORTUNITY_ID) as count
from account as acct
left join opportunity as opp
on acct.OPPORTUNITY_ID = opp.OPPORTUNITY_ID
where acct.CITY = 'Singapore'
相對的,DV就需要好幾個步驟:
# 0. 定義中心與鏈接表
with hub_acct as (
select * from adv__hub_acount
),
hub_opp as (
select * from adv__hub_opportunity
),
link_opp_acct as (
select * from adv__link_opportunity_account
),
# 1. 按日期在衛星表上篩選出最新載入的紀錄
sat_acct as (
select * from (
select
*,
row_number() over
(partition by account_pk order by load_datetime desc)
as rn
from adv__sat_account
) where rn = 1
),
acct as (
select
hub_acct.account_pk,
sat_acct.city
from hub_acct
left join sat_acct
on hub_acct.account_pk = sat_acct.account_pk
),
# 2. 通過鏈接表鏈接兩個中心表
acct_opp as (
select
acct.account_pk,
acct.city,
hub_opp.opportunity_pk
from acct
left join link_opp_acct
on acct.account_pk = link_opp_acct.account_pk
left join hub_opp
on link_opp_acct.opportunity_pk = hub_opp.opportunity_pk
)
# 3. 按照商業條件篩選計算,載出
select count(distinct opportunity_pk)
from acct_opp
where city = 'Singapore'
從以上的案例來看,可以看得出DV在使用上其實增加了很多複雜性。下一篇我們會討論什麼樣的案例會適合使用DV。
對 dbt 或 data 有興趣 :wave:?歡迎加入 dbt community 到 #local-taipei 找我們,也有實體 Meetup 請到 dbt Taipei Meetup 報名參加