資料表test有三個欄位(dt date ,goods varchar(10) ,price numeric(8,2) );
這資料表想要計算每個的variance,這個variance值的計算是上一筆variance再乘上一個數值(假設是0.5),最後再乘上price,所以第一筆的variance因為沒有上一筆的variance,就給它的預設值為1。
預期資料如下:(partition by goods order by dt)
dt ,goods ,price ,variance
2020-09-01 ,'metal' ,10 ,1
2020-09-02 ,'metal' ,12 ,1 * 0.5 * 12 = 6
2020-09-03 ,'metal' ,16 ,6 * 0.5 * 16 = 48
本來是用了lag windown function去實作,但發現因為variance的值是需要經過運算的,所以lag這邊取不出來。
我在這邊腦筋卡住了,希望能有先進能幫我解惑,麻煩大家了,謝謝。
原題意已經變更,程式修改如下 :
CREATE TABLE TEST01 (
dt date,
goods varchar(10),
price numeric(8, 2)
);
--
INSERT INTO TEST01
VALUES ('2020-09-01','metal',10.0),
('2020-09-01','plastics',2.2),
('2020-09-01','paper',5.1),
('2020-09-02','metal',12.2),
('2020-09-02','plastics',3.1),
('2020-09-02','paper',5.2),
('2020-09-03','metal',16.3),
('2020-09-03','plastics',2.9),
('2020-09-03','paper',7.1);
SELECT *,
-- 計算累乘
ROUND(EXP(SUM(LOG(TempValue)) OVER (PARTITION BY goods ORDER BY dt)), 4) variance
FROM (
SELECT dt,goods,price,rank,
CASE WHEN rank = 1 THEN 1
WHEN rank >= 2 THEN price * 0.5
END AS TempValue
FROM (
SELECT dt,goods,price,
ROW_NUMBER() OVER (PARTITION BY goods ORDER BY dt) AS rank
FROM TEST01) AS Temp1
) AS Temp2
ORDER BY goods,dt,rank
這個variance值的計算是上一筆variance再乘上一個數值(假設是0.5),最後再乘上price,
這個應該在資料庫的 stored procedure 中做吧,簡單又有效率。
按照新的多組 goods, 寫新的方式.
CREATE TABLE it200928d (
dt date not null
, goods text not null
, price numeric(8,2) not null
);
INSERT INTO it200928d VALUES
('2020-09-01'::date ,'metal' ,10.0)
,('2020-09-01'::date ,'plastics' ,2.2)
,('2020-09-01'::date ,'paper' ,5.1)
,('2020-09-02'::date ,'metal' ,12.2)
,('2020-09-02'::date ,'plastics' ,3.1)
,('2020-09-02'::date ,'paper' ,5.2)
,('2020-09-03'::date ,'metal' ,16.3)
,('2020-09-03'::date ,'plastics' ,2.9)
,('2020-09-03'::date ,'paper' ,7.1);
-- 分組後 遞迴
with recursive t1 as (
select dt, goods, price
, row_number() over (partition by goods order by dt) grn
from it200928d
), t2 (dt, goods, price, grn, variance) as (
select dt, goods, price, grn, 1::numeric
from t1
where grn = 1
union all
select b.dt, b.goods, b.price, a.grn + 1
, a.variance * 0.5 * b.price
from t2 a
, t1 b
where a.goods = b.goods
and a.grn + 1 = b.grn
)
select *
from t2
order by goods, grn;
+------------+----------+-------+-----+-----------+
| dt | goods | price | grn | variance |
+------------+----------+-------+-----+-----------+
| 2020-09-01 | metal | 10.00 | 1 | 1 |
| 2020-09-02 | metal | 12.20 | 2 | 6.100 |
| 2020-09-03 | metal | 16.30 | 3 | 49.715000 |
| 2020-09-01 | paper | 5.10 | 1 | 1 |
| 2020-09-02 | paper | 5.20 | 2 | 2.600 |
| 2020-09-03 | paper | 7.10 | 3 | 9.230000 |
| 2020-09-01 | plastics | 2.20 | 1 | 1 |
| 2020-09-02 | plastics | 3.10 | 2 | 1.550 |
| 2020-09-03 | plastics | 2.90 | 3 | 2.247500 |
+------------+----------+-------+-----+-----------+
(9 rows)
---- 另外 用 plpython 寫函數,再搭配 array_agg() 的方法
create or replace function ithelp200928(lst real[])
returns real
as $$
if len(lst) == 1:
return 1.0
else :
s = 1.0
for elem in lst[1:]:
s = s * 0.5 * elem
return s
$$ language plpython3u;
with t1 as (
select *
, array_agg(price::real) over(partition by goods order by dt) aggprc
from it200928d
)
select *
, ithelp200928(aggprc)
from t1;
+------------+----------+-------+----------------+--------------+
| dt | goods | price | aggprc | ithelp200928 |
+------------+----------+-------+----------------+--------------+
| 2020-09-01 | metal | 10.00 | {10} | 1 |
| 2020-09-02 | metal | 12.20 | {10,12.2} | 6.1 |
| 2020-09-03 | metal | 16.30 | {10,12.2,16.3} | 49.714996 |
| 2020-09-01 | paper | 5.10 | {5.1} | 1 |
| 2020-09-02 | paper | 5.20 | {5.1,5.2} | 2.6 |
| 2020-09-03 | paper | 7.10 | {5.1,5.2,7.1} | 9.23 |
| 2020-09-01 | plastics | 2.20 | {2.2} | 1 |
| 2020-09-02 | plastics | 3.10 | {2.2,3.1} | 1.55 |
| 2020-09-03 | plastics | 2.90 | {2.2,3.1,2.9} | 2.2475 |
+------------+----------+-------+----------------+--------------+
(9 rows)
Time: 6.827 ms
我看了這麼久
還是
抱歉,今天腦袋實在太混頓了,連問題語意都沒好好呈現。
變異數它基本上就是用今天的價格去乘上昨天的變異數,最後再乘上一個系數(ex=0.5)
確實goods中會有多種品項,我的例子只簡單提到一個,確實與情境不符。
正常的資料範例如下
create table test ( dt date not null ,goods varchar(10) ,price numeric(8,2));
insert into test values
('2020-09-01'::date ,'metal' ,10.0)
,('2020-09-01'::date ,'plastics' ,2.2)
,('2020-09-01'::date ,'paper' ,5.1)
,('2020-09-02'::date ,'metal' ,12.2)
,('2020-09-02'::date ,'plastics' ,3.1)
,('2020-09-02'::date ,'paper' ,5.2)
,('2020-09-03'::date ,'metal' ,16.3)
,('2020-09-03'::date ,'plastics' ,2.9)
,('2020-09-03'::date ,'paper' ,7.1);
我試著用您的語法改寫了一下,發現遞歸出來的結果只有rn = 1的資料
我是依照品項跟時間作rank,最後在where的地方多加上了goods跟dt要相等,出來的結果就與預期不符了!
所以就不知道該怎麼去做這個查詢了,再麻煩指教了!
謝謝
沒關係,先釐清.我跟rogeryao大大,還有這裡很多熱心網友,都會幫你的.
先謝謝各位網友/邦友的熱心幫助
可以猜的答案已更新在上面了,沒招了
PostgreSQL 有 variance() .
rogeryao 大大的方法很高明,我補充了兩種比較普通的方式.
遞迴也是個好方法,彈性高可因應不同的邏輯演算
分組遞迴,要先分組一下.
你要確認這個table的哪一個欄位是你的seq key,不然怎麼確認計算順序?
順序是依照每個品項的時間作順序
select dt, goods, price, row_number() over(partition by goods order by dt) rn
from test
然後是計算每個品項當天的variance
DECLARE rs CURSOR FOR SELECT DT, GOODS, PRICE FROM CITY
DECLARE @_DT VARCHAR(10), @_GOODS VARCHAR(10),@_PRICE INT, @VARIANCE FLOAT, @_FIRST INT
SET @VARIANCE=1.0
SET @_FIRST=1
Open rs
Fetch Next From rs Into @_DT , @_GOODS , @_PRICE
While @@Fetch_Status = 0 -- 0是成功,-1是失敗,-2是遺漏
Begin
IF @_FIRST=1
SET @VARIANCE=1
ELSE
SET @VARIANCE=@_ID0.5@VARIANCE
SELECT @_DT, @_GOODS, @_PRICE, @VARIANCE
SET @_FIRST=@_FIRST+1
Fetch Next From rs Into @_DT , @_GOODS , @_PRICE
End
Close rs
Deallocate rs