資料表內容整理及批量更正流程建議請教

postgresql 資料庫資料整理

Victor 2019-05-28 14:23:32 ‧ 2548 瀏覽

分享至

各位大大好! 最近接到一個任務需要整理資料表的內容，可能有些字串需要替換
例如 x10-3 取代成 ×10-3 之類的操作，
執行下面這種指令 (網路上範例)

SELECT REPLACE (Region_Name, 'ast', 'astern')
FROM Geography;

但是怕直接取代會有錯誤的地方，所以打算把變更的紀錄記錄在一個地方，取代完再人工檢查
目前就是卡在這個部分（紀錄ＬＯＧ的方法），想請教有甚麼方法可以記錄這些內容，
由於有很多張資料表，每張資料表的欄位也不同。
所以我打算每個都去查詢並替換
但欄位不同，不知道該如何記錄這些SQL執行後修改的紀錄
請教各位大大該如何處裡比較好？
或有其他更好的整理資料方法，可以提供給小弟研究看看!!

rogeryao iT邦超人 7 級 ‧ 2019-05-28 15:00:01 檢舉

1.請問 x10-3 是代表 : 乘以 10 的 -3 次方嗎 ?
2.若欄位內有 6x10-4x3x10+2 應如何解讀 ?

Victor iT邦新手 2 級 ‧ 2019-05-28 16:28:21 檢舉

1.是的! 都是字串只是頁面上要顯示上標，所以用HTML TAG 替換
2.我倒是沒有想過這個問題，但是欄位內容目前看起來沒有運算式字串，所以想說替代的記錄記錄下來檢查，如果替代指令下去應該會變成 `6x10-4 x3x10+3 ` 再去檢查紀錄是否正確，如果是運算式那就在手動修改。

rogeryao iT邦超人 7 級 ‧ 2019-05-28 16:55:50 檢舉

A.上述第 2個問題是整個問題的根本,未確定前切勿動手
B.
1. x10 可以做為依據替代成 x10
update temptable set val = replace (val, 'x10', 'x10');
2.要替代 時卻沒有參考的字串 (底下的 ??????)
update temptable set val = replace (val, '??????', '');
C.只有 A和B 兩個條件都確定時 , 再執行相關的 SQL 才會有意義 ; 否則只是產生一
堆很奇怪的資料 , 最後資料還要還原回去

登入發表討論

直播研討會

3 個回答

一級屠豬士

iT邦大師 1 級 ‧ 2019-05-28 16:01:59

最佳解答

-- auto commit 狀態檢查
-- in psql
\echo :AUTOCOMMIT
OFF

-- 設定
\set AUTOCOMMIT ON
\set AUTOCOMMIT OFF

-- 當然我們是要使用 auto commit off
-- 這樣可以 rollback
-- 除了互動式 \set 以外,在 .psqlrc 中設定
\set AUTOCOMMIT OFF

-- 記得要檢查

-- 使用範例

create table ithelp190528 (
  id serial not null primary key
, val text not null
);

insert into ithelp190528 (val)
values ('test1');

select *
  from ithelp190528;
  
+----+-------+
| id |  val  |
+----+-------+
|  1 | test1 |
+----+-------+

begin;

update ithelp190528
   set val = 'test2'
 where id = 1;

select *
  from ithelp190528;

+----+-------+
| id |  val  |
+----+-------+
|  1 | test2 |
+----+-------+

rollback;

-- check 
select *
  from ithelp190528;

+----+-------+
| id |  val  |
+----+-------+
|  1 | test1 |
+----+-------+

-- 若確認無誤時,可以 commit
--------
-- 建立比對的 Table

create table ithelp190528p (
  id int not null
, oldval text not null
, newval text not null
);

insert into ithelp190528p
select id
     , val as oldval
     , replace (val, 'te', 'se') as newval
  from ithelp190528;

select * from ithelp190528p;
+----+--------+--------+
| id | oldval | newval |
+----+--------+--------+
|  1 | test1  | sest1  |
+----+--------+--------+
-- 確認操作無誤後,再將 insert 改寫為 update
---------
-- log table and trigger 

create table ithelp190528t (
  id serial not null primary key
, old_id int not null
, old_val text not null
, new_val text not null
, upts timestamp with time zone not null default clock_timestamp()
);

create function bf_up_ithelp190528_fun()
returns trigger as
$_$
begin
    if new.val <> old.val then
        insert into ithelp190528t(old_id, old_val, new_val)
        values (old.id, old.val, new.val);
    end if;
    
    return new;
end;
$_$ LANGUAGE plpgsql;

create trigger tri_bf_up_ithelp190528
    before update
    on ithelp190528
    for each row
    execute procedure bf_up_ithelp190528_fun();
----
begin;

update ithelp190528
   set val = 'test2'
 where id = 1;

commit;

select *
  from ithelp190528;

+----+-------+
| id |  val  |
+----+-------+
|  1 | test2 |
+----+-------+

select *
  from ithelp190528t;  

+----+--------+---------+---------+------------------------------+
| id | old_id | old_val | new_val |             upts             |
+----+--------+---------+---------+------------------------------+
|  1 |      1 | test1   | test2   | 2019-05-28 15:50:51.87478+08 |
+----+--------+---------+---------+------------------------------+
(1 row)

-- 可以看到 trigger 有效的將紀錄保存. 
----
-- 再來做幾筆

insert into ithelp190528 (val)
values 
('test3'),('test4');

begin;

update ithelp190528
   set val = replace (val, 'te', 'se');

select * from ithelp190528;
+----+-------+
| id |  val  |
+----+-------+
|  1 | sest2 |
|  2 | sest3 |
|  3 | sest4 |
+----+-------+
(3 rows)

commit;

-- check log table
select *
  from ithelp190528t;  

+----+--------+---------+---------+-------------------------------+
| id | old_id | old_val | new_val |             upts              |
+----+--------+---------+---------+-------------------------------+
|  1 |      1 | test1   | test2   | 2019-05-28 15:50:51.87478+08  |
|  2 |      1 | test2   | sest2   | 2019-05-28 15:56:41.869679+08 |
|  3 |      2 | test3   | sest3   | 2019-05-28 15:56:41.869775+08 |
|  4 |      3 | test4   | sest4   | 2019-05-28 15:56:41.86999+08  |
+----+--------+---------+---------+-------------------------------+
(4 rows)

--------------
-- 以上的步驟,auto commit 的要確認,以免自動就commit了.
-- 其他可以自行視需要靈活搭配使用.
-- 操作之前最好能做備份,以及匯出,有備無患.

回應 10
分享
檢舉

看更多先前的回應...收起先前的回應...

海綿寶寶 iT邦大神 1 級 ‧ 2019-05-29 09:03:11 檢舉

您寫這麼多
我想他還是覺得「沒有幫助」
不管
我還是按個讚先

Victor iT邦新手 2 級 ‧ 2019-05-29 09:29:40 檢舉

誤會呀!兩位大大!小弟比較需要時間理解，發現我對於觸發跟AUTOCOMMIT沒有很熟還在查資料研究中!
想請問另外一個地方
AUTOCOMMIT 甚麼時候才真得寫入 commit; 以後嗎?
begin 用意是甚麼?

一級屠豬士 iT邦大師 1 級 ‧ 2019-05-29 16:02:01 檢舉

commit rollback , begin transaction.這些觀念要建立.
auto commit 是一些client的參數, 當設定 auto commit on
你下指令後,會幫你補上 commit.因為現在蠻多都是這樣,對你這次想做的不適合.最好是自己確定後再 commit.

一級屠豬士 iT邦大師 1 級 ‧ 2019-05-29 17:25:38 檢舉

所以你對資料庫 transaction 的觀念是?

從沒聽過.
有聽過,但沒特別注意.

我有看了一下你以往發問的問題,你是用框架開發.
我有點好奇,你們公司的人,有人懂資料庫嗎?

Victor iT邦新手 2 級 ‧ 2019-05-30 10:27:01 檢舉

一級屠豬士大大好!
經過這次大大的回答讓我更清楚這部分。
資料庫 transaction 剛剛查了一下！
之前比較常用後端程式語言來處理~雖然我不常用! 但會用類似如下面的PHP程式碼(Laravel 框架)：

DB::beginTransaction();
    try {
        //新增刪除修改
        DB::commit();
    } catch (Exception $e) {
        //DB 還原交易
        DB::rollBack();
        return $e->getMessage();
    }
    return $results;

公司專案應該算都一條龍服務，前後端資料庫都要碰到，沒有學得很深，資料庫的部分基本上建立表，外鍵，正規化而已，之前有需要完成其他任務有用過資料庫的View 以及trigger 以及了解到預存程序，但久久用一次有點不熟。

一級屠豬士 iT邦大師 1 級 ‧ 2019-05-30 10:34:09 檢舉

框架有示範了,使用 transaction. 這方面的觀念要建立.
在實作上逐步加深了解,後續你會再提升,加油!

Victor iT邦新手 2 級 ‧ 2019-05-30 10:38:06 檢舉

謝謝大大!最近越學越深

前端開始邁入 npm 來管理套件，以及用VUE框架
後端 laravel 開始打造前後端分離的API
資料庫的部分，不知道大大可不可能給我甚麼意見，例如推薦的教材之類的...

雖然我也不知道學那麼分散好不好! 小弟第一份工作，歡迎給小弟一點意見！

一級屠豬士 iT邦大師 1 級 ‧ 2019-05-30 10:47:47 檢舉

PostgreSQL.TW <- 臉書社群. 我有在這個社群.
之前我跟一些it幫的網友有辦一些研討會.有再辦的時候,歡迎來參加.

Victor iT邦新手 2 級 ‧ 2019-05-30 10:51:47 檢舉

一級屠豬士好喔! 感謝大大 http://postgresql.tw/ 這個網址是嘛?剛剛直接用網址列搜尋結果到這個地方! 哈哈我最近來好好更深入研究資料庫好了!

一級屠豬士 iT邦大師 1 級 ‧ 2019-05-30 13:46:57 檢舉

這是PostgreSQL TW 社群官網,裡面有連結,可以加入臉書社團.

登入發表回應

㊣浩瀚星空㊣

iT邦大神 1 級 ‧ 2019-05-28 17:26:19

詳細的做法可以參照「一級屠豬士」的處理。那算是比較嚴格的做法。

我之前用的是簡易判斷的做法。
先將要更新的表，額外在多一個欄位如要改Region_Name的話。就依照他原本的欄位型態。再多一個Region_Name_bak出來

然後下達如下語法

update db SET Region_Name_bak=Region_Name,Region_Name=REPLACE(Region_Name, 'ast', 'astern')

補充說明，為了安全起見，要下這個命令前，切記一定要先備份。以備不時之需。

回應
分享
檢舉

登入發表回應

rogeryao

iT邦超人 7 級 ‧ 2019-05-28 20:30:37

如前所述都是根本的問題 , 基本上應該把要 update 的欄位內的字串分析一遍 ,
可能有好幾種型態(包含局部資料已經手動補入 ) , 在依照不同的型態 update.

若型態不只一種(A,B,C,D,...) , 那麼在原 table 可能要多開 2 個欄位(Field_sup,Field_sup2) 及備份原始資料欄位(Region_Name_bak),
在 update 時紀錄是那一種型態.指令應該類似 :

A. 備份原始欄位資料,執行一次
update YourTable 
SET Region_Name_bak = Region_Name

B.型態不只一種時,可能要反覆執行
1. 先處理 <sup>
update YourTable 
SET Region_Name = REPLACE(Region_Name, 'Str1', '<sup>'),
    Field_sup = "A"
Where "條件";    

2. 再處理 </sup> 
update YourTable 
SET Region_Name = REPLACE(Region_Name, 'Str2', '</sup>'),
    Field_sup2 = "A"
Where "條件";