一般存貨代碼檔,除了必備的 prod_id , prod_name 等等之外,會有一個 prod_memo,通常以 text 型態成立。
一般在運用上,memo 用到的機會其實很少,我在想,分割到另一個 table,用 prod_id 來關連是否比較理想呢?
我不太清楚 mysql 的運作原理,在 select 時,對於沒有引用到的欄位,mysql 是否仍然會耗用資源?
一個 table 若有 100 個欄位,今天 select 時只用 5 個欄位,對於系統資源之消耗有差嗎?
有另外一個不同,但是可以參考的問題.
https://ithelp.ithome.com.tw/questions/10188078
接著我們來做些測試觀察.
create table ithelpmy190924a (
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, pad1 CHAR(100) NOT NULL
);
create table ithelpmy190924b (
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, pad1 CHAR(100) NOT NULL
, pad2 text
);
-- 載入資料
LOAD DATA INFILE '/usr/share/dict/words' INTO TABLE ithelpmy190924a(pad1);
LOAD DATA INFILE '/usr/share/dict/words' INTO TABLE ithelpmy190924b(pad1);
Query OK, 235886 rows affected (3.11 sec)
Records: 235886 Deleted: 0 Skipped: 0 Warnings: 0
觀察空間大小
SELECT table_name
, data_length / 1024 AS 'data(K)'
, row_format
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name LIKE 'ithelpmy190924%'
ORDER BY table_name;
+-----------------+------------+------------+
| TABLE_NAME | data(K) | ROW_FORMAT |
+-----------------+------------+------------+
| ithelpmy190924a | 31296.0000 | Dynamic |
| ithelpmy190924b | 31296.0000 | Dynamic |
+-----------------+------------+------------+
2 rows in set (0.01 sec)
接著再放入一些資料到 ithelpmy190924b pad2
update ithelpmy190924b
set pad2 = pad1
where rand() <= 0.05;
Query OK, 11856 rows affected (0.51 sec)
Rows matched: 11856 Changed: 11856 Warnings: 0
analyze table ithelpmy190924b;
+-----------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-----------------------+---------+----------+----------+
| miku1.ithelpmy190924b | analyze | status | OK |
+-----------------------+---------+----------+----------+
SELECT table_name
, data_length / 1024 AS 'data(K)'
, row_format
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name LIKE 'ithelpmy190924%'
ORDER BY table_name;
+-----------------+------------+------------+
| TABLE_NAME | data(K) | ROW_FORMAT |
+-----------------+------------+------------+
| ithelpmy190924a | 31296.0000 | Dynamic |
| ithelpmy190924b | 31296.0000 | Dynamic |
+-----------------+------------+------------+
https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html#data-types-storage-reqs-strings
TEXT and BLOB columns are implemented differently in NDB; each row in a TEXT column is
made up of two separate parts. One of these is of fixed size (256 bytes),
and is actually stored in the original table.
The other consists of any data in excess of 256 bytes, which is stored in a hidden table.
The rows in this second table are always 2000 bytes long.
This means that the size of a TEXT column is 256
if size <= 256 (where size represents the size of the row);
otherwise, the size is 256 + size + (2000 × (size − 256) % 2000).
有 hidden table, 也就是說, 有額外的空間來做處理.
select max(length(pad1))
, max(length(pad2))
from ithelpmy190924b;
+-------------------+-------------------+
| max(length(pad1)) | max(length(pad2)) |
+-------------------+-------------------+
| 24 | 22 |
+-------------------+-------------------+
1 row in set (0.13 sec)
在上面載入資料均未超過 256 bytes.
MySQL 8 String Function 文件中 OCTET_LENGTH() is a synonym for LENGTH().
接著來將 pad2 再填充一些資料.
update ithelpmy190924b
set pad2 = rpad(pad1, 1256, 'pad')
where pad2 is not null;
Query OK, 11856 rows affected (1.63 sec)
Rows matched: 11856 Changed: 11856 Warnings: 0
analyze table ithelpmy190924b;
SELECT table_name
, data_length / 1024 AS 'data(K)'
, row_format
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name LIKE 'ithelpmy190924%'
ORDER BY table_name;
+-----------------+------------+------------+
| TABLE_NAME | data(K) | ROW_FORMAT |
+-----------------+------------+------------+
| ithelpmy190924a | 31296.0000 | Dynamic |
| ithelpmy190924b | 73328.0000 | Dynamic |
+-----------------+------------+------------+
果然變胖了...
在以前比較舊的資料庫, 一些不常使用的欄位, 因為處理時,還是有佔空間的疑慮(這方面
你可以自行安排測試,相信會很精彩),所以會另外建立一個table, 與基本table做一對一對應.
再建立 view, 當需要查完整資料時,查view,當查主要資料時,查基本 table.
經過上面的觀察,現在比較新的資料庫,使用了 text 型態,會將額外的部分,另外存放.
在背景中,幫我們處理掉了.
另外說明一下, 在 Postgresql中是將文字型態資料另外放,基本table中實際上是放指標.
與 MySQL 的作法,各有千秋.
以上供你參考.期待你做的測試.
經過上面的觀察,現在比較新的資料庫,使用了 text 型態,會將額外的部分,另外存放.
在背景中,幫我們處理掉了.
如果,mysql會另外存放 text 欄位,那我想,有沒有拆開成二個 table 就沒差了。
另外,您提到的那個討論串,我也仔細看了,謝謝您的資訊。
要注意版本....
我想,等到那一天,mysql 支援 horizontal partition 時,
我們就不用傷腦筋啦。
http://download.nust.na/pub6/mysql/tech-resources/articles/performance-partitioning.html
A Quick Side Note on Vertical Partitioning
Although MySQL 5.1 automates horizontal partitioning, don't lose sight of vertical partitioning schemes when designing your databases. Although you have to do vertical partitioning manually, you can benefit from the practice in certain circumstances. For example, let's say you didn't normally need to reference or use the VARCHAR column defined in our previously shown partitioned table. Would the elimination of this column help query speed? Let's find out:
mysql> desc part_tab;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| c1 | int(11) | YES | | NULL | |
| c2 | varchar(30) | YES | | NULL | |
| c3 | date | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
3 rows in set (0.03 sec)
mysql> alter table part_tab drop column c2;
Query OK, 8000000 rows affected (42.20 sec)
Records: 8000000 Duplicates: 0 Warnings: 0
mysql> desc part_tab;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| c1 | int(11) | YES | | NULL | |
| c3 | date | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)
mysql> select count(*) from part_tab where
-> c3 > date '1995-01-01' and c3 < date '1995-12-31';
+----------+
| count(*) |
+----------+
| 795181 |
+----------+
1 row in set (0.34 sec)
By removing the VARCHAR column from the design, you actually get another 90+% reduction in query response time. Beyond partitioning, this speaks to the effect wide tables can have on queries and why you should always ensure that all columns defined to a table are actually needed.
以結論來看,把不常用的 text 欄位分割獨立出去,還是有益的。
若你們的table,有一大堆欄位,那不常用的,就另外table,會比較好.
但是,為何要弄一大堆不常用的欄位,這才是要好好討論設計的地方.
以我觀察,有好多都是一字長蛇陣啊...
不會是一大堆啦,但 memo 最頭大,使用者可能打一大串資料,佔據了很大的空間,但其實,百年才會去看一次。
以各類代碼檔來說,客戶代碼,存貨代碼,其中,實際常用資料沒佔多少 byte , 佔最多的,通常是備註。
舉個例子,我的客戶在他的員工的某一天考核備註欄上寫著:
接班前一天就已敲定Day日14:00到班,耍天兵,因下雨給我遲到2.5小時,理由超爛。Day日近中午時打電話來說下大雨,可能會晚一些到,我說沒關係,路上注意安全,然黃見就傻傻的等我電話,待14:20客人問看護怎麼還未到,我去電問黃,黃竟然說在等我電話等通知何時幾點要到客戶處,黃還說我有錯,可以晚到的話怎麼沒跟她通知。然後還說要18:00到,我電話中當場開罵,問說雨勢減小怎麼還不出發,為何還要等4個小時。
這種的就另外開Table放吧,可以弄個 master detail, 也不用一一對應啊,一對多,需要的就能放多筆,不需要的當然就沒有這樣空間省下來了.
基本表那裡弄個boolean 欄位,代表有 memo.要查有沒有memo時,
也不用去 join,但這樣是否比較好也不一定啦.還是要看實際情況.
只是一種方法,你參考看看.
基本表那裡弄個boolean 欄位,代表有 memo.要查有沒有memo時,
這倒是一個好點子,我以前沒想過,感恩。
分而治之.話說以前有個dBASE III+, 他有一個 memo 型態,
就不是放在 dbf檔,會另外有 dbt檔,在 FoxBase 是 fpt檔.
也是類似的概念.
理論上,
如果不是SELECT * 基本上是沒有影響,
反而是筆數的影響更大,
一般是建議只SELECT需要的欄位就好,
也比較不會影響讀取的速度.
select 絕對要養成好習慣
只讀需要得欄位就好
曾經菜鳥的時候
前輩整個案子用select *
上線前離職
那一週直接崩潰.....
照理來說 如果沒有類似text型態的欄位是還好