(以下文章擷取自筆者的 blog,分享給大家)
上篇筆者說明了一些原理說明,這篇就來做個實驗。
有一個不錯用的工具 mysql_install_db.exe 可以直接開新的 Instance。
筆者用這個先開一個 Instance 在 Port 3307。
設定原來在 3306 的為 Master (server-id=1),3307 的為 Slave (server-id=2),都開啟 binlog (log-bin)。
Event 記得設成 Disabled on Slave。
接下來備份 Master
E:\backup>mysqldump -u sujunmin -p --master-data --all-databases --events --routines --gtid > all_db.sql
Enter password: ************
其中幾個重要參數
| 參數 | 意義 | 
|---|---|
| master-data | dump 出來的掛 master ID | 
| events | 包含 events | 
| routines | 包含 functions 與 store procedures | 
| gtid | 產生 CHANGE MASTER TO master_use_gtid語法 | 
觀察一下 all_db.sql 的內容
-- MySQL dump 10.16  Distrib 10.1.17-MariaDB, for Win64 (AMD64)
(中間省略)
--
-- Position to start replication or point-in-time recovery from
--
-- CHANGE MASTER TO MASTER_LOG_FILE='WSTest-bin.000002', MASTER_LOG_POS=3732;
--
-- GTID to start replication from
--
CHANGE MASTER TO MASTER_USE_GTID=slave_pos;
SET GLOBAL gtid_slave_pos='0-1-770';
--
-- Current Database: 'master'
--
(後面省略)
可以看到如果要用舊的方式(兼容 MySQL Replication) 與新的方式。
啟動 Slave 的 Instance。
把 all_db.sql 倒到 Slave (3307) 上面。
在 Master 上開一個 User lsuser 作為 Replication 用,權限是 REPLICATION SLAVE 與 SUPER (這個我不設定會沒權限登入,但是官方網站沒有這個權限)。
在 Slave 上設定 CHANGE MASTER
MariaDB [(none)]> change master to master_host='localhost', master_port=3306, master_user='lsuser', master_password='password';
Query OK, 0 rows affected (0.05 sec)
START SLAVE;
SHOW SLAVE STATUS;
 MariaDB [(none)]> show slave status\G
 *************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: localhost
                Master_User: lsuser
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: WSTest-bin.000002
        Read_Master_Log_Pos: 5429
             Relay_Log_File: WSTest-relay-bin.000002
              Relay_Log_Pos: 5757
      Relay_Master_Log_File: WSTest-bin.000002
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
                         (省略)
                 Using_Gtid: Slave_Pos
                Gtid_IO_Pos: 0-1-775
    Replicate_Do_Domain_Ids:
  Replicate_Ignore_Domain_Ids:
              Parallel_Mode: conservative
  1 row in set (0.00 sec)
看一下同步狀況
移除 Master 模擬 Master 壞掉了狀態
sc stop mariadb
sc delete mariadb
Slave 沒法連到 Master 了
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
             Slave_IO_State: Reconnecting after a failed master event read
                Master_Host: localhost
                Master_User: lsuser
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: WSTest-bin.000002
        Read_Master_Log_Pos: 8223
             Relay_Log_File: WSTest-relay-bin.000002
              Relay_Log_Pos: 8551
      Relay_Master_Log_File: WSTest-bin.000002
           Slave_IO_Running: Connecting
          Slave_SQL_Running: Yes
                         (省略)
              Last_IO_Errno: 2003
              Last_IO_Error: error reconnecting to master 'lsuser@localhost:3306' - retry-time: 60  retries: 86400  message: Can't connect to MySQL server on'localhost' (10061 "Unknown error")
             Last_SQL_Errno: 0
             Last_SQL_Error:
 Replicate_Ignore_Server_Ids:
           Master_Server_Id: 1
             Master_SSL_Crl:
         Master_SSL_Crlpath:
                 Using_Gtid: Slave_Pos
                Gtid_IO_Pos: 0-1-784
    Replicate_Do_Domain_Ids:
 Replicate_Ignore_Domain_Ids:
              Parallel_Mode: conservative
 1 row in set (0.00 sec)
STOP SLAVE
打開 Slave 的 Event 服務
繼續服務
重建 Master
E:\backup>mysqldump -u sujunmin -p --port 3307 --master-data --all-databases --events --routines --gtid > all_db.sql
Enter password: ************
觀察一下 all_db.sql 的內容
-- MySQL dump 10.16  Distrib 10.1.17-MariaDB, for Win64 (AMD64)
(中間省略)
--
-- Position to start replication or point-in-time recovery from
--
-- CHANGE MASTER TO MASTER_LOG_FILE='WSTest-bin.000003', MASTER_LOG_POS=12441;
--
-- GTID to start replication from
--
CHANGE MASTER TO MASTER_USE_GTID=slave_pos;
SET GLOBAL gtid_slave_pos='0-2-976';
--
-- Current Database: 'master'
--
(後面省略)
啟動 Master 的 Instance。
把 all_db.sql 倒到 Master (3307) 上面。
在 Slave 上設定 CHANGE MASTER 為 current_pos
MariaDB [(none)]> change master to master_host='localhost', master_port=3306, master_user='lsuser', master_password='password', master_use_gtid=current_pos;
Query OK, 0 rows affected (0.05 sec)
START SLAVE;
SHOW SLAVE STATUS;
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
                          (省略)
             Last_IO_Errno: 1236
             Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-2-980, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions'
            Last_SQL_Errno: 0
            Last_SQL_Error:
Replicate_Ignore_Server_Ids:
          Master_Server_Id: 1
            Master_SSL_Crl:
        Master_SSL_Crlpath:
                Using_Gtid: Current_Pos
               Gtid_IO_Pos: 0-2-980
   Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
              Parallel_Mode: conservative
1 row in set (0.00 sec)
STOP SLAVE;
在 Slave 上設定 CHANGE MASTER 為 slave_pos
MariaDB [(none)]> change master to master_host='localhost', master_port=3306, master_user='lsuser', master_password='password', master_use_gtid=slave_pos;
Query OK, 0 rows affected (0.05 sec)
START SLAVE;
SHOW SLAVE STATUS;
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
            Slave_IO_State: Waiting for master to send event
               Master_Host: localhost
               Master_User: lsuser
               Master_Port: 3306
             Connect_Retry: 60
           Master_Log_File: WSTest-bin.000003
       Read_Master_Log_Pos: 7177
            Relay_Log_File: WSTest-relay-bin.000003
             Relay_Log_Pos: 7466
     Relay_Master_Log_File: WSTest-bin.000003
          Slave_IO_Running: Yes
         Slave_SQL_Running: Yes
                        (省略)
                Using_Gtid: Slave_Pos
               Gtid_IO_Pos: 0-1-1167
   Replicate_Do_Domain_Ids:
 Replicate_Ignore_Domain_Ids:
               Parallel_Mode: conservative
1 row in set (0.00 sec)
0-2-976, 0-1-1167) 都幫你做好了CREATE MASTER to ..., master_use_gtid=slave_pos;),因為 slave_pos 可以是多個的,例如 gtid_slave_pos='0-1-123,1-2-456,...'
stop slave;
set global sql_slave_skip_counter=1;
start slave;
select sleep(10);
show slave status;
雖然有人說不要太常用 sql_slave_skip_counter,但這個是我覺得重建外最快的方法了,到現在還沒有碰到其他問題。