MHA Failover 过程解析




                   DBA Team
                  二零一三年三月

                  文档修订版历史


     日期      版本      说明       作者    审阅

2013-03-27                    邱伟胜




                      1
目录


目录
1.MHA 场景:.................................................................................................................3
2.MHA 切换过程.............................................................................................................3
    2.1 Phase 1: Configuration Check Phase...................................................3
    2.2 Phase 2: Dead Master Shutdown Phase.................................................3
    2.3 Phase 3: Master Recovery Phase...........................................................3
    2.4 Phase 4: Slaves Recovery Phase...........................................................9
    2.5 Phase 5: New master cleanup phase...................................................12




                                                            2
1.MHA 场景:

在下面的集群中,       通过手工控制,     模拟出 master 和各个 slave 不一致。 master
                                                    如
上表 qwsh 有四条记录,而 10.0.0.75 上只有一条记录:
10.0.0.13 (current master)
 +--10.0.0.74
 +--10.0.0.11
 +--10.0.0.75

Server        Role                      Table   Column   Rows
10.0.0.13     Master                    Qwsh    Aa int   1,2,3,4
10.0.0.11     Slave                     Qwsh    Aa int   1,2,3
10.0.0.74     Slave(candidate master)   Qwsh    Aa int   1,2
10.0.0.75     slave                     Qwsh    Aa int   1




2.MHA 切换过程

以下通过 manual failover 来详细解析一下过程:


2.1 Phase 1: Configuration Check Phase..

主要是检查各节点的状态:
一是 dead 与 alive;
二是 Primary candidate for the new Master 等



2.2 Phase 2: Dead Master Shutdown Phase..

一是检查是否可以 ssh 到 Dead Master
二是对 Dead Master 做一些处理,如 Disable VIP,Shutdown 主机等




                                 3
2.3 Phase 3: Master Recovery Phase..

2.3.1 Phase 3.1: Getting Latest Slaves Phase..

根据各 slave 的同步情况得到 Latest slaves(mysql-bin.000034:250773)和
Oldest slaves(mysql-bin.000034:250405)



2.3.2 Phase 3.2: Saving Dead Master's Binlog Phase..

如果 Dead Master 仍是可以 ssh 到,  获取 lasted slave 与 master 之间的 bin log
(start mysql-bin.000034:250773)

save_binary_logs     --command=save      --start_file=mysql-bin.000034
--start_pos=250773                       --binlog_dir=/data/mysql/arch
--output_file=/var/tmp/saved_master_binlog_from_10.0.0.13_3306_201303
25143805.binlog        --handle_raw_binlog=1       --disable_log_bin=0
--manager_version=0.55

如下为对应的 bin log 的内容:
[root@db-13~]#                                            mysqlbinlog
/var/tmp/saved_master_binlog_from_10.0.0.13_3306_20130325143805.binlo
g
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003                                                           SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 10:40:31 server id 1 end_log_pos 107 Start: binlog v 4, server
v 5.5.27-log created 130325 10:40:31 at startup
ROLLBACK/*!*/;
BINLOG '
H7lPUQ8BAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAfuU9REzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#130325 14:18:47 server id 1      end_log_pos 250841             Query
thread_id=21     exec_time=0     error_code=0
SET TIMESTAMP=1364192327/*!*/;
SET @@session.pseudo_thread_id=21/*!*/;
                                  4
SET    @@session.foreign_key_checks=1,   @@session.sql_auto_is_null=0,
@@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET                              @@session.auto_increment_increment=1,
@@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET
@@session.character_set_client=33,@@session.collation_connection=33,@
@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 175
#130325 14:18:47 server id 1      end_log_pos 250930             Query
thread_id=21     exec_time=0     error_code=0
use test/*!*/;
SET TIMESTAMP=1364192327/*!*/;
insert into qwsh values(4)
/*!*/;
# at 264
#130325 14:18:47 server id 1 end_log_pos 250957          Xid = 2425
COMMIT/*!*/;
# at 291
#130325 14:19:42 server id 1 end_log_pos 250976          Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;




2.3.3 Phase 3.3: Determining New Master Phase..

检查 latest slave 是否有所有的 relay log 用来修复其他的 slave(oldest pos:
mysql-bin.000034:250405)。然后根据候选规则,选出新的主库(会检查是否有
设置 candidate_master=1 和 no_master=1 等):

apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000034
--latest_rmlp=250773                   --target_mlf=mysql-bin.000034
--target_rmlp=250405        --server_id=3         --workdir=/var/tmp
--timestamp=20130325143805                    --manager_version=0.55

                                  5
--relay_log_info=/data/mysql/data/relay-log.info
--relay_dir=/data/mysql/data/



2.3.4 Phase 3.4: New Master Diff Log Generation Phase..

候选 master 与 lasted slave 比较,是否要生产差异 log (10.0.0.74 received
relay logs up to: mysql-bin.000034:250589 , the latest slave(10.0.0.11)
up to: mysql-bin.000034:250773 )

apply_diff_relay_logs   --command=generate_and_send    --scp_user=root
--scp_host=10.0.0.74                    --latest_mlf=mysql-bin.000034
--latest_rmlp=250773                    --target_mlf=mysql-bin.000034
--target_rmlp=250589                                     --server_id=3
--diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_10.0.0.74
_3306_20130325143805.binlog                         --workdir=/var/tmp
--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55
--relay_log_info=/data/mysql/data/relay-log.info
--relay_dir=/data/mysql/data/

如下为对应的 bin log 的内容:
[root@db-11~]#mysqlbinlog
/var/tmp/relay_from_read_to_latest_10.0.0.74_3306_20130325143805.binl
og
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003                                                             SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, server
v 5.5.27-log created 130325 11:03:52
BINLOG '
mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#700101    8:00:00 server id 1       end_log_pos 0           Rotate to
mysql-bin.000034 pos: 107
# at 150
#130325 10:40:31 server id 1 end_log_pos 0     Start: binlog v 4, server

                                   6
v 5.5.27-log created 130325 10:40:31
BINLOG '
H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 253
#130325 14:12:19 server id 1      end_log_pos 250657             Query
thread_id=21     exec_time=0     error_code=0
SET TIMESTAMP=1364191939/*!*/;
SET @@session.pseudo_thread_id=21/*!*/;
SET    @@session.foreign_key_checks=1,   @@session.sql_auto_is_null=0,
@@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET                              @@session.auto_increment_increment=1,
@@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET
@@session.character_set_client=33,@@session.collation_connection=33,@
@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 321
#130325 14:12:19 server id 1      end_log_pos 250746             Query
thread_id=21     exec_time=0     error_code=0
use test/*!*/;
SET TIMESTAMP=1364191939/*!*/;
insert into qwsh values(3)
/*!*/;
# at 410
#130325 14:12:19 server id 1 end_log_pos 250773          Xid = 2424
COMMIT/*!*/;
# at 437
#130325 14:12:36 server id 3 end_log_pos 250938          Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;




                                  7
2.3.5 Phase 3.5: Master Log Apply Phase..

一是 Waiting until all relay logs are applied。

二是合并 lasted slave 和 dead master 的日志,因为有些日志的 events 可能
不完整,合并过程中要检查:All apply target binary logs are concatinated
at /var/tmp/total_binlog_for_10.0.0.74_3306.20130325143805.binlog .

以下是对应的 log 内容:
[mysql@db-74                       ~]$                       mysqlbinlog
/var/tmp/total_binlog_for_10.0.0.74_3306.20130325143805.binlog
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003                                                              SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, server
v 5.5.27-log created 130325 11:03:52
BINLOG '
mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#700101     8:00:00 server id 1       end_log_pos 0           Rotate to
mysql-bin.000034 pos: 107
# at 150
#130325 10:40:31 server id 1 end_log_pos 0      Start: binlog v 4, server
v 5.5.27-log created 130325 10:40:31
BINLOG '
H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 253
#130325 14:12:19 server id 1      end_log_pos 250657                Query
thread_id=21     exec_time=0     error_code=0
SET TIMESTAMP=1364191939/*!*/;
SET @@session.pseudo_thread_id=21/*!*/;
SET    @@session.foreign_key_checks=1,    @@session.sql_auto_is_null=0,
@@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;

                                    8
SET                             @@session.auto_increment_increment=1,
@@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET
@@session.character_set_client=33,@@session.collation_connection=33,@
@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 321
#130325 14:12:19 server id 1     end_log_pos 250746             Query
thread_id=21    exec_time=0     error_code=0
use test/*!*/;
SET TIMESTAMP=1364191939/*!*/;
insert into qwsh values(3)
/*!*/;
# at 410
#130325 14:12:19 server id 1 end_log_pos 250773         Xid = 2424
COMMIT/*!*/;
# at 437
#130325 14:12:36 server id 3 end_log_pos 250938         Stop
# at 456
#130325 14:18:47 server id 1     end_log_pos 250841             Query
thread_id=21    exec_time=0     error_code=0
SET TIMESTAMP=1364192327/*!*/;
BEGIN
/*!*/;
# at 524
#130325 14:18:47 server id 1     end_log_pos 250930             Query
thread_id=21    exec_time=0     error_code=0
SET TIMESTAMP=1364192327/*!*/;
insert into qwsh values(4)
/*!*/;
# at 613
#130325 14:18:47 server id 1 end_log_pos 250957         Xid = 2425
COMMIT/*!*/;
# at 640
#130325 14:19:42 server id 1 end_log_pos 250976         Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
                                  9
三是记录新的 master 的 log file 和 pos:
All other slaves should start replication from here. Statement should be:
CHANGE    MASTER    TO    MASTER_HOST='10.0.0.74',     MASTER_PORT=3306,
MASTER_LOG_FILE='mysql-bin.000003',                  MASTER_LOG_POS=475,
MASTER_USER='repl', MASTER_PASSWORD='xxx';

四是 Executing master IP activate script;
五是 Set read_only=0 on the new master




2.4 Phase 4: Slaves Recovery Phase..

2.4.1 Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..

判断各个 slave 与 lastest slave 是否存在 relay log 差异,在 latest slave
上执行如下命令,生成差异 relay log 文件,并通过 scp 拷贝到对应的从库上:
(Server 10.0.0.75 received relay logs up to: mysql-bin.000034:250405.
Need to get diffs from the latest slave(10.0.0.11) up to:
mysql-bin.000034:250773)
apply_diff_relay_logs    --command=generate_and_send    --scp_user=root
--scp_host=10.0.0.75                     --latest_mlf=mysql-bin.000034
--latest_rmlp=250773                     --target_mlf=mysql-bin.000034
--target_rmlp=250405                                      --server_id=3
--diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_10.0.0.75
_3306_20130325143805.binlog                          --workdir=/var/tmp
--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55
--relay_log_info=/data/mysql/data/relay-log.info
--relay_dir=/data/mysql/data/



2.4.2 Phase 4.2: Starting Parallel Slave Log Apply Phase..

一是 Waiting until all relay logs are applied

二是检查是否有最新的 relay log,然后合并后应用

10.0.0.11 有 lasted relay log:
                                   10
apply_diff_relay_logs        --command=apply       --slave_user='root'
--slave_host=10.0.0.11    --slave_ip=10.0.0.11       --slave_port=3306
--apply_files=/var/tmp/saved_master_binlog_from_10.0.0.13_3306_201303
25143805.binlog     --workdir=/var/tmp     --target_version=5.5.27-log
--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55 --slave_pass=xxx

10.0.0.75 没有最新的 relay log,需要合并 relay log 和 dead master 的 bin
log:
apply_diff_relay_logs        --command=apply       --slave_user='root'
--slave_host=10.0.0.75    --slave_ip=10.0.0.75       --slave_port=3306
--apply_files=/var/tmp/relay_from_read_to_latest_10.0.0.75_3306_20130
325143805.binlog,/var/tmp/saved_master_binlog_from_10.0.0.13_3306_201
30325143805.binlog    --workdir=/var/tmp   --target_version=5.5.27-log
--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55 --slave_pass=xxx

以下是对应 log 的内容:
[mysql@db-75                     data]$                     mysqlbinlog
/var/tmp/total_binlog_for_10.0.0.75_3306.20130325143805.binlog
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003                                                             SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, server
v 5.5.27-log created 130325 11:03:52
BINLOG '
mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#700101    8:00:00 server id 1       end_log_pos 0           Rotate to
mysql-bin.000034 pos: 107
# at 150
#130325 10:40:31 server id 1 end_log_pos 0     Start: binlog v 4, server
v 5.5.27-log created 130325 10:40:31
BINLOG '
H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
                                   11
# at 253
#130325 14:09:57 server id 1      end_log_pos 250473             Query
thread_id=21     exec_time=0     error_code=0
SET TIMESTAMP=1364191797/*!*/;
SET @@session.pseudo_thread_id=21/*!*/;
SET    @@session.foreign_key_checks=1,   @@session.sql_auto_is_null=0,
@@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET                              @@session.auto_increment_increment=1,
@@session.auto_increment_offset=1/*!*/;
/*!C utf8 *//*!*/;
SET
@@session.character_set_client=33,@@session.collation_connection=33,@
@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 321
#130325 14:09:57 server id 1      end_log_pos 250562             Query
thread_id=21     exec_time=0     error_code=0
use test/*!*/;
SET TIMESTAMP=1364191797/*!*/;
insert into qwsh values(2)
/*!*/;
# at 410
#130325 14:09:57 server id 1 end_log_pos 250589          Xid = 2423
COMMIT/*!*/;
# at 437
#130325 14:12:19 server id 1      end_log_pos 250657             Query
thread_id=21     exec_time=0     error_code=0
SET TIMESTAMP=1364191939/*!*/;
BEGIN
/*!*/;
# at 505
#130325 14:12:19 server id 1      end_log_pos 250746             Query
thread_id=21     exec_time=0     error_code=0
SET TIMESTAMP=1364191939/*!*/;
insert into qwsh values(3)
/*!*/;
# at 594
#130325 14:12:19 server id 1 end_log_pos 250773          Xid = 2424
COMMIT/*!*/;
                                  12
# at 621
#130325 14:12:36 server id 3 end_log_pos 250938        Stop
# at 640
#130325 14:18:47 server id 1     end_log_pos 250841            Query
thread_id=21    exec_time=0     error_code=0
SET TIMESTAMP=1364192327/*!*/;
BEGIN
/*!*/;
# at 708
#130325 14:18:47 server id 1     end_log_pos 250930            Query
thread_id=21    exec_time=0     error_code=0
SET TIMESTAMP=1364192327/*!*/;
insert into qwsh values(4)
/*!*/;
# at 797
#130325 14:18:47 server id 1 end_log_pos 250957        Xid = 2425
COMMIT/*!*/;
# at 824
#130325 14:19:42 server id 1 end_log_pos 250976        Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

三是 Executed CHANGE MASTER



2.5 Phase 5: New master cleanup phase..

Resetting slave info on the new master




                                 13

Mha procedure

  • 1.
    MHA Failover 过程解析 DBA Team 二零一三年三月 文档修订版历史 日期 版本 说明 作者 审阅 2013-03-27 邱伟胜 1
  • 2.
    目录 目录 1.MHA 场景:.................................................................................................................3 2.MHA 切换过程.............................................................................................................3 2.1 Phase 1: Configuration Check Phase...................................................3 2.2 Phase 2: Dead Master Shutdown Phase.................................................3 2.3 Phase 3: Master Recovery Phase...........................................................3 2.4 Phase 4: Slaves Recovery Phase...........................................................9 2.5 Phase 5: New master cleanup phase...................................................12 2
  • 3.
    1.MHA 场景: 在下面的集群中, 通过手工控制, 模拟出 master 和各个 slave 不一致。 master 如 上表 qwsh 有四条记录,而 10.0.0.75 上只有一条记录: 10.0.0.13 (current master) +--10.0.0.74 +--10.0.0.11 +--10.0.0.75 Server Role Table Column Rows 10.0.0.13 Master Qwsh Aa int 1,2,3,4 10.0.0.11 Slave Qwsh Aa int 1,2,3 10.0.0.74 Slave(candidate master) Qwsh Aa int 1,2 10.0.0.75 slave Qwsh Aa int 1 2.MHA 切换过程 以下通过 manual failover 来详细解析一下过程: 2.1 Phase 1: Configuration Check Phase.. 主要是检查各节点的状态: 一是 dead 与 alive; 二是 Primary candidate for the new Master 等 2.2 Phase 2: Dead Master Shutdown Phase.. 一是检查是否可以 ssh 到 Dead Master 二是对 Dead Master 做一些处理,如 Disable VIP,Shutdown 主机等 3
  • 4.
    2.3 Phase 3:Master Recovery Phase.. 2.3.1 Phase 3.1: Getting Latest Slaves Phase.. 根据各 slave 的同步情况得到 Latest slaves(mysql-bin.000034:250773)和 Oldest slaves(mysql-bin.000034:250405) 2.3.2 Phase 3.2: Saving Dead Master's Binlog Phase.. 如果 Dead Master 仍是可以 ssh 到, 获取 lasted slave 与 master 之间的 bin log (start mysql-bin.000034:250773) save_binary_logs --command=save --start_file=mysql-bin.000034 --start_pos=250773 --binlog_dir=/data/mysql/arch --output_file=/var/tmp/saved_master_binlog_from_10.0.0.13_3306_201303 25143805.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 如下为对应的 bin log 的内容: [root@db-13~]# mysqlbinlog /var/tmp/saved_master_binlog_from_10.0.0.13_3306_20130325143805.binlo g /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #130325 10:40:31 server id 1 end_log_pos 107 Start: binlog v 4, server v 5.5.27-log created 130325 10:40:31 at startup ROLLBACK/*!*/; BINLOG ' H7lPUQ8BAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAA AAAAAAAAAAAAAAAAAAAfuU9REzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA== '/*!*/; # at 107 #130325 14:18:47 server id 1 end_log_pos 250841 Query thread_id=21 exec_time=0 error_code=0 SET TIMESTAMP=1364192327/*!*/; SET @@session.pseudo_thread_id=21/*!*/; 4
  • 5.
    SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=0/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; /*!C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@ @session.collation_server=33/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; BEGIN /*!*/; # at 175 #130325 14:18:47 server id 1 end_log_pos 250930 Query thread_id=21 exec_time=0 error_code=0 use test/*!*/; SET TIMESTAMP=1364192327/*!*/; insert into qwsh values(4) /*!*/; # at 264 #130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425 COMMIT/*!*/; # at 291 #130325 14:19:42 server id 1 end_log_pos 250976 Stop DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; 2.3.3 Phase 3.3: Determining New Master Phase.. 检查 latest slave 是否有所有的 relay log 用来修复其他的 slave(oldest pos: mysql-bin.000034:250405)。然后根据候选规则,选出新的主库(会检查是否有 设置 candidate_master=1 和 no_master=1 等): apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000034 --latest_rmlp=250773 --target_mlf=mysql-bin.000034 --target_rmlp=250405 --server_id=3 --workdir=/var/tmp --timestamp=20130325143805 --manager_version=0.55 5
  • 6.
    --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ 2.3.4 Phase 3.4:New Master Diff Log Generation Phase.. 候选 master 与 lasted slave 比较,是否要生产差异 log (10.0.0.74 received relay logs up to: mysql-bin.000034:250589 , the latest slave(10.0.0.11) up to: mysql-bin.000034:250773 ) apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=10.0.0.74 --latest_mlf=mysql-bin.000034 --latest_rmlp=250773 --target_mlf=mysql-bin.000034 --target_rmlp=250589 --server_id=3 --diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_10.0.0.74 _3306_20130325143805.binlog --workdir=/var/tmp --timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ 如下为对应的 bin log 的内容: [root@db-11~]#mysqlbinlog /var/tmp/relay_from_read_to_latest_10.0.0.74_3306_20130325143805.binl og /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, server v 5.5.27-log created 130325 11:03:52 BINLOG ' mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA== '/*!*/; # at 107 #700101 8:00:00 server id 1 end_log_pos 0 Rotate to mysql-bin.000034 pos: 107 # at 150 #130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, server 6
  • 7.
    v 5.5.27-log created130325 10:40:31 BINLOG ' H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA== '/*!*/; # at 253 #130325 14:12:19 server id 1 end_log_pos 250657 Query thread_id=21 exec_time=0 error_code=0 SET TIMESTAMP=1364191939/*!*/; SET @@session.pseudo_thread_id=21/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=0/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; /*!C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@ @session.collation_server=33/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; BEGIN /*!*/; # at 321 #130325 14:12:19 server id 1 end_log_pos 250746 Query thread_id=21 exec_time=0 error_code=0 use test/*!*/; SET TIMESTAMP=1364191939/*!*/; insert into qwsh values(3) /*!*/; # at 410 #130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424 COMMIT/*!*/; # at 437 #130325 14:12:36 server id 3 end_log_pos 250938 Stop DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; 7
  • 8.
    2.3.5 Phase 3.5:Master Log Apply Phase.. 一是 Waiting until all relay logs are applied。 二是合并 lasted slave 和 dead master 的日志,因为有些日志的 events 可能 不完整,合并过程中要检查:All apply target binary logs are concatinated at /var/tmp/total_binlog_for_10.0.0.74_3306.20130325143805.binlog . 以下是对应的 log 内容: [mysql@db-74 ~]$ mysqlbinlog /var/tmp/total_binlog_for_10.0.0.74_3306.20130325143805.binlog /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, server v 5.5.27-log created 130325 11:03:52 BINLOG ' mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA== '/*!*/; # at 107 #700101 8:00:00 server id 1 end_log_pos 0 Rotate to mysql-bin.000034 pos: 107 # at 150 #130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, server v 5.5.27-log created 130325 10:40:31 BINLOG ' H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA== '/*!*/; # at 253 #130325 14:12:19 server id 1 end_log_pos 250657 Query thread_id=21 exec_time=0 error_code=0 SET TIMESTAMP=1364191939/*!*/; SET @@session.pseudo_thread_id=21/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=0/*!*/; 8
  • 9.
    SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; /*!C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@ @session.collation_server=33/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; BEGIN /*!*/; # at 321 #130325 14:12:19 server id 1 end_log_pos 250746 Query thread_id=21 exec_time=0 error_code=0 use test/*!*/; SET TIMESTAMP=1364191939/*!*/; insert into qwsh values(3) /*!*/; # at 410 #130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424 COMMIT/*!*/; # at 437 #130325 14:12:36 server id 3 end_log_pos 250938 Stop # at 456 #130325 14:18:47 server id 1 end_log_pos 250841 Query thread_id=21 exec_time=0 error_code=0 SET TIMESTAMP=1364192327/*!*/; BEGIN /*!*/; # at 524 #130325 14:18:47 server id 1 end_log_pos 250930 Query thread_id=21 exec_time=0 error_code=0 SET TIMESTAMP=1364192327/*!*/; insert into qwsh values(4) /*!*/; # at 613 #130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425 COMMIT/*!*/; # at 640 #130325 14:19:42 server id 1 end_log_pos 250976 Stop DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; 9
  • 10.
    三是记录新的 master 的log file 和 pos: All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.74', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=475, MASTER_USER='repl', MASTER_PASSWORD='xxx'; 四是 Executing master IP activate script; 五是 Set read_only=0 on the new master 2.4 Phase 4: Slaves Recovery Phase.. 2.4.1 Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. 判断各个 slave 与 lastest slave 是否存在 relay log 差异,在 latest slave 上执行如下命令,生成差异 relay log 文件,并通过 scp 拷贝到对应的从库上: (Server 10.0.0.75 received relay logs up to: mysql-bin.000034:250405. Need to get diffs from the latest slave(10.0.0.11) up to: mysql-bin.000034:250773) apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=10.0.0.75 --latest_mlf=mysql-bin.000034 --latest_rmlp=250773 --target_mlf=mysql-bin.000034 --target_rmlp=250405 --server_id=3 --diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_10.0.0.75 _3306_20130325143805.binlog --workdir=/var/tmp --timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ 2.4.2 Phase 4.2: Starting Parallel Slave Log Apply Phase.. 一是 Waiting until all relay logs are applied 二是检查是否有最新的 relay log,然后合并后应用 10.0.0.11 有 lasted relay log: 10
  • 11.
    apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=10.0.0.11 --slave_ip=10.0.0.11 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_10.0.0.13_3306_201303 25143805.binlog --workdir=/var/tmp --target_version=5.5.27-log --timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --slave_pass=xxx 10.0.0.75 没有最新的 relay log,需要合并 relay log 和 dead master 的 bin log: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=10.0.0.75 --slave_ip=10.0.0.75 --slave_port=3306 --apply_files=/var/tmp/relay_from_read_to_latest_10.0.0.75_3306_20130 325143805.binlog,/var/tmp/saved_master_binlog_from_10.0.0.13_3306_201 30325143805.binlog --workdir=/var/tmp --target_version=5.5.27-log --timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55 --slave_pass=xxx 以下是对应 log 的内容: [mysql@db-75 data]$ mysqlbinlog /var/tmp/total_binlog_for_10.0.0.75_3306.20130325143805.binlog /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, server v 5.5.27-log created 130325 11:03:52 BINLOG ' mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA== '/*!*/; # at 107 #700101 8:00:00 server id 1 end_log_pos 0 Rotate to mysql-bin.000034 pos: 107 # at 150 #130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, server v 5.5.27-log created 130325 10:40:31 BINLOG ' H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA== '/*!*/; 11
  • 12.
    # at 253 #13032514:09:57 server id 1 end_log_pos 250473 Query thread_id=21 exec_time=0 error_code=0 SET TIMESTAMP=1364191797/*!*/; SET @@session.pseudo_thread_id=21/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=0/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; /*!C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@ @session.collation_server=33/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; BEGIN /*!*/; # at 321 #130325 14:09:57 server id 1 end_log_pos 250562 Query thread_id=21 exec_time=0 error_code=0 use test/*!*/; SET TIMESTAMP=1364191797/*!*/; insert into qwsh values(2) /*!*/; # at 410 #130325 14:09:57 server id 1 end_log_pos 250589 Xid = 2423 COMMIT/*!*/; # at 437 #130325 14:12:19 server id 1 end_log_pos 250657 Query thread_id=21 exec_time=0 error_code=0 SET TIMESTAMP=1364191939/*!*/; BEGIN /*!*/; # at 505 #130325 14:12:19 server id 1 end_log_pos 250746 Query thread_id=21 exec_time=0 error_code=0 SET TIMESTAMP=1364191939/*!*/; insert into qwsh values(3) /*!*/; # at 594 #130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424 COMMIT/*!*/; 12
  • 13.
    # at 621 #13032514:12:36 server id 3 end_log_pos 250938 Stop # at 640 #130325 14:18:47 server id 1 end_log_pos 250841 Query thread_id=21 exec_time=0 error_code=0 SET TIMESTAMP=1364192327/*!*/; BEGIN /*!*/; # at 708 #130325 14:18:47 server id 1 end_log_pos 250930 Query thread_id=21 exec_time=0 error_code=0 SET TIMESTAMP=1364192327/*!*/; insert into qwsh values(4) /*!*/; # at 797 #130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425 COMMIT/*!*/; # at 824 #130325 14:19:42 server id 1 end_log_pos 250976 Stop DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; 三是 Executed CHANGE MASTER 2.5 Phase 5: New master cleanup phase.. Resetting slave info on the new master 13