Your SlideShare is downloading. ×
MHA Failover 过程解析                   DBA Team                  二零一三年三月                  文档修订版历史     日期      版本      说明     ...
目录目录1.MHA 场景:................................................................................................................
1.MHA 场景:在下面的集群中,       通过手工控制,     模拟出 master 和各个 slave 不一致。 master                                                    如上...
2.3 Phase 3: Master Recovery Phase..2.3.1 Phase 3.1: Getting Latest Slaves Phase..根据各 slave 的同步情况得到 Latest slaves(mysql-bi...
SET    @@session.foreign_key_checks=1,   @@session.sql_auto_is_null=0,@@session.unique_checks=1, @@session.autocommit=1/*!...
--relay_log_info=/data/mysql/data/relay-log.info--relay_dir=/data/mysql/data/2.3.4 Phase 3.4: New Master Diff Log Generati...
v 5.5.27-log created 130325 10:40:31BINLOG H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...
2.3.5 Phase 3.5: Master Log Apply Phase..一是 Waiting until all relay logs are applied。二是合并 lasted slave 和 dead master 的日志,因...
SET                             @@session.auto_increment_increment=1,@@session.auto_increment_offset=1/*!*/;/*!C utf8 *//*...
三是记录新的 master 的 log file 和 pos:All other slaves should start replication from here. Statement should be:CHANGE    MASTER  ...
apply_diff_relay_logs        --command=apply       --slave_user=root--slave_host=10.0.0.11    --slave_ip=10.0.0.11       -...
# at 253#130325 14:09:57 server id 1      end_log_pos 250473             Querythread_id=21     exec_time=0     error_code=...
# at 621#130325 14:12:36 server id 3 end_log_pos 250938        Stop# at 640#130325 14:18:47 server id 1     end_log_pos 25...
Upcoming SlideShare
Loading in...5
×

Mha procedure

303

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
303
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Mha procedure"

  1. 1. MHA Failover 过程解析 DBA Team 二零一三年三月 文档修订版历史 日期 版本 说明 作者 审阅2013-03-27 邱伟胜 1
  2. 2. 目录目录1.MHA 场景:.................................................................................................................32.MHA 切换过程.............................................................................................................3 2.1 Phase 1: Configuration Check Phase...................................................3 2.2 Phase 2: Dead Master Shutdown Phase.................................................3 2.3 Phase 3: Master Recovery Phase...........................................................3 2.4 Phase 4: Slaves Recovery Phase...........................................................9 2.5 Phase 5: New master cleanup phase...................................................12 2
  3. 3. 1.MHA 场景:在下面的集群中, 通过手工控制, 模拟出 master 和各个 slave 不一致。 master 如上表 qwsh 有四条记录,而 10.0.0.75 上只有一条记录:10.0.0.13 (current master) +--10.0.0.74 +--10.0.0.11 +--10.0.0.75Server Role Table Column Rows10.0.0.13 Master Qwsh Aa int 1,2,3,410.0.0.11 Slave Qwsh Aa int 1,2,310.0.0.74 Slave(candidate master) Qwsh Aa int 1,210.0.0.75 slave Qwsh Aa int 12.MHA 切换过程以下通过 manual failover 来详细解析一下过程:2.1 Phase 1: Configuration Check Phase..主要是检查各节点的状态:一是 dead 与 alive;二是 Primary candidate for the new Master 等2.2 Phase 2: Dead Master Shutdown Phase..一是检查是否可以 ssh 到 Dead Master二是对 Dead Master 做一些处理,如 Disable VIP,Shutdown 主机等 3
  4. 4. 2.3 Phase 3: Master Recovery Phase..2.3.1 Phase 3.1: Getting Latest Slaves Phase..根据各 slave 的同步情况得到 Latest slaves(mysql-bin.000034:250773)和Oldest slaves(mysql-bin.000034:250405)2.3.2 Phase 3.2: Saving Dead Masters Binlog Phase..如果 Dead Master 仍是可以 ssh 到, 获取 lasted slave 与 master 之间的 bin log(start mysql-bin.000034:250773)save_binary_logs --command=save --start_file=mysql-bin.000034--start_pos=250773 --binlog_dir=/data/mysql/arch--output_file=/var/tmp/saved_master_binlog_from_10.0.0.13_3306_20130325143805.binlog --handle_raw_binlog=1 --disable_log_bin=0--manager_version=0.55如下为对应的 bin log 的内容:[root@db-13~]# mysqlbinlog/var/tmp/saved_master_binlog_from_10.0.0.13_3306_20130325143805.binlog/*!40019 SET @@session.max_insert_delayed_threads=0*/;/*!50003 SET@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;DELIMITER /*!*/;# at 4#130325 10:40:31 server id 1 end_log_pos 107 Start: binlog v 4, serverv 5.5.27-log created 130325 10:40:31 at startupROLLBACK/*!*/;BINLOG H7lPUQ8BAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAfuU9REzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/;# at 107#130325 14:18:47 server id 1 end_log_pos 250841 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364192327/*!*/;SET @@session.pseudo_thread_id=21/*!*/; 4
  5. 5. SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,@@session.unique_checks=1, @@session.autocommit=1/*!*/;SET @@session.sql_mode=0/*!*/;SET @@session.auto_increment_increment=1,@@session.auto_increment_offset=1/*!*/;/*!C utf8 *//*!*/;SET@@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;SET @@session.lc_time_names=0/*!*/;SET @@session.collation_database=DEFAULT/*!*/;BEGIN/*!*/;# at 175#130325 14:18:47 server id 1 end_log_pos 250930 Querythread_id=21 exec_time=0 error_code=0use test/*!*/;SET TIMESTAMP=1364192327/*!*/;insert into qwsh values(4)/*!*/;# at 264#130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425COMMIT/*!*/;# at 291#130325 14:19:42 server id 1 end_log_pos 250976 StopDELIMITER ;# End of log fileROLLBACK /* added by mysqlbinlog */;/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;2.3.3 Phase 3.3: Determining New Master Phase..检查 latest slave 是否有所有的 relay log 用来修复其他的 slave(oldest pos:mysql-bin.000034:250405)。然后根据候选规则,选出新的主库(会检查是否有设置 candidate_master=1 和 no_master=1 等):apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000034--latest_rmlp=250773 --target_mlf=mysql-bin.000034--target_rmlp=250405 --server_id=3 --workdir=/var/tmp--timestamp=20130325143805 --manager_version=0.55 5
  6. 6. --relay_log_info=/data/mysql/data/relay-log.info--relay_dir=/data/mysql/data/2.3.4 Phase 3.4: New Master Diff Log Generation Phase..候选 master 与 lasted slave 比较,是否要生产差异 log (10.0.0.74 receivedrelay logs up to: mysql-bin.000034:250589 , the latest slave(10.0.0.11)up to: mysql-bin.000034:250773 )apply_diff_relay_logs --command=generate_and_send --scp_user=root--scp_host=10.0.0.74 --latest_mlf=mysql-bin.000034--latest_rmlp=250773 --target_mlf=mysql-bin.000034--target_rmlp=250589 --server_id=3--diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_10.0.0.74_3306_20130325143805.binlog --workdir=/var/tmp--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0--manager_version=0.55--relay_log_info=/data/mysql/data/relay-log.info--relay_dir=/data/mysql/data/如下为对应的 bin log 的内容:[root@db-11~]#mysqlbinlog/var/tmp/relay_from_read_to_latest_10.0.0.74_3306_20130325143805.binlog/*!40019 SET @@session.max_insert_delayed_threads=0*/;/*!50003 SET@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;DELIMITER /*!*/;# at 4#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, serverv 5.5.27-log created 130325 11:03:52BINLOG mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/;# at 107#700101 8:00:00 server id 1 end_log_pos 0 Rotate tomysql-bin.000034 pos: 107# at 150#130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, server 6
  7. 7. v 5.5.27-log created 130325 10:40:31BINLOG H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/;# at 253#130325 14:12:19 server id 1 end_log_pos 250657 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364191939/*!*/;SET @@session.pseudo_thread_id=21/*!*/;SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,@@session.unique_checks=1, @@session.autocommit=1/*!*/;SET @@session.sql_mode=0/*!*/;SET @@session.auto_increment_increment=1,@@session.auto_increment_offset=1/*!*/;/*!C utf8 *//*!*/;SET@@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;SET @@session.lc_time_names=0/*!*/;SET @@session.collation_database=DEFAULT/*!*/;BEGIN/*!*/;# at 321#130325 14:12:19 server id 1 end_log_pos 250746 Querythread_id=21 exec_time=0 error_code=0use test/*!*/;SET TIMESTAMP=1364191939/*!*/;insert into qwsh values(3)/*!*/;# at 410#130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424COMMIT/*!*/;# at 437#130325 14:12:36 server id 3 end_log_pos 250938 StopDELIMITER ;# End of log fileROLLBACK /* added by mysqlbinlog */;/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; 7
  8. 8. 2.3.5 Phase 3.5: Master Log Apply Phase..一是 Waiting until all relay logs are applied。二是合并 lasted slave 和 dead master 的日志,因为有些日志的 events 可能不完整,合并过程中要检查:All apply target binary logs are concatinatedat /var/tmp/total_binlog_for_10.0.0.74_3306.20130325143805.binlog .以下是对应的 log 内容:[mysql@db-74 ~]$ mysqlbinlog/var/tmp/total_binlog_for_10.0.0.74_3306.20130325143805.binlog/*!40019 SET @@session.max_insert_delayed_threads=0*/;/*!50003 SET@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;DELIMITER /*!*/;# at 4#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, serverv 5.5.27-log created 130325 11:03:52BINLOG mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/;# at 107#700101 8:00:00 server id 1 end_log_pos 0 Rotate tomysql-bin.000034 pos: 107# at 150#130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, serverv 5.5.27-log created 130325 10:40:31BINLOG H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/;# at 253#130325 14:12:19 server id 1 end_log_pos 250657 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364191939/*!*/;SET @@session.pseudo_thread_id=21/*!*/;SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,@@session.unique_checks=1, @@session.autocommit=1/*!*/;SET @@session.sql_mode=0/*!*/; 8
  9. 9. SET @@session.auto_increment_increment=1,@@session.auto_increment_offset=1/*!*/;/*!C utf8 *//*!*/;SET@@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;SET @@session.lc_time_names=0/*!*/;SET @@session.collation_database=DEFAULT/*!*/;BEGIN/*!*/;# at 321#130325 14:12:19 server id 1 end_log_pos 250746 Querythread_id=21 exec_time=0 error_code=0use test/*!*/;SET TIMESTAMP=1364191939/*!*/;insert into qwsh values(3)/*!*/;# at 410#130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424COMMIT/*!*/;# at 437#130325 14:12:36 server id 3 end_log_pos 250938 Stop# at 456#130325 14:18:47 server id 1 end_log_pos 250841 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364192327/*!*/;BEGIN/*!*/;# at 524#130325 14:18:47 server id 1 end_log_pos 250930 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364192327/*!*/;insert into qwsh values(4)/*!*/;# at 613#130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425COMMIT/*!*/;# at 640#130325 14:19:42 server id 1 end_log_pos 250976 StopDELIMITER ;# End of log fileROLLBACK /* added by mysqlbinlog */;/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; 9
  10. 10. 三是记录新的 master 的 log file 和 pos:All other slaves should start replication from here. Statement should be:CHANGE MASTER TO MASTER_HOST=10.0.0.74, MASTER_PORT=3306,MASTER_LOG_FILE=mysql-bin.000003, MASTER_LOG_POS=475,MASTER_USER=repl, MASTER_PASSWORD=xxx;四是 Executing master IP activate script;五是 Set read_only=0 on the new master2.4 Phase 4: Slaves Recovery Phase..2.4.1 Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..判断各个 slave 与 lastest slave 是否存在 relay log 差异,在 latest slave上执行如下命令,生成差异 relay log 文件,并通过 scp 拷贝到对应的从库上:(Server 10.0.0.75 received relay logs up to: mysql-bin.000034:250405.Need to get diffs from the latest slave(10.0.0.11) up to:mysql-bin.000034:250773)apply_diff_relay_logs --command=generate_and_send --scp_user=root--scp_host=10.0.0.75 --latest_mlf=mysql-bin.000034--latest_rmlp=250773 --target_mlf=mysql-bin.000034--target_rmlp=250405 --server_id=3--diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_10.0.0.75_3306_20130325143805.binlog --workdir=/var/tmp--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0--manager_version=0.55--relay_log_info=/data/mysql/data/relay-log.info--relay_dir=/data/mysql/data/2.4.2 Phase 4.2: Starting Parallel Slave Log Apply Phase..一是 Waiting until all relay logs are applied二是检查是否有最新的 relay log,然后合并后应用10.0.0.11 有 lasted relay log: 10
  11. 11. apply_diff_relay_logs --command=apply --slave_user=root--slave_host=10.0.0.11 --slave_ip=10.0.0.11 --slave_port=3306--apply_files=/var/tmp/saved_master_binlog_from_10.0.0.13_3306_20130325143805.binlog --workdir=/var/tmp --target_version=5.5.27-log--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0--manager_version=0.55 --slave_pass=xxx10.0.0.75 没有最新的 relay log,需要合并 relay log 和 dead master 的 binlog:apply_diff_relay_logs --command=apply --slave_user=root--slave_host=10.0.0.75 --slave_ip=10.0.0.75 --slave_port=3306--apply_files=/var/tmp/relay_from_read_to_latest_10.0.0.75_3306_20130325143805.binlog,/var/tmp/saved_master_binlog_from_10.0.0.13_3306_20130325143805.binlog --workdir=/var/tmp --target_version=5.5.27-log--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0--manager_version=0.55 --slave_pass=xxx以下是对应 log 的内容:[mysql@db-75 data]$ mysqlbinlog/var/tmp/total_binlog_for_10.0.0.75_3306.20130325143805.binlog/*!40019 SET @@session.max_insert_delayed_threads=0*/;/*!50003 SET@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;DELIMITER /*!*/;# at 4#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, serverv 5.5.27-log created 130325 11:03:52BINLOG mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/;# at 107#700101 8:00:00 server id 1 end_log_pos 0 Rotate tomysql-bin.000034 pos: 107# at 150#130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, serverv 5.5.27-log created 130325 10:40:31BINLOG H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/; 11
  12. 12. # at 253#130325 14:09:57 server id 1 end_log_pos 250473 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364191797/*!*/;SET @@session.pseudo_thread_id=21/*!*/;SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,@@session.unique_checks=1, @@session.autocommit=1/*!*/;SET @@session.sql_mode=0/*!*/;SET @@session.auto_increment_increment=1,@@session.auto_increment_offset=1/*!*/;/*!C utf8 *//*!*/;SET@@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;SET @@session.lc_time_names=0/*!*/;SET @@session.collation_database=DEFAULT/*!*/;BEGIN/*!*/;# at 321#130325 14:09:57 server id 1 end_log_pos 250562 Querythread_id=21 exec_time=0 error_code=0use test/*!*/;SET TIMESTAMP=1364191797/*!*/;insert into qwsh values(2)/*!*/;# at 410#130325 14:09:57 server id 1 end_log_pos 250589 Xid = 2423COMMIT/*!*/;# at 437#130325 14:12:19 server id 1 end_log_pos 250657 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364191939/*!*/;BEGIN/*!*/;# at 505#130325 14:12:19 server id 1 end_log_pos 250746 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364191939/*!*/;insert into qwsh values(3)/*!*/;# at 594#130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424COMMIT/*!*/; 12
  13. 13. # at 621#130325 14:12:36 server id 3 end_log_pos 250938 Stop# at 640#130325 14:18:47 server id 1 end_log_pos 250841 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364192327/*!*/;BEGIN/*!*/;# at 708#130325 14:18:47 server id 1 end_log_pos 250930 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364192327/*!*/;insert into qwsh values(4)/*!*/;# at 797#130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425COMMIT/*!*/;# at 824#130325 14:19:42 server id 1 end_log_pos 250976 StopDELIMITER ;# End of log fileROLLBACK /* added by mysqlbinlog */;/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;三是 Executed CHANGE MASTER2.5 Phase 5: New master cleanup phase..Resetting slave info on the new master 13

×