• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Mha procedure
 

Mha procedure

on

  • 365 views

 

Statistics

Views

Total Views
365
Views on SlideShare
365
Embed Views
0

Actions

Likes
0
Downloads
16
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Mha procedure Mha procedure Document Transcript

    • MHA Failover 过程解析 DBA Team 二零一三年三月 文档修订版历史 日期 版本 说明 作者 审阅2013-03-27 邱伟胜 1
    • 目录目录1.MHA 场景:.................................................................................................................32.MHA 切换过程.............................................................................................................3 2.1 Phase 1: Configuration Check Phase...................................................3 2.2 Phase 2: Dead Master Shutdown Phase.................................................3 2.3 Phase 3: Master Recovery Phase...........................................................3 2.4 Phase 4: Slaves Recovery Phase...........................................................9 2.5 Phase 5: New master cleanup phase...................................................12 2
    • 1.MHA 场景:在下面的集群中, 通过手工控制, 模拟出 master 和各个 slave 不一致。 master 如上表 qwsh 有四条记录,而 10.0.0.75 上只有一条记录:10.0.0.13 (current master) +--10.0.0.74 +--10.0.0.11 +--10.0.0.75Server Role Table Column Rows10.0.0.13 Master Qwsh Aa int 1,2,3,410.0.0.11 Slave Qwsh Aa int 1,2,310.0.0.74 Slave(candidate master) Qwsh Aa int 1,210.0.0.75 slave Qwsh Aa int 12.MHA 切换过程以下通过 manual failover 来详细解析一下过程:2.1 Phase 1: Configuration Check Phase..主要是检查各节点的状态:一是 dead 与 alive;二是 Primary candidate for the new Master 等2.2 Phase 2: Dead Master Shutdown Phase..一是检查是否可以 ssh 到 Dead Master二是对 Dead Master 做一些处理,如 Disable VIP,Shutdown 主机等 3
    • 2.3 Phase 3: Master Recovery Phase..2.3.1 Phase 3.1: Getting Latest Slaves Phase..根据各 slave 的同步情况得到 Latest slaves(mysql-bin.000034:250773)和Oldest slaves(mysql-bin.000034:250405)2.3.2 Phase 3.2: Saving Dead Masters Binlog Phase..如果 Dead Master 仍是可以 ssh 到, 获取 lasted slave 与 master 之间的 bin log(start mysql-bin.000034:250773)save_binary_logs --command=save --start_file=mysql-bin.000034--start_pos=250773 --binlog_dir=/data/mysql/arch--output_file=/var/tmp/saved_master_binlog_from_10.0.0.13_3306_20130325143805.binlog --handle_raw_binlog=1 --disable_log_bin=0--manager_version=0.55如下为对应的 bin log 的内容:[root@db-13~]# mysqlbinlog/var/tmp/saved_master_binlog_from_10.0.0.13_3306_20130325143805.binlog/*!40019 SET @@session.max_insert_delayed_threads=0*/;/*!50003 SET@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;DELIMITER /*!*/;# at 4#130325 10:40:31 server id 1 end_log_pos 107 Start: binlog v 4, serverv 5.5.27-log created 130325 10:40:31 at startupROLLBACK/*!*/;BINLOG H7lPUQ8BAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAfuU9REzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/;# at 107#130325 14:18:47 server id 1 end_log_pos 250841 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364192327/*!*/;SET @@session.pseudo_thread_id=21/*!*/; 4
    • SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,@@session.unique_checks=1, @@session.autocommit=1/*!*/;SET @@session.sql_mode=0/*!*/;SET @@session.auto_increment_increment=1,@@session.auto_increment_offset=1/*!*/;/*!C utf8 *//*!*/;SET@@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;SET @@session.lc_time_names=0/*!*/;SET @@session.collation_database=DEFAULT/*!*/;BEGIN/*!*/;# at 175#130325 14:18:47 server id 1 end_log_pos 250930 Querythread_id=21 exec_time=0 error_code=0use test/*!*/;SET TIMESTAMP=1364192327/*!*/;insert into qwsh values(4)/*!*/;# at 264#130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425COMMIT/*!*/;# at 291#130325 14:19:42 server id 1 end_log_pos 250976 StopDELIMITER ;# End of log fileROLLBACK /* added by mysqlbinlog */;/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;2.3.3 Phase 3.3: Determining New Master Phase..检查 latest slave 是否有所有的 relay log 用来修复其他的 slave(oldest pos:mysql-bin.000034:250405)。然后根据候选规则,选出新的主库(会检查是否有设置 candidate_master=1 和 no_master=1 等):apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000034--latest_rmlp=250773 --target_mlf=mysql-bin.000034--target_rmlp=250405 --server_id=3 --workdir=/var/tmp--timestamp=20130325143805 --manager_version=0.55 5
    • --relay_log_info=/data/mysql/data/relay-log.info--relay_dir=/data/mysql/data/2.3.4 Phase 3.4: New Master Diff Log Generation Phase..候选 master 与 lasted slave 比较,是否要生产差异 log (10.0.0.74 receivedrelay logs up to: mysql-bin.000034:250589 , the latest slave(10.0.0.11)up to: mysql-bin.000034:250773 )apply_diff_relay_logs --command=generate_and_send --scp_user=root--scp_host=10.0.0.74 --latest_mlf=mysql-bin.000034--latest_rmlp=250773 --target_mlf=mysql-bin.000034--target_rmlp=250589 --server_id=3--diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_10.0.0.74_3306_20130325143805.binlog --workdir=/var/tmp--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0--manager_version=0.55--relay_log_info=/data/mysql/data/relay-log.info--relay_dir=/data/mysql/data/如下为对应的 bin log 的内容:[root@db-11~]#mysqlbinlog/var/tmp/relay_from_read_to_latest_10.0.0.74_3306_20130325143805.binlog/*!40019 SET @@session.max_insert_delayed_threads=0*/;/*!50003 SET@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;DELIMITER /*!*/;# at 4#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, serverv 5.5.27-log created 130325 11:03:52BINLOG mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/;# at 107#700101 8:00:00 server id 1 end_log_pos 0 Rotate tomysql-bin.000034 pos: 107# at 150#130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, server 6
    • v 5.5.27-log created 130325 10:40:31BINLOG H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/;# at 253#130325 14:12:19 server id 1 end_log_pos 250657 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364191939/*!*/;SET @@session.pseudo_thread_id=21/*!*/;SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,@@session.unique_checks=1, @@session.autocommit=1/*!*/;SET @@session.sql_mode=0/*!*/;SET @@session.auto_increment_increment=1,@@session.auto_increment_offset=1/*!*/;/*!C utf8 *//*!*/;SET@@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;SET @@session.lc_time_names=0/*!*/;SET @@session.collation_database=DEFAULT/*!*/;BEGIN/*!*/;# at 321#130325 14:12:19 server id 1 end_log_pos 250746 Querythread_id=21 exec_time=0 error_code=0use test/*!*/;SET TIMESTAMP=1364191939/*!*/;insert into qwsh values(3)/*!*/;# at 410#130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424COMMIT/*!*/;# at 437#130325 14:12:36 server id 3 end_log_pos 250938 StopDELIMITER ;# End of log fileROLLBACK /* added by mysqlbinlog */;/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; 7
    • 2.3.5 Phase 3.5: Master Log Apply Phase..一是 Waiting until all relay logs are applied。二是合并 lasted slave 和 dead master 的日志,因为有些日志的 events 可能不完整,合并过程中要检查:All apply target binary logs are concatinatedat /var/tmp/total_binlog_for_10.0.0.74_3306.20130325143805.binlog .以下是对应的 log 内容:[mysql@db-74 ~]$ mysqlbinlog/var/tmp/total_binlog_for_10.0.0.74_3306.20130325143805.binlog/*!40019 SET @@session.max_insert_delayed_threads=0*/;/*!50003 SET@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;DELIMITER /*!*/;# at 4#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, serverv 5.5.27-log created 130325 11:03:52BINLOG mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/;# at 107#700101 8:00:00 server id 1 end_log_pos 0 Rotate tomysql-bin.000034 pos: 107# at 150#130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, serverv 5.5.27-log created 130325 10:40:31BINLOG H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/;# at 253#130325 14:12:19 server id 1 end_log_pos 250657 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364191939/*!*/;SET @@session.pseudo_thread_id=21/*!*/;SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,@@session.unique_checks=1, @@session.autocommit=1/*!*/;SET @@session.sql_mode=0/*!*/; 8
    • SET @@session.auto_increment_increment=1,@@session.auto_increment_offset=1/*!*/;/*!C utf8 *//*!*/;SET@@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;SET @@session.lc_time_names=0/*!*/;SET @@session.collation_database=DEFAULT/*!*/;BEGIN/*!*/;# at 321#130325 14:12:19 server id 1 end_log_pos 250746 Querythread_id=21 exec_time=0 error_code=0use test/*!*/;SET TIMESTAMP=1364191939/*!*/;insert into qwsh values(3)/*!*/;# at 410#130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424COMMIT/*!*/;# at 437#130325 14:12:36 server id 3 end_log_pos 250938 Stop# at 456#130325 14:18:47 server id 1 end_log_pos 250841 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364192327/*!*/;BEGIN/*!*/;# at 524#130325 14:18:47 server id 1 end_log_pos 250930 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364192327/*!*/;insert into qwsh values(4)/*!*/;# at 613#130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425COMMIT/*!*/;# at 640#130325 14:19:42 server id 1 end_log_pos 250976 StopDELIMITER ;# End of log fileROLLBACK /* added by mysqlbinlog */;/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; 9
    • 三是记录新的 master 的 log file 和 pos:All other slaves should start replication from here. Statement should be:CHANGE MASTER TO MASTER_HOST=10.0.0.74, MASTER_PORT=3306,MASTER_LOG_FILE=mysql-bin.000003, MASTER_LOG_POS=475,MASTER_USER=repl, MASTER_PASSWORD=xxx;四是 Executing master IP activate script;五是 Set read_only=0 on the new master2.4 Phase 4: Slaves Recovery Phase..2.4.1 Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..判断各个 slave 与 lastest slave 是否存在 relay log 差异,在 latest slave上执行如下命令,生成差异 relay log 文件,并通过 scp 拷贝到对应的从库上:(Server 10.0.0.75 received relay logs up to: mysql-bin.000034:250405.Need to get diffs from the latest slave(10.0.0.11) up to:mysql-bin.000034:250773)apply_diff_relay_logs --command=generate_and_send --scp_user=root--scp_host=10.0.0.75 --latest_mlf=mysql-bin.000034--latest_rmlp=250773 --target_mlf=mysql-bin.000034--target_rmlp=250405 --server_id=3--diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_10.0.0.75_3306_20130325143805.binlog --workdir=/var/tmp--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0--manager_version=0.55--relay_log_info=/data/mysql/data/relay-log.info--relay_dir=/data/mysql/data/2.4.2 Phase 4.2: Starting Parallel Slave Log Apply Phase..一是 Waiting until all relay logs are applied二是检查是否有最新的 relay log,然后合并后应用10.0.0.11 有 lasted relay log: 10
    • apply_diff_relay_logs --command=apply --slave_user=root--slave_host=10.0.0.11 --slave_ip=10.0.0.11 --slave_port=3306--apply_files=/var/tmp/saved_master_binlog_from_10.0.0.13_3306_20130325143805.binlog --workdir=/var/tmp --target_version=5.5.27-log--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0--manager_version=0.55 --slave_pass=xxx10.0.0.75 没有最新的 relay log,需要合并 relay log 和 dead master 的 binlog:apply_diff_relay_logs --command=apply --slave_user=root--slave_host=10.0.0.75 --slave_ip=10.0.0.75 --slave_port=3306--apply_files=/var/tmp/relay_from_read_to_latest_10.0.0.75_3306_20130325143805.binlog,/var/tmp/saved_master_binlog_from_10.0.0.13_3306_20130325143805.binlog --workdir=/var/tmp --target_version=5.5.27-log--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0--manager_version=0.55 --slave_pass=xxx以下是对应 log 的内容:[mysql@db-75 data]$ mysqlbinlog/var/tmp/total_binlog_for_10.0.0.75_3306.20130325143805.binlog/*!40019 SET @@session.max_insert_delayed_threads=0*/;/*!50003 SET@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;DELIMITER /*!*/;# at 4#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, serverv 5.5.27-log created 130325 11:03:52BINLOG mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/;# at 107#700101 8:00:00 server id 1 end_log_pos 0 Rotate tomysql-bin.000034 pos: 107# at 150#130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, serverv 5.5.27-log created 130325 10:40:31BINLOG H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==/*!*/; 11
    • # at 253#130325 14:09:57 server id 1 end_log_pos 250473 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364191797/*!*/;SET @@session.pseudo_thread_id=21/*!*/;SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,@@session.unique_checks=1, @@session.autocommit=1/*!*/;SET @@session.sql_mode=0/*!*/;SET @@session.auto_increment_increment=1,@@session.auto_increment_offset=1/*!*/;/*!C utf8 *//*!*/;SET@@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;SET @@session.lc_time_names=0/*!*/;SET @@session.collation_database=DEFAULT/*!*/;BEGIN/*!*/;# at 321#130325 14:09:57 server id 1 end_log_pos 250562 Querythread_id=21 exec_time=0 error_code=0use test/*!*/;SET TIMESTAMP=1364191797/*!*/;insert into qwsh values(2)/*!*/;# at 410#130325 14:09:57 server id 1 end_log_pos 250589 Xid = 2423COMMIT/*!*/;# at 437#130325 14:12:19 server id 1 end_log_pos 250657 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364191939/*!*/;BEGIN/*!*/;# at 505#130325 14:12:19 server id 1 end_log_pos 250746 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364191939/*!*/;insert into qwsh values(3)/*!*/;# at 594#130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424COMMIT/*!*/; 12
    • # at 621#130325 14:12:36 server id 3 end_log_pos 250938 Stop# at 640#130325 14:18:47 server id 1 end_log_pos 250841 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364192327/*!*/;BEGIN/*!*/;# at 708#130325 14:18:47 server id 1 end_log_pos 250930 Querythread_id=21 exec_time=0 error_code=0SET TIMESTAMP=1364192327/*!*/;insert into qwsh values(4)/*!*/;# at 797#130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425COMMIT/*!*/;# at 824#130325 14:19:42 server id 1 end_log_pos 250976 StopDELIMITER ;# End of log fileROLLBACK /* added by mysqlbinlog */;/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;三是 Executed CHANGE MASTER2.5 Phase 5: New master cleanup phase..Resetting slave info on the new master 13