Maxscale switchover, failover, and auto rejoin

Wagner Bianchi
Wagner BianchiPrincipal RDBA, DBOps, Project, Process & Services Addicted, Oracle ACE Diretor at MariaDB Corporation
MariaDB Maxscale
Switchover, Failover and Rejoin
Wagner Bianchi
Remote DBA Team Lead @ MariaDB RDBA Team
Esa Korhonen
Software Engineer @ MariaDB Maxscale Engineering Team
Introduction to MariaDB MaxScale
● Intelligent database proxy:
○ Separates client application
from backend(s)
○ Understands authentication,
queries and backend roles
○ Typical use-cases: read-write
splitting, load-balancing
○ Many plugins: query filtering,
logging, caching
● Latest GA version: 2.2
DATABASE
SERVERS
CLIENT
Query processing stages
Filter
Client
Protocol
Protocol
Filter Filter Router
Server State
Monitor
Parser updates
monitors
uses
Backend
What is new in MariaDB-Monitor for MaxScale 2.2*
● Support for replication cluster manipulation: failover, switchover, rejoin
○ failover: replace a failed master with a slave
○ switchover: swap a slave with a live master
○ rejoin: bring a standalone server back to the cluster or redirect slaves replicating from the
wrong master
● Failover & rejoin can be set to activate automatically
● Reduces need for custom scripts or replication management tools
● Supported topologies: 1 Master, N slaves, 1-level depth
● Limited support for external masters
* Note: Renamed from previous mysqlmon
Switchover
● Controlled swap of master with a
designated slave
● Monitor user must have SUPER-privilege
● Depends on read_only to freeze cluster
○ SUPER-users bypasses this
● Waits for all slaves to catch up with
master
○ no data should be lost, but can be slow
● Configuration settings:
○ replication_user & replication_password
○ switchover_timeout
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
$./maxctrl call command mariadbmon switchover MariaDB-Monitor LocalSlave1
OK
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Slave, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
Failover
● Promote a slave to take place of failed
master
● Damage has already been done, so no
need to worry about old master
● Chooses a new master based on following
criteria (in order of importance):
○ not in exclusion-list
○ has latest event in relay log
○ has processed latest event
○ has log_slave_updates on
● Configuration:
○ failover_timeout
● May lose data with failed master
○ (semi)sync replication
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │
├──────────────┼───────────┼──────┼─────────────┼────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │
├──────────────┼───────────┼──────┼─────────────┼────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴────────────────┘
$./maxctrl call command mariadbmon failover MariaDB-Monitor
OK
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
Automatic failover
● Trigger: master must be down for a
set amount of time
● Additional check by looking at slave
connections
● Configuration settings:
○ auto_failover
○ failcount & monitor_interval
○ verify_master_failure &
master_failure_timeout
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
$docker stop maxscalebackends_testing1_master1_1
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │
├──────────────┼───────────┼──────┼─────────────┼────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │
├──────────────┼───────────┼──────┼─────────────┼────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴────────────────┘
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
Rejoin
● Directs the joining to server to replicate from
the cluster master
○ redirect a slave replicating from the wrong master
○ start replication on a standalone server
● Looks at gtid:s to decide if the joining server can
replicate
● Manual/automatic mode (auto_rejoin=1)
● Typical use case: master goes down -> failover
-> old master comes back -> rejoined to cluster
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
$docker start maxscalebackends_testing1_master1_1
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
$./maxctrl call command mariadbmon rejoin MariaDB-Monitor LocalMaster1
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Slave, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
External master handling
DC A DC B
replicating from
DC A DC B
replicating from
Switchover details
Starting checks:
1. Cluster has 1 master and >1 slaves
2. All servers use GTID replication and cluster
GTID-domain is known
3. Requested new master has binary log on
Prepare current master:
1. SET GLOBAL read_only=1;
2. FLUSH TABLES;
3. FLUSH LOGS;
4. update GTID-info
Wait until all slaves catch up to
master:
1. MASTER_GTID_WAIT()
A
B
C
A
B
C
Stop slave replication on new
master:
1. STOP SLAVE;
2. RESET SLAVE ALL;
3. SET GLOBAL read_only=0
B
A
C
Redirect slaves & old master to
new master:
1. STOP SLAVE;
2. RESET SLAVE;
3. CHANGE MASTER TO …
4. START SLAVE;
Check that replication is working:
1. FLUSH TABLES;
2. Check that all slaves
receive new gtid
DEMO TIME!!
Maxscale 2.2 New Features
● At this point you know that, MariaDB Maxscale is able to:
○ Automatic/Manual Failover;
○ Manual Switchover;
○ Rejoin a crashed node as slave of an existing cluster;
● The previous processes relies on the new MariaDBMon monitor;
● Hidden details when implementing and/or break/fix:
○ For the switchover/failover/rejoin work, you need to have the monitor user (MariaDBMon) with
access on all the servers or, a separate user for replication_user and replication_password
with access on all the servers;
○ If the monitor user (MariaDBMon) has an encrypted password, the replication_password
should be encrypted as well, otherwise, the CHANGE MASTER TO running for the processes
won't be able to configure the replication for the new server;
Maxscale 2.2 New Features
● Failover: replacing a failed master.
● For the automatic failover, auto_failover variable should be true on monitor
configuration definition;
○ auto_failover=true, for automatic failover be activated;
● For the manual failover, auto_failover should be set to false on monitor
configuration definition;
● The master should be dead for the manual failover to work;
○ auto_failover=false, the failover can be activated manually:
● Enable and disable to auto_failover with the alter monitor command.
[root@box01 ~]# maxadmin call command mariadbmon failover replication-cluster-monitor
Maxscale 2.2 New Features
● Failover: replacing a failed master (automatic, auto_failover=true)
#: checking current configurations
[root@box01 ~]# grep auto_failover /var/lib/maxscale/maxscale.cnf.d/replication-cluster-monitor.cnf
auto_failover=true
#: shutdown the current master - check the current topology out of `maxadmin list servers` for better confirming it
[root@box02 ~]# systemctl stop mariadb.service
#: watching the actions on the log file
2018-02-10 13:51:02 error : Monitor was unable to connect to server [192.168.50.13]:3306 : "Can't connect to MySQL server on '192.168.50.13'"
2018-02-10 13:51:02 notice : [mariadbmon] Server [192.168.50.13]:3306 lost the master status.
2018-02-10 13:51:02 notice : Server changed state: box03[192.168.50.13:3306]: master_down. [Master, Running] -> [Down]
2018-02-10 13:51:02 warning: [mariadbmon] Master has failed. If master status does not change in 4 monitor passes, failover begins.
2018-02-10 13:51:06 notice : [mariadbmon] Performing automatic failover to replace failed master 'box03'.
2018-02-10 13:51:06 notice : [mariadbmon] Promoting server 'box02' to master.
2018-02-10 13:51:06 notice : [mariadbmon] Redirecting slaves to new master.
2018-02-10 13:51:07 warning: [mariadbmon] Setting standalone master, server 'box02' is now the master.
2018-02-10 13:51:07 notice : Server changed state: box02[192.168.50.12:3306]: new_master. [Slave, Running] -> [Master, Running]
Maxscale 2.2 New Features
● Failover: replacing a failed master (manual, auto_failover=false)
#: setting auto_fauilover=false
[root@box01 ~]# maxadmin alter monitor replication-cluster-monitor auto_failover=false
#: current master is down, automatic failover deactivated
2018-02-09 23:31:01 error : Monitor was unable to connect to server [192.168.50.12]:3306:"Can't connect to MySQL server on '192.168.50.12'"
2018-02-09 23:31:01 notice : [mariadbmon] Server [192.168.50.12]:3306 lost the master status.
2018-02-09 23:31:01 notice : Server changed state: box02[192.168.50.12:3306]: master_down. [Master, Running] -> [Down]
#: manual failover executed
[root@box01 ~]# maxadmin call command mariadbmon failover replication-cluster-monitor
#: let's check the logs
2018-02-09 23:32:30 info : (17) [cli] MaxAdmin: call command "mariadbmon" "failover" "replication-cluster-monitor"
2018-02-09 23:32:30 notice : (17) [mariadbmon] Stopped monitor replication-cluster-monitor for the duration of failover.
2018-02-09 23:32:30 notice : (17) [mariadbmon] Promoting server 'box03' to master.
2018-02-09 23:32:30 notice : (17) [mariadbmon] Redirecting slaves to new master.
2018-02-09 23:32:30 notice : (17) [mariadbmon] Failover performed.
2018-02-09 23:32:30 warning: [mariadbmon] Setting standalone master, server 'box03' is now the master.
2018-02-09 23:32:30 notice : Server changed state: box03[192.168.50.13:3306]: new_master. [Slave, Running] -> [Master, Running]
Maxscale 2.2 New Features
● Failover: replacing a failed master, additional details
● The passes time is based on the monitor's monitor_interval value;
○ As it's now set as 1000ms, 1 second, the failover will be triggered after 4 seconds, considering
the first pass done when monitor reported the first message;
○ If the failover process does not complete within the time configured on failover_timeout, it is 90
secs by default, the failover is canceled and the feature is disabled;
○ To enable failover again (after checking the possible problems), use the alter monitor cmd:
2018-02-10 13:51:02 warning: [mariadbmon] Master has failed.If master status does not change in 4 monitor passes, failover begins.
[root@box01 ~]# maxadmin alter monitor replication-cluster-monitor auto_failover=true
Maxscale 2.2 New Features
● Switchover: swapping a slave with a running master.
● The switchover process relies on the replication_user and
replication_password setting added to the monitor configs;
● The process is triggered manually and it should take up to
switchover_timeout seconds to complete - default 90 seconds;
● If the process fails, the log will be written and the auto_failover will be
disabled if enabled;
[root@team01-box01 ~]# maxadmin call command mariadbmon switchover replication-cluster-monitor new_master master
Maxscale 2.2 New Features
#: checking the current server's list
[root@team01-box01 ~]# maxadmin list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
box02 | 10.132.116.147 | 3306 | 0 | Slave, Running
box03 | 10.132.116.161 | 3306 | 0 | Master, Running
-------------------+-----------------+-------+-------------+--------------------
#: new_master=box03, current_master=box02
[root@team01-box01 ~]# maxadmin call command mariadbmon switchover replication-cluster-monitor box03 box02
#: checking logs
2018-02-14 16:44:46 info : (712) [cli] MaxAdmin: call command "mariadbmon" "switchover" "replication-cluster-monitor" "box02" "box03"
2018-02-14 16:44:46 notice : (712) [mariadbmon] Stopped the monitor replication-cluster-monitor for the duration of switchover.
2018-02-14 16:44:46 notice : (712) [mariadbmon] Demoting server 'box03'.
2018-02-14 16:44:46 notice : (712) [mariadbmon] Promoting server 'box02' to master.
2018-02-14 16:44:46 notice : (712) [mariadbmon] Old master 'box03' starting replication from 'box02'.
2018-02-14 16:44:46 notice : (712) [mariadbmon] Redirecting slaves to new master.
2018-02-14 16:44:47 notice : (712) [mariadbmon] Switchover box03 -> box02 performed.
2018-02-14 16:44:47 notice : Server changed state: box02[10.132.116.147:3306]: new_master. [Slave, Running] -> [Master, Slave, Running]
2018-02-14 16:44:47 notice : Server changed state: box03[10.132.116.161:3306]: new_slave. [Master, Running] -> [Slave, Running]
2018-02-14 16:44:48 notice : Server changed state: box02[10.132.116.147:3306]: new_master. [Master, Slave, Running] -> [Master, Running]
Switchover: swapping a slave with a running master.
Maxscale 2.2 New Features
● Rejoin: joining a standalone server to the cluster.
● Enable automatic joining back of server to the cluster when a crashed
backend server gets back online;
● When auto_rejoin is enabled, the monitor will attempt to direct
standalone servers and servers replicating from a relay master to the main
cluster master server;
● Test it as we did:
○ Check what is the current master, shutdown MariaDB Server;
○ The failover will happen in case auto_failover is enabled;
○ Start the process for the shutdown MariaDB Server;
○ List servers again out of Maxadmin, watch logs.
Maxscale 2.2 New Features
● Rejoin: joining a standalone server to the cluster.
#: current_master=box02
[root@team01-box02 ~]# mysqladmin shutdown
#: watching logs, the failover will happen as the master "crashed"
2018-02-14 18:44:36 error : Monitor was unable to connect to server [10.132.116.147]:3306 : "Can't connect to MySQL server on '10.132.116.147' (115)"
2018-02-14 18:44:36 notice : [mariadbmon] Server [10.132.116.147]:3306 lost the master status.
2018-02-14 18:44:36 notice : Server changed state: box02[10.132.116.147:3306]: master_down. [Master, Running] -> [Down]
2018-02-14 18:44:36 warning: [mariadbmon] Master has failed. If master status does not change in 4 monitor passes, failover begins.
2018-02-14 18:44:40 notice : [mariadbmon] Performing automatic failover to replace failed master 'box02'.
2018-02-14 18:44:40 notice : [mariadbmon] Promoting server 'box03' to master.
2018-02-14 18:44:40 notice : [mariadbmon] Redirecting slaves to new master.
2018-02-14 18:44:41 warning: [mariadbmon] Setting standalone master, server 'box03' is now the master.
2018-02-14 18:44:41 notice : Server changed state: box03[10.132.116.161:3306]: new_master. [Slave, Running] -> [Master, Running]
#: starting old master back
[root@team01-box02 ~]# systemctl start mariadb.service
#: watching logs
2018-02-14 18:47:27 notice : Server changed state: box02[10.132.116.147:3306]: server_up. [Down] -> [Running]
2018-02-14 18:47:27 notice : [mariadbmon] Directing standalone server 'box02' to replicate from 'box03'.
2018-02-14 18:47:27 notice : [mariadbmon] 1 server(s) redirected or rejoined the cluster.
2018-02-14 18:47:28 notice : Server changed state: box02[10.132.116.147:3306]: new_slave. [Running] -> [Slave, Running]
Thank you!
Time for questions
And answers
1 of 21

Recommended

How to Manage Scale-Out Environments with MariaDB MaxScale by
How to Manage Scale-Out Environments with MariaDB MaxScaleHow to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScaleMariaDB plc
1.2K views42 slides
Maxscale 소개 1.1.1 by
Maxscale 소개 1.1.1Maxscale 소개 1.1.1
Maxscale 소개 1.1.1NeoClova
1.1K views49 slides
Galera cluster for high availability by
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability Mydbops
2.2K views28 slides
Maxscale_메뉴얼 by
Maxscale_메뉴얼Maxscale_메뉴얼
Maxscale_메뉴얼NeoClova
2.1K views21 slides
Database Security Threats - MariaDB Security Best Practices by
Database Security Threats - MariaDB Security Best PracticesDatabase Security Threats - MariaDB Security Best Practices
Database Security Threats - MariaDB Security Best PracticesMariaDB plc
1.5K views31 slides
MariaDB MaxScale monitor 매뉴얼 by
MariaDB MaxScale monitor 매뉴얼MariaDB MaxScale monitor 매뉴얼
MariaDB MaxScale monitor 매뉴얼NeoClova
1.4K views59 slides

More Related Content

What's hot

MariaDB MaxScale by
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScaleMariaDB plc
1.9K views41 slides
Optimizing MariaDB for maximum performance by
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceMariaDB plc
2.1K views35 slides
MariaDB: in-depth (hands on training in Seoul) by
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)Colin Charles
9.2K views148 slides
MariaDB Performance Tuning and Optimization by
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB plc
11.3K views51 slides
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx by
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docxKeepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docxNeoClova
721 views16 slides
Redo log improvements MYSQL 8.0 by
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Mydbops
368 views44 slides

What's hot(20)

MariaDB MaxScale by MariaDB plc
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
MariaDB plc1.9K views
Optimizing MariaDB for maximum performance by MariaDB plc
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
MariaDB plc2.1K views
MariaDB: in-depth (hands on training in Seoul) by Colin Charles
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)
Colin Charles9.2K views
MariaDB Performance Tuning and Optimization by MariaDB plc
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and Optimization
MariaDB plc11.3K views
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx by NeoClova
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docxKeepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
NeoClova721 views
Redo log improvements MYSQL 8.0 by Mydbops
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
Mydbops368 views
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB by Mydbops
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Mydbops191 views
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf by Jesmar Cannao'
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdfProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
Jesmar Cannao'202 views
MariaDB MaxScale: an Intelligent Database Proxy by Markus Mäkelä
MariaDB MaxScale: an Intelligent Database ProxyMariaDB MaxScale: an Intelligent Database Proxy
MariaDB MaxScale: an Intelligent Database Proxy
Markus Mäkelä419 views
Best Practice for Achieving High Availability in MariaDB by MariaDB plc
Best Practice for Achieving High Availability in MariaDBBest Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDB
MariaDB plc4.2K views
How Safe is Asynchronous Master-Master Setup? by Sveta Smirnova
 How Safe is Asynchronous Master-Master Setup? How Safe is Asynchronous Master-Master Setup?
How Safe is Asynchronous Master-Master Setup?
Sveta Smirnova505 views
MariaDB 10: The Complete Tutorial by Colin Charles
MariaDB 10: The Complete TutorialMariaDB 10: The Complete Tutorial
MariaDB 10: The Complete Tutorial
Colin Charles33K views
MariaDB Galera Cluster - Simple, Transparent, Highly Available by MariaDB Corporation
MariaDB Galera Cluster - Simple, Transparent, Highly AvailableMariaDB Galera Cluster - Simple, Transparent, Highly Available
MariaDB Galera Cluster - Simple, Transparent, Highly Available
MariaDB Corporation13.2K views
NY Meetup: Scaling MariaDB with Maxscale by Wagner Bianchi
NY Meetup: Scaling MariaDB with MaxscaleNY Meetup: Scaling MariaDB with Maxscale
NY Meetup: Scaling MariaDB with Maxscale
Wagner Bianchi1.3K views
MariaDB 10.5 binary install (바이너리 설치) by NeoClova
MariaDB 10.5 binary install (바이너리 설치)MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)
NeoClova445 views
Parallel Replication in MySQL and MariaDB by Mydbops
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
Mydbops859 views
ProxySQL High Availability (Clustering) by Mydbops
ProxySQL High Availability (Clustering)ProxySQL High Availability (Clustering)
ProxySQL High Availability (Clustering)
Mydbops2.5K views
MariaDB Galera Cluster by Abdul Manaf
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
Abdul Manaf1.1K views
MySQL Database Architectures - 2022-08 by Kenny Gryp
MySQL Database Architectures - 2022-08MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08
Kenny Gryp145 views
The Complete MariaDB Server tutorial by Colin Charles
The Complete MariaDB Server tutorialThe Complete MariaDB Server tutorial
The Complete MariaDB Server tutorial
Colin Charles2.7K views

Similar to Maxscale switchover, failover, and auto rejoin

Shift Remote: JS - Node.js Scalability Tips - Luciano Mammino (FabFitFun) by
Shift Remote: JS - Node.js Scalability Tips - Luciano Mammino (FabFitFun)Shift Remote: JS - Node.js Scalability Tips - Luciano Mammino (FabFitFun)
Shift Remote: JS - Node.js Scalability Tips - Luciano Mammino (FabFitFun)Shift Conference
128 views66 slides
Node.js: scalability tips by
Node.js: scalability tipsNode.js: scalability tips
Node.js: scalability tipsLuciano Mammino
131 views66 slides
Linux: LVM by
Linux: LVMLinux: LVM
Linux: LVMMichal Sedlak
6K views31 slides
Advanced percona xtra db cluster in a nutshell... la suite plsc2016 by
Advanced percona xtra db cluster in a nutshell... la suite plsc2016Advanced percona xtra db cluster in a nutshell... la suite plsc2016
Advanced percona xtra db cluster in a nutshell... la suite plsc2016Frederic Descamps
1.9K views212 slides
A3 sec -_msr_2.0 by
A3 sec -_msr_2.0A3 sec -_msr_2.0
A3 sec -_msr_2.0a3sec
338 views45 slides
A little systemtap by
A little systemtapA little systemtap
A little systemtapyang bingwu
310 views28 slides

Similar to Maxscale switchover, failover, and auto rejoin(8)

Shift Remote: JS - Node.js Scalability Tips - Luciano Mammino (FabFitFun) by Shift Conference
Shift Remote: JS - Node.js Scalability Tips - Luciano Mammino (FabFitFun)Shift Remote: JS - Node.js Scalability Tips - Luciano Mammino (FabFitFun)
Shift Remote: JS - Node.js Scalability Tips - Luciano Mammino (FabFitFun)
Shift Conference128 views
Advanced percona xtra db cluster in a nutshell... la suite plsc2016 by Frederic Descamps
Advanced percona xtra db cluster in a nutshell... la suite plsc2016Advanced percona xtra db cluster in a nutshell... la suite plsc2016
Advanced percona xtra db cluster in a nutshell... la suite plsc2016
Frederic Descamps1.9K views
A3 sec -_msr_2.0 by a3sec
A3 sec -_msr_2.0A3 sec -_msr_2.0
A3 sec -_msr_2.0
a3sec338 views
A little systemtap by yang bingwu
A little systemtapA little systemtap
A little systemtap
yang bingwu310 views
A little systemtap by yang bingwu
A little systemtapA little systemtap
A little systemtap
yang bingwu183 views
Security testing with gauntlt by James Wickett
Security testing with gauntltSecurity testing with gauntlt
Security testing with gauntlt
James Wickett3.3K views

More from Wagner Bianchi

Migrations from PLSQL and Transact-SQL - m18 by
Migrations from PLSQL and Transact-SQL - m18Migrations from PLSQL and Transact-SQL - m18
Migrations from PLSQL and Transact-SQL - m18Wagner Bianchi
479 views36 slides
Meetup São Paulo, Maxscale Implementação e Casos de Uso by
Meetup São Paulo, Maxscale Implementação e Casos de UsoMeetup São Paulo, Maxscale Implementação e Casos de Uso
Meetup São Paulo, Maxscale Implementação e Casos de UsoWagner Bianchi
662 views24 slides
Escalando o ambiente com MariaDB Cluster (Portuguese Edition) by
Escalando o ambiente com MariaDB Cluster (Portuguese Edition)Escalando o ambiente com MariaDB Cluster (Portuguese Edition)
Escalando o ambiente com MariaDB Cluster (Portuguese Edition)Wagner Bianchi
1.1K views30 slides
Webinar: MariaDB Provides the Solution to Ease Multi-Source Replication by
Webinar: MariaDB Provides the Solution to Ease Multi-Source ReplicationWebinar: MariaDB Provides the Solution to Ease Multi-Source Replication
Webinar: MariaDB Provides the Solution to Ease Multi-Source ReplicationWagner Bianchi
806 views30 slides
MySQL Multi-Source Replication for PL2016 by
MySQL Multi-Source Replication for PL2016MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016Wagner Bianchi
1.4K views43 slides
MySQL 5.7 Multi-Source Replication by
MySQL 5.7 Multi-Source ReplicationMySQL 5.7 Multi-Source Replication
MySQL 5.7 Multi-Source ReplicationWagner Bianchi
1.2K views31 slides

More from Wagner Bianchi(20)

Migrations from PLSQL and Transact-SQL - m18 by Wagner Bianchi
Migrations from PLSQL and Transact-SQL - m18Migrations from PLSQL and Transact-SQL - m18
Migrations from PLSQL and Transact-SQL - m18
Wagner Bianchi479 views
Meetup São Paulo, Maxscale Implementação e Casos de Uso by Wagner Bianchi
Meetup São Paulo, Maxscale Implementação e Casos de UsoMeetup São Paulo, Maxscale Implementação e Casos de Uso
Meetup São Paulo, Maxscale Implementação e Casos de Uso
Wagner Bianchi662 views
Escalando o ambiente com MariaDB Cluster (Portuguese Edition) by Wagner Bianchi
Escalando o ambiente com MariaDB Cluster (Portuguese Edition)Escalando o ambiente com MariaDB Cluster (Portuguese Edition)
Escalando o ambiente com MariaDB Cluster (Portuguese Edition)
Wagner Bianchi1.1K views
Webinar: MariaDB Provides the Solution to Ease Multi-Source Replication by Wagner Bianchi
Webinar: MariaDB Provides the Solution to Ease Multi-Source ReplicationWebinar: MariaDB Provides the Solution to Ease Multi-Source Replication
Webinar: MariaDB Provides the Solution to Ease Multi-Source Replication
Wagner Bianchi806 views
MySQL Multi-Source Replication for PL2016 by Wagner Bianchi
MySQL Multi-Source Replication for PL2016MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016
Wagner Bianchi1.4K views
MySQL 5.7 Multi-Source Replication by Wagner Bianchi
MySQL 5.7 Multi-Source ReplicationMySQL 5.7 Multi-Source Replication
MySQL 5.7 Multi-Source Replication
Wagner Bianchi1.2K views
UNIFAL - MySQL 5.6 - Replicação by Wagner Bianchi
UNIFAL - MySQL 5.6 - ReplicaçãoUNIFAL - MySQL 5.6 - Replicação
UNIFAL - MySQL 5.6 - Replicação
Wagner Bianchi1.2K views
UNIFAL - MySQL Logs - 5.0/5.6 by Wagner Bianchi
UNIFAL - MySQL Logs - 5.0/5.6UNIFAL - MySQL Logs - 5.0/5.6
UNIFAL - MySQL Logs - 5.0/5.6
Wagner Bianchi1.3K views
UNIFAL - MySQL Transações - 5.0/5.6 by Wagner Bianchi
UNIFAL - MySQL Transações - 5.0/5.6UNIFAL - MySQL Transações - 5.0/5.6
UNIFAL - MySQL Transações - 5.0/5.6
Wagner Bianchi3.3K views
UNIFAL - MySQL Storage Engine - 5.0/5.6 by Wagner Bianchi
UNIFAL - MySQL Storage Engine - 5.0/5.6UNIFAL - MySQL Storage Engine - 5.0/5.6
UNIFAL - MySQL Storage Engine - 5.0/5.6
Wagner Bianchi3.4K views
UNIFAL - MySQL Views - 5.0/5.6 by Wagner Bianchi
UNIFAL - MySQL Views - 5.0/5.6UNIFAL - MySQL Views - 5.0/5.6
UNIFAL - MySQL Views - 5.0/5.6
Wagner Bianchi446 views
UNIFAL - MySQL Triggers - 5.0/5.6 by Wagner Bianchi
UNIFAL - MySQL Triggers - 5.0/5.6UNIFAL - MySQL Triggers - 5.0/5.6
UNIFAL - MySQL Triggers - 5.0/5.6
Wagner Bianchi768 views
UNIFAL - MySQL Stored Routines - 5.0/5.6 by Wagner Bianchi
UNIFAL - MySQL Stored Routines - 5.0/5.6UNIFAL - MySQL Stored Routines - 5.0/5.6
UNIFAL - MySQL Stored Routines - 5.0/5.6
Wagner Bianchi759 views
UNIFAL - MySQL Linguagem SQL Básico - 5.0/5.6 by Wagner Bianchi
UNIFAL - MySQL Linguagem SQL Básico - 5.0/5.6UNIFAL - MySQL Linguagem SQL Básico - 5.0/5.6
UNIFAL - MySQL Linguagem SQL Básico - 5.0/5.6
Wagner Bianchi1K views
UNIFAL - MySQL & Vagrant (iniciando os trabalhos) by Wagner Bianchi
UNIFAL - MySQL & Vagrant (iniciando os trabalhos)UNIFAL - MySQL & Vagrant (iniciando os trabalhos)
UNIFAL - MySQL & Vagrant (iniciando os trabalhos)
Wagner Bianchi486 views
Wagner Bianchi, GUOB 2014 MySQL Cluster 7.3 by Wagner Bianchi
Wagner Bianchi, GUOB 2014 MySQL Cluster 7.3Wagner Bianchi, GUOB 2014 MySQL Cluster 7.3
Wagner Bianchi, GUOB 2014 MySQL Cluster 7.3
Wagner Bianchi1.4K views
Introdução ao MySQL 5.6 by Wagner Bianchi
Introdução ao MySQL 5.6Introdução ao MySQL 5.6
Introdução ao MySQL 5.6
Wagner Bianchi3.8K views
InnoDB Plugin - II Fórum da Comunidade MySQL by Wagner Bianchi
InnoDB Plugin - II Fórum da Comunidade MySQLInnoDB Plugin - II Fórum da Comunidade MySQL
InnoDB Plugin - II Fórum da Comunidade MySQL
Wagner Bianchi1.4K views
MySQL Cluster Product Overview by Wagner Bianchi
MySQL Cluster Product OverviewMySQL Cluster Product Overview
MySQL Cluster Product Overview
Wagner Bianchi1.4K views

Recently uploaded

Uni Systems for Power Platform.pptx by
Uni Systems for Power Platform.pptxUni Systems for Power Platform.pptx
Uni Systems for Power Platform.pptxUni Systems S.M.S.A.
58 views21 slides
Future of Indian ConsumerTech by
Future of Indian ConsumerTechFuture of Indian ConsumerTech
Future of Indian ConsumerTechKapil Khandelwal (KK)
24 views68 slides
SAP Automation Using Bar Code and FIORI.pdf by
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdfVirendra Rai, PMP
25 views38 slides
Ransomware is Knocking your Door_Final.pdf by
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdfSecurity Bootcamp
66 views46 slides
Microsoft Power Platform.pptx by
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptxUni Systems S.M.S.A.
61 views38 slides
Business Analyst Series 2023 - Week 3 Session 5 by
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5DianaGray10
345 views20 slides

Recently uploaded(20)

SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10345 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker48 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec15 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson126 views
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe by Simone Puorto
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
Simone Puorto13 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...

Maxscale switchover, failover, and auto rejoin

  • 1. MariaDB Maxscale Switchover, Failover and Rejoin Wagner Bianchi Remote DBA Team Lead @ MariaDB RDBA Team Esa Korhonen Software Engineer @ MariaDB Maxscale Engineering Team
  • 2. Introduction to MariaDB MaxScale ● Intelligent database proxy: ○ Separates client application from backend(s) ○ Understands authentication, queries and backend roles ○ Typical use-cases: read-write splitting, load-balancing ○ Many plugins: query filtering, logging, caching ● Latest GA version: 2.2 DATABASE SERVERS CLIENT
  • 3. Query processing stages Filter Client Protocol Protocol Filter Filter Router Server State Monitor Parser updates monitors uses Backend
  • 4. What is new in MariaDB-Monitor for MaxScale 2.2* ● Support for replication cluster manipulation: failover, switchover, rejoin ○ failover: replace a failed master with a slave ○ switchover: swap a slave with a live master ○ rejoin: bring a standalone server back to the cluster or redirect slaves replicating from the wrong master ● Failover & rejoin can be set to activate automatically ● Reduces need for custom scripts or replication management tools ● Supported topologies: 1 Master, N slaves, 1-level depth ● Limited support for external masters * Note: Renamed from previous mysqlmon
  • 5. Switchover ● Controlled swap of master with a designated slave ● Monitor user must have SUPER-privilege ● Depends on read_only to freeze cluster ○ SUPER-users bypasses this ● Waits for all slaves to catch up with master ○ no data should be lost, but can be slow ● Configuration settings: ○ replication_user & replication_password ○ switchover_timeout $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘ $./maxctrl call command mariadbmon switchover MariaDB-Monitor LocalSlave1 OK $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Slave, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘
  • 6. Failover ● Promote a slave to take place of failed master ● Damage has already been done, so no need to worry about old master ● Chooses a new master based on following criteria (in order of importance): ○ not in exclusion-list ○ has latest event in relay log ○ has processed latest event ○ has log_slave_updates on ● Configuration: ○ failover_timeout ● May lose data with failed master ○ (semi)sync replication $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │ ├──────────────┼───────────┼──────┼─────────────┼────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │ ├──────────────┼───────────┼──────┼─────────────┼────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴────────────────┘ $./maxctrl call command mariadbmon failover MariaDB-Monitor OK $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘
  • 7. Automatic failover ● Trigger: master must be down for a set amount of time ● Additional check by looking at slave connections ● Configuration settings: ○ auto_failover ○ failcount & monitor_interval ○ verify_master_failure & master_failure_timeout $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘ $docker stop maxscalebackends_testing1_master1_1 $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │ ├──────────────┼───────────┼──────┼─────────────┼────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │ ├──────────────┼───────────┼──────┼─────────────┼────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴────────────────┘ $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘
  • 8. Rejoin ● Directs the joining to server to replicate from the cluster master ○ redirect a slave replicating from the wrong master ○ start replication on a standalone server ● Looks at gtid:s to decide if the joining server can replicate ● Manual/automatic mode (auto_rejoin=1) ● Typical use case: master goes down -> failover -> old master comes back -> rejoined to cluster $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘ $docker start maxscalebackends_testing1_master1_1 $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘ $./maxctrl call command mariadbmon rejoin MariaDB-Monitor LocalMaster1 $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Slave, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘
  • 9. External master handling DC A DC B replicating from DC A DC B replicating from
  • 10. Switchover details Starting checks: 1. Cluster has 1 master and >1 slaves 2. All servers use GTID replication and cluster GTID-domain is known 3. Requested new master has binary log on Prepare current master: 1. SET GLOBAL read_only=1; 2. FLUSH TABLES; 3. FLUSH LOGS; 4. update GTID-info Wait until all slaves catch up to master: 1. MASTER_GTID_WAIT() A B C A B C Stop slave replication on new master: 1. STOP SLAVE; 2. RESET SLAVE ALL; 3. SET GLOBAL read_only=0 B A C Redirect slaves & old master to new master: 1. STOP SLAVE; 2. RESET SLAVE; 3. CHANGE MASTER TO … 4. START SLAVE; Check that replication is working: 1. FLUSH TABLES; 2. Check that all slaves receive new gtid
  • 12. Maxscale 2.2 New Features ● At this point you know that, MariaDB Maxscale is able to: ○ Automatic/Manual Failover; ○ Manual Switchover; ○ Rejoin a crashed node as slave of an existing cluster; ● The previous processes relies on the new MariaDBMon monitor; ● Hidden details when implementing and/or break/fix: ○ For the switchover/failover/rejoin work, you need to have the monitor user (MariaDBMon) with access on all the servers or, a separate user for replication_user and replication_password with access on all the servers; ○ If the monitor user (MariaDBMon) has an encrypted password, the replication_password should be encrypted as well, otherwise, the CHANGE MASTER TO running for the processes won't be able to configure the replication for the new server;
  • 13. Maxscale 2.2 New Features ● Failover: replacing a failed master. ● For the automatic failover, auto_failover variable should be true on monitor configuration definition; ○ auto_failover=true, for automatic failover be activated; ● For the manual failover, auto_failover should be set to false on monitor configuration definition; ● The master should be dead for the manual failover to work; ○ auto_failover=false, the failover can be activated manually: ● Enable and disable to auto_failover with the alter monitor command. [root@box01 ~]# maxadmin call command mariadbmon failover replication-cluster-monitor
  • 14. Maxscale 2.2 New Features ● Failover: replacing a failed master (automatic, auto_failover=true) #: checking current configurations [root@box01 ~]# grep auto_failover /var/lib/maxscale/maxscale.cnf.d/replication-cluster-monitor.cnf auto_failover=true #: shutdown the current master - check the current topology out of `maxadmin list servers` for better confirming it [root@box02 ~]# systemctl stop mariadb.service #: watching the actions on the log file 2018-02-10 13:51:02 error : Monitor was unable to connect to server [192.168.50.13]:3306 : "Can't connect to MySQL server on '192.168.50.13'" 2018-02-10 13:51:02 notice : [mariadbmon] Server [192.168.50.13]:3306 lost the master status. 2018-02-10 13:51:02 notice : Server changed state: box03[192.168.50.13:3306]: master_down. [Master, Running] -> [Down] 2018-02-10 13:51:02 warning: [mariadbmon] Master has failed. If master status does not change in 4 monitor passes, failover begins. 2018-02-10 13:51:06 notice : [mariadbmon] Performing automatic failover to replace failed master 'box03'. 2018-02-10 13:51:06 notice : [mariadbmon] Promoting server 'box02' to master. 2018-02-10 13:51:06 notice : [mariadbmon] Redirecting slaves to new master. 2018-02-10 13:51:07 warning: [mariadbmon] Setting standalone master, server 'box02' is now the master. 2018-02-10 13:51:07 notice : Server changed state: box02[192.168.50.12:3306]: new_master. [Slave, Running] -> [Master, Running]
  • 15. Maxscale 2.2 New Features ● Failover: replacing a failed master (manual, auto_failover=false) #: setting auto_fauilover=false [root@box01 ~]# maxadmin alter monitor replication-cluster-monitor auto_failover=false #: current master is down, automatic failover deactivated 2018-02-09 23:31:01 error : Monitor was unable to connect to server [192.168.50.12]:3306:"Can't connect to MySQL server on '192.168.50.12'" 2018-02-09 23:31:01 notice : [mariadbmon] Server [192.168.50.12]:3306 lost the master status. 2018-02-09 23:31:01 notice : Server changed state: box02[192.168.50.12:3306]: master_down. [Master, Running] -> [Down] #: manual failover executed [root@box01 ~]# maxadmin call command mariadbmon failover replication-cluster-monitor #: let's check the logs 2018-02-09 23:32:30 info : (17) [cli] MaxAdmin: call command "mariadbmon" "failover" "replication-cluster-monitor" 2018-02-09 23:32:30 notice : (17) [mariadbmon] Stopped monitor replication-cluster-monitor for the duration of failover. 2018-02-09 23:32:30 notice : (17) [mariadbmon] Promoting server 'box03' to master. 2018-02-09 23:32:30 notice : (17) [mariadbmon] Redirecting slaves to new master. 2018-02-09 23:32:30 notice : (17) [mariadbmon] Failover performed. 2018-02-09 23:32:30 warning: [mariadbmon] Setting standalone master, server 'box03' is now the master. 2018-02-09 23:32:30 notice : Server changed state: box03[192.168.50.13:3306]: new_master. [Slave, Running] -> [Master, Running]
  • 16. Maxscale 2.2 New Features ● Failover: replacing a failed master, additional details ● The passes time is based on the monitor's monitor_interval value; ○ As it's now set as 1000ms, 1 second, the failover will be triggered after 4 seconds, considering the first pass done when monitor reported the first message; ○ If the failover process does not complete within the time configured on failover_timeout, it is 90 secs by default, the failover is canceled and the feature is disabled; ○ To enable failover again (after checking the possible problems), use the alter monitor cmd: 2018-02-10 13:51:02 warning: [mariadbmon] Master has failed.If master status does not change in 4 monitor passes, failover begins. [root@box01 ~]# maxadmin alter monitor replication-cluster-monitor auto_failover=true
  • 17. Maxscale 2.2 New Features ● Switchover: swapping a slave with a running master. ● The switchover process relies on the replication_user and replication_password setting added to the monitor configs; ● The process is triggered manually and it should take up to switchover_timeout seconds to complete - default 90 seconds; ● If the process fails, the log will be written and the auto_failover will be disabled if enabled; [root@team01-box01 ~]# maxadmin call command mariadbmon switchover replication-cluster-monitor new_master master
  • 18. Maxscale 2.2 New Features #: checking the current server's list [root@team01-box01 ~]# maxadmin list servers Servers. -------------------+-----------------+-------+-------------+-------------------- Server | Address | Port | Connections | Status -------------------+-----------------+-------+-------------+-------------------- box02 | 10.132.116.147 | 3306 | 0 | Slave, Running box03 | 10.132.116.161 | 3306 | 0 | Master, Running -------------------+-----------------+-------+-------------+-------------------- #: new_master=box03, current_master=box02 [root@team01-box01 ~]# maxadmin call command mariadbmon switchover replication-cluster-monitor box03 box02 #: checking logs 2018-02-14 16:44:46 info : (712) [cli] MaxAdmin: call command "mariadbmon" "switchover" "replication-cluster-monitor" "box02" "box03" 2018-02-14 16:44:46 notice : (712) [mariadbmon] Stopped the monitor replication-cluster-monitor for the duration of switchover. 2018-02-14 16:44:46 notice : (712) [mariadbmon] Demoting server 'box03'. 2018-02-14 16:44:46 notice : (712) [mariadbmon] Promoting server 'box02' to master. 2018-02-14 16:44:46 notice : (712) [mariadbmon] Old master 'box03' starting replication from 'box02'. 2018-02-14 16:44:46 notice : (712) [mariadbmon] Redirecting slaves to new master. 2018-02-14 16:44:47 notice : (712) [mariadbmon] Switchover box03 -> box02 performed. 2018-02-14 16:44:47 notice : Server changed state: box02[10.132.116.147:3306]: new_master. [Slave, Running] -> [Master, Slave, Running] 2018-02-14 16:44:47 notice : Server changed state: box03[10.132.116.161:3306]: new_slave. [Master, Running] -> [Slave, Running] 2018-02-14 16:44:48 notice : Server changed state: box02[10.132.116.147:3306]: new_master. [Master, Slave, Running] -> [Master, Running] Switchover: swapping a slave with a running master.
  • 19. Maxscale 2.2 New Features ● Rejoin: joining a standalone server to the cluster. ● Enable automatic joining back of server to the cluster when a crashed backend server gets back online; ● When auto_rejoin is enabled, the monitor will attempt to direct standalone servers and servers replicating from a relay master to the main cluster master server; ● Test it as we did: ○ Check what is the current master, shutdown MariaDB Server; ○ The failover will happen in case auto_failover is enabled; ○ Start the process for the shutdown MariaDB Server; ○ List servers again out of Maxadmin, watch logs.
  • 20. Maxscale 2.2 New Features ● Rejoin: joining a standalone server to the cluster. #: current_master=box02 [root@team01-box02 ~]# mysqladmin shutdown #: watching logs, the failover will happen as the master "crashed" 2018-02-14 18:44:36 error : Monitor was unable to connect to server [10.132.116.147]:3306 : "Can't connect to MySQL server on '10.132.116.147' (115)" 2018-02-14 18:44:36 notice : [mariadbmon] Server [10.132.116.147]:3306 lost the master status. 2018-02-14 18:44:36 notice : Server changed state: box02[10.132.116.147:3306]: master_down. [Master, Running] -> [Down] 2018-02-14 18:44:36 warning: [mariadbmon] Master has failed. If master status does not change in 4 monitor passes, failover begins. 2018-02-14 18:44:40 notice : [mariadbmon] Performing automatic failover to replace failed master 'box02'. 2018-02-14 18:44:40 notice : [mariadbmon] Promoting server 'box03' to master. 2018-02-14 18:44:40 notice : [mariadbmon] Redirecting slaves to new master. 2018-02-14 18:44:41 warning: [mariadbmon] Setting standalone master, server 'box03' is now the master. 2018-02-14 18:44:41 notice : Server changed state: box03[10.132.116.161:3306]: new_master. [Slave, Running] -> [Master, Running] #: starting old master back [root@team01-box02 ~]# systemctl start mariadb.service #: watching logs 2018-02-14 18:47:27 notice : Server changed state: box02[10.132.116.147:3306]: server_up. [Down] -> [Running] 2018-02-14 18:47:27 notice : [mariadbmon] Directing standalone server 'box02' to replicate from 'box03'. 2018-02-14 18:47:27 notice : [mariadbmon] 1 server(s) redirected or rejoined the cluster. 2018-02-14 18:47:28 notice : Server changed state: box02[10.132.116.147:3306]: new_slave. [Running] -> [Slave, Running]
  • 21. Thank you! Time for questions And answers