7. Terminology
Automatic failover: a standby database was automatically promoted to primary
because the previous primary failed (or became otherwise unavailable)
Manual switchover: a standby database was manually promoted to primary by a
DBA to, for example, perform a rolling upgrade
Automatic rejoin: a previously failed primary database is recovered and
automatically reconfigured as a standby
11. Global transaction IDs
GTID = domain ID + server ID + sequence number
1. prevents conflicts between multiple masters
2. enable standbys to resume replication
12. Binary log (binlog)
Commit ID GTID Server ID Event type Position End position
100 0-1-200 1 Query 0 150
100 0-1-201 1 Query 151 500
100 0-1-202 1 Query 501 600
101 0-1-203 1 Query 601 800
101 0-1-204 1 Query 801 1000
Logical view of the binlog
13. Sequence
1. The standby IO thread requests binlog events, includes its current GTID
2. The primary returns binlog events for the next GTID(s)
3. The standby IO thread writes the binlog events to its relay log
4. The standby SQL thread reads the binlog events from its relay log
5. The standby SQL thread executes the binlog events and updates its current GTID
24. Concepts
Group communication ensures total ordering of messages sent from multiple nodes
Write sets contain all of the rows modified by a transaction, created during the commit phase
Global transaction ordering assigns writes sets a GTID (UUID + sequence number) so writes
are applied in the same order on every node
Certification ensures write sets are applied on all nodes or rejected on all nodes with
deterministic testing
25. Sequence
1. Synchronous
a. Originating node: create a write set
b. Originating node: assign a global transaction ID to the write set and replicate it
c. Originating node: apply the write set and commit the transaction
2. Asynchronous
a. Other nodes: certify the write set
b. Other nodes: apply the write set and commit the transaction
31. Parameters
Variable Values Default
auto_failover TRUE | FALSE FALSE
auto_rejoin TRUE | FALSE FALSE
switchover_on_low_disk_space TRUE | FALSE FALSE
failcount 1 to n 5
monitor_interval 100 to n (ms) 2000 (2 seconds)
verify_master_failure TRUE | FALSE FALSE
servers_no_promotion server names (CSV) N/A
32. What’s new
Connection failover: connection is migrated to the new primary
Delayed retry: retry queries after automatic failover has completed
Transaction replay: replay transactions from start if failover occurs mid transaction
Optimistic transactions: start transactions on standbys for session failover
33. What’s new
Variable Values Default
master_reconnection TRUE | FALSE FALSE
delayed_retry TRUE | FALSE FALSE
delayed_retry_timeout 0 to n (s) 10
transaction_replay TRUE | FALSE FALSE
optimistic_trx TRUE | FALSE FALSE
max_sescmd_history 0 to n 50
37. Read consistency: replication
If read consistency is required, enable causal reads.
● Takes advantage of GTID (MASTER_GTID_WAIT function)
● Waits for a standby to catch up to the client
● If it doesn’t catch up in time, the query is routed to the primary
Variable Values Default
causal_reads TRUE | FALSE FALSE
causal_reads_timeout 0 to n (s) 10
38. Read consistency: clustering
Variable Values Default
wsrep_sync_wait 0 (DISABLED)
1 (READ)
2 (UPDATE and DELETE)
3 (READ, UPDATE and DELETE)
4 (INSERT and REPLACE)
5 (READ, INSERT and REPLACE)
6 (UPDATE, DELETE, INSERT and REPLACE)
7 (READ, UPDATE, DELETE, INSERT and REPLACE)
8 (SHOW), 9-15 (1-7 + SHOW)
0 (DISABLED)
If read consistency is required, set the wsrep_sync_wait system variable to 1.
40. Multiple data centers: replication
Data Center (DC1, Active) Data Center (DC2, Passive)
Standby Standby Primary Primary Standby Standby
MariaDB MaxScale
(Proxy)
MariaDB MaxScale
(Proxy)
41. Multiple data centers: clustering
Data Center (DC1, Active) Data Center (DC2, Passive)
Node 1
(P1: priority=1,
P2: priority=3)
Node 2
(P1: priority=2,
P2: priority=2)
Node 3
(P1: priority=3,
P2: priority=1)
Clustering
(synchronous replication)
MariaDB MaxScale
(Proxy)
MariaDB MaxScale
(Proxy)
46. Replication: binlog
Variable Values Default
sync_binlog 0 (defer to OS), n (number of group commits to fsync) 0 (deter to OS)
binlog_format STATEMENT | ROW | MIXED MIXED
log_bin_compress 0 (OFF), 1 (ON) 0 (OFF)
1. You can fsync multiple transactions by enabling group commits (sync_binlog=1)
2. You can use the binlog ROW format if transactions take a long time or result in small changes
3. You can compress binlog events to reduce disk and network IO
47. Replication: parallelization
Variable Values Default
slave-parallel-mode optimistic | conservative | aggressive | minimal | none conservative
slave-parallel-threads 0 - n 0
binlog_commit_wait_count 0 - n 0
binlog_commit_wait_usec 0 - n 100000 (100ms)
read_binlog_speed_limit 0 (unlimited), n (kb) 0
1. You can execute transactions in parallel on standbys (slave-parallel-threads > 0)
2. You can throttle replication to reduce standby load on the primary
48. Clustering: async write/flush
Variable Values Default
innodb_flush_log_at_tx_commit 0 (write and flush once a second)
1 (write and flush during commit)
2 (write during commit, flush once a second)
1
1. You can fsync InnoDB logs asynchronously because synchronous replication
provides durability