Hochverfügbarkeitslösungen mit MariaDB

Hochverfügbarkeitslösungen
mit MariaDB
Stefan Schmit, Senior Solution Engineer CEUR
MariaDB plc

2
High Availability - HA
High availability
(HA) is a characteristic
of a system which aims
to ensure an agreed
level of operational
performance, usually
uptime, for a higher than
normal period.
https://en.wikipedia.org/wiki/High_availability

https://upload.wikimedia.org/wikipedia/commons/6/69
/RPO_RTO_example_converted.png
3
RPO RTO
Recovery Time Objective
The Recovery Time Objective (RTO) is the targeted
duration of time and a service level within which a
business process must be restored after a disruption in
order to avoid a break in business continuity.
According to business continuity planning methodology,
the RTO is established during the Business Impact
Analysis (BIA) by the owner(s) of the process, including
identifying time frames for alternate or manual
workarounds.
Recovery Point Objective
A Recovery Point Objective (RPO) is the maximum
acceptable interval during which transactional data is
lost from an IT service.
For example, if RPO is measured in minutes, then in
practice, off-site mirrored backups must be continuously
maintained as a daily off-site backup will not suffice.
https://en.wikipedia.org/wiki/Disaster_recovery

5
Architecture - Single Node
MariaDB
Primary
r/w
Your
Applications Single Node Setup
No failover option
Backup / Restore is key
RPO / RTO define the SLA

6
Architecture - Primary / Replica Setup
MariaDB
Primary
r/w
Your
Applications
Primary / Replica Node Setup
“Manual” failover to Slave
Asynchronous Replication
Semi-synchronous Replication
“Passive” Hardware
Manual failover process defines the SLA
Backup process can run on Slave
MariaDB
Replica

7
Architecture - Primary / Replica Setup
MariaDB
Primary
r/w
Your
Applications
Primary / Replica Node Setup
“Manual” failover to Slave
Semi-synchronous Replication
Galera Cluster
“Passive” Hardware
Manual failover process defines the SLA
Backup process can run on Slave
MariaDB
Replica
MariaDB
Replica

Architecture for high Availability with MaxScale
MariaDB
Primary
MaxScale
MariaDB
Replica
r/w r
Your
Applications
MariaDB MaxScale is an advanced SQL firewall, proxy, router, and load balancer:
• MaxScale performs automated failover for MariaDB replication.
• MaxScale's ReadWriteSplit router performs query-based load balancing.
• MaxScale's Cache filter can improve SELECT performance by caching and
reusing results.
• MaxScale can filter data via Data Masking, with defined patterns
• MaxScale also helps to avoid downtimes or hick-ups with
- Upgrades and Patches
- adding Nodes
- DoS attacks
- SQL Injection
- Security Violations
MariaDB
Replica
r

Architecture for high Availability in SkySQL
MariaDB
Primary
MaxScale
MariaDB
Replica
r/w r
Your
Applications
Performance Standard
SkySQL Foundation Tier
• Multi-node conﬁgurations will deliver a 99.95%
service availability on a per-billing-month basis.
• For example, with this availability target in a 30 day
calendar month the maximum service downtime is 21
minutes and 54 seconds.
SkySQL Power Tier
• Multi-node conﬁgurations will deliver a 99.995%
service availability on a per-billing-month basis.
• For example, with this availability target in a 30 day
calendar month the maximum service downtime is 2
minutes and 11 seconds.
Availability
Zone 2
Availability
Zone 1
MariaDB SkySQL SLA

MariaDB MaxScale
Architecture Options
10

Basic HA Architecture
11
MaxScale
MaxScale
1
Active
Primary Replica-1 Replica-2
Replication
● Only one MaxScale
● Single point of failure
● MariaDB Backend HA

Traditional Setup
12
● Prior to MaxScale 2.5, MaxScale HA required manual intervention
● While all the MaxScale nodes can route queries, read write splitting
and other operations, only the “active” MaxScale node (PASSIVE =
false) could perform automatic failovers.
● In case of the “active” MaxScale goes down, one of the remaining
MaxScale nodes needed to be set to “PASSIVE = false” so that
particular node could handle automatic failover.
● This was usually done with the help of third party tools such as
○ keepalived
○ corosync/pacemaker

Typical Recommended Architecture (Traditionally)
13
MaxScale
MaxScale
1
Active
Primary Replica-1
MaxScale
MaxScale
2
Passive
Replica-2
Replication
● Can’t have both MaxScale doing database
Failover
● Must use 3rd Party tools such as KeepaliveD to
control which is the “Active” MaxScale
● Issues for support in case of KeepaliveD failure
● Complex Configuration
● Only One MaxScale can be used for Query
routing
KeepaliveD
Virtual IP

Why “Cooperative Locking”
14
● Starting with MaxScale 2.5, Co-op Locking was introduced
● Multiple MaxScale nodes can work together without the need of any
third party component(s)
● MaxScale nodes will seamlessly decide which is the primary
MaxScale and which is not.
○ This is done by a special locking mechanism.
● Primary MaxScale handles the MariaDB failover.
● Two modes to choose from
○ majority_of_running
○ majority_of_all

cooperative_monitoring_locks (maxscale.cnf)
15
majority_of_running
● Default in SkySQL if the customer goes for dual MaxScale setup.
● MaxScale node that has the maximum number of locks will become the Primary
● In this mode, the total number of “Running” MariaDB nodes are considered excluding the
nodes that are down.
● Locks required are calculated as
○ Round the result up: n_servers/2 + 1
○ “n_servers” is the total number alive servers in the cluster
○ Consider a 3 nodes cluster
■ All 3 nodes are alive: round(3/2+1) = 2
■ 1 Node goes down: round(2/2+1) = 2
■ 2 Nodes goes down: round(1/2+1) = 1
○ This supports more nodes failure while still being able to do automatic MariaDB
failover.

majority_of_running
16
MaxScale
MaxScale
1
Primary Primary
Primary DC DR DC
MaxScale
MaxScale
2
Replica-1
Async Replication
● One nodes go down, the minimum of DB locks required reduced to “2”, it
can be achieved,
● MaxScale 1 is “primary”
● Automatic DB failover remains activated.

majority_of_running
17
MaxScale
MaxScale
1
Primary Primary
Primary DC DR DC
MaxScale
MaxScale
2
Primary
● 2 nodes go down, the minimum of DB locks required reduced to “1” which can
be still achieved.
● MaxScale 1 is still the “primary” MaxScale

majority_of_running
18
MaxScale
MaxScale
1
Primary Primary
Primary DC DR DC
MaxScale
MaxScale
2
● Entire Data Center goes down
● The minimum of DB locks required is
reduced to “1” which can be still achieved.
● MaxScale 3 becomes “primary”
Primary

19
majority_of_running
● Can cause split-brain (Multiple MaxScale nodes becoming primary!)
○ Consider a Primary / DR setup
○ In case of a network partition between the two data centers, both
MaxScale on each data center will become “Primary” as they can’t
see the other side DB nodes.
○ This leads to Two “Primary” MariaDB servers running on each data
center!
○ Unlikely scenario but keep this in mind.

majority_of_running
20
MaxScale
MaxScale
1
Primary Replica-1
Primary DC DR DC
MaxScale
MaxScale
2
Primary
● Network between the two data centers is LOST
● The MaxScale nodes can only see the DB nodes within their own data centers
● “majority_of_running” rule applies and minimum of locks required is reduced to 2 for DC and is reduced to 1 for DR
● Split-Brain! We now have two “primary” MaxScale nodes!
● The new “primary” MaxScale node in DR, promotes one of the Replica as “Primary DB”
● Two Primary DB nodes running, one on each DC creating data inconsistency!
Async Replication

21
majority_of_all
● In this mode, all the nodes are considered
● MaxScale node that has the maximum number of locks will become the Primary
● Locks required are calculated as
○ Round the result up: n_servers/2 + 1
○ “n_servers” is the total number of MariaDB servers in the cluster
○ In case of
■ 3 nodes setup, the locks required by MaxScale is round(3/2+1) = 2
■ 7 nodes setup, the locks required by MaxScale is round(7/2+1) = 4
● If too many MariaDB nodes going down at the same time, none of the
MaxScale nodes will be able to get the minimum number of locks required.
○ Consider, total of 3 backend servers, if 2 nodes go down, the minimum of
required locks, “2”, can’t be achieved
○ No automatic failover.
○ Minimum of “n_servers/2 + 1” must be alive for the automatic failover to
work

majority_of_all
22
MaxScale
MaxScale
1
Primary Replica-1
Primary DC DR DC
MaxScale
MaxScale
2
Replica-2
Async Replication
● Locks required are (round up of) 3/2+1 = 2
● MaxScale 1 has the max locks for instance, it becomes “primary”
● Other three MaxScale noes are “secondary”

majority_of_all
23
MaxScale
MaxScale
1
Primary Primary
Primary DC DR DC
MaxScale
MaxScale
2
Replica-2
Async Replication
● One nodes go down, the minimum of DB locks required “2” can be achieved,
● MaxScale 1 is still “primary”
● It is possible, another MaxScale node becomes primary, but only one.

majority_of_all
24
MaxScale
MaxScale
1
Primary Primary
Primary DC DR DC
MaxScale
MaxScale
2
Replica-2
● 2 nodes go down, the minimum of DB locks required “2” can no longer be be achieved!
● All the MaxScale nodes become “secondary”, Automatic failover is disabled.

25
majority_of_all
● This protects against multiple MaxScale nodes becoming
primary in case of a split brain scenario
● Good to have in case of poor network network between two
data centers.

majority_of_all
26
MaxScale
MaxScale
1
Primary Replica-1
Primary DC DR DC (Read-Only)
MaxScale
MaxScale
2
Replica-2
● Network between the two data centers is broken, all the MaxScale nodes on DC can
acquire 2 locks each which the same as minimum requirement of “2”
● DC still MaxScale still can do automatic failover.
● But the DR MaxScale can only get lock on “1” node, it’s automatic failover is disabled.

27
Architecture - higher Availability Options
MariaDB
Primary
MaxScale
r/w
MariaDB
Replica
r
Your
Applications
MariaDB
Replica
MaxScale
MariaDB
Replica
r r
MariaDB
Replica
r
Datacenter 2
Datacenter 1

28
MaxScale config_sync_cluster
When configuring MaxScale synchronization for the first time, the same static configuration files should be used on all MaxScale
instances that use same cluster value of “config_sync_cluster” must be the same on all MaxScale instances and the cluster (i.e.
the monitor) pointed by it and its servers must be the same in every configuration.

29
MariaDB HA Clients
MariaDB Connector/J JDBC
https://mariadb.com/kb/en/about-mariadb-connector-j/
MariaDB node.js Connector
https://mariadb.com/docs/xpand/connect/programming-languages/nodejs/promise/connection-pools/
MariaDB Python Connector
https://mariadb.com/docs/server/connect/programming-languages/python/connect/#Connection_Failover
MariaDB ODBC Connector
https://mariadb.com/docs/server/connect/programming-languages/odbc-api/connect/#Failover
MariaDB R2DBC Connector
https://mariadb.com/docs/server/release-notes/mariadb-connector-r2dbc/1-1-2/#Failover_and_Load_Balancing
MariaDB C++ Connector
https://mariadb.com/docs/server/release-notes/mariadb-connector-cpp-1-1/1-1-1/#Notable_Changes

31
Xpand - the distributed OLTP Database
Transactional
Distributed SQL
Full Elasticity (↑↓)
Read/Write Scale

32
Xpand - the distributed OLTP Database
When you run a
distributed Database, you
always think about:
- Data Distribution
- Data Replication
- Skewing
- Shared-nothing
- Distributed SQL
- Data locality
- GEO-Distribution
- read/write
performance
- etc. ….

33
Adding Nodes
replicas
slices
replicas
slices
replicas
slices
Node #1 Node #2 Node #3 NEW Node #4

34
Adding Nodes
replicas
slices
replicas
slices
replicas
slices
replicas
slices
Node #1 Node #2 Node #3 NEW Node #4

35
Self-healing
replicas
slices
replicas
slices
replicas
slices
replicas
slices
Node #1 Node #2 Node #3 Node #4

36
Self-healing - Temporary Failure
replicas
slices
replicas
slices
replicas
slices
replicas
slices
Node #1 Node #2 Node #3 Node #4

37
Self-healing
replicas
slices
replicas
slices
replicas
slices
Node #1 Node #2 Node #3

38
Self-healing
replicas
slices
replicas
slices
replicas
slices
No transactions are blocked during these operations

39
Self-healing
replicas
slices
replicas
slices
replicas
slices

XPAND DEEP DIVE
Distributed Query Processing

41
Product table
(slices 1-10)
Product table
(slices 11-20)
Product table
(slice 21-30)
Product table
(replica 1-10)
Product table
(replica 21-30)
Product table
(replica 11-20)
Distributed Query Processing - UPDATE
UPDATE products
SET price = 999
WHERE sku_id = 5;

42
Product table
(slices 1-10)
Product table
(slices 11-20)
Product table
(slice 21-30)
Product table
(replica 1-10)
Product table
(replica 21-30)
Product table
(replica 11-20)
SESSION / GTM
Look up slice locations by key
UPDATE products
SET price = 999
WHERE sku_id = 5;

43
Product table
(slices 1-10)
Product table
(slices 11-20)
Product table
(slice 21-30)
Product table
(replica 1-10)
Product table
(replica 21-30)
Product table
(replica 11-20)
SESSION / GTM
Look up slice location by key
Send code fragments to nodes
SET price = $999
WHERE sku_id = 5
SET price = $999
WHERE sku_id = 5
UPDATE products
SET price = 999
WHERE sku_id = 5;

44
Product table
(slices 1-10)
Product table
(slices 11-20)
Product table
(slice 21-30)
Product table
(replica 1-10)
Product table
(replica 21-30)
Product table
(replica 11-20)
SESSION / GTM
Coordinate transaction
SET price = $999
WHERE sku_id = 5
SET price = $999
WHERE sku_id = 5
Parallel Writes
Commit
Commit
UPDATE products
SET price = 999
WHERE sku_id = 5;

45
Product table
(slices 1-10)
Product table
(slices 11-20)
Product table
(slice 21-30)
Product table
(replica 1-10)
Product table
(replica 21-30)
Product table
(replica 11-20)
Transaction
Committed
SESSION / GTM
Coordinate transaction
UPDATE products
SET price = 999
WHERE sku_id = 5;

XPAND DEEP DIVE
○ Replication

47
Backup and Replication
Xpand
Parallel
Bacup
Xpand / MySQL / Mariadb

48
Multi-Stream Parallel Replication
Active / Active
Client
Drivers
Client
Drivers
12,500 trx/sec
30% writes
12,500 trx/sec
30% writes
<1 sec Lag between regions

Serves Multiple Problem Domains
49
High Volume, Fast, Parallel
Active-Active
topology
Passive Standby
for Disaster Recovery
Daisy chain
replication to multiple regions
for global access

Hochverfügbarkeitslösungen mit MariaDB

Recommended

Recommended

More Related Content

Similar to Hochverfügbarkeitslösungen mit MariaDB

Similar to Hochverfügbarkeitslösungen mit MariaDB (20)

More from MariaDB plc

More from MariaDB plc (20)

Recently uploaded

Recently uploaded (20)

Hochverfügbarkeitslösungen mit MariaDB