SlideShare a Scribd company logo
1 of 27
Download to read offline
Planning for Disaster Recovery
with Galera Cluster
Colin Charles, colin.charles@galeracluster.com

29 October 2019

https://twitter.com/galeracluster | www.galeracluster.com 

Codership Webinar
Agenda
• Disasters happen

• Trade off’s

• A geo-distributed Galera Cluster

• Architecture

• A DR plan

• Is async the best solution for DR?

• Resources
Galera Cluster highlights
• We talk a lot about High Availability

• We talk a lot about multi-master replication

• Synchronous clusters that can ensure you’re always available

• Quorum based failure handling, optimistic concurrency control to commit

• Optimised for the cloud/Wide Area Networks (WANs)
However how does all this work with Disaster
Recovery?
• Galera Cluster does support being run in multiple data centres

• Effectively you can have a 9-node Galera Cluster across 3 data centres to
keep you highly available

• Galera Cluster supports geo-distributed database clusters

• https://galeracluster.com/2015/07/geo-distributed-database-clusters-
with-galera/
Benefits of a geo-distributed Galera Cluster
• Increased redundancy

• All database operations are local
(segmented)

• Network traffic is reduced across
DCs (with optimised bandwidth
consumption)

• Latency penalty as minimal as
possible (when it is time to
COMMIT, hello speed of light, et al)

• Flow control fully configurable 

• No split brain issues

• Out of the box encryption

• Can also work with asynchronous
replication
So, architecture…
• If you’re doing 9 Galera Cluster nodes at the minimum, you also have to
have your application clusters in 3 DCs

• Sure, this is great for High Availability, but gets costly after sometime…

• You also have to ensure that your schema is planned sensibly, after all, if
you have hot rows, deadlocks, and less tolerance to performance issues
during rollbacks, this may not be the best solution for a busy application
that does a lot of UPDATEs
We are here to talk Disaster Recovery (DR)
• It is the ability to run your business continuously without any interruptions irrespective of any damage occurring to
your infrastructure 

• DR is definitely not cheap, but can you afford to lose business transactions? It is this “backup cost” that you need
to think about

• We’ve seen things inside the Linux kernel that can help with DR too, e.g. DRBD 

• Basically a good DR plan is your Business Continuity Plan (BCP)

• Disaster Recovery and Business Continuity, 3rd Edition by Thejandra BS

• Recovery Time Objective (RTO): the time-scale (in hours or days) within which this must be achieved, that is, the
length of time it can afford to cease operating its business.

• Recovery Point Objective (RPO): the point in time when an organisation should recover, for example, it could be
stated as ‘Data can be recovered as of 9 pm last night’ - it defines the amount of data that it can afford to lose.

• You’re building resilience in your infrastructure
Cloud people understand resilience
• Cloud instances tend to be of varying quality

• Sometimes you spin up a poor instance. Best to kill/restart, as long as you know baseline
benchmarks

• The Simian Army (by Netflix) can help make more resilient infrastructure 

• Includes Chaos Monkey, Latency Monkey, Chaos Gorilla (drops a whole AZ), etc.

• Even spawned a field, Chaos Engineering

• Chaos engineering is the discipline of experimenting on a software system in
production in order to build confidence in the system's capability to withstand turbulent
and unexpected conditions. (ref: https://en.wikipedia.org/wiki/Chaos_engineering)
What else do you need to think about?
• Keep track of the Mean Time to Recover (MTTR)

• Underrated is the Mean Time to Detect (MTTD) — how long do you know
a disaster has struck and can move your workloads?

• What is your SLA?
So what do you need in a plan?
• In terms of a Galera Cluster, you’re really thinking about ensuring you have
another data centre to take over

• You could already be running a 3-DC cluster… 

• But presumably, you’re planning for disaster recovery, likely via
asynchronous replication to another data centre (as it saves the cost of
having yet another DC)

• You also want to make sure all this is 100% fully automated…
You’ll have to think about your entire stack
• Beyond the database, you have to ensure that there will be quick DNS
switchover (so low TTL on your DNS)

• Application servers need to be running and ready to take on the load at
the other data centre

• If using a proxy, this too will have to be awaiting at the other data centre

• So to mitigate from a complete disaster AND have great performance, you
are going to want to create a replica of your setup at a remote site
Why async replication between data centres for
DR?
• Async replication in MySQL 5.7/8 are really quite fast (same with MariaDB
10.3/10.4)

• The idea of “lagging slaves” should not be too much of an issue… this
can be tuned and configured

• You must ask — is fully synchronous replication right for your application? 

• Callaghan’s Law: [In a Galera cluster] a given row can’t be modified more
than once per RTT. 
A practical case study
• A more practical example, by Marco Tusa — https://www.percona.com/
blog/2018/11/15/how-not-to-do-mysql-high-availability-geographic-node-
distribution-with-galera-based-replication-misuse/ AND https://
www.percona.com/blog/2018/11/15/mysql-high-availability-on-premises-
a-geographically-distributed-scenario/
Simple reasons…
• A Galera Cluster across 3-DCs is pricier
than the previous solution, and it gives
you data consistency across all nodes.
You however do need to ensure your
application can take the commit time
penalties, you have a high performant
link for replication…

• The other approach is more focused on
“local commits” (just to you 3-node
cluster in one DC), you’ll see some data
state difference thanks to async
replication, you don’t need a great
replication link, DR works, and also this
works better across geographies 

• We always think latencies, even 5ms
isn’t high, but it actually is!

• We have to remember a Galera writeset
can be as small as a 1 row INSERT but
large with many UPDATEs too

• We have to think about IP frames

• In Galera, flow control is the receiving
queue. There is a queue of events and
the longer this queue is, the longer it
takes for certification too.
All this doesn’t absolve you from other things…
• Like some kind of “automatic failover framework” when you go the async
route for DR

• A good backup and restore solution 

• A good rule based solution for load balancing (ProxySQL, MariaDB
MaxScale)
The Galera Arbitrator Daemon (garbd)
• If you have access to a 3rd data centre, or put a one-node garbd in your
DR site, you could also have a 2-paired cluster in 2 DCs, thus bringing
your node count to a mere 7 nodes (instead of 9)

• When you have an even number of nodes, garbd functions as an odd
node, to avoid split-brain situations. It can also request a consistent
application state snapshot, which help with backups
So what are your choices for ultimate DR?
• If you have the money, 3 data centres so you have synchronous clusters
with 9 Galera Cluster nodes… This is also in addition to your application
servers, proxies, etc.

• 2 data centres, 7 nodes, with the Galera Arbitrator is a possibility

• If you don’t have as much budget, consider the async replication option
between 2 DCs. Just remember all the “manual glue” you may need to go
with this!
• “The dread of a disaster makes everybody act in a way that increases the
disaster.” — Bertrand Russell
Some Galera Cluster specific resources
• https://galeracluster.com/library/documentation/managing-fc.html

• https://galeracluster.com/library/documentation/auto-eviction.html

• https://galeracluster.com/library/documentation/using-sr.html (Galera 4
new feature)

• https://galeracluster.com/library/documentation/backup-cluster.html

• https://galeracluster.com/library/training/tutorials/geo-distributed-
clusters.html
Resources
• Disaster Recovery and Business Continuity, 3rd Edition by Thejandra BS

• Disaster Recovery, Crisis Response, and Business Continuity: A
Management Desk Reference by Jamie Watters

• Business Continuity and Disaster Recovery Planning for IT Professionals,
2nd Edition by Susan Snedaker

• Effective MySQL Backup and Recovery by Ronald Bradford
Questions?
Colin Charles, colin.charles@galeracluster.com

https://twitter.com/galeracluster | www.galeracluster.com
27

More Related Content

What's hot

MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability SolutionsMydbops
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)Altinity Ltd
 
Maria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityMaria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityOSSCube
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability Mydbops
 
Galera explained 3
Galera explained 3Galera explained 3
Galera explained 3Marco Tusa
 
How to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScaleHow to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScaleMariaDB plc
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLOlivier DASINI
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterKenny Gryp
 
MySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestMySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestI Goo Lee
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQLMydbops
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesSeveralnines
 
Maxscale switchover, failover, and auto rejoin
Maxscale switchover, failover, and auto rejoinMaxscale switchover, failover, and auto rejoin
Maxscale switchover, failover, and auto rejoinWagner Bianchi
 
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdfProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdfJesmar Cannao'
 
MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB plc
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HAharoonm
 

What's hot (20)

MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability Solutions
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
 
Maria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityMaria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High Availability
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
Galera explained 3
Galera explained 3Galera explained 3
Galera explained 3
 
MySQL Shell for DBAs
MySQL Shell for DBAsMySQL Shell for DBAs
MySQL Shell for DBAs
 
How to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScaleHow to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScale
 
Query logging with proxysql
Query logging with proxysqlQuery logging with proxysql
Query logging with proxysql
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
 
InnoDb Vs NDB Cluster
InnoDb Vs NDB ClusterInnoDb Vs NDB Cluster
InnoDb Vs NDB Cluster
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
 
MySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestMySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software Test
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best Practices
 
Maxscale switchover, failover, and auto rejoin
Maxscale switchover, failover, and auto rejoinMaxscale switchover, failover, and auto rejoin
Maxscale switchover, failover, and auto rejoin
 
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdfProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
 
Zero Downtime Schema Changes in Galera Cluster
Zero Downtime Schema Changes in Galera ClusterZero Downtime Schema Changes in Galera Cluster
Zero Downtime Schema Changes in Galera Cluster
 
MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and Optimization
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
 

Similar to Planning for Disaster Recovery (DR) with Galera Cluster

Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu
 
Cassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeCassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeMarc Fielding
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityPapitha Velumani
 
Understanding application requirements
Understanding application requirementsUnderstanding application requirements
Understanding application requirementsCloud Genius
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez DataWorks Summit
 
The MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits inThe MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits inSakari Keskitalo
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Cloud Computing .ppt
Cloud Computing .pptCloud Computing .ppt
Cloud Computing .pptPrukaBay
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
Intro to MySQL Master Slave Replication
Intro to MySQL Master Slave ReplicationIntro to MySQL Master Slave Replication
Intro to MySQL Master Slave Replicationsatejsahu
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr
 
Caching principles-solutions
Caching principles-solutionsCaching principles-solutions
Caching principles-solutionspmanvi
 
Building a Fast, Reliable SQL Server for kCura Relativity
Building a Fast, Reliable SQL Server for kCura RelativityBuilding a Fast, Reliable SQL Server for kCura Relativity
Building a Fast, Reliable SQL Server for kCura RelativityBrent Ozar
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresOzgun Erdogan
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 

Similar to Planning for Disaster Recovery (DR) with Galera Cluster (20)

Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
My sql
My sqlMy sql
My sql
 
Cassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeCassandra Essentials Day Cambridge
Cassandra Essentials Day Cambridge
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Understanding application requirements
Understanding application requirementsUnderstanding application requirements
Understanding application requirements
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
The MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits inThe MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits in
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
ppt2.pdf
ppt2.pdfppt2.pdf
ppt2.pdf
 
Cloud Computing .ppt
Cloud Computing .pptCloud Computing .ppt
Cloud Computing .ppt
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Intro to MySQL Master Slave Replication
Intro to MySQL Master Slave ReplicationIntro to MySQL Master Slave Replication
Intro to MySQL Master Slave Replication
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & Toubleshooting
 
Caching principles-solutions
Caching principles-solutionsCaching principles-solutions
Caching principles-solutions
 
Building a Fast, Reliable SQL Server for kCura Relativity
Building a Fast, Reliable SQL Server for kCura RelativityBuilding a Fast, Reliable SQL Server for kCura Relativity
Building a Fast, Reliable SQL Server for kCura Relativity
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 

More from Codership Oy - Creators of Galera Cluster

Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...Codership Oy - Creators of Galera Cluster
 

More from Codership Oy - Creators of Galera Cluster (12)

Galera Cluster 4 for MySQL 8 Release Webinar slides
Galera Cluster 4 for MySQL 8 Release Webinar slidesGalera Cluster 4 for MySQL 8 Release Webinar slides
Galera Cluster 4 for MySQL 8 Release Webinar slides
 
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
 
Running Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft AzureRunning Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft Azure
 
Galera Cluster DDL and Schema Upgrades 220217
Galera Cluster DDL and Schema Upgrades 220217Galera Cluster DDL and Schema Upgrades 220217
Galera Cluster DDL and Schema Upgrades 220217
 
Taking Full Advantage of Galera Multi Master Cluster
Taking Full Advantage of Galera Multi Master ClusterTaking Full Advantage of Galera Multi Master Cluster
Taking Full Advantage of Galera Multi Master Cluster
 
Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Galera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replicationGalera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replication
 
Codership's galera cluster installation and quickstart webinar march 2016
Codership's galera cluster installation and quickstart webinar march 2016Codership's galera cluster installation and quickstart webinar march 2016
Codership's galera cluster installation and quickstart webinar march 2016
 
How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013
 
Galera Cluster 3.0 Features
Galera Cluster 3.0 FeaturesGalera Cluster 3.0 Features
Galera Cluster 3.0 Features
 
Introducing Galera 3.0
Introducing Galera 3.0Introducing Galera 3.0
Introducing Galera 3.0
 

Recently uploaded

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 

Recently uploaded (20)

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 

Planning for Disaster Recovery (DR) with Galera Cluster

  • 1. Planning for Disaster Recovery with Galera Cluster Colin Charles, colin.charles@galeracluster.com 29 October 2019 https://twitter.com/galeracluster | www.galeracluster.com Codership Webinar
  • 2. Agenda • Disasters happen • Trade off’s • A geo-distributed Galera Cluster • Architecture • A DR plan • Is async the best solution for DR? • Resources
  • 3.
  • 4.
  • 5.
  • 6. Galera Cluster highlights • We talk a lot about High Availability • We talk a lot about multi-master replication • Synchronous clusters that can ensure you’re always available • Quorum based failure handling, optimistic concurrency control to commit • Optimised for the cloud/Wide Area Networks (WANs)
  • 7.
  • 8. However how does all this work with Disaster Recovery? • Galera Cluster does support being run in multiple data centres • Effectively you can have a 9-node Galera Cluster across 3 data centres to keep you highly available • Galera Cluster supports geo-distributed database clusters • https://galeracluster.com/2015/07/geo-distributed-database-clusters- with-galera/
  • 9.
  • 10. Benefits of a geo-distributed Galera Cluster • Increased redundancy • All database operations are local (segmented) • Network traffic is reduced across DCs (with optimised bandwidth consumption) • Latency penalty as minimal as possible (when it is time to COMMIT, hello speed of light, et al) • Flow control fully configurable • No split brain issues • Out of the box encryption • Can also work with asynchronous replication
  • 11. So, architecture… • If you’re doing 9 Galera Cluster nodes at the minimum, you also have to have your application clusters in 3 DCs • Sure, this is great for High Availability, but gets costly after sometime… • You also have to ensure that your schema is planned sensibly, after all, if you have hot rows, deadlocks, and less tolerance to performance issues during rollbacks, this may not be the best solution for a busy application that does a lot of UPDATEs
  • 12. We are here to talk Disaster Recovery (DR) • It is the ability to run your business continuously without any interruptions irrespective of any damage occurring to your infrastructure • DR is definitely not cheap, but can you afford to lose business transactions? It is this “backup cost” that you need to think about • We’ve seen things inside the Linux kernel that can help with DR too, e.g. DRBD • Basically a good DR plan is your Business Continuity Plan (BCP) • Disaster Recovery and Business Continuity, 3rd Edition by Thejandra BS • Recovery Time Objective (RTO): the time-scale (in hours or days) within which this must be achieved, that is, the length of time it can afford to cease operating its business. • Recovery Point Objective (RPO): the point in time when an organisation should recover, for example, it could be stated as ‘Data can be recovered as of 9 pm last night’ - it defines the amount of data that it can afford to lose. • You’re building resilience in your infrastructure
  • 13. Cloud people understand resilience • Cloud instances tend to be of varying quality • Sometimes you spin up a poor instance. Best to kill/restart, as long as you know baseline benchmarks • The Simian Army (by Netflix) can help make more resilient infrastructure • Includes Chaos Monkey, Latency Monkey, Chaos Gorilla (drops a whole AZ), etc. • Even spawned a field, Chaos Engineering • Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system's capability to withstand turbulent and unexpected conditions. (ref: https://en.wikipedia.org/wiki/Chaos_engineering)
  • 14. What else do you need to think about? • Keep track of the Mean Time to Recover (MTTR) • Underrated is the Mean Time to Detect (MTTD) — how long do you know a disaster has struck and can move your workloads? • What is your SLA?
  • 15. So what do you need in a plan? • In terms of a Galera Cluster, you’re really thinking about ensuring you have another data centre to take over • You could already be running a 3-DC cluster… • But presumably, you’re planning for disaster recovery, likely via asynchronous replication to another data centre (as it saves the cost of having yet another DC) • You also want to make sure all this is 100% fully automated…
  • 16. You’ll have to think about your entire stack • Beyond the database, you have to ensure that there will be quick DNS switchover (so low TTL on your DNS) • Application servers need to be running and ready to take on the load at the other data centre • If using a proxy, this too will have to be awaiting at the other data centre • So to mitigate from a complete disaster AND have great performance, you are going to want to create a replica of your setup at a remote site
  • 17.
  • 18. Why async replication between data centres for DR? • Async replication in MySQL 5.7/8 are really quite fast (same with MariaDB 10.3/10.4) • The idea of “lagging slaves” should not be too much of an issue… this can be tuned and configured • You must ask — is fully synchronous replication right for your application? • Callaghan’s Law: [In a Galera cluster] a given row can’t be modified more than once per RTT. 
  • 19. A practical case study • A more practical example, by Marco Tusa — https://www.percona.com/ blog/2018/11/15/how-not-to-do-mysql-high-availability-geographic-node- distribution-with-galera-based-replication-misuse/ AND https:// www.percona.com/blog/2018/11/15/mysql-high-availability-on-premises- a-geographically-distributed-scenario/
  • 20. Simple reasons… • A Galera Cluster across 3-DCs is pricier than the previous solution, and it gives you data consistency across all nodes. You however do need to ensure your application can take the commit time penalties, you have a high performant link for replication… • The other approach is more focused on “local commits” (just to you 3-node cluster in one DC), you’ll see some data state difference thanks to async replication, you don’t need a great replication link, DR works, and also this works better across geographies • We always think latencies, even 5ms isn’t high, but it actually is! • We have to remember a Galera writeset can be as small as a 1 row INSERT but large with many UPDATEs too • We have to think about IP frames • In Galera, flow control is the receiving queue. There is a queue of events and the longer this queue is, the longer it takes for certification too.
  • 21. All this doesn’t absolve you from other things… • Like some kind of “automatic failover framework” when you go the async route for DR • A good backup and restore solution • A good rule based solution for load balancing (ProxySQL, MariaDB MaxScale)
  • 22. The Galera Arbitrator Daemon (garbd) • If you have access to a 3rd data centre, or put a one-node garbd in your DR site, you could also have a 2-paired cluster in 2 DCs, thus bringing your node count to a mere 7 nodes (instead of 9) • When you have an even number of nodes, garbd functions as an odd node, to avoid split-brain situations. It can also request a consistent application state snapshot, which help with backups
  • 23. So what are your choices for ultimate DR? • If you have the money, 3 data centres so you have synchronous clusters with 9 Galera Cluster nodes… This is also in addition to your application servers, proxies, etc. • 2 data centres, 7 nodes, with the Galera Arbitrator is a possibility • If you don’t have as much budget, consider the async replication option between 2 DCs. Just remember all the “manual glue” you may need to go with this!
  • 24. • “The dread of a disaster makes everybody act in a way that increases the disaster.” — Bertrand Russell
  • 25. Some Galera Cluster specific resources • https://galeracluster.com/library/documentation/managing-fc.html • https://galeracluster.com/library/documentation/auto-eviction.html • https://galeracluster.com/library/documentation/using-sr.html (Galera 4 new feature) • https://galeracluster.com/library/documentation/backup-cluster.html • https://galeracluster.com/library/training/tutorials/geo-distributed- clusters.html
  • 26. Resources • Disaster Recovery and Business Continuity, 3rd Edition by Thejandra BS • Disaster Recovery, Crisis Response, and Business Continuity: A Management Desk Reference by Jamie Watters • Business Continuity and Disaster Recovery Planning for IT Professionals, 2nd Edition by Susan Snedaker • Effective MySQL Backup and Recovery by Ronald Bradford