There are many many approaches to MySQL high availability - from traditional, loosely-coupled database setups based on asynchronous replication to more modern, tightly-coupled architectures based on synchronous replication. These offer varying degrees of protection, and DBAs almost always have to choose a trade-off between high-availability and cost.
In this webinar, we looked at some of the most widely used HA alternatives in the MySQL world and discuss their pros and cons.
AGENDA
- HA - what is it?
- Caching layer
- HA solutions
• MySQL Replication
• MySQL Cluster
• Galera Cluster
• Hybrid Replication
- Proxy layer
• HAProxy
• MaxScale
• Elastic Load Balancer (AWS)
- Common issues
• Split brain scenarios
• GTID-based failover and Errant Transactions
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Become a MySQL DBA - webinar series - slides: Which High Availability solution?
1. Copyright 2015 Severalnines AB
Designing HA for MySQL
July 28, 2015
Krzysztof Książek
Severalnines
krzysztof@severalnines.com
1
2. Copyright 2015 Severalnines AB
! We want to help all non-DBA people who have to take
care of MySQL infrastructure
! Discuss most common activities
! Share tips and good practicies
! If you missed, we’d like to encourage you to watch the
replay of the “Deep Dive Into How to Monitor Galera
Cluster” and “Deciding on a relevant backup solution"
! http://www.slideshare.net/Severalnines/videos
2
“Become a MySQL DBA” series
3. Copyright 2015 Severalnines AB
! HA - what it is?
! Caching layer
! HA solutions
! Proxy layer
! Common problems
3
Agenda
4. Copyright 2015 Severalnines AB
! It’s not enough to just build a database infrastructure, you
have to keep it available
! Redundancy is your friend
! Automate as much of the failover process where possible
! Know your business requirements and then decide
! Higher availability = higher costs - you need to find a
sweet spot
4
High Availability - what it is about?
6. Copyright 2015 Severalnines AB
! Reduce the load on the
database
! Works as a buffer between
the application and the
database tier
! Gives you some HA features
as it can serve the data even
if database is not available
! It’s a must have for any larger
application - cheaper than a
database and easier to scale
6
Caching layer - why do I need it?
7. Copyright 2015 Severalnines AB
! Memcached, Redis, Couchbase, you name it
! Database access is expensive and should be avoided -
therefore we have cache to handle reads
! Avoid cache miss storm
! Serve outdated data or wait for refresh if you can’t do
that
! Refresh the cache by executing a query ONCE!
! If you can serve old results, you can partially hide issues
7
Read cache
8. Copyright 2015 Severalnines AB
! Do not write directly to the database, use persistent
queue for that (Kinesis or Rabbitmq for example)
! Helps you to avoid overloading data tier with writes
! You can define exact number of workers to handle the
writes
! Helps to minimize impact of the database tier not being
available by caching writes
8
Write cache
10. Copyright 2015 Severalnines AB
! RAID 1 over the TCP
! Maintain an exact copy (or
two) of your volume in a
separate location
! Active - passive model, only
one volume can be
mounted
! Works nice if you have a
single database node
10
Distributed Redundant Block Device
11. Copyright 2015 Severalnines AB
! Passive -> active switch takes time (InnoDB recovery is
required)
! You can’t use the passive node for anything
! Just a single node, not feasible for larger environments
! Great tool but with limited use cases
11
Distributed Redundant Block Device
12. Copyright 2015 Severalnines AB
! Best known HA solution for
MySQL
! You can use it for scaling
too
! By default - asynchronous
! Failover process may be
tricky without GTID
! Many tools are available to
automate it, though
12
MySQL Replication
13. Copyright 2015 Severalnines AB
! Locate the most advanced slave, apply missing updates
from master’s binary logs (if needed and possible)
! Ensure all slaves are on the same position for reslaving
! Reslave rest of the slaves to the chosen node
! Perform the failover
! Whole process is error-prone and tricky
! You can use MHA to manage it for you
13
MySQL Replication - failover
14. Copyright 2015 Severalnines AB
! Reslaving became much easier (CHANGE MASTER TO …
MASTER_AUTO_POSITION=1)
! You still have to choose a most advanced slave to
promote
! You still have to replay binary logs (if possible) to apply
missing changes
! You still may benefit from additional tooling
14
MySQL Replication - GTID
15. Copyright 2015 Severalnines AB
! Handles the failover process for you
! Can be used as a standalone solution or a part of a grand
schema (Pacemaker)
! masterha_manager being a SPOF - you need to monitor it
! Can work with GTID and regular replication
! Make sure that you have shutdown_script defined for
STONITH in MHA config
15
MySQL Replication - MHA
17. Copyright 2015 Severalnines AB
! Synchronous cluster
! Based on NDB engine (not
InnoDB!)
! Great point-select performance
! Great insert performance
! Using data partitioning for
redundancy
! Behaves differently than InnoDB
- especially range queries
! It’s not a drop-in replacement
for regular MySQL
17
MySQL Cluster
18. Copyright 2015 Severalnines AB
! Virtually synchronous cluster
(lag <1s, typically few ms)
! Doesn’t split the data, each
node is a full copy
! Harder to scale as you
can’t increase disk
capacity by adding new
nodes
! Easier to run reporting
queries as all data is
available on every node
18
Galera Cluster
19. Copyright 2015 Severalnines AB
! "almost" a drop-in replacement for regular MySQL (uses
InnoDB engine)
! Different AUTO_INCREMENT handling
! All tables should have a primary key defined
! Basically, it’s InnoDB only (avoid MyISAM)
! Large transactions may be problematic or impossible
! Schema changes may become more complex
19
Galera Cluster
21. Copyright 2015 Severalnines AB
! It’s a set of servers that will work as a middle man
between the application and the database layer
! They route the traffic to the database layer
! They should detect failed instances and topology
changes
! It’s useful to hide the database layer complexity from the
application
21
Proxy layer - why do we need it?
22. Copyright 2015 Severalnines AB
! Popular and proven tool
! Not database-aware, it just
moves packets
! Can check if the port is
available
! Can do HTTP tests - very
useful to build a logic
22
HAProxy
23. Copyright 2015 Severalnines AB
! Use read_only variable to differentiate master and slaves
! Have a script that works in the background, checks the
state of a node and store it in shared memory
! Have a script that will be executed via xinetd, check the
state from shared memory and return HTTP codes (200,
503) accordingly
! Make sure that the read_only variable will be changed
after the old master was stopped (by shutdown_script in
MHA) - otherwise split brain may happen
23
HAProxy
24. Copyright 2015 Severalnines AB
! MySQL-aware proxy
! Read/write splitting
! Still a new software, requires
detailed tests
! Tends to use significant
amount of CPU for R/W split
! Updated frequently, you
may want to follow the
changes
24
MaxScale
25. Copyright 2015 Severalnines AB
! Proxies need to be highly available too
! Multiple options to choose from:
! Put a proxy in front of the proxy (ELB)
! VIP + failover (i.e. keepalived)
! Colocate proxies with web nodes and handle config
changes via orchestration tools
25
HA for proxies
26. Copyright 2015 Severalnines AB
! Every approach has pros and cons
! ELB - easiest but available only in AWS (similar tools may
be available for other cloud providers too)
! VIP - only one proxy node will be active at a given time -
CPU utilization may become an issue
! Colocating proxies allows you to use more of them but
maintaining configuration can become a burden and
may be error prone
26
HA for proxies
28. Copyright 2015 Severalnines AB
! Errant transactions are transactions executed on the slave
only, not on the master
! With GTID, all transactions executed on a given host will
be requested once the host become a master
! If transactions are not available in binlogs, replication will
break
! http://www.severalnines.com/blog/mysql-replication-and-
gtid-based-failover-deep-dive-errant-transactions
28
Errant transactions in GTID
29. Copyright 2015 Severalnines AB
! Master loses network connection, failover is deemed
necessary
! One of the slaves is staged to be a new master, others are
reslaved
! VIP is assigned to the new master
! VIP is still assigned to the old master (as it’s not available
and VIP can’t be removed)
! Once the old master comes up, data will be written to it
29
Split Brain
30. Copyright 2015 Severalnines AB
! STONITH is the solution - kill it with fire
! Ensure that the old master won’t come back up (think
how your automated recovery will behave?)
! Start with dedicated network connection (if available) -
use a patchcord. Maybe it’s a network error only?
! Try to use IPMI/iLO/KVM-ish solutions to stop the server
! Try to stop the server using manageable power strip
30
Split Brain - STONITH
31. Copyright 2015 Severalnines AB
! In the cloud - try to stop the instance completely
! You can also try to bond your interfaces for better network
availability (although you never know what’s on the other
side of the black box)
! For MHA - STONITH process is executed as
shutdown_script. For other tools - ensure you have
implemented similar behavior
! This is relevant for MySQL replication - Galera doesn’t
require such harsh methods
31
Split Brain - STONITH