Become a MySQL DBA - webinar series - slides: Which High Availability solution?

Copyright 2015 Severalnines AB
Designing HA for MySQL
July 28, 2015
Krzysztof Książek
Severalnines
krzysztof@severalnines.com
1

! We want to help all non-DBA people who have to take
care of MySQL infrastructure
! Discuss most common activities
! Share tips and good practicies
! If you missed, we’d like to encourage you to watch the
replay of the “Deep Dive Into How to Monitor Galera
Cluster” and “Deciding on a relevant backup solution"
! http://www.slideshare.net/Severalnines/videos
2
“Become a MySQL DBA” series

! HA - what it is?
! Caching layer
! HA solutions
! Proxy layer
! Common problems
3
Agenda

! It’s not enough to just build a database infrastructure, you
have to keep it available
! Redundancy is your friend
! Automate as much of the failover process where possible
! Know your business requirements and then decide
! Higher availability = higher costs - you need to find a
sweet spot
4
High Availability - what it is about?

Caching layer
5

! Reduce the load on the
database
! Works as a buffer between
the application and the
database tier
! Gives you some HA features
as it can serve the data even
if database is not available
! It’s a must have for any larger
application - cheaper than a
database and easier to scale
6
Caching layer - why do I need it?

! Memcached, Redis, Couchbase, you name it
! Database access is expensive and should be avoided -
therefore we have cache to handle reads
! Avoid cache miss storm
! Serve outdated data or wait for refresh if you can’t do
that
! Refresh the cache by executing a query ONCE!
! If you can serve old results, you can partially hide issues
7
Read cache

! Do not write directly to the database, use persistent
queue for that (Kinesis or Rabbitmq for example)
! Helps you to avoid overloading data tier with writes
! You can define exact number of workers to handle the
writes
! Helps to minimize impact of the database tier not being
available by caching writes
8
Write cache

High Availability Solutions
9

! RAID 1 over the TCP
! Maintain an exact copy (or
two) of your volume in a
separate location
! Active - passive model, only
one volume can be
mounted
! Works nice if you have a
single database node
10
Distributed Redundant Block Device

! Passive -> active switch takes time (InnoDB recovery is
required)
! You can’t use the passive node for anything
! Just a single node, not feasible for larger environments
! Great tool but with limited use cases
11
Distributed Redundant Block Device

! Best known HA solution for
MySQL
! You can use it for scaling
too
! By default - asynchronous
! Failover process may be
tricky without GTID
! Many tools are available to
automate it, though
12
MySQL Replication

! Locate the most advanced slave, apply missing updates
from master’s binary logs (if needed and possible)
! Ensure all slaves are on the same position for reslaving
! Reslave rest of the slaves to the chosen node
! Perform the failover
! Whole process is error-prone and tricky
! You can use MHA to manage it for you
13
MySQL Replication - failover

! Reslaving became much easier (CHANGE MASTER TO …
MASTER_AUTO_POSITION=1)
! You still have to choose a most advanced slave to
promote
! You still have to replay binary logs (if possible) to apply
missing changes
! You still may benefit from additional tooling
14
MySQL Replication - GTID

! Handles the failover process for you
! Can be used as a standalone solution or a part of a grand
schema (Pacemaker)
! masterha_manager being a SPOF - you need to monitor it
! Can work with GTID and regular replication
! Make sure that you have shutdown_script defined for
STONITH in MHA config
15
MySQL Replication - MHA

Clustering
16

! Synchronous cluster
! Based on NDB engine (not
InnoDB!)
! Great point-select performance
! Great insert performance
! Using data partitioning for
redundancy
! Behaves differently than InnoDB
- especially range queries
! It’s not a drop-in replacement
for regular MySQL
17
MySQL Cluster

! Virtually synchronous cluster
(lag <1s, typically few ms)
! Doesn’t split the data, each
node is a full copy
! Harder to scale as you
can’t increase disk
capacity by adding new
nodes
! Easier to run reporting
queries as all data is
available on every node
18
Galera Cluster

! "almost" a drop-in replacement for regular MySQL (uses
InnoDB engine)
! Different AUTO_INCREMENT handling
! All tables should have a primary key defined
! Basically, it’s InnoDB only (avoid MyISAM)
! Large transactions may be problematic or impossible
! Schema changes may become more complex
19
Galera Cluster

Proxy layer
20

! It’s a set of servers that will work as a middle man
between the application and the database layer
! They route the traffic to the database layer
! They should detect failed instances and topology
changes
! It’s useful to hide the database layer complexity from the
application
21
Proxy layer - why do we need it?

! Popular and proven tool
! Not database-aware, it just
moves packets
! Can check if the port is
available
! Can do HTTP tests - very
useful to build a logic
22
HAProxy

! Use read_only variable to differentiate master and slaves
! Have a script that works in the background, checks the
state of a node and store it in shared memory
! Have a script that will be executed via xinetd, check the
state from shared memory and return HTTP codes (200,
503) accordingly
! Make sure that the read_only variable will be changed
after the old master was stopped (by shutdown_script in
MHA) - otherwise split brain may happen
23
HAProxy

! MySQL-aware proxy
! Read/write splitting
! Still a new software, requires
detailed tests
! Tends to use significant
amount of CPU for R/W split
! Updated frequently, you
may want to follow the
changes
24
MaxScale

! Proxies need to be highly available too
! Multiple options to choose from:
! Put a proxy in front of the proxy (ELB)
! VIP + failover (i.e. keepalived)
! Colocate proxies with web nodes and handle config
changes via orchestration tools
25
HA for proxies

! Every approach has pros and cons
! ELB - easiest but available only in AWS (similar tools may
be available for other cloud providers too)
! VIP - only one proxy node will be active at a given time -
CPU utilization may become an issue
! Colocating proxies allows you to use more of them but
maintaining configuration can become a burden and
may be error prone
26
HA for proxies

Common problems
27

! Errant transactions are transactions executed on the slave
only, not on the master
! With GTID, all transactions executed on a given host will
be requested once the host become a master
! If transactions are not available in binlogs, replication will
break
! http://www.severalnines.com/blog/mysql-replication-and-
gtid-based-failover-deep-dive-errant-transactions
28
Errant transactions in GTID

! Master loses network connection, failover is deemed
necessary
! One of the slaves is staged to be a new master, others are
reslaved
! VIP is assigned to the new master
! VIP is still assigned to the old master (as it’s not available
and VIP can’t be removed)
! Once the old master comes up, data will be written to it
29
Split Brain

! STONITH is the solution - kill it with fire
! Ensure that the old master won’t come back up (think
how your automated recovery will behave?)
! Start with dedicated network connection (if available) -
use a patchcord. Maybe it’s a network error only?
! Try to use IPMI/iLO/KVM-ish solutions to stop the server
! Try to stop the server using manageable power strip
30
Split Brain - STONITH

! In the cloud - try to stop the instance completely
! You can also try to bond your interfaces for better network
availability (although you never know what’s on the other
side of the black box)
! For MHA - STONITH process is executed as
shutdown_script. For other tools - ensure you have
implemented similar behavior
! This is relevant for MySQL replication - Galera doesn’t
require such harsh methods
31
Split Brain - STONITH

32
Synchro-
nous
Load-
balancing
reads
Load-
balancing
writes
WAN Scalable
MySQL/
InnoDB
compatible
DRBD yes no no
yes
(standby
DR site)
no yes
MySQL
replicati
on
no
(async or
semi
sync)
yes no yes
only
reads
yes
MySQL
Cluster
yes yes yes yes yes
expect
different
behavior
Galera virtually yes yes yes
reads,
writes to
some
extend
almost

! More blogs in “Become a MySQL DBA” series:
! http://www.severalnines.com/blog/become-dba-blog-
series-monitoring-and-trending
! http://www.severalnines.com/blog/become-mysql-
dba-blog-series-backup-restore
! Contact: krzysztof@severalnines.com
33
Thank You!

Become a MySQL DBA - webinar series - slides: Which High Availability solution?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Become a MySQL DBA - webinar series - slides: Which High Availability solution?

Similar to Become a MySQL DBA - webinar series - slides: Which High Availability solution? (20)

More from Severalnines

More from Severalnines (20)

Recently uploaded

Recently uploaded (20)

Become a MySQL DBA - webinar series - slides: Which High Availability solution?