Failover or not to failoverPresentation Transcript
Failover, or not Failover,that is the questionPercona Live MySQL Conference and Expo 2013Massimo Brignoli, SkySQLHenrik Ingo, NokiaPlease share and reuse this presentation licensed under the Creative Commonse Attribution License
Agenda● Why HA is more difficult for databases● Steps to failover● Monitoring● Automating failover● Sounds great!What could possibly go wrong?● Amazon Dynamo● Galera and NDB
Fault tolerance = redundancy● RAID● 2 power units per server● Cluster of servers● 2 kidneys per person● Redudancy at all levels:Software, Hardware, Network, Electricity...A chain is as strong as the weakest link.
Durability"Durability is an interesting concept.If I flush a transaction to disk,it is said to be durable.But if I then take a backup,it is even more durable."Heikki Tuuri
Why High Availability isMore Difficult for DatabasesRedundancy of serverANDRedundancy of dataWHILEPerforming thousands of write operationsper second onto the dataset
What failover?1. Primary server2. Secondary / Standby serverfor redundancy3. In case Primary fails,Secondary server mustbecome the new Primary
Steps to failover (theory)1. Notice failure2. Move VIP3. Continue
Automating failoverGeneric Clustering Solutions● Pacemaker/Corosync● Linux Heartbeat● Red Hat Cluster Suite● Solaris Cluster● Windows Server FailoverClustering● etc...MySQL Specific Solutions● MMM● PRM● MHA● JDBC connector
Steps to failover (DRBD)VIPVIP1. Have DRBD2. Notice failure3. Shutdown MySQL on primary4. Unmount disk on primary5. Mount disk on secondary6. Start MySQL on secondary7. Wait for InnoDB recovery8. Wait for InnoDB recovery9. Wait for InnoDB recovery10. Unset VIP on primary11. Set VIP on secondary12. Continue13. Should you add a new secondary?
Steps to failover (MySQL replication)VIPVIP1. Have replication2. Notice failure3. Make slave writable4. Make master read-only5. Unset VIP on master6. Set VIP on slave7. Continue8. Should you add a new slave?
What if you have more than 2 servers?(MySQL replication)VIPVIP?● MySQL replication failover with more than 2servers can be a hassle.● Which slave should become the new master?● All slaves must be pointed to the new master.● They must figure out where to continuereplication (binlog position)● MySQL 5.6 GTID helps.
MHA and SkySQL...● Combination ofresource manager+ scripts● Automating failoverprocess:○ New Masterselection○ Slavesreconfiguration○ VIP management○ Missing binlogsretrieval
Sounds great,what could possibly go wrong?
Sounds great,what could possibly go wrong?VIP1. Have replication○ Ok, is it working? What if its not working?○ Is it replicating in the right direction?○ Does your bash script handle binlog positions correctly?○ Asynchronous?2. Notice failure○ Polling interval○ Who is polling?○ ...and from where?○ How is he handling failure himself?○ False positives○ Is failover the right response to every failure?3. STONITH○ Shutdown MySQL on Primary? How? Its not responding...○ Unmount disk on Primary? How? Its not responding...○ "You need a STONITH device"! Hehe, nice try...4. Move VIP○ Unset VIP on Master/Primary? How? Its not responding...○ Set VIP on Secondary/Slave. This will work fine. Unfortunately.5. Continue6. Add back new/same Secondary○ Automatically of course. Even if it just failed 15 seconds ago.VIP
Case Githubhttps://github.com/blog/1261-github-availability-this-week● MySQL replication, Pacemaker, Corosync, Percona Replication Manager● PRM health check fails due to high load during schema migration.● Failover!● New node has cold caches, so even worse performance.● Failover! (back)● Disable PRM● A slave is found outdated as replication is not happening● Enable PRM and hope it will fix it● Pacemaker segfaults, causing cluster partition● PRM selects the outdated node as master, shuts down others● All kinds of data inconsistencies● Restart PRM on all nodes● ...
Case GithubLesson learned:Automated failover is dangerousCold cache is dangerous
But...Not automating is also dangerousBaron Schwartz:75% of replication failures are human errorshttp://www.percona.com/about-us/mysql-white-paper/causes-of-downtime-in-production-mysql-servers80% of Aviation accidentsare caused by human errorshttp://asasi.org/papers/2004/Shappell%20et%20al_HFACS_ISASI04.pdf80% Events caused by human errors, 70% ofthem due to organization weaknesseshttp://www.hss.doe.gov/sesa/corporatesafety/hpc/fundamentals.html
Are we solving the right problem?
Instead of automating the problem...Eliminate the problem!
Amazon DynamoR + W > NVoldemort, Cassandra, RIAK, DynamoDB, S3http://openlife.cc/blogs/2012/september/failover-evil
N=3, R=W=2R + W > N
Eventual consistency is internal onlyR + W > N
Failover?Single node failure is a non-event!
For relational databases?Synchronous replication isspecial case of Dynamo:W=N & R=1
Or is there a failover after all?Due to W=N, writers actually notice nodefailures! Cluster reconfiguration needed.(Readers are ok.)?
Example: MySQL NDB Cluster
What have we learned?● Failover with DRBD is painful because it isslow.● Failover with MySQL replication is painfulbecause its a mess.● Amazon Dynamo has no failover● Galera Cluster has no failover but needscluster reconfiguration. Same thing...● MySQL NDB Cluster has failover but youcant see it.