Choosing a MySQL Replication               & High Availability Solution                         Henrik Ingo               ...
Henrik Ingo                                     Senior Performance Architect,                                             ...
High Availability                   Performance                               DurabilityTransactions / second (throughput)...
Uptime                      Percentile target Max downtime per year                                  90% 36 days          ...
So we pick a HA solution and are done!                                    ?                    MySQL   MySQL   MySQL   MyS...
Ok, so what do I really want?                 MySQL     MySQL    MySQL    MySQL     Tung     Galera   DRBD      SAN   NDB ...
MySQL level vs disk level replication                    MySQL     MySQL     MySQL     MySQL         Tung    Galera       ...
Statement vs Row based                            Asynchronous vs Synchronous                    MySQL   MySQL   MySQL   M...
Clustering frameworks                     Heartbeat                     Corosync                       MMM                ...
Synchronous multi-master                       NDB                      Galera                       Failover2011-10-25   ...
Is a clustering solution part of the solution or the part of the problem?    "Causes of Downtime in Production MySQL Serve...
Statement vs Row based                             Asynchronous vs Synchronous                    MySQL     MySQL      MyS...
Benchmarks!Baseline single node performance"group commit bug" when sync_binlog=1 & innodb_flush_log_at_trx_commit=1wsrep a...
3 node Galera clusterNo overhead in master-slave mode (red vs yellow)Small benefit in multi-master mode (depends on read/w...
Single node, disk bound workload10% and 20% of data in cacheOptimization for dummies: Double amount of RAM, double perform...
3 node Galera cluster, disk boundAgain 2x RAM => 2x performanceRoughly 50% of single node performanceDecreases when writin...
Same disk bound cluster, modified Sysbench OLTPModified sysbench oltp: Same queries as isolated short transactions. - 75% ...
DRBD vs Single node60% of single node performanceMinimum latency 10x higher but average is not so bad (not shown)Note: Thi...
Semi sync vs Single nodePractically no performance overheadOpportunity to relax sync_binlog setting (green - yellow)2011-1...
Conclusions    Simpler is better    MySQL level replication is better than DRBD which is better than SAN    Synchronous re...
References    http://openlife.cc/blogs/2011/july/ultimate-mysql-high-availability-solution    http://openlife.cc/category/...
Upcoming SlideShare
Loading in...5
×

Choosing a MySQL High Availability solution - Percona Live UK 2011

6,139

Published on

Evaluating various MySQL high-availability solutions and choosing Galera.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,139
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
128
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Choosing a MySQL High Availability solution - Percona Live UK 2011

  1. 1. Choosing a MySQL Replication & High Availability Solution Henrik Ingo Senior Performance Architect Nokia Percona Live UK 2011-10-25 Please share and re-use this presentation, licensed under Creative Commons Attribution license.2011-10-25 1
  2. 2. Henrik Ingo Senior Performance Architect, Nokia SOA: Each team does their own thing open source technology and Nokia and web? strategy specialist App store, music store, Maps, SSO... 200M clients, 100M accounts active in MySQL-forks, Drupal communities Architecture reviews, "internal consultant" author of "Open Life: The Philosophy of Open Source" MySQL improvements: Recommend backup, HA, version etc... www.openlife.cc best practices henrik.ingo@openlife.cc2011-10-25 2
  3. 3. High Availability Performance DurabilityTransactions / second (throughput) Speaking of databases Response time (latency) Committed data is not lost Percentiles (95% - 99%) D in ACID Get any response at all (tps > 0) Replicas, snapshots Measured as percentile (99.999%) point in time, backups High Availability Clustering Replication Monitoring Redundancy Failover 2011-10-25 3
  4. 4. Uptime Percentile target Max downtime per year 90% 36 days 99% 3.65 days 99.5% 1.83 days 99.9% 8.76 hours 99.99% 52.56 minutes 99.999% 5.26 minutes 99.9999% 31.5 seconds Beyond system availability: Average downtime per user.2011-10-25 4
  5. 5. So we pick a HA solution and are done! ? MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB Usability Performance AsynchronousStatement based Row based Semi-sync Synchronous Global trx id Multi threaded Clustering framework2011-10-25 5
  6. 6. Ok, so what do I really want? MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB + + + + + + + + InnoDB We use InnoDB. We want to continue using InnoDB. Which solutions support InnoDB? NDB is its own storage engine. Its great. It can blow away all others in a benchmark. But its not InnoDB and is not considered here.2011-10-25 6
  7. 7. MySQL level vs disk level replication MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB + + + + + + + + Usability + + + + ++ ++ - + Performance (1) (1) + - - + <.............. MySQL server level replication ............> < disk level ><engine> Higher level replication is betterCompetence: Performance: Replication = MySQL DBA can manage SAN has higher latency than local disk DRBD = Linux sysadmin can manage DRBD has higher latency than local disk SAN = Nobody can manage Replication has surprisingly little overheadOperations: Redundancy: Disk level = cold standby = long failover time Shared disk = Single Point of Failure Replication = hot standby = short failover time Shared nothing = redundant = good ++ for global trx id, easy provisioning 2011-10-25 7
  8. 8. Statement vs Row based Asynchronous vs Synchronous MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB + + + + + + + + Usability + + + + ++ ++ - + Performance (1) (1) + - - + Asynchronous + + + + + (2)Statement based + + + + + + Row based + + + + + (3) (3) Semi-sync + + Synchronous (4) + + + Global trx id + + + + Multi threaded (1) (1) + + Row based = deterministic = good Asynchronous = data loss on failover Statement based = dangerous Synchronous = good Global trx id = easier setup & failover Multi threaded = scalability for complex topologies2011-10-25 8
  9. 9. Clustering frameworks Heartbeat Corosync MMM Oracle/other VM MHA Tungsten Enterprise Solaris Cluster ... Failover2011-10-25 9
  10. 10. Synchronous multi-master NDB Galera Failover2011-10-25 10
  11. 11. Is a clustering solution part of the solution or the part of the problem? "Causes of Downtime in Production MySQL Servers" by Baron Schwartz: #1: Human error #2: SAN Complex clustering framework + SAN = More problems, not less! Galera (and NDB) = Replication based, no SAN or DRBD No "failover moment", no false positives No clustering framework needed (JDBC loadbalance) Simple and elegant!2011-10-25 11
  12. 12. Statement vs Row based Asynchronous vs Synchronous MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB + + + + + + + + Usability + + + + + ++ - + Performance (1) (1) + - - + Asynchronous + + + + + (2)Statement based + + + + + + Row based + + + + + +(3) +(3) Semi-sync + + Synchronous +(4) + + + Global trx id + + + + Multi threaded (1) (1) + + Clustering + + framework 1) Multi-threaded slave, 1 per schema 2) No, but can be combined with MySQL replication 3) Reliability comparable to row based replication 4) Internally slave applier is asynchronous, but exposes synchronous caracteristics2011-10-25 12
  13. 13. Benchmarks!Baseline single node performance"group commit bug" when sync_binlog=1 & innodb_flush_log_at_trx_commit=1wsrep api (Galera module, no replication) adds minimal overhead2011-10-25 13
  14. 14. 3 node Galera clusterNo overhead in master-slave mode (red vs yellow)Small benefit in multi-master mode (depends on read/write ratio)2011-10-25 14
  15. 15. Single node, disk bound workload10% and 20% of data in cacheOptimization for dummies: Double amount of RAM, double performance2011-10-25 15
  16. 16. 3 node Galera cluster, disk boundAgain 2x RAM => 2x performanceRoughly 50% of single node performanceDecreases when writing to multiple masters (weird!)=> Blame InnoDB redo log purge weirdness2011-10-25 16
  17. 17. Same disk bound cluster, modified Sysbench OLTPModified sysbench oltp: Same queries as isolated short transactions. - 75% read-only, 25% write-onlyPerformance is same in master-slave and multi-master modes - Limited by write throughput40% of single node performance2011-10-25 17
  18. 18. DRBD vs Single node60% of single node performanceMinimum latency 10x higher but average is not so bad (not shown)Note: This is different HW than the Galera test, and different metric2011-10-25 18
  19. 19. Semi sync vs Single nodePractically no performance overheadOpportunity to relax sync_binlog setting (green - yellow)2011-10-25 19
  20. 20. Conclusions Simpler is better MySQL level replication is better than DRBD which is better than SAN Synchronous replication = no data loss Asynchronous replication = no latency (WAN replication) Multi-master = no clustering frameworks Multi-threaded slave increases performance in disk bound workload Global trx id, autoprovisioning increases operations usability Galera (and NDB) provides all these with good performance and stability2011-10-25 20
  21. 21. References http://openlife.cc/blogs/2011/july/ultimate-mysql-high-availability-solution http://openlife.cc/category/topic/galera http://openlife.cc/blogs/2011/may/drbd-and-semi-sync-shootout-large-server http://www.percona.com/about-us/white-papers/ http://www.mysqlperformanceblog.com/2011/09/18/disaster-mysql-5-5-flushing/2011-10-25 21
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×