Choosing a MySQL High Availability solution - Percona Live UK 2011

  • 5,424 views
Uploaded on

Evaluating various MySQL high-availability solutions and choosing Galera.

Evaluating various MySQL high-availability solutions and choosing Galera.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,424
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
110
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Choosing a MySQL Replication & High Availability Solution Henrik Ingo Senior Performance Architect Nokia Percona Live UK 2011-10-25 Please share and re-use this presentation, licensed under Creative Commons Attribution license.2011-10-25 1
  • 2. Henrik Ingo Senior Performance Architect, Nokia SOA: Each team does their own thing open source technology and Nokia and web? strategy specialist App store, music store, Maps, SSO... 200M clients, 100M accounts active in MySQL-forks, Drupal communities Architecture reviews, "internal consultant" author of "Open Life: The Philosophy of Open Source" MySQL improvements: Recommend backup, HA, version etc... www.openlife.cc best practices henrik.ingo@openlife.cc2011-10-25 2
  • 3. High Availability Performance DurabilityTransactions / second (throughput) Speaking of databases Response time (latency) Committed data is not lost Percentiles (95% - 99%) D in ACID Get any response at all (tps > 0) Replicas, snapshots Measured as percentile (99.999%) point in time, backups High Availability Clustering Replication Monitoring Redundancy Failover 2011-10-25 3
  • 4. Uptime Percentile target Max downtime per year 90% 36 days 99% 3.65 days 99.5% 1.83 days 99.9% 8.76 hours 99.99% 52.56 minutes 99.999% 5.26 minutes 99.9999% 31.5 seconds Beyond system availability: Average downtime per user.2011-10-25 4
  • 5. So we pick a HA solution and are done! ? MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB Usability Performance AsynchronousStatement based Row based Semi-sync Synchronous Global trx id Multi threaded Clustering framework2011-10-25 5
  • 6. Ok, so what do I really want? MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB + + + + + + + + InnoDB We use InnoDB. We want to continue using InnoDB. Which solutions support InnoDB? NDB is its own storage engine. Its great. It can blow away all others in a benchmark. But its not InnoDB and is not considered here.2011-10-25 6
  • 7. MySQL level vs disk level replication MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB + + + + + + + + Usability + + + + ++ ++ - + Performance (1) (1) + - - + <.............. MySQL server level replication ............> < disk level ><engine> Higher level replication is betterCompetence: Performance: Replication = MySQL DBA can manage SAN has higher latency than local disk DRBD = Linux sysadmin can manage DRBD has higher latency than local disk SAN = Nobody can manage Replication has surprisingly little overheadOperations: Redundancy: Disk level = cold standby = long failover time Shared disk = Single Point of Failure Replication = hot standby = short failover time Shared nothing = redundant = good ++ for global trx id, easy provisioning 2011-10-25 7
  • 8. Statement vs Row based Asynchronous vs Synchronous MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB + + + + + + + + Usability + + + + ++ ++ - + Performance (1) (1) + - - + Asynchronous + + + + + (2)Statement based + + + + + + Row based + + + + + (3) (3) Semi-sync + + Synchronous (4) + + + Global trx id + + + + Multi threaded (1) (1) + + Row based = deterministic = good Asynchronous = data loss on failover Statement based = dangerous Synchronous = good Global trx id = easier setup & failover Multi threaded = scalability for complex topologies2011-10-25 8
  • 9. Clustering frameworks Heartbeat Corosync MMM Oracle/other VM MHA Tungsten Enterprise Solaris Cluster ... Failover2011-10-25 9
  • 10. Synchronous multi-master NDB Galera Failover2011-10-25 10
  • 11. Is a clustering solution part of the solution or the part of the problem? "Causes of Downtime in Production MySQL Servers" by Baron Schwartz: #1: Human error #2: SAN Complex clustering framework + SAN = More problems, not less! Galera (and NDB) = Replication based, no SAN or DRBD No "failover moment", no false positives No clustering framework needed (JDBC loadbalance) Simple and elegant!2011-10-25 11
  • 12. Statement vs Row based Asynchronous vs Synchronous MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB + + + + + + + + Usability + + + + + ++ - + Performance (1) (1) + - - + Asynchronous + + + + + (2)Statement based + + + + + + Row based + + + + + +(3) +(3) Semi-sync + + Synchronous +(4) + + + Global trx id + + + + Multi threaded (1) (1) + + Clustering + + framework 1) Multi-threaded slave, 1 per schema 2) No, but can be combined with MySQL replication 3) Reliability comparable to row based replication 4) Internally slave applier is asynchronous, but exposes synchronous caracteristics2011-10-25 12
  • 13. Benchmarks!Baseline single node performance"group commit bug" when sync_binlog=1 & innodb_flush_log_at_trx_commit=1wsrep api (Galera module, no replication) adds minimal overhead2011-10-25 13
  • 14. 3 node Galera clusterNo overhead in master-slave mode (red vs yellow)Small benefit in multi-master mode (depends on read/write ratio)2011-10-25 14
  • 15. Single node, disk bound workload10% and 20% of data in cacheOptimization for dummies: Double amount of RAM, double performance2011-10-25 15
  • 16. 3 node Galera cluster, disk boundAgain 2x RAM => 2x performanceRoughly 50% of single node performanceDecreases when writing to multiple masters (weird!)=> Blame InnoDB redo log purge weirdness2011-10-25 16
  • 17. Same disk bound cluster, modified Sysbench OLTPModified sysbench oltp: Same queries as isolated short transactions. - 75% read-only, 25% write-onlyPerformance is same in master-slave and multi-master modes - Limited by write throughput40% of single node performance2011-10-25 17
  • 18. DRBD vs Single node60% of single node performanceMinimum latency 10x higher but average is not so bad (not shown)Note: This is different HW than the Galera test, and different metric2011-10-25 18
  • 19. Semi sync vs Single nodePractically no performance overheadOpportunity to relax sync_binlog setting (green - yellow)2011-10-25 19
  • 20. Conclusions Simpler is better MySQL level replication is better than DRBD which is better than SAN Synchronous replication = no data loss Asynchronous replication = no latency (WAN replication) Multi-master = no clustering frameworks Multi-threaded slave increases performance in disk bound workload Global trx id, autoprovisioning increases operations usability Galera (and NDB) provides all these with good performance and stability2011-10-25 20
  • 21. References http://openlife.cc/blogs/2011/july/ultimate-mysql-high-availability-solution http://openlife.cc/category/topic/galera http://openlife.cc/blogs/2011/may/drbd-and-semi-sync-shootout-large-server http://www.percona.com/about-us/white-papers/ http://www.mysqlperformanceblog.com/2011/09/18/disaster-mysql-5-5-flushing/2011-10-25 21