Choosing a MySQL High Availability solution - Percona Live UK 2011
Upcoming SlideShare
Loading in...5
×
 

Choosing a MySQL High Availability solution - Percona Live UK 2011

on

  • 5,758 views

Evaluating various MySQL high-availability solutions and choosing Galera.

Evaluating various MySQL high-availability solutions and choosing Galera.

Statistics

Views

Total Views
5,758
Views on SlideShare
5,218
Embed Views
540

Actions

Likes
6
Downloads
103
Comments
0

4 Embeds 540

http://openlife.cc 510
http://blog.newitfarmer.com 27
http://cc.bingj.com 2
http://www.openlife.cc 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Choosing a MySQL High Availability solution - Percona Live UK 2011 Choosing a MySQL High Availability solution - Percona Live UK 2011 Presentation Transcript

  • Choosing a MySQL Replication & High Availability Solution Henrik Ingo Senior Performance Architect Nokia Percona Live UK 2011-10-25 Please share and re-use this presentation, licensed under Creative Commons Attribution license.2011-10-25 1
  • Henrik Ingo Senior Performance Architect, Nokia SOA: Each team does their own thing open source technology and Nokia and web? strategy specialist App store, music store, Maps, SSO... 200M clients, 100M accounts active in MySQL-forks, Drupal communities Architecture reviews, "internal consultant" author of "Open Life: The Philosophy of Open Source" MySQL improvements: Recommend backup, HA, version etc... www.openlife.cc best practices henrik.ingo@openlife.cc2011-10-25 2
  • High Availability Performance DurabilityTransactions / second (throughput) Speaking of databases Response time (latency) Committed data is not lost Percentiles (95% - 99%) D in ACID Get any response at all (tps > 0) Replicas, snapshots Measured as percentile (99.999%) point in time, backups High Availability Clustering Replication Monitoring Redundancy Failover 2011-10-25 3
  • Uptime Percentile target Max downtime per year 90% 36 days 99% 3.65 days 99.5% 1.83 days 99.9% 8.76 hours 99.99% 52.56 minutes 99.999% 5.26 minutes 99.9999% 31.5 seconds Beyond system availability: Average downtime per user.2011-10-25 4
  • So we pick a HA solution and are done! ? MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB Usability Performance AsynchronousStatement based Row based Semi-sync Synchronous Global trx id Multi threaded Clustering framework2011-10-25 5
  • Ok, so what do I really want? MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB + + + + + + + + InnoDB We use InnoDB. We want to continue using InnoDB. Which solutions support InnoDB? NDB is its own storage engine. Its great. It can blow away all others in a benchmark. But its not InnoDB and is not considered here.2011-10-25 6
  • MySQL level vs disk level replication MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB + + + + + + + + Usability + + + + ++ ++ - + Performance (1) (1) + - - + <.............. MySQL server level replication ............> < disk level ><engine> Higher level replication is betterCompetence: Performance: Replication = MySQL DBA can manage SAN has higher latency than local disk DRBD = Linux sysadmin can manage DRBD has higher latency than local disk SAN = Nobody can manage Replication has surprisingly little overheadOperations: Redundancy: Disk level = cold standby = long failover time Shared disk = Single Point of Failure Replication = hot standby = short failover time Shared nothing = redundant = good ++ for global trx id, easy provisioning 2011-10-25 7
  • Statement vs Row based Asynchronous vs Synchronous MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB + + + + + + + + Usability + + + + ++ ++ - + Performance (1) (1) + - - + Asynchronous + + + + + (2)Statement based + + + + + + Row based + + + + + (3) (3) Semi-sync + + Synchronous (4) + + + Global trx id + + + + Multi threaded (1) (1) + + Row based = deterministic = good Asynchronous = data loss on failover Statement based = dangerous Synchronous = good Global trx id = easier setup & failover Multi threaded = scalability for complex topologies2011-10-25 8
  • Clustering frameworks Heartbeat Corosync MMM Oracle/other VM MHA Tungsten Enterprise Solaris Cluster ... Failover2011-10-25 9
  • Synchronous multi-master NDB Galera Failover2011-10-25 10
  • Is a clustering solution part of the solution or the part of the problem? "Causes of Downtime in Production MySQL Servers" by Baron Schwartz: #1: Human error #2: SAN Complex clustering framework + SAN = More problems, not less! Galera (and NDB) = Replication based, no SAN or DRBD No "failover moment", no false positives No clustering framework needed (JDBC loadbalance) Simple and elegant!2011-10-25 11
  • Statement vs Row based Asynchronous vs Synchronous MySQL MySQL MySQL MySQL Tung Galera DRBD SAN NDB 5.0 5.1 5.5 5.6 sten InnoDB + + + + + + + + Usability + + + + + ++ - + Performance (1) (1) + - - + Asynchronous + + + + + (2)Statement based + + + + + + Row based + + + + + +(3) +(3) Semi-sync + + Synchronous +(4) + + + Global trx id + + + + Multi threaded (1) (1) + + Clustering + + framework 1) Multi-threaded slave, 1 per schema 2) No, but can be combined with MySQL replication 3) Reliability comparable to row based replication 4) Internally slave applier is asynchronous, but exposes synchronous caracteristics2011-10-25 12
  • Benchmarks!Baseline single node performance"group commit bug" when sync_binlog=1 & innodb_flush_log_at_trx_commit=1wsrep api (Galera module, no replication) adds minimal overhead2011-10-25 13
  • 3 node Galera clusterNo overhead in master-slave mode (red vs yellow)Small benefit in multi-master mode (depends on read/write ratio)2011-10-25 14
  • Single node, disk bound workload10% and 20% of data in cacheOptimization for dummies: Double amount of RAM, double performance2011-10-25 15
  • 3 node Galera cluster, disk boundAgain 2x RAM => 2x performanceRoughly 50% of single node performanceDecreases when writing to multiple masters (weird!)=> Blame InnoDB redo log purge weirdness2011-10-25 16
  • Same disk bound cluster, modified Sysbench OLTPModified sysbench oltp: Same queries as isolated short transactions. - 75% read-only, 25% write-onlyPerformance is same in master-slave and multi-master modes - Limited by write throughput40% of single node performance2011-10-25 17
  • DRBD vs Single node60% of single node performanceMinimum latency 10x higher but average is not so bad (not shown)Note: This is different HW than the Galera test, and different metric2011-10-25 18
  • Semi sync vs Single nodePractically no performance overheadOpportunity to relax sync_binlog setting (green - yellow)2011-10-25 19
  • Conclusions Simpler is better MySQL level replication is better than DRBD which is better than SAN Synchronous replication = no data loss Asynchronous replication = no latency (WAN replication) Multi-master = no clustering frameworks Multi-threaded slave increases performance in disk bound workload Global trx id, autoprovisioning increases operations usability Galera (and NDB) provides all these with good performance and stability2011-10-25 20
  • References http://openlife.cc/blogs/2011/july/ultimate-mysql-high-availability-solution http://openlife.cc/category/topic/galera http://openlife.cc/blogs/2011/may/drbd-and-semi-sync-shootout-large-server http://www.percona.com/about-us/white-papers/ http://www.mysqlperformanceblog.com/2011/09/18/disaster-mysql-5-5-flushing/2011-10-25 21