Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Galera Cluster

3,760 views

Published on

Introducing Galera Cluster & the Codership Team

Galera Cluster in a nutshell:
True multi-master:
Read & write to any node
* Synchronous replication
* No slave lag
* No integrity issues
* No master-slave failovers or VIP needed
* Multi-threaded slave, no performance penalty
* Automatic node provisioning
Elastic:
Easy scale-out & scale-in, all nodes read-write

Published in: Technology

Introduction to Galera Cluster

  1. 1. Introduction to Galera Cluster and Codership
  2. 2. 2 Created by Codership Oy   Our founders participated in 3 MySQL cluster developments, since 2003.   Started Galera work 2007. Based on PhD by Fernando Pedone.   1.0 in 2011. Percona & MariaDB in 2012.   Galera is free & open source. Support and consulting by Codership & partners.
  3. 3. 3 Galera Galera in a nutshell   True multi-master: Read & write to any node   Synchronous replication   No slave lag   No integrity issues   No master-slave failovers or VIP needed   Multi-threaded slave, no performance penalty   Automatic node provisioning   Elastic: Easy scale-out & scale-in, all nodes read-write Master MasterMaster
  4. 4. 4 Sysbench disk bound (20GB data / 6GB buffer), tps   EC2 w local disk -  Note: pretty poor I/O here   Blue vs red: innodb_flush_log_at_trx_commit > 66% improvement   Scale-out factors: 2N = 0.5 x 1N 4N = 0.5 x 2N Sysbench disk bound, 20GB data / 6GB InnoDB buffer, tps http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
  5. 5. 5 Galera vs other HA solutions Galera is like...   MySQL replication without integrity issues or slave lag   DRBD/SAN without failover downtime and performance penalty   Oracle RAC without failover downtime   NDB, but you get to keep InnoDB Galera NDB Failover downtime MySQL replication Slow Fast Dataintegrity DRBD 99 % 99.999...% PoorSolid RAC SAN Backups
  6. 6. 6 Active-Active DB = best with Load Balancer   HA Proxy, GLB, Cisco, F5...   Pictured: Load balancer on each app server -  No Single Point of Failure -  One less layer of network components -  PHP and JDBC drivers provide this built-in! jdbc:mysql:loadbalance:// 10.0.0.1,10.0.0.2,10.0.0.3 /<database>? loadBalanceBlacklistTimeout=5000   Or: Separate HW or SW load balancer -  Centralized administration -  What if LB fails? Galera MySQL MySQLMySQL LB LB
  7. 7. 7 GaleraGalera Some other architectures MySQL MySQLMySQLMySQL MySQLMySQL VIP Whole stack cluster Virtual IP failover
  8. 8. 8 Galera Quorum   Galera uses quorum based failure handling: -  When cluster partitioning is detected, the majority partition "has quorum" and can continue -  A minority partition cannot commit transactions, but will attempt to re-connect to primary partition -  Note: 50% is not majority! => Minimum 3 nodes recommended.   Load balancer will notice errors & remove node from pool MySQL MySQLMySQL LB LB
  9. 9. 9 WAN replication   Works fine   Use higher timeouts and send windows   No impact on reads   No impact within a transaction   adds 100-300 ms to commit latency   No major impact on tps   Quorum between data centers -  3 data centers -  Distribute nodes evenly
  10. 10. 10 WAN with MySQL asynchronous replication   You can mix Galera replication and MySQL replication   Good option on poor WAN   Remember to watch out for slave lag, etc...   "Channel failover" if a master node crashes   Mixed replication useful when you want async slave (such as time-delayed, filtered, multi- source...)
  11. 11. 11 Who is using Galera?
  12. 12. Extra slides
  13. 13. 13 Migration checklist   Are your tables InnoDB?   Make sure all tables have Primary Key   Watch out for Triggers and Events Tip: Don't do too many changes at once. Migrate to InnoDB first, run a month in production, then migrate to Galera.
  14. 14. 14 MySQL A MySQL Galera cluster is... InnoDBMyISAM ReplicationAPI WsrepAPI SHOW STATUS LIKE "wsrep%" SHOW VARIABLES ... Galera group comm library MySQL MySQL Snapshot State Transfer mysqldump rsync xtrabackup etc... http://www.codership.com/downloads/download-mysqlgalera
  15. 15. 15 Understanding the transaction sequence in Galera BEGIN Master Slave SELECT UPDATE COMMIT User transaction Certification Group communication => GTIDCertification COMMIT Apply commit return Commit delay Virtual synchrony = Committed events written to InnoDB after small delay Optimistic locking between nodes = Risk for deadlocks ROLLB InnoDB commit COMMIT discard Certification = deterministic InnoDB commit
  16. 16. 16 What if I only have 2 nodes? Galera Arbitrator (garbd)   Acts as a 3rd node in a cluster but doesn't store the data.   Run it on an app server.   Run it on any other available server.   Note: Do not run a 3rd node in a VM on same hypervisor as other Galera nodes. (Why?) Master-slave clustering   Pacemaker, Heartbeat, etc... -  Manual failover?   Still better than MySQL replication or DRBD: Hot standby, multi-threaded slave...   Prioritize data integrity: set global wsrep_on=0 # (at failover)   Prioritize failover speed: pc.ignore_quorum=on # (at startup)
  17. 17. 17 Optimistic locking cluster-wide   ...theoretical chance of deadlocks -  In most cases less than 1 out of 10.000 trx -  Correct solution: Catch exceptions in app and retry -  Design: Avoid hot-spots in tables -  Workaround: Directing all writes (or all problematic writes) to single node brings back 100% InnoDB compatibility
  18. 18. 18 Snapshot options SST = Full snapshot   Mysqldump & rsync will block donor -  Dedicate 1 node to act as donor   Xtrabackup is a non-blocking option   Really big databases -  wsrep_sst_method=skip + manual backup & restore -  wsrep_sst_method=fedex :-) IST = Incremental State Transfer   Logic: IST is preferred over SST   gcache.size <= DB size gcache.size >= wsrep_replicated_bytes * <outage duration>
  19. 19. Benchmarks http://codership.com/info/benchmarks
  20. 20. 20 Sysbench disk bound (20GB data / 6GB buffer), tps   EC2 w local disk -  Note: pretty poor I/O here   Blue vs red: turning off innodb_flush_log_at_trx_com mit gives 66% improvement   Scale-out factors: 2N = 0.5 x 1N 4N = 0.5 x 2N   5th node was EC2 weakness. Later test scaled a little more up to 8 nodes http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
  21. 21. 21 Sysbench disk bound (20GB data / 6GB buffer), latency   As before   Not syncing InnoDB decreases latency   Scale-out decreases latency   Galera does not add latency overhead http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
  22. 22. 22 Galera and NDB shootout: sysbench "out of the box"   Galera is 4x better Ok, so what does this really mean?   That Galera is better... -  For this workload -  With default settings (Severalnines) -  Pretty user friendly and general purpose   NDB -  Excels at key-value and heavy-write workloads (which sysbench is not) -  Would benefit here from PARTITION BY RANGE http://codership.com/content/whats-difference-kenneth
  23. 23. 23 Drupal on Galera: baseline w single server   Drupal, Apache, PHP, MySQL 5.1   JMeter -  3 types of users: poster, commenter, reader -  Gaussian (15, 7) think time   Large EC2 instance   Ideal scalability: linear until tipping point at 140-180 users -  Constrained by Apache/PHP CPU utilization -  Could scale out by adding more Apache in front of single MySQL http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
  24. 24. 24 Drupal on Galera: Scale-out with 1-4 Galera nodes (tps)   Drupal, Apache, PHP, MySQL 5.1 w Galera   1-4 identical nodes -  Whole stack cluster -  MySQL connection to localhost   Multiply nr of users -  180, 360, 540, 720   3 nodes = linear scalability, 4 nodes still near-linear   Minimal latency overhead http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
  25. 25. 25 Drupal on Galera: Scale-out with 1-4 Galera nodes (latency)   Like before   Constant nr of users -  180, 180, 180, 180   Scaling from 1 to 2 -  drastically reduces latency -  tps back to linear scalability   Scaling to 3 and 4 -  No more tps as there was no bottleneck. -  Slightly better latency -  Note: No overhead from additional nodes! http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
  26. 26. 26 WAN replication, EC2 eu-west + us-east, tps http://codership.com/content/synchronous-replication-loves-you-again client eu-west db in us-east
  27. 27. 27 WAN replication, EC2 eu-west + us-east, latency http://codership.com/content/synchronous-replication-loves-you-again client eu-west db in us-east
  28. 28. 28 Conclusion: WAN only adds commit latency, which is usually ok EU-west <-> US-east -  90 ms -  "best case" EU <-> JPN -  275 ms EU <-> JPN <-> USA -  295 ms You can choose latency between: -  user and web server = ok -  web server and db = bad -  db and db = great! http://codership.com/content/synchronous-replication-loves-you-again http://www.mysqlperformanceblog.com/2012/01/11/making-the-impossible-3-nodes-intercontinental-replication/

×