Uploaded on

Introducing Galera Cluster & the Codership Team …

Introducing Galera Cluster & the Codership Team

Galera Cluster in a nutshell:
True multi-master:
Read & write to any node
* Synchronous replication
* No slave lag
* No integrity issues
* No master-slave failovers or VIP needed
* Multi-threaded slave, no performance penalty
* Automatic node provisioning
Elastic:
Easy scale-out & scale-in, all nodes read-write

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
663
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
1
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction to Galera Cluster and Codership
  • 2. 2 Created by Codership Oy   Our founders participated in 3 MySQL cluster developments, since 2003.   Started Galera work 2007. Based on PhD by Fernando Pedone.   1.0 in 2011. Percona & MariaDB in 2012.   Galera is free & open source. Support and consulting by Codership & partners.
  • 3. 3 Galera Galera in a nutshell   True multi-master: Read & write to any node   Synchronous replication   No slave lag   No integrity issues   No master-slave failovers or VIP needed   Multi-threaded slave, no performance penalty   Automatic node provisioning   Elastic: Easy scale-out & scale-in, all nodes read-write Master MasterMaster
  • 4. 4 Sysbench disk bound (20GB data / 6GB buffer), tps   EC2 w local disk -  Note: pretty poor I/O here   Blue vs red: innodb_flush_log_at_trx_commit > 66% improvement   Scale-out factors: 2N = 0.5 x 1N 4N = 0.5 x 2N Sysbench disk bound, 20GB data / 6GB InnoDB buffer, tps http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
  • 5. 5 Galera vs other HA solutions Galera is like...   MySQL replication without integrity issues or slave lag   DRBD/SAN without failover downtime and performance penalty   Oracle RAC without failover downtime   NDB, but you get to keep InnoDB Galera NDB Failover downtime MySQL replication Slow Fast Dataintegrity DRBD 99 % 99.999...% PoorSolid RAC SAN Backups
  • 6. 6 Active-Active DB = best with Load Balancer   HA Proxy, GLB, Cisco, F5...   Pictured: Load balancer on each app server -  No Single Point of Failure -  One less layer of network components -  PHP and JDBC drivers provide this built-in! jdbc:mysql:loadbalance:// 10.0.0.1,10.0.0.2,10.0.0.3 /<database>? loadBalanceBlacklistTimeout=5000   Or: Separate HW or SW load balancer -  Centralized administration -  What if LB fails? Galera MySQL MySQLMySQL LB LB
  • 7. 7 GaleraGalera Some other architectures MySQL MySQLMySQLMySQL MySQLMySQL VIP Whole stack cluster Virtual IP failover
  • 8. 8 Galera Quorum   Galera uses quorum based failure handling: -  When cluster partitioning is detected, the majority partition "has quorum" and can continue -  A minority partition cannot commit transactions, but will attempt to re-connect to primary partition -  Note: 50% is not majority! => Minimum 3 nodes recommended.   Load balancer will notice errors & remove node from pool MySQL MySQLMySQL LB LB
  • 9. 9 WAN replication   Works fine   Use higher timeouts and send windows   No impact on reads   No impact within a transaction   adds 100-300 ms to commit latency   No major impact on tps   Quorum between data centers -  3 data centers -  Distribute nodes evenly
  • 10. 10 WAN with MySQL asynchronous replication   You can mix Galera replication and MySQL replication   Good option on poor WAN   Remember to watch out for slave lag, etc...   "Channel failover" if a master node crashes   Mixed replication useful when you want async slave (such as time-delayed, filtered, multi- source...)
  • 11. 11 Who is using Galera?
  • 12. Extra slides
  • 13. 13 Migration checklist   Are your tables InnoDB?   Make sure all tables have Primary Key   Watch out for Triggers and Events Tip: Don't do too many changes at once. Migrate to InnoDB first, run a month in production, then migrate to Galera.
  • 14. 14 MySQL A MySQL Galera cluster is... InnoDBMyISAM ReplicationAPI WsrepAPI SHOW STATUS LIKE "wsrep%" SHOW VARIABLES ... Galera group comm library MySQL MySQL Snapshot State Transfer mysqldump rsync xtrabackup etc... http://www.codership.com/downloads/download-mysqlgalera
  • 15. 15 Understanding the transaction sequence in Galera BEGIN Master Slave SELECT UPDATE COMMIT User transaction Certification Group communication => GTIDCertification COMMIT Apply commit return Commit delay Virtual synchrony = Committed events written to InnoDB after small delay Optimistic locking between nodes = Risk for deadlocks ROLLB InnoDB commit COMMIT discard Certification = deterministic InnoDB commit
  • 16. 16 What if I only have 2 nodes? Galera Arbitrator (garbd)   Acts as a 3rd node in a cluster but doesn't store the data.   Run it on an app server.   Run it on any other available server.   Note: Do not run a 3rd node in a VM on same hypervisor as other Galera nodes. (Why?) Master-slave clustering   Pacemaker, Heartbeat, etc... -  Manual failover?   Still better than MySQL replication or DRBD: Hot standby, multi-threaded slave...   Prioritize data integrity: set global wsrep_on=0 # (at failover)   Prioritize failover speed: pc.ignore_quorum=on # (at startup)
  • 17. 17 Optimistic locking cluster-wide   ...theoretical chance of deadlocks -  In most cases less than 1 out of 10.000 trx -  Correct solution: Catch exceptions in app and retry -  Design: Avoid hot-spots in tables -  Workaround: Directing all writes (or all problematic writes) to single node brings back 100% InnoDB compatibility
  • 18. 18 Snapshot options SST = Full snapshot   Mysqldump & rsync will block donor -  Dedicate 1 node to act as donor   Xtrabackup is a non-blocking option   Really big databases -  wsrep_sst_method=skip + manual backup & restore -  wsrep_sst_method=fedex :-) IST = Incremental State Transfer   Logic: IST is preferred over SST   gcache.size <= DB size gcache.size >= wsrep_replicated_bytes * <outage duration>
  • 19. Benchmarks http://codership.com/info/benchmarks
  • 20. 20 Sysbench disk bound (20GB data / 6GB buffer), tps   EC2 w local disk -  Note: pretty poor I/O here   Blue vs red: turning off innodb_flush_log_at_trx_com mit gives 66% improvement   Scale-out factors: 2N = 0.5 x 1N 4N = 0.5 x 2N   5th node was EC2 weakness. Later test scaled a little more up to 8 nodes http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
  • 21. 21 Sysbench disk bound (20GB data / 6GB buffer), latency   As before   Not syncing InnoDB decreases latency   Scale-out decreases latency   Galera does not add latency overhead http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
  • 22. 22 Galera and NDB shootout: sysbench "out of the box"   Galera is 4x better Ok, so what does this really mean?   That Galera is better... -  For this workload -  With default settings (Severalnines) -  Pretty user friendly and general purpose   NDB -  Excels at key-value and heavy-write workloads (which sysbench is not) -  Would benefit here from PARTITION BY RANGE http://codership.com/content/whats-difference-kenneth
  • 23. 23 Drupal on Galera: baseline w single server   Drupal, Apache, PHP, MySQL 5.1   JMeter -  3 types of users: poster, commenter, reader -  Gaussian (15, 7) think time   Large EC2 instance   Ideal scalability: linear until tipping point at 140-180 users -  Constrained by Apache/PHP CPU utilization -  Could scale out by adding more Apache in front of single MySQL http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
  • 24. 24 Drupal on Galera: Scale-out with 1-4 Galera nodes (tps)   Drupal, Apache, PHP, MySQL 5.1 w Galera   1-4 identical nodes -  Whole stack cluster -  MySQL connection to localhost   Multiply nr of users -  180, 360, 540, 720   3 nodes = linear scalability, 4 nodes still near-linear   Minimal latency overhead http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
  • 25. 25 Drupal on Galera: Scale-out with 1-4 Galera nodes (latency)   Like before   Constant nr of users -  180, 180, 180, 180   Scaling from 1 to 2 -  drastically reduces latency -  tps back to linear scalability   Scaling to 3 and 4 -  No more tps as there was no bottleneck. -  Slightly better latency -  Note: No overhead from additional nodes! http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
  • 26. 26 WAN replication, EC2 eu-west + us-east, tps http://codership.com/content/synchronous-replication-loves-you-again client eu-west db in us-east
  • 27. 27 WAN replication, EC2 eu-west + us-east, latency http://codership.com/content/synchronous-replication-loves-you-again client eu-west db in us-east
  • 28. 28 Conclusion: WAN only adds commit latency, which is usually ok EU-west <-> US-east -  90 ms -  "best case" EU <-> JPN -  275 ms EU <-> JPN <-> USA -  295 ms You can choose latency between: -  user and web server = ok -  web server and db = bad -  db and db = great! http://codership.com/content/synchronous-replication-loves-you-again http://www.mysqlperformanceblog.com/2012/01/11/making-the-impossible-3-nodes-intercontinental-replication/