Introduction
to
Galera Cluster and Codership
2
Created by Codership Oy
  Our founders participated
in 3 MySQL cluster
developments, since
2003.
  Started Galera work...
3
Galera
Galera in a nutshell
  True multi-master:
Read & write to any node
  Synchronous replication
  No slave lag
 ...
4
Sysbench disk bound (20GB data / 6GB buffer), tps
  EC2 w local disk
-  Note: pretty poor I/O
here
  Blue vs red:
inno...
5
Galera vs other HA solutions
Galera is like...
  MySQL replication without
integrity issues or slave lag
  DRBD/SAN wi...
6
Active-Active DB = best with Load Balancer
  HA Proxy, GLB, Cisco, F5...
  Pictured: Load balancer on
each app server
...
7
GaleraGalera
Some other architectures
MySQL MySQLMySQLMySQL MySQLMySQL
VIP
Whole stack cluster
Virtual IP failover
8
Galera
Quorum
  Galera uses quorum based failure
handling:
-  When cluster partitioning is
detected, the majority parti...
9
WAN replication
  Works fine
  Use higher timeouts and send
windows
  No impact on reads
  No impact within a transa...
10
WAN with MySQL asynchronous replication
  You can mix Galera replication
and MySQL replication
  Good option on poor ...
11
Who is using Galera?
Extra slides
13
Migration checklist
  Are your tables InnoDB?
  Make sure all tables have Primary Key
  Watch out for Triggers and E...
14
MySQL
A MySQL Galera cluster is...
InnoDBMyISAM
ReplicationAPI
WsrepAPI
SHOW STATUS LIKE "wsrep%"
SHOW VARIABLES ...
Ga...
15
Understanding the transaction sequence in Galera
BEGIN
Master Slave
SELECT
UPDATE
COMMIT
User transaction
Certification...
16
What if I only have 2 nodes?
Galera Arbitrator (garbd)
  Acts as a 3rd node in a
cluster but doesn't store the
data.
...
17
Optimistic locking cluster-wide
  ...theoretical chance of deadlocks
-  In most cases less than 1 out of 10.000 trx
- ...
18
Snapshot options
SST = Full snapshot
  Mysqldump & rsync will block donor
-  Dedicate 1 node to act as donor
  Xtraba...
Benchmarks
http://codership.com/info/benchmarks
20
Sysbench disk bound (20GB data / 6GB buffer), tps
  EC2 w local disk
-  Note: pretty poor I/O here
  Blue vs red: tur...
21
Sysbench disk bound (20GB data / 6GB buffer), latency
  As before
  Not syncing InnoDB
decreases latency
  Scale-out...
22
Galera and NDB shootout: sysbench "out of the box"
  Galera is 4x better
Ok, so what does this really
mean?
  That Ga...
23
Drupal on Galera: baseline w single server
  Drupal, Apache, PHP,
MySQL 5.1
  JMeter
-  3 types of users: poster,
com...
24
Drupal on Galera: Scale-out with 1-4 Galera nodes (tps)
  Drupal, Apache, PHP,
MySQL 5.1 w Galera
  1-4 identical nod...
25
Drupal on Galera: Scale-out with 1-4 Galera nodes (latency)
  Like before
  Constant nr of users
-  180, 180, 180, 18...
26
WAN replication, EC2 eu-west + us-east, tps
http://codership.com/content/synchronous-replication-loves-you-again
client...
27
WAN replication, EC2 eu-west + us-east, latency
http://codership.com/content/synchronous-replication-loves-you-again
cl...
28
Conclusion: WAN only adds commit latency, which is usually ok
EU-west <-> US-east
-  90 ms
-  "best case"
EU <-> JPN
- ...
Upcoming SlideShare
Loading in...5
×

Introduction to Galera Cluster

1,800

Published on

Introducing Galera Cluster & the Codership Team

Galera Cluster in a nutshell:
True multi-master:
Read & write to any node
* Synchronous replication
* No slave lag
* No integrity issues
* No master-slave failovers or VIP needed
* Multi-threaded slave, no performance penalty
* Automatic node provisioning
Elastic:
Easy scale-out & scale-in, all nodes read-write

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,800
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
1
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Introduction to Galera Cluster"

  1. 1. Introduction to Galera Cluster and Codership
  2. 2. 2 Created by Codership Oy   Our founders participated in 3 MySQL cluster developments, since 2003.   Started Galera work 2007. Based on PhD by Fernando Pedone.   1.0 in 2011. Percona & MariaDB in 2012.   Galera is free & open source. Support and consulting by Codership & partners.
  3. 3. 3 Galera Galera in a nutshell   True multi-master: Read & write to any node   Synchronous replication   No slave lag   No integrity issues   No master-slave failovers or VIP needed   Multi-threaded slave, no performance penalty   Automatic node provisioning   Elastic: Easy scale-out & scale-in, all nodes read-write Master MasterMaster
  4. 4. 4 Sysbench disk bound (20GB data / 6GB buffer), tps   EC2 w local disk -  Note: pretty poor I/O here   Blue vs red: innodb_flush_log_at_trx_commit > 66% improvement   Scale-out factors: 2N = 0.5 x 1N 4N = 0.5 x 2N Sysbench disk bound, 20GB data / 6GB InnoDB buffer, tps http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
  5. 5. 5 Galera vs other HA solutions Galera is like...   MySQL replication without integrity issues or slave lag   DRBD/SAN without failover downtime and performance penalty   Oracle RAC without failover downtime   NDB, but you get to keep InnoDB Galera NDB Failover downtime MySQL replication Slow Fast Dataintegrity DRBD 99 % 99.999...% PoorSolid RAC SAN Backups
  6. 6. 6 Active-Active DB = best with Load Balancer   HA Proxy, GLB, Cisco, F5...   Pictured: Load balancer on each app server -  No Single Point of Failure -  One less layer of network components -  PHP and JDBC drivers provide this built-in! jdbc:mysql:loadbalance:// 10.0.0.1,10.0.0.2,10.0.0.3 /<database>? loadBalanceBlacklistTimeout=5000   Or: Separate HW or SW load balancer -  Centralized administration -  What if LB fails? Galera MySQL MySQLMySQL LB LB
  7. 7. 7 GaleraGalera Some other architectures MySQL MySQLMySQLMySQL MySQLMySQL VIP Whole stack cluster Virtual IP failover
  8. 8. 8 Galera Quorum   Galera uses quorum based failure handling: -  When cluster partitioning is detected, the majority partition "has quorum" and can continue -  A minority partition cannot commit transactions, but will attempt to re-connect to primary partition -  Note: 50% is not majority! => Minimum 3 nodes recommended.   Load balancer will notice errors & remove node from pool MySQL MySQLMySQL LB LB
  9. 9. 9 WAN replication   Works fine   Use higher timeouts and send windows   No impact on reads   No impact within a transaction   adds 100-300 ms to commit latency   No major impact on tps   Quorum between data centers -  3 data centers -  Distribute nodes evenly
  10. 10. 10 WAN with MySQL asynchronous replication   You can mix Galera replication and MySQL replication   Good option on poor WAN   Remember to watch out for slave lag, etc...   "Channel failover" if a master node crashes   Mixed replication useful when you want async slave (such as time-delayed, filtered, multi- source...)
  11. 11. 11 Who is using Galera?
  12. 12. Extra slides
  13. 13. 13 Migration checklist   Are your tables InnoDB?   Make sure all tables have Primary Key   Watch out for Triggers and Events Tip: Don't do too many changes at once. Migrate to InnoDB first, run a month in production, then migrate to Galera.
  14. 14. 14 MySQL A MySQL Galera cluster is... InnoDBMyISAM ReplicationAPI WsrepAPI SHOW STATUS LIKE "wsrep%" SHOW VARIABLES ... Galera group comm library MySQL MySQL Snapshot State Transfer mysqldump rsync xtrabackup etc... http://www.codership.com/downloads/download-mysqlgalera
  15. 15. 15 Understanding the transaction sequence in Galera BEGIN Master Slave SELECT UPDATE COMMIT User transaction Certification Group communication => GTIDCertification COMMIT Apply commit return Commit delay Virtual synchrony = Committed events written to InnoDB after small delay Optimistic locking between nodes = Risk for deadlocks ROLLB InnoDB commit COMMIT discard Certification = deterministic InnoDB commit
  16. 16. 16 What if I only have 2 nodes? Galera Arbitrator (garbd)   Acts as a 3rd node in a cluster but doesn't store the data.   Run it on an app server.   Run it on any other available server.   Note: Do not run a 3rd node in a VM on same hypervisor as other Galera nodes. (Why?) Master-slave clustering   Pacemaker, Heartbeat, etc... -  Manual failover?   Still better than MySQL replication or DRBD: Hot standby, multi-threaded slave...   Prioritize data integrity: set global wsrep_on=0 # (at failover)   Prioritize failover speed: pc.ignore_quorum=on # (at startup)
  17. 17. 17 Optimistic locking cluster-wide   ...theoretical chance of deadlocks -  In most cases less than 1 out of 10.000 trx -  Correct solution: Catch exceptions in app and retry -  Design: Avoid hot-spots in tables -  Workaround: Directing all writes (or all problematic writes) to single node brings back 100% InnoDB compatibility
  18. 18. 18 Snapshot options SST = Full snapshot   Mysqldump & rsync will block donor -  Dedicate 1 node to act as donor   Xtrabackup is a non-blocking option   Really big databases -  wsrep_sst_method=skip + manual backup & restore -  wsrep_sst_method=fedex :-) IST = Incremental State Transfer   Logic: IST is preferred over SST   gcache.size <= DB size gcache.size >= wsrep_replicated_bytes * <outage duration>
  19. 19. Benchmarks http://codership.com/info/benchmarks
  20. 20. 20 Sysbench disk bound (20GB data / 6GB buffer), tps   EC2 w local disk -  Note: pretty poor I/O here   Blue vs red: turning off innodb_flush_log_at_trx_com mit gives 66% improvement   Scale-out factors: 2N = 0.5 x 1N 4N = 0.5 x 2N   5th node was EC2 weakness. Later test scaled a little more up to 8 nodes http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
  21. 21. 21 Sysbench disk bound (20GB data / 6GB buffer), latency   As before   Not syncing InnoDB decreases latency   Scale-out decreases latency   Galera does not add latency overhead http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
  22. 22. 22 Galera and NDB shootout: sysbench "out of the box"   Galera is 4x better Ok, so what does this really mean?   That Galera is better... -  For this workload -  With default settings (Severalnines) -  Pretty user friendly and general purpose   NDB -  Excels at key-value and heavy-write workloads (which sysbench is not) -  Would benefit here from PARTITION BY RANGE http://codership.com/content/whats-difference-kenneth
  23. 23. 23 Drupal on Galera: baseline w single server   Drupal, Apache, PHP, MySQL 5.1   JMeter -  3 types of users: poster, commenter, reader -  Gaussian (15, 7) think time   Large EC2 instance   Ideal scalability: linear until tipping point at 140-180 users -  Constrained by Apache/PHP CPU utilization -  Could scale out by adding more Apache in front of single MySQL http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
  24. 24. 24 Drupal on Galera: Scale-out with 1-4 Galera nodes (tps)   Drupal, Apache, PHP, MySQL 5.1 w Galera   1-4 identical nodes -  Whole stack cluster -  MySQL connection to localhost   Multiply nr of users -  180, 360, 540, 720   3 nodes = linear scalability, 4 nodes still near-linear   Minimal latency overhead http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
  25. 25. 25 Drupal on Galera: Scale-out with 1-4 Galera nodes (latency)   Like before   Constant nr of users -  180, 180, 180, 180   Scaling from 1 to 2 -  drastically reduces latency -  tps back to linear scalability   Scaling to 3 and 4 -  No more tps as there was no bottleneck. -  Slightly better latency -  Note: No overhead from additional nodes! http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
  26. 26. 26 WAN replication, EC2 eu-west + us-east, tps http://codership.com/content/synchronous-replication-loves-you-again client eu-west db in us-east
  27. 27. 27 WAN replication, EC2 eu-west + us-east, latency http://codership.com/content/synchronous-replication-loves-you-again client eu-west db in us-east
  28. 28. 28 Conclusion: WAN only adds commit latency, which is usually ok EU-west <-> US-east -  90 ms -  "best case" EU <-> JPN -  275 ms EU <-> JPN <-> USA -  295 ms You can choose latency between: -  user and web server = ok -  web server and db = bad -  db and db = great! http://codership.com/content/synchronous-replication-loves-you-again http://www.mysqlperformanceblog.com/2012/01/11/making-the-impossible-3-nodes-intercontinental-replication/

×