- Galera is a MySQL clustering solution that provides true multi-master replication with synchronous replication and no single point of failure.
- It allows high availability, data integrity, and elastic scaling of databases across multiple nodes.
- Companies like Percona and MariaDB have integrated Galera to provide highly available database clusters.
2. 2
Created by Codership Oy
Our founders participated
in 3 MySQL cluster
developments, since
2003.
Started Galera work
2007. Based on PhD by
Fernando Pedone.
1.0 in 2011. Percona &
MariaDB in 2012.
Galera is free & open
source. Support and
consulting by Codership
& partners.
3. 3
Galera
Galera in a nutshell
True multi-master:
Read & write to any node
Synchronous replication
No slave lag
No integrity issues
No master-slave failovers or VIP
needed
Multi-threaded slave, no
performance penalty
Automatic node provisioning
Elastic:
Easy scale-out & scale-in,
all nodes read-write
Master MasterMaster
4. 4
Sysbench disk bound (20GB data / 6GB buffer), tps
EC2 w local disk
- Note: pretty poor I/O
here
Blue vs red:
innodb_flush_log_at_trx_commit
> 66% improvement
Scale-out factors:
2N = 0.5 x 1N
4N = 0.5 x 2N
Sysbench disk bound, 20GB data / 6GB InnoDB buffer, tps
http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
5. 5
Galera vs other HA solutions
Galera is like...
MySQL replication without
integrity issues or slave lag
DRBD/SAN without failover
downtime and performance
penalty
Oracle RAC without failover
downtime
NDB, but you get to keep
InnoDB
Galera
NDB
Failover downtime
MySQL
replication
Slow Fast
Dataintegrity
DRBD
99 %
99.999...%
PoorSolid
RAC
SAN
Backups
6. 6
Active-Active DB = best with Load Balancer
HA Proxy, GLB, Cisco, F5...
Pictured: Load balancer on
each app server
- No Single Point of Failure
- One less layer of network components
- PHP and JDBC drivers provide this built-in!
jdbc:mysql:loadbalance://
10.0.0.1,10.0.0.2,10.0.0.3
/<database>?
loadBalanceBlacklistTimeout=5000
Or: Separate HW or SW load
balancer
- Centralized administration
- What if LB fails?
Galera
MySQL MySQLMySQL
LB LB
8. 8
Galera
Quorum
Galera uses quorum based failure
handling:
- When cluster partitioning is
detected, the majority partition
"has quorum" and can continue
- A minority partition cannot
commit transactions, but will
attempt to re-connect to primary
partition
- Note: 50% is not majority!
=> Minimum 3 nodes
recommended.
Load balancer will notice errors &
remove node from pool
MySQL MySQLMySQL
LB LB
9. 9
WAN replication
Works fine
Use higher timeouts and send
windows
No impact on reads
No impact within a transaction
adds 100-300 ms to commit
latency
No major impact on tps
Quorum between data
centers
- 3 data centers
- Distribute nodes evenly
10. 10
WAN with MySQL asynchronous replication
You can mix Galera replication
and MySQL replication
Good option on poor WAN
Remember to watch out for
slave lag, etc...
"Channel failover" if a master
node crashes
Mixed replication useful when
you want async slave (such as
time-delayed, filtered, multi-
source...)
13. 13
Migration checklist
Are your tables InnoDB?
Make sure all tables have Primary Key
Watch out for Triggers and Events
Tip: Don't do too many changes at once. Migrate to InnoDB first,
run a month in production, then migrate to Galera.
14. 14
MySQL
A MySQL Galera cluster is...
InnoDBMyISAM
ReplicationAPI
WsrepAPI
SHOW STATUS LIKE "wsrep%"
SHOW VARIABLES ...
Galera group comm library
MySQL
MySQL
Snapshot State Transfer
mysqldump
rsync
xtrabackup
etc...
http://www.codership.com/downloads/download-mysqlgalera
15. 15
Understanding the transaction sequence in Galera
BEGIN
Master Slave
SELECT
UPDATE
COMMIT
User transaction
Certification
Group
communication
=> GTIDCertification
COMMIT
Apply
commit
return
Commit
delay
Virtual
synchrony
=
Committed
events
written to
InnoDB
after small
delay
Optimistic
locking
between
nodes
=
Risk for
deadlocks
ROLLB
InnoDB
commit
COMMIT discard
Certification =
deterministic
InnoDB
commit
16. 16
What if I only have 2 nodes?
Galera Arbitrator (garbd)
Acts as a 3rd node in a
cluster but doesn't store the
data.
Run it on an app server.
Run it on any other available
server.
Note: Do not run a 3rd node
in a VM on same hypervisor
as other Galera nodes.
(Why?)
Master-slave clustering
Pacemaker, Heartbeat, etc...
- Manual failover?
Still better than MySQL
replication or DRBD: Hot
standby, multi-threaded
slave...
Prioritize data integrity:
set global wsrep_on=0
# (at failover)
Prioritize failover speed:
pc.ignore_quorum=on
# (at startup)
17. 17
Optimistic locking cluster-wide
...theoretical chance of deadlocks
- In most cases less than 1 out of 10.000 trx
- Correct solution: Catch exceptions in app and retry
- Design: Avoid hot-spots in tables
- Workaround: Directing all writes (or all problematic writes) to
single node brings back 100% InnoDB compatibility
18. 18
Snapshot options
SST = Full snapshot
Mysqldump & rsync will block donor
- Dedicate 1 node to act as donor
Xtrabackup is a non-blocking option
Really big databases
- wsrep_sst_method=skip + manual backup & restore
- wsrep_sst_method=fedex :-)
IST = Incremental State Transfer
Logic: IST is preferred over SST
gcache.size <= DB size
gcache.size >= wsrep_replicated_bytes * <outage duration>
20. 20
Sysbench disk bound (20GB data / 6GB buffer), tps
EC2 w local disk
- Note: pretty poor I/O here
Blue vs red: turning off
innodb_flush_log_at_trx_com
mit gives 66% improvement
Scale-out factors:
2N = 0.5 x 1N
4N = 0.5 x 2N
5th node was EC2 weakness.
Later test scaled a little more
up to 8 nodes
http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
21. 21
Sysbench disk bound (20GB data / 6GB buffer), latency
As before
Not syncing InnoDB
decreases latency
Scale-out decreases
latency
Galera does not add
latency overhead
http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
22. 22
Galera and NDB shootout: sysbench "out of the box"
Galera is 4x better
Ok, so what does this really
mean?
That Galera is better...
- For this workload
- With default settings
(Severalnines)
- Pretty user friendly and
general purpose
NDB
- Excels at key-value and
heavy-write workloads
(which sysbench is not)
- Would benefit here from
PARTITION BY RANGE http://codership.com/content/whats-difference-kenneth
23. 23
Drupal on Galera: baseline w single server
Drupal, Apache, PHP,
MySQL 5.1
JMeter
- 3 types of users: poster,
commenter, reader
- Gaussian (15, 7) think time
Large EC2 instance
Ideal scalability: linear until
tipping point at 140-180 users
- Constrained by Apache/PHP
CPU utilization
- Could scale out by adding
more Apache in front of
single MySQL
http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
24. 24
Drupal on Galera: Scale-out with 1-4 Galera nodes (tps)
Drupal, Apache, PHP,
MySQL 5.1 w Galera
1-4 identical nodes
- Whole stack cluster
- MySQL connection to
localhost
Multiply nr of users
- 180, 360, 540, 720
3 nodes = linear scalability,
4 nodes still near-linear
Minimal latency overhead
http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
25. 25
Drupal on Galera: Scale-out with 1-4 Galera nodes (latency)
Like before
Constant nr of users
- 180, 180, 180, 180
Scaling from 1 to 2
- drastically reduces
latency
- tps back to linear
scalability
Scaling to 3 and 4
- No more tps as there
was no bottleneck.
- Slightly better latency
- Note: No overhead from
additional nodes! http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
26. 26
WAN replication, EC2 eu-west + us-east, tps
http://codership.com/content/synchronous-replication-loves-you-again
client eu-west
db in us-east
27. 27
WAN replication, EC2 eu-west + us-east, latency
http://codership.com/content/synchronous-replication-loves-you-again
client eu-west
db in us-east
28. 28
Conclusion: WAN only adds commit latency, which is usually ok
EU-west <-> US-east
- 90 ms
- "best case"
EU <-> JPN
- 275 ms
EU <-> JPN <-> USA
- 295 ms
You can choose latency
between:
- user and web server = ok
- web server and db = bad
- db and db = great!
http://codership.com/content/synchronous-replication-loves-you-again
http://www.mysqlperformanceblog.com/2012/01/11/making-the-impossible-3-nodes-intercontinental-replication/