Galera Cluster Best Practices
Seppo Jaakola
Codership
Agenda
●

Galera Cluster Short Introduction

●

Multi-master Conflicts

●

State Transfers (SST IST)

●

Backups

●

Schem...
Galera Cluster
Multi-Master Replication

MySQL

a

Galera Replication

www.codership.com
4
Multi-Master Replication

There can be several nodes

MySQL

a

MySQL

Galera Replication

www.codership.com
5
Multi-Master Replication

There can be several nodes

MySQL

a

MySQL

MySQL

Galera Replication

www.codership.com
6
Multi-Master Replication

Client can connect to any node
There can be several nodes

MySQL

a

MySQL

MySQL

Galera Replic...
Multi-Master Replication

read & write

read & write

read & write

Read & write access to any node
Client can connect to ...
Multi-Master Replication

read & write

read & write

read & write

Read & write access to any node
Client can connect to ...
Multi-Master Replication

read & write

read & write

read & write

Multi-master cluster looks
like one big database with
...
Galera Cluster

➢

Synchronous multi-master cluster

➢

For MySQL/InnoDB

➢

3 or more nodes needed for HA

➢

Automatic n...
Synchronous Replication

Transaction is
processed locally up
to commit time

Read & write

MySQL

a

MySQL

MySQL

Galera ...
Synchronous Replication

Transaction is
replicated to whole
cluster

commit

MySQL

MySQL

MySQL

a
Galera Replication
www...
Synchronous Replication

Client gets OK
status

OK

MySQL

MySQL

MySQL

a
Galera Replication
www.codership.com
16
Synchronous Replication

Transaction is
applied in slaves

MySQL

MySQL

MySQL

a
Galera Replication
www.codership.com
17
Dealing with Multi-Master Conflicts
Multi-master Conflicts

write

write

MySQL

MySQL

MySQL

a
Galera Replication
www.codership.com
20
Multi-master Conflicts

write

write

MySQL

MySQL

MySQL

Conflict detected

a
Galera Replication
www.codership.com
21
Multi-master Conflicts

OK

write

MySQL

MySQL

Deadlock
error

MySQL

a
Galera Replication
www.codership.com
22
Multi-Master Conflicts
●
●

Galera uses optimistic concurrency control
If two transactions modify same row on
different no...
Database Hot-Spots
●

●

Some rows where many transactions want to
write to simultaneously
Patterns like queue or ID alloc...
Hot-Spots

write

write

write

Hot row

a
www.codership.com
25
Diagnosing Multi-Master Conflicts
●

●

●

●

In the past Galera did not log much information
from cluster wide conflicts
...
wsrep_retry_autocommit
●

●

●

●

Galera can retry autocommit transaction on
behalf of the client application, inside of ...
Retry Autocommit

Write

write

1. conflict
detected
2. retrying

MySQL

MySQL

MySQL

a
Galera Replication
www.codership....
Multi-Master Conflicts
1) Analyze the hot-spot
2) Check if application logic can be changed to
catch deadlock exception an...
State Transfers
State Transfer

Joining node needs to get the current
database state
➢ Two choices:
➢ IST: incremental state transfer
➢ SS...
State Snapshot Transfer
To send full database state
● wsrep_sst_method to choose the method:
➢ mysqldump
➢ rsync
➢ xtrabac...
SST Request

MySQL

joiner

MySQL
SST Request

Galera Replication

●

wsrep_sst_method

www.codership.com
33
SST Method

wsrep_sst_mysqldump

MySQL

donor

wsrep_sst_rsync

joiner

Galera Replication
wsrep_sst_xtrabackup

www.coder...
SST API
SST is open API for shell scripts
● Anyone can write custom SST
● SST API can be used e.g. for:
● Backups
● Filter...
wsrep_sst_mysqldump
Logical backup
● Slowest method
● Configure authentication
➢ wsrep_sst_auth=”root:rootpass”
➢ Super pr...
wsrep_sst_rsync
Physical backup
● Fast method
● Can only be used when node is starting
➢ Rsyncing datadirectory under runn...
wsrep_sst_xtrabackup
Contributed by Percona
● Probably the fastest method
● Uses xtrabackup
● Least blocking on Donor side...
SST Donor
All SST methods cause some disturbance
for donor node
● By default donor accepts client
connections, although co...
Incremental State Transfer

Request to join

Donor

gcache

GTID: seqno-n

Joiner

seqno-n
gcache

www.codership.com
40
Incremental State Transfer

Joiner

Donor

Send IST events
gcache

apply

seqno-n
gcache

www.codership.com
41
Incremental State Transfer
Very effective
● gcache.size parameter defines how big
cache will be maintained
● Gcache is mma...
Incremental State Transfer
●

Use database size and write rate to
optimize gcache:
gcache < database
➢ Write rate tells ho...
Incremental State Transfer
●

You can think that IST Is
●
●

A short asynchronous replication session
If communication is ...
Backups

Backups

Backups
Backups
➢

All Galera nodes are constantly up to date

➢

Best practices:
Dedicate a reference node for backups
➢ Assign g...
Backups with global Trx ID
➢

➢

Global transaction ID (GTID) marks a
position in the cluster transaction stream
Backup wi...
Backup by Disconnecting a Node

Isolate the backup node

Load Balancing

MySQL

MySQL

MySQL

Galera Replication

www.code...
Backup by Disconnecting a Node

Load Balancing

MySQL

MySQL

MySQL

Disconnect from group
e.g. clear wsrep_provider

Gale...
Backup by Disconnecting a Node

Load Balancing

MySQL

MySQL

MySQL

Disconnect from group
e.g. clear wsrep_provider

Gale...
Backup by Disconnecting a Node

Load Balancing

MySQL

MySQL

MySQL

Work your backup magic

Galera Replication

backups
w...
Backup by Disconnecting a Node

Load Balancing

MySQL

MySQL

MySQL

Galera Replication

Read global transaction ID
from s...
Backup by SST

●

●

●

Donor mode provides isolated processing
environment
A special SST script can be written just to
pr...
Backup by SST API

Launch garbd

Load Balancing

SST request

node1

node2

node3

Garbd
wsrep_sst_donor=node3
wsrep_sst_m...
Backup by SST API

Donor launches
wsrep_sst_backup

Load Balancing

node1

node2

node3

Galera Replication

wsrep_sst_bac...
Backup by SST API

wsrep_sst_backup
prepares the backup

Load Balancing

node1

node2

node3

Galera Replication

wsrep_ss...
Backup by SST API

Backup node returns to
cluster

Load Balancing

node1

node2

node3

Galera Replication

www.codership....
Backup by xtrabackup

●

●
●

Xtrabackup is hot backup method and can be
used anytime
Simple, efficient
Use –galera-info o...
Schema Upgrades
Schema Upgrades

●

●

DDL is non-transactional, and therefore
bad
Galera has two methods for DDL
TOI, Total Order Isolati...
Total Order Isolation

●

DDL is replicated up-front
Each node will get the DDL statement
and must process the DDL at same...
Rolling Schema Upgrade
●
●

●

●

●

DDL is not replicated
Galera will take the node out of replication
for the duration o...
wsrep_on=OFF
●

●

●

wsrep_on is a session variable telling if this
session will be replicated or not
I tried to hide thi...
Schema Upgrades
●

Best practices:
➔

Plan your upgrades
➔

➔

Rehearse your upgrades
➔

➔

Try to be backwards compatible...
Consistent Reads
Consistent reads
Replication is virtually synchronous...

Transaction is
replicated to whole
cluster

commit

MySQL

MySQL...
Consistent reads

1. Insert into t1 values (1,....)
2. Select from t1 where i=1
Will the select see
the inserted row?

MyS...
Consistent Reads
●
●

Aka read causality
There is causal dependency between
operations on two database connections
●

Appl...
Consistent Reads
●

Use: wsrep_causal_reads=ON
➔

●

Every read (select, show) will wait until slave
queue has been fully ...
Other Tidbits...
Parallel Applying
●

Aka parallel replication

●

“true parallel applying”
●
●

Every application will benefit of it
Works...
MyISAM Replication

●

On experimental level
●

MyISAM is phasing out not much demand to
complete

●

Replicates SQL up-fr...
SSL / TLS
●

Replication over SSL is supported

●

No authentication (yet), only encryption

●

Whole cluster must use SSL...
SSL or VPN
●

●

●

Bundling several nodes through VPN
tunnel may cause a vulnerability
When VPN gateway breaks, a big par...
UDP Multicast
●
●

●
●

Configure with gmcast.mcast_addr
Full cluster must be configured for
multicast or tcp sockets
Mult...
Galera Project
Galera Project
●

Galera Cluster for MySQL
●
●
●
●

●

~3 releases per year
●
●

●

5 years development
based on MySQL ser...
Galera Project
Galera Cluster for MySQL
MariaDB Galera Cluster

Percona XtraDB Cluster
MySQL
Percona
Server

e
er g
m

API...
Questions?

Thank you for listening!
Happy Clustering :-)
Upcoming SlideShare
Loading in...5
×

Plny12 galera-cluster-best-practices

1,474

Published on

Published in: Technology, Travel
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,474
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
22
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Plny12 galera-cluster-best-practices

  1. 1. Galera Cluster Best Practices Seppo Jaakola Codership
  2. 2. Agenda ● Galera Cluster Short Introduction ● Multi-master Conflicts ● State Transfers (SST IST) ● Backups ● Schema Upgrades ● Galera Project www.codership.com 2
  3. 3. Galera Cluster
  4. 4. Multi-Master Replication MySQL a Galera Replication www.codership.com 4
  5. 5. Multi-Master Replication There can be several nodes MySQL a MySQL Galera Replication www.codership.com 5
  6. 6. Multi-Master Replication There can be several nodes MySQL a MySQL MySQL Galera Replication www.codership.com 6
  7. 7. Multi-Master Replication Client can connect to any node There can be several nodes MySQL a MySQL MySQL Galera Replication www.codership.com 7
  8. 8. Multi-Master Replication read & write read & write read & write Read & write access to any node Client can connect to any node There can be several nodes MySQL a MySQL MySQL Galera Replication www.codership.com 8
  9. 9. Multi-Master Replication read & write read & write read & write Read & write access to any node Client can connect to any node There can be several nodes MySQL a MySQL MySQL Galera Replication Replication is synchronous www.codership.com 9
  10. 10. Multi-Master Replication read & write read & write read & write Multi-master cluster looks like one big database with multiple entry points a MySQL www.codership.com 10
  11. 11. Galera Cluster ➢ Synchronous multi-master cluster ➢ For MySQL/InnoDB ➢ 3 or more nodes needed for HA ➢ Automatic node provisioning ➢ Works in LAN / WAN / Cloud www.codership.com 11
  12. 12. Synchronous Replication Transaction is processed locally up to commit time Read & write MySQL a MySQL MySQL Galera Replication www.codership.com 14
  13. 13. Synchronous Replication Transaction is replicated to whole cluster commit MySQL MySQL MySQL a Galera Replication www.codership.com 15
  14. 14. Synchronous Replication Client gets OK status OK MySQL MySQL MySQL a Galera Replication www.codership.com 16
  15. 15. Synchronous Replication Transaction is applied in slaves MySQL MySQL MySQL a Galera Replication www.codership.com 17
  16. 16. Dealing with Multi-Master Conflicts
  17. 17. Multi-master Conflicts write write MySQL MySQL MySQL a Galera Replication www.codership.com 20
  18. 18. Multi-master Conflicts write write MySQL MySQL MySQL Conflict detected a Galera Replication www.codership.com 21
  19. 19. Multi-master Conflicts OK write MySQL MySQL Deadlock error MySQL a Galera Replication www.codership.com 22
  20. 20. Multi-Master Conflicts ● ● Galera uses optimistic concurrency control If two transactions modify same row on different nodes at the same time, one of the transactions must abort ➔ ● Victim transaction will get deadlock error Application should retry deadlocked transactions, however not all applications have retrying logic inbuilt www.codership.com 23
  21. 21. Database Hot-Spots ● ● Some rows where many transactions want to write to simultaneously Patterns like queue or ID allocation can be hotspots www.codership.com 24
  22. 22. Hot-Spots write write write Hot row a www.codership.com 25
  23. 23. Diagnosing Multi-Master Conflicts ● ● ● ● In the past Galera did not log much information from cluster wide conflicts But, by using wsrep_debug configuration, all conflicts (...and plenty of other information) will be logged Next release will add new variable: wsrep_log_conflicts which will cause each cluster conflict to be logged in mysql error log Monitor: ● ● wsrep_local_bf_aborts wsrep_local_cert_failures www.codership.com 26
  24. 24. wsrep_retry_autocommit ● ● ● ● Galera can retry autocommit transaction on behalf of the client application, inside of the MySQL server MySQL will not return deadlock error, but will silently retry the transaction wsrep_retry_autocommit=n will retry the transaction n times before giving up and returning deadlock error Retrying applies only to autocommit transactions, as retrying is not safe for multistatement transactions www.codership.com 27
  25. 25. Retry Autocommit Write write 1. conflict detected 2. retrying MySQL MySQL MySQL a Galera Replication www.codership.com 28
  26. 26. Multi-Master Conflicts 1) Analyze the hot-spot 2) Check if application logic can be changed to catch deadlock exception and apply retrying logic in application 3) Try if wsrep_retry_autocommit configuration helps 4) Limit the number of master nodes or change completely to master-slave model if you can filter out the access to the hotspot table, it is enough to treat writes only to hot-spotwww.codership.com master-slave table as 29
  27. 27. State Transfers
  28. 28. State Transfer Joining node needs to get the current database state ➢ Two choices: ➢ IST: incremental state transfer ➢ SST: full state transfer ➢ If joining node had some previous state and gcache spans to that, then IST can be used ➢ www.codership.com 31
  29. 29. State Snapshot Transfer To send full database state ● wsrep_sst_method to choose the method: ➢ mysqldump ➢ rsync ➢ xtrabackup ● www.codership.com 32
  30. 30. SST Request MySQL joiner MySQL SST Request Galera Replication ● wsrep_sst_method www.codership.com 33
  31. 31. SST Method wsrep_sst_mysqldump MySQL donor wsrep_sst_rsync joiner Galera Replication wsrep_sst_xtrabackup www.codership.com 34
  32. 32. SST API SST is open API for shell scripts ● Anyone can write custom SST ● SST API can be used e.g. for: ● Backups ● Filtering out part of database ● www.codership.com 35
  33. 33. wsrep_sst_mysqldump Logical backup ● Slowest method ● Configure authentication ➢ wsrep_sst_auth=”root:rootpass” ➢ Super privilege needed ● Make sure SST user in donor node can take mysqldump from donor and load it over the network to joiner node ● ● You can try this manually beforehand www.codership.com 36
  34. 34. wsrep_sst_rsync Physical backup ● Fast method ● Can only be used when node is starting ➢ Rsyncing datadirectory under running InnoDB is not possible ● www.codership.com 37
  35. 35. wsrep_sst_xtrabackup Contributed by Percona ● Probably the fastest method ● Uses xtrabackup ● Least blocking on Donor side (short readlock is still used when backup starts) ● www.codership.com 38
  36. 36. SST Donor All SST methods cause some disturbance for donor node ● By default donor accepts client connections, although committing will be prohibited for a while ● If wsrep_sst_donor_rejects_queries is set, donor gives unknown command error to clients ➔ Best practice is to dedicate a reference node for donor and backup activities ● www.codership.com 39
  37. 37. Incremental State Transfer Request to join Donor gcache GTID: seqno-n Joiner seqno-n gcache www.codership.com 40
  38. 38. Incremental State Transfer Joiner Donor Send IST events gcache apply seqno-n gcache www.codership.com 41
  39. 39. Incremental State Transfer Very effective ● gcache.size parameter defines how big cache will be maintained ● Gcache is mmap, available disk space is upper limit for size allocation ● www.codership.com 42
  40. 40. Incremental State Transfer ● Use database size and write rate to optimize gcache: gcache < database ➢ Write rate tells how long tail will be stored in cache ➢ www.codership.com 43
  41. 41. Incremental State Transfer ● You can think that IST Is ● ● A short asynchronous replication session If communication is bad quality, node can drop and join back fast with IST www.codership.com 44
  42. 42. Backups Backups Backups
  43. 43. Backups ➢ All Galera nodes are constantly up to date ➢ Best practices: Dedicate a reference node for backups ➢ Assign global trx ID with the backup Possible methods: ➢ ➢ 1.Disconnecting a node for backup 2.Using SST script interface 3.xtrabackup www.codership.com 46
  44. 44. Backups with global Trx ID ➢ ➢ Global transaction ID (GTID) marks a position in the cluster transaction stream Backup with known GTID make it possible to utilize IST when joining new nodes, eg, when: ➢ ➢ Recovering the node Provisioning new nodes www.codership.com 47
  45. 45. Backup by Disconnecting a Node Isolate the backup node Load Balancing MySQL MySQL MySQL Galera Replication www.codership.com 48
  46. 46. Backup by Disconnecting a Node Load Balancing MySQL MySQL MySQL Disconnect from group e.g. clear wsrep_provider Galera Replication www.codership.com 49
  47. 47. Backup by Disconnecting a Node Load Balancing MySQL MySQL MySQL Disconnect from group e.g. clear wsrep_provider Galera Replication www.codership.com 50
  48. 48. Backup by Disconnecting a Node Load Balancing MySQL MySQL MySQL Work your backup magic Galera Replication backups www.codership.com 51
  49. 49. Backup by Disconnecting a Node Load Balancing MySQL MySQL MySQL Galera Replication Read global transaction ID from status and assign to backup wsrep_cluster_uuid wsrep_last_committed backups www.codership.com 52
  50. 50. Backup by SST ● ● ● Donor mode provides isolated processing environment A special SST script can be written just to prepare backup in donor node: wsrep_sst_backup Garbd can be used to trigger donor node to run the wsrep_sst_backup www.codership.com 53
  51. 51. Backup by SST API Launch garbd Load Balancing SST request node1 node2 node3 Garbd wsrep_sst_donor=node3 wsrep_sst_method=backup Galera Replication www.codership.com 54
  52. 52. Backup by SST API Donor launches wsrep_sst_backup Load Balancing node1 node2 node3 Galera Replication wsrep_sst_backup . . . www.codership.com 55
  53. 53. Backup by SST API wsrep_sst_backup prepares the backup Load Balancing node1 node2 node3 Galera Replication wsrep_sst_backup . . .GTID backups www.codership.com 56
  54. 54. Backup by SST API Backup node returns to cluster Load Balancing node1 node2 node3 Galera Replication www.codership.com 57
  55. 55. Backup by xtrabackup ● ● ● Xtrabackup is hot backup method and can be used anytime Simple, efficient Use –galera-info option to get global transaction ID logged into separate galera info file www.codership.com 58
  56. 56. Schema Upgrades
  57. 57. Schema Upgrades ● ● DDL is non-transactional, and therefore bad Galera has two methods for DDL TOI, Total Order Isolation ● RSU, Rolling Schema Upgrade Use wsrep_osu_method to choose either option ● ● www.codership.com 60
  58. 58. Total Order Isolation ● DDL is replicated up-front Each node will get the DDL statement and must process the DDL at same slot in transaction stream Galera will isolate the affected table/database for the duration of DDL processing ● ● www.codership.com 61
  59. 59. Rolling Schema Upgrade ● ● ● ● ● DDL is not replicated Galera will take the node out of replication for the duration of DDL processing When DDL is done with, node will catch up with missed transactions (like IST) DBA should roll RSU operation over all nodes Requires backwards compatible schema changes www.codership.com 62
  60. 60. wsrep_on=OFF ● ● ● wsrep_on is a session variable telling if this session will be replicated or not I tried to hide this information to the best I can, but somebody has leaked this out And so, yes, it is possible to run “poor man's RSU” with wsrep_on set to OFF ● such session may be aborted by replication ● Use only, if you are really sure that: planned SQL is not conflicting ● SQL will not generate inconsistency ● www.codership.com 63
  61. 61. Schema Upgrades ● Best practices: ➔ Plan your upgrades ➔ ➔ Rehearse your upgrades ➔ ➔ Try to be backwards compatible Find out DDL execution time Go for RSU if possible www.codership.com 64
  62. 62. Consistent Reads
  63. 63. Consistent reads Replication is virtually synchronous... Transaction is replicated to whole cluster commit MySQL MySQL MySQL Galera Replication www.codership.com 66
  64. 64. Consistent reads 1. Insert into t1 values (1,....) 2. Select from t1 where i=1 Will the select see the inserted row? MySQL MySQL Galera Replication www.codership.com 67
  65. 65. Consistent Reads ● ● Aka read causality There is causal dependency between operations on two database connections ● Application is expecting to see the values of earlier write www.codership.com 68
  66. 66. Consistent Reads ● Use: wsrep_causal_reads=ON ➔ ● Every read (select, show) will wait until slave queue has been fully applied There is timeout for max causal read wait: ● replicator.causal_read_keepalive www.codership.com 69
  67. 67. Other Tidbits...
  68. 68. Parallel Applying ● Aka parallel replication ● “true parallel applying” ● ● Every application will benefit of it Works not on database, not on table, but on row level ● wsrep_slave_threads=n ● How many slaves makes sense: ● ● Monitor wsrep_cert_deps_distance Max 2 * cores www.codership.com 71
  69. 69. MyISAM Replication ● On experimental level ● MyISAM is phasing out not much demand to complete ● Replicates SQL up-front, like TOI ● Should be used in master-slave model ● No checks for non-deterministic SQL ● Insert into t (r, time) values (rand(), now()); www.codership.com 72
  70. 70. SSL / TLS ● Replication over SSL is supported ● No authentication (yet), only encryption ● Whole cluster must use SSL www.codership.com 73
  71. 71. SSL or VPN ● ● ● Bundling several nodes through VPN tunnel may cause a vulnerability When VPN gateway breaks, a big part of cluster will be blacked out Best practice is to go for SSL if VPN does not have alternative routes www.codership.com 74
  72. 72. UDP Multicast ● ● ● ● Configure with gmcast.mcast_addr Full cluster must be configured for multicast or tcp sockets Multicast is good for scalability Best practice is to go for multicast if planning for large clusters www.codership.com 75
  73. 73. Galera Project
  74. 74. Galera Project ● Galera Cluster for MySQL ● ● ● ● ● ~3 releases per year ● ● ● 5 years development based on MySQL server community edition Fully open source Active community Release 2.2 RC out yesterday Major release 3.0 in the works Galera Replication also used in: ● ● Percona XtraDB Cluster MariaDB Galera www.codership.com Cluster 77
  75. 75. Galera Project Galera Cluster for MySQL MariaDB Galera Cluster Percona XtraDB Cluster MySQL Percona Server e er g m API MariaDB merge API API Galera Replication plugin www.codership.com 78
  76. 76. Questions? Thank you for listening! Happy Clustering :-)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×