Galera

Cluster

Node Recovery
Seppo Jaakola
Codership
Agenda

●

Node Recovery Scenarios

●

Incremental State Transfer

●

State Snapshot Transfer

●

Full Cluster recovery

www.codership.com
2
Node Recovery Scenarios
Node drops from cluster gracefully and joins back
Replication state is stored in grastate.dat le
Joining happens by Incremental State Transfer (IST)

●
●

Joining after node crash
●
●

Node has either known or unknown state
Joining can happen by IST or full State Snapshot Transfer (SST)
is required

Full Cluster recovery
●
●
●
●

e.g. data center power down
All nodes with known or unknown states
The node with latest known state must be identi ed
New cluster needs to be bootstrapped

www.codership.com
3
Joining One Node to Cluster
Automatic Node Joining

Cluster handshake

MySQL

joiner

MySQL

Galera Replication

www.codership.com
5
Automatic Node Joining

Cluster selects donor to help
the joiner to join
Send state

MySQL

joiner

Donor
IST or SST

Galera Replication

www.codership.com
6
Automatic Node Joining

Catch up

MySQL

MySQL

Galera Replication

joiner

Slave queue

www.codership.com
7
Automatic Node Joining

MySQL

MySQL

MySQL

Galera Replication

www.codership.com
8
Incremental State Transfer
Incremental State Transfer
●

●

●

Every node in Galera Cluster has a log of replicated write
sets: gcache
Gcache is mmap le, available disk space is upper limit
for size allocation
If joining node has past history in the cluster and donor
has long enough gcache containing joiner's seqno
position => then IST can be used for synchronization

www.codership.com
10
Incremental State Transfer

Request to join

Node-1

GTID: seqno-n

Node-n

Joiner

Donor
seqno-n+m

grastate.dat

seqno-n
gcache

Group ID:seqno

gcache

www.codership.com
11
Incremental State Transfer

Node-1

Node-n

Joiner
Donor
apply

seqno-n+m

Send IST events
grastate.dat

gcache

seqno-n

Group ID:seqno

gcache

www.codership.com
12
Incremental State Transfer
●

●

●

Node synchronization by IST is very e/ective and least
intrusive method for the donor
gcache.size parameter de nes how big cache will be
maintained
Use database size and write rate to optimize gcache:
➢
➢

●

●

gcache < database size
Write rate de nes how long tail is available in cache

If joiner node had crashed and IST was used to
synchronize it back, then it is essential that InnoDB
recovery works (innodb_doublewrite)
If IST is not possible, donor will switch automatically to
SST method
www.codership.com
13
State Snapshot Transfer
SST Request

MySQL

joiner

MySQL
SST Request

Galera Replication

●

wsrep_sst_method

www.codership.com
15
SST Method

wsrep_sst_mysqldump

MySQL

donor

wsrep_sst_rsync

joiner

Galera Replication
wsrep_sst_xtrabackup

www.codership.com
16
State Snapshot Transfer
●
●

●
●

●

To send full database state
wsrep_sst_method to choose the method:
➢ mysqldump
➢ rsync
➢ Xtrabackup
Open API for creating new SST methods
All SST methods cause at least some service break in
donor node
If node has crashed, InnoDB recovery will happen
during startup. But with SST, this InnoDB recovery is
more or less useless

www.codership.com
17
Full Cluster Recovery
Full Cluster Recovery
All nodes dropped from cluster:
1. Find the node which has latest changes
2. Bootstrap new cluster from the latest node

www.codership.com
19
Node With Latest Changes
Check grastate.dat les:
1. File has valid seqno

# GALERA saved state
version: 2.1
uuid:
5ee99582-bb8d-11e2-b8e3-23de375c1d30
seqno:
8204503945773

●

Graceful shutdown

●

Find node which has biggest seqno

2. No seqno, but group ID is there

# GALERA saved state
version: 2.1
uuid:
5ee99582-bb8d-11e2-b8e3-23de375c1d30
seqno:
-1

●

Crash during transaction processing

●

Use –wsrep-recover to dig out the last seqno

3. No seqno, no group ID
●

# GALERA saved state
version: 2.1
uuid:
00000000-0000-0000-0000-000000000000
seqno:
-1

Crash during DDL

http://www.codership.com/wiki/doku.php?id=mysql_galera_restart
www.codership.com
20
--wsrep-recover
MySQL stores last committed GTID in InnoDB data
header, transactionally

●

This GTID can be read by starting mysqld with
–wsrep-recover option

●

<path to bin>/mysqld

–wsrep-recover –defaults- le=<path to my.cnf>

●

Mysqld will read InnoDB header les and shutdown immediately

●

Last wsrep position is printed in mysql error le
130514 18:39:13 [Note] WSREP: Recovered position: 5ee99582-bb8d-11e2-b8e3-23de375c1d30:8204503945771

www.codership.com
21
Bootstrapping New Cluster
When the latest node has been identi ed, start this
node as rst node in cluster

●

●
●

service mysql start –wsrep_new_cluster
service mysql start –wsrep_cluster_address=gcomm://

Start all other nodes. my.cnf should have
wsrep_cluster_address pointing to all other nodes

●

●
●

service mysql start
Don't re all nodes at once, rather start them one by one

www.codership.com
22
Questions?

Thank you for listening!
Happy Clustering :-)

Galera Cluster - Node Recovery - Webinar slides

  • 1.
  • 2.
    Agenda ● Node Recovery Scenarios ● IncrementalState Transfer ● State Snapshot Transfer ● Full Cluster recovery www.codership.com 2
  • 3.
    Node Recovery Scenarios Nodedrops from cluster gracefully and joins back Replication state is stored in grastate.dat le Joining happens by Incremental State Transfer (IST) ● ● Joining after node crash ● ● Node has either known or unknown state Joining can happen by IST or full State Snapshot Transfer (SST) is required Full Cluster recovery ● ● ● ● e.g. data center power down All nodes with known or unknown states The node with latest known state must be identi ed New cluster needs to be bootstrapped www.codership.com 3
  • 4.
    Joining One Nodeto Cluster
  • 5.
    Automatic Node Joining Clusterhandshake MySQL joiner MySQL Galera Replication www.codership.com 5
  • 6.
    Automatic Node Joining Clusterselects donor to help the joiner to join Send state MySQL joiner Donor IST or SST Galera Replication www.codership.com 6
  • 7.
    Automatic Node Joining Catchup MySQL MySQL Galera Replication joiner Slave queue www.codership.com 7
  • 8.
    Automatic Node Joining MySQL MySQL MySQL GaleraReplication www.codership.com 8
  • 9.
  • 10.
    Incremental State Transfer ● ● ● Everynode in Galera Cluster has a log of replicated write sets: gcache Gcache is mmap le, available disk space is upper limit for size allocation If joining node has past history in the cluster and donor has long enough gcache containing joiner's seqno position => then IST can be used for synchronization www.codership.com 10
  • 11.
    Incremental State Transfer Requestto join Node-1 GTID: seqno-n Node-n Joiner Donor seqno-n+m grastate.dat seqno-n gcache Group ID:seqno gcache www.codership.com 11
  • 12.
    Incremental State Transfer Node-1 Node-n Joiner Donor apply seqno-n+m SendIST events grastate.dat gcache seqno-n Group ID:seqno gcache www.codership.com 12
  • 13.
    Incremental State Transfer ● ● ● Nodesynchronization by IST is very e/ective and least intrusive method for the donor gcache.size parameter de nes how big cache will be maintained Use database size and write rate to optimize gcache: ➢ ➢ ● ● gcache < database size Write rate de nes how long tail is available in cache If joiner node had crashed and IST was used to synchronize it back, then it is essential that InnoDB recovery works (innodb_doublewrite) If IST is not possible, donor will switch automatically to SST method www.codership.com 13
  • 14.
  • 15.
    SST Request MySQL joiner MySQL SST Request GaleraReplication ● wsrep_sst_method www.codership.com 15
  • 16.
  • 17.
    State Snapshot Transfer ● ● ● ● ● Tosend full database state wsrep_sst_method to choose the method: ➢ mysqldump ➢ rsync ➢ Xtrabackup Open API for creating new SST methods All SST methods cause at least some service break in donor node If node has crashed, InnoDB recovery will happen during startup. But with SST, this InnoDB recovery is more or less useless www.codership.com 17
  • 18.
  • 19.
    Full Cluster Recovery Allnodes dropped from cluster: 1. Find the node which has latest changes 2. Bootstrap new cluster from the latest node www.codership.com 19
  • 20.
    Node With LatestChanges Check grastate.dat les: 1. File has valid seqno # GALERA saved state version: 2.1 uuid: 5ee99582-bb8d-11e2-b8e3-23de375c1d30 seqno: 8204503945773 ● Graceful shutdown ● Find node which has biggest seqno 2. No seqno, but group ID is there # GALERA saved state version: 2.1 uuid: 5ee99582-bb8d-11e2-b8e3-23de375c1d30 seqno: -1 ● Crash during transaction processing ● Use –wsrep-recover to dig out the last seqno 3. No seqno, no group ID ● # GALERA saved state version: 2.1 uuid: 00000000-0000-0000-0000-000000000000 seqno: -1 Crash during DDL http://www.codership.com/wiki/doku.php?id=mysql_galera_restart www.codership.com 20
  • 21.
    --wsrep-recover MySQL stores lastcommitted GTID in InnoDB data header, transactionally ● This GTID can be read by starting mysqld with –wsrep-recover option ● <path to bin>/mysqld –wsrep-recover –defaults- le=<path to my.cnf> ● Mysqld will read InnoDB header les and shutdown immediately ● Last wsrep position is printed in mysql error le 130514 18:39:13 [Note] WSREP: Recovered position: 5ee99582-bb8d-11e2-b8e3-23de375c1d30:8204503945771 www.codership.com 21
  • 22.
    Bootstrapping New Cluster Whenthe latest node has been identi ed, start this node as rst node in cluster ● ● ● service mysql start –wsrep_new_cluster service mysql start –wsrep_cluster_address=gcomm:// Start all other nodes. my.cnf should have wsrep_cluster_address pointing to all other nodes ● ● ● service mysql start Don't re all nodes at once, rather start them one by one www.codership.com 22
  • 23.
    Questions? Thank you forlistening! Happy Clustering :-)