Taking Full Advantage of
Galera Multi-Master
Philip Stoev
Codership Oy
Agenda
• A very quick overview of Galera Cluster
• General principles of multi-master (MM)
• Workloads that are well-suited for MM
• Application considerations for MM
• Configuring, monitoring and troubleshooting multi-
master
Galera Cluster Overview
Synchronous
– each transaction is immediately replicated on all nodes at commit
– no stale slaves
Multi-Master
– read from and write to any node
– automatic transaction conflict detection
Replication
– a copy of the entire dataset is available on all nodes
– new nodes can join automatically
For MySQL
– based on a modified version of MySQL (5.5, 5.6 with 5.7 coming up)
– InnoDB storage engine
And more …
• Recovers from node failures within seconds
• Data consistency protections
– avoids reading stale data
– prevents unsafe data modifications
• Cloud and WAN support
Introduction to Multi-Master
What is Multi-Master
• The ability to issue any transaction on any Galera node
• A core feature of the product, not a clever trick that
happens to work
• Available out of the box
Benefits to Multi-Master
• Operational flexibility
– no need to designate a single node to use exclusively for writes
– simplified configuration for load balancing
– easier handling of scheduled downtime and node failures
• Wide Area Networks
– applications can write to the node that is closest to them
General Principles
• Galera places consistency on top:
– conflicting transactions issued on different nodes will be detected
– the transaction that committed first succeeds, those that attempt to
commit after it are rejected
– a transaction can be aborted halfway through if Galera detects that it
can not be completed without a conflict
• Callaghan’s Law
“a given row can’t be modified more than once per RTT”
Write Scaling
Does multi-master provide write scaling?
• The updates made by every write transaction need to
be applied on every Galera node
• But none of the following operations are duplicated:
– parser and optimizer overhead
– the effort needed to find and read many records in order to update a few
– execution of triggers
Applications and Workloads
The Multimaster-ready Application
• Check if application uses transactions or individual queries
• Suggested application behavior:
– ensure that the application can handle “deadlock” errors during
transaction and at COMMIT
– application should be able to retry failed transactions
– transactions requiring absolutely fresh data are known
– reads and writes can be directed to different servers if needed
• Better logging:
– make sure all database errors are logged to enable analysis
• For legacy applications:
– autocommit statements can be retried by Galera in case of failure
Suitable Workloads
• Low percentage of effective database updates
• Queries or transactions that perform a lot of work or
contain a lot of business logic but eventually update a
smaller set of rows
Typical Example
START TRANSACTION
SELECT * FROM table1;
SELECT * FROM table2;
SELECT * FROM table3;
...
UPDATE total_amount = 42 WHERE pk = 1
COMMIT
Other Examples
INSERT INTO t1
SELECT COUNT(*) FROM very_large_table;
UPDATE shipments
SET flag = 1
WHERE sender_country = ‘Vatican’
AND receiving_state = ‘WY’;
# Assuming no suitable indexes
Workload Considerations
• High-percentage of single-row, NoSQL-style updates
that act on single rows
• The SELECT FOR UPDATE statement
• Frequent operations on “hot” rows:
– job queues or locking schemes implemented in the database
– counters
– generation of sequence numbers
– repeated updates to “last accessed” timestamp records
• Long-running and housekeeping transactions
Autoincrement Handling
• Galera handles AUTO_INCREMENT columns in a safe
way
– works even as nodes join or leave the cluster
– gaps in the sequence are possible, so use bigint columns
• There is no need for the application to manage
sequence values, reserve ranges, etc.
Read-Write Splitting
• If there are conflicts due to heavy contention on rows,
the application can direct writes to those rows alone to a
single node
• With a TCP load-balancer, provide a TCP port that can
be load-balanced to any node and a TCP port that is
directed to a single node only
• Consider a query-aware proxy such as MaxScale
Configuring Galera for MM
Galera Variables
• Galera is multi-master by default
– any node can accept any query out of the box
Useful options:
• wsrep_retry_autocommit
– retries queries that failed
• wsrep_sync_wait
– ensures data freshness
• wsrep_log_conficts
– prints information in the server error log
Retrying Autocommit Transactions
• Autocommit transactions are those that contain only a
single SQL statement, even if it updates multiple rows
• A higher value of wsrep_retry_autocommit will help a
most such transactions complete successfully
• default is 1, so one retry will happen by default
• SQL statements that update many rows may not be
successful even if retried multiple times
Sync Waiting
• With Galera, some small slave lag (a few transactions) is
allowed for performance reasons
• If a transaction absolutely positively needs the most up-to-
date data there is, set wsrep_sync_wait
• Can be set on a session basis, as needed (do not forget to
reset the variable at the end of the critical block)
• Makes sure the data is up to date as of the start of the
transaction
• Sync waiting is a properly of the transaction that requires
fresh data, not of the transaction that wrote the data
Dealing with Conflicts
Monitoring Conflicts
• wsrep_local_bf_aborts
– number of transactions that were aborted because a conflicting
transaction has already been committed locally
– this type of abort can happen even prior to COMMIT, to avoid
performing unnecessary work that is doomed to fail
• wsrep_local_cert_failures
– number of transactions failed at COMMIT time because they conflict
with another transaction still in “in flight”
Use the sum of the two counters.
Debugging Conflicts
• Enable logging on the application side:
– to provide context information on the failing query (e.g. function or line
numbers; schema name)
• Ensure that system time is synchronized across the
cluster and with the application servers
• Enable the wsrep_log_conflicts variable
• Enable binary logging to obtain information on the
winning transaction (see http://goo.gl/Tw5JLn)
Log Output
*** Victim TRANSACTION:
TRANSACTION 1374, ACTIVE 23 sec starting index read
mysql tables in use 1, locked 1
4833 lock struct(s), heap size 554536, 1004832 row lock(s), undo log entries 934296
MySQL thread id 5, OS thread handle 0x7fbbb4601700, query id 50
localhost ::1 root updating
update t1 set f2 = 'problematic_key_value21'
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 8 page no 4 n bits 280 index `PRIMARY`
of table `test`.`t1` trx id 1374 lock_mode X
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 4; hex 80000001; asc ;; # Unsigned integer value of PK
1: len 6; hex 00000000055e; asc ^;;
2: len 7; hex 39000021fd0110; asc 9 ! ;;
3: len 30; hex 70726f626c656d617469635f6b65795f76616c7565323120202020202020; asc
problematic_key_value21 ; (total 50 bytes);
Avoiding Conflicts
For hot records:
• break down a “hot record” into multiple rows
• replace repeated updates with inserting new records into a log
table
For long-running transactions:
• split housekeeping work into smaller units
Or:
• Send conflicting writes to a single node
– non-conflicting transactions can still be directed to any node
Questions
• Please use the Question/Chat box in the GoToWebinar
panel
Thank You
http://www.galeracluster.com
Discussion group:
codership-team@googlegroups.com

Taking Full Advantage of Galera Multi Master Cluster

  • 1.
    Taking Full Advantageof Galera Multi-Master Philip Stoev Codership Oy
  • 2.
    Agenda • A veryquick overview of Galera Cluster • General principles of multi-master (MM) • Workloads that are well-suited for MM • Application considerations for MM • Configuring, monitoring and troubleshooting multi- master
  • 3.
    Galera Cluster Overview Synchronous –each transaction is immediately replicated on all nodes at commit – no stale slaves Multi-Master – read from and write to any node – automatic transaction conflict detection Replication – a copy of the entire dataset is available on all nodes – new nodes can join automatically For MySQL – based on a modified version of MySQL (5.5, 5.6 with 5.7 coming up) – InnoDB storage engine
  • 4.
    And more … •Recovers from node failures within seconds • Data consistency protections – avoids reading stale data – prevents unsafe data modifications • Cloud and WAN support
  • 5.
  • 6.
    What is Multi-Master •The ability to issue any transaction on any Galera node • A core feature of the product, not a clever trick that happens to work • Available out of the box
  • 7.
    Benefits to Multi-Master •Operational flexibility – no need to designate a single node to use exclusively for writes – simplified configuration for load balancing – easier handling of scheduled downtime and node failures • Wide Area Networks – applications can write to the node that is closest to them
  • 8.
    General Principles • Galeraplaces consistency on top: – conflicting transactions issued on different nodes will be detected – the transaction that committed first succeeds, those that attempt to commit after it are rejected – a transaction can be aborted halfway through if Galera detects that it can not be completed without a conflict • Callaghan’s Law “a given row can’t be modified more than once per RTT”
  • 9.
    Write Scaling Does multi-masterprovide write scaling? • The updates made by every write transaction need to be applied on every Galera node • But none of the following operations are duplicated: – parser and optimizer overhead – the effort needed to find and read many records in order to update a few – execution of triggers
  • 10.
  • 11.
    The Multimaster-ready Application •Check if application uses transactions or individual queries • Suggested application behavior: – ensure that the application can handle “deadlock” errors during transaction and at COMMIT – application should be able to retry failed transactions – transactions requiring absolutely fresh data are known – reads and writes can be directed to different servers if needed • Better logging: – make sure all database errors are logged to enable analysis • For legacy applications: – autocommit statements can be retried by Galera in case of failure
  • 12.
    Suitable Workloads • Lowpercentage of effective database updates • Queries or transactions that perform a lot of work or contain a lot of business logic but eventually update a smaller set of rows
  • 13.
    Typical Example START TRANSACTION SELECT* FROM table1; SELECT * FROM table2; SELECT * FROM table3; ... UPDATE total_amount = 42 WHERE pk = 1 COMMIT
  • 14.
    Other Examples INSERT INTOt1 SELECT COUNT(*) FROM very_large_table; UPDATE shipments SET flag = 1 WHERE sender_country = ‘Vatican’ AND receiving_state = ‘WY’; # Assuming no suitable indexes
  • 15.
    Workload Considerations • High-percentageof single-row, NoSQL-style updates that act on single rows • The SELECT FOR UPDATE statement • Frequent operations on “hot” rows: – job queues or locking schemes implemented in the database – counters – generation of sequence numbers – repeated updates to “last accessed” timestamp records • Long-running and housekeeping transactions
  • 16.
    Autoincrement Handling • Galerahandles AUTO_INCREMENT columns in a safe way – works even as nodes join or leave the cluster – gaps in the sequence are possible, so use bigint columns • There is no need for the application to manage sequence values, reserve ranges, etc.
  • 17.
    Read-Write Splitting • Ifthere are conflicts due to heavy contention on rows, the application can direct writes to those rows alone to a single node • With a TCP load-balancer, provide a TCP port that can be load-balanced to any node and a TCP port that is directed to a single node only • Consider a query-aware proxy such as MaxScale
  • 18.
  • 19.
    Galera Variables • Galerais multi-master by default – any node can accept any query out of the box Useful options: • wsrep_retry_autocommit – retries queries that failed • wsrep_sync_wait – ensures data freshness • wsrep_log_conficts – prints information in the server error log
  • 20.
    Retrying Autocommit Transactions •Autocommit transactions are those that contain only a single SQL statement, even if it updates multiple rows • A higher value of wsrep_retry_autocommit will help a most such transactions complete successfully • default is 1, so one retry will happen by default • SQL statements that update many rows may not be successful even if retried multiple times
  • 21.
    Sync Waiting • WithGalera, some small slave lag (a few transactions) is allowed for performance reasons • If a transaction absolutely positively needs the most up-to- date data there is, set wsrep_sync_wait • Can be set on a session basis, as needed (do not forget to reset the variable at the end of the critical block) • Makes sure the data is up to date as of the start of the transaction • Sync waiting is a properly of the transaction that requires fresh data, not of the transaction that wrote the data
  • 22.
  • 23.
    Monitoring Conflicts • wsrep_local_bf_aborts –number of transactions that were aborted because a conflicting transaction has already been committed locally – this type of abort can happen even prior to COMMIT, to avoid performing unnecessary work that is doomed to fail • wsrep_local_cert_failures – number of transactions failed at COMMIT time because they conflict with another transaction still in “in flight” Use the sum of the two counters.
  • 24.
    Debugging Conflicts • Enablelogging on the application side: – to provide context information on the failing query (e.g. function or line numbers; schema name) • Ensure that system time is synchronized across the cluster and with the application servers • Enable the wsrep_log_conflicts variable • Enable binary logging to obtain information on the winning transaction (see http://goo.gl/Tw5JLn)
  • 25.
    Log Output *** VictimTRANSACTION: TRANSACTION 1374, ACTIVE 23 sec starting index read mysql tables in use 1, locked 1 4833 lock struct(s), heap size 554536, 1004832 row lock(s), undo log entries 934296 MySQL thread id 5, OS thread handle 0x7fbbb4601700, query id 50 localhost ::1 root updating update t1 set f2 = 'problematic_key_value21' *** WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 8 page no 4 n bits 280 index `PRIMARY` of table `test`.`t1` trx id 1374 lock_mode X Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000001; asc ;; # Unsigned integer value of PK 1: len 6; hex 00000000055e; asc ^;; 2: len 7; hex 39000021fd0110; asc 9 ! ;; 3: len 30; hex 70726f626c656d617469635f6b65795f76616c7565323120202020202020; asc problematic_key_value21 ; (total 50 bytes);
  • 26.
    Avoiding Conflicts For hotrecords: • break down a “hot record” into multiple rows • replace repeated updates with inserting new records into a log table For long-running transactions: • split housekeeping work into smaller units Or: • Send conflicting writes to a single node – non-conflicting transactions can still be directed to any node
  • 27.
    Questions • Please usethe Question/Chat box in the GoToWebinar panel
  • 28.