Multi-Master Synchronous ReplicationGalera Cluster for MySQL     August 2012     Alex Yu     VP Products     alex@severaln...
Copyright Severalnines ABAgenda     About Severalnines     What is Galera Replication?     Galera Concepts     Node Pr...
Copyright Severalnines ABAbout Us    Stockholm, Tokyo and Singapore    Database Automation and DBaaS software vendor    ...
Copyright Severalnines ABWhat is Galera Replication?      Synchronous (Virtually) Multi-Master Replication          Read...
Copyright Severalnines ABGalera Cluster for MySQL     Codership patches for MySQL          Binaries and source available...
Copyright Severalnines ABGalera Cluster for MySQL cont.      Higher probability for “deadlocks”          Cluster wide op...
Copyright Severalnines ABSynchronous Replication                                      Transaction t1  Node 1           BEG...
Copyright Severalnines ABGalera Concepts    Application State         A set of data that application decides to replicat...
Copyright Severalnines ABGalera Concepts cont.                                                                          My...
Copyright Severalnines ABGalera Concepts cont.     State Snapshot Transfer - SST          A transfer of a consistent sna...
Copyright Severalnines ABGalera Concepts cont.     Node Failures          A peer crash is indistinguishable from network...
Copyright Severalnines ABGalera Concepts cont.     LAN vs WAN replication          No notion of local or remote node    ...
Copyright Severalnines ABNode Provisioning     Automatic node (re)synchronization     A ‘donor’ is chosen to provision a...
Copyright Severalnines ABNode Provisioning cont.                                      Client           Client           Cl...
Copyright Severalnines ABNode Provisioning cont.                                      Client           Client             ...
Copyright Severalnines ABNode Provisioning cont.                                      Client           Client             ...
Copyright Severalnines ABNode Provisioning cont.                                       Client           Client            ...
Copyright Severalnines ABNode Provisioning cont.                                      Client           Client             ...
Copyright Severalnines ABNetwork Partitioning/Split Brain     Quorum based system          “Majority >50%” partition con...
Copyright Severalnines ABNetwork Partitioning/Split Brain cont.                                               Client      ...
Copyright Severalnines ABNetwork Partitioning/Split Brain cont.                                               Client      ...
Copyright Severalnines ABNetwork Partitioning/Split Brain cont.                                         Client       Clien...
Copyright Severalnines ABNetwork Partitioning/Split Brain cont.                                         Client       Clien...
Copyright Severalnines ABNetwork Partitioning/Split Brain cont.                                               Client      ...
Copyright Severalnines ABGalera Configuration Example [mysqld] wsrep_cluster_address=/usr/lib64/libgalera_smm.so wsrep_nod...
Copyright Severalnines ABwsrep variables     wsrep_provider          Path to wsrep provider library     wsrep_cluster_a...
Copyright Severalnines ABwsrep variables cont.     wsrep_node_name          An optional name for the node. It will be us...
Copyright Severalnines ABwsrep variables cont.     wsrep_slave_threads          Parallel applying threads (1-512)       ...
Copyright Severalnines ABPerformance Metrics     wsrep_flow_control_paused         Fraction of the time replication was ...
Copyright Severalnines ABNumber of conflicts/”deadlocks”     wsrep_last_committed          Last committed transaction   ...
Copyright Severalnines ABBenchmarks: sysbench, tps                http://codership.com/content/whats-difference-kenneth   ...
Copyright Severalnines ABBenchmarks: sysbench, latency                http://codership.com/content/whats-difference-kennet...
Copyright Severalnines ABBenchmarks: Comparing NDB vs Galera                    Note: No optimizations done for the NDB st...
Copyright Severalnines ABBenchmarks: Comparing NDB vs Galera                    Note: No optimizations done for the NDB st...
Copyright Severalnines ABBest Practices      Dedicated switch/network for Galera Nodes (1 GBit min)      Connection pool...
Copyright Severalnines ABBest Practices cont.     Reference Node                                                  Client ...
Copyright Severalnines ABBest Practices cont.     Minimize probability of deadlocks          Writes go only to 1 Node   ...
Copyright Severalnines ABGalera Limitations      MyISAM replication is experimental          DDL statements are replicat...
Copyright Severalnines ABMonitoring and Management                                Confidential   39
Copyright Severalnines ABClusterControl  Host Monitoring (CPU, RAM, Disk, Network)                   Configuration Manag...
Copyright Severalnines ABConfigurators                                 Confidential   41
Copyright Severalnines ABGalera Configurator                                 Confidential   42
Copyright Severalnines ABGalera Configurator cont.                                 Confidential   43
Copyright Severalnines ABGalera Configurator cont.                                 Confidential   44
Copyright Severalnines ABDeploy Galera Cluster with HAProxy     cd ~/s9s-galera-2.10/mysql/scripts/install     ./deploy....
Copyright Severalnines AB                            Confidential   46
Copyright Severalnines AB                            Confidential   47
Copyright Severalnines ABResources     Severalnines MySQL Galera Configurator         http://www.severalnines.com/resour...
Upcoming SlideShare
Loading in...5
×

Galera cluster for MySQL - Introduction Slides

79,597

Published on

This set of slides gives you an overview of Galera, configuration basics and deployment best practices.

The following topics are covered:
- Concepts
- Node provisioning
- Network partitioning
- Configuration example
- Benchmarks
- Deployment best practices
- Galera monitoring and management

Published in: Technology

Transcript of "Galera cluster for MySQL - Introduction Slides"

  1. 1. Multi-Master Synchronous ReplicationGalera Cluster for MySQL August 2012 Alex Yu VP Products alex@severalnines.com Confidential
  2. 2. Copyright Severalnines ABAgenda  About Severalnines  What is Galera Replication?  Galera Concepts  Node Provisioning  Network partitioning/Split brain  Configuration Example  Benchmarks & Performance Metrics  Best Practices  Monitoring and Management Confidential 2
  3. 3. Copyright Severalnines ABAbout Us Stockholm, Tokyo and Singapore Database Automation and DBaaS software vendor Over 7,000 deployments to date Commercial product launched Q1 2011 Winner Best Startup EuroCloud Europe 2011 Launched Europe’s first Data Cloud in Nov 2011 Press coverage 2011: CIO Magazine, eWeek, PC-World, IDG News, Le Figaro, LeMondeInformatique, heise.de, Computerwelt, silicon.de, etc … Confidential 3
  4. 4. Copyright Severalnines ABWhat is Galera Replication?  Synchronous (Virtually) Multi-Master Replication  Read and Write on any Node  No Master Failover! No Slave Lag! Application MySQL Server  Guaranteed write consistency WSREP API WSREP API  Cluster wide conflicts resolution (certification) WSREP Provider wsrep plugin  Highly Available and Scalable Replication Replication  No SPOF  Read and Write (Parallel Applier threads) scalability  Geographical Replication (Mix MySQL Async & Galera Sync)  Cluster (Group Communication Protocol)  Automatic Node Provisioning, QoS Confidential 4
  5. 5. Copyright Severalnines ABGalera Cluster for MySQL Codership patches for MySQL  Binaries and source available at launchpad InnoDB (& MyISAM experimental) Client Client Client  No need to change DB schema/queries  Local queries LB Parallel Replication!  Multiple Applier Threads (1-512) R/W R/W R/W MySQL MySQL MySQL  Row events, row level locks [WSREP] [WSREP] [WSREP] Asynchronous Replication Galera Replication (Synchronous)  In/Out of the cluster Confidential 5
  6. 6. Copyright Severalnines ABGalera Cluster for MySQL cont.  Higher probability for “deadlocks”  Cluster wide optimistic locking  Locking conflicts detected at commit Client Client Client  First to commit succeeds  Minimum 3 nodes required LB  “Donor” node blocks writes during full synch of joining/recovering node R/W R/W R/W  3rd node then is available for service MySQL MySQL MySQL [WSREP]  Gotchas: 2 recovering nodes will block the last node [WSREP] [WSREP]  Replication performance dependent on Galera Replication (Synchronous)  Network latency  Performance of the “slowest” or the farthest Node (RTT)  Number of deployed nodes Confidential 6
  7. 7. Copyright Severalnines ABSynchronous Replication Transaction t1 Node 1 BEGIN COMMIT (REQ) COMMIT (ACK/returns) Statements Commit response time time COMMIT or Rollback WS Replication event OK or Conflict Node 2 Transaction applied (virtually synchronous) WS time Certification Apply event Node 3 Transaction applied (virtually synchronous) WS time Certification Apply event All nodes 100% sync Confidential 7
  8. 8. Copyright Severalnines ABGalera Concepts Application State  A set of data that application decides to replicate  Default is the whole MySQL databases. Every node is a complete replica  Application state is identified by a Global Transaction ID Global Transaction ID (GTID)  f7720ae0-6f9b-11e1-0800-598d1b386dce:32520198989  CLUSTER/HISTORY/STATE UUID:TRX/STATE/SEQNO  All replicated transactions can be uniquely referenced in any node Initial state: f7720ae0-6f9b-11e1-0800-598d1b386dce:0 Undefined state: 00000000-0000-0000-0000-000000000000:-1 Confidential 8
  9. 9. Copyright Severalnines ABGalera Concepts cont. MySQL [WSREP] Primary Component - PC  The whole cluster is a PC during normal operation  Node and network failures MySQL [WSREP] MySQL [WSREP]  Splits clusters into several components Primary Component Only PC can continue to modify state Quorum algorithm invoked to select a PC during cluster partitioning  Majority rules  Minority tries to reconnect with PC Confidential 9
  10. 10. Copyright Severalnines ABGalera Concepts cont. State Snapshot Transfer - SST  A transfer of a consistent snapshot of a node state corresponding to a certain GTID  Initialize the state of a newly joining cluster node from an already initialized node (donor) Incremental State Transfer - IST  Catch up with the cluster by replaying missing transactions  Known initial node state  Enough transactions cached at the donor Confidential 10
  11. 11. Copyright Severalnines ABGalera Concepts cont. Node Failures  A peer crash is indistinguishable from network failure  A node is considered failed when it no longer can be communicated with Node health verified by receiving messages or keepalives  evs.inactive_timeout  sets the timeout after which node is considered inactive (dead)  evs.suspect_timeout  sets the timeout after which the node can be pronounced dead if everyone else agrees Confidential 11
  12. 12. Copyright Severalnines ABGalera Concepts cont. LAN vs WAN replication  No notion of local or remote node  Works as long as TCP works May need tuning to be more tolerant to network latency/issues Network params sample  evs.keepalive_period = PT3S  evs.inactive_check_period = PT10S  evs.suspect_timeout = PT30S  evs.inactive_timeout = PT1M  evs.consensus_timeout = PT1M Confidential 12
  13. 13. Copyright Severalnines ABNode Provisioning Automatic node (re)synchronization A ‘donor’ is chosen to provision a ‘joiner’ node  ‘Donor’ node is blocked (write operations) until SST completes State Snapshot Transfer - SST  Scriptable interface  mysqldump (slow)  rsync (fast)  Percona Xtrabackup (faster and non-blocking) Confidential 13
  14. 14. Copyright Severalnines ABNode Provisioning cont. Client Client Client Load balancer Node 1 MySQL [WSREP] Node 2 MySQL [WSREP] Confidential 14
  15. 15. Copyright Severalnines ABNode Provisioning cont. Client Client Client Load balancer Node 1 MySQL [WSREP] MySQL Node 2 MySQL [WSREP] [WSREP] ‘Joiner’ Node 3 Confidential 15
  16. 16. Copyright Severalnines ABNode Provisioning cont. Client Client Client Load balancer Node 1 MySQL [WSREP] ‘Joiner’ Node 3 MySQL Node 2 MySQL [WSREP] [WSREP] rsync receive wsrep_cluster_address=Node 2 SST Request Confidential 16
  17. 17. Copyright Severalnines ABNode Provisioning cont. Client Client Client Load balancer Node 1 MySQL [WSREP] ‘Joiner’ Node 3 MySQL Node 2 MySQL [WSREP] [WSREP] rsync receive rsync send Node 2 in ‘donor mode’. Write operations blocked Confidential 17
  18. 18. Copyright Severalnines ABNode Provisioning cont. Client Client Client Load balancer Node 1 MySQL [WSREP] Catch up MySQL Node 2 MySQL [WSREP] [WSREP] Node 3 Confidential 18
  19. 19. Copyright Severalnines ABNetwork Partitioning/Split Brain Quorum based system  “Majority >50%” partition continues operation  “Minority” partition blocks operations  Until reconnected with Primary Component Use odd number of nodes  Minimum 3 (5, 7, 9 etc) Galera Arbitrator (garbd)  Useful if you have even number of nodes  Nodes across DCs  Replication relay Confidential 19
  20. 20. Copyright Severalnines ABNetwork Partitioning/Split Brain cont. Client Client Client Load balancer MySQL [WSREP] MySQL1 Primary Component [WSREP] MySQL [WSREP] DC1 DC2 Confidential 20
  21. 21. Copyright Severalnines ABNetwork Partitioning/Split Brain cont. Client Client Client Load balancer MySQL [WSREP] MySQL Block operations untilPrimary Component ? [WSREP] reconnected with PC MySQL [WSREP] DC1 DC2 Confidential 21
  22. 22. Copyright Severalnines ABNetwork Partitioning/Split Brain cont. Client Client Client Load balancer MySQL [WSREP] MySQL [WSREP] MySQL [WSREP] DC1 DC2 Galera Arbitrator DC3 Confidential 22
  23. 23. Copyright Severalnines ABNetwork Partitioning/Split Brain cont. Client Client Client Load balancer MySQL [WSREP] MySQL [WSREP] MySQL [WSREP] Replication Relay DC1 DC2 Galera Arbitrator DC3 Confidential 23
  24. 24. Copyright Severalnines ABNetwork Partitioning/Split Brain cont. Client Client Client Load balancer MySQL [WSREP] MySQLPrimary Component ? [WSREP] MySQL [WSREP] DC1 DC2 Galera Arbitrator DC3 Confidential 24
  25. 25. Copyright Severalnines ABGalera Configuration Example [mysqld] wsrep_cluster_address=/usr/lib64/libgalera_smm.so wsrep_node_address=gcomm:// # NOTE: This must be changed to peer address ASAP! wsrep_node_name=node1 wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_provider_options=gcache.size=1G;socket.ssl_key=my_key;socket.ssl_cert=my_cert wsrep_slave_threads=16 wsrep_sst_method=xtrabackup wsrep_sst_auth=root: innodb_buffer_pool_size=1G innodb_log_file_size=256M innodb_autoinc_lock_mode=2 innodb_flush_log_at_trx_commit=0 innodb_doublewrite=0 innodb_file_per_table=1 binlog_format=ROW datadir=/var/lib/mysql log-bin = mysql-bin server-id = 2 relay-log = mysql-relay-bin #read-only = 1 log-slave-updates = 1 Confidential 25
  26. 26. Copyright Severalnines ABwsrep variables wsrep_provider  Path to wsrep provider library wsrep_cluster_address  URI form:gcomm://another_node_address?opt1=val1&opt2=val2  gcomm:// special meaning. Initialize the cluster (never leave it in my.cnf) wsrep_node_address  An optional address of the node. A short-cut way to configure listen addresses for replication and state transfers  By default it will be initialized to the first network interface returned by ifconfig. This could be unreliable.  For best results initialize it explicitly Confidential 26
  27. 27. Copyright Severalnines ABwsrep variables cont. wsrep_node_name  An optional name for the node. It will be used in logging and to identify the desired donor for state transfer  Default it will be initialized to hostname wsrep_provider_options  Semicolon-separated list of options specific to provider  Ex:  gcache.size – a size of the permanent transaction on-disk cache  socket.ssl_key, socket.ssl_cert – SSL key and certificate files Confidential 27
  28. 28. Copyright Severalnines ABwsrep variables cont. wsrep_slave_threads  Parallel applying threads (1-512)  >1 requires certain InnoDB settings. Applying of STATEMENT-based events is always serialized wsrep_sst_method  Base package contains scripts for mysqldump, rsync and xtrabackup based state snapshot transfers. Own scripts can be used  Default is mysqldump Confidential 28
  29. 29. Copyright Severalnines ABPerformance Metrics  wsrep_flow_control_paused  Fraction of the time replication was paused  wsrep_flow_control_sent  How many times this node paused replication  wsrep_local_recv_queue_avg  Average length of slave trx queue – a sign of slave side bottleneck  wsrep_cert_deps_distance  How many transactions can be applied in parallel  wsrep_local_send_queue_avg  A sign of network bottleneck Confidential 29
  30. 30. Copyright Severalnines ABNumber of conflicts/”deadlocks” wsrep_last_committed  Last committed transaction wsrep_local_cert_failures, wsrep_local_bf_aborts  Rollbacks, conflicts detected Confidential 30
  31. 31. Copyright Severalnines ABBenchmarks: sysbench, tps http://codership.com/content/whats-difference-kenneth Confidential 31
  32. 32. Copyright Severalnines ABBenchmarks: sysbench, latency http://codership.com/content/whats-difference-kenneth Confidential 32
  33. 33. Copyright Severalnines ABBenchmarks: Comparing NDB vs Galera Note: No optimizations done for the NDB storage engine (DB schema nor queries) http://codership.com/content/whats-difference-kenneth Confidential 33
  34. 34. Copyright Severalnines ABBenchmarks: Comparing NDB vs Galera Note: No optimizations done for the NDB storage engine (DB schema nor queries) http://codership.com/content/whats-difference-kenneth Confidential 34
  35. 35. Copyright Severalnines ABBest Practices  Dedicated switch/network for Galera Nodes (1 GBit min)  Connection pools/Load balancing with applications  Gives best performance  Use static/elastic IPs for the Galera nodes  Con: Need to handle node membership changes  Con: JDBC/PHP etc are not aware of Galera specific Node states  Load Balancers  Hardware, e.g., IP5  SW load balancer  HAProxy with Galera specific health check scripts  IP dispatching in the kernal for example Linux LVS  GLB (Galera Load Balancer)  Con: Need to setup LB redundancy Confidential 35
  36. 36. Copyright Severalnines ABBest Practices cont. Reference Node Client Client Client  Act as a ‘donor’ node  Backup node  No client connections LB R/W R/W R/W MySQL [WSREP] ... MySQL [WSREP] MySQL [WSREP] Donor & Backup Node Confidential 36
  37. 37. Copyright Severalnines ABBest Practices cont. Minimize probability of deadlocks  Writes go only to 1 Node  Applications use connection pool or Client Client Client load balancer on read only nodes  Have 1 “reference” Node for write failover LB and donor R R W MySQL [WSREP] ... MySQL [WSREP] MySQL [WSREP] “Master” Node Confidential 37
  38. 38. Copyright Severalnines ABGalera Limitations  MyISAM replication is experimental  DDL statements are replicated in statement level  Any writes to other table types, including system (mysql.*) tables are not replicated  CREATE USER..., but issuing: INSERT INTO mysql.user..., will not be replicated  Non-deterministic functions like NOW() are not supported  Query log cannot be directed to table  LOCK/UNLOCK TABLES cannot be supported in multi-master setups  lock functions (GET_LOCK(), RELEASE_LOCK()... )  Maximum allowed transaction size is defined by wsrep_max_ws_rows and wsrep_max_ws_size  XA transactions can not be supported due to possible rollback on commit Confidential 38
  39. 39. Copyright Severalnines ABMonitoring and Management Confidential 39
  40. 40. Copyright Severalnines ABClusterControl  Host Monitoring (CPU, RAM, Disk, Network)  Configuration Management  DB Metrics Monitoring  Performance Management  DB Resources Monitoring  Database Upgrades/Downgrades  Cluster-wide Query Analyzer  Online Scaling of MySQL Servers  Schema Management  Configurable Resource Thresholds  Replication Fail-over  Alarms and Email Notifications  Clusterware – Process Management and Automated Recovery  Backup Scheduling  Manual start/stop of Nodes  Real-time Performance Probes Confidential 40
  41. 41. Copyright Severalnines ABConfigurators Confidential 41
  42. 42. Copyright Severalnines ABGalera Configurator Confidential 42
  43. 43. Copyright Severalnines ABGalera Configurator cont. Confidential 43
  44. 44. Copyright Severalnines ABGalera Configurator cont. Confidential 44
  45. 45. Copyright Severalnines ABDeploy Galera Cluster with HAProxy cd ~/s9s-galera-2.10/mysql/scripts/install ./deploy.sh &> | tee -a cc.log wget http://severalnines.com/downloads/s9s-haproxy.tar.gz tar zxvf s9s-haproxy.tar.gz cd haproxy ./install-haproxy.sh <lb host> <rhel|debian> galera done... Confidential 45
  46. 46. Copyright Severalnines AB Confidential 46
  47. 47. Copyright Severalnines AB Confidential 47
  48. 48. Copyright Severalnines ABResources  Severalnines MySQL Galera Configurator  http://www.severalnines.com/resources/configurator  Supported platforms (MySQL Galera)  http://support.severalnines.com/entries/21589522-verified-and-supported-operating- systems  Galera limitations  http://support.severalnines.com/entries/21692388-limitations-in-galera-replication-for- mysql  ClusterControl server requirements  http://support.severalnines.com/entries/20614858-server-requirements-on-premise- amis-other-imageshttp://support.severalnines.com/entries/20614858-server- requirements-on-premise-amis-other-images Confidential 48

×