Your SlideShare is downloading. ×

Galera cluster for MySQL - Introduction Slides

74,729

Published on

This set of slides gives you an overview of Galera, configuration basics and deployment best practices. …

This set of slides gives you an overview of Galera, configuration basics and deployment best practices.

The following topics are covered:
- Concepts
- Node provisioning
- Network partitioning
- Configuration example
- Benchmarks
- Deployment best practices
- Galera monitoring and management

Published in: Technology
0 Comments
35 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
74,729
On Slideshare
0
From Embeds
0
Number of Embeds
34
Actions
Shares
0
Downloads
2
Comments
0
Likes
35
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Multi-Master Synchronous ReplicationGalera Cluster for MySQL August 2012 Alex Yu VP Products alex@severalnines.com Confidential
  • 2. Copyright Severalnines ABAgenda  About Severalnines  What is Galera Replication?  Galera Concepts  Node Provisioning  Network partitioning/Split brain  Configuration Example  Benchmarks & Performance Metrics  Best Practices  Monitoring and Management Confidential 2
  • 3. Copyright Severalnines ABAbout Us Stockholm, Tokyo and Singapore Database Automation and DBaaS software vendor Over 7,000 deployments to date Commercial product launched Q1 2011 Winner Best Startup EuroCloud Europe 2011 Launched Europe’s first Data Cloud in Nov 2011 Press coverage 2011: CIO Magazine, eWeek, PC-World, IDG News, Le Figaro, LeMondeInformatique, heise.de, Computerwelt, silicon.de, etc … Confidential 3
  • 4. Copyright Severalnines ABWhat is Galera Replication?  Synchronous (Virtually) Multi-Master Replication  Read and Write on any Node  No Master Failover! No Slave Lag! Application MySQL Server  Guaranteed write consistency WSREP API WSREP API  Cluster wide conflicts resolution (certification) WSREP Provider wsrep plugin  Highly Available and Scalable Replication Replication  No SPOF  Read and Write (Parallel Applier threads) scalability  Geographical Replication (Mix MySQL Async & Galera Sync)  Cluster (Group Communication Protocol)  Automatic Node Provisioning, QoS Confidential 4
  • 5. Copyright Severalnines ABGalera Cluster for MySQL Codership patches for MySQL  Binaries and source available at launchpad InnoDB (& MyISAM experimental) Client Client Client  No need to change DB schema/queries  Local queries LB Parallel Replication!  Multiple Applier Threads (1-512) R/W R/W R/W MySQL MySQL MySQL  Row events, row level locks [WSREP] [WSREP] [WSREP] Asynchronous Replication Galera Replication (Synchronous)  In/Out of the cluster Confidential 5
  • 6. Copyright Severalnines ABGalera Cluster for MySQL cont.  Higher probability for “deadlocks”  Cluster wide optimistic locking  Locking conflicts detected at commit Client Client Client  First to commit succeeds  Minimum 3 nodes required LB  “Donor” node blocks writes during full synch of joining/recovering node R/W R/W R/W  3rd node then is available for service MySQL MySQL MySQL [WSREP]  Gotchas: 2 recovering nodes will block the last node [WSREP] [WSREP]  Replication performance dependent on Galera Replication (Synchronous)  Network latency  Performance of the “slowest” or the farthest Node (RTT)  Number of deployed nodes Confidential 6
  • 7. Copyright Severalnines ABSynchronous Replication Transaction t1 Node 1 BEGIN COMMIT (REQ) COMMIT (ACK/returns) Statements Commit response time time COMMIT or Rollback WS Replication event OK or Conflict Node 2 Transaction applied (virtually synchronous) WS time Certification Apply event Node 3 Transaction applied (virtually synchronous) WS time Certification Apply event All nodes 100% sync Confidential 7
  • 8. Copyright Severalnines ABGalera Concepts Application State  A set of data that application decides to replicate  Default is the whole MySQL databases. Every node is a complete replica  Application state is identified by a Global Transaction ID Global Transaction ID (GTID)  f7720ae0-6f9b-11e1-0800-598d1b386dce:32520198989  CLUSTER/HISTORY/STATE UUID:TRX/STATE/SEQNO  All replicated transactions can be uniquely referenced in any node Initial state: f7720ae0-6f9b-11e1-0800-598d1b386dce:0 Undefined state: 00000000-0000-0000-0000-000000000000:-1 Confidential 8
  • 9. Copyright Severalnines ABGalera Concepts cont. MySQL [WSREP] Primary Component - PC  The whole cluster is a PC during normal operation  Node and network failures MySQL [WSREP] MySQL [WSREP]  Splits clusters into several components Primary Component Only PC can continue to modify state Quorum algorithm invoked to select a PC during cluster partitioning  Majority rules  Minority tries to reconnect with PC Confidential 9
  • 10. Copyright Severalnines ABGalera Concepts cont. State Snapshot Transfer - SST  A transfer of a consistent snapshot of a node state corresponding to a certain GTID  Initialize the state of a newly joining cluster node from an already initialized node (donor) Incremental State Transfer - IST  Catch up with the cluster by replaying missing transactions  Known initial node state  Enough transactions cached at the donor Confidential 10
  • 11. Copyright Severalnines ABGalera Concepts cont. Node Failures  A peer crash is indistinguishable from network failure  A node is considered failed when it no longer can be communicated with Node health verified by receiving messages or keepalives  evs.inactive_timeout  sets the timeout after which node is considered inactive (dead)  evs.suspect_timeout  sets the timeout after which the node can be pronounced dead if everyone else agrees Confidential 11
  • 12. Copyright Severalnines ABGalera Concepts cont. LAN vs WAN replication  No notion of local or remote node  Works as long as TCP works May need tuning to be more tolerant to network latency/issues Network params sample  evs.keepalive_period = PT3S  evs.inactive_check_period = PT10S  evs.suspect_timeout = PT30S  evs.inactive_timeout = PT1M  evs.consensus_timeout = PT1M Confidential 12
  • 13. Copyright Severalnines ABNode Provisioning Automatic node (re)synchronization A ‘donor’ is chosen to provision a ‘joiner’ node  ‘Donor’ node is blocked (write operations) until SST completes State Snapshot Transfer - SST  Scriptable interface  mysqldump (slow)  rsync (fast)  Percona Xtrabackup (faster and non-blocking) Confidential 13
  • 14. Copyright Severalnines ABNode Provisioning cont. Client Client Client Load balancer Node 1 MySQL [WSREP] Node 2 MySQL [WSREP] Confidential 14
  • 15. Copyright Severalnines ABNode Provisioning cont. Client Client Client Load balancer Node 1 MySQL [WSREP] MySQL Node 2 MySQL [WSREP] [WSREP] ‘Joiner’ Node 3 Confidential 15
  • 16. Copyright Severalnines ABNode Provisioning cont. Client Client Client Load balancer Node 1 MySQL [WSREP] ‘Joiner’ Node 3 MySQL Node 2 MySQL [WSREP] [WSREP] rsync receive wsrep_cluster_address=Node 2 SST Request Confidential 16
  • 17. Copyright Severalnines ABNode Provisioning cont. Client Client Client Load balancer Node 1 MySQL [WSREP] ‘Joiner’ Node 3 MySQL Node 2 MySQL [WSREP] [WSREP] rsync receive rsync send Node 2 in ‘donor mode’. Write operations blocked Confidential 17
  • 18. Copyright Severalnines ABNode Provisioning cont. Client Client Client Load balancer Node 1 MySQL [WSREP] Catch up MySQL Node 2 MySQL [WSREP] [WSREP] Node 3 Confidential 18
  • 19. Copyright Severalnines ABNetwork Partitioning/Split Brain Quorum based system  “Majority >50%” partition continues operation  “Minority” partition blocks operations  Until reconnected with Primary Component Use odd number of nodes  Minimum 3 (5, 7, 9 etc) Galera Arbitrator (garbd)  Useful if you have even number of nodes  Nodes across DCs  Replication relay Confidential 19
  • 20. Copyright Severalnines ABNetwork Partitioning/Split Brain cont. Client Client Client Load balancer MySQL [WSREP] MySQL1 Primary Component [WSREP] MySQL [WSREP] DC1 DC2 Confidential 20
  • 21. Copyright Severalnines ABNetwork Partitioning/Split Brain cont. Client Client Client Load balancer MySQL [WSREP] MySQL Block operations untilPrimary Component ? [WSREP] reconnected with PC MySQL [WSREP] DC1 DC2 Confidential 21
  • 22. Copyright Severalnines ABNetwork Partitioning/Split Brain cont. Client Client Client Load balancer MySQL [WSREP] MySQL [WSREP] MySQL [WSREP] DC1 DC2 Galera Arbitrator DC3 Confidential 22
  • 23. Copyright Severalnines ABNetwork Partitioning/Split Brain cont. Client Client Client Load balancer MySQL [WSREP] MySQL [WSREP] MySQL [WSREP] Replication Relay DC1 DC2 Galera Arbitrator DC3 Confidential 23
  • 24. Copyright Severalnines ABNetwork Partitioning/Split Brain cont. Client Client Client Load balancer MySQL [WSREP] MySQLPrimary Component ? [WSREP] MySQL [WSREP] DC1 DC2 Galera Arbitrator DC3 Confidential 24
  • 25. Copyright Severalnines ABGalera Configuration Example [mysqld] wsrep_cluster_address=/usr/lib64/libgalera_smm.so wsrep_node_address=gcomm:// # NOTE: This must be changed to peer address ASAP! wsrep_node_name=node1 wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_provider_options=gcache.size=1G;socket.ssl_key=my_key;socket.ssl_cert=my_cert wsrep_slave_threads=16 wsrep_sst_method=xtrabackup wsrep_sst_auth=root: innodb_buffer_pool_size=1G innodb_log_file_size=256M innodb_autoinc_lock_mode=2 innodb_flush_log_at_trx_commit=0 innodb_doublewrite=0 innodb_file_per_table=1 binlog_format=ROW datadir=/var/lib/mysql log-bin = mysql-bin server-id = 2 relay-log = mysql-relay-bin #read-only = 1 log-slave-updates = 1 Confidential 25
  • 26. Copyright Severalnines ABwsrep variables wsrep_provider  Path to wsrep provider library wsrep_cluster_address  URI form:gcomm://another_node_address?opt1=val1&opt2=val2  gcomm:// special meaning. Initialize the cluster (never leave it in my.cnf) wsrep_node_address  An optional address of the node. A short-cut way to configure listen addresses for replication and state transfers  By default it will be initialized to the first network interface returned by ifconfig. This could be unreliable.  For best results initialize it explicitly Confidential 26
  • 27. Copyright Severalnines ABwsrep variables cont. wsrep_node_name  An optional name for the node. It will be used in logging and to identify the desired donor for state transfer  Default it will be initialized to hostname wsrep_provider_options  Semicolon-separated list of options specific to provider  Ex:  gcache.size – a size of the permanent transaction on-disk cache  socket.ssl_key, socket.ssl_cert – SSL key and certificate files Confidential 27
  • 28. Copyright Severalnines ABwsrep variables cont. wsrep_slave_threads  Parallel applying threads (1-512)  >1 requires certain InnoDB settings. Applying of STATEMENT-based events is always serialized wsrep_sst_method  Base package contains scripts for mysqldump, rsync and xtrabackup based state snapshot transfers. Own scripts can be used  Default is mysqldump Confidential 28
  • 29. Copyright Severalnines ABPerformance Metrics  wsrep_flow_control_paused  Fraction of the time replication was paused  wsrep_flow_control_sent  How many times this node paused replication  wsrep_local_recv_queue_avg  Average length of slave trx queue – a sign of slave side bottleneck  wsrep_cert_deps_distance  How many transactions can be applied in parallel  wsrep_local_send_queue_avg  A sign of network bottleneck Confidential 29
  • 30. Copyright Severalnines ABNumber of conflicts/”deadlocks” wsrep_last_committed  Last committed transaction wsrep_local_cert_failures, wsrep_local_bf_aborts  Rollbacks, conflicts detected Confidential 30
  • 31. Copyright Severalnines ABBenchmarks: sysbench, tps http://codership.com/content/whats-difference-kenneth Confidential 31
  • 32. Copyright Severalnines ABBenchmarks: sysbench, latency http://codership.com/content/whats-difference-kenneth Confidential 32
  • 33. Copyright Severalnines ABBenchmarks: Comparing NDB vs Galera Note: No optimizations done for the NDB storage engine (DB schema nor queries) http://codership.com/content/whats-difference-kenneth Confidential 33
  • 34. Copyright Severalnines ABBenchmarks: Comparing NDB vs Galera Note: No optimizations done for the NDB storage engine (DB schema nor queries) http://codership.com/content/whats-difference-kenneth Confidential 34
  • 35. Copyright Severalnines ABBest Practices  Dedicated switch/network for Galera Nodes (1 GBit min)  Connection pools/Load balancing with applications  Gives best performance  Use static/elastic IPs for the Galera nodes  Con: Need to handle node membership changes  Con: JDBC/PHP etc are not aware of Galera specific Node states  Load Balancers  Hardware, e.g., IP5  SW load balancer  HAProxy with Galera specific health check scripts  IP dispatching in the kernal for example Linux LVS  GLB (Galera Load Balancer)  Con: Need to setup LB redundancy Confidential 35
  • 36. Copyright Severalnines ABBest Practices cont. Reference Node Client Client Client  Act as a ‘donor’ node  Backup node  No client connections LB R/W R/W R/W MySQL [WSREP] ... MySQL [WSREP] MySQL [WSREP] Donor & Backup Node Confidential 36
  • 37. Copyright Severalnines ABBest Practices cont. Minimize probability of deadlocks  Writes go only to 1 Node  Applications use connection pool or Client Client Client load balancer on read only nodes  Have 1 “reference” Node for write failover LB and donor R R W MySQL [WSREP] ... MySQL [WSREP] MySQL [WSREP] “Master” Node Confidential 37
  • 38. Copyright Severalnines ABGalera Limitations  MyISAM replication is experimental  DDL statements are replicated in statement level  Any writes to other table types, including system (mysql.*) tables are not replicated  CREATE USER..., but issuing: INSERT INTO mysql.user..., will not be replicated  Non-deterministic functions like NOW() are not supported  Query log cannot be directed to table  LOCK/UNLOCK TABLES cannot be supported in multi-master setups  lock functions (GET_LOCK(), RELEASE_LOCK()... )  Maximum allowed transaction size is defined by wsrep_max_ws_rows and wsrep_max_ws_size  XA transactions can not be supported due to possible rollback on commit Confidential 38
  • 39. Copyright Severalnines ABMonitoring and Management Confidential 39
  • 40. Copyright Severalnines ABClusterControl  Host Monitoring (CPU, RAM, Disk, Network)  Configuration Management  DB Metrics Monitoring  Performance Management  DB Resources Monitoring  Database Upgrades/Downgrades  Cluster-wide Query Analyzer  Online Scaling of MySQL Servers  Schema Management  Configurable Resource Thresholds  Replication Fail-over  Alarms and Email Notifications  Clusterware – Process Management and Automated Recovery  Backup Scheduling  Manual start/stop of Nodes  Real-time Performance Probes Confidential 40
  • 41. Copyright Severalnines ABConfigurators Confidential 41
  • 42. Copyright Severalnines ABGalera Configurator Confidential 42
  • 43. Copyright Severalnines ABGalera Configurator cont. Confidential 43
  • 44. Copyright Severalnines ABGalera Configurator cont. Confidential 44
  • 45. Copyright Severalnines ABDeploy Galera Cluster with HAProxy cd ~/s9s-galera-2.10/mysql/scripts/install ./deploy.sh &> | tee -a cc.log wget http://severalnines.com/downloads/s9s-haproxy.tar.gz tar zxvf s9s-haproxy.tar.gz cd haproxy ./install-haproxy.sh <lb host> <rhel|debian> galera done... Confidential 45
  • 46. Copyright Severalnines AB Confidential 46
  • 47. Copyright Severalnines AB Confidential 47
  • 48. Copyright Severalnines ABResources  Severalnines MySQL Galera Configurator  http://www.severalnines.com/resources/configurator  Supported platforms (MySQL Galera)  http://support.severalnines.com/entries/21589522-verified-and-supported-operating- systems  Galera limitations  http://support.severalnines.com/entries/21692388-limitations-in-galera-replication-for- mysql  ClusterControl server requirements  http://support.severalnines.com/entries/20614858-server-requirements-on-premise- amis-other-imageshttp://support.severalnines.com/entries/20614858-server- requirements-on-premise-amis-other-images Confidential 48

×