How to understand Galera Cluster - 2013

1,257
-1

Published on

Topics covered in this presentation:

1. The difference between traditional (e.g. MySQL) replication and Galera Cluster

2. General Galera Cluster principles

Published in: Technology, Travel

How to understand Galera Cluster - 2013

  1. 1. How to Understand Galera Cluster Alexey Yurchenko Codership Oy
  2. 2. 2 Agenda   The difference between traditional (e.g. MySQL) replication and Galera Cluster.   General Galera Cluster principles.
  3. 3. Galera Cluster Difference
  4. 4. 4 Traditional Replication Approach Server-centric: “One server streams data to another” Server 1 “master” “slave” Server 1 Server 2
  5. 5. 5 Traditional Replication Approach Cool topologies! 1 3 2 4 5 8 7 9 10 11 12 1 14 1113 17 1921 23 24 18 20 22 6 16 15
  6. 6. 6 Traditional Replication Approach But there are questions:   If node C crashes, do we still have a cluster?   If node B crashes and clients failover to C, how B joins back?   Which node has data X?   How do we backup the cluster? B A C
  7. 7. 7 Galera (wsrep API) Approach Data-centric: - it is synchronized between one or more servers Server 1 Server 2 Server 3 Server N. . . There are some Data (a dataset)
  8. 8. 8 Galera (wsrep API) Approach Data-centric: Data does not belong to a Node - Node belongs to Data
  9. 9. 9 Galera (wsrep API) Approach Data-centric: The dataset needs an ID: 00295a79-9c48-11e2-bdf0-9a916cbb9294 There are some Data (a dataset)
  10. 10. 10 Galera (wsrep API) Approach Data-centric: Dataset ID == Cluster ID 00295a79-9c48-11e2-bdf0-9a916cbb9294 Data ServerServerServerServerCluster
  11. 11. 11 Data-centric: Cluster A (9b0e4ad8-a42b-11e2-bdf0-9851bb738edb) Cluster B (e1eedd79-a5d6-11e2-a801-aae16e1f5226) Galera (wsrep API) Approach B A C
  12. 12. 12 Global Transaction ID (GTID) 00295a79-9c48-11e2-bdf0-9a916cbb9294:64201 Dataset 64198 64199 64200 64201 64202 64203 64204 64205 64206 64207... ... sequence of atomic changes+ =
  13. 13. 13 Global Transaction ID (GTID) 00295a79-9c48-11e2-bdf0-9a916cbb9294:64201 64201 00295a79- 9c48-11e2- bdf0- 9a916cbb9294: 64201 Transaction sequence number Dataset ID
  14. 14. 14 Global Transaction ID (GTID) 00295a79-9c48-11e2-bdf0-9a916cbb9294:0 - initial dataset 00295a79-9c48-11e2-bdf0-9a916cbb9294:1 - first change/transaction 00000000-0000-0000-0000-000000000000:-1 - undefined GTID
  15. 15. 15 Global Transaction ID (GTID) Galera GTID: 00295a79-9c48-11e2-bdf0-9a916cbb9294:64201 MySQL 5.6 GTID: 8182213e-7c1e-11e2-a6e2-080027635ef5:12345
  16. 16. 16 Global Transaction ID (GTID) Galera GTID: 00295a79-9c48-11e2-bdf0-9a916cbb9294:64201 MySQL 5.6 GTID: 8182213e-7c1e-11e2-a6e2-080027635ef5:12345 Data/Cluster ID Server ID
  17. 17. 17 Global Transaction ID (GTID) Galera GTID: 00295a79-9c48-11e2-bdf0-9a916cbb9294:64201 MySQL 5.6 GTID: 8182213e-7c1e-11e2-a6e2-080027635ef5:12345 Data/Cluster ID Server ID data change in the cluster transaction processed by the server
  18. 18. 18 Global Transaction ID (GTID) What we see in MySQL 5.6: 8182213e-7c1e-11e2-a6e2-080027635ef5:12345 8182213e-7c1e-11e2-a6e2-080027635ef5:12346 8182213e-7c1e-11e2-a6e2-080027635ef5:12347 ← new master promoted → f4e3bf7a-a91f-11e2-4e02-3f8dbcffaed8:1 f4e3bf7a-a91f-11e2-4e02-3f8dbcffaed8:2 f4e3bf7a-a91f-11e2-4e02-3f8dbcffaed8:3
  19. 19. 19 Global Transaction ID (GTID) What we see in Galera: 00295a79-9c48-11e2-bdf0-9a916cbb9294:64201 00295a79-9c48-11e2-bdf0-9a916cbb9294:64202 00295a79-9c48-11e2-bdf0-9a916cbb9294:64203 ← new master promoted → 00295a79-9c48-11e2-bdf0-9a916cbb9294:64204 00295a79-9c48-11e2-bdf0-9a916cbb9294:64205 00295a79-9c48-11e2-bdf0-9a916cbb9294:64206
  20. 20. 20 Global Transaction ID (GTID) 1)  Galera nodes are ANONYMOUS => all equal. 2)  Galera cluster == one big distributed “master”.
  21. 21. 21 Global Transaction ID (GTID) UUID1 UUID2 ASYNCHRONOUS
  22. 22. 22 Global Transaction ID (GTID) UUID1 UUID2 ASYNCHRONOUS UUID1:123 UUID2:539 UUID1:123 UUID2:543
  23. 23. 23 Global Transaction ID (GTID) SYNCHRONOUS / ASYNCHRONOUS <==> SINGLE DATABASE / INDEPENDENT DATABASES <==> CONSISTENCY / INCONSISTENCY
  24. 24. 24 cluster Master / Slave ? master1 slave2 slave1 slave2 slave1 slave2 slave1 slave2 slave1 master2 client1 client2 master for client 2 master for client 1
  25. 25. 25 Master / Slave ? 1.  Not a node role/function. 2.  Is a relation between a node and a client.
  26. 26. 26 Galera Cluster as a Meeting 00295a79-9c48-11e2- bdf0-9a916cbb9294
  27. 27. 27 Galera Cluster as a Meeting 00295a79-9c48-11e2- bdf0-9a916cbb9294
  28. 28. 28 Galera Cluster as a Meeting 00295a79-9c48-11e2- bdf0-9a916cbb9294
  29. 29. 29 Galera Cluster as a Meeting 00295a79-9c48-11e2- bdf0-9a916cbb9294
  30. 30. 30 Galera Cluster as a Meeting 00295a79-9c48-11e2- bdf0-9a916cbb9294
  31. 31. 31 Galera Cluster as a Meeting 00295a79-9c48-11e2- bdf0-9a916cbb9294
  32. 32. 32 Galera Cluster as a Meeting 00295a79-9c48-11e2- bdf0-9a916cbb9294
  33. 33. 33 Galera Cluster as a Meeting 00295a79-9c48-11e2-0800-9a916 cbb9294
  34. 34. 34 Galera Cluster as a Meeting 00295a79-9c48-11e2-0800-9a916 cbb9294 ? ? ?
  35. 35. 35 Galera Cluster as a Meeting e1eedd79-a5d6-11e2-0800- a8e16e1f5226 New meeting!
  36. 36. 36 Galera Cluster as a Meeting 00295a79-9c48-11e2- bdf0-9a916cbb9294 Galera way The “usual” way
  37. 37. 37 wsrep_cluster_address 10.0.0.2 10.0.0.1 10.0.0.3 10.0.0.6 10.0.0.4 10.0.0.5 ?
  38. 38. 38 wsrep_cluster_address 10.0.0.2 10.0.0.1 10.0.0.3 10.0.0.6 10.0.0.4 10.0.0.5
  39. 39. 39 wsrep_cluster_address 10.0.0.2 10.0.0.1 10.0.0.3 10.0.0.6 10.0.0.4 10.0.0.5
  40. 40. 40 wsrep_cluster_address 10.0.0.2 10.0.0.1 10.0.0.3 10.0.0.6 10.0.0.4 10.0.0.5
  41. 41. 41 wsrep_cluster_address 10.0.0.5 ? Nothing here
  42. 42. 42 wsrep_cluster_address wsrep_cluster_address = gcomm://node1,node2 => try to connect to members: node1, node2 wsrep_cluster_address = gcomm:// => no members in the cluster, you are the first one. Start a new cluster.
  43. 43. 43 Node Synchronization (State Transfer) 10.0.0.2 10.0.0.1 10.0.0.3 10.0.0.6 10.0.0.4 10.0.0.5
  44. 44. 44 Node Synchronization (State Transfer) SYNCED JOINED SYNCED DESYNC SYNCED UNDEFINEDUNDEFINED
  45. 45. 45 Node Synchronization (State Transfer) New node Cluster Old node Here's what I've got, give me what I'm missing You be the donor Found a donor for you, please hold Here's your stuff (through private channel) JOINER UNDEFINED DONOR JOINED JOINED SYNCED catch-up catch-up SYNCEDSYNCED
  46. 46. 46 Primary Component PRIMARY
  47. 47. 47 Primary Component PRIMARY NON-PRIMARY
  48. 48. 48 Primary Component PRIMARY NON-PRIMARY keeps on working tries to reconnect
  49. 49. 49 Primary Component PRIMARY
  50. 50. 50 Primary Component NON-PPRIMARY NON-PRIMARY SPLIT-BRAIN!
  51. 51. 51 2-node Cluster and Split Brain wsrep_provider_options=”pc.ignore_sb” VIP clients
  52. 52. 52 2-node Cluster and Split Brain Galera replication can be used in every manner traditional asynchronous master- slave replication is. It implements a SUPERSET of traditional replication functionality
  53. 53. 53 How Synchronous is Galera? 1.  Synchronous penalty. 2.  Slave lag.
  54. 54. 54 Galera Synchronous Penalty? The only thing Galera does synchronously is copying of data buffer to all cluster members on COMMIT command from client. => ~1 RTT added latency
  55. 55. 55 Galera Synchronous Penalty? ~1 RTT added latency => Connection throughput = 1/RTT trx/sec => Total throughput = 1/RTT trx/sec ✕ #connections
  56. 56. 56 Galera Synchronous Penalty in WAN (EC2) - sysbench client at us-east - sysbench client at eu-west 95% latencies Both clients connect to a standalone server at us- east accessibility zone 2-node Galera cluster: us-east client connects to us-east node, eu-west client connects to eu-west node
  57. 57. 57 Galera Synchronous Penalty in WAN (EC2) - sysbench client at us-east - sysbench client at eu-west Throughput Both clients connect to a standalone server at us- east accessibility zone 2-node Galera cluster: us-east client connects to us-east node, eu-west client connects to eu-west node
  58. 58. 58 Galera Synchronous Penalty? Still: CALLAGHAN'S LAW: A given row can't be modified more often than 1/RTT times a second (discovered by Mark Callaghan)
  59. 59. 59 Slave lag in Galera? Client Master node Slave node START TRANSACTION Replicate changeset OK process commit commit apply COMMIT SELECT (stale data)
  60. 60. Questions? Thank you for listening! Happy Clustering :-)
  61. 61. 61 Galera Cluster as a Meeting

×