Percona XtraDB Cluster SF Meetup


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Today I want to focus on High Availability questions.I personally define the current MySQL era as era of High Availability.While there are many materials how to setup and tune single server, HA for MySQL is still on the initial but raising stage.You may see an increasing interest to many third party software and scripts, like Continuent, MHA, MMM, Flipper, Percona Replication Manager, etc
  • So what is High Availablity. Availability refers to the ability of the user community to access the system, whether to submit new work, update or alter existing work, or collect the results of previous work. If a user cannot access the system, it is said to be unavailable. (Wikipedia)Word “High” refers to some pre-arranged level of performance during long period of time
  • An usual approach to provide availability is to have redundant systems. This is especially useful for services like web server or application server
  • Basically we duplicate resources
  • And a failover procedure is very simple. When one system is down, we redirect user requests to second system.
  • It apply to more than twosystems, having multiple servers for redundancy we can manage a probability of failure.If single system has probability P. then two servers have probability P/2,And for X servers: P/X
  • Or graphically we have an Inversely Proportional function – the more server you have, the less probability of failure
  • Was that easy ?Not, if we deal with databases
  • Databases is more complicated. There are two parts: a system (server) and data
  • And beside a system redundancy we need to provide a data redundancy
  • To provide availability for database, we need to provide availability for two components: service + data
  • This is where a replication comes into play. The replication is a process of sharing events between resources. When one of systems updates data, it informs other systems.
  • And MySQL Replication is of course is most known in MySQL world
  • When we speak Availability and it is based on MySQL replication, I have this picture in my mind. This systems works, but also it has many life-support elements, and if you do something wrong, your system fails.Well, yes, I exaggerate this a little, but I am doing it to draw your attention.There is a lot of MySQL replication setups which are to serve HA purposes, and they are doing great job.What is wrong with it – I will explain myself just in couple minutes.MySQL Replication is a great tool, no questions. Simple and easy to understand. This all made MySQL and MySQL Replication very popular. And MySQL Replication is one of factors why MySQL is most popular Open Source database. The lack of a good replication was the biggest issue for PostgreSQLusers.
  • So what’s wrong with MySQL Replication ? This is as simple as “a”
  • “a” in asynchronous
  • MySQL replication is asynchronous
  • asynchronous means that there is no a confirmation or a guarantee when an event will be applied on the replicated system.By its nature it assumes that there is a delay between a slave and a master. And the delay can be microseconds or hours…
  • In a simple term: asynchronous does not guarantee that your data is the same in this given point of time.
  • Synchronous communication is different. It assumes a confirmation process. The first systems are waiting on confirmation.
  • Wait, is not it how DRBD works?
  • Yes, DRBD. However we with DRBD we have a different problem. While data is always available, the system is not. DRBD only works in active-passive mode. Failover time depends on how fast you can wake up your system. And it cannot be 0 time.
  • On this step we are coming to Clustering solutions
  • And namely our solutions: PerconaXtraDB Cluster.As usually our soft is free and open source.You are free to install, use, uninstall, copy and distribute, provide commercial support, whatever you want.
  • In XtraDB Cluster the nodes are connected through group communication, and interexchange events synchronous
  • Well, in fact the process is not fully synchronous,but “Virtually synchronous”. I will not go in details now, it is fairly complicated, the good reading is by following link:
  • I tried to simplify the process, and this is the simplest one I could come up with.Important points: The network interaction happens when you issue COMMIT statement. At this time the NODE performs network communication and certification of transaction. On an intercontinental communication between nodes the network roundtrip can be significant ( e.g. 0.18 sec for nodes on Amazon-California and Amazon-Europe zone).Also important to note that Slave still can have a short period of time (i.e. less than 1 sec) when it is out of sync with master. The difference come from: Applying event on slave make take longer than COMMIT on the master.
  • So there is list of benefits that XtraDB Cluster provides.1. Is synchronous replication, the importance of this I already covered
  • Second, Multi-master replication
  • In a regular MySQL replication writes to several servers are possible, but you are looking for big troubles.This comes from the fact I already mentioned: being asynchronous, the second server may be behind, and updating it we may update a stale data.
  • With XtraDB Cluster it is different: you can update any server in the Cluster
  • Third benefit is: Parallel replication
  • It is well know limitation of MySQL that slave is only single threaded, and it is an additional factor why a slave may get behind a master. On modern servers with 32 cores, the master can perform much more updates per second than the slave with single thread can handle.
  • Fourth benefit is: data consistency.
  • In XtraDB Cluster we guarantee that data is equal on all nodes. A transaction is either committed on all nodes or not committed at all.
  • And Fifth, Automatic node provisioning
  • When a new node joins the Cluster, it automatically copies data from an existing node
  • Finishing with benefits, I want to give some attention to CAP theorem. You probably heard about it, especially how it is applied to different NOSQL distributed system. I will try to use to explain a difference between MySQL Replication and XtraDB Cluster using CAP theorem.
  • I will simplifythe theorem to next statement:In a distributed system you can have only two from following choice:Data consistencyNode availabilityPartition Tolerance
  • I understand that still sounds somewhat fuzzy, let me show the following example:Imagine in a system with three nodes, we have a network failure to one of nodes.
  • In this case MySQL Replication will provide you access to all nodes. Even the node disconnected from other nodes, you still are able to connect to MySQL locally or through available connection and extract and change even change data. However as you understand, the node will not receive updates from Cluster. The Data consistency is compromised.
  • WithXtraDB Cluster, we guarantee data consistency, but as a downside we can’t allow access to a node that disconnected from the cluster
  • As consequence: minimal recommended configuration is 3 modes, why ? Let’s see what we have with 2 nodes:
  • In 2 nodes configuration, in case of link failure, we have a case which has a special name “Split brain”.We can’t pretty much decide which node accepts queries and which is not.By default, if that happens: both nodes will refuse queries.
  • There is however a special option that allows you to have this schema, but you take responsibility on yourself.You need to make sure that you one and only one master, that is a server that executes update queries.
  • Once again, back to our theorem. It shows a principal difference between MySQL Replication and XtraDB Cluster.MySQL Replication provides you an access to all systems. XtraDB Cluster prioritize Data Consistency.
  • This applies to all external software and scripts that are based on MySQL replications.As long as MySQL replication is asynchronous, it does not guarantee a Data Consistency
  • On this stage I want to give a little more details about XtraDB Cluster
  • PerconaXtraDB Cluster is old good Percona Server + special replication patches + Galera library.The patches and Galera library are developed by Finnish company CodershipOy.
  • The fact, that XtraDB Cluster is based on Percona Server, which is compatible with MySQL, means that XtraDB Cluster is compatible with MySQL setups. You use the same InnoDB storage engine, and database server behaves the same way: queries have the same execution plan, you use the same configurations and the same optimization techniques.
  • There are also minimal efforts to migrate from existing working systems to system running XtraDB Cluster. It is not much harder than to upgrade from MySQL to Percona Server. If you did it, you know that is quite easy: you just replace old binaries by new binaries files. For XtraDB Cluster you will need to make couple changes to configuration file.
  • And also important, there is no lock-in, which for me is an quite important factor. If you do not like this solution by some reason there is always an easy way to return to previous setup.
  • This allsounds so good, so is this a perfect solution? I want to answer “Yes”, but you won’t believe me. Of course there are limitations.
  • It is a new product and new solution and there are limitations, some of them will be resolved later and we are already working on.
  • The first limitation is that only InnoDB tables are supported. Changes to MyISAM tables are not replication, so you need to make sure you have only InnoDB when you test XtraDB Cluster.
  • The second limitation or incompatibility for applications is that XtraDB Cluster introduces OPTIMISTIC locking. This locking is applied not to all cases but to transactions that running on different servers. Let me explain what does it mean for you.
  • First, traditionalInnoDB locking, for transactions on the same server.When two transactions are trying to update the same row, the second transaction waits until first COMMITs or ROLLBACKs
  • In XtraDB Cluster running transactions on different servers (multi-master) we can get Error on COMMIT statement.Once again, this applies for different servers. If we run on the same server, we have a traditional InnoDB locking.On different servers, however, we get so named “OPTIMISTIC locking”. Two transactions are running lockless, as they assume there is no conflicts, and later, only on COMMIT stage transactions are communicate each with other. Again, we have this model because all communication happens on COMMIT stage. So if on this stage transaction 2 finds out it updated a row, that also updated by another transaction, then transaction 2 performs ROLLBACK and returns ERROR to client.This is not something that usually happens in traditional applications based on MySQL.And your application may not be ready for that. Fair to say that many applications or frameworks do not handle errors on COMMIT query.And this may require changes in the application logic if you expect to run transactions on different nodes.But also this could be the ONLY ONE significant change you need.
  • Ok, next limitation. Write performance is limited by weakest node you have. This is price we pay for a data consistency. If one of nodes becomes suddenly slow (by different reasons, i.e. a disk failure in a RAID) , write queries are equally slow in the whole cluster. Let me show why
  • When user runs update on one of servers, this write event is communicated to all nodes. The user gets the confirmation after server gets the confirmation for every node.If one of nodes is slow, the whole cluster is slow.
  • Now let’s talk about write intensive application. Write intensive I mean very high rate of updates/inserts/deletes per second. If you have this case,there will be some limit of how much data you can have in the cluster. And this limitation is not physical, there is nothing hardcoded, but it is rather logical. Let me explain why
  • I also will explain how Cluster handles JOIN process. Let’s assume one new node wants to join to existing cluster. As we already discussed, it has to have full copy of data, to have the same data as others nodes in the cluster.
  • So what happens:Cluster allocates one node, which gets status DONOR andJOINER copies whole dataset from DONORYou understand that, for example, for 200GB of data it make take time to copy it over network.Meantime DONOR is also gets OUT OF CLUSTER. It may be short or long period of time. It depends on what copying method you choose. I will show different copying methods later.
  • Now, when data copying is finished - we have two nodes that were disconnected from Cluster.And they need to apply events that happened while they were disconnected.If your have big database, and if you have an intensive rate of changes, it may take long time.And while events are applied to DONOR and JOINER, new events may still happen in the cluster.You understand that these two are trying to catch up, but the cluster generates and generates new events.In the worst case these two outsides may never catch up.
  • So for write intensive applications this should be Hardware + Software solution. This is the case which we can’t solve only using software solutions.And this applies not only to XtraDB Cluster. Let me show an analogy.
  • Let’s take singleInnoDB system. When we need good write performance from InnoDB, the usual setup is to have Disk Array. You can’t have decent performance with single disk.
  • If you need InnoDB to provide good performance and durability, you need not just Disk Array, but Array with cache, which backed by a battery.
  • The same for cluster.For write intensive applications and good performance in cluster, you will need Good networking, like 10 Gigabit or Infinibad andGood storage, condider SSD drives or PCIe Flash cards
  • Let’s back to JOIN process. As I promised, let me review methods XtraDB Cluster can use to copy date. The process by it self has a name: State Transfer
  • InXtraDB cluster we have two State transfers:Full data copy: this is Snapshot State Transfer. It happens when totally new node joins cluster. Or node was in the cluster, but then by some reason it was disconnected for long period of time.And second is Incremental State Transfer. This happens when node was disconnected for short period of time
  • For Snapshot State Transfer (full copy of data), we have following choice:mysqldump, obviously it is good for small database to just to play with clusterRsync, the data is copied using rsync process. Usually it is a fast way to copy data, but the drawback is that DONOR is disconnected from cluster for whole copying timeXtraBackup. With xtrabackup the donor is disconnected for short period of time, but copying data and joining may be slower than rsync
  • Incremental State Transfer is used when a node was in the cluster but we had it put it down for short period of time, like server reboot, or change some configuration parameters. The second case when it can be used (but not yet, this is work in progress), is when node crashed. Yes, it happens. Unfortunately at this moment, after the crash node has to perform Full Snapshot State Transfer.
  • Ok, if you are still with me, we can continue with scalabilty topic
  • Scaleability for me is quite similar to availability. That’s why we can use XtraDB Cluster for needs to scale a load
  • Scaleability is similar to availability is a sense how it can be handled: by redundancy. In this case only difference is that the first system is not able to handle a user’s request not because it is down, but because it is overloaded. An experience for the user is the same: the system refuses to handle his query, the system is not available for the user. It can be handled by redirecting query to a second system.
  • InXtraDB Cluster it is easy to scale reads. Reads queries do not require additional overhead or group communication
  • Scaling writes is more complicated. As each write has to be replicated on every system. Each server has to handle writes coming from all servers
  • We can make some a rule of thumb. It is very approximate. Just to get basic understanding how to look on it.If we have N servers, and our workload is 100% reads, we can scale as much as N factor.For 100% writes – we can scale only to some constant or even can’t scale at all
  • That is if 1 server can handle 100 read queries per second, than 10 servers can handle 1000 the same queries per second.For 100% of write traffic there is no much room to grow. Because of internal communication I showed before,If 1 server can handle 100 update queries per second, then 10 servers still are able to handle only 100 the same queries per second.Actually it can be a little better. Because internal communication happens in optimized
  • With all this group communication over network and synchronous process, is it fast ?Actually it is reasonably fast. Certification and virtual synchronous minimizes overhead.If we look at two performance characteristics: response time and throughput, the response time the one that may take hit. A network roundtrip for sure will increase it. If it is critical, make sure you have a decent network and storage.The throughput is less affected. As we can do many operations in parallel, we can have reasonable performance numbers for throughput.
  • One of setup when XtraDB Cluster is considered is an intercontinental, or inter-coast replication
  • I do not like this question. I usually answer “It is different”. I mean, it is really different systems with different goals, how you can compare it. But usually people do not like this answer.
  • That’s why I come up with this a marketing-like table with checklist.
  • Percona XtraDB Cluster SF Meetup

    1. 1. Percona XtraDB Cluster powered by Galera Vadim Tkachenko Percona Inc, co-founder, CTO
    2. 2. Percona XtraDB ClusterThis talk online• PowerPoint •• PDF •• Google Docs •
    3. 3. Percona XtraDB ClusterThis talk High Availability Replication Cluster
    4. 4. Percona XtraDB ClusterWhat is HAAvailability Avail ~ Ability Ability to Avail
    5. 5. Percona XtraDB ClusterAvailability by redundancy
    6. 6. Percona XtraDB ClusterDuplicate resources
    7. 7. Percona XtraDB ClusterFailover
    8. 8. Percona XtraDB ClusterProbability of failure Single Twoserver: P servers: P/2 X servers: P/X
    9. 9. Percona XtraDB ClusterProbability of failureProbability of failure 1 2 3 4 5 6 7 8 9 10 N of servers
    10. 10. Percona XtraDB ClusterEasy ?Not if we deal with databases
    11. 11. Percona XtraDB ClusterDatabase
    12. 12. Percona XtraDB ClusterRedundancy ?
    13. 13. Percona XtraDB ClusterDatabase availability is hard Service Dataavailability availability
    14. 14. Percona XtraDB ClusterReplication
    15. 15. Percona XtraDB Cluster
    16. 16. Percona XtraDB ClusterWhat is wrong with MySQLreplication ? “a”
    17. 17. Percona XtraDB ClusterWhat is wrong with MySQLreplication ? “a” in async
    18. 18. Percona XtraDB ClusterWhat is wrong with MySQLreplication ? “async” vs “sync”
    19. 19. Percona XtraDB ClusterAsync
    20. 20. Percona XtraDB ClusterAsync
    21. 21. Percona XtraDB Clustersync
    22. 22. Percona XtraDB ClusterDidn’t we just reinventDRBD ?
    23. 23. Percona XtraDB ClusterDRBD
    24. 24. Percona XtraDB ClusterClustering
    25. 25. Percona XtraDB ClusterPercona XtraDB ClusterFree and Open Source
    26. 26. Percona XtraDB ClusterPercona XtraDB Cluster
    27. 27. Percona XtraDB ClusterVirtually synchronous
    28. 28. Percona XtraDB ClusterVirtually synchronous
    29. 29. Percona XtraDB Clustersynchronous multi-master replication replication parallel dataapplying on consistency slaves automatic node provisioning
    30. 30. Percona XtraDB Clustersynchronous multi-master replication replication parallel dataapplying on consistency slaves automatic node provisioning
    31. 31. Percona XtraDB ClusterMulti-master: MySQL
    32. 32. Percona XtraDB ClusterMulti-master: XtraDB Cluster
    33. 33. Percona XtraDB Clustersynchronous multi-master replication replication parallel dataapplying on consistency slaves automatic node provisioning
    34. 34. Percona XtraDB ClusterParallel apply: MySQL
    35. 35. Percona XtraDB ClusterParallel apply: XtraDB Cluster
    36. 36. Percona XtraDB Clustersynchronous multi-master replication replication parallel dataapplying on consistency slaves automatic node provisioning
    37. 37. Percona XtraDB ClusterXtraDB Cluster data consistency
    38. 38. Percona XtraDB Clustersynchronous multi-master replication replication parallel dataapplying on consistency slaves automatic node provisioning
    39. 39. Percona XtraDB ClusterNode provisioning
    40. 40. Percona XtraDB ClusterCAP theorem
    41. 41. Percona XtraDB ClusterPick only TWO Node Consistency availability Partition Tolerance
    42. 42. Percona XtraDB ClusterNetwork failure
    43. 43. Percona XtraDB ClusterMySQL Replication Access to all systems - YES Data consistency - NO
    44. 44. Percona XtraDB ClusterXtraDB Cluster Access to all systems - NO Data consistency - YES
    45. 45. Percona XtraDB Cluster3 nodes is the minimalrecommended configuration
    46. 46. Percona XtraDB ClusterSplit brain Which system to make available ?
    47. 47. Percona XtraDB ClusterSplit brain You still can have this setup But you deal with consequences
    48. 48. Percona XtraDB ClusterChoice MySQL Replication: Access to all systems XtraDB Cluster: Data consistency
    49. 49. Percona XtraDB Cluster MHAMMM MySQL replication based Flipper PRM
    50. 50. Percona XtraDB ClusterPercona XtraDB Clusterdetails
    51. 51. Percona XtraDB Cluster Percona Server Percona WSREPXtraDB Cluster patches Galera library
    52. 52. Percona XtraDB Cluster Fullcompatibilitywith existing systems
    53. 53. Percona XtraDB ClusterMinimalefforts tomigrate
    54. 54. Percona XtraDB Cluster Minimal efforts toreturn back to MySQL
    55. 55. Percona XtraDB ClusterSo, is this a perfect solution?
    56. 56. Percona XtraDB ClusterLimitationssome will be solved later
    57. 57. Percona XtraDB ClusterOnly InnoDB tables aresupported
    58. 58. Percona XtraDB ClusterOPTIMISTIC locking fortransactions on different servers
    59. 59. Percona XtraDB ClusterTraditional locking
    60. 60. Percona XtraDB ClusterOptimistic locking
    61. 61. Percona XtraDB ClusterThe write performance islimited by weakest node
    62. 62. Percona XtraDB ClusterWrite performance
    63. 63. Percona XtraDB ClusterFor write intensive applicationsthere could be datasize limit pernodeNot physical but logical
    64. 64. Percona XtraDB ClusterJoin process. Step 1
    65. 65. Percona XtraDB ClusterJoin process. Step 2
    66. 66. Percona XtraDB ClusterJoin process: step 3
    67. 67. Percona XtraDB ClusterThis is software + hardwaresolution
    68. 68. Percona XtraDB ClusterInnoDB write performance
    69. 69. Percona XtraDB ClusterInnoDB performance + ACID
    70. 70. Percona XtraDB ClusterCluster performance •10 GigE Network •Infiniband •SSD Storage •PCI-e Flash
    71. 71. Percona XtraDB ClusterJoin process
    72. 72. Percona XtraDB ClusterState Transfer Full data Incremental SST IST Node New node disconnected short time Node long time disconnected
    73. 73. Percona XtraDB ClusterSnapshot State TransferMysqldump Rsync XtraBackup Donor Donor Small disconnected disconnected databases for copy time for short time faster slower
    74. 74. Percona XtraDB ClusterIncremental State Transfer Node was in cluster Disconnected for maintenance Node Crashed (work in progress)
    75. 75. Percona XtraDB ClusterScaleability
    76. 76. Percona XtraDB ClusterScaleability Scale ~ Ability Ability to Scale
    77. 77. Percona XtraDB ClusterScaleability is similar to availability
    78. 78. Percona XtraDB ClusterXtraDB Cluster: Reads scalability iseasy
    79. 79. Percona XtraDB ClusterWrite scalability is complicated
    80. 80. Percona XtraDB ClusterN servers scale to :100% reads •N factor … •… 50/50 •N/2 factor … •…100% writes •1 or const
    81. 81. Percona XtraDB Cluster10 servers scale to : 100% • 1 server: 100 q/s • 10 servers: 1000 q/s reads • 1 server: 100 q/s 50/50 • 10 servers: 500 q/s 100% • 1 server: 100 q/s • 10 servers: 100 q/s writes (can be more)
    82. 82. Percona XtraDB ClusterFAQQuestions I am asked
    83. 83. Percona XtraDB ClusterIt looks so easy. Why didnot you implement it earlier?It is not easy.Computer science of groupcommunication and distributedtransactions.Credits to Codership Oy
    84. 84. Percona XtraDB ClusterHow fast is it?Reasonably fast.
    85. 85. Percona XtraDB ClusterCan I replicate XtraDBCluster to MySQLReplication?Yes
    86. 86. Percona XtraDB ClusterAsync MySQL Replication
    87. 87. Percona XtraDB ClusterWould I install it on aproduction system?Yes. I am going to to use XtraDB Cluster
    88. 88. Percona XtraDB ClusterHow it is compared toMySQL Cluster?It is different
    89. 89. Percona XtraDB Cluster XtraDB MySQL Cluster ClusterEasy to migrate Easy to use Cloud / EC2 Changes in anapplication Write scaling 99.999% 
    90. 90. Percona XtraDB ClusterResources••• Virtual synchrony •• CAP Theorem •• Optimistic locking •
    91. 91. Percona XtraDB ClusterCredits• WSREP patches and Galera library is developed by Codership Oy
    92. 92. Percona XtraDB ClusterThank you! Questions ? You can try Percona XtraDB Cluster today!