Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PoC: Using a Group Communication System to improve MySQL Replication HA


Published on

High Availability solutions for MySQL Replication are either simple to use but introduce a single point of failure or free of pitfalls but complex and hard to use. The Proof-of-Concept sketches a way in the middle. For monitoring a group communication system is embedded into MySQL usng a MySQL plugin which eliminates the monitoring SPOF and is easy to use. Much emphasis is put of the often neglected client side. The PoC shows an architecture in which clients reconfigure themselves dynamically. No client deployment is required.

Published in: Technology
  • Be the first to comment

PoC: Using a Group Communication System to improve MySQL Replication HA

  1. 1. PoC: MySQL HA improved Ulf Wendel, MySQL/Oracle
  2. 2. The speaker says... If on a sailing boat on the wide, wide ocean and your captain is the only one who knows how to sail, would you feel safe? If a police helicopter, that will eventually loose sight or fail, monitors your ship, would you feel safer? No? You are right. Both the captain and the helicopter are Single Points of Failure. BTW, does you MySQL Replication cluster have a proper high availability configuration with no single point of failure? No? Because it is too complicated? Hint: use a GCS for MySQL HA!
  3. 3. Tip of the day A Single Point of Failure cannot cure a SPOF Ulf Wendel, MySQL/Oracle
  4. 4. The speaker says... MySQL Replication has a Single point of Failure: the master server. The master is, by design, the weak spot of every primary copy based replication cluster*. Replication breaks when the master fails. At best, read only queries can be served from the slaves until a new master has been elected and set up. Read: downtime, service outage, costs. Primary copy is still a valid design choice. It is simple. It is fast. But, the Single point of Failure (SPOF) remains fact… A clients view on existing solutions and a Proof-of-Concept mashup based on a recent Group Communication System. * gives a technical overview on database clustering theory (MySQL Cluster, 3rd party...)
  5. 5. Things to care a lot about • Master database process and/or master host monitoring • How to identify a failover candidate • How not to loose transactions, ever The Servers' worries Master (Primary) Slave (Copy) Slave (Copy) GTID = 12 GTID = 9 GTID = 12 Monitor
  6. 6. The speaker says... Let's recap. If the Master of a MySQL Replication system fails a slave must be promoted to become the new master. All slaves are examined to identify the most recent one. Finding the most recent slave only recently became less troublesome with the introduction of Global Transaction Identifier (GTIDs) in MySQL 5.6. Then, the candidate must be promoted to master and all other slaves must be updated to continue replicating from the new master. In heterogenous deployments with older MySQL versions, searching for the latest transactions and applying them on all slaves can be quite demanding. Hence, use a tool for it!
  7. 7. Introduced as MySQL 5.6 Utility • Health monitoring, Failover • Aims for 99.9% HA – 8 hours downtime per year Example: mysqlfailover utility Master (Primary) Slave (Copy) Slave (Copy) GTID = 12 GTID = 9 GTID = 12 heartbeating mysqlfailover
  8. 8. The speaker says... For years, MySQL has recommended using 3rd party monitoring solutions for MySQL Replication. MySQL 5.6 finally introduces the mysqlfailover command line utility. It sends heartbeats to the nodes of a MySQL Replication cluster and monitors their health. If required, it performs a failover. Due to the complexity of the failure, its much welcome to see it being automated. BTW, the utility is now GA, and can be run as a daemon.
  9. 9. Common design because of its simplicity • See mysqlfailover • See 3rd party, for example, MHA (MySQL High Availability) Result: SPOFs doubled SPOF: Master Slave (Copy) Slave (Copy) GTID = 12 GTID = 9 GTID = 12 SPOF: Network SPOF: Monitor
  10. 10. The speaker says... If the MySQL Replication Master is a Single Point of Failure, what is a single health monitor? Be it MHA or mysqlfailover, this approach introduces a new Single Point of Failure: the health monitor. Given how rare a failover is, and given how unlikely it is that two systems – the master and the monitor – fail at the same time, it is still a valid design. In the worst case, you can still manually (re)start the monitor. However, unnecessary failover may happen if the monitor uniliterally looses contact to the master. Generic HA Cluster solutions such as Windows Clustering or its Linux counterpart address such
  11. 11. Developed and pushed by major Linux vendors • Pacemaker, Corosync/Heartbeat, DRBD • Aims for 99.99% HA – 50 minutes downtime/year Generic HA Cluster solution Master (Active) Slave (Active) Pacemaker (CRM) Corosync (CCM) Master (Standby) Pacemaker (CRM) Corosync (CCM) Pacemaker (CRM) Corosync (CCM) DRBD DRBD
  12. 12. The speaker says... Higher HA levels require a significant more complex architecture, such as a combination of Pacemaker, Heartbeat/Corosync and DRBD. Pacemaker is a Cluster Resource Manager (CRM) that manages arbitrary services, for example, MySQL servers. The managed services are monitored by a Cluster Communication Manager (CCM), such as Corosync. Everything is broken into small, independent programs. There are no SPOFs because all the programs run on all the cluster nodes. A Distributed Replicated Block Device (DRBD) mirrors the MySQL master to lower the risk of transaction loss and speed up failover. A tad complicated, maybe...
  13. 13. Things to care a lot about • Available servers, their roles, and possibly replication lag • Partitioning and sharding hints, if needed • Real-time server load The Clients' worries Master (Primary) Slave (Copy) Slave (Copy) Load 80/100 Load 5/100 Load 95/100 All tables: A, B, C Tables: A, B Tables: C Lag: 0s Lag: 1 second Lag: 32 seconds
  14. 14. The speaker says... Given enough information about the nature and status of a database cluster an intelligent client can dimmish the line between connecting to a single database and a database cluster. The mission statement of PECL/mysqlnd_ms*, a load balancer plugin for the PHP MySQL driver, is to hide the complexity of database clusters from the application developer: load balancing, read write splitting, read-your- writes (consistency), sharding and partitioning support, connection pool management, automatic caching of selected slave requests, GTID… - all done by the driver! Dear database and/or HA cluster, just tell the driver. * (it got even more feature loaded meanwhile)
  15. 15. Typical failover procedure • Master failure: switch virtual IP, no client deployment • Slave failure: deploy all clients • Role, status, lag, load, distribution, … : hacks, at best The Server/Client clash Failed Master New Master Slave Client Virtual IP Virtual IP Master: Virtual IP, Slave:
  16. 16. The speaker says... All too often HA solutions stop to care about clients beyond performing an virtual IP switch as part of a MySQL replication master failover. But failover is only a fraction of the task: the cloud era demands elastic clusters. At best, HA solutions support the use of proxy servers to redirect client requests. Proxies add complexity to the stack, they add latency to all requests, they can become bottle- necks, evolve into SPOFs and their failure affects many clients, not just one. Driver-integrated load balancers, such as PECL/mysqlnd_ms don't have these disadvantages*! * (Pro/Con discussion of different load balancing approaches)
  17. 17. Server plugin for monitoring and management • Monitoring based on Group Communication System plugin • Management may utilize external scripts • Clients read I_S on any node for automatic self-deployment A mashup for the mess Master Slave Slave Plugin: CCM/CRM Plugin: CCM/CRM Plugin: CCM/CRM
  18. 18. The speaker says... Could a MySQL server plugin using a Group Communication System (GCS) offer the robustness of a Pacemaker, Corosync approach (no SPOF), beat all including mysqlfailover on ease-of-use and be driver-based proxies best friend for near zero administration? At any time, a GCS can report its members and share state information among all members. A failed member (MySQL server) can be detected automatically by the GCS and appropriate action can be taken, for example, running the mysqlfailover command line utility to perform failover. If a client fails to connect to a node, it queries the INFORMATION_SCHEMA on any of the remaining to learn about changes.
  19. 19. Compared to Pacemaker/Corosync • Similar no SPOF design for the monitor • Aims for: out-of-the-box experience, smaller installations • Aims for: continous, automatic client reconfiguration Recap: simplified, client focus MySQL Plugin: CCM/CRM MySQL Pacemaker (CRM) Corosync (CCM) DRBD GTID based
  20. 20. The speaker says... A GCS not only helps with failover. It can also report newly added nodes. Clients can periodically check the list of nodes and start to use the new ones automatically. In the proposed solution, clients use plain SQL to learn about changes. Clients learn from MySQL nodes. State information is exposed through the INFORMATION_SCHEMA and exchanged (synchronously) by help of a GCS. Data may include load, replication lag, … - whatever, clients reconfigure themselves continously. That's the idea. (Basically, its a system that uses lazy primary copy for
  21. 21. Compared to super-sized central management monitor • No SPOF design for the monitor • No central server that can get overloaded • No new communication channels for client reconfiguration Recap: simplified, client focus MySQL Plugin (CCM/CRM) Monitor (CRM) MySQL Client (SQL) Client (Reconf) MySQL Client (SQL) Client (Reconf) Client (SQL) MySQL CCM/CRM Client Client (Reconf)
  22. 22. The speaker says... The proposed system is better than a super-sized monitor that manages state (nodes, roles, load, …) centrally. A super-sized monitor can easily become a single point of failure. If clients notice the failure of a MySQL node and thousands of clients almost concurrently query the one centralized monitor to learn about cluster state changes the central monitor likely gets overloaded. Finally, a centralized monitor likely forces clients to learn a new, additional protocol for communication with the monitor. Not so with the GCS approach: SQL for everything. No overloading: load is distributed on all remaining MySQL servers.
  23. 23. nixnutz@linux-dstv:~/src/isis_201245> rm isis_deamon.exe ; dmcs isis_deamon.cs Isis.cs ; mono ./isis_deamon.exe Isis: Searching for the Isis ORACLE... [IsisDaemonMain] Connecting to ISIS... [GroupConnector][view change][<mysql> incoming multicasts delivery thread] Some view change (e.g. join), no action required [IsisDaemonMain] Starting daemon for communication with MySQL. [GroupDaemon] Server is waiting on socket [IsisDaemonMain] Started. Listening to client requests. Q: 31 'join 3400 on master' A: 4 '0 OK' [GroupConnector][remote mysql register][RemoteRegisterMySQLServer mysql] Added (50851)- 14:13:07- 30 New server count 1 LastUpdate before heartbeat 07.08.2013 14:13:07 LastUpdate after heartbeat 07.08.2013 14:13:17 [GroupConnector][heartbeat][<mysql> incoming multicasts delivery thread] Thread.CurrentThread.Name Ignoring Hearbeat message to ourselves to avoid deadlock. Q: 29 'heartbeat 3400 on' A: 56 '(50851) 3400 on master 30 07.08.2013 14:13:17HA MySQL cluster using: ORACLE Rendevous Service
  24. 24. The speaker says... Let's hack it ?! The first steps are trivial. The biggest challenge is to find a free and open source C/C++ group communication system that can be embedded in a MySQL daemon server plugin. Corosync has a client/server-deamon design which is no perfect match for the task. A brother, the Spread Toolkit, is somewhat limited to ~40 nodes and the API is not that appealing. LibPaxos is intentionally licensed under GPLv3 to make clear its experimental. The rest: old, Java, ... The OSS world lacks a cool C/C++ GCS :-/. Then came Isis2! C#, wrong language; but what a nice API! Isis2 is designed to make distributed cloud computing easy. Thus, its ideal for a PoC that hides all the glory details ;-)
  25. 25. ++Isis, Ken Birmans 1980/1990 masterpiece improved • • Virtual Synchrony Model* merged with Paxos ideas • Most easy to use yet powerful API • Distributed (programming language) objects • Distributed key-value store/hashtable • From low level unreliable messaging for gossip protocols to high level globally ordered reliable messaging – it's all there • New BSD license, C#/.NET (pure C++ port considered) • Aims to support cloud sized clusters (thousands of nodes) * (Virtual Synchrony, slide 34+) Isis2 Cloud Computing Library
  26. 26. The speaker says... Isis2 is almost a perfect choice for the PoC. The API is perfect, the documentation is great but its written in C#/.NET. This not only means I had to use Mono on my preferred Linux platform but there is a language barrier that destroys parts of the beauty of the proposed solution. MySQL and its plugins are written in C/C++. One cannot call/link the Isis2 library from a MySQL plugin and use Isis2's neat distributed (programming language) objects feature or any of it other services directly. A proxy/connector, a socket server, is required to communicate between MySQL and Isis2. It is an undesired extra layer. Still... yippie yeahh!
  27. 27. MySQL APIs and language barriers require a compromise • C#/.NET socket server becomes Isis2 client • C/C++ MySQL daemon plugin sends heartbeat to socket server • C/C++ MySQL INFORMATION_SCHEMA plugin Sad but true: compromises... MySQL Plugin: I_S tables Plugin: heartbeat to Isis2 client Isis2 client w. socket server MySQL Plugin: Connector Plugin: Connector Isis2 client: CCM/CRM
  28. 28. The speaker says... The C#/.NET to C/C++ language barrier does kill some of simplicity and beauty of the proposal. A MySQL plugin cannot be an Isis2 client. Instead, a C#/.NET Isis2 client (node) has to run as a socket server and communicate with MySQL through a socket. The process model gets complicated. On the MySQL side we now need a daemon plugin that sends a heartbeat to the local Isis2 client and another MySQL plugin that implements the I_S tables a SQL client can query to learn about the clusters members and state. Think of the plugins as „Connectors“. The Isis2 client takes the CCM/CRM role – as planned.
  29. 29. For every MySQL server do: • Start Isis2 client: mono ./isis_deamon.exe • Configure Connector Plugins, e.g. Isis2 client address • Heartbeat to Isis2: INSTALL PLUGIN isis2d SONAME '' • I_S Plugin: INSTALL PLUGIN isis2is SONAME '' • Teach your clients to monitor INFORMATION_SCHEMA The proposed user manual mysql> SELECT * FROM INFORMATION_SCHEMA.ISIS2ISG *************************** 1. row *************************** ERROR: ERRNO: 0 ISIS2_MEMBER: (50249) MYSQL_HOST: MYSQL_PORT_OR_SOCKET: 3400 MYSQL_STATUS: on MYSQL_ROLE: master MYSQL_HEARTBEAT_TTL: 30 MYSQL_HEARTBEAT_LAST: 07.08.2013 13:50:22 1 row in set (0,01 sec)
  30. 30. The speaker says... DBA instructions. Start a daemon (only required because of the C# - C/C++ mismatch – otherwise it would be part of the plugin!). Then, install some plugins, let the MySQL server announce its availability and you are done. The servers communicate with each other and jointly discover new servers entering the cluster and servers leaving it. If a master leaves, a failover script can be called to reconfigure the cluster. On any of the MySQL nodes in your cluster, a client can get a list of currently connected MySQL servers and their state (e.g. role). The state is replicated synchronously. HA built-in to MySQL, job done ;-). Let's talk about details...
  31. 31. Isis2 client socket server start Joining Isis2 client Isis2 clientIsis2 clientConnected clients Joining Isis2 client Isis2 clientIsis2 clientCurrent leader ORACLE Rendevous Service Isis2 Group ORACLE Rendevous Service Isis2 Group Isis: Searching for the Isis ORACLE... Isis: Found the Isis.ORACLE service, attempting to connect. [IsisDaemonMain] Connecting to ISIS... [state transfer] received: (51971)- [state transfer] received: (51971)-
  32. 32. The speaker says... When the DBA starts the Isis2 client socket server on a MySQL host, the client tries to connect to a virtual, distributed group in the cloud. The Isis2 library calls the code that manages virtual groups the ORACLE – no joke! Once the client is connected, a checkpoint is done to transfer the state of already connected clients to the joining one. The state transferred is the list of the MySQL servers, if any, that contacted their local Isis2 clients to register themselves in the group. All this takes less than 100 lines of C#.
  33. 33. New Isis2 client Isis2 client socket server start Isis2 clientIsis2 clientConnected clientsJoined Isis2 client ORACLE Rendevous Service Isis2 Group Local socket server [IsisDaemonMain] Starting server for communication with MySQL. [GroupDaemon] Server is waiting on socket Isis2 clientIsis2 clientConnected clients ORACLE Rendevous Service Isis2 Group [GroupConnector][view change][<mysql> incoming multicasts delivery thread] Some view change (e.g. join), no action required
  34. 34. The speaker says... As soon as an Isis2 client joins the group, a view change message is send to all group members. In our case it gets ignored. A joining Isis2 client does not change the list of MySQL servers that have registered themselves in the cluster.
  35. 35. Isis2d heartbeat MySQL plugin Isis2 clientIsis2 clientRemote clientsLocal Isis2 client ORACLE Rendevous Service Isis2 Group Socket server [GroupConnector][remote mysql register][RemoteRegisterMySQLServer mysql] Added (54830)- 16:28:26- 30 New server count 1 MySQL Isis2d Plugin Q: 31 'join 3400 on master'
  36. 36. The speaker says... MySQL does not appear on stage before the DBA loads the first of the two „Connector“ plugins, the Isis2 daemon plugin, into MySQL. When the plugin respectively the MySQL server starts, it send a join message to the local Isis2 client. The client parses it and sends a message to all group members. Hereby all connected Isis2 clients on all hosts learn (virtually) synchronously about the MySQL server. The use of the Isis2 SafeSend()function ensures globally ordered and reliable messaging. This is by far the slowest Send() variant, but it simplifies our job. Either all or no Isis2 group members add the MySQL server to their list of
  37. 37. Isis2d heartbeat MySQL plugin Isis2 clientIsis2 clientRemote clientsLocal Isis2 client ORACLE Rendevous Service Isis2 Group Socket server [GroupConnector][heartbeat][RemoteHeartbeatmysql] heartbeat server (54830)- 16:28:36 MySQL Isis2d Plugin Q: 29 'heartbeat 3400 on' A: 56 '(54830) 3400 on master 30 07.08.2013 16:28:36
  38. 38. The speaker says... Unfortunately, due to the C# to C/C++ language barrier, our MySQL Servers are no members the Isis2 group but the local Isis2 client are. Thus, Isis2s' own group membership service cannot monitor the MySQL processes directly. A hack – heartbeating - is needed to tell each local Isis2 clients about the state of its associated MySQL Server. Every now and then the Isis2 heartbeat plugin sends a heartbeat. The local Isis2 client then increases the TTL of the servers entry in the groups server list. It is also the local Isis2 client that may drop a server from the list if it fails to send heartbeat messages.
  39. 39. Isis2d heartbeat MySQL plugin Isis2 clientIsis2 clientRemote clientsLocal Isis2 client ORACLE Rendevous Service Isis2 Group Socket server [GroupConnector][unregister][RemoteUnregisterMySQLServermysql] removing server (53743)- 16:20:29-30 MySQL Isis2d Plugin Q: 22 'leave 3400'
  40. 40. The speaker says... Upon plugin respectively MySQL server shutdown, the Isis2d daemon plugin sends a leave message to its local Isis2 client. The leave message is handled like the join message: parse message, forward to all using SafeSend(). Neither a missed heartbeat nor a leave command trigger any failover logic in the PoC, although this is what should be done. I didn't want to spend more than few days fooling & toying around with the coding. Also, the main message should come clear even without triggering a failover script, which would be a rather trivial thing to do, given a proper failover script.
  41. 41. Isis2is I_S MySQL plugin Local Isis2 client Socket server MySQL Isis2is Plugin Q: 12 'serverlist' A: 58 '0 (54830) 3400 on master 30 07.08.2013 17:36:58' mysql> SELECT * FROM INFORMATION_SCHEMA.ISIS2ISG ***************** 1. row ***************** ERROR: ERRNO: 0 ISIS2_MEMBER: (54830) MYSQL_HOST: MYSQL_PORT_OR_SOCKET: 3400 MYSQL_STATUS: on MYSQL_ROLE: master MYSQL_HEARTBEAT_TTL: 30 MYSQL_HEARTBEAT_LAST: 07.08.2013 17:36:58
  42. 42. The speaker says... The Isis2is INFORMATION_SCHEMA plugin provides a list of all MySQL servers in the cluster through the table INFORMATION_SCHEMA.ISIS2IS . Clients can check the table periodically for news, e.g. new servers or – not implemented – server load to adapt their load balancing dynamically. Whenever a client fails to connect to a MySQL server, it can ask any of the other MySQL servers it knows about for an update. If required, the client can automatically adapt to membership changes: use new servers, switch to different master. No DBA action required to deploy clients. The PECL/mysqlnd_ms code has been prepared for zero- deployment/runtime configuration changes years ago. Search the source for „hotloading“...
  43. 43. No more than an illustration of the idea • NOT stable, NOT complete, NO sketch of client code • Because: my first three days ever with C#/Mono hacking • Because: the C#/Mono approach raises questionmarks • Because: PoC... just fun, just spreading ideas! • Neat, simple, no SPOF, zero-administration approach ?! • Only 509 lines of C# for the Isis2 client socket server • Only 462 lines of C/C++ for the two MySQL server plugins • Available on Proof of Concept Code
  44. 44. The speaker says... The code of the Isis2 client socket server and the two plugins is available at . It is PoC code to illustrate the basic idea, no less, no more. The code is neither complete (e.g. join not repeated if it fails initially, and no failover logic) nor free of bugs (e.g. UNINSTALL PLUGIN may crash under certain circumstances). This, however, does not matter much when the focus is on discussing different designs for HA solutions and suggesting a mashup of two existing one. A mashup with strong client support for zero-administration. Happy hacking!
  45. 45. THE END Contact:
  46. 46. The speaker says... Thank you for your attendance! Upcoming shows: PHP Unconference Hamburg, September 2013 PHP Summit Munich, December 2013