PoC: Using a Group Communication System to improve MySQL Replication HA

PoC: MySQL HA improved
Ulf Wendel, MySQL/Oracle

The speaker says...
If on a sailing boat on the wide, wide ocean and your captain
is the only one who knows how to sail, would you feel safe?
If a police helicopter, that will eventually loose sight or fail,
monitors your ship, would you feel safer? No? You are right.
Both the captain and the helicopter are Single Points of
Failure.
BTW, does you MySQL Replication cluster have a proper
high availability configuration with no single point of failure?
No? Because it is too complicated? Hint: use a GCS for
MySQL HA!

Tip of the day
A Single Point of Failure
cannot cure a SPOF
Ulf Wendel, MySQL/Oracle

The speaker says...
MySQL Replication has a Single point of Failure: the master
server. The master is, by design, the weak spot of every
primary copy based replication cluster*. Replication breaks
when the master fails. At best, read only queries can be
served from the slaves until a new master has been elected
and set up. Read: downtime, service outage, costs. Primary
copy is still a valid design choice. It is simple. It is fast. But,
the Single point of Failure (SPOF) remains fact…
A clients view on existing solutions and a Proof-of-Concept
mashup based on a recent Group Communication System.
* http://www.slideshare.net/nixnutz/diy-a-distributed-database-cluster-or-mysql-cluster
gives a technical overview on database clustering theory (MySQL Cluster, 3rd party...)

Things to care a lot about
• Master database process and/or master host monitoring
• How to identify a failover candidate
• How not to loose transactions, ever
The Servers' worries
Master (Primary) Slave (Copy) Slave (Copy)
GTID = 12 GTID = 9 GTID = 12
Monitor

The speaker says...
Let's recap. If the Master of a MySQL Replication system
fails a slave must be promoted to become the new master.
All slaves are examined to identify the most recent one.
Finding the most recent slave only recently became less
troublesome with the introduction of Global Transaction
Identifier (GTIDs) in MySQL 5.6. Then, the candidate must
be promoted to master and all other slaves must be updated
to continue replicating from the new master. In
heterogenous deployments with older MySQL versions,
searching for the latest transactions and applying them on all
slaves can be quite demanding. Hence, use a tool for it!

Introduced as MySQL 5.6 Utility
• Health monitoring, Failover
• Aims for 99.9% HA – 8 hours downtime per year
Example: mysqlfailover utility
heartbeating
mysqlfailover

The speaker says...
For years, MySQL has recommended using 3rd party
monitoring solutions for MySQL Replication.
MySQL 5.6 finally introduces the mysqlfailover command
line utility. It sends heartbeats to the nodes of a MySQL
Replication cluster and monitors their health. If required, it
performs a failover. Due to the complexity of the failure, its
much welcome to see it being automated.
BTW, the utility is now GA, and can be run as a daemon.

Common design because of its simplicity
• See mysqlfailover
• See 3rd party, for example, MHA (MySQL High Availability)
Result: SPOFs doubled
SPOF: Master Slave (Copy) Slave (Copy)
SPOF: Network
SPOF: Monitor

The speaker says...
If the MySQL Replication Master is a Single Point of Failure,
what is a single health monitor? Be it MHA or mysqlfailover,
this approach introduces a new Single Point of Failure: the
health monitor.
Given how rare a failover is, and given how unlikely it is that
two systems – the master and the monitor – fail at the same
time, it is still a valid design. In the worst case, you can still
manually (re)start the monitor. However, unnecessary
failover may happen if the monitor uniliterally looses contact
to the master. Generic HA Cluster solutions such as
Windows Clustering or its Linux counterpart address such

Developed and pushed by major Linux vendors
• Pacemaker, Corosync/Heartbeat, DRBD
• Aims for 99.99% HA – 50 minutes downtime/year
Generic HA Cluster solution
Master (Active) Slave (Active)
Pacemaker (CRM)
Corosync (CCM)
Master (Standby)
Pacemaker (CRM)
Corosync (CCM)
Pacemaker (CRM)
Corosync (CCM)
DRBD DRBD

The speaker says...
Higher HA levels require a significant more complex
architecture, such as a combination of Pacemaker,
Heartbeat/Corosync and DRBD. Pacemaker is a Cluster
Resource Manager (CRM) that manages arbitrary services,
for example, MySQL servers. The managed services are
monitored by a Cluster Communication Manager (CCM),
such as Corosync. Everything is broken into small,
independent programs. There are no SPOFs because all the
programs run on all the cluster nodes. A Distributed
Replicated Block Device (DRBD) mirrors the MySQL master
to lower the risk of transaction loss and speed up failover.
A tad complicated, maybe...

Things to care a lot about
• Available servers, their roles, and possibly replication lag
• Partitioning and sharding hints, if needed
• Real-time server load
The Clients' worries
Load 80/100 Load 5/100 Load 95/100
All tables: A, B, C Tables: A, B Tables: C
Lag: 0s Lag: 1 second Lag: 32 seconds

The speaker says...
Given enough information about the nature and status of a
database cluster an intelligent client can dimmish the line
between connecting to a single database and a database
cluster. The mission statement of PECL/mysqlnd_ms*, a
load balancer plugin for the PHP MySQL driver, is to hide
the complexity of database clusters from the application
developer: load balancing, read write splitting, read-your-
writes (consistency), sharding and partitioning support,
connection pool management, automatic caching of selected
slave requests, GTID… - all done by the driver! Dear
database and/or HA cluster, just tell the driver.
* http://www.slideshare.net/nixnutz/load-mysq-clusterin-balancing-peclmysqlndms-14
(it got even more feature loaded meanwhile)

Typical failover procedure
• Master failure: switch virtual IP, no client deployment
• Slave failure: deploy all clients
• Role, status, lag, load, distribution, … : hacks, at best
The Server/Client clash
Failed Master New Master Slave
Client
Virtual IP Virtual IP 192.168.128.11
Master: Virtual IP,
Slave: 192.168.128.11

The speaker says...
All too often HA solutions stop to care about clients beyond
performing an virtual IP switch as part of a MySQL
replication master failover. But failover is only a fraction of
the task: the cloud era demands elastic clusters.
At best, HA solutions support the use of proxy servers to
redirect client requests. Proxies add complexity to the stack,
they add latency to all requests, they can become bottle-
necks, evolve into SPOFs and their failure affects many
clients, not just one. Driver-integrated load balancers, such
as PECL/mysqlnd_ms don't have these disadvantages*!
* http://www.slideshare.net/nixnutz/load-balancing-for-php-and-mysql
(Pro/Con discussion of different load balancing approaches)

Server plugin for monitoring and management
• Monitoring based on Group Communication System plugin
• Management may utilize external scripts
• Clients read I_S on any node for automatic self-deployment
A mashup for the mess
Master
Slave Slave
Plugin: CCM/CRM
Plugin: CCM/CRM Plugin: CCM/CRM

The speaker says...
Could a MySQL server plugin using a Group Communication
System (GCS) offer the robustness of a Pacemaker,
Corosync approach (no SPOF), beat all including
mysqlfailover on ease-of-use and be driver-based proxies
best friend for near zero administration?
At any time, a GCS can report its members and share state
information among all members. A failed member (MySQL
server) can be detected automatically by the GCS and
appropriate action can be taken, for example, running the
mysqlfailover command line utility to perform failover. If a
client fails to connect to a node, it queries the
INFORMATION_SCHEMA on any of the remaining to learn
about changes.

Compared to Pacemaker/Corosync
• Similar no SPOF design for the monitor
• Aims for: out-of-the-box experience, smaller installations
• Aims for: continous, automatic client reconfiguration
Recap: simplified, client focus
MySQL
Plugin: CCM/CRM
MySQL
Pacemaker (CRM)
Corosync (CCM)
DRBD GTID based

The speaker says...
A GCS not only helps with failover. It can also report newly
added nodes. Clients can periodically check the list of nodes
and start to use the new ones automatically.
In the proposed solution, clients use plain SQL to learn
about changes. Clients learn from MySQL nodes. State
information is exposed through the
INFORMATION_SCHEMA and exchanged (synchronously)
by help of a GCS. Data may include load, replication lag, …
- whatever, clients reconfigure themselves continously.
That's the idea.
(Basically, its a system that uses lazy primary copy for

Compared to super-sized central management monitor
• No SPOF design for the monitor
• No central server that can get overloaded
• No new communication channels for client reconfiguration
Recap: simplified, client focus
MySQL
Plugin (CCM/CRM)
Monitor (CRM)
MySQL
Client (SQL)
Client (Reconf)
MySQL
Client (SQL)
Client (Reconf)
Client (SQL)
MySQL
CCM/CRM
Client
Client (Reconf)

The speaker says...
The proposed system is better than a super-sized monitor
that manages state (nodes, roles, load, …) centrally.
A super-sized monitor can easily become a single point of
failure. If clients notice the failure of a MySQL node and
thousands of clients almost concurrently query the one
centralized monitor to learn about cluster state changes the
central monitor likely gets overloaded. Finally, a centralized
monitor likely forces clients to learn a new, additional
protocol for communication with the monitor. Not so with the
GCS approach: SQL for everything. No overloading: load is
distributed on all remaining MySQL servers.

nixnutz@linux-dstv:~/src/isis_201245> rm isis_deamon.exe ; dmcs
isis_deamon.cs Isis.cs ; mono ./isis_deamon.exe
Isis: Searching for the Isis ORACLE...
[IsisDaemonMain] Connecting to ISIS...
[GroupConnector][view change][<mysql> incoming multicasts delivery
thread] Some view change (e.g. join), no action required
[IsisDaemonMain] Starting daemon for communication with MySQL.
[GroupDaemon] Server is waiting on socket 127.0.0.1:2200
[IsisDaemonMain] Started. Listening to client requests.
Q: 31 'join 127.0.0.1 3400 on master'
A: 4 '0 OK'
[GroupConnector][remote mysql register][RemoteRegisterMySQLServer
mysql] Added (50851)-127.0.0.1-3400-on-master-07.08.2013 14:13:07-
30 New server count 1
LastUpdate before heartbeat 07.08.2013 14:13:07
LastUpdate after heartbeat 07.08.2013 14:13:17
[GroupConnector][heartbeat][<mysql> incoming multicasts delivery
thread] Thread.CurrentThread.Name Ignoring Hearbeat message to
ourselves to avoid deadlock.
Q: 29 'heartbeat 127.0.0.1 3400 on'
A: 56 '(50851) 127.0.0.1 3400 on master 30 07.08.2013 14:13:17HA MySQL cluster using:
ORACLE Rendevous Service

The speaker says...
Let's hack it ?! The first steps are trivial. The biggest
challenge is to find a free and open source C/C++ group
communication system that can be embedded in a MySQL
daemon server plugin. Corosync has a client/server-deamon
design which is no perfect match for the task. A brother, the
Spread Toolkit, is somewhat limited to ~40 nodes and the
API is not that appealing. LibPaxos is intentionally licensed
under GPLv3 to make clear its experimental. The rest: old,
Java, ... The OSS world lacks a cool C/C++ GCS :-/.
Then came Isis2! C#, wrong language; but what a nice API!
Isis2 is designed to make distributed cloud computing easy.
Thus, its ideal for a PoC that hides all the glory details ;-)

++Isis, Ken Birmans 1980/1990 masterpiece improved
• https://isis2.codeplex.com/
• Virtual Synchrony Model* merged with Paxos ideas
• Most easy to use yet powerful API
• Distributed (programming language) objects
• Distributed key-value store/hashtable
• From low level unreliable messaging for gossip protocols to
high level globally ordered reliable messaging – it's all there
• New BSD license, C#/.NET (pure C++ port considered)
• Aims to support cloud sized clusters (thousands of nodes)
* http://de.slideshare.net/nixnutz/diy-a-distributed-database-cluster-or-mysql-cluster (Virtual Synchrony, slide 34+)
Isis2 Cloud Computing Library

The speaker says...
Isis2 is almost a perfect choice for the PoC. The API is
perfect, the documentation is great but its written in
C#/.NET.
This not only means I had to use Mono on my preferred
Linux platform but there is a language barrier that destroys
parts of the beauty of the proposed solution. MySQL and its
plugins are written in C/C++. One cannot call/link the Isis2
library from a MySQL plugin and use Isis2's neat distributed
(programming language) objects feature or any of it other
services directly. A proxy/connector, a socket server, is
required to communicate between MySQL and Isis2. It is an
undesired extra layer. Still... yippie yeahh!

MySQL APIs and language barriers require a compromise
• C#/.NET socket server becomes Isis2 client
• C/C++ MySQL daemon plugin sends heartbeat to socket
server
• C/C++ MySQL INFORMATION_SCHEMA plugin
Sad but true: compromises...
MySQL
Plugin: I_S tables
Plugin: heartbeat to Isis2 client
Isis2 client w. socket server
MySQL
Plugin: Connector
Plugin: Connector
Isis2 client: CCM/CRM

The speaker says...
The C#/.NET to C/C++ language barrier does kill some of
simplicity and beauty of the proposal. A MySQL plugin
cannot be an Isis2 client. Instead, a C#/.NET Isis2 client
(node) has to run as a socket server and communicate with
MySQL through a socket. The process model gets
complicated. On the MySQL side we now need a daemon
plugin that sends a heartbeat to the local Isis2 client and
another MySQL plugin that implements the I_S tables a SQL
client can query to learn about the clusters members and
state. Think of the plugins as „Connectors“. The Isis2 client
takes the CCM/CRM role – as planned.

For every MySQL server do:
• Start Isis2 client: mono ./isis_deamon.exe
• Configure Connector Plugins, e.g. Isis2 client address
• Heartbeat to Isis2: INSTALL PLUGIN isis2d SONAME
'libisis2.so'
• I_S Plugin: INSTALL PLUGIN isis2is SONAME 'libisis2.so'
• Teach your clients to monitor INFORMATION_SCHEMA
The proposed user manual
mysql> SELECT * FROM INFORMATION_SCHEMA.ISIS2ISG
*************************** 1. row ***************************
ERROR:
ERRNO: 0
ISIS2_MEMBER: (50249)
MYSQL_HOST: 127.0.0.1
MYSQL_PORT_OR_SOCKET: 3400
MYSQL_STATUS: on
MYSQL_ROLE: master
MYSQL_HEARTBEAT_TTL: 30
MYSQL_HEARTBEAT_LAST: 07.08.2013 13:50:22
1 row in set (0,01 sec)

The speaker says...
DBA instructions. Start a daemon (only required because of
the C# - C/C++ mismatch – otherwise it would be part of the
plugin!). Then, install some plugins, let the MySQL server
announce its availability and you are done. The servers
communicate with each other and jointly discover new
servers entering the cluster and servers leaving it. If a
master leaves, a failover script can be called to reconfigure
the cluster.
On any of the MySQL nodes in your cluster, a client can get
a list of currently connected MySQL servers and their state
(e.g. role). The state is replicated synchronously. HA built-in
to MySQL, job done ;-). Let's talk about details...

Isis2 client socket server start
Joining Isis2 client Isis2 clientIsis2 clientConnected clients
Joining Isis2 client Isis2 clientIsis2 clientCurrent leader
Isis2 Group
Isis2 Group
Isis: Searching for the Isis ORACLE...
Isis: Found the Isis.ORACLE service, attempting to connect.
[IsisDaemonMain] Connecting to ISIS...
[state transfer] received: (51971)-127.0.0.1-3400-master-on-30
[state transfer] received: (51971)-192.168.2.1-3306-slave-on-30

The speaker says...
When the DBA starts the Isis2 client socket server on a
MySQL host, the client tries to connect to a virtual,
distributed group in the cloud. The Isis2 library calls the code
that manages virtual groups the ORACLE – no joke!
Once the client is connected, a checkpoint is done to
transfer the state of already connected clients to the joining
one. The state transferred is the list of the MySQL servers, if
any, that contacted their local Isis2 clients to register
themselves in the group. All this takes less than 100 lines of
C#.

New Isis2 client
Isis2 client socket server start
Isis2 clientIsis2 clientConnected clientsJoined Isis2 client
Isis2 Group
Local socket server
[IsisDaemonMain] Starting server for communication with MySQL.
[GroupDaemon] Server is waiting on socket 127.0.0.1:2200
Isis2 clientIsis2 clientConnected clients
Isis2 Group
[GroupConnector][view change][<mysql> incoming multicasts delivery
thread] Some view change (e.g. join), no action required

The speaker says...
As soon as an Isis2 client joins the group, a view change
message is send to all group members. In our case it gets
ignored. A joining Isis2 client does not change the list of
MySQL servers that have registered themselves in the
cluster.

Isis2d heartbeat MySQL plugin
Isis2 clientIsis2 clientRemote clientsLocal Isis2 client
Isis2 Group
Socket server
[GroupConnector][remote mysql register][RemoteRegisterMySQLServer
mysql] Added (54830)-127.0.0.1-3400-on-master-07.08.2013 16:28:26-
30 New server count 1
MySQL
Isis2d Plugin
Q: 31 'join 127.0.0.1 3400 on master'

The speaker says...
MySQL does not appear on stage before the DBA loads the
first of the two „Connector“ plugins, the Isis2 daemon plugin,
into MySQL. When the plugin respectively the MySQL server
starts, it send a join message to the local Isis2 client. The
client parses it and sends a message to all group members.
Hereby all connected Isis2 clients on all hosts learn
(virtually) synchronously about the MySQL server.
The use of the Isis2 SafeSend()function ensures globally
ordered and reliable messaging. This is by far the slowest
Send() variant, but it simplifies our job. Either all or no Isis2
group members add the MySQL server to their list of

Isis2 Group
Socket server
[GroupConnector][heartbeat][RemoteHeartbeatmysql] heartbeat server
(54830)-127.0.0.1-3400-master-on-30-07.08.2013 16:28:36
MySQL
Isis2d Plugin
Q: 29 'heartbeat 127.0.0.1 3400 on'
A: 56 '(54830) 127.0.0.1 3400 on master 30 07.08.2013 16:28:36

The speaker says...
Unfortunately, due to the C# to C/C++ language barrier, our
MySQL Servers are no members the Isis2 group but the
local Isis2 client are. Thus, Isis2s' own group membership
service cannot monitor the MySQL processes directly. A
hack – heartbeating - is needed to tell each local Isis2 clients
about the state of its associated MySQL Server. Every now
and then the Isis2 heartbeat plugin sends a heartbeat. The
local Isis2 client then increases the TTL of the servers entry
in the groups server list. It is also the local Isis2 client that
may drop a server from the list if it fails to send heartbeat
messages.

Isis2 Group
Socket server
[GroupConnector][unregister][RemoteUnregisterMySQLServermysql]
removing server (53743)-127.0.0.1-3400-master-on-07.08.2013
16:20:29-30
MySQL
Isis2d Plugin
Q: 22 'leave 127.0.0.1 3400'

The speaker says...
Upon plugin respectively MySQL server shutdown, the
Isis2d daemon plugin sends a leave message to its local
Isis2 client. The leave message is handled like the join
message: parse message, forward to all using
SafeSend().
Neither a missed heartbeat nor a leave command trigger
any failover logic in the PoC, although this is what should be
done. I didn't want to spend more than few days fooling &
toying around with the coding. Also, the main message
should come clear even without triggering a failover script,
which would be a rather trivial thing to do, given a proper
failover script.

Isis2is I_S MySQL plugin
Local Isis2 client
Socket server
MySQL
Isis2is Plugin
Q: 12 'serverlist'
A: 58 '0 (54830) 127.0.0.1 3400 on master 30 07.08.2013 17:36:58'
mysql> SELECT * FROM INFORMATION_SCHEMA.ISIS2ISG
***************** 1. row *****************
ERROR:
ERRNO: 0
ISIS2_MEMBER: (54830)
MYSQL_HOST: 127.0.0.1
MYSQL_PORT_OR_SOCKET: 3400
MYSQL_STATUS: on
MYSQL_ROLE: master
MYSQL_HEARTBEAT_TTL: 30
MYSQL_HEARTBEAT_LAST: 07.08.2013 17:36:58

The speaker says...
The Isis2is INFORMATION_SCHEMA plugin provides a list
of all MySQL servers in the cluster through the table
INFORMATION_SCHEMA.ISIS2IS . Clients can check the
table periodically for news, e.g. new servers or – not
implemented – server load to adapt their load balancing
dynamically. Whenever a client fails to connect to a MySQL
server, it can ask any of the other MySQL servers it knows
about for an update. If required, the client can automatically
adapt to membership changes: use new servers, switch to
different master. No DBA action required to deploy clients.
The PECL/mysqlnd_ms code has been prepared for zero-
deployment/runtime configuration changes years ago.
Search the source for „hotloading“...

No more than an illustration of the idea
• NOT stable, NOT complete, NO sketch of client code
• Because: my first three days ever with C#/Mono hacking
• Because: the C#/Mono approach raises questionmarks
• Because: PoC... just fun, just spreading ideas!
• Neat, simple, no SPOF, zero-administration approach ?!
• Only 509 lines of C# for the Isis2 client socket server
• Only 462 lines of C/C++ for the two MySQL server plugins
• Available on blog.ulf-wendel.de
Proof of Concept Code

The speaker says...
The code of the Isis2 client socket server and the two
plugins is available at blog.ulf-wendel.de . It is PoC code to
illustrate the basic idea, no less, no more. The code is
neither complete (e.g. join not repeated if it fails initially,
and no failover logic) nor free of bugs (e.g. UNINSTALL
PLUGIN may crash under certain circumstances).
This, however, does not matter much when the focus is on
discussing different designs for HA solutions and suggesting
a mashup of two existing one. A mashup with strong client
support for zero-administration.
Happy hacking!

THE END
Contact: ulf.wendel@oracle.com

The speaker says...
Thank you for your attendance!
Upcoming shows:
PHP Unconference
Hamburg, September 2013
PHP Summit
Munich, December 2013

PoC: Using a Group Communication System to improve MySQL Replication HA

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to PoC: Using a Group Communication System to improve MySQL Replication HA

Similar to PoC: Using a Group Communication System to improve MySQL Replication HA (20)

More from Ulf Wendel

More from Ulf Wendel (8)

Recently uploaded

Recently uploaded (20)

PoC: Using a Group Communication System to improve MySQL Replication HA