MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)

MySQL High Availability:
Managing Farms of Distributed Servers
(MySQL Fabric)
Mats Kindahl
Alfranio Correia
Narayanan Venkateswaran

3 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
The following is intended to outline our general product direction. It is intended
for information purposes only, and may not be incorporated into any contract.
It is not a commitment to deliver any material, code, or functionality, and
should not be relied upon in making purchasing decision. The development,
release, and timing of any features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
Safe Harbor Statement

Agenda
 MySQL High Availability Options
 MySQL Fabric – New kid on the block
 MySQL Fabric – Failure detection and Failover
 MySQL Fabric-aware connectors
 MySQL Fabric – Playing with the new kid

MySQL High Availability Options

What Causes Downtime?
 System Failures
– Server faults
– Software bugs or crashes
 Physical Disasters
 Scheduled Maintenance
 User Errors

Effect and Impact
 Effect:
– Service Unavailability
– Bad response time
 Impact:
– Revenue loss
– Negative impact on customer relationships
– Reduced employee productivity
– Regulatory issues

Another Amazon Outage Exposes the Cloud's Dark Lining
By Brad Stone - Bloomberg Businessweek
“The entire incident lasted all of 49 minutes...”

Causes of Downtime in Production MySQL Servers
By Baron Schwartz – Percona
“It is ironic but true that high-availability tools can cause
downtime.”

Failures are inevitable so design your
systems taking this into account.

High Availability Solutions
 Primary-Secondary
 Shared Nothing Clusters
 Tightly-coupled Clusters

 Simple to configure
 Different Platforms
 Configured over LAN or WAN
 No Shared Storage or Virtual
IP required
Primary-Secondary
Characteristics
MySQL Replication in 5.6
Master
Slave
Slave
Slave
Slave

 Asynchronous Replication: risk
of data loss (unless using
semi-sync)
 Performance overhead to
master
 No automatic failover or
switchover (unless using
MySQL Utilities)
Primary-Secondary
Characteristics
MySQL Replication in 5.6
Master
Slave
Slave
Slave
Slave

 Multi-master architecture
 No single point of failure
 Support for SQL and NoSQL
Interfaces
 Synchronous replication
Shared Nothing Clusters
Characteristics
MySQL Cluster
MySQL Cluster Data Nodes MySQL Servers

Tightly Coupled Clusters
 Provide Active/Passive Solution
 Examples:
– DRBD
– WSFC
– Solaris Clustering
– Oracle Virtual Machines

 Linux Kernel module
integrated into Oracle Linux
 Synchronous replication
 Only one MySQL operational
Distributed Replicated Block Device
Characteristics
DRBD (Regular Operation)
Pacemaker
MySQL
DRBD
MySQL
DRBD
Corosync
ServicesCluster

 Cluster Management System
required
 Virtual IP migration
Characteristics
DRBD (Failover)
Pacemaker
MySQL
DRBD
MySQL
DRBD
Corosync
ServicesCluster

 Required:
– Windows Clustering
– Shared Storage
 Only one MySQL Operational
 Virutal IP migration
 Shared storage used to vote
Shared Storage
Characteristics
Windows Server Failover Clustering (Regular Operation)
SharedStorageServers
MySQL
Windows Clustering
MySQL
Windows Clustering
Services
Vote
Data
Binary
Log

MySQL Fabric – New kid on the block

 Distributed framework
 Extensions are first-class Citizens
 Supported by a variety of connectors
 Fault-tolerant solution
 You can suggest features, report bugs and
contribute patches
MySQL Fabric
 Still early alpha, long journey ahead
 Farms of MySQL 5.6 Servers

 Support for Primary-
Secondary
 Focus on MySQL 5.6 and
later
 Written in Python
Birds-eye View
Characteristics
High Availability Groups
MySQL Fabric Application
XML-RPC
SQL
Key Components

 Fabric-aware connectors:
– Route Transactions
– Cache Information
– Currently Python, Java,
PHP
Birds-eye View
Characteristics
High Availability Groups
MySQL Fabric Application
XML-RPC
SQL
Fabric-aware Connectors

 XML-RPC is widely available
 Extensible Framework
 Failures taken into account
Architecture
Characteristics
MySQL
MySQL Fabric
Framework
Executor
State Store
(Persister)
Sh
?HA
MySQLAMQP XML-RPC
??
Extensions
Backing Store
Protocols

MySQL Fabric: Prerequisites
 MySQL Servers 5.6.10 (or later):
– Backing Store
– Managed Servers
 Python 2.6 or 2.7
 MySQL Utilities 1.4.0
– Available at labs (http://labs.mysql.com)

MySQL Fabric – Failure Detection
and Failover

 Fabric keeps information on
groups
 Application defines the group
that it will use
 Connection failures regularly
propagated
HA Overview
Characteristics
High Availability GroupMySQL Fabric
ApplicationOperator

Failure Detection and Failover
 Current Status:
– Simple failure detector/recovery per group
 Considering:
– Make connectors report failures
– Support external/custom failure detectors
– Improve failover/switchover algorithm
– Extend servers/system to avoid the split-brain problem

Enabled per group
Failure Detection
group = Group.fetch(self.__group_id)
for server in group.servers():
  if server.is_alive():
    continue
  if group.master == server.uuid:
    trigger("FAIL_OVER", [], self.__group_id)
  else:
    trigger("SERVER_LOST", [], self.__group_id,
            server.uuid)
  server.status = MySQLServer.FAULTY
Failover if master has gone
Notification if not master
Server marked as faulty

Failover
Master
Slave
Slave
Slave
Slave
T1
T2
T3 T1
T2
T3
T1
T1
T2
T1
Master fails

Failover
Master
Slave
Slave
Slave
Slave
T1
T2
T3 T1
T2
T3
T1
T1
T2
T1
Choosing a candidate

Failover
Master
Slave
Slave
Slave
Slave
T1
T2
T3 T1
T2
T3
T1
T1
T2
T1
Pointing to the new master

Making Fabric Itself HA
 Current Status:
– Fabric can automatically resume on-going activities
– Backing store is not left in an inconsistent state
– Information is cached in the connector
 Considering:
– Replicated State Machine among Fabric nodes
– Use MySQL Cluster as backing store

Crash-safe Procedures
MySQL Fabric
Framework
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL
Executor
Procedure. Step 1
Procedure. Step 2
Procedure. Step 3
Regular Execution

MySQL Fabric
Framework
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL
Executor
Procedure. Step 1
Procedure. Step 2
Procedure. Step 3
Failover/Recovery Execution

MySQL Fabric
Framework
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL
Executor
Procedure. Step 1
Procedure. Step 2
Procedure. Step 3
Resuming Execution

Writing a procedure
@_events.on_event(STEP_1)
def do_something(group_id):
    _do_it(group_id)
    _events.trigger_within_procedure(STEP_2, group_id)
    )
@do_something.undo
def undo_something(group_id):
    _undo_it(group_id)
Trigger the next step
Compensate Operation
Transactional Context

MySQL Fabric: Using MySQL Cluster
MySQL Fabric
Framework
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL Fabric
Framework
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL Cluster
Executor Executor

MySQL
MySQL Fabric
Framework
Executor
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL
tore
ter)
Sh
HA
L-RPC
MySQ
Fram
Executor
MAMQP
MySQL
MySQL Fabric
Framework
Executor
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
RSMRSM
MySQL Fabric: Using Replicated State Machine

MySQL Fabric-aware Connectors

Use MySQLFabricConnection
Writing an application
import mysql.connector.fabric as connector
conn = connector.MySQLFabricConnection(
fabric={"host": "fabric.example.com", "port" : 8080},
user='mats', passwd= 'passwd', database="employees")
conn.set_property(group='YYZ')
cur = conn.cursor()
Connecting to a Group
Define a group
Get a cursor to master in YYZ

Connectors cannot hide failures
Multi-statement transaction

Connectors cannot hide failures
Single-statement transaction

Writing an application
try:
  conn.start_transaction()
  conn.execute('INSERT...')
  conn.execute('UPDATE...')
  self.__cnx.commit()
except InterfaceError as error:
  cur = conn.cursor()
Handling Connection Failures
Connectors cannot safely retry or
reconnect

Plan your application to retry after a
failure.

Good practices
 Handle session information in the retry logic:
– Temporary tables
– Session variables
– Prepared statements
 Check the wait_timeout server's property
 Do not set connection_timeout

Blogs
 http://alfranio-distributed.blogspot.com/2013/09/writing-fault-tolerant-database.html
 http://alfranio-distributed.blogspot.com/2013/09/tips-to-build-fault-tolerant-database.html
Documents
 http://miscalculation/why-mysql/white-papers/mysql-guide-to-high-availability-solutions/
 http://dev.mysql.com/doc/workbench/en/mysql-utilities.html
Code
 MySQL Fabric available at http://labs.mysql.com/
References

MySQL Fabric – Playing with the
new kid

 Use MTR
 Do it manually, use sandbox,
whatever you like
Starting MySQL Servers
Quick Setup
rpl_fabric_gtid.cnf:
!include ../my.cnf
[mysqld.n]
reporthost=localhost
logslaveupdates
innodb
gtidmode=on
enforcegtidconsistency
masterinforepository=TABLE
source include/have_innodb.inc
rpl_fabric_gtid.test:

 Python 2.6 or 2.7
 MySQL Utilities 1.4.0
 Check configuration file
MySQL Fabric Installation
Quick Setup fabric.cfg:
[storage]
address = localhost:3306
user = fabric
password =
database = fabric
connection_timeout = 6
[protocol.xmlrpc]
address = localhost:8080
threads = 5
url = file:///var/log/fabric.log

 Configure the state store
 Start fabric
 Manage your groups
Run MySQL Fabric
Quick Setup
mysqlfabric manage setup
mysqlfabric manage start
Terminal 1:
mysqlfabric listcommands
mysqlfabric group create YYZ
mysqlfabric group add localhost:1300
root ''
Terminal 2:

Thoughts for the Future
●
Connector multi-cast
●
Scatter-gather
●
Internal interfaces
●
Improve extension support
●
Improve procedures support
●
Command-line interface
●
Improving usability
●
Focus on ease-of-use
●
More protocols
●
MySQL-RPC Protocol?
●
AMQP?
●
More frameworks?
●
More HA group types
●
DRBD
●
MySQL Cluster
●
Fabric-unaware connectors?

Thoughts for the Future
●
“More transparent” sharding
●
Single-query transactions
●
Cross-shard joins is a problem
●
Multiple shard mappings
●
Independent tables
●
Multi-way shard split
●
Efficient initial sharding
●
Better use of resources
●
High-availability executor
●
Node failure stop execution
●
Replicated State Machine
●
Fail over to other Fabric node
●
Distributed failure detector
●
Connectors report failures
●
Custom failure detectors

Thank you!

Your Feedback is Highly Appreciated!
http://forums.mysql.com/list.php?144

MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)

More Related Content

What's hot

Similar to MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)

Recently uploaded

MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)