Copyright © 2013, Oracle and/or its affiliates. All rights reserved.1
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.2
MySQL High Availability:
Managing Farms of Distributed Servers
(MySQL Fabric)
Mats Kindahl
Alfranio Correia
Narayanan Venkateswaran
3 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
The following is intended to outline our general product direction. It is intended
for information purposes only, and may not be incorporated into any contract.
It is not a commitment to deliver any material, code, or functionality, and
should not be relied upon in making purchasing decision. The development,
release, and timing of any features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
Safe Harbor Statement
4 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Agenda
 MySQL High Availability Options
 MySQL Fabric – New kid on the block
 MySQL Fabric – Failure detection and Failover
 MySQL Fabric-aware connectors
 MySQL Fabric – Playing with the new kid
5 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL High Availability Options
6 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
What Causes Downtime?
 System Failures
– Server faults
– Software bugs or crashes
 Physical Disasters
 Scheduled Maintenance
 User Errors
7 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Effect and Impact
 Effect:
– Service Unavailability
– Bad response time
 Impact:
– Revenue loss
– Negative impact on customer relationships
– Reduced employee productivity
– Regulatory issues
8 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Another Amazon Outage Exposes the Cloud's Dark Lining
By Brad Stone - Bloomberg Businessweek
“The entire incident lasted all of 49 minutes...”
9 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Causes of Downtime in Production MySQL Servers
By Baron Schwartz – Percona
“It is ironic but true that high-availability tools can cause
downtime.”
10 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Failures are inevitable so design your
systems taking this into account.
11 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
High Availability Solutions
 Primary-Secondary
 Shared Nothing Clusters
 Tightly-coupled Clusters
12 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Simple to configure
 Different Platforms
 Configured over LAN or WAN
 No Shared Storage or Virtual
IP required
Primary-Secondary
Characteristics
MySQL Replication in 5.6
Master
Slave
Slave
Slave
Slave
13 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Asynchronous Replication: risk
of data loss (unless using
semi-sync)
 Performance overhead to
master
 No automatic failover or
switchover (unless using
MySQL Utilities)
Primary-Secondary
Characteristics
MySQL Replication in 5.6
Master
Slave
Slave
Slave
Slave
14 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Multi-master architecture
 No single point of failure
 Support for SQL and NoSQL
Interfaces
 Synchronous replication
Shared Nothing Clusters
Characteristics
MySQL Cluster
MySQL Cluster Data Nodes MySQL Servers
15 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Tightly Coupled Clusters
 Provide Active/Passive Solution
 Examples:
– DRBD
– WSFC
– Solaris Clustering
– Oracle Virtual Machines
16 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Linux Kernel module
integrated into Oracle Linux
 Synchronous replication
 Only one MySQL operational
Distributed Replicated Block Device
Characteristics
DRBD (Regular Operation)
Pacemaker
MySQL
DRBD
MySQL
DRBD
Corosync
ServicesCluster
17 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Cluster Management System
required
 Virtual IP migration
Distributed Replicated Block Device
Characteristics
DRBD (Failover)
Pacemaker
MySQL
DRBD
MySQL
DRBD
Corosync
ServicesCluster
18 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Cluster Management System
required
 Virtual IP migration
Distributed Replicated Block Device
Characteristics
DRBD (Failover)
Pacemaker
MySQL
DRBD
MySQL
DRBD
Corosync
ServicesCluster
19 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Required:
– Windows Clustering
– Shared Storage
 Only one MySQL Operational
 Virutal IP migration
 Shared storage used to vote
Shared Storage
Characteristics
Windows Server Failover Clustering (Regular Operation)
SharedStorageServers
MySQL
Windows Clustering
MySQL
Windows Clustering
Services
Vote
Data
Binary
Log
20 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL Fabric – New kid on the block
21 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Distributed framework
 Extensions are first-class Citizens
 Supported by a variety of connectors
 Fault-tolerant solution
 You can suggest features, report bugs and
contribute patches
MySQL Fabric
 Still early alpha, long journey ahead
 Farms of MySQL 5.6 Servers
22 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Support for Primary-
Secondary
 Focus on MySQL 5.6 and
later
 Written in Python
Birds-eye View
Characteristics
High Availability Groups
MySQL Fabric Application
XML-RPC
SQL
Key Components
23 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Fabric-aware connectors:
– Route Transactions
– Cache Information
– Currently Python, Java,
PHP
Birds-eye View
Characteristics
High Availability Groups
MySQL Fabric Application
XML-RPC
SQL
Fabric-aware Connectors
24 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 XML-RPC is widely available
 Extensible Framework
 Failures taken into account
Architecture
Characteristics
MySQL
MySQL Fabric
Framework
Executor
State Store
(Persister)
Sh
?HA
MySQLAMQP XML-RPC
??
Extensions
Backing Store
Protocols
25 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL Fabric: Prerequisites
 MySQL Servers 5.6.10 (or later):
– Backing Store
– Managed Servers
 Python 2.6 or 2.7
 MySQL Utilities 1.4.0
– Available at labs (http://labs.mysql.com)
26 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL Fabric – Failure Detection
and Failover
27 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Fabric keeps information on
groups
 Application defines the group
that it will use
 Connection failures regularly
propagated
HA Overview
Characteristics
High Availability GroupMySQL Fabric
ApplicationOperator
28 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Failure Detection and Failover
 Current Status:
– Simple failure detector/recovery per group
 Considering:
– Make connectors report failures
– Support external/custom failure detectors
– Improve failover/switchover algorithm
– Extend servers/system to avoid the split-brain problem
29 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Enabled per group
Failure Detection
group = Group.fetch(self.__group_id)
for server in group.servers():
  if server.is_alive():
    continue
  if group.master == server.uuid:
    trigger("FAIL_OVER", [], self.__group_id)
  else:
    trigger("SERVER_LOST", [], self.__group_id, 
            server.uuid)
  server.status = MySQLServer.FAULTY
Failover if master has gone
Notification if not master
Server marked as faulty
30 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Failover
Master
Slave
Slave
Slave
Slave
T1
T2
T3 T1
T2
T3
T1
T1
T2
T1
Master fails
31 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Failover
Master
Slave
Slave
Slave
Slave
T1
T2
T3 T1
T2
T3
T1
T1
T2
T1
Choosing a candidate
32 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Failover
Master
Slave
Slave
Slave
Slave
T1
T2
T3 T1
T2
T3
T1
T1
T2
T1
Pointing to the new master
33 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Making Fabric Itself HA
 Current Status:
– Fabric can automatically resume on-going activities
– Backing store is not left in an inconsistent state
– Information is cached in the connector
 Considering:
– Replicated State Machine among Fabric nodes
– Use MySQL Cluster as backing store
34 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Crash-safe Procedures
MySQL Fabric
Framework
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL
Executor
Procedure. Step 1
Procedure. Step 2
Procedure. Step 3
Regular Execution
35 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Crash-safe Procedures
MySQL Fabric
Framework
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL
Executor
Procedure. Step 1
Procedure. Step 2
Procedure. Step 3
Failover/Recovery Execution
36 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Crash-safe Procedures
MySQL Fabric
Framework
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL
Executor
Procedure. Step 1
Procedure. Step 2
Procedure. Step 3
Resuming Execution
37 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Writing a procedure
@_events.on_event(STEP_1)
def do_something(group_id):
    _do_it(group_id)
    _events.trigger_within_procedure(STEP_2, group_id)
    )
@do_something.undo
def undo_something(group_id):
    _undo_it(group_id)
Trigger the next step
Compensate Operation
Transactional Context
38 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL Fabric: Using MySQL Cluster
MySQL Fabric
Framework
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL Fabric
Framework
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL Cluster
Executor Executor
39 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL
MySQL Fabric
Framework
Executor
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL
tore
ter)
Sh
HA
L-RPC
MySQ
Fram
Executor
MAMQP
MySQL
MySQL Fabric
Framework
Executor
State Store
(Persister)
Sh
HA
MySQLAMQP XML-RPC
RSMRSM
MySQL Fabric: Using Replicated State Machine
40 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL Fabric-aware Connectors
41 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Use MySQLFabricConnection
Writing an application
import mysql.connector.fabric as connector
conn = connector.MySQLFabricConnection(
    fabric={"host": "fabric.example.com", "port" : 8080},
    user='mats', passwd= 'passwd', database="employees")
conn.set_property(group='YYZ')
cur = conn.cursor()
Connecting to a Group
Define a group
Get a cursor to master in YYZ
42 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Connectors cannot hide failures
Multi-statement transaction
43 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Connectors cannot hide failures
Single-statement transaction
44 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Writing an application
try:
  conn.start_transaction()
  conn.execute('INSERT...')
  conn.execute('UPDATE...')
  self.__cnx.commit()
except InterfaceError as error:
  cur = conn.cursor()
Handling Connection Failures
Connectors cannot safely retry or
reconnect
45 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Plan your application to retry after a
failure.
46 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Good practices
 Handle session information in the retry logic:
– Temporary tables
– Session variables
– Prepared statements
 Check the wait_timeout server's property
 Do not set connection_timeout
47 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Blogs
 http://alfranio-distributed.blogspot.com/2013/09/writing-fault-tolerant-database.html
 http://alfranio-distributed.blogspot.com/2013/09/tips-to-build-fault-tolerant-database.html
Documents
 http://miscalculation/why-mysql/white-papers/mysql-guide-to-high-availability-solutions/
 http://dev.mysql.com/doc/workbench/en/mysql-utilities.html
Code
 MySQL Fabric available at http://labs.mysql.com/
References
48 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL Fabric – Playing with the
new kid
49 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Use MTR
 Do it manually, use sandbox,
whatever you like
Starting MySQL Servers
Quick Setup
rpl_fabric_gtid.cnf:
!include ../my.cnf
[mysqld.n]
report­host=localhost
log­slave­updates
innodb
gtid­mode=on
enforce­gtid­consistency
master­info­repository=TABLE
­­source include/have_innodb.inc
rpl_fabric_gtid.test:
50 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Python 2.6 or 2.7
 MySQL Utilities 1.4.0
 Check configuration file
MySQL Fabric Installation
Quick Setup fabric.cfg:
[storage]
address = localhost:3306
user = fabric
password = 
database = fabric
connection_timeout = 6
[protocol.xmlrpc]
address = localhost:8080
threads = 5
url = file:///var/log/fabric.log
51 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
 Configure the state store
 Start fabric
 Manage your groups
Run MySQL Fabric
Quick Setup
mysqlfabric manage setup
mysqlfabric manage start
Terminal 1:
mysqlfabric list­commands
mysqlfabric group create YYZ
mysqlfabric group add localhost:1300
root ''
Terminal 2:
52 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Thoughts for the Future
●
Connector multi-cast
●
Scatter-gather
●
Internal interfaces
●
Improve extension support
●
Improve procedures support
●
Command-line interface
●
Improving usability
●
Focus on ease-of-use
●
More protocols
●
MySQL-RPC Protocol?
●
AMQP?
●
More frameworks?
●
More HA group types
●
DRBD
●
MySQL Cluster
●
Fabric-unaware connectors?
53 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Thoughts for the Future
●
“More transparent” sharding
●
Single-query transactions
●
Cross-shard joins is a problem
●
Multiple shard mappings
●
Independent tables
●
Multi-way shard split
●
Efficient initial sharding
●
Better use of resources
●
High-availability executor
●
Node failure stop execution
●
Replicated State Machine
●
Fail over to other Fabric node
●
Distributed failure detector
●
Connectors report failures
●
Custom failure detectors
54 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Thank you!
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.55
Your Feedback is Highly Appreciated!
http://forums.mysql.com/list.php?144

MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)

  • 1.
    Copyright © 2013,Oracle and/or its affiliates. All rights reserved.1
  • 2.
    Copyright © 2013,Oracle and/or its affiliates. All rights reserved.2 MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric) Mats Kindahl Alfranio Correia Narayanan Venkateswaran
  • 3.
    3 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decision. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Safe Harbor Statement
  • 4.
    4 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Agenda  MySQL High Availability Options  MySQL Fabric – New kid on the block  MySQL Fabric – Failure detection and Failover  MySQL Fabric-aware connectors  MySQL Fabric – Playing with the new kid
  • 5.
    5 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. MySQL High Availability Options
  • 6.
    6 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. What Causes Downtime?  System Failures – Server faults – Software bugs or crashes  Physical Disasters  Scheduled Maintenance  User Errors
  • 7.
    7 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Effect and Impact  Effect: – Service Unavailability – Bad response time  Impact: – Revenue loss – Negative impact on customer relationships – Reduced employee productivity – Regulatory issues
  • 8.
    8 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Another Amazon Outage Exposes the Cloud's Dark Lining By Brad Stone - Bloomberg Businessweek “The entire incident lasted all of 49 minutes...”
  • 9.
    9 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Causes of Downtime in Production MySQL Servers By Baron Schwartz – Percona “It is ironic but true that high-availability tools can cause downtime.”
  • 10.
    10 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Failures are inevitable so design your systems taking this into account.
  • 11.
    11 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. High Availability Solutions  Primary-Secondary  Shared Nothing Clusters  Tightly-coupled Clusters
  • 12.
    12 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Simple to configure  Different Platforms  Configured over LAN or WAN  No Shared Storage or Virtual IP required Primary-Secondary Characteristics MySQL Replication in 5.6 Master Slave Slave Slave Slave
  • 13.
    13 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Asynchronous Replication: risk of data loss (unless using semi-sync)  Performance overhead to master  No automatic failover or switchover (unless using MySQL Utilities) Primary-Secondary Characteristics MySQL Replication in 5.6 Master Slave Slave Slave Slave
  • 14.
    14 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Multi-master architecture  No single point of failure  Support for SQL and NoSQL Interfaces  Synchronous replication Shared Nothing Clusters Characteristics MySQL Cluster MySQL Cluster Data Nodes MySQL Servers
  • 15.
    15 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Tightly Coupled Clusters  Provide Active/Passive Solution  Examples: – DRBD – WSFC – Solaris Clustering – Oracle Virtual Machines
  • 16.
    16 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Linux Kernel module integrated into Oracle Linux  Synchronous replication  Only one MySQL operational Distributed Replicated Block Device Characteristics DRBD (Regular Operation) Pacemaker MySQL DRBD MySQL DRBD Corosync ServicesCluster
  • 17.
    17 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Cluster Management System required  Virtual IP migration Distributed Replicated Block Device Characteristics DRBD (Failover) Pacemaker MySQL DRBD MySQL DRBD Corosync ServicesCluster
  • 18.
    18 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Cluster Management System required  Virtual IP migration Distributed Replicated Block Device Characteristics DRBD (Failover) Pacemaker MySQL DRBD MySQL DRBD Corosync ServicesCluster
  • 19.
    19 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Required: – Windows Clustering – Shared Storage  Only one MySQL Operational  Virutal IP migration  Shared storage used to vote Shared Storage Characteristics Windows Server Failover Clustering (Regular Operation) SharedStorageServers MySQL Windows Clustering MySQL Windows Clustering Services Vote Data Binary Log
  • 20.
    20 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. MySQL Fabric – New kid on the block
  • 21.
    21 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Distributed framework  Extensions are first-class Citizens  Supported by a variety of connectors  Fault-tolerant solution  You can suggest features, report bugs and contribute patches MySQL Fabric  Still early alpha, long journey ahead  Farms of MySQL 5.6 Servers
  • 22.
    22 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Support for Primary- Secondary  Focus on MySQL 5.6 and later  Written in Python Birds-eye View Characteristics High Availability Groups MySQL Fabric Application XML-RPC SQL Key Components
  • 23.
    23 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Fabric-aware connectors: – Route Transactions – Cache Information – Currently Python, Java, PHP Birds-eye View Characteristics High Availability Groups MySQL Fabric Application XML-RPC SQL Fabric-aware Connectors
  • 24.
    24 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  XML-RPC is widely available  Extensible Framework  Failures taken into account Architecture Characteristics MySQL MySQL Fabric Framework Executor State Store (Persister) Sh ?HA MySQLAMQP XML-RPC ?? Extensions Backing Store Protocols
  • 25.
    25 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. MySQL Fabric: Prerequisites  MySQL Servers 5.6.10 (or later): – Backing Store – Managed Servers  Python 2.6 or 2.7  MySQL Utilities 1.4.0 – Available at labs (http://labs.mysql.com)
  • 26.
    26 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. MySQL Fabric – Failure Detection and Failover
  • 27.
    27 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Fabric keeps information on groups  Application defines the group that it will use  Connection failures regularly propagated HA Overview Characteristics High Availability GroupMySQL Fabric ApplicationOperator
  • 28.
    28 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Failure Detection and Failover  Current Status: – Simple failure detector/recovery per group  Considering: – Make connectors report failures – Support external/custom failure detectors – Improve failover/switchover algorithm – Extend servers/system to avoid the split-brain problem
  • 29.
    29 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Enabled per group Failure Detection group = Group.fetch(self.__group_id) for server in group.servers():   if server.is_alive():     continue   if group.master == server.uuid:     trigger("FAIL_OVER", [], self.__group_id)   else:     trigger("SERVER_LOST", [], self.__group_id,              server.uuid)   server.status = MySQLServer.FAULTY Failover if master has gone Notification if not master Server marked as faulty
  • 30.
    30 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Failover Master Slave Slave Slave Slave T1 T2 T3 T1 T2 T3 T1 T1 T2 T1 Master fails
  • 31.
    31 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Failover Master Slave Slave Slave Slave T1 T2 T3 T1 T2 T3 T1 T1 T2 T1 Choosing a candidate
  • 32.
    32 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Failover Master Slave Slave Slave Slave T1 T2 T3 T1 T2 T3 T1 T1 T2 T1 Pointing to the new master
  • 33.
    33 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Making Fabric Itself HA  Current Status: – Fabric can automatically resume on-going activities – Backing store is not left in an inconsistent state – Information is cached in the connector  Considering: – Replicated State Machine among Fabric nodes – Use MySQL Cluster as backing store
  • 34.
    34 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Crash-safe Procedures MySQL Fabric Framework State Store (Persister) Sh HA MySQLAMQP XML-RPC MySQL Executor Procedure. Step 1 Procedure. Step 2 Procedure. Step 3 Regular Execution
  • 35.
    35 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Crash-safe Procedures MySQL Fabric Framework State Store (Persister) Sh HA MySQLAMQP XML-RPC MySQL Executor Procedure. Step 1 Procedure. Step 2 Procedure. Step 3 Failover/Recovery Execution
  • 36.
    36 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Crash-safe Procedures MySQL Fabric Framework State Store (Persister) Sh HA MySQLAMQP XML-RPC MySQL Executor Procedure. Step 1 Procedure. Step 2 Procedure. Step 3 Resuming Execution
  • 37.
    37 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Writing a procedure @_events.on_event(STEP_1) def do_something(group_id):     _do_it(group_id)     _events.trigger_within_procedure(STEP_2, group_id)     ) @do_something.undo def undo_something(group_id):     _undo_it(group_id) Trigger the next step Compensate Operation Transactional Context
  • 38.
    38 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. MySQL Fabric: Using MySQL Cluster MySQL Fabric Framework State Store (Persister) Sh HA MySQLAMQP XML-RPC MySQL Fabric Framework State Store (Persister) Sh HA MySQLAMQP XML-RPC MySQL Cluster Executor Executor
  • 39.
    39 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. MySQL MySQL Fabric Framework Executor State Store (Persister) Sh HA MySQLAMQP XML-RPC MySQL tore ter) Sh HA L-RPC MySQ Fram Executor MAMQP MySQL MySQL Fabric Framework Executor State Store (Persister) Sh HA MySQLAMQP XML-RPC RSMRSM MySQL Fabric: Using Replicated State Machine
  • 40.
    40 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. MySQL Fabric-aware Connectors
  • 41.
    41 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Use MySQLFabricConnection Writing an application import mysql.connector.fabric as connector conn = connector.MySQLFabricConnection(     fabric={"host": "fabric.example.com", "port" : 8080},     user='mats', passwd= 'passwd', database="employees") conn.set_property(group='YYZ') cur = conn.cursor() Connecting to a Group Define a group Get a cursor to master in YYZ
  • 42.
    42 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Connectors cannot hide failures Multi-statement transaction
  • 43.
    43 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Connectors cannot hide failures Single-statement transaction
  • 44.
    44 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Writing an application try:   conn.start_transaction()   conn.execute('INSERT...')   conn.execute('UPDATE...')   self.__cnx.commit() except InterfaceError as error:   cur = conn.cursor() Handling Connection Failures Connectors cannot safely retry or reconnect
  • 45.
    45 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Plan your application to retry after a failure.
  • 46.
    46 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Good practices  Handle session information in the retry logic: – Temporary tables – Session variables – Prepared statements  Check the wait_timeout server's property  Do not set connection_timeout
  • 47.
    47 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Blogs  http://alfranio-distributed.blogspot.com/2013/09/writing-fault-tolerant-database.html  http://alfranio-distributed.blogspot.com/2013/09/tips-to-build-fault-tolerant-database.html Documents  http://miscalculation/why-mysql/white-papers/mysql-guide-to-high-availability-solutions/  http://dev.mysql.com/doc/workbench/en/mysql-utilities.html Code  MySQL Fabric available at http://labs.mysql.com/ References
  • 48.
    48 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. MySQL Fabric – Playing with the new kid
  • 49.
    49 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Use MTR  Do it manually, use sandbox, whatever you like Starting MySQL Servers Quick Setup rpl_fabric_gtid.cnf: !include ../my.cnf [mysqld.n] report­host=localhost log­slave­updates innodb gtid­mode=on enforce­gtid­consistency master­info­repository=TABLE ­­source include/have_innodb.inc rpl_fabric_gtid.test:
  • 50.
    50 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Python 2.6 or 2.7  MySQL Utilities 1.4.0  Check configuration file MySQL Fabric Installation Quick Setup fabric.cfg: [storage] address = localhost:3306 user = fabric password =  database = fabric connection_timeout = 6 [protocol.xmlrpc] address = localhost:8080 threads = 5 url = file:///var/log/fabric.log
  • 51.
    51 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved.  Configure the state store  Start fabric  Manage your groups Run MySQL Fabric Quick Setup mysqlfabric manage setup mysqlfabric manage start Terminal 1: mysqlfabric list­commands mysqlfabric group create YYZ mysqlfabric group add localhost:1300 root '' Terminal 2:
  • 52.
    52 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Thoughts for the Future ● Connector multi-cast ● Scatter-gather ● Internal interfaces ● Improve extension support ● Improve procedures support ● Command-line interface ● Improving usability ● Focus on ease-of-use ● More protocols ● MySQL-RPC Protocol? ● AMQP? ● More frameworks? ● More HA group types ● DRBD ● MySQL Cluster ● Fabric-unaware connectors?
  • 53.
    53 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Thoughts for the Future ● “More transparent” sharding ● Single-query transactions ● Cross-shard joins is a problem ● Multiple shard mappings ● Independent tables ● Multi-way shard split ● Efficient initial sharding ● Better use of resources ● High-availability executor ● Node failure stop execution ● Replicated State Machine ● Fail over to other Fabric node ● Distributed failure detector ● Connectors report failures ● Custom failure detectors
  • 54.
    54 | 21/09/2013| Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Thank you!
  • 55.
    Copyright © 2013,Oracle and/or its affiliates. All rights reserved.55 Your Feedback is Highly Appreciated! http://forums.mysql.com/list.php?144