MySQL InnoDB cluster provides a complete high availability solution for MySQL.
MySQL Shell includes AdminAPI which enables you to easily configure and administer a group of at least three MySQL server instances to function as an InnoDB cluster.
Each MySQL server instance runs MySQL Group Replication, which provides the mechanism to replicate data within InnoDB clusters, with built-in failover.
MySQL Router can automatically configure itself based on the cluster you deploy, connecting client applications transparently to the server instances.
1. MySQL InnoDB Cluster
Olivier Dasini
MySQL Principal Solutions Architect EMEA
olivier.dasini@oracle.com
@freshdaz
Copyright 2017, Oracle and/or its affiliates. All rights reserved
A complete High Availability solution for MySQL
5. Copyright 2017, Oracle and/or its affiliates. All rights reserved
The world's most popular open source database
6. Copyright 2017, Oracle and/or its affiliates. All rights reserved
1. Google
2. Facebook
3. YouTube
4. Baidu
5. Yahoo!
6. Amazon
7. Wikipedia
8. QQ
9. Google.co.in
10. Twitter
11. Live.com
12. Taobao
13. Msn.com
14. Yahoo.co.jp
15. Sina
16. Linkedin.com
17. Google.co.jp
18. Weibo
19. Bing.com
20. Yandaz.ru
Global Top 20 Sites: Powered by MySQL
Source: Wikipedia 2016 6
Powers the Web
7. Copyright 2017, Oracle and/or its affiliates. All rights reserved
They Scale
with MySQL
Mobile Network Supporting
Over 800 Million
Subscribers
2+ Billion Active Users
100 TB of User Data for PayPal
IDs Processed for
1 Billion Citizens
850 Million Candy Crush Game
Plays/Day
2 Billion Events/Day for Booking.com
7
14. High Availability: Factors
• Environment
– Redundant servers in different datacenters and geographical areas will protect you against regional
issues—power grid failures, hurricanes, earthquakes, etc.
• Hardware
– Each part of your hardware stack—networking, storage, servers—should be redundant
• Software
– Every layer of the software stack needs to be duplicated and distributed across separate hardware and
environments
• Data
– Data loss and inconsistency/corruption must be prevented by having multiple copies of each piece of
data, with consistency checks and guarantees for each change
15
15. High Availability: The Causes of Downtime
16
40,00 %
40,00 %
20,00 %
Software/Application Human Error Hardware
* Source: Gartner Group 1998 survey
A study by the Gartner Group
projected that through 2015, 80% of
downtime will be due to people and
process issues
16. High Availability: The Business Cost of Downtime
• Calculate a cost per minute of downtime
– Average revenue generated per-minute over a year
– Cost of not meeting any customer SLAs
– Factor in costs that are harder to quantify
1. Revenue
2. Reputation
3. Customer sentiment
4. Stock price
5. Service’s success
6. Company’s very existence
17
THIS is why HA
matters!
18. Copyright 2017, Oracle and/or its affiliates. All rights reserved
HA solutions with MySQL : Replication modes
• Asynchronous => MySQL Replication
— MySQL Default; In parallel: Master acks to app and sends transaction to slave
— Fast; Risk of lost changes if master dies
• Semi-Synchronous => MySQL Replication w/ semisynchronous plugin
— MySQL 5.5+ Enhanced in MySQL 5.7; Serially: Master waits for change to be received by slave then In parallel ack to app and apply changes on slave
— Intermediate latency; Lossless (MySQL 5.7)
• Virtual-Synchronous => MySQL InnoDB Cluster w/ MySQL Group Replication
— MySQL Group Replication; Multi-masters updates on nodes applied in parallel
— MySQL Plugin delivered by MySQL for MySQL :)
— Intermediate latency; Best suited to small transactions; Lossless
• Synchronous => MySQL NDB Cluster
— Only available with MySQL Cluster; Serially: Master waits for change to be applied on all slaves before ack to app
— Higher latency; If Active/Active, best suited to small transactions; Lossless
23. Copyright 2017, Oracle and/or its affiliates. All rights reserved
Use MySQL Replication For HA
27
Redundancy: If master crashes, promote slave to master
B
C
CrashCrash
B is the
new master
Ouch!!! Whew!
https://dev.mysql.com/downloads/utilities/
B
C
A
C
B
Slave promotion managed by mysqlrpladmin or mysqlfailover
34. MySQL Group Replication: What Is It?
• Group Replication library
– Implementation of Replicated Database State Machine theory
• MySQL GCS is based on Paxos (variant of Mencius)
– Provides virtually synchronous replication for MySQL 5.7+
– Supported on all MySQL platforms
• Linux, Windows, Solaris, OSX, FreeBSD
“Single/Multi-master update everywhere replication plugin for MySQL with built-in automatic
distributed recovery, conflict detection and group membership.”
41
http://dasini.net/blog/2016/11/08/deployer-un-cluster-mysql-group-replication/
35. • A Highly Available distributed MySQL database service
– Clustering eliminates single points of failure (No SPOF)
• Allows for online maintenance
– Removes the need for handling server fail-over
– Provides fault tolerance
– Enables update everywhere setups
– Automates group reconfiguration (handling of crashes, failures, re-connects)
– Provides a highly available replicated database
– Automatically ensures data consistency ie no data loss
• Detects and handles conflicts
• Prevents data loss
• Prevents data corruption
42
MySQL Group Replication: What Does It Provide?
36. MySQL Group Replication: Use Cases
• Elastic Replication
– Environments that require a very fluid replication infrastructure, where the
number of servers has to grow or shrink dynamically and with little pain as possible.
• Highly Available Shards
– Sharding is a popular approach to achieve write scale-out. Users can use MySQL
Group Replication to implement highly available shards in a federated system.
Each shard can map into a Replication Group.
• Alternative to Master-Slave Replication
– Single-primary mode provides further automation on such setups
â—Ź Automatic PRIMARY/SECONDARY roles assignment
â—Ź Automatic new PRIMARY election on PRIMARY failures
â—Ź Automatic setup of read/write modes on PRIMARY and SECONDARIES
â—Ź Global consistent view of which server is the PRIMARY
43
37. MySQL Group Replication: What Sets It Apart?
• Built by the MySQL Engineering Team
– Natively integrated into Server: InnoDB, Replication, GTIDs, Performance Schema, SYS
– Built-in, no need for separate downloads
– Available on all platforms [Linux, Windows, Solaris, FreeBSD, etc]
• Better performance than similar offerings
– MySQL GCS has optimized network protocol that reduces the impact on latency
• Easier monitoring
– Simple Performance Schema tables for group and node status/stats
– Native support for Group Replication in MySQL Enterprise Monitor
• Modern full stack MySQL HA being built around it
44
38. MySQL Group Replication: Architecture
Node Types
R: Traffic routers/proxies: mysqlrouter, ProxySQL, HAProxy...
M: mysqld nodes participating in Group Replication
45
41. Full stack secure connections
• Following the industry standards, Group Replication supports secure connections along the
complete stack
– Client connections
– Distributed recovery connections
– Connections between members
• IP Whitelisting
– Restrict which hosts are allowed to connect to the group
– By default it is set to the values AUTOMATIC, which allow connections from private subnetworks active
on the host
48
http://mysqlhighavailability.com/mysql-group-replication-securing-the-perimeter/
42. Prioritize member for the Primary Member Election
• group_replication_member_weight
– allows users to influence primary member election
– takes integer value between 0 and 100
– default value = 50
• The first primary member is still the member which bootstrapped the group irrespective of
group_replication_member_weight value.
49
http://mysqlhighavailability.com/group-replication-prioritise-member-for-the-primary-member-election/
node1> SET GLOBAL group_replication_member_weight= 90;
node2> SET GLOBAL group_replication_member_weight= 70;
node1> SET GLOBAL group_replication_member_weight= 90;
node2> SET GLOBAL group_replication_member_weight= 70;
43. Parallel applier support
• Group Replication now also takes full advantage of parallel binary log applier infrastructure
– Reduces applier lad and improves replication performance considerably
– Configured in the sale way as asynchronous replication
50
slave_parallel_workers=<NUMBER>
slave_parallel_type=logical_clock
slave_preserve_commit_order=ON
slave_parallel_workers=<NUMBER>
slave_parallel_type=logical_clock
slave_preserve_commit_order=ON
44. Single Primary Mode
• Configuration mode that makes a single member act as a writeable master (PRIMARY) and the rest of the members
act as hot-standbys (SECONDARIES)
– The group itself coordinates automatically to figure out which is the member that will act as the PRIMARY, through an
automatic primary election mechanism
– Secondaries are automatically set to read-only
• Single_primary mode is the default mode
– Closer to classic asynchronous replication setups, simpler to reason about the beginning
– Avoids some limitations of multi-primary mode by default
• The current PRIMARY member UUID can be know by executing the following SQL statement:
51
mysql> SELECT * FROM performance_schema.global_status WHERE VARIABLE_NAME='group_replication_primary_member'G
*************************** 1. row ***************************
VARIABLE_NAME: group_replication_primary_member
VARIABLE_VALUE: dcd3b36b-79c5-11e6-97b8-00212844d44e
mysql> SELECT * FROM performance_schema.global_status WHERE VARIABLE_NAME='group_replication_primary_member'G
*************************** 1. row ***************************
VARIABLE_NAME: group_replication_primary_member
VARIABLE_VALUE: dcd3b36b-79c5-11e6-97b8-00212844d44e
45. Multi-primary Mode
• Configuration mode that makes all members writeable
– Enabled by setting option --group_replication_single_primary_mode to OFF
• Any two transactions on different servers can write to the same tuple
• Conflicts will be detected and dealt with
– First committer wins rule
52
46. • Execute locally
– Group Replication starts replicating a transaction when it is ready to commit, just before being written to the binary log
• Send in-order to all members
– At that point, transactions are broadcast to the network using a group communication protocol (Paxos, similar to the
Mencius variant)
• Certify independently
– All members receive the transactions in order and execute a deterministic certification algorithms to check if the received
transaction can be applied safely
• Apply asynchronously
– On remote members successfully certified transactions are written to the relay log and asynchronously applied by the
members, just as happens for other replication methods.
– On local member, the prepared transactions is committed to the storage engine.
53
MySQL Group Replication
Transaction life cycle
50. MySQL Group Replication
Traditional vs Optimistic locking
57
Traditional locking Optimistic locking
Trx 2 waits
on Trx 1 commit
ERROR 1180 (HY000): Got error 149 during COMMIT
51. • Optimize the system as a whole
– Sometimes it is beneficial to delay some parts of a distributed system it to improve the throughput of the system as a whole
– In MySQL Group Replication, it is used to
â—Ź keep writers operating below the sustained capacity of the system;
â—Ź reduce buffering stress on the replication pipeline;
â—Ź protect the correct execution of the system.
• Designed as a safety measure
– Throttling will never be active while the system is operating below its sustained capacity.
• To better support unbalanced systems and unfriedly workloads
– Keep members closer for faster failover
– Keep members closer for reduced replication lag
– Reduce the number of transactions aborts in multi-master
• Make sure new members can always join write-intensive groups
– Nodes entering the group need to catch up previous work while also storing current work to apply later
– In case excess capacity is not available, the cluster will need to be put at lower throughput for new members to catch up
60
Flow Control
52. • MySQL 5.7 - Basic configuration options
– group-replicaton-fow-control-mode = QUOTA | DISABLED
– group-replicaton-fow-control-certfer-threshold = 0..n
– group-replicaton-fow-control-applier-threshold = 0..n
• Certifier/applier thresholds
– The thresholds are the point at which the flow-control system will delay the writes at the master
– The default is set to 25000 and should be kept larger then one second of sustained commit rate
– But some members will be up to 25000 transactions delayed, if, and only if, they are unable to keep up with the writer members
• MySQL 8.0.3 introduce additional options to fine-tune the heuristics
– group_replicaton_fow_control_min_quota = X commits/s
– group_replicaton_fow_control_min_recovery_quota = X commits/s
– group_replicaton_fow_control_max_commit_quota = X commits/s
– group_replicaton_fow_control_member_quota_percent = Y %
– group_replicaton_fow_control_period = Z seconds
– group_replicaton_fow_control_hold_percent = Y %
– group_replicaton_fow_control_release_percent = Y %
61
Flow Control options
53. MySQL Group Replication: Requirement
•InnoDB Storage Engine
– Data must be stored in the InnoDB transactional storage engine.
•Primary Keys
– Every table that is to be replicated by the group must have an explicit primary key
defined.
•IPv4 Network
– The group communication engine used by MySQL Group Replication only supports
IPv4.
•Network Performance
– Group Replication is designed to be deployed in a cluster environment where server
instances are very close to each other, and is impacted by both network latency as
well as network bandwidth.
62
64. Hardware and Infrastructure Notes
• 3, 5, or 7 machines per group
– Isolate machine resources as much as possible
– Limit virtualization layers
– Machines configured for dedicated database server role
• Recommended configuration
– 32-64 vCPUs with fast CPU clock (2.5GHz+)
– SSDs (for data and replication logs)
– High quality network connection between each machine
• Low latency, high throughput, reliable
• Limit routers and hubs as much as possible
• Isolated and dedicated network when possible
74
65. Shared Nothing Cluster – Single Data Center
75
Application Servers
MySQL Router in Stack
MySQL Database Service
Group Replication
66. Shared Nothing Cluster Active / Passive – Cross Data Center
76
MySQL Database Service
Group Replication
Active Data Center Backup Data Center
Clients
67. Shared Nothing Cluster – Cross Data Center
77
MySQL Database Service
Group Replication
Data Center 1 Data Center 2
Clients
82. MySQL Enterprise Monitor 4.0
• Native holistic support for Group Replication clusters
– Intelligent monitoring and alerting
– Topology views
– Detailed metrics and graphs
– Best Practice advice
• Monitoring of MySQL Group Replication
104
83. MySQL Enterprise Monitor
105
• Group Replication with 3 online nodes
• Group Replication with 3 nodes :
– 2 online
– 1 unreachable
84. MySQL Enterprise Monitor
106
• Asynchronous replication between
– Group Replication cluster : master
– Standalone instances : slaves
85. MySQL Enterprise Monitor
107
• Asynchronous replication between
– Group Replication cluster 1 : master
– Group Replication cluster 2 : slave
86. Monitoring: replication_group_member_stats
• Useful to understand :
– how the applier queue is growing
– how many conflicts have been found
– how many transactions were checked
– which transactions are committed everywhere
â—Ź Important for monitoring the performance of the members connected in the group
node1> SELECT * FROM performance_schema.replication_group_member_statsG
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
VIEW_ID: 14845735801161197:3
MEMBER_ID: 00014001-1111-1111-1111-111111111111
COUNT_TRANSACTIONS_IN_QUEUE: 0
COUNT_TRANSACTIONS_CHECKED: 0
COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
TRANSACTIONS_COMMITTED_ALL_MEMBERS: 4e0f05b7-d9d0-11e6-87cf-002710cccc64:1-2
LAST_CONFLICT_FREE_TRANSACTION:
108
87. Monitoring: replication_group_members 1/2
• Used for monitoring the status of the different server instances that are tracked in the current view
node1> SELECT * FROM performance_schema.replication_group_membersG
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 00014001-1111-1111-1111-111111111111
MEMBER_HOST: localhost
MEMBER_PORT: 14001
MEMBER_STATE: ONLINE
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 00014002-2222-2222-2222-222222222222
MEMBER_HOST: localhost
MEMBER_PORT: 14002
MEMBER_STATE: ONLINE
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 00014003-3333-3333-3333-333333333333
MEMBER_HOST: localhost
MEMBER_PORT: 14003
MEMBER_STATE: ONLINE
109
88. Monitoring: replication_group_members 2/2
• replication_group_members table is updated whenever there is a view change
• There are various states that a server instance can be in
• If servers are communicating properly, all report the same states for all servers
• If there is a network partition, or a server leaves the group, then different information may be reported,
depending on which server is queried
110
Field Description Group Synchronized
ONLINE
The member is ready to serve as a fully functional group member, meaning that the client can connect and
start executing transactions Yes
RECOVERING
The member is in the process of becoming an active member of the group and is currently going through
the recovery process, receiving state information from a donor No
OFFLINE The plugin is loaded but the member does not belong to any group No
ERROR
The state of the local node. Whenever there is an error on the recovery phase or while applying changes,
the server enters this state No
UNREACHABLE
Whenever the local failure detector suspects that a given server is not reachable, because maybe it has
crashed or was disconnected involuntarily, it shows that server's state as 'unreachable' No
89. Monitoring: replication_connection_status
• Show information regarding Group Replication :
– transactions that have been received from the group and queued in the applier queue (the relay log)
– Recovery
node1> SELECT * FROM performance_schema.replication_connection_statusG
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
GROUP_NAME: 4e0f05b7-d9d0-11e6-87cf-002710cccc64
SOURCE_UUID: 4e0f05b7-d9d0-11e6-87cf-002710cccc64
THREAD_ID: NULL
SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 0
LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00
RECEIVED_TRANSACTION_SET: 4e0f05b7-d9d0-11e6-87cf-002710cccc64:1-2
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
111
90. Monitoring: replication_applier_status
• The state of the Group Replication related channels and thread
• If there are many different worker threads applying transactions then the worker tables can
also be used to monitor what each worker thread is doing
node1> SELECT * FROM
performance_schema.replication_applier_statusG
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
SERVICE_STATE: ON
REMAINING_DELAY: NULL
COUNT_TRANSACTIONS_RETRIES: 0
112