Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.
Group	Replication:		A	Journey	to	the	Group	Communicat...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Safe	Harbor	Statement
The	following	is	intended	to...
Copyright	©	2017,	Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Program	Agenda
4th	of	February Oracle	/	Fosdem	2017
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Background
Group	Communication	Interface
Group Com...
Copyright	©	2017,	Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Background
4th	of	February Oracle	/	Fosdem	2017
1
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
MySQL	InnoDB Cluster
64th	of	February Oracle	/	Fos...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
MySQL	Group	Replication
• What	is	MySQL	Group	Repl...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Major	Building Blocks
84th	of	February Oracle	/	Fo...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
The	Complete	Stack
94th	of	February Oracle	/	Fosde...
Copyright	©	2017,	Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Group Communication Interface
4th	of	February Orac...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Design
• Abstract interface	to	support different s...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Semantics
• Closed Group
– Only group members can	...
Copyright	©	2017,	Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Group Communication Engine
4th	of	February Oracle	...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Built-in	Communication	Engine
• Based on proven	di...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Paxos	Family and Friends
154th	of	February Oracle	...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Basic	Paxos
164th	of	February Oracle	/	Fosdem	2017...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Prepare	Phase
174th	of	February Oracle	/	Fosdem	20...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Accept Phase
184th	of	February Oracle	/	Fosdem	201...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Learn Phase
194th	of	February Oracle	/	Fosdem	2017...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Multi-Paxos
204th	of	February Oracle	/	Fosdem	2017...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
So what?
• They can	easily become a	bottleneck
• M...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
How does	XCOM	work?
224th	of	February Oracle	/	Fos...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Nothing to	Propose
234th	of	February Oracle	/	Fosd...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
How is the optimization possible?
• Member “1”	sen...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Handling	Failures/Suspicions
254th	of	February Ora...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Implemented Optimizations in	XCOM
• Pipeline
– Pro...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Implemented Optimizations in	Biding
• Compression
...
Copyright	©	2017,	Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Performance
4th	of	February Oracle	/	Fosdem	2017
6
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Configuration
• Multipe writers – One per	Server
•...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Multiple writers (256	Bytes)
304th	of	February Ora...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Multiple writers (1K	Bytes)
314th	of	February Orac...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Single	Writer (1K	Bytes)
324th	of	February Oracle	...
Copyright	©	2017,	Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Conclusion
4th	of	February Oracle	/	Fosdem	2017
5
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Current Status
• Has made into MySQL 5.7.17	releas...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Future
• Configurable Paxos	role(s)
– Leader/Accep...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Future
• Expose some	configuration options:
– Batc...
Copyright	©	2017, Oracle	and/or	its	affiliates.	All	rights	reserved.		|
Where	to	go	from	here?
• Packages
– http://www.mys...
Group Replication: A Journey to the Group Communication Core
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
MySQL Group Replication - HandsOn Tutorial
Next
Upcoming SlideShare
MySQL Group Replication - HandsOn Tutorial
Next
Download to read offline and view in fullscreen.

Share

Group Replication: A Journey to the Group Communication Core

Download to read offline

Describes the design decisions on the paxos-based implementation that is used by Group Replication.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Group Replication: A Journey to the Group Communication Core

  1. 1. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Group Replication: A Journey to the Group Communication Core Alfranio Correia (alfranio.correia@oracle.com) Principal Software Engineer 4th of February Oracle / Fosdem 2017
  2. 2. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 24th of February Oracle / Fosdem 2017
  3. 3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Program Agenda 4th of February Oracle / Fosdem 2017
  4. 4. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Background Group Communication Interface Group Communication Engine Performance Conclusion Program Agenda 4th of February Oracle / Fosdem 2017 4 1 2 3 4 5
  5. 5. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Background 4th of February Oracle / Fosdem 2017 1
  6. 6. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | MySQL InnoDB Cluster 64th of February Oracle / Fosdem 2017 S1 S2 S3 S4 S… M M M MySQL Connector Application MySQL Router MySQL Connector Application MySQL Router MySQL Shell HA ReplicaSet1 S1 S2 S3 S4 S… M M M MySQL Connector Application MySQL Router HA ReplicaSet 2 ReplicaSet 3 MySQL Connector Application MySQL Router S1 S2 S3 S4 M M M HA
  7. 7. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | MySQL Group Replication • What is MySQL Group Replication? “Multi-master update everywhere replication plugin for MySQL with built-in automatic distributed recovery, conflict detection and group membership.” • What does the MySQL Group Replication plugin do for the user? – Automates server failover in Single Primary – Provides fault tolerance – Enables update everywhere setups – Automates group reconfiguration (handling of crashes, failures, re-connects) – Provides a highly available replicated database 74th of February Oracle / Fosdem 2017
  8. 8. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Major Building Blocks 84th of February Oracle / Fosdem 2017 M M M M M Com. API Replication Plugin API MySQL Server Group Comm. System (Corosync) Group Com. Engine
  9. 9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | The Complete Stack 94th of February Oracle / Fosdem 2017 API Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier InnoDB Replication Protocol Group Com. API Group Com. Engine Network Plugin Capture Applier Conflicts Handler Group Comm. System (Corosync) Group Com. Engine Group Com. Binding Recovery
  10. 10. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Group Communication Interface 4th of February Oracle / Fosdem 2017 2
  11. 11. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Design • Abstract interface to support different solutions – Reconfigure the group and get membership information – Send and receive messages • Uses the observer pattern – MySQL Group Replication listens to events • Different implementations per Communication Systems • Made the transition from Corosync easy 114th of February Oracle / Fosdem 2017
  12. 12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Semantics • Closed Group – Only group members can send and receive messages • Total Order – Messages are totally ordered among each other • Safe Delivery – One cannot deliver a message if the majority can’t do so • View Synchrony – Changes to membership are tolltaly ordered with messages 124th of February Oracle / Fosdem 2017
  13. 13. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Group Communication Engine 4th of February Oracle / Fosdem 2017 3
  14. 14. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Built-in Communication Engine • Based on proven distributed systems algorithms (Paxos) – Compression, multi-platform, dynamic membership, SSL, IP whitelisting • No third-party software required • No network multicast support required – MySQL Group Replication can operate on cloud based installations where multicast is unsupported 144th of February Oracle / Fosdem 2017
  15. 15. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Paxos Family and Friends 154th of February Oracle / Fosdem 2017 Multi-Paxos Fast Paxos Disk Paxos Cheap Paxos Vertical Paxos Generalized Paxos Raft Mencius Flexible Paxos Egalitarian Paxos Byzantine Paxos
  16. 16. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Basic Paxos 164th of February Oracle / Fosdem 2017 M0 M1 M2 Prepare/Election Phase M0 M1 M2 Accept Phase M0 M1 M2 Learn Phase • Get agreement on a value: – Next message/transaction to be delivered • Members may have different roles: – Usually all members are proposers, acceptors and learners • Need a quorum to make progress – Usually a majority 1 2 3
  17. 17. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Prepare Phase 174th of February Oracle / Fosdem 2017 • Proposer sends a prepare request with number “n” to members (i.e. acceptors) • If an acceptor has not received a request with a number greater than “n”, it will respond • It will promise not to accept a request numbered less than “n” • If the reply has a non-empty value, the leader will use that with the highest number M0 M1 M2 Prepare1.1 M0 M1 M2 Promise1.2 (n) (n) (y, value) (x, value)
  18. 18. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Accept Phase 184th of February Oracle / Fosdem 2017 • If the leader finds out that a non-empty value has been previously proposed, it will use it • Otherwise, it will propose a new value • Requires a network round-trip to get agreement M0 M1 M2 Accept2.1 M0 M1 M2 Accepted2.2 (n, value) (n, value) (ack) (ack)
  19. 19. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Learn Phase 194th of February Oracle / Fosdem 2017 • It will inform other members about the decision • Only one learner is required to have progress • If the member already has the value, an ack is enough M0 M1 M2 Learn3 (value) (value)
  20. 20. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Multi-Paxos 204th of February Oracle / Fosdem 2017 slot 0 0 1 2 Accept/Learn 0 1 2 Accept/Learn 0 1 2 Accept/Learn 0 1 2 Election 0 1 2 Accept/Learn 0 1 2 Election slot 1 slot 2 slot 3 ... • Consensus round to decide on each slot’s content • Replicated Log Stream
  21. 21. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | So what? • They can easily become a bottleneck • Multiple leaders: eXtended COMmunications 214th of February Oracle / Fosdem 2017
  22. 22. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | How does XCOM work? 224th of February Oracle / Fosdem 2017 slot 0 0 1 2 Accept/Learn slot 1 slot 2 slot 3 0 1 2 Accept/Learn slot 4 slot 5 ...... 0 1 2 Accept/Learn 0 1 2 Accept/Learn 0 1 2 Accept/Learn 0 1 2 Accept/Learn • Every member is a leader so no leader election • Every member owns a In-Memory Replicated Log
  23. 23. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Nothing to Propose 234th of February Oracle / Fosdem 2017 slot 0 0 1 2 Accept/Learn nop slot 2 slot 3 0 1 2 Accept/Learn nop slot 5 ...... 0 1 2 Accept/Learn 0 1 2 Learn 0 1 2 Accept/Learn 0 1 2 Learn • Only a learn message with a “nop” is enough
  24. 24. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | How is the optimization possible? • Member “1” sends a learn message “(0, nop)” to member “4” and dies • Non-leaders can only propose “nop”(s) on behalf of others • They must go through all Paxos phases 244th of February Oracle / Fosdem 2017 0 2 3 1 4 Learn 1 2 3 0 4 (1) (1) 1 2 3 0 4 (0, -) (0, -) 1 2 3 0 4 (1, nop) (1, nop) 1 2 3 0 4 (ack) (ack) Prepare Promise Accept Accepted 1 2 3 0 4 (nop) (nop) Learn (0, nop)
  25. 25. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Handling Failures/Suspicions 254th of February Oracle / Fosdem 2017 slot 0 0 1 2 Accept/Learn 0 1 2 Accept/Learn 0 1 2 Prep./Accept/Learn slot 1 slot 2 nop 0 1 2 Accept/Learn 0 1 2 Accept/Learn slot 4 0 1 2 Accept/Learn slot 5 ......
  26. 26. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Implemented Optimizations in XCOM • Pipeline – Proposes several “transactions” in parallel – Improves performance in high latency networks – Current value is “10” • Batch – Improves CPU usage – Improves performance in high latency/low bandwidth networks – Current value is “5” 264th of February Oracle / Fosdem 2017
  27. 27. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Implemented Optimizations in Biding • Compression – Reduces bandwith consumption • Automatically reconfigure a group – Faulty members are expelled 274th of February Oracle / Fosdem 2017
  28. 28. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Performance 4th of February Oracle / Fosdem 2017 6
  29. 29. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Configuration • Multipe writers – One per Server • Single writer – Just one client • Oracle Server X5-2L with two Intel Xeon E5-2660-V3 processors – 20 Cores – 40 Hardware Threads • Oracle Enterprise Linux 7, kernel 3.8.13-118.13.3 • 10 Gbps ethernet • Used “tc” to throttle network 294th of February Oracle / Fosdem 2017
  30. 30. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Multiple writers (256 Bytes) 304th of February Oracle / Fosdem 2017 3 members 5 members 7 members 3 members 5 members 7 members Uncompressed 256 byte payload Compressed 256 byte payload 0 20000 40000 60000 80000 100000 120000 140000 160000 10Gbps network with 0.1ms latency 200Mbps network with 7ms latency • Compression improves performance in Metropolitan • Headers are not compressed (~200 bytes) though Messages per second sent
  31. 31. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Multiple writers (1K Bytes) 314th of February Oracle / Fosdem 2017 • Check whether compression may help or not • Usually helps when bandwidth is a problem 3 members 5 members 7 members 3 members 5 members 7 members Uncompressed 1K payload Compressed 1K payload 0 20000 40000 60000 80000 100000 120000 10Gbps network with 0.1ms latency 200Mbps network with 7ms latency Messages per second sent
  32. 32. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Single Writer (1K Bytes) 324th of February Oracle / Fosdem 2017 3 members 5 members 7 members 3 members 5 members 7 members Uncompressed 1K payload Compressed 1K payload 0 20000 40000 60000 80000 100000 120000 10Gbps network with 0.1ms latency 200Mbps network with 7ms latency • The scale out effect with multiple writers is small • Compression does not help here Messages per second sent
  33. 33. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Conclusion 4th of February Oracle / Fosdem 2017 5
  34. 34. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Current Status • Has made into MySQL 5.7.17 release • GA in December 2016 344th of February Oracle / Fosdem 2017
  35. 35. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Future • Configurable Paxos role(s) – Leader/Acceptor/Learner or Acceptor/Learner or Learner • Multiple leaders only if needed: – Avoids the skip message – Improves CPU and network usage • Not all members need to make messages network durable – Reduces resilience but improves performance 354th of February Oracle / Fosdem 2017
  36. 36. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Future • Expose some configuration options: – Batch – Pipeline • Compression at low level layers as well • Write to network in parallel • Overlay networks 364th of February Oracle / Fosdem 2017
  37. 37. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Where to go from here? • Packages – http://www.mysql.com/downloads/ • Documentation – http://dev.mysql.com/doc/refman/5.7/en/group-replication.html • Blogs from the Engineers (news, technical information, and much more) – http://mysqlhighavailability.com 374th of February Oracle / Fosdem 2017
  • kakaling

    Jun. 17, 2020
  • zhylcq

    Sep. 6, 2018
  • ssuser5f787f

    Aug. 23, 2017
  • logchild

    May. 2, 2017
  • ybbct

    Apr. 5, 2017
  • freshdaz

    Mar. 26, 2017
  • whitepoplar

    Feb. 14, 2017
  • mattalord

    Feb. 7, 2017
  • fuyou001

    Feb. 7, 2017

Describes the design decisions on the paxos-based implementation that is used by Group Replication.

Views

Total views

2,233

On Slideshare

0

From embeds

0

Number of embeds

163

Actions

Downloads

110

Shares

0

Comments

0

Likes

9

×