Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Double Sync Replication

124 views

Published on

Double Sync Replication slides in OOW16

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Double Sync Replication

  1. 1. Double Sync Replication ——Enhancing Data Durability Lixun Peng @ Alibaba Cloud Compute
  2. 2. About me • Name: Lixun Peng • Location: Hangzhou, China • Occupation: Staff Database Kernel Engineer @ Alibaba Cloud • Interests: MySQL Replication & InnoDB • Experience: In the first, I worked as a DBA Then, I began to modify code, in order to better use Gradually I became a MySQL Kernel Engineer
  3. 3. Agenda • The problem about Async/Semi-Sync • How to solve these problems • How to implement Double-Sync • How to use Double-Sync • Several cases
  4. 4. Agenda • The problem about Async/Semi-Sync • How to solve these problems • How to implement Double-Sync • Several cases
  5. 5. Problem of Async Replication • Master doesn’t have to wait ACK from Slave. • Slave doesn’t know if it dumps the latest binary logs. • When Master crashes, slave can’t tell if it catches up Master. • The major problem is slave doesn’t know master’s status.
  6. 6. Semi-Sync Replication Semi-Sync will wait for the ACK from Slave
  7. 7. Problem of SemiSync • Master has to wait ACK from slave. • Slave will downgrade to async when timeout happens. • If timeout setting is too small, timeout happens too often. • If timeout setting is too big, master blocks a lot. • Slave dump binary logs generated during timeout asynchronously, after it recover from network failure. • If Master crashes, slave doesn’t know how replication works (Async or SemiSync). • In this case, slave still doesn’t know if it dumps the latest binary logs. • Conclusion is SemiSync doesn’t solve the major problem .
  8. 8. Problem of Async/SemiSync
  9. 9. Flow Chart (Async/Semi-Sync)
  10. 10. Background & Target • Background • SA team guarantee the server availability: 99.999% • Net Ops team guarantee the network availability: 99.999% • Assuming master and network doesn’t fail at the same time. • Target • Slave knows if it catch up master. • Slave knows how data in master side it doesn’t have. • Key Point: Clarify Slave's status!
  11. 11. Agenda • The problem about Async/Semi-Sync • How to solve these problems • How to implement Double-Sync • Several cases
  12. 12. Solve the weak point of SemiSync • Even network recover after failure, slave still has to dump the binary logs generated during timeout asynchronously. • If timeout happens and slave gives up the binary logs generated during timeout, what will happen afterwards if master only send the latest position & logs? • When network is down, slave always knows the latest position. • Slave can know if its data is the same with Master or not. • How to catch up data modification when network is down? • Async replication can still dump binary logs • So we can use Async replication to do a full log apply.
  13. 13. Combine the Async and SemiSync • Async Replication (Async Channel) • Dumping continuous binary logs from master. • Applying logs immediately after slave receives them. • SemiSync Replication(Sync Channel) • Dumping the latest binary logs and position. • Not applying logs immediately. Expired logs are being purged automatically. • Analyzing Consistency • Comparing logs and position from two channels.
  14. 14. Combine the Async and SemiSync
  15. 15. Flow Chart (Double Sync)
  16. 16. Agenda • The problem about Async/Semi-Sync • How to solve these problems • How to implement Double-Sync • Several cases
  17. 17. How to create two channels(1) • Multi-Source replication enables N channels in one slave. • Problem: when master received two dump requests from the same server-id servers, it disconnects the previous one. • Solution: set up special Server-ID (0xFFFFFF) for Sync Channel.
  18. 18. How to create two channels (2) • Problem: there are a SemiSync and a non-SemiSync Channel in one slave, but the SemiSync settings are global. • Solution: move SemiSyncSlave class to Master_info.
  19. 19. Analyzing consistency • Using the GTID • Using the Log_file_name and Log_file_pos • Learn the process by checking the following pictures J
  20. 20. Analyzing consistency ß Needn’t Repair, Just use it! ß Can’t Repair, Will lose something ß Can Repair, Use it after repair
  21. 21. Agenda • The problem about Async/Semi-Sync • How to solve these problems • How to implement Double-Sync • Several cases
  22. 22. CASE 1: Needn’t Fix • The GTID between Sync and Async Channel are the same.
  23. 23. CASE 2: Can’t Fix • Exists broken gap between Sync and Async Channel.
  24. 24. CASE 3: Can Repair • Combine two channel’s logs to make logs continuous.
  25. 25. How to Repair • Slave waits for the Async Channel to apply all the logs it receives, then start the SQL THREAD of Sync Channel. • GTID filters the events which have been applied by Async Channel. • A REPAIR SLAVE command is provided to do things automatically.
  26. 26. FAQs (1) • Q1: Will Alibaba release this feature? • A1: Of course! Alibaba will release all the patches. • Q2: When Alibaba release the source codes? • A2: Check AliSQL’s roadmap. • Q3: How can I access AliSQL’s source codes? • A3: https://github.com/alibaba/AliSQL Currently the project is private. If you want to access it, please email me to provide your GitHub account.
  27. 27. FAQs (2) • Q4: What’s the difference between 2 Semi-Sync Slaves and double sync replication? • A4: In fact they do the same job. Performance is pretty much the same too. But double sync replication saves one more slave than 2 Semi-Sync Slaves architecture. When the number of MySQL servers grows, it will save lots of money.
  28. 28. Any other Questions? penglixun@gmail.com

×