Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Alibaba Patches in MariaDB
Lixun Peng
Topic
• Time Machine / Flashback (Developing)
• Double Sync Replication (Will Contribute)
• Multi-Source Replication
• Thr...
What’s a Time Machine
• Rolling back instances/databases/tables to a snapshot
• Implement on Server-Level to support all e...
Why Time Machine
• Everyone may make mistakes, including a DBA.
• After users mis-operating their data, of course, we can
...
How Time Machine Works
• As we know, if binlog_format is ROW (binlog-row-
image=FULL in 5.6 and later), all columns’ value...
Done List
• Full DML support
• Review table support
• Because users may want to check which part of data is flashbacked.
•...
ToDo List
• Adding DDL supports
• For ADD INDEX/COLUMN, or CREATE TABLE query, just drop the
index, column, table when run...
Flashback command
Double Sync Replication
——Enhancing data security guarantee
Lixun Peng @ Alibaba Cloud Compute
Problem of Async Replication
• Master don’t need to wait the ACK from Slave.
• Slave doesn’t know if it dumped the latest ...
Semi-Sync Replication
Problem of SemiSync
• Master needs to wait ACK from Slave.
• Slave will downgrade to Async when timeout happen.
• If the t...
Problem of Async/SemiSync
Backgroup & Target
• Backgroup
• SA guarantee the server availability: 99.999%
• NA guarantee the network availability: 99...
Solve the weak point of SemiSync
• Once SemiSync is timeout, even network is recovered, Slave
still need to dump the binar...
Combine the Async and SemiSync
• Async Replication(Async_Channel)
• Dumping continuous binary logs to guarantee that the S...
Combine the Async and SemiSync
How to create two channels(1)
• Multi-Source replication can create N channels in one Slave.
• Problem:When Master receive...
How to create two channels (2)
• Problem:There are a SemiSync and a non-SemiSync Channel
in one Slave, but the SemiSync se...
Analyzing consistency
• Using the GTID
• Using the Log_file_name and Log_file_pos
• How to judge, check the following pict...
Analyzing consistency
CASE 1: Needn’t Fix
• GTIDs between Sync and Async Channel are the same.
CASE 2: Can’t Fix
• Exist broken gap between Sync and Async Channel.
CASE 3: Can Repair
• Combine two channel’s logs, it’s continuous.
How to Repair
• We wait for the Async Channel till it applied for all logs that
received. Then start the SQL THREAD of Syn...
Multi-Source Replication
——N Masters and 1 Slave
Lixun Peng @ Alibaba Cloud Compute
Why we need multi-source
• OLAP
• Most of users using MySQL for data sharding.
• Multi-Source can help users to combine th...
How Multi-Source implement
What changes in the code
• Move Rpl_filter/skip_slave_counters into Master_info.
• Every channels will create a new Master...
The Syntax
• CHANGE MASTER ["connection_name"] ...
• FLUSH RELAY LOGS ["connection_name"]
• MASTER_POS_WAIT(....,["connect...
The Syntax
• set @@default_master_connection='';
• show status like 'Slave_running';
• set @@default_master_connection=‘co...
How it runs
Thread Memory Monitor
——Known how MySQL using memory
Lixun Peng @ Alibaba Cloud Compute
Why we need TMM
• MySQL’s memory limitation just work fine on Storage Engine
• For example in InnoDB: innodb_buffer_pool_s...
How to solve it
• Add a hack in my_malloc.
• Record the malloc size and which thread applied for this
memory
• Calculate a...
THANKS!
Upcoming SlideShare
Loading in …5
×

Alibaba patches in MariaDB

11,786 views

Published on

  • The #1 Woodworking Resource With Over 16,000 Plans, Download 50 FREE Plans... ➤➤ https://url.cn/ktFCrsHZ
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/2ZDZFYj ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ❤❤❤ http://bit.ly/2ZDZFYj ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Alibaba patches in MariaDB

  1. 1. Alibaba Patches in MariaDB Lixun Peng
  2. 2. Topic • Time Machine / Flashback (Developing) • Double Sync Replication (Will Contribute) • Multi-Source Replication • Thread Memory Monitor
  3. 3. What’s a Time Machine • Rolling back instances/databases/tables to a snapshot • Implement on Server-Level to support all engines. • By full image format binary logs • Currently, it’s a feature of mysqlbinlog tool (with--flashback option)
  4. 4. Why Time Machine • Everyone may make mistakes, including a DBA. • After users mis-operating their data, of course, we can recovery it from the last full backup set and binary logs. • But if users’ database is too huge, it will cost so much time! And usually, mis-operation just modify a few data, but we need to recovery whole database.
  5. 5. How Time Machine Works • As we know, if binlog_format is ROW (binlog-row- image=FULL in 5.6 and later), all columns’ values are store in the row event, so we can get the data before mis- operation. • Just do following things: • Change Event Type, INSERT->DELETE, DELETE->INSERT • For Update_Event, swapping the SET part and WHERE part • Applying those events from the last one to the first one which mis-operation happened. • All the data will be recovered by inverse operations of mis- oprerations.
  6. 6. Done List • Full DML support • Review table support • Because users may want to check which part of data is flashbacked. • GTID support (MariaDB) • We add GTID event support for MariaDB 10.1 • MySQL 5.6 GTID events support is still working
  7. 7. ToDo List • Adding DDL supports • For ADD INDEX/COLUMN, or CREATE TABLE query, just drop the index, column, table when running Flashback. • For DROP INDEX/COLUMN, or DROP TABLE query, copy or rename the old table to a reserved database. When Flashback is running, I can drop the new table, and rename the saved old one to the original database. • For TRUNCATE table, I just rename the old table to a reserved database and create a new empty table. • Adding a script for time machine.
  8. 8. Flashback command
  9. 9. Double Sync Replication ——Enhancing data security guarantee Lixun Peng @ Alibaba Cloud Compute
  10. 10. Problem of Async Replication • Master don’t need to wait the ACK from Slave. • Slave doesn’t know if it dumped the latest binary logs from Master. • When crashed, slave can check if itself is the same with Master or not by its own. • So,The main problem is that Slave doesn’t know the status of Master.
  11. 11. Semi-Sync Replication
  12. 12. Problem of SemiSync • Master needs to wait ACK from Slave. • Slave will downgrade to Async when timeout happen. • If the timeout is too small, timeout will happen frequently. • If the timeout is too big, Master will often be blocked. • After network is recovered, Slave should dump the binary logs generated during timeout. During the time, Slave is still Async. • When a Master is crashed, Slave doesn’t know if the master is Async or SemiSync. • So, Slave still doesn’t know if it’s the same with Master or not when Master crashed. • So,SemiSync doesn’t solve the main problem of Async Repplication.
  13. 13. Problem of Async/SemiSync
  14. 14. Backgroup & Target • Backgroup • SA guarantee the server availability: 99.999% • NA guarantee the network availability: 99.999% • So, we can assume when the Master is crashed, network will not timeout at that time point. • Target • Slave can know its status by itself. (the same with Master or not) • If the data isn’t the same with Master, notice the app&dev to fix the data, and show the range of lost data. • Key Point: To avoid Slave's status being unknown!
  15. 15. Solve the weak point of SemiSync • Once SemiSync is timeout, even network is recovered, Slave still need to dump the binary logs generated during timeout, under Async. • If SemiSync is timeout, we give up the binary logs during timeout, Master just send the latest position & logs. What will happen? • When the network is down, the Slave will always know the latest position on Master. • So, Slave can know if its data is the same with Master or not. • But, if Slave just dump the latest data, how to get the data during the time when network is down? • Async replication can dump the continuous binaray logs • So we can use Async replication to do the full log apply.
  16. 16. Combine the Async and SemiSync • Async Replication(Async_Channel) • Dumping continuous binary logs to guarantee that the Slave’s logs are continuous. • Applying for logs after received immediately. • SemiSync Replication(Sync_Channel) • Dumping the latest binary logs to guarantee that the Slave knows the latest position of Master. • Will not apply logs after received, just save the logs & position and outdated logs will be purged automatically. • Analyzing consistency • Comparing the received logs positions with these two channels.
  17. 17. Combine the Async and SemiSync
  18. 18. How to create two channels(1) • Multi-Source replication can create N channels in one Slave. • Problem:When Master received two dump requests from the same Server-ID servers, it will disconnect the previous one. • Solve:We set Sync Channel as a special Server-ID (0xFFFFFF).
  19. 19. How to create two channels (2) • Problem:There are a SemiSync and a non-SemiSync Channel in one Slave, but the SemiSync settings are global. • Solve:We moved SemiSyncSlave class to Master_info.
  20. 20. Analyzing consistency • Using the GTID • Using the Log_file_name and Log_file_pos • How to judge, check the following pictures 
  21. 21. Analyzing consistency
  22. 22. CASE 1: Needn’t Fix • GTIDs between Sync and Async Channel are the same.
  23. 23. CASE 2: Can’t Fix • Exist broken gap between Sync and Async Channel.
  24. 24. CASE 3: Can Repair • Combine two channel’s logs, it’s continuous.
  25. 25. How to Repair • We wait for the Async Channel till it applied for all logs that received. Then start the SQL THREAD of Sync Channel. • GTID will filter the event that applied by Async Channel. • We provide the REPAIR SLAVE command to do these things automaticially.
  26. 26. Multi-Source Replication ——N Masters and 1 Slave Lixun Peng @ Alibaba Cloud Compute
  27. 27. Why we need multi-source • OLAP • Most of users using MySQL for data sharding. • Multi-Source can help users to combine their data from sharding instances. • If you are using Master-Slave for backup, Multi-Source can help you to backup many instances into one, it’s easy to maintain.
  28. 28. How Multi-Source implement
  29. 29. What changes in the code • Move Rpl_filter/skip_slave_counters into Master_info. • Every channels will create a new Master_info. • Every replication-related function will use the special Maser_info. • We create a Master_info_index class to maintain all Master_info.
  30. 30. The Syntax • CHANGE MASTER ["connection_name"] ... • FLUSH RELAY LOGS ["connection_name"] • MASTER_POS_WAIT(....,["connection_name"]) • RESET SLAVE ["connection_name"] • SHOW RELAYLOG ["connection_name"] EVENTS • SHOW SLAVE ["connection_name"] STATUS • SHOW ALL SLAVES STATUS • START SLAVE ["connection_name"...] • START ALL SLAVES ... • STOP SLAVE ["connection_name"] ... • STOP ALL SLAVES ...
  31. 31. The Syntax • set @@default_master_connection=''; • show status like 'Slave_running'; • set @@default_master_connection=‘connection'; • show status like 'Slave_running';
  32. 32. How it runs
  33. 33. Thread Memory Monitor ——Known how MySQL using memory Lixun Peng @ Alibaba Cloud Compute
  34. 34. Why we need TMM • MySQL’s memory limitation just work fine on Storage Engine • For example in InnoDB: innodb_buffer_pool_size • In the Server we can limit only some features’ memory, like sort_buffer_size, join_buffer_size. • But for big Query,the most of memory cost is from MEM_ROOT,no option to limit it. • So when mysqld process used too many memory, we don’t know which thread is the reason. • Then we don’t know which thread to kill to release the memory.
  35. 35. How to solve it • Add a hack in my_malloc. • Record the malloc size and which thread applied for this memory • Calculate a total memory size of all threads.
  36. 36. THANKS!

×