Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Demystifying MySQL Replication Crash Safety

189 views

Published on

Up to MySQL 5.5, replication was not crash safe: after a crash, it would fail with "duplicate key" or "row not found" error, or might generate silent data corruption. It looks like 5.6 is much better, right? The short answer is maybe: in the simplest case, it is possible to achieve replication crash safety but it is not the default setting. MySQL 5.7 is not much better, 8.0 has safer defaults but it is still easy to get things wrong.

Crash safety is impacted by replication positioning (File+Pos or GTID), type (single-threaded or MTS), MTS settings (Database or Logical Clock, and with or without slave preserve commit order), the sync-ing of relay logs, the presence of binary logs, log-slave-updates and their sync-ing. This is very complicated stuff and even the manual is confused about it.

In this talk, I will explain the impact of above and help you finding the path to crash safety nirvana. I will also give details about replication internals, so you might learn a thing or two.

Published in: Technology
  • Be the first to comment

Demystifying MySQL Replication Crash Safety

  1. 1. Demystifying MySQL Replication Crash Safety Presented at Percona Live Europe 2018 in Frankfurt by Jean-François Gagné Senior Infrastructure Engineer / System and MySQL Expert jeanfrancois AT messagebird DOT com
  2. 2. 2 Introducing MessageBird MessageBird is a cloud communications platform founded in Amsterdam 2011. Examples of our messaging and voice SaaS: SMS in and out, call in (IVR) and out (alert), SIP, WhatsApp, Facebook, Telegram, Twitter, WeChat, … Omni-Channel Conversation Details at www.messagebird.com 225+ Direct-to-Carrier Agreements With operators from around the world 15,000+ Customers In over 60+ countries 180+ Employees Engineering office in Amsterdam Sales and support offices worldwide We are expanding : {Software, Front-End, Infrastructure, Data, Security, Telecom, QA} Engineers {Team, Tech, Product} Leads, Product Owners, Customer Support {Commercial, Connectivity, Partnership} Managers www.messagebird.com/careers
  3. 3. 3 Summary (Demystifying MySQL Replication Crash Safety – PLEU2018) • Helicopter view of – and then Zoom in – Replication and Crash Safety • MySQL 5.6 solution (and its problems) • Complexifying things with GTIDs and Multi-Threaded Slave (MTS) • Impacts of reducing / compromising durability (sync_binlog != 1 and trx_commit != 1) • Overview of related subjects: Semi-Sync, MariaDB & Pseudo-GTIDs • Closing, links, bugs and questions
  4. 4. Overview of MySQL Replication (Demystifying MySQL Replication Crash Safety – PLEU2018) One master with one or more slaves: • The master records transactions in a journal (binary logs); each slave: • Downloads the journal and saves it locally in the relay logs (IO thread) • Executes the relay logs on its local database (SQL thread) • Could also produce binary logs to be a master (log-slave-updates – lsu)
  5. 5. Replication Crash Safety (Demystifying MySQL Replication Crash Safety – PLEU2018) What do I mean by Replication Crash Safety ? • When a slave crashes, it is able to resume replication after recovery (OK if rewinds its state after recovery, as long as it is eventually consistent) • When a master crashes, slaves are able to resume replicating from it • All above without sacrificing data consistency • In other words: ACID is not compromised by a slave or a master crash (Discussion limited to transactional SE: InnoDB, TokuDB, MyRocks; obviously not MyISAM) Intermediate masters (IM) qualify both as master and slave Slaves are potential master (and IM) in some failover strategy (Proving replication crash un-safety is easy, proving safety is hard) 5
  6. 6. 6 State of the Dolphin and of the Sea Lion (Demystifying MySQL Replication Crash Safety – PLEU2018) State of the Dolphin in Replication Crash Safety: • MySQL 5.5 is not crash safe • MySQL 5.6 can be made crash safe (it is not by default) • MySQL 5.7 is mostly the same as 5.6 (with complexity added by Logical Lock parallel replication) • MySQL 8.0 is crash safe by default (but it can be made unsafe by “tuning” the configuration) Quick state of the Sea Lion: • MariaDB 5.5 is not replication crash safe • MariaDB 10.x can be made crash safe
  7. 7. 7 Zoom in the details [1 of 3] (Demystifying MySQL Replication Crash Safety – PLEU2018) More details about replication: • The IO Thread stores its state in master info (also configuration stored there) • The SQL Thread in relay log info slave1 [localhost] {msandbox} ((none)) > show slave statusG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event [...] Master_Log_File: mysql-bin.000001 <-------+-- master info (persisted state) Read_Master_Log_Pos: 25489 <-------+ Relay_Log_File: mysql-relay.000002 <--+ Relay_Log_Pos: 10788 <--+ Relay_Master_Log_File: mysql-bin.000001 <--+-- relay log info (persisted state) [...] | Exec_Master_Log_Pos: 10575 <--+ [...] 1 row in set (0.00 sec)
  8. 8. More parameters: sync_master_info, sync_relay_log and sync_relay_log_info In MySQL 5.5, master info and relay log info are files: • No atomicity of “making progress” and “state tracking” for IO & SQL Threads • Consistency of actual vs registered state is compromised after a crash Ø This is why replication is not crash-safe in MySQL 5.5 8 Zoom in the details [2 of 3] (Demystifying MySQL Replication Crash Safety – PLEU2018)
  9. 9. 9 Zoom in the details [3 of 3] (Demystifying MySQL Replication Crash Safety – PLEU2018) Even more parameters: • sync_binlog (and innodb_flush_log_at_trx_commit – trx_commit): • Binlogs are synchronised to disk after every N writes/transactions (default 0 in My|SQL 5.5 and 5.6; and in 5.7 and 8.0 it is 1 which is full ACID) • trx_commit = 1: logs written and flushed each trx (full ACID and default) = 0: written and flushed once per second (not crash safe) = 2: written after each trx and flushed once per second (mysqld crash safe, not OS crash safe)
  10. 10. MySQL 5.6 solution [1 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) Reminder: problems making MySQL 5.5 Replication Crash Un-Safe: • The position of the SQL Thread cannot be trusted • The position of the IO Thread cannot be trusted • The content of the Relay Logs cannot be trusted 10
  11. 11. MySQL 5.6 solution [2 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) The MySQL 5.6 solution • Atomicity for SQL Thread: relay-log-info-repository = TABLE (default = FILE) • Useless for crash safety: a parameter to store master info in a table: • master-info-repository = TABLE (default = FILE) • Providing a way to “fix” the relay logs: relay-log-recovery = 1 (default = 0)
  12. 12. 12 MySQL 5.6 solution [3 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) More details about Relay Log Recovery: • relay-log-recovery is only used on mysqld startup (dynamic would be useless) • If relay-log-recovery = 0, nothing special done (and a new relay log is created) • If relay-log-recovery = 1: • The position of the IO Thread is set to the position of the SQL Thread • The position of the SQL Thread is set to the newly created relay log • If relay-log-purge = 1: the old relay logs will be deleted on SQL Thread startup (relay-log-recovery does not delete anything: easy to test with skip-slave-start) Ø Said otherwise, the previous relay logs are skipped ! (those relay logs are considered improper for SQL Thread consumption) • This will happen even if MySQL (or the IO Thread) did not crash OK for 1st implementation but a waste of perfectly good relay logs
  13. 13. 13 MySQL 5.6 solution [4 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) In MySQL 5.7: • No change of defaults (for replication crash safety) • Relay log recovery still simplistic K In MySQL 8.0: • Still simplistic relay log recovery L • New defaults: • relay-log-info-repository = TABLE J • relay-log-recovery = 1 J • master-info-repository = TABLE (not sure this is very useful) Bug#74323: Avoid overloading the master NIC on relay-log-recovery of a lagging slave Bug#74321: Execute relay-log-recovery only when needed
  14. 14. Adding complexity with GTIDs [1 of 2] (Demystifying MySQL Replication Crash Safety – PLEU2018) Not only MySQL 5.6 introduces replication crash safety, it also introduced Global Transaction IDs (GTIDs) • This tags every transaction with an ID when writing to the binlogs • The GTID state of the master and slaves are tracked in the binlogs Ø IO and SQL Thread states are now partially in the binlogs (and relay logs) • Optionally, slaves can use GTID to replicate (instead of file+position) • This allows easier repointing of slaves to a new master (including fail over) • This heavily relies on precise tracking of GTID states on master and slaves Ø As this tracking is in the binlogs, this is impeded when sync_binlog != 1 Bug#70659: Make crash safe slave with gtid + less durable settings 14
  15. 15. Adding complexity with GTIDs [2 of 2] (Demystifying MySQL Replication Crash Safety – PLEU2018) To make replication crash safe with GTIDs in MySQL 5.6: • relay-log-info-repository = TABLE (default = FILE) • relay-log-recovery = 1 (default = 0) – (Bug#92093) • sync_binlog = 1 (default = 0) • In 5.7, the default is sync_binlog = 1 J (two other unchanged K) • In 8.0, all the defaults are good for crash safe replication with GTID J J • MySQL 5.7 adds a table for storing the GTID state of slaves: • Allows GTIDS slaves without log-slave-updates (lsu) • With lsu, this table (mysql.gtid_executed) is not updated after each trx Ø Missed opportunity for OS crash safety with sync_binlog != 1 L L L Bug#92109: Make GTID replication crash safe with less durable setting 15
  16. 16. 16 Master Replication Crash Safety [1 of 5] (Demystifying MySQL Replication Crash Safety – PLEU2018) Relaxing durability of the binlogs implies losing GTID state (after an OS crash) • What about the consequence on the master ? With and without GTID ? • If sync_binlog != 1 on the master, an OS crash will lose binlogs • With sync_binlog != 1, usually trx_commit != 1 (normally 2, but can be 0) • trx_commit = 2 preserves data on mysqld crashes, 0 does not (à 2 is better) Ø InnoDB will also lose transactions on an OS crash Ø After an OS crash, InnoDB will be out-of-sync with the binlogs Ø And we cannot trust the binlogs on such master (trx gap or ghost trx) The failure mode will be different depending on the configuration
  17. 17. 17 Master Replication Crash Safety [2 of 5] (Demystifying MySQL Replication Crash Safety – PLEU2018) With file+position • IO Thread in vanished binlogs • So slaves executed phantom trx (ghost in binlogs, maybe not in InnoDB) • When the master is restarted: • It records trx in new binlog file • Most slaves are broken, and they might be out-of-sync with each-others • Some lagging slave might skip vanished binlogs
  18. 18. 18 Master Replication Crash Safety [2 of 5] (Demystifying MySQL Replication Crash Safety – PLEU2018) With file+position • IO Thread in vanished binlogs • So slaves executed phantom trx (ghost in binlogs, maybe not in InnoDB) • When the master is restarted: • It records trx in new binlog file • Most slaves are broken, and they might be out-of-sync with each-others • Some lagging slave might skip vanished binlogs Ø Broken slaves have more data than the master (à data drift) Ø And different data drift on “lucky” lagging slaves that might not break
  19. 19. 19 Master Replication Crash Safety [3 of 5] (Demystifying MySQL Replication Crash Safety – PLEU2018) With GTID enabled • Slave also executed ghost trx vanished from binlogs • But those are in their GTID state • A recovered master reuses GTIDs of the vanished trx • Slaves magically reconnect to the master (MASTER_AUTO_POSITION = 1) 1. If master has not reused all ghost GTIDs, then the slave breaks 2. If it has, then the slave skips the new transactions à more data drift (in illustration, the slave will skip new 50 to 58 as it has the old one)
  20. 20. 20 Master Replication Crash Safety [4 of 5] (Demystifying MySQL Replication Crash Safety – PLEU2018) With GTID enabled but MASTER_AUTO_POSITION = 0 • Left as an exercise to the reader… On the consequences of sync_binlog != 1 (part #1) https://jfg-mysql.blogspot.com/2018/10/consequences-sync-binlog-neq-1-part-1.html (more posts to be published in the series)
  21. 21. Master Replication Crash Safety [5 of 5] (Demystifying MySQL Replication Crash Safety – PLEU2018) Summary of running with sync_binlog != 1: • The binlogs – of the master or slave – cannot be trusted after an OS crash • On a master, having mysqld normally restarts after such a crash leads to data drift Ø After an OS crash, make sure no slaves reconnect to the recovered master (OFFLINE_MODE = ON in config file – failing-over to a slave is the way forward) • On slaves, having mysqld restarts after such a crash leads to truncated binlogs Ø After an OS crash, consider purging all binlogs on the recovered slave • Intermediate Masters (IM) are both master and slaves Ø After an OS crash make sure no slaves reconnect to the recovered IM Ø And consider purging all binary logs on it • Remember: GTID state corrupted on slaves after OS crash (Bug#92109) 21
  22. 22. 22 Adding complexity with MTS [1 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) Multi-Threaded Slave (MTS) in MySQL 5.6 is doing out-of-order committing • Same for MySQL 5.7 with DATABASE and LOGICAL_CLOCK types • LOGICAL_CLOCK also has the slave_preserve_commit_order option (OFF by default in 5.7 and 8.0 K, with ON requiring log-slave-updates L) (Bug#75396: Allow slave_preserve_commit_order without log-slave-updates) Example: transactions A, B, C, D, E on the master • On a slave, SHOW SLAVE STATUS points to B, so A is committed • C and E are also committed, B is running and D is pending scheduling (maybe B and D are in the same schema with DATABASE type) With out-of-order commit, a file+position in relay log info is not enough • GTID allows tracking complex position (generating temporary holes on slaves) • And there is the mysql.slave_worker_info table (https://dev.mysql.com/worklog/task/?id=5599: for more details)
  23. 23. 23 Adding complexity with MTS [2 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) Without GTID, resuming replication after a crash needs filling the gap in trx • Manual, error-prone, and not always possible before 5.6.31 and 5.7.13 (Bug#77496) • Now, automated by doing START SLAVE UNTIL SQL_AFTER_MTS_GAPS • But this needs relay logs, which might have vanished after an OS crash (Bug#81840)
  24. 24. 24 Adding complexity with MTS [3 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) • Bug#81840 makes MTS with File+Position OS crash unsafe (safe for mysqld crash) • Hard to accept workaround: sync_relay_log = 1 (performance killer) • Full state in mysql.slave_worker_info à recovery possible with a lot of effort • The good solution would be a better relay log recovery (Bug#93081)
  25. 25. 25 Adding complexity with MTS [4 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) With GTID, MTS in MySQL 5.6, 5.7 & 8.0 is replication crash safe: • But it needs MASTER_AUTO_POSITION = 1 (and relay log recovery Bug#92093) • And it comes with all the GTID “goodies” (rogue transactions, lsu for 5.6, …) • Also needs sync_binlog = 1 (if 5.7+, also works without binlogs or lsu off) • And care with sync_binlog != 1 on the master (need to fail over if OS crash) (sync_binlog != 1 should not be needed in 95% of cases) (Group Commit and MTS make this optimisation almost obsolete) Example: A, B, C, D, E on the master with GTID 10, 11, 12, 13, 14: • GTID executed on the slave is 1-10:12:14 before a crash • Replication resumes by fetching 11:13:15… (after relay log recovery)
  26. 26. 26 Adding complexity with MTS [4 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) With GTID, MTS in MySQL 5.6, 5.7 & 8.0 is should be crash safe : • But it needs MASTER_AUTO_POSITION = 1 (and relay log recovery Bug#92093) • And it comes with all the GTID “goodies” (rogue transactions, lsu for 5.6, …) • Also needs sync_binlog = 1 (if 5.7+, also works without binlogs or lsu off) • And care with sync_binlog != 1 on the master (need to fail over if OS crash) Bug#92882: MTS not replication crash-safe with GTID and all the right parameters (Only applies to Operating System crashes) Example: A, B, C, D, E on the master with GTID 10, 11, 12, 13, 14: • GTID executed on the slave is 1-10:12:14 before Operating System crash • Relay log recovery tries to “fill the gaps” but fails because relay logs are gone (This might be a regression from the fix of Bug#77496) (Easy workaround: stop slave; reset slave; start slave;)
  27. 27. 27 Related subjects – Semi-Sync (Demystifying MySQL Replication Crash Safety – PLEU2018) In this talk, we did not cover master failover explicitly, when a master crashes in an unrecoverable way, failover needs to happen When failing-over to a slave, committed transactions can be lost (Some transactions on the crashed master might not have reached slaves) à violation of durability (ACID) in the replication topology (distributed system) Except if lossless semi-sync is used, more details in: Question about Semi-Synchronous Replication: the Answer with All the Details https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous- replication-answer-with-all-the-details/
  28. 28. 28 Related subjects – MariaDB (Demystifying MySQL Replication Crash Safety – PLEU2018) MariaDB still stores its master info and relay log info in files • But it stores GTID state of slaves in the mysql.gtid_slave_pos table Ø MariaDB is replication crash safe when using GTID slave positioning Also, it has an interesting feature: • If using more than one storage engine, a single state table is not optimal • Having one such table per storage engine could be better Improving replication with multiple storage engines (MariaDB 10.3) https://kristiannielsen.livejournal.com/19223.html
  29. 29. 29 Related subjects – Pseudo GTIDs (Demystifying MySQL Replication Crash Safety – PLEU2018) Pseudo-GTIDs: • A way to get GTID-like features without GTIDs • They work with any version of MySQL/MariaDB (even 5.5) • But they assume in-order-commit à does not work with MTS They can provide slave replication crash safety: • With log-slave-updates and sync_binlog = 1 • Even on MySQL 5.5 or MariaDB 5.5 https://github.com/github/orchestrator/blob/master/docs/pseudo-gtid.md
  30. 30. Conclusion (Demystifying MySQL Replication Crash Safety – PLEU2018) • It is complicated and it depends… • It has many edge cases • It might still change as bugs are fixed • And hopefully improvements will be made • So sorry: there is no short version
  31. 31. Conclusion [2 of 5] (Demystifying MySQL Replication Crash Safety – PLEU2018) Some parameters never impact/improve Replication Crash Safety: • master-info-repository, sync_master_info, sync_relay_log_info Some parameters are always needed for Replication Crash Safety: • relay-log-info-repository = TABLE • relay-log-recovery = 1
  32. 32. 32 Conclusion [3 of 5] (Demystifying MySQL Replication Crash Safety – PLEU2018) MySQL 5.6 with GTID (with and without MTS) à crash safe slave if: • All above with sync_binlog = 1 (not default) and MASTER_AUTO_POSITION = 1 and maybe a “stop slave; reset slave; start slave;” (Bug#92882) MySQL 5.6 without GTID and with MTS à not always crash safe slaves: • OK for MySQL crashes as relay logs are not lost • For OS crashes, losing the relay logs leads to replication breakage (Bug#81840) • Possible to recover with some voodoo and dark magic (Bug#93081) For master and slaves, binlogs cannot be trusted if sync_binlog != 1
  33. 33. 33 Conclusion [4 of 5] (Demystifying MySQL Replication Crash Safety – PLEU2018) MySQL 5.7 is mostly the same as 5.6: • sync_binlog = 1 is the default J • Will be crash safe with GTID and sync_binlog != 1 when Bug#92109 fixed • LOGICAL_CLOCK with slave_preserve_commit_order like single-threaded • Without slave_preserve_commit_order, same as MTS in 5.6 MySQL 8.0 is mostly the same as 5.7 with safer defaults: • relay-log-info-repository = TABLE J • relay-log-recovery = 1 J • But default for slave_preserve_commit_order is still 0 K
  34. 34. 34 Conclusion [5 of 5] (Demystifying MySQL Replication Crash Safety – PLEU2018) Care with MTS as it has many traps And in all cases: • Relay log recovery needs to re-download relay logs from the master • High load in case of lagging (or delayed) slaves L • Will fail if the binary logs were purged from the master L • Relay log recovery also fails for MTS and OS crashes (vanished relay logs) L L L We need a better Relay Log Recovery ! Bug#74321, Bug#74323, Bug#74324, Bug#81840 Bug#92882, Bug#93081
  35. 35. 35 Links [1 of 3] (Demystifying MySQL Replication Crash Safety – PLEU2018) Crash-Safe MySQL Replication - A Visual Guide https://hackmongo.com/post/crash-safe-mysql-replication-a-visual-guide/ (diagrams in this talk are inspired by this post) Jean-François’s blog posts about Replication Crash Safety: • Better Crash-safe replication for MySQL https://medium.com/booking-com-infrastructure/better-crash-safe-replication-for-mysql-a336a69b317f • Replication crash safety with MTS in MySQL 5.6 and 5.7: reality or illusion? https://jfg-mysql.blogspot.com/2016/01/replication-crash-safety-with-mts.html • A discussion about sync-master-info and other replication parameters https://jfg-mysql.blogspot.com/2016/08/discussion-about-sync-master-info-and-replication-parameters.html • On the consequences of sync_binlog != 1 (part #1) https://jfg-mysql.blogspot.com/2018/10/consequences-sync-binlog-neq-1-part-1.html
  36. 36. 36 Links [2 of 3] (Demystifying MySQL Replication Crash Safety – PLEU2018) Directly related bugs: • Bug#70669: Slave can't continue repl. after master's recovery (old – 5.6.14, and fixed – 5.6.17) • Bug#70659: Make crash safe slave work with gtid + less durable settings • Bug#74321: Execute relay-log-recovery only when needed • Bug#74323: Avoid overloading the master NIC on relay-log-recovery of a lagging slave • Bug#74324: Make keeping relay logs (relay_log_purge = 0) crash safe • Bug#77496: Replication position lost after crash on MTS configured slave (really fixed ?) • Bug#81840: Automatic Replication Recovery Does Not Handle Lost Relay Log Events • Bug#92093: Replication crash safety needs relay_log_recovery even with GTID • Bug#92109: Please make replication crash safe with GITD and less durable setting (bis) • Bug#92882: MTS not replication crash-safe with GTID and all the right parameters • Bug#93081: Please implement a better relay log recovery Somehow related bugs: • Bug#75396: Allow slave_preserve_commit_order without log-slave-updates • Bug#92891: Please make relay_log_space_limit dynamic
  37. 37. Links [3 of 3] (Demystifying MySQL Replication Crash Safety – PLEU2018) • Question about Semi-Synchronous Replication: the Answer with All the Details https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/ • Improving replication with multiple storage engines (in MariaDB 10.3) https://kristiannielsen.livejournal.com/19223.html • Pseudo-GTID and Orchestrator: https://github.com/github/orchestrator/blob/master/docs/pseudo-gtid.md https://speakerdeck.com/shlominoach/pseudo-gtid-and-easy-mysql-replication-topology-management • The Full MySQL and MariaDB Parallel Replication Tutorial https://www.slideshare.net/JeanFranoisGagn/the-full-mysql-and-mariadb-parallel-replication-tutorial • Arg: relay_log_space_limit is (still) not dynamic ! https://jfg-mysql.blogspot.com/2018/10/arg-relay-log-space-limit-is-still-not-dynamic.html • Evaluating MySQL Parallel Replication Part 2: Slave Group Commit https://medium.com/booking-com-infrastructure/evaluating-mysql-parallel-replication-part-2-slave-group-commit-459026a141d2 • Evaluating MySQL Parallel Replication Part 4: More Benchmarks in Production https://medium.com/booking-com-infrastructure/evaluating-mysql-parallel-replication-part-4-more-benchmarks-in-production-49ee255043ab
  38. 38. Thanks ! Presented at Percona Live Europe 2018 in Frankfurt by Jean-François Gagné Senior Infrastructure Engineer / System and MySQL Expert jeanfrancois AT messagebird DOT com

×