Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Database Designer’s Modern Day Cookbook

137 views

Published on

Preetam Jinks has been learning about database internals by learning the theory, seeing how other systems have implemented ideas, and trying to implement them myself. Of course, it’s not always easy, and he's not making anything that compares to the scale of something like MySQL or MongoDB, but reinventing a few small wheels here and there has given him a better understanding of how bigger systems work. In this breakout session at Percona Live 2017, Preetam talks about how some fundamental design choices regarding immutability, transactions, and ACID lead to some interesting trade-offs and implementation differences. He also talks about replication, and how it can be as simple as sending the same queries to multiple servers or how stronger guarantees can lead systems to use more complicated approaches like synchronous replication and consensus.

Published in: Software
  • Be the first to comment

The Database Designer’s Modern Day Cookbook

  1. 1. The Database Designer’s Modern-Day Cookbook Preetam Jinka Software Engineer Percona Live 2017
  2. 2. VividCortex’s database monitoring application is the best way to improve your database performance, efficiency, and uptime. Supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora, VividCortex uses patented algorithms to reveal key insights, helping users fix performance problems before they impact customers. Say hello and see a demo, Booth #205. We’re hiring!
  3. 3. Topics ● Immutability ● Transactions & ACID ● Replication 3
  4. 4. Topics ● Immutability ● Transactions & ACID ● Replication But mainly about trade-offs. 4
  5. 5. Immutability 5
  6. 6. 6 Immutability Not changing something once it’s created.
  7. 7. An immutable database? Just use a log. ● Write optimized ● Transactional ● Everything else is just a read optimization, right? ● Space might become a problem... 7
  8. 8. Something more realistic ● Two ways to update data. ○ In-place ○ Copy-on-write (the immutable approach) 8 In-place Copy-on-write
  9. 9. Concurrency Databases need to handle multiple readers and writers. And they need to provide certain guarantees (transactions). 9
  10. 10. In-place updates with ARIES Algorithms for Recovery and Isolation Exploiting Semantics ● Write-ahead logging ● Redo logs ● Undo logs Systems like MySQL, Oracle, SQL Server, DB2 use something like ARIES to manage transactions. 10
  11. 11. Copy-on-write with row versioning ● Systems like PostgreSQL create a copy of data when it needs to be changed. ● Immutability is inherently free of data races! ● But you need to get rid of old versions through vacuuming. ● You also need to manage the overhead of multiple versions. ● This is why systems like PostgreSQL don’t have an “undo log.” 11
  12. 12. > … PG must do more work at commit time, right? No. Commit and abort are both O(1). Where we pay the piper is in having to run VACUUM to clean up no-longer-needed row versions. This is a better design in principle, because the necessary maintenance can be done in background processes rather than making clients wait for transactions to finish. In practice, it's still pretty annoying, just in different ways than Oracle's UNDO. http://www.postgresql-archive.org/PG-and-undo-logging-td5850789.html 12
  13. 13. Secondary indexes ● Secondary indexes need to point to the original row. ● For MySQL, you just need the primary key. ● For PostgreSQL, you need the primary key and a version. ○ Primary keys aren’t unique because there could be different row versions! 13
  14. 14. What’s better? It’s a trade-off! 14
  15. 15. Uber’s Migration to MySQL Uber migrated from PostgreSQL to MySQL. Their reasons: ● Inefficient architecture for writes ● Inefficient data replication ● Issues with table corruption ● Poor replica MVCC support ● Difficulty upgrading to newer releases https://eng.uber.com/mysql-migration/ In other words: PostgreSQL probably didn’t have a set of trade-offs that worked well for them. 15
  16. 16. Transactions & ACID 16
  17. 17. 17 Transactions are complicated! MVCC ARIES UNDO logs REDO logs Row locks Isolation levelsACID Write skew Snapshot isolation Write-ahead log Consistency Commit
  18. 18. This is about making them simple. 18
  19. 19. ACID transactions 19 ● Atomicity ● Consistency ● Isolation ● Durability
  20. 20. What do you need for ACID? 1. A snapshot view of the data 2. Durable, atomic writes ● Immutability makes #1 easier. ● Single writer makes #2 easier. You can get both from a log. 20
  21. 21. ...but transactions & ACID in the real world tend to be much more complicated... ...because not everything uses immutability, and most systems are not single writer. 21
  22. 22. Replication 22
  23. 23. 23 Replication Copying data to several places.
  24. 24. 24 Replication choices ● Asynchronous ● Synchronous ○ Semi-synchronous is another option with MySQL As usual… trade-offs.
  25. 25. Replication spectrum 25 Synchronous AsynchronousSemi-sync Most guarantees Least flexible Least guarantees Most flexible
  26. 26. ● Requires coordination at the master ○ Coordination can get complicated... ● There’s waiting involved ○ Replica lag doesn’t exist ● Safe Synchronous 26
  27. 27. ● You need some sort of “master” or “leader” server handling coordination. ● Leader election and consensus are ways of selecting a master automatically ○ Paxos is a consensus algorithm that’s widely used. MySQL Group Replication uses a variant in their multi-master approach. Synchronous 27
  28. 28. Synchronous 28 Master Replica Replica Master pushes to replicas
  29. 29. Asynchronous 29 Master Replica Replica Replicas pull from the master
  30. 30. Asynchronous 30 ● Less coordination ● Pro: Master doesn’t wait for replicas. ○ It’s faster because there’s no waiting. ● Con: Master doesn’t wait for replicas. ○ Replicas can fall behind. ● Delayed replicas can be really useful when things go wrong!
  31. 31. Disaster recovery with a delayed replica 31 DigitalOcean’s April 2017 Outage: “Within three minutes of the initial alerts, we discovered that our primary database had been deleted. Four minutes later we commenced the recovery process, using one of our time-delayed database replicas. Over the next four hours, we copied and restored the data to our primary and secondary replicas.” https://www.digitalocean.com/company/blog/update-on-the-april-5th-2017-outage/
  32. 32. The diagrams look similar but they’re very different. 32 Master Replica Replica Replicas pull from the master Master Replica Replica Master pushes to replicas
  33. 33. Use the right tool for the job. 33
  34. 34. Final thoughts 34 ● There are trade-offs everywhere. ● You’re not limited to a single technology or implementation. ● Things keep getting more exciting.
  35. 35. Questions? 35

×