Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lightning talk: highly scalable databases and the PACELC theorem

661 views

Published on

Lightning talk I gave at Headspring's Friday brown bag event on 3/17/17. Basically a summary of what I've been blogging about at bardoloi.com

The content of these slides are from much greater sources than mine; to go to the original sources, you can start by looking at the References section at the end.

Published in: Technology
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Lightning talk: highly scalable databases and the PACELC theorem

  1. 1. Constraints of Highly Scalable Databases
  2. 2. 1. Traditional Databases Recap of the ACID constraints
  3. 3. “Traditional” databases operate with the Transaction paradigm that guarantees certain properties (A) Atomicity (C) Consistency (I) Isolation (D) Durability
  4. 4. The ACID Guarantees 1. Atomicity Each transaction must be “all or nothing” - if any part fails, the whole transaction must be rolled back as if it never happened. 2. Consistency The end-state of a transaction must follow all the rules defined in the database: data constraints, cascades, triggers etc. 3. Isolation The result of 2 concurrent operations should be the same as if they occurred in sequential order. 4. Durability A transaction, once committed, will survive permanently even if the system fails. This includes disk crashes, power outages, etc.
  5. 5. Locking ● Read / write / range locks How do they do this? Concurrency Control ● 2-phase commit (2PC), 3PC protocols ● Distributed locks
  6. 6. But then came the 2000s
  7. 7. And Scale Happened
  8. 8. Traditional RDBMSs were not designed for the needs of modern web applications Global Scale Netflix knows which movies you watched, when, at what point(s) you paused and for how long, etc. It then replicates that data across 3 global data centers. Volume In 2008, Facebook had only 100 million users and needed 8,000 shards of MySQL. Today it has ~ 1.86 Billion users. Speed In 2013 Twitter was recording 150,000 new tweets/second every single day.
  9. 9. What to do? Scale up! (?) - Increase memory, cores, CPU - Cache reads with memcached - Master-slave replication - Sharding
  10. 10. NOT ENOUGH
  11. 11. 2. Redefining Constraints Replacing ACID with BASE
  12. 12. “DMBS research is about ACID (mostly). But we forfeit “C” and “I” for availability, graceful degradation, and performance. This tradeoff is fundamental. - Eric Brewer, 2000
  13. 13. Eric Brewer proposed a new set of properties: BASE Soft State Basically Available Eventual consistency System is always available for clients (but may not be consistent) Database is no longer in charge of “valid” data state. The app is now responsible. If all goes well, all clients will eventually see the same thing. Probably.
  14. 14. In the world of BASE parameters, A different set of priorities rule Availability is most important Weak consistency (i.e. stale data) is okay Approximate answers are okay Aggressive (optimistic) algorithms are okay Simple, fast, easy evolution of the schema is important
  15. 15. A new set of constraints: the CAP Theorem It is impossible for a distributed computer system to simultaneously provide more than 2 of these 3 guarantees: Consistency Availability Partition tolerance (Eric Brewer, 1998-2000)
  16. 16. The CAP Parameters 1. Consistency* All clients get the same view of the data, or they get an error (i.e. every read receives the most recent write) 2. Availability All clients can always read and always write (i.e. every request receives a non-error response) 3. Partition tolerance The system functions even if some nodes are unavailable (i.e. system operates despite an arbitrary number of messages being dropped by the network between nodes)
  17. 17. All NoSQL databases live somewhere on this spectrum, based on how they’re tuned ACID BASE ● What levels of availability do you choose to provide? ● What levels of consistency do you choose to provide? ● What do you do when a partition is detected? ● How do you recover from a partition event?
  18. 18. But wait… we’re not through yet
  19. 19. 2010: Daniel Abadi (Yale) says CAP is misleading The trade-offs defined by CAP’s “pick any 2” are misleading: ● The only time you need to make a trade-off is when there is a partition event (P) ● Systems that sacrifice C must do so all the time ● But systems that sacrifice A only need to do so when there’s a partition Most importantly, you don’t give up C to gain A You give up C to get another missing ingredient: L
  20. 20. LATENCY Latency = how long must a client request wait for your response?
  21. 21. Imagine replicating data across global data centers Data Center 1 Data Center 2 Data Center 3 Data Center 4 Data Center n Data Center 5
  22. 22. “A high availability requirement implies that the system must replicate data. But as soon as a distributed system replicates data, a tradeoff between consistency and latency arises. - Abadi, 2010
  23. 23. The PACELC theorem (Abadi, 2010) In a system that replicates data: If a partition (P) is detected, how does the system trade off ○ (A) Availability or ○ (C) Consistency Else (E) how does the system trade off ○ (L) Latency or ○ (C) Consistency
  24. 24. DDBS P+A P+C E+L E+C Dynamo, Cassandra, Riak Mongo, H-Store, VoltDb Yahoo! PNUTS Comparing NoSQL databases using PACELC
  25. 25. References Images and title ideas from: ○ http://blog.nahurst.com/visual-guide-to-nosql-systems ○ http://digbigdata.com/know-thy-cap-theorem-for-nosql/ Detailed references at: ○ http://www.bardoloi.com/blog/2017/03/06/pacelc-theorem/
  26. 26. thanks! Any questions? You can find me at @bardoloi

×