Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CAP, PACELC, and Determinism


Published on

Published in: Technology

CAP, PACELC, and Determinism

  1. 1. CAP, PACELC, and Determinism Daniel Abadi Yale University January 18 th , 2011 Twitter: @daniel_abadi Blog:
  2. 2. MIT Theorem
  3. 3. CAP Theorem
  4. 4. Intuition Behind Proof DB 1 DB 2 network
  5. 5. Intuition Behind Proof DB 1 DB 2 network partition
  6. 6. Problems with CAP <ul><li>Not as elegant as the MIT theorem </li></ul>
  7. 7. MIT Theorem
  8. 8. CAP Theorem
  9. 9. Problems with CAP <ul><li>Not as elegant as the MIT theorem </li></ul><ul><ul><li>There are not three different choices! </li></ul></ul><ul><ul><li>CA and CP are indistinguishable </li></ul></ul><ul><li>Used as an excuse to not bother with consistency </li></ul><ul><ul><li>“ Availability is really important to me, so CAP says I have to get rid of consistency” </li></ul></ul>
  10. 10. You can have A and C when there are no partitions! DB 1 DB 2 network
  11. 11. Source of Confusion <ul><li>Asymmetry of CAP properties </li></ul><ul><ul><li>Some are properties of the system in general </li></ul></ul><ul><ul><li>Some are properties of the system only when there is a partition </li></ul></ul>
  12. 12. Another Problem to Fix <ul><li>There are other costs to consistency (besides availability in the face of partitions) </li></ul><ul><ul><li>Overhead of synchronization schemes </li></ul></ul><ul><ul><ul><li>Deterministic execution model significantly reduces the cost (see Thomson and Abadi in VLDB 2010) </li></ul></ul></ul><ul><ul><li>Latency </li></ul></ul><ul><ul><ul><li>If workload is geographically partitionable </li></ul></ul></ul><ul><ul><ul><ul><li>Latency is not so bad </li></ul></ul></ul></ul><ul><ul><ul><li>Otherwise </li></ul></ul></ul><ul><ul><ul><ul><li>No way to get around at least one round-trip message </li></ul></ul></ul></ul>
  13. 13. A Cut at Fixing Both Problems <ul><li>PACELC </li></ul><ul><ul><li>In the case of a partition (P), does the system choose availability (A) or consistency (C)? </li></ul></ul><ul><ul><li>Else (E), does the system choose latency (L) or consistency (C)? </li></ul></ul>
  14. 14. Examples <ul><li>PA/EL </li></ul><ul><ul><li>Dynamo, SimpleDB, Cassandra, Riptano, CouchDB, Cloudant </li></ul></ul><ul><li>PC/EC </li></ul><ul><ul><li>ACID compliant database systems </li></ul></ul><ul><li>PA/EC </li></ul><ul><ul><li>GenieDB </li></ul></ul><ul><ul><li>See CIDR paper from Wada, Fekete, et. al. </li></ul></ul><ul><ul><ul><li>Indicates that Google App Engine data store (eventual consistent option) falls under this category </li></ul></ul></ul><ul><li>PC/EL: Existence is debatable, see next slide </li></ul>
  15. 15. Does PC/EL exist? <ul><li>Strengthening consistency when there is a partition doesn’t seem to make sense </li></ul><ul><li>I think about it the following way </li></ul><ul><ul><li>Some EL systems have stronger consistency guarantees than eventual consistency </li></ul></ul><ul><ul><ul><li>Monotonic reads consistency </li></ul></ul></ul><ul><ul><ul><li>Timeline consistency </li></ul></ul></ul><ul><ul><ul><li>Read-your-writes consistency </li></ul></ul></ul><ul><ul><ul><li>Non-atomic consistency </li></ul></ul></ul><ul><ul><li>CAP presents a problem for these guarantees </li></ul></ul><ul><ul><li>Hence, partitioning causes some loss of availability </li></ul></ul><ul><ul><li>Examples: PNUTs/Sherpa (original option), MongoDB </li></ul></ul><ul><li>Dan Weinreb makes a case that this is too confusing: </li></ul>
  16. 16. A Case for P*/EC <ul><li>Increased push for horizontally scalable transactional database systems </li></ul><ul><ul><li>Reasons: </li></ul></ul><ul><ul><ul><li>Cloud computing </li></ul></ul></ul><ul><ul><ul><li>Distributed applications </li></ul></ul></ul><ul><ul><ul><li>Desire to deploy applications on cheap, commodity hardware </li></ul></ul></ul><ul><li>Vast majority of currently available horizontally scalable systems are P*/EL </li></ul><ul><ul><li>Developed by engineers at Google, Facebook, Yahoo, Amazon, etc. </li></ul></ul><ul><ul><li>These engineers can handle reduced consistency, but it’s really hard, and there needs to be an option for the rest of us </li></ul></ul><ul><li>Also </li></ul><ul><ul><li>Distributed concurrency control and commit protocols are expensive </li></ul></ul><ul><ul><li>Once consistency is gone, atomicity usually goes next ---> NoSQL </li></ul></ul>
  17. 17. Key Problems to Overcome <ul><li>High availability is critical, replication must be a first class citizen </li></ul><ul><ul><li>Today’s systems generally act, then replicate </li></ul></ul><ul><ul><ul><li>Complicates semantics of sending read queries to replicas </li></ul></ul></ul><ul><ul><ul><li>Need confirmation from replica before commit (increased latency) if you want durability and high availability </li></ul></ul></ul><ul><ul><ul><li>In progress transactions must be aborted upon a master failure </li></ul></ul></ul><ul><ul><li>Want system that replicates then acts </li></ul></ul><ul><li>Distributed concurrency control and commit are expensive, want to get rid of them both </li></ul>
  18. 18. Key Idea <ul><li>Instead of weakening ACID, strengthen it! </li></ul><ul><ul><li>Guaranteeing equivalence to SOME serial order makes active replication difficult </li></ul></ul><ul><ul><ul><li>Running the same set of xacts on two different replicas might cause replicas to diverge </li></ul></ul></ul><ul><li>Disallow any nondeterministic behavior </li></ul><ul><li>Disallow aborts caused by DBMS </li></ul><ul><ul><li>Disallow deadlock </li></ul></ul><ul><ul><li>Distributed commit much easier if you don’t have to worry about aborts </li></ul></ul>
  19. 19. Consequences of Determinism <ul><li>Replicas produce the same output, given the same input, </li></ul><ul><ul><li>Facilitates active replication </li></ul></ul><ul><li>Only initial input needs to be logged, state at failure can be reconstructed from this input log (or from a replica) </li></ul><ul><li>Active distributed xacts not aborted upon node failure </li></ul><ul><ul><li>Greatly reduces (or eliminates) cost of distributed commit </li></ul></ul><ul><ul><ul><li>Don’t have to worry about nodes failing during commit protocol </li></ul></ul></ul><ul><ul><ul><li>Don’t have to worry about affects of transaction making it to disk before promising to commit transaction </li></ul></ul></ul><ul><ul><ul><li>Just need one message from any node that potentially can deterministically abort the xact </li></ul></ul></ul><ul><ul><ul><li>This message can be sent in the middle of the xact, as soon as it knows it will commit </li></ul></ul></ul>
  20. 20. So how can this be done? <ul><li>Serialize transactions </li></ul><ul><ul><li>Slow </li></ul></ul><ul><li>Partition data across cores, serialize transactions in each partition </li></ul><ul><ul><li>My work with Stonebraker, Madden, and Harizopoulos shows that this works well if workload is partitionable (H-Store project) </li></ul></ul><ul><ul><li>H-Store project became VoltDB </li></ul></ul><ul><li>What about non-partitionable workloads? </li></ul><ul><ul><li>See Thomson and Abadi VLDB 2010 paper: “The case for determinism in database systems” </li></ul></ul><ul><ul><li> </li></ul></ul><ul><li>Bottom line: </li></ul><ul><ul><li>Cost of consistency is less than people think </li></ul></ul><ul><ul><li>Latency is still an issue over a WAN unless workload is geographically partitionable </li></ul></ul><ul><ul><li>VLDB 2010 paper makes P*/EC systems more viable </li></ul></ul>