CAP, PACELC, and Determinism


Published on

Published in: Technology

CAP, PACELC, and Determinism

  1. 1. CAP, PACELC, and Determinism Daniel Abadi Yale University January 18 th , 2011 Twitter: @daniel_abadi Blog:
  2. 2. MIT Theorem
  3. 3. CAP Theorem
  4. 4. Intuition Behind Proof DB 1 DB 2 network
  5. 5. Intuition Behind Proof DB 1 DB 2 network partition
  6. 6. Problems with CAP <ul><li>Not as elegant as the MIT theorem </li></ul>
  7. 7. MIT Theorem
  8. 8. CAP Theorem
  9. 9. Problems with CAP <ul><li>Not as elegant as the MIT theorem </li></ul><ul><ul><li>There are not three different choices! </li></ul></ul><ul><ul><li>CA and CP are indistinguishable </li></ul></ul><ul><li>Used as an excuse to not bother with consistency </li></ul><ul><ul><li>“ Availability is really important to me, so CAP says I have to get rid of consistency” </li></ul></ul>
  10. 10. You can have A and C when there are no partitions! DB 1 DB 2 network
  11. 11. Source of Confusion <ul><li>Asymmetry of CAP properties </li></ul><ul><ul><li>Some are properties of the system in general </li></ul></ul><ul><ul><li>Some are properties of the system only when there is a partition </li></ul></ul>
  12. 12. Another Problem to Fix <ul><li>There are other costs to consistency (besides availability in the face of partitions) </li></ul><ul><ul><li>Overhead of synchronization schemes </li></ul></ul><ul><ul><ul><li>Deterministic execution model significantly reduces the cost (see Thomson and Abadi in VLDB 2010) </li></ul></ul></ul><ul><ul><li>Latency </li></ul></ul><ul><ul><ul><li>If workload is geographically partitionable </li></ul></ul></ul><ul><ul><ul><ul><li>Latency is not so bad </li></ul></ul></ul></ul><ul><ul><ul><li>Otherwise </li></ul></ul></ul><ul><ul><ul><ul><li>No way to get around at least one round-trip message </li></ul></ul></ul></ul>
  13. 13. A Cut at Fixing Both Problems <ul><li>PACELC </li></ul><ul><ul><li>In the case of a partition (P), does the system choose availability (A) or consistency (C)? </li></ul></ul><ul><ul><li>Else (E), does the system choose latency (L) or consistency (C)? </li></ul></ul>
  14. 14. Examples <ul><li>PA/EL </li></ul><ul><ul><li>Dynamo, SimpleDB, Cassandra, Riptano, CouchDB, Cloudant </li></ul></ul><ul><li>PC/EC </li></ul><ul><ul><li>ACID compliant database systems </li></ul></ul><ul><li>PA/EC </li></ul><ul><ul><li>GenieDB </li></ul></ul><ul><ul><li>See CIDR paper from Wada, Fekete, et. al. </li></ul></ul><ul><ul><ul><li>Indicates that Google App Engine data store (eventual consistent option) falls under this category </li></ul></ul></ul><ul><li>PC/EL: Existence is debatable, see next slide </li></ul>
  15. 15. Does PC/EL exist? <ul><li>Strengthening consistency when there is a partition doesn’t seem to make sense </li></ul><ul><li>I think about it the following way </li></ul><ul><ul><li>Some EL systems have stronger consistency guarantees than eventual consistency </li></ul></ul><ul><ul><ul><li>Monotonic reads consistency </li></ul></ul></ul><ul><ul><ul><li>Timeline consistency </li></ul></ul></ul><ul><ul><ul><li>Read-your-writes consistency </li></ul></ul></ul><ul><ul><ul><li>Non-atomic consistency </li></ul></ul></ul><ul><ul><li>CAP presents a problem for these guarantees </li></ul></ul><ul><ul><li>Hence, partitioning causes some loss of availability </li></ul></ul><ul><ul><li>Examples: PNUTs/Sherpa (original option), MongoDB </li></ul></ul><ul><li>Dan Weinreb makes a case that this is too confusing: </li></ul>
  16. 16. A Case for P*/EC <ul><li>Increased push for horizontally scalable transactional database systems </li></ul><ul><ul><li>Reasons: </li></ul></ul><ul><ul><ul><li>Cloud computing </li></ul></ul></ul><ul><ul><ul><li>Distributed applications </li></ul></ul></ul><ul><ul><ul><li>Desire to deploy applications on cheap, commodity hardware </li></ul></ul></ul><ul><li>Vast majority of currently available horizontally scalable systems are P*/EL </li></ul><ul><ul><li>Developed by engineers at Google, Facebook, Yahoo, Amazon, etc. </li></ul></ul><ul><ul><li>These engineers can handle reduced consistency, but it’s really hard, and there needs to be an option for the rest of us </li></ul></ul><ul><li>Also </li></ul><ul><ul><li>Distributed concurrency control and commit protocols are expensive </li></ul></ul><ul><ul><li>Once consistency is gone, atomicity usually goes next ---> NoSQL </li></ul></ul>
  17. 17. Key Problems to Overcome <ul><li>High availability is critical, replication must be a first class citizen </li></ul><ul><ul><li>Today’s systems generally act, then replicate </li></ul></ul><ul><ul><ul><li>Complicates semantics of sending read queries to replicas </li></ul></ul></ul><ul><ul><ul><li>Need confirmation from replica before commit (increased latency) if you want durability and high availability </li></ul></ul></ul><ul><ul><ul><li>In progress transactions must be aborted upon a master failure </li></ul></ul></ul><ul><ul><li>Want system that replicates then acts </li></ul></ul><ul><li>Distributed concurrency control and commit are expensive, want to get rid of them both </li></ul>
  18. 18. Key Idea <ul><li>Instead of weakening ACID, strengthen it! </li></ul><ul><ul><li>Guaranteeing equivalence to SOME serial order makes active replication difficult </li></ul></ul><ul><ul><ul><li>Running the same set of xacts on two different replicas might cause replicas to diverge </li></ul></ul></ul><ul><li>Disallow any nondeterministic behavior </li></ul><ul><li>Disallow aborts caused by DBMS </li></ul><ul><ul><li>Disallow deadlock </li></ul></ul><ul><ul><li>Distributed commit much easier if you don’t have to worry about aborts </li></ul></ul>
  19. 19. Consequences of Determinism <ul><li>Replicas produce the same output, given the same input, </li></ul><ul><ul><li>Facilitates active replication </li></ul></ul><ul><li>Only initial input needs to be logged, state at failure can be reconstructed from this input log (or from a replica) </li></ul><ul><li>Active distributed xacts not aborted upon node failure </li></ul><ul><ul><li>Greatly reduces (or eliminates) cost of distributed commit </li></ul></ul><ul><ul><ul><li>Don’t have to worry about nodes failing during commit protocol </li></ul></ul></ul><ul><ul><ul><li>Don’t have to worry about affects of transaction making it to disk before promising to commit transaction </li></ul></ul></ul><ul><ul><ul><li>Just need one message from any node that potentially can deterministically abort the xact </li></ul></ul></ul><ul><ul><ul><li>This message can be sent in the middle of the xact, as soon as it knows it will commit </li></ul></ul></ul>
  20. 20. So how can this be done? <ul><li>Serialize transactions </li></ul><ul><ul><li>Slow </li></ul></ul><ul><li>Partition data across cores, serialize transactions in each partition </li></ul><ul><ul><li>My work with Stonebraker, Madden, and Harizopoulos shows that this works well if workload is partitionable (H-Store project) </li></ul></ul><ul><ul><li>H-Store project became VoltDB </li></ul></ul><ul><li>What about non-partitionable workloads? </li></ul><ul><ul><li>See Thomson and Abadi VLDB 2010 paper: “The case for determinism in database systems” </li></ul></ul><ul><ul><li> </li></ul></ul><ul><li>Bottom line: </li></ul><ul><ul><li>Cost of consistency is less than people think </li></ul></ul><ul><ul><li>Latency is still an issue over a WAN unless workload is geographically partitionable </li></ul></ul><ul><ul><li>VLDB 2010 paper makes P*/EC systems more viable </li></ul></ul>