Your SlideShare is downloading. ×

CAP, PACELC, and Determinism


Published on

Published in: Technology
1 Comment
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. CAP, PACELC, and Determinism Daniel Abadi Yale University January 18 th , 2011 Twitter: @daniel_abadi Blog:
  • 2. MIT Theorem
  • 3. CAP Theorem
  • 4. Intuition Behind Proof DB 1 DB 2 network
  • 5. Intuition Behind Proof DB 1 DB 2 network partition
  • 6. Problems with CAP
    • Not as elegant as the MIT theorem
  • 7. MIT Theorem
  • 8. CAP Theorem
  • 9. Problems with CAP
    • Not as elegant as the MIT theorem
      • There are not three different choices!
      • CA and CP are indistinguishable
    • Used as an excuse to not bother with consistency
      • “ Availability is really important to me, so CAP says I have to get rid of consistency”
  • 10. You can have A and C when there are no partitions! DB 1 DB 2 network
  • 11. Source of Confusion
    • Asymmetry of CAP properties
      • Some are properties of the system in general
      • Some are properties of the system only when there is a partition
  • 12. Another Problem to Fix
    • There are other costs to consistency (besides availability in the face of partitions)
      • Overhead of synchronization schemes
        • Deterministic execution model significantly reduces the cost (see Thomson and Abadi in VLDB 2010)
      • Latency
        • If workload is geographically partitionable
          • Latency is not so bad
        • Otherwise
          • No way to get around at least one round-trip message
  • 13. A Cut at Fixing Both Problems
    • PACELC
      • In the case of a partition (P), does the system choose availability (A) or consistency (C)?
      • Else (E), does the system choose latency (L) or consistency (C)?
  • 14. Examples
    • PA/EL
      • Dynamo, SimpleDB, Cassandra, Riptano, CouchDB, Cloudant
    • PC/EC
      • ACID compliant database systems
    • PA/EC
      • GenieDB
      • See CIDR paper from Wada, Fekete, et. al.
        • Indicates that Google App Engine data store (eventual consistent option) falls under this category
    • PC/EL: Existence is debatable, see next slide
  • 15. Does PC/EL exist?
    • Strengthening consistency when there is a partition doesn’t seem to make sense
    • I think about it the following way
      • Some EL systems have stronger consistency guarantees than eventual consistency
        • Monotonic reads consistency
        • Timeline consistency
        • Read-your-writes consistency
        • Non-atomic consistency
      • CAP presents a problem for these guarantees
      • Hence, partitioning causes some loss of availability
      • Examples: PNUTs/Sherpa (original option), MongoDB
    • Dan Weinreb makes a case that this is too confusing:
  • 16. A Case for P*/EC
    • Increased push for horizontally scalable transactional database systems
      • Reasons:
        • Cloud computing
        • Distributed applications
        • Desire to deploy applications on cheap, commodity hardware
    • Vast majority of currently available horizontally scalable systems are P*/EL
      • Developed by engineers at Google, Facebook, Yahoo, Amazon, etc.
      • These engineers can handle reduced consistency, but it’s really hard, and there needs to be an option for the rest of us
    • Also
      • Distributed concurrency control and commit protocols are expensive
      • Once consistency is gone, atomicity usually goes next ---> NoSQL
  • 17. Key Problems to Overcome
    • High availability is critical, replication must be a first class citizen
      • Today’s systems generally act, then replicate
        • Complicates semantics of sending read queries to replicas
        • Need confirmation from replica before commit (increased latency) if you want durability and high availability
        • In progress transactions must be aborted upon a master failure
      • Want system that replicates then acts
    • Distributed concurrency control and commit are expensive, want to get rid of them both
  • 18. Key Idea
    • Instead of weakening ACID, strengthen it!
      • Guaranteeing equivalence to SOME serial order makes active replication difficult
        • Running the same set of xacts on two different replicas might cause replicas to diverge
    • Disallow any nondeterministic behavior
    • Disallow aborts caused by DBMS
      • Disallow deadlock
      • Distributed commit much easier if you don’t have to worry about aborts
  • 19. Consequences of Determinism
    • Replicas produce the same output, given the same input,
      • Facilitates active replication
    • Only initial input needs to be logged, state at failure can be reconstructed from this input log (or from a replica)
    • Active distributed xacts not aborted upon node failure
      • Greatly reduces (or eliminates) cost of distributed commit
        • Don’t have to worry about nodes failing during commit protocol
        • Don’t have to worry about affects of transaction making it to disk before promising to commit transaction
        • Just need one message from any node that potentially can deterministically abort the xact
        • This message can be sent in the middle of the xact, as soon as it knows it will commit
  • 20. So how can this be done?
    • Serialize transactions
      • Slow
    • Partition data across cores, serialize transactions in each partition
      • My work with Stonebraker, Madden, and Harizopoulos shows that this works well if workload is partitionable (H-Store project)
      • H-Store project became VoltDB
    • What about non-partitionable workloads?
      • See Thomson and Abadi VLDB 2010 paper: “The case for determinism in database systems”
    • Bottom line:
      • Cost of consistency is less than people think
      • Latency is still an issue over a WAN unless workload is geographically partitionable
      • VLDB 2010 paper makes P*/EC systems more viable