Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Power of Determinism in Database Systems

3,788 views

Published on

Slides for Daniel Abadi talk at UC Berkeley on 10/22/2014. Discusses the problems with traditional database systems, especially around modularity and horizontal scalability, and shows how deterministic database systems can help.

Published in: Technology
  • Be the first to comment

The Power of Determinism in Database Systems

  1. 1. The Power of Determinism in Database Systems Daniel J. Abadi Yale University (Joint work with Jose Faleiro, Kun Ren, and Alex Thomson)
  2. 2. Database Systems Are Great • Protects a dataset from corruption or deletion in the face of media, system, or program crashes • Allows programs to change state of data in arbitrary ways • Allows 1000s of such programs to run concurrently – Guarantees atomicity and isolation of such programs • Has served as blueprint for many concurrent, highly complex systems
  3. 3. But … • Design is incredibly complex – Takes $17 million to build a new one • Components are horribly monolithic • Corner case bugs nearly impossible to reproduce • Does not scale horizontally • Does not scale horizontally (seriously) Should the DBMS architecture really be a blueprint for concurrent system design?
  4. 4. Nondeterminism is the problem • Building on top of: – OSes that enable threads to be scheduled arbitrarily – Networks that deliver messages with arbitrary delays (and sometimes in arbitrary orders) – Hardware that can fail arbitrarily • Only natural to allow the state of the database to be dependent on these nondeterministic events
  5. 5. Nondeterminism is the problem • OS non-deterministic thread scheduling leads to: – Arbitrary transaction interleaving – Deadlocks – Difficult to reproduce bugs – Tight interactions between lock manager, recovery manager, access manager, and transaction manager. • Hardware failures and message delivery delays result in transaction aborts – Need complicated recovery manager to handle half-completed transactions – Need commit-protocol for distributed transactions
  6. 6. How to eliminate nondeterminism? • There exist proposals for: – Deterministic operating systems – (Somewhat) deterministic networking layers – Highly redundant and reliable hardware • Maybe one day those proposals will come with fewer disadvantages • In the meantime, we have to create determinism from nondeterministic components – Select and choose what we make deterministic
  7. 7. Possible determinism levels • Given an input and initial state of the database system, to get to one and only one possible final state: – Level 1: System always runs the same sequence of instructions – Level 2: System always proceeds through the same sequence of states of the database – Level 3: Database is allowed to proceed through states in any order as long as the final state of all external and internal data structures is determined by the input – Level 4: Database is allowed to proceed through states in any order as long as the final state of all external structures is determined by the input
  8. 8. Database Systems Problems • Design is incredibly complex – Takes $17 million to build a new one • Components are horribly monolithic • Corner case bugs nearly impossible to reproduce • Does not scale horizontally • Does not scale horizontally
  9. 9. Database Systems Problems • Design is incredibly complex – Takes $17 million to build a new one LEVEL 4 DETERMINISM HELPS WITH ALL OF • Components are horribly monolithic • Corner case bugs nearly impossible to reproduce • Does not scale THESE horizontally • Does not scale horizontally
  10. 10. Recovery • Brain-dead version: – Log all input to the system – Upon a failure, trash the entire database, reply input log from the beginning • Less brain-dead version: – Create checkpoints of database state as of some point in the input log – Upon a failure, trash the entire database, load checkpoint, replay input log from point where checkpoint was taken • Note that logging can happen entirely externally to the DBMS • Same is true for checkpointing, although may want to perform it inside the DBMS for performance – Even in this case, it needs very little knowledge about other components
  11. 11. Replication • Send the same input log to replica DBMS – User-visible state in replicas will not diverge – Can happen entirely externally to the DBMS
  12. 12. Horizontal Scalability • Active distributed xacts not aborted upon node failure – Greatly reduces (or eliminates) cost of distributed commit • Don’t have to worry about nodes failing during commit protocol • Don’t have to worry about affects of transaction making it to disk before promising to commit transaction • Just need one message from any node that potentially can deterministically abort the xact – This message can be sent in the middle of the xact, as soon as it knows it will commit
  13. 13. One Way to Implement Determinism • Use a preprocessor to handle client communications, and create a log of submitted xacts • Send log in batches to DBMS • Every xact immediately requests all locks it will need (in order of log) • If it doesn’t know what it will need – Run enough of the xact to find out, but do not change the database state – Reissue xact to the preprocessor with lock requirements included as parameter – Run enough of the new xact to find out if it locked the correct items (database state might have changed in the meantime) • If so, then xact can proceed as normal • If not, reissue again to the preprocessor and repeat as necessary • Trivial to prove this is deterministic and deadlock-free
  14. 14. What’s the Downside? • Increased latency to log input transactions and send to the DBMS in batches • No flexibility for the system to abort transactions on a whim • Can’t reorder transaction execution if one xact stalls mid-transaction • Need to determine what will be locked in advance
  15. 15. Additional Upside • Our implementation eliminates deadlocks – Distributed deadlock is a major problem for distributed DBMSs • Lock manager totally separate from the rest of DBMS – Increases modularity of the system
  16. 16. Experimental Evaluation • Experiments conducted on Amazon EC2 using m3.2xlarge(Double Extra Large) • Cluster of 8 nodes • TPC-C • Microbenchmark: – 10RMW actions – 10RMW actions + CPU computation
  17. 17. TPC-C
  18. 18. Microbenchmark Experiments (Long xacts) 250000 200000 150000 100000 50000 0 0% 20% 40% 60% 80% 100% Transactions per second per node % Distributed Transactions Deterministic, high contention Nondeterministic, high contention Deterministic, low contention Nondeterministic, low contention Nondeterministic w/o 2PC, low contention Nondeterministic w/o 2PC, high contention
  19. 19. Microbenchmark Experiments (Short xacts) 600000 500000 400000 300000 200000 100000 0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Transactions per second per node % Distributed Transactions Deterministic, high contention Nondeterministic, high contention Nondeterministic, low contention Deterministic, low contention Deterministic w/ VLL, low contention Deterministic w/ VLL, high contention
  20. 20. Resource Constraints Experiments 250000 200000 150000 100000 50000 0 Deterministic, 5% distributed Deterministic, 100% distributed Nondeterministic, 5% distributed Nondeterministic, 100% distributed Nondeterministic w/ throttling, 5% distributed 0 5 10 15 20 25 30 35 40 45 throughput (txns/sec) time (seconds)
  21. 21. Dependent Transactions Experiments 70000 60000 50000 40000 30000 20000 Deterministic, 0% dependent Deterministic, 20% dependent Deterministic, 50% dependent Deterministic, 100% dependent Nondeterministic, 0% dependent Nondeterministic, 20% dependent Nondeterministic, 50% dependent Nondeterministic, 100% dependent 0 (a) 0% distributed transactions Deterministic, 0% dependent Deterministic, 20% dependent Deterministic, 50% dependent Deterministic, 100% dependent Nondeterministic, 0% dependent Nondeterministic, 20% dependent Nondeterministic, 50% dependent Nondeterministic, 100% dependent (b)100% distributed transactions 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 0.01 0.1 1 10 100 throughput (txns/sec) index entry volatility 10000 0.01 0.1 1 10 100 throughput (txns/sec) index entry volatility
  22. 22. Latency CDF
  23. 23. More information • The Case for Determinism in Database Systems Alexander Thomson and Daniel J. Abadi. In PVLDB, 3(1), 2010. (pdf) • Calvin: Fast Distributed Transactions for Partitioned Database Systems Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. In Proceedings of SIGMOD, 2012. (pdf) • An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems Kun Ren, Alexander Thomson and Daniel J. Abadi. In PVLDB, 7(10), 2014. (pdf) • Modularity and Scalability in Calvin Alexander Thomson and Daniel J. Abadi. In IEEE Data Eng. Bull., 36(2): 48-55, 2013. (pdf) • Lightweight Locking for Main Memory Database Systems Kun Ren, Alexander Thomson, and Daniel J. Abadi. In PVLDB 6(2): 145-156, 2012. (pdf)
  24. 24. Conclusions • Determinism not a good fit for latency-sensitive applications • Fewer options to deal with node overload (true only for lock-based implementation) • Much improved throughput for distributed transactions • Much simpler design. Recover manager, lock manager, totally separate from rest of DBMS • Replication is trivial

×