Databases in 2008Relational is entrenched; NoSQL emergingwith some interesting advantages:• Voldemort• Cassandra• HBase …but the fine print about data guarantees doesn’t look so good.
The CAP2008 theorem• Brewer: Pick 2 out of 3• Werner Vogels (CTO Amazon.com): “Data inconsistency in large-scale reliable distributed systems has to be tolerated … [for performance and to handle faults]”• Wrong descriptions all over the web: “The availability property means that the system is ‘online’ and the client of the system can expect to receive a response for its request.”
CAP2008 Conclusions?• Scaling requires distributed design• Distributed requires high availability• Availability requires no C So, if we want scalability we have to give up C, the cornerstone of ACID. Right?
Thinking about CAP2008• Is a partition worse than a failure?• Three computers can’t agree?• Keyword: Availability… Availability != high availability
Flash forward to CAP2012• Brewer: “Why ‘2 of 3’ is misleading”• Brewer: “CAP prohibits … perfect availability”• Vogles: “Achieving strict consistency can come at a cost in update or read latency, and may result in lower throughput…”• Google (Spanner): “…it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.“
The FoundationDB concept• Attack CAP2008 and deliver transactions at NoSQL performance and scale• Reduce core to minimal feature set• Add features back with higher-level abstractions—“Layers”• Decouple choice of data model and choice of storage technology
FoundationDBDatabase software: Application•Ordered key-value API Layer•Scalable Key-value API•Transactional•Fault tolerant
Engineering pressuresEngineering Challenge StrategyEngineering for extreme reliability Simulationand fault tolerance of large clustersunder adverse conditionsMany asynchronous Erlang?communicating processesFast algorithms; efficient I/O C++ We need new tools!
First tool: Flow• A new programming language• Adds actor-model concurrency to C++11• New keywords: ACTOR, future, promise, wait, choose, when, streams• Flow code -> C++11 code -> binary Seriously?
Flow allows…• Testability by enabling simulation.• Performance by compiling to native.• Easier ACTOR-model coding.
Flow performanceJoe Armstrong (author of “Programming Erlang”):“Write a ring benchmark. Create N processes in a ring.Send a message round the ring M times so that a totalof N * M messages get sent. Time how long this takesfor different values of N and M. Write a similarprogram in some other programming language you arefamiliar with. Compare the results. Write a blog, andpublish the results on the internet!”
Traditional approaches• Glue together smaller transactional systems – Two-phase-commit (Open/X XA) – Paxos• Build on a distributed file system – BigTable/HBase
The FoundationDB approach• Deconstruct a traditional transactional database and scale the individual parts• Each part must also be fault tolerant• The parts: – Accept requests – Check for transaction conflicts – Log transactions – Store data
Key insightChecking for transaction conflicts• Problem is scalable• When highly optimized, is a small amount of the total % of work.• Is tricky to make fault tolerant…
Training montage• Paxos coordination algorithm• Multi-versioned data structures• SSD optimizations• Application-managed page cache• Prioritization deeply integrated• Control theory for queue sizes• Testing, testing, testing
Did we reach our big goals?• High performance• Ease scaling out• Ease of building abstractions• Ease of operation
High performanceFoundationDBdelivers performanceexceeding otherNoSQL databases, butwith transactions!
Ease of scaling out• Add and remove nodes on-the-fly• Single key-space with global transactions• Validated to 96-cores, 48-SSDs
Ease of building abstractions• Transactions enable abstraction• Abstractions very hard to build on non- transactional systems• Ordered data model for performance Abstractions built on a scalable, faulttolerant, transactional foundation inherit those properties.
Examples of “ease”• SQL database in one day• Indexed table layer (3 days * 1 intern)• Fractal spatial index in 200 lines:
Ease of operation• Automatic data partitioning/replication• Highly fault-tolerant• Minimal management Try to break it yourself!
Conclusion• Our mission is to solve the problem of state management so that developers can focus on building their applications• 3+ years in the making, now ready for your applications• Bindings for C, Python, JVM, Node.js, Ruby