Comparing NoSQL Databases for Operational Workloads

7,661 views
7,344 views

Published on

These are the slides from the talk I delivered at OmniTI Surge 2013 about Thumbtack's research into NoSQL databases. You can learn more about Thumbtack and our NoSQL practice at http://www.thumbtack.net.

The talk was sponsored by Aerospike (http://www.aerospike.com)

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,661
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
195
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide
  • Tradeoff between speed and scale
  • Examine how they fail over, and what that means in real terms.
  • Comparing NoSQL Databases for Operational Workloads

    1. 1. How to Compare NoSQL Databases Tradeoffs between performance and reliability Ben Engber Thumbtack Technology Sponsored by
    2. 2. Who are we and what do we want?  Consulting company with focus on scalability  Long background of ―no SQL‖  Production deployments across many NoSQL vendors  Engineering staff of 50  Ongoing research teams  Advise people on which solutions to use
    3. 3. Advertised Features MongoDB  Flexibility  JSON documents  Dynamic schema  Power  Secondary indexes  Dynamic queries  Rich updates  Easy aggregation  Speed/Scaling  Ease of use Cassandra  Elastic scalability  Linear performance  Flexible, dynamic schema  Multiple datacenter and cloud readiness  Tunable data consistency  Basic transaction support http://www.mongodb.org/about/introduction/ http://www.datastax.com/what-we-offer/products-services/ datastax-enterprise/apache-cassandra
    4. 4. Why use NoSQL at all?  ―Because I’ve heard of it‖  ―I want rapid application development‖  ―I want to do something with Big Data‖ Operational Workload  High throughput  Multi-user (concurrency)  Integrity and consistency  Small, simple, operations Analytic Workload  Ad hoc analysis  Batch operation on sets  Map-Reduce  Machine learning, predictive analytics, etc.
    5. 5. Landscape of Operational NoSQL DBs Document Stores •MongoDB •MarkLogic Column Family •Cassandra •HBase Key-Value Stores •Aerospike •Voldemoort •Couchbase •Riak What’s the difference between an indexed ―value‖ or ―column‖ and a document? Couchbase 1.x  2.x Aerospike 2.x  3.x
    6. 6. What are we really asking?  ―I want to support a large transaction volume‖  ―I want to distribute my data tier‖  ―I want simpler handling of failover‖  ―I want to scale my data tier horizontally‖ Key-Value Stuff
    7. 7. What about other queries? Shards A,B,C Shards D,E,F Shards G,H,I Shards J,K,L
    8. 8. So, we’re focusing on scale How should we measure operational data?
    9. 9.  Test a bunch of databases  Start with a nice simple workload  (key value storage)  Use a standard client (YCSB)  Then move on to  secondary indexes  even more databases  failover The Plan – Start Simple
    10. 10.  Running a database is easy – running it correctly is hard  Memory sizing, problem sizing, etc.  Consistency tradeoffs  Eviction  Hardware utilization  These databases work in very different ways
    11. 11. CAP Theorem  Consistency / Availability is somewhat academic  Your application needs both  HTTP  Caches  These databases are tunable Consistency Partition Tolerance Availability
    12. 12. What to think about instead?  Consistency  Immediate/Eventual  Convergence  Isolation  Durability  Data loss  Failover  Latency  Availability (downtime) Fast Reliable Most NoSQL databases can sit in multiple places on this spectrum There is a spectrum of choices
    13. 13.  Choose the databases we hear about most often  Create standard baseline scenarios  Measure raw performance for various scenarios  Examine how they fail over
    14. 14. How do databases achieve these guarantees? 6 nodes 6 ―shards‖ — A, B, C, D, E, F Replication factor of 3 2 Scenarios: ―Fast‖ and ―Reliable‖
    15. 15. Master-Slave (MySQL, MongoDB) Node 1 Master: A Node 2 Slave: A Node 3 Slave: A Node 4 Master: B Node 5 Slave: B Node 6 Slave: B Client 1 Write master Read master Write row A quickly Client 2 Write master and observe Read master Write row B durably
    16. 16. Shard Master (Couchbase) Node 1 Master: A Slave: B,C Node 2 Master: B Slave: C,D Node 3 Master: C Slave: D,E Node 6 Master: F Slave: A,B Node 5 Master: E Slave: F,A Node 4 Master: D Slave: E,F Client Write master Read master Write row A quickly Client Write master and observe Read master Write row D durably
    17. 17. Tunable Quorum (Cassandra, Riak) Node 2 B,C,D Node 3 C,D,E Node 4 D,E,F Node 1 A,B,C Node 6 F,A,B Node 5 E,F,A Client 6 Write quorum Read quorum Client 5 Write one Read all Client 4 Write all Read one Client 2 Write one Read one Client 1 Write one Read one Client 3 Write one Read one Read/Write row A quickly Read/Write row D consistently
    18. 18. Transactional Consensus (Aerospike, FoundationDB, Cassandra 2.0) Node 1 A,B,C Node 2 B,C,D Node 3 C,D,E Node 6 F,A,B Node 5 E,F,A Node 4 D,E,F Client 2 Fire and forget Client 1 Fire and forget Client 3 Fire and forget Read/Write row A quickly Client 4 Transactional Client 5 Transactional Read/Write row D ACIDly
    19. 19. Quick and Dirty Conclusion  Systems like MongoDB and Couchbase trade speed for Durability  Systems like Cassandra and Riak and Aerospike trade speed for Consistency  Systems like Aerospike and FoundationDB trade speed for ACID (or parts of it)
    20. 20. Consistency ―In distributed data systems like Cassandra, [consistency] usually means that once a writer has written, all readers will see that write.‖  Row-level (CAS)  Multi-key  Long running transactions ACIDity Old Value New Value Old Value
    21. 21. Reliability Spectrum Aerospike (fast) Cassandra (fast) Couchbase (fast) MongoDB (fast) Aerospike (reliable) Cassandra (reliable) MongoDB (reliable) Replication Model async async async async sync sync sync Consistency Model eventual eventual immediate immediate immediate immediate immediate Data loss on node failure yes yes yes yes no no no Availability on no quorum available available available available available unavailable unavailable Data loss on replica set failure 25% 25% 25% 50% 25% 25% 50%
    22. 22. Create Baselines Fast Reliable  Asynchronous replication  Asynchronous writes to disk  Data set fits in RAM  Immediate or Eventual Consistency  Synchronous replication  Synchronous or asynchronous writes to disk  Data set larger than RAM  Immediate Consistency (+)
    23. 23. Performance Tests 1. Install a database on a 4-node cluster (replication factor of 2) 2. Load a sizable dataset (500M rows) to SSD (―reliable‖) 3. Determine maximum load 4. Perform a stepwise load for latency 5. Repeat for read-heavy and balanced read-write 6. Repeat steps 3-6 for a dataset that fits into RAM (―fast‖)
    24. 24. 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000 Balanced Read-Heavy Aerospike Cassandra MongoDB Couchbase 1.8 Couchbase 2.0 0 50,000 100,000 150,000 200,000 250,000 300,000 350,000 Balanced Read-Heavy Aerospike Cassandra MongoDB Maximum Throughput SSD / Synchronous RAM / Asynchronous
    25. 25. Latency Scenarios 0 2.5 5 7.5 10 0 50,000 100,000 150,000 200,000 AverageLatency,ms Throughput, ops/sec Balanced Workload Read Latency (Full view) Aerospike Cassandra MongoDB 0 4 8 12 16 0 50,000 100,000 150,000 200,000 AverageLatency,ms Throughput, ops/sec Balanced Workload Update Latency (Full view) Aerospike Cassandra MongoDB SSD / Synchronous RAM / Asynchronous 0 5 10 15 20 0 100,000 200,000 300,000 400,000 AverageLatency,ms Throughput, ops/sec Balanced Workload Read Latency (Full view) Aerospike Couchbase 1.8 Couchbase 2.0 Cassandra MongoDB 0 2 4 6 8 0 100,000 200,000 300,000 400,000 AverageLatency,ms Throughput, ops/sec Balanced Workload Update Latency (Full view) Aerospike Couchbase 1.8 Couchbase 2.0 Cassandra MongoDB
    26. 26. Cluster Availability Things to consider: Replication delay Cluster downtime Data loss Things to test: Graceful shutdown kill -9 Split brain
    27. 27. Cluster 4 Nodes 100% Load RAM Asynchronous ―Fast‖ MongoDB Aerospike Couchbase
    28. 28. Aerospike Couchbase Cassandra Cluster 4 Nodes 75% Load RAM Asynchronous ―Fast‖
    29. 29. Aerospike Cassandra 6 node (QUORUM) Cluster 4 Nodes 75% Load Disk Synchronous ―Reliable‖ Cassandra 4 node (ALL)
    30. 30. 0 2000 4000 6000 8000 10000 12000 14000 Aerospike Cassandra Couchbase MongoDB min/maxdowntime(ms) Downtimes on node down Do they fail over? 0 5000 10000 15000 20000 25000 30000 35000 mediandowntime(ms) Downtime on node restore Aerospike Cassandra Couchbase MongoDB
    31. 31. Takeaways  NoSQL databases are converging feature-wise  For operational workloads, key-value is king  The Speed – Reliability spectrum is complex  Consider your application’s likely needs  Pick the best usability features within this window
    32. 32. Questions or Advice? Thumbtack Technology Ben Engber bengber@thumbtack.net http://thumbtack.net/whitepapers @bengber http://www.thumbtack.net Benchmarks and detailed discussion of methodology at:

    ×