CAP not really too relevant. If you’re using a NoSQL database, you probably are running on a cluster with multiple partitions. And consistency – availability is not necessarily easy to interpret except in a formal theoretical way.Plus, it’s tunable. Plus, what does consistency even mean? Example, Cassandra partial writes.
and have been clobbered by dband managing relational shards is hardRAD – it’s a great reason and very valuable for document and graph type problems. But it’s not what we’re talking about.
- Fair use – they kept yelling at us!All the other tests out there seemed to be running them without understanding the core models. They tended to analyze breadth but not depth.Even choosing the problem is quite hard. For example, Aerospike and Couchbase were the most closely related databases in our test, but the kind of problem space that makes Aerospike interesting is the kind that breaks Couchbase.
Examine how they fail over, and what that means in real terms.
Couchbase is in its default mode is consistent, but by having a master copyData set fits in RAM – think about itTo disk – if the # IOPS is 40k, and the number of transactions is 250k, think about it.
These are all approximateFind vBucket for key, master on Node 1, write
These are all approximateFind vBucket for key, master on Node 1, write
Client connects to something, it can be a coordinator Different drivers with better routing
Similar to Cassandra, but no partial writes – each transaction is
1. Provision the hardware, operating systems, and environment using our best estimates of what customers will run in their data centers and the best practices, where specified, of the various databases.2. Configure it optimally and ensure it is functioning as a single cluster.3. by inserting records individually but as fast as possible.
2ndary indexes – hit all nodes
“Quick” NoSQL Comparison: Measuring performance and failover of Aerospike, Cassandra, Couchbase, and MongoDB
Measuring performance and failover ofAerospike, Cassandra, Couchbase, and MongoDB Ben Engber Thumbtack Technology
A lot of interest in NoSQL databases Discussions around these tend to be confusing ◦ e.g. discussions of CAP Theorem Our goal was to present business use case answers
You want to support a large transaction volume You want to find a way to distribute your data tier You want simpler failover and other administration‣ You want rapid application development
Test a bunch of databases Start with a nice simple workload ◦ (key value storage) Use a standard client (YCSB) Test something new but legit ◦ Solid state performance on bare metal Then move on to ◦ secondary indexes ◦ other databases ◦ failover
Running a database is easy – running it correctly is hard ◦ Memory sizing, problem sizing, etc. ◦ Eviction / ejection ◦ These databases work in very different ways How do we even represent a fair use of a database? What does it mean to test key-value storage
Choose the databases we hear about most often Create standard baseline scenarios Measure raw performance for various scenarios Later: Examine how they fail over
Fast Reliable Not the same as Not the same as Available Consistent Asynchronous Synchronous replication to nodes replication to nodes Asynchronous writes to Synchronous or disk asynchronous writes to disk Data set fits in RAM Data set cannot live in Pretty reliable RAM Pretty fast
Node 2 Master: B Slave:Client Node 1 C,D Node 3Writemaster Master: A Master: CRead master Slave: B,C Slave: D,E Node 6 Node 4 Client Write Master: F Master: D master and Slave: A,B Node 5 Slave: E,F observe Read master Master: E Slave: F,A
Client 1 Node 1 Node 4Writemaster Master: A Master: BRead master Node 2 Node 3 Node 5 Node 6 Slave: A Slave: A Slave: B Slave: B Node 1 Node 4 Client 2 Write Master: A Master: B master and observe Read master Node 2 Node 3 Node 5 Node 6 Slave: A Slave: A Slave: B Slave: B
Client 4 Write quorum Read quorum Node 2 B,C,D Node 1 Node 3Client 1 Client 5Write one Write one A,B,C C,D,ERead one Read all Node 6 Node 4 F,A,B D,E,F Node 5 E,F,A Client 6 Write all Read oneClient 2 Client 3 “In distributed data systems like Cassandra,Write one Write one [consistency] usually means that once aRead one Read one writer has written, all readers will see that write.”
Aerospike Aerospike Cassandra Cassandra Couchbase MongoDB (async) (sync) (async) (sync) (async) (async)Standard Asynchronous Synchronous Asynchronous Synchronous Asynchronous AsynchronousReplicationModelDurability Asynchronous Synchronous Asynchronous Asynchronou Asynchronous Asynchronous sDefault sync 128kB per immediate 10 seconds 10 seconds 250k records 100msbatch deviceConsistency Eventual Immediate Eventual Immediate Immediate ImmediateModelConsistency Inconsistent Consistent Inconsistent Consistent Inconsistent Inconsistenton singlenode failureAvailability Available Available Available Unavailable Available Availableon singlenode failure/ noquorumData loss 25% 25% 25% 25% 25% 50%on replicaset failure
1. Provision according to best practices and reality2. Install a database on a 4-node cluster3. Load a large dataset to disk (SSD)4. Determine maximum load, using the strongest durability guarantees practical5. Perform a stepwise load for latency6. Repeatforread-heavy and balanced read-write7. Repeat steps 3-6 for a dataset that fits into RAM
Aerospike is an SSD optimized DB, and proved itself Asynchronous K/V stores can do a huge amount of traffic This says nothing about secondary indexes Doesn’t answer our other main concern about distributed DBs
Throughput (50%, 75%, 100%) Failure type (graceful, kill -9, split brain) Workload (balanced read-write, mostly read) Replication Model / Durability Model
Downtimes on node down Downtime on node restore 14000 35000 30000 12000 median downtime (ms) 25000 10000 20000min/max downtime (ms) 15000 8000 Aerospike 10000 Cassandra 6000 Couchbase 5000 MongoDB 4000 0 2000 0 Aerospike Cassandra Couchbase MongoDB
Aerospike Aerospike Cassandra Cassandra Couchbase MongoDB (async) (sync) (async) (sync)Original 300,000 150,000 27,500 30,000 375,000 33,750ThroughputOriginal 100% 100% 99% 104% 100% 100%ReplicationDowntime 3,200 1,600 6,300 ∞ 2,400* 4,250(ms)Recovery time 4,500 900 27,000 N/A 5,000 600(ms)Node Down 300,000 149,200 22,000 0 362,000 31,200ThroughputNode Down 52% 52% N/A 54% 50% 50%ReplicationPotential Data large none 220,000 rows none large 2400 rowsLossTime to small 3,300 small small small 31,100stabilize onnode up (ms)Final 300,000 88,300 21,300† 17,500† 362,000 31,200ThroughputFinal 100% 100% 101% 108% 76% 100%Replication † Depends on driver being used. Newer drivers like Hector restore to 100% throughput * Assuming perfect monitoring scripts
For “Fast” scenario, these systems function as advertised ◦ Downtime is low ◦ Performance effect is not dramatic For “Reliable” scenario: ◦ For MongoDB and Cassandra, make sure you have a replication factor of 3 ◦ Include replication lag in your capacity planning For both, understand your potential data loss
Finite wealth ◦ Building and reserving bare metal hardware each with multiple SSDs is expensive ◦ Not a fair synchronous failover test for Cassandra and MongoDB Lab != real world Replication delays
Do it in a larger cluster Measure data loss Measure more than K/V store Get other databases involved ◦ What’s your preference?
Thumbtack Ben Engberhttp://www.thumbtack.net email@example.com NoSQL implementations bengber Everything Else Application scalability Social applications Mobile Cloud migrationshttp://www.slideshare.net/bengber/no-sql-presentationhttp://thumbtack.net/solutions/ThumbtackWhitePaper.html