How to Compare NoSQL Databases: Aerospike, Cassandra, Couchbase, NoSQL


Published on

Thumbtack's talk from the NoSQL Matters conference in Cologne, Germany. In it we discuss in some depth what it really means to choose a consistency model for your data, and the various tradeoffs involved in each. We then look at performance and failover characteristics for four databases.

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • NoSQL before nosqlCAP not really too relevant. If you’re using a NoSQL database, you probably are running on a cluster with multiple partitions. And consistency – availability is not necessarily easy to interpret except in a formal theoretical way.Plus, it’s tunable. Plus, what does consistency even mean? Example, Cassandra partial writes.
  • What is a secondary index  more key-value lookupsit’s just a matter of time before this gravy becomes a problem
  • Examine how they fail over, and what that means in real terms.
  • Couchbase is in its default mode is consistent, but by having a master copyData set fits in RAM – think about itTo disk – if the # IOPS is 40k, and the number of transactions is 250k, think about it.
  • These are all approximateFind vBucket for key, master on Node 1, write
  • These are all approximateFind vBucket for key, master on Node 1, write
  • Client connects to something, it can be a coordinator Different drivers with better routing
  • Similar to Cassandra, but no partial writes – each transaction is
  • 1. Provision the hardware, operating systems, and environment using our best estimates of what customers will run in their data centers and the best practices, where specified, of the various databases.2. Configure it optimally and ensure it is functioning as a single cluster.3. by inserting records individually but as fast as possible.
  • What does this mean – look at flat latency for both Couchbase and Aerospike.
  • 2ndary indexes – hit all nodes
  • How to Compare NoSQL Databases: Aerospike, Cassandra, Couchbase, NoSQL

    1. 1. Focusing on key characteristics ofAerospike, Cassandra, Couchbase, and MongoDBBen EngberThumbtack Technology
    2. 2.  Consulting company with focus on scalability Long background of “no SQL”◦ Production deployments across many NoSQLvendors Engineering team of 50, 6 months of research People aren’t asking the right questions◦ Document-oriented feature sets◦ CAP Theorem A number of reports, but little good data
    3. 3.  “Because I’ve heard of it” “I want rapid application development” “I want to support a large transaction volume” “I want to distribute my data tier” “I want simpler administration”Key-Valuey Stuff
    4. 4.  Choose the databaseswe hear about mostoften Create standardbaseline key-valuescenarios Measure rawperformance foreach Examine failover
    5. 5. Fast Reliable Not the same asAvailable Asynchronousreplication to nodes Asynchronous writes todisk Data set fits in RAM Not the same asConsistent Synchronousreplication to nodes Synchronous orasynchronous writes todisk Data set much too bigfor RAM
    6. 6. 6 nodes6 virtual shardsReplication factor of 3
    7. 7. Node 1Master: ASlave: B,CNode 2Master: BSlave:C,D Node 3Master: CSlave: D,ENode 6Master: FSlave: A,B Node 5Master: ESlave: F,ANode 4Master: DSlave: E,FClientWritemasterRead masterClientWritemaster andobserveRead master
    8. 8. Node 1Master: ANode 2Slave: ANode 3Slave: AClient 1WritemasterRead masterNode 4Master: BNode 5Slave: BNode 6Slave: BNode 1Master: ANode 2Slave: ANode 3Slave: AClient 2Writemaster andobserveRead masterNode 4Master: BNode 5Slave: BNode 6Slave: B
    9. 9. Node 1A,B,CNode 2B,C,DNode 3C,D,ENode 6F,A,BNode 5E,F,ANode 4D,E,FClient 2Write oneRead oneClient 6Write allRead oneClient 1Write oneRead oneClient 3Write oneRead oneClient 4Write quorumRead quorumClient 5Write oneRead all“In distributed data systems like Cassandra,[consistency] usually means that once awriter has written, all readers will see thatwrite.”
    10. 10. Node 1A,B,CNode 2B,C,DNode 3C,D,ENode 6F,A,BNode 5E,F,ANode 4D,E,FClient 2Fire andforgetClient 1Fire andforgetClient 3Fire andforgetClient 4ACIDClient 5ACID
    11. 11. Aerospike(fast)Cassandra(fast)Couchbase(fast)MongoDB(fast)Aerospike(reliable)Cassandra(reliable)MongoDB(reliable)StandardReplicationModelasync async async async sync sync syncDefault syncbatch128kB perdevice10 seconds 250krecords100ms ? ms 10 seconds 100msConsistencyModeleventual eventual immediate immediate immediate immediate immediateConsistencyon single nodefailureinconsistent inconsistent inconsistent inconsistent consistent consistent consistentAvailability onsingle nodefailure / noquorumavailable available available available available unavailable* unavailable*Data loss onreplica setfailure25% 25% 25% 50% 25% 25% 50%
    12. 12. 1. Provision according to best practices and reality2. Install a database on a 4-node cluster(replication factor of 2)3. Load a large dataset (500M rows) to disk (SSD)4. Determine maximum load5. Perform a stepwise load for latency6. Repeat for read-heavy and balanced read-write7. Repeat steps 3-6 for a dataset that fits into RAM
    13. 13. SSD / Synchronous RAM / Asynchronous050,000100,000150,000200,000250,000300,000350,000Balanced Read-HeavyAerospikeCassandraMongoDB0100,000200,000300,000400,000500,000600,000700,000800,000900,0001,000,000Balanced Read-HeavyAerospikeCassandraMongoDBCouchbase 1.8Couchbase 2.0
    14. 14. 02.557.5100 100,000 200,000AverageLatency,msThroughput, ops/secBalanced Workload Read Latency (Full view)AerospikeCassandraMongoDB04812160 50,000 100,000 150,000 200,000AverageLatency,msThroughput, ops/secBalanced Workload Update Latency (Full view)AerospikeCassandraMongoDBSSD / Synchronous RAM / Asynchronous051015200 100,000 200,000 300,000 400,000AverageLatency,msThroughput, ops/secBalanced Workload Read Latency (Full view)AerospikeCouchbase1.8Couchbase2.0024680 100,000 200,000 300,000 400,000AverageLatency,msThroughput, ops/secBalanced Workload Update Latency (Full view)AerospikeCouchbase1.8Couchbase2.0
    15. 15.  The world will need to scale K/V stores can do a huge amount of traffic Think about your durability models◦ Consistency might not mean what you think it does Do these systems really handle partitionswell?
    16. 16.  Throughput (50%, 75%, 100%) Failure type (graceful, kill -9, split brain) Workload (balanced read-write, mostly read) Replication Model / Durability Model
    17. 17. Aerospike CassandraCouchbase MongoDB300k300k27.5k27.5k
    18. 18. Aerospike CassandraCouchbase MongoDB
    19. 19. Aerospike CassandraCouchbase MongoDB150k 22.5k22.5k
    20. 20. Aerospike CassandraMongoDBSee how easy it is to dosomething wrong?20k4.5k20k
    21. 21. 02000400060008000100001200014000Aerospike Cassandra Couchbase MongoDBmin/maxdowntime(ms)Downtimes on node down05000100001500020000250003000035000mediandowntime(ms)Downtime on node restoreAerospikeCassandraCouchbaseMongoDB
    22. 22.  For “Fast” scenario, these systems just work◦ Downtime is low◦ Performance effect is not dramatic◦ You’re losing data on even one node failure For “Reliable” scenario, plan capacity carefully◦ MongoDB and Cassandra: min replication factor is 3◦ Include replication effects in capacity planning
    23. 23.  Finite wealth◦ Building and reserving bare metal hardware eachwith multiple SSDs is expensive◦ Doing tests on a cloud is hard to be fair Quantify data loss How do secondary indexes change thepicture? Get other databases involved What’s your preference?
    24. 24. Thumbtack Ben Engber NoSQL implementations Application scalability Social applications Mobile Cloud migrationsbengber@thumbtack.net bengber