Benchmarking MongoDB and CouchBase

13,629 views

Published on

4 Comments
34 Likes
Statistics
Notes
  • For full text search, try elastic search! With the mongodb-elasticsearch river setup, you can have full blown search engine integrated with database seamlessly!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Can't understand slide 37. What do you mean with one BIG box? and then you show 28 number of shards.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • @bobliu3 You missed understood me. The tests were ran with the current version, however the problem of performance still persists, and I wasn't the only one who found the performance problem, so I searched for support on Couchbase Support, and came across an old* post which says 'one known issue.... views are optimal on very large clusters...', I was merely making a point they don't seem to have addressed the issue. And that problem has existed since 2011.Does that make more sense now?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • In your slides, you mentioned the Couchbase results are from the developer preview dated in 2011. However, I see the slide posted date is 1/24/2013. So my question is that why post the slides with old data 'now' and are you planning on doing an evaluation of the couchbase official release?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
13,629
On SlideShare
0
From Embeds
0
Number of Embeds
88
Actions
Shares
0
Downloads
176
Comments
4
Likes
34
Embeds 0
No embeds

No notes for slide

Benchmarking MongoDB and CouchBase

  1. 1. BenchmarkingNo-SQL Databases Alex Voss Chris Choi
  2. 2. TOP 2 Questions Should a social scientist buy MORE or UPGRADE computers? Which DATABASE(s)?
  3. 3. Document Oriented Databases
  4. 4. The central concept of adocument-oriented databaseis the notion of a Document
  5. 5. Both andUses JSON to store theseDocuments
  6. 6. Example Documents: {{ FirstName:"Jonathan", FirstName:"Bob", Address:"15 Wanamassa Point Road", Address:"5 Oak St.", Children:[ Hobby:"sailing" {Name:"Michael",Age:10},} {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2} ] }there are no empty fields, and it doesnot require explicitly stating if otherpieces of information are left out
  7. 7. The Database Tests
  8. 8. Database TestAggregate/Count: Most User Mentioned Most Hashed Tags Most Shared URLs
  9. 9. Database TestAggregate/Count: Typical questions Most User Mentioned Most Hashed Tags Most Shared URLs
  10. 10. Has two query approaches: AggregationMap Reduce & Framework
  11. 11. Map Reduce For Aggregation Uses JavaScript
  12. 12. Aggregation Framework Similar to Map Reduce But Uses C++ (Faster!)
  13. 13. Single Node SSD vs HDD Map Reduce Aggregation Framework60 60 Most User Mentioned Most Hashed-Tags Most User Mentioned50 50 Most Hashed-Tags Most Shared URLs Most Shared URLs40 4030 3020 2010 100 0 HDD SSD HDD SSD
  14. 14. Single NodeAggregation Framework is roughly 2 times faster than Map Reduce
  15. 15. When we have more than 1 node ~ Scaling Generally . . .
  16. 16. Scaling BIGGERVertically and FATTER machine!ScalingHorizontally MORE machines!
  17. 17. Replica-SetMaster-SlaveReplicationReplicates datafrom master toslave Only one node is in charge (the Master), and data only travels in one direction (to the Slave)
  18. 18. Replica-Set Single Node vs Replica Set: Aggregation Framework 25 Single Node Replica Set 20 15Time (minutes) 10 5 0 User Mentioned Hashed-Tags Shared URLs
  19. 19. Replica-Set Single Node vs Replica Set: Map Reduce 80 Single Node 70 Replica Set 60 50Time (minutes) 40 30 20 10 0 User Mentioned Hashed-Tags Shared URLs
  20. 20. Replicating Data did not improve query performance(MapReduce and Aggregation Framework) . . . so we tried Sharding
  21. 21. Sharding One BIG stack Two or more smaller stacks
  22. 22. democratic uses a hierarchical scaling approachMaster-SlaveReplication
  23. 23. But Deployment is anissue with MongoDB . . .If you want to shard . . .
  24. 24. documentation suggests minimum 11 nodes, to build a fail-safe sharded cluster2 Shards → 6 nodes (3 each)2 mongos (load balancer) → 2 nodes3 mongod (config servers) → 3 nodes------------------------------------------- Total: 11 nodes
  25. 25. The investment to go fromsingle node --> sharded clusteris very HIGH!
  26. 26. By omitting Fail-Safe featuresFor our tests, we kind of cheated and used: Note: X is variable 2 X Shards → 6 2 nodes 2 1 mongos (load balancer) → 2 0.5 node 3 1 mongod (config servers) → 3 0.5 node ------------------------------------------- Total: 11 3 nodes NOT RECOMMENDED IN PRODUCTION
  27. 27. Which looks something like this: 4 Shard configuration
  28. 28. Or this: 8 Shard configuration
  29. 29. Another problem is withquerying in a shardedenvironment . . .
  30. 30. We couldnt perform queries usingthe Aggregation Framework in theshards, because it was limited by:- Output at every stage of theaggregation can only contain 16MB- >10% RAM usage MongoDB would fail if the limits were reached
  31. 31. 4 Cores 8GB RAM Sharded ClusterMap Reduce: On 3 modest machines Most hash tags (mins) 30.00 Most user mentions (mins) Most shared urls (mins) 25.00 20.00 Time (minutes) 15.00 10.00 5.00 0.00 2 4 8 Most Number of Shards Optimal
  32. 32. 4 Cores 8GB RAM Sharded Cluster Map Reduce: On 3 modest machines Single Node Three Nodes 60.00 60 Most hash tags (mins) Most Hashed-Tags Most user mentions (mins) 50.00 50 Most User Mentioned Most shared urls (mins) Most Shared URLs 40.00 40 Time (minutes)Time (minutes) 30 30.00 20 20.00 10 10.00 0 0.00 HDD SSD 2 4 8 Number of Shards
  33. 33. 4 Cores 8GB RAM Sharded Cluster Map Reduce: On 3 modest machines Single Node Three Nodes 60.00 60 60 Most Hashed-Tags Most hash tags (mins) Most Hashed-Tags Most User Mentioned Most user mentions (mins) 50.00 50 50 Most User Mentioned Most Shared URLs Most shared urls (mins) Most Shared URLs 40 40 40.00 Time (minutes)Time (minutes) Time (minutes) 30 30 30.00 20 20 20.00 10 10 10.00 0 0 0.00 HDD HDD SSD SSD 2 4 8 Number of Shards
  34. 34. Performance of Single Node (SSD)is very comparable to 3 nodes (HDD)of different number of shards! (Single Node with SSD is 2-3 minutes slower than 3 nodes . . . [40 GB corpus])Perhaps its worth purchasing a goodSSD over investing more nodes toimprove aggregate performance . . .
  35. 35. What happens when we scale verticallywith ONE BIG BOX? we get --->
  36. 36. 32 Cores 128GB RAM Sharded Cluster Map Reduce: On a Big Box 12.00 Most hash tags (mins) Most user mentions (mins) Most 10.00 Most shared urls (mins) Optimal 8.00Time (minutes) 6.00 4.00 2.00 0.00 2 4 8 16 24 28 30 Number of Shards
  37. 37. On a SINGLE NODE , while CouchBase is FASTER than MongoDBs MapReduceMongoDBs Aggregation Framework is FASTER than CouchBase
  38. 38. That said adding moreCouchbase nodes was disappointing. . .
  39. 39. 1 2 VS 3 VS Node Nodes Nodes One Node90 Two Nodes Three Nodes80706050403020100 Most user mentioned Most hashed tags Most shared urls
  40. 40. Adding more nodes did not improve query performance . . .
  41. 41. Other users have found similar issues,Couchbase Support says: “One known issue with the current developerpreview of Couchbase is that it slows downon smaller clusters. The views are optimal on very large clusters.” [1] But they did not specify what is “very large” [1] Couchbase performance - real tests - its horrible - or am I doing somethin wrong?
  42. 42. That wasNovember 2011 . . .
  43. 43. Which one?
  44. 44. The obvious answer isFrom a performance standpoint.
  45. 45. Simple flat file aggregation with Javatook approx 60 minsOn the same machine, MongoDB tookapprox30 mins (MapReduce)10 mins (Aggregation Framework)
  46. 46. Deployment Options 140.00 2500 1 Node Total ExecutionHard drive HDD Time 2200Map Reduce MR Costs 120.00 1950 2000 100.00 Approximate Total Cost (GBP) Total Execution Time (mins) 1 Node 1500 80.00 SSD MR 60.00 1000 1000 900 1000 3 Nodes 40.00650 HDD MR MR 500 20.00 AF Big Box 1 Node #2 HDD 0.00 Fast SSD 0 MR Aggregate Framework
  47. 47. But! Has problems with: - Full Text Search - Social Network Graphs - Aggregation
  48. 48. s Full Text Search is slow1 simple query on a singlenode, 44GB corpus tookapprox 10 mins!
  49. 49. Solution: A Fast Full Text Search Engine The same search took only a fraction of a second
  50. 50. To create aSocialNetworkGraph with Can be Complex, since you query multiple times to obtain relationships
  51. 51. Solution: A Popular Graph Database Where you can obtain a network graph with VERY FEW queries
  52. 52. Example: START john=node:node_auto_index(name = John) MATCH john-[:friend]->()-[:friend]->fof RETURN john, fof The above query fetches all nodes who are friends of friends of “Johns”
  53. 53. With Aggregation Is perhaps not the best, we have yet to play with
  54. 54. Or even An SQL language to build MapReduce queriesWith
  55. 55. Moral of the storyUse the right toolsfor the right job
  56. 56. Future workWe aim to experiment with:With possibility in integrating bothwith

×