Scaling MongoDB for real time analytics

1,664 views

Published on

Published in: Technology, Business
1 Comment
4 Likes
Statistics
Notes
  • Is there any video regarding these slides?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
1,664
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
18
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Scaling MongoDB for real time analytics

  1. 1. SHORTCUTS AROUND THEMISTAKES WE’VEMADE SCALING MONGODB @effata slideshare.net/tollmyr David Tollmyr, Platform lead
  2. 2. What we doWe want to make digital advertising anamazing user experience.There is more to metrics that clicks.
  3. 3. Ads
  4. 4. Data
  5. 5. Assembling sessions exposure ping pingevent ping ping ping ➔ ➔ session event ping
  6. 6. Information
  7. 7. Crunching session sessionsession session session sessionsession session session session session ➔ ➔ 42 session session
  8. 8. Metrics
  9. 9. Reports
  10. 10. What we doTrack ads, make pretty reports.
  11. 11. That doesn’tsound so hardWe don’t know when sessions endThere’s a lot of dataIt’s all done in (close to) real time
  12. 12. Numbers200 Gb logs100 million data pointsper day~300 metrics per data point= 6000 updates / s at peak
  13. 13. How we use(d) MongoDB“Virtual memory” to offload data while we waitfor sessions to finishShort time storage (<48 hours) for batch jobs,replays and manual analysisMetrics storage
  14. 14. Why we use MongoDBSchemalessness makes things so much easier,the data we collect changes as we come upwith new ideasSharding makes it possible to scale writesSecondary indexes and rich query language aregreat features (for the metrics store)It’s just… nice
  15. 15. Btw.We use JRuby, it’s awesome
  16. 16. STANDING ONTHE SHOULDERS OF GIANTS WITH JRUBY slideshare.net/iconara
  17. 17. A story in 9 iterations
  18. 18. 1st iterationsecondary indexes and updatesOne document per session, update as newdata comes alongOutcome: 1000% write lock
  19. 19. #1Everything is aboutworking around the GLOBAL WRITE LOCK
  20. 20. MongoDB 1.8.1 db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)
  21. 21. MongoDB 2.0.0 db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)
  22. 22. 2nd iterationusing scans for two step assemblingInstead of updating, save each fragment, thenscan over _id to assemble sessions
  23. 23. 2nd iterationusing scans for two step assemblingOutcome: not as much lock, but still not greatperformance. We also realised we couldn’tremove data fast enough
  24. 24. #2Everything is aboutworking around the GLOBAL WRITE LOCK
  25. 25. #3 Give a lot ofthought to yourPRIMARY KEY
  26. 26. 3rd iterationpartitioningPartitioning the data by writing to a newcollection every hourOutcome: complicated, fragmented database
  27. 27. #4Make sure you canREMOVEOLD DATA
  28. 28. 4th iterationshardingTo get around the global write lock and gethigher write performance we moved to asharded cluster.Outcome: higher write performance, lots ofproblems, lots of ops time spent debugging
  29. 29. #5Everything is aboutworking around the GLOBAL WRITE LOCK
  30. 30. #6 SHARDING IS NOT ASILVER BULLET and it’s complex, if you can, avoid it
  31. 31. #7IT WILL FAIL design for it
  32. 32. 5th iterationmoving things to separate clustersWe saw very different loads on the shards andrealised we had databases with very differentusage patterns, some that made autoshardingnot work. We moved these off the cluster.Outcome: a more balanced and stable cluster
  33. 33. #8Everything is aboutworking around the GLOBAL WRITE LOCK
  34. 34. #9ONE DATABASEwith one usage patternPER CLUSTER
  35. 35. #10 MONITOREVERYTHINGlook at your health graphs daily
  36. 36. 6th iterationmonster machinesWe got new problems removing data andneeded some room to breathe and thinkSolution: upgraded the servers to High-Memory Quadruple Extra Large (with cheese).I♥
  37. 37. #11Don’t try to scale upSCALE OUT
  38. 38. #12 When you’re out of ideasCALL THEEXPERTS
  39. 39. 7th iterationpartitioning (again) and pre-chunkingWe rewrote the database layer to write to anew database each day, and we created allchunks in advance. We also decreased the sizeof our documents by a lot.Outcome: no more problems removing data.
  40. 40. #13Smaller objects means asmaller database, and asmaller database means LESS RAM NEEDED
  41. 41. #14 Give a lot ofthought to yourPRIMARY KEY
  42. 42. #15Everything is aboutworking around the GLOBAL WRITE LOCK
  43. 43. 8th iterationrealize when you have the wrong toolTransient data might not need all the bells andwhistles.Outcome: Redis gave us 100x performance inthe assembling step
  44. 44. #16When all you have is a HAMMEReverything looks like a NAIL
  45. 45. 9th iterationrinse and repeatWe now have the same scaling issues later inthe chain.Outcome: Upcoming rewrite to make writes/updated more effectiveRedis was actually slower
  46. 46. #17Everything is aboutworking around the GLOBAL WRITE LOCK
  47. 47. Thank you @effata slideshare.net/tollmyrengineering.burtcorp.com burtcorp.com richmetrics.com
  48. 48. Since we got time…
  49. 49. TipsEC2You have three copies of your data, do youreally need EBS?Instance store disks are included in the priceand they have predictable performance.m1.xlarge comes with 1.7 TB of storage.
  50. 50. TipsAvoid bulk insertsVery dangerous if there’s a possibility ofduplicate key errorsIt’s not fixed in 2.0 even though the driver has aflag for it.
  51. 51. TipsSafe modeRun every Nth insert in safe modeThis will give you warnings when bad thingshappen; like failovers

×