The Rough Guide to MongoDB

1,513 views
1,418 views

Published on

Simeon Simeonov, Founder & CTO of Swoop, shares how Swoop uses Mongo behind the scenes for their high-performance core data processing and analytics. The presentation goes over tips and tricks such as zero-overhead hierarchical relationships with MongoDB, high-performance MongoDB atomic update buffering, content-addressed storage using cryptographic hashing and more. Presented to the Boston MongoDB User Group.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,513
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

The Rough Guide to MongoDB

  1. 1. The Rough Guide to MongoDB Simeon Simeonov @simeons
  2. 2. Founding. Funding. Growing. Startups.
  3. 3. Why MongoDB?
  4. 4. I am @simeons
  5. 5. recruit amazing people solve hard problems ship make users happy repeat
  6. 6. Why MongoDB? Again, please
  7. 7. SQL is slow (for our business)
  8. 8. SQL is slow (for our developer workflow)
  9. 9. SQL is slow (for our analytics system)
  10. 10. So what’s Swoop?
  11. 11. Display Advertising Makes the Web Suck User-focused optimization Tens of millions of users 1000+% better than average 200+% better than Google Swoop Fixes That
  12. 12. Mobile SDKs iOS & Android Web SDK RequireJS & jQuery Components AngularJS NLP, etc. Python Targeting High-Perf Java Analytics Ruby 2.0 Internal Apps Ruby 2.0 / Rails 3 Pub Portal Ruby 2.0 / Rails 3 Ad Portal Ruby 2.0 / Rails 4
  13. 13. MongoDB: the Good Fast Flexible JavaScript
  14. 14. MongoDB: the Bad Not Quite Enterprise-Grade Not Quite Enterprise-Grade Not Cheap to Run Well
  15. 15. I will write more robust code I will write more robust code I will write more robust code I will write more robust code I will write more robust code I will write more robust code I will write more robust code I will write more robust code I will write more robust code
  16. 16. I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce
  17. 17. RAM + locks == $$$
  18. 18. Five Steps to Happiness Sharding Native Relationships Atomic Update Buffering Content-Addressed Storage Shell Tricks
  19. 19. // Google AdWords object model Account Campaign AdGroup // this joins ads & keywords Ad Keyword // For example AdGroup has an Account AdGroup has a Campaign AdGroup has many Ads AdGroup has many Keywords Slam dunk for SQL
  20. 20. // Let’s play a bit Account Campaign AdGroup Ad Keyword
  21. 21. // Let’s play some more Account Campaign AdGroup Ad Keyword
  22. 22. // There is just one bit left Account Campaign AdGroup 1 Ad 0 Keyword
  23. 23. // build a hierarchical ID accountIDcampaignIDadGroupID((0keywordID)|(1adID)) // a binary ID! 10100100001100000000101001100110101010010100 < accountID >< campaignID >< … // Encode it in base 16, 32 or 64 {"_id" : "a4300a66a94d20f1", … }
  24. 24. // Example The 5th ad Of the 3rd ad group Of the 7th campaign Of the 255th account could have the _id 0x00ff000700031005 The _id for the 10th keyword of the same ad group would be 0x00ff00070003000a
  25. 25. // Neat: the ad’s and keyword’s _ids contain the // IDs of all of their ancestors in the hierarchy. keywordId = 0x00ff00070003000a adGroupId = keywordId & 0xffffffffffff0000 campaignId = keywordId & 0xffffffff00000000 accountId = keywordId & 0xffff000000000000 // has-a relationship is a simple lookup account = db.accounts.findOne({_id: accountId})
  26. 26. // Neater: has-many relationships are just // range queries on the _id index. adGroupId = keywordId & 0xffffffffffff0000 startOfAds = adGroupId + 0x1000 endOfAds = adGroupId + 0x1fff adsForKeyword = db.ads.find({ _id: {$gte: startOfAds, $lte: endOfAds} }) // Technically, that was a join via the ad group. // Who said Mongo can’t do joins???
  27. 27. > db.reports.findOne() { "_id" : …, "period" : "hour", "shard" : 0, // 16Mb doc limit protection "topic" : "ce-1", "ts" : ISODate("2012-06-12T05:00:00Z"), "variations" : { "2" : { // variationID (dimension set) "hint" : { "present" : 311, // hint.present is a metric "clicks" : 1 } }, "4" : { "hint" : { "present" : 331 } } } }
  28. 28. Content Addressed Storage Lazy join abstraction Very space efficient Extremely (pre-)cacheable Join only happens during reporting
  29. 29. // Step 1: take a set of dimensions worth tracking data = { "domain_id" : "SW-28077508-16444", "hint" : "Find an organic alternative", "theme" : "red" } // Step 2: compute a digital signature, e.g., MD5 sig = "000069569F4835D16E69DF704187AC2F” // Step 3: if new sig, increment a counter counter = 264034 // Step 4: create a document in the context- // addressed store collection for these
  30. 30. > db.cas.findOne() { "_id" : "000069569F4835D16E69DF704187AC2F", // MD5 hash "data" : { // data that was digested to the hash above "domain_id" : "SW-28077508-16444", "hint" : "Find an organic alternative", "theme” : "red" }, "meta_data" : { "id" : 264034 // variationID }, "created_at" : ISODate("2013-02-04T12:05:34.752Z") } // Elsewhere, in the reports collection… "variations" : { "264034" : { // metrics here }, … lazy join
  31. 31. // Use underscore.js in the shell // See http://underscorejs.org/ function underscore() { load("/mongo_hacks/underscore.js"); }
  32. 32. // Loads underscore.js on the MongoDB server function server_underscore(force) { force = force || false; if (force || typeof(underscoreLoaded) === 'undefined') { db.eval(cat("/mongo_hacks/underscore.js")); underscoreLoaded = true; } }
  33. 33. // Callstack printing on exception -- wraps a function function dbg(f) { try { f(); } catch (e) { print("n**** Exception: " + e.toString()); print("n"); print(e.stack); print("n"); if (arguments.length > 1) { printjson(arguments); print("n"); } throw e; } }
  34. 34. function minutesAgo(minutes, d) { d = d || new Date(); return new Date(d.valueOf() - minutes * 60 * 1000); } function hoursAgo(hours, d) { d = d || new Date(); return minutesAgo(60 * hours, d); } function daysAgo(days, d) { d = d || new Date(); return hoursAgo(24 * days, d); }
  35. 35. // Don’t write in the shell. // Use your fav editor, save & type t() in mongo function t() { load("/mongo_hacks/bag_of_tricks.js"); }
  36. 36. @simeons sim@swoop.com

×