Using NoSQL with Yo' SQL

1,983 views
1,911 views

Published on

Supplementing a relational database application with MongoDB.

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
1,983
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
19
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Using NoSQL with Yo' SQL

  1. 1. Using NoSQL with Yo’ SQL Supplementing your app with a slice of MongoDB Rich Thornett DribbbleThursday, June 9, 2011
  2. 2. Dribbble What are you working on? Show and tell for creatives via screenshotsThursday, June 9, 2011
  3. 3. Your Fathers Webapp Dribbble is a typical web application: Ruby on Rails + Relational Database We <3 PostgreSQL But for certain tasks ...Thursday, June 9, 2011
  4. 4. Alternative Values log | scale | optimize | aggregate | cache More flexible data structures Easier horizontal scalingThursday, June 9, 2011
  5. 5. NoSQL No == Not Only (but sounds a bit stronger, no?) • No: Fixed table schemas • No: Joins • Yes: Scale horizontally Examples Memcached, Redis, CouchDB, Cassandra, MongoDB ...Thursday, June 9, 2011
  6. 6. Exploring MongoDB • Persistent data store • Powerful query language (closest to RDBMs) • Broad feature set • Great community and documentation Utility belt that fits us?Thursday, June 9, 2011
  7. 7. What is MongoDB? A document-oriented NoSQL database Collections & Documents v. Tables & RowsThursday, June 9, 2011
  8. 8. Whats a document? Our old friend JavaScript { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: [can, haz, arrays] } Documents are BSON (binary encoded JSON)Thursday, June 9, 2011
  9. 9. Embedded Documents Avoid joins for "belongs to" associations { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: [can, haz, arrays] })Thursday, June 9, 2011
  10. 10. Arrays Avoid joins for "tiny relations" { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: [can, haz, arrays] }) Relational Cruft thing thing_taggings tagsThursday, June 9, 2011
  11. 11. Googley “With MongoDB we can ... grow our data set horizontally on a cluster of commodity hardware and do distributed (read parallel execution of) queries/updates/inserts/deletes.” --Markus Gattol http://www.markus-gattol.name/ws/mongodb.htmlThursday, June 9, 2011
  12. 12. Replica Sets Automate the storing of multiple copies of data • Read Scaling • Data Redundancy • Automated Failover • Maintenance • Disaster RecoveryThursday, June 9, 2011
  13. 13. Dude, who sharded? Relax, not you. Auto-sharding You Specify a shard key for a collection Mongo Partitions the collection across machines Application Blissfully unaware (mostly :)Thursday, June 9, 2011
  14. 14. CoSQL MongoDB Lo g alin WEBAPP ggi Sc ng MIND THE APP RDBMS An ing aly ch tic Ca s FlexibilityThursday, June 9, 2011
  15. 15. Ads • Orthogonal to primary app • Few joins • Integrity not critical Lets Mongo!Thursday, June 9, 2011
  16. 16. From the Console But there are drivers for all major languages Create a text ad db.ads.insert({ advertiser_id: 1, type: text, url: http://dribbbler-on-the-roof.com, copy: Watch me!, runs: [{ start: new Date(2011, 4, 7), end: new Date(2011, 4, 14) }], created_at: new Date() })Thursday, June 9, 2011
  17. 17. Querying Query by match db.ads.find({advertiser_id: 1}) Paging active ads // Page 2 of text ads running this month db.ads.find({ type: text, runs: { $elemMatch: { start: {$lte: new Date(2011, 4, 10)}, end: {$gte: new Date(2011, 4, 10)} } } }).sort({created_at: -1}).skip(15).limit(15)Thursday, June 9, 2011
  18. 18. Advanced Queries http://www.mongodb.org/display/DOCS/Advanced+Queries $gt $mod $size $lt $ne $type $gte $in $elemMatch $lte $nin $not $all $nor $where $exists $or count | distinct | group Group does not work across shards, use map/reduce instead.Thursday, June 9, 2011
  19. 19. Polymorphism Easy inheritance. Document has whatever fields it needs. // Banner ad has additional fields db.ads.insert({ advertiser_id: 1, type: banner, url: http://dribbble-me-this.com, copy: Buy me!, runs: [], image_file_name: ad.png, image_content_type: image/png, image_file_size: 33333 }) Single | Multiple | Joined table inheritance all present difficulties No DB changes to create new subclasses in MongoThursday, June 9, 2011
  20. 20. Logging • Scale and query horizontally • Add fields on the fly • Writes: Fast, asynchronous, atomicThursday, June 9, 2011
  21. 21. Volume Logging • Ad impressions • Screenshot views • Profile views Fast, asynchronous writes and sharding FTW!Thursday, June 9, 2011
  22. 22. Real-time Analytics What people and locations are trending this hour? db.trends.update( {date: "2011-04-10 13:00"}, // search criteria { $inc: { // increment user.simplebits.likes_received: 1, country.us.likes_received: 1, city.boston.likes_received: 1 } }, true // upsert ) upsert: Update document (if present) or insert it $inc: Increment field by amount (if present) or set to amountThursday, June 9, 2011
  23. 23. Flex Benefits • Add/nest new fields to measure with ease • Atomic upsert with $inc Replaces two-step, transactional find-and-update/create • Live, cached aggregationThursday, June 9, 2011
  24. 24. ScoutingThursday, June 9, 2011
  25. 25. Design a Designer db.users.insert( { name: Dan Cederholm, available: true, skills: [html, css, illustration, icon design] } )Thursday, June 9, 2011
  26. 26. Geospatial Indexing db.users.ensureIndex({location: 2d}) db.users.insert( { name: Dan Cederholm, // Salem longitude/latitude location: [-70.8972222, 42.5194444], available: true, skills: [html, css, illustration, icon design] } )Thursday, June 9, 2011
  27. 27. Search by Location boston = [-71.0602778, 42.3583333] // long/lat Within area // $maxDistance: Find users in Boston area (w/in 50 miles) db.users.find({location: {$near: boston, $maxDistance: 0.7234842}}) Within area, matching criteria // Find users in the Boston area who: // are available for work // have expertise in HTML and icon design db.users.find({ location: {$near: boston, $maxDistance: .7234842}, available: true, skills: {$all: [html, icon design]} })Thursday, June 9, 2011
  28. 28. Search Power Flexible Documents + Rich Query Language + Geospatial IndexingThursday, June 9, 2011
  29. 29. StatsThursday, June 9, 2011
  30. 30. Unique Views a.k.a visitors per day unique = remote_ip address / DAYThursday, June 9, 2011
  31. 31. Map/Reduce http://www.mongodb.org/display/DOCS/MapReduce Aggregate by key => GROUP BY in SQL Collections Input and output Map Returns 0..N key/value pairs per document Reduce Aggregates values per keyThursday, June 9, 2011
  32. 32. Strategy Two-pass map/reduce to calculate unique visitors Pass 1 GROUP BY: profile, visitor COUNT: visits per visitor per profile Pass 2 GROUP BY: profile COUNT: visitorsThursday, June 9, 2011
  33. 33. Profile View Data Visits on a given day // Profile 1 {profile_id: 1, remote_ip: 127.0.0.1} {profile_id: 1, remote_ip: 127.0.0.1} {profile_id: 1, remote_ip: 127.0.0.2} // Profile 2 {profile_id: 2, remote_ip: 127.0.0.4} {profile_id: 2, remote_ip: 127.0.0.4}Thursday, June 9, 2011
  34. 34. Pass 1: Map Function Count visits per remote_ip per profile KEY = profile, remote_ip map = function() { var key = { profile_id: this.profile_id, remote_ip: this.remote_ip }; emit(key, {count: 1}); }Thursday, June 9, 2011
  35. 35. Reduce Function Counts (occurrences of key) reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v.count; }); return {count: count}; }Thursday, June 9, 2011
  36. 36. Pass 1: Run Map/Reduce Count visits per remote_ip per profile db.profile_views.mapReduce(map, reduce, {out: profile_views_by_visitor} ) // Results: Unique visitors per profile db.profile_views_by_visitor.find() { "_id": { "profile_id": 1, "remote_ip": "127.0.0.1" }, "value": { "count": 2 } } { "_id": { "profile_id": 1, "remote_ip": "127.0.0.2" }, "value": { "count": 1 } } { "_id": { "profile_id": 2, "remote_ip": "127.0.0.4" }, "value": { "count": 1 } }Thursday, June 9, 2011
  37. 37. Pass 2: Map/Reduce Count visitors per profile KEY = profile_id map = function() { emit(this._id.profile_id, {count: 1}); }Thursday, June 9, 2011
  38. 38. Pass 2: Results Count visitors per profile // Same reduce function as before db.profile_views_by_visitor.mapReduce(map, reduce, {out: profile_views_unique} ) // Results db.profile_views_unique.find() { "_id" : 1, "value" : { "count" : 2 } } { "_id" : 2, "value" : { "count" : 1 } }Thursday, June 9, 2011
  39. 39. Map/Deduce Can be clunkier than GROUP BY in SQL. But ... Large data sets, you get: • Horizontal scaling • Parallel processing across cluster JavaScript functions offers flexibility/powerThursday, June 9, 2011
  40. 40. Activity SELECT * FROM everything; Too many tables to JOIN or UNIONThursday, June 9, 2011
  41. 41. Relational solution Denormalized events table as activity log. Column | Type | ------------------------+-----------------------------+ id | integer | event_type | character varying(255) | subject_type | character varying(255) | actor_type | character varying(255) | secondary_subject_type | character varying(255) | subject_id | integer | actor_id | integer | secondary_subject_id | integer | recipient_id | integer | secondary_recipient_id | integer | created_at | timestamp without time zone | We use James Golick’s timeline_fu gem for Rails: https://github.com/jamesgolick/timeline_fuThursday, June 9, 2011
  42. 42. Direction Incoming Activity Generated Activity (recipients) (actors)Thursday, June 9, 2011
  43. 43. Complications Multiple recipients • Subscribe to comments for a shot • Twitter-style @ mentions in comments Confusing names • Generic names make queries and view logic hard to follow N+1 • Each event may require several lookups to get actor, subject, etcThursday, June 9, 2011
  44. 44. Events in Mongo Comment on a Screenshot containing an @ mention Screenshot owner and @user should be recipients. Mongo version of our timeline_events table { event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [], // Multiple recipients secondary_recipient_id: 3, created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" }Thursday, June 9, 2011
  45. 45. Mongo Event v.2 Why is a user a recipient? { event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [1, 2], recipients: [ {user_id: 2, reason: screenshot owner}, {user_id: 3, reason: mention} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" }Thursday, June 9, 2011
  46. 46. Mongo Event v.3 Meaningful names { event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1 user_id: 1, comment_id 999, screenshot_id: 555, recipients: [ {user_id: 2, reason: screenshot owner}, {user_id: 3, reason: mention} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" }Thursday, June 9, 2011
  47. 47. Mongo Event v.4 Denormalize to eliminate N+1s in view { event_type: "created", subject_type: "Comment", user_id: 1, comment_id: 999, screenshot_id: 999, user: {id: 1, login: "simplebits", avatar: "dancederholm-peek.png"}, comment: {id: 999, text: "Great shot!”}, screenshot: {id: 555, title: "Shot heard around the world"}, recipients: [ {user_id: 2, reason: screenshot owner}, {user_id: 3, reason: mention} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" }Thursday, June 9, 2011
  48. 48. Denormalizing? Youre giving up RDBMs benefits to optimize. Optimize your optimizations. Document flexibility: Data structures can mirror the viewThursday, June 9, 2011
  49. 49. Caching http://www.mongodb.org/display/DOCS/Caching MongoDB uses memory-mapped files • Grabs free memory as needed; no configured cache size • Relies on OS to reclaim memory (LRU)Thursday, June 9, 2011
  50. 50. Replace Redis/Memcached? FREQUENTLY accessed items LIKELY in memory Good enough for you? One less moving part.Thursday, June 9, 2011
  51. 51. Cache Namespaces ad_1 Memcached keys are flat ad_2 ad_3 No simple way to expire all Collection // Clear collection to expire db.ads_cache.remove() can serve as an expirable namespaceThursday, June 9, 2011
  52. 52. Time to Mongo? Versatility? Data structure flexibility worth more than joins? Easier horizontal scaling? log | scale | optimize | aggregate | cache http://www.mongodb.orgThursday, June 9, 2011
  53. 53. Cheers! Rich Thornett Dribbble http://dribbble.com @frogandcodeThursday, June 9, 2011

×