Using NoSQL with Yo' SQL
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Using NoSQL with Yo' SQL

on

  • 2,165 views

Supplementing a relational database application with MongoDB.

Supplementing a relational database application with MongoDB.

Statistics

Views

Total Views
2,165
Views on SlideShare
2,071
Embed Views
94

Actions

Likes
3
Downloads
15
Comments
1

2 Embeds 94

http://lanyrd.com 89
http://paper.li 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Using NoSQL with Yo' SQL Presentation Transcript

  • 1. Using NoSQL with Yo’ SQL Supplementing your app with a slice of MongoDB Rich Thornett DribbbleThursday, June 9, 2011
  • 2. Dribbble What are you working on? Show and tell for creatives via screenshotsThursday, June 9, 2011
  • 3. Your Fathers Webapp Dribbble is a typical web application: Ruby on Rails + Relational Database We <3 PostgreSQL But for certain tasks ...Thursday, June 9, 2011
  • 4. Alternative Values log | scale | optimize | aggregate | cache More flexible data structures Easier horizontal scalingThursday, June 9, 2011
  • 5. NoSQL No == Not Only (but sounds a bit stronger, no?) • No: Fixed table schemas • No: Joins • Yes: Scale horizontally Examples Memcached, Redis, CouchDB, Cassandra, MongoDB ...Thursday, June 9, 2011
  • 6. Exploring MongoDB • Persistent data store • Powerful query language (closest to RDBMs) • Broad feature set • Great community and documentation Utility belt that fits us?Thursday, June 9, 2011
  • 7. What is MongoDB? A document-oriented NoSQL database Collections & Documents v. Tables & RowsThursday, June 9, 2011
  • 8. Whats a document? Our old friend JavaScript { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: [can, haz, arrays] } Documents are BSON (binary encoded JSON)Thursday, June 9, 2011
  • 9. Embedded Documents Avoid joins for "belongs to" associations { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: [can, haz, arrays] })Thursday, June 9, 2011
  • 10. Arrays Avoid joins for "tiny relations" { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: [can, haz, arrays] }) Relational Cruft thing thing_taggings tagsThursday, June 9, 2011
  • 11. Googley “With MongoDB we can ... grow our data set horizontally on a cluster of commodity hardware and do distributed (read parallel execution of) queries/updates/inserts/deletes.” --Markus Gattol http://www.markus-gattol.name/ws/mongodb.htmlThursday, June 9, 2011
  • 12. Replica Sets Automate the storing of multiple copies of data • Read Scaling • Data Redundancy • Automated Failover • Maintenance • Disaster RecoveryThursday, June 9, 2011
  • 13. Dude, who sharded? Relax, not you. Auto-sharding You Specify a shard key for a collection Mongo Partitions the collection across machines Application Blissfully unaware (mostly :)Thursday, June 9, 2011
  • 14. CoSQL MongoDB Lo g alin WEBAPP ggi Sc ng MIND THE APP RDBMS An ing aly ch tic Ca s FlexibilityThursday, June 9, 2011
  • 15. Ads • Orthogonal to primary app • Few joins • Integrity not critical Lets Mongo!Thursday, June 9, 2011
  • 16. From the Console But there are drivers for all major languages Create a text ad db.ads.insert({ advertiser_id: 1, type: text, url: http://dribbbler-on-the-roof.com, copy: Watch me!, runs: [{ start: new Date(2011, 4, 7), end: new Date(2011, 4, 14) }], created_at: new Date() })Thursday, June 9, 2011
  • 17. Querying Query by match db.ads.find({advertiser_id: 1}) Paging active ads // Page 2 of text ads running this month db.ads.find({ type: text, runs: { $elemMatch: { start: {$lte: new Date(2011, 4, 10)}, end: {$gte: new Date(2011, 4, 10)} } } }).sort({created_at: -1}).skip(15).limit(15)Thursday, June 9, 2011
  • 18. Advanced Queries http://www.mongodb.org/display/DOCS/Advanced+Queries $gt $mod $size $lt $ne $type $gte $in $elemMatch $lte $nin $not $all $nor $where $exists $or count | distinct | group Group does not work across shards, use map/reduce instead.Thursday, June 9, 2011
  • 19. Polymorphism Easy inheritance. Document has whatever fields it needs. // Banner ad has additional fields db.ads.insert({ advertiser_id: 1, type: banner, url: http://dribbble-me-this.com, copy: Buy me!, runs: [], image_file_name: ad.png, image_content_type: image/png, image_file_size: 33333 }) Single | Multiple | Joined table inheritance all present difficulties No DB changes to create new subclasses in MongoThursday, June 9, 2011
  • 20. Logging • Scale and query horizontally • Add fields on the fly • Writes: Fast, asynchronous, atomicThursday, June 9, 2011
  • 21. Volume Logging • Ad impressions • Screenshot views • Profile views Fast, asynchronous writes and sharding FTW!Thursday, June 9, 2011
  • 22. Real-time Analytics What people and locations are trending this hour? db.trends.update( {date: "2011-04-10 13:00"}, // search criteria { $inc: { // increment user.simplebits.likes_received: 1, country.us.likes_received: 1, city.boston.likes_received: 1 } }, true // upsert ) upsert: Update document (if present) or insert it $inc: Increment field by amount (if present) or set to amountThursday, June 9, 2011
  • 23. Flex Benefits • Add/nest new fields to measure with ease • Atomic upsert with $inc Replaces two-step, transactional find-and-update/create • Live, cached aggregationThursday, June 9, 2011
  • 24. ScoutingThursday, June 9, 2011
  • 25. Design a Designer db.users.insert( { name: Dan Cederholm, available: true, skills: [html, css, illustration, icon design] } )Thursday, June 9, 2011
  • 26. Geospatial Indexing db.users.ensureIndex({location: 2d}) db.users.insert( { name: Dan Cederholm, // Salem longitude/latitude location: [-70.8972222, 42.5194444], available: true, skills: [html, css, illustration, icon design] } )Thursday, June 9, 2011
  • 27. Search by Location boston = [-71.0602778, 42.3583333] // long/lat Within area // $maxDistance: Find users in Boston area (w/in 50 miles) db.users.find({location: {$near: boston, $maxDistance: 0.7234842}}) Within area, matching criteria // Find users in the Boston area who: // are available for work // have expertise in HTML and icon design db.users.find({ location: {$near: boston, $maxDistance: .7234842}, available: true, skills: {$all: [html, icon design]} })Thursday, June 9, 2011
  • 28. Search Power Flexible Documents + Rich Query Language + Geospatial IndexingThursday, June 9, 2011
  • 29. StatsThursday, June 9, 2011
  • 30. Unique Views a.k.a visitors per day unique = remote_ip address / DAYThursday, June 9, 2011
  • 31. Map/Reduce http://www.mongodb.org/display/DOCS/MapReduce Aggregate by key => GROUP BY in SQL Collections Input and output Map Returns 0..N key/value pairs per document Reduce Aggregates values per keyThursday, June 9, 2011
  • 32. Strategy Two-pass map/reduce to calculate unique visitors Pass 1 GROUP BY: profile, visitor COUNT: visits per visitor per profile Pass 2 GROUP BY: profile COUNT: visitorsThursday, June 9, 2011
  • 33. Profile View Data Visits on a given day // Profile 1 {profile_id: 1, remote_ip: 127.0.0.1} {profile_id: 1, remote_ip: 127.0.0.1} {profile_id: 1, remote_ip: 127.0.0.2} // Profile 2 {profile_id: 2, remote_ip: 127.0.0.4} {profile_id: 2, remote_ip: 127.0.0.4}Thursday, June 9, 2011
  • 34. Pass 1: Map Function Count visits per remote_ip per profile KEY = profile, remote_ip map = function() { var key = { profile_id: this.profile_id, remote_ip: this.remote_ip }; emit(key, {count: 1}); }Thursday, June 9, 2011
  • 35. Reduce Function Counts (occurrences of key) reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v.count; }); return {count: count}; }Thursday, June 9, 2011
  • 36. Pass 1: Run Map/Reduce Count visits per remote_ip per profile db.profile_views.mapReduce(map, reduce, {out: profile_views_by_visitor} ) // Results: Unique visitors per profile db.profile_views_by_visitor.find() { "_id": { "profile_id": 1, "remote_ip": "127.0.0.1" }, "value": { "count": 2 } } { "_id": { "profile_id": 1, "remote_ip": "127.0.0.2" }, "value": { "count": 1 } } { "_id": { "profile_id": 2, "remote_ip": "127.0.0.4" }, "value": { "count": 1 } }Thursday, June 9, 2011
  • 37. Pass 2: Map/Reduce Count visitors per profile KEY = profile_id map = function() { emit(this._id.profile_id, {count: 1}); }Thursday, June 9, 2011
  • 38. Pass 2: Results Count visitors per profile // Same reduce function as before db.profile_views_by_visitor.mapReduce(map, reduce, {out: profile_views_unique} ) // Results db.profile_views_unique.find() { "_id" : 1, "value" : { "count" : 2 } } { "_id" : 2, "value" : { "count" : 1 } }Thursday, June 9, 2011
  • 39. Map/Deduce Can be clunkier than GROUP BY in SQL. But ... Large data sets, you get: • Horizontal scaling • Parallel processing across cluster JavaScript functions offers flexibility/powerThursday, June 9, 2011
  • 40. Activity SELECT * FROM everything; Too many tables to JOIN or UNIONThursday, June 9, 2011
  • 41. Relational solution Denormalized events table as activity log. Column | Type | ------------------------+-----------------------------+ id | integer | event_type | character varying(255) | subject_type | character varying(255) | actor_type | character varying(255) | secondary_subject_type | character varying(255) | subject_id | integer | actor_id | integer | secondary_subject_id | integer | recipient_id | integer | secondary_recipient_id | integer | created_at | timestamp without time zone | We use James Golick’s timeline_fu gem for Rails: https://github.com/jamesgolick/timeline_fuThursday, June 9, 2011
  • 42. Direction Incoming Activity Generated Activity (recipients) (actors)Thursday, June 9, 2011
  • 43. Complications Multiple recipients • Subscribe to comments for a shot • Twitter-style @ mentions in comments Confusing names • Generic names make queries and view logic hard to follow N+1 • Each event may require several lookups to get actor, subject, etcThursday, June 9, 2011
  • 44. Events in Mongo Comment on a Screenshot containing an @ mention Screenshot owner and @user should be recipients. Mongo version of our timeline_events table { event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [], // Multiple recipients secondary_recipient_id: 3, created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" }Thursday, June 9, 2011
  • 45. Mongo Event v.2 Why is a user a recipient? { event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [1, 2], recipients: [ {user_id: 2, reason: screenshot owner}, {user_id: 3, reason: mention} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" }Thursday, June 9, 2011
  • 46. Mongo Event v.3 Meaningful names { event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1 user_id: 1, comment_id 999, screenshot_id: 555, recipients: [ {user_id: 2, reason: screenshot owner}, {user_id: 3, reason: mention} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" }Thursday, June 9, 2011
  • 47. Mongo Event v.4 Denormalize to eliminate N+1s in view { event_type: "created", subject_type: "Comment", user_id: 1, comment_id: 999, screenshot_id: 999, user: {id: 1, login: "simplebits", avatar: "dancederholm-peek.png"}, comment: {id: 999, text: "Great shot!”}, screenshot: {id: 555, title: "Shot heard around the world"}, recipients: [ {user_id: 2, reason: screenshot owner}, {user_id: 3, reason: mention} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" }Thursday, June 9, 2011
  • 48. Denormalizing? Youre giving up RDBMs benefits to optimize. Optimize your optimizations. Document flexibility: Data structures can mirror the viewThursday, June 9, 2011
  • 49. Caching http://www.mongodb.org/display/DOCS/Caching MongoDB uses memory-mapped files • Grabs free memory as needed; no configured cache size • Relies on OS to reclaim memory (LRU)Thursday, June 9, 2011
  • 50. Replace Redis/Memcached? FREQUENTLY accessed items LIKELY in memory Good enough for you? One less moving part.Thursday, June 9, 2011
  • 51. Cache Namespaces ad_1 Memcached keys are flat ad_2 ad_3 No simple way to expire all Collection // Clear collection to expire db.ads_cache.remove() can serve as an expirable namespaceThursday, June 9, 2011
  • 52. Time to Mongo? Versatility? Data structure flexibility worth more than joins? Easier horizontal scaling? log | scale | optimize | aggregate | cache http://www.mongodb.orgThursday, June 9, 2011
  • 53. Cheers! Rich Thornett Dribbble http://dribbble.com @frogandcodeThursday, June 9, 2011