Using NoSQL with Yo' SQL
Upcoming SlideShare
Loading in...5
×
 

Using NoSQL with Yo' SQL

on

  • 2,129 views

Supplementing a relational database application with MongoDB.

Supplementing a relational database application with MongoDB.

Statistics

Views

Total Views
2,129
Views on SlideShare
2,035
Embed Views
94

Actions

Likes
3
Downloads
15
Comments
1

2 Embeds 94

http://lanyrd.com 89
http://paper.li 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Using NoSQL with Yo' SQL Using NoSQL with Yo' SQL Presentation Transcript

  • Using NoSQL with Yo’ SQL Supplementing your app with a slice of MongoDB Rich Thornett DribbbleThursday, June 9, 2011
  • Dribbble What are you working on? Show and tell for creatives via screenshotsThursday, June 9, 2011
  • Your Fathers Webapp Dribbble is a typical web application: Ruby on Rails + Relational Database We <3 PostgreSQL But for certain tasks ...Thursday, June 9, 2011
  • Alternative Values log | scale | optimize | aggregate | cache More flexible data structures Easier horizontal scalingThursday, June 9, 2011
  • NoSQL No == Not Only (but sounds a bit stronger, no?) • No: Fixed table schemas • No: Joins • Yes: Scale horizontally Examples Memcached, Redis, CouchDB, Cassandra, MongoDB ...Thursday, June 9, 2011
  • Exploring MongoDB • Persistent data store • Powerful query language (closest to RDBMs) • Broad feature set • Great community and documentation Utility belt that fits us?Thursday, June 9, 2011
  • What is MongoDB? A document-oriented NoSQL database Collections & Documents v. Tables & RowsThursday, June 9, 2011
  • Whats a document? Our old friend JavaScript { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: [can, haz, arrays] } Documents are BSON (binary encoded JSON)Thursday, June 9, 2011
  • Embedded Documents Avoid joins for "belongs to" associations { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: [can, haz, arrays] })Thursday, June 9, 2011
  • Arrays Avoid joins for "tiny relations" { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: [can, haz, arrays] }) Relational Cruft thing thing_taggings tagsThursday, June 9, 2011
  • Googley “With MongoDB we can ... grow our data set horizontally on a cluster of commodity hardware and do distributed (read parallel execution of) queries/updates/inserts/deletes.” --Markus Gattol http://www.markus-gattol.name/ws/mongodb.htmlThursday, June 9, 2011
  • Replica Sets Automate the storing of multiple copies of data • Read Scaling • Data Redundancy • Automated Failover • Maintenance • Disaster RecoveryThursday, June 9, 2011
  • Dude, who sharded? Relax, not you. Auto-sharding You Specify a shard key for a collection Mongo Partitions the collection across machines Application Blissfully unaware (mostly :)Thursday, June 9, 2011
  • CoSQL MongoDB Lo g alin WEBAPP ggi Sc ng MIND THE APP RDBMS An ing aly ch tic Ca s FlexibilityThursday, June 9, 2011
  • Ads • Orthogonal to primary app • Few joins • Integrity not critical Lets Mongo!Thursday, June 9, 2011
  • From the Console But there are drivers for all major languages Create a text ad db.ads.insert({ advertiser_id: 1, type: text, url: http://dribbbler-on-the-roof.com, copy: Watch me!, runs: [{ start: new Date(2011, 4, 7), end: new Date(2011, 4, 14) }], created_at: new Date() })Thursday, June 9, 2011
  • Querying Query by match db.ads.find({advertiser_id: 1}) Paging active ads // Page 2 of text ads running this month db.ads.find({ type: text, runs: { $elemMatch: { start: {$lte: new Date(2011, 4, 10)}, end: {$gte: new Date(2011, 4, 10)} } } }).sort({created_at: -1}).skip(15).limit(15)Thursday, June 9, 2011
  • Advanced Queries http://www.mongodb.org/display/DOCS/Advanced+Queries $gt $mod $size $lt $ne $type $gte $in $elemMatch $lte $nin $not $all $nor $where $exists $or count | distinct | group Group does not work across shards, use map/reduce instead.Thursday, June 9, 2011
  • Polymorphism Easy inheritance. Document has whatever fields it needs. // Banner ad has additional fields db.ads.insert({ advertiser_id: 1, type: banner, url: http://dribbble-me-this.com, copy: Buy me!, runs: [], image_file_name: ad.png, image_content_type: image/png, image_file_size: 33333 }) Single | Multiple | Joined table inheritance all present difficulties No DB changes to create new subclasses in MongoThursday, June 9, 2011
  • Logging • Scale and query horizontally • Add fields on the fly • Writes: Fast, asynchronous, atomicThursday, June 9, 2011
  • Volume Logging • Ad impressions • Screenshot views • Profile views Fast, asynchronous writes and sharding FTW!Thursday, June 9, 2011
  • Real-time Analytics What people and locations are trending this hour? db.trends.update( {date: "2011-04-10 13:00"}, // search criteria { $inc: { // increment user.simplebits.likes_received: 1, country.us.likes_received: 1, city.boston.likes_received: 1 } }, true // upsert ) upsert: Update document (if present) or insert it $inc: Increment field by amount (if present) or set to amountThursday, June 9, 2011
  • Flex Benefits • Add/nest new fields to measure with ease • Atomic upsert with $inc Replaces two-step, transactional find-and-update/create • Live, cached aggregationThursday, June 9, 2011
  • ScoutingThursday, June 9, 2011
  • Design a Designer db.users.insert( { name: Dan Cederholm, available: true, skills: [html, css, illustration, icon design] } )Thursday, June 9, 2011
  • Geospatial Indexing db.users.ensureIndex({location: 2d}) db.users.insert( { name: Dan Cederholm, // Salem longitude/latitude location: [-70.8972222, 42.5194444], available: true, skills: [html, css, illustration, icon design] } )Thursday, June 9, 2011
  • Search by Location boston = [-71.0602778, 42.3583333] // long/lat Within area // $maxDistance: Find users in Boston area (w/in 50 miles) db.users.find({location: {$near: boston, $maxDistance: 0.7234842}}) Within area, matching criteria // Find users in the Boston area who: // are available for work // have expertise in HTML and icon design db.users.find({ location: {$near: boston, $maxDistance: .7234842}, available: true, skills: {$all: [html, icon design]} })Thursday, June 9, 2011
  • Search Power Flexible Documents + Rich Query Language + Geospatial IndexingThursday, June 9, 2011
  • StatsThursday, June 9, 2011
  • Unique Views a.k.a visitors per day unique = remote_ip address / DAYThursday, June 9, 2011
  • Map/Reduce http://www.mongodb.org/display/DOCS/MapReduce Aggregate by key => GROUP BY in SQL Collections Input and output Map Returns 0..N key/value pairs per document Reduce Aggregates values per keyThursday, June 9, 2011
  • Strategy Two-pass map/reduce to calculate unique visitors Pass 1 GROUP BY: profile, visitor COUNT: visits per visitor per profile Pass 2 GROUP BY: profile COUNT: visitorsThursday, June 9, 2011
  • Profile View Data Visits on a given day // Profile 1 {profile_id: 1, remote_ip: 127.0.0.1} {profile_id: 1, remote_ip: 127.0.0.1} {profile_id: 1, remote_ip: 127.0.0.2} // Profile 2 {profile_id: 2, remote_ip: 127.0.0.4} {profile_id: 2, remote_ip: 127.0.0.4}Thursday, June 9, 2011
  • Pass 1: Map Function Count visits per remote_ip per profile KEY = profile, remote_ip map = function() { var key = { profile_id: this.profile_id, remote_ip: this.remote_ip }; emit(key, {count: 1}); }Thursday, June 9, 2011
  • Reduce Function Counts (occurrences of key) reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v.count; }); return {count: count}; }Thursday, June 9, 2011
  • Pass 1: Run Map/Reduce Count visits per remote_ip per profile db.profile_views.mapReduce(map, reduce, {out: profile_views_by_visitor} ) // Results: Unique visitors per profile db.profile_views_by_visitor.find() { "_id": { "profile_id": 1, "remote_ip": "127.0.0.1" }, "value": { "count": 2 } } { "_id": { "profile_id": 1, "remote_ip": "127.0.0.2" }, "value": { "count": 1 } } { "_id": { "profile_id": 2, "remote_ip": "127.0.0.4" }, "value": { "count": 1 } }Thursday, June 9, 2011
  • Pass 2: Map/Reduce Count visitors per profile KEY = profile_id map = function() { emit(this._id.profile_id, {count: 1}); }Thursday, June 9, 2011
  • Pass 2: Results Count visitors per profile // Same reduce function as before db.profile_views_by_visitor.mapReduce(map, reduce, {out: profile_views_unique} ) // Results db.profile_views_unique.find() { "_id" : 1, "value" : { "count" : 2 } } { "_id" : 2, "value" : { "count" : 1 } }Thursday, June 9, 2011
  • Map/Deduce Can be clunkier than GROUP BY in SQL. But ... Large data sets, you get: • Horizontal scaling • Parallel processing across cluster JavaScript functions offers flexibility/powerThursday, June 9, 2011
  • Activity SELECT * FROM everything; Too many tables to JOIN or UNIONThursday, June 9, 2011
  • Relational solution Denormalized events table as activity log. Column | Type | ------------------------+-----------------------------+ id | integer | event_type | character varying(255) | subject_type | character varying(255) | actor_type | character varying(255) | secondary_subject_type | character varying(255) | subject_id | integer | actor_id | integer | secondary_subject_id | integer | recipient_id | integer | secondary_recipient_id | integer | created_at | timestamp without time zone | We use James Golick’s timeline_fu gem for Rails: https://github.com/jamesgolick/timeline_fuThursday, June 9, 2011
  • Direction Incoming Activity Generated Activity (recipients) (actors)Thursday, June 9, 2011
  • Complications Multiple recipients • Subscribe to comments for a shot • Twitter-style @ mentions in comments Confusing names • Generic names make queries and view logic hard to follow N+1 • Each event may require several lookups to get actor, subject, etcThursday, June 9, 2011
  • Events in Mongo Comment on a Screenshot containing an @ mention Screenshot owner and @user should be recipients. Mongo version of our timeline_events table { event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [], // Multiple recipients secondary_recipient_id: 3, created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" }Thursday, June 9, 2011
  • Mongo Event v.2 Why is a user a recipient? { event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [1, 2], recipients: [ {user_id: 2, reason: screenshot owner}, {user_id: 3, reason: mention} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" }Thursday, June 9, 2011
  • Mongo Event v.3 Meaningful names { event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1 user_id: 1, comment_id 999, screenshot_id: 555, recipients: [ {user_id: 2, reason: screenshot owner}, {user_id: 3, reason: mention} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" }Thursday, June 9, 2011
  • Mongo Event v.4 Denormalize to eliminate N+1s in view { event_type: "created", subject_type: "Comment", user_id: 1, comment_id: 999, screenshot_id: 999, user: {id: 1, login: "simplebits", avatar: "dancederholm-peek.png"}, comment: {id: 999, text: "Great shot!”}, screenshot: {id: 555, title: "Shot heard around the world"}, recipients: [ {user_id: 2, reason: screenshot owner}, {user_id: 3, reason: mention} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" }Thursday, June 9, 2011
  • Denormalizing? Youre giving up RDBMs benefits to optimize. Optimize your optimizations. Document flexibility: Data structures can mirror the viewThursday, June 9, 2011
  • Caching http://www.mongodb.org/display/DOCS/Caching MongoDB uses memory-mapped files • Grabs free memory as needed; no configured cache size • Relies on OS to reclaim memory (LRU)Thursday, June 9, 2011
  • Replace Redis/Memcached? FREQUENTLY accessed items LIKELY in memory Good enough for you? One less moving part.Thursday, June 9, 2011
  • Cache Namespaces ad_1 Memcached keys are flat ad_2 ad_3 No simple way to expire all Collection // Clear collection to expire db.ads_cache.remove() can serve as an expirable namespaceThursday, June 9, 2011
  • Time to Mongo? Versatility? Data structure flexibility worth more than joins? Easier horizontal scaling? log | scale | optimize | aggregate | cache http://www.mongodb.orgThursday, June 9, 2011
  • Cheers! Rich Thornett Dribbble http://dribbble.com @frogandcodeThursday, June 9, 2011