Your SlideShare is downloading. ×
0
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
MongoSV Schema Workshop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

MongoSV Schema Workshop

3,643

Published on

1 Comment
6 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,643
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
69
Comments
1
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Schema Design Workshop Sridhar Nanjundeswaran Software Engineer, 10Gen sridhar@10gen.com @snanjundWednesday, December 5, 12
  • 2. Agenda • Part One - Basic Schema & Patterns • Part Two - Schema Design • Part Three - Sharding • Part Four: - ReplicationWednesday, December 5, 12
  • 3. Why is schema design different? • RDBMS design you ask "what answers do I have" • MongoDB you ask "what questions will I have"Wednesday, December 5, 12
  • 4. Goals • Learn Data Modeling with MongoDB • Labs to try to solve problems • Understand implications of • Replication • Sharding Please, ask many, many questions!Wednesday, December 5, 12
  • 5. Part One Basic Schema & PatternsWednesday, December 5, 12
  • 6. So why model data? http://bit.ly/SSs7QBWednesday, December 5, 12
  • 7. Normalization • 1970 E.F.Codd introduces 1st Normal Form (1NF) • 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF) • 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF) • 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF) Goals: • Avoid anomalies when inserting, updating or deleting • Minimize redesign when extending the schema • Make the model informative to users • Avoid bias towards a particular style of query * source : wikipediaWednesday, December 5, 12
  • 8. So today’s example will use... http://bit.ly/RyIOvOWednesday, December 5, 12
  • 9. Terminology RDBMS MongoDB Table Collection Row(s) JSON  Document Index Index Join Embedding  &  Linking Partition Shard Partition  Key Shard  KeyWednesday, December 5, 12
  • 10. Schema Design Relational DatabaseWednesday, December 5, 12
  • 11. Schema Design MongoDBWednesday, December 5, 12
  • 12. Schema Design MongoDB linkingWednesday, December 5, 12
  • 13. Schema Design embedding MongoDB linkingWednesday, December 5, 12
  • 14. Basic schema Design documents that simply map to your application > post = { author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "movie"] } > db.blogs.save(post)Wednesday, December 5, 12
  • 15. Find the document > db.blogs.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "movie" ] } Notes: • ID must be unique, but can be anything you’d like • MongoDB will generate a default ID if one is not suppliedWednesday, December 5, 12
  • 16. Add an index, find via Index Secondary index for “author” // 1 means ascending, -1 means descending > db.blogs.ensureIndex( { author: 1 } ) > db.blogs.find( { author: Hergé } ) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-09-18T09:56:06.298Z"), author: "Hergé", ... }Wednesday, December 5, 12
  • 17. Examine the query plan > db.blogs.find( { author: "Hergé" } ).explain() { ! "cursor" : "BtreeCursor author_1", ! "nscanned" : 1, ! "nscannedObjects" : 1, ! "n" : 1, ! "millis" : 5, ! "indexBounds" : { ! ! "author" : [ ! ! ! [ ! ! ! ! "Hergé", ! ! ! ! "Hergé" ! ! ! ] ! ! ] ! } }Wednesday, December 5, 12
  • 18. Examine the query plan > db.blogs.find( { author: "Hergé" } ).explain() { ! "cursor" : "BtreeCursor author_1", ! "nscanned" : 1, ! "nscannedObjects" : 1, ! "n" : 1, ! "millis" : 5, ! "indexBounds" : { ! ! "author" : [ ! ! ! [ ! ! ! ! "Hergé", ! ! ! ! "Hergé" ! ! ! ] ! ! ] ! } }Wednesday, December 5, 12
  • 19. Examine the query plan > db.blogs.find( { author: "Hergé" } ).explain() { ! "cursor" : "BtreeCursor author_1", ! "nscanned" : 1, ! "nscannedObjects" : 1, Number of objects ! "n" : 1, returned ! "millis" : 5, ! "indexBounds" : { How long it took ! ! "author" : [ ! ! ! [ ! ! ! ! "Hergé", ! ! ! ! "Hergé" ! ! ! ] ! ! ] ! } }Wednesday, December 5, 12
  • 20. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags > db.blogs.find( { tags: { $exists: true } } ) Regular expressions: // posts where author starts with h > db.blogs.find( { author: /^h/i } ) Counting: // number of posts written by Hergé > db.blogs.find( { author: "Hergé" } ).count()Wednesday, December 5, 12
  • 21. Extending the Schema http://bit.ly/PpjT1lWednesday, December 5, 12
  • 22. Extending the Schema > new_comment = { author: "Kyle", date: new Date(), text: "great book" } > db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } } )Wednesday, December 5, 12
  • 23. Extending the Schema > new_comment = { author: "Kyle", date: new Date(), text: "great book" } > db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } } ) Add element to Increment counter arrayWednesday, December 5, 12
  • 24. Extending the Schema > db.blogs.find( { author: "Hergé"} ) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "movie" ], comments : [ ! { ! ! author : "Kyle", ! ! date : ISODate("2011-09-19T09:56:06.298Z"), ! ! text : "great book" ! } ], comments_count: 1 }Wednesday, December 5, 12
  • 25. Extending the Schema // create index on nested documents: > db.blogs.ensureIndex( { "comments.author": 1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) // find last 5 posts: > db.blogs.find().sort( { date: -1 } ).limit(5) // most commented post: > db.blogs.find().sort( { comments_count: -1 } ).limit(1) When sorting, check if you need an indexWednesday, December 5, 12
  • 26. Common Patterns http://bit.ly/SNnt4zWednesday, December 5, 12
  • 27. Inheritance http://bit.ly/T7MqUzWednesday, December 5, 12
  • 28. InheritanceWednesday, December 5, 12
  • 29. Single Table Inheritance - RDBMS select * from shapes; id type area radius length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2Wednesday, December 5, 12
  • 30. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} missing values not stored!Wednesday, December 5, 12
  • 31. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } )Wednesday, December 5, 12
  • 32. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } ) // create index > db.shapes.ensureIndex( { radius: 1 }, { sparse:true } ) index only values present!Wednesday, December 5, 12
  • 33. One to Many http://bit.ly/Oqbt8zWednesday, December 5, 12
  • 34. One to Many One to Many relationships can specify • degree of association between objects • containment • life-cycleWednesday, December 5, 12
  • 35. One to Many Embedded Array •$slice operator to return subset of comments •some queries harder •e.g find latest comments across all blogs blogs: { author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ ! { author : "Kyle", ! ! date : ISODate("2011-09-19T09:56:06.298Z"), ! ! text : "great book" } ] } > db.blogs.find( { author: "Hergé" }, { comment: { $slice : 10 } } )Wednesday, December 5, 12
  • 36. One to Many Normalized (2 collections) • most flexible • more queries blogs: { _id: 1000, author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), comments: [ ! {comment : 1)} ]} comments : { _id : 1, blog: 1000, author : "Kyle", ! ! date : ISODate("2011-09-19T09:56:06.298Z")} > blog = db.blogs.find( { text: "Destination Moon" } ); > db.comments.find( { blog: blog._id } ).limit(5);Wednesday, December 5, 12
  • 37. Many to Many http://bit.ly/QTzhBFWednesday, December 5, 12
  • 38. Many - Many Example: • Blog can have many Tags • Tag can be used by many BlogsWednesday, December 5, 12
  • 39. Many - Many // Each Tag lists the "_id" of the Blog tags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] } { _id: 30, name: "movie", // Unique blog_ids: [ 10 ] }Wednesday, December 5, 12
  • 40. Many - Many // Each Tag lists the "_id" of the Blog tags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] } { _id: 30, name: "movie", // Unique blog_ids: [ 10 ] } // Each Blog lists the "tag" of the Tags blogs: { _id: 10, name: "Destination Moon", tags: [ "comic", "movie" ] }Wednesday, December 5, 12
  • 41. Many - Many // Each Tag lists the "_id" of the Blog tags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] } links via unique key, in this { _id: 30, case "tags", could be "_id" name: "movie", // Unique blog_ids: [ 10 ] } // Each Blog lists the "tag" of the Tags blogs: { _id: 10, name: "Destination Moon", tags: [ "comic", "movie" ] }Wednesday, December 5, 12
  • 42. Many - Many // Each Tag lists the "_id" of the Blog tags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] } { _id: 30, name: "movie", // Unique blog_ids: [ 10 ] } // Each Blog lists the "tag" of the Tags blogs: { _id: 10, name: "Destination Moon", tags: [ "comic", "movie" ] } // All Tags for a given Blog > db.tags.find( { blog_ids: 10 } )Wednesday, December 5, 12
  • 43. Use _id or not? blogs: blogs: { _id: 10, name: "..." { _id: 10, name: "..." tags: [ "comic", "movie" ] tags: [ 10, 20 ] } } Pros: Pros: • Single query • Single update Cons: Cons: • Cascade any changes • Second query requiredWednesday, December 5, 12
  • 44. Alternative // Each Blog lists the _id of the Tag blogs: { _id: 10, name: "Destination Moon", tag_ids: [ 20, 30 ] } // Association not stored on the Tag tags: { _id: 20, name: "comic" }Wednesday, December 5, 12
  • 45. Alternative // Each Blog lists the _id of the Tag blogs: { _id: 10, name: "Destination Moon", tag_ids: [ 20, 30 ] } // Association not stored on the Tag tags: { _id: 20, name: "comic" } // All Blogs for a given Tag > db.blogs.find( { tag_ids: 20 } )Wednesday, December 5, 12
  • 46. Alternative // Each Blog lists the _id of the Tag blogs: { _id: 10, name: "Destination Moon", tag_ids: [ 20, 30 ] } // Association not stored on the Tag tags: { _id: 20, name: "comic" } // All Blogs for a given Tag > db.blogs.find( { tag_ids: 20 } ) // All Tags for a given Blog > blog = db.blogs.findOne( { _id: 10 } ) > db.tags.find({_id: {$in : blog.tag_ids}})Wednesday, December 5, 12
  • 47. Many - Many Intersection Attributes Example: • Blog can have many Tags • Tag can be used my many Blogs • When a Tag is used, record the usage dateWednesday, December 5, 12
  • 48. Many - Many Normalized // Each Blog lists the _id of the Tag blogs: { _id: 10, name: "...", tag_ids: [ 20, 30 ] } // Association not stored on the Tag tags: { _id: 20, name: "comic" } // Store the interaction and usage date usages: { blog_id: 10, // Blog _id tag_id : 20, // Tag _id usage: ISODate("2012-10-12...") } // Find the Tags for a Blog for(var c = db.usages.find({ blog_id: 10 }); c.hasNext(); ) { u = c.next(); t = db.tags.findOne( { _id: c.tag_id } ) printjson( u.usage );Wednesday, December 5, 12
  • 49. Many - Many Intersection Attributes // Each Blog lists the Blog Usage Object blogs: { _id: 10, name: "Destination Moon", tags: [ { tag: "comic", usage: ISODate("2012-10-12...") } { tag: "movie", usage: ISODate("2012-09-11...") } ] } // Find the Tags for a Blog > db.blogs.find( { _id: 10 }, { tags: 1} ) Pros: • Usage object encapsulated where used Cons: • If updates allowed, changes will have to be cascadedWednesday, December 5, 12
  • 50. Summary • Single biggest performance factor • More choices than in an RDBMS • Embedding, index design, shard keysWednesday, December 5, 12
  • 51. Part Two Schema DesignWednesday, December 5, 12
  • 52. Lab #1 Design Schema for Twitter • Model each users activity stream • Users • Name, email address, display name • Tweets • Text • Who • TimestampWednesday, December 5, 12
  • 53. Lab #1 - Solution A Two Collections // users - one doc per user { _id: "alvin", email: "alvin@10gen.com", display: "jonnyeight" } // tweets - one doc per user per tweet { user: "bob", for: "alvin", tweet: "20111209-1231", text: "Best Tweet Ever!", ts: ISODate("2011-09-18T09:56:06.298Z") }Wednesday, December 5, 12
  • 54. Lab #1 - Solution B Embedded Tweets // users - one doc per user with all tweets { _id: "alvin", email: "alvin@10gen.com", display; "jonnyeight", tweets: [ ! { ! ! user: "bob", ! ! tweet: "20111209-1231", ! ! text: "Best Tweet Ever!", ts: ISODate("2011-09-18T09:56:06.298Z") ! } ] }Wednesday, December 5, 12
  • 55. Embedding • Great for read performance • One seek to load entire object • One roundtrip to database • Writes can be slow if adding to objects all the timeWednesday, December 5, 12
  • 56. Linking or Embedding? Linking can make some queries easy // Find latest 50 tweets for "alvin" > db.tweets.find( { _id:"alvin"} ) .sort( {ts:-1} ) .limit(50) But what effect does this have on the systems?Wednesday, December 5, 12
  • 57. Collection 1 Index 1Wednesday, December 5, 12
  • 58. Collection 1 Virtual Address Space 1 Index 1 This is your virtual memory size (mapped)Wednesday, December 5, 12
  • 59. Collection 1 Virtual Address Space 1 Physical RAM Index 1 This is your resident memory sizeWednesday, December 5, 12
  • 60. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1Wednesday, December 5, 12
  • 61. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1 100 ns = 10,000,000 ns =Wednesday, December 5, 12
  • 62. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1 1 2 > db.tweets.find( { _id: "alvin" } ) .sort( { ts: -1 } ) .limit(10) 3 Linking = Many seeks + random readsWednesday, December 5, 12
  • 63. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1 > db.tweets.find( { _id: "alvin" } ) 1 Embedding = Large Sequential ReadWednesday, December 5, 12
  • 64. Lab #2 Alternative Schema • Display last 10 tweets from today • Efficiently use memory and Disk seeks / IOPsWednesday, December 5, 12
  • 65. Lab #2 - Solution Buckets // tweets : one doc per user per day > db.tweets.findOne() { _id: "alvin-2011/12/09", email: "alvin@10gen.com", tweets: [ { user: "Bob", ! tweet: "20111209-1231", ! text: "Best Tweet Ever!" } , ! { author: "Joe", ! tweet: "20111210-9025", ! date: "May 27 2011", ! text: "Stuck in traffic (again)" } ] }Wednesday, December 5, 12
  • 66. Lab #2 - Solution Last 10 Tweets > db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ) .sort( { _id: -1 } ) .limit(1)Wednesday, December 5, 12
  • 67. Lab #2 - Solution Adding a Tweet > tweet = { user: "Bob", ! tweet: "20111209-1231", ! text: "Best Tweet Ever!" } > db.tweets.update( { _id : "alvin-2011/12/09" }, { $push : { tweets : tweet } );Wednesday, December 5, 12
  • 68. Lab #2 - Solution Getting All Tweets > cursor = db.tweets.find ( { _id : /^alvin/ } ).sort( { _id : -1 } ) > while ( cursor.hasNext() ) { doc = cursor.next(); for ( var i=0; i<doc.tweets.length; i++ ) printjson( doc.tweets[i] ) }Wednesday, December 5, 12
  • 69. Lab #2 - Solution Deleting a Tweet > db.tweets.update( { _id: "alvin-20111209" }, { $pull: { tweets: { tweet: "20111209-1231" } } )Wednesday, December 5, 12
  • 70. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1 > db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ) 1 .sort( { _id: -1 } ) .limit(1) Bucket = 1 seek + 1 sequential readWednesday, December 5, 12
  • 71. Trees http://bit.ly/Oqc8XsWednesday, December 5, 12
  • 72. Trees Hierarchical information    Wednesday, December 5, 12
  • 73. Trees Full Tree in Document { retweet: [ { who: “Kyle”, text: “...”, retweet: [ {who: “James”, text: “...”, retweet: []} ]} ] } Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 16MB limit    Wednesday, December 5, 12
  • 74. Array of Ancestors A B C // Store all Ancestors of a node E D { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } F { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" }Wednesday, December 5, 12
  • 75. Array of Ancestors A B C // Store all Ancestors of a node E D { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } F { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) // find all retweets of "e" anywhere in tree > db.tweets.find( { tree: "e" } ) // find tweet history of f: > tweets = db.tweets.findOne( { _id: "f" } ).tree > db.tweets.find( { _id: { $in : tweets } } )Wednesday, December 5, 12
  • 76. Trees as Paths A B C E D Store hierarchy as a path expression • Separate each node by a delimiter, e.g. “/” F • Use text search for find parts of a tree { retweets: [ { _id: "a", text: "initial tweet", path: "a" }, { _id: "b", text: "reweet with comment", path: "a/b" }, { _id: "c", text: "reply to retweet", path : "a/b/c"} ] } // Find the conversations "a" started > db.tweets.find( { path: /^a/i } )Wednesday, December 5, 12
  • 77. Queues & Workflows http://bit.ly/QeNsPXWednesday, December 5, 12
  • 78. Lab #3 Following Requests • Users are allowed to "follow" another user • User send a "follow" request • Follower approves or not • Requests are timed out after 7 days • The approval is an async processWednesday, December 5, 12
  • 79. Lab #3 - Solution Queues & Workflows • Need to maintain order and state • Ensure that updates are atomic > db.approvals.insert( { inprogress: false, approved: false, priority: 1, text: "Hey Jim, want to follow you!" } ); // find highest priority approval and mark as in-progress job = db.approvals.findAndModify({ query: { inprogress: false }, sort: { priority: -1 }, update: { $set: { inprogress: true, started: new Date() } }, new: true})Wednesday, December 5, 12
  • 80. Lab #3 - Solution Queues & Workflows • Need to maintain order and state • Ensure that updates are atomic > db.approvals.insert( { inprogress: false, approved: false, priority: 1, text: "Hey Jim, want to follow you!" } ); // find highest priority approval and mark as in-progress job = db.approvals.findAndModify({ query: { inprogress: false }, sort: { priority: -1 }, update: { $set: { inprogress: true, started: new Date() } }, new: true})Wednesday, December 5, 12
  • 81. Lab #3 - Solution Queues & Workflows updated { inprogress: true, priority: 1, approved: False, started: ISODate("2011-09-18T09:56:06.298Z") ... } addedWednesday, December 5, 12
  • 82. Lab #3 - Solution Queues & Workflows • Follower approves request // update approval after receiving approval > job = db.approvals.update( { _id: "1234" }, { $set: { approved: true } } ) • System times out request after 7 days var limit=new Date(); limit.setDate(limit.getDate()-7); > job = db.approvals.update( { inprogress: true, started: { $gt: limit} }, { $set: { approved: false } } )Wednesday, December 5, 12
  • 83. Lab #4 Voting Twitter meets Stack Overflow • Users can "vote" for a tweet • A user can "vote" once and only once • Need to display current votesWednesday, December 5, 12
  • 84. Lab #4 - Solution Votes // One document per voter per tweet > db.votes.insert( { tweet: "20111209-1231", voter: "alvin" } ); // Unique index guarantees the user cant vote twice > db.votes.ensureIndex( { tweet: 1, voter: 1 }, { unique: true } ); // Count will return the number of votes cast > db.votes.find({ tweet: "20111209-1231" }).count()Wednesday, December 5, 12
  • 85. Count or Not? • Indexes in MongoDB are not counting • The count has to be computed via a index scan // One summary document per tweet, no "voter" key > db.votes.update( { tweet: "20111209-1231", voter: { $exists: false } }, { "$inc": { count: 1 } }, true, false ); // Return the count for the no "voter" document > db.votes.find( { tweet: "20111209-1231", voter: { $exists: false } }, { count: 1, _id: 0} )Wednesday, December 5, 12
  • 86. Lab #5 Time Series • Records votes by • Day, Hour, Minute • Show time series of votes castWednesday, December 5, 12
  • 87. Lab #5 - Solution A Time Series // Time series buckets, hour and minute sub-docs { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, hourly: { 0: 23, 1: 14, 2: 19 ... 23: 72 }, minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 } }Wednesday, December 5, 12
  • 88. Lab #5 - Solution A Time Series // Add one to the last minute before midnight > db.votes.update( { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.037Z") }, { $inc: { daily: 1 }, $inc: { "hourly.23": 1 }, $inc: { "minute.1439": 1 } ) What is the cost of updating the minute before midnight?Wednesday, December 5, 12
  • 89. BSON Storage • Sequence of key/value pairs • NOT a hash map • Optimized to scan quickly 0 1 2 3 ... 1439 • 1439 skipsWednesday, December 5, 12
  • 90. BSON Storage • Can skip sub-documents 0 1 ... 23 1 ... 59 60 ... 119 1380 ... 1439 • 23 skips (hours) + 59 skips (minutes) = 82 skipsWednesday, December 5, 12
  • 91. Lab #5 - Solution B Time Series // Time series buckets, each hour a sub-document { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, minute: { 0: { 0: 0, 1: 7, ... 59: 2 }, ... 23: { 0: 15, ... 59: 6 } } } // Add one to the last second before midnight > db.votes.update( { _id: "20111209-1231" }, ts: ISODate("2011-12-09T00:00:00.000Z") }, { $inc: { daily: 1 }, $inc: { "minute.23.59": 1 } })Wednesday, December 5, 12
  • 92. Lab #6 Inventory • User has a number of "votes" they can useWednesday, December 5, 12
  • 93. Lab #6 - Solution Inventory // Number of votes and who voted for { _id: "alvin", votes: 42, voted_for: [] } // Subtract a vote and add the voted for tweet // "20111209-1231" > db.user.update( { _id: "alvin", votes : { $gt : 0}, voted_for: { $ne: "20111209-1231" }}, { "$push": { voted_for: "20111209-1231"}, "$inc": { votes: -1} } )Wednesday, December 5, 12
  • 94. Lab #6 - Solution Inventory // After vote decremented > db.votes.findOne() { _id: "alvin", votes: 41, voted_for: ["20111209-1231"] } addedWednesday, December 5, 12
  • 95. Lab #7 Statistic Buckets • Record referring web sites on customer sign up • Independent counter for each web siteWednesday, December 5, 12
  • 96. Lab #7 - Solution A Statistic Buckets { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 4 },         { domain: "www.yahoo.com", count: 1 }, ] }Wednesday, December 5, 12
  • 97. Lab #7 - Solution A Statistic Buckets { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 4 },         { domain: "www.yahoo.com", count: 1 }, ] } > db.referers.update( { "referrers.domain": "www.google.co.uk" }, { $inc: { "referrers.$.count": 1 } } )Wednesday, December 5, 12
  • 98. Lab #7 - Solution A Statistic Buckets { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 4 },         { domain: "www.yahoo.com", count: 1 }, ] } > db.referers.update( { "referrers.domain": "www.google.co.uk" }, { $inc: { "referrers.$.count": 1 } } )Wednesday, December 5, 12
  • 99. Lab #7 - Solution A Statistic Buckets { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 4 },         { domain: "www.yahoo.com", count: 1 }, ] } > db.referers.update( { "referrers.domain": "www.google.co.uk" }, { $inc: { "referrers.$.count": 1 } } ) { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 5 },         { domain: "www.yahoo.com", count: 1 }, ] }Wednesday, December 5, 12
  • 100. Lab #7 - Solution A Statistic Buckets { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 4 },         { domain: "www.yahoo.com", count: 1 }, ] } > db.referers.update( { "referrers.domain": "www.google.co.uk" }, { $inc: { "referrers.$.count": 1 } } ) { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 5 },         { domain: "www.yahoo.com", count: 1 }, ] }Wednesday, December 5, 12
  • 101. Lab #7 - Solution A Statistic Buckets > db.referers.update( { "referrers.domain": "www.bing.com" }, { $inc: {"referrers.$.count": 1 } }, false, true ) What happens if a new referring site is used?Wednesday, December 5, 12
  • 102. Lab #7 - Solution B Statistic Buckets // Need to replace dots with underscores { _id: "alvin", referrers:      { "www_google_co_uk": 4,        "www_yahoo_com": 1 }, } // simple $inc will add www_bing_com if not present > db.referers.update( { _id: "alvin" }, { $inc: { "referrers.www_bing_com": 1 } }, true, false);Wednesday, December 5, 12
  • 103. Part Three ShardingWednesday, December 5, 12
  • 104. What is Sharding • Ad-hoc partitioning • Consistent hashing • Amazon Dynamo • Range based partitioning • Google BigTable • Yahoo! PNUTS • MongoDBWednesday, December 5, 12
  • 105. MongoDB Sharding • Automatic partitioning and management • Range based • Convert to sharded system with no downtime • Fully consistent • No code changes requiredWednesday, December 5, 12
  • 106. Sharding - Range distribution sh.shardCollection("mydb.tweets",  {_id:  1}  ,  false) shard01 shard02 shard03Wednesday, December 5, 12
  • 107. Sharding - Range distribution shard01 shard02 shard03 a-i j-r s-zWednesday, December 5, 12
  • 108. Sharding - Splits shard01 shard02 shard03 a-i ja-jz s-z k-rWednesday, December 5, 12
  • 109. Sharding - Splits shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-rWednesday, December 5, 12
  • 110. Sharding - Auto Balancing shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw js-jw jz-r jz-rWednesday, December 5, 12
  • 111. Sharding - Auto Balancing shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-rWednesday, December 5, 12
  • 112. Sharding for cachingWednesday, December 5, 12
  • 113. Sharding for caching 96 GB Mem 3:1 Data/Mem shard01 a-i 300 GB Data j-r s-z 300 GBWednesday, December 5, 12
  • 114. Aggregate Horizontal Resources 96 GB Mem 96 GB Mem 96 GB Mem 1:1 Data/Mem 1:1 Data/Mem 1:1 Data/Mem shard01 shard02 shard03 a-i j-r s-z 300 GB Data j-r s-z 100 GB 100 GB 100 GBWednesday, December 5, 12
  • 115. Sharding Features • Shard data without no downtime • Automatic balancing as data is written • Commands routed (switched) to correct node • Inserts - must have the Shard Key • Updates - can have the Shard Key • Queries • With Shard Key - routed to nodes • Without Shard Key - scatter gather • Indexed / Sorted Queries • With Shard Key - routed in order • Without Shard Key - distributed sort mergeWednesday, December 5, 12
  • 116. Lab #8 Sharding Twitter Pictures User can upload pictures to Twitter feed { photo_id : ???? , data : <binary> } What should photo_id be? How will photo_id be sharded?Wednesday, December 5, 12
  • 117. Lab #8 Sharding Key { photo_id : ???? , data : <binary> } What’s the right key? • auto increment • MD5( data ) • month() + MD5( data )Wednesday, December 5, 12
  • 118. Right balanced access • Only have to keep small portion in ram • Time Based • Right shard "hot" • ObjectId • Auto IncrementWednesday, December 5, 12
  • 119. Random access • Have to keep entire index in ram • All shards "warm" • HashWednesday, December 5, 12
  • 120. Segmented access • Have to keep some index in ram • Some shards "warm" •Month + HashWednesday, December 5, 12
  • 121. Lab #9 Single Identities // Shard by _id ids: { _id : "alvin", email: "alvin@10gen.com", addresses: [ { state : "CA", country: "USA" }, { country: "UK" } ] } How would the following queries be executed? > db.ids.find( { _id: "alvin"} ) > db.ids.find( { email: "alvin@10gen.com" } )Wednesday, December 5, 12
  • 122. Sharding - Routed Query find(  {  _id:  "alvin"}  ) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-rWednesday, December 5, 12
  • 123. Sharding - Routed Query find(  {  _id:  "alvin"}  ) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-rWednesday, December 5, 12
  • 124. Sharding - Scatter Gather find(  {  email:  "alvin@10gen.com"  }  ) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-rWednesday, December 5, 12
  • 125. Sharding - Scatter Gather find(  {  email:  "alvin@10gen.com"  }  ) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-rWednesday, December 5, 12
  • 126. Lab #9 Multiple Identities User can have multiple identities • twitter name • email address • facebook name • etc. What is the best sharding key & schema design?Wednesday, December 5, 12
  • 127. Lab #9 - Solution A Multiple Identities // Shard by _id { _id: "alvin", email: "alvin@10gen.com", fb: "alvin.richards", // facebook li: "alvin.j.richards", // linkedin tweets: [ ... ] } Lookup by _id hits 1 node Lookup by email, li or fb is scatter gather Cannot create a unique index on email, li or fbWednesday, December 5, 12
  • 128. Lab #9 - Solution B Multiple Identities identities { _id: { _id: "alvin"}, info: "1200-42"} { _id: { em: "alvin@10gen.com"}, info: "1200-42"} { _id: { li: "alvin.j.richards"}, info: "1200-42"} tweets { _id: "1200-42", tweets: [ ... ] } • Shard identities on { _id: 1} • Can create unique index on _id • Shard info on { _id: 1 }Wednesday, December 5, 12
  • 129. Sharding - Multiple Identities shard01 shard02 shard03 em: a-q em: r-z _id: a-z _id: "Min"- li: d-r "1100" li: s-z _id: "1100"- _id: "1200"- "1200" "Max" li: a-c ids tweets collection collectionWednesday, December 5, 12
  • 130. Sharding - Multiple Identities ids.find({  _id:                      {"em","alvin@10gen.com  }) shard01 shard02 shard03 em: a-q em: r-z _id: a-z _id: "Min"- li: d-r "1100" li: s-z _id: "1100"- _id: "1200"- "1200" "Max" li: a-c ids tweets collection collectionWednesday, December 5, 12
  • 131. Sharding - Multiple Identities ids.find({  _id:                      {"em","alvin@10gen.com  }) tweets.find({  _id:  "1200-­‐42"  }) shard01 shard02 shard03 em: a-q em: r-z _id: a-z _id: "Min"- li: d-r "1100" li: s-z _id: "1100"- _id: "1200"- "1200" "Max" li: a-c ids tweets collection collectionWednesday, December 5, 12
  • 132. Part Four ReplicationWednesday, December 5, 12
  • 133. Types of outage • Planned • Hardware upgrade • O/S or file-system tuning • Relocation of data to new file-system / storage • Software upgrade • Unplanned • Hardware failure • Data center failure • Region outage • Human error • Application corruptionWednesday, December 5, 12
  • 134. Replica Sets • Data Protection • Multiple copies of the data • Spread across Data Centers, AZs • High Availability • Automated Failover • Automated RecoveryWednesday, December 5, 12
  • 135. Replica Sets App Write Primary Asynchronous Read Replication Secondary Read Secondary ReadWednesday, December 5, 12
  • 136. Replica Sets App Write Primary Read Secondary Read Secondary ReadWednesday, December 5, 12
  • 137. Replica Sets App Primary Write Primary Automatic Election of new Primary Read Secondary ReadWednesday, December 5, 12
  • 138. Replica Sets App Recovering Write New primary serves Primary data Read Secondary ReadWednesday, December 5, 12
  • 139. Replica Sets App Secondary Read Write Primary Read Secondary ReadWednesday, December 5, 12
  • 140. Elections During an election • Most up to date • Highest priority • Less than 10s behind failed PrimaryWednesday, December 5, 12
  • 141. Types of Durability with MongoDB • Fire and forget • Wait for error • Wait for fsync • Wait for journal sync • Wait for replicationWednesday, December 5, 12
  • 142. Network Ack- Old Default Driver Primary write apply  in  memoryWednesday, December 5, 12
  • 143. Get last error - New default Driver Primary write getLastError apply  in  memoryWednesday, December 5, 12
  • 144. Wait for Journal Sync Driver Primary write getLastError apply  in  memory j:true Write  to  journalWednesday, December 5, 12
  • 145. Wait for replication Driver Primary Secondary write getLastError apply  in  memory w:2 replicateWednesday, December 5, 12
  • 146. Tunable Data Durability Memory Journal Secondary Other Data Center RDBMS network async ACK w=1 w=1 j=true syncw="majority" w=nw="myTag" Less MoreWednesday, December 5, 12
  • 147. Eventual Consistency Using Replicas for Reads Read  preference • primary (only) • primaryPreferred • secondary (only) • secondaryPreferred • nearestWednesday, December 5, 12
  • 148. Immediate Consistency Thread #1 Primary Insert v1 Read ✔ Update v2 Read ✔Wednesday, December 5, 12
  • 149. Eventual Consistency Thread #1 Primary Secondary Thread #2 Insert v1 v1 does not exist Read ✔ ✖ v1 reads v1 Update v2 ✔ Read ✔ ✖ reads v1 v2 ✔ reads v2Wednesday, December 5, 12
  • 150. Lab #10 Replication Primary, Secondary or both? • Show the latest "votes" for a tweet and/or user • Changing your profile picture • Showing your thumbnail with a tweetWednesday, December 5, 12
  • 151. Summary • Schema design is different in MongoDB • Basic data design principals stay the same • Focus on how the application manipulates data • Rapidly evolve schema to meet your requirements • Consider sharding early • Understand the impact of eventual consistencyWednesday, December 5, 12
  • 152. download at mongodb.org conferences,  appearances,  and  meetups http://www.10gen.com/events Facebook                    |                  Twitter                  |                  LinkedIn http://bit.ly/mongo>   @mongodb http://linkd.in/joinmongoWednesday, December 5, 12

×