Uploaded on

Introduction to Schema design using MongoDB

Introduction to Schema design using MongoDB

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
755
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
43
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • * EXplain why..\n
  • * 3rd Normal Form - determining a table's degree of vulnerability to logical inconsistencies\n* The higher the normal form applicable to a table, the less vulnerable it is to inconsistencies and anomalies\n
  • * Scaling RDMS path tends towards denormalization\n
  • * No joins for scalability - Doing joins across shards in SQL highly inefficient and difficult to perform.\n* MongoDB is geared for easy scaling - going from a single node to a distributed cluster is easy.\n* Little or no application code changes are needed to scale from a single node to a sharded cluster.\n
  • * Questions about database features inform our schema design\nAccess Patterns\n* Less of an issue for Normalized databases\n* MongoDB document models can be rich, its flexible\n
  • * To review simple schema design we'll use a simple blog example..\n
  • * Notice Hergé - UTF-8 support is native\n
  • \n
  • \n
  • \n
  • * Can create indexes for arrays / objects\n* In the Relational world - you'd have to do joins\n* Object modelled directly to MongoDB\n
  • * Rich query language\n* Powerful - can do range queries $lt and $gt\n* Update - can update parts of documents\n
  • \n
  • * upserts - $push, $inc\n\n
  • \n
  • * Allows easy access to embedded documents / arrays\n* Also can do positional: comments.0.author\n
  • * range queries still use indexes\n
  • \n
  • * Full collection scan\n* scanAndOrder - reorders\n
  • \n
  • * If document is always presented as a whole - a single doc gives performance benefits\n* A single doc is not a panacea - as we'll see\n
  • *As with nature common patterns emerge when modeling data\n
  • \n
  • \n
  • * Leaves nulls in the table\n* Not intuitive\n
  • * Single Table inheritance is clean and initiative in mongodb\n
  • * Single Table inheritance is clean and initiative in mongodb\n
  • \n
  • * One author one Blog Entry\n* Many authors for one Blog Entry\n** Delete the blog - don't delete the author(s)\n** Delete the blog - delete the author(s) - aka Cascading delete\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • * Also one to many pattern\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • * Update: will update in_progress and add started\n
  • * Update: will update in_progress and add started\n
  • \n
  • * limits on number of namespaces\n
  • * Schema is specific to application / data usage\n* Think future - data change / how you are going to query\n
  • \n

Transcript

  • 1. Schema DesignChristian Kvalheim - christkv@10gen.com
  • 2. Topics Introduction• Working with documents• Evolving a schema• Queries and indexes• Rich Documents
  • 3. Topics Introduction• Working with documents• Evolving a schema• Queries and indexes• Rich DocumentsCommon patterns• Single table inheritance• One-to-Many & Many-to-Many• Trees• Queues
  • 4. Ways to model data: http://www.flickr.com/photos/42304632@N00/493639870/
  • 5. Relational
  • 6. Rich Document
  • 7. Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking
  • 8. Schema-design criteria How can we manipulate Access Patterns? this data?• Dynamic Queries • Read / Write Ratio• Secondary Indexes • Types of updates• Atomic Updates • Types of queries• Map Reduce • Data life-cycle• Aggregation (coming soon) Considerations• No Joins• Document writes are atomic
  • 9. Destination Moon
  • 10. A simple startpost = {author: "Hergé", date: new Date(), text: "Destination Moon", tags: ["comic", "adventure"]}> db.blog.save(post)Map the documents to your application.
  • 11. Find the document> db.blog.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text: "Destination Moon", tags: [ "comic", "adventure" ] }Note:• _id must be unique, but can be anything youd like• Default BSON ObjectId if one is not supplied
  • 12. Add an index, find via index> db.blog.ensureIndex({author: 1})> db.blog.find({author: Hergé}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), ... }Secondary index on "author"
  • 13. Examine the query plan> db.blogs.find({"author": Hergé}).explain(){ "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] }}
  • 14. Multi-key indexes// Build an index on the tags array> db.blog.ensureIndex({tags: 1})// find posts with a specific tag// (This will use an index!)> db.blog.find({tags: comic}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), ... }
  • 15. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists,$type, .. $lt, $lte, $gt, $gte, $ne Update operators: $set, $inc, $push, $pop, $pull, $pushAll, $pullAll
  • 16. Extending the schema http://nysi.org.uk/kids_stuff/rocket/rocket.htm
  • 17. Extending the Schemanew_comment = {author: "Chris", date: new Date(), text: "great book", votes: 5}> db.blog.update( {text: "Destination Moon" }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} })
  • 18. Extending the Schema { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [{ author : "Chris", date : ISODate("2012-01-23T14:31:53.848Z"), text : "great book", votes : 5 }], comments_count: 1 }
  • 19. Extending the Schema { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [{ author : "Chris", date : ISODate("2012-01-23T14:31:53.848Z"), text : "great book", votes : 5 }], comments_count: 1 }
  • 20. The dot operator// create index on nested documents:> db.blog.ensureIndex({"comments.author": 1})> db.blog.find({"comments.author":"Chris"}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), ... }
  • 21. The dot operator// create index comment votes:> db.blog.ensureIndex({"comments.votes": 1})// find all posts with any comments with// more than 50 votes> db.blog.find({"comments.votes": {$gt: 50}})
  • 22. The dot operator// find last 5 posts:> db.blog.find().sort({"date":-1}).limit(5)// find the top 10 commented posts:> db.blog.find().sort({"comments_count":-1}).limit(10)When sorting, check if you need an index...
  • 23. Watch for full table scans{ "cursor" : "BasicCursor", "nscanned" : 250003, "nscannedObjects" : 250003, "n" : 10, "scanAndOrder" : true, "millis" : 335, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { }}
  • 24. Watch for full table scans{ "cursor" : "BasicCursor", "nscanned" : 250003, "nscannedObjects" : 250003, "n" : 10, "scanAndOrder" : true, "millis" : 335, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { }}
  • 25. Rich Documents http://www.flickr.com/photos/diorama_sky/2975796332
  • 26. Rich Documents• Intuitive• Developer friendly• Encapsulates whole objects• Performant• They are scalable
  • 27. Common Patterns http://www.flickr.com/photos/colinwarren/158628063
  • 28. Inheritancehttp://www.flickr.com/photos/dysonstarr/5098228295
  • 29. Inheritance
  • 30. Single Table Inheritance - RDBMS• Shapes table id type area radius d length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2
  • 31. Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "circle", area: 3.14, radius: 1} { _id: "2", type: "square", area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}
  • 32. Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "circle", area: 3.14, radius: 1} { _id: "2", type: "square", area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}// find shapes where radius > 0> db.shapes.find({radius: {$gt: 0}})
  • 33. Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "circle", area: 3.14, radius: 1} { _id: "2", type: "square", area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}// find shapes where radius > 0> db.shapes.find({radius: {$gt: 0}})// create sparse index> db.shapes.ensureIndex({radius: 1}, {sparse: true})
  • 34. One to Manyhttp://www.flickr.com/photos/j-fish/6502708899/
  • 35. One to Many
  • 36. One to ManyEmbedded Array / Array Keys• $slice operator to return subset of array• some queries hard e.g find latest comments across all documents
  • 37. One to Many Embedded Array / Array Keys { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [{ author : "Chris", date : ISODate("2012-01-23T14:31:53.848Z"), text : "great book", votes : 5 }], comments_count: 1 }
  • 38. One to Many Embedded Array / Array Keys { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [{ author : "Chris", date : ISODate("2012-01-23T14:31:53.848Z"), text : "great book", votes : 5 }], comments_count: 1 }
  • 39. One to Many Embedded Array / Array Keys { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [{ author : "Chris", date : ISODate("2012-01-23T14:31:53.848Z"), text : "great book", votes : 5 }], comments_count: 1 }
  • 40. One to ManyNormalized (2 collections)• Most flexible• More queries
  • 41. One to Many - Normalized // Posts collection { _id : 1000, author : "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text : "Destination Moon", } // Comments collection { _id : 1, blog : 1000, author : "Chris", date : ISODate("2012-01-23T14:31:53.848Z"), ... }> blog = db.blogs.find({text: "Destination Moon"});> db.comments.find({blog: blog._id});
  • 42. One to Many - patterns• Embedded Array / Array Keys• Embedded Array / Array Keys• Normalized
  • 43. Embedding vs. Referencing• Embed when the many objects always appear with their parent.• Reference when you need more flexibility.
  • 44. Many to Manyhttp://www.flickr.com/photos/pats0n/6013379192
  • 45. Many - ManyExample:• Product can be in many categories• Category can have many products
  • 46. Many to Many// Products{ _id: 10, name: "Destination Moon", category_ids: [20, 30]}
  • 47. Many to Many// Products{ _id: 10, name: "Destination Moon", category_ids: [20, 30]} // Categories{ _id: 20, name: "comic", product_ids:[10, 11, 12]}{ _id: 30, name: "adventure", product_ids:[10]}
  • 48. Many to Many // Products { _id: 10, name: "Destination Moon", category_ids: [20, 30]} // Categories { _id: 20, name: "comic", product_ids:[10, 11, 12]} { _id: 30, name: "adventure", product_ids:[10]}//All categories for a given product> db.categories.find({"product_ids": 10})
  • 49. Alternative// Products{ _id: 10, name: "Destination Moon", category_ids: [20, 30]} // Categories{ _id: 20, name: "comic"}
  • 50. Alternative // Products { _id: 10, name: "Destination Moon", category_ids: [20, 30]} // Categories { _id: 20, name: "comic"}//All products for a given category> db.products.find({"category_ids": 20})
  • 51. Alternative // Products { _id: 10, name: "Destination Moon", category_ids: [20, 30]} // Categories { _id: 20, name: "comic"}//All products for a given category> db.products.find({"category_ids": 20})// All categories for a given productproduct = db.products.find(_id : some_id)> db.categories.find({_id : {$in : product.category_ids}})
  • 52. Treeshttp://www.flickr.com/photos/cubagallery/5949819558
  • 53. TreesHierarchical information
  • 54. Trees Embedded Tree { comments : [{ author : "Chris", text : "...", replies : [{ author : "Fred", text : "..." replies : [], }] }] }Pros: Single Document, Performance, IntuitiveCons: Hard to search, Partial Results, 16MB limit
  • 55. Array of Ancestors A B C// Store all ancestors of a node{ _id: "a" } E D{ _id: "b", thread: [ "a" ], replyTo: "a" }{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }{ _id: "e", thread: [ "a" ], replyTo: "a" }{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
  • 56. Array of Ancestors A B C// Store all ancestors of a node{ _id: "a" } E D{ _id: "b", thread: [ "a" ], replyTo: "a" }{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }{ _id: "e", thread: [ "a" ], replyTo: "a" }{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }// find all threads where b" is in> db.msg_tree.find({"thread": "b"})
  • 57. Array of Ancestors A B C// Store all ancestors of a node{ _id: "a" } E D{ _id: "b", thread: [ "a" ], replyTo: "a" }{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }{ _id: "e", thread: [ "a" ], replyTo: "a" }{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }// find all threads where b" is in> db.msg_tree.find({"thread": "b"})// find all direct message "b: replied to> db.msg_tree.find({"replyTo": "b"})
  • 58. Array of Ancestors A B C// Store all ancestors of a node{ _id: "a" } E D{ _id: "b", thread: [ "a" ], replyTo: "a" }{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }{ _id: "e", thread: [ "a" ], replyTo: "a" }{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }// find all threads where b" is in> db.msg_tree.find({"thread": "b"})// find all direct message "b: replied to> db.msg_tree.find({"replyTo": "b"})//find all ancestors of f:> threads = db.msg_tree.findOne({"_id": "f"}).thread> db.msg_tree.find({"_id ": { $in : threads})
  • 59. Array of AncestorsStore hierarchy as a path expression • Separate each node by a delimiter, e.g. "/" • Use text search for find parts of a tree{ comments: [ { author: "Kyle", text: "initial post", path: "" }, { author: "Jim", text: "jim’s comment", path: "jim" }, { author: "Kyle", text: "Kyle’s reply to Jim", path : "jim/kyle"} ] }// Find the conversations Jim was part of> db.blogs.find({path: /^jim/i})
  • 60. Queueshttp://www.flickr.com/photos/deanspic/4960440218
  • 61. QueueRequirements• See jobs waiting, jobs in progress• Ensure that each job is started once and only once// Queue document{ in_progress: false, priority: 1, message: "Rich documents FTW!" ...}
  • 62. QueueRequirements• See jobs waiting, jobs in progress• Ensure that each job is started once and only once// Queue document{ in_progress: false, priority: 1, message: "Rich documents FTW!" ...}// find highest priority job and mark as in-progressjob = db.jobs.findAndModify({ query: {in_progress: false}, sort: {priority: -1), update: {$set: {in_progress: true, started: new Date()}}})
  • 63. QueueRequirements• See jobs waiting, jobs in progress• Ensure that each job is started once and only once// Queue document{ in_progress: false, priority: 1, message: "Rich documents FTW!" ...}// find highest priority job and mark as in-progressjob = db.jobs.findAndModify({ query: {in_progress: false}, sort: {priority: -1), update: {$set: {in_progress: true, started: new Date()}}})
  • 64. Anti Patternshttp://www.flickr.com/photos/51838104@N02/5841690990
  • 65. Anti patterns• Careless indexing• Large, deeply nested documents• Multiple types for a key• One size fits all collections• One collection per user
  • 66. Summary• Schema design is different in MongoDB• Basic data design principals stay the same• Focus on how the apps manipulates data• Rapidly evolve schema to meet your requirements• Enjoy your new freedom, use it wisely :-)
  • 67. download at mongodb.org conferences, appearances, and meetups http://www.10gen.com/events Facebook | Twitter | LinkedInhttp://bit.ly/mongofb @mongodb http://linkd.in/joinmongo support, training, and this talk brought to you by