Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Schema Design with MongoDB

25,706 views

Published on

Introduction to schema design with MongoDB

Published in: Technology
  • Nice !! Download 100 % Free Ebooks, PPts, Study Notes, Novels, etc @ https://www.ThesisScientist.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • For data visualization,data analytics,data intelligence and ERP Tools, online training with job placements, register at http://www.todaycourses.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Schema Design with MongoDB

  1. 1. open-source, high-performance, document-oriented database
  2. 2. Schema Design Basics roger@10Gen.com
  3. 3. This talk
  4. 4. This talk ‣Intro -Terms / Definitions
  5. 5. This talk ‣Intro -Terms / Definitions
  6. 6. This talk ‣Intro -Terms / Definitions ‣Getting a flavor -Creating a Schema -Indexes -Evolving the Schema
  7. 7. This talk ‣Intro -Terms / Definitions ‣Getting a flavor -Creating a Schema -Indexes -Evolving the Schema ‣Data modeling -DBRef -Single Table Inheritance -Many - Many -Trees -Lists / Queues / Stacks
  8. 8. Document Oriented Basic unit of data: JSON Documents Not Relational, Key Value Not OODB - Associations implied by Document Structure - but your database schema != your program schema
  9. 9. Terms Table -> Collection Row(s) -> JSON Document Index -> Index Join -> Embedding and Linking across documents Partition -> Shard Partition Key -> Shard Key
  10. 10. Considerations What are the requirements ? - Functionality to be supported - Access Patterns ? - Data Life Cycle (insert, update, deletes) - Expected Performance / Workload ? Capabilities of the database ?
  11. 11. DB Considerations How can we manipulate this data ? Dynamic Queries Secondary Indexes Atomic Updates Map Reduce Access Patterns ? Read / Write Ratio Types of updates Types of queries Considerations No Joins Single Document Transactions only
  12. 12. Design Session Use Rich Design Documents post = {author: “kyle”, date: new Date(), text: “my blog post...”, tags: [“mongodb”, “intro”]} >db.post.save(post)
  13. 13. >db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ] } Notes: - ID is unique, but can be anything you’d like
  14. 14. Secondary index for “author” // 1 means ascending, -1 means descending >db.posts.ensureIndex({author: 1}) >db.posts.find({author: 'kyle'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", ... }
  15. 15. Verifying indexes exist >db.system.indexes.find() // Index on ID { name : "_id_", ns : "test.posts", key : { "_id" : 1 } }
  16. 16. Verifying indexes exist >db.system.indexes.find() // Index on ID { name : "_id_", ns : "test.posts", key : { "_id" : 1 } } // Index on author { _id : ObjectId("4c4ba6c5672c685e5e8aabf4"), ns : "test.posts", key : { "author" : 1 }, name : "author_1" }
  17. 17. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}})
  18. 18. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i })
  19. 19. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i }) Counting: // posts written by mike >db.posts.find({author: “mike”}).count()
  20. 20. Extending the Schema comment = {author: “fred”, date: new Date(), text: “super duper”} update = { ‘$push’: {comments: comment}, ‘$inc’: {comments_count: 1}} >db.posts.update({_id: “...” }, update)
  21. 21. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ], comments_count: 1, comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Super Duper" } ]}
  22. 22. // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”Fred”})
  23. 23. // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”kyle”}) // find last 5 posts: >db.posts.find().sort({date:-1}).limit(5)
  24. 24. // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”kyle”}) // find last 5 posts: >db.posts.find().sort({date:-1}).limit(5) // most commented post: >db.posts.find().sort({comments_count:-1}).limit(1) When sorting, check if you need an index
  25. 25. Map Reduce Aggregation and batch manipulation Collection in, Collection out Parallel in sharded environments
  26. 26. Map reduce mapFunc = function () { this.tags.forEach(function (z) {emit(z, {count:1});}); } reduceFunc = function (k, v) { var total = 0; for (var i = 0; i < v.length; i++) { total += v[i].count; } return {count:total}; } res = db.posts.mapReduce(mapFunc, reduceFunc) >db[res.result].find() { _id : "intro", value : { count : 1 } } { _id : "mongodb", value : { count : 1 } }
  27. 27. Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more
  28. 28. Wordnik 9B records, 100M queries / week, 1,2TB { entry : { header: { id: 0, headword: "m", sourceDictionary: "GCide", textProns : [ {text: "(em)", seq:0} ], syllables: [ {id: 0, text: "m"} ], sourceDictionary: "1913 Webster", headWord: "m", id: 1, definitions: : [ {text: "M, the thirteenth letter..."}, {text: "As a numeral, M stands for 1000"}] } } }
  29. 29. Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more Observations: - Using Rich Documents works well - Simplify relations by embedding them - Iterative development is easy with MongoDB
  30. 30. Single Table Inheritance >db.shapes.find() { _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1} { _id: ObjectId("..."), type: "square", area: 4, d: 2} { _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2} // find shapes where radius > 0 >db.shapes.find({radius: {$gt: 0}}) // create index >db.shapes.ensureIndex({radius: 1})
  31. 31. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents
  32. 32. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query
  33. 33. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query - Normalized (2 collections) - most flexible - more queries
  34. 34. Many - Many Example: - Product can be in many categories - Category can have many products Products id | product_id | category_id Category
  35. 35. products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}
  36. 36. products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}
  37. 37. products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})
  38. 38. products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")}) //All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
  39. 39. Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}
  40. 40. Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"} // All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
  41. 41. Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"} // All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")}) // All categories for a given product product = db.products.find(_id : some_id) >db.categories.find({_id : {$in : product.category_ids}})
  42. 42. Trees Full Tree in Document { comments: [ { author: “rpb”, text: “...”, replies: [ {author: “Fred”, text: “...”, replies: []} ]} ]} Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 4MB limit
  43. 43. Trees Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)
  44. 44. Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
  45. 45. Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" } //find all descendants of b: >db.tree2.find({ancestors: ‘b’})
  46. 46. Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" } //find all descendants of b: >db.tree2.find({ancestors: ‘b’}) //find all ancestors of f: >ancestors = db.tree2.findOne({_id:’f’}).ancestors >db.tree2.find({_id: { $in : ancestors})
  47. 47. findAndModify Queue example //Example: grab highest priority job and mark job = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -1), update: {$set: {inprogress: true, started: new Date()}}, new: true})
  48. 48. More Cool Stuff • Aggregation • Capped collections • GridFS • Geo
  49. 49. Learn More Kyle’s presentation + video: http://www.slideshare.net/kbanker/mongodb-schema-design http://www.blip.tv/file/3704083 Dwight’s presentation http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwight- merriman Documentation Trees: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB Queues: http://www.mongodb.org/display/DOCS/findandmodify+Command Aggregration: http://www.mongodb.org/display/DOCS/Aggregation Capped Col. : http://www.mongodb.org/display/DOCS/Capped+Collections Geo: http://www.mongodb.org/display/DOCS/Geospatial+Indexing GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification
  50. 50. Thank You :-)
  51. 51. Download MongoDB and let us know what you think @mongodb http://www.mongodb.org
  52. 52. DBRef DBRef {$ref: collection, $id: id_value} - Think URL - YDSMV: your driver support may vary Sample Schema: nr = {note_refs: [{"$ref" : "notes", "$id" : 5}, ... ]} Dereferencing: nr.forEach(function(r) { printjson(db[r.$ref].findOne({_id: r.$id})); }
  53. 53. BSON Mongodb stores data in BSON internally Lightweight, Traversable, Efficient encoding Typed boolean, integer, float, date, string, binary, array...

×