Schema Design with MongoDB
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Schema Design with MongoDB

on

  • 22,655 views

Introduction to schema design with MongoDB

Introduction to schema design with MongoDB

Statistics

Views

Total Views
22,655
Views on SlideShare
17,484
Embed Views
5,171

Actions

Likes
32
Downloads
458
Comments
0

10 Embeds 5,171

http://nosql.mypopescu.com 5085
http://gurjarmehul.wordpress.com 30
http://rg443blog.wordpress.com 24
http://wiki.infobank.net 10
http://webcache.googleusercontent.com 9
http://paper.li 6
http://translate.googleusercontent.com 3
https://twitter.com 2
https://si0.twimg.com 1
http://www.google.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • blog post <br /> twitter <br />
  • <br />
  • <br />

Schema Design with MongoDB Presentation Transcript

  • 1. open-source, high-performance, document-oriented database
  • 2. Schema Design Basics roger@10Gen.com
  • 3. This talk
  • 4. This talk ‣Intro -Terms / Definitions
  • 5. This talk ‣Intro -Terms / Definitions
  • 6. This talk ‣Intro -Terms / Definitions ‣Getting a flavor -Creating a Schema -Indexes -Evolving the Schema
  • 7. This talk ‣Intro -Terms / Definitions ‣Getting a flavor -Creating a Schema -Indexes -Evolving the Schema ‣Data modeling -DBRef -Single Table Inheritance -Many - Many -Trees -Lists / Queues / Stacks
  • 8. Document Oriented Basic unit of data: JSON Documents Not Relational, Key Value Not OODB - Associations implied by Document Structure - but your database schema != your program schema
  • 9. Terms Table -> Collection Row(s) -> JSON Document Index -> Index Join -> Embedding and Linking across documents Partition -> Shard Partition Key -> Shard Key
  • 10. Considerations What are the requirements ? - Functionality to be supported - Access Patterns ? - Data Life Cycle (insert, update, deletes) - Expected Performance / Workload ? Capabilities of the database ?
  • 11. DB Considerations How can we manipulate this data ? Dynamic Queries Secondary Indexes Atomic Updates Map Reduce Access Patterns ? Read / Write Ratio Types of updates Types of queries Considerations No Joins Single Document Transactions only
  • 12. Design Session Use Rich Design Documents post = {author: “kyle”, date: new Date(), text: “my blog post...”, tags: [“mongodb”, “intro”]} >db.post.save(post)
  • 13. >db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ] } Notes: - ID is unique, but can be anything you’d like
  • 14. Secondary index for “author” // 1 means ascending, -1 means descending >db.posts.ensureIndex({author: 1}) >db.posts.find({author: 'kyle'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", ... }
  • 15. Verifying indexes exist >db.system.indexes.find() // Index on ID { name : "_id_", ns : "test.posts", key : { "_id" : 1 } }
  • 16. Verifying indexes exist >db.system.indexes.find() // Index on ID { name : "_id_", ns : "test.posts", key : { "_id" : 1 } } // Index on author { _id : ObjectId("4c4ba6c5672c685e5e8aabf4"), ns : "test.posts", key : { "author" : 1 }, name : "author_1" }
  • 17. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}})
  • 18. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i })
  • 19. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i }) Counting: // posts written by mike >db.posts.find({author: “mike”}).count()
  • 20. Extending the Schema comment = {author: “fred”, date: new Date(), text: “super duper”} update = { ‘$push’: {comments: comment}, ‘$inc’: {comments_count: 1}} >db.posts.update({_id: “...” }, update)
  • 21. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ], comments_count: 1, comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Super Duper" } ]}
  • 22. // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”Fred”})
  • 23. // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”kyle”}) // find last 5 posts: >db.posts.find().sort({date:-1}).limit(5)
  • 24. // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”kyle”}) // find last 5 posts: >db.posts.find().sort({date:-1}).limit(5) // most commented post: >db.posts.find().sort({comments_count:-1}).limit(1) When sorting, check if you need an index
  • 25. Map Reduce Aggregation and batch manipulation Collection in, Collection out Parallel in sharded environments
  • 26. Map reduce mapFunc = function () { this.tags.forEach(function (z) {emit(z, {count:1});}); } reduceFunc = function (k, v) { var total = 0; for (var i = 0; i < v.length; i++) { total += v[i].count; } return {count:total}; } res = db.posts.mapReduce(mapFunc, reduceFunc) >db[res.result].find() { _id : "intro", value : { count : 1 } } { _id : "mongodb", value : { count : 1 } }
  • 27. Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more
  • 28. Wordnik 9B records, 100M queries / week, 1,2TB { entry : { header: { id: 0, headword: "m", sourceDictionary: "GCide", textProns : [ {text: "(em)", seq:0} ], syllables: [ {id: 0, text: "m"} ], sourceDictionary: "1913 Webster", headWord: "m", id: 1, definitions: : [ {text: "M, the thirteenth letter..."}, {text: "As a numeral, M stands for 1000"}] } } }
  • 29. Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more Observations: - Using Rich Documents works well - Simplify relations by embedding them - Iterative development is easy with MongoDB
  • 30. Single Table Inheritance >db.shapes.find() { _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1} { _id: ObjectId("..."), type: "square", area: 4, d: 2} { _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2} // find shapes where radius > 0 >db.shapes.find({radius: {$gt: 0}}) // create index >db.shapes.ensureIndex({radius: 1})
  • 31. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents
  • 32. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query
  • 33. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query - Normalized (2 collections) - most flexible - more queries
  • 34. Many - Many Example: - Product can be in many categories - Category can have many products Products id | product_id | category_id Category
  • 35. products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}
  • 36. products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}
  • 37. products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})
  • 38. products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")}) //All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
  • 39. Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}
  • 40. Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"} // All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
  • 41. Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"} // All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")}) // All categories for a given product product = db.products.find(_id : some_id) >db.categories.find({_id : {$in : product.category_ids}})
  • 42. Trees Full Tree in Document { comments: [ { author: “rpb”, text: “...”, replies: [ {author: “Fred”, text: “...”, replies: []} ]} ]} Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 4MB limit
  • 43. Trees Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)
  • 44. Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
  • 45. Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" } //find all descendants of b: >db.tree2.find({ancestors: ‘b’})
  • 46. Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" } //find all descendants of b: >db.tree2.find({ancestors: ‘b’}) //find all ancestors of f: >ancestors = db.tree2.findOne({_id:’f’}).ancestors >db.tree2.find({_id: { $in : ancestors})
  • 47. findAndModify Queue example //Example: grab highest priority job and mark job = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -1), update: {$set: {inprogress: true, started: new Date()}}, new: true})
  • 48. More Cool Stuff • Aggregation • Capped collections • GridFS • Geo
  • 49. Learn More Kyle’s presentation + video: http://www.slideshare.net/kbanker/mongodb-schema-design http://www.blip.tv/file/3704083 Dwight’s presentation http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwight- merriman Documentation Trees: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB Queues: http://www.mongodb.org/display/DOCS/findandmodify+Command Aggregration: http://www.mongodb.org/display/DOCS/Aggregation Capped Col. : http://www.mongodb.org/display/DOCS/Capped+Collections Geo: http://www.mongodb.org/display/DOCS/Geospatial+Indexing GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification
  • 50. Thank You :-)
  • 51. Download MongoDB and let us know what you think @mongodb http://www.mongodb.org
  • 52. DBRef DBRef {$ref: collection, $id: id_value} - Think URL - YDSMV: your driver support may vary Sample Schema: nr = {note_refs: [{"$ref" : "notes", "$id" : 5}, ... ]} Dereferencing: nr.forEach(function(r) { printjson(db[r.$ref].findOne({_id: r.$id})); }
  • 53. BSON Mongodb stores data in BSON internally Lightweight, Traversable, Efficient encoding Typed boolean, integer, float, date, string, binary, array...