open-source, high-performance,
  document-oriented database
Schema Design Basics
    roger@10Gen.com
This talk
This talk
‣Intro
  -Terms / Definitions
This talk
‣Intro
  -Terms / Definitions
This talk
‣Intro
  -Terms / Definitions
‣Getting a flavor
 -Creating a Schema
 -Indexes
 -Evolving the Schema
This talk
‣Intro
  -Terms / Definitions
‣Getting a flavor
 -Creating a Schema
 -Indexes
 -Evolving the Schema
‣Data modeling...
Document Oriented

Basic unit of data: JSON Documents
Not Relational, Key Value

Not OODB
 - Associations implied by Docum...
Terms
Table           -> Collection

Row(s)          -> JSON Document

Index           -> Index

Join            -> Embedd...
Considerations

What are the requirements ?
 - Functionality to be supported
 - Access Patterns ?
 - Data Life Cycle (inse...
DB Considerations
How can we manipulate this data ?
 Dynamic Queries
 Secondary Indexes
 Atomic Updates
 Map Reduce

Acces...
Design Session
Use Rich Design Documents

 post = {author: “kyle”,
      date: new Date(),
      text: “my blog post...”,
...
>db.posts.find()

 { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
   author : "kyle",
   date : "Sat Jul 24 2010 19:47:11 GM...
Secondary index for “author”

 // 1 means ascending, -1 means descending

 >db.posts.ensureIndex({author: 1})

 >db.posts....
Verifying indexes exist
 >db.system.indexes.find()

   // Index on ID
   { name : "_id_",
     ns : "test.posts",
     key ...
Verifying indexes exist
 >db.system.indexes.find()

   // Index on ID
   { name : "_id_",
     ns : "test.posts",
     key ...
Query operators
Conditional operators:
 $ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
 $lt, $lte, $gt, $gte, $ne,
...
Query operators
Conditional operators:
 $ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
 $lt, $lte, $gt, $gte, $ne,
...
Query operators
Conditional operators:
 $ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
 $lt, $lte, $gt, $gte, $ne,
...
Extending the Schema
 comment = {author: “fred”,
            date: new Date(),
            text: “super duper”}

 update =...
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
      author : "kyle",
      date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)...
// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})

>db.posts.find({comments.author:”Fred”})
// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})

>db.posts.find({comments.author:”kyle”}...
// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})

>db.posts.find({comments.author:”kyle”}...
Map Reduce
Aggregation and batch manipulation

 Collection in, Collection out

 Parallel in sharded environments
Map reduce
mapFunc = function () {
  this.tags.forEach(function (z) {emit(z, {count:1});});
}

reduceFunc = function (k, v...
Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some mo...
Wordnik
9B   records, 100M queries / week, 1,2TB
{

     entry : {

     
   header: { id: 0,

     
   
     headword: "m...
Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some mo...
Single Table Inheritance
>db.shapes.find()

 { _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1}
 { _id: ObjectI...
One to Many
- Embedded Array / Array Keys
  - slice operator to return subset of array
  - hard to find latest comments acr...
One to Many
- Embedded Array / Array Keys
  - slice operator to return subset of array
  - hard to find latest comments acr...
One to Many
- Embedded Array / Array Keys
  - slice operator to return subset of array
  - hard to find latest comments acr...
Many - Many
Example:

- Product can be in many categories
- Category can have many products



  Products     id | product...
products:
  { _id: ObjectId("4c4ca23933fb5941681b912e"),
    name: "Sumatra Dark Roast",
    category_ids: [ ObjectId("4c4...
products:
  { _id: ObjectId("4c4ca23933fb5941681b912e"),
    name: "Sumatra Dark Roast",
    category_ids: [ ObjectId("4c4...
products:
  { _id: ObjectId("4c4ca23933fb5941681b912e"),
    name: "Sumatra Dark Roast",
    category_ids: [ ObjectId("4c4...
products:
  { _id: ObjectId("4c4ca23933fb5941681b912e"),
    name: "Sumatra Dark Roast",
    category_ids: [ ObjectId("4c4...
Alternative
products:
  { _id: ObjectId("4c4ca23933fb5941681b912e"),
    name: "Sumatra Dark Roast",
    category_ids: [ O...
Alternative
products:
  { _id: ObjectId("4c4ca23933fb5941681b912e"),
    name: "Sumatra Dark Roast",
    category_ids: [ O...
Alternative
products:
  { _id: ObjectId("4c4ca23933fb5941681b912e"),
    name: "Sumatra Dark Roast",
    category_ids: [ O...
Trees
Full Tree in Document

{ comments: [
     { author: “rpb”, text: “...”,
       replies: [
                  {author:...
Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent

Child Links
- Each node contains t...
Array of Ancestors
- Store Ancestors of a node
 {   _id:   "a" }
 {   _id:   "b", ancestors: [ "a" ], parent: "a" }
 {   _...
Array of Ancestors
- Store Ancestors of a node
 {   _id:   "a" }
 {   _id:   "b", ancestors: [ "a" ], parent: "a" }
 {   _...
Array of Ancestors
- Store Ancestors of a node
 {   _id:   "a" }
 {   _id:   "b", ancestors: [ "a" ], parent: "a" }
 {   _...
findAndModify
Queue example

//Example: grab highest priority job and mark

job = db.jobs.findAndModify({
          query: {...
More Cool Stuff

• Aggregation
• Capped collections
• GridFS
• Geo
Learn More
 Kyle’s presentation + video:
http://www.slideshare.net/kbanker/mongodb-schema-design
http://www.blip.tv/file/37...
Thank You :-)
Download
       MongoDB

and let us know what you think
          @mongodb
   http://www.mongodb.org
DBRef
DBRef
 {$ref: collection, $id: id_value}

- Think URL
- YDSMV: your driver support may vary

Sample Schema:
  nr = {...
BSON
Mongodb stores data in BSON internally

 Lightweight, Traversable, Efficient encoding

 Typed
   boolean, integer, flo...
Schema Design with MongoDB
Upcoming SlideShare
Loading in...5
×

Schema Design with MongoDB

23,267

Published on

Introduction to schema design with MongoDB

Published in: Technology
0 Comments
35 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
23,267
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
537
Comments
0
Likes
35
Embeds 0
No embeds

No notes for slide


















































  • blog post
    twitter


  • Transcript of "Schema Design with MongoDB"

    1. 1. open-source, high-performance, document-oriented database
    2. 2. Schema Design Basics roger@10Gen.com
    3. 3. This talk
    4. 4. This talk ‣Intro -Terms / Definitions
    5. 5. This talk ‣Intro -Terms / Definitions
    6. 6. This talk ‣Intro -Terms / Definitions ‣Getting a flavor -Creating a Schema -Indexes -Evolving the Schema
    7. 7. This talk ‣Intro -Terms / Definitions ‣Getting a flavor -Creating a Schema -Indexes -Evolving the Schema ‣Data modeling -DBRef -Single Table Inheritance -Many - Many -Trees -Lists / Queues / Stacks
    8. 8. Document Oriented Basic unit of data: JSON Documents Not Relational, Key Value Not OODB - Associations implied by Document Structure - but your database schema != your program schema
    9. 9. Terms Table -> Collection Row(s) -> JSON Document Index -> Index Join -> Embedding and Linking across documents Partition -> Shard Partition Key -> Shard Key
    10. 10. Considerations What are the requirements ? - Functionality to be supported - Access Patterns ? - Data Life Cycle (insert, update, deletes) - Expected Performance / Workload ? Capabilities of the database ?
    11. 11. DB Considerations How can we manipulate this data ? Dynamic Queries Secondary Indexes Atomic Updates Map Reduce Access Patterns ? Read / Write Ratio Types of updates Types of queries Considerations No Joins Single Document Transactions only
    12. 12. Design Session Use Rich Design Documents post = {author: “kyle”, date: new Date(), text: “my blog post...”, tags: [“mongodb”, “intro”]} >db.post.save(post)
    13. 13. >db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ] } Notes: - ID is unique, but can be anything you’d like
    14. 14. Secondary index for “author” // 1 means ascending, -1 means descending >db.posts.ensureIndex({author: 1}) >db.posts.find({author: 'kyle'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", ... }
    15. 15. Verifying indexes exist >db.system.indexes.find() // Index on ID { name : "_id_", ns : "test.posts", key : { "_id" : 1 } }
    16. 16. Verifying indexes exist >db.system.indexes.find() // Index on ID { name : "_id_", ns : "test.posts", key : { "_id" : 1 } } // Index on author { _id : ObjectId("4c4ba6c5672c685e5e8aabf4"), ns : "test.posts", key : { "author" : 1 }, name : "author_1" }
    17. 17. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}})
    18. 18. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i })
    19. 19. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i }) Counting: // posts written by mike >db.posts.find({author: “mike”}).count()
    20. 20. Extending the Schema comment = {author: “fred”, date: new Date(), text: “super duper”} update = { ‘$push’: {comments: comment}, ‘$inc’: {comments_count: 1}} >db.posts.update({_id: “...” }, update)
    21. 21. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ], comments_count: 1, comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Super Duper" } ]}
    22. 22. // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”Fred”})
    23. 23. // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”kyle”}) // find last 5 posts: >db.posts.find().sort({date:-1}).limit(5)
    24. 24. // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”kyle”}) // find last 5 posts: >db.posts.find().sort({date:-1}).limit(5) // most commented post: >db.posts.find().sort({comments_count:-1}).limit(1) When sorting, check if you need an index
    25. 25. Map Reduce Aggregation and batch manipulation Collection in, Collection out Parallel in sharded environments
    26. 26. Map reduce mapFunc = function () { this.tags.forEach(function (z) {emit(z, {count:1});}); } reduceFunc = function (k, v) { var total = 0; for (var i = 0; i < v.length; i++) { total += v[i].count; } return {count:total}; } res = db.posts.mapReduce(mapFunc, reduceFunc) >db[res.result].find() { _id : "intro", value : { count : 1 } } { _id : "mongodb", value : { count : 1 } }
    27. 27. Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more
    28. 28. Wordnik 9B records, 100M queries / week, 1,2TB { entry : { header: { id: 0, headword: "m", sourceDictionary: "GCide", textProns : [ {text: "(em)", seq:0} ], syllables: [ {id: 0, text: "m"} ], sourceDictionary: "1913 Webster", headWord: "m", id: 1, definitions: : [ {text: "M, the thirteenth letter..."}, {text: "As a numeral, M stands for 1000"}] } } }
    29. 29. Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more Observations: - Using Rich Documents works well - Simplify relations by embedding them - Iterative development is easy with MongoDB
    30. 30. Single Table Inheritance >db.shapes.find() { _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1} { _id: ObjectId("..."), type: "square", area: 4, d: 2} { _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2} // find shapes where radius > 0 >db.shapes.find({radius: {$gt: 0}}) // create index >db.shapes.ensureIndex({radius: 1})
    31. 31. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents
    32. 32. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query
    33. 33. One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query - Normalized (2 collections) - most flexible - more queries
    34. 34. Many - Many Example: - Product can be in many categories - Category can have many products Products id | product_id | category_id Category
    35. 35. products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}
    36. 36. products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}
    37. 37. products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})
    38. 38. products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")}) //All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
    39. 39. Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}
    40. 40. Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"} // All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
    41. 41. Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"} // All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")}) // All categories for a given product product = db.products.find(_id : some_id) >db.categories.find({_id : {$in : product.category_ids}})
    42. 42. Trees Full Tree in Document { comments: [ { author: “rpb”, text: “...”, replies: [ {author: “Fred”, text: “...”, replies: []} ]} ]} Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 4MB limit
    43. 43. Trees Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)
    44. 44. Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
    45. 45. Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" } //find all descendants of b: >db.tree2.find({ancestors: ‘b’})
    46. 46. Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" } //find all descendants of b: >db.tree2.find({ancestors: ‘b’}) //find all ancestors of f: >ancestors = db.tree2.findOne({_id:’f’}).ancestors >db.tree2.find({_id: { $in : ancestors})
    47. 47. findAndModify Queue example //Example: grab highest priority job and mark job = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -1), update: {$set: {inprogress: true, started: new Date()}}, new: true})
    48. 48. More Cool Stuff • Aggregation • Capped collections • GridFS • Geo
    49. 49. Learn More Kyle’s presentation + video: http://www.slideshare.net/kbanker/mongodb-schema-design http://www.blip.tv/file/3704083 Dwight’s presentation http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwight- merriman Documentation Trees: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB Queues: http://www.mongodb.org/display/DOCS/findandmodify+Command Aggregration: http://www.mongodb.org/display/DOCS/Aggregation Capped Col. : http://www.mongodb.org/display/DOCS/Capped+Collections Geo: http://www.mongodb.org/display/DOCS/Geospatial+Indexing GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification
    50. 50. Thank You :-)
    51. 51. Download MongoDB and let us know what you think @mongodb http://www.mongodb.org
    52. 52. DBRef DBRef {$ref: collection, $id: id_value} - Think URL - YDSMV: your driver support may vary Sample Schema: nr = {note_refs: [{"$ref" : "notes", "$id" : 5}, ... ]} Dereferencing: nr.forEach(function(r) { printjson(db[r.$ref].findOne({_id: r.$id})); }
    53. 53. BSON Mongodb stores data in BSON internally Lightweight, Traversable, Efficient encoding Typed boolean, integer, float, date, string, binary, array...
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×