Your SlideShare is downloading. ×
0
Schema Design with MongoDB     Antoine Girbal   antoine@10gen.com      @antoinegirbal
So why model data?  http://www.flickr.com/photos/42304632@N00/493639870/
Normalization• Goals• Avoid anomalies when inserting, updating or  deleting• Minimize redesign when extending the  schema•...
Terminology RDBMS           MongoDB Table           Collection Row(s)          JSON Document Index           Index Join   ...
Collections Basics• Equivalent to a Table in SQL• Cheap to create (max 24000)• Collections don’t have a fixed schema• Comm...
Document basics• Elements are name/value pairs,  equivalent to column value in SQL• elements can be nested• Rich data type...
Schema Design - Relational
Schema Design - MongoDB
Schema Design - MongoDB                  embedding
Schema Design - MongoDB                  embedding       linking
Design SessionDesign documents that simply map to your application> post = { author: "Hergé",       date: ISODate("2011-09...
Find the document> db.blogs.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),   author: "Hergé",   date: ISODate("2011-0...
Add and index, find via IndexSecondary index for “author”// 1 means ascending, -1 means descending> db.blogs.ensureIndex( ...
Examine the query plan> db.blogs.find( { author: "Hergé" } ).explain(){      "cursor" : "BtreeCursor author_1",      "nsca...
Examine the query plan> db.blogs.find( { author: "Hergé" } ).explain(){      "cursor" : "BtreeCursor author_1",      "nsca...
Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...//...
Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...//...
Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...//...
Extending the Schema> new_comment =  { author: "Kyle",    date: new Date(),    text: "great book" }> db.blogs.update(     ...
Extending the Schema> db.blogs.find( { author: "Hergé"} ){ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),  author : "Hergé", ...
Extending the Schema// create index on nested documents:> db.blogs.ensureIndex( { "comments.author": 1 } )> db.blogs.find(...
Extending the Schema// create index on nested documents:> db.blogs.ensureIndex( { "comments.author": 1 } )> db.blogs.find(...
Extending the Schema// create index on nested documents:> db.blogs.ensureIndex( { "comments.author": 1 } )> db.blogs.find(...
Common Patterns Patterns: • Inheritance • one to one • one to many • many to many
Inheritance
Single Table Inheritance -MongoDB shapes table    id      type   area   radius length   width   1       circle 3.14    1  ...
Single Table Inheritance -MongoDB> db.shapes.find(){ _id: "1", type: "c", area: 3.14, radius: 1}{ _id: "2", type: "s", are...
Single Table Inheritance -MongoDB> db.shapes.find(){ _id: "1", type: "c", area: 3.14, radius: 1}{ _id: "2", type: "s", are...
Single Table Inheritance -MongoDB> db.shapes.find(){ _id: "1", type: "c", area: 3.14, radius: 1}{ _id: "2", type: "s", are...
One to Many  Either:  •Embedded Array / Document:    • improves read speed    • simplifies schema  •Normalize:    • if lis...
One to ManyEmbedded Array:•$slice operator to return subset of comments•some queries become harder (e.g find latest commen...
One to ManyNormalized (2 collections)•most flexible•more queriesblogs: { _id: 1000,     author: "Hergé",     date: ISODate...
Many - ManyExample:• Product can be in many categories• Category can have many products
Many - Many// Each product list the IDs of the categoriesproducts:   { _id: 10, name: "Destination Moon",     category_ids...
Many - Many// Each product list the IDs of the categoriesproducts:   { _id: 10, name: "Destination Moon",     category_ids...
Many - Many// Each product list the IDs of the categoriesproducts:   { _id: 10, name: "Destination Moon",     category_ids...
Alternative// Each product list the IDs of the categoriesproducts:   { _id: 10, name: "Destination Moon",     category_ids...
Alternative// Each product list the IDs of the categoriesproducts:   { _id: 10, name: "Destination Moon",     category_ids...
Common Use Cases Use cases: • Trees • Time Series
TreesHierarchical information
TreesFull Tree in Document{ retweet: [   { who: “Kyle”, text: “...”,     retweet: [        {who: “James”, text: “...”,    ...
Array of Ancestors                                 A   B   C// Store all Ancestors of a node                       E   D  ...
Array of Ancestors                                 A   B   C// Store all Ancestors of a node                       E   D  ...
Array of Ancestors                                  A   B   C// Store all Ancestors of a node                        E   D...
Trees as Paths                                 A   B   CStore hierarchy as a path expression               E   D• Separate...
Time Series• Records stats by • Day, Hour, Minute• Show time series
Time Series// Time series buckets, hour and minute sub-docs{ _id: "20111209-1231",  ts: ISODate("2011-12-09T00:00:00.000Z"...
BSON Storage• Sequence of key/value pairs• NOT a hash map• Optimized to scan quickly     0 1 2 3 ... 1439What is the cost ...
BSON Storage• Can skip sub-documents             0             ...          23   0     1    ... 59             1380    ......
Time SeriesUse more of a Tree structure by nesting!// Time series buckets, each hour a sub-document{ _id: "20111209-1231",...
Duplicate dataDocument to represent a shopping order:{ _id: 1234,  ts: ISODate("2011-12-09T00:00:00.000Z")  customerId: 67...
Duplicate data• Pros:   • only 1 query to get all information needed to display   the order   • processing on the db is as...
Summary• Basic data design principles stay the same ...• But MongoDB is more flexible and brings possibilities• embed or d...
download at mongodb.org      conferences, appearances, and meetups                 http://www.10gen.com/events  Facebook  ...
Upcoming SlideShare
Loading in...5
×

10gen Presents Schema Design and Data Modeling

1,237

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,237
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
58
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "10gen Presents Schema Design and Data Modeling"

  1. 1. Schema Design with MongoDB Antoine Girbal antoine@10gen.com @antoinegirbal
  2. 2. So why model data? http://www.flickr.com/photos/42304632@N00/493639870/
  3. 3. Normalization• Goals• Avoid anomalies when inserting, updating or deleting• Minimize redesign when extending the schema• Avoid bias toward a particular query• Make use of all SQL features• In MongoDB• Similar goals apply but rules are different• Denormalization for optimization is an option: most features still exist, contrary to BLOBS
  4. 4. Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking Partition Shard Partition Key Shard Key
  5. 5. Collections Basics• Equivalent to a Table in SQL• Cheap to create (max 24000)• Collections don’t have a fixed schema• Common for documents in a collection to share a schema• Document schema can evolve• Consider using multiple related collections tied together by a naming convention: • e.g. LogData-2011-02-08
  6. 6. Document basics• Elements are name/value pairs, equivalent to column value in SQL• elements can be nested• Rich data types for values• JSON for the human eye• BSON for all internals• 16MB maximum size (many books..)• What you see is what is stored
  7. 7. Schema Design - Relational
  8. 8. Schema Design - MongoDB
  9. 9. Schema Design - MongoDB embedding
  10. 10. Schema Design - MongoDB embedding linking
  11. 11. Design SessionDesign documents that simply map to your application> post = { author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "adventure"] }> db.blogs.save(post)
  12. 12. Find the document> db.blogs.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "adventure" ] }Notes:• ID must be unique, but can be anything you’d like• MongoDB will generate a default ID if one is not supplied
  13. 13. Add and index, find via IndexSecondary index for “author”// 1 means ascending, -1 means descending> db.blogs.ensureIndex( { author: 1 } )> db.blogs.find( { author: Hergé } ) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-09-18T09:56:06.298Z"), author: "Hergé", ... }
  14. 14. Examine the query plan> db.blogs.find( { author: "Hergé" } ).explain(){ "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] }}
  15. 15. Examine the query plan> db.blogs.find( { author: "Hergé" } ).explain(){ "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] }}
  16. 16. Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...// find posts with any tags> db.blogs.find( { tags: { $exists: true } } )
  17. 17. Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...// find posts with any tags> db.blogs.find( { tags: { $exists: true } } )Regular expressions:// posts where author starts with h> db.blogs.find( { author: /^h/ } )
  18. 18. Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...// find posts with any tags> db.blogs.find( { tags: { $exists: true } } )Regular expressions:// posts where author starts with h> db.blogs.find( { author: /^h/ } )Counting:// number of posts written by Hergé> db.blogs.find( { author: "Hergé" } ).count()
  19. 19. Extending the Schema> new_comment = { author: "Kyle", date: new Date(), text: "great book" }> db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } })
  20. 20. Extending the Schema> db.blogs.find( { author: "Hergé"} ){ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ], comments_count: 1}
  21. 21. Extending the Schema// create index on nested documents:> db.blogs.ensureIndex( { "comments.author": 1 } )> db.blogs.find( { "comments.author": "Kyle" } )
  22. 22. Extending the Schema// create index on nested documents:> db.blogs.ensureIndex( { "comments.author": 1 } )> db.blogs.find( { "comments.author": "Kyle" } )// find last 5 posts:> db.blogs.find().sort( { date: -1 } ).limit(5)
  23. 23. Extending the Schema// create index on nested documents:> db.blogs.ensureIndex( { "comments.author": 1 } )> db.blogs.find( { "comments.author": "Kyle" } )// find last 5 posts:> db.blogs.find().sort( { date: -1 } ).limit(5)// most commented post:> db.blogs.find().sort( { comments_count: -1 } ).limit(1)When sorting, check if you need an index
  24. 24. Common Patterns Patterns: • Inheritance • one to one • one to many • many to many
  25. 25. Inheritance
  26. 26. Single Table Inheritance -MongoDB shapes table id type area radius length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2
  27. 27. Single Table Inheritance -MongoDB> db.shapes.find(){ _id: "1", type: "c", area: 3.14, radius: 1}{ _id: "2", type: "s", area: 4, length: 2}{ _id: "3", type: "r", area: 10, length: 5, width: 2} missing values not stored!
  28. 28. Single Table Inheritance -MongoDB> db.shapes.find(){ _id: "1", type: "c", area: 3.14, radius: 1}{ _id: "2", type: "s", area: 4, length: 2}{ _id: "3", type: "r", area: 10, length: 5, width: 2}// find shapes where radius > 0> db.shapes.find( { radius: { $gt: 0 } } )
  29. 29. Single Table Inheritance -MongoDB> db.shapes.find(){ _id: "1", type: "c", area: 3.14, radius: 1}{ _id: "2", type: "s", area: 4, length: 2}{ _id: "3", type: "r", area: 10, length: 5, width: 2}// find shapes where radius > 0> db.shapes.find( { radius: { $gt: 0 } } )// create index> db.shapes.ensureIndex( { radius: 1 }, { sparse:true } ) index only values present!
  30. 30. One to Many Either: •Embedded Array / Document: • improves read speed • simplifies schema •Normalize: • if list grows significantly • if sub items are updated often • if sub items are more than 1 level deep and need updating
  31. 31. One to ManyEmbedded Array:•$slice operator to return subset of comments•some queries become harder (e.g find latest comments across all blogs)blogs: { author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ]}
  32. 32. One to ManyNormalized (2 collections)•most flexible•more queriesblogs: { _id: 1000, author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z") }comments : { _id : 1, blogId: 1000, author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z") }> blog = db.blogs.find( { text: "Destination Moon" } );> db.ensureIndex( { blogId: 1 } ) // important!> db.comments.find( { blogId: blog._id } );
  33. 33. Many - ManyExample:• Product can be in many categories• Category can have many products
  34. 34. Many - Many// Each product list the IDs of the categoriesproducts: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }
  35. 35. Many - Many// Each product list the IDs of the categoriesproducts: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }// Each category lists the IDs of the productscategories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] }categories: { _id: 21, name: "movie", product_ids: [ 10 ] }
  36. 36. Many - Many// Each product list the IDs of the categoriesproducts: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }// Each category lists the IDs of the productscategories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] }categories: { _id: 21, name: "movie", product_ids: [ 10 ] }Cuts mapping table and 2 indexes, but:• potential consistency issue• lists can grow too large
  37. 37. Alternative// Each product list the IDs of the categoriesproducts: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }// Association not stored on the categoriescategories: { _id: 20, name: "adventure"}
  38. 38. Alternative// Each product list the IDs of the categoriesproducts: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }// Association not stored on the categoriescategories: { _id: 20, name: "adventure"}// All products for a given category> db.products.ensureIndex( { category_ids: 1} ) // yes!> db.products.find( { category_ids: 20 } )
  39. 39. Common Use Cases Use cases: • Trees • Time Series
  40. 40. TreesHierarchical information
  41. 41. TreesFull Tree in Document{ retweet: [ { who: “Kyle”, text: “...”, retweet: [ {who: “James”, text: “...”, retweet: []} ]} ]}Pros: Single Document, Performance, IntuitiveCons: Hard to search or update, document can easily gettoo large
  42. 42. Array of Ancestors A B C// Store all Ancestors of a node E D { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } F { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" }// find all direct retweets of "b"> db.tweets.find( { retweet: "b" } )
  43. 43. Array of Ancestors A B C// Store all Ancestors of a node E D { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } F { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" }// find all direct retweets of "b"> db.tweets.find( { retweet: "b" } )// find all retweets of "e" anywhere in tree> db.tweets.find( { tree: "e" } )
  44. 44. Array of Ancestors A B C// Store all Ancestors of a node E D { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } F { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" }// find all direct retweets of "b"> db.tweets.find( { retweet: "b" } )// find all retweets of "e" anywhere in tree> db.tweets.find( { tree: "e" } )// find tweet history of f:> tweets = db.tweets.findOne( { _id: "f" } ).tree> db.tweets.find( { _id: { $in : tweets } } )
  45. 45. Trees as Paths A B CStore hierarchy as a path expression E D• Separate each node by a delimiter, e.g. “,”• Use text search for find parts of a tree F• search must be left-rooted and use an index!{ retweets: [ { _id: "a", text: "initial tweet", path: "a" }, { _id: "b", text: "reweet with comment", path: "a,b" }, { _id: "c", text: "reply to retweet", path : "a,b,c"} ] }// Find the conversations "a" started> db.tweets.find( { path: /^a/ } )// Find the conversations under a branch> db.tweets.find( { path: /^a,b/ } )
  46. 46. Time Series• Records stats by • Day, Hour, Minute• Show time series
  47. 47. Time Series// Time series buckets, hour and minute sub-docs{ _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, hourly: { 0: 3, 1: 14, 2: 19 ... 23: 72 }, minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 }}// Add one to the last minute before midnight> db.votes.update( { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.037Z") }, { $inc: { "hourly.23": 1 }, $inc: { "minute.1439": 1 })
  48. 48. BSON Storage• Sequence of key/value pairs• NOT a hash map• Optimized to scan quickly 0 1 2 3 ... 1439What is the cost of update the minute beforemidnight?
  49. 49. BSON Storage• Can skip sub-documents 0 ... 23 0 1 ... 59 1380 ... 1439How could this change the schema?
  50. 50. Time SeriesUse more of a Tree structure by nesting!// Time series buckets, each hour a sub-document{ _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, minute: { 0: { 0: 0, 1: 7, ... 59: 2 }, ... 23: { 0: 15, ... 59: 6 } }}// Add one to the last second before midnight> db.votes.update( { _id: "20111209-1231" }, ts: ISODate("2011-12-09T00:00:00.000Z") }, { $inc: { "minute.23.59": 1 } })
  51. 51. Duplicate dataDocument to represent a shopping order:{ _id: 1234, ts: ISODate("2011-12-09T00:00:00.000Z") customerId: 67, total_price: 1050, items: [{ sku: 123, quantity: 2, price: 50, name: “macbook”, thumbnail: “macbook.png” }, { sku: 234, quantity: 1, price: 20, name: “iphone”, thumbnail: “iphone.png” }, ... }}The item information is duplicated in every order that reference it.Mongo’s flexible schema makes it easy!
  52. 52. Duplicate data• Pros: • only 1 query to get all information needed to display the order • processing on the db is as fast as a BLOB • can achieve much higher performance• Cons: • more storage used ... cheap enough • updates are much more complicated ... just consider fields immutable
  53. 53. Summary• Basic data design principles stay the same ...• But MongoDB is more flexible and brings possibilities• embed or duplicate data to speed up operations, cut downthe number of collections and indexes• watch for documents growing too large• make sure to use the proper indexes for querying and sorting• schema should feel natural to your application!
  54. 54. download at mongodb.org conferences, appearances, and meetups http://www.10gen.com/events Facebook | Twitter | LinkedInhttp://bit.ly/mongofb @mongodb http://linkd.in/joinmongo
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×