Emily Stolfo
#mongodbdays
Schema Design
Ruby Engineer/Evangelist, 10gen
@EmStolfo
Agenda
• Working with documents
• Common patterns
• Queries and Indexes
Terminology
RDBMS MongoDB
Database ➜ Database
Table ➜ Collection
Row ➜ Document
Index ➜ Index
Join ➜ Embedded Document
Foreign Key ➜ Reference
Working with Documents
Documents
Provide flexibility and
performance
Example Schema (MongoDB)
Embedding
Example Schema (MongoDB)
Embedding
Linking
Example Schema (MongoDB)
Relational Schema Design
Focuses on data storage
Document Schema Design
Focuses on data use
Schema Design Considerations
• What is a priority?
– High consistency
– High read performance
– High write performance
• How does the application access and manipulate
data?
– Read/Write Ratio
– Types of Queries / Updates
– Data life-cycle and growth
– Analytics (Map Reduce, Aggregation)
Tools for Data Access
• Flexible Schemas
• Embedded data structures
• Secondary Indexes
• Multi-Key Indexes
• Aggregation Framework
– Pipeline operators: $project, $match, $limit,
$skip, $sort, $group, $unwind
• No Joins
Data Manipulation
• Conditional Query Operators
– Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte,
$ne
– Vector: $in, $nin, $all, $size
• Atomic Update Operators
– Scalar: $inc, $set, $unset
– Vector: $push, $pop, $pull, $pushAll, $pullAll,
$addToSet
Schema Design
Example
Library Management
Application
• Patrons
• Books
• Authors
• Publishers
One to One Relations
example
patron = {
_id: "joe"
name: "Joe Bookreader”
}
address = {
patron_id = "joe",
street: "123 Fake St. ",
city: "Faketon",
state: "MA",
zip: 12345
}
Modeling Patrons
patron = {
_id: "joe"
name: "Joe Bookreader",
address: {
street: "123 Fake St. ",
city: "Faketon",
state: "MA",
zip: 12345
}
}
One to One Relations
• “Contains” relationships are often
embedded.
• Document provides a holistic representation
of objects with embedded entities.
• Optimized read performance.
examples
One To Many Relations
patron = {
_id: "joe"
name: "Joe Bookreader",
join_date: ISODate("2011-10-15"),
addresses: [
{street: "1 Vernon St.", city: "Newton", state: "MA", …},
{street: "52 Main St.", city: "Boston", state: "MA", …},
]
}
Patrons with many addresses
example 2
Publishers and Books
One to Many Relations
Publishers and Books relation
• Publishers put out many books
• Books have one publisher
MongoDB: The Definitive Guide,
By Kristina Chodorow and Mike Dirolf
Published: 9/24/2010
Pages: 216
Language: English
Publisher: O’Reilly Media, CA
Book Data
book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
}
Book Model with Embedded Publisher
publisher = {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
Book Model with Normalized Publisher
publisher = {
_id: "oreilly",
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}
Link with Publisher _id as a
Reference
publisher = {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
books: [ "123456789", ... ]
}
book = {
_id: "123456789",
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
Link with Book _ids as a Reference
Where do you put the reference?
• Reference to single publisher on books
– Use when items have unbounded growth (unlimited # of
books)
• Array of books in publisher document
– Optimal when many means a handful of items
– Use when there is a bound on potential growth
example 3
Books and Patrons
One to Many Relations
Books and Patrons
• Book can be checked out by one Patron at a
time
• Patrons can check out many books (but not
1000s)
patron = {
_id: "joe"
name: "Joe Bookreader",
join_date: ISODate("2011-10-15"),
address: { ... }
}
book = {
_id: "123456789"
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ],
...
}
Modeling Checkouts
patron = {
_id: "joe"
name: "Joe Bookreader",
join_date: ISODate("2011-10-15"),
address: { ... },
checked_out: [
{ _id: "123456789", checked_out: "2012-10-15" },
{ _id: "987654321", checked_out: "2012-09-12" },
...
]
}
Modeling Checkouts
De-normalization
Provides data locality
patron = {
_id: "joe"
name: "Joe Bookreader",
join_date: ISODate("2011-10-15"),
address: { ... },
checked_out: [
{ _id: "123456789",
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ],
checked_out: ISODate("2012-10-15")
},
{ _id: "987654321"
title: "MongoDB: The Scaling Adventure", ...
}, ...
]
}
Modeling Checkouts - de-normalized
Referencing vs. Embedding
• Embedding is a bit like pre-joining data
• Document level operations are easy for the
server to handle
• Embed when the “many” objects always
appear with (viewed in the context of) their
parents.
• Reference when you need more flexibility
How does your application access and
manipulate data?
example
Many to Many Relations
book = {
title: "MongoDB: The Definitive Guide",
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
author = {
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "New York"
}
author = {
_id: "mdirolf",
name: "Mike Dirolf",
hometown: "Albany"
}
Books and Authors
book = {
title: "MongoDB: The Definitive Guide",
authors : [
{ _id: "kchodorow", name: "Kristina Chodorow” },
{ _id: "mdirolf", name: "Mike Dirolf” }
]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
author = {
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "New York"
}
author = {
_id: "mdirolf",
name: "Mike Dirolf",
hometown: "Albany"
}
Relation stored in Book
document
book = {
_id: 123456789
title: "MongoDB: The Definitive Guide",
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
author = {
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "Cincinnati",
books: [ {book_id: 123456789, title : "MongoDB: The Definitive Guide" }]
}
Relation stored in Author
document
book = {
_id: 123456789
title: "MongoDB: The Definitive Guide",
authors = [ kchodorow, mdirolf ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
author = {
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "New York",
books: [ 123456789, ... ]
}
author = {
_id: "mdirolf",
name: "Mike Dirolf",
hometown: "Albany",
books: [ 123456789, ... ]
}
Relation stored in both
documents
book = {
title: "MongoDB: The Definitive Guide",
authors : [
{ _id: "kchodorow", name: "Kristina Chodorow” },
{ _id: "mdirolf", name: "Mike Dirolf” }
]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
author = {
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "New York"
}
db.books.find( { authors.name : "Kristina Chodorow" } )
Where do you put the reference?
Think about common queries
Where do you put the reference?
Think about indexes
book = {
title: "MongoDB: The Definitive Guide",
authors : [
{ _id: "kchodorow", name: "Kristina Chodorow” },
{ _id: "mdirolf", name: "Mike Dirolf” }
]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
author = {
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "New York"
}
db.books.createIndex( { authors.name : 1 } )
Summary
• Schema design is different in MongoDB
• Basic data design principals apply
• Focus on how application accesses and
manipulates data
• Evolve schema to meet changing
requirements
• Application-level logic is important!
Emily Stolfo
#mongodbdays
Thank You
Ruby Engineer/Evangelist, 10gen
@EmStolfo

MongoDB Schema Design

  • 1.
    Emily Stolfo #mongodbdays Schema Design RubyEngineer/Evangelist, 10gen @EmStolfo
  • 2.
    Agenda • Working withdocuments • Common patterns • Queries and Indexes
  • 3.
    Terminology RDBMS MongoDB Database ➜Database Table ➜ Collection Row ➜ Document Index ➜ Index Join ➜ Embedded Document Foreign Key ➜ Reference
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    Schema Design Considerations •What is a priority? – High consistency – High read performance – High write performance • How does the application access and manipulate data? – Read/Write Ratio – Types of Queries / Updates – Data life-cycle and growth – Analytics (Map Reduce, Aggregation)
  • 12.
    Tools for DataAccess • Flexible Schemas • Embedded data structures • Secondary Indexes • Multi-Key Indexes • Aggregation Framework – Pipeline operators: $project, $match, $limit, $skip, $sort, $group, $unwind • No Joins
  • 13.
    Data Manipulation • ConditionalQuery Operators – Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte, $ne – Vector: $in, $nin, $all, $size • Atomic Update Operators – Scalar: $inc, $set, $unset – Vector: $push, $pop, $pull, $pushAll, $pullAll, $addToSet
  • 14.
  • 15.
    Library Management Application • Patrons •Books • Authors • Publishers
  • 16.
    One to OneRelations example
  • 17.
    patron = { _id:"joe" name: "Joe Bookreader” } address = { patron_id = "joe", street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345 } Modeling Patrons patron = { _id: "joe" name: "Joe Bookreader", address: { street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345 } }
  • 18.
    One to OneRelations • “Contains” relationships are often embedded. • Document provides a holistic representation of objects with embedded entities. • Optimized read performance.
  • 19.
  • 20.
    patron = { _id:"joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), addresses: [ {street: "1 Vernon St.", city: "Newton", state: "MA", …}, {street: "52 Main St.", city: "Boston", state: "MA", …}, ] } Patrons with many addresses
  • 21.
    example 2 Publishers andBooks One to Many Relations
  • 22.
    Publishers and Booksrelation • Publishers put out many books • Books have one publisher
  • 23.
    MongoDB: The DefinitiveGuide, By Kristina Chodorow and Mike Dirolf Published: 9/24/2010 Pages: 216 Language: English Publisher: O’Reilly Media, CA Book Data
  • 24.
    book = { title:"MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" } } Book Model with Embedded Publisher
  • 25.
    publisher = { name:"O’Reilly Media", founded: "1980", location: "CA" } book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English" } Book Model with Normalized Publisher
  • 26.
    publisher = { _id:"oreilly", name: "O’Reilly Media", founded: "1980", location: "CA" } book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly" } Link with Publisher _id as a Reference
  • 27.
    publisher = { name:"O’Reilly Media", founded: "1980", location: "CA" books: [ "123456789", ... ] } book = { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English" } Link with Book _ids as a Reference
  • 28.
    Where do youput the reference? • Reference to single publisher on books – Use when items have unbounded growth (unlimited # of books) • Array of books in publisher document – Optimal when many means a handful of items – Use when there is a bound on potential growth
  • 29.
    example 3 Books andPatrons One to Many Relations
  • 30.
    Books and Patrons •Book can be checked out by one Patron at a time • Patrons can check out many books (but not 1000s)
  • 31.
    patron = { _id:"joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... } } book = { _id: "123456789" title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], ... } Modeling Checkouts
  • 32.
    patron = { _id:"joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }, checked_out: [ { _id: "123456789", checked_out: "2012-10-15" }, { _id: "987654321", checked_out: "2012-09-12" }, ... ] } Modeling Checkouts
  • 33.
  • 34.
    patron = { _id:"joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }, checked_out: [ { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], checked_out: ISODate("2012-10-15") }, { _id: "987654321" title: "MongoDB: The Scaling Adventure", ... }, ... ] } Modeling Checkouts - de-normalized
  • 35.
    Referencing vs. Embedding •Embedding is a bit like pre-joining data • Document level operations are easy for the server to handle • Embed when the “many” objects always appear with (viewed in the context of) their parents. • Reference when you need more flexibility How does your application access and manipulate data?
  • 36.
  • 37.
    book = { title:"MongoDB: The Definitive Guide", published_date: ISODate("2010-09-24"), pages: 216, language: "English" } author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York" } author = { _id: "mdirolf", name: "Mike Dirolf", hometown: "Albany" } Books and Authors
  • 38.
    book = { title:"MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English" } author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York" } author = { _id: "mdirolf", name: "Mike Dirolf", hometown: "Albany" } Relation stored in Book document
  • 39.
    book = { _id:123456789 title: "MongoDB: The Definitive Guide", published_date: ISODate("2010-09-24"), pages: 216, language: "English" } author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "Cincinnati", books: [ {book_id: 123456789, title : "MongoDB: The Definitive Guide" }] } Relation stored in Author document
  • 40.
    book = { _id:123456789 title: "MongoDB: The Definitive Guide", authors = [ kchodorow, mdirolf ] published_date: ISODate("2010-09-24"), pages: 216, language: "English" } author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York", books: [ 123456789, ... ] } author = { _id: "mdirolf", name: "Mike Dirolf", hometown: "Albany", books: [ 123456789, ... ] } Relation stored in both documents
  • 41.
    book = { title:"MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English" } author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York" } db.books.find( { authors.name : "Kristina Chodorow" } ) Where do you put the reference? Think about common queries
  • 42.
    Where do youput the reference? Think about indexes book = { title: "MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English" } author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York" } db.books.createIndex( { authors.name : 1 } )
  • 43.
    Summary • Schema designis different in MongoDB • Basic data design principals apply • Focus on how application accesses and manipulates data • Evolve schema to meet changing requirements • Application-level logic is important!
  • 44.
    Emily Stolfo #mongodbdays Thank You RubyEngineer/Evangelist, 10gen @EmStolfo

Editor's Notes

  • #6 Flexibility – Ability to represent rich data structures Performance – Benefit from data locality
  • #7 Concrete example of typical blog using a document oriented de-normalized approach
  • #13 Tools for data access
  • #14 Tools for data manipulation
  • #18 Slow to get address data every time you query for a user. Requires an extra operation.
  • #21 Patron may have two addresses, in this case, you would need a separate table in a relation database With MongoDB, you simply start storing the address field as an array Only patrons which have multiple addresses could have this schema! No migration necessary! but Caution: Additional application logic required!
  • #25 Publisher is repeated for every book, data duplication!
  • #26 Publisher is better being a separate entity and having its own collection.
  • #27 Now to create a relation between the two entities, you can choose to reference the publisher from the book document. This is similar to the relational approach for this very same problem.
  • #28 OR: because we are using MongoDB and documents can have arrays you can choose to model the relation by creating and maintaining an array of books within each publisher entity. Careful with mutable, growing arrays. See next slide.
  • #29 Costly for a small number of books because to get the publisher
  • #34 And data locality provides speed
  • #36 tie back to examples, give some concrete scenarios
  • #38 Authors often use pseudonyms for a book even though it’s the same individual To get books by a particular author: - get the author - get books that have that author id in array
  • #39 To get the authors given a book: - Single query To get books by a particular author: - get the author id - get books that have that author id in array
  • #40 Getting the title of book published by an author is a single query Getting the authors of a book. 2 queries Get the book id Query the author for books in the id