This document provides an overview of schema design considerations for MongoDB. It discusses modeling different types of relationships like one-to-one, one-to-many, and many-to-many. It also covers topics like embedding vs referencing, indexing, and common patterns like trees and queues. The key message is that schema design should focus on how the data will be used by applications rather than how it is stored, as the document model provides more flexibility compared to relational databases.
13. Schema Design
Considerations
• How do we manipulate the data?
– Dynamic Ad-Hoc Queries
– Atomic Updates
– Map Reduce
• What are the access patterns of the application?
– Read/Write Ratio
– Types of Queries / Updates
– Data life-cycle and growth rate
Schema Design – Gary Murakami
21. One to One Relations
• Mostly the same as the relational approach
• Generally good idea to embed “contains”
relationships
• Document model provides a holistic
representation of objects
Schema Design – Gary Murakami
24. Publishers and Books
• Publishers put out many books
• Books have one publisher
Schema Design – Gary Murakami
25. Book
MongoDB: The Definitive Guide,
By Kristina Chodorow and Mike Dirolf
Published: 9/24/2010
Pages: 216
Language: English
Publisher: O’Reilly Media, CA
Schema Design – Gary Murakami
28. Publisher _id as a Foreign
Key
publisher = {
_id: "oreilly",
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}
Schema Design – Gary Murakami
29. Book _id as a Foreign Key
publisher = {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
books: [ "123456789", ... ]
}
book = {
_id: "123456789",
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
Schema Design – Gary Murakami
30. Where Do You Put the Foreign
Key?
• Array of books inside of publisher
– Makes sense when many means a handful of items
– Useful when items have bound on potential growth
• Reference to single publisher on books
– Useful when items have unbounded growth (unlimited # of
books)
• SQL doesn’t give you a choice, no arrays
Schema Design – Gary Murakami
31. Longest “Database Endgame”
Mate
• Retrograde analysis
– Distance to mate (DTM)
– Distance to conversion (DTC)
• Tablebase analysis
– 6 piece – 262 moves, KRNKNN
– 7 piece – 517 moves, so far
• estimate 2015
Schema Design – Gary Murakami
33. Books and Patrons
• Book can be checked out by one Patron at a time
• Patrons can check out many books (but not
1000’s)
Schema Design – Gary Murakami
38. Referencing vs. Embedding
• Embedding is a bit like pre-joined data
• Document-level ops are easy for server to
handle
• Embed when the 'many' objects always appear
with (i.e. viewed in the context of) their parent
• Reference when you need more flexibility
Schema Design – Gary Murakami
48. Modeling Trees
• Parent Links
- Each node is stored as a document
- Contains the id of the parent
• Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
Schema Design – Gary Murakami
52. It’s All About Your
Application
• Programs + Databases = Applications
• Your Schema is the impedance matcher
– Melds programming with MongoDB
– Capitalizes on both programming language & DB
– Maximizes power transfer – best of both working together
– Flexible for development and change
• Programs X MongoDB = Great Big Applications
• Play chess with God
Schema Design – Gary Murakami
53. It’s All About Your
Application
• Programs + Databases = Applications
• Your Schema is the impedance matcher
– Melds programming with MongoDB
– Capitalizes on both programming language & DB
– Maximizes power transfer – best of both working together
– Flexible for development and change
• Programs X MongoDB = Great Big Applications
• Play music with God – AAC / PAC
Schema Design – Gary Murakami
54. #MongoDC
"His pattern indicates
two-dimensional thinking.”
-Spock
Star Trek II:
The Wrath of Khan
Thank You
Gary J. Murakami, Ph.D. www.3dchessfederation.com
Lead Engineer and Ruby Evangelist, 10gen (the MongoDB
company)
Editor's Notes
In the filing cabinet model, the patient’s x-rays, checkups, and allergies are stored in separate drawers and pulled together (like an RDBMS)In the file folder model, we store all of the patient information in a single folder (like MongoDB)
Flexibility – Ability to represent rich data structuresPerformance – Benefit from data locality
Concrete example of typical blog in typical tionaltional normalized form
Concrete example of typical blog using a document oriented de-normalized approach
Tools for data manipulation
Tools for data access
Slow to get address data every time you query for a user. Requires an extra operation.
Publisher is repeated for every book, data duplication!
Publisher is better being a separate entity and having its own collection.
Now to create a relation between the two entities, you can choose to reference the publisher from the book document.This is similar to the relational approach for this very same problem.
OR: because we are using MongoDB and documents can have arrays you can choose to model the relation by creating and maintaining an array of books within each publisher entity.Careful with mutable, growing arrays. See next slide.
Costly for a small number of books because to get the publisher
And data locality provides speed
Book’s kind attribute could be local or loanableNote that we have locations for loanable books but not for localNote that these two separate schemas can co-exist (loanable books / local books are both books)
Kristine to update this graphic at some point
Authors often use pseudonyms for a book even though it’s the same individualTo get books by a particular author: - get the author - get books that have that author id in array