2. Topics
Introduction
• Basic data modeling
• Manipulating data
• Evolving a schema
Common patterns
• Single table inheritance
• One-to-many
• Many-to-many
• Trees
• Queues
3. Benefit of relational
Before relational model
• Data and logic combined
After relational model
• Separation of concerns
• Data model independent of logic
• Logic freed from concerns of data design
MongoDB continues this separation
4. Normalization
Goals
• Avoid anomalies when inserting, updating or
deleting
• Minimize redesign when extending the
schema
• Make the model informative to users
• Avoid bias toward a particular query
In MongoDB
• Similar goals apply
• But rules are different
7. Terminology
Relational MongoDB
Table Collection
Row(s) Documents
Index Index
Join Embedding and linking
Partition Shard
Partition key Shard key
8. Collections
• Cheap to create (max 24000)
• Collections don’t have a schema
• Individual documents have a schema
• Common for documents in a collection to
share a schema
• Document schema can evolve
• Consider using multiple related collections
tied together by a naming convention:
• e.g. LogData-2011-02-08
9. Document basics
• Zero or more elements
• Elements are name/value pairs
• Rich data types for values
• JSON
• BSON
11. Experimenting with MongoDB
• Mongo shell
• Javascript
$ mongo
MongoDB shell version: 1.7.5
connecting to: test
> db.books.find()
{
_id : ObjectId("12345678901234567890abcd"),
author : "Ernest Hemingway",
title : "The Old Man and the Sea"
}
>
13. Rich document advantages
• Holistic representation
• Still easy to manipulate
• Pre-joined for fast retrieval
14. Document size
• Max 4MB in earlier MongoDB versions
• Max 16MB in current versions
• Performance considerations long before
reaching the maximum size
15. Database considerations
• How can we manipulate this data?
• Dynamic queries
• Secondary indexes
• Atomic updates
What are the access patterns?
• Read/write ratio
• Types of updates
• Types of queries
• Data life-cycle
Considerations
• No joins
• Document writes are atomic
16. Document design
• Design documents that map simply to your
application data
> book = {
author : "Ernest Hemingway",
title : "The Old Man and the Sea",
tags : ["American Literature", "Sea", "Large Fish"]
}
> db.books.insert(book)
>
25. Using the extended schema
// create index on nested element
> db.books.ensureIndex({ "comments.author" : 1 })
// find books Robert has commented on
> db.books.find({ "comments.author" : "Robert" })
// find book with most comments
> db.books.find().sort({ "comments_count" : ‐1}).limit(1)
// when sorting, check if you need an index
26. Watch for full table scans
Examine the query plan
> db.books.find()
.sort({ "comments_count" : ‐1}).limit(1).explain()
{
cursor : "BasicCursor",
nscanned : 12345,
nscannedObjects : 12345,
n : 1,
millis : 123
indexBounds : { }
}
>
32. One to many: embedded trees
> db.books.find()
{
author : "Ernest Hemingway",
title : "The Old Man and the Sea",
comments : [
{
author : "Robert",
text : "Great book"
replies : [
{
author : "Jim",
text : "I didn't like it"
}
]
}
]
}
>
39. Trees
Options:
•Full tree in document
•Parent links
•Child links
•Parent and child links
•Array of ancestors
•Ancestor paths
40. Trees: full tree in document
{
comments : [
{ author : "Robert", text : "...",
replies : [
{ author : "Jim", text : "...",
replies : []
}
]
}
]
}
Pros: single document, performance, intuitive
Cons: hard to search, hard to get partial results, document
size limit could be reached
41. Trees: Parent and child links
Parent links
• Each node is stored as a document
• Contains the id of the parent
Child links
• Each node is stored as a document
• Contains the ids of the children
In some cases you might do both
47. Summary
• Schema design is different in MongoDB
• Basic principles stay the same
• Use rich documents
• There's more than one right way
• Focus on how your application uses the
data
• Rapidly evolve the schema to meet your
requirements