Advanced Schema Design Patterns

O C T O B E R 1 2 , 2 0 1 7 | B E S P O K E | S A N F R A N C I S C O
# M D B l o c a l
Advanced Schema
Design Patterns

# M D B l o c a l
{ "name": "Daniel Coupal",
"jobs_at_MongoDB": [
{ "job": "Senior Curriculum Engineer",
"from": new Date("2016-11") },
{ "job": "Senior Technical Service Engineer",
"from": new Date("2013-11") }
],
"previous_jobs": [
"Consultant",
"Developer",
"Manager Quality & Tools Team",
"Manager Software Team",
"Tools Developer"
],
"likes": [ "food", "beers", "movies", "MongoDB" ]
}
Who Am I?

# M D B l o c a l
The "Gang of Four":
A design pattern systematically names, explains,
and evaluates an important and recurring design
in object-oriented systems
MongoDB systems can also be built using its own
patterns
PATTERN
Pattern

# M D B l o c a l
• Enable teams to use a common methodology and vocabulary
when designing schemas for MongoDB
• Giving you the ability to model schemas using building blocks
• Less art and more methodology
Why this Talk?

# M D B l o c a l
Ensure:
• Good performance
• Scalability
despite constraints ➡
• Hardware
• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
• Database Server
• Maximum size for a document
• Atomicity of a write
• Data set
• Size of data
Why do we Create Models?

# M D B l o c a l
•Don’t over-design! •Design for:
•Performance
•Scalability
•Simplicity
However …

# M D B l o c a l
WMDB -
World Movie Database
Any events, characters and
entities depicted in this
presentation are fictional.
Any resemblance or similarity to
reality is entirely coincidental

# M D B l o c a l
WMDB -
World Movie Database
First iteration
3 collections:
A. movies
B. moviegoers
C. screenings

# M D B l o c a l
Our mission, should we decide to accept it, is to
fix this solution, so it can perform well and scale.
As always, should I or anyone in the audience do
it without training, WMDB will disavow any
knowledge of our actions.
This tape will self-destruct in five seconds. Good
luck!
Mission Possible

# M D B l o c a l
Categories of Patterns
• Frequency of
Access
• Subset ✓
• Approximation ✓
• Grouping
• Computed ✓
• Overflow
• Bucket
• Representation
• Attribute ✓
• Schema Versioning ✓
• Document Versioning
• Tree
• Pre-Allocation

# M D B l o c a l
{
title: "Moonlight",
...
release_USA: "2016/09/02",
release_Mexico: "2017/01/27",
release_France: "2017/02/01",
release_Festival_Mill_Valley:
"2017/10/10"
}
Would need the following indexes:
{ release_USA: 1 }
{ release_Mexico: 1 }
{ release_France: 1 }
...
{ release_Festival_Mill_Valley: 1 }
...
Issue #1: Big Documents, Many Fields
and Many Indexes

# M D B l o c a l
Pattern #1: Attribute
{
title: "Moonlight",
...
release_USA: "2016/09/02",
release_Mexico: "2017/01/27",
release_France: "2017/02/01",
release_Festival_Mill_Valley:
"2017/10/10"
}

# M D B l o c a l
Problem:
• Lots of similar fields
• Common characteristic to search across those fields together
• Fields present in only a small subset of documents
Use cases:
• Product attributes like ‘color’, ‘size’, ‘dimensions’, ...
• Release dates of a movie in different countries, festivals
Attribute Pattern

# M D B l o c a l
Solution:
• Field pairs in an array
Benefits:
• Allow for non deterministic list of attributes
• Easy to index
{ "releases.location": 1, "releases.date": 1 }
• Easy to extend with a qualifier, for example:
{ descriptor: "price", qualifier: "euros", value: Decimal(100.00) }
Attribute Pattern - Solution

# M D B l o c a l
Possible solutions:
A. Reduce the size of your working set
B. Add more RAM per machine
C. Start sharding or add more shards
Issue #2: Working Set doesn’t fit in RAM

# M D B l o c a l
In this example, we can:
• Limit the list of actors and
crew to 20
• Limit the embedded reviews
to the top 20
• …
Pattern #2: Subset

# M D B l o c a l
Problem:
• There is a 1-N or N-N relationship, and only few documents from
need to be shown always
• Only infrequently do you need to pull all of the depending
documents
Use cases:
• Main actors of a movie
• List of reviews or comments
Subset Pattern

# M D B l o c a l
Solution:
• Keep duplicates of a small subset of fields in the main collection
Benefits:
• Allows for fast data retrieval and a reduced working set size
• One query brings all the information needed for the "main page"
Subset Pattern - Solution

# M D B l o c a l
• How duplication is handled
A. Update both source and target in real time
B. Update target from source at regular intervals. Examples:
• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?
Aspect of Patterns: Consistency

# M D B l o c a l
Issue #3: Lot of CPU Usage

# M D B l o c a l
{
title: "Your Name",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
Issue #3: ..caused by repeated
calculations

# M D B l o c a l
For example:
• Apply a sum, count, ...
• rollup data by minute, hour,
day
• As long as you don’t mess
with your source, you can
recreate the rollups
Pattern #3: Computed

# M D B l o c a l
Problem:
• There is data that needs to be computed
• The same calculations would happen over and over
• Reads outnumber writes:
• example: 1K writes per hour vs 1M read per hour
Use cases:
• Have revenues per movie showing, want to display sums
• Time series data, Event Sourcing
Computed Pattern

# M D B l o c a l
Solution:
• Apply a computation or operation on data and store the result
Benefits:
• Avoid re-computing the same thing over and over
• Replaces a view
Computed Pattern - Solution

# M D B l o c a l
Issue #4: Lots of Writes
Web page counters
Updates on movie data
Screenings
Other

# M D B l o c a l
Issue #4: … for non critical data

# M D B l o c a l
• Only increment once in X
iterations
• Increment by X
Pattern #4: Approximation

# M D B l o c a l
Problem:
• Data is difficult to calculate correctly
• May be too expensive to update the document every time to keep
an exact count
• No one gives a damn if the number is exact
Use cases:
• Population of a country
• Web site visits
Approximation Pattern

# M D B l o c a l
Solution:
• Fewer stronger writes
Benefits:
• Less writes, reducing contention on some documents
Approximation Pattern –
Solution

# M D B l o c a l
• Keeping track of the schema version of a document
Issue #5: Need to change the list of fields
in the documents

# M D B l o c a l
Add a field to track the
schema version number, per
document
Does not have to exist for
version 1
Pattern #5: Schema Versioning

# M D B l o c a l
Problem:
• Updating the schema of a database is:
• Not atomic
• Long operation
• May not want to update all documents, only do it on updates
Use cases:
• Practically any database that will go to production
Schema Versioning Pattern

# M D B l o c a l
Solution:
• Have a field keeping track of the schema version
Benefits:
• Don't need to update all the documents at once
• May not have to update documents until their next modification
Schema Versioning Pattern –
Solution

# M D B l o c a l
• Bucket
• grouping documents together, to have less documents
• Document Versioning
• tracking of content changes in a document
• Outlier
• Avoid few documents drive the design, and impact performance for all
• Tree(s)
• Pre-allocation
Other Patterns

# M D B l o c a l
• Simple grouping from tables to collections is not optimal
• Learn a common vocabulary for designing schemas with
MongoDB
• Use patterns as "plug-and-play" for your future designs
• Attribute
• Subset
• Computed
• Approximation
• Schema Versioning
Take Aways

# M D B l o c a l
A full design example for a
given problem:
• E-commerce site
• Contents Management
System
• Social Networking
• Single view
• …
References for complete Solutions

# M D B l o c a l
• More patterns in a follow up to this presentation
• MongoDB in-person training courses on Schema Design
• Upcoming Online course at
MongoDB University:
• https://university.mongodb.com
• M220 Data Modeling
How Can I Learn More About Schema
Design?

# M D B l o c a l
daniel.coupal@mongodb.com
Thank You for
using MongoDB!

Advanced Schema Design Patterns

More Related Content

What's hot

Similar to Advanced Schema Design Patterns

More from MongoDB

Advanced Schema Design Patterns

Editor's Notes