MongoDB.local Seattle 2019: Advanced Schema Design Patterns

Advanced Schema Design
Patterns
Muthu Chinnasamy, Technical Director – SI Partners,
MongoDB

Muthu Chinnasamy
Technical Director – SI Partners, MongoDB
@muthumongo

Why This Talk?
Over ten years with the
document model
Use of a common methodology
and vocabulary when designing
schemas for MongoDB
Ability to model schemas using
building blocks
Less art and more methodology

Pattern
The "Gang of Four":
A design pattern systematically
names, explains, and evaluates
an important and recurring design
in object-oriented systems
MongoDB systems can also be
built using its own patterns

Why Do We Create
Models?
Ensure:
• Good performance
• Scalability
despite constraints
Hardware
• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
Database Server
• Maximum size for a document
• Atomicity of a write
Data set
• Size of data

WMDB -
World Movie Database
Any events, characters and
entities depicted in this
presentation are fictional.
Any resemblance or similarity
to reality is entirely
coincidental

WMDB -
World Movie Database
First iteration
3 collections:
A. movies
B. moviegoers
C.screenings

Our mission, should we decide to accept it,
is to fix this solution, so it can perform well
and scale.
As always, should I or anyone in the
audience do it without training, WMDB will
disavow any knowledge of our actions.
This tape will self-destruct in five seconds.
Good luck!
Mission Possible

• Frequency of Access
• Subset ✔️
• Approximation ✔️
• Extended Reference
Patterns by Category
• Grouping
• Computed ✔️
• Bucket
• Outlier
• Representation
• Attribute ✔️
• Schema Versioning ✔️
• Document Versioning
• Tree
• Polymorphism
• Pre-Allocation

{
title: "Dunkirk",
...
release_USA: "2017/07/23",
release_Mexico: "2017/08/01",
release_France: "2017/08/01",
release_Festival_San_Jose:
"2017/07/22"
}
Would need the following indexes:
{ release_USA: 1 }
{ release_Mexico: 1 }
{ release_France: 1 }
...
{ release_Festival_San_Jose: 1 }
...
Issue #1: Big Documents, Many
Fields
and Many Indexes

Pattern #1: Attribute
{
title: "Dunkirk",
...
release_USA: "2017/07/23",
release_Mexico: "2017/08/01",
release_France: "2017/08/01",
release_Festival_San_Jose:
"2017/07/22"
}

Problem:
Lots of similar fields
Common characteristic to search across those fields together
Fields present in only a small subset of documents
Use cases:
Product attributes like ‘color’, ‘size’, ‘dimensions’, ...
Release dates of a movie in different countries, festivals
Attribute Pattern

Solution:
Field pairs in an array
Benefits:
Allow for non deterministic list of attributes
Easy to index
{ "releases.location": 1, "releases.date": 1 }
Easy to extend with a qualifier, for example:
{ descriptor: "price", qualifier: "euros", value: Decimal(100.00) }
Attribute Pattern - Solution

Possible solutions:
A. Reduce the size of your working set
B. Add more RAM per machine
C. Start sharding or add more shards
Issue #2: Working Set Doesn’t Fit in
RAM

In this example, we can:
Limit the list of actors and
crew to 20
Limit the embedded reviews
to the top 20
…
Pattern #2: Subset

Solution:
Keep duplicates of a small subset of fields in the main collection
Benefits:
Allows for fast data retrieval and a reduced working set size
One query brings all the information needed for the "main page"
Subset Pattern - Solution

Question:
Which MongoDB feature introduced in version 3.6 will allow me to
notify an application if the name of an actor is changed?
Quiz A
Subset Pattern

CPU is on fire!
Issue #3: Lot of CPU Usage

{
title: "The Shape of Water",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
Issue #3: ...caused by repeated
calculations

For example:
Apply a sum, count, ...
rollup data by minute, hour,
day
As long as you don’t mess
with your source, you can
recreate the rollups
Pattern #3: Computed

Problem:
There is data that needs to be computed
The same calculations would happen over and over
Reads outnumber writes:
• example: 1K writes per hour vs 1M read per hour
Use cases:
Have revenues per movie showing, want to display sums
Time series data, Event Sourcing
Computed Pattern

Solution:
Apply a computation or operation on data and store the result
Benefits:
Avoid re-computing the same thing over and over
Computed Pattern - Solution

Question:
Which Relational Database feature is typically used to mimic the
computed pattern?
Quiz B
Computed Pattern

Issue #4: Lots of Writes
Updates on movie data
Screenings
Other
Web page counters

Issue #4: … For Non Critical Data

Only increment once in X
iterations
Increment by X
Pattern #4: Approximation

Updates on movie data
Screenings
Other
Web page counters

Problem:
Data is difficult to calculate correctly
May be too expensive to update the document every time to keep
an exact count
Exactness of count may not be of high concern
Use cases:
Population of a country
Web site visits
Approximation Pattern

Solution:
Fewer stronger writes
Benefits:
Less writes, reducing contention on some documents
Approximation Pattern –
Solution

Keeping track of the schema version of a document
Issue #5: Need to Change the List
of Fields in the Documents

Add a field to track the
schema version number, per
document
Does not have to exist for
version 1
Pattern #5: Schema Versioning

Problem:
Updating the schema of a database is:
• Not atomic
• Long operation
• May not want to update all documents, only do it on updates
Use cases:
Practically any database that will go to production
Schema Versioning Pattern

Solution:
Have a field keeping track of the schema version
Benefits:
Don't need to update all the documents at once
May not have to update documents until their next modification
Schema Versioning Pattern –
Solution

How duplication is handled
A. Update both source and target in real time
B. Update target from source at regular intervals. Examples:
• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?
Aspect of Patterns: Consistency

What our Patterns did for us
Problem Pattern
Messy and Large Documents Attribute
Too much RAM Subset
Too much CPU Computed
Too many disk accesses Approximation
No downtime to upgrade schema Schema Versioning

• Bucket
• Grouping documents together, to have less documents
• Document Versioning
• Tracking of content changes in a document
• Outlier
• Avoid few documents driving the design and impact performance for
all
• External Reference
• Tree(s)
• Polymorphism
• Pre-allocation
Other Patterns

A. Simple grouping from tables to collections is not optimal
B. Learn a common vocabulary for designing schemas with MongoDB
C. Use patterns as "plug-and-play" to improve performance
Takeaways

A full design example for a
given problem:
E-commerce site
Contents Management
System
Social Networking
Single view
…
References for complete Solutions

More patterns in a follow up to this presentation
MongoDB in-person training courses on Schema Design
MongoDB Building With Patterns Blog series
Upcoming Online course at
MongoDB University:
• https://university.mongodb.com
• Data Modeling
How Can I Learn More About
Schema Design?

Question:
Which Pattern is used in the
following document?
{ "name": "Ken W. Alger",
"jobs_at_MongoDB": [
{ "job": "Developer Advocate",
"from": new Date("2018-07") }
],
"previous_jobs": [
"Production Manager",
"Executive Chef",
"Congressional Assistant",
"Entrepenuer”
],
"likes": [ "food", "beers", "movies", "MongoDB" ],
"email": "ken.alger@mongodb.com"
}
Quiz C
Which Pattern is used

MongoDB.local Seattle 2019: Advanced Schema Design Patterns

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MongoDB.local Seattle 2019: Advanced Schema Design Patterns

Similar to MongoDB.local Seattle 2019: Advanced Schema Design Patterns (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

MongoDB.local Seattle 2019: Advanced Schema Design Patterns

Editor's Notes