O C T O B E R 1 6 , 2 0 1 7 | M O N G O D B W E B I N A R
Advanced Schema
Design Patterns
# M D B l o c a l
{ "name": "Daniel Coupal",
"jobs_at_MongoDB": [
{ "job": "Senior Curriculum Engineer",
"from": new Date("2016-11") },
{ "job": "Senior Technical Service Engineer",
"from": new Date("2013-11") }
],
"previous_jobs": [
"Consultant",
"Developer",
"Manager Quality & Tools Team",
"Manager Software Team",
"Tools Developer"
],
"likes": [ "food", "beers", "movies", "MongoDB" ]
}
Who Am I?
# M D B l o c a l
The "Gang of Four":
A design pattern systematically names, explains,
and evaluates an important and recurring design
in object-oriented systems
MongoDB systems can also be built using its
own patterns
PATTERN
Pattern
# M D B l o c a l
• Enable teams to use a common methodology and vocabulary
when designing schemas for MongoDB
• Giving you the ability to model schemas using building blocks
• Less art and more methodology
Why this Talk?
# M D B l o c a l
Ensure:
• Good performance
• Scalability
despite constraints ➡
• Hardware
• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
• Database Server
• Maximum size for a document
• Atomicity of a write
• Data set
• Size of data
Why do we Create Models?
# M D B l o c a l
•Don’t over-design! •Design for:
•Performance
•Scalability
•Simplicity
However …
# M D B l o c a l
WMDB -
World Movie Database
Any	events,	characters	and	
entities	depicted	in	this	
presentation	are	fictional.
Any	resemblance	or	similarity	to	
reality	is	entirely	coincidental
# M D B l o c a l
WMDB -
World Movie Database
First	iteration
3	collections:
A. movies
B. moviegoers
C. screenings
# M D B l o c a l
Our mission, should we decide to accept it, is to
fix this solution, so it can perform well and
scale.
As always, should I or anyone in the audience do
it without training, WMDB will disavow any
knowledge of our actions.
This tape will self-destruct in five seconds. Good
luck!
Mission Possible
# M D B l o c a l
Categories of Patterns
• Frequency of Access
• Subset ✓
• Approximation ✓
• Grouping
• Computed ✓
• Overflow
• Bucket
• Representation
• Attribute ✓
• Schema Versioning ✓
• Document Versioning
• Tree
• Pre-Allocation
# M D B l o c a l
{
title: "Moonlight",
...
release_USA: "2016/09/02",
release_Mexico: "2017/01/27",
release_France: "2017/02/01",
release_Festival_Mill_Valley:
"2017/10/10"
}
Would	need	the	following	indexes:
{ release_USA: 1 }
{ release_Mexico: 1 }
{ release_France: 1 }
...
{ release_Festival_Mill_Valley: 1 }
...
Issue #1: Big Documents, Many Fields
and Many Indexes
# M D B l o c a l
Pattern #1: Attribute
{
title: "Moonlight",
...
release_USA: "2016/09/02",
release_Mexico: "2017/01/27",
release_France: "2017/02/01",
release_Festival_Mill_Valley:
"2017/10/10"
}
# M D B l o c a l
Problem:
• Lots of similar fields
• Common characteristic to search across those fields together
• Fields present in only a small subset of documents
Use cases:
• Product attributes like ‘color’, ‘size’, ‘dimensions’, ...
• Release dates of a movie in different countries, festivals
Attribute Pattern
# M D B l o c a l
Solution:
• Field pairs in an array
Benefits:
• Allow for non deterministic list of attributes
• Easy to index
{ "releases.location": 1, "releases.date": 1 }
• Easy to extend with a qualifier, for example:
{ descriptor: "price", qualifier: "euros", value: Decimal(100.00) }
Attribute Pattern - Solution
# M D B l o c a l
Possible solutions:
A. Reduce the size of your working set
B. Add more RAM per machine
C. Start sharding or add more shards
Issue #2: Working Set doesn’t fit in RAM
# M D B l o c a l
WMDB -
World Movie Database
First	iteration
3	collections:
A. movies
B. moviegoers
C. screenings
# M D B l o c a l
In this example, we can:
• Limit the list of actors and
crew to 20
• Limit the embedded reviews
to the top 20
• …
Pattern #2: Subset
# M D B l o c a l
Problem:
• There is a 1-N or N-N relationship, and only few documents
always need to be shown
• Only infrequently do you need to pull all of the depending
documents
Use cases:
• Main actors of a movie
• List of reviews or comments
Subset Pattern
# M D B l o c a l
Solution:
• Keep duplicates of a small subset of fields in the main collection
Benefits:
• Allows for fast data retrieval and a reduced working set size
• One query brings all the information needed for the "main page"
Subset Pattern - Solution
# M D B l o c a l
Issue #3: Lot of CPU Usage
# M D B l o c a l
{
title: "Your Name",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
Issue #3: ..caused by repeated calculations
# M D B l o c a l
For example:
• Apply a sum, count, ...
• rollup data by minute, hour,
day
• As long as you don’t mess
with your source, you can
recreate the rollups
Pattern #3: Computed
# M D B l o c a l
Problem:
• There is data that needs to be computed
• The same calculations would happen over and over
• Reads outnumber writes:
• example: 1K writes per hour vs 1M read per hour
Use cases:
• Have revenues per movie showing, want to display sums
• Time series data, Event Sourcing
Computed Pattern
# M D B l o c a l
Solution:
• Apply a computation or operation on data and store the result
Benefits:
• Avoid re-computing the same thing over and over
• Replaces a view
Computed Pattern - Solution
# M D B l o c a l
Issue #4: Lots of Writes
Web page counters
Updates on movie data
Screenings
Other
# M D B l o c a l
Issue #4: … for non critical data
# M D B l o c a l
• Only increment once in X
iterations
• Increment by X
Pattern #4: Approximation
# M D B l o c a l
Web page counters
Updates on movie data
Screenings
Other
# M D B l o c a l
Problem:
• Data is difficult to calculate correctly
• May be too expensive to update the document every time to keep
an exact count
• No one gives a damn if the number is exact
Use cases:
• Population of a country
• Web site visits
Approximation Pattern
# M D B l o c a l
Solution:
• Fewer stronger writes
Benefits:
• Less writes, reducing contention on some documents
Approximation Pattern –
Solution
# M D B l o c a l
• Keeping track of the schema version of a document
Issue #5: Need to change the list of fields
in the documents
# M D B l o c a l
Add a field to track the
schema version number, per
document
Does not have to exist for
version 1
Pattern #5: Schema Versioning
# M D B l o c a l
Problem:
• Updating the schema of a database is:
• Not atomic
• Long operation
• May not want to update all documents, only do it on updates
Use cases:
• Practically any database that will go to production
Schema Versioning Pattern
# M D B l o c a l
Solution:
• Have a field keeping track of the schema version
Benefits:
• Don't need to update all the documents at once
• May not have to update documents until their next modification
Schema Versioning Pattern –
Solution
# M D B l o c a l
• How duplication is handled
A. Update both source and target in real time
B. Update target from source at regular intervals. Examples:
• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?
Aspect of Patterns: Consistency
# M D B l o c a l
• Bucket
• grouping documents together, to have less documents
• Document Versioning
• tracking of content changes in a document
• Outlier
• Avoid few documents drive the design, and impact performance for all
• Tree(s)
• Pre-allocation
Other Patterns
#MDBW17
BACK to reality
# M D B l o c a l
• Simple grouping from tables to collections is not optimal
• Learn a common vocabulary for designing schemas with
MongoDB
• Use patterns as "plug-and-play" for your future designs
• Attribute
• Subset
• Computed
• Approximation
• Schema Versioning
Take Aways
# M D B l o c a l
A full design example for a
given problem:
• E-commerce site
• Contents Management
System
• Social Networking
• Single view
• …
References for complete Solutions
# M D B l o c a l
• More patterns in a follow up to this presentation
• MongoDB in-person training courses on Schema Design
• Upcoming Online course at
MongoDB University:
• https://university.mongodb.com
• M220 Data Modeling
How Can I Learn More About Schema
Design?
# M D B l o c a l
daniel.coupal@mongodb.com
Thank You for
using MongoDB!

Advanced Schema Design Patterns

  • 1.
    O C TO B E R 1 6 , 2 0 1 7 | M O N G O D B W E B I N A R Advanced Schema Design Patterns
  • 2.
    # M DB l o c a l { "name": "Daniel Coupal", "jobs_at_MongoDB": [ { "job": "Senior Curriculum Engineer", "from": new Date("2016-11") }, { "job": "Senior Technical Service Engineer", "from": new Date("2013-11") } ], "previous_jobs": [ "Consultant", "Developer", "Manager Quality & Tools Team", "Manager Software Team", "Tools Developer" ], "likes": [ "food", "beers", "movies", "MongoDB" ] } Who Am I?
  • 3.
    # M DB l o c a l The "Gang of Four": A design pattern systematically names, explains, and evaluates an important and recurring design in object-oriented systems MongoDB systems can also be built using its own patterns PATTERN Pattern
  • 4.
    # M DB l o c a l • Enable teams to use a common methodology and vocabulary when designing schemas for MongoDB • Giving you the ability to model schemas using building blocks • Less art and more methodology Why this Talk?
  • 5.
    # M DB l o c a l Ensure: • Good performance • Scalability despite constraints ➡ • Hardware • RAM faster than Disk • Disk cheaper than RAM • Network latency • Reduce costs $$$ • Database Server • Maximum size for a document • Atomicity of a write • Data set • Size of data Why do we Create Models?
  • 6.
    # M DB l o c a l •Don’t over-design! •Design for: •Performance •Scalability •Simplicity However …
  • 7.
    # M DB l o c a l WMDB - World Movie Database Any events, characters and entities depicted in this presentation are fictional. Any resemblance or similarity to reality is entirely coincidental
  • 8.
    # M DB l o c a l WMDB - World Movie Database First iteration 3 collections: A. movies B. moviegoers C. screenings
  • 9.
    # M DB l o c a l Our mission, should we decide to accept it, is to fix this solution, so it can perform well and scale. As always, should I or anyone in the audience do it without training, WMDB will disavow any knowledge of our actions. This tape will self-destruct in five seconds. Good luck! Mission Possible
  • 10.
    # M DB l o c a l Categories of Patterns • Frequency of Access • Subset ✓ • Approximation ✓ • Grouping • Computed ✓ • Overflow • Bucket • Representation • Attribute ✓ • Schema Versioning ✓ • Document Versioning • Tree • Pre-Allocation
  • 11.
    # M DB l o c a l { title: "Moonlight", ... release_USA: "2016/09/02", release_Mexico: "2017/01/27", release_France: "2017/02/01", release_Festival_Mill_Valley: "2017/10/10" } Would need the following indexes: { release_USA: 1 } { release_Mexico: 1 } { release_France: 1 } ... { release_Festival_Mill_Valley: 1 } ... Issue #1: Big Documents, Many Fields and Many Indexes
  • 12.
    # M DB l o c a l Pattern #1: Attribute { title: "Moonlight", ... release_USA: "2016/09/02", release_Mexico: "2017/01/27", release_France: "2017/02/01", release_Festival_Mill_Valley: "2017/10/10" }
  • 13.
    # M DB l o c a l Problem: • Lots of similar fields • Common characteristic to search across those fields together • Fields present in only a small subset of documents Use cases: • Product attributes like ‘color’, ‘size’, ‘dimensions’, ... • Release dates of a movie in different countries, festivals Attribute Pattern
  • 14.
    # M DB l o c a l Solution: • Field pairs in an array Benefits: • Allow for non deterministic list of attributes • Easy to index { "releases.location": 1, "releases.date": 1 } • Easy to extend with a qualifier, for example: { descriptor: "price", qualifier: "euros", value: Decimal(100.00) } Attribute Pattern - Solution
  • 15.
    # M DB l o c a l Possible solutions: A. Reduce the size of your working set B. Add more RAM per machine C. Start sharding or add more shards Issue #2: Working Set doesn’t fit in RAM
  • 16.
    # M DB l o c a l WMDB - World Movie Database First iteration 3 collections: A. movies B. moviegoers C. screenings
  • 17.
    # M DB l o c a l In this example, we can: • Limit the list of actors and crew to 20 • Limit the embedded reviews to the top 20 • … Pattern #2: Subset
  • 18.
    # M DB l o c a l Problem: • There is a 1-N or N-N relationship, and only few documents always need to be shown • Only infrequently do you need to pull all of the depending documents Use cases: • Main actors of a movie • List of reviews or comments Subset Pattern
  • 19.
    # M DB l o c a l Solution: • Keep duplicates of a small subset of fields in the main collection Benefits: • Allows for fast data retrieval and a reduced working set size • One query brings all the information needed for the "main page" Subset Pattern - Solution
  • 20.
    # M DB l o c a l Issue #3: Lot of CPU Usage
  • 21.
    # M DB l o c a l { title: "Your Name", ... viewings: 5,000 viewers: 385,000 revenues: 5,074,800 } Issue #3: ..caused by repeated calculations
  • 22.
    # M DB l o c a l For example: • Apply a sum, count, ... • rollup data by minute, hour, day • As long as you don’t mess with your source, you can recreate the rollups Pattern #3: Computed
  • 23.
    # M DB l o c a l Problem: • There is data that needs to be computed • The same calculations would happen over and over • Reads outnumber writes: • example: 1K writes per hour vs 1M read per hour Use cases: • Have revenues per movie showing, want to display sums • Time series data, Event Sourcing Computed Pattern
  • 24.
    # M DB l o c a l Solution: • Apply a computation or operation on data and store the result Benefits: • Avoid re-computing the same thing over and over • Replaces a view Computed Pattern - Solution
  • 25.
    # M DB l o c a l Issue #4: Lots of Writes Web page counters Updates on movie data Screenings Other
  • 26.
    # M DB l o c a l Issue #4: … for non critical data
  • 27.
    # M DB l o c a l • Only increment once in X iterations • Increment by X Pattern #4: Approximation
  • 28.
    # M DB l o c a l Web page counters Updates on movie data Screenings Other
  • 29.
    # M DB l o c a l Problem: • Data is difficult to calculate correctly • May be too expensive to update the document every time to keep an exact count • No one gives a damn if the number is exact Use cases: • Population of a country • Web site visits Approximation Pattern
  • 30.
    # M DB l o c a l Solution: • Fewer stronger writes Benefits: • Less writes, reducing contention on some documents Approximation Pattern – Solution
  • 31.
    # M DB l o c a l • Keeping track of the schema version of a document Issue #5: Need to change the list of fields in the documents
  • 32.
    # M DB l o c a l Add a field to track the schema version number, per document Does not have to exist for version 1 Pattern #5: Schema Versioning
  • 33.
    # M DB l o c a l Problem: • Updating the schema of a database is: • Not atomic • Long operation • May not want to update all documents, only do it on updates Use cases: • Practically any database that will go to production Schema Versioning Pattern
  • 34.
    # M DB l o c a l Solution: • Have a field keeping track of the schema version Benefits: • Don't need to update all the documents at once • May not have to update documents until their next modification Schema Versioning Pattern – Solution
  • 35.
    # M DB l o c a l • How duplication is handled A. Update both source and target in real time B. Update target from source at regular intervals. Examples: • Most popular items => update nightly • Revenues from a movie => update every hour • Last 10 reviews => update hourly? daily? Aspect of Patterns: Consistency
  • 36.
    # M DB l o c a l • Bucket • grouping documents together, to have less documents • Document Versioning • tracking of content changes in a document • Outlier • Avoid few documents drive the design, and impact performance for all • Tree(s) • Pre-allocation Other Patterns
  • 37.
  • 38.
    # M DB l o c a l • Simple grouping from tables to collections is not optimal • Learn a common vocabulary for designing schemas with MongoDB • Use patterns as "plug-and-play" for your future designs • Attribute • Subset • Computed • Approximation • Schema Versioning Take Aways
  • 39.
    # M DB l o c a l A full design example for a given problem: • E-commerce site • Contents Management System • Social Networking • Single view • … References for complete Solutions
  • 40.
    # M DB l o c a l • More patterns in a follow up to this presentation • MongoDB in-person training courses on Schema Design • Upcoming Online course at MongoDB University: • https://university.mongodb.com • M220 Data Modeling How Can I Learn More About Schema Design?
  • 41.
    # M DB l o c a l daniel.coupal@mongodb.com Thank You for using MongoDB!