Advanced Schema Design
Patterns
Anant Srivastava, Sr. Consulting Engineer,MongoDB
{
name : "Anant Srivastava",
company : "MongoDB",
title : "Senior Consulting Engineer",
location : "Plano, TX",
start_date: new Date("2016-02"),
Hats : [ "Troubleshooter", "Trainer",Developer
"Advisor" ],
email : "anant.srivastava@mongodb.com"
}
Pattern
The "Gang of Four":
A design pattern systematically names,
explains, and evaluates an important and
recurring design in object-oriented
systems
MongoDB systems can also be built
using its own patterns
• 10+ years with the document
model
• Use of a common methodology
and vocabulary when designing
schemas for MongoDB
• Ability to model schemas using
building blocks
• Less art and more methodology
Why this Talk?
Ensure:
• Good performance
• Scalability
despite constraints
• Hardware
• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
• Database Server
• Maximum size for a document
• Atomicity of a write
• Data set
• Size of data
Why do we Create Models?
Any events, characters and
entities depicted in this
presentation are fictional.
Any resemblance or similarity to
reality is entirely coincidental
WMDB -
World Movie Database
First iteration
3 collections:
A. movies
B. moviegoers
C. screenings
WMDB -
World Movie Database
Our mission, should we decide to accept it, is to fix this solution, so it can perform well and scale.
As always, should I or anyone in the audience do it without training, WMDB will disavow any
knowledge of our actions.
This tape will self-destruct in five seconds. Good luck!
Mission Possible
Our mission, should we decide to accept it, is to
fix this solution, so it can perform well and scale.
As always, should I or anyone in the audience do
it without training, WMDB will disavow any
knowledge of our actions.
This tape will self-destruct in five seconds. Good
luck!
Mission Possible
• Frequency of Access
• Subset ✔
• Approximation ✔
• Extended Reference
Patterns by Category
• Grouping
• Computed ✔
• Bucket
• Outlier
• Representation
• Attribute ✔
• Schema Versioning ✔
• Document Versioning
• Tree
• Polymorphism
• Pre-Allocation
{
title: "Dunkirk",
...
release_USA: "2017/07/23",
release_Mexico: "2017/08/01",
release_France: "2017/08/01",
release_Festival_San_Jose:
"2017/07/22"
}
Would need the following indexes:
{ release_USA: 1 }
{ release_Mexico: 1 }
{ release_France: 1 }
...
{ release_Festival_San_Jose: 1 }
...
Issue #1: Big Documents, Many Fields
and Many Indexes
Pattern #1: Attribute
{
title: "Dunkirk",
...
release_USA: "2017/07/23",
release_Mexico: "2017/08/01",
release_France: "2017/08/01",
release_Festival_San_Jose:
"2017/07/22"
}
Problem:
• Lots of similar fields
• Common characteristic to search across those fields
• Fields present in only a small subset of documents
Use cases:
• Product attributes like ‘color’, ‘size’, ‘dimensions’, ...
• Release dates of a movie in different countries, festivals
Attribute Pattern
Solution:
• Field pairs in an array
Benefits:
• Allow for non deterministic list of attributes
• Easy to index
{ "releases.location": 1, "releases.date": 1 }
• Easy to extend with a qualifier, for example:
{ descriptor: "price", qualifier: "euros", value: Decimal(100.00) }
Attribute Pattern - Solution
Possible solutions:
A. Reduce the size of your working set
B. Add more RAM per machine
C. Start sharding or add more shards
Issue #2: Working Set doesn’t fit in RAM
In this example, we can:
• Limit the list of actors and
crew to 20
• Limit the embedded reviews
to the top 20
Pattern #2: Subset
Problem:
• There is a 1-N or N-N relationship, and only a few documents
always need to be shown
• Only infrequently do you need to pull all of the depending
documents
Use cases:
• Main actors of a movie
• List of reviews or comments
Subset Pattern
Solution:
• Keep duplicates of a small subset of fields in the main collection
Benefits:
• Allows for fast data retrieval and a reduced working set size
• One query brings all the information needed for the "main page"
Subset Pattern - Solution
Issue #3: CPU is on fire...
{
title: "The Shape of Water",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
Issue #3: ..caused by repeated calculations
For example:
• Apply a sum, count, ...
• rollup data by minute,
hour, day
• As long as you don’t
mess with your source,
you can recreate the
rollups
Pattern #3: Computed
Problem:
• There is data that needs to be computed
• The same calculations would happen over and over
• Reads outnumber writes:
• example: 1K writes per hour vs 1M read per hour
Use cases:
• Have revenues per movie showing, want to display sums
• Time series data, Event Sourcing
Computed Pattern
Solution:
• Apply a computation or operation on data and store the result
Benefits:
• Avoid re-computing the same thing over and over
Computed Pattern - Solution
Issue #4: Heavy Writes
Issue #4: … for non critical data
• Only increment once in X
iterations
• Increment by X
Pattern #4: Approximation
Problem:
• Data is difficult to calculate correctly
• May be too expensive to update the document every time to
keep an exact count
• Not critical if the number is exact
Use cases:
• Web site visits
• Time series data
Approximation Pattern
Solution:
• Fewer stronger writes
Benefits:
• Less writes
• Reduces contention on some documents
Approximation Pattern –
Solution
• Keeping track of the schema version of a document
Issue #5: Need to change document fields or
structure
Add a field to track the schema
version number, per document
Does not have to exist for
version 1
Pattern #5: Schema Versioning
Problem:
• Updating the schema of a database is:
• Not atomic
• Long operation
• May not want to update all documents, only do it on updates
Use cases:
• Practically any database that will go to production
Schema Versioning Pattern
Solution:
• Have a field keeping track of the schema version
Benefits:
• Don't need to update all the documents at once
• May not have to update documents until their next modification
Schema Versioning Pattern –
Solution
• How duplication is handled
A. Update both source and target in real time
B. Update target from source at regular intervals. Examples:
• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?
Aspect of Patterns: Consistency
What these Patterns did for us
Problem Pattern
Non deterministic list of attributes ?
Large documents using memory ?
Recomputing same values ?
High writes and document contention ?
No downtime to upgrade schema ?
Other Patterns
• Frequency of Access
• Subset ✔
• Approximation ✔
• Extended Reference
• Grouping
• Computed ✔
• Bucket
• Outlier
• Representation
• Attribute ✔
• Schema Versioning ✔
• Document Versioning
• Tree
• Polymorphism
• Pre-Allocation
A. Simple grouping from tables to collections is not optimal
B. Learn a common vocabulary for designing schemas with
MongoDB
C. Use patterns as "plug-and-play" to improve performance
Take Aways
A full design example for a
given problem:
• E-commerce site
• Content Management System
• Social Networking
• Single view
References for complete Solutions
• https://docs.mongodb.com/manual/core/data-modeling-introduction/
• Professional Services / MongoDB in-person training
• Upcoming Online course at
MongoDB University:
• https://university.mongodb.com
• Data Modeling
How Can I Learn More About Schema Design?
Thank You for using MongoDB !
{
name: "Anant Srivastava",
email: "anant.srivastava@mongodb.com"
}

Advanced Schema Design Patterns

  • 2.
    Advanced Schema Design Patterns AnantSrivastava, Sr. Consulting Engineer,MongoDB
  • 3.
    { name : "AnantSrivastava", company : "MongoDB", title : "Senior Consulting Engineer", location : "Plano, TX", start_date: new Date("2016-02"), Hats : [ "Troubleshooter", "Trainer",Developer "Advisor" ], email : "anant.srivastava@mongodb.com" }
  • 4.
    Pattern The "Gang ofFour": A design pattern systematically names, explains, and evaluates an important and recurring design in object-oriented systems MongoDB systems can also be built using its own patterns
  • 5.
    • 10+ yearswith the document model • Use of a common methodology and vocabulary when designing schemas for MongoDB • Ability to model schemas using building blocks • Less art and more methodology Why this Talk?
  • 6.
    Ensure: • Good performance •Scalability despite constraints • Hardware • RAM faster than Disk • Disk cheaper than RAM • Network latency • Reduce costs $$$ • Database Server • Maximum size for a document • Atomicity of a write • Data set • Size of data Why do we Create Models?
  • 7.
    Any events, charactersand entities depicted in this presentation are fictional. Any resemblance or similarity to reality is entirely coincidental WMDB - World Movie Database
  • 8.
    First iteration 3 collections: A.movies B. moviegoers C. screenings WMDB - World Movie Database
  • 9.
    Our mission, shouldwe decide to accept it, is to fix this solution, so it can perform well and scale. As always, should I or anyone in the audience do it without training, WMDB will disavow any knowledge of our actions. This tape will self-destruct in five seconds. Good luck! Mission Possible Our mission, should we decide to accept it, is to fix this solution, so it can perform well and scale. As always, should I or anyone in the audience do it without training, WMDB will disavow any knowledge of our actions. This tape will self-destruct in five seconds. Good luck! Mission Possible
  • 10.
    • Frequency ofAccess • Subset ✔ • Approximation ✔ • Extended Reference Patterns by Category • Grouping • Computed ✔ • Bucket • Outlier • Representation • Attribute ✔ • Schema Versioning ✔ • Document Versioning • Tree • Polymorphism • Pre-Allocation
  • 11.
    { title: "Dunkirk", ... release_USA: "2017/07/23", release_Mexico:"2017/08/01", release_France: "2017/08/01", release_Festival_San_Jose: "2017/07/22" } Would need the following indexes: { release_USA: 1 } { release_Mexico: 1 } { release_France: 1 } ... { release_Festival_San_Jose: 1 } ... Issue #1: Big Documents, Many Fields and Many Indexes
  • 12.
    Pattern #1: Attribute { title:"Dunkirk", ... release_USA: "2017/07/23", release_Mexico: "2017/08/01", release_France: "2017/08/01", release_Festival_San_Jose: "2017/07/22" }
  • 13.
    Problem: • Lots ofsimilar fields • Common characteristic to search across those fields • Fields present in only a small subset of documents Use cases: • Product attributes like ‘color’, ‘size’, ‘dimensions’, ... • Release dates of a movie in different countries, festivals Attribute Pattern
  • 14.
    Solution: • Field pairsin an array Benefits: • Allow for non deterministic list of attributes • Easy to index { "releases.location": 1, "releases.date": 1 } • Easy to extend with a qualifier, for example: { descriptor: "price", qualifier: "euros", value: Decimal(100.00) } Attribute Pattern - Solution
  • 15.
    Possible solutions: A. Reducethe size of your working set B. Add more RAM per machine C. Start sharding or add more shards Issue #2: Working Set doesn’t fit in RAM
  • 16.
    In this example,we can: • Limit the list of actors and crew to 20 • Limit the embedded reviews to the top 20 Pattern #2: Subset
  • 17.
    Problem: • There isa 1-N or N-N relationship, and only a few documents always need to be shown • Only infrequently do you need to pull all of the depending documents Use cases: • Main actors of a movie • List of reviews or comments Subset Pattern
  • 18.
    Solution: • Keep duplicatesof a small subset of fields in the main collection Benefits: • Allows for fast data retrieval and a reduced working set size • One query brings all the information needed for the "main page" Subset Pattern - Solution
  • 19.
    Issue #3: CPUis on fire...
  • 20.
    { title: "The Shapeof Water", ... viewings: 5,000 viewers: 385,000 revenues: 5,074,800 } Issue #3: ..caused by repeated calculations
  • 21.
    For example: • Applya sum, count, ... • rollup data by minute, hour, day • As long as you don’t mess with your source, you can recreate the rollups Pattern #3: Computed
  • 22.
    Problem: • There isdata that needs to be computed • The same calculations would happen over and over • Reads outnumber writes: • example: 1K writes per hour vs 1M read per hour Use cases: • Have revenues per movie showing, want to display sums • Time series data, Event Sourcing Computed Pattern
  • 23.
    Solution: • Apply acomputation or operation on data and store the result Benefits: • Avoid re-computing the same thing over and over Computed Pattern - Solution
  • 24.
  • 25.
    Issue #4: …for non critical data
  • 26.
    • Only incrementonce in X iterations • Increment by X Pattern #4: Approximation
  • 28.
    Problem: • Data isdifficult to calculate correctly • May be too expensive to update the document every time to keep an exact count • Not critical if the number is exact Use cases: • Web site visits • Time series data Approximation Pattern
  • 29.
    Solution: • Fewer strongerwrites Benefits: • Less writes • Reduces contention on some documents Approximation Pattern – Solution
  • 30.
    • Keeping trackof the schema version of a document Issue #5: Need to change document fields or structure
  • 31.
    Add a fieldto track the schema version number, per document Does not have to exist for version 1 Pattern #5: Schema Versioning
  • 32.
    Problem: • Updating theschema of a database is: • Not atomic • Long operation • May not want to update all documents, only do it on updates Use cases: • Practically any database that will go to production Schema Versioning Pattern
  • 33.
    Solution: • Have afield keeping track of the schema version Benefits: • Don't need to update all the documents at once • May not have to update documents until their next modification Schema Versioning Pattern – Solution
  • 34.
    • How duplicationis handled A. Update both source and target in real time B. Update target from source at regular intervals. Examples: • Most popular items => update nightly • Revenues from a movie => update every hour • Last 10 reviews => update hourly? daily? Aspect of Patterns: Consistency
  • 35.
    What these Patternsdid for us Problem Pattern Non deterministic list of attributes ? Large documents using memory ? Recomputing same values ? High writes and document contention ? No downtime to upgrade schema ?
  • 36.
    Other Patterns • Frequencyof Access • Subset ✔ • Approximation ✔ • Extended Reference • Grouping • Computed ✔ • Bucket • Outlier • Representation • Attribute ✔ • Schema Versioning ✔ • Document Versioning • Tree • Polymorphism • Pre-Allocation
  • 37.
    A. Simple groupingfrom tables to collections is not optimal B. Learn a common vocabulary for designing schemas with MongoDB C. Use patterns as "plug-and-play" to improve performance Take Aways
  • 38.
    A full designexample for a given problem: • E-commerce site • Content Management System • Social Networking • Single view References for complete Solutions
  • 39.
    • https://docs.mongodb.com/manual/core/data-modeling-introduction/ • ProfessionalServices / MongoDB in-person training • Upcoming Online course at MongoDB University: • https://university.mongodb.com • Data Modeling How Can I Learn More About Schema Design?
  • 40.
    Thank You forusing MongoDB ! { name: "Anant Srivastava", email: "anant.srivastava@mongodb.com" }