Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
#MDBW17
Daniel Coupal, Senior Curriculum Engineer
ADVANCED SCHEMA DESIGN
PATTERNS
#MDBW17
WHO AM I?
{ "name": "Daniel Coupal",
"jobs_at_MongoDB": [
{ "job": "Senior Curriculum Engineer",
"from": new Date(...
#MDBW17
PATTERN
• The "Gang of Four":
A design pattern systematically
names, explains, and evaluates
an important and recu...
#MDBW17
WHY THIS TALK?
1) Enable teams to use a common methodology and vocabulary
when designing schemas for MongoDB
2) Gi...
#MDBW17
WMDB -
WORLD MOVIE DATABASE
Any events, characters and
entities depicted in this
presentation are fictional.
Any r...
#MDBW17
WMDB -
WORLD MOVIE DATABASE
First iteration
3 collections:
A. movies
B. moviegoers
C.screenings
#MDBW17
MISSION
POSSIBLEOur mission, should we
decide to accept it, is to fix
this solution, so it can
perform well and sc...
#MDBW17
WHY WE CREATE MODELS
Ensure:
• Good performance
• Scalability
despite a set of constraints ➡
• Hardware
‒ RAM fast...
#MDBW17
HOWEVER …
• Don’t over-design! • Design for:
‒ Performance
‒ Scalability
‒ Simplicity
#MDBW17
CATEGORIES OF PATTERNS
• Representation
‒ Attribute ✓
‒ Tree
‒ Pre-Allocation
• Frequency of access
‒ Subset ✓
‒ A...
#MDBW17
ISSUE #1: TOO MANY OPTIONAL FIELDS
{
title: "Moonlight",
...
release_USA: "2016/09/02",
release_Mexico: "2017/01/2...
#MDBW17
PATTERN #1: ATTRIBUTES
• Easy to index, for example:
{
"releases.location":1,
"releases.date":1
}
#MDBW17
PATTERN #1: ATTRIBUTES
Problem:
• Fields present in only a small subset of documents
• Lots of those fields
• Comm...
#MDBW17
SUMMARY: ATTRIBUTES
Solution:
• Field pairs in an array
• Easy to extend with a qualifier, for example:
‒ {descrip...
#MDBW17
ISSUE #2: WORKING SET DOESN’T FIT IN
RAM
Possible solutions:
A. Reduce the size of your working set
B. Add more RA...
#MDBW17
WHY CAN’T WE
HAVE MORE RAM?
Elon Musk is buying all the
metal for his colony on Mars
#MDBW17
PATTERN #2: SUBSET
In this example, we can:
• Limit the list of actors and
crew to 20
• Limit the embedded
reviews...
#MDBW17
PATTERN #2: SUBSET
Problem:
• There is a 1-N or N-N relationship, and only few documents from
need to be shown alw...
#MDBW17
SUMMARY: SUBSET
Solution:
• Keep duplicates of a small subset of fields in the main collection
Benefits:
• Allows ...
#MDBW17
PATTERN ASPECT: CONSISTENCY
• How duplication is handled
A. Update both source and target in real time
B. Update t...
#MDBW17
ISSUE #3: REPEATED COMPUTATIONS
{
title: "Your Name",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
#MDBW17
PATTERN #3: COMPUTED
For example:
• Apply a sum, count, ...
• rollup data by minute, hour,
day
• As long as you do...
#MDBW17
PATTERN #3: COMPUTED
Problem:
• There is data that needs to be computed
• The same calculations would happen over ...
#MDBW17
SUMMARY: COMPUTED
Solution:
• Apply a computation or operation on data and store the result
Benefits:
• Avoid re-c...
#MDBW17
ISSUE #4: APPROXIMATE VALUES
#MDBW17
PATTERN #4: APPROXIMATION
• Only increment once in X
iterations
• Increment by X
#MDBW17
PATTERN #4: APPROXIMATION
Problem:
• Data is difficult to calculate correctly
• May be too expensive to update the...
#MDBW17
SUMMARY: APPROXIMATION
Solution:
• Fewer stronger writes
Benefits:
• Less writes, reducing contention on some docu...
#MDBW17
ISSUE #5: OUTLIERS DRIVING OUR
SOLUTION
• Trying to model for the worst case
#MDBW17
I WANT TO BE AN EXTRA!
• Not the best way
to be noticed
#MDBW17
PATTERN #5: OVERFLOW
Each group of extras is put
in a bucket of 1000.
If we fill a bucket, we create
a new one.
Al...
#MDBW17
PATTERN #5: OVERFLOW
Problem:
• There is a 1-N relationship
• N can be embedded or referenced, except for few outl...
#MDBW17
SUMMARY: OVERFLOW
Solution:
• Have a field marking a document as an outlier
• Do different queries for the outlier...
#MDBW17
OTHER PATTERNS
• Bucket
• Pre-allocation
• Tree(s)
#MDBW17
BACK TO REALITY
#MDBW17
TAKE AWAYS
• Simple grouping from tables to collections is not optimal
• Learn a common vocabulary for designing s...
#MDBW17
REFERENCES FOR COMPLETE SOLUTIONS
A full design example for a given
problem:
• E-commerce site
• Contents Manageme...
#MDBW17
HOW CAN I LEARN MORE ABOUT SCHEMA
DESIGN?
• More patterns in the published form of this presentation
• MongoDB in-...
#MDBW17
THANK YOU FOR USING MONGODB!
daniel.coupal@mongodb.com
Advanced Schema Design Patterns
Advanced Schema Design Patterns
Upcoming SlideShare
Loading in …5
×

Advanced Schema Design Patterns

3,591 views

Published on

Speaker: Daniel Coupal, Senior Curriculum Engineer, MongoDB
Level: 200 (Intermediate)
Track: Developer
At this point, you may be familiar with the design of MongoDB databases and collections, however what are the frequent patterns you may have to model?

This presentation will build on the knowledge of how to represent common relationships (1-1, 1-N, N-N) into MongoDB. Going further than relationships, this presentation aims at identifying a set of common patterns in a similar way the Gang of Four did for Object Oriented Design. Finally, this presentation will guide you through the steps of modeling those patterns into MongoDB collections.

What You Will Learn:
- How to create the appropriate MongoDB collections for some of the patterns discussed.
- The different relationships from the relational databases world, and understand how those translate to MongoDB collections.
- The patterns that are frequently seen in developing applications with MongoDB, and a specific vocabulary with which to refer to them. For example, “Subset”, “Attributes” and “Rolled Up” are among some of the patterns explored.

Published in: Technology

Advanced Schema Design Patterns

  1. 1. #MDBW17 Daniel Coupal, Senior Curriculum Engineer ADVANCED SCHEMA DESIGN PATTERNS
  2. 2. #MDBW17 WHO AM I? { "name": "Daniel Coupal", "jobs_at_MongoDB": [ { "job": "Senior Curriculum Engineer", "from": new Date("2016-11") }, { "job": "Senior Technical Service Engineer", "from": new Date("2013-11") } ], "previous_jobs": [ "Consultant", "Developer", "Manager Quality & Tools Team", "Manager Software Team", "Tools Developer" ], "likes": [ "food", "beers", "movies", "MongoDB" ] }
  3. 3. #MDBW17 PATTERN • The "Gang of Four": A design pattern systematically names, explains, and evaluates an important and recurring design in object-oriented systems • MongoDB systems can also be built using its own patterns
  4. 4. #MDBW17 WHY THIS TALK? 1) Enable teams to use a common methodology and vocabulary when designing schemas for MongoDB 2) Giving you the ability to model schemas using building blocks 3) Less art and more methodology
  5. 5. #MDBW17 WMDB - WORLD MOVIE DATABASE Any events, characters and entities depicted in this presentation are fictional. Any resemblance or similarity to reality is entirely coincidental
  6. 6. #MDBW17 WMDB - WORLD MOVIE DATABASE First iteration 3 collections: A. movies B. moviegoers C.screenings
  7. 7. #MDBW17 MISSION POSSIBLEOur mission, should we decide to accept it, is to fix this solution, so it can perform well and scale. As always, should I or anyone in the audience do it without training, WMDB will disavow any knowledge of our actions. This tape will self-destruct in five seconds. Good luck!
  8. 8. #MDBW17 WHY WE CREATE MODELS Ensure: • Good performance • Scalability despite a set of constraints ➡ • Hardware ‒ RAM faster than Disk ‒ Disk cheaper than RAM ‒ Network latency ‒ Reduce costs $$$ • Database Server ‒ Maximum size for a document ‒ Atomicity of a write • Data set ‒ Size of data
  9. 9. #MDBW17 HOWEVER … • Don’t over-design! • Design for: ‒ Performance ‒ Scalability ‒ Simplicity
  10. 10. #MDBW17 CATEGORIES OF PATTERNS • Representation ‒ Attribute ✓ ‒ Tree ‒ Pre-Allocation • Frequency of access ‒ Subset ✓ ‒ Approximation ✓ • Grouping ‒ Computed ✓ ‒ Overflow ✓ ‒ Bucket
  11. 11. #MDBW17 ISSUE #1: TOO MANY OPTIONAL FIELDS { title: "Moonlight", ... release_USA: "2016/09/02", release_Mexico: "2017/01/27", release_France: "2017/02/01", release_Festival_Mill_Valley: "2017/10/10" } Would need the following indexes: { release_USA: 1 } { release_Mexico: 1 } { release_France: 1 } ... { release_Festival_Mill_Valley: 1 } ...
  12. 12. #MDBW17 PATTERN #1: ATTRIBUTES • Easy to index, for example: { "releases.location":1, "releases.date":1 }
  13. 13. #MDBW17 PATTERN #1: ATTRIBUTES Problem: • Fields present in only a small subset of documents • Lots of those fields • Common characteristic to search across those fields together Use cases: • Product attributes like ‘color’, ‘size’, ‘dimensions’, ... • Release dates of a movie in different countries, festivals
  14. 14. #MDBW17 SUMMARY: ATTRIBUTES Solution: • Field pairs in an array • Easy to extend with a qualifier, for example: ‒ {descriptor: "price", qualifier: "euros", value: Decimal(100.00)} Benefits: • Allow for non deterministic list of attributes • Easy to index
  15. 15. #MDBW17 ISSUE #2: WORKING SET DOESN’T FIT IN RAM Possible solutions: A. Reduce the size of your working set B. Add more RAM per machine C. Start sharding or add more shards
  16. 16. #MDBW17 WHY CAN’T WE HAVE MORE RAM? Elon Musk is buying all the metal for his colony on Mars
  17. 17. #MDBW17 PATTERN #2: SUBSET In this example, we can: • Limit the list of actors and crew to 20 • Limit the embedded reviews to the top 20
  18. 18. #MDBW17 PATTERN #2: SUBSET Problem: • There is a 1-N or N-N relationship, and only few documents from need to be shown always • Only infrequently do you need to pull all of the depending documents Use cases: • Main actors of a movie • List of reviews or comments
  19. 19. #MDBW17 SUMMARY: SUBSET Solution: • Keep duplicates of a small subset of fields in the main collection Benefits: • Allows for fast data retrieval and a reduced working set size • One query brings all the information needed for the "main page"
  20. 20. #MDBW17 PATTERN ASPECT: CONSISTENCY • How duplication is handled A. Update both source and target in real time B. Update target from source at regular intervals. Examples: o Most popular items => update nightly o Revenues from a movie => update every hour o Last 10 reviews => update hourly? daily?
  21. 21. #MDBW17 ISSUE #3: REPEATED COMPUTATIONS { title: "Your Name", ... viewings: 5,000 viewers: 385,000 revenues: 5,074,800 }
  22. 22. #MDBW17 PATTERN #3: COMPUTED For example: • Apply a sum, count, ... • rollup data by minute, hour, day • As long as you don’t mess with your source, you can recreate the rollups
  23. 23. #MDBW17 PATTERN #3: COMPUTED Problem: • There is data that needs to be computed • The same calculations would happen over and over • Reads outnumber writes: ‒ example: 1K writes per hour vs 1M read per hour Use cases: • Have revenues per movie showing, want to display sums • Time series data, Event Sourcing
  24. 24. #MDBW17 SUMMARY: COMPUTED Solution: • Apply a computation or operation on data and store the result Benefits: • Avoid re-computing the same thing over and over • Replaces a view
  25. 25. #MDBW17 ISSUE #4: APPROXIMATE VALUES
  26. 26. #MDBW17 PATTERN #4: APPROXIMATION • Only increment once in X iterations • Increment by X
  27. 27. #MDBW17 PATTERN #4: APPROXIMATION Problem: • Data is difficult to calculate correctly • May be too expensive to update the document every time to keep an exact count • No one gives a damn if the number is exact Use cases: • Population of a country • Web site visits
  28. 28. #MDBW17 SUMMARY: APPROXIMATION Solution: • Fewer stronger writes Benefits: • Less writes, reducing contention on some documents
  29. 29. #MDBW17 ISSUE #5: OUTLIERS DRIVING OUR SOLUTION • Trying to model for the worst case
  30. 30. #MDBW17 I WANT TO BE AN EXTRA! • Not the best way to be noticed
  31. 31. #MDBW17 PATTERN #5: OVERFLOW Each group of extras is put in a bucket of 1000. If we fill a bucket, we create a new one. Also known as the "Justin Bieber" pattern
  32. 32. #MDBW17 PATTERN #5: OVERFLOW Problem: • There is a 1-N relationship • N can be embedded or referenced, except for few outliers • The list of references may not even fit into an array • You don’t want the outliers to drive your overall design Use cases: • Some very popular people with a huge list of followers • Movie with a ton of actors
  33. 33. #MDBW17 SUMMARY: OVERFLOW Solution: • Have a field marking a document as an outlier • Do different queries for the outliers Benefits: • The design is not driven by few outliers. However, you will need to handle the outliers on the application side
  34. 34. #MDBW17 OTHER PATTERNS • Bucket • Pre-allocation • Tree(s)
  35. 35. #MDBW17 BACK TO REALITY
  36. 36. #MDBW17 TAKE AWAYS • Simple grouping from tables to collections is not optimal • Learn a common vocabulary for designing schemas with MongoDB • Use patterns as "plug-and-play" for your future designs ‒ Attribute ‒ Subset ‒ Computed ‒ Approximation ‒ Overflow
  37. 37. #MDBW17 REFERENCES FOR COMPLETE SOLUTIONS A full design example for a given problem: • E-commerce site • Contents Management System • Social Networking • Single view • …
  38. 38. #MDBW17 HOW CAN I LEARN MORE ABOUT SCHEMA DESIGN? • More patterns in the published form of this presentation • MongoDB in-person training courses on Schema Design • Upcoming Online course at MongoDB University: ‒ https://university.mongodb.com ‒ M220 Data Modeling
  39. 39. #MDBW17 THANK YOU FOR USING MONGODB! daniel.coupal@mongodb.com

×