SlideShare a Scribd company logo
1 of 47
Advanced Schema Design
Patterns
Muthu Chinnasamy, Technical Director – SI Partners,
MongoDB
Muthu Chinnasamy
Technical Director – SI Partners, MongoDB
@muthumongo
Why This Talk?
Over ten years with the
document model
Use of a common methodology
and vocabulary when designing
schemas for MongoDB
Ability to model schemas using
building blocks
Less art and more methodology
Pattern
The "Gang of Four":
A design pattern systematically
names, explains, and evaluates
an important and recurring design
in object-oriented systems
MongoDB systems can also be
built using its own patterns
Why Do We Create
Models?
Ensure:
• Good performance
• Scalability
despite constraints
Hardware
• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
Database Server
• Maximum size for a document
• Atomicity of a write
Data set
• Size of data
However, Don't Over Design!
WMDB -
World Movie Database
Any events, characters and
entities depicted in this
presentation are fictional.
Any resemblance or similarity
to reality is entirely
coincidental
WMDB -
World Movie Database
First iteration
3 collections:
A. movies
B. moviegoers
C.screenings
Our mission, should we decide to accept it,
is to fix this solution, so it can perform well
and scale.
As always, should I or anyone in the
audience do it without training, WMDB will
disavow any knowledge of our actions.
This tape will self-destruct in five seconds.
Good luck!
Mission Possible
• Frequency of Access
• Subset ✔️
• Approximation ✔️
• Extended Reference
Patterns by Category
• Grouping
• Computed ✔️
• Bucket
• Outlier
• Representation
• Attribute ✔️
• Schema Versioning ✔️
• Document Versioning
• Tree
• Polymorphism
• Pre-Allocation
{
title: "Dunkirk",
...
release_USA: "2017/07/23",
release_Mexico: "2017/08/01",
release_France: "2017/08/01",
release_Festival_San_Jose:
"2017/07/22"
}
Would need the following indexes:
{ release_USA: 1 }
{ release_Mexico: 1 }
{ release_France: 1 }
...
{ release_Festival_San_Jose: 1 }
...
Issue #1: Big Documents, Many
Fields
and Many Indexes
Pattern #1: Attribute
{
title: "Dunkirk",
...
release_USA: "2017/07/23",
release_Mexico: "2017/08/01",
release_France: "2017/08/01",
release_Festival_San_Jose:
"2017/07/22"
}
Problem:
Lots of similar fields
Common characteristic to search across those fields together
Fields present in only a small subset of documents
Use cases:
Product attributes like ‘color’, ‘size’, ‘dimensions’, ...
Release dates of a movie in different countries, festivals
Attribute Pattern
Solution:
Field pairs in an array
Benefits:
Allow for non deterministic list of attributes
Easy to index
{ "releases.location": 1, "releases.date": 1 }
Easy to extend with a qualifier, for example:
{ descriptor: "price", qualifier: "euros", value: Decimal(100.00) }
Attribute Pattern - Solution
Possible solutions:
A. Reduce the size of your working set
B. Add more RAM per machine
C. Start sharding or add more shards
Issue #2: Working Set Doesn’t Fit in
RAM
WORKING SET
WMDB -
World Movie Database
First iteration
3 collections:
A. movies
B. moviegoers
C.screenings
In this example, we can:
Limit the list of actors and
crew to 20
Limit the embedded reviews
to the top 20
…
Pattern #2: Subset
Solution:
Keep duplicates of a small subset of fields in the main collection
Benefits:
Allows for fast data retrieval and a reduced working set size
One query brings all the information needed for the "main page"
Subset Pattern - Solution
Question:
Which MongoDB feature introduced in version 3.6 will allow me to
notify an application if the name of an actor is changed?
Quiz A
Subset Pattern
CPU is on fire!
Issue #3: Lot of CPU Usage
{
title: "The Shape of Water",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
Issue #3: ...caused by repeated
calculations
For example:
Apply a sum, count, ...
rollup data by minute, hour,
day
As long as you don’t mess
with your source, you can
recreate the rollups
Pattern #3: Computed
Problem:
There is data that needs to be computed
The same calculations would happen over and over
Reads outnumber writes:
• example: 1K writes per hour vs 1M read per hour
Use cases:
Have revenues per movie showing, want to display sums
Time series data, Event Sourcing
Computed Pattern
Solution:
Apply a computation or operation on data and store the result
Benefits:
Avoid re-computing the same thing over and over
Computed Pattern - Solution
Question:
Which Relational Database feature is typically used to mimic the
computed pattern?
Quiz B
Computed Pattern
Issue #4: Lots of Writes
Updates on movie data
Screenings
Other
Web page counters
Issue #4: … For Non Critical Data
Only increment once in X
iterations
Increment by X
Pattern #4: Approximation
Updates on movie data
Screenings
Other
Web page counters
Problem:
Data is difficult to calculate correctly
May be too expensive to update the document every time to keep
an exact count
Exactness of count may not be of high concern
Use cases:
Population of a country
Web site visits
Approximation Pattern
Solution:
Fewer stronger writes
Benefits:
Less writes, reducing contention on some documents
Approximation Pattern –
Solution
Keeping track of the schema version of a document
Issue #5: Need to Change the List
of Fields in the Documents
Add a field to track the
schema version number, per
document
Does not have to exist for
version 1
Pattern #5: Schema Versioning
Problem:
Updating the schema of a database is:
• Not atomic
• Long operation
• May not want to update all documents, only do it on updates
Use cases:
Practically any database that will go to production
Schema Versioning Pattern
Solution:
Have a field keeping track of the schema version
Benefits:
Don't need to update all the documents at once
May not have to update documents until their next modification
Schema Versioning Pattern –
Solution
BACK to Reality
How duplication is handled
A. Update both source and target in real time
B. Update target from source at regular intervals. Examples:
• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?
Aspect of Patterns: Consistency
What our Patterns did for us
Problem Pattern
Messy and Large Documents Attribute
Too much RAM Subset
Too much CPU Computed
Too many disk accesses Approximation
No downtime to upgrade schema Schema Versioning
• Bucket
• Grouping documents together, to have less documents
• Document Versioning
• Tracking of content changes in a document
• Outlier
• Avoid few documents driving the design and impact performance for
all
• External Reference
• Tree(s)
• Polymorphism
• Pre-allocation
Other Patterns
A. Simple grouping from tables to collections is not optimal
B. Learn a common vocabulary for designing schemas with MongoDB
C. Use patterns as "plug-and-play" to improve performance
Takeaways
A full design example for a
given problem:
E-commerce site
Contents Management
System
Social Networking
Single view
…
References for complete Solutions
More patterns in a follow up to this presentation
MongoDB in-person training courses on Schema Design
MongoDB Building With Patterns Blog series
Upcoming Online course at
MongoDB University:
• https://university.mongodb.com
• Data Modeling
How Can I Learn More About
Schema Design?
Question:
Which Pattern is used in the
following document?
{ "name": "Ken W. Alger",
"jobs_at_MongoDB": [
{ "job": "Developer Advocate",
"from": new Date("2018-07") }
],
"previous_jobs": [
"Production Manager",
"Executive Chef",
"Congressional Assistant",
"Entrepenuer”
],
"likes": [ "food", "beers", "movies", "MongoDB" ],
"email": "ken.alger@mongodb.com"
}
Quiz C
Which Pattern is used
Thank You for using MongoDB !

More Related Content

What's hot

Refactoring: Improving the design of existing code
Refactoring: Improving the design of existing codeRefactoring: Improving the design of existing code
Refactoring: Improving the design of existing codeKnoldus Inc.
 
Art of refactoring - Code Smells and Microservices Antipatterns
Art of refactoring - Code Smells and Microservices AntipatternsArt of refactoring - Code Smells and Microservices Antipatterns
Art of refactoring - Code Smells and Microservices AntipatternsEl Mahdi Benzekri
 
Iterative enhancement model
Iterative enhancement modelIterative enhancement model
Iterative enhancement modelRahul Sharma
 
Requirements validation - requirements engineering
Requirements validation - requirements engineeringRequirements validation - requirements engineering
Requirements validation - requirements engineeringRa'Fat Al-Msie'deen
 
Code refactoring
Code refactoringCode refactoring
Code refactoringLalit Kale
 
Introduction to design patterns
Introduction to design patternsIntroduction to design patterns
Introduction to design patternsAmit Kabra
 
Software evolution and maintenance
Software evolution and maintenanceSoftware evolution and maintenance
Software evolution and maintenanceFeliciano Colella
 
SE_Lec 05_System Modelling and Context Model
SE_Lec 05_System Modelling and Context ModelSE_Lec 05_System Modelling and Context Model
SE_Lec 05_System Modelling and Context ModelAmr E. Mohamed
 
Software Project Management (lecture 3)
Software Project Management (lecture 3)Software Project Management (lecture 3)
Software Project Management (lecture 3)Syed Muhammad Hammad
 
Introduction to object-oriented analysis and design (OOA/D)
Introduction to object-oriented analysis and design (OOA/D)Introduction to object-oriented analysis and design (OOA/D)
Introduction to object-oriented analysis and design (OOA/D)Ahmed Farag
 
Requirements Engineering - Frameworks & Standards
Requirements Engineering - Frameworks & StandardsRequirements Engineering - Frameworks & Standards
Requirements Engineering - Frameworks & StandardsBirgit Penzenstadler
 
Reverse engineering
Reverse  engineeringReverse  engineering
Reverse engineeringYuffie Valen
 
ADUF - Adaptable Design Up Front
ADUF -  Adaptable Design Up FrontADUF -  Adaptable Design Up Front
ADUF - Adaptable Design Up FrontHayim Makabee
 
Entity relationship diagram (erd)
Entity relationship diagram (erd)Entity relationship diagram (erd)
Entity relationship diagram (erd)tameemyousaf
 
Refactoring for Software Design Smells
Refactoring for Software Design SmellsRefactoring for Software Design Smells
Refactoring for Software Design SmellsGanesh Samarthyam
 

What's hot (20)

Refactoring: Improving the design of existing code
Refactoring: Improving the design of existing codeRefactoring: Improving the design of existing code
Refactoring: Improving the design of existing code
 
Art of refactoring - Code Smells and Microservices Antipatterns
Art of refactoring - Code Smells and Microservices AntipatternsArt of refactoring - Code Smells and Microservices Antipatterns
Art of refactoring - Code Smells and Microservices Antipatterns
 
Iterative enhancement model
Iterative enhancement modelIterative enhancement model
Iterative enhancement model
 
Requirements validation - requirements engineering
Requirements validation - requirements engineeringRequirements validation - requirements engineering
Requirements validation - requirements engineering
 
Code refactoring
Code refactoringCode refactoring
Code refactoring
 
Introduction to design patterns
Introduction to design patternsIntroduction to design patterns
Introduction to design patterns
 
Software evolution and maintenance
Software evolution and maintenanceSoftware evolution and maintenance
Software evolution and maintenance
 
SE_Lec 05_System Modelling and Context Model
SE_Lec 05_System Modelling and Context ModelSE_Lec 05_System Modelling and Context Model
SE_Lec 05_System Modelling and Context Model
 
Software Project Management (lecture 3)
Software Project Management (lecture 3)Software Project Management (lecture 3)
Software Project Management (lecture 3)
 
Introduction to object-oriented analysis and design (OOA/D)
Introduction to object-oriented analysis and design (OOA/D)Introduction to object-oriented analysis and design (OOA/D)
Introduction to object-oriented analysis and design (OOA/D)
 
Requirements engineering
Requirements engineeringRequirements engineering
Requirements engineering
 
Requirements Engineering - Frameworks & Standards
Requirements Engineering - Frameworks & StandardsRequirements Engineering - Frameworks & Standards
Requirements Engineering - Frameworks & Standards
 
Unit 3
Unit 3Unit 3
Unit 3
 
Ch18
Ch18Ch18
Ch18
 
Reverse engineering
Reverse  engineeringReverse  engineering
Reverse engineering
 
ADUF - Adaptable Design Up Front
ADUF -  Adaptable Design Up FrontADUF -  Adaptable Design Up Front
ADUF - Adaptable Design Up Front
 
Entity relationship diagram (erd)
Entity relationship diagram (erd)Entity relationship diagram (erd)
Entity relationship diagram (erd)
 
Jeet ooad unit-2
Jeet ooad unit-2Jeet ooad unit-2
Jeet ooad unit-2
 
Sdlc 4
Sdlc 4Sdlc 4
Sdlc 4
 
Refactoring for Software Design Smells
Refactoring for Software Design SmellsRefactoring for Software Design Smells
Refactoring for Software Design Smells
 

Similar to MongoDB.local Seattle 2019: Advanced Schema Design Patterns

MongoDB.local Dallas 2019: Advanced Schema Design Patterns
MongoDB.local Dallas 2019: Advanced Schema Design PatternsMongoDB.local Dallas 2019: Advanced Schema Design Patterns
MongoDB.local Dallas 2019: Advanced Schema Design PatternsMongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
SH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxSH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxMongoDB
 
SH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxSH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxMongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
MongoDb Schema Pattern - Kalpit Pandit.pptx
MongoDb Schema Pattern - Kalpit Pandit.pptxMongoDb Schema Pattern - Kalpit Pandit.pptx
MongoDb Schema Pattern - Kalpit Pandit.pptxKalpitPandit1
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuningYosuke Mizutani
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsMatthew Kalan
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11CloudExpoEurope
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11aseager
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11aseager
 
Dev tools rendering & memory profiling
Dev tools rendering & memory profilingDev tools rendering & memory profiling
Dev tools rendering & memory profilingOpen Academy
 
Google Chrome DevTools: Rendering & Memory profiling on Open Academy 2013
Google Chrome DevTools: Rendering & Memory profiling on Open Academy 2013Google Chrome DevTools: Rendering & Memory profiling on Open Academy 2013
Google Chrome DevTools: Rendering & Memory profiling on Open Academy 2013Máté Nádasdi
 
Memory efficient java tutorial practices and challenges
Memory efficient java tutorial practices and challengesMemory efficient java tutorial practices and challenges
Memory efficient java tutorial practices and challengesmustafa sarac
 
MongoDB and In-Memory Computing
MongoDB and In-Memory ComputingMongoDB and In-Memory Computing
MongoDB and In-Memory ComputingDylan Tong
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupSri Ambati
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goalskamaelian
 

Similar to MongoDB.local Seattle 2019: Advanced Schema Design Patterns (20)

MongoDB.local Dallas 2019: Advanced Schema Design Patterns
MongoDB.local Dallas 2019: Advanced Schema Design PatternsMongoDB.local Dallas 2019: Advanced Schema Design Patterns
MongoDB.local Dallas 2019: Advanced Schema Design Patterns
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
SH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxSH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptx
 
SH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxSH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptx
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
MongoDb Schema Pattern - Kalpit Pandit.pptx
MongoDb Schema Pattern - Kalpit Pandit.pptxMongoDb Schema Pattern - Kalpit Pandit.pptx
MongoDb Schema Pattern - Kalpit Pandit.pptx
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
Dev tools rendering & memory profiling
Dev tools rendering & memory profilingDev tools rendering & memory profiling
Dev tools rendering & memory profiling
 
Google Chrome DevTools: Rendering & Memory profiling on Open Academy 2013
Google Chrome DevTools: Rendering & Memory profiling on Open Academy 2013Google Chrome DevTools: Rendering & Memory profiling on Open Academy 2013
Google Chrome DevTools: Rendering & Memory profiling on Open Academy 2013
 
Memory efficient java tutorial practices and challenges
Memory efficient java tutorial practices and challengesMemory efficient java tutorial practices and challenges
Memory efficient java tutorial practices and challenges
 
MongoDB and In-Memory Computing
MongoDB and In-Memory ComputingMongoDB and In-Memory Computing
MongoDB and In-Memory Computing
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

MongoDB.local Seattle 2019: Advanced Schema Design Patterns

  • 1.
  • 2. Advanced Schema Design Patterns Muthu Chinnasamy, Technical Director – SI Partners, MongoDB
  • 3. Muthu Chinnasamy Technical Director – SI Partners, MongoDB @muthumongo
  • 4. Why This Talk? Over ten years with the document model Use of a common methodology and vocabulary when designing schemas for MongoDB Ability to model schemas using building blocks Less art and more methodology
  • 5. Pattern The "Gang of Four": A design pattern systematically names, explains, and evaluates an important and recurring design in object-oriented systems MongoDB systems can also be built using its own patterns
  • 6. Why Do We Create Models? Ensure: • Good performance • Scalability despite constraints Hardware • RAM faster than Disk • Disk cheaper than RAM • Network latency • Reduce costs $$$ Database Server • Maximum size for a document • Atomicity of a write Data set • Size of data
  • 8. WMDB - World Movie Database Any events, characters and entities depicted in this presentation are fictional. Any resemblance or similarity to reality is entirely coincidental
  • 9. WMDB - World Movie Database First iteration 3 collections: A. movies B. moviegoers C.screenings
  • 10. Our mission, should we decide to accept it, is to fix this solution, so it can perform well and scale. As always, should I or anyone in the audience do it without training, WMDB will disavow any knowledge of our actions. This tape will self-destruct in five seconds. Good luck! Mission Possible
  • 11.
  • 12. • Frequency of Access • Subset ✔️ • Approximation ✔️ • Extended Reference Patterns by Category • Grouping • Computed ✔️ • Bucket • Outlier • Representation • Attribute ✔️ • Schema Versioning ✔️ • Document Versioning • Tree • Polymorphism • Pre-Allocation
  • 13. { title: "Dunkirk", ... release_USA: "2017/07/23", release_Mexico: "2017/08/01", release_France: "2017/08/01", release_Festival_San_Jose: "2017/07/22" } Would need the following indexes: { release_USA: 1 } { release_Mexico: 1 } { release_France: 1 } ... { release_Festival_San_Jose: 1 } ... Issue #1: Big Documents, Many Fields and Many Indexes
  • 14. Pattern #1: Attribute { title: "Dunkirk", ... release_USA: "2017/07/23", release_Mexico: "2017/08/01", release_France: "2017/08/01", release_Festival_San_Jose: "2017/07/22" }
  • 15. Problem: Lots of similar fields Common characteristic to search across those fields together Fields present in only a small subset of documents Use cases: Product attributes like ‘color’, ‘size’, ‘dimensions’, ... Release dates of a movie in different countries, festivals Attribute Pattern
  • 16. Solution: Field pairs in an array Benefits: Allow for non deterministic list of attributes Easy to index { "releases.location": 1, "releases.date": 1 } Easy to extend with a qualifier, for example: { descriptor: "price", qualifier: "euros", value: Decimal(100.00) } Attribute Pattern - Solution
  • 17. Possible solutions: A. Reduce the size of your working set B. Add more RAM per machine C. Start sharding or add more shards Issue #2: Working Set Doesn’t Fit in RAM
  • 19. WMDB - World Movie Database First iteration 3 collections: A. movies B. moviegoers C.screenings
  • 20. In this example, we can: Limit the list of actors and crew to 20 Limit the embedded reviews to the top 20 … Pattern #2: Subset
  • 21. Solution: Keep duplicates of a small subset of fields in the main collection Benefits: Allows for fast data retrieval and a reduced working set size One query brings all the information needed for the "main page" Subset Pattern - Solution
  • 22. Question: Which MongoDB feature introduced in version 3.6 will allow me to notify an application if the name of an actor is changed? Quiz A Subset Pattern
  • 23. CPU is on fire! Issue #3: Lot of CPU Usage
  • 24. { title: "The Shape of Water", ... viewings: 5,000 viewers: 385,000 revenues: 5,074,800 } Issue #3: ...caused by repeated calculations
  • 25. For example: Apply a sum, count, ... rollup data by minute, hour, day As long as you don’t mess with your source, you can recreate the rollups Pattern #3: Computed
  • 26. Problem: There is data that needs to be computed The same calculations would happen over and over Reads outnumber writes: • example: 1K writes per hour vs 1M read per hour Use cases: Have revenues per movie showing, want to display sums Time series data, Event Sourcing Computed Pattern
  • 27. Solution: Apply a computation or operation on data and store the result Benefits: Avoid re-computing the same thing over and over Computed Pattern - Solution
  • 28. Question: Which Relational Database feature is typically used to mimic the computed pattern? Quiz B Computed Pattern
  • 29. Issue #4: Lots of Writes Updates on movie data Screenings Other Web page counters
  • 30. Issue #4: … For Non Critical Data
  • 31. Only increment once in X iterations Increment by X Pattern #4: Approximation
  • 32. Updates on movie data Screenings Other Web page counters
  • 33. Problem: Data is difficult to calculate correctly May be too expensive to update the document every time to keep an exact count Exactness of count may not be of high concern Use cases: Population of a country Web site visits Approximation Pattern
  • 34. Solution: Fewer stronger writes Benefits: Less writes, reducing contention on some documents Approximation Pattern – Solution
  • 35. Keeping track of the schema version of a document Issue #5: Need to Change the List of Fields in the Documents
  • 36. Add a field to track the schema version number, per document Does not have to exist for version 1 Pattern #5: Schema Versioning
  • 37. Problem: Updating the schema of a database is: • Not atomic • Long operation • May not want to update all documents, only do it on updates Use cases: Practically any database that will go to production Schema Versioning Pattern
  • 38. Solution: Have a field keeping track of the schema version Benefits: Don't need to update all the documents at once May not have to update documents until their next modification Schema Versioning Pattern – Solution
  • 40. How duplication is handled A. Update both source and target in real time B. Update target from source at regular intervals. Examples: • Most popular items => update nightly • Revenues from a movie => update every hour • Last 10 reviews => update hourly? daily? Aspect of Patterns: Consistency
  • 41. What our Patterns did for us Problem Pattern Messy and Large Documents Attribute Too much RAM Subset Too much CPU Computed Too many disk accesses Approximation No downtime to upgrade schema Schema Versioning
  • 42. • Bucket • Grouping documents together, to have less documents • Document Versioning • Tracking of content changes in a document • Outlier • Avoid few documents driving the design and impact performance for all • External Reference • Tree(s) • Polymorphism • Pre-allocation Other Patterns
  • 43. A. Simple grouping from tables to collections is not optimal B. Learn a common vocabulary for designing schemas with MongoDB C. Use patterns as "plug-and-play" to improve performance Takeaways
  • 44. A full design example for a given problem: E-commerce site Contents Management System Social Networking Single view … References for complete Solutions
  • 45. More patterns in a follow up to this presentation MongoDB in-person training courses on Schema Design MongoDB Building With Patterns Blog series Upcoming Online course at MongoDB University: • https://university.mongodb.com • Data Modeling How Can I Learn More About Schema Design?
  • 46. Question: Which Pattern is used in the following document? { "name": "Ken W. Alger", "jobs_at_MongoDB": [ { "job": "Developer Advocate", "from": new Date("2018-07") } ], "previous_jobs": [ "Production Manager", "Executive Chef", "Congressional Assistant", "Entrepenuer” ], "likes": [ "food", "beers", "movies", "MongoDB" ], "email": "ken.alger@mongodb.com" } Quiz C Which Pattern is used
  • 47. Thank You for using MongoDB !

Editor's Notes

  1. Welcome May not have time for questions, however come see me at the end
  2. 10 Years, Vocabulary, Building Blocks, "Art", => Why create models? We use this content in our internal trainings. The goal is not to teach you about doing schema design. I am expecting you to either have done some with MongoDB or with a Relational Database My goal is to help you formalize the process of creating schemas for MongoDB, help you work in a team by sharing visuals, vocabulary
  3. Building blocks, Some patterns, => Same for MongoDB Basically the ones who wrote this book on "Design Patterns" GOF are Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides https://en.wikipedia.org/wiki/Design_Patterns Key words are "Elements of Reusable Software" Assemble their experience on designing and implementing software over the years They found that a lot of the solutions were sharing some "patterns" Examples of patterns from "Design Patterns" Types: Creational (5), Structural (7), Behavioral (11) Singleton (restrict the creation to a single object for a given class) Observer (number of objects to see an event) Command (user operation) Decorator (embellishing a UI element) Memento (ability to restore an object to a previous state) … So, they went and made a catalog of those "patterns". The idea is enable people who write software to share a common language and have building blocks for solutions.
  4. Performance & scalability, "air" Before we get going, let's just answer why we create models. In a perfect world, you don't really have to model. I mean if everything is super fast and resources are abundant, you really don't care where and how data is stored Every day I get up I don't make plans on how I will breathe air. However if you go to space or under water, you will need a "design" that will let you get the amount of air you need.
  5. Design is optional, cost of developer, 5 or 10 shards If performance is not an issue, meaning you have resources to spare, then you are likely to model for simplicity. The reason is that software engineers are very expensive. You may not think so, but your manager does. If you need to shard the database, it is likely that performance is very important Why using 10 shards, if you can reduce the number of operations (reads and writes) by 2 and be able to do the same with 5 shards?
  6. Fictional site, Entities In order to illustrate this talk, let's assume there is a fictional site called the "World Movie Database". This site is so popular that everyone goes there on Thursdays before the release of new movies and it crashes the site. Then some people tried to migrate the site to a NoSQL database, MongoDB obviously.
  7. Collections, grouping not optimal => accept challenge This is the first try of trying to move the schema from Relational to MongoDB. There are 3 collections: movies, moviegoers and screenings. Simply grouping entities into collections is not optimal. The solution using this design did not perform much better than the previous one. This is still normalized. When you remove this restriction, duplication is fine, 1-1 relationships are fine. You open the door to some important transformations. Those will be our patterns. [NOTE] Use "Sync Visibility" once you activate the color layer to also see it in the PNG file.
  8. Perform & Scale, without training Our goal, no need to say, is to fix this website before it gets the same fate as this tape recorder.
  9. Some heroes need a lot of gadgets to achieve their mission. We will do our with a very powerfull one, patterns.
  10. Categories, top 5 most frequent patterns We will use patterns, like the Gang of Four. Most patterns can be grouped in 3 categories. We will cover those patterns identified with check marks in this presentation. Also, I will cover the patterns in order of importance, or so. For the other ones, I will refer you to the slides of this presentation and subsequent content we will have on the subject.
  11. Documents are too big How do I search on movies being released on a given date in the USA? The same would apply to products you could see on E-commerce site. For example, clothes may have a size that is expressed as S, M, L, while for some other products like a laptop, size would be something like 13", 15"
  12. Inventory of things to insure Polymorphic entities Vehicles: submarine, car
  13. "Adding a qualifier on the attribute" may be "currency"
  14. Definition of Working set With everyone pounding on the WMDB site, it was observed that the working set does not fit in memory. What can you do? Looking at the design we see that we are putting all the actors and all reviews for a given movie in the main document [TODO] Add a drawing showing what the working set is
  15. Definition of Working set Working set represents the total body of data that the application uses in the course of normal operation. Often this is a subset of the total data size, but the specific size of the working set depends on actual moment-to-moment use of the database. If you run a query that requires MongoDB to scan every document in a collection, the working set will expand to include every document. Depending on physical memory size, this may cause documents in the working set to “page out,” or to be removed from physical memory by the operating system. The next time MongoDB needs to access these documents, MongoDB may incur a hard page fault. For best performance, the majority of your active set should fit in RAM.
  16. Remember this collection in the middle?
  17. The collection "castandcrew" contains all the actors, but also the producers, costume makers, stunts, etc. For this pattern to be worth it, it has to have a fair amount of information left aside.
  18. Please come see us at the Education booth, we have prizes for people who will get the answers right
  19. What causes the CPU to heat up?
  20. As you may guess, people pay attention to the popularity of the movies. So, metrics like "revenues" and "viewers" are really important. In the current design, those numbers are calculated every time the page of a movie is displayed. Let’s calculate those numbers once in a while and stick the results on the page instead.
  21. Also refer to "Rolled up" as CQRS - Command Query Responsibility Segregation
  22. Another thing that was observed with the current design is that trying to keep track of all page views of the site resulted in very poor performance. That was seen for both MMAPv1 and WT. In MMAPv1, you get a lot of threads looking for the write lock. While with WT, you get a lot of write conflicts that need to be retried. One solution is to record "good enough" numbers. Is it important that the count is 525 million or 525,00,234. What is the tolerance level here? Let’s assume 1000. In this case, we will let the application update the page views by 1000, however only 1/1000th of the time. Statistically, we should get a result very close to the exact count, however doing only 1/1000th of the writes. If you make the parallel to a movie, we never see a movie as a continuous image, the movie is made by displaying 24 static images per second, however this is enough to our eyes to not see the discontinuties. How do you do that? Let’s have the application run a (X mod 1000) operation, where X is a random number. If the result is 0, let’s update the counter by 1000.
  23. Another thing that was observed with the current design is that trying to keep track of all page views of the site resulted in very poor performance. That was seen for both MMAPv1 and WT. In MMAPv1, you get a lot of threads looking for the write lock. While with WT, you get a lot of write conflicts that need to be retried. One solution is to record "good enough" numbers. Well no one cares that the count is 100 millions or 100 millions and few. What is the tolerance level here? Let’s assume 1000. In this case, we will let the application update the page views by 1000, however only 1/1000th of the time. Statistically, we should get a result very close to the exact count, however doing only 1/1000th of the writes. If you make the parallel to a movie, we never see a movie as a continuous image, the movie is made by displaying 24 static images per second, however this is enough to our eyes to not see the discontinuities. How do you do that? Let’s have the application run a (X mod 1000) operation, where X is a random number. If the result is 0, let’s update the counter by 1000.
  24. You can have a counter. Once you reach the count, you do the write. Or you can use a random generator and when you get a specific value, you do the write. As you might guess, this simple pattern is also applicable to Relational databases. … it is just that NoSQL people have more tricks to handle performance bottlenecks.
  25. Let's face it configuration management and database usually don't work well together. Database tend to keep the "latest" state of your data, while "CM" systems remember everything. Those of us who checked in stupid mistakes in Git, ClearCase, etc know CM systems have a very long memory. For this pattern, we are keeping track of the shape of the document. We are not addressing keeping track of the different contents of the document itself. This other case is solved by the Document Versioning pattern.
  26. Instead of using a "version" field, we could discover the version number based on fields
  27. - Few million references would not even fit into an embedded array. And if it did, you would not want to construct a query by passing a million values to the $in operator.
  28. MongoDB has been around for many years. We wish we could have gone in the future a few years ago and see how people use our database. That did not happen. However, now we know better how people are designing with MongoDB. We are able to identify patterns because we have seen a lot of models. Those are "plug-and-play" elements that let you go faster in your designs. While we are on the subject of the future, I do believe MongoDB has a bright future. Most data that could be put in a Relational Database is already there. We are left with: Data this is "not square", meaning it does not fit well in square tables. Large datasets We believe the document model and the scalability of MongoDB are prime to store those data sets. Ensure you are ready for the future by becoming an expert on MongoDB and how to model for it We did use a fictional site, however all the patterns we used would also apply to "Internet of Things", "Single View", "E-commerce" solutions.
  29. Let’s pause from our pattern list, and let’s examine a characteristic or aspect of some patterns.
  30. The bucket pattern lets you group X sub-documents into one document. When the bucket is full, you create another one. Pre-allocation will be the case where you pre-create an array of cells to have the reads and writes easily access the elements. This is a very important pattern if you are using MMAPv1, as continuously growing an array can have a negative effect. With Wired Tiger it is not as crucial, however may make the code in the application simpler. As for Trees, they are commonly represented by either having one node per document, where you can list the parent, the children, the ancestors, or a combination of those
  31. 10 years of experience, building schema with the document model You may not know about all the solutions, however we collected those over our 10 years of working with customers. If you follow this advice, good things will happen. Armed with those patterns, you may fell like a SuperHero, more powerful
  32. My goal was to introduce you to patterns, however if you want more complete solutions to common problems, there are few good books out there. Let me point you to those 2: The Little Mongo DB Schema Design Book Paperback, by Christian Kvalheim MongoDB Applied Design Patterns, by Rick Copeland