SlideShare a Scribd company logo
1 of 53
Download to read offline
Data Modelling for MongoDB
Norberto Leite
MongoDB
May 14th, 2019
Tel Aviv, Israel
Norberto Leite
Lead Engineer - Curriculum @ MongoDB
norberto@mongodb.com
New York
@nleite
https://university.mongodb.com
Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns
Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns
Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns
Differences when Modelling for
a Document Database versus a
Relational Database
Thinking in Documents
1. Polymorphism
• different documents may contain
different fields
2. Array
• represent a "one-to-many" relation
• index is on all entries
3. Sub Document
• grouping some fields together
4. JSON/BSON
• documents are often shown as JSON
• BSON is the physical format
Example: modelling a blog
… 5 tabes become 1 or 2 collections
Example: Modelling a Social Network
Tabular MongoDB
Steps to create the model 1 – define schema
2 – develop app and queries
1 – identifying the queries
2 – define schema
Initial schema 3rd normal form
One solution
many solutions possible
Final schema likely denormalized few changes
Schema evolution difficult and not optimal
Likely downtime
easy and no downtime
Performance mediocre optimized
Differences: Relational/Tabular vs Document
Other Considerations for the Model
1. one-to-many relationships where "many" is a humongous number
2. Embed or Reference
• Joins via $lookup
• Transactions for multi document writes
3. Transactions available for Replica set, and soon for Sharded Clusters
4. Sharding Key
5. Indexes
6. Simple queries, or more complex ones with the Aggregation Framework
Flexible Modelling Methodology for
MongoDB
Methodology
Methodology
1. Describe the
Workload
Methodology
1. Describe the
Workload
2. Identify and Model
the Relationships
Methodology
1. Describe the
Workload
2. Identify and Model
the Relationships
3. Apply Patterns
Flexible Methodology
Case Study: ‫א‬‫ס‬‫פ‬‫ר‬‫ס‬‫ו‬‫א‬‫ר‬‫ו‬‫מ‬‫ט‬‫י‬
A. Business: coffee shop franchises
B. Name: Cuppa Coffee
also considered: Coffee Mate, Crocodile Coffee
C. Objective:
• 10 000 stores in Israel, Kazakhstan, Romania, Ukraine ...
• … then we invade America
D. Keys to success:
• Best coffee in the world
• Technology
Make the Best Coffee in the World
23g of ground coffee in, 20g of extracted
coffee out, in approximately 20 seconds
1. Fill a small or regular cup with 80% hot
water (not boiling but pretty hot). Your
cup should be 150ml to 200ml in total
volume, 80% of which will be hot water.
2. Grind 23g of coffee into your portafilter
using the double basket. We use a scale
that you can get here.
3. Draw 20g of coffee over the hot water by
placing your cup on a scale, press tare
and extract your shot.
Technology
1. Measure inventory in real time
• Shelves with scales
2. Big Data collection on cups of coffee
• weighings, temperature, time to produce, …
3. Data Analysis
• Coffee perfection
• Rush hours -> staffing needs
4. MongoDB
Methodology
1. Describe the
Workload
2. Identify and Model
the Relationships
3. Apply Patterns
1 – Workload: List Queries
Query Operation Description
1. Coffee weight on the shelves write A shelf send information when coffee bags are added or
removed
2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the
next days
3. Anomalies in the inventory read Analytics
4. Making a cup of coffee write A coffee machine reporting on the production of a coffee
cup
5. Analysis of cups of coffee read Analytics
6. Technical Support read Helping our franchisees
Query Quantification Qualification
1. Coffee weight on the shelves 10/day*shelf*store
=> 1/sec
<1s
critical write
2. Coffee to deliver to stores 1/day*store
=> 0.1/sec
<60s
3. Anomalies in the inventory 24 reads/day <5mins
"collection scan"
4. Making a cup of coffee 10 000 000 writes/day
115 writes/sec
<100ms
non-critical write
… cups of coffee at rush hour 3 000 000 writes/hr
833 writes/sec
<100ms
non-critical write
5. Analysis of cups of coffee 24 reads/day stale data is fine
"collection scan"
6. Technical Support 1000 reads/day <1s
1 – Workload: quantify/qualify
Query Quantification Qualification
1. Coffee weight on the shelves 10/day*shelf*store
=> 1/sec
<1s
critical write
2. Coffee to deliver to stores 1/day*store
=> 0.1/sec
<60s
3. Anomalies in the inventory 24 reads/day <5mins
"collection scan"
4. Making a cup of coffee 10 000 000 writes/day
115 writes/sec
<100ms
non-critical write
… cups of coffee at rush hour 3 000 000 writes/hr
833 writes/sec
<100ms
non-critical write
5. Analysis of cups of coffee 24 reads/day stale data is fine
"collection scan"
6. Technical Support 1000 reads/day <1s
1 – Workload: quantify/qualify
Disk Space
Cups of coffee (one year of data)
• 10000 x 1000/day x 365
• 3.7 billions/year
• 370 GB (100 bytes/cup of coffee)
Weighings
• 10000 x 10/day x 365
• 365 billions/year
• 3.7 GB (100 bytes/weighings)
Methodology
1. Describe the
Workload
2. Identify and Model
the Relationships
3. Apply Patterns
2 - Relations are still important
Type of Relation -> one-to-one/1-1 one-to-many/1-N many-to-many/N-N
Document embedded in
the parent document
• one read
• no joins
• one read
• no joins
• one read
• no joins
• duplication of
information
Document referenced in
the parent document
• smaller reads
• many reads
• smaller reads
• many reads
• smaller reads
• many reads
2 - Entities for ‫א‬‫ס‬‫פ‬‫ר‬‫ס‬‫ו‬‫א‬‫ר‬‫ו‬‫מ‬‫ט‬‫י‬
- Coffee cups
- Stores
- Coffee
machines
- Shelves
- Weighings
- Coffee bags
Methodology
1. Describe the
Workload
2. Identify and Model
the Relationships
3. Apply Patterns
Schema Design Patterns
Schema Design Patterns
Resources
A. Advanced Schema Design
Patterns
• MongoDB World 2017
• Webinar
B. MongoDB University
• university.mongodb.com
• M320 – Data Modeling (2019)
C. Blogs on Schema Design
Patterns
https://www.mongodb.com/blog/post/building-with-patterns-a-summary
Schema Versioning
Computed Pattern
Subset Pattern
Subset Pattern
Bucket Pattern
Bucket Pattern
{
"device_id": 000123456,
"type": "2A",
"date": ISODate("2018-03-02"),
"temp": [ [ 20.0, 20.1, 20.2, ... ],
[ 22.1, 22.1, 22.0, ... ],
...
]
}
{
"device_id": 000123456,
"type": "2A",
"date": ISODate("2018-03-03"),
"temp": [ [ 20.1, 20.2, 20.3, ... ],
[ 22.4, 22.4, 22.3, ... ],
...
]
}
{
"device_id": 000123456,
"type": "2A",
"date": ISODate("2018-03-02T13"),
"temp": { 1: 20.0, 2: 20.1, 3: 20.2, ... }
}
{
"device_id": 000123456,
"type": "2A",
"date": ISODate("2018-03-02T14"),
"temp": { 1: 22.1, 2: 22.1, 3: 22.0, ... }
}
Bucket per
Day
Bucket per
Hour
External Reference Pattern
Cuppa Coffee - Solution with
Patterns
• Schema Versioning
• Subset
• Computed
• Bucket
• External Reference
Conclusion
Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
• Workload
• Relationships
• Patterns
Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
• Workload
• Relationships
• Patterns
Recognize the need
and when to apply
Schema Design
Patterns
Coming Soon …
• "Data Modelling" course at:
university.mongodb.com
Norberto Leite
Lead Engineer
norberto@mongodb.com
@nleite
Data Modeling for MongoDB

More Related Content

What's hot

Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
MongoDB
 

What's hot (20)

NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
 
Mongo DB Presentation
Mongo DB PresentationMongo DB Presentation
Mongo DB Presentation
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Mongodb vs mysql
Mongodb vs mysqlMongodb vs mysql
Mongodb vs mysql
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDB
 
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
 
MongoDB
MongoDBMongoDB
MongoDB
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Copy of MongoDB .pptx
Copy of MongoDB .pptxCopy of MongoDB .pptx
Copy of MongoDB .pptx
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Mongo indexes
Mongo indexesMongo indexes
Mongo indexes
 

Similar to Data Modeling for MongoDB

Rapid Development with Schemaless Data Models
Rapid Development with Schemaless Data ModelsRapid Development with Schemaless Data Models
Rapid Development with Schemaless Data Models
MongoDB
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
MongoDB
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 

Similar to Data Modeling for MongoDB (20)

Data Modelling for MongoDB - MongoDB.local Tel Aviv
Data Modelling for MongoDB - MongoDB.local Tel AvivData Modelling for MongoDB - MongoDB.local Tel Aviv
Data Modelling for MongoDB - MongoDB.local Tel Aviv
 
MongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDBMongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDB
 
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
 
MongoDB World 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB World 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019: A Complete Methodology to Data Modeling for MongoDB
 
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDBMongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
 
MongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDB
MongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDBMongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDB
MongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB
 
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
 
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDB
 
Rapid Development with Schemaless Data Models
Rapid Development with Schemaless Data ModelsRapid Development with Schemaless Data Models
Rapid Development with Schemaless Data Models
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applications
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Data Modeling for MongoDB

  • 1. Data Modelling for MongoDB Norberto Leite MongoDB May 14th, 2019 Tel Aviv, Israel
  • 2. Norberto Leite Lead Engineer - Curriculum @ MongoDB norberto@mongodb.com New York @nleite
  • 4. Goals of the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB Recognize the need and when to apply Schema Design Patterns
  • 5. Goals of the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB Recognize the need and when to apply Schema Design Patterns
  • 6. Goals of the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB Recognize the need and when to apply Schema Design Patterns
  • 7. Differences when Modelling for a Document Database versus a Relational Database
  • 8.
  • 9. Thinking in Documents 1. Polymorphism • different documents may contain different fields 2. Array • represent a "one-to-many" relation • index is on all entries 3. Sub Document • grouping some fields together 4. JSON/BSON • documents are often shown as JSON • BSON is the physical format
  • 11. … 5 tabes become 1 or 2 collections
  • 12. Example: Modelling a Social Network
  • 13. Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema 3rd normal form One solution many solutions possible Final schema likely denormalized few changes Schema evolution difficult and not optimal Likely downtime easy and no downtime Performance mediocre optimized Differences: Relational/Tabular vs Document
  • 14. Other Considerations for the Model 1. one-to-many relationships where "many" is a humongous number 2. Embed or Reference • Joins via $lookup • Transactions for multi document writes 3. Transactions available for Replica set, and soon for Sharded Clusters 4. Sharding Key 5. Indexes 6. Simple queries, or more complex ones with the Aggregation Framework
  • 16.
  • 19. Methodology 1. Describe the Workload 2. Identify and Model the Relationships
  • 20.
  • 21.
  • 22.
  • 23. Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
  • 25. Case Study: ‫א‬‫ס‬‫פ‬‫ר‬‫ס‬‫ו‬‫א‬‫ר‬‫ו‬‫מ‬‫ט‬‫י‬ A. Business: coffee shop franchises B. Name: Cuppa Coffee also considered: Coffee Mate, Crocodile Coffee C. Objective: • 10 000 stores in Israel, Kazakhstan, Romania, Ukraine ... • … then we invade America D. Keys to success: • Best coffee in the world • Technology
  • 26. Make the Best Coffee in the World 23g of ground coffee in, 20g of extracted coffee out, in approximately 20 seconds 1. Fill a small or regular cup with 80% hot water (not boiling but pretty hot). Your cup should be 150ml to 200ml in total volume, 80% of which will be hot water. 2. Grind 23g of coffee into your portafilter using the double basket. We use a scale that you can get here. 3. Draw 20g of coffee over the hot water by placing your cup on a scale, press tare and extract your shot.
  • 27. Technology 1. Measure inventory in real time • Shelves with scales 2. Big Data collection on cups of coffee • weighings, temperature, time to produce, … 3. Data Analysis • Coffee perfection • Rush hours -> staffing needs 4. MongoDB
  • 28. Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
  • 29. 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup 5. Analysis of cups of coffee read Analytics 6. Technical Support read Helping our franchisees
  • 30. Query Quantification Qualification 1. Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s 1 – Workload: quantify/qualify
  • 31. Query Quantification Qualification 1. Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s 1 – Workload: quantify/qualify
  • 32. Disk Space Cups of coffee (one year of data) • 10000 x 1000/day x 365 • 3.7 billions/year • 370 GB (100 bytes/cup of coffee) Weighings • 10000 x 10/day x 365 • 365 billions/year • 3.7 GB (100 bytes/weighings)
  • 33. Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
  • 34. 2 - Relations are still important Type of Relation -> one-to-one/1-1 one-to-many/1-N many-to-many/N-N Document embedded in the parent document • one read • no joins • one read • no joins • one read • no joins • duplication of information Document referenced in the parent document • smaller reads • many reads • smaller reads • many reads • smaller reads • many reads
  • 35. 2 - Entities for ‫א‬‫ס‬‫פ‬‫ר‬‫ס‬‫ו‬‫א‬‫ר‬‫ו‬‫מ‬‫ט‬‫י‬ - Coffee cups - Stores - Coffee machines - Shelves - Weighings - Coffee bags
  • 36. Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
  • 38. Schema Design Patterns Resources A. Advanced Schema Design Patterns • MongoDB World 2017 • Webinar B. MongoDB University • university.mongodb.com • M320 – Data Modeling (2019) C. Blogs on Schema Design Patterns https://www.mongodb.com/blog/post/building-with-patterns-a-summary
  • 44. Bucket Pattern { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02"), "temp": [ [ 20.0, 20.1, 20.2, ... ], [ 22.1, 22.1, 22.0, ... ], ... ] } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-03"), "temp": [ [ 20.1, 20.2, 20.3, ... ], [ 22.4, 22.4, 22.3, ... ], ... ] } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02T13"), "temp": { 1: 20.0, 2: 20.1, 3: 20.2, ... } } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02T14"), "temp": { 1: 22.1, 2: 22.1, 3: 22.0, ... } } Bucket per Day Bucket per Hour
  • 46. Cuppa Coffee - Solution with Patterns • Schema Versioning • Subset • Computed • Bucket • External Reference
  • 48. Takeaways from the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database
  • 49. Takeaways from the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB • Workload • Relationships • Patterns
  • 50. Takeaways from the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB • Workload • Relationships • Patterns Recognize the need and when to apply Schema Design Patterns
  • 51. Coming Soon … • "Data Modelling" course at: university.mongodb.com