SlideShare a Scribd company logo
MongoDB Schema Design
Tips & Tricks
Grupo Undanet
August 2017, Salamanca
Who am I
Juan Roy
Twitter: @juanroycouto
Email: juanroycouto@gmail.com
MongoDB DBA at Grupo Undanet
2
Agenda
MongoDB Schema Design
● What is MongoDB
● What is a JSON Document
● What a Document Must Contain
● Relational Approach vs
Document Model
● Normalization vs
Denormalization
● Embedding Documents
● Things to Keep in Mind
● Goals
● Over Normalization
3
● Overloaded Documents
● Working Set
● Historic Information
● 1-1
● 1-Few (Embedding & Referencing)
● N-1
● 1-Many
● Many-Many
● Recap
What is MongoDB
MongoDB Schema Design
● Non-Relational Database
● NoSQL Multipurpose Database
● Main Characteristics:
○ Scalability
○ High Availability
○ Automatic Failover
○ …
● Document-based (JSON)
4
SQL MongoDB
Database Database
Table Collection
Register Document
What is a JSON Document
MongoDB Schema Design
5
{
"_id" : ObjectId("59400587962fe33db2194129"),
"description" : "MICHELIN 285/30 ZR21 PILOT SUPER SPORT 2012",
"date" : ISODate("2017-08-28T04:02:32Z"),
"property" : {
"tag" : {
"noisebands" : "1",
"rollingresistance" : "B",
"noise" : "69",
"wetgrip" : "A"
},
"ratio" : 30,
},
"ecotasa" : [
{
"country" : "724",
"price" : NumberDecimal("1.380000"),
},
{
"country" : "620",
"price" : NumberDecimal("0.000000"),
}
],
"location" : {
"type" : Point,
"coordinates" : [ -5.724332, 40.959219 ]
}
}
_id
string
array
date
subdocument
geo-location
number
What a Document must Contain
MongoDB Schema Design
● Ideally
○ All (principal application) item-related data
○ 1 Doc per Item
6
Application Principal Item
Catalog Article
Finance Client
● Really
○ Most frequently accessed data
Relational Approach vs Document Model
MongoDB Schema Design
7
{
"_id" : ObjectId("59400587962fe33db2194129"),
"description" : "MICHELIN 285/30 ZR21 PILOT SUPER SPORT 2012",
"date" : ISODate("2017-08-28T04:02:32Z")
"property" : {
"tag" : {
"noisebands" : "1",
"rollingresistance" : "B",
"noise" : "69",
"wetgrip" : "A"
},
"ratio" : "30",
},
"ecotasa" : [
{
"country" : "724",
"price" : NumberDecimal("1.380000"),
},
{
"country" : "620",
"price" : NumberDecimal("0.000000"),
}
],
"location" : {
"type" : Point,
"coordinates" : [ -5.724332, 40.959219 ]
}
}
Normalization vs Denormalization
MongoDB Schema Design
8
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca'
}
Motorbikes
{
_id : 1,
owner : 1,
color : 'red',
model : 'Suzuki'
}
{
_id : 2,
owner : 1,
color : 'black',
model : 'Harley Davidson'
}
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca',
motorbikes : [
{
model : 'Suzuki',
color : 'red'
},
{
model : 'Harley Davidson',
color : 'black'
}
]
}
Denormalization
Normalization
Embedding Documents
MongoDB Schema Design
9
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca'
}
Motorbikes
{
_id : 1,
owner : 1,
color : 'red',
model : 'Suzuki'
}
{
_id : 2,
owner : 1,
color : 'black',
model : 'Harley Davidson'
}
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca',
motorbikes : [
{
model : 'Suzuki',
color : 'red'
},
{
model : 'Harley Davidson',
color : 'black'
}
]
}
Things to Keep in Mind
MongoDB Schema Design
10
● Avoid Relational Approach
● What will happen if we scale
● Size of:
○ Data
○ Index
○ Document
● How will users access the data
○ Normal users
○ Machine Learning
○ Business Intelligence
Goals
MongoDB Schema Design
11
● Performance
● Scalability
● Simplicity
Over Normalization
MongoDB Schema Design
● The relational model has been moved directly to the MongoDB model.
● In the relational world is common to have one table per concept. They do not
have arrays.
● Only one action implies multiple queries, instead of just querying the data
once.
12
Overloaded Documents
MongoDB Schema Design
● This problem can arise if the application is packing lots of rarely used data
into its frequently accessed documents.
● If your application is packing rarely used data into a document that needs to
be touched frequently, that means it is more likely to evict other important
data from the cache when that document gets read.
● Multiply this across a collection and the net result is that the server could be
paging a lot more data than necessary in order to service the application.
13
Working Set
MongoDB Schema Design
14
The Working Set is the size of:
● Our Data *
plus
● Our Indexes
* But only the size of our most accessed data
The Working Set must fit in RAM!
Working Set
MongoDB Schema Design
15
The Working Set does not fit in RAM, what should I do?
● Add more RAM to our machine
● Shard
● Reduce the size of our Working Set:
○ Limit our arrays
○ Limit our embedded documents
○ …
○ Benefits:
■ Fast data retrieval
■ One query brings all the information needed
Historic Information
MongoDB Schema Design
16
● When our data grows up continuously (historical) and we embed them on our
main collection, our document will own a lot of information not needed
habitually. But maybe, I want to store that for analytics purposes. So we’ll
keep it away from the user document.
● That is not the case of information with a limited growth (addresses, phone
numbers, etc).
1-1
MongoDB Schema Design
17
id name phone_number zip_code
1 Rick 555-111-1234 01209
2 Mike 555-222-2345 30062
Users
{
_id : 1,
name : 'Rick',
phone_number : '555-111-1234',
zip_code : '01209'
}
{
_id : 2,
name : 'Mike',
phone_number : '555-222-2345',
zip_code : '30062'
}
1-Few
MongoDB Schema Design
18
● Referencing (or Normalization)
○ To show a user’s information we need to do joins (or more than one query), this implies
random seeks, a very low-performance operation!
● Embedding (or Denormalization)
○ We can avoid joins via denormalization. This implies redundancy data and more complex
applications for not to generate inconsistencies.
○ Arrays help us to get no redundancy. This solution gives us perform benefits.
○ With denormalization, we have a lot of data model possibilities and this makes more difficult to
define our model.
1-Few
MongoDB Schema Design
19
id name zip_code
1 Rick 01209
2 Mike 30062
id user_id phone_number
1 1 555-111-1234
2 2 555-222-2345
3 2 555-333-3456
1-Few (MongoDB-Embedding)
MongoDB Schema Design
● The approach that gives us the best performance and data consistency guarantees.
● Locality: MongoDB stores documents contiguously on disk, putting all the data you
need into one document means that you’re never more than one seek away from
everything you need.
● Atomicity and Isolation: Embedding we get atomicity (transactionality).
20
{
_id : 2,
name : 'Mike',
zip_code : '30062',
phone_numbers : [ '555-222-2345', '555-333-3456' ]
}
1-Few (MongoDB-Referencing)
MongoDB Schema Design
21
{
_id : 2,
name : 'Mike',
zip_code : '30062',
phone_numbers : [ 2, 3 ]
}
{
_id : 2,
user_id : 2,
phone_number : '555-222-2345'
}
{
_id : 3,
user_id : 2,
phone_number : '555-333-3456'
}
● Referencing we lose transactionality.
● We need:
○ More than one query
○ To use $lookup (joins)
● This approach is worst than embedding
for performance.
● If we have to read our data frequently is
better to embed it.
● Flexibility in order to project desired
fields.
N-1
MongoDB Schema Design
22
{
_id : 2,
name : 'Mike',
zip_code : '30062',
phone_numbers : [ 2, 3 ],
address : '13, Rue del Percebe'
}
{
_id : 1,
name : 'Rick',
zip_code : '01209',
phone_numbers : [ 2, 3 ],
address : '13, Rue del Percebe'
}
What if two people share an address?
● Does that mean that you have to
store the address twice? Yes, you
do have to store it twice, three
times, etc.
● This is better than make
unnecessary joins. This extra
space on the disk you are going to
need will make your queries faster.
1-Many
MongoDB Schema Design
Case: A blog with hundreds, or even thousands, of comments for a given post.
Embedding carries significant penalties:
● The larger a document is, the more RAM it uses. The fewer documents in RAM, the more likely the
server is to page fault to retrieve documents, and ultimately page faults lead to random disk I/O.
● Growing documents must eventually be copied to larger spaces.
● The document never stops growing up.
● MongoDB documents have a hard size limit of 16MB.
Referencing:
● The document will not grow up because we will have one document per comment in a second
collection.
● For very high or unpredictable one-to-many relationships.
Solution: We may only wish to display the first three comments when showing a blog entry, more is simply
wasting RAM.
23
Many-Many
MongoDB Schema Design
● We will embed a list of _id values in both directions
● We no longer have redundant information
24
Product
{ _id : 'My product',
category_ids : [ 'My category',... ]
}
Category
{ _id : 'My category',
product_ids : [ 'My product', … ]
}
Recap
MongoDB Schema Design
● Avoid round trips to the database.
● User events should only generate a small number of queries.
● Use arrays when needed and of course when they won’t grow indefinitely.
● Don’t just migrate relational schemas.
● Data that is queried together should be in the same document whenever possible.
● Store the last login time, plus the shopping cart, in the user document since that is all
we need for the landing page.
● Embedding for performance and atomicity (transactionality).
● Referencing for huge relationships.
Ultimately, the decision depends on the access patterns of your application.
25
Questions?
MongoDB Schema Design
26
Thank you!
MongoDB Schema Design
Thank you for your attention!
27

More Related Content

What's hot

MongoDB 101
MongoDB 101MongoDB 101
MongoDB 101
Abhijeet Vaikar
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB
 
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
 Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
MongoDB
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
MongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB
 
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosConceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
MongoDB
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
MongoDB
 
Doing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookupDoing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookup
MongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
MongoDB
 
Mongo db – document oriented database
Mongo db – document oriented databaseMongo db – document oriented database
Mongo db – document oriented database
Wojciech Sznapka
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
Webinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to BasicsWebinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to Basics
MongoDB
 
Jumpstart: Introduction to MongoDB
Jumpstart: Introduction to MongoDBJumpstart: Introduction to MongoDB
Jumpstart: Introduction to MongoDB
MongoDB
 
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB and Spark
MongoDB and SparkMongoDB and Spark
MongoDB and Spark
Norberto Leite
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
User Data Management with MongoDB
User Data Management with MongoDB User Data Management with MongoDB
User Data Management with MongoDB
MongoDB
 

What's hot (20)

MongoDB 101
MongoDB 101MongoDB 101
MongoDB 101
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big Compute
 
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
 Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
 
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosConceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 
Doing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookupDoing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookup
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
 
Mongo db – document oriented database
Mongo db – document oriented databaseMongo db – document oriented database
Mongo db – document oriented database
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Webinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to BasicsWebinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to Basics
 
Jumpstart: Introduction to MongoDB
Jumpstart: Introduction to MongoDBJumpstart: Introduction to MongoDB
Jumpstart: Introduction to MongoDB
 
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB and Spark
MongoDB and SparkMongoDB and Spark
MongoDB and Spark
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
User Data Management with MongoDB
User Data Management with MongoDB User Data Management with MongoDB
User Data Management with MongoDB
 

Similar to MongoDB Schema Design Tips & Tricks

An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
César Trigo
 
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
OpenExpoES
 
MongoDB Workshop Universidad de Huelva
MongoDB Workshop Universidad de HuelvaMongoDB Workshop Universidad de Huelva
MongoDB Workshop Universidad de Huelva
Juan Antonio Roy Couto
 
MongoDB DOC v1.5
MongoDB DOC v1.5MongoDB DOC v1.5
MongoDB DOC v1.5
Tharun Srinivasa
 
MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?
Binary Studio
 
MongoDB
MongoDBMongoDB
MongoDB
wiTTyMinds1
 
[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema Design
[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema Design[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema Design
[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema Design
MongoDB
 
MongoDB Basics Unileon
MongoDB Basics UnileonMongoDB Basics Unileon
MongoDB Basics Unileon
Juan Antonio Roy Couto
 
Jumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignJumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema Design
MongoDB
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
MongoDB
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB
 
MongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseMongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql Database
Sudhir Patil
 
MongoDB Design Patterns
MongoDB Design PatternsMongoDB Design Patterns
MongoDB Design Patterns
Haim Michael
 
New paradigms
New paradigmsNew paradigms
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
Vladislav Supalov
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
MongoDB
 
MongoDB NoSQL - Developer Guide
MongoDB NoSQL - Developer GuideMongoDB NoSQL - Developer Guide
MongoDB NoSQL - Developer Guide
Shiv K Sah
 
Mongodb
MongodbMongodb
Mongodb
Apurva Vyas
 
Precog & MongoDB User Group: Skyrocket Your Analytics
Precog & MongoDB User Group: Skyrocket Your Analytics Precog & MongoDB User Group: Skyrocket Your Analytics
Precog & MongoDB User Group: Skyrocket Your Analytics
MongoDB
 

Similar to MongoDB Schema Design Tips & Tricks (20)

An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
 
MongoDB Workshop Universidad de Huelva
MongoDB Workshop Universidad de HuelvaMongoDB Workshop Universidad de Huelva
MongoDB Workshop Universidad de Huelva
 
MongoDB DOC v1.5
MongoDB DOC v1.5MongoDB DOC v1.5
MongoDB DOC v1.5
 
MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?
 
MongoDB
MongoDBMongoDB
MongoDB
 
[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema Design
[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema Design[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema Design
[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema Design
 
MongoDB Basics Unileon
MongoDB Basics UnileonMongoDB Basics Unileon
MongoDB Basics Unileon
 
Jumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignJumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema Design
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
MongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseMongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql Database
 
MongoDB Design Patterns
MongoDB Design PatternsMongoDB Design Patterns
MongoDB Design Patterns
 
New paradigms
New paradigmsNew paradigms
New paradigms
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
MongoDB NoSQL - Developer Guide
MongoDB NoSQL - Developer GuideMongoDB NoSQL - Developer Guide
MongoDB NoSQL - Developer Guide
 
Mongodb
MongodbMongodb
Mongodb
 
Precog & MongoDB User Group: Skyrocket Your Analytics
Precog & MongoDB User Group: Skyrocket Your Analytics Precog & MongoDB User Group: Skyrocket Your Analytics
Precog & MongoDB User Group: Skyrocket Your Analytics
 

Recently uploaded

原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 

Recently uploaded (20)

原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 

MongoDB Schema Design Tips & Tricks

  • 1. MongoDB Schema Design Tips & Tricks Grupo Undanet August 2017, Salamanca
  • 2. Who am I Juan Roy Twitter: @juanroycouto Email: juanroycouto@gmail.com MongoDB DBA at Grupo Undanet 2
  • 3. Agenda MongoDB Schema Design ● What is MongoDB ● What is a JSON Document ● What a Document Must Contain ● Relational Approach vs Document Model ● Normalization vs Denormalization ● Embedding Documents ● Things to Keep in Mind ● Goals ● Over Normalization 3 ● Overloaded Documents ● Working Set ● Historic Information ● 1-1 ● 1-Few (Embedding & Referencing) ● N-1 ● 1-Many ● Many-Many ● Recap
  • 4. What is MongoDB MongoDB Schema Design ● Non-Relational Database ● NoSQL Multipurpose Database ● Main Characteristics: ○ Scalability ○ High Availability ○ Automatic Failover ○ … ● Document-based (JSON) 4 SQL MongoDB Database Database Table Collection Register Document
  • 5. What is a JSON Document MongoDB Schema Design 5 { "_id" : ObjectId("59400587962fe33db2194129"), "description" : "MICHELIN 285/30 ZR21 PILOT SUPER SPORT 2012", "date" : ISODate("2017-08-28T04:02:32Z"), "property" : { "tag" : { "noisebands" : "1", "rollingresistance" : "B", "noise" : "69", "wetgrip" : "A" }, "ratio" : 30, }, "ecotasa" : [ { "country" : "724", "price" : NumberDecimal("1.380000"), }, { "country" : "620", "price" : NumberDecimal("0.000000"), } ], "location" : { "type" : Point, "coordinates" : [ -5.724332, 40.959219 ] } } _id string array date subdocument geo-location number
  • 6. What a Document must Contain MongoDB Schema Design ● Ideally ○ All (principal application) item-related data ○ 1 Doc per Item 6 Application Principal Item Catalog Article Finance Client ● Really ○ Most frequently accessed data
  • 7. Relational Approach vs Document Model MongoDB Schema Design 7 { "_id" : ObjectId("59400587962fe33db2194129"), "description" : "MICHELIN 285/30 ZR21 PILOT SUPER SPORT 2012", "date" : ISODate("2017-08-28T04:02:32Z") "property" : { "tag" : { "noisebands" : "1", "rollingresistance" : "B", "noise" : "69", "wetgrip" : "A" }, "ratio" : "30", }, "ecotasa" : [ { "country" : "724", "price" : NumberDecimal("1.380000"), }, { "country" : "620", "price" : NumberDecimal("0.000000"), } ], "location" : { "type" : Point, "coordinates" : [ -5.724332, 40.959219 ] } }
  • 8. Normalization vs Denormalization MongoDB Schema Design 8 People { _id : 1, name : 'Peter', city : 'Salamanca' } Motorbikes { _id : 1, owner : 1, color : 'red', model : 'Suzuki' } { _id : 2, owner : 1, color : 'black', model : 'Harley Davidson' } People { _id : 1, name : 'Peter', city : 'Salamanca', motorbikes : [ { model : 'Suzuki', color : 'red' }, { model : 'Harley Davidson', color : 'black' } ] } Denormalization Normalization
  • 9. Embedding Documents MongoDB Schema Design 9 People { _id : 1, name : 'Peter', city : 'Salamanca' } Motorbikes { _id : 1, owner : 1, color : 'red', model : 'Suzuki' } { _id : 2, owner : 1, color : 'black', model : 'Harley Davidson' } People { _id : 1, name : 'Peter', city : 'Salamanca', motorbikes : [ { model : 'Suzuki', color : 'red' }, { model : 'Harley Davidson', color : 'black' } ] }
  • 10. Things to Keep in Mind MongoDB Schema Design 10 ● Avoid Relational Approach ● What will happen if we scale ● Size of: ○ Data ○ Index ○ Document ● How will users access the data ○ Normal users ○ Machine Learning ○ Business Intelligence
  • 11. Goals MongoDB Schema Design 11 ● Performance ● Scalability ● Simplicity
  • 12. Over Normalization MongoDB Schema Design ● The relational model has been moved directly to the MongoDB model. ● In the relational world is common to have one table per concept. They do not have arrays. ● Only one action implies multiple queries, instead of just querying the data once. 12
  • 13. Overloaded Documents MongoDB Schema Design ● This problem can arise if the application is packing lots of rarely used data into its frequently accessed documents. ● If your application is packing rarely used data into a document that needs to be touched frequently, that means it is more likely to evict other important data from the cache when that document gets read. ● Multiply this across a collection and the net result is that the server could be paging a lot more data than necessary in order to service the application. 13
  • 14. Working Set MongoDB Schema Design 14 The Working Set is the size of: ● Our Data * plus ● Our Indexes * But only the size of our most accessed data The Working Set must fit in RAM!
  • 15. Working Set MongoDB Schema Design 15 The Working Set does not fit in RAM, what should I do? ● Add more RAM to our machine ● Shard ● Reduce the size of our Working Set: ○ Limit our arrays ○ Limit our embedded documents ○ … ○ Benefits: ■ Fast data retrieval ■ One query brings all the information needed
  • 16. Historic Information MongoDB Schema Design 16 ● When our data grows up continuously (historical) and we embed them on our main collection, our document will own a lot of information not needed habitually. But maybe, I want to store that for analytics purposes. So we’ll keep it away from the user document. ● That is not the case of information with a limited growth (addresses, phone numbers, etc).
  • 17. 1-1 MongoDB Schema Design 17 id name phone_number zip_code 1 Rick 555-111-1234 01209 2 Mike 555-222-2345 30062 Users { _id : 1, name : 'Rick', phone_number : '555-111-1234', zip_code : '01209' } { _id : 2, name : 'Mike', phone_number : '555-222-2345', zip_code : '30062' }
  • 18. 1-Few MongoDB Schema Design 18 ● Referencing (or Normalization) ○ To show a user’s information we need to do joins (or more than one query), this implies random seeks, a very low-performance operation! ● Embedding (or Denormalization) ○ We can avoid joins via denormalization. This implies redundancy data and more complex applications for not to generate inconsistencies. ○ Arrays help us to get no redundancy. This solution gives us perform benefits. ○ With denormalization, we have a lot of data model possibilities and this makes more difficult to define our model.
  • 19. 1-Few MongoDB Schema Design 19 id name zip_code 1 Rick 01209 2 Mike 30062 id user_id phone_number 1 1 555-111-1234 2 2 555-222-2345 3 2 555-333-3456
  • 20. 1-Few (MongoDB-Embedding) MongoDB Schema Design ● The approach that gives us the best performance and data consistency guarantees. ● Locality: MongoDB stores documents contiguously on disk, putting all the data you need into one document means that you’re never more than one seek away from everything you need. ● Atomicity and Isolation: Embedding we get atomicity (transactionality). 20 { _id : 2, name : 'Mike', zip_code : '30062', phone_numbers : [ '555-222-2345', '555-333-3456' ] }
  • 21. 1-Few (MongoDB-Referencing) MongoDB Schema Design 21 { _id : 2, name : 'Mike', zip_code : '30062', phone_numbers : [ 2, 3 ] } { _id : 2, user_id : 2, phone_number : '555-222-2345' } { _id : 3, user_id : 2, phone_number : '555-333-3456' } ● Referencing we lose transactionality. ● We need: ○ More than one query ○ To use $lookup (joins) ● This approach is worst than embedding for performance. ● If we have to read our data frequently is better to embed it. ● Flexibility in order to project desired fields.
  • 22. N-1 MongoDB Schema Design 22 { _id : 2, name : 'Mike', zip_code : '30062', phone_numbers : [ 2, 3 ], address : '13, Rue del Percebe' } { _id : 1, name : 'Rick', zip_code : '01209', phone_numbers : [ 2, 3 ], address : '13, Rue del Percebe' } What if two people share an address? ● Does that mean that you have to store the address twice? Yes, you do have to store it twice, three times, etc. ● This is better than make unnecessary joins. This extra space on the disk you are going to need will make your queries faster.
  • 23. 1-Many MongoDB Schema Design Case: A blog with hundreds, or even thousands, of comments for a given post. Embedding carries significant penalties: ● The larger a document is, the more RAM it uses. The fewer documents in RAM, the more likely the server is to page fault to retrieve documents, and ultimately page faults lead to random disk I/O. ● Growing documents must eventually be copied to larger spaces. ● The document never stops growing up. ● MongoDB documents have a hard size limit of 16MB. Referencing: ● The document will not grow up because we will have one document per comment in a second collection. ● For very high or unpredictable one-to-many relationships. Solution: We may only wish to display the first three comments when showing a blog entry, more is simply wasting RAM. 23
  • 24. Many-Many MongoDB Schema Design ● We will embed a list of _id values in both directions ● We no longer have redundant information 24 Product { _id : 'My product', category_ids : [ 'My category',... ] } Category { _id : 'My category', product_ids : [ 'My product', … ] }
  • 25. Recap MongoDB Schema Design ● Avoid round trips to the database. ● User events should only generate a small number of queries. ● Use arrays when needed and of course when they won’t grow indefinitely. ● Don’t just migrate relational schemas. ● Data that is queried together should be in the same document whenever possible. ● Store the last login time, plus the shopping cart, in the user document since that is all we need for the landing page. ● Embedding for performance and atomicity (transactionality). ● Referencing for huge relationships. Ultimately, the decision depends on the access patterns of your application. 25
  • 27. Thank you! MongoDB Schema Design Thank you for your attention! 27