MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB

#MDBlocal
Complete Methodology to
Data Modeling for MongoDB
Yulia Genkina, Curriculum Engineer, MongoDB
München

#MDBLocal
• MongoDB Data Modeling
Methodology :
• Entity Relationships
• Schema Patterns
• Methodology Use Case
Example
• Conclusions and other
considerations
Talk Structure

#MDBLocal
Step 1 : Define the schema.
Step 2 : Develop the application
and queries.
Concerns:
- One possible solution for the initial
schema.
- Final schema is most likely denormalized.
- Schema evolution is difficult and likely
requires downtime.
- Performance drops as schema evolves.
Data Modeling in the TabularWorld

#MDBLocal
Step 1 : Develop the application
and queries.
Step 2 : Define the schema.
Step 3 : Improve the application.
Step 4 : Improve the schema.
Step 5 : Repeat steps 3 and 4
indefinitely.
Step 6 : Profit
Data Modeling in the Document World

Data Modeling
Step-by-step Guide

#MDBLocal
• Data size.
• A list of database queries
and indexes.
• A list of current operations
and assumptions.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
Evaluate the
application
workload

#MDBLocal
• A list of collections with
document fields for each
collection.
• Data size.
and indexes.
• A list of current operations,
assumptions, and growth
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
• CRD : Collection
Relationship
Diagrams
Evaluate the
application
workload
Map out the
entities and
their
relationships

Relationships
Brief Introduction

#MDBLocal
Example 1: Entities and Relationships in an Blog.

#MDBLocal
Example 1: Schema Outline for a Blog
orEmbed All Embed & Link
Queries by
articles or
users
Queries by
articles

#MDBLocal
Example 2: Entities for a Library Application.
book
title
isbn
language
published_by
author
user
username
first_name
last_name
author
first_name
last_name
Normalized form

#MDBLocal
Example 2: Entities for a Library Application.
book
title
isbn
language
published_by
author
- first_name
- last_name
user
username
first_name
last_name
De-Normalized form

#MDBLocal
Example 2: Embedding
• Can be used for a 1-N or an N-N relationship.
• Great for read performance.
• One atomic operation retrieves all necessary
information.

#MDBLocal
Example 2: Linking.
• More, smaller documents.
• Can make queries by ID very simple.
• Can be used for a 1-N or an N-N relationship.

#MDBLocal
document fields and
shapes for each collection.
• Data size.
and indexes.
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
• CRD : Collections
Relationship
Diagram
Evaluate the
application
workload
Map out the
entities and
their
relationships
Finalize schema
for each
collection
• Identify and apply
relevant schema
patterns

#MDBLocal
Schema Versioning Pattern

#MDBLocal
Computed Pattern
CPU work

#MDBLocal
Bucket Pattern
New document for each sensor
readingTabularApproach
A document per time unit per sensor
Document Approach

#MDBLocal
Bucket Pattern
Schema
Bucket per Hour
Computed Pattern

#MDBLocal
Solution with Schema Versioning, Subset, Computed, and Bucket
Patterns

#MDBLocal
Other Patterns and Where to Find Them
• Read more about patterns on our blog:
http://bit.ly/building-with-patterns
• Take the Data Modeling with MongoDB Course:
https://university.mongodb.com/courses/M320/abou
t
• Some more patterns to explore:
• Approximation
• Attribute
• Document Versioning
• Extended Reference
• Outlier
• Preallocated
• Polymorphic

Design an Online Shopping App:
MongoMart
A Use Case Example

#MDBLocal
• Data size.
• A list of database queries.
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
Evaluate the
application
workload

#MDBLocal
Evaluate Application Workload
1000 stores
10 Million items
100 Million user accounts
• 500K new accounts per week
• logging 20 times a year
• looking up 100 items per year
• making 5 carts per year
• reviewing 2 items per year
Analytics
• 50 employees per store
• one store lookup per customer per year
• 100 reviews per item
• 500K updates per day (new products, price
updates, ...)
• putting 4 items in the cart
• buying an average of 2 items per cart
• 10 data scientists
• each running 10 queries a day

#MDBLocal
List and Sizing of Write Operations
ID Description Type Durability Data Life Data Size
(Bytes)
Storage Size
(per day)
Average
Frequence
(writes/sec)
Peak Frequency
(writes/sec)
W1
user creates an
account
insert w: majority forever 500 35.7 MB 1 3
W2
application
records time and
user info when an
item is viewed
insert w: 0 5 years 100 2.7 GB 317 800
W3
user adds item to
cart
insert w: majority 1 month 500 2.7 GB 64 100
W4
user creates a
shopping cart
insert w: majority 5 years 2000 2.7 GB 16 40
W5
user adds a
review to an item
insert w: 1 5 years 1000 547 MB 7 14
W6
employee inserts
new items or
updates existing
items in the
catalog
insert or
update
w: majority forever 500 250 MB 6 12

#MDBLocal
List and Sizing of Read Operations
ID Description Type Max Latency Execution Time Single Doc Size
(Bytes)
Average
Frequency
(reads/sec)
Peak
Frequency
(reads/sec)
R1
user logs into
the application
real-time 5ms 1000 64 80
R2
user views a
specific item
real-time 1ms 1000 317 800
R3
user views a
specific store
real-time 50ms 1000 3 10
R4
user views their
cart
real-time 20ms 2000 31 100
R5
data scientist
runs analytics
analytics 60 secs < 1

#MDBLocal
Data Sizing
Entity Count Document Size
(Bytes)
Total Disk Space
(Bytes)
Notes
carts 2,500,000,000
2000
5.00E+12 5 years of data
categories 100
100
1.00E+04
items 10,000,000
1000
1.00E+10
reviews 1,000,000,000
1000
1.00E+12 5 years of data
staff 10,000
200
2.00E+06
stores 200
1000
2.00E+05
users 100,000,000 1000 1.00E+11
views 50,000,000,000 50 2.50E+12

#MDBLocal
Workload Evaluation Summary
Most important queries:
• R2: user views a specific item – has to be under 1
ms.
• W3: user adds item to cart – write concern:
majority.
Required indexes:
• { category: 1, item_name: 1}
• { category: 1, item_name: 1, price: 1}
• { username: 1}
Assumptions and Projections:
• Data will be stored for a maximum of 5 years.
• Number of items sold will double each year.
• Number of users will double each year.
List of Entities:
• carts
• categories
• items
• reviews
• staff
• stores
• users
• views

#MDBLocal
document fields for each
collection.
• Data size.
and indexes.
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Business
dom
ain
expertise
Current
and
predicted
scenarios
• CRD : Collections
Relationship
Diagram
Evaluate the
application
workload
Map out the
entities and
their
relationships

#MDBLocal
Entity Relationship Diagram

#MDBLocal
Collections Relationship Diagram ( Simple )
Embed
Everything!

#MDBLocal
Collections Relationship Diagram ( Better )
Accommodate for
Assumptions.
Embed and Link
clear every 5
years
clear every 5
years

#MDBLocal
Apply All the Patterns!
Patterns Used:
• Schema
Versioning
• Subset
• Computed
• Bucket
• Extended
Reference

Conclusion
And additional considerations

#MDBLocal
Your Data Model Will Evolve
Just like your application

#MDBLocal
Tailor the Data Model
To your unique setup
• Shared hosted
DB
• Small team
• Large Sharded
Cluster
• Large Team
• Replica Set
Simpler data
model Performant data model

#MDBLocal
Flexible Data Modeling Approach
For a Simpler data model
focus on:
For a bit of both:
For the most Performant
data model focus on:
Evaluate the application
workload
The most frequent
operation
• Data size
• The most frequent
operations
• Data size
• The most frequent
operations
• The most important
operations
Map out the entities and their
relationships
Embedding data Embedding and linking data Embedding and linking data
Finalize schema for each
collection
Use few patterns
Use as many patterns as
necessary
Use as many patterns as
necessary

MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB

Similar to MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB