Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
2. #MDBLocal
• MongoDB Data Modeling
Methodology :
• Entity Relationships
• Schema Patterns
• Methodology Use Case
Example
• Conclusions and other
considerations
Talk Structure
3. #MDBLocal
Step 1 : Define the schema.
Step 2 : Develop the application
and queries.
Concerns:
- One possible solution for the initial
schema.
- Final schema is most likely denormalized.
- Schema evolution is difficult and likely
requires downtime.
- Performance drops as schema evolves.
Data Modeling in the TabularWorld
4. #MDBLocal
Step 1 : Develop the application
and queries.
Step 2 : Define the schema.
Step 3 : Improve the application.
Step 4 : Improve the schema.
Step 5 : Repeat steps 3 and 4
indefinitely.
Step 6 : Profit
Data Modeling in the Document World
6. #MDBLocal
• Data size.
• A list of database queries
and indexes.
• A list of current operations
and assumptions.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
Evaluate the
application
workload
7. #MDBLocal
• A list of collections with
document fields for each
collection.
• Data size.
• A list of database queries
and indexes.
• A list of current operations,
assumptions, and growth
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
• CRD : Collection
Relationship
Diagrams
Evaluate the
application
workload
Map out the
entities and
their
relationships
10. #MDBLocal
Example 1: Schema Outline for a Blog
orEmbed All Embed & Link
Queries by
articles or
users
Queries by
articles
11. #MDBLocal
Example 2: Entities for a Library Application.
book
title
isbn
language
published_by
author
user
username
first_name
last_name
author
first_name
last_name
Normalized form
12. #MDBLocal
Example 2: Entities for a Library Application.
book
title
isbn
language
published_by
author
- first_name
- last_name
user
username
first_name
last_name
De-Normalized form
13. #MDBLocal
Example 2: Embedding
• Can be used for a 1-N or an N-N relationship.
• Great for read performance.
• One atomic operation retrieves all necessary
information.
14. #MDBLocal
Example 2: Linking.
• More, smaller documents.
• Can make queries by ID very simple.
• Can be used for a 1-N or an N-N relationship.
15. #MDBLocal
• A list of collections with
document fields and
shapes for each collection.
• Data size.
• A list of database queries
and indexes.
• A list of current operations,
assumptions, and growth
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
• CRD : Collections
Relationship
Diagram
Evaluate the
application
workload
Map out the
entities and
their
relationships
Finalize schema
for each
collection
• Identify and apply
relevant schema
patterns
35. #MDBLocal
Other Patterns and Where to Find Them
• Read more about patterns on our blog:
http://bit.ly/building-with-patterns
• Take the Data Modeling with MongoDB Course:
https://university.mongodb.com/courses/M320/abou
t
• Some more patterns to explore:
• Approximation
• Attribute
• Document Versioning
• Extended Reference
• Outlier
• Preallocated
• Polymorphic
37. #MDBLocal
• Data size.
• A list of database queries.
• A list of current operations,
assumptions, and growth
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
Evaluate the
application
workload
38. #MDBLocal
Evaluate Application Workload
1000 stores
10 Million items
100 Million user accounts
• 500K new accounts per week
• logging 20 times a year
• looking up 100 items per year
• making 5 carts per year
• reviewing 2 items per year
Analytics
• 50 employees per store
• one store lookup per customer per year
• 100 reviews per item
• 500K updates per day (new products, price
updates, ...)
• putting 4 items in the cart
• buying an average of 2 items per cart
• 10 data scientists
• each running 10 queries a day
39. #MDBLocal
List and Sizing of Write Operations
ID Description Type Durability Data Life Data Size
(Bytes)
Storage Size
(per day)
Average
Frequence
(writes/sec)
Peak Frequency
(writes/sec)
W1
user creates an
account
insert w: majority forever 500 35.7 MB 1 3
W2
application
records time and
user info when an
item is viewed
insert w: 0 5 years 100 2.7 GB 317 800
W3
user adds item to
cart
insert w: majority 1 month 500 2.7 GB 64 100
W4
user creates a
shopping cart
insert w: majority 5 years 2000 2.7 GB 16 40
W5
user adds a
review to an item
insert w: 1 5 years 1000 547 MB 7 14
W6
employee inserts
new items or
updates existing
items in the
catalog
insert or
update
w: majority forever 500 250 MB 6 12
40. #MDBLocal
List and Sizing of Read Operations
ID Description Type Max Latency Execution Time Single Doc Size
(Bytes)
Average
Frequency
(reads/sec)
Peak
Frequency
(reads/sec)
R1
user logs into
the application
real-time 5ms 1000 64 80
R2
user views a
specific item
real-time 1ms 1000 317 800
R3
user views a
specific store
real-time 50ms 1000 3 10
R4
user views their
cart
real-time 20ms 2000 31 100
R5
data scientist
runs analytics
analytics 60 secs < 1
41. #MDBLocal
Data Sizing
Entity Count Document Size
(Bytes)
Total Disk Space
(Bytes)
Notes
carts 2,500,000,000
2000
5.00E+12 5 years of data
categories 100
100
1.00E+04
items 10,000,000
1000
1.00E+10
reviews 1,000,000,000
1000
1.00E+12 5 years of data
staff 10,000
200
2.00E+06
stores 200
1000
2.00E+05
users 100,000,000 1000 1.00E+11
views 50,000,000,000 50 2.50E+12
42. #MDBLocal
Workload Evaluation Summary
Most important queries:
• R2: user views a specific item – has to be under 1
ms.
• W3: user adds item to cart – write concern:
majority.
Required indexes:
• { category: 1, item_name: 1}
• { category: 1, item_name: 1, price: 1}
• { username: 1}
Assumptions and Projections:
• Data will be stored for a maximum of 5 years.
• Number of items sold will double each year.
• Number of users will double each year.
List of Entities:
• carts
• categories
• items
• reviews
• staff
• stores
• users
• views
43. #MDBLocal
• A list of collections with
document fields for each
collection.
• Data size.
• A list of database queries
and indexes.
• A list of current operations,
assumptions, and growth
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Business
dom
ain
expertise
Current
and
predicted
scenarios
• CRD : Collections
Relationship
Diagram
Evaluate the
application
workload
Map out the
entities and
their
relationships
47. #MDBLocal
• A list of collections with
document fields and
shapes for each collection.
• Data size.
• A list of database queries
and indexes.
• A list of current operations,
assumptions, and growth
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
• CRD : Collections
Relationship
Diagram
Evaluate the
application
workload
Map out the
entities and
their
relationships
Finalize schema
for each
collection
• Identify and apply
relevant schema
patterns
48. #MDBLocal
Apply All the Patterns!
Patterns Used:
• Schema
Versioning
• Subset
• Computed
• Bucket
• Extended
Reference
51. #MDBLocal
Tailor the Data Model
To your unique setup
• Shared hosted
DB
• Small team
• Large Sharded
Cluster
• Large Team
• Replica Set
Simpler data
model Performant data model
52. #MDBLocal
Flexible Data Modeling Approach
For a Simpler data model
focus on:
For a bit of both:
For the most Performant
data model focus on:
Evaluate the application
workload
The most frequent
operation
• Data size
• The most frequent
operations
• Data size
• The most frequent
operations
• The most important
operations
Map out the entities and their
relationships
Embedding data Embedding and linking data Embedding and linking data
Finalize schema for each
collection
Use few patterns
Use as many patterns as
necessary
Use as many patterns as
necessary