#MDBlocal
Complete Methodology to
Data Modeling for MongoDB
Yulia Genkina, Curriculum Engineer, MongoDB
München
#MDBLocal
• MongoDB Data Modeling
Methodology :
• Entity Relationships
• Schema Patterns
• Methodology Use Case
Example
• Conclusions and other
considerations
Talk Structure
#MDBLocal
Step 1 : Define the schema.
Step 2 : Develop the application
and queries.
Concerns:
- One possible solution for the initial
schema.
- Final schema is most likely denormalized.
- Schema evolution is difficult and likely
requires downtime.
- Performance drops as schema evolves.
Data Modeling in the TabularWorld
#MDBLocal
Step 1 : Develop the application
and queries.
Step 2 : Define the schema.
Step 3 : Improve the application.
Step 4 : Improve the schema.
Step 5 : Repeat steps 3 and 4
indefinitely.
Step 6 : Profit
Data Modeling in the Document World
Data Modeling
Step-by-step Guide
#MDBLocal
• Data size.
• A list of database queries
and indexes.
• A list of current operations
and assumptions.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
Evaluate the
application
workload
#MDBLocal
• A list of collections with
document fields for each
collection.
• Data size.
• A list of database queries
and indexes.
• A list of current operations,
assumptions, and growth
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
• CRD : Collection
Relationship
Diagrams
Evaluate the
application
workload
Map out the
entities and
their
relationships
Relationships
Brief Introduction
#MDBLocal
Example 1: Entities and Relationships in an Blog.
#MDBLocal
Example 1: Schema Outline for a Blog
orEmbed All Embed & Link
Queries by
articles or
users
Queries by
articles
#MDBLocal
Example 2: Entities for a Library Application.
book
title
isbn
language
published_by
author
user
username
first_name
last_name
author
first_name
last_name
Normalized form
#MDBLocal
Example 2: Entities for a Library Application.
book
title
isbn
language
published_by
author
- first_name
- last_name
user
username
first_name
last_name
De-Normalized form
#MDBLocal
Example 2: Embedding
• Can be used for a 1-N or an N-N relationship.
• Great for read performance.
• One atomic operation retrieves all necessary
information.
#MDBLocal
Example 2: Linking.
• More, smaller documents.
• Can make queries by ID very simple.
• Can be used for a 1-N or an N-N relationship.
#MDBLocal
• A list of collections with
document fields and
shapes for each collection.
• Data size.
• A list of database queries
and indexes.
• A list of current operations,
assumptions, and growth
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
• CRD : Collections
Relationship
Diagram
Evaluate the
application
workload
Map out the
entities and
their
relationships
Finalize schema
for each
collection
• Identify and apply
relevant schema
patterns
Patterns
Brief Introduction
#MDBLocal
Schema Versioning Pattern
#MDBLocal
Schema Versioning Pattern
#MDBLocal
Schema Versioning Pattern
#MDBLocal
Schema Versioning Pattern
#MDBLocal
Schema Versioning Pattern
#MDBLocal
Computed Pattern
CPU work
#MDBLocal
Computed Pattern
CPU work
#MDBLocal
Computed Pattern
#MDBLocal
Subset Pattern
#MDBLocal
Subset Pattern
#MDBLocal
Subset Pattern
#MDBLocal
Subset Pattern
#MDBLocal
Subset Pattern
#MDBLocal
Subset Pattern
#MDBLocal
Subset Pattern
#MDBLocal
Bucket Pattern
New document for each sensor
readingTabularApproach
A document per time unit per sensor
Document Approach
#MDBLocal
Bucket Pattern
Schema
Bucket per Hour
Computed Pattern
#MDBLocal
Solution with Schema Versioning, Subset, Computed, and Bucket
Patterns
#MDBLocal
Other Patterns and Where to Find Them
• Read more about patterns on our blog:
http://bit.ly/building-with-patterns
• Take the Data Modeling with MongoDB Course:
https://university.mongodb.com/courses/M320/abou
t
• Some more patterns to explore:
• Approximation
• Attribute
• Document Versioning
• Extended Reference
• Outlier
• Preallocated
• Polymorphic
Design an Online Shopping App:
MongoMart
A Use Case Example
#MDBLocal
• Data size.
• A list of database queries.
• A list of current operations,
assumptions, and growth
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
Evaluate the
application
workload
#MDBLocal
Evaluate Application Workload
1000 stores
10 Million items
100 Million user accounts
• 500K new accounts per week
• logging 20 times a year
• looking up 100 items per year
• making 5 carts per year
• reviewing 2 items per year
Analytics
• 50 employees per store
• one store lookup per customer per year
• 100 reviews per item
• 500K updates per day (new products, price
updates, ...)
• putting 4 items in the cart
• buying an average of 2 items per cart
• 10 data scientists
• each running 10 queries a day
#MDBLocal
List and Sizing of Write Operations
ID Description Type Durability Data Life Data Size
(Bytes)
Storage Size
(per day)
Average
Frequence
(writes/sec)
Peak Frequency
(writes/sec)
W1
user creates an
account
insert w: majority forever 500 35.7 MB 1 3
W2
application
records time and
user info when an
item is viewed
insert w: 0 5 years 100 2.7 GB 317 800
W3
user adds item to
cart
insert w: majority 1 month 500 2.7 GB 64 100
W4
user creates a
shopping cart
insert w: majority 5 years 2000 2.7 GB 16 40
W5
user adds a
review to an item
insert w: 1 5 years 1000 547 MB 7 14
W6
employee inserts
new items or
updates existing
items in the
catalog
insert or
update
w: majority forever 500 250 MB 6 12
#MDBLocal
List and Sizing of Read Operations
ID Description Type Max Latency Execution Time Single Doc Size
(Bytes)
Average
Frequency
(reads/sec)
Peak
Frequency
(reads/sec)
R1
user logs into
the application
real-time 5ms 1000 64 80
R2
user views a
specific item
real-time 1ms 1000 317 800
R3
user views a
specific store
real-time 50ms 1000 3 10
R4
user views their
cart
real-time 20ms 2000 31 100
R5
data scientist
runs analytics
analytics 60 secs < 1
#MDBLocal
Data Sizing
Entity Count Document Size
(Bytes)
Total Disk Space
(Bytes)
Notes
carts 2,500,000,000
2000
5.00E+12 5 years of data
categories 100
100
1.00E+04
items 10,000,000
1000
1.00E+10
reviews 1,000,000,000
1000
1.00E+12 5 years of data
staff 10,000
200
2.00E+06
stores 200
1000
2.00E+05
users 100,000,000 1000 1.00E+11
views 50,000,000,000 50 2.50E+12
#MDBLocal
Workload Evaluation Summary
Most important queries:
• R2: user views a specific item – has to be under 1
ms.
• W3: user adds item to cart – write concern:
majority.
Required indexes:
• { category: 1, item_name: 1}
• { category: 1, item_name: 1, price: 1}
• { username: 1}
Assumptions and Projections:
• Data will be stored for a maximum of 5 years.
• Number of items sold will double each year.
• Number of users will double each year.
List of Entities:
• carts
• categories
• items
• reviews
• staff
• stores
• users
• views
#MDBLocal
• A list of collections with
document fields for each
collection.
• Data size.
• A list of database queries
and indexes.
• A list of current operations,
assumptions, and growth
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Business
dom
ain
expertise
Current
and
predicted
scenarios
• CRD : Collections
Relationship
Diagram
Evaluate the
application
workload
Map out the
entities and
their
relationships
#MDBLocal
Entity Relationship Diagram
#MDBLocal
Collections Relationship Diagram ( Simple )
Embed
Everything!
#MDBLocal
Collections Relationship Diagram ( Better )
Accommodate for
Assumptions.
Embed and Link
clear every 5
years
clear every 5
years
#MDBLocal
• A list of collections with
document fields and
shapes for each collection.
• Data size.
• A list of database queries
and indexes.
• A list of current operations,
assumptions, and growth
projections.
• Data size.
• A list of
operations
ranked by
importance.
Production
logsand
stats
Busines
s
dom
ain
expertis
e
Current
and
predicted
scenarios
• CRD : Collections
Relationship
Diagram
Evaluate the
application
workload
Map out the
entities and
their
relationships
Finalize schema
for each
collection
• Identify and apply
relevant schema
patterns
#MDBLocal
Apply All the Patterns!
Patterns Used:
• Schema
Versioning
• Subset
• Computed
• Bucket
• Extended
Reference
Conclusion
And additional considerations
#MDBLocal
Your Data Model Will Evolve
Just like your application
#MDBLocal
Tailor the Data Model
To your unique setup
• Shared hosted
DB
• Small team
• Large Sharded
Cluster
• Large Team
• Replica Set
Simpler data
model Performant data model
#MDBLocal
Flexible Data Modeling Approach
For a Simpler data model
focus on:
For a bit of both:
For the most Performant
data model focus on:
Evaluate the application
workload
The most frequent
operation
• Data size
• The most frequent
operations
• Data size
• The most frequent
operations
• The most important
operations
Map out the entities and their
relationships
Embedding data Embedding and linking data Embedding and linking data
Finalize schema for each
collection
Use few patterns
Use as many patterns as
necessary
Use as many patterns as
necessary
THANK YOU

MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB

  • 1.
    #MDBlocal Complete Methodology to DataModeling for MongoDB Yulia Genkina, Curriculum Engineer, MongoDB München
  • 2.
    #MDBLocal • MongoDB DataModeling Methodology : • Entity Relationships • Schema Patterns • Methodology Use Case Example • Conclusions and other considerations Talk Structure
  • 3.
    #MDBLocal Step 1 :Define the schema. Step 2 : Develop the application and queries. Concerns: - One possible solution for the initial schema. - Final schema is most likely denormalized. - Schema evolution is difficult and likely requires downtime. - Performance drops as schema evolves. Data Modeling in the TabularWorld
  • 4.
    #MDBLocal Step 1 :Develop the application and queries. Step 2 : Define the schema. Step 3 : Improve the application. Step 4 : Improve the schema. Step 5 : Repeat steps 3 and 4 indefinitely. Step 6 : Profit Data Modeling in the Document World
  • 5.
  • 6.
    #MDBLocal • Data size. •A list of database queries and indexes. • A list of current operations and assumptions. • Data size. • A list of operations ranked by importance. Production logsand stats Busines s dom ain expertis e Current and predicted scenarios Evaluate the application workload
  • 7.
    #MDBLocal • A listof collections with document fields for each collection. • Data size. • A list of database queries and indexes. • A list of current operations, assumptions, and growth projections. • Data size. • A list of operations ranked by importance. Production logsand stats Busines s dom ain expertis e Current and predicted scenarios • CRD : Collection Relationship Diagrams Evaluate the application workload Map out the entities and their relationships
  • 8.
  • 9.
    #MDBLocal Example 1: Entitiesand Relationships in an Blog.
  • 10.
    #MDBLocal Example 1: SchemaOutline for a Blog orEmbed All Embed & Link Queries by articles or users Queries by articles
  • 11.
    #MDBLocal Example 2: Entitiesfor a Library Application. book title isbn language published_by author user username first_name last_name author first_name last_name Normalized form
  • 12.
    #MDBLocal Example 2: Entitiesfor a Library Application. book title isbn language published_by author - first_name - last_name user username first_name last_name De-Normalized form
  • 13.
    #MDBLocal Example 2: Embedding •Can be used for a 1-N or an N-N relationship. • Great for read performance. • One atomic operation retrieves all necessary information.
  • 14.
    #MDBLocal Example 2: Linking. •More, smaller documents. • Can make queries by ID very simple. • Can be used for a 1-N or an N-N relationship.
  • 15.
    #MDBLocal • A listof collections with document fields and shapes for each collection. • Data size. • A list of database queries and indexes. • A list of current operations, assumptions, and growth projections. • Data size. • A list of operations ranked by importance. Production logsand stats Busines s dom ain expertis e Current and predicted scenarios • CRD : Collections Relationship Diagram Evaluate the application workload Map out the entities and their relationships Finalize schema for each collection • Identify and apply relevant schema patterns
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
    #MDBLocal Bucket Pattern New documentfor each sensor readingTabularApproach A document per time unit per sensor Document Approach
  • 33.
  • 34.
    #MDBLocal Solution with SchemaVersioning, Subset, Computed, and Bucket Patterns
  • 35.
    #MDBLocal Other Patterns andWhere to Find Them • Read more about patterns on our blog: http://bit.ly/building-with-patterns • Take the Data Modeling with MongoDB Course: https://university.mongodb.com/courses/M320/abou t • Some more patterns to explore: • Approximation • Attribute • Document Versioning • Extended Reference • Outlier • Preallocated • Polymorphic
  • 36.
    Design an OnlineShopping App: MongoMart A Use Case Example
  • 37.
    #MDBLocal • Data size. •A list of database queries. • A list of current operations, assumptions, and growth projections. • Data size. • A list of operations ranked by importance. Production logsand stats Busines s dom ain expertis e Current and predicted scenarios Evaluate the application workload
  • 38.
    #MDBLocal Evaluate Application Workload 1000stores 10 Million items 100 Million user accounts • 500K new accounts per week • logging 20 times a year • looking up 100 items per year • making 5 carts per year • reviewing 2 items per year Analytics • 50 employees per store • one store lookup per customer per year • 100 reviews per item • 500K updates per day (new products, price updates, ...) • putting 4 items in the cart • buying an average of 2 items per cart • 10 data scientists • each running 10 queries a day
  • 39.
    #MDBLocal List and Sizingof Write Operations ID Description Type Durability Data Life Data Size (Bytes) Storage Size (per day) Average Frequence (writes/sec) Peak Frequency (writes/sec) W1 user creates an account insert w: majority forever 500 35.7 MB 1 3 W2 application records time and user info when an item is viewed insert w: 0 5 years 100 2.7 GB 317 800 W3 user adds item to cart insert w: majority 1 month 500 2.7 GB 64 100 W4 user creates a shopping cart insert w: majority 5 years 2000 2.7 GB 16 40 W5 user adds a review to an item insert w: 1 5 years 1000 547 MB 7 14 W6 employee inserts new items or updates existing items in the catalog insert or update w: majority forever 500 250 MB 6 12
  • 40.
    #MDBLocal List and Sizingof Read Operations ID Description Type Max Latency Execution Time Single Doc Size (Bytes) Average Frequency (reads/sec) Peak Frequency (reads/sec) R1 user logs into the application real-time 5ms 1000 64 80 R2 user views a specific item real-time 1ms 1000 317 800 R3 user views a specific store real-time 50ms 1000 3 10 R4 user views their cart real-time 20ms 2000 31 100 R5 data scientist runs analytics analytics 60 secs < 1
  • 41.
    #MDBLocal Data Sizing Entity CountDocument Size (Bytes) Total Disk Space (Bytes) Notes carts 2,500,000,000 2000 5.00E+12 5 years of data categories 100 100 1.00E+04 items 10,000,000 1000 1.00E+10 reviews 1,000,000,000 1000 1.00E+12 5 years of data staff 10,000 200 2.00E+06 stores 200 1000 2.00E+05 users 100,000,000 1000 1.00E+11 views 50,000,000,000 50 2.50E+12
  • 42.
    #MDBLocal Workload Evaluation Summary Mostimportant queries: • R2: user views a specific item – has to be under 1 ms. • W3: user adds item to cart – write concern: majority. Required indexes: • { category: 1, item_name: 1} • { category: 1, item_name: 1, price: 1} • { username: 1} Assumptions and Projections: • Data will be stored for a maximum of 5 years. • Number of items sold will double each year. • Number of users will double each year. List of Entities: • carts • categories • items • reviews • staff • stores • users • views
  • 43.
    #MDBLocal • A listof collections with document fields for each collection. • Data size. • A list of database queries and indexes. • A list of current operations, assumptions, and growth projections. • Data size. • A list of operations ranked by importance. Production logsand stats Business dom ain expertise Current and predicted scenarios • CRD : Collections Relationship Diagram Evaluate the application workload Map out the entities and their relationships
  • 44.
  • 45.
    #MDBLocal Collections Relationship Diagram( Simple ) Embed Everything!
  • 46.
    #MDBLocal Collections Relationship Diagram( Better ) Accommodate for Assumptions. Embed and Link clear every 5 years clear every 5 years
  • 47.
    #MDBLocal • A listof collections with document fields and shapes for each collection. • Data size. • A list of database queries and indexes. • A list of current operations, assumptions, and growth projections. • Data size. • A list of operations ranked by importance. Production logsand stats Busines s dom ain expertis e Current and predicted scenarios • CRD : Collections Relationship Diagram Evaluate the application workload Map out the entities and their relationships Finalize schema for each collection • Identify and apply relevant schema patterns
  • 48.
    #MDBLocal Apply All thePatterns! Patterns Used: • Schema Versioning • Subset • Computed • Bucket • Extended Reference
  • 49.
  • 50.
    #MDBLocal Your Data ModelWill Evolve Just like your application
  • 51.
    #MDBLocal Tailor the DataModel To your unique setup • Shared hosted DB • Small team • Large Sharded Cluster • Large Team • Replica Set Simpler data model Performant data model
  • 52.
    #MDBLocal Flexible Data ModelingApproach For a Simpler data model focus on: For a bit of both: For the most Performant data model focus on: Evaluate the application workload The most frequent operation • Data size • The most frequent operations • Data size • The most frequent operations • The most important operations Map out the entities and their relationships Embedding data Embedding and linking data Embedding and linking data Finalize schema for each collection Use few patterns Use as many patterns as necessary Use as many patterns as necessary
  • 53.