Advanced Schema Design Patterns
Justin LaBreck
Principal Consulting Engineer
Daniel Coupal
Education Department
Goals of the Presentation
What are Patterns?
What are
schema design
patterns and
why are we
using them?
Goals of the Presentation
What are Patterns? Methodology
What are
schema design
patterns and
why are we
using them?
A quick
overview of our
methodology
Goals of the Presentation
What are Patterns? Methodology Use Case
What are
schema design
patterns and
why are we
using them?
A quick
overview of our
methodology
Single View
An insurance
company
needs to unify
a few systems
using RDBMS
Goals of the Presentation
What are Patterns? Methodology Use Case Applying Patterns
What are
schema design
patterns and
why are we
using them?
A quick
overview of our
methodology
Single View
An insurance
company
needs to unify
a few systems
using RDBMS
• bucket
• computed
• extended
reference
• outlier
• approximation
What are Patterns?
What are schema design patterns and why are we using them?
What are patterns?
Building Blocks
identified by our Consulting Engineers
helping customers for the last 12 years.
Common Language
Data Architects and Engineers can easily
reference the same things.
What are Patterns?
?
Improve Performance
by using no more resources than you
should
Simplify the access to the data
by grouping and pre-arranging data in a
simpler form
What can patterns
do for you?
☑️
Duplication of Data
to avoid reading from many collections
for common queries
Staleness of Data
for pre-computed fields updated on a
given frequency
De-normalizations
to make data easier and faster to retrieveAny caution in using
the patterns?
Schema Design Patterns
and Use Cases
Methodology
A quick overview of our methodology
What are patterns?What are Patterns? Methodology
BREAKOUT SESSION
A Complete Methodology
of Data Modeling for
MongoDB
Daniel Coupal
Education Department
Main Tradeoff in Modeling
Simplicity Performance
Methodology
1. Describe the Workload
Methodology
1. Describe the Workload
2. Identify and Model
the Relationships
Methodology
1. Describe the Workload
2. Identify and Model
the Relationships
3. Apply Patterns
Methodology
Flexible Methodology
Goal Simplicity
1. Describe the
Workload
Most frequent Operation
2. Identify and Model
the Relationships
Mostly embedding
3. Apply Patterns Pattern A
Flexible Methodology
Goal Simplicity Performance
1. Describe the
Workload
Most frequent Operation All Operations
Quantify Ops
Qualify Ops
2. Identify and Model
the Relationships
Mostly embedding Embedding and
linking
3. Apply Patterns Pattern A Pattern A
Pattern B
Pattern C
…
Flexible Methodology
Goal Simplicity Simplicity and
Performance
Performance
1. Describe the
Workload
Most frequent Operation Most Operations
Quantify Ops
All Operations
Quantify Ops
Qualify Ops
2. Identify and Model
the Relationships
Mostly embedding Embedding and
linking
Embedding and
linking
3. Apply Patterns Pattern A Pattern A
Pattern B
Pattern A
Pattern B
Pattern C
…
Use Case: Single View
Mongo Insurance Corporation needs to unify a few systems using RDBMS
What are patterns?What are Patterns? Methodology Use Case
Mongo
Insurance
Portal
Superfast user interface
Merging legacy systems
Analytics in batch
Future base for customer portal
1 – Workload: list the operations
Query Operation Description
1. Sync changes to MIP write Apply changes to the different legacy databases to
MIP's MongoDB database
2. Load user profile read Create a session with the user information
3. Load overview of account read Gather all information for the main page: policies,
claims, payments, documents, and messages
4. Load detail of an artifact read Access a given policy, claim, payment, document,
or message
5. Process claims read Analyze all changes in claim for which actions
must be taken
6. User portal operations?
1 – Workload: quantify/qualify the operations
Query Quantification Qualification
1. Sync changes to MIP 24 writes/day
< 10 mins
critical write
2. Load user profile 12 reads/sec (1 000 000 reads/day)
< 2 ms
no stale data
3. Load overview of account 12 reads/sec (1 000 000 reads/day)
< 2 ms
no stale data
4. Load detail of an artifact 115 reads/sec (10 000 000 reads/day)
< 2 ms
no stale data
5. Process claims 2 reads/day
< 5 mins
no stale data
collection scan
1 – Workload: important queries
Query Quantification Qualification
1. Sync changes to MIP 24 writes/day
< 10 mins
critical write
2. Load user profile 12 reads/sec (1 000 000 reads/day)
< 2 ms
no stale data
3. Load overview of account 12 reads/sec (1 000 000 reads/day)
< 2 ms
no stale data
4. Load detail of an artifact 115 reads/sec (10 000 000 reads/day)
< 2 ms
no stale data
5. Process claims 2 reads/day
< 5 mins
no stale data
collection scan
Mongo Insurance Corporation John Doe
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Mongo Insurance Corporation John Doe
Documents and Statements
54323 Views
Total documents: 8
New policy inquery 2020/02/14
Bill past due 2019/11/03
Bill past due (2nd notice) 2019/12/03
Your new claim 2019/04/20
Your claim status updated 2019/04/15
Your new claim 2019/04/13
next >>
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
2 - Entities for MongoDB Insurance Portal
Entities:
• Users
• Policies
• Claims
• Bills
• Documents
• Messages
This is unlikely fast enough!
Let's apply some magic
Applying Patterns
Bucket, Computed, Extended Reference, Outlier, Approximation
What are patterns?What are Patterns? Methodology Use Case Applying Patterns
Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Query user details once
then store in session
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Important operations
Query Quantification Qualification
1. Sync changes to MIP 24 writes/day
< 10 mins
critical write
2. Load user profile 12 reads/sec (1 000 000 reads/day)
< 2 ms
no stale data
3. Load overview of account 12 reads/sec (1 000 000 reads/day)
< 2 ms
no stale data
4. Load detail of an artifact 115 reads/sec (10 000 000 reads/day)
< 2 ms
no stale data
5. Process claims 2 reads/day
< 5 mins
no stale data
collection scan
1. Describe the Workload
2. Identify and Model
the Relationships
3. Apply Patterns
Methodology
Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Main Tradeoff in Modeling
Simplicity Performance
Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Simplicity versus Performance
Policy collection?
Separate claims, billing, documents, and messages?
• Store a “computed” value instead of calculating every
time
• Fast for reads, additional writes required
• A “cached” aggregation, updated by the application
Computed pattern
Computed pattern
• Keep a “reference” to data from one collection in another
• Have the data needed to generate a page load available to
the application when it is needed
• Denormalization to increase read performance
• MongoDB doesn’t have JOINs, it has $lookup
• One read is faster than two
Extended Reference pattern
Extended Reference pattern
The users collection
Extended
Reference
Computed
Computed /
Extended
Reference
The users collection
db.users.updateOne(
{ _id: 10000 },
{ $set: { “billing.2019-12.owe”: 1800 } }
);
db.users.updateOne(
{ _id: 10000 },
{ $set: { “billing.2020-01.paid”: 400 } }
);
db.users.updateOne(
{ _id: 10000 },
{ $set: { “billing.2020-02.paid”: 400 } }
);
db.users.updateOne(
{ _id: 10000 },
{ $set: { “billing.2020-03.paid”: 400 } }
);
Application computes for display:
1800 – 400 – 400 – 400 = 800
No “update everything” cron job;
simple calculation instead of
continuously updating collection
Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $1000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $400
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Using the bucket pattern, each policy
exists in an array
Contains all information about each
policy within the array
• A single domain object is made up of multiple documents
• Leverage the power of arrays to “bucket” groups of data
together
e.g. 2 documents with buckets contain 50 objects
= 1 domain object with 100 objects
• Useful for keeping relevant data “close”, e.g. paging
• Define a max ”bucket size”, or the number of objects to
store per bucket
Bucket pattern
Bucket pattern
Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $1000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $400
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
The policies collection
Bucket
_id references users
collection
Enough policy information to
generate display
Extended
Reference
Extended
Reference
Mongo Insurance Corporation John Doe
Documents and Statements
54323 Views
Total documents: 8
New policy inquiry 2020/02/14
Bill past due 2019/11/03
Bill past due (2nd notice) 2019/12/03
Your new claim 2019/04/20
Your claim status updated 2019/04/15
Your new claim 2019/04/13
next >>
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Bucket pattern? Or outlier pattern?
• Focus on 80% use case, optimize for other 20% later
• Create a collection representing the ideal case
then “overflow” into another collection for
everything else
• An overflow collection will represent a separate code
path and the extra/overflow documents exist in the
overflow collection
Outlier pattern
Which pattern?
Bucket? or
Outlier?
0-10 11-20 21-30 41-50 51-60 61-70 71-80 81-90 91-100
Documents per customer (Histogram)
Your schema decisions
can and should
be data driven
0-10 11-20 21-30 41-50 51-60 61-70 71-80 81-90 91-100
Documents per customer (Histogram)
Maximum size
of bucket (60)
0-10 11-20 21-30 41-50 51-60 61-70 71-80 81-90 91-100
Documents per customer (Histogram)
Outlier?
The documents collection
Bucket
_id matches users collection
Bucket contains 6 documents*
or some realistic maximum
that is data driven
But the users collection
says “documentCount” is 8.
Where are the rest?
* This example uses a maximum
bucket size of 6 to keep the example
simple
The documents collection
Bucket of
Six
Bucket of
two
Total of eight documents, in two buckets
The documents_overflow collection
(or the bucket pattern)
Bucket
These represent the same data; pick one
Overflow
Overflow
The documents_overflow collection
(or the bucket pattern)
Overflow
Overflow
Bucket
These represent the same data; pick one
Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
• It’s a facsimile of the actual value (”close enough”)
• Don’t take the compute cost for numbers that don’t
matter
• Leverage the “law of averages” to reduce overall load
• Useful for page counters, Internet of Things (IoT), etc.
Approximation pattern
Approximation pattern
CounterCounter
The counters collection
Approximation
Update the data randomly, about every 100 page loads:
if(Rand(0,99) == 0) {
db.counters.updateOne({ _id: “page_views” }, { $inc: { count: 100 } })
}
The counters collection
Approximation
Update the data randomly, about every 100 page loads:
if(Rand(0,99) == 0) {
db.counters.updateOne({ _id: “page_views” }, { $inc: { count: 100 } })
}
Conclusion
Takeaways
Leverage flexible methodology
Don’t re-invent the wheel, use the flexible
methodology to make development faster
Takeaways
Leverage flexible methodology
Don’t re-invent the wheel, use the flexible
methodology to make development faster
Share a common vocabulary
Make it easy to talk to each other about patterns;
share and use the common definition
Takeaways
Leverage flexible methodology
Don’t re-invent the wheel, use the flexible
methodology to make development faster
Share a common vocabulary
Make it easy to talk to each other about patterns;
share and use the common definition
Use patterns
Use "plug-and-play" components for your future
designs
Additional
Resources
Blogs on Patterns
by Ken Alger and Daniel Coupal
https://www.mongodb.com/blog/post/building-
with-patterns-a-summary
Course on Data Modeling: M320
university.mongodb.com
Paging with the Bucket Pattern
By Justin LaBreck
https://www.mongodb.com/blog/post/paging-
with-the-bucket-pattern--part-1
MongoDB Consulting Services
https://www.mongodb.com/products/consulting
Remember to take
our free MongoDB
courses!
university.mongodb.com
#MDBlive

MongoDB.Live 2020 - Advanced Schema Design Patterns

  • 1.
    Advanced Schema DesignPatterns Justin LaBreck Principal Consulting Engineer Daniel Coupal Education Department
  • 2.
    Goals of thePresentation What are Patterns? What are schema design patterns and why are we using them?
  • 3.
    Goals of thePresentation What are Patterns? Methodology What are schema design patterns and why are we using them? A quick overview of our methodology
  • 4.
    Goals of thePresentation What are Patterns? Methodology Use Case What are schema design patterns and why are we using them? A quick overview of our methodology Single View An insurance company needs to unify a few systems using RDBMS
  • 5.
    Goals of thePresentation What are Patterns? Methodology Use Case Applying Patterns What are schema design patterns and why are we using them? A quick overview of our methodology Single View An insurance company needs to unify a few systems using RDBMS • bucket • computed • extended reference • outlier • approximation
  • 6.
    What are Patterns? Whatare schema design patterns and why are we using them? What are patterns?
  • 8.
    Building Blocks identified byour Consulting Engineers helping customers for the last 12 years. Common Language Data Architects and Engineers can easily reference the same things. What are Patterns? ?
  • 9.
    Improve Performance by usingno more resources than you should Simplify the access to the data by grouping and pre-arranging data in a simpler form What can patterns do for you? ☑️
  • 10.
    Duplication of Data toavoid reading from many collections for common queries Staleness of Data for pre-computed fields updated on a given frequency De-normalizations to make data easier and faster to retrieveAny caution in using the patterns?
  • 11.
  • 12.
    Methodology A quick overviewof our methodology What are patterns?What are Patterns? Methodology
  • 13.
    BREAKOUT SESSION A CompleteMethodology of Data Modeling for MongoDB Daniel Coupal Education Department
  • 14.
    Main Tradeoff inModeling Simplicity Performance
  • 15.
  • 16.
    1. Describe theWorkload Methodology
  • 17.
    1. Describe theWorkload 2. Identify and Model the Relationships Methodology
  • 18.
    1. Describe theWorkload 2. Identify and Model the Relationships 3. Apply Patterns Methodology
  • 19.
    Flexible Methodology Goal Simplicity 1.Describe the Workload Most frequent Operation 2. Identify and Model the Relationships Mostly embedding 3. Apply Patterns Pattern A
  • 20.
    Flexible Methodology Goal SimplicityPerformance 1. Describe the Workload Most frequent Operation All Operations Quantify Ops Qualify Ops 2. Identify and Model the Relationships Mostly embedding Embedding and linking 3. Apply Patterns Pattern A Pattern A Pattern B Pattern C …
  • 21.
    Flexible Methodology Goal SimplicitySimplicity and Performance Performance 1. Describe the Workload Most frequent Operation Most Operations Quantify Ops All Operations Quantify Ops Qualify Ops 2. Identify and Model the Relationships Mostly embedding Embedding and linking Embedding and linking 3. Apply Patterns Pattern A Pattern A Pattern B Pattern A Pattern B Pattern C …
  • 22.
    Use Case: SingleView Mongo Insurance Corporation needs to unify a few systems using RDBMS What are patterns?What are Patterns? Methodology Use Case
  • 23.
    Mongo Insurance Portal Superfast user interface Merginglegacy systems Analytics in batch Future base for customer portal
  • 24.
    1 – Workload:list the operations Query Operation Description 1. Sync changes to MIP write Apply changes to the different legacy databases to MIP's MongoDB database 2. Load user profile read Create a session with the user information 3. Load overview of account read Gather all information for the main page: policies, claims, payments, documents, and messages 4. Load detail of an artifact read Access a given policy, claim, payment, document, or message 5. Process claims read Analyze all changes in claim for which actions must be taken 6. User portal operations?
  • 25.
    1 – Workload:quantify/qualify the operations Query Quantification Qualification 1. Sync changes to MIP 24 writes/day < 10 mins critical write 2. Load user profile 12 reads/sec (1 000 000 reads/day) < 2 ms no stale data 3. Load overview of account 12 reads/sec (1 000 000 reads/day) < 2 ms no stale data 4. Load detail of an artifact 115 reads/sec (10 000 000 reads/day) < 2 ms no stale data 5. Process claims 2 reads/day < 5 mins no stale data collection scan
  • 26.
    1 – Workload:important queries Query Quantification Qualification 1. Sync changes to MIP 24 writes/day < 10 mins critical write 2. Load user profile 12 reads/sec (1 000 000 reads/day) < 2 ms no stale data 3. Load overview of account 12 reads/sec (1 000 000 reads/day) < 2 ms no stale data 4. Load detail of an artifact 115 reads/sec (10 000 000 reads/day) < 2 ms no stale data 5. Process claims 2 reads/day < 5 mins no stale data collection scan
  • 27.
    Mongo Insurance CorporationJohn Doe Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25 Car Insurance Details Car 1 - Supercar XLT (2020) 4 wheel drive, 8 cyl Premium: $2000 Car 2 - Compactcar (2010) 2 wheel drive, 4 cyl Premium: $800 Open claims Collision Damaged rear bumper Trunk body damage Find a partner repair center 54323 Views
  • 28.
    Mongo Insurance CorporationJohn Doe Car Insurance Details Car 1 - Supercar XLT (2020) 4 wheel drive, 8 cyl Premium: $2000 Car 2 - Compactcar (2010) 2 wheel drive, 4 cyl Premium: $800 Open claims Collision Damaged rear bumper Trunk body damage Find a partner repair center 54323 Views Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25
  • 29.
    Mongo Insurance CorporationJohn Doe Car Insurance Details Car 1 - Supercar XLT (2020) 4 wheel drive, 8 cyl Premium: $2000 Car 2 - Compactcar (2010) 2 wheel drive, 4 cyl Premium: $800 Open claims Collision Damaged rear bumper Trunk body damage 54323 Views Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25
  • 30.
    Mongo Insurance CorporationJohn Doe Documents and Statements 54323 Views Total documents: 8 New policy inquery 2020/02/14 Bill past due 2019/11/03 Bill past due (2nd notice) 2019/12/03 Your new claim 2019/04/20 Your claim status updated 2019/04/15 Your new claim 2019/04/13 next >> Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25
  • 31.
    2 - Entitiesfor MongoDB Insurance Portal Entities: • Users • Policies • Claims • Bills • Documents • Messages
  • 32.
    This is unlikelyfast enough! Let's apply some magic
  • 33.
    Applying Patterns Bucket, Computed,Extended Reference, Outlier, Approximation What are patterns?What are Patterns? Methodology Use Case Applying Patterns
  • 34.
    Mongo Insurance CorporationJohn Doe Car Insurance Details Car 1 - Supercar XLT (2020) 4 wheel drive, 8 cyl Premium: $2000 Car 2 - Compactcar (2010) 2 wheel drive, 4 cyl Premium: $800 Open claims Collision Damaged rear bumper Trunk body damage Find a partner repair center 54323 Views Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25
  • 35.
    Mongo Insurance CorporationJohn Doe Car Insurance Details Car 1 - Supercar XLT (2020) 4 wheel drive, 8 cyl Premium: $2000 Car 2 - Compactcar (2010) 2 wheel drive, 4 cyl Premium: $800 Open claims Collision Damaged rear bumper Trunk body damage Find a partner repair center 54323 Views Query user details once then store in session Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25
  • 36.
    Mongo Insurance CorporationJohn Doe Car Insurance Details Car 1 - Supercar XLT (2020) 4 wheel drive, 8 cyl Premium: $2000 Car 2 - Compactcar (2010) 2 wheel drive, 4 cyl Premium: $800 Open claims Collision Damaged rear bumper Trunk body damage 54323 Views Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25
  • 37.
    Important operations Query QuantificationQualification 1. Sync changes to MIP 24 writes/day < 10 mins critical write 2. Load user profile 12 reads/sec (1 000 000 reads/day) < 2 ms no stale data 3. Load overview of account 12 reads/sec (1 000 000 reads/day) < 2 ms no stale data 4. Load detail of an artifact 115 reads/sec (10 000 000 reads/day) < 2 ms no stale data 5. Process claims 2 reads/day < 5 mins no stale data collection scan
  • 38.
    1. Describe theWorkload 2. Identify and Model the Relationships 3. Apply Patterns Methodology
  • 39.
    Mongo Insurance CorporationJohn Doe Car Insurance Details Car 1 - Supercar XLT (2020) 4 wheel drive, 8 cyl Premium: $2000 Car 2 - Compactcar (2010) 2 wheel drive, 4 cyl Premium: $800 Open claims Collision Damaged rear bumper Trunk body damage 54323 Views Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25
  • 40.
    Main Tradeoff inModeling Simplicity Performance
  • 41.
    Mongo Insurance CorporationJohn Doe Car Insurance Details Car 1 - Supercar XLT (2020) 4 wheel drive, 8 cyl Premium: $2000 Car 2 - Compactcar (2010) 2 wheel drive, 4 cyl Premium: $800 Open claims Collision Damaged rear bumper Trunk body damage 54323 Views Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25 Simplicity versus Performance Policy collection? Separate claims, billing, documents, and messages?
  • 42.
    • Store a“computed” value instead of calculating every time • Fast for reads, additional writes required • A “cached” aggregation, updated by the application Computed pattern
  • 43.
  • 44.
    • Keep a“reference” to data from one collection in another • Have the data needed to generate a page load available to the application when it is needed • Denormalization to increase read performance • MongoDB doesn’t have JOINs, it has $lookup • One read is faster than two Extended Reference pattern
  • 45.
  • 46.
  • 47.
    The users collection db.users.updateOne( {_id: 10000 }, { $set: { “billing.2019-12.owe”: 1800 } } ); db.users.updateOne( { _id: 10000 }, { $set: { “billing.2020-01.paid”: 400 } } ); db.users.updateOne( { _id: 10000 }, { $set: { “billing.2020-02.paid”: 400 } } ); db.users.updateOne( { _id: 10000 }, { $set: { “billing.2020-03.paid”: 400 } } ); Application computes for display: 1800 – 400 – 400 – 400 = 800 No “update everything” cron job; simple calculation instead of continuously updating collection
  • 48.
    Mongo Insurance CorporationJohn Doe Car Insurance Details Car 1 - Supercar XLT (2020) 4 wheel drive, 8 cyl Premium: $1000 Car 2 - Compactcar (2010) 2 wheel drive, 4 cyl Premium: $400 Open claims Collision Damaged rear bumper Trunk body damage Find a partner repair center 54323 Views Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25 Using the bucket pattern, each policy exists in an array Contains all information about each policy within the array
  • 49.
    • A singledomain object is made up of multiple documents • Leverage the power of arrays to “bucket” groups of data together e.g. 2 documents with buckets contain 50 objects = 1 domain object with 100 objects • Useful for keeping relevant data “close”, e.g. paging • Define a max ”bucket size”, or the number of objects to store per bucket Bucket pattern
  • 50.
  • 51.
    Mongo Insurance CorporationJohn Doe Car Insurance Details Car 1 - Supercar XLT (2020) 4 wheel drive, 8 cyl Premium: $1000 Car 2 - Compactcar (2010) 2 wheel drive, 4 cyl Premium: $400 Open claims Collision Damaged rear bumper Trunk body damage Find a partner repair center 54323 Views Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25
  • 52.
    The policies collection Bucket _idreferences users collection Enough policy information to generate display Extended Reference Extended Reference
  • 53.
    Mongo Insurance CorporationJohn Doe Documents and Statements 54323 Views Total documents: 8 New policy inquiry 2020/02/14 Bill past due 2019/11/03 Bill past due (2nd notice) 2019/12/03 Your new claim 2019/04/20 Your claim status updated 2019/04/15 Your new claim 2019/04/13 next >> Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25 Bucket pattern? Or outlier pattern?
  • 54.
    • Focus on80% use case, optimize for other 20% later • Create a collection representing the ideal case then “overflow” into another collection for everything else • An overflow collection will represent a separate code path and the extra/overflow documents exist in the overflow collection Outlier pattern
  • 55.
  • 56.
    0-10 11-20 21-3041-50 51-60 61-70 71-80 81-90 91-100 Documents per customer (Histogram) Your schema decisions can and should be data driven
  • 57.
    0-10 11-20 21-3041-50 51-60 61-70 71-80 81-90 91-100 Documents per customer (Histogram) Maximum size of bucket (60)
  • 58.
    0-10 11-20 21-3041-50 51-60 61-70 71-80 81-90 91-100 Documents per customer (Histogram) Outlier?
  • 59.
    The documents collection Bucket _idmatches users collection Bucket contains 6 documents* or some realistic maximum that is data driven But the users collection says “documentCount” is 8. Where are the rest? * This example uses a maximum bucket size of 6 to keep the example simple
  • 60.
    The documents collection Bucketof Six Bucket of two Total of eight documents, in two buckets
  • 61.
    The documents_overflow collection (orthe bucket pattern) Bucket These represent the same data; pick one Overflow Overflow
  • 62.
    The documents_overflow collection (orthe bucket pattern) Overflow Overflow Bucket These represent the same data; pick one
  • 63.
    Mongo Insurance CorporationJohn Doe Car Insurance Details Car 1 - Supercar XLT (2020) 4 wheel drive, 8 cyl Premium: $2000 Car 2 - Compactcar (2010) 2 wheel drive, 4 cyl Premium: $800 Open claims Collision Damaged rear bumper Trunk body damage Find a partner repair center 54323 Views Policies ● Home ● Car ● Earthquake Open Claims 2 Billing ● Due: $800 Documents 8 Messages 25
  • 64.
    • It’s afacsimile of the actual value (”close enough”) • Don’t take the compute cost for numbers that don’t matter • Leverage the “law of averages” to reduce overall load • Useful for page counters, Internet of Things (IoT), etc. Approximation pattern
  • 65.
  • 66.
    The counters collection Approximation Updatethe data randomly, about every 100 page loads: if(Rand(0,99) == 0) { db.counters.updateOne({ _id: “page_views” }, { $inc: { count: 100 } }) } The counters collection Approximation Update the data randomly, about every 100 page loads: if(Rand(0,99) == 0) { db.counters.updateOne({ _id: “page_views” }, { $inc: { count: 100 } }) }
  • 67.
  • 68.
    Takeaways Leverage flexible methodology Don’tre-invent the wheel, use the flexible methodology to make development faster
  • 69.
    Takeaways Leverage flexible methodology Don’tre-invent the wheel, use the flexible methodology to make development faster Share a common vocabulary Make it easy to talk to each other about patterns; share and use the common definition
  • 70.
    Takeaways Leverage flexible methodology Don’tre-invent the wheel, use the flexible methodology to make development faster Share a common vocabulary Make it easy to talk to each other about patterns; share and use the common definition Use patterns Use "plug-and-play" components for your future designs
  • 71.
    Additional Resources Blogs on Patterns byKen Alger and Daniel Coupal https://www.mongodb.com/blog/post/building- with-patterns-a-summary Course on Data Modeling: M320 university.mongodb.com Paging with the Bucket Pattern By Justin LaBreck https://www.mongodb.com/blog/post/paging- with-the-bucket-pattern--part-1
  • 72.
  • 73.
    Remember to take ourfree MongoDB courses! university.mongodb.com #MDBlive

Editor's Notes

  • #2 Notes for Justin and Daniel working on this presentation. Powerpoint/Google management keep slides from "Examples of Slides – Remove from Presentation" and on until we are done, otherwise we will lose the associated layouts. when uploading in Google Drive, use "manage versions" on the file Only use PowerPoint to make changes, not Google Slides Contents we need to have animations on our slides. The easiest ones are building the text that appear as bullet points. Otherwise I used red rectangles to highlights things. However, no worry, we can add these animations after. section 3 introduces the use case section 4 will apply patterns I used already existing diagrams to explain patterns, but please use better ones.
  • #6 Justin: “The reason we’re doing this talk is to help customers succeed. I work with customers often as a consulting engineer on creating schema designs for new projects or to migrate off legacy database platforms. Patterns are a useful way to jumpstart development and move quickly from design to implementation. We want to help you with this process, so let’s start by defining what patterns actually are.”
  • #8 We can make a parallel between MongoDB's schema design patterns with the software design patterns described by the Gang of Four. The software design patterns are not full solutions to problems. They are common unit of work these engineers were running in all the time when designing solutions for their customers.
  • #9 For over 12 years, we have been helping our customers build solutions.
  • #10 For over 12 years, we have been helping our customers build solutions.
  • #12 Today, we will go over a few of the patterns we identified. However, before we do so, let's see how they fit in our overall methodology for modeling for MongoDB
  • #14 Justin: ”While we’re on the topic of methodology, Daniel created another presentation that goes much more in-depth on data modeling methodology. It’s part MongoDB .live so it should be available for you to view if you haven’t already seen it."
  • #15 Justin: “In an ideal world, customers design with both simplicity and performance as the goal. MongoDB provides that mixture of simplicity and performance for most early release projects, allowing developers to focus on simplicity with performance automatically following. But as software becomes more complex, simplicity reduces naturally, and unfortunately performance reduces too as a result. Companies that experience growth have to scale. Those that scale successfully spend a lot of time focusing on performance. That performance, however, often comes at the cost of simplicity. For large projects, making the decision early to spend more time on schema design with an emphasis on performance will reduce the need for subsequent release phases and major schema changes in the long run.”
  • #16 Different inputs available Migrating from a RDBMS would provide logs and stats on the current system May be there are documented scenarios Or consult business Domain experts
  • #17 3 resources provide input to the "Workload" in terms of Size_date Number of operations Quantify operations -> durability. Outputs: Queries on tdata Potential Indexes Size of data Operation in terms of create/read/update and delete Assumptions (important) Assumptions change!?
  • #18 Athough MongoDB is a Document Store we do have relationship. Here we need to consult Data Modeling experts Business Domain expert The outputs: Collections Fields Shapes (types, sub document, array)
  • #19 At the end of step 2. you have answered the important questions how to model each relationship Should if be embedded Or will it be linked Step 3: is mostly around a lot of patterns on performance. You only need to apply them if they are needed More on why we have patterns and we will come back to them.
  • #24 Here are some tops requirements we have been given for this project.
  • #25 Transposing this in operations, we get … As for planning for the future portal for the users, eh, we will model this later. The great thing with MongoDB, compared to traditional relational databases, is that it is very easy to migrate your schema without downtime.
  • #26 Let's get some numbers and attributes for the operations …
  • #27 Well, the main thing here is that the operations that deal with the UI must all be super fast, so this should be our focus. I think by naming the project MIP, the stakeholders were giving us a hint that they wanted a superfast system.
  • #28 In other words, this is the UI, and …
  • #29 Loading the profile should be superfast
  • #30 Loading the overview of the account should be superfast
  • #31 And loading the details of a policy, claim, bill, document, or message should also be superfast
  • #32 We could embed all entities into the account one. This is simple, however it may lead to large documents and it may not give us the performance we want.
  • #34 Justin Base use cases on customer experiences
  • #35 Draw a physical box around everything that needs to be generated Find domains on page Data should be stored as it’s used Identify areas that require queries Minimize the number of queries
  • #36  Ask the question: What information will be required most often User information Queried at least once per session, possibly once for every page load Simple enough What is the second most used part of the site?
  • #37 What is the second most used portion of the site? Navigation. Lots of information here Budget exists for this section
  • #38 Time budget for “Load overview of account”
  • #39 Note: phase 2 and phase 3 of methodology are inherent here “Because we have the visuals, we can do both phases 2 and 3 at the same time” Experience will make this easier over time “Identify the relationship” and “apply a pattern” Visual representation Defines relationships Driven by requirements of the page
  • #40  Scope: user to everything A relationship exists between the user and everything because the user is authenticated What is displayed on the page??
  • #41 Looking at patterns: Simplicity versus performance Simplicity: keep everything in one large object All information about claims, documents, messages, etc However, create a large document, which is slow to load, takes up a lot of memory, and is cumbersome to maintain Performance: Going back to original goal, render the page as fast as possible Find balance between simplicity and performance We want this to be fast so we don’t want to read a lot of information
  • #42 Simplicity versus performance Counting claims, documents, and messages will be slow (happens every page load) We also don’t need a lot of claim information to get a list of claims, we just need names Save some memory by keeping just names We have patterns to speed this up Two patterns specifically Keep all fields here in the users collection Extended reference pattern Open claims Computed pattern Billing due
  • #43 Computed pattern pre-calculates values STOP AFTER FIRST BULLET Daniel: “A cached aggregation? That sounds a lot like a view from the SQL/relational world?” Justin: “Absolutely, but even better.”
  • #44 Justin: “In relational world, the view aggregates its underlying table. It’s even better in MongoDB because we’re using a pattern to keep the computed data available to be used where the application needs it.” On the left, aggregate across lots of rows or documents resulting in many calculations for every read On the right, we offload the work to writes (because we’re optimizing for reads) Additional rights results in significantly fewer reads The application already has this information available, read may not be necessary
  • #45 Denormalization Aside: how do you denormalize and keep data up-to-date Most common question Not as difficult as it sounds Large ecommerce site and reviews
  • #47 Users collection Many patterns build on each other
  • #49 Policy collection (click a policy) Bucket pattern Contains all details per policy in a single document Also uses extended reference for open claims (possible claims collection)
  • #50 END OF SLIDE, PAUSE Daniel: We really see the bucket pattern used for IoT
  • #51 PAUSE Daniel covers slide
  • #52 Policy collection (click a policy) Bucket pattern Contains all details per policy in a single document Also uses extended reference for open claims (possible claims collection)
  • #53  Policies needs to reference a car We don’t need all that car information Just keeping the fields needed to display quickly In-between a reference and embedded
  • #54 Another use of the bucket pattern Pick a pattern, bucket or overflow Messages can be long so maximum number of items in the bucket Multiple buckets create subset pattern Total documents available from users collection “documentCount”
  • #64 Approximation pattern Customer story Everyone probably missed this 90% CPU usage on updates Exact number unnecessary (nobody knew it was there)
  • #74 Before you go, I want to remember to go register at University.mongodb.com We have courses that cover everything you want to learn about MongoDB. Courses are free. Our goal is for you to have the required knowledge to be successful in building and deploying systems using MongoDB. Thanks for listening, and enjoy the other presentations at this conference.