1. Advanced Schema Design Patterns
Justin LaBreck
Principal Consulting Engineer
Daniel Coupal
Education Department
2. Goals of the Presentation
What are Patterns?
What are
schema design
patterns and
why are we
using them?
3. Goals of the Presentation
What are Patterns? Methodology
What are
schema design
patterns and
why are we
using them?
A quick
overview of our
methodology
4. Goals of the Presentation
What are Patterns? Methodology Use Case
What are
schema design
patterns and
why are we
using them?
A quick
overview of our
methodology
Single View
An insurance
company
needs to unify
a few systems
using RDBMS
5. Goals of the Presentation
What are Patterns? Methodology Use Case Applying Patterns
What are
schema design
patterns and
why are we
using them?
A quick
overview of our
methodology
Single View
An insurance
company
needs to unify
a few systems
using RDBMS
• bucket
• computed
• extended
reference
• outlier
• approximation
6. What are Patterns?
What are schema design patterns and why are we using them?
What are patterns?
7.
8. Building Blocks
identified by our Consulting Engineers
helping customers for the last 12 years.
Common Language
Data Architects and Engineers can easily
reference the same things.
What are Patterns?
?
9. Improve Performance
by using no more resources than you
should
Simplify the access to the data
by grouping and pre-arranging data in a
simpler form
What can patterns
do for you?
☑️
10. Duplication of Data
to avoid reading from many collections
for common queries
Staleness of Data
for pre-computed fields updated on a
given frequency
De-normalizations
to make data easier and faster to retrieveAny caution in using
the patterns?
17. 1. Describe the Workload
2. Identify and Model
the Relationships
Methodology
18. 1. Describe the Workload
2. Identify and Model
the Relationships
3. Apply Patterns
Methodology
19. Flexible Methodology
Goal Simplicity
1. Describe the
Workload
Most frequent Operation
2. Identify and Model
the Relationships
Mostly embedding
3. Apply Patterns Pattern A
20. Flexible Methodology
Goal Simplicity Performance
1. Describe the
Workload
Most frequent Operation All Operations
Quantify Ops
Qualify Ops
2. Identify and Model
the Relationships
Mostly embedding Embedding and
linking
3. Apply Patterns Pattern A Pattern A
Pattern B
Pattern C
…
21. Flexible Methodology
Goal Simplicity Simplicity and
Performance
Performance
1. Describe the
Workload
Most frequent Operation Most Operations
Quantify Ops
All Operations
Quantify Ops
Qualify Ops
2. Identify and Model
the Relationships
Mostly embedding Embedding and
linking
Embedding and
linking
3. Apply Patterns Pattern A Pattern A
Pattern B
Pattern A
Pattern B
Pattern C
…
22. Use Case: Single View
Mongo Insurance Corporation needs to unify a few systems using RDBMS
What are patterns?What are Patterns? Methodology Use Case
24. 1 – Workload: list the operations
Query Operation Description
1. Sync changes to MIP write Apply changes to the different legacy databases to
MIP's MongoDB database
2. Load user profile read Create a session with the user information
3. Load overview of account read Gather all information for the main page: policies,
claims, payments, documents, and messages
4. Load detail of an artifact read Access a given policy, claim, payment, document,
or message
5. Process claims read Analyze all changes in claim for which actions
must be taken
6. User portal operations?
25. 1 – Workload: quantify/qualify the operations
Query Quantification Qualification
1. Sync changes to MIP 24 writes/day
< 10 mins
critical write
2. Load user profile 12 reads/sec (1 000 000 reads/day)
< 2 ms
no stale data
3. Load overview of account 12 reads/sec (1 000 000 reads/day)
< 2 ms
no stale data
4. Load detail of an artifact 115 reads/sec (10 000 000 reads/day)
< 2 ms
no stale data
5. Process claims 2 reads/day
< 5 mins
no stale data
collection scan
26. 1 – Workload: important queries
Query Quantification Qualification
1. Sync changes to MIP 24 writes/day
< 10 mins
critical write
2. Load user profile 12 reads/sec (1 000 000 reads/day)
< 2 ms
no stale data
3. Load overview of account 12 reads/sec (1 000 000 reads/day)
< 2 ms
no stale data
4. Load detail of an artifact 115 reads/sec (10 000 000 reads/day)
< 2 ms
no stale data
5. Process claims 2 reads/day
< 5 mins
no stale data
collection scan
27. Mongo Insurance Corporation John Doe
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
28. Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
29. Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
30. Mongo Insurance Corporation John Doe
Documents and Statements
54323 Views
Total documents: 8
New policy inquery 2020/02/14
Bill past due 2019/11/03
Bill past due (2nd notice) 2019/12/03
Your new claim 2019/04/20
Your claim status updated 2019/04/15
Your new claim 2019/04/13
next >>
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
33. Applying Patterns
Bucket, Computed, Extended Reference, Outlier, Approximation
What are patterns?What are Patterns? Methodology Use Case Applying Patterns
34. Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
35. Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Query user details once
then store in session
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
36. Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
37. Important operations
Query Quantification Qualification
1. Sync changes to MIP 24 writes/day
< 10 mins
critical write
2. Load user profile 12 reads/sec (1 000 000 reads/day)
< 2 ms
no stale data
3. Load overview of account 12 reads/sec (1 000 000 reads/day)
< 2 ms
no stale data
4. Load detail of an artifact 115 reads/sec (10 000 000 reads/day)
< 2 ms
no stale data
5. Process claims 2 reads/day
< 5 mins
no stale data
collection scan
38. 1. Describe the Workload
2. Identify and Model
the Relationships
3. Apply Patterns
Methodology
39. Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
41. Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Simplicity versus Performance
Policy collection?
Separate claims, billing, documents, and messages?
42. • Store a “computed” value instead of calculating every
time
• Fast for reads, additional writes required
• A “cached” aggregation, updated by the application
Computed pattern
44. • Keep a “reference” to data from one collection in another
• Have the data needed to generate a page load available to
the application when it is needed
• Denormalization to increase read performance
• MongoDB doesn’t have JOINs, it has $lookup
• One read is faster than two
Extended Reference pattern
48. Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $1000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $400
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Using the bucket pattern, each policy
exists in an array
Contains all information about each
policy within the array
49. • A single domain object is made up of multiple documents
• Leverage the power of arrays to “bucket” groups of data
together
e.g. 2 documents with buckets contain 50 objects
= 1 domain object with 100 objects
• Useful for keeping relevant data “close”, e.g. paging
• Define a max ”bucket size”, or the number of objects to
store per bucket
Bucket pattern
51. Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $1000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $400
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
52. The policies collection
Bucket
_id references users
collection
Enough policy information to
generate display
Extended
Reference
Extended
Reference
53. Mongo Insurance Corporation John Doe
Documents and Statements
54323 Views
Total documents: 8
New policy inquiry 2020/02/14
Bill past due 2019/11/03
Bill past due (2nd notice) 2019/12/03
Your new claim 2019/04/20
Your claim status updated 2019/04/15
Your new claim 2019/04/13
next >>
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
Bucket pattern? Or outlier pattern?
54. • Focus on 80% use case, optimize for other 20% later
• Create a collection representing the ideal case
then “overflow” into another collection for
everything else
• An overflow collection will represent a separate code
path and the extra/overflow documents exist in the
overflow collection
Outlier pattern
59. The documents collection
Bucket
_id matches users collection
Bucket contains 6 documents*
or some realistic maximum
that is data driven
But the users collection
says “documentCount” is 8.
Where are the rest?
* This example uses a maximum
bucket size of 6 to keep the example
simple
63. Mongo Insurance Corporation John Doe
Car Insurance Details
Car 1 - Supercar XLT (2020)
4 wheel drive, 8 cyl
Premium: $2000
Car 2 - Compactcar (2010)
2 wheel drive, 4 cyl
Premium: $800
Open claims
Collision
Damaged rear bumper
Trunk body damage
Find a partner repair center
54323 Views
Policies
● Home
● Car
● Earthquake
Open Claims 2
Billing
● Due: $800
Documents 8
Messages 25
64. • It’s a facsimile of the actual value (”close enough”)
• Don’t take the compute cost for numbers that don’t
matter
• Leverage the “law of averages” to reduce overall load
• Useful for page counters, Internet of Things (IoT), etc.
Approximation pattern
69. Takeaways
Leverage flexible methodology
Don’t re-invent the wheel, use the flexible
methodology to make development faster
Share a common vocabulary
Make it easy to talk to each other about patterns;
share and use the common definition
70. Takeaways
Leverage flexible methodology
Don’t re-invent the wheel, use the flexible
methodology to make development faster
Share a common vocabulary
Make it easy to talk to each other about patterns;
share and use the common definition
Use patterns
Use "plug-and-play" components for your future
designs
71. Additional
Resources
Blogs on Patterns
by Ken Alger and Daniel Coupal
https://www.mongodb.com/blog/post/building-
with-patterns-a-summary
Course on Data Modeling: M320
university.mongodb.com
Paging with the Bucket Pattern
By Justin LaBreck
https://www.mongodb.com/blog/post/paging-
with-the-bucket-pattern--part-1
Notes for Justin and Daniel working on this presentation.
Powerpoint/Google management
keep slides from "Examples of Slides – Remove from Presentation" and on until we are done, otherwise we will lose the associated layouts.
when uploading in Google Drive, use "manage versions" on the file
Only use PowerPoint to make changes, not Google Slides
Contents
we need to have animations on our slides. The easiest ones are building the text that appear as bullet points. Otherwise I used red rectangles to highlights things. However, no worry, we can add these animations after.
section 3 introduces the use case
section 4 will apply patterns
I used already existing diagrams to explain patterns, but please use better ones.
Justin: “The reason we’re doing this talk is to help customers succeed. I work with customers often as a consulting engineer on creating schema designs for new projects or to migrate off legacy database platforms. Patterns are a useful way to jumpstart development and move quickly from design to implementation. We want to help you with this process, so let’s start by defining what patterns actually are.”
We can make a parallel between MongoDB's schema design patterns with the software design patterns described by the Gang of Four.
The software design patterns are not full solutions to problems.
They are common unit of work these engineers were running in all the time when designing solutions for their customers.
For over 12 years, we have been helping our customers build solutions.
For over 12 years, we have been helping our customers build solutions.
Today, we will go over a few of the patterns we identified.
However, before we do so, let's see how they fit in our overall methodology for modeling for MongoDB
Justin: ”While we’re on the topic of methodology, Daniel created another presentation that goes much more in-depth on data modeling methodology. It’s part MongoDB .live so it should be available for you to view if you haven’t already seen it."
Justin: “In an ideal world, customers design with both simplicity and performance as the goal. MongoDB provides that mixture of simplicity and performance for most early release projects, allowing developers to focus on simplicity with performance automatically following. But as software becomes more complex, simplicity reduces naturally, and unfortunately performance reduces too as a result. Companies that experience growth have to scale. Those that scale successfully spend a lot of time focusing on performance. That performance, however, often comes at the cost of simplicity. For large projects, making the decision early to spend more time on schema design with an emphasis on performance will reduce the need for subsequent release phases and major schema changes in the long run.”
Different inputs available
Migrating from a RDBMS would provide logs and stats on the current system
May be there are documented scenarios
Or consult business Domain experts
3 resources provide input to the "Workload" in terms of
Size_date
Number of operations
Quantify operations -> durability.
Outputs:
Queries on tdata
Potential Indexes
Size of data
Operation in terms of create/read/update and delete
Assumptions (important)
Assumptions change!?
Athough MongoDB is a Document Store we do have relationship.
Here we need to consult
Data Modeling experts
Business Domain expert
The outputs:
Collections
Fields
Shapes (types, sub document, array)
At the end of step 2. you have answered the important questions how to model each relationship
Should if be embedded
Or will it be linked
Step 3: is mostly around a lot of patterns on performance. You only need to apply them if they are needed
More on why we have patterns and we will come back to them.
Here are some tops requirements we have been given for this project.
Transposing this in operations, we get
…
As for planning for the future portal for the users, eh, we will model this later. The great thing with MongoDB, compared to traditional relational databases, is that it is very easy to migrate your schema without downtime.
Let's get some numbers and attributes for the operations
…
Well, the main thing here is that the operations that deal with the UI must all be super fast, so this should be our focus.
I think by naming the project MIP, the stakeholders were giving us a hint that they wanted a superfast system.
In other words, this is the UI, and …
Loading the profile should be superfast
Loading the overview of the account should be superfast
And loading the details of a policy, claim, bill, document, or message should also be superfast
We could embed all entities into the account one. This is simple, however it may lead to large documents and it may not give us the performance we want.
Justin
Base use cases on customer experiences
Draw a physical box around everything that needs to be generated
Find domains on page
Data should be stored as it’s used
Identify areas that require queries
Minimize the number of queries
Ask the question: What information will be required most often
User information
Queried at least once per session, possibly once for every page load
Simple enough
What is the second most used part of the site?
What is the second most used portion of the site? Navigation.
Lots of information here
Budget exists for this section
Time budget for “Load overview of account”
Note: phase 2 and phase 3 of methodology are inherent here
“Because we have the visuals, we can do both phases 2 and 3 at the same time”
Experience will make this easier over time
“Identify the relationship” and “apply a pattern”
Visual representation
Defines relationships
Driven by requirements of the page
Scope: user to everything
A relationship exists between the user and everything because the user is authenticated
What is displayed on the page??
Looking at patterns: Simplicity versus performance
Simplicity: keep everything in one large object
All information about claims, documents, messages, etc
However, create a large document, which is slow to load, takes up a lot of memory, and is cumbersome to maintain
Performance:
Going back to original goal, render the page as fast as possible
Find balance between simplicity and performance
We want this to be fast so we don’t want to read a lot of information
Simplicity versus performance
Counting claims, documents, and messages will be slow (happens every page load)
We also don’t need a lot of claim information to get a list of claims, we just need names
Save some memory by keeping just names
We have patterns to speed this up
Two patterns specifically
Keep all fields here in the users collection
Extended reference pattern
Open claims
Computed pattern
Billing due
Computed pattern pre-calculates values
STOP AFTER FIRST BULLET
Daniel: “A cached aggregation? That sounds a lot like a view from the SQL/relational world?”
Justin: “Absolutely, but even better.”
Justin: “In relational world, the view aggregates its underlying table. It’s even better in MongoDB because we’re using a pattern to keep the computed data available to be used where the application needs it.”
On the left, aggregate across lots of rows or documents resulting in many calculations for every read
On the right, we offload the work to writes (because we’re optimizing for reads)
Additional rights results in significantly fewer reads
The application already has this information available, read may not be necessary
Denormalization
Aside: how do you denormalize and keep data up-to-date
Most common question
Not as difficult as it sounds
Large ecommerce site and reviews
Users collection
Many patterns build on each other
Policy collection (click a policy)
Bucket pattern
Contains all details per policy in a single document
Also uses extended reference for open claims (possible claims collection)
END OF SLIDE, PAUSE
Daniel: We really see the bucket pattern used for IoT
PAUSE
Daniel covers slide
Policy collection (click a policy)
Bucket pattern
Contains all details per policy in a single document
Also uses extended reference for open claims (possible claims collection)
Policies needs to reference a car
We don’t need all that car information
Just keeping the fields needed to display quickly
In-between a reference and embedded
Another use of the bucket pattern
Pick a pattern, bucket or overflow
Messages can be long so maximum number of items in the bucket
Multiple buckets create subset pattern
Total documents available from users collection “documentCount”
Approximation pattern
Customer story
Everyone probably missed this
90% CPU usage on updates
Exact number unnecessary (nobody knew it was there)
Before you go, I want to remember to go register at University.mongodb.com
We have courses that cover everything you want to learn about MongoDB.
Courses are free. Our goal is for you to have the required knowledge to be successful in building and deploying systems using MongoDB.
Thanks for listening, and enjoy the other presentations at this conference.