At this point, you may be familiar with MongoDB and its Document Model.
However, what are the methods you can use to create an efficient database schema quickly and effectively?
This presentation will explore the different phases of a methodology to create a database schema. This methodology covers the description of your workload, the identification of the relationships between the elements (one-to-one, one-to-many and many-to-many) and an introduction to design patterns. Those patterns present practical solutions to different problems observed while helping our customers over the last 10 years.
In this session, you will learn about:
The differences between modeling for MongoDB versus a relational database.
A flexible methodology to model for MongoDB, which can be applied to simple projects, agile ones or more complex ones.
Overview of some common design patterns that help improve the performance of systems.
3. Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns
4. Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns
5. Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns
8. Thinking in Documents
1. Polymorphism
• different documents may contain
different fields
2. Array
• represent a "one-to-many" relation
• index is on all entries
3. Sub Document
• grouping some fields together
4. JSON/BSON
• documents are often shown as
JSON
• BSON is the physical format
12. Relationnel MongoDB
Steps to create the model 1 – define schema
2 – develop app and queries
1 – identifying the queries
2 – define schema
Initial schema 3rd normal form
One solution
many solutions possible
Final schema likely denormalized few changes
Schema evolution difficult and not optimal
Likely downtime
easy and no downtime
Performance mediocre optimized
Differences: Relational/Tabular vs
Document
13. Other Considerations for the
Model1. one-to-many relationships where "many" is a humongous number
2. Embed or Reference
• Joins via $lookup
• Transactions for multi document writes
3. Transactions available for Replica set, and soon for Sharded Clusters
4. Sharding Key
5. Indexes
6. Simple queries, or more complex ones with the Aggregation Framework
24. Case Study: Cuppa Coffee
A. Business: coffee shop franchises
B. Name: Cuppa Coffee
also considered: Coffee Mate, Crocodile Coffee
C. Objective:
• 10 000 stores in Australia, New Zealand and South Asia
• … then we invade America
D. Keys to success:
• Best coffee in the world
• Technology
25. Make the Best Coffee in the World
23g of ground coffee in, 20g of extracted
coffee out, in approximately 20 seconds
1. Fill a small or regular cup with 80% hot
water (not boiling but pretty hot). Your
cup should be 150ml to 200ml in total
volume, 80% of which will be hot water.
2. Grind 23g of coffee into your portafilter
using the double basket. We use a scale
that you can get here.
3. Draw 20g of coffee over the hot water by
placing your cup on a scale, press tare
and extract your shot.
26. Technology
1. Measure inventory in real time
• Shelves with scales
2. Big Data collection on cups of coffee
• weighings, temperature, time to produce, …
3. Data Analysis
• Coffee perfection
• Rush hours -> staffing needs
4. MongoDB
28. 1 – Workload: List Queries
Query Operation Description
1. Coffee weight on the
shelves
write A shelf send information when coffee bags are
added or removed
2. Coffee to deliver to stores read How much coffee do we have to ship to the store in
the next days
3. Anomalies in the inventory read Analytics
4. Making a cup of coffee write A coffee machine reporting on the production of a
coffee cup
5. Analysis of cups of coffee read Analytics
6. Technical Support read Helping our franchisees
29. Query Quantification Qualification
1. Coffee weight on the shelves 10/day*shelf*store
=> 1/sec
<1s
critical write
2. Coffee to deliver to stores 1/day*store
=> 0.1/sec
<60s
3. Anomalies in the inventory 24 reads/day <5mins
"collection scan"
4. Making a cup of coffee 10 000 000 writes/day
115 writes/sec
<100ms
non-critical write
… cups of coffee at rush hour 3 000 000 writes/hr
833 writes/sec
<100ms
non-critical write
5. Analysis of cups of coffee 24 reads/day stale data is fine
"collection scan"
6. Technical Support 1000 reads/day <1s
1 – Workload: quantify/qualify the
queries
30. 1 – Workload: quantify/qualify the
queriesQuery Quantification Qualification
1. Coffee weight on the shelves 10/day*shelf*store
=> 1/sec
<1s
critical write
2. Coffee to deliver to stores 1/day*store
=> 0.1/sec
<60s
3. Anomalies in the inventory 24 reads/day <5mins
"collection scan"
4. Making a cup of coffee 10 000 000 writes/day
115 writes/sec
<100ms
non-critical write
… cups of coffee at rush hour 3 000 000 writes/hr
833 writes/sec
<100ms
non-critical write
5. Analysis of cups of coffee 24 reads/day stale data is fine
"collection scan"
6. Technical Support 1000 reads/day <1s
31. Disk Space
Cups of coffee (one year of data)
• 10000 x 1000/day x 365
• 3.7 billions/year
• 370 GB (100 bytes/cup of coffee)
Weighings
• 10000 x 10/day x 365
• 365 billions/year
• 3.7 GB (100 bytes/weighings)
33. 2 - Relations are still important
Type of Relation -> one-to-one/1-1 one-to-many/1-N many-to-many/N-N
Document embedded
in the parent
document
• one read
• no joins
• one read
• no joins
• one read
• no joins
• duplication of
information
Document referenced
in the parent
document
• smaller reads
• many reads
• smaller reads
• many reads
• smaller reads
• many reads
37. Schema Design Patterns
RessourcesA. Advanced Schema Design
Patterns
• MongoDB World 2017
• Webinar
B. MongoDB University
• university.mongodb.com
• M320 – Data Modeling (2019)
C. Blogs on Schema Design
Patterns
D. Appendix to this presentation
• Schema Versioning Pattern
• Computed Pattern
47. Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
48. Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
• Workload
• Relationships
• Patterns
49. Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
• Workload
• Relationships
• Patterns
Recognize the need
and when to apply
Schema Design
Patterns
54. This is what your dreams should be
when
thinking about a schema upgrade !
55. Schema Revision
Relational MongoDB
Versioned Unit Schema Document
Migration Procedure Difficult Easy
Service Uptime Interrupted No interruption
Rollback Difficult to nightmare-ish Easy
56.
57.
58. Application Lifecycle
Modify Application
• Can read/process all versions of documents
• Have different handler per version
• Reshape the document before processing it
Update all Application servers
• Install updated application
• Remove old processes
Once migration completed
• remove the code to process old versions.
59. Document Lifecycle
New Documents:
• Application writes them in latest version
Existing Documents
A) Use updates to documents
• to transform to latest version
• keep forever documents that
never need an update
B) or transform all documents in
batch
• no worry even if process takes
days
61. Problem Solution
Use Cases Examples Benefits and Trade-Offs
Schema Versioning Pattern
● Avoid downtime while doing schema
upgrades
● Upgrading all documents can take hours,
days or even weeks when dealing with
big data
● Don't want to update all documents
✅ No downtime needed
✅ Feel in control of the migration
✅ Less future technical debt
� May need 2 indexes for same field while
in migration period
● Each document gets a "schema_version"
field
● Application can handle all versions
● Choose your strategy to migrate the
documents
● Every application that use a database,
deployed in production and heavily used.
● System with a lot of legacy data
67. Problem Solution
Use Cases Examples Benefits and Trade-Offs
Computed Pattern
● Costly computation or manipulation of
data
● Executed frequently on the same data,
producing the same result
✅ Read queries are faster
✅ Saving on resources like CPU and Disk
� May be difficult to identify the need
� Avoid applying or overusing it unless
needed
● Perform the operation and store the result
in the appropriate document and
collection
● If need to redo the operations, keep the
source of them
● Internet Of Things (IOT)
● Event Sourcing
● Time Series Data
● Frequent Aggregation Framework queries
Editor's Notes
Modelling (Aus) vs Modeling (USA)
Thanks for attending the conference
Topic is data modeling, more specifically data modeling for MongoDB
Why this presentation?
More than using examples of documents
Complement of Schema Design Patterns Talks
1. Recognize the differences when modelling for a Document Database vs a Relational Database
2. Summarize the steps of a flexible methodology
3. Recognize the need and when to apply Schema Design Patterns
Document is key-value pairs, key being the column name and value, the associated value
Value can be usual types: string, number, geolocation
Or subdocument
Or array
Or array of subdocument
Polymorphism
Array
Sub Document
JSON/BSON
Arrays model a one-to-many relationship.
Array of document is the result of a join on 2 tables.
There is only one solution to represent a relationship between 2 fields
Left solution:
Simpler
Oriented "articles"
Right solution
A little more complex
Oriented 'articles' and 'users'
The question is how are you going to use the data, what are the queries?
Left:
Normalized representation
Right
pre-computed, we write every picture/blog to all the consumers/friends. It takes more space, the writes are slower, however the reads are faster
Maybe the speed of reads make or break your system. Users will navigate away if the pages don't load fast enough.
- We also refer to a Relational Database as a Tabular Database
Different inputs available
Migrating from a RDBMS would provide logs and stats on the current system
Units of information for the domain to model
Example from a movie Website
- When you think in documents, you may assign the reviews info directly in the movies
- A lot of patterns are about performance. Only apply them if they are needed
- One query dwarfs the rest, this will help us provision the I/O
- One collection dwarfs the other one, this will help size the disks
- Let's go quickly over some of them for our use case
TODO
- Use 3 columns like in this presentation: https://docs.google.com/presentation/d/1IYlqAk6LtKIP6ZKjqQW6TGPJbD4hJ2w9smQmArhu0e0/edit?ts=5c606c37#slide=id.g4c8e0a0b6f_0_14
TODO
- Use 3 columns like in this presentation: https://docs.google.com/presentation/d/1IYlqAk6LtKIP6ZKjqQW6TGPJbD4hJ2w9smQmArhu0e0/edit?ts=5c606c37#slide=id.g4c8e0a0b6f_0_14
TODO
- Use 3 columns like in this presentation: https://docs.google.com/presentation/d/1IYlqAk6LtKIP6ZKjqQW6TGPJbD4hJ2w9smQmArhu0e0/edit?ts=5c606c37#slide=id.g4c8e0a0b6f_0_14