Webinar: MongoDB Schema Design and Performance Implications

MongoDB Schema
Design Patterns
Jumpstart Session
@SigNarvaez

Sigfrido ”Sig” Narváez
Sr. Solutions Architect, MongoDB
sigfrido@mongodb.com
@SigNarvaez

Agenda
Medical Record
Example01 Modeling
Relationships03Schema Design:
MongoDB vs.
Relational
02
Performance
04 Summary
Q&A06What’s new with
3.205

Medical Records
• Collects all patient information in a central repository
• Provide central point of access for
• Patients
• Care providers: physicians, nurses, etc.
• Billing
• Insurance reconciliation
• Hospitals, physicians, patients, procedures, records
Patient
Records
Medications
Lab Results
Procedures
Hospital
Records
Physicians
Patients
Nurses
Billing

Medical Record Data
• Hospitals
• have physicians
• Physicians
• Have patients
• Perform procedures
• Belong to hospitals
• Patients
• Have physicians
• Are the subject of procedures
• Procedures
• Associated with a patient
• Associated with a physician
• Have a record
• Variable meta data
• Records
• Associated with a procedure
• Binary data
• Variable fields

Schema Design: MongoDB vs.
Relational

MongoDB Relational
Collections Tables
Documents Rows
Data Use Data Storage
What questions do I have? What answers do I have?
MongoDB vs. Relational

Attribute MongoDB Relational
Storage N-dimensional Two-dimensional
Field Values 0, 1, many, or embed Single value
Query Any field, at any level Any field
Schema Flexible Very structured
MongoDB vs. Relational

Documents are Rich Data Structures
{
first_name: ‘Paul’,
last_name: ‘Miller’,
cell: 1234567890,
city: ‘London’,
location: [45.123,47.232],
professions: [‘banking’, ‘finance’, ‘trader’],
physicians: [
{ name: ‘Canelo Álvarez, M.D.’,
last_visit: ‘Del Carmen Hospital’,
last_visit_dt: ‘20160501’, … },
{ name: ‘Érik Morales, M.D.’,
last_visit: ‘Del Prado Hospital’,
last_visit_dt: ‘20160302’, … }
]
}
Fields can contain an array of sub-
documents
Fields
Strongly Typed field values
Fields can
contain arrays
Fields can be indexed and queried at
any level
ORM Layer removed – Data is already
an object!

Referencing & Embedding
https://docs.mongodb.com/manual/core/data-modeling-introduction/

Procedure
• patient
• date
• type
• physician
• type
Results
• dataType
• size
• content: {…}
Use two collections with a
reference field – “relational”
Procedure
• patient
• date
• type
• results
• equipmentId
• data1
• data2
• physician
• Results
• type
• size
• content: {…}
Embedding
Document Schema
Referencing

Referencing
Procedure
{
"_id" : 333,
"date" : "2003-02-09T05:00:00"),
"hospital" : “County Hills”,
"patient" : “John Doe”,
"physician" : “Stephen Smith”,
"type" : ”Chest X-ray",
”result_id" : 134
}
Results
{
“_id” : 134
"type" : "txt",
"size" : NumberInt(12),
"content" : {
value1: 343,
value2: “abc”,
…
}
}

Embedding Procedure
{
"_id" : 333,
"date" : "2003-02-09T05:00:00"),
"hospital" : “County Hills”,
"patient" : “John Doe”,
"physician" : “Stephen Smith”,
"type" : ”Chest X-ray",
”result" : {
"type" : "txt",
"size" : NumberInt(12),
"content" : {
value1: 343,
value2: “abc”,
…
}
}
}

Embedding
• Advantages
• Retrieve all relevant information in a single query/document
• Avoid implementing joins in application code
• Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations
• Large documents mean more overhead if most fields are not relevant
• 16 MB document size limit

Atomicity
• Document operations are atomic
db.patients.update({_id: 12345},
{ $inc : { numProcedures : 1 },
$push : { procedures : “proc123” },
$set : { addr.state : “TX” }})
• No multi-document transactions
db.beginTransaction();
db.patients.update({_id: 12345}, …);
db.procedure.insert({_id: “proc123”, …});
db.records.insert({_id: “rec123”, …});
db.endTransaction();

Referencing
• Advantages
• Smaller documents
• Less likely to reach 16 MB document limit
• Infrequently accessed information not accessed on every query
• No duplication of data
• Limitations
• Two queries required to retrieve information
• Cannot update related information atomically

1-1: General Recommendations
• Embed
• No additional data duplication
• Can query or index on embedded
field
• e.g., “result.type”
• Exceptional cases…
• Embedding results in large
documents
• Set of infrequently access fields
{
"_id": 333,
"date": "2003-02-09T05:00:00",
"hospital": "County Hills",
"patient": "John Doe",
"physician": "Stephen Smith",
"type": "Chest X - ray",
"result": {
"type": "txt",
"size": 12,
"content": {
"value1": 343,
"value2": "abc"
}
}
}

{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
Patients
Embed
1-M
Modeled in 2 possible ways
{
_id: 2,
first: “Joe”,
addr: { …},
procedures: [12345, 12346]}
{
_id: 12345,
date: 2015-02-15,
…}
{
_id: 12346,
date: 2015-02-15,
…}
Patients
Reference
Procedures

1-M : General Recommendations
• Embed, when possible
• Many are weak entities
• Access all information in a single query
• Take advantage of update atomicity
• No additional data duplication
• Can query or index on any field
• e.g., { “phones.type”: “mobile” }
• Exceptional cases:
• 16 MB document size
• Large number of infrequently accessed fields
{
_id: 2,
first: “Joe”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
…},
{
id: 12346,
date: 2015-02-15,
…}]
}

M-M
Traditional Relational Association
Join table Physicians
name
specialty
phone
Hospitals
name
HosPhysicanRel
hospitalId
physicianId
X
Use arrays instead

{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…},
{
id: 12346,
name: “Mary Well”,
address: {…},
…}]
}
M-M
Embedding Physicians in Hospitals collection
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [
{
id: 63633,
name: “Harold Green”,
address: {…},
…},
{
id: 12345,
address: {…},
…}]
}
Data Duplication
…
is ok!

{
_id: 1,
beds: 131,
physicians: [12345, 12346]
}
M-M
Referencing
{
id: 63633,
name: “Harold Green”,
hospitals: [1,2],
…}
Hospitals
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [63633, 12345]
}
Physicians
{
id: 12345,
hospitals: [1],
…}
{
id: 12346,
hospitals: [1,2],
…}

M-M : General Recommendation
• Use case determines whether to reference or
embed:
1. Data Duplication
• Embedding may result in data duplication
• Duplication may be okay if reads
dominate updates
• Of the two, which one changes the
least?
2. Referencing may be required if many
related items
3. Hybrid approach
• Potentially do both .. It’s ok!
{
_id: 2,
beds: 131,
physicians: [12345, 12346]}
{
_id: 12345,
address: {…},
…}
{
_id: 12346,
address: {…},
…}
Hospitals
Reference
Physicians

Example 1: Hybrid
Approach
Embed and Reference

Healthcare Example
patients
procedures

Tailor Schema to Queries
{
"_id" : 593340651,
"first" : "Gregorio",
"last" : "Lang",
"addr" : {
"street" : "623 Flowers Rd",
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : ["551ac”, “343fs”]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Find all patients from NH that
have had chest x-rays

Tailor Schema to Queries (cont.)
{
"_id" : 593340651,
"last" : "Lang",
"addr" : {
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : [
{id : "551ac”,
type : “Chest X-ray”},
{id : “343fs”,
type : “Blood Test”}]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
}
Patient Procedure
3.2’s $lookup!!
(left-outer join)

Example 2: Time Series
Data
Medical Devices

Vital Sign Monitoring Device
Vital Signs Measured:
• Blood Pressure
• Pulse
• Blood Oxygen Levels
Produces data at regular intervals
• Once per minute
• Many Devices, Many Hospitals

Data From Vital Signs Monitoring Device
{
deviceId: 123456,
ts: ISODate("2013-10-16T22:07:00.000-0500"),
spO2: 88,
pulse: 74,
bp: [128, 80]
}
• One document x minute x device
• Relational approach

Document Per Hour (By minute)
{
deviceId: 123456,
ts: ISODate("2013-10-16T22:00:00.000-0500"),
spO2: { 0: 88, 1: 90, …, 59: 92},
pulse: { 0: 74, 1: 76, …, 59: 72},
bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]}
}
• 1 document x device x hour
• Store per-minute data at the hourly level
• Update-driven workload

Characterizing Write Differences
• Example: data generated every minute
• Recording the data for 1 patient for 1 hour:
Document Per Event
60 inserts
Document Per Hour
1 insert, 59 updates

Characterizing Read Differences
• Want to graph 24 hour of vital signs for a patient:
• Read performance is greatly improved
Document Per Event
1440 reads
Document Per Hour
24 reads

Characterizing Memory and Storage Differences
Document Per Minute Document Per Hour
Number Documents 52.6 Billion 876 Million
Total Index Size 6,364 GB 106 GB
_id index 1,468 GB 24.5 GB
{ts: 1, deviceId: 1} 4,895 GB 81.6 GB
Document Size 92 Bytes 758 Bytes
Database Size 4,503 GB 618 GB
• 100K Devices
• 1 years worth of data, at second resolution (365 x 24 x 60)

MongoDB 3.2 – a GIANT Release
Hash-Based Sharding
Roles
Kerberos
On-Prem Monitoring
2.2 2.4 2.6 3.0 3.2
Agg. Framework
Location-Aware
Sharding
$out
Index Intersection
Text Search
Field-Level Redaction
LDAP & x509
Auditing
Document Validation
Fast Failover
Simpler Scalability
Aggregation ++
Encryption At Rest
In-Memory Storage
Engine
BI Connector
$lookup
MongoDB Compass
APM Integration
Profiler Visualization
Auto Index Builds
Backups to File
System
Doc-Level
Concurrency
Compression
Storage Engine API
≤50 replicas
Auditing ++
Ops Manager

Tools
• mgenerate
• Part of mtools: https://github.com/rueckstiess/mtools/wiki/mgenerate
• Model schema using json definition
• Generate Millions of documents with random data
• How well does the schema work?
• Queries, Indexes, Data Size, Index Size, Replication
• Demo

Documents are Rich Data Structures{
first_name: ‘Paul’,
last_name: ‘Miller’,
cell: 1234567890,
city: ‘London’,
location: [45.123,47.232],
professions: [‘banking’, ‘finance’, ‘trader’],
physicians: [
{ name: ‘Canelo Álvarez, M.D.’,
last_visit: ‘Mission Hospital’,
last_visit_dt: ‘20160501’, … },
{ name: ‘Érik Morales, M.D.’,
last_visit: ‘Del Prado Hospital’,
last_visit_dt: ‘20160302’, … }
]
}
Fields can contain an array of sub-
documents
Fields
Typed field values
Fields can
contain arrays
Fields can be indexed and queried at
any level
ORM Layer removed – Data is already
an object!

Schema using mgenerate
{
"first_name" : { "$string" : { "length" : 30 }},
"last_name" : { "$string" : { "length" : 30 }},
"cell" : "$number",
"city" : { "$string" : { "length" : 30 }},
"location" : [ "$number", "$number"],
"professions" : { "$array" : [ {
"$choose" : [ "banking", "finance", "trader" ] },
{ "$number": [1, 3] }
] },
"physicians" : { "$array" : [
{
"name" : { "$string" : { "length" : 30 }},
"last_visit" : { "$string" : { "length" : 30 }},
"last_visit_dt" : "$datetime"
},
{ "$number" : [1, 5]}
] }
}
> mgenerate --host localhost --port 27017 -d webinar -c patients --drop -n 100 patients.json

Use Compass to visualize & query data!

Visual Query Profiler
Identify your slow-running queries with
the click of a button
Index Suggestions
Index recommendations to improve
your deployment
&

MongoDB 3.2 $lookup
{
"_id" : 593340651,
"last" : "Lang",
"addr" : {
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : [
{id : "551ac”,
type : “Chest X-ray”},
{id : “343fs”,
type : “Blood Test”}]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
}
Patient Procedure
3.2’s $lookup!!
(left-outer join)

MongoDB 3.2 $lookup
{ "_id": 593340651,
"first": "Gregorio",
"last": "Lang",
"addr": {
"street": "623 Flowers Rd",
"city": "Groton",
"state": "NH",
"zip": 3266 },
"physicians": [10387, 33456],
"procedures": ["551ac", "343fs"]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
}
Patient Procedure
Obtain Patient view with
Procedure details, but
without Physicians

MongoDB 3.2 $lookup
db.PatientsColl.aggregate([
{ "$match" : { "_id": 593340651 }},
{ "$unwind" : "$procedures"},
{ "$lookup" : {
"from" : "ProceduresColl",
"localField" : "procedures",
"foreignField": "_id",
"as" : "procs" }},
{ "$unwind" : "$procs" },
{ "$group" : { "_id" : { "_id" : "$_id",
"first" : "$first",
"last" : "$last",
"addr" : "$addr" },
"procedures" : { "$push" : "$procs"} }
},
{ "$project" : { "_id" : "$_id._id",
"first" : "$_id.first",
"last" : "$_id.last",
"addr" : "$_id.addr",
"procedures._id" : 1,
"procedures.type" : 1,
"procedures.date" : 1 }
}]);
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
{
"_id": 593340651,
"first": "Gregorio",
"last": "Lang",
"addr": {
"street": "623 Flowers Rd",
"city": "Groton",
"state": "NH",
"zip": 3266
},
"procedures": [{
"_id": "551ac",
"date": "2000-04-26",
"type": "Chest X-ray"
}, {
"_id": "343fs",
"date": "2000-04-26",
"type": "Blood Test"
}]
}
Obtain Patient view with
Procedure details, but
without Physicians

MongoDB 3.2 Document Validation
db.runCommand( {
collMod: "Patients",
validator: { $and: [
{ "first_name": { "$type": "string" }},
{ "last_name": { "$type": "string"}},
{ "physicians": { "$type": "array"}}
] },
validationLevel: "strict"
});
https://docs.mongodb.com/manual/core/document-validation/
All Patient records must
have alphanumeric data
for the first and last
name, and a list of
Physicians

Summary
Embedding and
Referencing01 Context of Application Data
and Query Workload
Decisions
031-1 : Embed
1-M : Embed
when possible
M-M : Hybrid
02
Different schemas may result
in dramatically different query
performance, data/index size
and hardware requirements!
Iterate
04 $lookup
Document Validation
3.2
06Measure data/index size, query
performance
- mgenerate/mtools
- Compass
- Cloud Manager / Ops
Manager
Tools!
05

Q&A Sigfrido Narváez
Sr. Solutions Architect, MongoDB

Webinar: MongoDB Schema Design and Performance Implications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Webinar: MongoDB Schema Design and Performance Implications

Similar to Webinar: MongoDB Schema Design and Performance Implications (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

Webinar: MongoDB Schema Design and Performance Implications

Editor's Notes