SlideShare a Scribd company logo
MongoDB Schema
Design Patterns
Jumpstart Session
@SigNarvaez
Sigfrido ”Sig” Narváez
Sr. Solutions Architect, MongoDB
sigfrido@mongodb.com
@SigNarvaez
Agenda
Medical Record
Example01 Modeling
Relationships03Schema Design:
MongoDB vs.
Relational
02
Performance
04 Summary
Q&A06What’s new with
3.205
Medical Record Example
Medical Records
• Collects all patient information in a central repository
• Provide central point of access for
• Patients
• Care providers: physicians, nurses, etc.
• Billing
• Insurance reconciliation
• Hospitals, physicians, patients, procedures, records
Patient
Records
Medications
Lab Results
Procedures
Hospital
Records
Physicians
Patients
Nurses
Billing
Medical Record Data
• Hospitals
• have physicians
• Physicians
• Have patients
• Perform procedures
• Belong to hospitals
• Patients
• Have physicians
• Are the subject of procedures
• Procedures
• Associated with a patient
• Associated with a physician
• Have a record
• Variable meta data
• Records
• Associated with a procedure
• Binary data
• Variable fields
Lot of Variability
Schema Design: MongoDB vs.
Relational
MongoDB Relational
Collections Tables
Documents Rows
Data Use Data Storage
What questions do I have? What answers do I have?
MongoDB vs. Relational
Attribute MongoDB Relational
Storage N-dimensional Two-dimensional
Field Values 0, 1, many, or embed Single value
Query Any field, at any level Any field
Schema Flexible Very structured
MongoDB vs. Relational
Complex Normalized Schemas
Complex Normalized Schemas
Documents are Rich Data Structures
{
first_name: ‘Paul’,
last_name: ‘Miller’,
cell: 1234567890,
city: ‘London’,
location: [45.123,47.232],
professions: [‘banking’, ‘finance’, ‘trader’],
physicians: [
{ name: ‘Canelo Álvarez, M.D.’,
last_visit: ‘Del Carmen Hospital’,
last_visit_dt: ‘20160501’, … },
{ name: ‘Érik Morales, M.D.’,
last_visit: ‘Del Prado Hospital’,
last_visit_dt: ‘20160302’, … }
]
}
Fields can contain an array of sub-
documents
Fields
Strongly Typed field values
Fields can
contain arrays
Fields can be indexed and queried at
any level
ORM Layer removed – Data is already
an object!
Modeling Relationships
1-1
Referencing & Embedding
https://docs.mongodb.com/manual/core/data-modeling-introduction/
Procedure
• patient
• date
• type
• physician
• type
Results
• dataType
• size
• content: {…}
Use two collections with a
reference field – “relational”
Procedure
• patient
• date
• type
• results
• equipmentId
• data1
• data2
• physician
• Results
• type
• size
• content: {…}
Embedding
Document Schema
Referencing
Referencing
Procedure
{
"_id" : 333,
"date" : "2003-02-09T05:00:00"),
"hospital" : “County Hills”,
"patient" : “John Doe”,
"physician" : “Stephen Smith”,
"type" : ”Chest X-ray",
”result_id" : 134
}
Results
{
“_id” : 134
"type" : "txt",
"size" : NumberInt(12),
"content" : {
value1: 343,
value2: “abc”,
…
}
}
Embedding Procedure
{
"_id" : 333,
"date" : "2003-02-09T05:00:00"),
"hospital" : “County Hills”,
"patient" : “John Doe”,
"physician" : “Stephen Smith”,
"type" : ”Chest X-ray",
”result" : {
"type" : "txt",
"size" : NumberInt(12),
"content" : {
value1: 343,
value2: “abc”,
…
}
}
}
Embedding
• Advantages
• Retrieve all relevant information in a single query/document
• Avoid implementing joins in application code
• Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations
• Large documents mean more overhead if most fields are not relevant
• 16 MB document size limit
Atomicity
• Document operations are atomic
db.patients.update({_id: 12345},
{ $inc : { numProcedures : 1 },
$push : { procedures : “proc123” },
$set : { addr.state : “TX” }})
• No multi-document transactions
db.beginTransaction();
db.patients.update({_id: 12345}, …);
db.procedure.insert({_id: “proc123”, …});
db.records.insert({_id: “rec123”, …});
db.endTransaction();
Embedding
• Advantages
• Retrieve all relevant information in a single query/document
• Avoid implementing joins in application code
• Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations
• Large documents mean more overhead if most fields are not relevant
• 16 MB document size limit
Referencing
• Advantages
• Smaller documents
• Less likely to reach 16 MB document limit
• Infrequently accessed information not accessed on every query
• No duplication of data
• Limitations
• Two queries required to retrieve information
• Cannot update related information atomically
1-1: General Recommendations
• Embed
• No additional data duplication
• Can query or index on embedded
field
• e.g., “result.type”
• Exceptional cases…
• Embedding results in large
documents
• Set of infrequently access fields
{
"_id": 333,
"date": "2003-02-09T05:00:00",
"hospital": "County Hills",
"patient": "John Doe",
"physician": "Stephen Smith",
"type": "Chest X - ray",
"result": {
"type": "txt",
"size": 12,
"content": {
"value1": 343,
"value2": "abc"
}
}
}
1-M
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
Patients
Embed
1-M
Modeled in 2 possible ways
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [12345, 12346]}
{
_id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…}
{
_id: 12346,
date: 2015-02-15,
type: “blood test”,
…}
Patients
Reference
Procedures
1-M : General Recommendations
• Embed, when possible
• Many are weak entities
• Access all information in a single query
• Take advantage of update atomicity
• No additional data duplication
• Can query or index on any field
• e.g., { “phones.type”: “mobile” }
• Exceptional cases:
• 16 MB document size
• Large number of infrequently accessed fields
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
M-M
M-M
Traditional Relational Association
Join table Physicians
name
specialty
phone
Hospitals
name
HosPhysicanRel
hospitalId
physicianId
X
Use arrays instead
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…},
{
id: 12346,
name: “Mary Well”,
address: {…},
…}]
}
M-M
Embedding Physicians in Hospitals collection
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [
{
id: 63633,
name: “Harold Green”,
address: {…},
…},
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…}]
}
Data Duplication
…
is ok!
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]
}
M-M
Referencing
{
id: 63633,
name: “Harold Green”,
hospitals: [1,2],
…}
Hospitals
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [63633, 12345]
}
Physicians
{
id: 12345,
name: “Joe Doctor”,
hospitals: [1],
…}
{
id: 12346,
name: “Mary Well”,
hospitals: [1,2],
…}
M-M : General Recommendation
• Use case determines whether to reference or
embed:
1. Data Duplication
• Embedding may result in data duplication
• Duplication may be okay if reads
dominate updates
• Of the two, which one changes the
least?
2. Referencing may be required if many
related items
3. Hybrid approach
• Potentially do both .. It’s ok!
{
_id: 2,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]}
{
_id: 12345,
name: “Joe Doctor”,
address: {…},
…}
{
_id: 12346,
name: “Mary Well”,
address: {…},
…}
Hospitals
Reference
Physicians
Performance
Example 1: Hybrid
Approach
Embed and Reference
Healthcare Example
patients
procedures
Tailor Schema to Queries
{
"_id" : 593340651,
"first" : "Gregorio",
"last" : "Lang",
"addr" : {
"street" : "623 Flowers Rd",
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : ["551ac”, “343fs”]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Find all patients from NH that
have had chest x-rays
Tailor Schema to Queries (cont.)
{
"_id" : 593340651,
"first" : "Gregorio",
"last" : "Lang",
"addr" : {
"street" : "623 Flowers Rd",
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : [
{id : "551ac”,
type : “Chest X-ray”},
{id : “343fs”,
type : “Blood Test”}]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Find all patients from NH that
have had chest x-rays
3.2’s $lookup!!
(left-outer join)
Example 2: Time Series
Data
Medical Devices
Vital Sign Monitoring Device
Vital Signs Measured:
• Blood Pressure
• Pulse
• Blood Oxygen Levels
Produces data at regular intervals
• Once per minute
• Many Devices, Many Hospitals
Data From Vital Signs Monitoring Device
{
deviceId: 123456,
ts: ISODate("2013-10-16T22:07:00.000-0500"),
spO2: 88,
pulse: 74,
bp: [128, 80]
}
• One document x minute x device
• Relational approach
Document Per Hour (By minute)
{
deviceId: 123456,
ts: ISODate("2013-10-16T22:00:00.000-0500"),
spO2: { 0: 88, 1: 90, …, 59: 92},
pulse: { 0: 74, 1: 76, …, 59: 72},
bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]}
}
• 1 document x device x hour
• Store per-minute data at the hourly level
• Update-driven workload
Characterizing Write Differences
• Example: data generated every minute
• Recording the data for 1 patient for 1 hour:
Document Per Event
60 inserts
Document Per Hour
1 insert, 59 updates
Characterizing Read Differences
• Want to graph 24 hour of vital signs for a patient:
• Read performance is greatly improved
Document Per Event
1440 reads
Document Per Hour
24 reads
Characterizing Memory and Storage Differences
Document Per Minute Document Per Hour
Number Documents 52.6 Billion 876 Million
Total Index Size 6,364 GB 106 GB
_id index 1,468 GB 24.5 GB
{ts: 1, deviceId: 1} 4,895 GB 81.6 GB
Document Size 92 Bytes 758 Bytes
Database Size 4,503 GB 618 GB
• 100K Devices
• 1 years worth of data, at second resolution (365 x 24 x 60)
MongoDB 3.2
MongoDB 3.2 – a GIANT Release
Hash-Based Sharding
Roles
Kerberos
On-Prem Monitoring
2.2 2.4 2.6 3.0 3.2
Agg. Framework
Location-Aware
Sharding
$out
Index Intersection
Text Search
Field-Level Redaction
LDAP & x509
Auditing
Document Validation
Fast Failover
Simpler Scalability
Aggregation ++
Encryption At Rest
In-Memory Storage
Engine
BI Connector
$lookup
MongoDB Compass
APM Integration
Profiler Visualization
Auto Index Builds
Backups to File
System
Doc-Level
Concurrency
Compression
Storage Engine API
≤50 replicas
Auditing ++
Ops Manager
Tools
• mgenerate
• Part of mtools: https://github.com/rueckstiess/mtools/wiki/mgenerate
• Model schema using json definition
• Generate Millions of documents with random data
• How well does the schema work?
• Queries, Indexes, Data Size, Index Size, Replication
• Demo
Documents are Rich Data Structures{
first_name: ‘Paul’,
last_name: ‘Miller’,
cell: 1234567890,
city: ‘London’,
location: [45.123,47.232],
professions: [‘banking’, ‘finance’, ‘trader’],
physicians: [
{ name: ‘Canelo Álvarez, M.D.’,
last_visit: ‘Mission Hospital’,
last_visit_dt: ‘20160501’, … },
{ name: ‘Érik Morales, M.D.’,
last_visit: ‘Del Prado Hospital’,
last_visit_dt: ‘20160302’, … }
]
}
Fields can contain an array of sub-
documents
Fields
Typed field values
Fields can
contain arrays
Fields can be indexed and queried at
any level
ORM Layer removed – Data is already
an object!
Schema using mgenerate
{
"first_name" : { "$string" : { "length" : 30 }},
"last_name" : { "$string" : { "length" : 30 }},
"cell" : "$number",
"city" : { "$string" : { "length" : 30 }},
"location" : [ "$number", "$number"],
"professions" : { "$array" : [ {
"$choose" : [ "banking", "finance", "trader" ] },
{ "$number": [1, 3] }
] },
"physicians" : { "$array" : [
{
"name" : { "$string" : { "length" : 30 }},
"last_visit" : { "$string" : { "length" : 30 }},
"last_visit_dt" : "$datetime"
},
{ "$number" : [1, 5]}
] }
}
> mgenerate --host localhost --port 27017 -d webinar -c patients --drop -n 100 patients.json
Use Compass to visualize & query data!
Visual Query Profiler
Identify your slow-running queries with
the click of a button
Index Suggestions
Index recommendations to improve
your deployment
&
MongoDB 3.2 $lookup
{
"_id" : 593340651,
"first" : "Gregorio",
"last" : "Lang",
"addr" : {
"street" : "623 Flowers Rd",
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : [
{id : "551ac”,
type : “Chest X-ray”},
{id : “343fs”,
type : “Blood Test”}]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Find all patients from NH that
have had chest x-rays
3.2’s $lookup!!
(left-outer join)
MongoDB 3.2 $lookup
{ "_id": 593340651,
"first": "Gregorio",
"last": "Lang",
"addr": {
"street": "623 Flowers Rd",
"city": "Groton",
"state": "NH",
"zip": 3266 },
"physicians": [10387, 33456],
"procedures": ["551ac", "343fs"]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Obtain Patient view with
Procedure details, but
without Physicians
MongoDB 3.2 $lookup
db.PatientsColl.aggregate([
{ "$match" : { "_id": 593340651 }},
{ "$unwind" : "$procedures"},
{ "$lookup" : {
"from" : "ProceduresColl",
"localField" : "procedures",
"foreignField": "_id",
"as" : "procs" }},
{ "$unwind" : "$procs" },
{ "$group" : { "_id" : { "_id" : "$_id",
"first" : "$first",
"last" : "$last",
"addr" : "$addr" },
"procedures" : { "$push" : "$procs"} }
},
{ "$project" : { "_id" : "$_id._id",
"first" : "$_id.first",
"last" : "$_id.last",
"addr" : "$_id.addr",
"procedures._id" : 1,
"procedures.type" : 1,
"procedures.date" : 1 }
}]);
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
{
"_id": 593340651,
"first": "Gregorio",
"last": "Lang",
"addr": {
"street": "623 Flowers Rd",
"city": "Groton",
"state": "NH",
"zip": 3266
},
"procedures": [{
"_id": "551ac",
"date": "2000-04-26",
"type": "Chest X-ray"
}, {
"_id": "343fs",
"date": "2000-04-26",
"type": "Blood Test"
}]
}
Obtain Patient view with
Procedure details, but
without Physicians
MongoDB 3.2 Document Validation
db.runCommand( {
collMod: "Patients",
validator: { $and: [
{ "first_name": { "$type": "string" }},
{ "last_name": { "$type": "string"}},
{ "physicians": { "$type": "array"}}
] },
validationLevel: "strict"
});
https://docs.mongodb.com/manual/core/document-validation/
All Patient records must
have alphanumeric data
for the first and last
name, and a list of
Physicians
Summary
Embedding and
Referencing01 Context of Application Data
and Query Workload
Decisions
031-1 : Embed
1-M : Embed
when possible
M-M : Hybrid
02
Different schemas may result
in dramatically different query
performance, data/index size
and hardware requirements!
Iterate
04 $lookup
Document Validation
3.2
06Measure data/index size, query
performance
- mgenerate/mtools
- Compass
- Cloud Manager / Ops
Manager
Tools!
05
Q&A Sigfrido Narváez
Sr. Solutions Architect, MongoDB

More Related Content

What's hot

MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
MongoDB
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
Bishal Khanal
 
MongoDB 101
MongoDB 101MongoDB 101
MongoDB 101
Abhijeet Vaikar
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
MongoDB
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the Roadmap
EDB
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
Norberto Leite
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizer
Mydbops
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
MongoDB
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
MongoDB
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
Surya937648
 
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB
 
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
NodeXperts
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
Habilelabs
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
MongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
Jason Terpko
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
MongoDB
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalog
MongoDB
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
Marco Segato
 
Get to know PostgreSQL!
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!
Oddbjørn Steffensen
 

What's hot (20)

MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
MongoDB 101
MongoDB 101MongoDB 101
MongoDB 101
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the Roadmap
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizer
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
 
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad Query
 
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalog
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
Get to know PostgreSQL!
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!
 

Similar to Webinar: MongoDB Schema Design and Performance Implications

Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance Implications
MongoDB
 
MongoDB Schema Design and its Performance Implications
MongoDB Schema Design and its Performance ImplicationsMongoDB Schema Design and its Performance Implications
MongoDB Schema Design and its Performance Implications
Lewis Lin 🦊
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
MongoDB
 
Working With Large-Scale Clinical Datasets
Working With Large-Scale Clinical DatasetsWorking With Large-Scale Clinical Datasets
Working With Large-Scale Clinical Datasets
Craig Smail
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
Globus
 
Accelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBAccelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDB
MongoDB
 
How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare Technology
MongoDB
 
Data Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationData Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data Proliferation
MongoDB
 
Accelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbAccelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo db
MongoDB
 
Painting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBPainting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDB
MongoDB
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
lucenerevolution
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
Lewis Lin 🦊
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
Kees van Bochove
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI
Matthieu Schapranow
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
Mike Hogarth, MD, FACMI, FACP
 
Data Management for Quantitative Biology - Database Systems (continued) LIMS ...
Data Management for Quantitative Biology - Database Systems (continued) LIMS ...Data Management for Quantitative Biology - Database Systems (continued) LIMS ...
Data Management for Quantitative Biology - Database Systems (continued) LIMS ...
QBiC_Tue
 
Systems of engagement
Systems of engagementSystems of engagement
Systems of engagement
Bryan Reinero
 
2012-ICGC-Heidelberg-Whitty-DCC 2
2012-ICGC-Heidelberg-Whitty-DCC 22012-ICGC-Heidelberg-Whitty-DCC 2
2012-ICGC-Heidelberg-Whitty-DCC 2
Brett Whitty
 
Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDB
MongoDB
 

Similar to Webinar: MongoDB Schema Design and Performance Implications (20)

Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance Implications
 
MongoDB Schema Design and its Performance Implications
MongoDB Schema Design and its Performance ImplicationsMongoDB Schema Design and its Performance Implications
MongoDB Schema Design and its Performance Implications
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
 
Working With Large-Scale Clinical Datasets
Working With Large-Scale Clinical DatasetsWorking With Large-Scale Clinical Datasets
Working With Large-Scale Clinical Datasets
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
 
Accelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBAccelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDB
 
How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare Technology
 
Data Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationData Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data Proliferation
 
Accelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbAccelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo db
 
Painting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBPainting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDB
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 
Data Management for Quantitative Biology - Database Systems (continued) LIMS ...
Data Management for Quantitative Biology - Database Systems (continued) LIMS ...Data Management for Quantitative Biology - Database Systems (continued) LIMS ...
Data Management for Quantitative Biology - Database Systems (continued) LIMS ...
 
Systems of engagement
Systems of engagementSystems of engagement
Systems of engagement
 
2012-ICGC-Heidelberg-Whitty-DCC 2
2012-ICGC-Heidelberg-Whitty-DCC 22012-ICGC-Heidelberg-Whitty-DCC 2
2012-ICGC-Heidelberg-Whitty-DCC 2
 
Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDB
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 

Recently uploaded (20)

UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 

Webinar: MongoDB Schema Design and Performance Implications

  • 2. Sigfrido ”Sig” Narváez Sr. Solutions Architect, MongoDB sigfrido@mongodb.com @SigNarvaez
  • 3. Agenda Medical Record Example01 Modeling Relationships03Schema Design: MongoDB vs. Relational 02 Performance 04 Summary Q&A06What’s new with 3.205
  • 5. Medical Records • Collects all patient information in a central repository • Provide central point of access for • Patients • Care providers: physicians, nurses, etc. • Billing • Insurance reconciliation • Hospitals, physicians, patients, procedures, records Patient Records Medications Lab Results Procedures Hospital Records Physicians Patients Nurses Billing
  • 6. Medical Record Data • Hospitals • have physicians • Physicians • Have patients • Perform procedures • Belong to hospitals • Patients • Have physicians • Are the subject of procedures • Procedures • Associated with a patient • Associated with a physician • Have a record • Variable meta data • Records • Associated with a procedure • Binary data • Variable fields
  • 8. Schema Design: MongoDB vs. Relational
  • 9. MongoDB Relational Collections Tables Documents Rows Data Use Data Storage What questions do I have? What answers do I have? MongoDB vs. Relational
  • 10. Attribute MongoDB Relational Storage N-dimensional Two-dimensional Field Values 0, 1, many, or embed Single value Query Any field, at any level Any field Schema Flexible Very structured MongoDB vs. Relational
  • 13. Documents are Rich Data Structures { first_name: ‘Paul’, last_name: ‘Miller’, cell: 1234567890, city: ‘London’, location: [45.123,47.232], professions: [‘banking’, ‘finance’, ‘trader’], physicians: [ { name: ‘Canelo Álvarez, M.D.’, last_visit: ‘Del Carmen Hospital’, last_visit_dt: ‘20160501’, … }, { name: ‘Érik Morales, M.D.’, last_visit: ‘Del Prado Hospital’, last_visit_dt: ‘20160302’, … } ] } Fields can contain an array of sub- documents Fields Strongly Typed field values Fields can contain arrays Fields can be indexed and queried at any level ORM Layer removed – Data is already an object!
  • 15. 1-1
  • 17. Procedure • patient • date • type • physician • type Results • dataType • size • content: {…} Use two collections with a reference field – “relational” Procedure • patient • date • type • results • equipmentId • data1 • data2 • physician • Results • type • size • content: {…} Embedding Document Schema Referencing
  • 18. Referencing Procedure { "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result_id" : 134 } Results { “_id” : 134 "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } }
  • 19. Embedding Procedure { "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : { "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } } }
  • 20. Embedding • Advantages • Retrieve all relevant information in a single query/document • Avoid implementing joins in application code • Update related information as a single atomic operation • MongoDB doesn’t offer multi-document transactions • Limitations • Large documents mean more overhead if most fields are not relevant • 16 MB document size limit
  • 21. Atomicity • Document operations are atomic db.patients.update({_id: 12345}, { $inc : { numProcedures : 1 }, $push : { procedures : “proc123” }, $set : { addr.state : “TX” }}) • No multi-document transactions db.beginTransaction(); db.patients.update({_id: 12345}, …); db.procedure.insert({_id: “proc123”, …}); db.records.insert({_id: “rec123”, …}); db.endTransaction();
  • 22. Embedding • Advantages • Retrieve all relevant information in a single query/document • Avoid implementing joins in application code • Update related information as a single atomic operation • MongoDB doesn’t offer multi-document transactions • Limitations • Large documents mean more overhead if most fields are not relevant • 16 MB document size limit
  • 23. Referencing • Advantages • Smaller documents • Less likely to reach 16 MB document limit • Infrequently accessed information not accessed on every query • No duplication of data • Limitations • Two queries required to retrieve information • Cannot update related information atomically
  • 24. 1-1: General Recommendations • Embed • No additional data duplication • Can query or index on embedded field • e.g., “result.type” • Exceptional cases… • Embedding results in large documents • Set of infrequently access fields { "_id": 333, "date": "2003-02-09T05:00:00", "hospital": "County Hills", "patient": "John Doe", "physician": "Stephen Smith", "type": "Chest X - ray", "result": { "type": "txt", "size": 12, "content": { "value1": 343, "value2": "abc" } } }
  • 25. 1-M
  • 26. { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”, …}, { id: 12346, date: 2015-02-15, type: “blood test”, …}] } Patients Embed 1-M Modeled in 2 possible ways { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [12345, 12346]} { _id: 12345, date: 2015-02-15, type: “Cat scan”, …} { _id: 12346, date: 2015-02-15, type: “blood test”, …} Patients Reference Procedures
  • 27. 1-M : General Recommendations • Embed, when possible • Many are weak entities • Access all information in a single query • Take advantage of update atomicity • No additional data duplication • Can query or index on any field • e.g., { “phones.type”: “mobile” } • Exceptional cases: • 16 MB document size • Large number of infrequently accessed fields { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”, …}, { id: 12346, date: 2015-02-15, type: “blood test”, …}] }
  • 28. M-M
  • 29. M-M Traditional Relational Association Join table Physicians name specialty phone Hospitals name HosPhysicanRel hospitalId physicianId X Use arrays instead
  • 30. { _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [ { id: 12345, name: “Joe Doctor”, address: {…}, …}, { id: 12346, name: “Mary Well”, address: {…}, …}] } M-M Embedding Physicians in Hospitals collection { _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [ { id: 63633, name: “Harold Green”, address: {…}, …}, { id: 12345, name: “Joe Doctor”, address: {…}, …}] } Data Duplication … is ok!
  • 31. { _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346] } M-M Referencing { id: 63633, name: “Harold Green”, hospitals: [1,2], …} Hospitals { _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [63633, 12345] } Physicians { id: 12345, name: “Joe Doctor”, hospitals: [1], …} { id: 12346, name: “Mary Well”, hospitals: [1,2], …}
  • 32. M-M : General Recommendation • Use case determines whether to reference or embed: 1. Data Duplication • Embedding may result in data duplication • Duplication may be okay if reads dominate updates • Of the two, which one changes the least? 2. Referencing may be required if many related items 3. Hybrid approach • Potentially do both .. It’s ok! { _id: 2, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346]} { _id: 12345, name: “Joe Doctor”, address: {…}, …} { _id: 12346, name: “Mary Well”, address: {…}, …} Hospitals Reference Physicians
  • 36. Tailor Schema to Queries { "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : ["551ac”, “343fs”] } { "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”] } Patient Procedure Find all patients from NH that have had chest x-rays
  • 37. Tailor Schema to Queries (cont.) { "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : [ {id : "551ac”, type : “Chest X-ray”}, {id : “343fs”, type : “Blood Test”}] } { "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”] } Patient Procedure Find all patients from NH that have had chest x-rays 3.2’s $lookup!! (left-outer join)
  • 38. Example 2: Time Series Data Medical Devices
  • 39. Vital Sign Monitoring Device Vital Signs Measured: • Blood Pressure • Pulse • Blood Oxygen Levels Produces data at regular intervals • Once per minute • Many Devices, Many Hospitals
  • 40. Data From Vital Signs Monitoring Device { deviceId: 123456, ts: ISODate("2013-10-16T22:07:00.000-0500"), spO2: 88, pulse: 74, bp: [128, 80] } • One document x minute x device • Relational approach
  • 41. Document Per Hour (By minute) { deviceId: 123456, ts: ISODate("2013-10-16T22:00:00.000-0500"), spO2: { 0: 88, 1: 90, …, 59: 92}, pulse: { 0: 74, 1: 76, …, 59: 72}, bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]} } • 1 document x device x hour • Store per-minute data at the hourly level • Update-driven workload
  • 42. Characterizing Write Differences • Example: data generated every minute • Recording the data for 1 patient for 1 hour: Document Per Event 60 inserts Document Per Hour 1 insert, 59 updates
  • 43. Characterizing Read Differences • Want to graph 24 hour of vital signs for a patient: • Read performance is greatly improved Document Per Event 1440 reads Document Per Hour 24 reads
  • 44. Characterizing Memory and Storage Differences Document Per Minute Document Per Hour Number Documents 52.6 Billion 876 Million Total Index Size 6,364 GB 106 GB _id index 1,468 GB 24.5 GB {ts: 1, deviceId: 1} 4,895 GB 81.6 GB Document Size 92 Bytes 758 Bytes Database Size 4,503 GB 618 GB • 100K Devices • 1 years worth of data, at second resolution (365 x 24 x 60)
  • 46. MongoDB 3.2 – a GIANT Release Hash-Based Sharding Roles Kerberos On-Prem Monitoring 2.2 2.4 2.6 3.0 3.2 Agg. Framework Location-Aware Sharding $out Index Intersection Text Search Field-Level Redaction LDAP & x509 Auditing Document Validation Fast Failover Simpler Scalability Aggregation ++ Encryption At Rest In-Memory Storage Engine BI Connector $lookup MongoDB Compass APM Integration Profiler Visualization Auto Index Builds Backups to File System Doc-Level Concurrency Compression Storage Engine API ≤50 replicas Auditing ++ Ops Manager
  • 47. Tools • mgenerate • Part of mtools: https://github.com/rueckstiess/mtools/wiki/mgenerate • Model schema using json definition • Generate Millions of documents with random data • How well does the schema work? • Queries, Indexes, Data Size, Index Size, Replication • Demo
  • 48. Documents are Rich Data Structures{ first_name: ‘Paul’, last_name: ‘Miller’, cell: 1234567890, city: ‘London’, location: [45.123,47.232], professions: [‘banking’, ‘finance’, ‘trader’], physicians: [ { name: ‘Canelo Álvarez, M.D.’, last_visit: ‘Mission Hospital’, last_visit_dt: ‘20160501’, … }, { name: ‘Érik Morales, M.D.’, last_visit: ‘Del Prado Hospital’, last_visit_dt: ‘20160302’, … } ] } Fields can contain an array of sub- documents Fields Typed field values Fields can contain arrays Fields can be indexed and queried at any level ORM Layer removed – Data is already an object!
  • 49. Schema using mgenerate { "first_name" : { "$string" : { "length" : 30 }}, "last_name" : { "$string" : { "length" : 30 }}, "cell" : "$number", "city" : { "$string" : { "length" : 30 }}, "location" : [ "$number", "$number"], "professions" : { "$array" : [ { "$choose" : [ "banking", "finance", "trader" ] }, { "$number": [1, 3] } ] }, "physicians" : { "$array" : [ { "name" : { "$string" : { "length" : 30 }}, "last_visit" : { "$string" : { "length" : 30 }}, "last_visit_dt" : "$datetime" }, { "$number" : [1, 5]} ] } } > mgenerate --host localhost --port 27017 -d webinar -c patients --drop -n 100 patients.json
  • 50. Use Compass to visualize & query data!
  • 51. Visual Query Profiler Identify your slow-running queries with the click of a button Index Suggestions Index recommendations to improve your deployment &
  • 52. MongoDB 3.2 $lookup { "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : [ {id : "551ac”, type : “Chest X-ray”}, {id : “343fs”, type : “Blood Test”}] } { "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”] } Patient Procedure Find all patients from NH that have had chest x-rays 3.2’s $lookup!! (left-outer join)
  • 53. MongoDB 3.2 $lookup { "_id": 593340651, "first": "Gregorio", "last": "Lang", "addr": { "street": "623 Flowers Rd", "city": "Groton", "state": "NH", "zip": 3266 }, "physicians": [10387, 33456], "procedures": ["551ac", "343fs"] } { "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”] } Patient Procedure Obtain Patient view with Procedure details, but without Physicians
  • 54. MongoDB 3.2 $lookup db.PatientsColl.aggregate([ { "$match" : { "_id": 593340651 }}, { "$unwind" : "$procedures"}, { "$lookup" : { "from" : "ProceduresColl", "localField" : "procedures", "foreignField": "_id", "as" : "procs" }}, { "$unwind" : "$procs" }, { "$group" : { "_id" : { "_id" : "$_id", "first" : "$first", "last" : "$last", "addr" : "$addr" }, "procedures" : { "$push" : "$procs"} } }, { "$project" : { "_id" : "$_id._id", "first" : "$_id.first", "last" : "$_id.last", "addr" : "$_id.addr", "procedures._id" : 1, "procedures.type" : 1, "procedures.date" : 1 } }]); https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/ { "_id": 593340651, "first": "Gregorio", "last": "Lang", "addr": { "street": "623 Flowers Rd", "city": "Groton", "state": "NH", "zip": 3266 }, "procedures": [{ "_id": "551ac", "date": "2000-04-26", "type": "Chest X-ray" }, { "_id": "343fs", "date": "2000-04-26", "type": "Blood Test" }] } Obtain Patient view with Procedure details, but without Physicians
  • 55. MongoDB 3.2 Document Validation db.runCommand( { collMod: "Patients", validator: { $and: [ { "first_name": { "$type": "string" }}, { "last_name": { "$type": "string"}}, { "physicians": { "$type": "array"}} ] }, validationLevel: "strict" }); https://docs.mongodb.com/manual/core/document-validation/ All Patient records must have alphanumeric data for the first and last name, and a list of Physicians
  • 56. Summary Embedding and Referencing01 Context of Application Data and Query Workload Decisions 031-1 : Embed 1-M : Embed when possible M-M : Hybrid 02 Different schemas may result in dramatically different query performance, data/index size and hardware requirements! Iterate 04 $lookup Document Validation 3.2 06Measure data/index size, query performance - mgenerate/mtools - Compass - Cloud Manager / Ops Manager Tools! 05
  • 57. Q&A Sigfrido Narváez Sr. Solutions Architect, MongoDB

Editor's Notes

  1. Hi my name is Sigfrido Narvaez, and I like to go by Sig. Today we will be talking about MongoDB schema design and some of its performance implications. We will also explore some of the new features in MongoDB 3.2 that are relevant to schema design, and some additional tools that will help you iterate and try out different approaches quickly. During the webinar, please feel free to type any questions in the chat box, and at the end, we will have a Q&A session and answer as much as we can.
  2. Ok, so I am a Sr. Solutions Architect here at MongoDB based out of Southern California, and prior to joining I was the Principal Software Architect for a Hybrid Cloud & Polyglot Persistence solution that used MongoDB, and that required leveraging MongoDB’s dynamic flexible schema to power cloud and mobile apps whose main source of data originated from many on-premise ERP’s. And I have also been organizing the orange county MUG for almost 4 years. I have provided my email address and my Twitter handle, in case I don't cover all the questions or we any follow-ups, so please feel free to reach out with any questions afterwards and I will make sure I find the information you are looking for.
  3. The agenda for today’s presentation, We will use a medical record example, and explore its schema in MongoDB vs. relational, using Embedding & Referencing, and comparing against the classic 1-1, 1-M & M-M. We will then jump into a performance analysis examining data and index growth, and finally, explore new features in MongoDB 3.2
  4. Design a schema for a medical information system. Where we will need to store data for the Patients, the Physicians, the Procedures, and many other aspects about a medical system. And all this data is interrelated and we have to assume the system will be around for many years and will grow over time.
  5. Left-down to right-down Let's examine the data entities that are going to be part of this system. First we have hospitals and hospitals have many physicians, Then we have the physicians who attend many patients and that will perform many procedures, and who themselves belong to many hospitals The patients, who again are attended by many physicians, they are the subject of many procedures The Procedures are of course applied by a physician to a patient, inside a hospital on a particular time, and the data that is produced by each of these procedures can vary a lot. For example an x-ray procedure will produce a bunch of data along with an image or set of images, but a blood test will only produce a bunch of data. Each procedure has different data, schema design problem As we can see the main entities and their relationships maybe a great fit for a relational database.
  6. But the procedures data is not, and, overtime procedures will change and use new medical devices or go through improvements and may produce even more data with more variability, and we still have to keep historical records too. This is a real challenge for a relational database But for MongoDB and the flexible document model, this is easy. The way we would model this is by having some common data points that all procedures have, such as the timestamp, the physician, the patient, and the hospital, and any other common fields but then have a variable section in the json document, for the unique data points of each procedure. This will make great use of the polymorphic schema capabilities of MongoDB, and with modern languages, this can be modeled using base classes and extensions or inheritance.
  7. Before we go into the modeling exercises, let's do a level set of understanding of MongoDB concepts versus Relational concepts
  8. In MongoDB data is stored in a collection and that is analogous to a table. Collections contain Documents and that is analogous to a Row or Record. More importantly in mongo DB we think about what data do I need to use and how will it be used, versus how will the data be stored. In MongoDB, we need to look at queries to guide schema design decisions, where as in relational we model first, and then answer questions, and eventually add Indexes and in some cases, denormalize data to support queries and performance over time.
  9. Another difference is that in mongo DB fields have many dimensions versus just having two (rows and columns) Each field can contain 0, 1 or many values such as an Array, or even embedded such as sub-documents, and the type can vary from document to document. vs. a single value of a pre-defined type. I can also query at any field and at any level in the document versus a single field, and we
  10. Okay so when we start modeling data, the first thing to avoid is to think of every single little thing that I may not use immediately, which usually leads to creating complex over normalized schemas
  11. DO NOT PERFORM 3rd normal form modeling, and create hundreds of tables, where you have join tables for M-M’s and store all kinds of entities entities which will be very difficult to join, will slow down performance and will be hard to maintain over time.
  12. Instead what we do is create rich data structures that are single documents. As you can see in this example we have many fields about a patient, where they live, what professions they practice, a list of the physicians they're currently seeing, when was the last visit, etc. So I can get a quick view of a patient in a single document. Now we have talked about MongoDB having strong data types, such as strings and numbers, but we also have more advanced data types such as coordinates, and arrays of other sub-documents In MongoDB I can query and index using any number of fields at any level, and the document is already in object form so I don't need an ORM layer like Hibernate or Entity Framework to translate data from relational to object, the data is already an object.
  13. Two ways to model relationships: Referencing and Embedding. Referencing is a very relational-like approach where I duplicate ID’s across collections. But take into account that MongoDB does not enforce foreign key constraints, so if you were to delete a master document, you will likely end-up with orphans and this has to be handled by the application level. Embedding is more natural to MongoDB and it works by nesting data inside a single document. There May or may not be a need to generate an ID for nested data, but for sure there is no need to duplicate them as everything lives together.
  14. So how does this apply to our medical schema? Let’s look at Procedures and Results. With Referencing I could use two collections and have a relationship between them. In Embedding, I could embed the results inside of the procedure. Now, something to think about, which of these two entities is a strong entity and a weak entity. Clearly the Results is weak as it cannot exist without a Procedure.
  15. Here is how the referencing approach would look like. Obtaining all the data I need will require two reads and two roundtrips to the database. Notice we have placed the result ID in the procedure, Why? Because my application will display Procedures and their Results. This way I only need to read the Procedure collection and then lookup the Result document by its ID, and I can perform this lookup in the application layer. And to give you a hint about the latter section of the presentation, with MongoDB 3.2 I can use the $lookup pipeline stage to perform what is essentially a left-outer join performed at the database layer. Take a second to think about this design, using classic relational modeling and considering the strong and weak entities, I would have probably placed the ProcedureID in the Results. But then I would need to create an additional index which costs disk and memory
  16. However, with the Embedding approach, this is quite easy to model and getting my data requires a single read and a single roundtrip to the database.
  17. So the advantages of embedding is that I can retrieve all relevant information and read from a single document. I don't have to implement any joins in my application code and also when I update or insert data, it is a single atomic operation. Consider that MongoDB, at this time, does not offer multi-document, multi-collection transactions. Let’s talk about Atomicity for a bit.
  18. In a single database command, we can update many fields, or the whole doc. If there are concurrent reads and writes to the same document, the application will see the document before or after the update, but not in between. So a single Update statement can alter either the complete document, or parts of, as we see in this example, and that is atomic. Explain particular operation. But what is not possible, up to MongoDB 3.2, is to do multi-document transactions. You cannot begin a transaction, perform operations and then either commit or rollback. What you may have guessed already, is that Embedding takes advantage of mongodb’s document-level atomicity
  19. But, there are limitations. A large document can also cost more overhead, and there is a 16MB limit, although, 16MB of JSON is a considerable amount of data. So, larger documents can cost more to read and update specially if data does not change too much.
  20. Exact of opposite than embedding Avoid duplication (1-M)
  21. Always look at embedding first, and then prove that embedding doesn’t work Can always query on any embedded information Careful: extra large documents, or embedded data not accessed frequently
  22. Mixed or Hybrid approach – reference to keep master data, but also embedd to store latest or most-used data for speed
  23. Avoid join tables! – what is a join table? A list of key pairs that relate to independent entities In MongoDB we have arrays The relationship can be done as embedded or referencing
  24. Using Embedding, arrays can be used. Data Duplication will happen and this is not such a bad idea as it is in relational. Notice how we are denormalizing some of the fields that we need most often (like the dr’s names) and still suffice our queries in a very fast manner Downside, if the fields we duplicate change, then we do have maintance or stale data. So take into account which fields will most likely not change, such as a Dr’s name. What to do if the fields change often?
  25. If the fields change quite often, then perhaps we could revert to Referencing, knowing we may need to hit the DB multiple times.
  26. Decision is really dependent on your application Fast queries Atomic updates Data maintenance when duplication - How often does data change? Read or Write intensive?
  27. Let’s look at Patients and Procedures
  28. Hypothetically decided to always use Referencing Look at queries – find all patients from a state that have had a particular procedure Very difficult query!! Bad performance Query Patients coll in New Hampshire – get the Patient ID’s Now go against all procedures of type Xray and for these Patiend ID’s – join code in application
  29. Referencing and embedding Contains the Type of the Procedure Can now embedd a small amount of Procedure info and can now execute in a single query If the “Chest X-Ray” changes, have to change everywhere – but very seldom changes, maybe once a decade!
  30. Tons of data into MongoDB every second
  31. Patients pulse, heart pressure, from which device, when, etc. Schema easy by creating a record per event – easy, but let’s analyze the consequences? Millions of records very quickly, a lot of the same data repeats! E.b. Device Id, the PatiendIT, and most of the time-stamp Index space will grow significantly, operations and queries will be expensive too!
  32. Store one document per hour! Vs. 1 doc per minute Each doc will contain 60 mins of data Pulse is a two dimensional array
  33. In general, an update is less costly than an insert, in this case we are creating less write workload by doing more updates than inserts
  34. Graph 1 day of activity Substantially less IOPS for Read, which means reading is faster
  35. Order of magnitudes differences!! When planning 1 years worth of data ALWAYS ALWAYS consider what indexes are neede, and the size of that index Revisit for formatting and easy reads Consider hardware needed!! – Servers with 100’s GB RAM are easy then TB sizes! - Same for disk space Summarize total HW at bottom row
  36. Use mgenerate to model data to see actual data sizes!
  37. Quickly identify your slow-running queries. Part of MongoDB Ops Manager, the Visual Query Profiler displays how query and write latency vary over time With the click of a button, the Visual Query Profiler consolidates and displays metrics from all your nodes on a single screen
  38. Let’s go back to this example from earlier, and imagine that the Procedure Name changes quite often, and we have decided to reference instead of embed.
  39. But I also want to get a view of the data that just has the Patient and his/hers Procedures, but not the Physicians
  40. Using $lookup I can do this
  41. Finally, when schema is done, working and performing well and I am in Production, I may want to lock this down. I can do this with Document Validation in 3.2.