SlideShare a Scribd company logo
1 of 57
MongoDB Schema
Design Patterns
Jumpstart Session
@SigNarvaez
Sigfrido ”Sig” Narváez
Sr. Solutions Architect, MongoDB
sigfrido@mongodb.com
@SigNarvaez
Agenda
Medical Record
Example01 Modeling
Relationships03Schema Design:
MongoDB vs.
Relational
02
Performance
04 Summary
Q&A06What’s new with
3.205
Medical Record Example
Medical Records
• Collects all patient information in a central repository
• Provide central point of access for
• Patients
• Care providers: physicians, nurses, etc.
• Billing
• Insurance reconciliation
• Hospitals, physicians, patients, procedures, records
Patient
Records
Medications
Lab Results
Procedures
Hospital
Records
Physicians
Patients
Nurses
Billing
Medical Record Data
• Hospitals
• have physicians
• Physicians
• Have patients
• Perform procedures
• Belong to hospitals
• Patients
• Have physicians
• Are the subject of procedures
• Procedures
• Associated with a patient
• Associated with a physician
• Have a record
• Variable meta data
• Records
• Associated with a procedure
• Binary data
• Variable fields
Lot of Variability
Schema Design: MongoDB vs.
Relational
MongoDB Relational
Collections Tables
Documents Rows
Data Use Data Storage
What questions do I have? What answers do I have?
MongoDB vs. Relational
Attribute MongoDB Relational
Storage N-dimensional Two-dimensional
Field Values 0, 1, many, or embed Single value
Query Any field, at any level Any field
Schema Flexible Very structured
MongoDB vs. Relational
Complex Normalized Schemas
Complex Normalized Schemas
Documents are Rich Data Structures
{
first_name: ‘Paul’,
last_name: ‘Miller’,
cell: 1234567890,
city: ‘London’,
location: [45.123,47.232],
professions: [‘banking’, ‘finance’, ‘trader’],
physicians: [
{ name: ‘Canelo Álvarez, M.D.’,
last_visit: ‘Del Carmen Hospital’,
last_visit_dt: ‘20160501’, … },
{ name: ‘Érik Morales, M.D.’,
last_visit: ‘Del Prado Hospital’,
last_visit_dt: ‘20160302’, … }
]
}
Fields can contain an array of sub-
documents
Fields
Strongly Typed field values
Fields can
contain arrays
Fields can be indexed and queried at
any level
ORM Layer removed – Data is already
an object!
Modeling Relationships
1-1
Referencing & Embedding
https://docs.mongodb.com/manual/core/data-modeling-introduction/
Procedure
• patient
• date
• type
• physician
• type
Results
• dataType
• size
• content: {…}
Use two collections with a
reference field – “relational”
Procedure
• patient
• date
• type
• results
• equipmentId
• data1
• data2
• physician
• Results
• type
• size
• content: {…}
Embedding
Document Schema
Referencing
Referencing
Procedure
{
"_id" : 333,
"date" : "2003-02-09T05:00:00"),
"hospital" : “County Hills”,
"patient" : “John Doe”,
"physician" : “Stephen Smith”,
"type" : ”Chest X-ray",
”result_id" : 134
}
Results
{
“_id” : 134
"type" : "txt",
"size" : NumberInt(12),
"content" : {
value1: 343,
value2: “abc”,
…
}
}
Embedding Procedure
{
"_id" : 333,
"date" : "2003-02-09T05:00:00"),
"hospital" : “County Hills”,
"patient" : “John Doe”,
"physician" : “Stephen Smith”,
"type" : ”Chest X-ray",
”result" : {
"type" : "txt",
"size" : NumberInt(12),
"content" : {
value1: 343,
value2: “abc”,
…
}
}
}
Embedding
• Advantages
• Retrieve all relevant information in a single query/document
• Avoid implementing joins in application code
• Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations
• Large documents mean more overhead if most fields are not relevant
• 16 MB document size limit
Atomicity
• Document operations are atomic
db.patients.update({_id: 12345},
{ $inc : { numProcedures : 1 },
$push : { procedures : “proc123” },
$set : { addr.state : “TX” }})
• No multi-document transactions
db.beginTransaction();
db.patients.update({_id: 12345}, …);
db.procedure.insert({_id: “proc123”, …});
db.records.insert({_id: “rec123”, …});
db.endTransaction();
Embedding
• Advantages
• Retrieve all relevant information in a single query/document
• Avoid implementing joins in application code
• Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations
• Large documents mean more overhead if most fields are not relevant
• 16 MB document size limit
Referencing
• Advantages
• Smaller documents
• Less likely to reach 16 MB document limit
• Infrequently accessed information not accessed on every query
• No duplication of data
• Limitations
• Two queries required to retrieve information
• Cannot update related information atomically
1-1: General Recommendations
• Embed
• No additional data duplication
• Can query or index on embedded
field
• e.g., “result.type”
• Exceptional cases…
• Embedding results in large
documents
• Set of infrequently access fields
{
"_id": 333,
"date": "2003-02-09T05:00:00",
"hospital": "County Hills",
"patient": "John Doe",
"physician": "Stephen Smith",
"type": "Chest X - ray",
"result": {
"type": "txt",
"size": 12,
"content": {
"value1": 343,
"value2": "abc"
}
}
}
1-M
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
Patients
Embed
1-M
Modeled in 2 possible ways
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [12345, 12346]}
{
_id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…}
{
_id: 12346,
date: 2015-02-15,
type: “blood test”,
…}
Patients
Reference
Procedures
1-M : General Recommendations
• Embed, when possible
• Many are weak entities
• Access all information in a single query
• Take advantage of update atomicity
• No additional data duplication
• Can query or index on any field
• e.g., { “phones.type”: “mobile” }
• Exceptional cases:
• 16 MB document size
• Large number of infrequently accessed fields
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
M-M
M-M
Traditional Relational Association
Join table Physicians
name
specialty
phone
Hospitals
name
HosPhysicanRel
hospitalId
physicianId
X
Use arrays instead
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…},
{
id: 12346,
name: “Mary Well”,
address: {…},
…}]
}
M-M
Embedding Physicians in Hospitals collection
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [
{
id: 63633,
name: “Harold Green”,
address: {…},
…},
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…}]
}
Data Duplication
…
is ok!
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]
}
M-M
Referencing
{
id: 63633,
name: “Harold Green”,
hospitals: [1,2],
…}
Hospitals
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [63633, 12345]
}
Physicians
{
id: 12345,
name: “Joe Doctor”,
hospitals: [1],
…}
{
id: 12346,
name: “Mary Well”,
hospitals: [1,2],
…}
M-M : General Recommendation
• Use case determines whether to reference or
embed:
1. Data Duplication
• Embedding may result in data duplication
• Duplication may be okay if reads
dominate updates
• Of the two, which one changes the
least?
2. Referencing may be required if many
related items
3. Hybrid approach
• Potentially do both .. It’s ok!
{
_id: 2,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]}
{
_id: 12345,
name: “Joe Doctor”,
address: {…},
…}
{
_id: 12346,
name: “Mary Well”,
address: {…},
…}
Hospitals
Reference
Physicians
Performance
Example 1: Hybrid
Approach
Embed and Reference
Healthcare Example
patients
procedures
Tailor Schema to Queries
{
"_id" : 593340651,
"first" : "Gregorio",
"last" : "Lang",
"addr" : {
"street" : "623 Flowers Rd",
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : ["551ac”, “343fs”]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Find all patients from NH that
have had chest x-rays
Tailor Schema to Queries (cont.)
{
"_id" : 593340651,
"first" : "Gregorio",
"last" : "Lang",
"addr" : {
"street" : "623 Flowers Rd",
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : [
{id : "551ac”,
type : “Chest X-ray”},
{id : “343fs”,
type : “Blood Test”}]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Find all patients from NH that
have had chest x-rays
3.2’s $lookup!!
(left-outer join)
Example 2: Time Series
Data
Medical Devices
Vital Sign Monitoring Device
Vital Signs Measured:
• Blood Pressure
• Pulse
• Blood Oxygen Levels
Produces data at regular intervals
• Once per minute
• Many Devices, Many Hospitals
Data From Vital Signs Monitoring Device
{
deviceId: 123456,
ts: ISODate("2013-10-16T22:07:00.000-0500"),
spO2: 88,
pulse: 74,
bp: [128, 80]
}
• One document x minute x device
• Relational approach
Document Per Hour (By minute)
{
deviceId: 123456,
ts: ISODate("2013-10-16T22:00:00.000-0500"),
spO2: { 0: 88, 1: 90, …, 59: 92},
pulse: { 0: 74, 1: 76, …, 59: 72},
bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]}
}
• 1 document x device x hour
• Store per-minute data at the hourly level
• Update-driven workload
Characterizing Write Differences
• Example: data generated every minute
• Recording the data for 1 patient for 1 hour:
Document Per Event
60 inserts
Document Per Hour
1 insert, 59 updates
Characterizing Read Differences
• Want to graph 24 hour of vital signs for a patient:
• Read performance is greatly improved
Document Per Event
1440 reads
Document Per Hour
24 reads
Characterizing Memory and Storage Differences
Document Per Minute Document Per Hour
Number Documents 52.6 Billion 876 Million
Total Index Size 6,364 GB 106 GB
_id index 1,468 GB 24.5 GB
{ts: 1, deviceId: 1} 4,895 GB 81.6 GB
Document Size 92 Bytes 758 Bytes
Database Size 4,503 GB 618 GB
• 100K Devices
• 1 years worth of data, at second resolution (365 x 24 x 60)
MongoDB 3.2
MongoDB 3.2 – a GIANT Release
Hash-Based Sharding
Roles
Kerberos
On-Prem Monitoring
2.2 2.4 2.6 3.0 3.2
Agg. Framework
Location-Aware
Sharding
$out
Index Intersection
Text Search
Field-Level Redaction
LDAP & x509
Auditing
Document Validation
Fast Failover
Simpler Scalability
Aggregation ++
Encryption At Rest
In-Memory Storage
Engine
BI Connector
$lookup
MongoDB Compass
APM Integration
Profiler Visualization
Auto Index Builds
Backups to File
System
Doc-Level
Concurrency
Compression
Storage Engine API
≤50 replicas
Auditing ++
Ops Manager
Tools
• mgenerate
• Part of mtools: https://github.com/rueckstiess/mtools/wiki/mgenerate
• Model schema using json definition
• Generate Millions of documents with random data
• How well does the schema work?
• Queries, Indexes, Data Size, Index Size, Replication
• Demo
Documents are Rich Data Structures{
first_name: ‘Paul’,
last_name: ‘Miller’,
cell: 1234567890,
city: ‘London’,
location: [45.123,47.232],
professions: [‘banking’, ‘finance’, ‘trader’],
physicians: [
{ name: ‘Canelo Álvarez, M.D.’,
last_visit: ‘Mission Hospital’,
last_visit_dt: ‘20160501’, … },
{ name: ‘Érik Morales, M.D.’,
last_visit: ‘Del Prado Hospital’,
last_visit_dt: ‘20160302’, … }
]
}
Fields can contain an array of sub-
documents
Fields
Typed field values
Fields can
contain arrays
Fields can be indexed and queried at
any level
ORM Layer removed – Data is already
an object!
Schema using mgenerate
{
"first_name" : { "$string" : { "length" : 30 }},
"last_name" : { "$string" : { "length" : 30 }},
"cell" : "$number",
"city" : { "$string" : { "length" : 30 }},
"location" : [ "$number", "$number"],
"professions" : { "$array" : [ {
"$choose" : [ "banking", "finance", "trader" ] },
{ "$number": [1, 3] }
] },
"physicians" : { "$array" : [
{
"name" : { "$string" : { "length" : 30 }},
"last_visit" : { "$string" : { "length" : 30 }},
"last_visit_dt" : "$datetime"
},
{ "$number" : [1, 5]}
] }
}
> mgenerate --host localhost --port 27017 -d webinar -c patients --drop -n 100 patients.json
Use Compass to visualize & query data!
Visual Query Profiler
Identify your slow-running queries with
the click of a button
Index Suggestions
Index recommendations to improve
your deployment
&
MongoDB 3.2 $lookup
{
"_id" : 593340651,
"first" : "Gregorio",
"last" : "Lang",
"addr" : {
"street" : "623 Flowers Rd",
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : [
{id : "551ac”,
type : “Chest X-ray”},
{id : “343fs”,
type : “Blood Test”}]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Find all patients from NH that
have had chest x-rays
3.2’s $lookup!!
(left-outer join)
MongoDB 3.2 $lookup
{ "_id": 593340651,
"first": "Gregorio",
"last": "Lang",
"addr": {
"street": "623 Flowers Rd",
"city": "Groton",
"state": "NH",
"zip": 3266 },
"physicians": [10387, 33456],
"procedures": ["551ac", "343fs"]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Obtain Patient view with
Procedure details, but
without Physicians
MongoDB 3.2 $lookup
db.PatientsColl.aggregate([
{ "$match" : { "_id": 593340651 }},
{ "$unwind" : "$procedures"},
{ "$lookup" : {
"from" : "ProceduresColl",
"localField" : "procedures",
"foreignField": "_id",
"as" : "procs" }},
{ "$unwind" : "$procs" },
{ "$group" : { "_id" : { "_id" : "$_id",
"first" : "$first",
"last" : "$last",
"addr" : "$addr" },
"procedures" : { "$push" : "$procs"} }
},
{ "$project" : { "_id" : "$_id._id",
"first" : "$_id.first",
"last" : "$_id.last",
"addr" : "$_id.addr",
"procedures._id" : 1,
"procedures.type" : 1,
"procedures.date" : 1 }
}]);
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
{
"_id": 593340651,
"first": "Gregorio",
"last": "Lang",
"addr": {
"street": "623 Flowers Rd",
"city": "Groton",
"state": "NH",
"zip": 3266
},
"procedures": [{
"_id": "551ac",
"date": "2000-04-26",
"type": "Chest X-ray"
}, {
"_id": "343fs",
"date": "2000-04-26",
"type": "Blood Test"
}]
}
Obtain Patient view with
Procedure details, but
without Physicians
MongoDB 3.2 Document Validation
db.runCommand( {
collMod: "Patients",
validator: { $and: [
{ "first_name": { "$type": "string" }},
{ "last_name": { "$type": "string"}},
{ "physicians": { "$type": "array"}}
] },
validationLevel: "strict"
});
https://docs.mongodb.com/manual/core/document-validation/
All Patient records must
have alphanumeric data
for the first and last
name, and a list of
Physicians
Summary
Embedding and
Referencing01 Context of Application Data
and Query Workload
Decisions
031-1 : Embed
1-M : Embed
when possible
M-M : Hybrid
02
Different schemas may result
in dramatically different query
performance, data/index size
and hardware requirements!
Iterate
04 $lookup
Document Validation
3.2
06Measure data/index size, query
performance
- mgenerate/mtools
- Compass
- Cloud Manager / Ops
Manager
Tools!
05
Q&A Sigfrido Narváez
Sr. Solutions Architect, MongoDB

More Related Content

What's hot

Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDBMongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBMongoDB
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBLee Theobald
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMike Friedman
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema DesignMongoDB
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseMike Dirolf
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningPuneet Behl
 
Time Series Data with InfluxDB
Time Series Data with InfluxDBTime Series Data with InfluxDB
Time Series Data with InfluxDBTuri, Inc.
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB Habilelabs
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB FundamentalsMongoDB
 
MongoDB performance
MongoDB performanceMongoDB performance
MongoDB performanceMydbops
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Ontico
 

What's hot (20)

Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Time Series Data with InfluxDB
Time Series Data with InfluxDBTime Series Data with InfluxDB
Time Series Data with InfluxDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
 
MongoDB performance
MongoDB performanceMongoDB performance
MongoDB performance
 
Mongo db dhruba
Mongo db dhrubaMongo db dhruba
Mongo db dhruba
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
 

Similar to Webinar: MongoDB Schema Design and Performance Implications

Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsMongoDB
 
MongoDB Schema Design and its Performance Implications
MongoDB Schema Design and its Performance ImplicationsMongoDB Schema Design and its Performance Implications
MongoDB Schema Design and its Performance ImplicationsLewis Lin 🦊
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB
 
Working With Large-Scale Clinical Datasets
Working With Large-Scale Clinical DatasetsWorking With Large-Scale Clinical Datasets
Working With Large-Scale Clinical DatasetsCraig Smail
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusGlobus
 
Accelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBAccelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBMongoDB
 
How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyMongoDB
 
Data Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationData Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationMongoDB
 
Accelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbAccelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbMongoDB
 
Painting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBPainting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBMongoDB
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...lucenerevolution
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBMongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best PracticesLewis Lin 🦊
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Kees van Bochove
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Matthieu Schapranow
 
Data Management for Quantitative Biology - Database Systems (continued) LIMS ...
Data Management for Quantitative Biology - Database Systems (continued) LIMS ...Data Management for Quantitative Biology - Database Systems (continued) LIMS ...
Data Management for Quantitative Biology - Database Systems (continued) LIMS ...QBiC_Tue
 
Systems of engagement
Systems of engagementSystems of engagement
Systems of engagementBryan Reinero
 
2012-ICGC-Heidelberg-Whitty-DCC 2
2012-ICGC-Heidelberg-Whitty-DCC 22012-ICGC-Heidelberg-Whitty-DCC 2
2012-ICGC-Heidelberg-Whitty-DCC 2Brett Whitty
 
Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBMongoDB
 

Similar to Webinar: MongoDB Schema Design and Performance Implications (20)

Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance Implications
 
MongoDB Schema Design and its Performance Implications
MongoDB Schema Design and its Performance ImplicationsMongoDB Schema Design and its Performance Implications
MongoDB Schema Design and its Performance Implications
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
 
Working With Large-Scale Clinical Datasets
Working With Large-Scale Clinical DatasetsWorking With Large-Scale Clinical Datasets
Working With Large-Scale Clinical Datasets
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
 
Accelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBAccelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDB
 
How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare Technology
 
Data Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationData Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data Proliferation
 
Accelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbAccelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo db
 
Painting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBPainting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDB
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 
Data Management for Quantitative Biology - Database Systems (continued) LIMS ...
Data Management for Quantitative Biology - Database Systems (continued) LIMS ...Data Management for Quantitative Biology - Database Systems (continued) LIMS ...
Data Management for Quantitative Biology - Database Systems (continued) LIMS ...
 
Systems of engagement
Systems of engagementSystems of engagement
Systems of engagement
 
2012-ICGC-Heidelberg-Whitty-DCC 2
2012-ICGC-Heidelberg-Whitty-DCC 22012-ICGC-Heidelberg-Whitty-DCC 2
2012-ICGC-Heidelberg-Whitty-DCC 2
 
Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDB
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 

Recently uploaded

Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 

Recently uploaded (20)

Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 

Webinar: MongoDB Schema Design and Performance Implications

  • 2. Sigfrido ”Sig” Narváez Sr. Solutions Architect, MongoDB sigfrido@mongodb.com @SigNarvaez
  • 3. Agenda Medical Record Example01 Modeling Relationships03Schema Design: MongoDB vs. Relational 02 Performance 04 Summary Q&A06What’s new with 3.205
  • 5. Medical Records • Collects all patient information in a central repository • Provide central point of access for • Patients • Care providers: physicians, nurses, etc. • Billing • Insurance reconciliation • Hospitals, physicians, patients, procedures, records Patient Records Medications Lab Results Procedures Hospital Records Physicians Patients Nurses Billing
  • 6. Medical Record Data • Hospitals • have physicians • Physicians • Have patients • Perform procedures • Belong to hospitals • Patients • Have physicians • Are the subject of procedures • Procedures • Associated with a patient • Associated with a physician • Have a record • Variable meta data • Records • Associated with a procedure • Binary data • Variable fields
  • 8. Schema Design: MongoDB vs. Relational
  • 9. MongoDB Relational Collections Tables Documents Rows Data Use Data Storage What questions do I have? What answers do I have? MongoDB vs. Relational
  • 10. Attribute MongoDB Relational Storage N-dimensional Two-dimensional Field Values 0, 1, many, or embed Single value Query Any field, at any level Any field Schema Flexible Very structured MongoDB vs. Relational
  • 13. Documents are Rich Data Structures { first_name: ‘Paul’, last_name: ‘Miller’, cell: 1234567890, city: ‘London’, location: [45.123,47.232], professions: [‘banking’, ‘finance’, ‘trader’], physicians: [ { name: ‘Canelo Álvarez, M.D.’, last_visit: ‘Del Carmen Hospital’, last_visit_dt: ‘20160501’, … }, { name: ‘Érik Morales, M.D.’, last_visit: ‘Del Prado Hospital’, last_visit_dt: ‘20160302’, … } ] } Fields can contain an array of sub- documents Fields Strongly Typed field values Fields can contain arrays Fields can be indexed and queried at any level ORM Layer removed – Data is already an object!
  • 15. 1-1
  • 17. Procedure • patient • date • type • physician • type Results • dataType • size • content: {…} Use two collections with a reference field – “relational” Procedure • patient • date • type • results • equipmentId • data1 • data2 • physician • Results • type • size • content: {…} Embedding Document Schema Referencing
  • 18. Referencing Procedure { "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result_id" : 134 } Results { “_id” : 134 "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } }
  • 19. Embedding Procedure { "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : { "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } } }
  • 20. Embedding • Advantages • Retrieve all relevant information in a single query/document • Avoid implementing joins in application code • Update related information as a single atomic operation • MongoDB doesn’t offer multi-document transactions • Limitations • Large documents mean more overhead if most fields are not relevant • 16 MB document size limit
  • 21. Atomicity • Document operations are atomic db.patients.update({_id: 12345}, { $inc : { numProcedures : 1 }, $push : { procedures : “proc123” }, $set : { addr.state : “TX” }}) • No multi-document transactions db.beginTransaction(); db.patients.update({_id: 12345}, …); db.procedure.insert({_id: “proc123”, …}); db.records.insert({_id: “rec123”, …}); db.endTransaction();
  • 22. Embedding • Advantages • Retrieve all relevant information in a single query/document • Avoid implementing joins in application code • Update related information as a single atomic operation • MongoDB doesn’t offer multi-document transactions • Limitations • Large documents mean more overhead if most fields are not relevant • 16 MB document size limit
  • 23. Referencing • Advantages • Smaller documents • Less likely to reach 16 MB document limit • Infrequently accessed information not accessed on every query • No duplication of data • Limitations • Two queries required to retrieve information • Cannot update related information atomically
  • 24. 1-1: General Recommendations • Embed • No additional data duplication • Can query or index on embedded field • e.g., “result.type” • Exceptional cases… • Embedding results in large documents • Set of infrequently access fields { "_id": 333, "date": "2003-02-09T05:00:00", "hospital": "County Hills", "patient": "John Doe", "physician": "Stephen Smith", "type": "Chest X - ray", "result": { "type": "txt", "size": 12, "content": { "value1": 343, "value2": "abc" } } }
  • 25. 1-M
  • 26. { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”, …}, { id: 12346, date: 2015-02-15, type: “blood test”, …}] } Patients Embed 1-M Modeled in 2 possible ways { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [12345, 12346]} { _id: 12345, date: 2015-02-15, type: “Cat scan”, …} { _id: 12346, date: 2015-02-15, type: “blood test”, …} Patients Reference Procedures
  • 27. 1-M : General Recommendations • Embed, when possible • Many are weak entities • Access all information in a single query • Take advantage of update atomicity • No additional data duplication • Can query or index on any field • e.g., { “phones.type”: “mobile” } • Exceptional cases: • 16 MB document size • Large number of infrequently accessed fields { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”, …}, { id: 12346, date: 2015-02-15, type: “blood test”, …}] }
  • 28. M-M
  • 29. M-M Traditional Relational Association Join table Physicians name specialty phone Hospitals name HosPhysicanRel hospitalId physicianId X Use arrays instead
  • 30. { _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [ { id: 12345, name: “Joe Doctor”, address: {…}, …}, { id: 12346, name: “Mary Well”, address: {…}, …}] } M-M Embedding Physicians in Hospitals collection { _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [ { id: 63633, name: “Harold Green”, address: {…}, …}, { id: 12345, name: “Joe Doctor”, address: {…}, …}] } Data Duplication … is ok!
  • 31. { _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346] } M-M Referencing { id: 63633, name: “Harold Green”, hospitals: [1,2], …} Hospitals { _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [63633, 12345] } Physicians { id: 12345, name: “Joe Doctor”, hospitals: [1], …} { id: 12346, name: “Mary Well”, hospitals: [1,2], …}
  • 32. M-M : General Recommendation • Use case determines whether to reference or embed: 1. Data Duplication • Embedding may result in data duplication • Duplication may be okay if reads dominate updates • Of the two, which one changes the least? 2. Referencing may be required if many related items 3. Hybrid approach • Potentially do both .. It’s ok! { _id: 2, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346]} { _id: 12345, name: “Joe Doctor”, address: {…}, …} { _id: 12346, name: “Mary Well”, address: {…}, …} Hospitals Reference Physicians
  • 36. Tailor Schema to Queries { "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : ["551ac”, “343fs”] } { "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”] } Patient Procedure Find all patients from NH that have had chest x-rays
  • 37. Tailor Schema to Queries (cont.) { "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : [ {id : "551ac”, type : “Chest X-ray”}, {id : “343fs”, type : “Blood Test”}] } { "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”] } Patient Procedure Find all patients from NH that have had chest x-rays 3.2’s $lookup!! (left-outer join)
  • 38. Example 2: Time Series Data Medical Devices
  • 39. Vital Sign Monitoring Device Vital Signs Measured: • Blood Pressure • Pulse • Blood Oxygen Levels Produces data at regular intervals • Once per minute • Many Devices, Many Hospitals
  • 40. Data From Vital Signs Monitoring Device { deviceId: 123456, ts: ISODate("2013-10-16T22:07:00.000-0500"), spO2: 88, pulse: 74, bp: [128, 80] } • One document x minute x device • Relational approach
  • 41. Document Per Hour (By minute) { deviceId: 123456, ts: ISODate("2013-10-16T22:00:00.000-0500"), spO2: { 0: 88, 1: 90, …, 59: 92}, pulse: { 0: 74, 1: 76, …, 59: 72}, bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]} } • 1 document x device x hour • Store per-minute data at the hourly level • Update-driven workload
  • 42. Characterizing Write Differences • Example: data generated every minute • Recording the data for 1 patient for 1 hour: Document Per Event 60 inserts Document Per Hour 1 insert, 59 updates
  • 43. Characterizing Read Differences • Want to graph 24 hour of vital signs for a patient: • Read performance is greatly improved Document Per Event 1440 reads Document Per Hour 24 reads
  • 44. Characterizing Memory and Storage Differences Document Per Minute Document Per Hour Number Documents 52.6 Billion 876 Million Total Index Size 6,364 GB 106 GB _id index 1,468 GB 24.5 GB {ts: 1, deviceId: 1} 4,895 GB 81.6 GB Document Size 92 Bytes 758 Bytes Database Size 4,503 GB 618 GB • 100K Devices • 1 years worth of data, at second resolution (365 x 24 x 60)
  • 46. MongoDB 3.2 – a GIANT Release Hash-Based Sharding Roles Kerberos On-Prem Monitoring 2.2 2.4 2.6 3.0 3.2 Agg. Framework Location-Aware Sharding $out Index Intersection Text Search Field-Level Redaction LDAP & x509 Auditing Document Validation Fast Failover Simpler Scalability Aggregation ++ Encryption At Rest In-Memory Storage Engine BI Connector $lookup MongoDB Compass APM Integration Profiler Visualization Auto Index Builds Backups to File System Doc-Level Concurrency Compression Storage Engine API ≤50 replicas Auditing ++ Ops Manager
  • 47. Tools • mgenerate • Part of mtools: https://github.com/rueckstiess/mtools/wiki/mgenerate • Model schema using json definition • Generate Millions of documents with random data • How well does the schema work? • Queries, Indexes, Data Size, Index Size, Replication • Demo
  • 48. Documents are Rich Data Structures{ first_name: ‘Paul’, last_name: ‘Miller’, cell: 1234567890, city: ‘London’, location: [45.123,47.232], professions: [‘banking’, ‘finance’, ‘trader’], physicians: [ { name: ‘Canelo Álvarez, M.D.’, last_visit: ‘Mission Hospital’, last_visit_dt: ‘20160501’, … }, { name: ‘Érik Morales, M.D.’, last_visit: ‘Del Prado Hospital’, last_visit_dt: ‘20160302’, … } ] } Fields can contain an array of sub- documents Fields Typed field values Fields can contain arrays Fields can be indexed and queried at any level ORM Layer removed – Data is already an object!
  • 49. Schema using mgenerate { "first_name" : { "$string" : { "length" : 30 }}, "last_name" : { "$string" : { "length" : 30 }}, "cell" : "$number", "city" : { "$string" : { "length" : 30 }}, "location" : [ "$number", "$number"], "professions" : { "$array" : [ { "$choose" : [ "banking", "finance", "trader" ] }, { "$number": [1, 3] } ] }, "physicians" : { "$array" : [ { "name" : { "$string" : { "length" : 30 }}, "last_visit" : { "$string" : { "length" : 30 }}, "last_visit_dt" : "$datetime" }, { "$number" : [1, 5]} ] } } > mgenerate --host localhost --port 27017 -d webinar -c patients --drop -n 100 patients.json
  • 50. Use Compass to visualize & query data!
  • 51. Visual Query Profiler Identify your slow-running queries with the click of a button Index Suggestions Index recommendations to improve your deployment &
  • 52. MongoDB 3.2 $lookup { "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : [ {id : "551ac”, type : “Chest X-ray”}, {id : “343fs”, type : “Blood Test”}] } { "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”] } Patient Procedure Find all patients from NH that have had chest x-rays 3.2’s $lookup!! (left-outer join)
  • 53. MongoDB 3.2 $lookup { "_id": 593340651, "first": "Gregorio", "last": "Lang", "addr": { "street": "623 Flowers Rd", "city": "Groton", "state": "NH", "zip": 3266 }, "physicians": [10387, 33456], "procedures": ["551ac", "343fs"] } { "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”] } Patient Procedure Obtain Patient view with Procedure details, but without Physicians
  • 54. MongoDB 3.2 $lookup db.PatientsColl.aggregate([ { "$match" : { "_id": 593340651 }}, { "$unwind" : "$procedures"}, { "$lookup" : { "from" : "ProceduresColl", "localField" : "procedures", "foreignField": "_id", "as" : "procs" }}, { "$unwind" : "$procs" }, { "$group" : { "_id" : { "_id" : "$_id", "first" : "$first", "last" : "$last", "addr" : "$addr" }, "procedures" : { "$push" : "$procs"} } }, { "$project" : { "_id" : "$_id._id", "first" : "$_id.first", "last" : "$_id.last", "addr" : "$_id.addr", "procedures._id" : 1, "procedures.type" : 1, "procedures.date" : 1 } }]); https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/ { "_id": 593340651, "first": "Gregorio", "last": "Lang", "addr": { "street": "623 Flowers Rd", "city": "Groton", "state": "NH", "zip": 3266 }, "procedures": [{ "_id": "551ac", "date": "2000-04-26", "type": "Chest X-ray" }, { "_id": "343fs", "date": "2000-04-26", "type": "Blood Test" }] } Obtain Patient view with Procedure details, but without Physicians
  • 55. MongoDB 3.2 Document Validation db.runCommand( { collMod: "Patients", validator: { $and: [ { "first_name": { "$type": "string" }}, { "last_name": { "$type": "string"}}, { "physicians": { "$type": "array"}} ] }, validationLevel: "strict" }); https://docs.mongodb.com/manual/core/document-validation/ All Patient records must have alphanumeric data for the first and last name, and a list of Physicians
  • 56. Summary Embedding and Referencing01 Context of Application Data and Query Workload Decisions 031-1 : Embed 1-M : Embed when possible M-M : Hybrid 02 Different schemas may result in dramatically different query performance, data/index size and hardware requirements! Iterate 04 $lookup Document Validation 3.2 06Measure data/index size, query performance - mgenerate/mtools - Compass - Cloud Manager / Ops Manager Tools! 05
  • 57. Q&A Sigfrido Narváez Sr. Solutions Architect, MongoDB

Editor's Notes

  1. Hi my name is Sigfrido Narvaez, and I like to go by Sig. Today we will be talking about MongoDB schema design and some of its performance implications. We will also explore some of the new features in MongoDB 3.2 that are relevant to schema design, and some additional tools that will help you iterate and try out different approaches quickly. During the webinar, please feel free to type any questions in the chat box, and at the end, we will have a Q&A session and answer as much as we can.
  2. Ok, so I am a Sr. Solutions Architect here at MongoDB based out of Southern California, and prior to joining I was the Principal Software Architect for a Hybrid Cloud & Polyglot Persistence solution that used MongoDB, and that required leveraging MongoDB’s dynamic flexible schema to power cloud and mobile apps whose main source of data originated from many on-premise ERP’s. And I have also been organizing the orange county MUG for almost 4 years. I have provided my email address and my Twitter handle, in case I don't cover all the questions or we any follow-ups, so please feel free to reach out with any questions afterwards and I will make sure I find the information you are looking for.
  3. The agenda for today’s presentation, We will use a medical record example, and explore its schema in MongoDB vs. relational, using Embedding & Referencing, and comparing against the classic 1-1, 1-M & M-M. We will then jump into a performance analysis examining data and index growth, and finally, explore new features in MongoDB 3.2
  4. Design a schema for a medical information system. Where we will need to store data for the Patients, the Physicians, the Procedures, and many other aspects about a medical system. And all this data is interrelated and we have to assume the system will be around for many years and will grow over time.
  5. Left-down to right-down Let's examine the data entities that are going to be part of this system. First we have hospitals and hospitals have many physicians, Then we have the physicians who attend many patients and that will perform many procedures, and who themselves belong to many hospitals The patients, who again are attended by many physicians, they are the subject of many procedures The Procedures are of course applied by a physician to a patient, inside a hospital on a particular time, and the data that is produced by each of these procedures can vary a lot. For example an x-ray procedure will produce a bunch of data along with an image or set of images, but a blood test will only produce a bunch of data. Each procedure has different data, schema design problem As we can see the main entities and their relationships maybe a great fit for a relational database.
  6. But the procedures data is not, and, overtime procedures will change and use new medical devices or go through improvements and may produce even more data with more variability, and we still have to keep historical records too. This is a real challenge for a relational database But for MongoDB and the flexible document model, this is easy. The way we would model this is by having some common data points that all procedures have, such as the timestamp, the physician, the patient, and the hospital, and any other common fields but then have a variable section in the json document, for the unique data points of each procedure. This will make great use of the polymorphic schema capabilities of MongoDB, and with modern languages, this can be modeled using base classes and extensions or inheritance.
  7. Before we go into the modeling exercises, let's do a level set of understanding of MongoDB concepts versus Relational concepts
  8. In MongoDB data is stored in a collection and that is analogous to a table. Collections contain Documents and that is analogous to a Row or Record. More importantly in mongo DB we think about what data do I need to use and how will it be used, versus how will the data be stored. In MongoDB, we need to look at queries to guide schema design decisions, where as in relational we model first, and then answer questions, and eventually add Indexes and in some cases, denormalize data to support queries and performance over time.
  9. Another difference is that in mongo DB fields have many dimensions versus just having two (rows and columns) Each field can contain 0, 1 or many values such as an Array, or even embedded such as sub-documents, and the type can vary from document to document. vs. a single value of a pre-defined type. I can also query at any field and at any level in the document versus a single field, and we
  10. Okay so when we start modeling data, the first thing to avoid is to think of every single little thing that I may not use immediately, which usually leads to creating complex over normalized schemas
  11. DO NOT PERFORM 3rd normal form modeling, and create hundreds of tables, where you have join tables for M-M’s and store all kinds of entities entities which will be very difficult to join, will slow down performance and will be hard to maintain over time.
  12. Instead what we do is create rich data structures that are single documents. As you can see in this example we have many fields about a patient, where they live, what professions they practice, a list of the physicians they're currently seeing, when was the last visit, etc. So I can get a quick view of a patient in a single document. Now we have talked about MongoDB having strong data types, such as strings and numbers, but we also have more advanced data types such as coordinates, and arrays of other sub-documents In MongoDB I can query and index using any number of fields at any level, and the document is already in object form so I don't need an ORM layer like Hibernate or Entity Framework to translate data from relational to object, the data is already an object.
  13. Two ways to model relationships: Referencing and Embedding. Referencing is a very relational-like approach where I duplicate ID’s across collections. But take into account that MongoDB does not enforce foreign key constraints, so if you were to delete a master document, you will likely end-up with orphans and this has to be handled by the application level. Embedding is more natural to MongoDB and it works by nesting data inside a single document. There May or may not be a need to generate an ID for nested data, but for sure there is no need to duplicate them as everything lives together.
  14. So how does this apply to our medical schema? Let’s look at Procedures and Results. With Referencing I could use two collections and have a relationship between them. In Embedding, I could embed the results inside of the procedure. Now, something to think about, which of these two entities is a strong entity and a weak entity. Clearly the Results is weak as it cannot exist without a Procedure.
  15. Here is how the referencing approach would look like. Obtaining all the data I need will require two reads and two roundtrips to the database. Notice we have placed the result ID in the procedure, Why? Because my application will display Procedures and their Results. This way I only need to read the Procedure collection and then lookup the Result document by its ID, and I can perform this lookup in the application layer. And to give you a hint about the latter section of the presentation, with MongoDB 3.2 I can use the $lookup pipeline stage to perform what is essentially a left-outer join performed at the database layer. Take a second to think about this design, using classic relational modeling and considering the strong and weak entities, I would have probably placed the ProcedureID in the Results. But then I would need to create an additional index which costs disk and memory
  16. However, with the Embedding approach, this is quite easy to model and getting my data requires a single read and a single roundtrip to the database.
  17. So the advantages of embedding is that I can retrieve all relevant information and read from a single document. I don't have to implement any joins in my application code and also when I update or insert data, it is a single atomic operation. Consider that MongoDB, at this time, does not offer multi-document, multi-collection transactions. Let’s talk about Atomicity for a bit.
  18. In a single database command, we can update many fields, or the whole doc. If there are concurrent reads and writes to the same document, the application will see the document before or after the update, but not in between. So a single Update statement can alter either the complete document, or parts of, as we see in this example, and that is atomic. Explain particular operation. But what is not possible, up to MongoDB 3.2, is to do multi-document transactions. You cannot begin a transaction, perform operations and then either commit or rollback. What you may have guessed already, is that Embedding takes advantage of mongodb’s document-level atomicity
  19. But, there are limitations. A large document can also cost more overhead, and there is a 16MB limit, although, 16MB of JSON is a considerable amount of data. So, larger documents can cost more to read and update specially if data does not change too much.
  20. Exact of opposite than embedding Avoid duplication (1-M)
  21. Always look at embedding first, and then prove that embedding doesn’t work Can always query on any embedded information Careful: extra large documents, or embedded data not accessed frequently
  22. Mixed or Hybrid approach – reference to keep master data, but also embedd to store latest or most-used data for speed
  23. Avoid join tables! – what is a join table? A list of key pairs that relate to independent entities In MongoDB we have arrays The relationship can be done as embedded or referencing
  24. Using Embedding, arrays can be used. Data Duplication will happen and this is not such a bad idea as it is in relational. Notice how we are denormalizing some of the fields that we need most often (like the dr’s names) and still suffice our queries in a very fast manner Downside, if the fields we duplicate change, then we do have maintance or stale data. So take into account which fields will most likely not change, such as a Dr’s name. What to do if the fields change often?
  25. If the fields change quite often, then perhaps we could revert to Referencing, knowing we may need to hit the DB multiple times.
  26. Decision is really dependent on your application Fast queries Atomic updates Data maintenance when duplication - How often does data change? Read or Write intensive?
  27. Let’s look at Patients and Procedures
  28. Hypothetically decided to always use Referencing Look at queries – find all patients from a state that have had a particular procedure Very difficult query!! Bad performance Query Patients coll in New Hampshire – get the Patient ID’s Now go against all procedures of type Xray and for these Patiend ID’s – join code in application
  29. Referencing and embedding Contains the Type of the Procedure Can now embedd a small amount of Procedure info and can now execute in a single query If the “Chest X-Ray” changes, have to change everywhere – but very seldom changes, maybe once a decade!
  30. Tons of data into MongoDB every second
  31. Patients pulse, heart pressure, from which device, when, etc. Schema easy by creating a record per event – easy, but let’s analyze the consequences? Millions of records very quickly, a lot of the same data repeats! E.b. Device Id, the PatiendIT, and most of the time-stamp Index space will grow significantly, operations and queries will be expensive too!
  32. Store one document per hour! Vs. 1 doc per minute Each doc will contain 60 mins of data Pulse is a two dimensional array
  33. In general, an update is less costly than an insert, in this case we are creating less write workload by doing more updates than inserts
  34. Graph 1 day of activity Substantially less IOPS for Read, which means reading is faster
  35. Order of magnitudes differences!! When planning 1 years worth of data ALWAYS ALWAYS consider what indexes are neede, and the size of that index Revisit for formatting and easy reads Consider hardware needed!! – Servers with 100’s GB RAM are easy then TB sizes! - Same for disk space Summarize total HW at bottom row
  36. Use mgenerate to model data to see actual data sizes!
  37. Quickly identify your slow-running queries. Part of MongoDB Ops Manager, the Visual Query Profiler displays how query and write latency vary over time With the click of a button, the Visual Query Profiler consolidates and displays metrics from all your nodes on a single screen
  38. Let’s go back to this example from earlier, and imagine that the Procedure Name changes quite often, and we have decided to reference instead of embed.
  39. But I also want to get a view of the data that just has the Patient and his/hers Procedures, but not the Physicians
  40. Using $lookup I can do this
  41. Finally, when schema is done, working and performing well and I am in Production, I may want to lock this down. I can do this with Document Validation in 3.2.