SlideShare a Scribd company logo
1 of 49
Schema Design
(and its performance implications)
Jay Runkel
Principal Solutions Architect
jay.runkel@mongodb.com
@jayrunkel
2
Agenda
1. Today’s Example
2. MongoDB Schema Design vs. Relational
3. Modeling Relationships
4. Schema Design and Performance
Today’s Example
4
Medical Records
• Collects all patient information in a central repository
• Provide central point of access for
– Patients
– Care providers: physicians, nurses, etc.
– Billing
– Insurance reconciliation
• Hospitals, physicians, patients, procedures, records
Patient
Records
Medications
Lab Results
Procedures
Hospital
Records
Physicians
Patients
Nurses
Billing
5
Medical Record Data
• Hospitals
– have physicians
• Physicians
– Have patients
– Perform procedures
– Belong to hospitals
• Patients
– Have physicians
– Are the subject of procedures
• Procedures
– Associated with a patient
– Associated with a physician
– Have a record
– Variable meta data
• Records
– Associated with a procedure
– Binary data
– Variable fields
6
Lot of Variability
Relational View
Schema Design:
MongoDB vs. Relational
MongoDB Relational
Collections Tables
Documents Rows
Data Use Data Storage
What questions do I have? What answers do I have?
MongoDB versus Relational
Attribute MongoDB Relational
Storage N-dimensional Two-dimensional
Field Values 0, 1, many, or embed Single value
Query Any field or level Any field
Schema Flexible Very structured
Complex Normalized Schemas
Complex Normalized Schemas
13
Documents are Rich Data Structures
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: ‘+447557505611’
city: ‘London’,
location: [45.123,47.232],
Profession: [banking, finance, trader],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array of sub-documents
Fields
Typed field values
Fields can contain arrays
Relationships
Modeling One-to-One Relationships
16
Referencing
Procedure
• patient
• date
• type
• physician
• type
Results
• dataType
• size
• content: {…}
Use two collections with a reference
Similar to relational
17
Procedure
• patient
• date
• type
• results
• equipmentId
• data1
• data2
• physician
• Results
• type
• size
• content: {…}
Embedding
Document Schema
18
Referencing
Procedure
{
"_id" : 333,
"date" : "2003-02-09T05:00:00"),
"hospital" : “County Hills”,
"patient" : “John Doe”,
"physician" : “Stephen Smith”,
"type" : ”Chest X-ray",
”result" : 134
}
Results
{
“_id” : 134
"type" : "txt",
"size" : NumberInt(12),
"content" : {
value1: 343,
value2: “abc”,
…
}
}
19
Embedding
Procedure
{
"_id" : 333,
"date" : "2003-02-09T05:00:00"),
"hospital" : “County Hills”,
"patient" : “John Doe”,
"physician" : “Stephen Smith”,
"type" : ”Chest X-ray",
”result" : {
"type" : "txt",
"size" : NumberInt(12),
"content" : {
value1: 343,
value2: “abc”,
…
}
}
}
20
Embedding
• Advantages
– Retrieve all relevant information in a single query/document
– Avoid implementing joins in application code
– Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations
– Large documents mean more overhead if most fields are not relevant
– 16 MB document size limit
21
Atomicity
• Document operations are atomic
db.patients.update({_id: 12345},
{$inc : {numProcedures : 1},
$push : {procedures : “proc123”},
$set : {addr.state : “TX”}})
• No multi-document transactions
db.beginTransaction();
db.patients.update({_id: 12345}, …);
db.procedure.insert({_id: “proc123”, …});
db.records.insert({_id: “rec123”, …});
db.endTransaction();
22
Embedding
• Advantages
– Retrieve all relevant information in a single query/document
– Avoid implementing joins in application code
– Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations
– Large documents mean more overhead if most fields are not relevant
– 16 MB document size limit
23
Referencing
• Advantages
– Smaller documents
– Less likely to reach 16 MB document limit
– Infrequently accessed information not accessed on every query
– No duplication of data
• Limitations
– Two queries required to retrieve information
– Cannot update related information atomically
24
One to One: General Recommendations
• Embed
– No additional data duplication
– Can query or index on
embedded field
• e.g., “result.type”
• Exceptional cases…
• Embedding results in large
documents
• Set of infrequently access
fields
{
"_id" : 333,
"date" : "2003-02-09T05:00:00"),
"hospital" : “County Hills”,
"patient" : “John Doe”,
"physician" : “Stephen Smith”,
"type" : ”Chest X-ray",
”result" : {
"type" : "txt",
"size" : NumberInt(12),
"content" : {
value1: 343,
value2: “abc”,
…
}
}
}
Modeling One-to-Many Relationships
26
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
Patients
Embed
One-to-Many Relationships
Modeled in 2 possible ways
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [12345, 12346]}
{
_id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…}
{
_id: 12346,
date: 2015-02-15,
type: “blood test”,
…}
Patients
Reference
Procedures
27
One to Many: General Recommendations
• Embed, when possible
– Access all information in a single query
– Take advantage of update atomicity
– No additional data duplication
– Can query or index on any field
• e.g., { “phones.type”: “mobile” }
• Exceptional cases:
– 16 MB document size
– Large number of infrequently accessed fields
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
Modeling Many-to-Many Relationships
29
Many to Many
Traditional Relational Association
Join table
Physicians
name
specialty
phone
Hospitals
name
HosPhysicanRel
hospitalId
physicianId
X
Use arrays instead
30
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…},
{
id: 12346,
name: “Mary Well”,
address: {…},
…}]
}
Many-to-Many Relationships
Embedding physicians in hospitals collection
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [
{
id: 63633,
name: “Harold Green”,
address: {…},
…},
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…}]
}
Data Duplication
31
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]
}
Many-to-Many Relationships
Referencing
{
id: 63633,
name: “Harold Green”,
address: {…},
…}
Hospitals
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [63633, 12345]
}
Physicians
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…}
{
id: 12346,
name: “Mary Well”,
address: {…},
…}
32
Many to Many
General Recommendation
• Use case determines whether to reference or
embed:
1. Data Duplication
• Embedding may result in data duplication
• Duplication may be okay if reads
dominate updates
2. Referencing may be required if many
related items
3. Hybrid approach
• Potentially do both
{
_id: 2,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]}
{
_id: 12345,
name: “Joe Doctor”,
address: {…},
…}
{
_id: 12346,
name: “Mary Well”,
address: {…},
…}
Hospitals
Reference
Physicians
What If I Want to Store Large Files in MongoDB?
34
GridFS
Driver
GridFS API
doc.jpg
(meta
data)
doc.jpg
(1)doc.jpg
(1)doc.jpg
(1)
fs.files fs.chunks
doc.jpg
mongofiles utility provides command line GridFS interface
Schema Design and Performance
Two Examples
Example 1: Hybrid Approach
Embed and Reference
37
Healthcare Example
patients
procedures
Tailor Schema to Queries (cont.)
{
"_id" : 593340651,
"first" : "Gregorio",
"last" : "Lang",
"addr" : {
"street" : "623 Flowers Rd",
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : ["551ac”, “343fs”]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Find all patients from NH that
have had chest x-rays
Tailor Schema to Queries (cont.)
{
"_id" : 593340651,
"first" : "Gregorio",
"last" : "Lang",
"addr" : {
"street" : "623 Flowers Rd",
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : [
{id : "551ac”,
type : “Chest X-ray”},
{id : “343fs”,
type : “Blood Test”}]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Find all patients from NH that
have had chest x-rays
Example 2: Time Series Data
Medical Devices
41
Vital Sign Monitoring Device
Vital Signs Measured:
• Blood Pressure
• Pulse
• Blood Oxygen Levels
Produces data at regular intervals
• Once per minute
42
We have a hospital(s) of devices
43
Data From Vital Signs Monitoring Device
{
deviceId: 123456,
spO2: 88,
pulse: 74,
bp: [128, 80],
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• One document per minute per device
• Relational approach
44
Document Per Hour (By minute)
{
deviceId: 123456,
spO2: { 0: 88, 1: 90, …, 59: 92},
pulse: { 0: 74, 1: 76, …, 59: 72},
bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]},
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-minute data at the hourly level
• Update-driven workload
• 1 document per device per hour
45
Characterizing Write Differences
• Example: data generated every minute
• Recording the data for 1 patient for 1 hour:
Document Per Event
60 inserts
Document Per Hour
1 insert, 59 updates
46
Characterizing Read Differences
• Want to graph 24 hour of vital signs for a patient:
• Read performance is greatly improved
Document Per Event
1440 reads
Document Per Hour
24 reads
47
Characterizing Memory and Storage Differences
Document Per Minute Document Per Hour
Number Documents 52.6 B 876 M
Total Index Size 6364 GB 106 GB
_id index 1468 GB 24.5 GB
{ts: 1, deviceId: 1} 4895 GB 81.6 GB
Document Size 92 Bytes 758 Bytes
Database Size 4503 GB 618 GB
• 100K Devices
• 1 years worth of data
100000 * 365 *
24 * 60
100000 * 365 *
24
100000 * 365 *
24 * 60 * 130
100000 * 365 *
24 * 130
100000 * 365 *
24 * 60 * 92
100000 * 365 *
24 * 758
48
Summary
• Relationships can be modeled by embedding or references
• Decision should be made in context of application data and query workload
– Tailor schema to application workload
• It is okay recommended to violate RDBMS schema design principles
– No duplication of data
– Normalization
• Different schemas may result in dramatically different
– Query performance
– Hardware requirements
Questions?
jay.runkel@mongodb.com
@jayrunkel

More Related Content

What's hot

Chapter 6(introduction to documnet databse) no sql for mere mortals
Chapter 6(introduction to documnet databse) no sql for mere mortalsChapter 6(introduction to documnet databse) no sql for mere mortals
Chapter 6(introduction to documnet databse) no sql for mere mortalsnehabsairam
 
MongoDB Days Silicon Valley: Introducing MongoDB 3.2
MongoDB Days Silicon Valley: Introducing MongoDB 3.2MongoDB Days Silicon Valley: Introducing MongoDB 3.2
MongoDB Days Silicon Valley: Introducing MongoDB 3.2MongoDB
 
Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresJiaheng Lu
 
Painting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBPainting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBMongoDB
 
Multi-model database
Multi-model databaseMulti-model database
Multi-model databaseJiaheng Lu
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsMongoDB
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsnehabsairam
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 
Jumpstart: Introduction to MongoDB
Jumpstart: Introduction to MongoDBJumpstart: Introduction to MongoDB
Jumpstart: Introduction to MongoDBMongoDB
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaMongoDB
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsnehabsairam
 
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB
 
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...Alexandre Riazanov
 
Creating, Updating and Deleting Document in MongoDB
Creating, Updating and Deleting Document in MongoDBCreating, Updating and Deleting Document in MongoDB
Creating, Updating and Deleting Document in MongoDBWildan Maulana
 
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...MongoDB
 

What's hot (19)

Chapter 6(introduction to documnet databse) no sql for mere mortals
Chapter 6(introduction to documnet databse) no sql for mere mortalsChapter 6(introduction to documnet databse) no sql for mere mortals
Chapter 6(introduction to documnet databse) no sql for mere mortals
 
Mongodb
MongodbMongodb
Mongodb
 
MongoDB Days Silicon Valley: Introducing MongoDB 3.2
MongoDB Days Silicon Valley: Introducing MongoDB 3.2MongoDB Days Silicon Valley: Introducing MongoDB 3.2
MongoDB Days Silicon Valley: Introducing MongoDB 3.2
 
Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated Polystores
 
Painting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBPainting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDB
 
Multi-model database
Multi-model databaseMulti-model database
Multi-model database
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortals
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Jumpstart: Introduction to MongoDB
Jumpstart: Introduction to MongoDBJumpstart: Introduction to MongoDB
Jumpstart: Introduction to MongoDB
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and Java
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortals
 
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
 
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
 
Creating, Updating and Deleting Document in MongoDB
Creating, Updating and Deleting Document in MongoDBCreating, Updating and Deleting Document in MongoDB
Creating, Updating and Deleting Document in MongoDB
 
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
 

Viewers also liked

Webinar: Schema Design
Webinar: Schema DesignWebinar: Schema Design
Webinar: Schema DesignMongoDB
 
Leveraging MongoDB as a Data Store for Security Data
Leveraging MongoDB as a Data Store for Security DataLeveraging MongoDB as a Data Store for Security Data
Leveraging MongoDB as a Data Store for Security DataMongoDB
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)ibwhite
 
Breaking the oracle tie
Breaking the oracle tieBreaking the oracle tie
Breaking the oracle tieagiamas
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)MongoSF
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineMongoDB
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 
Retail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
Retail Reference Architecture Part 2: Real-Time, Geo Distributed InventoryRetail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
Retail Reference Architecture Part 2: Real-Time, Geo Distributed InventoryMongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDBMongoDB
 
First Review(Ppt)
First Review(Ppt)First Review(Ppt)
First Review(Ppt)smjagadish
 
Final Year Project Presentation
Final Year Project PresentationFinal Year Project Presentation
Final Year Project PresentationSyed Absar
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema DesignMongoDB
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMike Friedman
 

Viewers also liked (15)

Webinar: Schema Design
Webinar: Schema DesignWebinar: Schema Design
Webinar: Schema Design
 
Leveraging MongoDB as a Data Store for Security Data
Leveraging MongoDB as a Data Store for Security DataLeveraging MongoDB as a Data Store for Security Data
Leveraging MongoDB as a Data Store for Security Data
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)
 
Breaking the oracle tie
Breaking the oracle tieBreaking the oracle tie
Breaking the oracle tie
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Retail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
Retail Reference Architecture Part 2: Real-Time, Geo Distributed InventoryRetail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
Retail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
Final Year Project
Final Year ProjectFinal Year Project
Final Year Project
 
First Review(Ppt)
First Review(Ppt)First Review(Ppt)
First Review(Ppt)
 
Final Year Project Presentation
Final Year Project PresentationFinal Year Project Presentation
Final Year Project Presentation
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 

Similar to Schema Design and Performance Implications

Working With Large-Scale Clinical Datasets
Working With Large-Scale Clinical DatasetsWorking With Large-Scale Clinical Datasets
Working With Large-Scale Clinical DatasetsCraig Smail
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Matthieu Schapranow
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...lucenerevolution
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusGlobus
 
Accelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbAccelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbMongoDB
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Kees van Bochove
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Matthieu Schapranow
 
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialProcessing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialMatthieu Schapranow
 
Throw the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-careThrow the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-carehoot72
 
Data Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationData Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationMongoDB
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 
Steffen Frederiksen: DATA, DITA, DOCX
Steffen Frederiksen: DATA, DITA, DOCXSteffen Frederiksen: DATA, DITA, DOCX
Steffen Frederiksen: DATA, DITA, DOCXJack Molisani
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
CDISC Presentation
CDISC PresentationCDISC Presentation
CDISC Presentationhoot72
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
 
How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyMongoDB
 
GTC Europe 2017 - How to predict ICU mortality with digital health data
GTC Europe 2017 - How to predict ICU mortality with digital health dataGTC Europe 2017 - How to predict ICU mortality with digital health data
GTC Europe 2017 - How to predict ICU mortality with digital health dataMax Pumperla
 
A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...
A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...
A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...Gurdal Ertek
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 

Similar to Schema Design and Performance Implications (20)

Working With Large-Scale Clinical Datasets
Working With Large-Scale Clinical DatasetsWorking With Large-Scale Clinical Datasets
Working With Large-Scale Clinical Datasets
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
 
Accelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbAccelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo db
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?
 
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialProcessing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
 
Throw the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-careThrow the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-care
 
Data Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationData Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data Proliferation
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Steffen Frederiksen: DATA, DITA, DOCX
Steffen Frederiksen: DATA, DITA, DOCXSteffen Frederiksen: DATA, DITA, DOCX
Steffen Frederiksen: DATA, DITA, DOCX
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
CDISC Presentation
CDISC PresentationCDISC Presentation
CDISC Presentation
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to Terminology
 
How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare Technology
 
GTC Europe 2017 - How to predict ICU mortality with digital health data
GTC Europe 2017 - How to predict ICU mortality with digital health dataGTC Europe 2017 - How to predict ICU mortality with digital health data
GTC Europe 2017 - How to predict ICU mortality with digital health data
 
A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...
A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...
A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Schema Design and Performance Implications

  • 1. Schema Design (and its performance implications) Jay Runkel Principal Solutions Architect jay.runkel@mongodb.com @jayrunkel
  • 2. 2 Agenda 1. Today’s Example 2. MongoDB Schema Design vs. Relational 3. Modeling Relationships 4. Schema Design and Performance
  • 4. 4 Medical Records • Collects all patient information in a central repository • Provide central point of access for – Patients – Care providers: physicians, nurses, etc. – Billing – Insurance reconciliation • Hospitals, physicians, patients, procedures, records Patient Records Medications Lab Results Procedures Hospital Records Physicians Patients Nurses Billing
  • 5. 5 Medical Record Data • Hospitals – have physicians • Physicians – Have patients – Perform procedures – Belong to hospitals • Patients – Have physicians – Are the subject of procedures • Procedures – Associated with a patient – Associated with a physician – Have a record – Variable meta data • Records – Associated with a procedure – Binary data – Variable fields
  • 9. MongoDB Relational Collections Tables Documents Rows Data Use Data Storage What questions do I have? What answers do I have? MongoDB versus Relational
  • 10. Attribute MongoDB Relational Storage N-dimensional Two-dimensional Field Values 0, 1, many, or embed Single value Query Any field or level Any field Schema Flexible Very structured
  • 13. 13 Documents are Rich Data Structures { first_name: ‘Paul’, surname: ‘Miller’, cell: ‘+447557505611’ city: ‘London’, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Fields can contain an array of sub-documents Fields Typed field values Fields can contain arrays
  • 16. 16 Referencing Procedure • patient • date • type • physician • type Results • dataType • size • content: {…} Use two collections with a reference Similar to relational
  • 17. 17 Procedure • patient • date • type • results • equipmentId • data1 • data2 • physician • Results • type • size • content: {…} Embedding Document Schema
  • 18. 18 Referencing Procedure { "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : 134 } Results { “_id” : 134 "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } }
  • 19. 19 Embedding Procedure { "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : { "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } } }
  • 20. 20 Embedding • Advantages – Retrieve all relevant information in a single query/document – Avoid implementing joins in application code – Update related information as a single atomic operation • MongoDB doesn’t offer multi-document transactions • Limitations – Large documents mean more overhead if most fields are not relevant – 16 MB document size limit
  • 21. 21 Atomicity • Document operations are atomic db.patients.update({_id: 12345}, {$inc : {numProcedures : 1}, $push : {procedures : “proc123”}, $set : {addr.state : “TX”}}) • No multi-document transactions db.beginTransaction(); db.patients.update({_id: 12345}, …); db.procedure.insert({_id: “proc123”, …}); db.records.insert({_id: “rec123”, …}); db.endTransaction();
  • 22. 22 Embedding • Advantages – Retrieve all relevant information in a single query/document – Avoid implementing joins in application code – Update related information as a single atomic operation • MongoDB doesn’t offer multi-document transactions • Limitations – Large documents mean more overhead if most fields are not relevant – 16 MB document size limit
  • 23. 23 Referencing • Advantages – Smaller documents – Less likely to reach 16 MB document limit – Infrequently accessed information not accessed on every query – No duplication of data • Limitations – Two queries required to retrieve information – Cannot update related information atomically
  • 24. 24 One to One: General Recommendations • Embed – No additional data duplication – Can query or index on embedded field • e.g., “result.type” • Exceptional cases… • Embedding results in large documents • Set of infrequently access fields { "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : { "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } } }
  • 26. 26 { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”, …}, { id: 12346, date: 2015-02-15, type: “blood test”, …}] } Patients Embed One-to-Many Relationships Modeled in 2 possible ways { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [12345, 12346]} { _id: 12345, date: 2015-02-15, type: “Cat scan”, …} { _id: 12346, date: 2015-02-15, type: “blood test”, …} Patients Reference Procedures
  • 27. 27 One to Many: General Recommendations • Embed, when possible – Access all information in a single query – Take advantage of update atomicity – No additional data duplication – Can query or index on any field • e.g., { “phones.type”: “mobile” } • Exceptional cases: – 16 MB document size – Large number of infrequently accessed fields { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”, …}, { id: 12346, date: 2015-02-15, type: “blood test”, …}] }
  • 29. 29 Many to Many Traditional Relational Association Join table Physicians name specialty phone Hospitals name HosPhysicanRel hospitalId physicianId X Use arrays instead
  • 30. 30 { _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [ { id: 12345, name: “Joe Doctor”, address: {…}, …}, { id: 12346, name: “Mary Well”, address: {…}, …}] } Many-to-Many Relationships Embedding physicians in hospitals collection { _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [ { id: 63633, name: “Harold Green”, address: {…}, …}, { id: 12345, name: “Joe Doctor”, address: {…}, …}] } Data Duplication
  • 31. 31 { _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346] } Many-to-Many Relationships Referencing { id: 63633, name: “Harold Green”, address: {…}, …} Hospitals { _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [63633, 12345] } Physicians { id: 12345, name: “Joe Doctor”, address: {…}, …} { id: 12346, name: “Mary Well”, address: {…}, …}
  • 32. 32 Many to Many General Recommendation • Use case determines whether to reference or embed: 1. Data Duplication • Embedding may result in data duplication • Duplication may be okay if reads dominate updates 2. Referencing may be required if many related items 3. Hybrid approach • Potentially do both { _id: 2, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346]} { _id: 12345, name: “Joe Doctor”, address: {…}, …} { _id: 12346, name: “Mary Well”, address: {…}, …} Hospitals Reference Physicians
  • 33. What If I Want to Store Large Files in MongoDB?
  • 35. Schema Design and Performance Two Examples
  • 36. Example 1: Hybrid Approach Embed and Reference
  • 38. Tailor Schema to Queries (cont.) { "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : ["551ac”, “343fs”] } { "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”] } Patient Procedure Find all patients from NH that have had chest x-rays
  • 39. Tailor Schema to Queries (cont.) { "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : [ {id : "551ac”, type : “Chest X-ray”}, {id : “343fs”, type : “Blood Test”}] } { "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”] } Patient Procedure Find all patients from NH that have had chest x-rays
  • 40. Example 2: Time Series Data Medical Devices
  • 41. 41 Vital Sign Monitoring Device Vital Signs Measured: • Blood Pressure • Pulse • Blood Oxygen Levels Produces data at regular intervals • Once per minute
  • 42. 42 We have a hospital(s) of devices
  • 43. 43 Data From Vital Signs Monitoring Device { deviceId: 123456, spO2: 88, pulse: 74, bp: [128, 80], ts: ISODate("2013-10-16T22:07:00.000-0500") } • One document per minute per device • Relational approach
  • 44. 44 Document Per Hour (By minute) { deviceId: 123456, spO2: { 0: 88, 1: 90, …, 59: 92}, pulse: { 0: 74, 1: 76, …, 59: 72}, bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]}, ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-minute data at the hourly level • Update-driven workload • 1 document per device per hour
  • 45. 45 Characterizing Write Differences • Example: data generated every minute • Recording the data for 1 patient for 1 hour: Document Per Event 60 inserts Document Per Hour 1 insert, 59 updates
  • 46. 46 Characterizing Read Differences • Want to graph 24 hour of vital signs for a patient: • Read performance is greatly improved Document Per Event 1440 reads Document Per Hour 24 reads
  • 47. 47 Characterizing Memory and Storage Differences Document Per Minute Document Per Hour Number Documents 52.6 B 876 M Total Index Size 6364 GB 106 GB _id index 1468 GB 24.5 GB {ts: 1, deviceId: 1} 4895 GB 81.6 GB Document Size 92 Bytes 758 Bytes Database Size 4503 GB 618 GB • 100K Devices • 1 years worth of data 100000 * 365 * 24 * 60 100000 * 365 * 24 100000 * 365 * 24 * 60 * 130 100000 * 365 * 24 * 130 100000 * 365 * 24 * 60 * 92 100000 * 365 * 24 * 758
  • 48. 48 Summary • Relationships can be modeled by embedding or references • Decision should be made in context of application data and query workload – Tailor schema to application workload • It is okay recommended to violate RDBMS schema design principles – No duplication of data – Normalization • Different schemas may result in dramatically different – Query performance – Hardware requirements