SlideShare a Scribd company logo
1 of 47
Schema Design
Solutions Architect, MongoDB
Jay Runkel
#MongoDB
First a story:
Once upon a time there was a
medical records company…
• Schema Design Challenge
• Modeling Relationships in MongoDB
• An Example
• General Recommendations
Agenda
Schema Design Challenges
• Flexibility
– Easily adapt to new requirements
• Agility
– Rapid application development
• Scalability
– Support large data and query volumes
Schema Design Challenge
• How do we model data and relationships to
ensure:
–Flexibility
–Agility
–Scalability
Schema Design Challenge
Schema Design:
MongoDB vs. Relational
MongoDB Relational
Collections Tables
Documents Rows
Data Use Data Storage
What questions do I have? What answers do I have?
MongoDB versus Relational
Attribute MongoDB Relational
Storage N-dimensional Two-dimensional
Field Values
0, 1, many, or
embed
Single value
Query Any field or level Any field
Schema Flexible Very structured
Updates In line In place
With relational, this is hard
Long development times
Inflexible
Doesn’t scale
Document model is much easier
Shorter development times
Flexible
Scalable
{
"patient_id": "1177099",
"first_name": "John",
"last_name": "Doe",
"middle_initial": "A",
"dob": "2000-01-25",
"gender": "Male",
"blood_type": "B+",
"address": "123 Elm St., Chicago, IL 59923",
"height": "66",
"weight": "110",
"allergies": ["Nuts", "Penicillin", "Pet Dander"],
"current_medications": [{"name": "Zoloft",
"dosage": "2mg",
"frequency": "daily",
"route": "orally"}],
"complaint" : [{"entered": "2000-11-03",
"onset": "2000-11-03",
"prob_desc": "",
"icd" : 250.00,
"status" : "Active"},
{"entered": "2000-02-04",
"onset": "2000-02-04",
"prob_desc": "in spite of regular exercise, ...",
"icd" : 401.9,
"status" : "Active"}],
"diagnosis" : [{"visit" : "2005-07-22" ,
"narrative" : "Fractured femur",
"icd" : "9999",
"priority" : "Primary"},
{"visit" : "2005-07-22" ,
"narrative" : "Type II Diabetes",
"icd" : "250.00",
"priority" : "Secondary"}]
}
Modeling Entities
and Relationships
Let’s model something together
How about a business card?
Business Card
Address Book Entity-Relationship
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1
Modeling One-to-One Relationships
Referencing
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Use two collections with a reference
Similar to relational
Contact
• name
• company
• adress
• Street
• City
• State
• Zip
• title
• phone
• address
• street
• city
• State
• zip_code
Embedding
Document Schema
Referencing
Contacts
{
“_id”: 2,
“name”: “Steven Jobs”,
“title”:“VP, New Product Development”,
“company”: “Apple Computer”,
“phone”: “408-996-1010”,
“address_id”: 1
}
Addresses
{
“_id”: 1,
“street”:“10260 Bandley Dr”,
“city”: “Cupertino”,
“state”:“CA”,
“zip_code”:”95014”,
“country”: “USA”
}
Embedding
Contacts
{
“_id”:2,
“name”:“Steven Jobs”,
“title”:“VP,New Product Development”,
“company”: “AppleComputer”,
“address”:{“street”: “10260 BandleyDr”,
“city”: “Cupertino”,
“state”: “CA”,
“zip_code”:”95014”,
“country”: “USA”},
“phone”: “408-996-1010”
}
How are they different? Why?
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Contact
• name
• company
• adress
• Street
• City
• State
• Zip
• title
• phone
• address
• street
• city
• state
• zip_code
Schema Flexibility
{
“name”: “StevenJobs”,
“title”:“VP, NewProductDevelopment”,
“company”: “AppleComputer”,
“address”:{
“street”: 10260BandleyDr”,
“city”: “Cupertino”,
“state”: “CA”,
“zip_code”:“95014”
},
“phone”:“408-996-1010”
}
{
“name”: “Larry Page,
“url”:“http://google.com”,
“title”:“CEO”,
“company”: “Google!”,
“address”:{
“street”: 555 Bryant, #106”,
“city”: “PaloAlto”,
“state”: “CA”,
“zip_code”:“94301”
},
“phone”: “650-330-0100”
“fax”: ”650-330-1499”
}
One to One
Schema Design Choices
contact
twitter_id
twitter1 1
contact twitter
contact_id1 1
Redundant to track
relationship on both
sides
May save a fetch?
Contact
twitter
twitter 1
One to One: General Recommendations
• Embed
– Full contact info all at once
– Parent-child relationship “contains”
– No additional data duplication
– Can query or index on embedded field
• e.g., “twitter.name”
• Exceptional cases…
• Embedding results in large documents
Contact
twitter
twitter 1
Modeling One-to-Many Relationships
One to Many
Schema Design Choices
contact
phone_ids: [ ]
phone1 N
contact phone
contact_id1 N
Redundant to track
relationship on both
sides
Not possible in relational DBs
Contact
phones
phoneN
One-to-many embedding vs.
referencing
{
“name”: “Larry Page”,
“url”: “http://google.com/”,
“title”: “CEO”,
“company”: “Google!”,
“email”: “larry@google.com”,
“address”: [{
“street”: “555 Bryant, #106”,
“city”: “Palo Alto”,
“state”: “CA”,
“zip_code”: “94301”
}]
“phones”: [{“type”: “Office”,
“number”: “650-618-1499”},
{“type”: “fax”,
“number”: “650-330-0100”}]
}
{
“name”: “Larry Page”,
“url”: “http://google.com/”,
“title”: “CEO”,
“company”: “Google!”,
“email”: “larry@google.com”,
“address”: [“addr99”],
“phones”: [“ph23”, “ph49”]}
{ “_id”: “addr99”,
“street”: “555 Bryant, #106”,
“city”: “Palo Alto”,
“state”: “CA”,
“zip_code”: “94301”}
{ “_id”: “ph23”,
“type”: “Office”,
“number”: “650-618-1499”},
{ “_id”: “ph49”,
“type”: “fax”,
“number”: “650-330-0100”}
One to Many
General Recommendation
• Embed when possible
– Full contact info all at once
– Parent-children relationship “contains”
– No additional data duplication
– Can query or index on any field
• e.g., { “phones.type”: “mobile” }
• Exceptional cases…
• Scaling: maximum document size is 16MB
Contact
phones
phone N
Modeling Many-to-Many
Relationships
Many to Many
Traditional Relational Association
Join table
Contacts
name
company
title
phone
Groups
name
GroupContacts
group_id
contact_id
X
Use arrays instead
Many to Many
Schema Design Choices
group
contact_ids: [ ]
contactN N
group contact
group_ids: [ ]N N
Redundant to track
relationship on both sides
• Both references must be
updated for consistency
Redundant to track
relationship on both sides
• Duplicated data must be
updated for consistency
group
contacts
contact
N
contact
groups
group
N
Many to Many
General Recommendation
• Use case determines whether to reference
or embed:
1. Simple address book
• Contact references groups
2. Corporate email groups
• Group embeds contacts for performance
• Exceptional cases
– Scaling: maximum document size is 16MB
– Scaling may affect performance and working set
group contact
group_ids: [ ]N N
Address Book Entity-Relationship
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1
Contacts
• name
• company
• title
addresses
• type
• street
• city
• state
• zip_code
phones
• type
• number
emails
• type
• address
thumbnail
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
twitter
• name
• location
• web
• bio
N
N
N
1
1
Document model - holistic and efficient representation
Contact document example
{
“name”:“GaryJ.Murakami,Ph.D.”,
“company”:“MongoDB,Inc”,
“title”:“LeadEngineerandRubyEvangelist”,
“twitter”:{
“name”:“GaryMurakami”,“location”:“NewProvidence,NJ”,
“web”:“http://www.nobell.org”
},
“portrait_id”:1,
“addresses”:[
{“type”:“work”,“street”:”229W43rdSt.”,“city”:“NewYork”,“zip_code”:“10036”}
],
“phones”:[
{“type”:“work”,“number”:“1-866-237-8815x8015”}
],
“emails”:[
{“type”:“work”,“address”:“gary.murakami@mongodb.com”},
{“type”:“home”,“address”:“gjm@nobell.org”}
]
}
General Recommendations
Legacy Migration
1. Copy existing schema & some data to MongoDB
2. Iterative schema design development
– Measure performance, find bottlenecks, and embed
1. one to one associations first
2. one to many associations next
3. many to many associations
– eliminate join table using array of references or embedded
documents
– Measure and analyze, review concerns, scaling
• Embed by default
New Software Application
Embedding over Referencing
• Embedding is a bit like pre-joined data
– BSON (Binary JSON) document ops are easy for the
server
• Embed (90/10 following rule of thumb)
– When the “one” or “many” objects are viewed in the
context of their parent
– For performance
– For atomicity
• Reference
– When you need more scaling
– For easy consistency with “many to many” associations
without duplicated data
It’s All About Your Application
• Programs+Databases = (Big) Data Applications
• Your schema is the impedance matcher
– Design choices: normalize/denormalize,
reference/embed
– Melds programming with MongoDB for best of both
– Flexible for development and change
• Programs MongoDB = Great Big Data
Applications
Questions?
Thank You
Solutions Architect, MongoDB
Jay Runkel
jay.runkel@mongodb.com
@jayrunkel
#MongoDB

More Related Content

Similar to Webinar: Schema Design

Schema Design by Gary Murakami
Schema Design by Gary MurakamiSchema Design by Gary Murakami
Schema Design by Gary MurakamiMongoDB
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhenDavid Peyruc
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfjill734733
 
Graph all the things - PRathle
Graph all the things - PRathleGraph all the things - PRathle
Graph all the things - PRathleNeo4j
 
JSON Data Modeling - GDG Indy - April 2020
JSON Data Modeling - GDG Indy - April 2020JSON Data Modeling - GDG Indy - April 2020
JSON Data Modeling - GDG Indy - April 2020Matthew Groves
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2Neo4j
 
Voice Summit 2018 - Millions of Dollars in Helping Customers Through Searchin...
Voice Summit 2018 - Millions of Dollars in Helping Customers Through Searchin...Voice Summit 2018 - Millions of Dollars in Helping Customers Through Searchin...
Voice Summit 2018 - Millions of Dollars in Helping Customers Through Searchin...Noriaki Tatsumi
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to GraphsNeo4j
 
Schema design mongo_boston
Schema design mongo_bostonSchema design mongo_boston
Schema design mongo_bostonMongoDB
 
The Connected Data Imperative: The Shifting Enterprise Data Story
The Connected Data Imperative: The Shifting Enterprise Data StoryThe Connected Data Imperative: The Shifting Enterprise Data Story
The Connected Data Imperative: The Shifting Enterprise Data StoryNeo4j
 
Schema Design
Schema DesignSchema Design
Schema DesignMongoDB
 
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?MongoDB
 
Single View of the Customer
Single View of the Customer Single View of the Customer
Single View of the Customer MongoDB
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB MongoDB
 
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a TimeWebinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a TimeMongoDB
 
Thwart Fraud Using Graph-Enhanced Machine Learning and AI
Thwart Fraud Using Graph-Enhanced Machine Learning and AIThwart Fraud Using Graph-Enhanced Machine Learning and AI
Thwart Fraud Using Graph-Enhanced Machine Learning and AINeo4j
 
Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Chris Grabosky
 
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016Calin Constantinov
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Designaaronheckmann
 

Similar to Webinar: Schema Design (20)

Schema Design by Gary Murakami
Schema Design by Gary MurakamiSchema Design by Gary Murakami
Schema Design by Gary Murakami
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
 
Graph all the things - PRathle
Graph all the things - PRathleGraph all the things - PRathle
Graph all the things - PRathle
 
JSON Data Modeling - GDG Indy - April 2020
JSON Data Modeling - GDG Indy - April 2020JSON Data Modeling - GDG Indy - April 2020
JSON Data Modeling - GDG Indy - April 2020
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
 
Voice Summit 2018 - Millions of Dollars in Helping Customers Through Searchin...
Voice Summit 2018 - Millions of Dollars in Helping Customers Through Searchin...Voice Summit 2018 - Millions of Dollars in Helping Customers Through Searchin...
Voice Summit 2018 - Millions of Dollars in Helping Customers Through Searchin...
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to Graphs
 
Schema design mongo_boston
Schema design mongo_bostonSchema design mongo_boston
Schema design mongo_boston
 
The Connected Data Imperative: The Shifting Enterprise Data Story
The Connected Data Imperative: The Shifting Enterprise Data StoryThe Connected Data Imperative: The Shifting Enterprise Data Story
The Connected Data Imperative: The Shifting Enterprise Data Story
 
Schema Design
Schema DesignSchema Design
Schema Design
 
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
 
Single View of the Customer
Single View of the Customer Single View of the Customer
Single View of the Customer
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
 
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a TimeWebinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
 
Thwart Fraud Using Graph-Enhanced Machine Learning and AI
Thwart Fraud Using Graph-Enhanced Machine Learning and AIThwart Fraud Using Graph-Enhanced Machine Learning and AI
Thwart Fraud Using Graph-Enhanced Machine Learning and AI
 
Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization
 
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Webinar: Schema Design

  • 1. Schema Design Solutions Architect, MongoDB Jay Runkel #MongoDB
  • 2. First a story: Once upon a time there was a medical records company…
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. • Schema Design Challenge • Modeling Relationships in MongoDB • An Example • General Recommendations Agenda
  • 10. • Flexibility – Easily adapt to new requirements • Agility – Rapid application development • Scalability – Support large data and query volumes Schema Design Challenge
  • 11. • How do we model data and relationships to ensure: –Flexibility –Agility –Scalability Schema Design Challenge
  • 13. MongoDB Relational Collections Tables Documents Rows Data Use Data Storage What questions do I have? What answers do I have? MongoDB versus Relational
  • 14. Attribute MongoDB Relational Storage N-dimensional Two-dimensional Field Values 0, 1, many, or embed Single value Query Any field or level Any field Schema Flexible Very structured Updates In line In place
  • 15. With relational, this is hard Long development times Inflexible Doesn’t scale
  • 16. Document model is much easier Shorter development times Flexible Scalable { "patient_id": "1177099", "first_name": "John", "last_name": "Doe", "middle_initial": "A", "dob": "2000-01-25", "gender": "Male", "blood_type": "B+", "address": "123 Elm St., Chicago, IL 59923", "height": "66", "weight": "110", "allergies": ["Nuts", "Penicillin", "Pet Dander"], "current_medications": [{"name": "Zoloft", "dosage": "2mg", "frequency": "daily", "route": "orally"}], "complaint" : [{"entered": "2000-11-03", "onset": "2000-11-03", "prob_desc": "", "icd" : 250.00, "status" : "Active"}, {"entered": "2000-02-04", "onset": "2000-02-04", "prob_desc": "in spite of regular exercise, ...", "icd" : 401.9, "status" : "Active"}], "diagnosis" : [{"visit" : "2005-07-22" , "narrative" : "Fractured femur", "icd" : "9999", "priority" : "Primary"}, {"visit" : "2005-07-22" , "narrative" : "Type II Diabetes", "icd" : "250.00", "priority" : "Secondary"}] }
  • 18. Let’s model something together How about a business card?
  • 20. Address Book Entity-Relationship Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  • 22. Referencing Contact • name • company • title • phone Address • street • city • state • zip_code Use two collections with a reference Similar to relational
  • 23. Contact • name • company • adress • Street • City • State • Zip • title • phone • address • street • city • State • zip_code Embedding Document Schema
  • 24. Referencing Contacts { “_id”: 2, “name”: “Steven Jobs”, “title”:“VP, New Product Development”, “company”: “Apple Computer”, “phone”: “408-996-1010”, “address_id”: 1 } Addresses { “_id”: 1, “street”:“10260 Bandley Dr”, “city”: “Cupertino”, “state”:“CA”, “zip_code”:”95014”, “country”: “USA” }
  • 25. Embedding Contacts { “_id”:2, “name”:“Steven Jobs”, “title”:“VP,New Product Development”, “company”: “AppleComputer”, “address”:{“street”: “10260 BandleyDr”, “city”: “Cupertino”, “state”: “CA”, “zip_code”:”95014”, “country”: “USA”}, “phone”: “408-996-1010” }
  • 26. How are they different? Why? Contact • name • company • title • phone Address • street • city • state • zip_code Contact • name • company • adress • Street • City • State • Zip • title • phone • address • street • city • state • zip_code
  • 27. Schema Flexibility { “name”: “StevenJobs”, “title”:“VP, NewProductDevelopment”, “company”: “AppleComputer”, “address”:{ “street”: 10260BandleyDr”, “city”: “Cupertino”, “state”: “CA”, “zip_code”:“95014” }, “phone”:“408-996-1010” } { “name”: “Larry Page, “url”:“http://google.com”, “title”:“CEO”, “company”: “Google!”, “address”:{ “street”: 555 Bryant, #106”, “city”: “PaloAlto”, “state”: “CA”, “zip_code”:“94301” }, “phone”: “650-330-0100” “fax”: ”650-330-1499” }
  • 28. One to One Schema Design Choices contact twitter_id twitter1 1 contact twitter contact_id1 1 Redundant to track relationship on both sides May save a fetch? Contact twitter twitter 1
  • 29. One to One: General Recommendations • Embed – Full contact info all at once – Parent-child relationship “contains” – No additional data duplication – Can query or index on embedded field • e.g., “twitter.name” • Exceptional cases… • Embedding results in large documents Contact twitter twitter 1
  • 31. One to Many Schema Design Choices contact phone_ids: [ ] phone1 N contact phone contact_id1 N Redundant to track relationship on both sides Not possible in relational DBs Contact phones phoneN
  • 32. One-to-many embedding vs. referencing { “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “larry@google.com”, “address”: [{ “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301” }] “phones”: [{“type”: “Office”, “number”: “650-618-1499”}, {“type”: “fax”, “number”: “650-330-0100”}] } { “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “larry@google.com”, “address”: [“addr99”], “phones”: [“ph23”, “ph49”]} { “_id”: “addr99”, “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301”} { “_id”: “ph23”, “type”: “Office”, “number”: “650-618-1499”}, { “_id”: “ph49”, “type”: “fax”, “number”: “650-330-0100”}
  • 33. One to Many General Recommendation • Embed when possible – Full contact info all at once – Parent-children relationship “contains” – No additional data duplication – Can query or index on any field • e.g., { “phones.type”: “mobile” } • Exceptional cases… • Scaling: maximum document size is 16MB Contact phones phone N
  • 35. Many to Many Traditional Relational Association Join table Contacts name company title phone Groups name GroupContacts group_id contact_id X Use arrays instead
  • 36. Many to Many Schema Design Choices group contact_ids: [ ] contactN N group contact group_ids: [ ]N N Redundant to track relationship on both sides • Both references must be updated for consistency Redundant to track relationship on both sides • Duplicated data must be updated for consistency group contacts contact N contact groups group N
  • 37. Many to Many General Recommendation • Use case determines whether to reference or embed: 1. Simple address book • Contact references groups 2. Corporate email groups • Group embeds contacts for performance • Exceptional cases – Scaling: maximum document size is 16MB – Scaling may affect performance and working set group contact group_ids: [ ]N N
  • 38. Address Book Entity-Relationship Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  • 39. Contacts • name • company • title addresses • type • street • city • state • zip_code phones • type • number emails • type • address thumbnail • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 twitter • name • location • web • bio N N N 1 1 Document model - holistic and efficient representation
  • 42. Legacy Migration 1. Copy existing schema & some data to MongoDB 2. Iterative schema design development – Measure performance, find bottlenecks, and embed 1. one to one associations first 2. one to many associations next 3. many to many associations – eliminate join table using array of references or embedded documents – Measure and analyze, review concerns, scaling
  • 43. • Embed by default New Software Application
  • 44. Embedding over Referencing • Embedding is a bit like pre-joined data – BSON (Binary JSON) document ops are easy for the server • Embed (90/10 following rule of thumb) – When the “one” or “many” objects are viewed in the context of their parent – For performance – For atomicity • Reference – When you need more scaling – For easy consistency with “many to many” associations without duplicated data
  • 45. It’s All About Your Application • Programs+Databases = (Big) Data Applications • Your schema is the impedance matcher – Design choices: normalize/denormalize, reference/embed – Melds programming with MongoDB for best of both – Flexible for development and change • Programs MongoDB = Great Big Data Applications
  • 47. Thank You Solutions Architect, MongoDB Jay Runkel jay.runkel@mongodb.com @jayrunkel #MongoDB

Editor's Notes

  1. Schema Design is very important; its impact on your application is pervasive.
  2. Wrong data structure will hurt you.Proper data structure can make all the pieces fall into place.
  3. Two-dimensional storage of ordered tuples or traditional records.The winning technology is that every field/value is first class,In essence, every field can be addressed in queries and can be indexed for faster processing.Normalization process requires many tables, joins to rehydrate, indexes to make joins faster, and results in poor data locality.
  4. The essential capability of the winningtechnology frompersists and gets even better.The document structure can match your data structures – your schema.
  5. One-dimensional storage can be very fast but is relatively limited with respect to other DBMS.
  6. Not “schema-less” but rather “flexible schema”Common structure can be enforced by applicationWhile MongoDB does not enforce common structure, neither does it restrict your applicationDocuments may have a common structure that is optionally extended at the document-levelExample problems for traditionalMany empty columns instead of subclassing via yet another tableThree days for schema migrationKeywords: flexible, choice, evolve, change, modify
  7. Concept of arrays incorporates multiple values, associations involving many entities.The lack of multivalued fields is usually the first complaint of programmers that don’t wish to pay the cost for normalization.Keywords: array, multiple, many
  8. Documents may have a common structure that is optionally extended at the document-level.The application mapping can enforce the required and optional fields.
  9. A common example will help us understand the joy of flexible document structure.
  10. Left: One to one We're going to assume users only have on Twitter account. A thumbnail is a small profile image while portrait is a very large profile image.Right: One to manyMiddle: Many to many
  11. $project allows you to select top level fields and can be used to reduce data for a fetch. Note that some ODMs may not allow you to specify $project.
  12. BSON (Binary JSON) is the “magic” or core technology in MongoDB for data structures and performance.BSON does not have to be parsed like JSON, but is rather a format that can be traversed easily.Can choose a language to fit your application, or multiple languages to fit multiple components of your application as appropriate.