Webinar: Schema Design

Schema Design
Solutions Architect, MongoDB
Jay Runkel
#MongoDB

First a story:
Once upon a time there was a
medical records company…

• Schema Design Challenge
• Modeling Relationships in MongoDB
• An Example
• General Recommendations
Agenda

• Flexibility
– Easily adapt to new requirements
• Agility
– Rapid application development
• Scalability
– Support large data and query volumes
Schema Design Challenge

• How do we model data and relationships to
ensure:
–Flexibility
–Agility
–Scalability
Schema Design Challenge

Schema Design:
MongoDB vs. Relational

MongoDB Relational
Collections Tables
Documents Rows
Data Use Data Storage
What questions do I have? What answers do I have?
MongoDB versus Relational

Attribute MongoDB Relational
Storage N-dimensional Two-dimensional
Field Values
0, 1, many, or
embed
Single value
Query Any field or level Any field
Schema Flexible Very structured
Updates In line In place

With relational, this is hard
Long development times
Inflexible
Doesn’t scale

Document model is much easier
Shorter development times
Flexible
Scalable
{
"patient_id": "1177099",
"first_name": "John",
"last_name": "Doe",
"middle_initial": "A",
"dob": "2000-01-25",
"gender": "Male",
"blood_type": "B+",
"address": "123 Elm St., Chicago, IL 59923",
"height": "66",
"weight": "110",
"allergies": ["Nuts", "Penicillin", "Pet Dander"],
"current_medications": [{"name": "Zoloft",
"dosage": "2mg",
"frequency": "daily",
"route": "orally"}],
"complaint" : [{"entered": "2000-11-03",
"onset": "2000-11-03",
"prob_desc": "",
"icd" : 250.00,
"status" : "Active"},
{"entered": "2000-02-04",
"onset": "2000-02-04",
"prob_desc": "in spite of regular exercise, ...",
"icd" : 401.9,
"status" : "Active"}],
"diagnosis" : [{"visit" : "2005-07-22" ,
"narrative" : "Fractured femur",
"icd" : "9999",
"priority" : "Primary"},
{"visit" : "2005-07-22" ,
"narrative" : "Type II Diabetes",
"icd" : "250.00",
"priority" : "Secondary"}]
}

Modeling Entities
and Relationships

Let’s model something together
How about a business card?

Address Book Entity-Relationship
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1

Modeling One-to-One Relationships

Referencing
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Use two collections with a reference
Similar to relational

Contact
• name
• company
• adress
• Street
• City
• State
• Zip
• title
• phone
• address
• street
• city
• State
• zip_code
Embedding
Document Schema

Referencing
Contacts
{
“_id”: 2,
“name”: “Steven Jobs”,
“title”:“VP, New Product Development”,
“company”: “Apple Computer”,
“phone”: “408-996-1010”,
“address_id”: 1
}
Addresses
{
“_id”: 1,
“street”:“10260 Bandley Dr”,
“city”: “Cupertino”,
“state”:“CA”,
“zip_code”:”95014”,
“country”: “USA”
}

Embedding
Contacts
{
“_id”:2,
“name”:“Steven Jobs”,
“title”:“VP,New Product Development”,
“company”: “AppleComputer”,
“address”:{“street”: “10260 BandleyDr”,
“state”: “CA”,
“zip_code”:”95014”,
“country”: “USA”},
“phone”: “408-996-1010”
}

How are they different? Why?
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Contact
• name
• company
• adress
• Street
• City
• State
• Zip
• title
• phone
• address
• street
• city
• state
• zip_code

Schema Flexibility
{
“name”: “StevenJobs”,
“title”:“VP, NewProductDevelopment”,
“company”: “AppleComputer”,
“address”:{
“street”: 10260BandleyDr”,
“zip_code”:“95014”
},
“phone”:“408-996-1010”
}
{
“name”: “Larry Page,
“url”:“http://google.com”,
“title”:“CEO”,
“company”: “Google!”,
“address”:{
“street”: 555 Bryant, #106”,
“city”: “PaloAlto”,
“zip_code”:“94301”
},
“phone”: “650-330-0100”
“fax”: ”650-330-1499”
}

One to One
Schema Design Choices
contact
twitter_id
twitter1 1
contact twitter
contact_id1 1
Redundant to track
relationship on both
sides
May save a fetch?
Contact
twitter
twitter 1

One to One: General Recommendations
• Embed
– Full contact info all at once
– Parent-child relationship “contains”
– No additional data duplication
– Can query or index on embedded field
• e.g., “twitter.name”
• Exceptional cases…
• Embedding results in large documents
Contact
twitter
twitter 1

Modeling One-to-Many Relationships

One to Many
contact
phone_ids: [ ]
phone1 N
contact phone
contact_id1 N
Redundant to track
relationship on both
sides
Not possible in relational DBs
Contact
phones
phoneN

One-to-many embedding vs.
referencing
{
“name”: “Larry Page”,
“url”: “http://google.com/”,
“title”: “CEO”,
“email”: “larry@google.com”,
“address”: [{
“street”: “555 Bryant, #106”,
“city”: “Palo Alto”,
“zip_code”: “94301”
}]
“phones”: [{“type”: “Office”,
“number”: “650-618-1499”},
{“type”: “fax”,
“number”: “650-330-0100”}]
}
{
“name”: “Larry Page”,
“url”: “http://google.com/”,
“title”: “CEO”,
“email”: “larry@google.com”,
“address”: [“addr99”],
“phones”: [“ph23”, “ph49”]}
{ “_id”: “addr99”,
“street”: “555 Bryant, #106”,
“city”: “Palo Alto”,
“zip_code”: “94301”}
{ “_id”: “ph23”,
“type”: “Office”,
“number”: “650-618-1499”},
{ “_id”: “ph49”,
“type”: “fax”,
“number”: “650-330-0100”}

One to Many
General Recommendation
• Embed when possible
– Full contact info all at once
– Parent-children relationship “contains”
– No additional data duplication
– Can query or index on any field
• e.g., { “phones.type”: “mobile” }
• Exceptional cases…
• Scaling: maximum document size is 16MB
Contact
phones
phone N

Modeling Many-to-Many
Relationships

Many to Many
Traditional Relational Association
Join table
Contacts
name
company
title
phone
Groups
name
GroupContacts
group_id
contact_id
X
Use arrays instead

Many to Many
group
contact_ids: [ ]
contactN N
group contact
group_ids: [ ]N N
Redundant to track
relationship on both sides
• Both references must be
updated for consistency
Redundant to track
relationship on both sides
• Duplicated data must be
updated for consistency
group
contacts
contact
N
contact
groups
group
N

Many to Many
General Recommendation
• Use case determines whether to reference
or embed:
1. Simple address book
• Contact references groups
2. Corporate email groups
• Group embeds contacts for performance
• Exceptional cases
– Scaling: maximum document size is 16MB
– Scaling may affect performance and working set
group contact
group_ids: [ ]N N

Contacts
• name
• company
• title
addresses
• type
• street
• city
• state
• zip_code
phones
• type
• number
emails
• type
• address
thumbnail
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
twitter
• name
• location
• web
• bio
N
N
N
1
1
Document model - holistic and efficient representation

Contact document example
{
“name”:“GaryJ.Murakami,Ph.D.”,
“company”:“MongoDB,Inc”,
“title”:“LeadEngineerandRubyEvangelist”,
“twitter”:{
“name”:“GaryMurakami”,“location”:“NewProvidence,NJ”,
“web”:“http://www.nobell.org”
},
“portrait_id”:1,
“addresses”:[
{“type”:“work”,“street”:”229W43rdSt.”,“city”:“NewYork”,“zip_code”:“10036”}
],
“phones”:[
{“type”:“work”,“number”:“1-866-237-8815x8015”}
],
“emails”:[
{“type”:“work”,“address”:“gary.murakami@mongodb.com”},
{“type”:“home”,“address”:“gjm@nobell.org”}
]
}

Legacy Migration
1. Copy existing schema & some data to MongoDB
2. Iterative schema design development
– Measure performance, find bottlenecks, and embed
1. one to one associations first
2. one to many associations next
3. many to many associations
– eliminate join table using array of references or embedded
documents
– Measure and analyze, review concerns, scaling

• Embed by default
New Software Application

Embedding over Referencing
• Embedding is a bit like pre-joined data
– BSON (Binary JSON) document ops are easy for the
server
• Embed (90/10 following rule of thumb)
– When the “one” or “many” objects are viewed in the
context of their parent
– For performance
– For atomicity
• Reference
– When you need more scaling
– For easy consistency with “many to many” associations
without duplicated data

It’s All About Your Application
• Programs+Databases = (Big) Data Applications
• Your schema is the impedance matcher
– Design choices: normalize/denormalize,
reference/embed
– Melds programming with MongoDB for best of both
– Flexible for development and change
• Programs MongoDB = Great Big Data
Applications

Thank You
Solutions Architect, MongoDB
Jay Runkel
jay.runkel@mongodb.com
@jayrunkel
#MongoDB

Webinar: Schema Design

Recommended

Recommended

More Related Content

Similar to Webinar: Schema Design

Similar to Webinar: Schema Design (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

Webinar: Schema Design

Editor's Notes