Schema Design
Solutions Architect, MongoDB
Jay Runkel
#MongoDB
First a story:
Once upon a time there was a
medical records company…
• Schema Design Challenge
• Modeling Relationships in MongoDB
• An Example
• General Recommendations
Agenda
Schema Design Challenges
• Flexibility
– Easily adapt to new requirements
• Agility
– Rapid application development
• Scalability
– Support large d...
• How do we model data and relationships to
ensure:
–Flexibility
–Agility
–Scalability
Schema Design Challenge
Schema Design:
MongoDB vs. Relational
MongoDB Relational
Collections Tables
Documents Rows
Data Use Data Storage
What questions do I have? What answers do I hav...
Attribute MongoDB Relational
Storage N-dimensional Two-dimensional
Field Values
0, 1, many, or
embed
Single value
Query An...
With relational, this is hard
Long development times
Inflexible
Doesn’t scale
Document model is much easier
Shorter development times
Flexible
Scalable
{
"patient_id": "1177099",
"first_name": "John",...
Modeling Entities
and Relationships
Let’s model something together
How about a business card?
Business Card
Address Book Entity-Relationship
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Pho...
Modeling One-to-One Relationships
Referencing
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Use two collections with a...
Contact
• name
• company
• adress
• Street
• City
• State
• Zip
• title
• phone
• address
• street
• city
• State
• zip_co...
Referencing
Contacts
{
“_id”: 2,
“name”: “Steven Jobs”,
“title”:“VP, New Product Development”,
“company”: “Apple Computer”...
Embedding
Contacts
{
“_id”:2,
“name”:“Steven Jobs”,
“title”:“VP,New Product Development”,
“company”: “AppleComputer”,
“add...
How are they different? Why?
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Contact
•...
Schema Flexibility
{
“name”: “StevenJobs”,
“title”:“VP, NewProductDevelopment”,
“company”: “AppleComputer”,
“address”:{
“s...
One to One
Schema Design Choices
contact
twitter_id
twitter1 1
contact twitter
contact_id1 1
Redundant to track
relationsh...
One to One: General Recommendations
• Embed
– Full contact info all at once
– Parent-child relationship “contains”
– No ad...
Modeling One-to-Many Relationships
One to Many
Schema Design Choices
contact
phone_ids: [ ]
phone1 N
contact phone
contact_id1 N
Redundant to track
relations...
One-to-many embedding vs.
referencing
{
“name”: “Larry Page”,
“url”: “http://google.com/”,
“title”: “CEO”,
“company”: “Goo...
One to Many
General Recommendation
• Embed when possible
– Full contact info all at once
– Parent-children relationship “c...
Modeling Many-to-Many
Relationships
Many to Many
Traditional Relational Association
Join table
Contacts
name
company
title
phone
Groups
name
GroupContacts
gro...
Many to Many
Schema Design Choices
group
contact_ids: [ ]
contactN N
group contact
group_ids: [ ]N N
Redundant to track
re...
Many to Many
General Recommendation
• Use case determines whether to reference
or embed:
1. Simple address book
• Contact ...
Address Book Entity-Relationship
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Pho...
Contacts
• name
• company
• title
addresses
• type
• street
• city
• state
• zip_code
phones
• type
• number
emails
• type...
Contact document example
{
“name”:“GaryJ.Murakami,Ph.D.”,
“company”:“MongoDB,Inc”,
“title”:“LeadEngineerandRubyEvangelist”...
General Recommendations
Legacy Migration
1. Copy existing schema & some data to MongoDB
2. Iterative schema design development
– Measure performan...
• Embed by default
New Software Application
Embedding over Referencing
• Embedding is a bit like pre-joined data
– BSON (Binary JSON) document ops are easy for the
se...
It’s All About Your Application
• Programs+Databases = (Big) Data Applications
• Your schema is the impedance matcher
– De...
Questions?
Thank You
Solutions Architect, MongoDB
Jay Runkel
jay.runkel@mongodb.com
@jayrunkel
#MongoDB
Webinar: Schema Design
Webinar: Schema Design
Webinar: Schema Design
Webinar: Schema Design
Webinar: Schema Design
Upcoming SlideShare
Loading in …5
×

Webinar: Schema Design

3,535 views

Published on

One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for relational databases, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense.

Published in: Technology
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,535
On SlideShare
0
From Embeds
0
Number of Embeds
2,292
Actions
Shares
0
Downloads
99
Comments
0
Likes
11
Embeds 0
No embeds

No notes for slide
  • Schema Design is very important; its impact on your application is pervasive.
  • Wrong data structure will hurt you.Proper data structure can make all the pieces fall into place.
  • Two-dimensional storage of ordered tuples or traditional records.The winning technology is that every field/value is first class,In essence, every field can be addressed in queries and can be indexed for faster processing.Normalization process requires many tables, joins to rehydrate, indexes to make joins faster, and results in poor data locality.
  • The essential capability of the winningtechnology frompersists and gets even better.The document structure can match your data structures – your schema.
  • One-dimensional storage can be very fast but is relatively limited with respect to other DBMS.
  • Not “schema-less” but rather “flexible schema”Common structure can be enforced by applicationWhile MongoDB does not enforce common structure, neither does it restrict your applicationDocuments may have a common structure that is optionally extended at the document-levelExample problems for traditionalMany empty columns instead of subclassing via yet another tableThree days for schema migrationKeywords: flexible, choice, evolve, change, modify
  • Concept of arrays incorporates multiple values, associations involving many entities.The lack of multivalued fields is usually the first complaint of programmers that don’t wish to pay the cost for normalization.Keywords: array, multiple, many
  • Documents may have a common structure that is optionally extended at the document-level.The application mapping can enforce the required and optional fields.
  • A common example will help us understand the joy of flexible document structure.
  • Left: One to one We're going to assume users only have on Twitter account. A thumbnail is a small profile image while portrait is a very large profile image.Right: One to manyMiddle: Many to many
  • $project allows you to select top level fields and can be used to reduce data for a fetch. Note that some ODMs may not allow you to specify $project.
  • BSON (Binary JSON) is the “magic” or core technology in MongoDB for data structures and performance.BSON does not have to be parsed like JSON, but is rather a format that can be traversed easily.Can choose a language to fit your application, or multiple languages to fit multiple components of your application as appropriate.
  • Webinar: Schema Design

    1. 1. Schema Design Solutions Architect, MongoDB Jay Runkel #MongoDB
    2. 2. First a story: Once upon a time there was a medical records company…
    3. 3. • Schema Design Challenge • Modeling Relationships in MongoDB • An Example • General Recommendations Agenda
    4. 4. Schema Design Challenges
    5. 5. • Flexibility – Easily adapt to new requirements • Agility – Rapid application development • Scalability – Support large data and query volumes Schema Design Challenge
    6. 6. • How do we model data and relationships to ensure: –Flexibility –Agility –Scalability Schema Design Challenge
    7. 7. Schema Design: MongoDB vs. Relational
    8. 8. MongoDB Relational Collections Tables Documents Rows Data Use Data Storage What questions do I have? What answers do I have? MongoDB versus Relational
    9. 9. Attribute MongoDB Relational Storage N-dimensional Two-dimensional Field Values 0, 1, many, or embed Single value Query Any field or level Any field Schema Flexible Very structured Updates In line In place
    10. 10. With relational, this is hard Long development times Inflexible Doesn’t scale
    11. 11. Document model is much easier Shorter development times Flexible Scalable { "patient_id": "1177099", "first_name": "John", "last_name": "Doe", "middle_initial": "A", "dob": "2000-01-25", "gender": "Male", "blood_type": "B+", "address": "123 Elm St., Chicago, IL 59923", "height": "66", "weight": "110", "allergies": ["Nuts", "Penicillin", "Pet Dander"], "current_medications": [{"name": "Zoloft", "dosage": "2mg", "frequency": "daily", "route": "orally"}], "complaint" : [{"entered": "2000-11-03", "onset": "2000-11-03", "prob_desc": "", "icd" : 250.00, "status" : "Active"}, {"entered": "2000-02-04", "onset": "2000-02-04", "prob_desc": "in spite of regular exercise, ...", "icd" : 401.9, "status" : "Active"}], "diagnosis" : [{"visit" : "2005-07-22" , "narrative" : "Fractured femur", "icd" : "9999", "priority" : "Primary"}, {"visit" : "2005-07-22" , "narrative" : "Type II Diabetes", "icd" : "250.00", "priority" : "Secondary"}] }
    12. 12. Modeling Entities and Relationships
    13. 13. Let’s model something together How about a business card?
    14. 14. Business Card
    15. 15. Address Book Entity-Relationship Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
    16. 16. Modeling One-to-One Relationships
    17. 17. Referencing Contact • name • company • title • phone Address • street • city • state • zip_code Use two collections with a reference Similar to relational
    18. 18. Contact • name • company • adress • Street • City • State • Zip • title • phone • address • street • city • State • zip_code Embedding Document Schema
    19. 19. Referencing Contacts { “_id”: 2, “name”: “Steven Jobs”, “title”:“VP, New Product Development”, “company”: “Apple Computer”, “phone”: “408-996-1010”, “address_id”: 1 } Addresses { “_id”: 1, “street”:“10260 Bandley Dr”, “city”: “Cupertino”, “state”:“CA”, “zip_code”:”95014”, “country”: “USA” }
    20. 20. Embedding Contacts { “_id”:2, “name”:“Steven Jobs”, “title”:“VP,New Product Development”, “company”: “AppleComputer”, “address”:{“street”: “10260 BandleyDr”, “city”: “Cupertino”, “state”: “CA”, “zip_code”:”95014”, “country”: “USA”}, “phone”: “408-996-1010” }
    21. 21. How are they different? Why? Contact • name • company • title • phone Address • street • city • state • zip_code Contact • name • company • adress • Street • City • State • Zip • title • phone • address • street • city • state • zip_code
    22. 22. Schema Flexibility { “name”: “StevenJobs”, “title”:“VP, NewProductDevelopment”, “company”: “AppleComputer”, “address”:{ “street”: 10260BandleyDr”, “city”: “Cupertino”, “state”: “CA”, “zip_code”:“95014” }, “phone”:“408-996-1010” } { “name”: “Larry Page, “url”:“http://google.com”, “title”:“CEO”, “company”: “Google!”, “address”:{ “street”: 555 Bryant, #106”, “city”: “PaloAlto”, “state”: “CA”, “zip_code”:“94301” }, “phone”: “650-330-0100” “fax”: ”650-330-1499” }
    23. 23. One to One Schema Design Choices contact twitter_id twitter1 1 contact twitter contact_id1 1 Redundant to track relationship on both sides May save a fetch? Contact twitter twitter 1
    24. 24. One to One: General Recommendations • Embed – Full contact info all at once – Parent-child relationship “contains” – No additional data duplication – Can query or index on embedded field • e.g., “twitter.name” • Exceptional cases… • Embedding results in large documents Contact twitter twitter 1
    25. 25. Modeling One-to-Many Relationships
    26. 26. One to Many Schema Design Choices contact phone_ids: [ ] phone1 N contact phone contact_id1 N Redundant to track relationship on both sides Not possible in relational DBs Contact phones phoneN
    27. 27. One-to-many embedding vs. referencing { “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “larry@google.com”, “address”: [{ “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301” }] “phones”: [{“type”: “Office”, “number”: “650-618-1499”}, {“type”: “fax”, “number”: “650-330-0100”}] } { “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “larry@google.com”, “address”: [“addr99”], “phones”: [“ph23”, “ph49”]} { “_id”: “addr99”, “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301”} { “_id”: “ph23”, “type”: “Office”, “number”: “650-618-1499”}, { “_id”: “ph49”, “type”: “fax”, “number”: “650-330-0100”}
    28. 28. One to Many General Recommendation • Embed when possible – Full contact info all at once – Parent-children relationship “contains” – No additional data duplication – Can query or index on any field • e.g., { “phones.type”: “mobile” } • Exceptional cases… • Scaling: maximum document size is 16MB Contact phones phone N
    29. 29. Modeling Many-to-Many Relationships
    30. 30. Many to Many Traditional Relational Association Join table Contacts name company title phone Groups name GroupContacts group_id contact_id X Use arrays instead
    31. 31. Many to Many Schema Design Choices group contact_ids: [ ] contactN N group contact group_ids: [ ]N N Redundant to track relationship on both sides • Both references must be updated for consistency Redundant to track relationship on both sides • Duplicated data must be updated for consistency group contacts contact N contact groups group N
    32. 32. Many to Many General Recommendation • Use case determines whether to reference or embed: 1. Simple address book • Contact references groups 2. Corporate email groups • Group embeds contacts for performance • Exceptional cases – Scaling: maximum document size is 16MB – Scaling may affect performance and working set group contact group_ids: [ ]N N
    33. 33. Address Book Entity-Relationship Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
    34. 34. Contacts • name • company • title addresses • type • street • city • state • zip_code phones • type • number emails • type • address thumbnail • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 twitter • name • location • web • bio N N N 1 1 Document model - holistic and efficient representation
    35. 35. Contact document example { “name”:“GaryJ.Murakami,Ph.D.”, “company”:“MongoDB,Inc”, “title”:“LeadEngineerandRubyEvangelist”, “twitter”:{ “name”:“GaryMurakami”,“location”:“NewProvidence,NJ”, “web”:“http://www.nobell.org” }, “portrait_id”:1, “addresses”:[ {“type”:“work”,“street”:”229W43rdSt.”,“city”:“NewYork”,“zip_code”:“10036”} ], “phones”:[ {“type”:“work”,“number”:“1-866-237-8815x8015”} ], “emails”:[ {“type”:“work”,“address”:“gary.murakami@mongodb.com”}, {“type”:“home”,“address”:“gjm@nobell.org”} ] }
    36. 36. General Recommendations
    37. 37. Legacy Migration 1. Copy existing schema & some data to MongoDB 2. Iterative schema design development – Measure performance, find bottlenecks, and embed 1. one to one associations first 2. one to many associations next 3. many to many associations – eliminate join table using array of references or embedded documents – Measure and analyze, review concerns, scaling
    38. 38. • Embed by default New Software Application
    39. 39. Embedding over Referencing • Embedding is a bit like pre-joined data – BSON (Binary JSON) document ops are easy for the server • Embed (90/10 following rule of thumb) – When the “one” or “many” objects are viewed in the context of their parent – For performance – For atomicity • Reference – When you need more scaling – For easy consistency with “many to many” associations without duplicated data
    40. 40. It’s All About Your Application • Programs+Databases = (Big) Data Applications • Your schema is the impedance matcher – Design choices: normalize/denormalize, reference/embed – Melds programming with MongoDB for best of both – Flexible for development and change • Programs MongoDB = Great Big Data Applications
    41. 41. Questions?
    42. 42. Thank You Solutions Architect, MongoDB Jay Runkel jay.runkel@mongodb.com @jayrunkel #MongoDB

    ×