Thinking in Documents
Perl Engineer & Evangelist, MongoDB, Inc
Mike Friedman
#mongodb
@friedo
Agenda
• What is a Record?
• Core Concepts
• What is an Entity?
• Associating Entities
• General Recommendations
All application development is
Schema Design
Success comes from
Proper Data Structure
What is a Record?
Key → Value
• One-dimensional storage
• Single value is a blob
• Query on key only
• No schema
• Value cannot be updated, ...
Relational
• Two-dimensional storage (tuples)
• Each field contains a single value
• Query on any field
• Very structured ...
Document
• N-dimensional storage
• Each field can contain 0, 1,
many, or embedded values
• Query on any field & level
• Fl...
Core Concepts
Traditional Schema Design
Focus on data storage
Document Schema Design
Focus on data use
What answers do I have?
What questions do I
have?
Schema Design is
Flexible
Flexibility
• Choices for schema design
• Each record can have different fields
• Field names consistent for programming
•...
Building Blocks of
Document Schema
Design
1 - Arrays
[
1, 2, 3, "four",
5, "six", [ 7, 8, 9 ]
]
1 – Arrays
Multiple Values per Field
• Absent
• Set to null
• Set to a single value
• Set to an array of many values
Each ...
1 – Arrays
Multiple Values per Field
• Query for any matching value
– Can be indexed and each value in the array is in the...
2 – Embedded
Documents{
"foo": 42,
"bar": 43,
"stuff": { ... },
...
}
2 - Embedded Documents
• Avalue in a document can be another document
• Nested documents provide structure
• Query any fie...
What is an Entity?
An Entity
• Object in your model
• Associations with other entities
An Entity
• Object in your model
• Associations with o...
Let's model something
together
How about a business
card?
Business Card
Referencing
Addresses
{
"_id": ,
"street":
,
"city": ,
"state": ",
"zip_code": ,
"country":
}
Contacts
{
"_id": ,
"name": ...
Embedding
Contacts
{
"_id": ,
"name": ,
"title":
,
"company": ,
"address": {
"street": ,
"city": ,
"state": ,
"zip_code": ...
Relational Schema
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Contact
• name
• company
• adress
• Street
• City
• State
• Zip
• title
• phone
• address
• street
• city
• State
• zip_co...
How are they different? Why?
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Contact
•...
Schema Flexibility
{
"name": ,
"title":
,
"company": ,
"address": {
"street": ,
"city": ,
"state": ,
"zip_code":
},
"phone...
Example
Let’s Look at an
Address Book
Address Book
• What questions do I have?
• What are my entities?
• What are my associations?
Address Book Entity-Relationship
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Pho...
Associating Entities
One to One
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Em...
One to One
Schema Design Choices
contact
• twitter_id
twitter1 1
contact twitter
• contact_id1 1
Redundant to track relati...
One to One
General Recommendation
• Full contact info all at once
– Contact embeds twitter
• Parent-child relationship
– "...
One to Many
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
E...
One to Many
Schema Design Choices
contact
• phone_ids: [ ]
phone1 N
contact phone
• contact_id1 N
Redundant to track relat...
One to Many
General Recommendation
• Full contact info all at once
– Contact embeds multiple phones
• Parent-children rela...
Many to Many
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
...
Many to Many
Traditional Relational Association
Join table
Contacts
• name
• company
• title
• phone
Groups
• name
GroupCo...
Many to Many
Schema Design Choices
group
• contact_ids: [ ]
contactN N
group
contact
• group_ids: [
]
N N
Redundant to tra...
Many to Many
General Recommendation
• Depends on use case
1. Simple address book
• Contact references groups
2. Corporate ...
Contacts
• name
• company
• title
addresses
• type
• street
• city
• state
• zip_code
phones
• type
• number
emails
• type...
Contact document example
{
"name" : "Gary J. Murakami, Ph.D.",
"company" : "MongoDB, Inc.",
"title" : "Lead Engineer",
"tw...
Working Set
To reduce the working set, consider…
• Reference bulk data, e.g., portrait
• Reference less-used data instead ...
General Recommendations
Legacy Migration
1. Copy existing schema & some data to MongoDB
2. Iterate schema design development
Measure performance, ...
Embedding over Referencing
• Embedding is a bit like pre-joined data
– BSON (Binary JSON) document ops are easy for the
se...
It’s All About Your Application
• Programs+Databases = (Big) DataApplications
• Your schema is the impedance matcher
– Des...
Thank You
Perl Engineer & Evangelist, MongoDB
Mike Friedman
#mongodb
@friedo
Back to Basics 1: Thinking in documents
Upcoming SlideShare
Loading in...5
×

Back to Basics 1: Thinking in documents

15,631

Published on

Published in: Technology, Business

Back to Basics 1: Thinking in documents

  1. 1. Thinking in Documents Perl Engineer & Evangelist, MongoDB, Inc Mike Friedman #mongodb @friedo
  2. 2. Agenda • What is a Record? • Core Concepts • What is an Entity? • Associating Entities • General Recommendations
  3. 3. All application development is Schema Design
  4. 4. Success comes from Proper Data Structure
  5. 5. What is a Record?
  6. 6. Key → Value • One-dimensional storage • Single value is a blob • Query on key only • No schema • Value cannot be updated, only replaced Key Blob
  7. 7. Relational • Two-dimensional storage (tuples) • Each field contains a single value • Query on any field • Very structured schema (table) • In-place updates • Normalization process requires many tables, joins, indexes, and poor data locality Primary Key
  8. 8. Document • N-dimensional storage • Each field can contain 0, 1, many, or embedded values • Query on any field & level • Flexible schema • Inline updates * • Embedding related data has optimal data locality, requires fewer indexes, has better performance _id
  9. 9. Core Concepts
  10. 10. Traditional Schema Design Focus on data storage
  11. 11. Document Schema Design Focus on data use
  12. 12. What answers do I have? What questions do I have?
  13. 13. Schema Design is Flexible
  14. 14. Flexibility • Choices for schema design • Each record can have different fields • Field names consistent for programming • Common structure can be enforced by application • Easy to evolve as needed
  15. 15. Building Blocks of Document Schema Design
  16. 16. 1 - Arrays [ 1, 2, 3, "four", 5, "six", [ 7, 8, 9 ] ]
  17. 17. 1 – Arrays Multiple Values per Field • Absent • Set to null • Set to a single value • Set to an array of many values Each field in a document can be:
  18. 18. 1 – Arrays Multiple Values per Field • Query for any matching value – Can be indexed and each value in the array is in the index
  19. 19. 2 – Embedded Documents{ "foo": 42, "bar": 43, "stuff": { ... }, ... }
  20. 20. 2 - Embedded Documents • Avalue in a document can be another document • Nested documents provide structure • Query any field at any level – Can be indexed
  21. 21. What is an Entity?
  22. 22. An Entity • Object in your model • Associations with other entities An Entity • Object in your model • Associations with other entities Referencing (Relational) Embedding (Document) has_one embeds_one belongs_to embedded_in has_many embeds_many has_and_belongs_to_ma ny
  23. 23. Let's model something together How about a business card?
  24. 24. Business Card
  25. 25. Referencing Addresses { "_id": , "street": , "city": , "state": ", "zip_code": , "country": } Contacts { "_id": , "name": , "title": , "company": ", "phone": , "address_id": }
  26. 26. Embedding Contacts { "_id": , "name": , "title": , "company": , "address": { "street": , "city": , "state": , "zip_code": , "country": }, "phone": }
  27. 27. Relational Schema Contact • name • company • title • phone Address • street • city • state • zip_code
  28. 28. Contact • name • company • adress • Street • City • State • Zip • title • phone • address • street • city • State • zip_code Document Schema
  29. 29. How are they different? Why? Contact • name • company • title • phone Address • street • city • state • zip_code Contact • name • company • adress • Street • City • State • Zip • title • phone • address • street • city • state • zip_code
  30. 30. Schema Flexibility { "name": , "title": , "company": , "address": { "street": , "city": , "state": , "zip_code": }, "phone": } { "name": , "url": , "title": , "company": , "email": , "address": { "street": , "city": , "state": , "zip_code": } "phone": , "fax" }
  31. 31. Example
  32. 32. Let’s Look at an Address Book
  33. 33. Address Book • What questions do I have? • What are my entities? • What are my associations?
  34. 34. Address Book Entity-Relationship Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  35. 35. Associating Entities
  36. 36. One to One Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  37. 37. One to One Schema Design Choices contact • twitter_id twitter1 1 contact twitter • contact_id1 1 Redundant to track relationship on both sides • Both references must be updated for consistency • May save a fetch? Contact • twitter twitter 1
  38. 38. One to One General Recommendation • Full contact info all at once – Contact embeds twitter • Parent-child relationship – "contains" • No additional data duplication • Can query or index on embedded field – e.g., "twitter.name" – Exceptional cases… • Reference portrait which has very large data Contact • twitter twitter 1
  39. 39. One to Many Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  40. 40. One to Many Schema Design Choices contact • phone_ids: [ ] phone1 N contact phone • contact_id1 N Redundant to track relationship on both sides • Both references must be updated for consistency • Not possible in relational DBs • Save a fetch? Contact • phones phone N
  41. 41. One to Many General Recommendation • Full contact info all at once – Contact embeds multiple phones • Parent-children relationship – "contains" • No additional data duplication • Can query or index on any field – e.g., { "phones.type": "mobile" } – Exceptional cases… • Scaling: maximum document size is 16MB Contact • phones phone N
  42. 42. Many to Many Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  43. 43. Many to Many Traditional Relational Association Join table Contacts • name • company • title • phone Groups • name GroupContacts • group_id • contact_id Use arrays instead X
  44. 44. Many to Many Schema Design Choices group • contact_ids: [ ] contactN N group contact • group_ids: [ ] N N Redundant to track relationship on both sides • Both references must be updated for consistency Redundant to track relationship on both sides • Duplicated data must be updated for consistency group • contacts contact N contact • groups group N
  45. 45. Many to Many General Recommendation • Depends on use case 1. Simple address book • Contact references groups 2. Corporate email groups • Group embeds contacts for performance • Exceptional cases – Scaling: maximum document size is 16MB – Scaling may affect performance and working set group contact • group_ids: [ ] N N
  46. 46. Contacts • name • company • title addresses • type • street • city • state • zip_code phones • type • number emails • type • address thumbnail • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 twitter • name • location • web • bio N N N 1 1 Document model - holistic and efficient representation
  47. 47. Contact document example { "name" : "Gary J. Murakami, Ph.D.", "company" : "MongoDB, Inc.", "title" : "Lead Engineer", "twitter" : { "name" : "Gary Murakami", "location" : "New Providence, NJ", "web" : "http://www.nobell.org" }, "portrait_id" : 1, "addresses" : , "phones" : , "emails" : }
  48. 48. Working Set To reduce the working set, consider… • Reference bulk data, e.g., portrait • Reference less-used data instead of embedding – Extract into referenced child document Also for performance issues with large documents
  49. 49. General Recommendations
  50. 50. Legacy Migration 1. Copy existing schema & some data to MongoDB 2. Iterate schema design development Measure performance, find bottlenecks, and embed 1. one to one associations first 2. one to many associations next 3. many to many associations 3. Migrate full dataset to new schema New SoftwareApplication? Embed by default
  51. 51. Embedding over Referencing • Embedding is a bit like pre-joined data – BSON (Binary JSON) document ops are easy for the server • Embed (90/10 following rule of thumb) – When the "one" or "many" objects are viewed in the context of their parent – For performance – For atomicity • Reference – When you need more scaling – For easy consistency with "many to many" associations without duplicated data
  52. 52. It’s All About Your Application • Programs+Databases = (Big) DataApplications • Your schema is the impedance matcher – Design choices: normalize/denormalize, reference/embed – Melds programming with MongoDB for best of both – Flexible for development and change • Programs×MongoDB = Great Big DataApplications
  53. 53. Thank You Perl Engineer & Evangelist, MongoDB Mike Friedman #mongodb @friedo
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×