Your SlideShare is downloading. ×

Schema Design

1,487

Published on

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,487
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
88
Comments
1
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • At the venue with the actual display, make sure to preview these notes in presenter mode, and adjust the font size to fit you by using the slider in the left margin of the speaker notes.
  • Your framing interest story should go here and at appropriate intervals recommended by Greg.
  • Schema Design is very important; its impact on your application is pervasive.
  • Wrong data structure will hurt you.Proper data structure can make all the pieces fall into place.
  • One-dimensional storage can be very fast but very relatively limited with respect to other DBMS.
  • Two-dimensional storage of ordered tuples or traditional records.The winning technology is that every field/value is first class,In essence, every field can be addressed in queries and can be indexed for faster processing.Normalization process requires many tables, joins to rehydrate, indexes to make joins faster, and results in poor data locality.
  • The essential capability of the winningtechnology frompersists and gets even better.The document structure can match your data structures – your schema.
  • What questions do I have? What are my use cases?Does your schema take advantage of your application-specific knowledge of known queries, use cases, and client-program data structures?Traditional DBs make it hard to take advantage of them.Document DBs make it easy to take advantage of them.MongoDB documents can match your application – given good schema design.
  • Not “schema-less” but rather “flexible schema”Common structure can be enforced by applicationWhile MongoDB does not enforce common structure, neither does it restrict your applicationDocuments may have a common structure that is optionally extended at the document-levelExample problems for traditionalMany empty columns instead of subclassing via yet another tableThree days for schema migrationKeywords: flexible, choice, evolve, change, modify
  • Concept of arrays incorporates multiple values, associations involving many entities.The lack of multivalued fields is usually the first complaint of programmers that don’t wish to pay the cost for normalization.Keywords: array, multiple, many
  • Documents may have a common structure that is optionally extended at the document-level.The application mapping can enforce the required and optional fields.
  • “Vintage” business card
  • Contact and Address entities areassociated one to one.Traditional relational association is via referencing.In this example, the contact record for Steve Jobs has a reference to his address via the address_id field.
  • Entity-Relational diagram
  • Entity-Relational diagram for embedding documents
  • Left – relational - requires either two fetches/queries (or a join in a relational DB)Right – document – requires only one fetch/query and has data locality
  • A common example will help us understand the joy of flexible document structure.
  • Left: One to one We're going to assume users only have on Twitter account. A thumbnail is a small profile image while portrait is a very large profile image.Right: One to manyMiddle: Many to many
  • Arrays of references are more direct than a join table and save a fetch.
  • fundamentally not “contains”Concerns – exceptional casesExceeding maximum document size due to large data or scalingTransferring very large documents is probably a performance concernScaling may affect working set sizeSchema can be adjusted to improve performance- Fetch only the data that you need
  • Embedding entities in the contact document reduces six fetches to one
  • Embedding is used for both one-to-one and one-to-many associations, resulting in exactly what you expect and want for a contact.(This example has no thumbnail or groups)
  • $project allows you to select top level fields and can be used to reduce data for a fetch. Note that some ODMs may not allow you to specify $project.
  • For many-to-many associations, eliminate join table using array of references or embedded documents
  • Choose embedding by default as oppose to referencing.Referencing is not just the default for relational DBs, there is no other choice.
  • May you build Great Big Data Applications.Perhaps you can say inspiring quotes like Ken Thompson, “Play chess with God.”Ken and I worked on Perceptual Audio Coding, better known as Advanced Audio Coding or AAC as found in the iPod and iPhone.So I hope that this will inspire you to“Play music with God”to design your killer app
  • BSON (Binary JSON) is the “magic” or core technology in MongoDB for data structures and performance.BSON does not have to be parsed like JSON, but is rather a format that can be traversed easily.Can choose a language to fit your application, or multiple languages to fit multiple components of your application as appropriate.
  • Transcript

    • 1. Schema Design Mike Friedman Perl Engineer & Evangelist, MongoDB
    • 2. Agenda • What is a Record? • Core Concepts • What is an Entity? • Associating Entities • General Recommendations
    • 3. All application development is Schema Design
    • 4. Success comes from Proper Data Structure
    • 5. What is a Record?
    • 6. Key → Value • One-dimensional storage • Single value is a blob Key • Query on key only • No schema • Value cannot be updated, only replaced Blob
    • 7. Relational • Two-dimensional storage (tuples) • Each field contains a single value Primary Key • Query on any field • Very structured schema (table) • In-place updates • Normalization process requires many tables, joins, indexes, and poor data locality
    • 8. Document • N-dimensional storage _id • Each field can contain 0, 1, many, or embedded values • Query on any field & level • Flexible schema • Inline updates * • Embedding related data has optimal data locality, requires fewer indexes, has better performance
    • 9. Core Concepts
    • 10. Traditional Schema Design Focus on data storage
    • 11. Document Schema Design Focus on data use
    • 12. Another way to think about it What answers do I have? What questions do I have?
    • 13. Three Building Blocks of Document Schema Design
    • 14. 1 – Flexibility • Choices for schema design • Each record can have different fields • Common structure can be enforced by application • Easy to evolve as needed
    • 15. 2 – Arrays Multiple Values per Field • Each field can be: – Absent – Set to null – Set to a single value – Set to an array of many values • Query for any matching value – Can be indexed and each value in the array is in the index
    • 16. 3 - Embedded Documents • An acceptable value is a document • Nested documents provide structure • Query any field at any level – Can be indexed
    • 17. What is an Entity?
    • 18. An Entity • Object in your model • Associations with other entities Referencing (Relational) Embedding (Document) has_one belongs_to has_many embeds_one embedded_in embeds_many has_and_belongs_to_ma ny MongoDB has both referencing and embedding for universal coverage
    • 19. Let's model something together How about a business card?
    • 20. Business Card
    • 21. Referencing Contacts Addresses { { } “_id”: , “name”: “title”: “company”: “phone”: “address_id”: , ”, , , } “_id”: , “street”: “city”: “state”: ”, “zip_code”: “country”: , , ,
    • 22. Embedding Contacts { “_id”: , “name”: “title”: “company”: “address”: { “street”: “city”: “state”: , “zip_code”: “country”: }, “phone”: } , , , , , ,
    • 23. Contact • • • • name company title phone Address • • • • street city state zip_code Relational Schema
    • 24. Contact • • • • name company adress address • Street • street • City • city • State • State • Zip • zip_code • title • phone Document Schema
    • 25. Contact Contact • • • • name company title phone Address • • • • street city state zip_code • name • company • adress address • Street street • City city • State state • Zip zip_code • title • phone How are they different? Why?
    • 26. Schema Flexibility { “name”: “title”: “company”: “address”: { “street”: “city”: “state”: , “zip_code”: }, “phone”: { “name”: “url”: “title”: , “company”: “email”: “address”: { “street”: “city”: “state”: , “zip_code”: } “phone”: “fax” , , , , , } } , , , , , , ,
    • 27. Example
    • 28. Let’s Look at an Address Book
    • 29. Address Book • What questions do I have? • What are my entities? • What are my associations?
    • 30. • • • • name location web bio • name N 1 N 1 1 Thumbnail s • mime_type • data Contacts 1 • • • N • • type street city state zip_code Phones • name 1 • company • title 1 1 1 Portraits • mime_type • data Addresses Groups Twitters 1 N • type • number Emails N • type • address Address Book Entity-Relationship
    • 31. Associating Entities
    • 32. • • • • name location web bio • name N 1 N 1 1 Thumbnail s • mime_type • data Contacts • • • N • • type street city state zip_code Phones • name 1 • company • title 1 1 1 Portraits • mime_type • data Addresses Groups Twitters 1 N • type • number Emails N • type • address 1 One to One
    • 33. One to One Schema Design Choices contact • twitter_id 1 1 twitter Contact • twitter twitter • May save a fetch? contact twitter 1 1 • contact_id Redundant to track relationship on both sides • Both references must be updated for consistency 1
    • 34. One to One General Recommendation • Full contact info all at once – Contact embeds twitter • Parent-child relationship Contact • twitter – “contains” • No additional data duplication • Can query or index on embedded field – e.g., “twitter.name” twitter 1
    • 35. • • • • name location web bio • name N 1 N 1 1 Thumbnail s • mime_type • data Contacts • • • N • • type street city state zip_code Phones • name 1 • company • title 1 1 1 Portraits • mime_type • data Addresses Groups Twitters 1 N • type • number Emails N • type • address 1 One to Many
    • 36. One to Many Schema Design Choices contact • phone_ids: [ ] 1 N phone • phones phone N • Not possible in relational DBs • Save a fetch? contact Contact phone 1 N • contact_id Redundant to track relationship on both sides • Both references must be updated for consistency
    • 37. One to Many General Recommendation • Full contact info all at once – Contact embeds multiple phones • Parent-children relationship – “contains” Contact • phones phone N • No additional data duplication • Can query or index on any field – e.g., { “phones.type”: “mobile” } – Exceptional cases… • Scaling: maximum document size is 16MB
    • 38. • • • • name location web bio • name N 1 N 1 1 Thumbnail s • mime_type • data Contacts • • • N • • type street city state zip_code Phones • name 1 • company • title 1 1 1 Portraits • mime_type • data Addresses Groups Twitters 1 N • type • number Emails N • type • address 1 Many to Many
    • 39. Many to Many Traditional Relational Association Join table Groups • name X GroupContacts • group_id • contact_id Use arrays instead Contacts • • • • name company title phone
    • 40. Many to Many Schema Design Choices group • contact_ids: [ ] N N contact group • contacts contact group contact • groups N group N contact N N • group_ids: [ ] Redundant to track relationship on both sides • Both references must be updated for consistency Redundant to track relationship on both sides • Duplicated data must be updated for consistency
    • 41. Many to Many General Recommendation contact • Depends on use case group N N • group_ids: [ 1. Simple address book ] • Contact references groups 2. Corporate email groups • Group embeds contacts for performance • Exceptional cases – Scaling: maximum document size is 16MB – Scaling may affect performance and working set
    • 42. Groups Contacts • name N • name • company • title twitter N 1 1 Portraits • mime_type • data • • • • addresses N 1 name location web bio thumbnail 1 • mime_type • data • • • • • type street city state zip_code phones N • type • number emails N • type • address Document model - holistic and efficient representation
    • 43. Contact document example { “name” : “Gary J. Murakami, Ph.D.”, “company” : “MongoDB, Inc.”, “title” : “Lead Engineer”, “twitter” : { “name” : “Gary Murakami”, “location” : “New Providence, NJ”, “web” : “http://www.nobell.org” }, “portrait_id” : 1, “addresses” : , “phones” : , “emails” : }
    • 44. Working Set To reduce the working set, consider… • Reference bulk data, e.g., portrait • Reference less-used data instead of embedding – Extract into referenced child document Also for performance issues with large documents
    • 45. General Recommendations
    • 46. Legacy Migration 1. Copy existing schema & some data to MongoDB 2. Iterate schema design development Measure performance, find bottlenecks, and embed 1. one to one associations first 2. one to many associations next 3. many to many associations 3. Migrate full dataset to new schema New Software Application? Embed by default
    • 47. Embedding over Referencing • Embedding is a bit like pre-joined data – BSON (Binary JSON) document ops are easy for the server • Embed (90/10 following rule of thumb) – When the “one” or “many” objects are viewed in the context of their parent – For performance – For atomicity • Reference – When you need more scaling – For easy consistency with “many to many” associations without duplicated data
    • 48. It’s All About Your Application • Programs+Databases = (Big) Data Applications • Your schema is the impedance matcher – Design choices: normalize/denormalize, reference/embed – Melds programming with MongoDB for best of both – Flexible for development and change • Programs MongoDB = Great Big Data Applications
    • 49. Thank You Mike Friedman Perl Engineer & Evangelist, MongoDB

    ×