Successfully reported this slideshow.
Your SlideShare is downloading. ×

Drop your table ! MongoDB Schema Design

Ad

@SoftShakeEvent 
DROP TABLE ! 
Overview of Document Design 
Tugdual Grall 
Technical Evangelist 
MongoDB 
@tgrall Soft Sha...

Ad

{“about” : “me”} 
Tugdual “Tug” Grall 
• MongoDB 
Join me at the Gōng-fu I/O 
• Technical Evangelist 
• Couchbase 
• Techn...

Ad

With “Relations” Between Rows 
Customer 
ID First 
Name Last 
Name City 
0 John Doe New 
York 
1 Mark Smith San 
Francisco...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 52 Ad
1 of 52 Ad

Drop your table ! MongoDB Schema Design

MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB using various example. You will learn how to work with documents, evolve your schema, and common schema design patterns.

Delivered at Soft Shake '14 and Jug Summer Camp '14

MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB using various example. You will learn how to work with documents, evolve your schema, and common schema design patterns.

Delivered at Soft Shake '14 and Jug Summer Camp '14

More Related Content

Drop your table ! MongoDB Schema Design

  1. 1. @SoftShakeEvent DROP TABLE ! Overview of Document Design Tugdual Grall Technical Evangelist MongoDB @tgrall Soft Shake ‘14
  2. 2. {“about” : “me”} Tugdual “Tug” Grall • MongoDB Join me at the Gōng-fu I/O • Technical Evangelist • Couchbase • Technical Evangelist • eXo • CTO • Oracle • Developer/Product Manager • Mainly Java/SOA • Developer in consulting firms • Web • @tgrall • http://blog.grallandco.com • tgrall • NantesJUG co-founder • Pet Project : • http://www.resultri.com • tug@mongodb.com • tugdual@gmail.com
  3. 3. With “Relations” Between Rows Customer ID First Name Last Name City 0 John Doe New York 1 Mark Smith San Francisco 2 Jay Black Newark 3 Meagan White London 4 Edward Daniels Boston Account Number Branch ID Account Type Customer ID 10 100 Checking 0 11 101 Savings 0 12 101 IRA 0 13 200 Checking 1 14 200 Savings 1 15 201 IRA 2
  4. 4. But, some things are hard to model • Let’s look at a product catalog:
  5. 5. Baseball Bat ! -3 length to weight ratio 2-5/8" barrel diameter Two-piece construction R2 alloy barrel provides outstanding durability, performance and "pop" R2 composite handle shifts weight into the bat's knob for ultra-fast swing speeds Rifle Barrel design removes weight from the barrel for thinner wall thickness Acoustic barrel offers that sweet-sounding "ping" Contact grip helps eliminate sting and vibration AIR Elite is RIP-IT's® fastest BBCOR bat and the one with the most performance BBCOR certified - approved for high school and collegiate play Includes RIP-IT's® "Love It Or Return It" 30 Day Refund Policy with free return shipping Manufacturer's warranty: 400 days Made in the USA Model: B1403E
  6. 6. Bat Product Table Category Model Name Brand Length to weight ratio Barrel Dia Type Barrel Handle Cert. Country Price Bat B1403E Air Elite RIP-IT -3 2 5/8 Composite R2 Alloy R2 composite BBCOR USA $399.99 Bat B1403 Prototype RIP-IT -3 2 5/8 One-piece R1 Alloy R1 Alloy BBCOR USA $199.99 Bat MCB1B One Marucci -3 2 5/8 One-piece AZ3000 aluminum AZ3000 aluminum BBCOR Imported $199.99 Bat BB14S1 S1 Easton -3 2 5/8 Composite IMX SIC Black Carbon BBCOR China $399.99
  7. 7. Lets Add Gloves Size: 12" Infield/Outfield/Pitcher model 2-Piece Web pattern Most popular MLB® pattern among pitchers Pro Stock® American steerhide leather offers rugged durability and a superior feel Dual-Welting™ on "exposed edges" of the fingers helps maintain pocket shape and durability Pro Stock™ hand-designed pattern for unbeatable craftsmanship Dri-Lex® ultra-breathable wrist lining repels moisture from your hand Black leather with rich brown embellishments Pattern: B212 Model: WTA2000BBB212 Wilson
  8. 8. Bat and Glove Product Table Category Model Name Brand Length to weight ratio Barrel Dia Type Barrel Handle Cert. Country Price Bat B1403E Air Elite RIP-IT -3 2 5/8 Composite R2 Alloy R2 composite BBCOR USA $399.99 Bat B1403 Prototype RIP-IT -3 2 5/8 One-piece R1 Alloy R1 Alloy BBCOR USA $199.99 Bat MCB1B One Marucci -3 2 5/8 One-piece AL AL BBCOR Imported $199.99 Bat BB14S1 S1 Easton -3 2 5/8 Composite IMX SIC Black Carbon BBCOR China $399.99 Category Model Name Brand Size Position Pattern Web Pattern Material Color Country Price Glove WTA2000B BB212 A2000 Wilson 12" Infield B212 2-piece Leather black Vietnam $299.99 Glove PRO112PT HOH Pro Rawlings 11.25" Outfield Pro taper Modified Trap-Eze Horween Leather black China $229.99
  9. 9. Add some baseballs Cover: Full grain leather for excellent durability Core: Cushioned cork core Additions/Technologies: Made to the exact specifications of MLB Stitching/Seams: 108 classic red stitches/Rawlings® Major League seaming League/Certification(s): MLB Balls included per purchase: individual Recommended Age: All ages Model : ROMLB Rawlings
  10. 10. Bat and Glove and Ball Product Table Category Model Name Brand Length to weight ratio Barrel Dia Type Barrel Handle Cert. Country Price Bat B1403E Air Elite RIP-IT -3 2 5/8 Composite R2 Alloy R2 composite BBCOR USA $399.99 Bat B1403 Prototype RIP-IT -3 2 5/8 One-piece R1 Alloy R1 Alloy BBCOR USA $199.99 Bat MCB1B One Marucci -3 2 5/8 One-piece AL AL BBCOR Imported $199.99 Bat BB14S1 S1 Easton -3 2 5/8 Composite IMX SIC Black Carbon BBCOR China $399.99 Categor y Model Name Brand Size Position Pattern Web Pattern Material Color Country Price Glove WTA200 0BBB21 2 A2000 Wilson 12" Infield B212 2-piece Leather black Vietnam $299.99 Glove PRO112 PT HOH Pro Rawling s 11.25" Outfield Pro taper Modifie d Trap- Eze Horwee n Leather black China $229.99 Category Model Name Brand Color Cover Core Cert. Country Price Baseball DICRLLB1 PBG Little League Rawlings white Leather Cork rubber Little League China $4.99 Baseball ROML MLB Rawlings white Leather cork China $6.99
  11. 11. Sparse Table Category Model Name Brand Length to weight ratio Barrel Dia Type Barrel Handle Certificati on Country Price Size Position Pattern Web Pattern Material Color Cover Core Bat B1403E Air Elite RIP-­‐IT -­‐3 2 5/8 Composite R2 Alloy R2 composite BBCOR USA $399.99 Bat B1403 Prototype RIP-­‐IT -­‐3 2 5/8 One-­‐piece R1 Alloy R1 Alloy BBCOR USA $199.99 Bat MCB1B One Marucci -­‐3 2 5/8 One-­‐piece AZ3000 aluminum AZ3000 aluminum BBCOR Imported $199.99 Bat BB14S1 S1 Easton -­‐3 2 5/8 Composite IMX SIC Black Carbon BBCOR China $399.99 Glove WTA2000BB B212 A2000 Wilson Vietnam $299.99 12" Infield B212 2-­‐piece Leather black Glove PRO112PT HOH Pro Rawlings China $229.99 11.25" Outfield Pro taper Modified Trap-­‐Eze Horween Leather black Baseball DICRLLB1PB G Little League Rawlings Little League China $4.99 white Leather cork and rubber Baseball ROML MLB Rawlings China $6.99 white Leather cork Continue adding columns as you add new products
  12. 12. Maybe this design will work better prodID property value 1 length/ weight -3 1 barrel dia 2 5/8 1 type composite 1 certification BBCOR … 5 size 12 5 position infield 5 pattern B212 5 material leather 5 color black … 8 color white 8 cover leather 8 core cork prodID Category Model Name Brand Country Price 1 Bat B1403E Air Elite RIP-IT USA $399.99 2 Bat B1403 Prototype RIP-IT USA $199.99 3 Bat MCB1B One Marucci Imported $199.99 4 Bat BB14S1 S1 Easton China $399.99 5 Glove WTA2000B BB212 A2000 Wilson Vietnam $299.99 6 Glove PRO112PT HOH Pro Rawlings China $229.99 DICRLLB1 7 Baseball PBG Little League Rawlings China $4.99 8 Baseball ROML MLB Rawlings China $6.99
  13. 13. It is hard to iterate New Column Name Age Phone Email New Table New Table New Column
  14. 14. Have to Manage Changes in 3 Places Code XML Config DB Schema Relational Database Object Relational Application Mapping
  15. 15. Working with Documents
  16. 16. Match the Data in your Application Relational MongoDB { first_name: ‘Paul’, surname: ‘Miller’ city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000}, { model: ‘Rolls Royce’, year: 1965,! value: 330000}! } }
  17. 17. Let’s go back to our product catalog
  18. 18. How would we model this in MongoDB? Size: 12" Infield/Outfield/Pitcher model 2-Piece Web pattern Most popular MLB® pattern among pitchers Pro Stock® American steerhide leather offers rugged durability and a superior feel Dual-Welting™ on "exposed edges" of the fingers helps maintain pocket shape and durability Pro Stock™ hand-designed pattern for unbeatable craftsmanship Dri-Lex® ultra-breathable wrist lining repels moisture from your hand Black leather with rich brown embellishments Pattern: B212 Model: WTA2000BBB212 Wilson
  19. 19. We use a document ! { category: “glove”, model: “PRO112PT”, name: “Air Elite”, brand: “Rawlings”, price: 229.99, available: Date(“2013-03-31”) } Fields Values Field values are typed string number date
  20. 20. Documents are rich structures ! { category: “glove”, model: “PRO112PT”, name: “Air Elite”, brand: “Rawlings”, price: 229.99, available: Date(“2013-03-31”), position: [“infield”, “outfield”, “pitcher”] } Fields can contain arrays
  21. 21. Documents are rich structures ! { category: “glove”, model: “PRO112PT”, name: “Air Elite”, brand: “Rawlings”, price: 229.99, available: Date(“2013-03-31”), position: [“infield”, “outfield”, “pitcher”], endorsed: {name: “Ryan Howard”, team: “Phillies”, position: “first base”}, } Fields can contain sub-documents
  22. 22. ` { category: “glove”, model: “PRO112PT”, name: “Air Elite”, brand: “Rawlings”, price: 229.99, available: Date(“2013-03-31”), position: [“infield”, “outfield”, “pitcher”], endorsed: {name: “Ryan Howard”, team: “Phillies”, position: “first base”}, history: [{date: Date(“2013-03-31”), price: 279.99}, {date: Date(“2013-06-01”), price: 259.79}, {date: Date(“2013-08-15”), price: 229.99}] } Fields can contain an array of sub-documents
  23. 23. Variation is easy with document model { category: bat, model: B1403E, name: Air Elite, brand: “Rip-IT”, price: 399.99 ! diameter: “2 5/8”, barrel: R2 Alloy, handle: R2 Composite, type: composite, } ! { category: glove, model: PRO112PT, name: Air Elite, brand: “Rawlings”, price: “229.99” ! size: 11.25, position: outfield, pattern: “Pro taper”, material: leather, color: black } { category: ball, model: ROML, name: MLB, brand: “Rawlings”, price: “6.99” ! cover: leather, core: cork, color: white }
  24. 24. Relational Schema Design Focus on data storage
  25. 25. Document Schema Design Focus on data use
  26. 26. Document Design
  27. 27. Schema Design Considerations • How do we manipulate the data? – Dynamic Ad-Hoc Queries – Atomic Updates – Aggregation • What are the access patterns of the application? – Read/Write Ratio – Types of Queries / Updates – Data life-cycle and growth rate
  28. 28. Let's model something together How about a business card?
  29. 29. Business Card
  30. 30. Referencing Addresses! ! {! ! “_id”: 1,! ! “street”: “10260 Bandley Dr”,! ! “city”: “Cupertino”,! ! “state”: “CA”,! ! “zip_code”: ”95014”,! ! “country”: “USA”! } Contacts! ! {! “_id”: 2,! “name”: “Steven Jobs”,! “title”: “VP, New Product Development”,“company”: “Apple Computer”,! “phone”: “408-996-1010”,! “address_id”: 1! }
  31. 31. Embedding Contacts! ! {! “_id”: 2,! “name”: “Steven Jobs”,! “title”: “VP, New Product Development”,! “company”: “Apple Computer”,! “address”: {! ! “street”: “10260 Bandley Dr”,! ! “city”: “Cupertino”,! ! “state”: “CA”,! ! “zip_code”: ”95014”,! ! “country”: “USA”! },! “phone”: “408-996-1010”! }!
  32. 32. ! Contact Relational Schema ! • name • company • title • phone Address ! • street • city • state • zip_code
  33. 33. ! Contact ! • name • company • adress address • Street street • City city • State • Zip zip_code • title • phone Document Schema
  34. 34. Schema Flexibility {! “name”: “Steven Jobs”,! “title”: “VP, New Product Development”,! “company”: “Apple Computer”,! “address”: {! ! “street”: “10260 Bandley Dr”,! ! “city”: “Cupertino”,! ! “state”: “CA”,! ! “zip_code”: ”95014”! },! “phone”: “408-996-1010”! }! {! “name”: “Larry Page”,! “url”: “http://google.com/”,! “title”: “CEO”,! “company”: “Google!”,! “email”: “larry@google.com”,! “address”: {! “street”: “555 Bryant, #106”,! “city”: “Palo Alto”,! “state”: “CA”,! “zip_code”: “94301”! }! “phone”: “650-618-1499”,! “fax”: “650-330-0100”! }
  35. 35. Example
  36. 36. Let’s Look at an Address Book
  37. 37. Groups N N ! • name Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Twitters • name • location • web • bio Thumbnails • mime_type • data Portraits • mime_type • data N 1 1 1 1 1 1 1 1 N N 1 Address Book Entity-Relationship
  38. 38. Associating Entities
  39. 39. ! Contacts • name • company • title One to One Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnails • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 1 1 Twitters • name • location • web • bio 1 1
  40. 40. One to One General Recommendation • Full contact info all at once – Contact embeds twitter • Parent-child relationship – “contains” • twitter twitter 1 • No additional data duplication • Can query or index on embedded field – e.g., “twitter.name” – Exceptional cases… Contact • Reference portrait which has very large data
  41. 41. ! Contacts • name • company • title One to Many Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnails • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 1 1 Twitters • name • location • web • bio 1 1
  42. 42. One to Many General Recommendation • Full contact info all at once – Contact embeds multiple phones • Parent-children relationship – “contains” • No additional data duplication • Can query or index on any field – e.g., { “phones.type”: “mobile” } – Exceptional cases… Contact • phones phone N • Scaling: maximum document size is 16MB
  43. 43. ! Contacts • name • company • title Many to Many Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnails • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 1 1 Twitters • name • location • web • bio 1 1
  44. 44. Many to Many Traditional Relational Association X Join table ! Contacts • name • company • title • phone Groups • name GroupContacts • group_id • contact_id Use arrays instead
  45. 45. Many to Many General Recommendation • Depends on use case 1. Simple address book • Contact references groups 2. Corporate email groups group contact N N • group_ids: [ ] • Group embeds contacts for performance • Exceptional cases – Scaling: maximum document size is 16MB – Scaling may affect performance and working set
  46. 46. Contacts • name • company • title addresses • type • street • city • state • zip_code phones • type • number emails • type • address thumbnail • mime_type • data Groups • name N Portraits • mime_type • data N 1 1 twitter • name • location • web • bio N N N 1 1 Document model - holistic and efficient representation
  47. 47. Contact document example {! ! “name” : “Gary J. Murakami, Ph.D.”,! ! “company” : “MongoDB, Inc.”,! ! “title” : “Lead Engineer”,! ! “twitter” : {! ! ! “name” : “Gary Murakami”, “location” : “New Providence, NJ”,! ! ! “web” : “http://www.nobell.org”! ! },! ! “portrait_id” : 1,! ! “addresses” : [ ! ! ! { “type” : “work”, “street” : ”229 W 43rd St.”, “city” : “New York”, “zip_code” : “10036” }! ! ],! ! “phones” : [ ! ! ! { “type” : “work”, “number” : “1-866-237-8815 x8015” }! ! ],! ! “emails” : [ ! ! ! { “type” : “work”, “address” : “gary.murakami@mongodb.com” },! ! ! { “type” : “home”, “address” : “gjm@nobell.org” }! ! ]! ! }!
  48. 48. General Recommendations
  49. 49. Legacy Migration 1. Copy existing schema & some data to MongoDB 2. Iterate schema design development Measure performance, find bottlenecks, and embed 1. one to one associations first 2. one to many associations next 3. many to many associations 3. Migrate full dataset to new schema New Software Application? Embed by default
  50. 50. Embedding over Referencing • Embedding is a bit like pre-joined data – BSON (Binary JSON) document ops are easy for the server • Embed (90/10 following rule of thumb) – When the “one” or “many” objects are viewed in the context of their parent – For performance – For atomicity • Reference – When you need more scaling – For easy consistency with “many to many” associations without duplicated data
  51. 51. It’s All About Your Application • Programs+Databases = (Big) Data Applications • Your schema is the impedance matcher – Design choices: normalize/denormalize, reference/embed – Melds programming with MongoDB for best of both – Flexible for development and change • Programs×MongoDB = Great Big Data Applications
  52. 52. @SoftShakeEvent Thank You! Tugdual Grall Technical Evangelist MongoDB @tgrall Soft Shake ‘14

×