Successfully reported this slideshow.
MongoDB Schema Design:                        Insights and Tradeoffs                                     Montse Medina    ...
Social content is useful                  in contextSaturday, May 5, 12
Social context is       useful in contextSaturday, May 5, 12
Algorithms                             +                      InfrastructureSaturday, May 5, 12
Technology Stack                                Apache KafkaSaturday, May 5, 12
Outline    I. Schema design        ‣    Relational vs. Document-oriented        ‣    Schema-less design        ‣    Case s...
Outline    I. Schema design        ‣    Relational vs. Document-oriented        ‣    Schema-less design        ‣    Case s...
Relational vs. Document-                   oriented                                                        Users          ...
Find all the “to” edges for user 5                       Graph                  from      to                              ...
Advantages of doc-oriented schema         •Avoid joins         •Disk locality when fetching relations (everything         ...
Outline    I. Schema design        ‣    Relational vs. Document-oriented        ‣    Schema-less design        ‣    Case s...
Schema-less design        {id: 1, network: Twitter, name: “Robert”,         from:[2], to: [5,20], screenName: “robertE”}  ...
Outline    I. Schema design        ‣    Relational vs. Document-oriented        ‣    Schema-less design        ‣    Case s...
Read-Friendly                      Case Study: Publishers & SubscribersSaturday, May 5, 12
Read-Friendly Approach                                       Hi!                                             Hi!          ...
Read-Friendly Approach                                    db.posts.find({recipient: uid})                                  ...
Write-Friendly                      Case Study: Publishers & SubscribersSaturday, May 5, 12
Write-Friendly Approach                                 Hi!        Post:        { _id: postId,         owner: oId,        ...
Write-Friendly Approach                             db.posts.find({owner: {$in:user.from}})                                ...
Hybrid Approach                      Case Study: Publishers & SubscribersSaturday, May 5, 12
Hybrid Approach                               Hi!     Post:     { _id: postId,       owner: ownerId,       recipients: [u1...
Hybrid Approach                                db.posts.find({recipients: uId})                                          Sh...
Random sharding is not                     random!      t he           Best -- Impossible for our data         ize disk   ...
Outline    I. Schema design    II. Lessons learned for schema design        ‣    Indexes        ‣    Concurrency        ‣ ...
Outline    I. Schema design    II. Lessons learned for schema design        ‣    Indexes        ‣    Concurrency        ‣ ...
Indexes                                           Primary Key                       link: {                               ...
Indexes              Augment your schema to enable the                    most selective index                            ...
Indexes                      Make sure to use the proper index                           db.posts.find({recipients: uId}).s...
Outline    I. Schema design    II. Lessons learned for schema design        ‣    Indexes        ‣    Concurrency        ‣ ...
Concurrency                         Try to avoid “save()” in drivers                      thread1: { _id: u1,             ...
Concurrency       Atomic Commutative Operators                               db.users.update({_id: u1}, {$pull {to: u2}}) ...
Concurrency                                No Transactions          user1: { _id: u1,                                     ...
Outline    I. Schema design    II. Lessons learned for schema design        ‣    Indexes        ‣    Concurrency        ‣ ...
Reducing collection size                                   Name your fields with short                                     ...
OutlineI. Schema designII. Lessons learned for schema designIII. Things to remember about MongoDB     ‣   Single lock     ...
$or & sort query doesn’t use the proper                        index            db.posts.find({$or: [{recipients: uId}, {pr...
Record iterators +                          updating      var posts = db.posts.find().skip(n).limit(t)      while (posts.ha...
The take aways    I. What is more important?        •      Writes: Optimize for easy inserts/updates        •      Reads: ...
Thank you!                      Try our tech                               powered bySaturday, May 5, 12
Upcoming SlideShare
Loading in …5
×

MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)

11,696 views

Published on

Published in: Technology
  • For data visualization,data analytics,data intelligence and ERP Tools, online training with job placements, register at http://www.todaycourses.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)

  1. 1. MongoDB Schema Design: Insights and Tradeoffs Montse Medina COO,Saturday, May 5, 12
  2. 2. Social content is useful in contextSaturday, May 5, 12
  3. 3. Social context is useful in contextSaturday, May 5, 12
  4. 4. Algorithms + InfrastructureSaturday, May 5, 12
  5. 5. Technology Stack Apache KafkaSaturday, May 5, 12
  6. 6. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDBSaturday, May 5, 12
  7. 7. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDBSaturday, May 5, 12
  8. 8. Relational vs. Document- oriented Users { id: 1, Users Graph name: “Robert”, from:[2], id name from to to: [5,20]} vs 1 5 1 Robert 1 20 { id: 2, 2 Monica name:”Monica”, 2 1 3 Lucas from:[23], 2 5 to:[1,5]} ... ... ... ... ...Saturday, May 5, 12
  9. 9. Find all the “to” edges for user 5 Graph from to Users 1 5 Blocks { id: 5, name: “Robert”, vs 1 20 from:[1,2,4], 2 1 to: [1,20,3,7,2]} 2 5 1 disk se 3 4 ek guarante 3 23 ed ! ny 3 12 4 5 ma as s ... ... lly s a tia eek P ten k s o is es! d ”e dg “toSaturday, May 5, 12
  10. 10. Advantages of doc-oriented schema •Avoid joins •Disk locality when fetching relations (everything is stored within a doc record) Considerations for schema design •N to Many relations == Lists •Denormalization is more commonSaturday, May 5, 12
  11. 11. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDBSaturday, May 5, 12
  12. 12. Schema-less design {id: 1, network: Twitter, name: “Robert”, from:[2], to: [5,20], screenName: “robertE”} {id: 2, network: Facebook, name:”Maria”, from:[23], to:[1,5], likes: [“biking”, “hiking”]} ... he sche maless L ev erage t but put ture of Mongo, na n with ty p e s i n p rotectio you r code!Saturday, May 5, 12
  13. 13. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDBSaturday, May 5, 12
  14. 14. Read-Friendly Case Study: Publishers & SubscribersSaturday, May 5, 12
  15. 15. Read-Friendly Approach Hi! Hi! Hi! Post: { _id: postId, owner: ownerId, recipient: recipientId, text: “message”, ...}Saturday, May 5, 12
  16. 16. Read-Friendly Approach db.posts.find({recipient: uid}) Sharding Key: recipient Fast retrieval, easy sharding Slow writes, enormous amount of storageSaturday, May 5, 12
  17. 17. Write-Friendly Case Study: Publishers & SubscribersSaturday, May 5, 12
  18. 18. Write-Friendly Approach Hi! Post: { _id: postId, owner: oId, text: “message”, ...}Saturday, May 5, 12
  19. 19. Write-Friendly Approach db.posts.find({owner: {$in:user.from}}) Sharding Key: ? Fast writes, slim storage Slow reads, harder queriesSaturday, May 5, 12
  20. 20. Hybrid Approach Case Study: Publishers & SubscribersSaturday, May 5, 12
  21. 21. Hybrid Approach Hi! Post: { _id: postId, owner: ownerId, recipients: [u1, u2, u3, u5], text: “message”, ...}Saturday, May 5, 12
  22. 22. Hybrid Approach db.posts.find({recipients: uId}) Sharding Key: random :) Fast writes, slim storage, reasonable read speedSaturday, May 5, 12
  23. 23. Random sharding is not random! t he Best -- Impossible for our data ize disk nim of Mi e r b r sha rd! num pe seeks Worse Optimal solutionSaturday, May 5, 12
  24. 24. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDBSaturday, May 5, 12
  25. 25. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDBSaturday, May 5, 12
  26. 26. Indexes Primary Key link: { ral atu e a n f th _id: ObjectId(...), url: “www.jetlore.com”, has content”, title: “Jetlore is a search platform for social ad o ata ste r d t in tId you se i description: “...” j ec } If , u fault Ob PK de link: { _id: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” }Saturday, May 5, 12
  27. 27. Indexes Augment your schema to enable the most selective index ount” ik esC w “l post: { a ne ient s: 1 , _id: ObjectId(...), recipients: [...], Add r ec ip ex ( { likes: [...], fie ld! r eInd likesCount: ..., s.e nsu ) p ost nt: -1} ...} db. Cou s lik e Want all posts that a user can view sorted by the number of likesSaturday, May 5, 12
  28. 28. Indexes Make sure to use the proper index db.posts.find({recipients: uId}).sort({date: -1}) ith tw tes () a y s lain db.posts.ensureIndex({recipients: 1}) Alw exp db.posts.ensureIndex({date: 1}) vs date: -1 db.posts.ensureIndex({recipients: 1, date:1})Saturday, May 5, 12
  29. 29. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDBSaturday, May 5, 12
  30. 30. Concurrency Try to avoid “save()” in drivers thread1: { _id: u1, thread2: { _id: u1, name: “Robert”, name: “Bob”, from: [u2, u3] from: [] } } db.users.update({_id: thread1._id}, {$set: {thread1.from}}) db.users.update({_id: thread2._id}, {$set: {name: thread2.name}}) …but! db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false)Saturday, May 5, 12
  31. 31. Concurrency Atomic Commutative Operators db.users.update({_id: u1}, {$pull {to: u2}}) db.posts.update({_id: pId}, {$inc: {likesCount: 1}}) When updating lists and counters, instead of using $set, rely on $inc, $addToSet, $pullSaturday, May 5, 12
  32. 32. Concurrency No Transactions user1: { _id: u1, User1 wants to to: [u2, u3], unsubscribe from user2. from: [...], ...} user2: { _id: u2, Ideally we would update to: [...], from: [u1, ...], ...} both users in one transaction ur yo ti t in en e lem c o d I mpSaturday, May 5, 12
  33. 33. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDBSaturday, May 5, 12
  34. 34. Reducing collection size Name your fields with short names! post: { owner: ObjectId, messageText: “loving Jetlore”, mediaUrl: “www.jetlore.com”, mediaTitle: “Jetlore is a user analytics & search platform for social content” } vs post: { o: ObjectId, t: “loving Jetlore”, mu: “www.jetlore.com”, mt: “Jetlore is a user analytics & search platform for social content” }Saturday, May 5, 12
  35. 35. OutlineI. Schema designII. Lessons learned for schema designIII. Things to remember about MongoDB ‣ Single lock ‣ ($or + sort) query doesn’t use indexes properly ‣ Indexes with 2 list fields ‣ Record iterators + updateSaturday, May 5, 12
  36. 36. $or & sort query doesn’t use the proper index db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1}) db.posts.ensureIndex({recipients: 1, date: -1}) db.posts.ensureIndex({privacy: 1, date: -1}) Indexes with 2 list fields post: { _id: ObjectId(...), recipients: [...], db.posts.ensureIndex({recipients: 1, links: 1}) links: [...], ... }Saturday, May 5, 12
  37. 37. Record iterators + updating var posts = db.posts.find().skip(n).limit(t) while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}}) } Sort by a field that will not change or rename the old collection var posts = db.posts.find().sort({date: 1}).skip(n).limit(t) db.posts.renameCollection(“oldPosts”) var posts = db.oldPosts.find().skip(n).limit(t) while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}}) }Saturday, May 5, 12
  38. 38. The take aways I. What is more important? • Writes: Optimize for easy inserts/updates • Reads: Optimize for easy querying II. Denormalize to enable the most selective index III. Concurrency: design to leverage commutative operatorsSaturday, May 5, 12
  39. 39. Thank you! Try our tech powered bySaturday, May 5, 12

×