MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)

11,266 views
11,358 views

Published on

Published in: Technology

MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)

  1. 1. MongoDB Schema Design: Insights and Tradeoffs Montse Medina COO,Saturday, May 5, 12
  2. 2. Social content is useful in contextSaturday, May 5, 12
  3. 3. Social context is useful in contextSaturday, May 5, 12
  4. 4. Algorithms + InfrastructureSaturday, May 5, 12
  5. 5. Technology Stack Apache KafkaSaturday, May 5, 12
  6. 6. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDBSaturday, May 5, 12
  7. 7. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDBSaturday, May 5, 12
  8. 8. Relational vs. Document- oriented Users { id: 1, Users Graph name: “Robert”, from:[2], id name from to to: [5,20]} vs 1 5 1 Robert 1 20 { id: 2, 2 Monica name:”Monica”, 2 1 3 Lucas from:[23], 2 5 to:[1,5]} ... ... ... ... ...Saturday, May 5, 12
  9. 9. Find all the “to” edges for user 5 Graph from to Users 1 5 Blocks { id: 5, name: “Robert”, vs 1 20 from:[1,2,4], 2 1 to: [1,20,3,7,2]} 2 5 1 disk se 3 4 ek guarante 3 23 ed ! ny 3 12 4 5 ma as s ... ... lly s a tia eek P ten k s o is es! d ”e dg “toSaturday, May 5, 12
  10. 10. Advantages of doc-oriented schema •Avoid joins •Disk locality when fetching relations (everything is stored within a doc record) Considerations for schema design •N to Many relations == Lists •Denormalization is more commonSaturday, May 5, 12
  11. 11. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDBSaturday, May 5, 12
  12. 12. Schema-less design {id: 1, network: Twitter, name: “Robert”, from:[2], to: [5,20], screenName: “robertE”} {id: 2, network: Facebook, name:”Maria”, from:[23], to:[1,5], likes: [“biking”, “hiking”]} ... he sche maless L ev erage t but put ture of Mongo, na n with ty p e s i n p rotectio you r code!Saturday, May 5, 12
  13. 13. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDBSaturday, May 5, 12
  14. 14. Read-Friendly Case Study: Publishers & SubscribersSaturday, May 5, 12
  15. 15. Read-Friendly Approach Hi! Hi! Hi! Post: { _id: postId, owner: ownerId, recipient: recipientId, text: “message”, ...}Saturday, May 5, 12
  16. 16. Read-Friendly Approach db.posts.find({recipient: uid}) Sharding Key: recipient Fast retrieval, easy sharding Slow writes, enormous amount of storageSaturday, May 5, 12
  17. 17. Write-Friendly Case Study: Publishers & SubscribersSaturday, May 5, 12
  18. 18. Write-Friendly Approach Hi! Post: { _id: postId, owner: oId, text: “message”, ...}Saturday, May 5, 12
  19. 19. Write-Friendly Approach db.posts.find({owner: {$in:user.from}}) Sharding Key: ? Fast writes, slim storage Slow reads, harder queriesSaturday, May 5, 12
  20. 20. Hybrid Approach Case Study: Publishers & SubscribersSaturday, May 5, 12
  21. 21. Hybrid Approach Hi! Post: { _id: postId, owner: ownerId, recipients: [u1, u2, u3, u5], text: “message”, ...}Saturday, May 5, 12
  22. 22. Hybrid Approach db.posts.find({recipients: uId}) Sharding Key: random :) Fast writes, slim storage, reasonable read speedSaturday, May 5, 12
  23. 23. Random sharding is not random! t he Best -- Impossible for our data ize disk nim of Mi e r b r sha rd! num pe seeks Worse Optimal solutionSaturday, May 5, 12
  24. 24. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDBSaturday, May 5, 12
  25. 25. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDBSaturday, May 5, 12
  26. 26. Indexes Primary Key link: { ral atu e a n f th _id: ObjectId(...), url: “www.jetlore.com”, has content”, title: “Jetlore is a search platform for social ad o ata ste r d t in tId you se i description: “...” j ec } If , u fault Ob PK de link: { _id: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” }Saturday, May 5, 12
  27. 27. Indexes Augment your schema to enable the most selective index ount” ik esC w “l post: { a ne ient s: 1 , _id: ObjectId(...), recipients: [...], Add r ec ip ex ( { likes: [...], fie ld! r eInd likesCount: ..., s.e nsu ) p ost nt: -1} ...} db. Cou s lik e Want all posts that a user can view sorted by the number of likesSaturday, May 5, 12
  28. 28. Indexes Make sure to use the proper index db.posts.find({recipients: uId}).sort({date: -1}) ith tw tes () a y s lain db.posts.ensureIndex({recipients: 1}) Alw exp db.posts.ensureIndex({date: 1}) vs date: -1 db.posts.ensureIndex({recipients: 1, date:1})Saturday, May 5, 12
  29. 29. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDBSaturday, May 5, 12
  30. 30. Concurrency Try to avoid “save()” in drivers thread1: { _id: u1, thread2: { _id: u1, name: “Robert”, name: “Bob”, from: [u2, u3] from: [] } } db.users.update({_id: thread1._id}, {$set: {thread1.from}}) db.users.update({_id: thread2._id}, {$set: {name: thread2.name}}) …but! db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false)Saturday, May 5, 12
  31. 31. Concurrency Atomic Commutative Operators db.users.update({_id: u1}, {$pull {to: u2}}) db.posts.update({_id: pId}, {$inc: {likesCount: 1}}) When updating lists and counters, instead of using $set, rely on $inc, $addToSet, $pullSaturday, May 5, 12
  32. 32. Concurrency No Transactions user1: { _id: u1, User1 wants to to: [u2, u3], unsubscribe from user2. from: [...], ...} user2: { _id: u2, Ideally we would update to: [...], from: [u1, ...], ...} both users in one transaction ur yo ti t in en e lem c o d I mpSaturday, May 5, 12
  33. 33. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDBSaturday, May 5, 12
  34. 34. Reducing collection size Name your fields with short names! post: { owner: ObjectId, messageText: “loving Jetlore”, mediaUrl: “www.jetlore.com”, mediaTitle: “Jetlore is a user analytics & search platform for social content” } vs post: { o: ObjectId, t: “loving Jetlore”, mu: “www.jetlore.com”, mt: “Jetlore is a user analytics & search platform for social content” }Saturday, May 5, 12
  35. 35. OutlineI. Schema designII. Lessons learned for schema designIII. Things to remember about MongoDB ‣ Single lock ‣ ($or + sort) query doesn’t use indexes properly ‣ Indexes with 2 list fields ‣ Record iterators + updateSaturday, May 5, 12
  36. 36. $or & sort query doesn’t use the proper index db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1}) db.posts.ensureIndex({recipients: 1, date: -1}) db.posts.ensureIndex({privacy: 1, date: -1}) Indexes with 2 list fields post: { _id: ObjectId(...), recipients: [...], db.posts.ensureIndex({recipients: 1, links: 1}) links: [...], ... }Saturday, May 5, 12
  37. 37. Record iterators + updating var posts = db.posts.find().skip(n).limit(t) while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}}) } Sort by a field that will not change or rename the old collection var posts = db.posts.find().sort({date: 1}).skip(n).limit(t) db.posts.renameCollection(“oldPosts”) var posts = db.oldPosts.find().skip(n).limit(t) while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}}) }Saturday, May 5, 12
  38. 38. The take aways I. What is more important? • Writes: Optimize for easy inserts/updates • Reads: Optimize for easy querying II. Denormalize to enable the most selective index III. Concurrency: design to leverage commutative operatorsSaturday, May 5, 12
  39. 39. Thank you! Try our tech powered bySaturday, May 5, 12

×