Your SlideShare is downloading. ×
MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)

10,396
views

Published on

Published in: Technology

0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
10,396
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
125
Comments
0
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MongoDB Schema Design: Insights and Tradeoffs Montse Medina COO,Saturday, May 5, 12
  • 2. Social content is useful in contextSaturday, May 5, 12
  • 3. Social context is useful in contextSaturday, May 5, 12
  • 4. Algorithms + InfrastructureSaturday, May 5, 12
  • 5. Technology Stack Apache KafkaSaturday, May 5, 12
  • 6. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDBSaturday, May 5, 12
  • 7. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDBSaturday, May 5, 12
  • 8. Relational vs. Document- oriented Users { id: 1, Users Graph name: “Robert”, from:[2], id name from to to: [5,20]} vs 1 5 1 Robert 1 20 { id: 2, 2 Monica name:”Monica”, 2 1 3 Lucas from:[23], 2 5 to:[1,5]} ... ... ... ... ...Saturday, May 5, 12
  • 9. Find all the “to” edges for user 5 Graph from to Users 1 5 Blocks { id: 5, name: “Robert”, vs 1 20 from:[1,2,4], 2 1 to: [1,20,3,7,2]} 2 5 1 disk se 3 4 ek guarante 3 23 ed ! ny 3 12 4 5 ma as s ... ... lly s a tia eek P ten k s o is es! d ”e dg “toSaturday, May 5, 12
  • 10. Advantages of doc-oriented schema •Avoid joins •Disk locality when fetching relations (everything is stored within a doc record) Considerations for schema design •N to Many relations == Lists •Denormalization is more commonSaturday, May 5, 12
  • 11. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDBSaturday, May 5, 12
  • 12. Schema-less design {id: 1, network: Twitter, name: “Robert”, from:[2], to: [5,20], screenName: “robertE”} {id: 2, network: Facebook, name:”Maria”, from:[23], to:[1,5], likes: [“biking”, “hiking”]} ... he sche maless L ev erage t but put ture of Mongo, na n with ty p e s i n p rotectio you r code!Saturday, May 5, 12
  • 13. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDBSaturday, May 5, 12
  • 14. Read-Friendly Case Study: Publishers & SubscribersSaturday, May 5, 12
  • 15. Read-Friendly Approach Hi! Hi! Hi! Post: { _id: postId, owner: ownerId, recipient: recipientId, text: “message”, ...}Saturday, May 5, 12
  • 16. Read-Friendly Approach db.posts.find({recipient: uid}) Sharding Key: recipient Fast retrieval, easy sharding Slow writes, enormous amount of storageSaturday, May 5, 12
  • 17. Write-Friendly Case Study: Publishers & SubscribersSaturday, May 5, 12
  • 18. Write-Friendly Approach Hi! Post: { _id: postId, owner: oId, text: “message”, ...}Saturday, May 5, 12
  • 19. Write-Friendly Approach db.posts.find({owner: {$in:user.from}}) Sharding Key: ? Fast writes, slim storage Slow reads, harder queriesSaturday, May 5, 12
  • 20. Hybrid Approach Case Study: Publishers & SubscribersSaturday, May 5, 12
  • 21. Hybrid Approach Hi! Post: { _id: postId, owner: ownerId, recipients: [u1, u2, u3, u5], text: “message”, ...}Saturday, May 5, 12
  • 22. Hybrid Approach db.posts.find({recipients: uId}) Sharding Key: random :) Fast writes, slim storage, reasonable read speedSaturday, May 5, 12
  • 23. Random sharding is not random! t he Best -- Impossible for our data ize disk nim of Mi e r b r sha rd! num pe seeks Worse Optimal solutionSaturday, May 5, 12
  • 24. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDBSaturday, May 5, 12
  • 25. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDBSaturday, May 5, 12
  • 26. Indexes Primary Key link: { ral atu e a n f th _id: ObjectId(...), url: “www.jetlore.com”, has content”, title: “Jetlore is a search platform for social ad o ata ste r d t in tId you se i description: “...” j ec } If , u fault Ob PK de link: { _id: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” }Saturday, May 5, 12
  • 27. Indexes Augment your schema to enable the most selective index ount” ik esC w “l post: { a ne ient s: 1 , _id: ObjectId(...), recipients: [...], Add r ec ip ex ( { likes: [...], fie ld! r eInd likesCount: ..., s.e nsu ) p ost nt: -1} ...} db. Cou s lik e Want all posts that a user can view sorted by the number of likesSaturday, May 5, 12
  • 28. Indexes Make sure to use the proper index db.posts.find({recipients: uId}).sort({date: -1}) ith tw tes () a y s lain db.posts.ensureIndex({recipients: 1}) Alw exp db.posts.ensureIndex({date: 1}) vs date: -1 db.posts.ensureIndex({recipients: 1, date:1})Saturday, May 5, 12
  • 29. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDBSaturday, May 5, 12
  • 30. Concurrency Try to avoid “save()” in drivers thread1: { _id: u1, thread2: { _id: u1, name: “Robert”, name: “Bob”, from: [u2, u3] from: [] } } db.users.update({_id: thread1._id}, {$set: {thread1.from}}) db.users.update({_id: thread2._id}, {$set: {name: thread2.name}}) …but! db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false)Saturday, May 5, 12
  • 31. Concurrency Atomic Commutative Operators db.users.update({_id: u1}, {$pull {to: u2}}) db.posts.update({_id: pId}, {$inc: {likesCount: 1}}) When updating lists and counters, instead of using $set, rely on $inc, $addToSet, $pullSaturday, May 5, 12
  • 32. Concurrency No Transactions user1: { _id: u1, User1 wants to to: [u2, u3], unsubscribe from user2. from: [...], ...} user2: { _id: u2, Ideally we would update to: [...], from: [u1, ...], ...} both users in one transaction ur yo ti t in en e lem c o d I mpSaturday, May 5, 12
  • 33. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDBSaturday, May 5, 12
  • 34. Reducing collection size Name your fields with short names! post: { owner: ObjectId, messageText: “loving Jetlore”, mediaUrl: “www.jetlore.com”, mediaTitle: “Jetlore is a user analytics & search platform for social content” } vs post: { o: ObjectId, t: “loving Jetlore”, mu: “www.jetlore.com”, mt: “Jetlore is a user analytics & search platform for social content” }Saturday, May 5, 12
  • 35. OutlineI. Schema designII. Lessons learned for schema designIII. Things to remember about MongoDB ‣ Single lock ‣ ($or + sort) query doesn’t use indexes properly ‣ Indexes with 2 list fields ‣ Record iterators + updateSaturday, May 5, 12
  • 36. $or & sort query doesn’t use the proper index db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1}) db.posts.ensureIndex({recipients: 1, date: -1}) db.posts.ensureIndex({privacy: 1, date: -1}) Indexes with 2 list fields post: { _id: ObjectId(...), recipients: [...], db.posts.ensureIndex({recipients: 1, links: 1}) links: [...], ... }Saturday, May 5, 12
  • 37. Record iterators + updating var posts = db.posts.find().skip(n).limit(t) while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}}) } Sort by a field that will not change or rename the old collection var posts = db.posts.find().sort({date: 1}).skip(n).limit(t) db.posts.renameCollection(“oldPosts”) var posts = db.oldPosts.find().skip(n).limit(t) while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}}) }Saturday, May 5, 12
  • 38. The take aways I. What is more important? • Writes: Optimize for easy inserts/updates • Reads: Optimize for easy querying II. Denormalize to enable the most selective index III. Concurrency: design to leverage commutative operatorsSaturday, May 5, 12
  • 39. Thank you! Try our tech powered bySaturday, May 5, 12