Data Modeling for the Real World

951 views

Published on

This talk examines four real-world use cases for MongoDB document-based data modeling. We examine the implications of several possible solutions for each problem.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
951
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
23
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Data Modeling for the Real World

  1. 1. Perl Engineer & Evangelist, 10genMike Friedman#MongoDBdaysSchema DesignFour Real-World UseCases
  2. 2. Single Table EnAgenda• Why is schema design important• 4 Real World Schemas– Inbox– History– IndexedAttributes– Multiple Identities• Conclusions
  3. 3. Why is Schema Designimportant?• Largest factor for a performant system• Schema design with MongoDB is different• RDBMS – "What answers do I have?"• MongoDB – "What question will I have?"
  4. 4. #1 - Message Inbox
  5. 5. Let’s getSocial
  6. 6. Sending Messages?
  7. 7. Design Goals• Efficiently send new messages to recipients• Efficiently read inbox
  8. 8. Reading my Inbox?
  9. 9. 3 Approaches (there aremore)• Fan out on Read• Fan out on Write• Fan out on Write with Bucketing
  10. 10. // Shard on "from"db.shardCollection( "mongodbdays.inbox", { from: 1 } )// Make sure we have an index to handle inbox readsdb.inbox.ensureIndex( { to: 1, sent: 1 } )msg = {from: "Joe",to: [ "Bob", "Jane" ],sent: new Date(),message: "Hi!",}// Send a messagedb.inbox.save( msg )// Read my inboxdb.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )Fan out on read
  11. 11. Fan out on read – I/OShard1 Shard 2Shard3SendMessage
  12. 12. Fan out on read – I/OShard1 Shard 2Shard3ReadInboxSendMessage
  13. 13. Considerations• Write: One document per message sent• Read: Find all messages with my own name inthe recipient field• Read: Requires scatter-gather on shardedcluster• A lot of random I/O on a shard to find everything
  14. 14. // Shard on “recipient” and “sent”db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )msg = {from: "Joe",to: [ "Bob", "Jane" ],sent: new Date(),message: "Hi!",}// Send a messagefor ( recipient in msg.to ) {msg.recipient = msg.to[recipient]db.inbox.save( msg );}// Read my inboxdb.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )Fan out on write
  15. 15. Fan out on write – I/OShard1Shard2Shard3SendMessage
  16. 16. Fan out on write – I/OReadInboxSendMessageShard1Shard2Shard3
  17. 17. Considerations• Write: One document per recipient• Read: Find all of the messages with me as therecipient• Can shard on recipient, so inbox reads hit oneshard• But still lots of random I/O on the shard
  18. 18. // Shard on "owner / sequence"db.shardCollection( "mongodbdays.inbox",{ owner: 1, sequence: 1 } )db.shardCollection( "mongodbdays.users", { user_name: 1 } )msg = {from: "Joe",to: [ "Bob", "Jane" ],sent: new Date(),message: "Hi!",}Fan out on write with buckets
  19. 19. // Send a messagefor( recipient in msg.to) {count = db.users.findAndModify({query: { user_name: msg.to[recipient] },update: { "$inc": { "msg_count": 1 } },upsert: true,new: true }).msg_count;sequence = Math.floor(count / 50);db.inbox.update({owner: msg.to[recipient], sequence: sequence },{ $push: { "messages": msg } },{ upsert: true } );}// Read my inboxdb.inbox.find( { owner: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 )Fan out on write with buckets
  20. 20. Fan out on write with buckets• Each “inbox” document is an array of messages• Append a message onto “inbox” of recipient• Bucket inboxes so there’s not too manymessages per document• Can shard on recipient, so inbox reads hit oneshard• 1 or 2 documents to read the whole inbox
  21. 21. Fan out on write with buckets – I/OShard1Shard2Shard3SendMessage
  22. 22. Shard1Shard2Shard3Fan out on write with buckets – I/OReadInboxSendMessage
  23. 23. #2 – History
  24. 24. Design Goals• Need to retain a limited amount of history e.g.– Hours, Days, Weeks– May be legislative requirement (e.g. HIPPA, SOX, DPA)• Need to query efficiently by– match– ranges
  25. 25. 3 Approaches (there aremore)• Bucket by Number of messages• Fixed size array• Bucket by date + TTL collections
  26. 26. db.inbox.find(){ owner: "Joe", sequence: 25,messages: [{ from: "Joe",to: [ "Bob", "Jane" ],sent: ISODate("2013-03-01T09:59:42.689Z"),message: "Hi!"},…] }// Query with a date rangedb.inbox.find ({owner: "friend1",messages: {$elemMatch: {sent:{$gte: ISODate("…") }}}})// Remove elements based on a datedb.inbox.update({owner: "friend1" },{ $pull: { messages: {sent: { $gte: ISODate("…") } } } } )Bucket by number ofmessages
  27. 27. Considerations• Shrinking documents, space can be reclaimedwith– db.runCommand ( { compact: <collection> } )• Removing the document after the last element inthe array as been removed– { "_id" : …, "messages" : [ ], "owner" : "friend1","sequence" : 0 }
  28. 28. msg = {from: "Your Boss",to: [ "Bob" ],sent: new Date(),message: "CALL ME NOW!"}// 2.4 Introduces $each, $sort and $slice for $pushdb.messages.update({ _id: 1 },{ $push: { messages: { $each: [ msg ],$sort: { sent: 1 },$slice: -50 }}})Fixed Size Array
  29. 29. Considerations• Need to compute the size of the array based onretention period
  30. 30. // messages: one doc per user per daydb.inbox.findOne(){_id: 1,to: "Joe",sequence: ISODate("2013-02-04T00:00:00.392Z"),messages: [ ]}// Auto expires data after 31536000 seconds = 1 yeardb.messages.ensureIndex( { sequence: 1 },{ expireAfterSeconds: 31536000 } )TTL Collections
  31. 31. #3 – Indexed Attributes
  32. 32. Design Goal• Application needs to stored a variable number ofattributes e.g.– User defined Form– Meta Data tags• Queries needed– Equality– Range based• Need to be efficient, regardless of the number ofattributes
  33. 33. 2 Approaches (there aremore)• Attributes as Embedded Document• Attributes as Objects in an Array
  34. 34. db.files.insert( { _id: "local.0",attr: { type: "text", size: 64,created: ISODate("..." } } )db.files.insert( { _id: "local.1",attr: { type: "text", size: 128} } )db.files.insert( { _id: "mongod",attr: { type: "binary", size: 256,created: ISODate("...") } } )// Need to create an index for each item in the sub-documentdb.files.ensureIndex( { "attr.type": 1 } )db.files.find( { "attr.type": "text"} )// Can perform range queriesdb.files.ensureIndex( { "attr.size": 1 } )db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )Attributes as a Sub-Document
  35. 35. Considerations• Each attribute needs an Index• Each time you extend, you add an index• Lots and lots of indexes
  36. 36. db.files.insert( {_id: "local.0",attr: [ { type: "text" },{ size: 64 },{ created: ISODate("...") } ] } )db.files.insert( { _id: "local.1",attr: [ { type: "text" },{ size: 128 } ] } )db.files.insert( { _id: "mongod",attr: [ { type: "binary" },{ size: 256 },{ created: ISODate("...") } ] } )db.files.ensureIndex( { attr: 1 } )Attributes as Objects in Array
  37. 37. Considerations• Only one index needed on attr• Can support range queries, etc.• Index can be used only once per query
  38. 38. #4 – Multiple Identities
  39. 39. Design Goal• Ability to look up by a number of differentidentities e.g.• Username• Email address• FB Handle• LinkedIn URL
  40. 40. 2 Approaches (there aremore)• Identifiers in a single document• Separate Identifiers from Content
  41. 41. db.users.findOne(){ _id: "joe",email: "joe@example.com,fb: "joe.smith", // facebookli: "joe.e.smith", // linkedinother: {…}}// Shard collection by _iddb.shardCollection("mongodbdays.users", { _id: 1 } )// Create indexes on each keydb.users.ensureIndex( { email: 1} )db.users.ensureIndex( { fb: 1 } )db.users.ensureIndex( { li: 1 } )Single Document by User
  42. 42. Read by _id (shard key)Shard 1 Shard 2 Shard 3find( { _id: "joe"} )
  43. 43. Read by email (non-shardkey)Shard 1 Shard 2 Shard 3find ( { email: joe@example.com })
  44. 44. Considerations• Lookup by shard key is routed to 1 shard• Lookup by other identifier is scatter gatheredacross all shards• Secondary keys cannot have a unique index
  45. 45. // Create unique indexdb.identities.ensureIndex( { identifier : 1} , { unique: true} )// Create a document for each users documentdb.identities.save({ identifier : { hndl: "joe" }, user: "1200-42" } )db.identities.save({ identifier : { email: "joe@abc.com" }, user: "1200-42" } )db.identities.save({ identifier : { li: "joe.e.smith" }, user: "1200-42" } )// Shard collection by _iddb.shardCollection( "mydb.identities", { identifier : 1 } )// Create unique indexdb.users.ensureIndex( { _id: 1} , { unique: true} )// Shard collection by _iddb.shardCollection( "mydb.users", { _id: 1 } )Document per Identity
  46. 46. Read requires 2 readsShard 1 Shard 2 Shard 3db.identities.find({"identifier" : {"hndl" : "joe" }})db.users.find( { _id: "1200-42"})
  47. 47. Considerations• Lookup to Identities is a routed query• Lookup to Users is a routed query• Unique indexes available• Must do two queries per lookup
  48. 48. Conclusion
  49. 49. Summary• Multiple ways to model a domain problem• Understand the key uses cases of your app• Balance between ease of query vs. ease of write• Random I/O should be avoided
  50. 50. Perl Engineer & Evangelist, 10genMike Friedman#MongoDBdaysThank You
  51. 51. Next Sessions at 3:405th Floor:West Side Ballroom 3&4:Advanced Replication InternalsWest Side Ballroom 1&2: Building a High-Performance DistributedTask Queue on MongoDBJuilliard Complex: WhiteBoard Q&ALyceum Complex: Ask the Experts7th Floor:Empire Complex: Managing a Maturing MongoDB EcosystemSoHo Complex: MongoDB Indexing Constraints and CreativeSchemas

×