Technical Director, 10gen@jonnyeight alvin@10gen.com alvinonmongodb.comAlvin Richards#MongoDBdaysSchema Design4 Real World...
Single Table EnAgenda• Why is schema design important• 4 Real World Schemas– Inbox– History– IndexedAttributes– Multiple I...
Why is Schema Designimportant?• Largest factor for a performant system• Schema design with MongoDB is different• RBMS – "W...
#1 - Message Inbox
Let’s getSocial
Sending Messages?
Design Goals• Efficiently send new messages to recipients• Efficiently read inbox
Reading my Inbox?
3 Approaches (there aremore)• Fan out on Read• Fan out on Write• Fan out on Write with Bucketing
// Shard on "from"db.shardCollection( "mongodbdays.inbox", { from: 1 } )// Make sure we have an index to handle inbox read...
Fan out on read – SendMessageShard 1 Shard 2 Shard 3SendMessage
Fan out on read – Inbox ReadShard 1 Shard 2 Shard 3ReadInbox
Considerations• 1 document per message sent• Multiple recipients in an array key• Reading an inbox is finding all messages...
// Shard on “recipient” and “sent”db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )msg = {from: "Jo...
Fan out on write – SendMessageShard 1 Shard 2 Shard 3SendMessage
Fan out on write– Read InboxShard 1 Shard 2 Shard 3ReadInbox
Considerations• 1 document per recipient• Reading my inbox is just finding all of themessages with me as the recipient• Ca...
Fan out on write with buckets• Each “inbox” document is an array of messages• Append a message onto “inbox” of recipient• ...
// Shard on “owner / sequence”db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } )db.shardCollection( "mon...
Bucketed fan out on write -SendShard 1 Shard 2 Shard 3SendMessage
Bucketed fan out on write -ReadShard 1 Shard 2 Shard 3ReadInbox
#2 – History
Design Goals• Need to retain a limited amount of history e.g.– Hours, Days, Weeks– May be legislative requirement (e.g. HI...
3 Approaches (there aremore)• Bucket by Number of messages• Fixed size Array• Bucket by Date + TTL Collections
db.inbox.find(){ owner: "Joe", sequence: 25,messages: [{ from: "Joe",to: [ "Bob", "Jane" ],sent: ISODate("2013-03-01T09:59...
Considerations• Shrinking documents, space can be reclaimedwith– db.runCommand ( { compact: <collection> } )• Removing the...
msg = {from: "Your Boss",to: [ "Bob" ],sent: new Date(),message: "CALL ME NOW!"}// 2.4 Introduces $each, $sort and $slice ...
Considerations• Need to compute the size of the array based onretention period
// messages: one doc per user per daydb.inbox.findOne(){_id: 1,to: "Joe",sequence: ISODate("2013-02-04T00:00:00.392Z"),mes...
#3 – Indexed Attributes
Design Goal• Application needs to stored a variable number ofattributes e.g.– User defined Form– Meta Data tags• Queries n...
2 Approaches (there aremore)• Attributes• Attributes as Objects in an Array
db.files.insert( { _id: "local.0",attr: { type: "text", size: 64,created: ISODate("2013-03-01T09:59:42.689Z" } } )db.files...
Considerations• Each attribute needs an Index• Each time you extend, you add an index• Lots and lots of indexes
db.files.insert( { _id: "local.0",attr: [ { type: "text" }, { size: 64 },{ created: ISODate("2013-03-01T09:59:42.689Z" } ]...
// Range queriesdb.files.find( { attr: { $gt: { size:64 }, $lte: { size: 16384 } } } )db.files.find( { attr:{ $gte: { crea...
#4 – Multiple Identities
Design Goal• Ability to look up by a number of differentidentities e.g.• Username• Email address• FB Handle• LinkedIn URL
2 Approaches (there aremore)• Identifiers in a single document• Separate Identifiers from Content
db.users.findOne(){ _id: "joe",email: "joe@example.com,fb: "joe.smith", // facebookli: "joe.e.smith", // linkedinother: {…...
Read by _id (shard key)Shard 1 Shard 2 Shard 3find( { _id: "joe"} )
Read by email (non-shardkey)Shard 1 Shard 2 Shard 3find ( { email: joe@example.com })
Considerations• Lookup by shard key is routed to 1 shard• Lookup by other identifier is scatter gatheredacross all shards•...
// Create unique indexdb.identities.ensureIndex( { identifier : 1} , { unique: true} )// Create a document for each users ...
Read requires 2 readsShard 1 Shard 2 Shard 3db.identities.find({"identifier" : {"hndl" : "joe" }})db.users.find( { _id: "1...
Solution• Lookup to Identities is a routed query• Lookup to Users is a routed query• Unique indexes available
Conclusion
Summary• Multiple ways to model a domain problem• Understand the key uses cases of your app• Balance between ease of query...
Technical Director, 10gen@jonnyeight alvin@10gen.com alvinonmongodb.comAlvin Richards#MongoDBdaysThank You
MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen
Upcoming SlideShare
Loading in...5
×

MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

2,130

Published on

In this session, we'll examine schema design insights and trade-offs using real world examples. We'll look at three example applications: building an email inbox, selecting a shard key for a large scale web application, and using MongoDB to store user profiles. From these examples you should leave the session with an idea of the advantages and disadvantages of various approaches to modeling your data in MongoDB. Attendees should be well versed in basic schema design and familiar with concepts in the morning's basic schema design talk. No beginner topics will be covered in this session.

0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,130
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
64
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

MongoDB London 2013: Data Modeling Examples from the Real World presented by Alvin Richards, 10gen

  1. 1. Technical Director, 10gen@jonnyeight alvin@10gen.com alvinonmongodb.comAlvin Richards#MongoDBdaysSchema Design4 Real World Use Cases
  2. 2. Single Table EnAgenda• Why is schema design important• 4 Real World Schemas– Inbox– History– IndexedAttributes– Multiple Identities• Conclusions
  3. 3. Why is Schema Designimportant?• Largest factor for a performant system• Schema design with MongoDB is different• RBMS – "What answers do I have?"• MongoDB – "What question will I have?"
  4. 4. #1 - Message Inbox
  5. 5. Let’s getSocial
  6. 6. Sending Messages?
  7. 7. Design Goals• Efficiently send new messages to recipients• Efficiently read inbox
  8. 8. Reading my Inbox?
  9. 9. 3 Approaches (there aremore)• Fan out on Read• Fan out on Write• Fan out on Write with Bucketing
  10. 10. // Shard on "from"db.shardCollection( "mongodbdays.inbox", { from: 1 } )// Make sure we have an index to handle inbox readsdb.inbox.ensureIndex( { to: 1, sent: 1 } )msg = {from: "Joe",to: [ "Bob", "Jane" ],sent: new Date(),message: "Hi!",}// Send a messagedb.inbox.save( msg )// Read my inboxdb.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )Fan out on read
  11. 11. Fan out on read – SendMessageShard 1 Shard 2 Shard 3SendMessage
  12. 12. Fan out on read – Inbox ReadShard 1 Shard 2 Shard 3ReadInbox
  13. 13. Considerations• 1 document per message sent• Multiple recipients in an array key• Reading an inbox is finding all messages withmy own name in the recipient field• Requires scatter-gather on sharded cluster• Then a lot of random IO on a shard to findeverything
  14. 14. // Shard on “recipient” and “sent”db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )msg = {from: "Joe”,to: [ "Bob", "Jane" ],sent: new Date(),message: "Hi!",}// Send a messagefor ( recipient in msg.to ) {msg.recipient = recipientdb.inbox.save( msg );}// Read my inboxdb.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )Fan out on write
  15. 15. Fan out on write – SendMessageShard 1 Shard 2 Shard 3SendMessage
  16. 16. Fan out on write– Read InboxShard 1 Shard 2 Shard 3ReadInbox
  17. 17. Considerations• 1 document per recipient• Reading my inbox is just finding all of themessages with me as the recipient• Can shard on recipient, so inbox reads hit oneshard• But still lots of random IO on the shard
  18. 18. Fan out on write with buckets• Each “inbox” document is an array of messages• Append a message onto “inbox” of recipient• Bucket inbox documents so there’s not too manyper document• Can shard on recipient, so inbox reads hit oneshard• 1 or 2 documents to read the whole inbox
  19. 19. // Shard on “owner / sequence”db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } )db.shardCollection( "mongodbdays.users", { user_name: 1 } )msg = {from: "Joe",to: [ "Bob", "Jane" ],sent: new Date(),message: "Hi!",}// Send a messagefor( recipient in msg.to) {count = db.users.findAndModify({query: { user_name: msg.to[recipient] },update: { "$inc": { "msg_count": 1 } },upsert: true,new: true }).msg_count;sequence = Math.floor(count / 50);db.inbox.update( { owner: msg.to[recipient], sequence: sequence },{ $push: { "messages": msg } },{ upsert: true } );}// Read my inboxdb.inbox.find( { owner: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 )Fan out on write – withbuckets
  20. 20. Bucketed fan out on write -SendShard 1 Shard 2 Shard 3SendMessage
  21. 21. Bucketed fan out on write -ReadShard 1 Shard 2 Shard 3ReadInbox
  22. 22. #2 – History
  23. 23. Design Goals• Need to retain a limited amount of history e.g.– Hours, Days, Weeks– May be legislative requirement (e.g. HIPPA, SOX, DPA)• Need to query efficiently by– match– ranges
  24. 24. 3 Approaches (there aremore)• Bucket by Number of messages• Fixed size Array• Bucket by Date + TTL Collections
  25. 25. db.inbox.find(){ owner: "Joe", sequence: 25,messages: [{ from: "Joe",to: [ "Bob", "Jane" ],sent: ISODate("2013-03-01T09:59:42.689Z"),message: "Hi!"},…] }// Query with a date rangedb.inbox.find ( { owner: "friend1",messages: {$elemMatch: { sent: { $gte: ISODate("2013-04-04…") }}}})// Remove elements based on a datedb.inbox.update( { owner: "friend1" },{ $pull:{ messages: { sent: { $gte: ISODate("2013-04-04…") } } } } )Inbox – Bucket by #messages
  26. 26. Considerations• Shrinking documents, space can be reclaimedwith– db.runCommand ( { compact: <collection> } )• Removing the document after the last element inthe array as been removed– { "_id" : …, "messages" : [ ], "owner" :"friend1", "sequence" : 0 }
  27. 27. msg = {from: "Your Boss",to: [ "Bob" ],sent: new Date(),message: "CALL ME NOW!"}// 2.4 Introduces $each, $sort and $slice for $pushdb.messages.update({ _id: 1 },{ $push: { messages: { $each: [ msg ],$sort: { sent: 1 },$slice: -50 }}})Maintain the latest – FixedSize Array
  28. 28. Considerations• Need to compute the size of the array based onretention period
  29. 29. // messages: one doc per user per daydb.inbox.findOne(){_id: 1,to: "Joe",sequence: ISODate("2013-02-04T00:00:00.392Z"),messages: [ ]}// Auto expires data after 31536000 seconds = 1 yeardb.messages.ensureIndex( { sequence: 1 },{ expireAfterSeconds: 31536000 } )TTL Collections
  30. 30. #3 – Indexed Attributes
  31. 31. Design Goal• Application needs to stored a variable number ofattributes e.g.– User defined Form– Meta Data tags• Queries needed– Equality– Range based• Need to be efficient, regardless of the number ofattributes
  32. 32. 2 Approaches (there aremore)• Attributes• Attributes as Objects in an Array
  33. 33. db.files.insert( { _id: "local.0",attr: { type: "text", size: 64,created: ISODate("2013-03-01T09:59:42.689Z" } } )db.files.insert( { _id:"local.1",attr: { type: "text", size: 128} } )db.files.insert( { _id:"mongod",attr: { type: "binary", size: 256,created: ISODate("2013-04-01T18:13:42.689Z") } } )// Need to create an index for each item in the sub-documentdb.files.ensureIndex( { "attr.type": 1 } )db.files.find( { "attr.type": "text"} )// Can perform range queriesdb.files.ensureIndex( { "attr.size": 1 } )db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )Attributes as a Sub-Document
  34. 34. Considerations• Each attribute needs an Index• Each time you extend, you add an index• Lots and lots of indexes
  35. 35. db.files.insert( { _id: "local.0",attr: [ { type: "text" }, { size: 64 },{ created: ISODate("2013-03-01T09:59:42.689Z" } ] } )db.files.insert( { _id: "local.1",attr: [ { type: "text" }, { size: 128 } ] } )db.files.insert( { _id: "mongod",attr: [ { type: "binary" }, { size: 256 },{ created: ISODate("2013-04-01T18:13:42.689Z") } ] } )db.files.ensureIndex( { attr: 1 } )Attributes as Objects in Array
  36. 36. // Range queriesdb.files.find( { attr: { $gt: { size:64 }, $lte: { size: 16384 } } } )db.files.find( { attr:{ $gte: { created: ISODate("2013-02-01T00:00:01.689Z") } } } )// Multiple condition – Only the first predicate on the query can use the Index// ensure that this is the most selective.// Index Intersection will allow multiple indexes, see SERVER-3071db.files.find( { $and: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } },{ attr: { $gt: { size:128 }, $lte: { size: 16384 } } }] } )// Each $or can use an indexdb.files.find( { $or: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } },{ attr: { $gt: { size:128 }, $lte: { size: 16384 } } }] } )Queries
  37. 37. #4 – Multiple Identities
  38. 38. Design Goal• Ability to look up by a number of differentidentities e.g.• Username• Email address• FB Handle• LinkedIn URL
  39. 39. 2 Approaches (there aremore)• Identifiers in a single document• Separate Identifiers from Content
  40. 40. db.users.findOne(){ _id: "joe",email: "joe@example.com,fb: "joe.smith", // facebookli: "joe.e.smith", // linkedinother: {…}}// Shard collection by _iddb.shardCollection("mongodbdays.users", { _id: 1 } )// Create indexes on each keydb.users.ensureIndex( { email: 1} )db.users.ensureIndex( { fb: 1 } )db.users.ensureIndex( { li: 1 } )Single Document by User
  41. 41. Read by _id (shard key)Shard 1 Shard 2 Shard 3find( { _id: "joe"} )
  42. 42. Read by email (non-shardkey)Shard 1 Shard 2 Shard 3find ( { email: joe@example.com })
  43. 43. Considerations• Lookup by shard key is routed to 1 shard• Lookup by other identifier is scatter gatheredacross all shards• Secondary keys cannot have a unique index
  44. 44. // Create unique indexdb.identities.ensureIndex( { identifier : 1} , { unique: true} )// Create a document for each users documentdb.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } )db.identities.save( { identifier : { email: "joe@example.com" }, user: "1200-42" } )db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } )// Shard collection by _iddb.shardCollection( "mongodbdays.identities", { identifier : 1 } )// Create unique indexdb.users.ensureIndex( { _id: 1} , { unique: true} )// Create a docuemnt that holds all the other user attributesdb.users.save( { _id: "1200-42", ... } )// Shard collection by _iddb.shardCollection( "mongodbdays.users", { _id: 1 } )Document per Identity
  45. 45. Read requires 2 readsShard 1 Shard 2 Shard 3db.identities.find({"identifier" : {"hndl" : "joe" }})db.users.find( { _id: "1200-42"})
  46. 46. Solution• Lookup to Identities is a routed query• Lookup to Users is a routed query• Unique indexes available
  47. 47. Conclusion
  48. 48. Summary• Multiple ways to model a domain problem• Understand the key uses cases of your app• Balance between ease of query vs. ease of write• Random IO should be avoided
  49. 49. Technical Director, 10gen@jonnyeight alvin@10gen.com alvinonmongodb.comAlvin Richards#MongoDBdaysThank You
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×