Perl Engineer & Evangelist, 10gen
Mike Friedman
Data Modeling Examples in
the Real World
Single Table En
Agenda
• Why is schema design important
• 4 Real World Schemas
– Inbox
– History
– IndexedAttributes
– Mul...
Why is Schema Design
important?
• Largest factor for a performant system
• Schema design with MongoDB is different
• RDBMS...
#1 - Message Inbox
Let’s get
Social
Sending Messages
?
Design Goals
• Efficiently send new messages to recipients
• Efficiently read inbox
Reading my Inbox
?
3 Approaches (there are
more)
• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
// Shard on "from"
db.shardCollection( "mongodbdays.inbox", { from: 1 } )
// Make sure we have an index to handle inbox re...
Fan out on read – I/O
Shard
1 Shard 2
Shard
3
Send
Message
Fan out on read – I/O
Shard
1 Shard 2
Shard
3
Read
Inbox
Send
Message
Considerations
• Write: One document per message sent
• Read: Find all messages with my own name in
the recipient field
• ...
// Shard on “recipient” and “sent”
db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = {
from: ...
Fan out on write – I/O
Shard
1
Shard
2
Shard
3
Send
Message
Fan out on write – I/O
Read
Inbox
Send
Message
Shard
1
Shard
2
Shard
3
Considerations
• Write: One document per recipient
• Read: Find all of the messages with me as the
recipient
• Can shard o...
// Shard on "owner / sequence"
db.shardCollection( "mongodbdays.inbox",
{ owner: 1, sequence: 1 } )
db.shardCollection( "m...
// Send a message
for( recipient in msg.to) {
count = db.users.findAndModify({
query: { user_name: msg.to[recipient] },
up...
Fan out on write with buckets
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient...
Fan out on write with buckets – I/O
Shard
1
Shard
2
Shard
3
Send
Message
Shard
1
Shard
2
Shard
3
Fan out on write with buckets – I/O
Read
Inbox
Send
Message
#2 – History
Design Goals
• Need to retain a limited amount of history e.g.
– Hours, Days, Weeks
– May be legislative requirement (e.g....
3 Approaches (there are
more)
• Bucket by Number of messages
• Fixed size array
• Bucket by date + TTL collections
db.inbox.find()
{ owner: "Joe", sequence: 25,
messages: [
{ from: "Joe",
to: [ "Bob", "Jane" ],
sent: ISODate("2013-03-01T...
Considerations
• Shrinking documents, space can be reclaimed
with
– db.runCommand ( { compact: '<collection>' } )
• Removi...
msg = {
from: "Your Boss",
to: [ "Bob" ],
sent: new Date(),
message: "CALL ME NOW!"
}
// 2.4 Introduces $each, $sort and $...
Considerations
• Need to compute the size of the array based on
retention period
// messages: one doc per user per day
db.inbox.findOne()
{
_id: 1,
to: "Joe",
sequence: ISODate("2013-02-04T00:00:00.392Z"...
#3 – Indexed Attributes
Design Goal
• Application needs to stored a variable number of
attributes e.g.
– User defined Form
– Meta Data tags
• Quer...
2 Approaches (there are
more)
• Attributes as Embedded Document
• Attributes as Objects in an Array
db.files.insert( { _id: "local.0",
attr: { type: "text", size: 64,
created: ISODate("..." } } )
db.files.insert( { _id: "l...
Considerations
• Each attribute needs an Index
• Each time you extend, you add an index
• Lots and lots of indexes
db.files.insert( {_id: "local.0",
attr: [ { type: "text" },
{ size: 64 },
{ created: ISODate("...") } ] } )
db.files.inser...
Considerations
• Only one index needed on attr
• Can support range queries, etc.
• Index can be used only once per query
#4 – Multiple Identities
Design Goal
• Ability to look up by a number of different
identities e.g.
• Username
• Email address
• FB Handle
• LinkedI...
2 Approaches (there are
more)
• Identifiers in a single document
• Separate Identifiers from Content
db.users.findOne()
{ _id: "joe",
email: "joe@example.com,
fb: "joe.smith", // facebook
li: "joe.e.smith", // linkedin
othe...
Read by _id (shard key)
Shard 1 Shard 2 Shard 3
find( { _id: "joe"} )
Read by email (non-shard
key)
Shard 1 Shard 2 Shard 3
find ( { email: joe@example.com }
)
Considerations
• Lookup by shard key is routed to 1 shard
• Lookup by other identifier is scatter gathered
across all shar...
// Create unique index
db.identities.ensureIndex( { identifier : 1} , { unique: true} )
// Create a document for each user...
Read requires 2 reads
Shard 1 Shard 2 Shard 3
db.identities.find({"identifier" : {
"hndl" : "joe" }})
db.users.find( { _id...
Considerations
• Lookup to Identities is a routed query
• Lookup to Users is a routed query
• Unique indexes available
• M...
Conclusion
Summary
• Multiple ways to model a domain problem
• Understand the key uses cases of your app
• Balance between ease of qu...
Perl Engineer & Evangelist, 10gen
Mike Friedman
#MongoDBdays
Thank You
Next Sessions at 3:40
5th Floor:
West Side Ballroom 3&4:Advanced Replication Internals
West Side Ballroom 1&2: Building a ...
Webinar: Data Modeling Examples in the Real World
Upcoming SlideShare
Loading in …5
×

Webinar: Data Modeling Examples in the Real World

4,888 views

Published on

In this session, we'll examine schema design insights and trade-offs using real world examples. We'll look at three example applications: building an email inbox, selecting a shard key for a large scale web application, and using MongoDB to store user profiles. From these examples you should leave the session with an idea of the advantages and disadvantages of various approaches to modeling your data in MongoDB. Attendees should be well versed in basic schema design.

0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,888
On SlideShare
0
From Embeds
0
Number of Embeds
3,252
Actions
Shares
0
Downloads
76
Comments
0
Likes
12
Embeds 0
No embeds

No notes for slide

Webinar: Data Modeling Examples in the Real World

  1. 1. Perl Engineer & Evangelist, 10gen Mike Friedman Data Modeling Examples in the Real World
  2. 2. Single Table En Agenda • Why is schema design important • 4 Real World Schemas – Inbox – History – IndexedAttributes – Multiple Identities • Conclusions
  3. 3. Why is Schema Design important? • Largest factor for a performant system • Schema design with MongoDB is different • RDBMS – "What answers do I have?" • MongoDB – "What question will I have?"
  4. 4. #1 - Message Inbox
  5. 5. Let’s get Social
  6. 6. Sending Messages ?
  7. 7. Design Goals • Efficiently send new messages to recipients • Efficiently read inbox
  8. 8. Reading my Inbox ?
  9. 9. 3 Approaches (there are more) • Fan out on Read • Fan out on Write • Fan out on Write with Bucketing
  10. 10. // Shard on "from" db.shardCollection( "mongodbdays.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } ) Fan out on read
  11. 11. Fan out on read – I/O Shard 1 Shard 2 Shard 3 Send Message
  12. 12. Fan out on read – I/O Shard 1 Shard 2 Shard 3 Read Inbox Send Message
  13. 13. Considerations • Write: One document per message sent • Read: Find all messages with my own name in the recipient field • Read: Requires scatter-gather on sharded cluster • A lot of random I/O on a shard to find everything
  14. 14. // Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for ( recipient in msg.to ) { msg.recipient = msg.to[recipient] db.inbox.save( msg ); } // Read my inbox db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } ) Fan out on write
  15. 15. Fan out on write – I/O Shard 1 Shard 2 Shard 3 Send Message
  16. 16. Fan out on write – I/O Read Inbox Send Message Shard 1 Shard 2 Shard 3
  17. 17. Considerations • Write: One document per recipient • Read: Find all of the messages with me as the recipient • Can shard on recipient, so inbox reads hit one shard • But still lots of random I/O on the shard
  18. 18. // Shard on "owner / sequence" db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( "mongodbdays.users", { user_name: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } Fan out on write with buckets
  19. 19. // Send a message for( recipient in msg.to) { count = db.users.findAndModify({ query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50); db.inbox.update({ owner: msg.to[recipient], sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } ); } // Read my inbox db.inbox.find( { owner: "Joe" } ) .sort ( { sequence: -1 } ).limit( 2 ) Fan out on write with buckets
  20. 20. Fan out on write with buckets • Each “inbox” document is an array of messages • Append a message onto “inbox” of recipient • Bucket inboxes so there’s not too many messages per document • Can shard on recipient, so inbox reads hit one shard • 1 or 2 documents to read the whole inbox
  21. 21. Fan out on write with buckets – I/O Shard 1 Shard 2 Shard 3 Send Message
  22. 22. Shard 1 Shard 2 Shard 3 Fan out on write with buckets – I/O Read Inbox Send Message
  23. 23. #2 – History
  24. 24. Design Goals • Need to retain a limited amount of history e.g. – Hours, Days, Weeks – May be legislative requirement (e.g. HIPPA, SOX, DPA) • Need to query efficiently by – match – ranges
  25. 25. 3 Approaches (there are more) • Bucket by Number of messages • Fixed size array • Bucket by date + TTL collections
  26. 26. db.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, … ] } // Query with a date range db.inbox.find ({owner: "friend1", messages: { $elemMatch: {sent:{$gte: ISODate("…") }}}}) // Remove elements based on a date db.inbox.update({owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("…") } } } } ) Bucket by number of messages
  27. 27. Considerations • Shrinking documents, space can be reclaimed with – db.runCommand ( { compact: '<collection>' } ) • Removing the document after the last element in the array as been removed – { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }
  28. 28. msg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" } // 2.4 Introduces $each, $sort and $slice for $push db.messages.update( { _id: 1 }, { $push: { messages: { $each: [ msg ], $sort: { sent: 1 }, $slice: -50 } } } ) Fixed Size Array
  29. 29. Considerations • Need to compute the size of the array based on retention period
  30. 30. // messages: one doc per user per day db.inbox.findOne() { _id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] } // Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } ) TTL Collections
  31. 31. #3 – Indexed Attributes
  32. 32. Design Goal • Application needs to stored a variable number of attributes e.g. – User defined Form – Meta Data tags • Queries needed – Equality – Range based • Need to be efficient, regardless of the number of attributes
  33. 33. 2 Approaches (there are more) • Attributes as Embedded Document • Attributes as Objects in an Array
  34. 34. db.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("..." } } ) db.files.insert( { _id: "local.1", attr: { type: "text", size: 128} } ) db.files.insert( { _id: "mongod", attr: { type: "binary", size: 256, created: ISODate("...") } } ) // Need to create an index for each item in the sub-document db.files.ensureIndex( { "attr.type": 1 } ) db.files.find( { "attr.type": "text"} ) // Can perform range queries db.files.ensureIndex( { "attr.size": 1 } ) db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } ) Attributes as a Sub- Document
  35. 35. Considerations • Each attribute needs an Index • Each time you extend, you add an index • Lots and lots of indexes
  36. 36. db.files.insert( {_id: "local.0", attr: [ { type: "text" }, { size: 64 }, { created: ISODate("...") } ] } ) db.files.insert( { _id: "local.1", attr: [ { type: "text" }, { size: 128 } ] } ) db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("...") } ] } ) db.files.ensureIndex( { attr: 1 } ) Attributes as Objects in Array
  37. 37. Considerations • Only one index needed on attr • Can support range queries, etc. • Index can be used only once per query
  38. 38. #4 – Multiple Identities
  39. 39. Design Goal • Ability to look up by a number of different identities e.g. • Username • Email address • FB Handle • LinkedIn URL
  40. 40. 2 Approaches (there are more) • Identifiers in a single document • Separate Identifiers from Content
  41. 41. db.users.findOne() { _id: "joe", email: "joe@example.com, fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…} } // Shard collection by _id db.shardCollection("mongodbdays.users", { _id: 1 } ) // Create indexes on each key db.users.ensureIndex( { email: 1} ) db.users.ensureIndex( { fb: 1 } ) db.users.ensureIndex( { li: 1 } ) Single Document by User
  42. 42. Read by _id (shard key) Shard 1 Shard 2 Shard 3 find( { _id: "joe"} )
  43. 43. Read by email (non-shard key) Shard 1 Shard 2 Shard 3 find ( { email: joe@example.com } )
  44. 44. Considerations • Lookup by shard key is routed to 1 shard • Lookup by other identifier is scatter gathered across all shards • Secondary keys cannot have a unique index
  45. 45. // Create unique index db.identities.ensureIndex( { identifier : 1} , { unique: true} ) // Create a document for each users document db.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } ) db.identities.save( { identifier : { email: "joe@abc.com" }, user: "1200-42" } ) db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } ) // Shard collection by _id db.shardCollection( "mydb.identities", { identifier : 1 } ) // Create unique index db.users.ensureIndex( { _id: 1} , { unique: true} ) // Shard collection by _id db.shardCollection( "mydb.users", { _id: 1 } ) Document per Identity
  46. 46. Read requires 2 reads Shard 1 Shard 2 Shard 3 db.identities.find({"identifier" : { "hndl" : "joe" }}) db.users.find( { _id: "1200-42"} )
  47. 47. Considerations • Lookup to Identities is a routed query • Lookup to Users is a routed query • Unique indexes available • Must do two queries per lookup
  48. 48. Conclusion
  49. 49. Summary • Multiple ways to model a domain problem • Understand the key uses cases of your app • Balance between ease of query vs. ease of write • Random I/O should be avoided
  50. 50. Perl Engineer & Evangelist, 10gen Mike Friedman #MongoDBdays Thank You
  51. 51. Next Sessions at 3:40 5th Floor: West Side Ballroom 3&4:Advanced Replication Internals West Side Ballroom 1&2: Building a High-Performance Distributed Task Queue on MongoDB Juilliard Complex: WhiteBoard Q&A Lyceum Complex: Ask the Experts 7th Floor: Empire Complex: Managing a Maturing MongoDB Ecosystem SoHo Complex: MongoDB Indexing Constraints and Creative Schemas

×