Sever Engineer, 10gen
Shaun Verch
#oscon
Choosing a Shard Key:
Four Real-World Use
Cases
Sever Engineer, 10gen
Shaun Verch
#oscon
Schema Design
Four Real-World Use
Cases
Single Table En
Agenda
• Why is schema design important
• 4 Real World Schemas
– Inbox
– History
– IndexedAttributes
– Mul...
Why is Schema Design
important?
• Largest factor for a performant system
• Schema design with MongoDB is different
• RDBMS...
#1 - Message Inbox
Let’s get
Social
Sending Messages
?
Design Goals
• Efficiently send new messages to recipients
• Efficiently read inbox
Reading my Inbox
?
3 Approaches (there are
more)
• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
// Shard on "from"
db.shardCollection( ”oscon.inbox", { from: 1 } )
// Make sure we have an index to handle inbox reads
db...
Fan out on read – I/O
Shard
1 Shard 2
Shard
3
Send
Message
Fan out on read – I/O
Shard
1 Shard 2
Shard
3
Read
Inbox
Send
Message
Considerations
• Write: One document per message sent
• Read: Find all messages with my own name in
the recipient field
• ...
// Shard on “recipient” and “sent”
db.shardCollection( ”oscon.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = {
from: "Joe",...
Fan out on write – I/O
Shard
1
Shard
2
Shard
3
Send
Message
Fan out on write – I/O
Read
Inbox
Send
Message
Shard
1
Shard
2
Shard
3
Considerations
• Write: One document per recipient
• Read: Find all of the messages with me as the
recipient
• Can shard o...
// Shard on "owner / sequence"
db.shardCollection( ”oscon.inbox",
{ owner: 1, sequence: 1 } )
db.shardCollection( ”oscon.u...
// Send a message
for(var i = 0; i < msg.to.length; i++) {
count = db.users.findAndModify({
query: { user_name: msg.to[i] ...
Fan out on write with buckets
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient...
Fan out on write with buckets – I/O
Shard
1
Shard
2
Shard
3
Send
Message
Shard
1
Shard
2
Shard
3
Fan out on write with buckets – I/O
Read
Inbox
Send
Message
#2 – History
Design Goals
• Need to retain a limited amount of history e.g.
– Hours, Days, Weeks
– May be legislative requirement (e.g....
3 Approaches (there are
more)
• Bucket by Number of messages
• Fixed size array
• Bucket by date + TTL collections
db.inbox.find()
{ owner: "Joe", sequence: 25,
messages: [
{ from: "Joe",
to: [ "Bob", "Jane" ],
sent: ISODate("2013-03-01T...
Considerations
• Shrinking documents, space can be reclaimed
with
– db.runCommand ( { compact: '<collection>' } )
• Removi...
msg = {
from: "Your Boss",
to: [ "Bob" ],
sent: new Date(),
message: "CALL ME NOW!"
}
// 2.4 Introduces $each, $sort and $...
Considerations
• Need to compute the size of the array based on
retention period
// messages: one doc per user per day
db.inbox.findOne()
{
_id: 1,
to: "Joe",
sequence: ISODate("2013-02-04T00:00:00.392Z"...
#3 – Indexed Attributes
Design Goal
• Application needs to stored a variable number of
attributes e.g.
– User defined Form
– Meta Data tags
• Quer...
2 Approaches (there are
more)
• Attributes as Embedded Document
• Attributes as Objects in an Array
db.files.insert( { _id: "local.0",
attr: { type: "text", size: 64,
created: ISODate("..." } } )
db.files.insert( { _id: "l...
Considerations
• Each attribute needs an Index
• Each time you extend, you add an index
• Lots and lots of indexes
db.files.insert( {_id: "local.0",
attr: [ { type: "text" },
{ size: 64 },
{ created: ISODate("...") } ] } )
db.files.inser...
Considerations
• Only one index needed on attr
• Can support range queries, etc.
• Index can be used only once per query
#4 – Multiple Identities
Design Goal
• Ability to look up by a number of different
identities e.g.
• Username
• Email address
• FB Handle
• LinkedI...
2 Approaches (there are
more)
• Identifiers in a single document
• Separate Identifiers from Content
db.users.findOne()
{ _id: "joe",
email: "joe@example.com,
fb: "joe.smith", // facebook
li: "joe.e.smith", // linkedin
othe...
Read by _id (shard key)
Shard 1 Shard 2 Shard 3
find( { _id: "joe"} )
Read by email (non-shard
key)
Shard 1 Shard 2 Shard 3
find ( { email: joe@example.com }
)
Considerations
• Lookup by shard key is routed to 1 shard
• Lookup by other identifier is scatter gathered
across all shar...
// Create unique index
db.identities.ensureIndex( { identifier : 1} , { unique: true} )
// Create a document for each user...
Read requires 2 reads
Shard 1 Shard 2 Shard 3
db.identities.find({"identifier" : {
"hndl" : "joe" }})
db.users.find( { _id...
Considerations
• Lookup to Identities is a routed query
• Lookup to Users is a routed query
• Unique indexes available
• M...
Conclusion
Summary
• Multiple ways to model a domain problem
• Understand the key uses cases of your app
• Balance between ease of qu...
Server Engineer, 10gen
Shaun Verch
#oscon
Thank You
Choosing a Shard key
Upcoming SlideShare
Loading in...5
×

Choosing a Shard key

5,915

Published on

Choosing a shard key can be difficult, and the factors involved largely depend on your use case. In fact, there is no such thing as a perfect shard key; there are design tradeoffs inherent in every decision. This presentation goes through those tradeoffs, as well as the different types of shard keys available in MongoDB, such as hashed and compound shard keys

0 Comments
21 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,915
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
107
Comments
0
Likes
21
Embeds 0
No embeds

No notes for slide

Choosing a Shard key

  1. 1. Sever Engineer, 10gen Shaun Verch #oscon Choosing a Shard Key: Four Real-World Use Cases
  2. 2. Sever Engineer, 10gen Shaun Verch #oscon Schema Design Four Real-World Use Cases
  3. 3. Single Table En Agenda • Why is schema design important • 4 Real World Schemas – Inbox – History – IndexedAttributes – Multiple Identities • Conclusions
  4. 4. Why is Schema Design important? • Largest factor for a performant system • Schema design with MongoDB is different • RDBMS – "What answers do I have?" • MongoDB – "What question will I have?"
  5. 5. #1 - Message Inbox
  6. 6. Let’s get Social
  7. 7. Sending Messages ?
  8. 8. Design Goals • Efficiently send new messages to recipients • Efficiently read inbox
  9. 9. Reading my Inbox ?
  10. 10. 3 Approaches (there are more) • Fan out on Read • Fan out on Write • Fan out on Write with Bucketing
  11. 11. // Shard on "from" db.shardCollection( ”oscon.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } ) Fan out on read
  12. 12. Fan out on read – I/O Shard 1 Shard 2 Shard 3 Send Message
  13. 13. Fan out on read – I/O Shard 1 Shard 2 Shard 3 Read Inbox Send Message
  14. 14. Considerations • Write: One document per message sent • Read: Find all messages with my own name in the recipient field • Read: Requires scatter-gather on sharded cluster • A lot of random I/O on a shard to find everything
  15. 15. // Shard on “recipient” and “sent” db.shardCollection( ”oscon.inbox", { ”recipient”: 1, ”sent”: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for (var i = 0; i < msg.to.length; i++) { msg.recipient = msg.to[i] db.inbox.save( msg ); } // Read my inbox db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } ) Fan out on write
  16. 16. Fan out on write – I/O Shard 1 Shard 2 Shard 3 Send Message
  17. 17. Fan out on write – I/O Read Inbox Send Message Shard 1 Shard 2 Shard 3
  18. 18. Considerations • Write: One document per recipient • Read: Find all of the messages with me as the recipient • Can shard on recipient, so inbox reads hit one shard • But still lots of random I/O on the shard
  19. 19. // Shard on "owner / sequence" db.shardCollection( ”oscon.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( ”oscon.users", { user_name: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } Fan out on write with buckets
  20. 20. // Send a message for(var i = 0; i < msg.to.length; i++) { count = db.users.findAndModify({ query: { user_name: msg.to[i] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50); db.inbox.update({ owner: msg.to[i], sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } ); } // Read my inbox db.inbox.find( { owner: "Joe" } ) .sort ( { sequence: -1 } ).limit( 2 ) Fan out on write with buckets
  21. 21. Fan out on write with buckets • Each “inbox” document is an array of messages • Append a message onto “inbox” of recipient • Bucket inboxes so there’s not too many messages per document • Can shard on recipient, so inbox reads hit one shard • 1 or 2 documents to read the whole inbox
  22. 22. Fan out on write with buckets – I/O Shard 1 Shard 2 Shard 3 Send Message
  23. 23. Shard 1 Shard 2 Shard 3 Fan out on write with buckets – I/O Read Inbox Send Message
  24. 24. #2 – History
  25. 25. Design Goals • Need to retain a limited amount of history e.g. – Hours, Days, Weeks – May be legislative requirement (e.g. HIPPA, SOX, DPA) • Need to query efficiently by – match – ranges
  26. 26. 3 Approaches (there are more) • Bucket by Number of messages • Fixed size array • Bucket by date + TTL collections
  27. 27. db.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, … ] } // Query with a date range db.inbox.find ({owner: "friend1", messages: { $elemMatch: {sent:{$gte: ISODate("…") }}}}) // Remove elements based on a date db.inbox.update({owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("…") } } } } ) Bucket by number of messages
  28. 28. Considerations • Shrinking documents, space can be reclaimed with – db.runCommand ( { compact: '<collection>' } ) • Removing the document after the last element in the array as been removed – { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }
  29. 29. msg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" } // 2.4 Introduces $each, $sort and $slice for $push db.messages.update( { _id: 1 }, { $push: { messages: { $each: [ msg ], $sort: { sent: 1 }, $slice: -50 } } } ) Fixed Size Array
  30. 30. Considerations • Need to compute the size of the array based on retention period
  31. 31. // messages: one doc per user per day db.inbox.findOne() { _id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] } // Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } ) TTL Collections
  32. 32. #3 – Indexed Attributes
  33. 33. Design Goal • Application needs to stored a variable number of attributes e.g. – User defined Form – Meta Data tags • Queries needed – Equality – Range based • Need to be efficient, regardless of the number of attributes
  34. 34. 2 Approaches (there are more) • Attributes as Embedded Document • Attributes as Objects in an Array
  35. 35. db.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("..." } } ) db.files.insert( { _id: "local.1", attr: { type: "text", size: 128} } ) db.files.insert( { _id: "mongod", attr: { type: "binary", size: 256, created: ISODate("...") } } ) // Need to create an index for each item in the sub-document db.files.ensureIndex( { "attr.type": 1 } ) db.files.find( { "attr.type": "text"} ) // Can perform range queries db.files.ensureIndex( { "attr.size": 1 } ) db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } ) Attributes as a Sub- Document
  36. 36. Considerations • Each attribute needs an Index • Each time you extend, you add an index • Lots and lots of indexes
  37. 37. db.files.insert( {_id: "local.0", attr: [ { type: "text" }, { size: 64 }, { created: ISODate("...") } ] } ) db.files.insert( { _id: "local.1", attr: [ { type: "text" }, { size: 128 } ] } ) db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("...") } ] } ) db.files.ensureIndex( { attr: 1 } ) Attributes as Objects in Array
  38. 38. Considerations • Only one index needed on attr • Can support range queries, etc. • Index can be used only once per query
  39. 39. #4 – Multiple Identities
  40. 40. Design Goal • Ability to look up by a number of different identities e.g. • Username • Email address • FB Handle • LinkedIn URL
  41. 41. 2 Approaches (there are more) • Identifiers in a single document • Separate Identifiers from Content
  42. 42. db.users.findOne() { _id: "joe", email: "joe@example.com, fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…} } // Shard collection by _id db.shardCollection(”oscon.users", { _id: 1 } ) // Create indexes on each key db.users.ensureIndex( { email: 1} ) db.users.ensureIndex( { fb: 1 } ) db.users.ensureIndex( { li: 1 } ) Single Document by User
  43. 43. Read by _id (shard key) Shard 1 Shard 2 Shard 3 find( { _id: "joe"} )
  44. 44. Read by email (non-shard key) Shard 1 Shard 2 Shard 3 find ( { email: joe@example.com } )
  45. 45. Considerations • Lookup by shard key is routed to 1 shard • Lookup by other identifier is scatter gathered across all shards • Secondary keys cannot have a unique index
  46. 46. // Create unique index db.identities.ensureIndex( { identifier : 1} , { unique: true} ) // Create a document for each users document db.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } ) db.identities.save( { identifier : { email: "joe@abc.com" }, user: "1200-42" } ) db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } ) // Shard collection by _id db.shardCollection( "mydb.identities", { identifier : 1 } ) // Create unique index db.users.ensureIndex( { _id: 1} , { unique: true} ) // Shard collection by _id db.shardCollection( "mydb.users", { _id: 1 } ) Document per Identity
  47. 47. Read requires 2 reads Shard 1 Shard 2 Shard 3 db.identities.find({"identifier" : { "hndl" : "joe" }}) db.users.find( { _id: "1200-42"} )
  48. 48. Considerations • Lookup to Identities is a routed query • Lookup to Users is a routed query • Unique indexes available • Must do two queries per lookup
  49. 49. Conclusion
  50. 50. Summary • Multiple ways to model a domain problem • Understand the key uses cases of your app • Balance between ease of query vs. ease of write • Random I/O should be avoided
  51. 51. Server Engineer, 10gen Shaun Verch #oscon Thank You
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×