• Share
  • Email
  • Embed
  • Like
  • Private Content
Choosing a Shard key
 

Choosing a Shard key

on

  • 4,611 views

Choosing a shard key can be difficult, and the factors involved largely depend on your use case. In fact, there is no such thing as a perfect shard key; there are design tradeoffs inherent in every ...

Choosing a shard key can be difficult, and the factors involved largely depend on your use case. In fact, there is no such thing as a perfect shard key; there are design tradeoffs inherent in every decision. This presentation goes through those tradeoffs, as well as the different types of shard keys available in MongoDB, such as hashed and compound shard keys

Statistics

Views

Total Views
4,611
Views on SlideShare
4,143
Embed Views
468

Actions

Likes
14
Downloads
67
Comments
0

7 Embeds 468

http://data.story.lu 343
http://jb.integration.scoop.it 53
https://twitter.com 41
http://www.redditmedia.com 15
http://cloud.feedly.com 8
http://reader.aol.com 7
https://www.linkedin.com 1
More...

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Choosing a Shard key Choosing a Shard key Presentation Transcript

    • Sever Engineer, 10gen Shaun Verch #oscon Choosing a Shard Key: Four Real-World Use Cases
    • Sever Engineer, 10gen Shaun Verch #oscon Schema Design Four Real-World Use Cases
    • Single Table En Agenda • Why is schema design important • 4 Real World Schemas – Inbox – History – IndexedAttributes – Multiple Identities • Conclusions
    • Why is Schema Design important? • Largest factor for a performant system • Schema design with MongoDB is different • RDBMS – "What answers do I have?" • MongoDB – "What question will I have?"
    • #1 - Message Inbox
    • Let’s get Social
    • Sending Messages ?
    • Design Goals • Efficiently send new messages to recipients • Efficiently read inbox
    • Reading my Inbox ?
    • 3 Approaches (there are more) • Fan out on Read • Fan out on Write • Fan out on Write with Bucketing
    • // Shard on "from" db.shardCollection( ”oscon.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } ) Fan out on read
    • Fan out on read – I/O Shard 1 Shard 2 Shard 3 Send Message
    • Fan out on read – I/O Shard 1 Shard 2 Shard 3 Read Inbox Send Message
    • Considerations • Write: One document per message sent • Read: Find all messages with my own name in the recipient field • Read: Requires scatter-gather on sharded cluster • A lot of random I/O on a shard to find everything
    • // Shard on “recipient” and “sent” db.shardCollection( ”oscon.inbox", { ”recipient”: 1, ”sent”: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for (var i = 0; i < msg.to.length; i++) { msg.recipient = msg.to[i] db.inbox.save( msg ); } // Read my inbox db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } ) Fan out on write
    • Fan out on write – I/O Shard 1 Shard 2 Shard 3 Send Message
    • Fan out on write – I/O Read Inbox Send Message Shard 1 Shard 2 Shard 3
    • Considerations • Write: One document per recipient • Read: Find all of the messages with me as the recipient • Can shard on recipient, so inbox reads hit one shard • But still lots of random I/O on the shard
    • // Shard on "owner / sequence" db.shardCollection( ”oscon.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( ”oscon.users", { user_name: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } Fan out on write with buckets
    • // Send a message for(var i = 0; i < msg.to.length; i++) { count = db.users.findAndModify({ query: { user_name: msg.to[i] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50); db.inbox.update({ owner: msg.to[i], sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } ); } // Read my inbox db.inbox.find( { owner: "Joe" } ) .sort ( { sequence: -1 } ).limit( 2 ) Fan out on write with buckets
    • Fan out on write with buckets • Each “inbox” document is an array of messages • Append a message onto “inbox” of recipient • Bucket inboxes so there’s not too many messages per document • Can shard on recipient, so inbox reads hit one shard • 1 or 2 documents to read the whole inbox
    • Fan out on write with buckets – I/O Shard 1 Shard 2 Shard 3 Send Message
    • Shard 1 Shard 2 Shard 3 Fan out on write with buckets – I/O Read Inbox Send Message
    • #2 – History
    • Design Goals • Need to retain a limited amount of history e.g. – Hours, Days, Weeks – May be legislative requirement (e.g. HIPPA, SOX, DPA) • Need to query efficiently by – match – ranges
    • 3 Approaches (there are more) • Bucket by Number of messages • Fixed size array • Bucket by date + TTL collections
    • db.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, … ] } // Query with a date range db.inbox.find ({owner: "friend1", messages: { $elemMatch: {sent:{$gte: ISODate("…") }}}}) // Remove elements based on a date db.inbox.update({owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("…") } } } } ) Bucket by number of messages
    • Considerations • Shrinking documents, space can be reclaimed with – db.runCommand ( { compact: '<collection>' } ) • Removing the document after the last element in the array as been removed – { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }
    • msg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" } // 2.4 Introduces $each, $sort and $slice for $push db.messages.update( { _id: 1 }, { $push: { messages: { $each: [ msg ], $sort: { sent: 1 }, $slice: -50 } } } ) Fixed Size Array
    • Considerations • Need to compute the size of the array based on retention period
    • // messages: one doc per user per day db.inbox.findOne() { _id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] } // Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } ) TTL Collections
    • #3 – Indexed Attributes
    • Design Goal • Application needs to stored a variable number of attributes e.g. – User defined Form – Meta Data tags • Queries needed – Equality – Range based • Need to be efficient, regardless of the number of attributes
    • 2 Approaches (there are more) • Attributes as Embedded Document • Attributes as Objects in an Array
    • db.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("..." } } ) db.files.insert( { _id: "local.1", attr: { type: "text", size: 128} } ) db.files.insert( { _id: "mongod", attr: { type: "binary", size: 256, created: ISODate("...") } } ) // Need to create an index for each item in the sub-document db.files.ensureIndex( { "attr.type": 1 } ) db.files.find( { "attr.type": "text"} ) // Can perform range queries db.files.ensureIndex( { "attr.size": 1 } ) db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } ) Attributes as a Sub- Document
    • Considerations • Each attribute needs an Index • Each time you extend, you add an index • Lots and lots of indexes
    • db.files.insert( {_id: "local.0", attr: [ { type: "text" }, { size: 64 }, { created: ISODate("...") } ] } ) db.files.insert( { _id: "local.1", attr: [ { type: "text" }, { size: 128 } ] } ) db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("...") } ] } ) db.files.ensureIndex( { attr: 1 } ) Attributes as Objects in Array
    • Considerations • Only one index needed on attr • Can support range queries, etc. • Index can be used only once per query
    • #4 – Multiple Identities
    • Design Goal • Ability to look up by a number of different identities e.g. • Username • Email address • FB Handle • LinkedIn URL
    • 2 Approaches (there are more) • Identifiers in a single document • Separate Identifiers from Content
    • db.users.findOne() { _id: "joe", email: "joe@example.com, fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…} } // Shard collection by _id db.shardCollection(”oscon.users", { _id: 1 } ) // Create indexes on each key db.users.ensureIndex( { email: 1} ) db.users.ensureIndex( { fb: 1 } ) db.users.ensureIndex( { li: 1 } ) Single Document by User
    • Read by _id (shard key) Shard 1 Shard 2 Shard 3 find( { _id: "joe"} )
    • Read by email (non-shard key) Shard 1 Shard 2 Shard 3 find ( { email: joe@example.com } )
    • Considerations • Lookup by shard key is routed to 1 shard • Lookup by other identifier is scatter gathered across all shards • Secondary keys cannot have a unique index
    • // Create unique index db.identities.ensureIndex( { identifier : 1} , { unique: true} ) // Create a document for each users document db.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } ) db.identities.save( { identifier : { email: "joe@abc.com" }, user: "1200-42" } ) db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } ) // Shard collection by _id db.shardCollection( "mydb.identities", { identifier : 1 } ) // Create unique index db.users.ensureIndex( { _id: 1} , { unique: true} ) // Shard collection by _id db.shardCollection( "mydb.users", { _id: 1 } ) Document per Identity
    • Read requires 2 reads Shard 1 Shard 2 Shard 3 db.identities.find({"identifier" : { "hndl" : "joe" }}) db.users.find( { _id: "1200-42"} )
    • Considerations • Lookup to Identities is a routed query • Lookup to Users is a routed query • Unique indexes available • Must do two queries per lookup
    • Conclusion
    • Summary • Multiple ways to model a domain problem • Understand the key uses cases of your app • Balance between ease of query vs. ease of write • Random I/O should be avoided
    • Server Engineer, 10gen Shaun Verch #oscon Thank You