• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data Modeling for the Real World
 

Data Modeling for the Real World

on

  • 358 views

This talk examines four real-world use cases for MongoDB document-based data modeling. We examine the implications of several possible solutions for each problem.

This talk examines four real-world use cases for MongoDB document-based data modeling. We examine the implications of several possible solutions for each problem.

Statistics

Views

Total Views
358
Views on SlideShare
356
Embed Views
2

Actions

Likes
0
Downloads
5
Comments
0

2 Embeds 2

http://jfeeds.carsmantra.com 1
http://wp.joshlabs.in 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Data Modeling for the Real World Data Modeling for the Real World Presentation Transcript

    • Perl Engineer & Evangelist, 10genMike Friedman#MongoDBdaysSchema DesignFour Real-World UseCases
    • Single Table EnAgenda• Why is schema design important• 4 Real World Schemas– Inbox– History– IndexedAttributes– Multiple Identities• Conclusions
    • Why is Schema Designimportant?• Largest factor for a performant system• Schema design with MongoDB is different• RDBMS – "What answers do I have?"• MongoDB – "What question will I have?"
    • #1 - Message Inbox
    • Let’s getSocial
    • Sending Messages?
    • Design Goals• Efficiently send new messages to recipients• Efficiently read inbox
    • Reading my Inbox?
    • 3 Approaches (there aremore)• Fan out on Read• Fan out on Write• Fan out on Write with Bucketing
    • // Shard on "from"db.shardCollection( "mongodbdays.inbox", { from: 1 } )// Make sure we have an index to handle inbox readsdb.inbox.ensureIndex( { to: 1, sent: 1 } )msg = {from: "Joe",to: [ "Bob", "Jane" ],sent: new Date(),message: "Hi!",}// Send a messagedb.inbox.save( msg )// Read my inboxdb.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )Fan out on read
    • Fan out on read – I/OShard1 Shard 2Shard3SendMessage
    • Fan out on read – I/OShard1 Shard 2Shard3ReadInboxSendMessage
    • Considerations• Write: One document per message sent• Read: Find all messages with my own name inthe recipient field• Read: Requires scatter-gather on shardedcluster• A lot of random I/O on a shard to find everything
    • // Shard on “recipient” and “sent”db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )msg = {from: "Joe",to: [ "Bob", "Jane" ],sent: new Date(),message: "Hi!",}// Send a messagefor ( recipient in msg.to ) {msg.recipient = msg.to[recipient]db.inbox.save( msg );}// Read my inboxdb.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )Fan out on write
    • Fan out on write – I/OShard1Shard2Shard3SendMessage
    • Fan out on write – I/OReadInboxSendMessageShard1Shard2Shard3
    • Considerations• Write: One document per recipient• Read: Find all of the messages with me as therecipient• Can shard on recipient, so inbox reads hit oneshard• But still lots of random I/O on the shard
    • // Shard on "owner / sequence"db.shardCollection( "mongodbdays.inbox",{ owner: 1, sequence: 1 } )db.shardCollection( "mongodbdays.users", { user_name: 1 } )msg = {from: "Joe",to: [ "Bob", "Jane" ],sent: new Date(),message: "Hi!",}Fan out on write with buckets
    • // Send a messagefor( recipient in msg.to) {count = db.users.findAndModify({query: { user_name: msg.to[recipient] },update: { "$inc": { "msg_count": 1 } },upsert: true,new: true }).msg_count;sequence = Math.floor(count / 50);db.inbox.update({owner: msg.to[recipient], sequence: sequence },{ $push: { "messages": msg } },{ upsert: true } );}// Read my inboxdb.inbox.find( { owner: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 )Fan out on write with buckets
    • Fan out on write with buckets• Each “inbox” document is an array of messages• Append a message onto “inbox” of recipient• Bucket inboxes so there’s not too manymessages per document• Can shard on recipient, so inbox reads hit oneshard• 1 or 2 documents to read the whole inbox
    • Fan out on write with buckets – I/OShard1Shard2Shard3SendMessage
    • Shard1Shard2Shard3Fan out on write with buckets – I/OReadInboxSendMessage
    • #2 – History
    • Design Goals• Need to retain a limited amount of history e.g.– Hours, Days, Weeks– May be legislative requirement (e.g. HIPPA, SOX, DPA)• Need to query efficiently by– match– ranges
    • 3 Approaches (there aremore)• Bucket by Number of messages• Fixed size array• Bucket by date + TTL collections
    • db.inbox.find(){ owner: "Joe", sequence: 25,messages: [{ from: "Joe",to: [ "Bob", "Jane" ],sent: ISODate("2013-03-01T09:59:42.689Z"),message: "Hi!"},…] }// Query with a date rangedb.inbox.find ({owner: "friend1",messages: {$elemMatch: {sent:{$gte: ISODate("…") }}}})// Remove elements based on a datedb.inbox.update({owner: "friend1" },{ $pull: { messages: {sent: { $gte: ISODate("…") } } } } )Bucket by number ofmessages
    • Considerations• Shrinking documents, space can be reclaimedwith– db.runCommand ( { compact: <collection> } )• Removing the document after the last element inthe array as been removed– { "_id" : …, "messages" : [ ], "owner" : "friend1","sequence" : 0 }
    • msg = {from: "Your Boss",to: [ "Bob" ],sent: new Date(),message: "CALL ME NOW!"}// 2.4 Introduces $each, $sort and $slice for $pushdb.messages.update({ _id: 1 },{ $push: { messages: { $each: [ msg ],$sort: { sent: 1 },$slice: -50 }}})Fixed Size Array
    • Considerations• Need to compute the size of the array based onretention period
    • // messages: one doc per user per daydb.inbox.findOne(){_id: 1,to: "Joe",sequence: ISODate("2013-02-04T00:00:00.392Z"),messages: [ ]}// Auto expires data after 31536000 seconds = 1 yeardb.messages.ensureIndex( { sequence: 1 },{ expireAfterSeconds: 31536000 } )TTL Collections
    • #3 – Indexed Attributes
    • Design Goal• Application needs to stored a variable number ofattributes e.g.– User defined Form– Meta Data tags• Queries needed– Equality– Range based• Need to be efficient, regardless of the number ofattributes
    • 2 Approaches (there aremore)• Attributes as Embedded Document• Attributes as Objects in an Array
    • db.files.insert( { _id: "local.0",attr: { type: "text", size: 64,created: ISODate("..." } } )db.files.insert( { _id: "local.1",attr: { type: "text", size: 128} } )db.files.insert( { _id: "mongod",attr: { type: "binary", size: 256,created: ISODate("...") } } )// Need to create an index for each item in the sub-documentdb.files.ensureIndex( { "attr.type": 1 } )db.files.find( { "attr.type": "text"} )// Can perform range queriesdb.files.ensureIndex( { "attr.size": 1 } )db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )Attributes as a Sub-Document
    • Considerations• Each attribute needs an Index• Each time you extend, you add an index• Lots and lots of indexes
    • db.files.insert( {_id: "local.0",attr: [ { type: "text" },{ size: 64 },{ created: ISODate("...") } ] } )db.files.insert( { _id: "local.1",attr: [ { type: "text" },{ size: 128 } ] } )db.files.insert( { _id: "mongod",attr: [ { type: "binary" },{ size: 256 },{ created: ISODate("...") } ] } )db.files.ensureIndex( { attr: 1 } )Attributes as Objects in Array
    • Considerations• Only one index needed on attr• Can support range queries, etc.• Index can be used only once per query
    • #4 – Multiple Identities
    • Design Goal• Ability to look up by a number of differentidentities e.g.• Username• Email address• FB Handle• LinkedIn URL
    • 2 Approaches (there aremore)• Identifiers in a single document• Separate Identifiers from Content
    • db.users.findOne(){ _id: "joe",email: "joe@example.com,fb: "joe.smith", // facebookli: "joe.e.smith", // linkedinother: {…}}// Shard collection by _iddb.shardCollection("mongodbdays.users", { _id: 1 } )// Create indexes on each keydb.users.ensureIndex( { email: 1} )db.users.ensureIndex( { fb: 1 } )db.users.ensureIndex( { li: 1 } )Single Document by User
    • Read by _id (shard key)Shard 1 Shard 2 Shard 3find( { _id: "joe"} )
    • Read by email (non-shardkey)Shard 1 Shard 2 Shard 3find ( { email: joe@example.com })
    • Considerations• Lookup by shard key is routed to 1 shard• Lookup by other identifier is scatter gatheredacross all shards• Secondary keys cannot have a unique index
    • // Create unique indexdb.identities.ensureIndex( { identifier : 1} , { unique: true} )// Create a document for each users documentdb.identities.save({ identifier : { hndl: "joe" }, user: "1200-42" } )db.identities.save({ identifier : { email: "joe@abc.com" }, user: "1200-42" } )db.identities.save({ identifier : { li: "joe.e.smith" }, user: "1200-42" } )// Shard collection by _iddb.shardCollection( "mydb.identities", { identifier : 1 } )// Create unique indexdb.users.ensureIndex( { _id: 1} , { unique: true} )// Shard collection by _iddb.shardCollection( "mydb.users", { _id: 1 } )Document per Identity
    • Read requires 2 readsShard 1 Shard 2 Shard 3db.identities.find({"identifier" : {"hndl" : "joe" }})db.users.find( { _id: "1200-42"})
    • Considerations• Lookup to Identities is a routed query• Lookup to Users is a routed query• Unique indexes available• Must do two queries per lookup
    • Conclusion
    • Summary• Multiple ways to model a domain problem• Understand the key uses cases of your app• Balance between ease of query vs. ease of write• Random I/O should be avoided
    • Perl Engineer & Evangelist, 10genMike Friedman#MongoDBdaysThank You
    • Next Sessions at 3:405th Floor:West Side Ballroom 3&4:Advanced Replication InternalsWest Side Ballroom 1&2: Building a High-Performance DistributedTask Queue on MongoDBJuilliard Complex: WhiteBoard Q&ALyceum Complex: Ask the Experts7th Floor:Empire Complex: Managing a Maturing MongoDB EcosystemSoHo Complex: MongoDB Indexing Constraints and CreativeSchemas