Schema Design - Real world use case

1,485 views

Published on

Implementing social inbox or chronological feeds using MongoDB

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,485
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
40
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Define your schema when saving and creating indexesFunctional goalsPerformance goalsIn RDBMSImplement your domain model in the canonical way following normalization practices. Afterwards using relational databases mechanisms like joins and group by answer your queriesIn MongoDBYou first detect your queries, your typical access patterns and using these you implement your schema
  • Let’s go to our first example
  • Social media applicationsChronological feedsAll those platforms provide some level of messaging among their users
  • The message that I write here needs to be sent to hundreds or thousands of usersHow do we structure this in MongoDB?
  • This feed is unique per user, it’s 100% personalized
  • The simplest approachThe first idea that is coming to your mindWe’ll use Mongo shell for our code samples‘To’ field is an array, MongoDB when filtering with array fields similar to SQL ‘in’ operatorIt’s a really easy to implement solution
  • - No need to touch more than one shard, great for horizontal scalability!
  • - Reading close to the worst case scenario, thanks god we have an index
  • Write is fastRead is close to the worst caseFor a very read heavy application this is not a good approachIn order to retrieve all these documents when reading the inbox lots of IO
  • It’s the opposite situation that we faced in the first scenario
  • Efficient when reading messages but less efficient when writingWhen reading lot of random IO since we don’t have control where MongoDB stores each document, this is where the 3rd solution helps us
  • This is not a common sense solutionIt’s not going to be your first solution, maybe yes if you have a lot of experience with MongoDB
  • Let’s see in detail this findAndModify…Sequence is going to take the total count of messages, divide it by 50 and round it down, this is a pagination or bucketing algorithm where sequence is the number of pageWe push the message to the end of the array, each document contains 50 messages at maximumIt seems a lot of work for writing or sending a message
  • Writing it’s the same amount of work, actually a bit more, than previous solution
  • Reading it’s much better in this case because I only retrieve one or two documents to build my inbox and using an indexFor really high reading traffic applications this optimization is really important
  • Tweet is an example of history applicationRead a time window of messages
  • - Give me everything between 6 and 4 months ago.
  • Similar example to our previous case using sequence as a paginationUpdate operation is atomic at document level
  • Using pull command we shrink documents and produce fragmentationYou can fix that using compact in periodical basis, maybe with a cron job. Compaction it’s slow, it lockes, etc, good alternative to run it on secondariesRemember to delete the document once you got rid of all your messages
  • With this approach instead of deleting messages we are going to keep the latest messages when we insert them
  • - We need to know the size of the array,adaptative, based on user, or overestimate it
  • Another approach would be to set sequence with a Datetime in the future and expireAfterSeconds equals to 0TTL collections are quite popular for this kind of expiration
  • Schema design in no relational databases is not trivialThere is not a unique solution like in RDBMSThere is nothing mathematically tested like normalization formsWhich solution is best depends on your users and how do they use your application
  • Schema Design - Real world use case

    1. 1. #MongoDBDays Schema Design Real World Use Case Matias Cascallares Consulting Engineer, MongoDB matias.cascallares@mongodb.com
    2. 2. Agenda • Why is schema design important • A real world use case – Social Inbox – History • Conclusions
    3. 3. Why is Schema Design important? • Largest factor for a performant system • Schema design with MongoDB is different • • RDBMS – "What answers do I have?" MongoDB – "What question will I have?"
    4. 4. #1 – Message Inbox
    5. 5. • Let’s get • Social
    6. 6. Sending Messages ?
    7. 7. Reading my Inbox ?
    8. 8. Design Goals • Efficiently send new messages to recipients • Efficiently read inbox
    9. 9. 3 Approaches (there are more) • Fan out on Read • Fan out on Write • Fan out on Write with Bucketing
    10. 10. Fan out on read // Shard on "from" db.shardCollection( "mongodbdays.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: ”Matias", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: ”Matias" } ).sort( { sent: -1 } ) Schema Design, Matias Cascallares
    11. 11. Fan out on read – IO Send Message Shard 1 Shard 2 Shard 3
    12. 12. Fan out on read – IO Read Inbox Shard 1 Shard 2 Shard 3
    13. 13. Considerations • Write: one document per message sent • Reading my inbox means finding all messages with my own name in the recipient field • Read: requires scatter-gather on sharded cluster • Then a lot of random IO on a shard to find everything
    14. 14. Fan out on write // Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } ) msg = { from: ”Matias", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for ( recipient in msg.to ) { msg.recipient = recipient db.inbox.save( msg ); } // Read my inbox db.inbox.find( { recipient: "Matias" } ).sort( { sent: -1 } ) Schema Design, Matias Cascallares
    15. 15. Fan out on write – IO Send Message Shard 1 Shard 2 Shard 3
    16. 16. Fan out on write – IO Read Inbox Shard 1 Shard 2 Shard 3
    17. 17. Considerations • Write: one document per recipient • Reading my inbox is just finding all of the messages with me as the recipient • Can shard on recipient, so inbox reads hit one shard • But still lots of random IO on the shard
    18. 18. Fan out on write with buckets // Shard on “owner / sequence” db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( "mongodbdays.users", { user_name: 1 } ) msg = { from: ”Matias", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } Schema Design, Matias Cascallares
    19. 19. Fan out on write with buckets // Send a message for( recipient in msg.to ) { count = db.users.findAndModify({ query: { user_name: recipient }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50); db.inbox.update( { owner: recipient, sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } ); } // Read my inbox db.inbox.find( { owner: "Matias" } ).sort ( { sequence: -1 } ).limit( 2 ) Schema Design, Matias Cascallares
    20. 20. Fan out on write with buckets • Each “inbox” document is an array of messages • Append a message onto “inbox” of recipient • Bucket inboxes so there’s not too many messages per document • Can shard on recipient, so inbox reads hit one shard • 1 or 2 documents to read the whole inbox
    21. 21. Fan out on write with buckets - IO Send Message Shard 1 Shard 2 Shard 3
    22. 22. Fan out on write with buckets - IO Read Inbox Shard 1 Shard 2 Shard 3
    23. 23. #2 – History
    24. 24. Design Goals Need to retain a limited amount of history e.g. – Number of items – Hours, Days, Weeks – May be legislative requirement (e.g. HIPPA, SOX, DPA) Need to query efficiently by – match – ranges
    25. 25. 3 Approaches (there are more) • Bucket by number of messages • Fixed size array • Bucket by date + TTL Collections
    26. 26. Bucket by number of messages db.inbox.find() { owner: "Matias", sequence: 25, messages: [ { from: "Matias", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, … ]} // Query with a date range db.inbox.find({ owner: "Matias", messages: { $elemMatch: {sent:{$gt: ISODate("…") }}}}) // Remove elements based on a date db.inbox.update({ owner: "Matias" }, { $pull: { messages: { sent: { $lt: ISODate("…") } } } } ) Schema Design, Matias Cascallares
    27. 27. Considerations • Shrinking documents, space can be reclaimed with – db.runCommand ( { compact: '<collection>' } ) • Removing the document after the last element in the array as been removed – { "_id" : …, "messages" : [ ], "owner" : ”Bob", "sequence" : 0 }
    28. 28. Maintain the latest – Fixed size array msg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" } // 2.4 Introduces $each, $sort and $slice modifiers for $push db.messages.update( { _id: 1 }, { $push: { messages: { $each: [ msg ], $sort: { sent: 1 }, $slice: -50 } } } ) Schema Design, Matias Cascallares
    29. 29. Considerations • Need to compute the size of the array based on retention period
    30. 30. TTL Collections // messages: one doc per user per day db.inbox.findOne() { _id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] } // Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } ) Schema Design, Matias Cascallares
    31. 31. Conclusion
    32. 32. Summary • Multiple ways to model a domain problem • Understand the key uses cases of your app • Balance between ease of query vs. ease of write • Random IO should be avoided • Scatter/gatter should be avoided
    33. 33. Questions?
    34. 34. #MongoDBDays Thank You Matias Cascallares Consulting Engineer, MongoDB matias.cascallares@mongodb.com

    ×