• Like

Thanks for flagging this SlideShare!

Oops! An error has occurred.

MongoDB Indexing Constraints and Creative Schemas

  • 750 views
Published

 

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
750
On SlideShare
0
From Embeds
0
Number of Embeds
8

Actions

Shares
Downloads
23
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. MongoDB Indexing Constraints & Creative Schemas Chris Winslett chris@mongohq.com Thursday, June 27, 13
  • 2. My Background • For the past year, I’ve looked at MongoDB logs at least once every day. • We routinely answer the question “how can I improve performance?” Thursday, June 27, 13
  • 3. Who’s this talk for? • New to MongoDB • Seeing some slow operations, and need help debugging • Running database operations on a sizeable deploy • I have a MongoDB deployment, and I’ve hit a performance wall Thursday, June 27, 13
  • 4. What should you learn? Know where to look on a running MongoDB to uncover slowness, and discuss solutions. MongoDB has performance “patterns”. How to think about improving performance. And . . . Thursday, June 27, 13
  • 5. Schema Design Design with the end in mind. Thursday, June 27, 13
  • 6. MongoDB Indexing Constraints • One index per query * • One range operator per query ($) • Range operator must be last field in index • Using RAM well * except $or, but the sin with $or is appending a sort to the query. Thursday, June 27, 13
  • 7. The Tools • `mongostat` for MongoDB Behavior • `tail` the logs for current options • `iostat` for disk util • `top -c` for CPU usage Thursday, June 27, 13
  • 8. First, a Simple One query getmore command res faults locked db ar|aw netIn netOut conn time 129 4 7 126m 2 my_db:0.0% 3|0 27k 445k 42 15:36:54 64 4 3 126m 0 my_db:0.0% 5|0 12k 379k 42 15:36:55 65 7 8 126m 0 my_db:0.1% 3|0 15k 230k 42 15:36:56 65 3 3 126m 1 my_db:0.0% 3|0 13k 170k 42 15:36:57 66 1 6 126m 1 my_db:0.0% 0|0 14k 262k 42 15:36:58 32 8 5 126m 0 my_db:0.0% 5|0 5k 445k 42 15:36:59 a truncated mongostat Alerted due to high CPU Thursday, June 27, 13
  • 9. log [conn73454] query my_db.my_collection query: { $query: { publisher: "US Weekly" }, orderby: { publishAt: -1 } } ntoreturn:5 ntoskip:0 nscanned:33236 scanAndOrder:1 keyUpdates:0 numYields: 21 locks(micros) r:317266 nreturned:5 reslen:3127 178ms Thursday, June 27, 13
  • 10. Solution { $query: { publisher: "US Weekly" }, orderby: { publishedAt: -1 } } db.my_collection.ensureIndex({“publisher”: 1, publishedAt: -1}, {background: true}) We are fixing this query With this index I would show you the logs, but now they are silent. Thursday, June 27, 13
  • 11. The Pattern Inefficient Read Queries from in-memory table scans cause high CPU load Caused by not matching indexes to queries. Thursday, June 27, 13
  • 12. Example 2 MongoDB Twitter-ish Feed Customer was building a network graph of users. Thursday, June 27, 13
  • 13. Naive Method { creator_id: ObjectId(), status:“This is so awesome!” } Statuses Users { _id: ObjectId(), friends: [array-o-friends] } db.status.find({creator_id: {$in: [array-o-friends]}}).sort({_id: -1}) Query Thursday, June 27, 13
  • 14. Solution { creator_id: ObjectId(), friends_of_creator: [array-of-viewers], status:“This is so awesome!” } Statuses Users { _id: ObjectId(), friends: [array-o-friends] } db.statuses.find({friends_of_creator: ObjectId()}).sort({_id: -1}) Query Thursday, June 27, 13
  • 15. The Pattern With graphs, query on viewable by. What worked with minimal documents was not scaling. Thursday, June 27, 13
  • 16. Similar Issues - Messages { sender_id: ObjectId(), recipient_id: ObjectId(), message:“This is so awesome!” } Naive { sender_id: ObjectId(), recipient_id: ObjectId(), participants: [ObjectId(), ObjectId()], thread_id: ObjectId(), message:“This is so awesome!” } Solution db.messages.find({participants: ObjectId()}).sort({_id: -1}) Query db.messages.find({$or: [{sender_id: ObjectId()}, {recipient_id: ObjectId()]}).sort({_id: -1}) Naive Query Thursday, June 27, 13
  • 17. Example 3 insert query update delete getmore command faults locked % idx miss % qr|qw ar|aw *0 *0 *0 *0 0 1|0 1422 0 0 0|0 50|0 *0 6 *0 *0 0 6|0 575 0 0 0|0 51|0 *0 3 *0 *0 0 1|0 1047 0 0 0|0 50|0 *0 2 *0 *0 0 3|0 1660 0 0 0|0 50|0 a truncated mongostat Alerted on high CPU Thursday, June 27, 13
  • 18. tail [initandlisten] connection accepted from .... [conn4229724] authenticate: { authenticate: .... [initandlisten] connection accepted from .... [conn4229725] authenticate: { authenticate: ..... [conn4229717] query ..... 102ms [conn4229725] query ..... 140ms amazingly quiet Thursday, June 27, 13
  • 19. currentOp > db.currentOP() { "inprog" : [ { "opid" : 66178716, "lockType" : "read", "secs_running" : 760, "op" : "query", "ns" : "my_db.my_collection", "query" : { keywords: $in: [“keyword1”,“keyword2”], tags: $in: [“tags1”,“tags2”] }, orderby: { “created_at”: -1 }, "numYields" : 21 } ] } Thursday, June 27, 13
  • 20. Solution > db.currentOP().inprog.filter(function(row) { return row.secs_running > 100 && row.op == "query" }).forEach(function(row) { db.killOp(row.opid) }) Return Stability to Database Disable query, and refactor schema. Thursday, June 27, 13
  • 21. Refactoring I have one word for you,“Schema” Thursday, June 27, 13
  • 22. Example 4 A map reduce has gradually run slower and slower. Thursday, June 27, 13
  • 23. Finding Offenders Find the time of the slowest query of the day: grep '[0-9]{3,100}ms$' $MONGODB_LOG | awk '{print $NF}' | sort -n Thursday, June 27, 13
  • 24. Slowest Map Reduce my_db.$cmd command: { mapreduce: "my_collection", map: function() {}, query: { $or: [ { object.type: "this" }, { object.type: "that" } ], time: { $lt: new Date(1359025311290), $gt: new Date(1358420511290) }, object.ver: 1, origin: "tnh" }, out: "my_new_collection", reduce: function(keys, vals) { ....} } ntoreturn:1 keyUpdates:0 numYields: 32696 locks(micros) W:143870 r:511858643 w:6279425 reslen:140 421185ms Thursday, June 27, 13
  • 25. Solution Query is slow because it has multiple multi-value operators: $or, $gte, and $lte Problem Solution Change schema to use an “hour_created” attribute: hour_created: “%Y-%m-%d %H” Create an index on “hour_created” with followed by “$or” values. Query using the new “hour_created.” Thursday, June 27, 13
  • 26. Words of caution 2 / 4 solutions were to add an index. New indexes as a solution scales poorly. Thursday, June 27, 13
  • 27. Sometimes . . . It is best to do nothing, except add shards / add hardware. Go back to the drawing board on the design. Thursday, June 27, 13
  • 28. Bad things happen to good databases? • ORMs • Manage your indexes and queries. • Constraints will set you free. Thursday, June 27, 13
  • 29. Road Map for Refactoring • Measure, measure, measure. • Find your slowest queries and determine if they can be indexed • Rephrase the problem you are solving by asking “How do I want to query my data?” Thursday, June 27, 13
  • 30. Thank you! • Questions? • E-mail me: chris@mongohq.com Thursday, June 27, 13