MongoDB Indexing Constraints and Creative Schemas

1,540 views
1,410 views

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,540
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
30
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

MongoDB Indexing Constraints and Creative Schemas

  1. 1. MongoDB Indexing Constraints & Creative Schemas Chris Winslett chris@mongohq.com Thursday, June 27, 13
  2. 2. My Background • For the past year, I’ve looked at MongoDB logs at least once every day. • We routinely answer the question “how can I improve performance?” Thursday, June 27, 13
  3. 3. Who’s this talk for? • New to MongoDB • Seeing some slow operations, and need help debugging • Running database operations on a sizeable deploy • I have a MongoDB deployment, and I’ve hit a performance wall Thursday, June 27, 13
  4. 4. What should you learn? Know where to look on a running MongoDB to uncover slowness, and discuss solutions. MongoDB has performance “patterns”. How to think about improving performance. And . . . Thursday, June 27, 13
  5. 5. Schema Design Design with the end in mind. Thursday, June 27, 13
  6. 6. MongoDB Indexing Constraints • One index per query * • One range operator per query ($) • Range operator must be last field in index • Using RAM well * except $or, but the sin with $or is appending a sort to the query. Thursday, June 27, 13
  7. 7. The Tools • `mongostat` for MongoDB Behavior • `tail` the logs for current options • `iostat` for disk util • `top -c` for CPU usage Thursday, June 27, 13
  8. 8. First, a Simple One query getmore command res faults locked db ar|aw netIn netOut conn time 129 4 7 126m 2 my_db:0.0% 3|0 27k 445k 42 15:36:54 64 4 3 126m 0 my_db:0.0% 5|0 12k 379k 42 15:36:55 65 7 8 126m 0 my_db:0.1% 3|0 15k 230k 42 15:36:56 65 3 3 126m 1 my_db:0.0% 3|0 13k 170k 42 15:36:57 66 1 6 126m 1 my_db:0.0% 0|0 14k 262k 42 15:36:58 32 8 5 126m 0 my_db:0.0% 5|0 5k 445k 42 15:36:59 a truncated mongostat Alerted due to high CPU Thursday, June 27, 13
  9. 9. log [conn73454] query my_db.my_collection query: { $query: { publisher: "US Weekly" }, orderby: { publishAt: -1 } } ntoreturn:5 ntoskip:0 nscanned:33236 scanAndOrder:1 keyUpdates:0 numYields: 21 locks(micros) r:317266 nreturned:5 reslen:3127 178ms Thursday, June 27, 13
  10. 10. Solution { $query: { publisher: "US Weekly" }, orderby: { publishedAt: -1 } } db.my_collection.ensureIndex({“publisher”: 1, publishedAt: -1}, {background: true}) We are fixing this query With this index I would show you the logs, but now they are silent. Thursday, June 27, 13
  11. 11. The Pattern Inefficient Read Queries from in-memory table scans cause high CPU load Caused by not matching indexes to queries. Thursday, June 27, 13
  12. 12. Example 2 MongoDB Twitter-ish Feed Customer was building a network graph of users. Thursday, June 27, 13
  13. 13. Naive Method { creator_id: ObjectId(), status:“This is so awesome!” } Statuses Users { _id: ObjectId(), friends: [array-o-friends] } db.status.find({creator_id: {$in: [array-o-friends]}}).sort({_id: -1}) Query Thursday, June 27, 13
  14. 14. Solution { creator_id: ObjectId(), friends_of_creator: [array-of-viewers], status:“This is so awesome!” } Statuses Users { _id: ObjectId(), friends: [array-o-friends] } db.statuses.find({friends_of_creator: ObjectId()}).sort({_id: -1}) Query Thursday, June 27, 13
  15. 15. The Pattern With graphs, query on viewable by. What worked with minimal documents was not scaling. Thursday, June 27, 13
  16. 16. Similar Issues - Messages { sender_id: ObjectId(), recipient_id: ObjectId(), message:“This is so awesome!” } Naive { sender_id: ObjectId(), recipient_id: ObjectId(), participants: [ObjectId(), ObjectId()], thread_id: ObjectId(), message:“This is so awesome!” } Solution db.messages.find({participants: ObjectId()}).sort({_id: -1}) Query db.messages.find({$or: [{sender_id: ObjectId()}, {recipient_id: ObjectId()]}).sort({_id: -1}) Naive Query Thursday, June 27, 13
  17. 17. Example 3 insert query update delete getmore command faults locked % idx miss % qr|qw ar|aw *0 *0 *0 *0 0 1|0 1422 0 0 0|0 50|0 *0 6 *0 *0 0 6|0 575 0 0 0|0 51|0 *0 3 *0 *0 0 1|0 1047 0 0 0|0 50|0 *0 2 *0 *0 0 3|0 1660 0 0 0|0 50|0 a truncated mongostat Alerted on high CPU Thursday, June 27, 13
  18. 18. tail [initandlisten] connection accepted from .... [conn4229724] authenticate: { authenticate: .... [initandlisten] connection accepted from .... [conn4229725] authenticate: { authenticate: ..... [conn4229717] query ..... 102ms [conn4229725] query ..... 140ms amazingly quiet Thursday, June 27, 13
  19. 19. currentOp > db.currentOP() { "inprog" : [ { "opid" : 66178716, "lockType" : "read", "secs_running" : 760, "op" : "query", "ns" : "my_db.my_collection", "query" : { keywords: $in: [“keyword1”,“keyword2”], tags: $in: [“tags1”,“tags2”] }, orderby: { “created_at”: -1 }, "numYields" : 21 } ] } Thursday, June 27, 13
  20. 20. Solution > db.currentOP().inprog.filter(function(row) { return row.secs_running > 100 && row.op == "query" }).forEach(function(row) { db.killOp(row.opid) }) Return Stability to Database Disable query, and refactor schema. Thursday, June 27, 13
  21. 21. Refactoring I have one word for you,“Schema” Thursday, June 27, 13
  22. 22. Example 4 A map reduce has gradually run slower and slower. Thursday, June 27, 13
  23. 23. Finding Offenders Find the time of the slowest query of the day: grep '[0-9]{3,100}ms$' $MONGODB_LOG | awk '{print $NF}' | sort -n Thursday, June 27, 13
  24. 24. Slowest Map Reduce my_db.$cmd command: { mapreduce: "my_collection", map: function() {}, query: { $or: [ { object.type: "this" }, { object.type: "that" } ], time: { $lt: new Date(1359025311290), $gt: new Date(1358420511290) }, object.ver: 1, origin: "tnh" }, out: "my_new_collection", reduce: function(keys, vals) { ....} } ntoreturn:1 keyUpdates:0 numYields: 32696 locks(micros) W:143870 r:511858643 w:6279425 reslen:140 421185ms Thursday, June 27, 13
  25. 25. Solution Query is slow because it has multiple multi-value operators: $or, $gte, and $lte Problem Solution Change schema to use an “hour_created” attribute: hour_created: “%Y-%m-%d %H” Create an index on “hour_created” with followed by “$or” values. Query using the new “hour_created.” Thursday, June 27, 13
  26. 26. Words of caution 2 / 4 solutions were to add an index. New indexes as a solution scales poorly. Thursday, June 27, 13
  27. 27. Sometimes . . . It is best to do nothing, except add shards / add hardware. Go back to the drawing board on the design. Thursday, June 27, 13
  28. 28. Bad things happen to good databases? • ORMs • Manage your indexes and queries. • Constraints will set you free. Thursday, June 27, 13
  29. 29. Road Map for Refactoring • Measure, measure, measure. • Find your slowest queries and determine if they can be indexed • Rephrase the problem you are solving by asking “How do I want to query my data?” Thursday, June 27, 13
  30. 30. Thank you! • Questions? • E-mail me: chris@mongohq.com Thursday, June 27, 13

×