MongoDB - Monitoring & queueing

1,737 views

Published on

Building a queueing system in MongoDB and monitoring your cluster. Presentation by David Mytton at MongoSF May 2011 and MongoDB London User Group July 2011.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,737
On SlideShare
0
From Embeds
0
Number of Embeds
658
Actions
Shares
0
Downloads
31
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

MongoDB - Monitoring & queueing

  1. 1. MongoDB Queuing & Monitoring
  2. 2. Queuingwww.flickr.com/photos/triplexpresso/496995086/
  3. 3. Queuing☺ Redundancywww.flickr.com/photos/triplexpresso/496995086/
  4. 4. Queuing☺ Redundancy☺ Atomicitywww.flickr.com/photos/triplexpresso/496995086/
  5. 5. Queuing☺ Redundancy☺ Atomicity Speedwww.flickr.com/photos/triplexpresso/496995086/
  6. 6. Queuing☺ Redundancy☺ Atomicity Speed☹ GCwww.flickr.com/photos/triplexpresso/496995086/
  7. 7. Queuing☺ Redundancywww.flickr.com/photos/triplexpresso/496995086/
  8. 8. Queuing☺ Redundancy☺ Knownwww.flickr.com/photos/triplexpresso/496995086/
  9. 9. It’s a little different,but not entirely new.
  10. 10. Keep it in RAM. Obviously.www.flickr.com/photos/comedynose/4388430444/
  11. 11. How do you know? > db.stats() { ! "collections" : 3, ! "objects" : 379970142, ! "avgObjSize" : 146.4554114991488, ! "dataSize" : 55648683504, 51GB ! "storageSize" : 61795435008, ! "numExtents" : 64, ! "indexes" : 1, ! "indexSize" : 21354514128, 19GB ! "fileSize" : 100816388096, ! "ok" : 1 }http://www.flickr.com/photos/comedynose/4388430444/
  12. 12. Where should it go? Should it be in What? memory? Indexes Always Data If you canhttp://www.flickr.com/photos/comedynose/4388430444/
  13. 13. How you’ll know1) Slow queries Thu Oct 14 17:01:11 [conn7410] update sd.apiLog query: { c: "android/setDeviceToken", a: 1466, u: "blah", ua: "Server Density Android" } 51926mswww.flickr.com/photos/tonivc/2283676770/
  14. 14. How you’ll know2) Timeouts cursor timed out (20000 ms)
  15. 15. How you’ll know3) Disk i/o spikeswww.flickr.com/photos/daddo83/3406962115/
  16. 16. Watch your storage1) Pre-alloc
  17. 17. Watch your storage2) Sharding maxSize
  18. 18. Watch your storage3) Logging --quiet db.runCommand("logRotate"); killall -SIGUSR1 mongod
  19. 19. Watch your storage4) Journaling david@rs2b ~: ls -alh /mongodbdata/journal/ total 538M drwxrwxr-x 2 david david 29 Mar 20 16:50 . drwx------ 4 david david 4.0K Mar 13 09:50 .. -rw------- 1 david david 538M Mar 20 17:00 j._862 -rw------- 1 david david 88 Mar 20 17:00 lsn
  20. 20. db.serverStatus()
  21. 21. db.serverStatus()1) Used connectionswww.flickr.com/photos/armchaircaver/2061231069/
  22. 22. db.serverStatus()2) Available connections
  23. 23. Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open filesFri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1b") failed: No address associated with hostnameFri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1d") failed: No address associated with hostnameFri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1c") failed: No address associated with hostnameFri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2b") failed: No address associated with hostnameFri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2d") failed: No address associated with hostnameFri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2c") failed: No address associated with hostnameFri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2a") failed: No address associated with hostnameFri Nov 19 17:24:32 [conn2268] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters:[ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 }MessagingPort say send() errno:9 Bad file descriptor (NONE)Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2d:27018 socket exceptionFri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE)Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2c:27018 socket exceptionFri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE)Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2a:27018 socket exceptionFri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1a") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1b") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1d") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1c") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2b") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2d") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2c") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2a") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2b") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2d") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2c") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2a") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1b") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1d") failed: No address associated with hostnameFri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1c") failed: No address associated with hostnameFri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1b") failed: No address associated with hostnameFri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1d") failed: No address associated with hostnameFri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1c") failed: No address associated with hostnameFri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2b") failed: No address associated with hostnameFri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2d") failed: No address associated with hostnameFri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2c") failed: No address associated with hostnameFri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2a") failed: No address associated with hostnameFri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostnameFri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostnameFri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostnameFri Nov 19 17:24:34 [conn2343] trying reconnect to rs2d:27018
  24. 24. connPoolStats> db.runCommand("connPoolStats"){! "hosts" : {! ! "config1:27019" : {! ! ! "available" : 2,! ! ! "created" : 6! ! },! ! "set1/rs1a:27018,rs1b:27018" : {! ! ! "available" : 1,! ! ! "created" : 249! ! }, ...! },! "totalAvailable" : 5,! "totalCreated" : 1002,! "numDBClientConnection" : 3490,! "numAScopedConnection" : 3,}
  25. 25. db.serverStatus()3) Index counters "indexCounters" : { ! ! "btree" : { ! ! ! "accesses" : 15180175, ! ! ! "hits" : 15178725, ! ! ! "misses" : 1450, ! ! ! "resets" : 0, ! ! ! "missRatio" : 0.00009551932 ! ! } ! },
  26. 26. db.serverStatus()4) Op counterswww.flickr.com/photos/cosmic_bandita/2395369614/
  27. 27. db.serverStatus()5) Background flushingPicture is unrelated! Mmm, ice cream.
  28. 28. db.serverStatus()6) Dur
  29. 29. rs.status() { ! "_id" : 1, ! "name" : "rs3b:27018", ! "health" : 1, ! "state" : 2, ! "stateStr" : "SECONDARY", ! "uptime" : 1886098, ! "optime" : { ! ! "t" : 1291252178000, ! ! "i" : 13 ! }, ! "optimeDate" : ISODate("2010-12-02T01:09:38Z"), "lastHeartbeat" : ISODate("2010-12-02T01:09:38Z") },www.ex-astris-scientia.org/inconsistencies/ent_vs_tng.htm (yes it’s a replicator from Star Trek)
  30. 30. rs.status()1) myState Value Meaning 0 Starting up (phase 1) 1 Primary 2 Secondary 3 Recovering 4 Fatal error 5 Starting up (phase 2) 6 Unknown state 7 Arbiter 8 Downen.wikipedia.org/wiki/State_of_matter
  31. 31. rs.status()2) Optime "optimeDate" : ISODate("2010-12-02T01:09:38Z")www.flickr.com/photos/robbie73/4244846566/
  32. 32. rs.status()3) Heartbeat "lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")www.flickr.com/photos/drawblindfaith/3400981091/
  33. 33. mongostat
  34. 34. mongostat 1) faultsPicture is unrelated! Snowmobile in Norway.
  35. 35. mongostat2) lockedwww.flickr.com/photos/bbusschots/4541573665/
  36. 36. mongostat3) index misswww.flickr.com/photos/gareandkitty/276471187/
  37. 37. mongostat4) queues
  38. 38. mongostat5) Diagnostics
  39. 39. Current operations db.currentOp(); { ! ! ! "opid" : "shard1:299939199", ! ! ! "active" : true, ! ! ! "lockType" : "write", ! ! ! "waitingForLock" : false, ! ! ! "secs_running" : 15419, ! ! ! "op" : "remove", ! ! ! "ns" : "sd.metrics", ! ! ! "query" : { ! ! ! ! "accId" : 1391, ! ! ! ! "tA" : { ! ! ! ! ! "$lte" : ISODate("2010-11-24T19:53:00Z") ! ! ! ! } ! ! ! }, ! ! ! "client" : "10.121.12.228:44426", ! ! ! "desc" : "conn" ! ! },www.flickr.com/photos/jeffhester/2784666811/
  40. 40. Monitoring toolsServer Density
  41. 41. Monitoring toolswww.mongomonitor.com
  42. 42. App store for sysadminsplugins.serverdensity.com
  43. 43. Recap
  44. 44. RecapKeep it in RAM
  45. 45. RecapKeep it in RAMWatch your storage
  46. 46. RecapKeep it in RAMWatch your storagedb.serverStatus()rs.status()
  47. 47. David Mytton @davidmyttondavid@boxedice.comwww.mongomonitor.comWoop Japan!

×