MongoDB Health Tips
It’s a little different,
but not entirely new.
Keep it in RAM. Obviously.




www.flickr.com/photos/comedynose/4388430444/
How do you know?

                   >   db.stats()
                   {
                   !    "collections" : 3,
                   !    "objects" : 379970142,
                   !    "avgObjSize" : 146.4554114991488,
                   !    "dataSize" : 55648683504,           51GB
                   !    "storageSize" : 61795435008,
                   !    "numExtents" : 64,
                   !    "indexes" : 1,
                   !    "indexSize" : 21354514128,          19GB
                   !    "fileSize" : 100816388096,
                   !    "ok" : 1
                   }


http://www.flickr.com/photos/comedynose/4388430444/
Where should it go?


                                                     Should it be in
                            What?
                                                       memory?


                            Indexes                      Always


                               Data                     If you can



http://www.flickr.com/photos/comedynose/4388430444/
How you’ll know

1) Slow queries

                 Thu Oct 14 17:01:11 [conn7410] update sd.apiLog
                query: { c: "android/setDeviceToken", a: 1466, u:
                 "blah", ua: "Server Density Android" } 51926ms




www.flickr.com/photos/tonivc/2283676770/
How you’ll know

2) Timeouts

    cursor timed out (20000 ms)
How you’ll know

3) Disk i/o spikes




www.flickr.com/photos/daddo83/3406962115/
Watch your storage

1) Pre-alloc
Watch your storage

2) Sharding maxSize
Watch your storage

3) Logging
              --quiet


    db.runCommand("logRotate");


      killall -SIGUSR1 mongod
Watch your storage

4) Journaling
    david@rs2b   ~: ls -alh /mongodbdata/journal/
    total 538M
    drwxrwxr-x   2   david   david   29 Mar 20   16:50   .
    drwx------   4   david   david 4.0K Mar 13   09:50   ..
    -rw-------   1   david   david 538M Mar 20   17:00   j._862
    -rw-------   1   david   david   88 Mar 20   17:00   lsn
db.serverStatus()
db.serverStatus()

1) Used connections




www.flickr.com/photos/armchaircaver/2061231069/
db.serverStatus()

2) Available connections
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1b") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1d") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1c") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2b") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2d") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2c") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2a") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2268] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters:
[ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 }
MessagingPort say send() errno:9 Bad file descriptor (NONE)
Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2d:27018 socket exception
Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE)
Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2c:27018 socket exception
Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE)
Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2a:27018 socket exception
Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1a") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1b") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1d") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1c") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2b") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2d") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2c") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2a") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2b") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2d") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2c") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2a") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1b") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1d") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1c") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1b") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1d") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1c") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2b") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2d") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2c") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2a") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2d:27018
connPoolStats
>   db.runCommand("connPoolStats")
{
!   "hosts" : {
!   ! "config1:27019" : {
!   ! ! "available" : 2,
!   ! ! "created" : 6
!   ! },
!   ! "set1/rs1a:27018,rs1b:27018" : {
!   ! ! "available" : 1,
!   ! ! "created" : 249
!   ! },
        ...
!   },
!   "totalAvailable" : 5,
!   "totalCreated" : 1002,
!   "numDBClientConnection" : 3490,
!   "numAScopedConnection" : 3,
}
db.serverStatus()

3) Index counters
     "indexCounters" : {
     ! ! "btree" : {
     ! ! ! "accesses" : 15180175,
     ! ! ! "hits" : 15178725,
     ! ! ! "misses" : 1450,
     ! ! ! "resets" : 0,
     ! ! ! "missRatio" : 0.00009551932
     ! ! }
     ! },
db.serverStatus()

4) Op counters




www.flickr.com/photos/cosmic_bandita/2395369614/
db.serverStatus()

5) Background flushing




Picture is unrelated! Mmm, ice cream.
db.serverStatus()

6) Dur
rs.status()

       {
       !     "_id" : 1,
       !     "name" : "rs3b:27018",
       !     "health" : 1,
       !     "state" : 2,
       !     "stateStr" : "SECONDARY",
       !     "uptime" : 1886098,
       !     "optime" : {
       !     ! "t" : 1291252178000,
       !     ! "i" : 13
       !     },
       !     "optimeDate" : ISODate("2010-12-02T01:09:38Z"),
             "lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")
       },


www.ex-astris-scientia.org/inconsistencies/ent_vs_tng.htm (yes it’s a replicator from Star Trek)
rs.status()

1) myState
                               Value           Meaning
                                 0   Starting up (phase 1)
                                 1   Primary
                                 2   Secondary
                                 3   Recovering
                                 4   Fatal error
                                 5   Starting up (phase 2)
                                 6   Unknown state
                                 7   Arbiter
                                 8   Down
en.wikipedia.org/wiki/State_of_matter
rs.status()

2) Optime

         "optimeDate" : ISODate("2010-12-02T01:09:38Z")




www.flickr.com/photos/robbie73/4244846566/
rs.status()

3) Heartbeat

         "lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")




www.flickr.com/photos/drawblindfaith/3400981091/
mongostat
mongostat

 1) faults




Picture is unrelated! Snowmobile in Norway.
mongostat

2) locked




www.flickr.com/photos/bbusschots/4541573665/
mongostat

3) index miss




www.flickr.com/photos/gareandkitty/276471187/
mongostat

4) queues
mongostat

5) Diagnostics
Current operations
    db.currentOp();
    {
    ! ! ! "opid" : "shard1:299939199",
    ! ! ! "active" : true,
    ! ! ! "lockType" : "write",
    ! ! ! "waitingForLock" : false,
    ! ! ! "secs_running" : 15419,
    ! ! ! "op" : "remove",
    ! ! ! "ns" : "sd.metrics",
    ! ! ! "query" : {
    ! ! ! ! "accId" : 1391,
    ! ! ! ! "tA" : {
    ! ! ! ! ! "$lte" : ISODate("2010-11-24T19:53:00Z")
    ! ! ! ! }
    ! ! ! },
    ! ! ! "client" : "10.121.12.228:44426",
    ! ! ! "desc" : "conn"
    ! ! },
www.flickr.com/photos/jeffhester/2784666811/
Monitoring tools

Run yourself




    Ganglia
Monitoring tools

Server Density
Monitoring tools




www.mongomonitor.com
Recap
Recap

Keep it in RAM
Recap

Keep it in RAM

Watch your storage
Recap

Keep it in RAM

Watch your storage

db.serverStatus()

rs.status()
David Mytton

 @davidmytton

david@boxedice.com

www.mongomonitor.com

Woop Japan!

Monitoring MongoDB (MongoUK)

  • 1.
  • 2.
    It’s a littledifferent, but not entirely new.
  • 3.
    Keep it inRAM. Obviously. www.flickr.com/photos/comedynose/4388430444/
  • 4.
    How do youknow? > db.stats() { ! "collections" : 3, ! "objects" : 379970142, ! "avgObjSize" : 146.4554114991488, ! "dataSize" : 55648683504, 51GB ! "storageSize" : 61795435008, ! "numExtents" : 64, ! "indexes" : 1, ! "indexSize" : 21354514128, 19GB ! "fileSize" : 100816388096, ! "ok" : 1 } http://www.flickr.com/photos/comedynose/4388430444/
  • 5.
    Where should itgo? Should it be in What? memory? Indexes Always Data If you can http://www.flickr.com/photos/comedynose/4388430444/
  • 6.
    How you’ll know 1)Slow queries Thu Oct 14 17:01:11 [conn7410] update sd.apiLog query: { c: "android/setDeviceToken", a: 1466, u: "blah", ua: "Server Density Android" } 51926ms www.flickr.com/photos/tonivc/2283676770/
  • 7.
    How you’ll know 2)Timeouts cursor timed out (20000 ms)
  • 8.
    How you’ll know 3)Disk i/o spikes www.flickr.com/photos/daddo83/3406962115/
  • 9.
  • 10.
    Watch your storage 2)Sharding maxSize
  • 11.
    Watch your storage 3)Logging --quiet db.runCommand("logRotate"); killall -SIGUSR1 mongod
  • 12.
    Watch your storage 4)Journaling david@rs2b ~: ls -alh /mongodbdata/journal/ total 538M drwxrwxr-x 2 david david 29 Mar 20 16:50 . drwx------ 4 david david 4.0K Mar 13 09:50 .. -rw------- 1 david david 538M Mar 20 17:00 j._862 -rw------- 1 david david 88 Mar 20 17:00 lsn
  • 13.
  • 14.
  • 15.
  • 16.
    Fri Nov 1917:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2268] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters: [ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 } MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2d:27018 socket exception Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2c:27018 socket exception Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2a:27018 socket exception Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2d:27018
  • 17.
    connPoolStats > db.runCommand("connPoolStats") { ! "hosts" : { ! ! "config1:27019" : { ! ! ! "available" : 2, ! ! ! "created" : 6 ! ! }, ! ! "set1/rs1a:27018,rs1b:27018" : { ! ! ! "available" : 1, ! ! ! "created" : 249 ! ! }, ... ! }, ! "totalAvailable" : 5, ! "totalCreated" : 1002, ! "numDBClientConnection" : 3490, ! "numAScopedConnection" : 3, }
  • 18.
    db.serverStatus() 3) Index counters "indexCounters" : { ! ! "btree" : { ! ! ! "accesses" : 15180175, ! ! ! "hits" : 15178725, ! ! ! "misses" : 1450, ! ! ! "resets" : 0, ! ! ! "missRatio" : 0.00009551932 ! ! } ! },
  • 19.
  • 20.
  • 21.
  • 22.
    rs.status() { ! "_id" : 1, ! "name" : "rs3b:27018", ! "health" : 1, ! "state" : 2, ! "stateStr" : "SECONDARY", ! "uptime" : 1886098, ! "optime" : { ! ! "t" : 1291252178000, ! ! "i" : 13 ! }, ! "optimeDate" : ISODate("2010-12-02T01:09:38Z"), "lastHeartbeat" : ISODate("2010-12-02T01:09:38Z") }, www.ex-astris-scientia.org/inconsistencies/ent_vs_tng.htm (yes it’s a replicator from Star Trek)
  • 23.
    rs.status() 1) myState Value Meaning 0 Starting up (phase 1) 1 Primary 2 Secondary 3 Recovering 4 Fatal error 5 Starting up (phase 2) 6 Unknown state 7 Arbiter 8 Down en.wikipedia.org/wiki/State_of_matter
  • 24.
    rs.status() 2) Optime "optimeDate" : ISODate("2010-12-02T01:09:38Z") www.flickr.com/photos/robbie73/4244846566/
  • 25.
    rs.status() 3) Heartbeat "lastHeartbeat" : ISODate("2010-12-02T01:09:38Z") www.flickr.com/photos/drawblindfaith/3400981091/
  • 26.
  • 27.
    mongostat 1) faults Pictureis unrelated! Snowmobile in Norway.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
    Current operations db.currentOp(); { ! ! ! "opid" : "shard1:299939199", ! ! ! "active" : true, ! ! ! "lockType" : "write", ! ! ! "waitingForLock" : false, ! ! ! "secs_running" : 15419, ! ! ! "op" : "remove", ! ! ! "ns" : "sd.metrics", ! ! ! "query" : { ! ! ! ! "accId" : 1391, ! ! ! ! "tA" : { ! ! ! ! ! "$lte" : ISODate("2010-11-24T19:53:00Z") ! ! ! ! } ! ! ! }, ! ! ! "client" : "10.121.12.228:44426", ! ! ! "desc" : "conn" ! ! }, www.flickr.com/photos/jeffhester/2784666811/
  • 33.
  • 34.
  • 38.
  • 39.
  • 40.
  • 41.
    Recap Keep it inRAM Watch your storage
  • 42.
    Recap Keep it inRAM Watch your storage db.serverStatus() rs.status()
  • 43.