Operational Best   PracticesTales from the field
The Plan● Review support cases  ○ Taken from real issues  ○ Names/ips/dates changed to protect identities● Analyze reporte...
Scenario 1● Fire, it is on fire!● Users notice response time takes 1-3 sec● App logs show timeouts● Server log show socket...
Scenario 1 - Diagnostics● Logs● Understanding the timeouts   ○ Client read timeout set   ○ Connection closed/discarded   ○...
Scenario 1 - MonitoringGraphs speak a thousand words
Scenario 1 - Takeaways● Monitor Logs  ○ Alert, escalate  ○ Correlate● Disk   ○ Monitor   ○ Moved to RAID (10)● Instrument/...
Scenario 2● Alerts warn that server is running hot● Random (small) slowdowns● Increased traffic/queries
Scenario 2 - SymptomsHigh use cpuSimilar querypattern
Scenario 2 - Diagnostics● Turn on DB Profiling● Look at logsIdentify query patterns taking longest or withhighest frequenc...
Scenario 2 - Explaindb.scenario2.find({...}).sort({...}).explain() {     "cursor" : "BtreeCursor ABC",     "nscanned" : 16...
Scenario 2 - Diagnostics● Create a compound index  ○ Used for criteria and sort  ○ Reduced CPU dramatically
Scenario 2 - Takeaways●   Performance test/analyze system behavior●   Load test before deployment●   Alert on abnormal sta...
Scenario 3● General slowdown on login● High disk utilization
Scenario 3 - DiagnosticsiostatDevice:   rrqm/s wrqm/s r/s   w/s rsec/s wsec/s avgrq-sz avgqu-sz await   svctm %utilsdp    ...
Scenario 3$ blockdev --reportRO RA SSZ BSZ StartSec       Size    Devicerw 8096 512 4048    0 1099494850560 /dev/sdpHuge r...
Scenario 3 - Takeaways●   Pay attention to disk configurations●   Load testing would have found this early●   MongoDB depe...
Best Practices Learned● System provisioning  ○   Capacity  ○   Performance  ○   Scale  ○   Configuration● Logs  ○ Review  ...
Best Practices Learned● Query/Index Analysis  ○ Database Profiler  ○ Run explain periodically (sampled)  ○ Instrument code...
Thanks, more refsPlease take a look at http://mongodb.org (docs)● Ask on mongodb-user group● Use MMS or historic monitorin...
Questions
Upcoming SlideShare
Loading in …5
×

MongoDB Operational Best Practices (mongosf2012)

5,882 views
4,914 views

Published on

Learn about mongodb best practices from examples from fields.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,882
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
67
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

MongoDB Operational Best Practices (mongosf2012)

  1. Operational Best PracticesTales from the field
  2. The Plan● Review support cases ○ Taken from real issues ○ Names/ips/dates changed to protect identities● Analyze reported issues● Distill best practices● Summarize takeaways● Repeat...
  3. Scenario 1● Fire, it is on fire!● Users notice response time takes 1-3 sec● App logs show timeouts● Server log show socket exceptions
  4. Scenario 1 - Diagnostics● Logs● Understanding the timeouts ○ Client read timeout set ○ Connection closed/discarded ○ Symptom not cause● Server connection exceptions ○ Match timing of client timeouts ○ Symptom not cause
  5. Scenario 1 - MonitoringGraphs speak a thousand words
  6. Scenario 1 - Takeaways● Monitor Logs ○ Alert, escalate ○ Correlate● Disk ○ Monitor ○ Moved to RAID (10)● Instrument/Monitor App● Know your application and application (write) characteristics
  7. Scenario 2● Alerts warn that server is running hot● Random (small) slowdowns● Increased traffic/queries
  8. Scenario 2 - SymptomsHigh use cpuSimilar querypattern
  9. Scenario 2 - Diagnostics● Turn on DB Profiling● Look at logsIdentify query patterns taking longest or withhighest frequency and run explain
  10. Scenario 2 - Explaindb.scenario2.find({...}).sort({...}).explain() { "cursor" : "BtreeCursor ABC", "nscanned" : 160677, "nscannedObjects" : 12015, "n" : 55, "millis" : 99, "scanAndOrder" : true, "indexBounds" : {...} }
  11. Scenario 2 - Diagnostics● Create a compound index ○ Used for criteria and sort ○ Reduced CPU dramatically
  12. Scenario 2 - Takeaways● Performance test/analyze system behavior● Load test before deployment● Alert on abnormal states● High CPU is a sign of poorly indexed● Rolling upgrade for indexes
  13. Scenario 3● General slowdown on login● High disk utilization
  14. Scenario 3 - DiagnosticsiostatDevice: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %utilsdp 0.00 0.00 0.50 0.00 27.86 0.00 56.00 149.58 20320.00 2010.00 100.00
  15. Scenario 3$ blockdev --reportRO RA SSZ BSZ StartSec Size Devicerw 8096 512 4048 0 1099494850560 /dev/sdpHuge read-ahead of 4MB
  16. Scenario 3 - Takeaways● Pay attention to disk configurations● Load testing would have found this early● MongoDB depends on the OS a lot● Connect the dots from disportionate effects
  17. Best Practices Learned● System provisioning ○ Capacity ○ Performance ○ Scale ○ Configuration● Logs ○ Review ○ Alert ○ Rotate and collect (per cluster)
  18. Best Practices Learned● Query/Index Analysis ○ Database Profiler ○ Run explain periodically (sampled) ○ Instrument code, generate metrics● Plan/test rollouts ○ Rolling upgrade for Replica Set ○ Generate indexes on secondaries first ○ Name services, use redirection
  19. Thanks, more refsPlease take a look at http://mongodb.org (docs)● Ask on mongodb-user group● Use MMS or historic monitoring ○ Watch for trends ○ Create alerts ○ Forecast capacity for provisioning● logrotate unix command● monitor disk - munin or the like● iostat, dstat, vmstat, free, netstat
  20. Questions

×