MongoDB Operational Best Practices (mongosf2012)

2. The Plan ● Review support cases ○ Taken from real issues ○ Names/ips/dates changed to protect identities ● Analyze reported issues ● Distill best practices ● Summarize takeaways ● Repeat...

3. Scenario 1 ● Fire, it is on fire! ● Users notice response time takes 1-3 sec ● App logs show timeouts ● Server log show socket exceptions

4. Scenario 1 - Diagnostics ● Logs ● Understanding the timeouts ○ Client read timeout set ○ Connection closed/discarded ○ Symptom not cause ● Server connection exceptions ○ Match timing of client timeouts ○ Symptom not cause

5. Scenario 1 - Monitoring Graphs speak a thousand words

6. Scenario 1 - Takeaways ● Monitor Logs ○ Alert, escalate ○ Correlate ● Disk ○ Monitor ○ Moved to RAID (10) ● Instrument/Monitor App ● Know your application and application (write) characteristics

7. Scenario 2 ● Alerts warn that server is running hot ● Random (small) slowdowns ● Increased traffic/queries

8. Scenario 2 - Symptoms High use cpu Similar query pattern

9. Scenario 2 - Diagnostics ● Turn on DB Profiling ● Look at logs Identify query patterns taking longest or with highest frequency and run explain

10. Scenario 2 - Explain db.scenario2.find({...}).sort({...}).explain() { "cursor" : "BtreeCursor ABC", "nscanned" : 160677, "nscannedObjects" : 12015, "n" : 55, "millis" : 99, "scanAndOrder" : true, "indexBounds" : {...} }

11. Scenario 2 - Diagnostics ● Create a compound index ○ Used for criteria and sort ○ Reduced CPU dramatically

12. Scenario 2 - Takeaways ● Performance test/analyze system behavior ● Load test before deployment ● Alert on abnormal states ● High CPU is a sign of poorly indexed ● Rolling upgrade for indexes

13. Scenario 3 ● General slowdown on login ● High disk utilization

14. Scenario 3 - Diagnostics iostat Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdp 0.00 0.00 0.50 0.00 27.86 0.00 56.00 149.58 20320.00 2010.00 100.00

15. Scenario 3 $ blockdev --report RO RA SSZ BSZ StartSec Size Device rw 8096 512 4048 0 1099494850560 /dev/sdp Huge read-ahead of 4MB

16. Scenario 3 - Takeaways ● Pay attention to disk configurations ● Load testing would have found this early ● MongoDB depends on the OS a lot ● Connect the dots from disportionate effects

17. Best Practices Learned ● System provisioning ○ Capacity ○ Performance ○ Scale ○ Configuration ● Logs ○ Review ○ Alert ○ Rotate and collect (per cluster)

18. Best Practices Learned ● Query/Index Analysis ○ Database Profiler ○ Run explain periodically (sampled) ○ Instrument code, generate metrics ● Plan/test rollouts ○ Rolling upgrade for Replica Set ○ Generate indexes on secondaries first ○ Name services, use redirection

19. Thanks, more refs Please take a look at http://mongodb.org (docs) ● Ask on mongodb-user group ● Use MMS or historic monitoring ○ Watch for trends ○ Create alerts ○ Forecast capacity for provisioning ● logrotate unix command ● monitor disk - munin or the like ● iostat, dstat, vmstat, free, netstat

20. Questions

MongoDB Operational Best Practices (mongosf2012)

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to MongoDB Operational Best Practices (mongosf2012)

Similar to MongoDB Operational Best Practices (mongosf2012) (20)

More from Scott Hernandez

More from Scott Hernandez (13)

Recently uploaded

Recently uploaded (20)

MongoDB Operational Best Practices (mongosf2012)