SlideShare a Scribd company logo
1 of 20
Download to read offline
Operational Best
   Practices
Tales from the field
The Plan
● Review support cases
  ○ Taken from real issues
  ○ Names/ips/dates changed to protect identities
● Analyze reported issues
● Distill best practices
● Summarize takeaways

● Repeat...
Scenario 1
● Fire, it is on fire!

● Users notice response time takes 1-3 sec
● App logs show timeouts
● Server log show socket exceptions
Scenario 1 - Diagnostics
● Logs

● Understanding the timeouts
   ○ Client read timeout set
   ○ Connection closed/discarded
   ○ Symptom not cause


● Server connection exceptions
  ○ Match timing of client timeouts
   ○ Symptom not cause
Scenario 1 - Monitoring
Graphs speak a thousand words
Scenario 1 - Takeaways
● Monitor Logs
  ○ Alert, escalate
  ○ Correlate
● Disk
   ○ Monitor
   ○ Moved to RAID (10)
● Instrument/Monitor App
● Know your application and application (write)
  characteristics
Scenario 2
● Alerts warn that server is running hot
● Random (small) slowdowns
● Increased traffic/queries
Scenario 2 - Symptoms
High use cpu




Similar query
pattern
Scenario 2 - Diagnostics
● Turn on DB Profiling
● Look at logs


Identify query patterns taking longest or with
highest frequency and run explain
Scenario 2 - Explain
db.scenario2.find({...}).sort({...}).explain() {
     "cursor" : "BtreeCursor ABC",
     "nscanned" : 160677,
     "nscannedObjects" : 12015,
     "n" : 55,
     "millis" : 99,
     "scanAndOrder" : true,
     "indexBounds" : {...} }
Scenario 2 - Diagnostics
● Create a compound index
  ○ Used for criteria and sort
  ○ Reduced CPU dramatically
Scenario 2 - Takeaways
●   Performance test/analyze system behavior
●   Load test before deployment
●   Alert on abnormal states
●   High CPU is a sign of poorly indexed
●   Rolling upgrade for indexes
Scenario 3
● General slowdown on login
● High disk utilization
Scenario 3 - Diagnostics
iostat
Device:   rrqm/s wrqm/s r/s   w/s rsec/s wsec/s avgrq-sz avgqu-sz await   svctm %util
sdp       0.00 0.00 0.50      0.00 27.86 0.00 56.00 149.58        20320.00 2010.00 100.00
Scenario 3
$ blockdev --report
RO RA SSZ BSZ StartSec       Size    Device
rw 8096 512 4048    0 1099494850560 /dev/sdp


Huge read-ahead of 4MB
Scenario 3 - Takeaways
●   Pay attention to disk configurations
●   Load testing would have found this early
●   MongoDB depends on the OS a lot
●   Connect the dots from disportionate effects
Best Practices Learned
● System provisioning
  ○   Capacity
  ○   Performance
  ○   Scale
  ○   Configuration
● Logs
  ○ Review
  ○ Alert
  ○ Rotate and collect (per cluster)
Best Practices Learned
● Query/Index Analysis
  ○ Database Profiler
  ○ Run explain periodically (sampled)
  ○ Instrument code, generate metrics
● Plan/test rollouts
  ○ Rolling upgrade for Replica Set
  ○ Generate indexes on secondaries first
  ○ Name services, use redirection
Thanks, more refs
Please take a look at http://mongodb.org (docs)

● Ask on mongodb-user group
● Use MMS or historic monitoring
   ○ Watch for trends
   ○ Create alerts
   ○ Forecast capacity for provisioning
● logrotate unix command
● monitor disk - munin or the like
● iostat, dstat, vmstat, free, netstat
Questions

More Related Content

Viewers also liked

Java Development with MongoDB
Java Development with MongoDBJava Development with MongoDB
Java Development with MongoDB
Scott Hernandez
 
Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011
lennartkoopmann
 

Viewers also liked (8)

Java Development with MongoDB
Java Development with MongoDBJava Development with MongoDB
Java Development with MongoDB
 
Petty Cash Management - How To Manage Logs and Transactions
Petty Cash Management - How To Manage Logs and TransactionsPetty Cash Management - How To Manage Logs and Transactions
Petty Cash Management - How To Manage Logs and Transactions
 
Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011
 
"Grand Challenges" of Log Management
"Grand Challenges" of Log Management"Grand Challenges" of Log Management
"Grand Challenges" of Log Management
 
Log Files
Log FilesLog Files
Log Files
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
Gaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsGaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your Logs
 
Logs management
Logs managementLogs management
Logs management
 

Similar to MongoDB Operational Best Practices (mongosf2012)

Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Severalnines
 

Similar to MongoDB Operational Best Practices (mongosf2012) (20)

How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
Performance Whackamole (short version)
Performance Whackamole (short version)Performance Whackamole (short version)
Performance Whackamole (short version)
 
Security Monitoring for big Infrastructures without a Million Dollar budget
Security Monitoring for big Infrastructures without a Million Dollar budgetSecurity Monitoring for big Infrastructures without a Million Dollar budget
Security Monitoring for big Infrastructures without a Million Dollar budget
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Oksana Safronova - Will you detect it or not? How to check if security team i...
Oksana Safronova - Will you detect it or not? How to check if security team i...Oksana Safronova - Will you detect it or not? How to check if security team i...
Oksana Safronova - Will you detect it or not? How to check if security team i...
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Clustering in PostgreSQL - Because one database server is never enough (and n...
Clustering in PostgreSQL - Because one database server is never enough (and n...Clustering in PostgreSQL - Because one database server is never enough (and n...
Clustering in PostgreSQL - Because one database server is never enough (and n...
 
Tuning Java Servers
Tuning Java Servers Tuning Java Servers
Tuning Java Servers
 

More from Scott Hernandez

Mongo sf easy java persistence
Mongo sf   easy java persistenceMongo sf   easy java persistence
Mongo sf easy java persistence
Scott Hernandez
 
MongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupMongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF Meetup
Scott Hernandez
 
MongoDB: tips, trick and hacks
MongoDB: tips, trick and hacksMongoDB: tips, trick and hacks
MongoDB: tips, trick and hacks
Scott Hernandez
 
Mastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript ShellMastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript Shell
Scott Hernandez
 

More from Scott Hernandez (13)

MongoDB 2.8 Replication Internals: Fitting it all together
MongoDB 2.8 Replication Internals: Fitting it all togetherMongoDB 2.8 Replication Internals: Fitting it all together
MongoDB 2.8 Replication Internals: Fitting it all together
 
Advanced Replication Internals
Advanced Replication InternalsAdvanced Replication Internals
Advanced Replication Internals
 
Realtime Analytics with MongoDB Counters (mongonyc 2012)
Realtime Analytics with MongoDB Counters (mongonyc 2012)Realtime Analytics with MongoDB Counters (mongonyc 2012)
Realtime Analytics with MongoDB Counters (mongonyc 2012)
 
MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)
 
Mongo sf easy java persistence
Mongo sf   easy java persistenceMongo sf   easy java persistence
Mongo sf easy java persistence
 
MongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaMongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with Morphia
 
MongoDB: Mastering the shell
MongoDB: Mastering the shellMongoDB: Mastering the shell
MongoDB: Mastering the shell
 
MongoDB: Backup, Restore, and DR
MongoDB: Backup, Restore, and DRMongoDB: Backup, Restore, and DR
MongoDB: Backup, Restore, and DR
 
A Brief MongoDB Intro
A Brief MongoDB IntroA Brief MongoDB Intro
A Brief MongoDB Intro
 
What's new in the MongoDB Java Driver (2.5)?
What's new in the MongoDB Java Driver (2.5)?What's new in the MongoDB Java Driver (2.5)?
What's new in the MongoDB Java Driver (2.5)?
 
MongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupMongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF Meetup
 
MongoDB: tips, trick and hacks
MongoDB: tips, trick and hacksMongoDB: tips, trick and hacks
MongoDB: tips, trick and hacks
 
Mastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript ShellMastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript Shell
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

MongoDB Operational Best Practices (mongosf2012)

  • 1. Operational Best Practices Tales from the field
  • 2. The Plan ● Review support cases ○ Taken from real issues ○ Names/ips/dates changed to protect identities ● Analyze reported issues ● Distill best practices ● Summarize takeaways ● Repeat...
  • 3. Scenario 1 ● Fire, it is on fire! ● Users notice response time takes 1-3 sec ● App logs show timeouts ● Server log show socket exceptions
  • 4. Scenario 1 - Diagnostics ● Logs ● Understanding the timeouts ○ Client read timeout set ○ Connection closed/discarded ○ Symptom not cause ● Server connection exceptions ○ Match timing of client timeouts ○ Symptom not cause
  • 5. Scenario 1 - Monitoring Graphs speak a thousand words
  • 6. Scenario 1 - Takeaways ● Monitor Logs ○ Alert, escalate ○ Correlate ● Disk ○ Monitor ○ Moved to RAID (10) ● Instrument/Monitor App ● Know your application and application (write) characteristics
  • 7. Scenario 2 ● Alerts warn that server is running hot ● Random (small) slowdowns ● Increased traffic/queries
  • 8. Scenario 2 - Symptoms High use cpu Similar query pattern
  • 9. Scenario 2 - Diagnostics ● Turn on DB Profiling ● Look at logs Identify query patterns taking longest or with highest frequency and run explain
  • 10. Scenario 2 - Explain db.scenario2.find({...}).sort({...}).explain() { "cursor" : "BtreeCursor ABC", "nscanned" : 160677, "nscannedObjects" : 12015, "n" : 55, "millis" : 99, "scanAndOrder" : true, "indexBounds" : {...} }
  • 11. Scenario 2 - Diagnostics ● Create a compound index ○ Used for criteria and sort ○ Reduced CPU dramatically
  • 12. Scenario 2 - Takeaways ● Performance test/analyze system behavior ● Load test before deployment ● Alert on abnormal states ● High CPU is a sign of poorly indexed ● Rolling upgrade for indexes
  • 13. Scenario 3 ● General slowdown on login ● High disk utilization
  • 14. Scenario 3 - Diagnostics iostat Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdp 0.00 0.00 0.50 0.00 27.86 0.00 56.00 149.58 20320.00 2010.00 100.00
  • 15. Scenario 3 $ blockdev --report RO RA SSZ BSZ StartSec Size Device rw 8096 512 4048 0 1099494850560 /dev/sdp Huge read-ahead of 4MB
  • 16. Scenario 3 - Takeaways ● Pay attention to disk configurations ● Load testing would have found this early ● MongoDB depends on the OS a lot ● Connect the dots from disportionate effects
  • 17. Best Practices Learned ● System provisioning ○ Capacity ○ Performance ○ Scale ○ Configuration ● Logs ○ Review ○ Alert ○ Rotate and collect (per cluster)
  • 18. Best Practices Learned ● Query/Index Analysis ○ Database Profiler ○ Run explain periodically (sampled) ○ Instrument code, generate metrics ● Plan/test rollouts ○ Rolling upgrade for Replica Set ○ Generate indexes on secondaries first ○ Name services, use redirection
  • 19. Thanks, more refs Please take a look at http://mongodb.org (docs) ● Ask on mongodb-user group ● Use MMS or historic monitoring ○ Watch for trends ○ Create alerts ○ Forecast capacity for provisioning ● logrotate unix command ● monitor disk - munin or the like ● iostat, dstat, vmstat, free, netstat