SlideShare a Scribd company logo
1 of 20
Download to read offline
Operational Best
   Practices
Tales from the field
The Plan
● Review support cases
  ○ Taken from real issues
  ○ Names/ips/dates changed to protect identities
● Analyze reported issues
● Distill best practices
● Summarize takeaways

● Repeat...
Scenario 1
● Fire, it is on fire!

● Users notice response time takes 1-3 sec
● App logs show timeouts
● Server log show socket exceptions
Scenario 1 - Diagnostics
● Logs

● Understanding the timeouts
   ○ Client read timeout set
   ○ Connection closed/discarded
   ○ Symptom not cause


● Server connection exceptions
  ○ Match timing of client timeouts
   ○ Symptom not cause
Scenario 1 - Monitoring
Graphs speak a thousand words
Scenario 1 - Takeaways
● Monitor Logs
  ○ Alert, escalate
  ○ Correlate
● Disk
   ○ Monitor
   ○ Moved to RAID (10)
● Instrument/Monitor App
● Know your application and application (write)
  characteristics
Scenario 2
● Alerts warn that server is running hot
● Random (small) slowdowns
● Increased traffic/queries
Scenario 2 - Symptoms
High use cpu




Similar query
pattern
Scenario 2 - Diagnostics
● Turn on DB Profiling
● Look at logs


Identify query patterns taking longest or with
highest frequency and run explain
Scenario 2 - Explain
db.scenario2.find({...}).sort({...}).explain() {
     "cursor" : "BtreeCursor ABC",
     "nscanned" : 160677,
     "nscannedObjects" : 12015,
     "n" : 55,
     "millis" : 99,
     "scanAndOrder" : true,
     "indexBounds" : {...} }
Scenario 2 - Diagnostics
● Create a compound index
  ○ Used for criteria and sort
  ○ Reduced CPU dramatically
Scenario 2 - Takeaways
●   Performance test/analyze system behavior
●   Load test before deployment
●   Alert on abnormal states
●   High CPU is a sign of poorly indexed
●   Rolling upgrade for indexes
Scenario 3
● General slowdown on login
● High disk utilization
Scenario 3 - Diagnostics
iostat
Device:   rrqm/s wrqm/s r/s   w/s rsec/s wsec/s avgrq-sz avgqu-sz await   svctm %util
sdp       0.00 0.00 0.50      0.00 27.86 0.00 56.00 149.58        20320.00 2010.00 100.00
Scenario 3
$ blockdev --report
RO RA SSZ BSZ StartSec       Size    Device
rw 8096 512 4048    0 1099494850560 /dev/sdp


Huge read-ahead of 4MB
Scenario 3 - Takeaways
●   Pay attention to disk configurations
●   Load testing would have found this early
●   MongoDB depends on the OS a lot
●   Connect the dots from disportionate effects
Best Practices Learned
● System provisioning
  ○   Capacity
  ○   Performance
  ○   Scale
  ○   Configuration
● Logs
  ○ Review
  ○ Alert
  ○ Rotate and collect (per cluster)
Best Practices Learned
● Query/Index Analysis
  ○ Database Profiler
  ○ Run explain periodically (sampled)
  ○ Instrument code, generate metrics
● Plan/test rollouts
  ○ Rolling upgrade for Replica Set
  ○ Generate indexes on secondaries first
  ○ Name services, use redirection
Thanks, more refs
Please take a look at http://mongodb.org (docs)

● Ask on mongodb-user group
● Use MMS or historic monitoring
   ○ Watch for trends
   ○ Create alerts
   ○ Forecast capacity for provisioning
● logrotate unix command
● monitor disk - munin or the like
● iostat, dstat, vmstat, free, netstat
Questions

More Related Content

Viewers also liked

Java Development with MongoDB
Java Development with MongoDBJava Development with MongoDB
Java Development with MongoDBScott Hernandez
 
Petty Cash Management - How To Manage Logs and Transactions
Petty Cash Management - How To Manage Logs and TransactionsPetty Cash Management - How To Manage Logs and Transactions
Petty Cash Management - How To Manage Logs and TransactionsDavid Olson
 
Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011lennartkoopmann
 
"Grand Challenges" of Log Management
"Grand Challenges" of Log Management"Grand Challenges" of Log Management
"Grand Challenges" of Log ManagementAnton Chuvakin
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaSpringPeople
 
Gaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsGaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsAmazon Web Services
 

Viewers also liked (8)

Java Development with MongoDB
Java Development with MongoDBJava Development with MongoDB
Java Development with MongoDB
 
Petty Cash Management - How To Manage Logs and Transactions
Petty Cash Management - How To Manage Logs and TransactionsPetty Cash Management - How To Manage Logs and Transactions
Petty Cash Management - How To Manage Logs and Transactions
 
Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011
 
"Grand Challenges" of Log Management
"Grand Challenges" of Log Management"Grand Challenges" of Log Management
"Grand Challenges" of Log Management
 
Log Files
Log FilesLog Files
Log Files
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
Gaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsGaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your Logs
 
Logs management
Logs managementLogs management
Logs management
 

Similar to MongoDB Operational Best Practices (mongosf2012)

How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...Red Hat Developers
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningPetar Djekic
 
Security Monitoring for big Infrastructures without a Million Dollar budget
Security Monitoring for big Infrastructures without a Million Dollar budgetSecurity Monitoring for big Infrastructures without a Million Dollar budget
Security Monitoring for big Infrastructures without a Million Dollar budgetJuan Berner
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Hernan Costante
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafkaconfluent
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and MetricsRicardo Lourenço
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia DatabasesJaime Crespo
 
Oksana Safronova - Will you detect it or not? How to check if security team i...
Oksana Safronova - Will you detect it or not? How to check if security team i...Oksana Safronova - Will you detect it or not? How to check if security team i...
Oksana Safronova - Will you detect it or not? How to check if security team i...NoNameCon
 
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...Umair Shahid
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Redis Labs
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseBig Data Spain
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Clustering in PostgreSQL - Because one database server is never enough (and n...
Clustering in PostgreSQL - Because one database server is never enough (and n...Clustering in PostgreSQL - Because one database server is never enough (and n...
Clustering in PostgreSQL - Because one database server is never enough (and n...Umair Shahid
 

Similar to MongoDB Operational Best Practices (mongosf2012) (20)

How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
Performance Whackamole (short version)
Performance Whackamole (short version)Performance Whackamole (short version)
Performance Whackamole (short version)
 
Security Monitoring for big Infrastructures without a Million Dollar budget
Security Monitoring for big Infrastructures without a Million Dollar budgetSecurity Monitoring for big Infrastructures without a Million Dollar budget
Security Monitoring for big Infrastructures without a Million Dollar budget
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Oksana Safronova - Will you detect it or not? How to check if security team i...
Oksana Safronova - Will you detect it or not? How to check if security team i...Oksana Safronova - Will you detect it or not? How to check if security team i...
Oksana Safronova - Will you detect it or not? How to check if security team i...
 
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Clustering in PostgreSQL - Because one database server is never enough (and n...
Clustering in PostgreSQL - Because one database server is never enough (and n...Clustering in PostgreSQL - Because one database server is never enough (and n...
Clustering in PostgreSQL - Because one database server is never enough (and n...
 

More from Scott Hernandez

MongoDB 2.8 Replication Internals: Fitting it all together
MongoDB 2.8 Replication Internals: Fitting it all togetherMongoDB 2.8 Replication Internals: Fitting it all together
MongoDB 2.8 Replication Internals: Fitting it all togetherScott Hernandez
 
Advanced Replication Internals
Advanced Replication InternalsAdvanced Replication Internals
Advanced Replication InternalsScott Hernandez
 
Realtime Analytics with MongoDB Counters (mongonyc 2012)
Realtime Analytics with MongoDB Counters (mongonyc 2012)Realtime Analytics with MongoDB Counters (mongonyc 2012)
Realtime Analytics with MongoDB Counters (mongonyc 2012)Scott Hernandez
 
MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)Scott Hernandez
 
Mongo sf easy java persistence
Mongo sf   easy java persistenceMongo sf   easy java persistence
Mongo sf easy java persistenceScott Hernandez
 
MongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaMongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaScott Hernandez
 
MongoDB: Mastering the shell
MongoDB: Mastering the shellMongoDB: Mastering the shell
MongoDB: Mastering the shellScott Hernandez
 
MongoDB: Backup, Restore, and DR
MongoDB: Backup, Restore, and DRMongoDB: Backup, Restore, and DR
MongoDB: Backup, Restore, and DRScott Hernandez
 
What's new in the MongoDB Java Driver (2.5)?
What's new in the MongoDB Java Driver (2.5)?What's new in the MongoDB Java Driver (2.5)?
What's new in the MongoDB Java Driver (2.5)?Scott Hernandez
 
MongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupMongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupScott Hernandez
 
MongoDB: tips, trick and hacks
MongoDB: tips, trick and hacksMongoDB: tips, trick and hacks
MongoDB: tips, trick and hacksScott Hernandez
 
Mastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript ShellMastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript ShellScott Hernandez
 

More from Scott Hernandez (13)

MongoDB 2.8 Replication Internals: Fitting it all together
MongoDB 2.8 Replication Internals: Fitting it all togetherMongoDB 2.8 Replication Internals: Fitting it all together
MongoDB 2.8 Replication Internals: Fitting it all together
 
Advanced Replication Internals
Advanced Replication InternalsAdvanced Replication Internals
Advanced Replication Internals
 
Realtime Analytics with MongoDB Counters (mongonyc 2012)
Realtime Analytics with MongoDB Counters (mongonyc 2012)Realtime Analytics with MongoDB Counters (mongonyc 2012)
Realtime Analytics with MongoDB Counters (mongonyc 2012)
 
MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)
 
Mongo sf easy java persistence
Mongo sf   easy java persistenceMongo sf   easy java persistence
Mongo sf easy java persistence
 
MongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaMongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with Morphia
 
MongoDB: Mastering the shell
MongoDB: Mastering the shellMongoDB: Mastering the shell
MongoDB: Mastering the shell
 
MongoDB: Backup, Restore, and DR
MongoDB: Backup, Restore, and DRMongoDB: Backup, Restore, and DR
MongoDB: Backup, Restore, and DR
 
A Brief MongoDB Intro
A Brief MongoDB IntroA Brief MongoDB Intro
A Brief MongoDB Intro
 
What's new in the MongoDB Java Driver (2.5)?
What's new in the MongoDB Java Driver (2.5)?What's new in the MongoDB Java Driver (2.5)?
What's new in the MongoDB Java Driver (2.5)?
 
MongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupMongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF Meetup
 
MongoDB: tips, trick and hacks
MongoDB: tips, trick and hacksMongoDB: tips, trick and hacks
MongoDB: tips, trick and hacks
 
Mastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript ShellMastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript Shell
 

Recently uploaded

Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideStefan Dietze
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 

Recently uploaded (20)

Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 

MongoDB Operational Best Practices (mongosf2012)

  • 1. Operational Best Practices Tales from the field
  • 2. The Plan ● Review support cases ○ Taken from real issues ○ Names/ips/dates changed to protect identities ● Analyze reported issues ● Distill best practices ● Summarize takeaways ● Repeat...
  • 3. Scenario 1 ● Fire, it is on fire! ● Users notice response time takes 1-3 sec ● App logs show timeouts ● Server log show socket exceptions
  • 4. Scenario 1 - Diagnostics ● Logs ● Understanding the timeouts ○ Client read timeout set ○ Connection closed/discarded ○ Symptom not cause ● Server connection exceptions ○ Match timing of client timeouts ○ Symptom not cause
  • 5. Scenario 1 - Monitoring Graphs speak a thousand words
  • 6. Scenario 1 - Takeaways ● Monitor Logs ○ Alert, escalate ○ Correlate ● Disk ○ Monitor ○ Moved to RAID (10) ● Instrument/Monitor App ● Know your application and application (write) characteristics
  • 7. Scenario 2 ● Alerts warn that server is running hot ● Random (small) slowdowns ● Increased traffic/queries
  • 8. Scenario 2 - Symptoms High use cpu Similar query pattern
  • 9. Scenario 2 - Diagnostics ● Turn on DB Profiling ● Look at logs Identify query patterns taking longest or with highest frequency and run explain
  • 10. Scenario 2 - Explain db.scenario2.find({...}).sort({...}).explain() { "cursor" : "BtreeCursor ABC", "nscanned" : 160677, "nscannedObjects" : 12015, "n" : 55, "millis" : 99, "scanAndOrder" : true, "indexBounds" : {...} }
  • 11. Scenario 2 - Diagnostics ● Create a compound index ○ Used for criteria and sort ○ Reduced CPU dramatically
  • 12. Scenario 2 - Takeaways ● Performance test/analyze system behavior ● Load test before deployment ● Alert on abnormal states ● High CPU is a sign of poorly indexed ● Rolling upgrade for indexes
  • 13. Scenario 3 ● General slowdown on login ● High disk utilization
  • 14. Scenario 3 - Diagnostics iostat Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdp 0.00 0.00 0.50 0.00 27.86 0.00 56.00 149.58 20320.00 2010.00 100.00
  • 15. Scenario 3 $ blockdev --report RO RA SSZ BSZ StartSec Size Device rw 8096 512 4048 0 1099494850560 /dev/sdp Huge read-ahead of 4MB
  • 16. Scenario 3 - Takeaways ● Pay attention to disk configurations ● Load testing would have found this early ● MongoDB depends on the OS a lot ● Connect the dots from disportionate effects
  • 17. Best Practices Learned ● System provisioning ○ Capacity ○ Performance ○ Scale ○ Configuration ● Logs ○ Review ○ Alert ○ Rotate and collect (per cluster)
  • 18. Best Practices Learned ● Query/Index Analysis ○ Database Profiler ○ Run explain periodically (sampled) ○ Instrument code, generate metrics ● Plan/test rollouts ○ Rolling upgrade for Replica Set ○ Generate indexes on secondaries first ○ Name services, use redirection
  • 19. Thanks, more refs Please take a look at http://mongodb.org (docs) ● Ask on mongodb-user group ● Use MMS or historic monitoring ○ Watch for trends ○ Create alerts ○ Forecast capacity for provisioning ● logrotate unix command ● monitor disk - munin or the like ● iostat, dstat, vmstat, free, netstat