Webinar: Operational Best Practices

3,160 views
3,031 views

Published on

This webinar will cover best practices around dev/ops and general operations for those already familiar with basics of MongoDB. Topics will include team roles around data model design, monitoring, hardware configurations, replication and horizontal scaling.

Published in: Technology

Webinar: Operational Best Practices

  1. 1. #MongoDBOperational Best PracticesAsya KamskySenior Solutions Architect, 10gen
  2. 2. Best Practices == More ValueHow to get more sleep while your MongoDB cluster hums along Operational Best Practices Asya Kamsky
  3. 3. The Agenda•  Roles and responsibilities•  Schema design and application performance•  Hardware•  Replication•  Sharding•  Monitoring Operational Best Practices Asya Kamsky
  4. 4. Roles and Responsibilities
  5. 5. Roles and Responsibilities Application Data needs Network, Schema Firewalls, Design Security Hardware: Read and RAM, CPU, Write disk... Patterns Indexing Strategy Operational Best Practices Asya Kamsky
  6. 6. Roles and Responsibilities Application Data needs Schema Upgrades Design MONITORING Read and Maintenance Write Patterns Indexing Backups Strategy Network, Hardware: Firewalls, RAM, CPU, Security disk... Operational Best Practices Asya Kamsky
  7. 7. Roles and Responsibilities Application Developer Network Data Admin Architect System DBA Admin Operational Best Practices Asya Kamsky
  8. 8. Schema Design andApplication Performance
  9. 9. Schema and Performance DATA != SCHEMAIn MongoDB correct schemadesign is essential for optimalapplication performance. Operational Best Practices Asya Kamsky
  10. 10. Schema and Performance Indexing is essentialMultiple types of indexes supported. Operational Best Practices Asya Kamsky
  11. 11. Schema and PerformanceUnderstanding actual performance•  Monitoring •  Logs•  Measuring •  Query plan•  Benchmarking •  Application•  Optimizing •  Ad-hoc testing Operational Best Practices Asya Kamsky
  12. 12. Hardware
  13. 13. Hardware•  Memory•  Storage•  CPU - speed•  CPU - number of coresImpact on performance in that order! Operational Best Practices Asya Kamsky
  14. 14. Replica Sets
  15. 15. Client Application Driver Write Read Primary Secondary SecondaryReplica Sets and Application
  16. 16. Node 1 Node 2 Secondary Secondary Heartbeat Re n tio p lic ica ati pl on Re Node 3 PrimaryReplica Set – HA
  17. 17. Primary Election Node 1 Node 2 Secondary Heartbeat Secondary Node 3Replica Set – Failure
  18. 18. Replication Node 1 Node 2 Secondary Primary Heartbeat Node 3Replica Set – Failover
  19. 19. Replication Node 1 Node 2 Secondary Primary Heartbeat n tio ica pl Re Node 3 RecoveryReplica Set – Recovery
  20. 20. Replication Node 1 Node 2 Secondary Primary Heartbeat n tio ica pl Re Node 3 SecondaryReplica Set – Reestablished
  21. 21. Replica Sets•  Primary purpose: –  High Availability with automatic failover –  Disaster Recovery –  No-down-time maintenance –  No application changes on reconfiguration –  Extra copies of data for "special" read workloads•  Full benefit achieved with advance planning Operational Best Practices Asya Kamsky
  22. 22. Replica Sets •  Determine your SLA/HA requirements •  Determine your DR requirements •  Understand impact of node, network, DC failure •  Understand all available RS features priority scores, hidden, delayed, tags•  Full • benefit and proactively remedy potentialplanning Monitor achieved with advance problems •  Practice recovery from disastrous failure Operational Best Practices Asya Kamsky
  23. 23. Replica Sets•  Best Practices for Configuration –  Odd number of voting replica members –  Size the oplog appropriately for high volume loads –  Use multiple Data Centers/Availability Zones –  Use DNS names for node configuration –  Add hidden delayed-replication member as "insurance" –  All replica set nodes should have same capacity•  Operation –  Upgrade secondaries first (primary last) –  Maintenance on secondaries first (primary last) –  Use rs.stepDown() command Operational Best Practices Asya Kamsky
  24. 24. Sharded Clusters
  25. 25. App Server App Server App Server Mongos Mongos Mongos Config Node 1 Server Secondary Config Node 1 Server Secondary Config Node 1 Server Secondary Shard Shard ShardSharding
  26. 26. Sharded Clusters•  Keys to successful sharding: –  Pick a good shard key –  Make config servers resilient –  Shard before you "have to"•  Good shard key is essential to achieving scaling Operational Best Practices Asya Kamsky
  27. 27. Sharded Clusters •  Distributes your writes across all shards •  Allows majority of reads to be "targeted" (not scatter- gather) •  Exists in every document •  Has sufficiently high cardinality•  Good shardyou to is essential toadvanced features •  Allows key take advantage of achieving scaling - tag aware balancing Operational Best Practices Asya Kamsky
  28. 28. Sharded Clusters•  Config Servers –  Three must be available to automatically balance data –  All three must be "in sync" •  if one becomes unavailable others go read-only –  At least one must be available to avoid disaster •  without information inside config server its not possible to determine which shards contain which ranges of data!•  Must stop balancing during backup Operational Best Practices Asya Kamsky
  29. 29. Sharded Clusters•  Shard before you "have to" –  Balancing data is intensive process –  If existing cluster is near full capacity balancing may impact response time of application –  Planning to shard well in advance gives more time •  to provision new hardware •  to select a good shard key •  to understand advanced sharding features (tagging) Operational Best Practices Asya Kamsky
  30. 30. Sharded Clusters•  Other best practices –  Three config servers –  Each shard is a replica set –  Test what you run •  use the same topology in QA as in production –  Monitor •  RAM •  disk I/O •  total storage •  MongoDB throughput Operational Best Practices Asya Kamsky
  31. 31. Monitoring
  32. 32. Monitoring •  Multiple CLI and internal status commands •  mongostat; mongotop; db.serverStatus() •  MMS •  Plug-ins for munin, Nagios, cacti, etc. •  Integration via SNMP to other tools Operational Best Practices Asya Kamsky
  33. 33. MongoDB Monitoring Service (MMS)Free, cloud-based service for monitoring and alerts
  34. 34. MongoDB Monitoring Service (MMS)Free, cloud-based service for monitoring and alerts•  Charts, custom dashboards and automated alerting•  Tracks 100+ metrics – performance, resource utilization, availability and response times•  10,000+ users
  35. 35. A Picture Speaks a Thousand Words Operational Best Practices Asya Kamsky
  36. 36. SymptomsHigh Use CPU Similar Query Pattern Operational Best Practices Asya Kamsky
  37. 37. Diagnostics - iostatDevice: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %utilsdp 0.00 0.00 0.50 0.00 27.86 0.00 56.00 149.58 20320.00 2010.00 100.00 Operational Best Practices Asya Kamsky
  38. 38. Monitoring •  mongostat Operational Best Practices Asya Kamsky
  39. 39. Monitoring •  mongotop Operational Best Practices Asya Kamsky
  40. 40. Monitoring Best Practices•  Monitor Logs –  Alert, escalate –  Correlate•  Disk –  Monitor•  Instrument/Monitor App (including logs!)•  Know your application and application (write) characteristics Operational Best Practices Asya Kamsky
  41. 41. Monitoring Best Practices•  Performance test/analyze system behavior•  Load test before deployment•  Selectively use database profiling during testing•  Alert on abnormal states•  High CPU is a sign of poorly indexed query Operational Best Practices Asya Kamsky
  42. 42. Best Practices Summary
  43. 43. Best Practices•  Pre-deployment –  Learn –  Plan –  Prototype/Benchmark –  Execute•  During deployment –  Monitor –  Continue planning –  Evolve Operational Best Practices Asya Kamsky
  44. 44. System provisioning•  Capacity•  Performance•  Scale•  Configuration Operational Best Practices Asya Kamsky
  45. 45. Logs•  Review•  Alert•  Rotate and collect (per cluster) Operational Best Practices Asya Kamsky
  46. 46. Query/Index Analysis•  Database Profiler•  Run explain periodically (sampled)•  Instrument code, generate metrics•  Look for similar patterns to find root cause Operational Best Practices Asya Kamsky
  47. 47. Hardware Configuration•  Pay attention to disk configurations•  Load testing will find some misconfigurations•  MongoDB depends on the OS a lot Operational Best Practices Asya Kamsky
  48. 48. Plan/Test Rollouts•  Rolling upgrade for Replica Set•  Generate indexes on secondaries first•  Name services, use redirection Operational Best Practices Asya Kamsky
  49. 49. More References•  Please take a look at http://docs.mongodb.org•  Ask questions on mongodb-user group•  Use MMS or historic monitoring –  Watch for trends –  Create alerts –  Forecast capacity for provisioning•  Utilize all available resources –  10gen offers paid public and on-site training & free web-based classes –  consulting services –  pre-production and production support Operational Best Practices Asya Kamsky
  50. 50. #MongoSVThank YouAsya KamskySenior Solutions Architect, 10gen

×