Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Managing a Healthy Couchbase Server Deployment

730 views

Published on

“…An ounce of prevention is worth a pound of cure…” Using customer use cases we’ll review how to identify a healthy cluster and spot issues before they become problems for your application. The talk will cover Couchbase monitoring and explain how to interpret key statistics to identify common problems

Published in: Software
  • Be the first to comment

Managing a Healthy Couchbase Server Deployment

  1. 1. Managing a Healthy Couchbase Server Deployment Dean Proctor | SolutionsArchitect, Couchbase
  2. 2. ©2015 Couchbase Inc. 2 OperatorTasks 1. Size 2. Install 3. Monitor 4. Maintain
  3. 3. ©2015 Couchbase Inc. 3 Resources  Sizing – Docs: couchbase.com keyword “sizing” – Presentation: connect15.couchbase.com keyword “sizing”  Installation – Docs: couchbase.com keyword “installation” – Presentation: connect15.couchbase.com keyword “tuning”  Maintenance – Docs: couchbase.com keyword “admin troubleshooting”
  4. 4. ©2015 Couchbase Inc. 4 Monitoring
  5. 5. ©2015 Couchbase Inc. 5 How to get stats?  REST API  CLI  File  Web UI
  6. 6. ©2015 Couchbase Inc. 6 Stats via REST  http://<server ip>:8091/pools/default/buckets/<bucket>/stats – Metrics for all components available – JSON format – Last 60 measurements returned by default – Supports zoom and haveTStamp options
  7. 7. ©2015 Couchbase Inc. 7 Stats via CLI  /opt/couchbase/bin/cbstats all – 303 statistics available – ep_engine specific metrics – Key/value format – Integrates with traditional monitoring systems
  8. 8. ©2015 Couchbase Inc. 8 Stats via File  /opt/couchbase/var/lib/couchbase/logs/stats.log – Minutely dump of stats from ns_server and ep_engine – Mixed format: Erlang and Key/value – Requires custom parsers
  9. 9. ©2015 Couchbase Inc. 9 e Admin UI – Monitoring System Monitoring Stats per Bucket on entire Cluster 120+ Stats collected from entire Cluster View stats by aggregate Click eclipse to view this stat on per Server basis Tooltip provides description and stats used for calculating Stats via Web UI
  10. 10. ©2015 Couchbase Inc. 10 Built-in Alerts
  11. 11. ©2015 Couchbase Inc. 11 External Monitoring Systems
  12. 12. ©2015 Couchbase Inc. 12 Key Couchbase Metrics
  13. 13. ©2015 Couchbase Inc. 13  “cache miss ratio” – Goal: <1% unless you explicitly sized for something greater  “memory used” reaching the “high watermark” – Goal: Infrequent occurrence  “temp oom” – Goal: 0 Memory
  14. 14. ©2015 Couchbase Inc. 14  “disk write queue” – Goal: Peaks under 1 million items per node  “fragmentation” (docs and views) – Goal: Under 2x% Disk
  15. 15. ©2015 Couchbase Inc. 15  “items remaining” in the “DCP queues” section – Goal: very small number (workload dependent)  “mutations” in the “Outbound XDCR” section – Goal: small number (workload dependent) DCP and XDCR
  16. 16. ©2015 Couchbase Inc. 16  “active vBuckets” – Goal: 1024 * number of buckets across entire cluster  “replica vBuckets” – Goal: 1024 * total number of replicas across cluster vBuckets
  17. 17. ©2015 Couchbase Inc. 17 Key System Metrics
  18. 18. ©2015 Couchbase Inc. 18  “beam.smp” – Goal: >=3  “memcached” – Goal: 1  “ntpd” – Goal: 1, time in sync between nodes Processes
  19. 19. ©2015 Couchbase Inc. 19  Admin – 8091, 18091  CRUD API – 11210 – Custom bucket ports  Query APIs – 8092, 18092 – 8093, 9101, 9102 Ports
  20. 20. ©2015 Couchbase Inc. 20  Free space: data, indexes, logs – Goal: > 20% at peak util  Disk IO utilization – Goal: < 90% sustained Disk
  21. 21. ©2015 Couchbase Inc. 21  Free memory (including buffers/cache) – Goal: > 15% at peak util  Swap used – Goal: 0  memcached, beam.smp memory usage – Goal: beam.smp stable over time – memcached <= Couchbase server quota Memory
  22. 22. ©2015 Couchbase Inc. 22  Active connections – Goal: < 10,000 per node  Connections in CLOSE_WAIT – Goal: consistently low  Bandwidth utilization – Goal: < 80% sustained Network
  23. 23. ©2015 Couchbase Inc. 23  User % vs other – Goal: User % > sum(other)  Steal % – Goal: 0  Load – Goal: average peak < 80% * num_cores CPU
  24. 24. ©2015 Couchbase Inc. 24 Logs
  25. 25. ©2015 Couchbase Inc. 25  /opt/couchbase/var/lib/couchbase/logs/error.log  /opt/couchbase/var/lib/couchbase/logs/xdcr_errors.log – Alert on everything – Filter once you understand normal behavior for your environment  Check here first: – couchbase.com keyword “admin troubleshooting” – support.couchbase.com knowledge base Log Monitoring
  26. 26. ©2015 Couchbase Inc. 26 Questions?

×