• Like
Keeping_your_cluster_healthy_Couchbase_SF2013
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,017
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
15
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Prevention is better than cure! I will explain in this how to make sure a cluster is healthy and how to spot issues before they become problems for your application. We will also cover Couchbase monitoring and explain what key stats mean and how to interpret them to identify common problems. In addition to the monitoring, usage of the new Health Check tool will be covered, explaining how it can be used as a regular part of cluster health monitoring.
  • Agenda of this talk is around these four questions. What is healthy Couchbase Cluster looks likeHow to monitor your Couchbase ClusterWhat Key System Metrics to look forWhat Health-Check tool Couchbase provides
  • So What is a Healthy Couchbase Cluster?
  • Cluster is in good health when its able to run without any issues and still providesatisfactory performance and have enough capacity to handle any unknown loads.
  • That is in-terms of Couchbase Cluster
  • So What is a Healthy Couchbase Cluster?
  • That is in-terms of Couchbase Cluster Response Time for Application Consistent read-write response time which meets the Application requirementConsistent high throughputIn terms of CapacityAvailable Memory for the Cluster Disk Write RateCPU utilization and I/O utilization
  • Next question after we know what a healthy Couchbase Cluster should look like is how to monitor it?
  • In a nutshell this what Couchbase Monitoring System provides Real-time traffic graphs for entire cluster per server drill-down and also per-bucket drill downAll the stats are available as REST API
  • Let me quickly show you the screenshots of Couchbase Admin UI Console – which the best friend for Admins We’re looking at the initial landing page of Couchbase Admin UI i.e. Cluster Overview page which gives you a snapshot of health of the cluster
  • Second screenshot we’re looking at is Server Nodes page which provide list of all the active, failed-over and pending server (which is yet to be made active). Also this page provides Admins ability to add, remove or failover servers from the cluster.
  • Second screenshot we’re looking at is Server Nodes page which provide list of all the active, failed-over and pending server (which is yet to be made active). Also this page provides Admins ability to add, remove or failover servers from the cluster.
  • Monitoring System is where Admin will spend most of the time this page provides around 200+ stats from entire cluster.
  • Logging System provides logs for every event that has occurred in the cluster.
  • StatsStats timingDemo 3- Stats – check how we are on time.Load: -h localhost -i1000000 –M1000 –m9900 -t8 –K sharon –c500000 -l
  • StatsStats timingDemo 3- Stats – check how we are on time.Load: -h localhost -i1000000 –M1000 –m9900 -t8 –K sharon –c500000 -l
  • Add CPU

Transcript

  • 1. 1
  • 2. 2 Anil Kumar Technical Product Manager Keeping your Couchbase Cluster Healthy
  • 3. 3 Today’s Diagnostics  What does a Healthy Cluster look like What does a Unhealthy Cluster look like  How to Monitor your Couchbase Cluster  What Key System Metrics to look for  How the CB Health-Check tool helps
  • 4. 4 What does a Healthy Cluster look like
  • 5. 5 Healthy Couchbase Cluster
  • 6. 6 Healthy Couchbase Cluster  ‘Active vBuckets’ count across all the servers should be equal to “1024”  ‘Replica vBuckets’ count across all the servers should be equal to “1024 * <num of replica’s configured>”  ‘Cache Miss Ratio’ and ‘Disk Reads per Sec’ across all the servers should be equal to “0”  Items in ‘TAP Queue’ and ‘Disk Queue’ across all servers should be much lower numbers
  • 7. 7 What does a Unhealthy Cluster looks like
  • 8. 8 Symptoms of an Unhealthy Couchbase Cluster
  • 9. 9 Symptoms of an Unhealthy Couchbase Cluster  ‘Memory used’ is equal to ‘High Water Mark’ that means Active items are evicted from RAM  In case of sustained ‘Write rate’ if ‘Drain rate’ is much lower than ‘Fill rate’ then Disk Queue is full  TMP Out Of Memory that means memory usage is at or above 90% of bucket memory quota  ‘Disk Reads per Sec’ and ‘Cache Miss Ratio’ growing at hundreds or thousands that means no more memory capacity
  • 10. 10 How to Monitor your Couchbase Cluster
  • 11. 12  Real-time traffic graphs  REST API accessible  can extend your existing monitoring system to capture stats from couchbase through the REST APIs  Per bucket, per node and aggregate statistics  Monitor inter-node traffic Couchbase Monitoring System
  • 12. 13 Couchbase Admin UI – Cluster Overview Cluster Overview Page Cluster RAM Usage Cluster DISK Usage Buckets Deployed in Cluster Servers Deployed in Cluster Cluster Rebalance Progress Indicator
  • 13. 14 Couchbase Admin UI – Server Nodes Server Node Page List Active Servers Expand individual Servers Servers Ready for Rebalance
  • 14. 15 Couchbase Admin UI – Server Nodes Additional Server Details Rebalance Progress Indicator in-detail Keys transferred, Keys yet to be transferred Memory utilization on this Server Disk utilization on this Server
  • 15. 16 Couchbase Admin UI – Monitoring System Monitoring Stats per Bucket on entire Cluster 120+ Stats collected from entire cluster View stats by aggregated Click eclipse to view this stat on per Server basis Tooltip provides description and stats used for calculating
  • 16. 17 Couchbase Admin UI – Logging System Log Event page Logs all events occurring on the cluster With timestamp Server where event occurred
  • 17. 18 vBucket ResourcesActive State Replica State Pending State Total Active vBuckets Replica vBuckets
  • 18. 19 vBucket Resources vBucket stats section displays information for all vBucket types within the cluster displayed for each of the vBucket states, Active, Replica and Pending.  vBuckets – Number of vBuckets  items - Number of items within the vBucket  resident% - Percentage of items within the vBuckets that are resident (in RAM)  new items per sec - Number of new items created in vBuckets  ejections per sec - Number of items ejected per second within the vBuckets  user data in RAM - Size of user data within vBuckets that are resident in RAM  metadata in RAM - Size of item metadata within the vBuckets that are resident in RAM
  • 19. 20 Disk Queues Active State Replica State Pending State Total
  • 20. 21 Disk Queues Disk Queues stats section displays the information for data being placed into the disk queue displayed for each of the disk queue states, Active, Replica and Pending.  items - Number of items waiting to be written to disk for this bucket  fill rate - Number of items per second being added to the disk queue  drain rate - Number of items actually written to disk from the disk queue  average age - Average age of items (in seconds) within the disk queue
  • 21. 24 XDCR Stats Outgoing XDCR stats section displays information about the XDCR operations that are supporting cross datacenter replication from the current cluster to a destination cluster. Incoming XDCR stats section displays information about the XDCR operations that are coming into to the current cluster from a remote cluster.
  • 22. 27 What Key System Metrics to look for
  • 23. 29 Key System Metrics  Working set doesn’t fit in RAM  Cache miss rate / disk fetches  Disk I/O not keeping up  Disk Write queue size  Internal replication lag  TAP queues  Indexing not keeping up  XDCR lag
  • 24. 31 Couchbase Cluster Health Check Tool
  • 25. 32 What is CBHealthChecker Tool Report provides  ALERT user on issues where immediate action is required.  Easy way to indicate to user whether their Cluster is healthy.  List all the server in the Cluster and indicate whether they’re healthy  Summary of Cluster-wide metrics  Important stats on each Bucket  Lists important metrics on per Bucket basis  Important stats on each Node  List the important metrics on per Node basis  WARNING indicators to point out issues that needs to be addressed before they become an issue.
  • 26. 33 How to Run CBHealthChecker Tool • You can find this tool in the following locations, depending upon your platform: • Usage Couchbase Documentation – CBHealthChecker Toolwww.couchbase.com/docs/couchbase-manual- 2.1.0/couchbase-admin-cmdline-cbhealthchecker.html
  • 27. 34 Sample Healthy Report Time periods Categories to jump to Expanding this provides detailed Sizing info Cluster-wide stats analyzed
  • 28. 35 Sample UnHealthy Report User needs take action Details about what action
  • 29. 36 Q & A
  • 30. 37 Thank you! anil@couchbase.com @anilkumar1129 Get Couchbase Server at http://www.couchbase.com/download
  • 31. 38