1
2
Anil Kumar
Technical Product Manager
Keeping your Couchbase Cluster
Healthy
3
Today’s Diagnostics
 What does a Healthy Cluster look like
What does a Unhealthy Cluster look like
 How to Monitor yo...
4
What does a Healthy Cluster look
like
5
Healthy Couchbase Cluster
6
Healthy Couchbase Cluster
 ‘Active vBuckets’ count across all the servers should be
equal to “1024”
 ‘Replica vBuckets...
7
What does a Unhealthy Cluster looks
like
8
Symptoms of an Unhealthy Couchbase Cluster
9
Symptoms of an Unhealthy Couchbase Cluster
 ‘Memory used’ is equal to ‘High Water Mark’ that means
Active items are evi...
10
How to Monitor your Couchbase
Cluster
12
 Real-time traffic graphs
 REST API accessible
 can extend your existing monitoring system to capture stats
from cou...
13
Couchbase Admin UI – Cluster Overview
Cluster Overview
Page
Cluster RAM Usage
Cluster DISK Usage
Buckets Deployed
in Cl...
14
Couchbase Admin UI – Server Nodes
Server Node Page
List Active Servers
Expand individual
Servers
Servers Ready for
Reba...
15
Couchbase Admin UI – Server Nodes
Additional Server
Details
Rebalance Progress
Indicator in-detail Keys
transferred, Ke...
16
Couchbase Admin UI – Monitoring System
Monitoring Stats per Bucket
on entire Cluster
120+ Stats collected from
entire c...
17
Couchbase Admin UI – Logging System
Log Event page
Logs all events occurring
on the cluster
With timestamp
Server where...
18
vBucket ResourcesActive State
Replica State Pending State Total
Active vBuckets Replica vBuckets
19
vBucket Resources
vBucket stats section displays information for all vBucket types within the
cluster displayed for eac...
20
Disk Queues
Active State
Replica State Pending State Total
21
Disk Queues
Disk Queues stats section displays the information for data being placed
into the disk queue displayed for ...
24
XDCR Stats
Outgoing XDCR stats section displays information about the XDCR
operations that are supporting cross datacen...
27
What Key System Metrics to look for
29
Key System Metrics
 Working set doesn’t fit in RAM
 Cache miss rate / disk fetches
 Disk I/O not keeping up
 Disk W...
31
Couchbase Cluster Health Check Tool
32
What is CBHealthChecker Tool
Report provides
 ALERT user on issues where immediate action is required.
 Easy way to i...
33
How to Run CBHealthChecker Tool
• You can find this tool in the following locations, depending upon your
platform:
• Us...
34
Sample Healthy Report
Time periods
Categories to jump
to
Expanding this
provides detailed
Sizing info
Cluster-wide stat...
35
Sample UnHealthy Report
User needs take
action
Details about what
action
36
Q & A
37
Thank you!
anil@couchbase.com
@anilkumar1129
Get Couchbase Server at
http://www.couchbase.com/download
38
Upcoming SlideShare
Loading in …5
×

Keeping_your_cluster_healthy_Couchbase_SF2013

1,888 views

Published on

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,888
On SlideShare
0
From Embeds
0
Number of Embeds
575
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Prevention is better than cure! I will explain in this how to make sure a cluster is healthy and how to spot issues before they become problems for your application. We will also cover Couchbase monitoring and explain what key stats mean and how to interpret them to identify common problems. In addition to the monitoring, usage of the new Health Check tool will be covered, explaining how it can be used as a regular part of cluster health monitoring.
  • Agenda of this talk is around these four questions. What is healthy Couchbase Cluster looks likeHow to monitor your Couchbase ClusterWhat Key System Metrics to look forWhat Health-Check tool Couchbase provides
  • So What is a Healthy Couchbase Cluster?
  • Cluster is in good health when its able to run without any issues and still providesatisfactory performance and have enough capacity to handle any unknown loads.
  • That is in-terms of Couchbase Cluster
  • So What is a Healthy Couchbase Cluster?
  • That is in-terms of Couchbase Cluster Response Time for Application Consistent read-write response time which meets the Application requirementConsistent high throughputIn terms of CapacityAvailable Memory for the Cluster Disk Write RateCPU utilization and I/O utilization
  • Next question after we know what a healthy Couchbase Cluster should look like is how to monitor it?
  • In a nutshell this what Couchbase Monitoring System provides Real-time traffic graphs for entire cluster per server drill-down and also per-bucket drill downAll the stats are available as REST API
  • Let me quickly show you the screenshots of Couchbase Admin UI Console – which the best friend for Admins We’re looking at the initial landing page of Couchbase Admin UI i.e. Cluster Overview page which gives you a snapshot of health of the cluster
  • Second screenshot we’re looking at is Server Nodes page which provide list of all the active, failed-over and pending server (which is yet to be made active). Also this page provides Admins ability to add, remove or failover servers from the cluster.
  • Second screenshot we’re looking at is Server Nodes page which provide list of all the active, failed-over and pending server (which is yet to be made active). Also this page provides Admins ability to add, remove or failover servers from the cluster.
  • Monitoring System is where Admin will spend most of the time this page provides around 200+ stats from entire cluster.
  • Logging System provides logs for every event that has occurred in the cluster.
  • StatsStats timingDemo 3- Stats – check how we are on time.Load: -h localhost -i1000000 –M1000 –m9900 -t8 –K sharon –c500000 -l
  • StatsStats timingDemo 3- Stats – check how we are on time.Load: -h localhost -i1000000 –M1000 –m9900 -t8 –K sharon –c500000 -l
  • Add CPU
  • Keeping_your_cluster_healthy_Couchbase_SF2013

    1. 1. 1
    2. 2. 2 Anil Kumar Technical Product Manager Keeping your Couchbase Cluster Healthy
    3. 3. 3 Today’s Diagnostics  What does a Healthy Cluster look like What does a Unhealthy Cluster look like  How to Monitor your Couchbase Cluster  What Key System Metrics to look for  How the CB Health-Check tool helps
    4. 4. 4 What does a Healthy Cluster look like
    5. 5. 5 Healthy Couchbase Cluster
    6. 6. 6 Healthy Couchbase Cluster  ‘Active vBuckets’ count across all the servers should be equal to “1024”  ‘Replica vBuckets’ count across all the servers should be equal to “1024 * <num of replica’s configured>”  ‘Cache Miss Ratio’ and ‘Disk Reads per Sec’ across all the servers should be equal to “0”  Items in ‘TAP Queue’ and ‘Disk Queue’ across all servers should be much lower numbers
    7. 7. 7 What does a Unhealthy Cluster looks like
    8. 8. 8 Symptoms of an Unhealthy Couchbase Cluster
    9. 9. 9 Symptoms of an Unhealthy Couchbase Cluster  ‘Memory used’ is equal to ‘High Water Mark’ that means Active items are evicted from RAM  In case of sustained ‘Write rate’ if ‘Drain rate’ is much lower than ‘Fill rate’ then Disk Queue is full  TMP Out Of Memory that means memory usage is at or above 90% of bucket memory quota  ‘Disk Reads per Sec’ and ‘Cache Miss Ratio’ growing at hundreds or thousands that means no more memory capacity
    10. 10. 10 How to Monitor your Couchbase Cluster
    11. 11. 12  Real-time traffic graphs  REST API accessible  can extend your existing monitoring system to capture stats from couchbase through the REST APIs  Per bucket, per node and aggregate statistics  Monitor inter-node traffic Couchbase Monitoring System
    12. 12. 13 Couchbase Admin UI – Cluster Overview Cluster Overview Page Cluster RAM Usage Cluster DISK Usage Buckets Deployed in Cluster Servers Deployed in Cluster Cluster Rebalance Progress Indicator
    13. 13. 14 Couchbase Admin UI – Server Nodes Server Node Page List Active Servers Expand individual Servers Servers Ready for Rebalance
    14. 14. 15 Couchbase Admin UI – Server Nodes Additional Server Details Rebalance Progress Indicator in-detail Keys transferred, Keys yet to be transferred Memory utilization on this Server Disk utilization on this Server
    15. 15. 16 Couchbase Admin UI – Monitoring System Monitoring Stats per Bucket on entire Cluster 120+ Stats collected from entire cluster View stats by aggregated Click eclipse to view this stat on per Server basis Tooltip provides description and stats used for calculating
    16. 16. 17 Couchbase Admin UI – Logging System Log Event page Logs all events occurring on the cluster With timestamp Server where event occurred
    17. 17. 18 vBucket ResourcesActive State Replica State Pending State Total Active vBuckets Replica vBuckets
    18. 18. 19 vBucket Resources vBucket stats section displays information for all vBucket types within the cluster displayed for each of the vBucket states, Active, Replica and Pending.  vBuckets – Number of vBuckets  items - Number of items within the vBucket  resident% - Percentage of items within the vBuckets that are resident (in RAM)  new items per sec - Number of new items created in vBuckets  ejections per sec - Number of items ejected per second within the vBuckets  user data in RAM - Size of user data within vBuckets that are resident in RAM  metadata in RAM - Size of item metadata within the vBuckets that are resident in RAM
    19. 19. 20 Disk Queues Active State Replica State Pending State Total
    20. 20. 21 Disk Queues Disk Queues stats section displays the information for data being placed into the disk queue displayed for each of the disk queue states, Active, Replica and Pending.  items - Number of items waiting to be written to disk for this bucket  fill rate - Number of items per second being added to the disk queue  drain rate - Number of items actually written to disk from the disk queue  average age - Average age of items (in seconds) within the disk queue
    21. 21. 24 XDCR Stats Outgoing XDCR stats section displays information about the XDCR operations that are supporting cross datacenter replication from the current cluster to a destination cluster. Incoming XDCR stats section displays information about the XDCR operations that are coming into to the current cluster from a remote cluster.
    22. 22. 27 What Key System Metrics to look for
    23. 23. 29 Key System Metrics  Working set doesn’t fit in RAM  Cache miss rate / disk fetches  Disk I/O not keeping up  Disk Write queue size  Internal replication lag  TAP queues  Indexing not keeping up  XDCR lag
    24. 24. 31 Couchbase Cluster Health Check Tool
    25. 25. 32 What is CBHealthChecker Tool Report provides  ALERT user on issues where immediate action is required.  Easy way to indicate to user whether their Cluster is healthy.  List all the server in the Cluster and indicate whether they’re healthy  Summary of Cluster-wide metrics  Important stats on each Bucket  Lists important metrics on per Bucket basis  Important stats on each Node  List the important metrics on per Node basis  WARNING indicators to point out issues that needs to be addressed before they become an issue.
    26. 26. 33 How to Run CBHealthChecker Tool • You can find this tool in the following locations, depending upon your platform: • Usage Couchbase Documentation – CBHealthChecker Toolwww.couchbase.com/docs/couchbase-manual- 2.1.0/couchbase-admin-cmdline-cbhealthchecker.html
    27. 27. 34 Sample Healthy Report Time periods Categories to jump to Expanding this provides detailed Sizing info Cluster-wide stats analyzed
    28. 28. 35 Sample UnHealthy Report User needs take action Details about what action
    29. 29. 36 Q & A
    30. 30. 37 Thank you! anil@couchbase.com @anilkumar1129 Get Couchbase Server at http://www.couchbase.com/download
    31. 31. 38

    ×