More Related Content Similar to Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn (20) More from Michael Kehoe (17) Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn1. ©2016 Couchbase Inc.
Monitoring Production Deployments
TheTools – LinkedIn
Alex Ma – Principal Architect – Couchbase
Michael Kehoe – Staff Site Reliability Engineer - LinkedIn
1
2. ©2016 Couchbase Inc.©2016 Couchbase Inc.
Overview
• MonitoringTools
• Making sense of the data
• External Monitoring Integrations
• Summary
2
3. ©2016 Couchbase Inc. 3
Alex Ma
PrincipalArchitect, StrategicAccounts
alex@couchbase.com
IMAGE GOES HERE
4. ©2016 Couchbase Inc. 4
Michael Kehoe
Staff Site Reliability Engineer (SRE) - LinkedIn
mkehoe@linkedin.com
• Production-SRE team
• Member of CBVT
• Australian!
• Contact
• linkedin.com/in/michaelkkehoe
• @matrixtek
GOES HERE
9. ©2016 Couchbase Inc. 9
MonitoringTools – Couchbase REST API
• http://docs.couchbase.com/admin/admin/REST/rest-bucket-stats.html
• GET /pools/default/buckets/[bucket-name]/stats
• JSON output format
• 60 collections per metric
10. ©2016 Couchbase Inc. 10
MonitoringTools - cbstats
• http://docs.couchbase.com/admin/admin/CLI/cbstats-intro.html
• Command Line tool for viewing stats
• 333+ Available stats
• Cumulative and Snapshot
11. ©2016 Couchbase Inc. 11
MonitoringTools - cbstats
• Average value size = ep_value_size/(curr_items_tot-ep_num_non_resident)
• ep_value_size = Amount of RAM used to hold values in this bucket for this node
• Curr_items_tot =Total count of active/replica items in this bucket for this node
• Ep_num_non_resident =Total number of items not resident in RAM
• 9567135872 / ( 28733039 – 26582747 ) = 4449.22 bytes
12. ©2016 Couchbase Inc. 12
MonitoringTools - cbstats
• Cbstats can be pointed to a specific host and a specific port
13. ©2016 Couchbase Inc. 13
MonitoringTools - cbstats
• CbstatsTimings
• Histogram that shows the timing of a number of internal operations
• Commit to disk, background IO operations, GET ops
• http://docs.couchbase.com/admin/admin/CLI/CBstats/cbstats-timing.html
14. ©2016 Couchbase Inc. 14
MonitoringTools - Queries
• http://developer.couchbase.com/documentation/server/current/tools/query-monitoring.html
• http://localhost:8093/admin/vitals
15. ©2016 Couchbase Inc. 15
MonitoringTools - htop
• Htop|Top|vmstat|proc
• Core Utilization
• Customization
16. ©2016 Couchbase Inc. 16
MonitoringTools - iostat
• IO Utilization
• Average wait times
• Read/Write requests
• Determine Capacity
17. ©2016 Couchbase Inc. 17
MonitoringTools - iostat
• IO Utilization
• Average wait times
• Read/Write requests
• Determine Capacity
18. ©2016 Couchbase Inc. 18
MonitoringTools - iftop
• See where traffic is coming from
• Measure replication throughput
• Verify Capacity
20. ©2016 Couchbase Inc. 20
Key Statistics
Metrics to Consider:
• Couchbase-Server
• Client application
• Disk
• Network
22. ©2016 Couchbase Inc. 22
Key Statistics – Couchbase Server
Metrics to Consider:
• Operations
• Cache miss (ep_cache_miss_rate)
• Active/Replica vbuckets (vb_active_num/vb_replica_num)
• Percentage of items in memory (vb_active_resident_items_ratio)
• Disk Queue (ep_diskqueue_items)
• Misdirected Requests (ep_num_not_my_vbuckets)
23. ©2016 Couchbase Inc. 23
Key Statistics – Couchbase Client
Metrics to Consider:
• Call-time latency
• Measure GET’s/ SET’s separately
• Hit-rate
• Is the hit-rate what you expected
• Errors
• Timeouts retrieving objects
• Unable to reach Couchbase-Server
• See http://developer.couchbase.com/documentation/server/4.0/sdks/java-2.2/event-bus-
metrics.html
25. ©2016 Couchbase Inc. 25
Key Statistics – Disk
Metrics to Consider:
• Disk Space
• Compaction
• Rebalance
• Disk IO
• Can disk sustain required IOPS
• Disk Queue
26. ©2016 Couchbase Inc. 26
Key Statistics – Network
Metrics to Consider:
• Network connectivity
• Connections
• Capacity/ Utilization
27. ©2016 Couchbase Inc. 27
Key Statistics – Network – Connectivity
• Ping - simple network connectivity test
• Firewalls – make sure you have the correct ports open
• See http://developer.couchbase.com/documentation/server/current/install/install-ports.html
28. ©2016 Couchbase Inc. 28
Key Statistics – Network – Connections
• File-descriptor limits
• Connections in CLOSE_WAIT state
• Collect stats from /proc/net/tcp
29. ©2016 Couchbase Inc. 29
Key Statistics – Network – Capacity/ Utilization
• Practical network capacity is ~85-90% of theoretical
• E.g. 1Gb/s network interface can do 850-900Mb/s
• Congested networks are problematic
• Higher latency on responses
• Slower replication
• Collect stats from /proc/net/dev
30. ©2016 Couchbase Inc. 30
Key Statistics – Network – Capacity/ Utilization
• Practical network capacity is ~85-90% of theoretical (1250 Mb/s)
• E.g. 1Gb/s network interface can do 850-900Mb/s
Average object size (bytes) 4,096
ID length (bytes) 32
Meta data size (bytes) 56
Reads 100,000
Writes 60,000
Replica count 1
Read network utilization 421,600,000
Write network utilizaation 502,080,000
Total network utilization 923,680,000 1.25 billion theoretical max
remaining bandwidth 276,320,000
33. ©2016 Couchbase Inc. 33
External Monitoring Integrations – Write your own
Getting Started
• Use Couchbase REST API
• Pipe ‘cbstats’ output
34. ©2016 Couchbase Inc.©2016 Couchbase Inc.
Using Couchbase REST API
• Examples
• Datadog – http://lnkd.in/cb-datadog
• This Example – http://lnkd.in/cb-stats-collector
34
41. ©2016 Couchbase Inc. 41
Summary
Important to have monitoring in-place
Understand the metrics you monitor
• What causes them
• How to remediate
Editor's Notes - Service is slow, think its couchbase - Service is slow, think its couchbase