Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra Day Atlanta 2016 - Monitoring Cassandra

1,102 views

Published on

Overview of monitoring Apache Cassandra

Published in: Technology
  • Be the first to comment

Cassandra Day Atlanta 2016 - Monitoring Cassandra

  1. 1. CASSANDRA DAY ATLANTA 2016 MONITORING CASSANDRA Aaron Morton @aaronmorton CEO Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  2. 2. AboutThe Last Pickle. Work with clients to deliver and improve Apache Cassandra based solutions. Apache Cassandra Committer and DataStax MVPs. Based in New Zealand,Australia, France & USA.
  3. 3. Metrics Monitoring & Alerting Insights
  4. 4. codehale / yammer / drop wizard
  5. 5. Metrics <dependency groupId=“io.dropwizard.metrics" artifactId=“metrics-core" version="3.1.0" />
  6. 6. Metrics Seperate Collection from Reporting.
  7. 7. Metrics Collection Metrics are always collected.
  8. 8. Metrics Metrics have a dotted notation name, timestamp, and value e.g. com.thelastpickle.presenters.count=2
  9. 9. MetricTypes Gauge. A simple value.
  10. 10. MetricTypes Ratio Gauge. A ratio between two values.
  11. 11. MetricTypes Histograms. The distribution of values in a stream of data.
  12. 12. Histograms Quantiles (e.g. 75th, 95th) calculated using reservoir sampling. (Check docs.)
  13. 13. Histograms Default Exponentially Decaying Reservoirs, (roughly) the last five minutes of data, exponential weighting towards newer data. (Check docs.)
  14. 14. MetricTypes Meter Measures the per second rate at which a set of events occur.
  15. 15. Meter Three different exponentially- weighted moving average rates: 1, 5, and 15 minutes
  16. 16. MetricTypes Timer. Histogram of duration and rate of events .
  17. 17. Reporting Reporters run in the Cassandra process, pushing metrics to external services.
  18. 18. Reporters ConsoleReporter, GraphiteReporter, InfluxDBReporter, RiemannReporter, …
  19. 19. Reporters In Cassandra Configuration file: metrics-reporter-config- sample.yaml
  20. 20. Reporters In Cassandra graphite: - period: 10 timeunit: 'SECONDS' prefix: 'cassandra.prod.ip_1_2_3_4.' hosts: - host: '1.2.3.4' port: 2003 predicate: color: "white" useQualifiedName: true patterns: - "^org.apache.cassandra.metrics.+"
  21. 21. metrics-reporter-config Configures Metrics reporters. github.com/addthis/metrics- reporter-config
  22. 22. metrics-reporter-config Supports: Ganglia Graphite Riemann
  23. 23. JMX Cassandra creates JMX MBeans for each Metric.
  24. 24. JMX
  25. 25. Reporters Reporters may change the name of measures, e.g. 95thPercentile == p95
  26. 26. Metrics Monitoring & Alerting Insights
  27. 27. Monitoring and Alerting Use what you like and what works for you.
  28. 28. Monitoring Platforms OpsCentre, Grafana & Graphite, DataDog, Riemann
  29. 29. Metrics Monitoring & Alerting Insights
  30. 30. Names ? All under org.apache.cassandra.metrics
  31. 31. Scale ? Latency? microseconds Rates? per second Data? bytes
  32. 32. Percentiles ? 75thPercentile 95thPercentile 99thPercentile
  33. 33. Rates ? OneMinuteRate
  34. 34. RequestThroughput - All Requests ClientRequest. $REQUEST.Latency.1MinuteRate CASRead, CASWrite, RangeSlice, Read, ViewWrite, Write
  35. 35. A Note On Requests We will focus on Read, Write But there are others CAS*, RangeSlice, ViewWrite
  36. 36. RequestThroughput - PerTable Table.$KEYSPACE.$TABLE. ReadLatency.1MinuteRate WriteLatency.1MinuteRate
  37. 37. Request Latency - All Requests ClientRequest. Write.Latency.95percentile Read.Latency.95percentile
  38. 38. Request Latency - PerTable Table.$KEYSPACE.$TABLE. CoordinatorReadLatency.95percentile
  39. 39. Local Latency - PerTable Table.$KEYSPACE.$TABLE. WriteLatency.95percentile ReadLatency.95percentile
  40. 40. Local Read Path Table.$KEYSPACE.$TABLE. KeyCacheHitRate.value BloomFilterFalseRatio.value LiveScannedHistogram.95percentile TombstoneScannedHistogram.95percentile SSTablesPerReadHistogram.95percentile
  41. 41. Memory Usage Table.$KEYSPACE.$TABLE. BloomFilterOffHeapMemoryUsed.value IndexSummaryOffHeapMemoryUsed.value MemtableOnHeapSize.value MemtableOffHeapSize.value
  42. 42. Clients Client.connnectedNativeClients.value CQL.PreparedStatementsRatio.value CQL.PreparedStatementsEvicted.value
  43. 43. Client Errors ClientRequest. $REQUEST.Unavailables.1MinuteRate $REQUEST.Timeouts.1MinuteRate $REQUEST.Failures.1MinuteRate
  44. 44. Inconsistency Storage.TotalHints.count HintedHandOffManager. Hints_created-$IP_ADDRESS.count Connection.TotalTimeouts.1MinuteRate Connection.$IP_ADDRESS.Timeouts. 1MinuteRate
  45. 45. Inconsistency Will also want to monitor dropped messages, later…
  46. 46. Eventual Consistency ReadRepair.Attempted.1MinuteRate ReadRepair.RepairedBackground. 1MinuteRate ReadRepair.RepairedBlocking.1MinuteRate
  47. 47. Server Errors Storage.Exceptions.count
  48. 48. Disk Usage Storage.Load.count Table.$KEYSPACE.$TABLE. TotalDiskSpaceUsed.count
  49. 49. Compactions Compaction.PendingTasks.value Compaction.TotalCompactionsCompleted. 1MinuteRate Table.$KEYSPACE.$TABLE.PendingCompactions .value
  50. 50. Thread Pool Performance ThreadPools.request. MutationStage.PendingTasks.value ReadStage.PendingTasks.value CounterMutationStage.PendingTasks.value RequestResponseStage.PendingTasks.value ViewMutationStage.PendingTasks.value
  51. 51. Thread Pool Performance DroppedMessage. MUTATION.Dropped.1MinuteRate READ.Dropped.1MinuteRate
  52. 52. Thread Pool Performance DroppedMessage. $VERB.InternalDroppedLatency .95thPercentile $VERB.CrossNodeDroppedLatency .95thPercentile
  53. 53. Commit Log Performance CommitLog. PendingTasks.Value WaitingOnSegmentAllocation. 95thPercentile WaitingOnCommit.Value
  54. 54. Thanks.
  55. 55. Aaron Morton @aaronmorton Co-Founder & Principal Consultant www.thelastpickle.com

×