Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra Metrics

7,113 views

Published on

Presentation on metrics for the 2014 Cassandra Summit

Published in: Technology
  • Be the first to comment

Cassandra Metrics

  1. 1. Monitor Everything By: Chris Lohfink Cassandra Summit 2014
  2. 2. About Me ● Sr. Engineer at Pythian o Lead of Cassandra Practice ● Remote in Minnesota ● DataStax MVP for Cassandra ‘14 ● Interests o Java, Clojure, Python dev o Data science o Hobbyist electronics #CassandraSummit 2014
  3. 3. About Pythian Pythian is a global data outsourcing and consulting company that specializes in optimizing and managing mission-critical data systems. Pythian blends the world’s leading data experts with advanced, secure service delivery processes to create the industry’s best standard of care for its clients. Since its inception, Pythian has managed some of the world’s largest, most business-critical data infrastructures. #CassandraSummit 2014 10,000 Pythian currently manages more than 10,000 systems. 350 Pythian currently employs more than 350 people in 25 countries worldwide. 1997 Pythian was founded in 1997
  4. 4. About Cassandra ● No Single Point of Failure ● Fault Tolerant ● Awesome properties for an operations team who does not want to get up at 3am #CassandraSummit 2014
  5. 5. About Cassandra ● Nothing should be set up and forgotten about ● Easy to do with Cassandra though o Fault tolerance on properly configured setup handles single node being down or having temp performance issues o No back pressure on writes until there is a lot of #CassandraSummit 2014 trouble
  6. 6. Utilize the fault tolerance buffer ● Need to observe and react to current issues ● Predict future issues ● Divide this into two approaches #CassandraSummit 2014 o Proactive o Reactive
  7. 7. Proactive ● Daily & Weekly checkups to prevent, and predict problems o Capacity o Performance bottlenecks o Data Modeling issues #CassandraSummit 2014
  8. 8. Reactive ● Something about best laid plans… o Hardware failures o Bugs o Malicious or Non-Malicious users ● Alarms, Pager Duty #CassandraSummit 2014
  9. 9. Common element #CassandraSummit 2014 ● Data is needed o form alerts o find anomalies o trending o debugging
  10. 10. Metrics ● Window to the application o Bridge the gap - Coda Hale #CassandraSummit 2014
  11. 11. Gathering Metrics SOURCES Cassandra Environment OpsCenter Logs JMX CPU, Disk, Network Nodetool JVM, GC #CassandraSummit 2014
  12. 12. Metrics but of course… Without context, the data is just pretty graphs
  13. 13. JMX ● Java Management Extensions ● Complex… very engineered ● Resources represented as objects with attributes and operations ● Used for monitoring or as input #CassandraSummit 2014
  14. 14. JMX ● The annoying gateway to metrics ○ Poor tooling - requires java ○ Slow, Memory Leaks ○ Historically and currently frustrating for ops (pre 2.0.8) Cassandra Init connection to port 7199 Reply with hostname:port for 1024-65535 #CassandraSummit 2014 RMI connection Client (You) Gets new hostname:port, drops old connection and attempts to connect 7199 7199 Connected!
  15. 15. JMX #CassandraSummit 2014 ● Visual o jconsole o visualvm ● Command line o jmxterm o jmxsh ● MX4J ● Jolokia
  16. 16. JMX [domain]:[key]=[value],[key2]=[value2]... #CassandraSummit 2014
  17. 17. JMX [domain]:[key]=[value],[key2]=[value2]... com.pythian:site=blog,type=views,target=post1 #CassandraSummit 2014
  18. 18. JMX [domain]:[key]=[value],[key2]=[value2]... com.pythian:site=blog,type=views,target=post1 #CassandraSummit 2014
  19. 19. JMX [domain]:[key]=[value],[key2]=[value2]... com.pythian:site=blog,type=views,target=post1 #CassandraSummit 2014
  20. 20. JMX Domains org.apache.cassandra. ● db ● internal ● net ● request #CassandraSummit 2014
  21. 21. JMX Beans org.apache.cassandra.metrics ● db ● internal ● net ● request #CassandraSummit 2014
  22. 22. JMX org.apache.cassandra.metrics :type= #CassandraSummit 2014 ● Cache ● Client ● ClientRequest ● ClientRequestMetrics ● ColumnFamily ● CommitLog ● Compaction ● DroppedMessage ● FileCache ● Keyspace ● Storage ● ThreadPools
  23. 23. JMX org.apache.cassandra.metrics type=*, scope=*, name=*, type=ThreadPools, path=*, scope=*, name=*, type=ColumnFamily, keyspace=*, scope=*, name=*, type=Keyspace, keyspace=*, name=*, #CassandraSummit 2014
  24. 24. Metrics ● Toolkit called metrics for metrics o By Coda Hale @ Yammer ● Easy to use ● Easy to read (if you know java) ● Popular #CassandraSummit 2014
  25. 25. Types of Metrics #CassandraSummit 2014 ● Gauge o instantaneous value ● Counter o number that can be incremented & decremented ● Meter o rate of events over time (1/5/15 min moving avg) ● Histogram o representation of statistical distribution  50, 75, 95, 98, 99, 99.9 percentile  average, median, min, max, standard deviation ● Timer o rate of events (meter) o histogram of duration
  26. 26. JMX #CassandraSummit 2014 75th percentile is 683 MICROSECONDS (75% took 683us or less) One minute rate is 13,915 calls per SECOND
  27. 27. JMX ● Overwhelming at first ● Hard to tell what they mean without the source ● Moves around a lot ● Fortunately there is nodetool #CassandraSummit 2014
  28. 28. Nodetool ● JMX command line wrapper ● Many options ● Operations and diagnostic procedures ● For reactive analysis o ad hoc, spot checks #CassandraSummit 2014
  29. 29. Nodetool tpstats nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 #CassandraSummit 2014 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0
  30. 30. Staged Event Driven Architecture ● Decomposes complex event system ● Set of stages (thread pools) ● Queue between each ● Shares a lot of pros cons as SOA #CassandraSummit 2014
  31. 31. Staged Event Driven Architecture #CassandraSummit 2014 ReadStage Threads x32 Client Request RequestResponse Threads ReadRepairStage Threads Messaging Service Node 2 Node 1 Node 1 Node 1 = Task
  32. 32. Staged Event Driven Architecture ● Possible to overrun the processing capabilities of a stage that is not in the requests feedback loop (i.e. ReadRepairStage) #CassandraSummit 2014
  33. 33. Nodetool tpstats nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 #CassandraSummit 2014 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0
  34. 34. Nodetool tpstats nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 #CassandraSummit 2014 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0
  35. 35. Nodetool tpstats nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 #CassandraSummit 2014 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0
  36. 36. Nodetool tpstats nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 #CassandraSummit 2014 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0
  37. 37. Nodetool tpstats nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 #CassandraSummit 2014 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0
  38. 38. Nodetool tpstats nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 #CassandraSummit 2014 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0
  39. 39. Nodetool tpstats nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0 #CassandraSummit 2014 RequestResponse Threads
  40. 40. Nodetool tpstats nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 1 COUNTER_MUTATION 0 #CassandraSummit 2014 RequestResponse Threads
  41. 41. Nodetool tpstats nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks} #CassandraSummit 2014 More at: http://www.evidencebasedit.com/guide-to-cassandra-thread-pools
  42. 42. Nodetool cfhistograms nodetool cfhistograms {keyspace} {table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 SSTables per Read 1 sstables: 98554 2 sstables: 4534 Write Latency (microseconds) No Data Read Latency (microseconds) 10 us: 2 12 us: 17 14 us: 96 17 us: 208 20 us: 677 24 us: 3081 29 us: 4552 35 us: 3559
  43. 43. Read Write Path mile high overview Memtable SSTable #CassandraSummit 2014 Writes Reads
  44. 44. Read Write Path mile high overview Memtable SSTable #CassandraSummit 2014 Writes Reads
  45. 45. Read Write Path mile high overview Memtable SSTable #CassandraSummit 2014 Writes Reads
  46. 46. Read Write Path mile high overview Memtable SSTable #CassandraSummit 2014 Writes Reads
  47. 47. Read Write Path mile high overview Memtable SSTable #CassandraSummit 2014 Writes Reads
  48. 48. Nodetool cfhistograms nodetool cfhistograms {keyspace} {table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 SSTables per Read 1 sstables: 98554 2 sstables: 4534 Write Latency (microseconds) No Data Read Latency (microseconds) 10 us: 2 12 us: 17 14 us: 96 17 us: 208 20 us: 677 24 us: 3081 29 us: 4552 35 us: 3559
  49. 49. Nodetool cfhistograms 1.1 nodetool cfhistograms {keyspace} {table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} Offset SSTables Write Latency Read Latency Row Size Column Count 1 3579 0 0 0 0 2 0 0 0 0 0 . . . 35 0 0 0 0 0 42 0 0 27 0 0 50 0 0 187 0 0 60 0 10 460 0 0 72 0 200 689 0 0 86 0 663 552 0 0 103 0 796 367 0 0 124 0 297 736 0 0 149 0 265 243 0 0 179 0 460 263 0 0 . . . 25109160 0 0 0 0 0 #CassandraSummit 2014
  50. 50. Nodetool cfhistograms #CassandraSummit 2014 https://gist.github.com/clohfink/6068003
  51. 51. Nodetool cfhistograms 2.1 nodetool cfhistograms {keyspace} {table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} Keyspace/Table histograms Percentile SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes) 50% 1.00 10.00 524.00 310 5 75% 1.00 11.75 888.00 310 5 95% 1.00 15.00 4843.75 310 5 98% 1.00 17.00 9658.90 310 5 99% 1.00 19.00 12306.47 310 5 Min 0.00 0.00 68.00 30 0 Max 2.00 219386.00 45383.00 310 5 #CassandraSummit 2014
  52. 52. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  53. 53. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  54. 54. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  55. 55. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  56. 56. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 SSTables in each level: [14/4, 1, 0, …, 0] Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  57. 57. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  58. 58. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  59. 59. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  60. 60. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  61. 61. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  62. 62. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  63. 63. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  64. 64. Nodetool cfstats nodetool cfstats {-i} {keyspace}.{table} org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table} #CassandraSummit 2014 Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3 Local read count: 11207 Local read latency: 0.048 ms Local write count: 17598 Local write latency: 0.054 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 11688 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 126934 Compacted partition mean bytes: 2730 Average live cells per slice: 0.0 Average tombstones per slice: 0.0
  65. 65. Nodetool proxyhistograms nodetool proxyhistograms org.apache.cassandra.metrics:type=ClientRequest,scope={Read|Write|RangeSlice},name=Latency #CassandraSummit 2014 $ nodetool proxyhistograms proxy histograms Read Latency (microseconds) 61214 us: 1 Write Latency (microseconds) 103 us: 22 124 us: 142 149 us: 297 179 us: 1190 215 us: 1823 258 us: 2091 ...
  66. 66. Nodetool #CassandraSummit 2014 Much more!! http://www.datastax.com/documentation/cassa ndra/2.0/cassandra/tools/toolsNodetool_r.html
  67. 67. Reporting Interface #CassandraSummit 2014 Default Addons Community JMX Ganglia Cassandra StatsD NewRelic Splunk Console Graphite Cloudwatch Kafka Riemann TempDB Csv Munin Riak InfluxDB Sematext Slf4j MongoDB OpenTSDB Librato … MORE
  68. 68. Reporting Interface ● Configurable with yaml o console, csv, ganglia, graphite, riemann ● Create reporter with premain agent o compiling new jar with manifest o add to classpath o add javaagent in cassandra-env.sh #CassandraSummit 2014
  69. 69. Garbage Collection ● Death, Taxes, and a stop the world GC ● Common issue to all JVM based applications #CassandraSummit 2014
  70. 70. Garbage Collection Enable gc logging ● Virtually no overhead ● Can be very helpful in diagnosing performance issues #CassandraSummit 2014
  71. 71. Garbage Collection JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails" JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps" JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC" JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution" JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime" JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure" JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1" JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log" JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log" JVM_OPTS="$JVM_OPTS -XX:+UseGCLogFileRotation" JVM_OPTS="$JVM_OPTS -XX:NumberOfGCLogFiles=10" JVM_OPTS="$JVM_OPTS -XX:GCLogFileSize=10M" #CassandraSummit 2014
  72. 72. Garbage Collection JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails" JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps" JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC" JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution" JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime" JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure" JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1" JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log" JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log" JVM_OPTS="$JVM_OPTS -XX:+UseGCLogFileRotation" JVM_OPTS="$JVM_OPTS -XX:NumberOfGCLogFiles=10" JVM_OPTS="$JVM_OPTS -XX:GCLogFileSize=10M" #CassandraSummit 2014
  73. 73. Garbage Collection Could be its own talk Honorable mentions: ● https://github.com/chewiebug/GCViewer ● http://jworks.idv.tw/GcWeb/ ● Python, R, Octave #CassandraSummit 2014
  74. 74. Logging /var/log/cassandra/system.log o provides a rolling log o log4j /var/log/cassandra/output.log o captured standard error and standard out o truncated on restart #CassandraSummit 2014 System Logs o syslog, dmesg, etc
  75. 75. OS Metrics #CassandraSummit 2014 Shout-out: http://www.brendangregg.com/linuxperf.html
  76. 76. JVM #CassandraSummit 2014 ● Heap o GC logs o JMX ● Threads o jvmtop o Jstack (+htop) o kill -3 o JMX
  77. 77. And Everything #CassandraSummit 2014
  78. 78. Questions ? #CassandraSummit 2014

×