Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Metrics lightning talk


Published on

Introduction to thread pool metrics in Cassandra

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Metrics lightning talk

  1. 1. Cassandra Metrics By: Chris Lohfink
  2. 2. Blackbird About Me • Engineer at Blackbird • Worked with C* since 0.8 (3 years) • 7 years as a Java/Python developer • Interests o Data Science o Hobbyist Electronics o Development
  3. 3. Blackbird About Cassandra • Fault tolerant to a fault o easy to ignore until it gets bad • Like all other systems: o If not many events no one pays attention to it o If theres a lot of events need to keep eye on it o When things happen need information to quickly diagnose Basically...
  4. 4. Blackbird
  5. 5. Blackbird Lots of Metrics A lot of data but with no context or understanding doesn’t have that much use … but you have lots of pretty graphs
  6. 6. Blackbird Disclaimer This not all of the important metrics, in fact it is missing many critical ones • Heap • OS metrics • Latencies • Log messages
  7. 7. Blackbird An Example for a little background Threads ReadStage x32 ClientRequest RequestResponse 231-1 231-1 Threads ReadRepairStage Threads 231-1 Messaging Service
  8. 8. Blackbird Cassandra Key Metrics ● Cassandra internal messaging based on SEDA with many asynchronous elements ● Its easy to overrun the processing capabilities of a stage that is not in the requests feedback loop (i.e. ReadRepairStage)
  9. 9. Blackbird Access the metrics ● nodetool tpstats Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0 ● JMX org.apache.cassandra.request:type=* and org.apache.cassandra.internal:type=* ● Metrics Reporter MBean Attribute tpstats name Description ActiveCount Active Number of tasks pulled off the queue with a Thread currently processing. PendingTasks Pending Number of tasks in queue waiting for a thread CompletedTasks Completed Number of tasks completed CurrentlyBlockedTasks Blocked When a pool reaches its core pool size (configurable or set per stage, more below) it will begin queuing until the max size is reached. When this is reached it will block until there is room in the queue. TotalBlockedTasks All time blocked Total number of tasks that have been blocked
  10. 10. Blackbird Examples • Read/Mutation Stage o Too many reads/writes, disk failure, poor tuning • ReplicateOnWrite (CounterMutationStage in 2.1+) o High throughput of counter increments • FlushWriter o writes over running disk capabilities, poor tuning o large collections • GossipStage o vnodes + many servers (pre 2.0.3)
  11. 11. Blackbird Questions ?