Metrics lightning talk

•Download as PPTX, PDF•

1 like•1,123 views

This document discusses Cassandra metrics and provides some key metrics to monitor for Cassandra. It notes that while Cassandra is fault tolerant, issues can arise if the number of events is not monitored. It then lists some key stages and metrics to watch, including the ReadStage, RequestResponseStage, MutationStage, and provides an example of using nodetool tpstats and JMX to access metrics on tasks processed and pending in different stages. The document also notes that while there are many metrics, without context they are not very useful, and it is missing some important metrics like heap and OS metrics.

Technology

Blackbird
About Me
• Engineer at Blackbird
• Worked with C* since 0.8 (3 years)
• 7 years as a Java/Python developer
• Interests
o Data Science
o Hobbyist Electronics
o Development

Blackbird
About Cassandra
• Fault tolerant to a fault
o easy to ignore until it gets bad
• Like all other systems:
o If not many events no one pays attention to it
o If theres a lot of events need to keep eye on it
o When things happen need information to quickly diagnose
Basically...

Blackbird
Lots of Metrics
A lot of data but with no context or
understanding doesn’t have that
much use
… but you have lots of pretty
graphs

Blackbird
Disclaimer
This not all of the important metrics, in fact it is missing many critical ones
• Heap
• OS metrics
• Latencies
• Log messages

Blackbird
An Example for a little background
Threads
ReadStage
x32
ClientRequest
RequestResponse
231-1 231-1 Threads
ReadRepairStage
Threads
231-1
Messaging
Service

Blackbird
Cassandra Key Metrics
● Cassandra internal messaging based on SEDA with many asynchronous
elements
● Its easy to overrun the processing capabilities of a stage that is not in the
requests feedback loop (i.e. ReadRepairStage)

Blackbird
Access the metrics
● nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 113702 0 0
RequestResponseStage 0 0 0 0 0
MutationStage 0 0 164503 0 0
...
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
...
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
● JMX
org.apache.cassandra.request:type=*
and
org.apache.cassandra.internal:type=*
● Metrics Reporter
MBean Attribute tpstats name Description
ActiveCount Active Number of tasks pulled off the queue with
a Thread currently processing.
PendingTasks Pending Number of tasks in queue waiting for a
thread
CompletedTasks Completed Number of tasks completed
CurrentlyBlockedTasks Blocked When a pool reaches its core pool size
(configurable or set per stage, more
below) it will begin queuing until the max
size is reached. When this is reached it
will block until there is room in the queue.
TotalBlockedTasks All time blocked Total number of tasks that have been
blocked

Blackbird
Examples
• Read/Mutation Stage
o Too many reads/writes, disk failure, poor tuning
• ReplicateOnWrite (CounterMutationStage in 2.1+)
o High throughput of counter increments
• FlushWriter
o writes over running disk capabilities, poor tuning
o large collections
• GossipStage
o vnodes + many servers (pre 2.0.3)

What's hot

Prometheus for Monitoring Metrics (Fermilab 2018)Brian Brazil

Teoria efectului defectului hardware: GoogleFSAsociatia ProLinux

Anatomy of an actionGordon Chung

Concurrency Control in Distributed Database.Meghaj Mallick

OpenTSDB: HBaseCon2017HBaseCon

Evolving Prometheus for the Cloud Native World (FOSDEM 2018)Brian Brazil

Leveraging chaos mesh in Astra Serverless testingPierre Laporte

An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil

Getting to Know the Cassandra Codebasegdusbabek

Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)Brian Brazil

Anatomy of a Prometheus Client Library (PromCon 2018)Brian Brazil

Scala like distributed collections - dumping time-series data with apache sparkDemi Ben-Ari

Cassandra Codebase 2011gdusbabek

HDFS client write/read implementation detailswchevreuil

Monitoring NGINX (plus): key metrics and how-toDatadog

Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Ontico

Insight DE projectKat Chuang

Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Brian Brazil

An Introduction to PrometheusEvgeny Shmarnev

Prometheus for Monitoring Metrics (Percona Live Europe 2017)Brian Brazil

What's hot (20)

Prometheus for Monitoring Metrics (Fermilab 2018)

Teoria efectului defectului hardware: GoogleFS

Anatomy of an action

Concurrency Control in Distributed Database.

OpenTSDB: HBaseCon2017

Evolving Prometheus for the Cloud Native World (FOSDEM 2018)

Leveraging chaos mesh in Astra Serverless testing

An Introduction to Prometheus (GrafanaCon 2016)

Getting to Know the Cassandra Codebase

Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)

Anatomy of a Prometheus Client Library (PromCon 2018)

Scala like distributed collections - dumping time-series data with apache spark

Cassandra Codebase 2011

HDFS client write/read implementation details

Monitoring NGINX (plus): key metrics and how-to

Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...

Insight DE project

Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)

An Introduction to Prometheus

Prometheus for Monitoring Metrics (Percona Live Europe 2017)

Similar to Metrics lightning talk

Cassandra Summit 2014: Monitor Everything!DataStax Academy

Cassandra MetricsChris Lohfink

Deployment Preparedness MongoDB

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...Codemotion Tel Aviv

Presto At Treasure DataTaro L. Saito

Cassandra To Infinity And BeyondRomain Hardouin

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...Codemotion

Tutorial: Network State Awareness TroubleshootingAPNIC

Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax

Neo4j after 1 year in productionAndrew Nikishaev

Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion

Apache Tajo on Swift: Bringing SQL to the OpenStack WorldJihoon Son

Building Big Data Streaming ArchitecturesDavid Martínez Rego

(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services

cb streams - gavin pickinOrtus Solutions, Corp

How to Make Norikra PerfectSATOSHI TAGOMORI

Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch

Performance and predictability (1)RichardWarburton

Performance and Predictability - Richard WarburtonJAXLondon2014

Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...confluent

Similar to Metrics lightning talk (20)

Cassandra Summit 2014: Monitor Everything!

Cassandra Metrics

Deployment Preparedness

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...

Presto At Treasure Data

Cassandra To Infinity And Beyond

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...

Tutorial: Network State Awareness Troubleshooting

Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016

Neo4j after 1 year in production

Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018

Apache Tajo on Swift: Bringing SQL to the OpenStack World

Building Big Data Streaming Architectures

(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014

cb streams - gavin pickin

How to Make Norikra Perfect

Kubernetes @ Squarespace (SRE Portland Meetup October 2017)

Performance and predictability (1)

Performance and Predictability - Richard Warburton

Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

The transition to renewables in India.pdfCompetition Advisory Services (India) LLP

Key Features Of Token Development (1).pptxLBM Solutions

Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

AI as an Interface for Commercial BuildingsMemoori

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

How to convert PDF to text with Nanonetsnaman860154

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

My Hashitalk Indonesia April 2024 Presentation

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

Maximizing Board Effectiveness 2024 Webinar.pptx

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

08448380779 Call Girls In Friends Colony Women Seeking Men

The transition to renewables in India.pdf

Key Features Of Token Development (1).pptx

Next-generation AAM aircraft unveiled by Supernal, S-A2

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

AI as an Interface for Commercial Buildings

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

How to convert PDF to text with Nanonets

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Unblocking The Main Thread Solving ANRs and Frozen Frames

Azure Monitor & Application Insight to monitor Infrastructure & Application

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Metrics lightning talk

1. Cassandra Metrics By: Chris Lohfink

2. Blackbird About Me • Engineer at Blackbird • Worked with C* since 0.8 (3 years) • 7 years as a Java/Python developer • Interests o Data Science o Hobbyist Electronics o Development

3. Blackbird About Cassandra • Fault tolerant to a fault o easy to ignore until it gets bad • Like all other systems: o If not many events no one pays attention to it o If theres a lot of events need to keep eye on it o When things happen need information to quickly diagnose Basically...

4. Blackbird

5. Blackbird Lots of Metrics A lot of data but with no context or understanding doesn’t have that much use … but you have lots of pretty graphs

6. Blackbird Disclaimer This not all of the important metrics, in fact it is missing many critical ones • Heap • OS metrics • Latencies • Log messages

7. Blackbird An Example for a little background Threads ReadStage x32 ClientRequest RequestResponse 231-1 231-1 Threads ReadRepairStage Threads 231-1 Messaging Service

8. Blackbird Cassandra Key Metrics ● Cassandra internal messaging based on SEDA with many asynchronous elements ● Its easy to overrun the processing capabilities of a stage that is not in the requests feedback loop (i.e. ReadRepairStage)

9. Blackbird Access the metrics ● nodetool tpstats Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 113702 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 164503 0 0 ... InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 ... REQUEST_RESPONSE 0 COUNTER_MUTATION 0 ● JMX org.apache.cassandra.request:type=* and org.apache.cassandra.internal:type=* ● Metrics Reporter MBean Attribute tpstats name Description ActiveCount Active Number of tasks pulled off the queue with a Thread currently processing. PendingTasks Pending Number of tasks in queue waiting for a thread CompletedTasks Completed Number of tasks completed CurrentlyBlockedTasks Blocked When a pool reaches its core pool size (configurable or set per stage, more below) it will begin queuing until the max size is reached. When this is reached it will block until there is room in the queue. TotalBlockedTasks All time blocked Total number of tasks that have been blocked

10. Blackbird Examples • Read/Mutation Stage o Too many reads/writes, disk failure, poor tuning • ReplicateOnWrite (CounterMutationStage in 2.1+) o High throughput of counter increments • FlushWriter o writes over running disk capabilities, poor tuning o large collections • GossipStage o vnodes + many servers (pre 2.0.3)

11. Blackbird Questions ?

Metrics lightning talk

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Metrics lightning talk

Similar to Metrics lightning talk (20)

Recently uploaded

Recently uploaded (20)

Metrics lightning talk