SlideShare a Scribd company logo
1 of 69
Download to read offline
Improving the Observability of
Cassandra, Kafka and Kubernetes
applications with Prometheus and
OpenTracing
Paul Brebner
Technology Evangelist
instaclustr.com
Observability track, ApacheCon 2019, Tuesday, 10th September, Las Vegas, USA
https://www.apachecon.com/acna19/s/#/scheduledEvent/1031
As distributed applications
grow more complex,
dynamic, and massively
scalable, “observability”
becomes more critical.
Observability is the practice
of using metrics, monitoring
and distributed tracing to
understand how a system
works.
Observability
Critical
As distributed cloud
applications grow more
complex, dynamic, and
massively scalable,
“observability” becomes
more critical.
Observability is the practice
of using metrics, monitoring
and distributed tracing to
understand how a system
works.
And find the invisible cows
Observability
Critical
As distributed cloud
applications grow more
complex, dynamic, and
massively scalable,
“observability” becomes
more critical.
Observability is the practice
of using metrics, monitoring
and distributed tracing to
understand how a system
works.
And find the invisible cows
Observability
Critical
Open
APM
Land
scape
Lots
of
options
https://openapm.io/landscape
Open
APM
Land
scape
Pick
some
https://openapm.io/landscape
Open
APM
Land
scape
In this talk we’ll explore two complementary Open
Source technologies:
- Prometheus for monitoring application metrics, and
- OpenTracing and Jaeger for distributed tracing.
We’ll discover how they improve the observability of
- an Anomaly Detection application,
- deployed on AWS Kubernetes, and
- using Instaclustr managed Apache Cassandra and
Kafka clusters.
Goal
To increase the
observability of an
anomaly detection
application
Kubernetes
Cluster
?
Cloud
context
Running across
Kafka, Cassandra,&
Kubernetes Clusters
Observability
Goal 1: Metrics
T
P
S
TPS
TPS
T
i
m
e
R
o
w
s
Anomalies
Producer rate
Consumer rate
Anomaly checks rate
Detector duration Rows returned
anomaly rate
Observability
Goal 2: Distributed
Tracing
1
2
3
Overview
Prometheus for Monitoring
OpenTracing for Distributed Tracing
Conclusions, and Cassandra Auto Scaling use case
Monitoring
with
Prometheus
Popular Open
Source monitoring
system from
Soundcloud
Now Cloud Native
Computing
Foundation (CNCF)
Prometheus
Monitoring of
applications and
servers
Pull-based
Architecture &
Components…
Prometheus
Server
Server
responsible for service discovery,
pulling metrics from monitored
applications, storing metrics, and
analysis of time series data
Prometheus
GUI
Built in simple graphing GUI, and
native support for Grafana
Prometheus
Optional
Push gateway
Alerting
Optional push gateway and
alerting
Optional push gateway and alerting
Prometheus
How does metrics
capture work?
Instrumentation and Agents (Exporters)
- Client libraries for instrumenting applications in
multiple programming languages
- Java client collects JVM metrics and enables
custom application metrics
- Node exporter for host hardware metrics
Prometheus
Data Model
■ Metrics
● Time series data
ᐨ timestamp and value; name, key:value pairs
● By convention name includes
ᐨ thing being monitored, logical type, and units
ᐨ e.g. http_requests_total, http_duration_seconds
■ Prometheus automatically adds labels
● Job, host:port
■ Metric types (only relevant for instrumentation)
● Counter (increasing values)
● Gauge (values up and down)
● Histogram
● Summary
Target
metrics
Business metric
(Anomaly checks/s)
Diagnostic metrics T
P
S
TPS
TPS
T
i
m
e
R
o
w
s
Anomalies
Producer rate
Consumer rate
Anomaly checks rate
Detector duration Rows returned
anomaly rate
Steps
Basic
■ Create and register Prometheus Metric types
● (e.g. Counter) for each timeseries type (e.g. throughputs) including
name and units
■ Instrument the code
● e.g. increment the count, using name of the component (e.g.
producer, consumer, etc) as label
■ Create HTTP server in code
■ Tell Prometheus where to scrape from (config file)
■ Run Prometheus Server
■ Browse to Prometheus server
■ View and select metrics, check that there’s data
■ Construct expression
■ Graph the expression
■ Run and configure Grafana for better graphs
Instrumentation
Counter example
// Use a single Counter for throughput metrics
// for all stages of the pipeline
// stages are distinguished by labels
static final Counter pipelineCounter = Counter
.build()
.name(appName + "_requests_total")
.help("Count of executions of pipeline stages")
.labelNames("stage")
.register();
. . .
// After successful execution of each stage:
// increment producer/consumer/detector rate count
pipelineCounter.labels(“producer”).inc();
. . .
pipelineCounter.labels(“consumer”).inc();
. . .
pipelineCounter.labels(“detector”).inc();
Instrumentation
Gauge example
// A Gauge can go up and down
// Used to measure the current value of some variable.
// pipelineGauge will measure duration of each labelled stage
static final Gauge pipelineGauge = Gauge
.build()
.name(appName + "_duration_seconds")
.help("Gauge of stage durations in seconds")
.labelNames("stage")
.register();
. . .
// in detector pipeline, compute duration and set
long duration = nowTime – startTime;
pipelineGauge.labels(”detector”).setToTime(duration);
HTTP Server
For metric pulls
// Metrics are pulled by Prometheus
// Create an HTTP server as the endpoint to pull from
// If there are multiple processes running on the same server
// then you need different port numbers
// Add IPs and port numbers to the Prometheus configuration
// file.
HTTPServer server = null;
try {
server = new HTTPServer(1234);
} catch (IOException e) {
e.printStackTrace();
}
Using
Prometheus
Configure
Run
■ Configure Prometheus with IP and Ports to poll.
● Edit the default Prometheus.yml file
● Includes polling frequency, timeouts etc
● Ok for testing but doesn’t scale for production systems
■ Get, install and run Prometheus.
● Initially just running locally.
Graphs
Counter
■ Browse to Prometheus Server URL
■ No default dashboards
■ View and select metrics
■ Execute them to graph
■ Counter value increases over time
Rate
Graph using irate
function
■ Enter expressions, e.g. irate function
■ Expression language has multiple data types and many
functions
Gauge
graph
Pipeline stage
durations in
seconds
■ Doesn’t need a function as it’s a Gauge
Grafana
Prometheus GUI ok
for debugging
Grafana better for
production
■ Install and run Grafana
■ Browse to Grafana URL, create a Prometheus data
source, add a Prometheus Graph.
■ Can enter multiple Prometheus expressions and graph
them on the same graph.
■ Example shows rate and duration metrics
Simple Test
configuration
Prometheus Server
outside Kubernetes
cluster, pulls metrics
from Pods
Dynamic/many
Pods are a
challenge
■ IP addresses to pull from are dynamic
● Have to update Prometheus pull configurations
● In production too many Pods to do this manually
Prometheus
on
Kubernetes
A few extra steps
makes life easier
■ Create and register Prometheus Metric types
● (e.g. Counter) for each timeseries type (e.g. throughputs) including name and
units
■ Instrument the code
● e.g. increment the count, using name of the component (e.g. producer,
consumer, etc) as label
■ Create HTTP server in code
■ Run Prometheus Server on Kubernetes cluster,
using Kubernetes Operator
■ Configure so it dynamically monitors selected Pods
■ Enable ingress and external access to Prometheus
server
■ Browse to Prometheus server
■ View and select metrics, check that there’s data
■ Construct expression
■ Graph the expression
■ Run and configure Grafana for better graphs
Prometheus
In production on
Kubernetes
Use Prometheus
Operator
Prometheus
In production on
Kubernetes
Use Prometheus
Operator
1 Install Prometheus
Operator and Run
Prometheus
In production on
Kubernetes
Use Prometheus
Operator
1 Install Prometheus
Operator and Run
2 Configure Service Objects to
monitor Pods
Prometheus
In production on
Kubernetes
Use Prometheus
Operator
1 Install Prometheus
Operator and Run
2 Configure Service Objects to
monitor Pods
3 Configure ServiceMonitors to
discover Service Objects
Prometheus
In production on
Kubernetes
Use Prometheus
Operator
1 Install Prometheus
Operator and Run
2 Configure Service Objects to
monitor Pods
3 Configure ServiceMonitors to
discover Service Objects
4 Configure Prometheus objects
to specify which ServiceMonitors
should be included
Prometheus
In production on
Kubernetes
Use Prometheus
Operator
1 Install Prometheus
Operator and Run
2 Configure Service Objects to
monitor Pods
3 Configure ServiceMonitors to
discover Service Objects
4 Configure Prometheus objects
to specify which ServiceMonitors
should be included
5 Allow ingress to Prometheus
by using a Kubernetes
NodePort Service
6 Create Role-based access
control rules for both
Prometheus and Prometheus
Operator
7 Configure AWS EC2 firewalls
Weavescope
Prometheus now
magically monitors
Pods as they come
and go
Showing
Prometheus
monitoring Pods
Prometheus
Operator
Pods
OpenTracing
Use Case:
Topology Maps
■ Prometheus collects and displays metric aggregations
● No dependency or order information, no single events
■ Distributed tracing shows “call tree” (causality, timing) for
each event
■ And Topology Maps
OpenTracing
Standard API for
distributed tracing
■ Specification, not implementation
■ Need
● Application instrumentation
● OpenTracing tracer
Traced Applications API Tracer implementations
Open Source, Datadog
Spans
Smallest logical unit
of work in
distributed system
■ Spans are smallest logical unit of work
● Have name, start time, duration, associated component
■ Simplest trace is a single span
Trace
Multi-span trace
■ Spans can be related
● ChildOf = synchronous dependency (wait)
● FollowsFrom = asynchronous relationships (no wait)
■ A Trace is a DAG of Spans.
● 1 or more Spans.
Instrumentation
■ Language specific client instrumentation
● Used to create spans in the application within the same process
■ Contributed libraries for frameworks
● E.g. Elasticsearch, Cassandra, Kafka etc
● Used to create spans across process boundaries (Kafka producers
-> consumers)
■ Choose and Instantiate a Tracer implementation
// Example instrumentation for consumer -> detector spans
static Tracer tracer = initTracer(”AnomaliaMachina");
. . .
Span span1 = tracer.buildSpan(”consumer").start();
. . .
span1.finish();
Span span2 = tracer
.buildSpan(”detector")
.addReference(References.CHILD_OF, span1.context())
.start();
. . .
span2.finish();
Steps
Tracing
across
process
boundaries
Inject/extract
metadata
■ To trace across process boundaries (processes,
servers, clouds) OpenTracing injects metadata into
the cross-process call flows to build traces across
heterogeneous systems.
■ Inject and extract a spanContext, how depends on
protocol.
How to do
this for
Kafka?
Producer
Automatically
inserts a span
context into Kafka
headers using
Interceptors
// Register tracer with GlobalTracer:
GlobalTracer.register(tracer);
// Add TracingProducerInterceptor to sender properties:
senderProps.put(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG,
TracingProducerInterceptor.class.getName());
// Instantiate KafkaProducer
KafkaProducer<Integer, String> producer = new
KafkaProducer<>(senderProps);
// Send
producer.send(...);
// 3rd party library
// https://github.com/opentracing-contrib/java-kafka-client
Consumer
side
Extract spanContext
// Once you have a consumer record, extract
// the span context and
// create a new FOLLOWS_FROM span
SpanContext spanContext
= tracer.extract(Format.Builtin.TEXT_MAP, new
MyHeadersMapExtractAdapter(record.headers(),
false));
newSpan =
tracer.buildSpan("consumer").addReference(Refe
rences.FOLLOWS_FROM, spanContext).start();
Jaeger
Tracer
Open Source Tracer
Uber/CNCF
Jaeger
Tracer
How to use?
• Tracers can have different architectures and protocols
• Jaeger should scale well in production as
• It can use Cassandra and Spark
• Uses adaptive sampling
• Need to instantiate a Jaeger tracer in your code
Jaeger
GUI
■ Install and start Jaeger
■ Browse to Jaeger URL
■ Find traces by name, operation, and filter.
■ Select to drill down for more detail.
Jaeger
Single trace
■ Insight into total trace time, relationships and times
of spans
■ This is a trace of a single event through the
anomaly detector pipeline
● Producer (async)
● Consumer (async)
● Detector (async, with sync children)
ᐨ CassandraWrite
ᐨ CassandraRead
ᐨ AnomalyDetector
Jaeger
Dependencies view
■ Correctly shows anomaly detector topology
■ Only metric is number of spans observed
■ Can’t select subset of traces, or filter
■ Force directed view, select node and highlights
dependencies
Kafka
Challenge
Multiple Kafka topic
topologies
■ More complex example (application simulates
complex event flows across topics)
■ Show dependencies between source, intermediate
and sink Kafka topics.
Final
observable
system
• Enhanced
application
observability
• Metrics from
application
• Traces across
application and
Kafka cluster
Conclusions
Observations &
Alternatives
■ Topology view is basic (c.f. some commercial APMs)
■ Still need Prometheus for metrics
● in theory OpenTracing has everything needed for metrics.
■ Other OpenTracing tracers may be worth trying, e.g.
Datadog
■ OpenCensus is a competing approach.
■ Manual instrumentation is tedious and potentially
error prone, commercial APMs use Java agents with
byte-code injection instead, any Open Source?
■ The future? Kubernetes based service mesh
frameworks can construct traces for microservices
without instrumentation
● as they have visibility into how Pods interact with each other and
external systems
● and Pods only contain a single microservice, not a monolithic
application
● E.g. Kiali, observability for Istio, www.kiali.io
Results
Scaled out to 48
Cassandra nodes
Approx 600 cores
for whole system
109 Pods for
Prometheus to
monitor
Producer rate metric
(9 Pods)
Peak Producer rate = 2.3 Million events/s
Prometheus was critical for collecting, computing and displaying
the metrics, from multiple Pods
Business
metric
Detector rate
100 Pods
220,000 anomaly
checks/s computed
from 100 stacked
metrics
Anomaly Checks/s = 220,000
Prometheus was critical for tuning the system to achieve near perfect linear
scalability - used metrics for consumer and detector rate to tune thread pool
sizes to optimize anomaly checks/s, for increasingly bigger systems.
OpenTracing and Jaeger was useful during test deployment
- to check/debug if components were working together as expected
Cassandra &
OpenTracing
Visibility into
Cassandra clusters?
■ OpenTracing the example application was
● Across Kafka producers/consumers
● And within the Kubernetes deployed application
■ What options are there for improved visibility of
tracing across Cassandra clusters?
■ Instaclustr managed service
● OpenTracing support for the C* driver
● May not require any support from C* clusters
● https://github.com/opentracing-contrib/java-cassandra-driver
■ Self-managed clusters
● end-to-end OpenTracing through a C* cluster
● May require support from C* cluster
● https://github.com/thelastpickle/cassandra-zipkin-tracing
Cassandra &
Prometheus
Visibility into
Cassandra clusters?
■ Prometheus monitoring of the example application
● limited to application metrics collected from Kubernetes Pods
■ What options are there for integration with Casandra
Cluster metrics?
?
Cassandra &
Prometheus
Visibility into
Cassandra clusters?
Option 1
Self-managed
clusters
■ Instaclustr's Open Source Tools For Cassandra -
LDAP/Kerberos, Prometheus Exporter, Debug
Tooling and K8s Operator
● Adam Zegelin
● Wednesday, 11th Sep, 16:45 - 17:35
● Mesquite
■ Prometheus cassandra-exporter
● Exports Cassandra metrics to Prometheus
● https://github.com/instaclustr/cassandra-exporter
■ Kubernetes Operator for Cassandra
● The Cassandra operator will create the appropriate objects to inform the
Prometheus operator about the metrics endpoints available from Cassandra
● https://github.com/instaclustr/cassandra-operator/
Cassandra &
Prometheus
Visibility into
Cassandra clusters?
Option 2
Instaclustr managed
service
■ Instaclustr managed Cassandra
● NEW Instaclustr Prometheus Monitoring API
● Works with both Cassandra and Kafka
● Exports default Cassandra and Kafka metrics, Server metrics, and extra
managed service metrics
Use Case:
Cassandra
Auto scaling
Using Prometheus,
Instaclustr Managed
Cassandra with
Prometheus
Monitoring API and
Provisioning API
Cassandra
Auto scaling
1 Cassandra Cluster
metrics
Cassandra
Auto scaling
2 Load increasing
0
20
40
60
80
100
120
0 50 100 150 200 250
CPU%
Time (m)
Cassandra Cluster CPU %
Max Capacity CPU % (used for regression)
Cassandra
Auto scaling
3 Regression
prediction
0
20
40
60
80
100
120
0 50 100 150 200 250 300
CPU%
Time (m)
Cassandra Cluster CPU %
Max Capacity CPU % (used for regression) Regression
Cluster
predicted to
hit Max
Capacity
Regression
prediction
Cassandra
Auto scaling
4 Alert when it’s time
to trigger a cluster
resize (taking into
account predicted
rate of increase,
resizing time,
reduced cluster
capacity during
resize, but leave as
late as possible to
prevent unnecessary
resizing)
0
50
100
150
200
250
0 50 100 150 200 250 300 350
CAPACITY(%)
TIME (M)
RESIZE BY NODE
CPU % (used for regression) Regression Capacity (resize by node)
Latest time to
trigger resize
by node Reduced
cluster
capacity
during resize
Regression
prediction
Resize
time
Resized
capacity
Cassandra
Auto scaling
5 Use alert to trigger
5a Kubernetes Pod
scaling and
Pod scaling
Cassandra
Auto scaling
5 Use alert to trigger
5a Kubernetes Pod
scaling and
5b Dynamic Resizing
of Cassandra Cluster
(Using Webhooks?)
Pod scaling
Dynamic Resizing
More
information?
Blogs and another
talk
■ Anomalia Machina Blog Series (10 parts)
● https://www.instaclustr.com/anomalia-machina-10-final-results-
massively-scalable-anomaly-detection-with-apache-kafka-and-
cassandra/
■ All my blogs
● https://www.instaclustr.com/paul-brebner/
■ Kafka, Cassandra and Kubernetes at Scale -
Real-time Anomaly detection on 19 billion events
a day
● Paul Brebner
● Thursday, 12th Sep, 13:00 - 13:50
● Savoy
The End
Instaclustr Managed Platform for Open Source
www.instaclustr.com/platform/

More Related Content

What's hot

Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)confluent
 
IoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache KafkaIoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache Kafkaconfluent
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!confluent
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...HostedbyConfluent
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...confluent
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner
 
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
Event sourcing  - what could possibly go wrong ? Devoxx PL 2021Event sourcing  - what could possibly go wrong ? Devoxx PL 2021
Event sourcing - what could possibly go wrong ? Devoxx PL 2021Andrzej Ludwikowski
 
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...confluent
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Servicesconfluent
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connectorconfluent
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
 
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...confluent
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...confluent
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connectconfluent
 
Streams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLStreams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLconfluent
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...confluent
 
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...confluent
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?confluent
 

What's hot (20)

Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
 
IoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache KafkaIoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache Kafka
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
Event sourcing  - what could possibly go wrong ? Devoxx PL 2021Event sourcing  - what could possibly go wrong ? Devoxx PL 2021
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
 
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connector
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Streams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLStreams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQL
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
 
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?
 

Similar to ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kubernetes applications with Prometheus and OpenTracing

Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Brian Brazil
 
Prometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint CheckettsPrometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint Checkettsclintchecketts
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
 
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdfŁukasz Piątkowski
 
Slack in the Age of Prometheus
Slack in the Age of PrometheusSlack in the Age of Prometheus
Slack in the Age of PrometheusGeorge Luong
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Brian Brazil
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaArvind Kumar G.S
 
Prometheus kubernetes tech talk
Prometheus kubernetes tech talkPrometheus kubernetes tech talk
Prometheus kubernetes tech talkChandresh Pancholi
 
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingCREATE-NET
 
Monitoring Your AWS EKS Environment with Datadog
Monitoring Your AWS EKS Environment with DatadogMonitoring Your AWS EKS Environment with Datadog
Monitoring Your AWS EKS Environment with DatadogDevOps.com
 
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHubMuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHubAlfonso Martino
 

Similar to ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kubernetes applications with Prometheus and OpenTracing (20)

Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
Prometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint CheckettsPrometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint Checketts
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
About QTP 9.2
About QTP 9.2About QTP 9.2
About QTP 9.2
 
About Qtp_1 92
About Qtp_1 92About Qtp_1 92
About Qtp_1 92
 
About Qtp 92
About Qtp 92About Qtp 92
About Qtp 92
 
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf
2022-05-23-DevOps pro Europe - Managing Apps at scale.pdf
 
Slack in the Age of Prometheus
Slack in the Age of PrometheusSlack in the Age of Prometheus
Slack in the Age of Prometheus
 
Prometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb SolutionPrometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb Solution
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
LoadTracer
LoadTracer LoadTracer
LoadTracer
 
Prometheus kubernetes tech talk
Prometheus kubernetes tech talkPrometheus kubernetes tech talk
Prometheus kubernetes tech talk
 
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computing
 
Monitoring Your AWS EKS Environment with Datadog
Monitoring Your AWS EKS Environment with DatadogMonitoring Your AWS EKS Environment with Datadog
Monitoring Your AWS EKS Environment with Datadog
 
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHubMuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 

More from Paul Brebner

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...Paul Brebner
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersPaul Brebner
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaPaul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Paul Brebner
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaPaul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Paul Brebner
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialPaul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980'sPaul Brebner
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...Paul Brebner
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...Paul Brebner
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Paul Brebner
 

More from Paul Brebner (20)

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...
 

Recently uploaded

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kubernetes applications with Prometheus and OpenTracing

  • 1. Improving the Observability of Cassandra, Kafka and Kubernetes applications with Prometheus and OpenTracing Paul Brebner Technology Evangelist instaclustr.com Observability track, ApacheCon 2019, Tuesday, 10th September, Las Vegas, USA https://www.apachecon.com/acna19/s/#/scheduledEvent/1031
  • 2. As distributed applications grow more complex, dynamic, and massively scalable, “observability” becomes more critical. Observability is the practice of using metrics, monitoring and distributed tracing to understand how a system works. Observability Critical
  • 3. As distributed cloud applications grow more complex, dynamic, and massively scalable, “observability” becomes more critical. Observability is the practice of using metrics, monitoring and distributed tracing to understand how a system works. And find the invisible cows Observability Critical
  • 4. As distributed cloud applications grow more complex, dynamic, and massively scalable, “observability” becomes more critical. Observability is the practice of using metrics, monitoring and distributed tracing to understand how a system works. And find the invisible cows Observability Critical
  • 7. Open APM Land scape In this talk we’ll explore two complementary Open Source technologies: - Prometheus for monitoring application metrics, and - OpenTracing and Jaeger for distributed tracing. We’ll discover how they improve the observability of - an Anomaly Detection application, - deployed on AWS Kubernetes, and - using Instaclustr managed Apache Cassandra and Kafka clusters.
  • 8. Goal To increase the observability of an anomaly detection application Kubernetes Cluster ?
  • 10. Observability Goal 1: Metrics T P S TPS TPS T i m e R o w s Anomalies Producer rate Consumer rate Anomaly checks rate Detector duration Rows returned anomaly rate
  • 12. 1 2 3 Overview Prometheus for Monitoring OpenTracing for Distributed Tracing Conclusions, and Cassandra Auto Scaling use case
  • 13. Monitoring with Prometheus Popular Open Source monitoring system from Soundcloud Now Cloud Native Computing Foundation (CNCF)
  • 15. Prometheus Server Server responsible for service discovery, pulling metrics from monitored applications, storing metrics, and analysis of time series data
  • 16. Prometheus GUI Built in simple graphing GUI, and native support for Grafana
  • 17. Prometheus Optional Push gateway Alerting Optional push gateway and alerting Optional push gateway and alerting
  • 18. Prometheus How does metrics capture work? Instrumentation and Agents (Exporters) - Client libraries for instrumenting applications in multiple programming languages - Java client collects JVM metrics and enables custom application metrics - Node exporter for host hardware metrics
  • 19. Prometheus Data Model ■ Metrics ● Time series data ᐨ timestamp and value; name, key:value pairs ● By convention name includes ᐨ thing being monitored, logical type, and units ᐨ e.g. http_requests_total, http_duration_seconds ■ Prometheus automatically adds labels ● Job, host:port ■ Metric types (only relevant for instrumentation) ● Counter (increasing values) ● Gauge (values up and down) ● Histogram ● Summary
  • 20. Target metrics Business metric (Anomaly checks/s) Diagnostic metrics T P S TPS TPS T i m e R o w s Anomalies Producer rate Consumer rate Anomaly checks rate Detector duration Rows returned anomaly rate
  • 21. Steps Basic ■ Create and register Prometheus Metric types ● (e.g. Counter) for each timeseries type (e.g. throughputs) including name and units ■ Instrument the code ● e.g. increment the count, using name of the component (e.g. producer, consumer, etc) as label ■ Create HTTP server in code ■ Tell Prometheus where to scrape from (config file) ■ Run Prometheus Server ■ Browse to Prometheus server ■ View and select metrics, check that there’s data ■ Construct expression ■ Graph the expression ■ Run and configure Grafana for better graphs
  • 22. Instrumentation Counter example // Use a single Counter for throughput metrics // for all stages of the pipeline // stages are distinguished by labels static final Counter pipelineCounter = Counter .build() .name(appName + "_requests_total") .help("Count of executions of pipeline stages") .labelNames("stage") .register(); . . . // After successful execution of each stage: // increment producer/consumer/detector rate count pipelineCounter.labels(“producer”).inc(); . . . pipelineCounter.labels(“consumer”).inc(); . . . pipelineCounter.labels(“detector”).inc();
  • 23. Instrumentation Gauge example // A Gauge can go up and down // Used to measure the current value of some variable. // pipelineGauge will measure duration of each labelled stage static final Gauge pipelineGauge = Gauge .build() .name(appName + "_duration_seconds") .help("Gauge of stage durations in seconds") .labelNames("stage") .register(); . . . // in detector pipeline, compute duration and set long duration = nowTime – startTime; pipelineGauge.labels(”detector”).setToTime(duration);
  • 24. HTTP Server For metric pulls // Metrics are pulled by Prometheus // Create an HTTP server as the endpoint to pull from // If there are multiple processes running on the same server // then you need different port numbers // Add IPs and port numbers to the Prometheus configuration // file. HTTPServer server = null; try { server = new HTTPServer(1234); } catch (IOException e) { e.printStackTrace(); }
  • 25. Using Prometheus Configure Run ■ Configure Prometheus with IP and Ports to poll. ● Edit the default Prometheus.yml file ● Includes polling frequency, timeouts etc ● Ok for testing but doesn’t scale for production systems ■ Get, install and run Prometheus. ● Initially just running locally.
  • 26. Graphs Counter ■ Browse to Prometheus Server URL ■ No default dashboards ■ View and select metrics ■ Execute them to graph ■ Counter value increases over time
  • 27. Rate Graph using irate function ■ Enter expressions, e.g. irate function ■ Expression language has multiple data types and many functions
  • 28. Gauge graph Pipeline stage durations in seconds ■ Doesn’t need a function as it’s a Gauge
  • 29. Grafana Prometheus GUI ok for debugging Grafana better for production ■ Install and run Grafana ■ Browse to Grafana URL, create a Prometheus data source, add a Prometheus Graph. ■ Can enter multiple Prometheus expressions and graph them on the same graph. ■ Example shows rate and duration metrics
  • 30. Simple Test configuration Prometheus Server outside Kubernetes cluster, pulls metrics from Pods Dynamic/many Pods are a challenge ■ IP addresses to pull from are dynamic ● Have to update Prometheus pull configurations ● In production too many Pods to do this manually
  • 31. Prometheus on Kubernetes A few extra steps makes life easier ■ Create and register Prometheus Metric types ● (e.g. Counter) for each timeseries type (e.g. throughputs) including name and units ■ Instrument the code ● e.g. increment the count, using name of the component (e.g. producer, consumer, etc) as label ■ Create HTTP server in code ■ Run Prometheus Server on Kubernetes cluster, using Kubernetes Operator ■ Configure so it dynamically monitors selected Pods ■ Enable ingress and external access to Prometheus server ■ Browse to Prometheus server ■ View and select metrics, check that there’s data ■ Construct expression ■ Graph the expression ■ Run and configure Grafana for better graphs
  • 33. Prometheus In production on Kubernetes Use Prometheus Operator 1 Install Prometheus Operator and Run
  • 34. Prometheus In production on Kubernetes Use Prometheus Operator 1 Install Prometheus Operator and Run 2 Configure Service Objects to monitor Pods
  • 35. Prometheus In production on Kubernetes Use Prometheus Operator 1 Install Prometheus Operator and Run 2 Configure Service Objects to monitor Pods 3 Configure ServiceMonitors to discover Service Objects
  • 36. Prometheus In production on Kubernetes Use Prometheus Operator 1 Install Prometheus Operator and Run 2 Configure Service Objects to monitor Pods 3 Configure ServiceMonitors to discover Service Objects 4 Configure Prometheus objects to specify which ServiceMonitors should be included
  • 37. Prometheus In production on Kubernetes Use Prometheus Operator 1 Install Prometheus Operator and Run 2 Configure Service Objects to monitor Pods 3 Configure ServiceMonitors to discover Service Objects 4 Configure Prometheus objects to specify which ServiceMonitors should be included 5 Allow ingress to Prometheus by using a Kubernetes NodePort Service 6 Create Role-based access control rules for both Prometheus and Prometheus Operator 7 Configure AWS EC2 firewalls
  • 38. Weavescope Prometheus now magically monitors Pods as they come and go Showing Prometheus monitoring Pods Prometheus Operator Pods
  • 39. OpenTracing Use Case: Topology Maps ■ Prometheus collects and displays metric aggregations ● No dependency or order information, no single events ■ Distributed tracing shows “call tree” (causality, timing) for each event ■ And Topology Maps
  • 40. OpenTracing Standard API for distributed tracing ■ Specification, not implementation ■ Need ● Application instrumentation ● OpenTracing tracer Traced Applications API Tracer implementations Open Source, Datadog
  • 41. Spans Smallest logical unit of work in distributed system ■ Spans are smallest logical unit of work ● Have name, start time, duration, associated component ■ Simplest trace is a single span
  • 42. Trace Multi-span trace ■ Spans can be related ● ChildOf = synchronous dependency (wait) ● FollowsFrom = asynchronous relationships (no wait) ■ A Trace is a DAG of Spans. ● 1 or more Spans.
  • 43. Instrumentation ■ Language specific client instrumentation ● Used to create spans in the application within the same process ■ Contributed libraries for frameworks ● E.g. Elasticsearch, Cassandra, Kafka etc ● Used to create spans across process boundaries (Kafka producers -> consumers) ■ Choose and Instantiate a Tracer implementation // Example instrumentation for consumer -> detector spans static Tracer tracer = initTracer(”AnomaliaMachina"); . . . Span span1 = tracer.buildSpan(”consumer").start(); . . . span1.finish(); Span span2 = tracer .buildSpan(”detector") .addReference(References.CHILD_OF, span1.context()) .start(); . . . span2.finish(); Steps
  • 44. Tracing across process boundaries Inject/extract metadata ■ To trace across process boundaries (processes, servers, clouds) OpenTracing injects metadata into the cross-process call flows to build traces across heterogeneous systems. ■ Inject and extract a spanContext, how depends on protocol.
  • 45. How to do this for Kafka? Producer Automatically inserts a span context into Kafka headers using Interceptors // Register tracer with GlobalTracer: GlobalTracer.register(tracer); // Add TracingProducerInterceptor to sender properties: senderProps.put(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG, TracingProducerInterceptor.class.getName()); // Instantiate KafkaProducer KafkaProducer<Integer, String> producer = new KafkaProducer<>(senderProps); // Send producer.send(...); // 3rd party library // https://github.com/opentracing-contrib/java-kafka-client
  • 46. Consumer side Extract spanContext // Once you have a consumer record, extract // the span context and // create a new FOLLOWS_FROM span SpanContext spanContext = tracer.extract(Format.Builtin.TEXT_MAP, new MyHeadersMapExtractAdapter(record.headers(), false)); newSpan = tracer.buildSpan("consumer").addReference(Refe rences.FOLLOWS_FROM, spanContext).start();
  • 48. Jaeger Tracer How to use? • Tracers can have different architectures and protocols • Jaeger should scale well in production as • It can use Cassandra and Spark • Uses adaptive sampling • Need to instantiate a Jaeger tracer in your code
  • 49. Jaeger GUI ■ Install and start Jaeger ■ Browse to Jaeger URL ■ Find traces by name, operation, and filter. ■ Select to drill down for more detail.
  • 50. Jaeger Single trace ■ Insight into total trace time, relationships and times of spans ■ This is a trace of a single event through the anomaly detector pipeline ● Producer (async) ● Consumer (async) ● Detector (async, with sync children) ᐨ CassandraWrite ᐨ CassandraRead ᐨ AnomalyDetector
  • 51. Jaeger Dependencies view ■ Correctly shows anomaly detector topology ■ Only metric is number of spans observed ■ Can’t select subset of traces, or filter ■ Force directed view, select node and highlights dependencies
  • 52. Kafka Challenge Multiple Kafka topic topologies ■ More complex example (application simulates complex event flows across topics) ■ Show dependencies between source, intermediate and sink Kafka topics.
  • 53. Final observable system • Enhanced application observability • Metrics from application • Traces across application and Kafka cluster
  • 54. Conclusions Observations & Alternatives ■ Topology view is basic (c.f. some commercial APMs) ■ Still need Prometheus for metrics ● in theory OpenTracing has everything needed for metrics. ■ Other OpenTracing tracers may be worth trying, e.g. Datadog ■ OpenCensus is a competing approach. ■ Manual instrumentation is tedious and potentially error prone, commercial APMs use Java agents with byte-code injection instead, any Open Source? ■ The future? Kubernetes based service mesh frameworks can construct traces for microservices without instrumentation ● as they have visibility into how Pods interact with each other and external systems ● and Pods only contain a single microservice, not a monolithic application ● E.g. Kiali, observability for Istio, www.kiali.io
  • 55. Results Scaled out to 48 Cassandra nodes Approx 600 cores for whole system 109 Pods for Prometheus to monitor Producer rate metric (9 Pods) Peak Producer rate = 2.3 Million events/s Prometheus was critical for collecting, computing and displaying the metrics, from multiple Pods
  • 56. Business metric Detector rate 100 Pods 220,000 anomaly checks/s computed from 100 stacked metrics Anomaly Checks/s = 220,000 Prometheus was critical for tuning the system to achieve near perfect linear scalability - used metrics for consumer and detector rate to tune thread pool sizes to optimize anomaly checks/s, for increasingly bigger systems. OpenTracing and Jaeger was useful during test deployment - to check/debug if components were working together as expected
  • 57. Cassandra & OpenTracing Visibility into Cassandra clusters? ■ OpenTracing the example application was ● Across Kafka producers/consumers ● And within the Kubernetes deployed application ■ What options are there for improved visibility of tracing across Cassandra clusters? ■ Instaclustr managed service ● OpenTracing support for the C* driver ● May not require any support from C* clusters ● https://github.com/opentracing-contrib/java-cassandra-driver ■ Self-managed clusters ● end-to-end OpenTracing through a C* cluster ● May require support from C* cluster ● https://github.com/thelastpickle/cassandra-zipkin-tracing
  • 58. Cassandra & Prometheus Visibility into Cassandra clusters? ■ Prometheus monitoring of the example application ● limited to application metrics collected from Kubernetes Pods ■ What options are there for integration with Casandra Cluster metrics? ?
  • 59. Cassandra & Prometheus Visibility into Cassandra clusters? Option 1 Self-managed clusters ■ Instaclustr's Open Source Tools For Cassandra - LDAP/Kerberos, Prometheus Exporter, Debug Tooling and K8s Operator ● Adam Zegelin ● Wednesday, 11th Sep, 16:45 - 17:35 ● Mesquite ■ Prometheus cassandra-exporter ● Exports Cassandra metrics to Prometheus ● https://github.com/instaclustr/cassandra-exporter ■ Kubernetes Operator for Cassandra ● The Cassandra operator will create the appropriate objects to inform the Prometheus operator about the metrics endpoints available from Cassandra ● https://github.com/instaclustr/cassandra-operator/
  • 60. Cassandra & Prometheus Visibility into Cassandra clusters? Option 2 Instaclustr managed service ■ Instaclustr managed Cassandra ● NEW Instaclustr Prometheus Monitoring API ● Works with both Cassandra and Kafka ● Exports default Cassandra and Kafka metrics, Server metrics, and extra managed service metrics
  • 61. Use Case: Cassandra Auto scaling Using Prometheus, Instaclustr Managed Cassandra with Prometheus Monitoring API and Provisioning API
  • 63. Cassandra Auto scaling 2 Load increasing 0 20 40 60 80 100 120 0 50 100 150 200 250 CPU% Time (m) Cassandra Cluster CPU % Max Capacity CPU % (used for regression)
  • 64. Cassandra Auto scaling 3 Regression prediction 0 20 40 60 80 100 120 0 50 100 150 200 250 300 CPU% Time (m) Cassandra Cluster CPU % Max Capacity CPU % (used for regression) Regression Cluster predicted to hit Max Capacity Regression prediction
  • 65. Cassandra Auto scaling 4 Alert when it’s time to trigger a cluster resize (taking into account predicted rate of increase, resizing time, reduced cluster capacity during resize, but leave as late as possible to prevent unnecessary resizing) 0 50 100 150 200 250 0 50 100 150 200 250 300 350 CAPACITY(%) TIME (M) RESIZE BY NODE CPU % (used for regression) Regression Capacity (resize by node) Latest time to trigger resize by node Reduced cluster capacity during resize Regression prediction Resize time Resized capacity
  • 66. Cassandra Auto scaling 5 Use alert to trigger 5a Kubernetes Pod scaling and Pod scaling
  • 67. Cassandra Auto scaling 5 Use alert to trigger 5a Kubernetes Pod scaling and 5b Dynamic Resizing of Cassandra Cluster (Using Webhooks?) Pod scaling Dynamic Resizing
  • 68. More information? Blogs and another talk ■ Anomalia Machina Blog Series (10 parts) ● https://www.instaclustr.com/anomalia-machina-10-final-results- massively-scalable-anomaly-detection-with-apache-kafka-and- cassandra/ ■ All my blogs ● https://www.instaclustr.com/paul-brebner/ ■ Kafka, Cassandra and Kubernetes at Scale - Real-time Anomaly detection on 19 billion events a day ● Paul Brebner ● Thursday, 12th Sep, 13:00 - 13:50 ● Savoy
  • 69. The End Instaclustr Managed Platform for Open Source www.instaclustr.com/platform/