SlideShare a Scribd company logo

Citi Tech Talk: Monitoring and Performance

confluent
confluent
confluentconfluent

The objective of the engagement is for Citi to have an understanding and path forward to monitor their Confluent Platform and - Platform Monitoring - Maintenance and Upgrade

Citi Tech Talk: Monitoring and Performance

1 of 55
Download to read offline
Introduction to Monitoring
Confluent Platform
Akhilesh Dubey | Confluent
Ishan Dwivedi | Citibank
Agenda
2
01
Confluent Platform Monitoring
What can you monitor in Confluent
Platform?
02
Monitoring using Control Center
Overview of Monitoring through Confluent
Control Center
03
JMX Monitoring
Overview of JMX metrics and 3rd party
monitoring stacks - AppDynamics &
Prometheus/Grafana
04
Alerting
Alerting ability available through Confluent
Control Center and ITRS
01. Confluent Platform
Components
Confluent Platform Components
4
https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/
Application
Sticky Load Balancer
REST Proxy
Proxy
Kafka Brokers
Broker +
Rebalancer
ZooKeeper Nodes
ZK ZK ZK
Proxy
Broker +
Rebalancer
Broker +
Rebalancer
Broker +
Rebalancer
Schema Registry
Leader Follower
ZK ZK
Confluent
Control Center
Application
Clients
KStreams
pp
Streams
Kafka Connect
Worker +
Connectors
or
Replicator
Microservices
Worker +
Connectors
or
Replicator
ksqlDB
ksqlDB
Server
ksqlDB
Server
What components need monitoring?
● Resources (CPU, DISK, Memory, Network I/O)
● JVM
● Kafka Brokers
● Zookeeper
● Connect
● Schema Registry
● REST Proxy
● Clients (producers/consumers)
Where do I even
start?
Start with the basics:
● Do I have a monitoring solution today (agents, storage,
dashboards)?
● Most components emit JMX metrics. These can be
watched and exported to a JMX Collector
(AppDynamics, Prometheus, etc) for alerting or
visualization:
● Resources (put in alerting at 60% to investigate):
○ CPU
○ DISK Free (Kafka Cannot run if your disk is full)
○ Network I/O
○ Open File Handles
○ JVM (Enable and monitor garbage collection
times)
Where do I even
start?
Ad

Recommended

OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdfTarekHamdi8
 
Tokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdfTokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdfssuser2ae721
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For ArchitectsKevin Brockhoff
 
MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021Ieva Navickaite
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkconfluent
 
Migrating from a monolith to microservices – is it worth it?
Migrating from a monolith to microservices – is it worth it?Migrating from a monolith to microservices – is it worth it?
Migrating from a monolith to microservices – is it worth it?Katherine Golovinova
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...NETWAYS
 

More Related Content

Similar to Citi Tech Talk: Monitoring and Performance

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaRicardo Bravo
 
SDN in the Management Plane: OpenConfig and Streaming Telemetry
SDN in the Management Plane: OpenConfig and Streaming TelemetrySDN in the Management Plane: OpenConfig and Streaming Telemetry
SDN in the Management Plane: OpenConfig and Streaming TelemetryAnees Shaikh
 
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent RamièreAu delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramièreconfluent
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...HostedbyConfluent
 
Manage Microservices Chaos and Complexity with Observability
Manage Microservices Chaos and Complexity with ObservabilityManage Microservices Chaos and Complexity with Observability
Manage Microservices Chaos and Complexity with ObservabilityNGINX, Inc.
 
IBM MQ - better application performance
IBM MQ - better application performanceIBM MQ - better application performance
IBM MQ - better application performanceMarkTaylorIBM
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpJosé Román Martín Gil
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructureFernando Lopez Aguilar
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0Matt Lucas
 
Microservices and Deployment Methodologies
Microservices and Deployment MethodologiesMicroservices and Deployment Methodologies
Microservices and Deployment MethodologiesYash Gupta
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
What’s new in Rational collaborative lifecycle management 2011?
What’s new in Rational collaborative lifecycle management 2011?What’s new in Rational collaborative lifecycle management 2011?
What’s new in Rational collaborative lifecycle management 2011?IBM Danmark
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...confluent
 
Day in the life event-driven workshop
Day in the life  event-driven workshopDay in the life  event-driven workshop
Day in the life event-driven workshopChristina Lin
 

Similar to Citi Tech Talk: Monitoring and Performance (20)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
SDN in the Management Plane: OpenConfig and Streaming Telemetry
SDN in the Management Plane: OpenConfig and Streaming TelemetrySDN in the Management Plane: OpenConfig and Streaming Telemetry
SDN in the Management Plane: OpenConfig and Streaming Telemetry
 
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent RamièreAu delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
 
Manage Microservices Chaos and Complexity with Observability
Manage Microservices Chaos and Complexity with ObservabilityManage Microservices Chaos and Complexity with Observability
Manage Microservices Chaos and Complexity with Observability
 
IBM MQ - better application performance
IBM MQ - better application performanceIBM MQ - better application performance
IBM MQ - better application performance
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
 
Microservices and Deployment Methodologies
Microservices and Deployment MethodologiesMicroservices and Deployment Methodologies
Microservices and Deployment Methodologies
 
Apache KAfka
Apache KAfkaApache KAfka
Apache KAfka
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
What’s new in Rational collaborative lifecycle management 2011?
What’s new in Rational collaborative lifecycle management 2011?What’s new in Rational collaborative lifecycle management 2011?
What’s new in Rational collaborative lifecycle management 2011?
 
Microservices.pdf
Microservices.pdfMicroservices.pdf
Microservices.pdf
 
Resume2015
Resume2015Resume2015
Resume2015
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Shaik Niyas Ahamed M Resume
Shaik Niyas Ahamed M ResumeShaik Niyas Ahamed M Resume
Shaik Niyas Ahamed M Resume
 
Day in the life event-driven workshop
Day in the life  event-driven workshopDay in the life  event-driven workshop
Day in the life event-driven workshop
 

More from confluent

Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluentconfluent
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Diveconfluent
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloudconfluent
 
Confluent Partner Tech Talk with QLIK
Confluent Partner Tech Talk with QLIKConfluent Partner Tech Talk with QLIK
Confluent Partner Tech Talk with QLIKconfluent
 
Real-time Streaming for Government and the Public Sector
Real-time Streaming for Government and the Public SectorReal-time Streaming for Government and the Public Sector
Real-time Streaming for Government and the Public Sectorconfluent
 

More from confluent (20)

Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Dive
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloud
 
Confluent Partner Tech Talk with QLIK
Confluent Partner Tech Talk with QLIKConfluent Partner Tech Talk with QLIK
Confluent Partner Tech Talk with QLIK
 
Real-time Streaming for Government and the Public Sector
Real-time Streaming for Government and the Public SectorReal-time Streaming for Government and the Public Sector
Real-time Streaming for Government and the Public Sector
 

Recently uploaded

100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTSi-engage
 
Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...
Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...
Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...Oleksandr Zaitsev
 
Les02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptLes02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptDrZeeshanBhatti
 
maximum subarray ppt for killing camp students
maximum subarray ppt for killing camp studentsmaximum subarray ppt for killing camp students
maximum subarray ppt for killing camp studentsssuser82c38d
 
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이ssuser82c38d
 
The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!ISPMAIndia
 
Steps to Build a PWA with Odoo.pdf
Steps to Build a PWA with Odoo.pdfSteps to Build a PWA with Odoo.pdf
Steps to Build a PWA with Odoo.pdfayushinwizards
 
P1 Inspection Types in Municity 5 Smartsheet
P1 Inspection Types in Municity 5 SmartsheetP1 Inspection Types in Municity 5 Smartsheet
P1 Inspection Types in Municity 5 SmartsheetMatthewTHawley
 
SPM 2024 – Overview of and benefits of AI in Product Management
SPM 2024 – Overview of and benefits of AI in Product ManagementSPM 2024 – Overview of and benefits of AI in Product Management
SPM 2024 – Overview of and benefits of AI in Product ManagementISPMAIndia
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ..."Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...ISPMAIndia
 
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...ISPMAIndia
 
Sql server types of joins with example.pptx
Sql server types of joins with example.pptxSql server types of joins with example.pptx
Sql server types of joins with example.pptxsameer gaikwad
 
Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Asher Sterkin
 
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)GDSCNiT
 
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A..."Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...ISPMAIndia
 
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementEmbracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementOnePlan Solutions
 
AI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriAI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriISPMAIndia
 
App Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxApp Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxPoojitha B
 

Recently uploaded (20)

100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
 
Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...
Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...
Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...
 
Les02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptLes02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.ppt
 
maximum subarray ppt for killing camp students
maximum subarray ppt for killing camp studentsmaximum subarray ppt for killing camp students
maximum subarray ppt for killing camp students
 
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
 
The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!
 
Steps to Build a PWA with Odoo.pdf
Steps to Build a PWA with Odoo.pdfSteps to Build a PWA with Odoo.pdf
Steps to Build a PWA with Odoo.pdf
 
P1 Inspection Types in Municity 5 Smartsheet
P1 Inspection Types in Municity 5 SmartsheetP1 Inspection Types in Municity 5 Smartsheet
P1 Inspection Types in Municity 5 Smartsheet
 
SPM 2024 – Overview of and benefits of AI in Product Management
SPM 2024 – Overview of and benefits of AI in Product ManagementSPM 2024 – Overview of and benefits of AI in Product Management
SPM 2024 – Overview of and benefits of AI in Product Management
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ..."Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
 
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
 
Features of IETM Software -Code and Pixels
Features of IETM Software -Code and PixelsFeatures of IETM Software -Code and Pixels
Features of IETM Software -Code and Pixels
 
Sql server types of joins with example.pptx
Sql server types of joins with example.pptxSql server types of joins with example.pptx
Sql server types of joins with example.pptx
 
Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024
 
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
 
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A..."Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
 
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementEmbracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
 
AI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriAI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit Bendigiri
 
App Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxApp Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptx
 

Citi Tech Talk: Monitoring and Performance

  • 1. Introduction to Monitoring Confluent Platform Akhilesh Dubey | Confluent Ishan Dwivedi | Citibank
  • 2. Agenda 2 01 Confluent Platform Monitoring What can you monitor in Confluent Platform? 02 Monitoring using Control Center Overview of Monitoring through Confluent Control Center 03 JMX Monitoring Overview of JMX metrics and 3rd party monitoring stacks - AppDynamics & Prometheus/Grafana 04 Alerting Alerting ability available through Confluent Control Center and ITRS
  • 4. Confluent Platform Components 4 https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/ Application Sticky Load Balancer REST Proxy Proxy Kafka Brokers Broker + Rebalancer ZooKeeper Nodes ZK ZK ZK Proxy Broker + Rebalancer Broker + Rebalancer Broker + Rebalancer Schema Registry Leader Follower ZK ZK Confluent Control Center Application Clients KStreams pp Streams Kafka Connect Worker + Connectors or Replicator Microservices Worker + Connectors or Replicator ksqlDB ksqlDB Server ksqlDB Server
  • 5. What components need monitoring? ● Resources (CPU, DISK, Memory, Network I/O) ● JVM ● Kafka Brokers ● Zookeeper ● Connect ● Schema Registry ● REST Proxy ● Clients (producers/consumers) Where do I even start?
  • 6. Start with the basics: ● Do I have a monitoring solution today (agents, storage, dashboards)? ● Most components emit JMX metrics. These can be watched and exported to a JMX Collector (AppDynamics, Prometheus, etc) for alerting or visualization: ● Resources (put in alerting at 60% to investigate): ○ CPU ○ DISK Free (Kafka Cannot run if your disk is full) ○ Network I/O ○ Open File Handles ○ JVM (Enable and monitor garbage collection times) Where do I even start?
  • 7. You can use Control Center for an opinionated view of what is happening right now Brokers generate many metrics using JMX MBeans. ● Under Replicated Partitions ● Offline Partitions ● Total Time MS ● ISR Shrink Rate ● and many more (https://support.confluent.io/hc/en-us/articles/230419288-Monitoring-Kafka) Brokers
  • 8. Zookeeper is crucial to the operation of a Kafka cluster 4 Letter Words for quick status (RUOK, MNTR, STAT) Zookeeper also generates many many metrics using JMX MBeans. ● AvgRequestLatency (per node) ● OutstandingRequests (per node) Monitor which Zookeeper nodes are leaders (they tend to be busiest) ● How many clients and watchers: NumAliveConnections ● WatchCount https://zookeeper.apache.org/doc/current/zookeeperJMX.html Zookeeper
  • 9. Very important to monitor producers and consumers too! Confluent monitoring interceptors available to see end to end lag in Control Center JMX metrics also available in producers and consumers: Clients Consumers: ● records-lag/records-lag-m ax ● bytes-consumed-rate ● records-consumed-rate ● fetch-rate Producers: ● request-rate ● request-latency-avg ● response-rate ● outgoing-byte-rate ● io-wait-time-ns-avg ● batch-size-avg ● compression-rate-av g
  • 10. 02. Monitoring using Control Center
  • 11. ● Confluent Platform is the central nervous system for a business, and potentially a Kafka-based single source of truth. ● Kafka operators need to provide guarantees to the business that Kafka is working properly and delivering data in real time. They need to identify and triage problems in order to solve them before it affects end users. As a result, monitoring your Kafka deployments is an operational must-have. ● Monitoring help provides assurance that all your services are working properly, meeting SLAs and addressing business needs. ● Here are some common business-level questions: 1. Are applications receiving all data? 2. Are my business applications showing the latest data? 3. Why are the applications running slowly? 4. Do we need to scale up? 5. Can any data get lost? 6. Will there be service interruptions? 7. Are there assurances in case of a disaster event? We will see how Control Center can help to answer all those questions and where/when you require and additional monitoring stack. Why do we monitor? 11
  • 12. 12 You can deploy Confluent Control Center for out-of-the-box Kafka cluster monitoring so you don’t have to build your own monitoring system. Control Center makes it easy to manage the entire Confluent Platform. Control Center is a web-based application that allows you to manage your cluster, to monitor Kafka clusters in predefined dashboards and to alert on triggers.
  • 13. ● Kafka exposes hundreds of JMX metrics. Some of them are per broker, per client, per topic and per partition, and so the number of metrics scales up as the cluster grows. For an average-size Kafka cluster, the number of metrics can very quickly grow into thousands ! ● A common pitfall of generic monitoring tools is to import pretty much all available metrics. But even with a comprehensive list of metrics, there is a limit to what can be achieved with no Kafka context or Kafka expertise to determine which metrics are important and which ones are not. ○ People end up referring to just the two or three charts that they actually understand. ○ Meanwhile, they ignore all the other charts because they don’t understand them ○ It can generate a lot of noise as people spend time chasing “issues” that aren’t impactful to the services, or worse, obscures real problems. ● Control Center was designed to help operators identify the most important things to monitor in Kafka, including the cluster and the client applications producing messages to and consuming messages from the cluster The metrics swamp 13
  • 14. Control Center A walkthrough of the features
  • 15. 15 ● Cluster Overview provides insight into the well-being of the Kafka cluster from the cluster perspective, and allows you to drill down to the broker level, topic level, connect cluster level and KSQL level perspectives ● Multiple clusters can be monitored with a single Control Center and it also supports Multi-Cluster Schema Registry ● Requires Confluent Metrics Reporter to be installed and enabled Cluster Overview
  • 16. 16 ● Brokers Overview provides a succinct view of essential Kafka metrics for brokers in a cluster: ○ Throughput for production and consumption ○ Broker uptime ○ Partitions replicas status (including URP) ○ Apache ZooKeeper status ○ Active Controller ○ Disk usage and distribution ○ System metrics for network and request pool usage ● Clicking on panels, you get an historical view of the metrics 👇👇👇 Brokers Overview
  • 17. 17 ● Brokers Metrics page provides historical data for following panels: ○ Production metrics ○ Consumption metrics ○ Broker uptime metrics ○ Partition replicas metrics ○ System usage ○ Disk usage Brokers Metrics page
  • 18. 18 ● You can add, view, edit, and delete topics using the Control Center topic management interface ● Message Browser ● Manage Schemas for Topics ○ Avro, JSON-Schema and Protobuf ○ ⚠ Options to view and edit schemas through the user interface are available only for schemas that use the default TopicNameStrategy ○ Multi-Cluster Schema Registry ● Metrics: ○ Production Throughput and Failed production requests ○ Consumption Throughput and Failed consumptions requests, % messages consumed (require Monitoring Interceptors) and End-to-end latency (require Monitoring Interceptors) ○ Availability (URP and Out of Sync followers and observers) ○ Consumer Lag Topics
  • 19. 19 ● Provides the convenience of managing connectors for multiple Kafka Connect clusters. ● Use Control Center to: ○ Add a connector by completing UI fields. Note: specific procedure when RBAC is used. ○ Add a connector by uploading a connector configuration file ○ Download connector configuration files to reuse in another connector or cluster, or to use as a template. ○ Edit a connector configuration and relaunch it. ○ Pause a running connector; resume a paused connector. ○ Delete a connector. ○ View the status of connectors in Connect clusters. Connect
  • 20. 20 ● Control Center provides the convenience of running streaming queries on one or more ksqlDB clusters within its graphical user interface ● Use ksqlDB to: ○ View a summary of all ksqlDB applications connected to Control Center. ○ Search for a ksqlDB application being managed by the Control Center instance. ○ Browse topic messages. ○ View the number of running queries, registered streams, and registered tables for each ksqlDB application. ○ Navigate to the ksqlDB Editor, Streams, Tables, Flow View and Running Queries for each ksqlDB application. ksqlDB
  • 21. 21 ● View all consumer groups for all topics in a cluster ● Use Consumers menu to: ○ View all consumer groups for a cluster in the All consumer groups page ○ View consumer lag across all topics in a cluster ○ View consumption metric for a consumer group (only available if monitoring interceptors are set) ○ Set up consumer group alerts Consumers
  • 22. 22 ● You can set up alerts in Control Center based on 4 component triggers: ○ Broker ■ Bytes in ■ Bytes out ■ Fetch request latency ■ Production request count ■ Production request latency ○ Cluster ■ Cluster down ■ Leader election rate ■ Offline topic partitions ■ Unclean election count ■ Under replicated topic partitions ■ ZooKeeper status ■ ZooKeeper expiration rate ○ Consumer Group ■ Average latency (ms) ■ Consumer lag ■ Consumer lead ■ Consumption difference ■ Maximum latency (ms) ○ Topic ■ Bytes in ■ Bytes out ■ Out of sync replica count ■ Production request count ■ Under-replicated topic partitions ● Notifications are possible via email, PagerDuty or Slack Alerts
  • 23. 23 ● Cluster settings ○ Change cluster name (also possible using configuration file) ○ Update dynamic settings without any restart required ○ Download broker configuration ● Status and License menu ○ Processing status: status of Control Center (Running or Not Running). Consumption data and Broker data (message throughput are shown real-time for the last 30 minutes) ○ Set or update license And more...
  • 24. 03. JMX Metrics and Monitoring Stacks Overview of JMX metrics and 3rd party monitoring stacks
  • 25. ● Kafka brokers and Java client applications (Kafka Connect, Kafka Streams, Producer/Consumer, etc..) expose hundreds of internal JMX (Java Management Extensions) metrics ● Important JMX metrics to monitor: ○ Broker metrics ○ ZooKeeper metrics ○ Producer metrics ○ Consumer metrics ○ ksqlDB & Kafka Streams metrics ○ Kafka Connect metrics ● It’s key to have a dashboard that let you know “everything is OK?” in one glance ● Multiple monitoring stacks are available. Choose the one that is already used in your company JMX metrics 25
  • 26. 26 Java: Client JMX metrics • Java Kafka Client applications expose some internal JMX (Java Management Extensions) metrics • Many users run JMX exporters to feed these metrics into their monitoring systems (AppDynamics, Grafana, etc..) • Important Client JMX metrics to monitor General producer metrics and producer throttling-time Consumer metrics ksqlDB & Kafka Streams metrics Kafka Connect metrics • Prometheus is a popular open-source monitoring solution uses JMX-Exporter to extract the metrics. The exporter can be configured to extract and forward only the metrics desired. • Here is a demo of JMX-Exporter/Prometheus/Grafana
  • 27. 27 Typical Data pipeline pattern(s) for Client Metrics Clients emitting JMX JMX Client e.g. JMX-Exporter Prometheus Observability App Java Producer running in JVM, producing to Kafka Cluster JMX Exporter jmx_prometheus_javaagent configured as agent on jvm, exposing producer /metrics endpoint Prometheus Configured with a job that scrapes producer /metrics endpoint Grafana Configured with Prometheus as datasource Connect running in JVM, producing to Kafka Cluster JMX Client e.g. appdynamics-agent AppDynamics E.g.
  • 28. 28 Client Throttling • Depending on your cluster configuration, you may be restricted to specific throughputs for your client application • If your client applications exceed these rates, the quotas on the brokers will detect it and the client application requests will be throttled by the brokers. • If your clients are being throttled, consider two options: Modify your application to optimize its throughput, if possible (read the section Optimizing for Throughput for more details) Upgrade to a cluster configuration with higher limits • ℹ Metrics API can give you some indication of throughput from server side, but it doesn’t provide throughput metrics on the client side.
  • 29. 29 Client Throttling To get throttling metrics per producer and consumer, monitor the following client JMX metrics: Metric Description kafka.producer:type=producer-metrics,client-id=([-.w ]+),name=produce-throttle-time-avg The average time in ms that a request was throttled by a broker kafka.producer:type=producer-metrics,client-id=([-.w ]+),name=produce-throttle-time-max The maximum time in ms that a request was throttled by a broker kafka.consumer:type=consumer-fetch-manager-metrics,c lient-id=([-.w]+),name=fetch-throttle-time-avg The average time in ms that a broker spent throttling a fetch request kafka.consumer:type=consumer-fetch-manager-metrics,c lient-id=([-.w]+),name=fetch-throttle-time-max The maximum time in ms that a broker spent throttling a fetch request
  • 30. 30 AppDynamics • AppDynamics provides ability to do JMX monitoring of Java applications • Machine and application server monitoring can be combined to generate and monitor relevant Confluent Platform component metrics.
  • 32. AppDynamics: KaaS JMX Metrics Drill Down View 32
  • 35. 35 Prometheus/Grafana • Prometheus is a popular open-source monitoring solution which uses JMX-Exporter to extract the metrics. The exporter can be configured to extract and forward only the metrics desired. • An example of JMX-Exporter/Prometheus/Grafana monitoring stack deployed on top of Confluent cp-demo is available here Prometheus exporter (JMX-Exporter)
  • 39. • JMX metrics are only for java based clients. • Librdkafka applications can be configured (disabled by default) to emit internal metrics at a fixed interval by setting the statistics.interval.ms configuration property to a value > 0 and registering a stats_cb (or similar, depending on language) • All statistics described here • Emits JSON object string: Librdkafka: Client statistics 39
  • 40. Using prometheus-net/prometheus-net, starting up a MetricsServer to export metrics to Prometheus Prometheus/Grafana: Librdkafka: .NET example 40
  • 42. Monitor Consumer Lag All different ways to monitor consumer lag
  • 43. ● It is important to monitor your application’s consumer lag, which is the number of records for any partition that the consumer is behind in the log ● For "real-time" consumer applications, where the consumer is meant to be processing the newest messages with as little latency as possible, consumer lag should be monitored closely. ● Most "real-time" applications will want little-to-no consumer lag, because lag introduces end-to-end latency. Monitoring Consumer Lag 43
  • 44. Consumer lag is available in Consumers section from navigation bar: #1: Using Control Center 44
  • 45. If you use Java consumers, you can capture JMX metrics and monitor records-lag-max Note: the consumer’s records-lag-max JMX metric calculates lag by comparing the offset most recently seen by the consumer to the most recent offset in the log, which is a more real-time measurement. #2: Using JMX (Java client only) 45 Metric Description kafka.consumer:type=consumer-fe tch-manager-metrics,client-id=( [-.w]+),records-lag-max The maximum lag in terms of number of records for any partition in this window. An increasing value over time is your best indication that the consumer group is not keeping up with the producers.
  • 46. Refer to this Knowledge Base article for full details Create a properties file containing your security details Example: #3: Using kafka-consumer-groups CLI 46
  • 47. 47 #4: Using kafka-lag-exporter and Prometheus/Grafana • lightbend/kafka-lag-exporter is a 3rd party tool (not supported by Confluent) that is using Kafka's Admin API describeConsumerGroups() method to get consumer lags and export them to Prometheus. • Out of the box Grafana dashboard is available
  • 48. 04. Alerting Overview of Alerting capabilities through Confluent Control Center and ITRS
  • 49. 49 Alerts ● As seen earlier, setting up alerts can be done through Control Center, but also using your monitoring stack based on JMX metrics ● Alert on what’s important: Under-replicated partitions is a good start ● Alerting on SLAs is even better: especially when measured from a client point of view
  • 50. Key Alerts 50 Cluster/Broker: • UnderReplicatedPartitions > 0 * • OfflinePartitionsCount > 0 * • UnderMinIsrPartitionCount > 0 • ActiveControllerCount != 1 • AtMinIsrPartitionCount > 0 • RequestHandlerAvgIdlePercent < 40% • NetworkProcessorAvgIdlePercen t < 40% • RequestQueueSize (establish the baseline during normal/peak production load and alert if a deviation occurs) • TotalTimeMs,request=* (Produce|FetchConsumer|FetchF ollower) OS: • Disk usage > 60% (minor) > 80-90% (major) • CPU usage > 60% over 5 minutes (generally caused by SSL connections or old clients causing down conversions) • Network IO usage > 60% • File handle usage > 60% JVM Monitoring: • G1 YoungGeneration CollectionTime • G1 OldGeneration CollectionTime • GC time > 30% Connect: • connector=(*) status • connector=(*),task=(.*) status Zookeeper: • AvgRequestLatency > 10ms over 30 seconds(disk latency is high. `iostat -x ` look at await time in `top`) • NumAliveConnections - make sure you are not close to maximum as set with maxClientCnxns • OutstandingRequests - should be below 10 in general The Four Letter Words: mntr and ruok -Dzookeeper.4lw.commands.whiteli st=* $ echo ruok | nc localhost 2181 $ imok * alert can also be set with Control Center
  • 51. C3 Alerts: Configuring an Alert Trigger 51