SlideShare a Scribd company logo
1 of 50
Download to read offline
Configuring Kafka Connect To Be
Successful At Scale
Travis Sweet
Sr. Technical Support Engineer
About Me
Travis Sweet
I’ve been supporting Java applications since
August 2008 through the present day, i.e. 15+
years. Over 5+ years of that time has been
supporting Confluent Platform. In 2023 my
focus has been on Kafka Connect.
If I had my own theme music for this session
it would be:
We Gon’ Be Alright by Ty Tribbett
I like fries and ice cream.*
@jtsweet
Agenda
1. High Level Overview of Kafka Connect
2. Purpose
3. Foundation
4. Kafka Connect Has To Do More Than Just Run
@jtsweet
Overview of Kafka Connect
● Kafka Connect is a wonderful piece of computer software that allows users to
connect to external systems to put data into motion:
○ Source systems (where data comes from)
■ The data is produced (written to Kafka brokers)
○ Sink systems (where the data goes to, its new home)
■ The data is consumed (written to the new destination system)
@jtsweet
Agenda
1. High Level Overview of Kafka Connect
2. Purpose
3. Foundation
4. Kafka Connect Has To Do More Than Just Run
@jtsweet
Purpose
Purpose
● The house represents Kafka Connect Clusters that have not been configured
to be successful at scale.
@jtsweet
Purpose
● The house represents Kafka Connect Clusters that have not been configured
to be successful at scale.
● Similarities between the house and Kafka Connect Clusters.
@jtsweet
Purpose
● The house represents Kafka Connect Clusters that have not been configured
to be successful at scale.
● Similarities between the house and Kafka Connect Clusters.
● Where, when, and why does this happen?
@jtsweet
Purpose
● The house represents Kafka Connect Clusters that have not been configured
to be successful at scale.
● Similarities between the house and Kafka Connect Clusters.
● When, where, and why does this happen?
● We can prevent this!
@jtsweet
Purpose
● The house represents Kafka Connect Clusters that have not been configured
to be successful at scale.
● Similarities between the house and Kafka Connect Clusters.
● When, where, and why does this happen?
● We can prevent this!
● Please spread the word!
@jtsweet
Agenda
1. High Level Overview of Kafka Connect
2. Purpose
3. Foundation
4. Kafka Connect Has To Do More Than Just Run
@jtsweet
Foundation
● CPUs/Cores
@jtsweet
Foundation
● CPUs/Cores
● Memory
@jtsweet
Foundation
● Memory
○ In respect to Kafka Connect (JVM or Java Virtual Machine)
@jtsweet
Foundation
● Memory
○ In respect to Kafka Connect (JVM)
○ There are 3 memory areas
@jtsweet
Foundation
● Memory
○ In respect to Kafka Connect (JVM)
○ There are 3 memory areas
■ Metaspace memory area - previously named PermGen prior to Java 8
@jtsweet
Foundation
● Memory
○ In respect to Kafka Connect (JVM)
○ There are 3 memory areas
■ Metaspace memory area - previously named PermGen prior to Java 8
■ Native memory area
@jtsweet
Foundation
● Memory
○ In respect to Kafka Connect (JVM)
○ There are 3 memory areas
■ Metaspace memory area - previously named PermGen prior to Java 8
■ Native memory area
■ Java heap
@jtsweet
Foundation
● Memory
○ In respect to Kafka Connect (JVM)
○ There are 3 memory areas
■ Metaspace memory area - previously named PermGen prior to Java 8
■ Native memory area
■ Java heap
○ Total memory footprint of a Java process
@jtsweet
Foundation
● Memory
○ In respect to Kafka Connect (JVM)
○ There are 3 memory areas
■ Metaspace memory area - previously named PermGen prior to Java 8
■ Native memory area
■ Java heap
○ Total memory footprint of a Java process
■ It is the sum of the 3 memory areas
@jtsweet
Foundation
● Memory
○ In respect to Kafka Connect (JVM)
○ There are 3 memory areas
■ Metaspace memory area - previously named PermGen prior to Java 8
■ Native memory area
■ Java heap
○ Total memory footprint of a Java process
■ It is the sum of the 3 memory areas
○ What does the JVM try to do in regards to garbage collection?
@jtsweet
Foundation
● Memory
○ In respect to Kafka Connect (JVM)
○ There are 3 memory areas
■ Metaspace memory area - previously named PermGen prior to Java 8
■ Native memory area
■ Java heap
○ Total memory footprint of a Java process
■ It is the sum of the 3 memory areas
○ What does the JVM try to do in regards to garbage collection?
■ By design the JVM will try to keep the heap utilization between 40% to 70% of the Java
heap.
@jtsweet
Let's Apply What We Have Learned So Far
● Our Kafka Connect worker will be running on a machine with 16 GB of RAM
@jtsweet
Let's Apply What We Have Learned So Far
● Our Kafka Connect worker will be running on a machine with 16 GB of RAM
○ How much of the RAM goes to Kafka Connect?
@jtsweet
Let's Apply What We Have Learned So Far
● Our Kafka Connect worker will be running on a machine with 16 GB of RAM
○ How much of the RAM goes to Kafka Connect?
■ On average the Operating System will use around 2 GB of RAM for its own use.
@jtsweet
Let's Apply What We Have Learned So Far
● Our Kafka Connect worker will be running on a machine with 16 GB of RAM
○ How much of the RAM goes to Kafka Connect?
■ On average the Operating System will use around 2 GB of RAM for its own use.
○ That leaves us 14 GB of RAM for Kafka Connect and as a best practice there are no additional
components running on the machine.
○ Now we take the 14 GB of RAM left and apply it to our 3 memory areas:
■ Java heap
■ Native memory area
■ Metaspace memory area
@jtsweet
Let's Apply What We Have Learned So Far
● Our Kafka Connect worker will be running on a machine with 16 GB of RAM
○ How much of the RAM goes to Kafka Connect?
■ On average the Operating System will use around 2 GB of RAM for its own use.
○ That leaves us 14 GB of RAM for Kafka Connect and as a best practice there are no additional
components running on the machine.
○ Now we take the 14 GB of RAM left and apply it to our 3 memory areas:
■ Java heap
● -Xms2G -Xmx6G
■ Native memory area
■ Metaspace memory area
@jtsweet
Let's Apply What We Have Learned So Far
● Our Kafka Connect worker will be running on a machine with 16 GB of RAM
○ How much of the RAM goes to Kafka Connect?
■ On average the Operating System will use around 2 GB of RAM for its own use.
○ That leaves us 14 GB of RAM for Kafka Connect and as a best practice there are no additional
components running on the machine.
○ Now we take the 14 GB of RAM left and apply it to our 3 memory areas:
■ Java heap
● -Xms2G -Xmx6G
○ This leaves us with 8GB of RAM to be used between (14GB - 6GB = 8GB)
■ Native memory area
■ Metaspace memory area
@jtsweet
Let's Apply What We Have Learned So Far
● Our Kafka Connect worker will be running on a machine with 16 GB of RAM
○ How much of the RAM goes to Kafka Connect?
■ On average the Operating System will use around 2 GB of RAM for its own use.
○ That leaves us 14 GB of RAM for Kafka Connect and as a best practice there are no additional
components running on the machine.
○ Now we take the 14 GB of RAM left and apply it to our 3 memory areas:
■ Java heap
● -Xms2G -Xmx6G
○ This leaves us with 8GB of RAM to be used between
■ Native memory area
■ Metaspace memory area
○ Remember the JVM’s ideal heap utilization of 40% to 70% of the Java heap
@jtsweet
Let's Apply What We Have Learned So Far
● Our Kafka Connect worker will be running on a machine with 16 GB of RAM
○ How much of the RAM goes to Kafka Connect?
■ On average the Operating System will use around 2 GB of RAM for its own use.
○ That leaves us 14 GB of RAM for Kafka Connect and as a best practice there are no additional
components running on the machine.
○ Now we take the 14 GB of RAM left and apply it to our 3 memory areas:
■ Java heap
● -Xms2G -Xmx6G
○ This leaves us with 8GB of RAM to be used between
■ Native memory area
■ Metaspace memory area
○ Remember the JVM’s ideal heap utilization of 40% to 70% of the Java heap
■ In regards to the 6GB maximum heap size we have an ideal range of 2.4 GB to 4.2 GB
@jtsweet
Agenda
1. High Level Overview of Kafka Connect
2. Purpose
3. Foundation
4. Kafka Connect Has To Do More Than Just Run
@jtsweet
Kafka Connect Has To Do More Than Run
● How do we investigate the Java heap utilization of our Kafka Connect worker?
@jtsweet
Kafka Connect Has To Do More Than Run
● How do we investigate the Java heap utilization of our Kafka Connect worker?
○ Enable verbose garbage collection (verbose gc) and heap dump generation
■ How do I enable it or turn it on?
@jtsweet
Kafka Connect Has To Do More Than Run
● Enabling verbose gc and heap dump generation on Java 8
@jtsweet
Kafka Connect Has To Do More Than Run
● Enabling verbose gc and heap dump generation on Java 11 and Java 17
-Xlog:gc*,gc+ref=debug,gc+heap=debug,gc+age=trace:file=/tmp/gc.log:time,uptim
e,level,tags:filecount=5,filesize=100m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/heapdumpsFromJVM
Same disclaimer that the process has to be able to write the locations specified.
The location for the heap dump must be large enough for it to be written out
completely without it being truncated.
@jtsweet
Kafka Connect Has To Do More Than Run
● How do we investigate the Java heap utilization of our Kafka Connect worker?
○ Enable verbose garbage collection (verbose gc) and heap dump generation
○ Enable JMX metrics for Kafka Connect
@jtsweet
Kafka Connect Has To Do More Than Run
● How do we investigate the Java heap utilization of our Kafka Connect worker?
○ Enable verbose garbage collection (verbose gc) and heap dump generation
○ Enable JMX metrics for Kafka Connect
● Should you enable all of the above?
@jtsweet
Kafka Connect Has To Do More Than Run
● How do we investigate the Java heap utilization of our Kafka Connect worker?
○ Enable verbose garbage collection (verbose gc)
○ Enable JMX metrics for Kafka Connect
● Should you enable all of the above?
○ Yes, 100%
● Do your connect logs show the specific connector and specific task?
@jtsweet
Kafka Connect Has To Do More Than Run
● How do we investigate the Java heap utilization of our Kafka Connect worker?
○ Enable verbose garbage collection (verbose gc)
○ Enable JMX metrics for Kafka Connect
● Should you enable all of the above?
○ Yes, 100%
● Do your connect logs show the specific connector and specific task?
○ Enable Mapped Diagnostic Context (MDC) logging in your Kafka Connect environments
■ Do you have examples?
@jtsweet
Kafka Connect Has To Do More Than Run
Logs without Mapped Diagnostic Context enabled
INFO Using multi thread/connection supporting pooling connection manager
(io.searchbox.client.JestClientFactory)
INFO Using default GSON instance (io.searchbox.client.JestClientFactory)
INFO Node Discovery disabled... (io.searchbox.client.JestClientFactory)
INFO Idle connection reaping disabled... (io.searchbox.client.JestClientFactory)
@jtsweet
Kafka Connect Has To Do More Than Run
Logs with Mapped Diagnostic Context enabled
INFO [sink-elastic-orders-00|task-0] Using multi thread/connection supporting
pooling connection manager (io.searchbox.client.JestClientFactory:223)
INFO [sink-elastic-orders-00|task-0] Using default GSON instance
(io.searchbox.client.JestClientFactory:69)
INFO [sink-elastic-orders-00|task-0] Node Discovery disabled...
(io.searchbox.client.JestClientFactory:86)
@jtsweet
Kafka Connect Has To Do More Than Run
● How do we investigate the Java heap utilization of our Kafka Connect worker?
○ Enable verbose garbage collection (verbose gc)
○ Enable JMX metrics for Kafka Connect
● Should you enable all of the above?
○ Yes, 100%
● Do your connect logs show the specific connector and specific task?
○ Enable Mapped Diagnostic Context (MDC) logging in your Kafka Connect environments
■ Do you have examples?
● Which connect protocol should I be using in my environment?
@jtsweet
Kafka Connect Has To Do More Than Run
● How do we investigate the Java heap utilization of our Kafka Connect worker?
○ Enable verbose garbage collection (verbose gc)
○ Enable JMX metrics for Kafka Connect
● Should you enable all of the above?
○ Yes, 100%
● Do your connect logs show the specific connector and specific task?
○ Enable Mapped Diagnostic Context (MDC) logging in your Kafka Connect environments
■ Do you have examples?
● Which connect protocol should I be using in my environment?
○ Are you running Replicator as a connector?
@jtsweet
Kafka Connect Has To Do More Than Run
● How do we investigate the Java heap utilization of our Kafka Connect worker?
○ Enable verbose garbage collection (verbose gc)
○ Enable JMX metrics for Kafka Connect
● Should you enable all of the above?
○ Yes, 100%
● Do your connect logs show the specific connector and specific task?
○ Enable Mapped Diagnostic Context (MDC) logging in your Kafka Connect environments
■ Do you have examples?
● Which connect protocol should I be using in my environment?
○ Are you running Replicator as a connector?*
@jtsweet
Kafka Connect Has To Do More Than Run
● Java Runtime Environment (JRE) vs. Java Development Environment (JDK)
@jtsweet
Agenda
1. High Level Overview of Kafka Connect
2. Purpose
3. Foundation
4. Kafka Connect Has To Do More Than Just Run
@jtsweet
You can find me @
X (Previously Known As Twitter): RevJ1980
LinkedIn:
linkedin.com/in/james-t-sweet-01465582
Links:
File descriptors and mmap settings
How to enable verbose gc and heap dump Knowledge Base Article
JMX Metrics for Kafka Connect
Mapped Diagnostic Context (MDC)
Replicator
Deep Dive into Kafka Connect Protocol
Theme music - We Gon’ Be Alright by Ty Tribbet
Questions*

More Related Content

Similar to Configuring Kafka Connect To Be Successful At Scale

Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Making the big data ecosystem work together with Python & Apache Arrow, Apach...Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Making the big data ecosystem work together with Python & Apache Arrow, Apach...Holden Karau
 
Making the big data ecosystem work together with python apache arrow, spark,...
Making the big data ecosystem work together with python  apache arrow, spark,...Making the big data ecosystem work together with python  apache arrow, spark,...
Making the big data ecosystem work together with python apache arrow, spark,...Holden Karau
 
Query Parallelism in PostgreSQL: What's coming next?
Query Parallelism in PostgreSQL: What's coming next?Query Parallelism in PostgreSQL: What's coming next?
Query Parallelism in PostgreSQL: What's coming next?PGConf APAC
 
Jvm tuning in a rush! - Lviv JUG
Jvm tuning in a rush! - Lviv JUGJvm tuning in a rush! - Lviv JUG
Jvm tuning in a rush! - Lviv JUGTomek Borek
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streamingdatamantra
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkDemi Ben-Ari
 
Stateful stream processing with kafka and samza
Stateful stream processing with kafka and samzaStateful stream processing with kafka and samza
Stateful stream processing with kafka and samzaGeorge Li
 
Apache spark on planet scale
Apache spark on planet scaleApache spark on planet scale
Apache spark on planet scaleDenis Chapligin
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance TuningJeremy Leisy
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have ToHostedbyConfluent
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScyllaDB
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future DesignPivotalOpenSourceHub
 
Software Profiling: Java Performance, Profiling and Flamegraphs
Software Profiling: Java Performance, Profiling and FlamegraphsSoftware Profiling: Java Performance, Profiling and Flamegraphs
Software Profiling: Java Performance, Profiling and FlamegraphsIsuru Perera
 
Building maps for apps in the cloud - a Softlayer Use Case
Building maps for  apps in the cloud - a Softlayer Use CaseBuilding maps for  apps in the cloud - a Softlayer Use Case
Building maps for apps in the cloud - a Softlayer Use CaseTiman Rebel
 

Similar to Configuring Kafka Connect To Be Successful At Scale (20)

Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Making the big data ecosystem work together with Python & Apache Arrow, Apach...Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Making the big data ecosystem work together with Python & Apache Arrow, Apach...
 
Making the big data ecosystem work together with python apache arrow, spark,...
Making the big data ecosystem work together with python  apache arrow, spark,...Making the big data ecosystem work together with python  apache arrow, spark,...
Making the big data ecosystem work together with python apache arrow, spark,...
 
Query Parallelism in PostgreSQL: What's coming next?
Query Parallelism in PostgreSQL: What's coming next?Query Parallelism in PostgreSQL: What's coming next?
Query Parallelism in PostgreSQL: What's coming next?
 
Jvm tuning in a rush! - Lviv JUG
Jvm tuning in a rush! - Lviv JUGJvm tuning in a rush! - Lviv JUG
Jvm tuning in a rush! - Lviv JUG
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
 
Stateful stream processing with kafka and samza
Stateful stream processing with kafka and samzaStateful stream processing with kafka and samza
Stateful stream processing with kafka and samza
 
Apache spark on planet scale
Apache spark on planet scaleApache spark on planet scale
Apache spark on planet scale
 
ForkJoinPools and parallel streams
ForkJoinPools and parallel streamsForkJoinPools and parallel streams
ForkJoinPools and parallel streams
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To
 
NUMA and Java Databases
NUMA and Java DatabasesNUMA and Java Databases
NUMA and Java Databases
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design
 
Software Profiling: Java Performance, Profiling and Flamegraphs
Software Profiling: Java Performance, Profiling and FlamegraphsSoftware Profiling: Java Performance, Profiling and Flamegraphs
Software Profiling: Java Performance, Profiling and Flamegraphs
 
Building maps for apps in the cloud - a Softlayer Use Case
Building maps for  apps in the cloud - a Softlayer Use CaseBuilding maps for  apps in the cloud - a Softlayer Use Case
Building maps for apps in the cloud - a Softlayer Use Case
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Configuring Kafka Connect To Be Successful At Scale

  • 1. Configuring Kafka Connect To Be Successful At Scale Travis Sweet Sr. Technical Support Engineer
  • 2. About Me Travis Sweet I’ve been supporting Java applications since August 2008 through the present day, i.e. 15+ years. Over 5+ years of that time has been supporting Confluent Platform. In 2023 my focus has been on Kafka Connect. If I had my own theme music for this session it would be: We Gon’ Be Alright by Ty Tribbett I like fries and ice cream.* @jtsweet
  • 3. Agenda 1. High Level Overview of Kafka Connect 2. Purpose 3. Foundation 4. Kafka Connect Has To Do More Than Just Run @jtsweet
  • 4. Overview of Kafka Connect ● Kafka Connect is a wonderful piece of computer software that allows users to connect to external systems to put data into motion: ○ Source systems (where data comes from) ■ The data is produced (written to Kafka brokers) ○ Sink systems (where the data goes to, its new home) ■ The data is consumed (written to the new destination system) @jtsweet
  • 5. Agenda 1. High Level Overview of Kafka Connect 2. Purpose 3. Foundation 4. Kafka Connect Has To Do More Than Just Run @jtsweet
  • 7. Purpose ● The house represents Kafka Connect Clusters that have not been configured to be successful at scale. @jtsweet
  • 8. Purpose ● The house represents Kafka Connect Clusters that have not been configured to be successful at scale. ● Similarities between the house and Kafka Connect Clusters. @jtsweet
  • 9. Purpose ● The house represents Kafka Connect Clusters that have not been configured to be successful at scale. ● Similarities between the house and Kafka Connect Clusters. ● Where, when, and why does this happen? @jtsweet
  • 10. Purpose ● The house represents Kafka Connect Clusters that have not been configured to be successful at scale. ● Similarities between the house and Kafka Connect Clusters. ● When, where, and why does this happen? ● We can prevent this! @jtsweet
  • 11. Purpose ● The house represents Kafka Connect Clusters that have not been configured to be successful at scale. ● Similarities between the house and Kafka Connect Clusters. ● When, where, and why does this happen? ● We can prevent this! ● Please spread the word! @jtsweet
  • 12. Agenda 1. High Level Overview of Kafka Connect 2. Purpose 3. Foundation 4. Kafka Connect Has To Do More Than Just Run @jtsweet
  • 15. Foundation ● Memory ○ In respect to Kafka Connect (JVM or Java Virtual Machine) @jtsweet
  • 16. Foundation ● Memory ○ In respect to Kafka Connect (JVM) ○ There are 3 memory areas @jtsweet
  • 17. Foundation ● Memory ○ In respect to Kafka Connect (JVM) ○ There are 3 memory areas ■ Metaspace memory area - previously named PermGen prior to Java 8 @jtsweet
  • 18. Foundation ● Memory ○ In respect to Kafka Connect (JVM) ○ There are 3 memory areas ■ Metaspace memory area - previously named PermGen prior to Java 8 ■ Native memory area @jtsweet
  • 19. Foundation ● Memory ○ In respect to Kafka Connect (JVM) ○ There are 3 memory areas ■ Metaspace memory area - previously named PermGen prior to Java 8 ■ Native memory area ■ Java heap @jtsweet
  • 20. Foundation ● Memory ○ In respect to Kafka Connect (JVM) ○ There are 3 memory areas ■ Metaspace memory area - previously named PermGen prior to Java 8 ■ Native memory area ■ Java heap ○ Total memory footprint of a Java process @jtsweet
  • 21. Foundation ● Memory ○ In respect to Kafka Connect (JVM) ○ There are 3 memory areas ■ Metaspace memory area - previously named PermGen prior to Java 8 ■ Native memory area ■ Java heap ○ Total memory footprint of a Java process ■ It is the sum of the 3 memory areas @jtsweet
  • 22. Foundation ● Memory ○ In respect to Kafka Connect (JVM) ○ There are 3 memory areas ■ Metaspace memory area - previously named PermGen prior to Java 8 ■ Native memory area ■ Java heap ○ Total memory footprint of a Java process ■ It is the sum of the 3 memory areas ○ What does the JVM try to do in regards to garbage collection? @jtsweet
  • 23. Foundation ● Memory ○ In respect to Kafka Connect (JVM) ○ There are 3 memory areas ■ Metaspace memory area - previously named PermGen prior to Java 8 ■ Native memory area ■ Java heap ○ Total memory footprint of a Java process ■ It is the sum of the 3 memory areas ○ What does the JVM try to do in regards to garbage collection? ■ By design the JVM will try to keep the heap utilization between 40% to 70% of the Java heap. @jtsweet
  • 24. Let's Apply What We Have Learned So Far ● Our Kafka Connect worker will be running on a machine with 16 GB of RAM @jtsweet
  • 25. Let's Apply What We Have Learned So Far ● Our Kafka Connect worker will be running on a machine with 16 GB of RAM ○ How much of the RAM goes to Kafka Connect? @jtsweet
  • 26. Let's Apply What We Have Learned So Far ● Our Kafka Connect worker will be running on a machine with 16 GB of RAM ○ How much of the RAM goes to Kafka Connect? ■ On average the Operating System will use around 2 GB of RAM for its own use. @jtsweet
  • 27. Let's Apply What We Have Learned So Far ● Our Kafka Connect worker will be running on a machine with 16 GB of RAM ○ How much of the RAM goes to Kafka Connect? ■ On average the Operating System will use around 2 GB of RAM for its own use. ○ That leaves us 14 GB of RAM for Kafka Connect and as a best practice there are no additional components running on the machine. ○ Now we take the 14 GB of RAM left and apply it to our 3 memory areas: ■ Java heap ■ Native memory area ■ Metaspace memory area @jtsweet
  • 28. Let's Apply What We Have Learned So Far ● Our Kafka Connect worker will be running on a machine with 16 GB of RAM ○ How much of the RAM goes to Kafka Connect? ■ On average the Operating System will use around 2 GB of RAM for its own use. ○ That leaves us 14 GB of RAM for Kafka Connect and as a best practice there are no additional components running on the machine. ○ Now we take the 14 GB of RAM left and apply it to our 3 memory areas: ■ Java heap ● -Xms2G -Xmx6G ■ Native memory area ■ Metaspace memory area @jtsweet
  • 29. Let's Apply What We Have Learned So Far ● Our Kafka Connect worker will be running on a machine with 16 GB of RAM ○ How much of the RAM goes to Kafka Connect? ■ On average the Operating System will use around 2 GB of RAM for its own use. ○ That leaves us 14 GB of RAM for Kafka Connect and as a best practice there are no additional components running on the machine. ○ Now we take the 14 GB of RAM left and apply it to our 3 memory areas: ■ Java heap ● -Xms2G -Xmx6G ○ This leaves us with 8GB of RAM to be used between (14GB - 6GB = 8GB) ■ Native memory area ■ Metaspace memory area @jtsweet
  • 30. Let's Apply What We Have Learned So Far ● Our Kafka Connect worker will be running on a machine with 16 GB of RAM ○ How much of the RAM goes to Kafka Connect? ■ On average the Operating System will use around 2 GB of RAM for its own use. ○ That leaves us 14 GB of RAM for Kafka Connect and as a best practice there are no additional components running on the machine. ○ Now we take the 14 GB of RAM left and apply it to our 3 memory areas: ■ Java heap ● -Xms2G -Xmx6G ○ This leaves us with 8GB of RAM to be used between ■ Native memory area ■ Metaspace memory area ○ Remember the JVM’s ideal heap utilization of 40% to 70% of the Java heap @jtsweet
  • 31. Let's Apply What We Have Learned So Far ● Our Kafka Connect worker will be running on a machine with 16 GB of RAM ○ How much of the RAM goes to Kafka Connect? ■ On average the Operating System will use around 2 GB of RAM for its own use. ○ That leaves us 14 GB of RAM for Kafka Connect and as a best practice there are no additional components running on the machine. ○ Now we take the 14 GB of RAM left and apply it to our 3 memory areas: ■ Java heap ● -Xms2G -Xmx6G ○ This leaves us with 8GB of RAM to be used between ■ Native memory area ■ Metaspace memory area ○ Remember the JVM’s ideal heap utilization of 40% to 70% of the Java heap ■ In regards to the 6GB maximum heap size we have an ideal range of 2.4 GB to 4.2 GB @jtsweet
  • 32. Agenda 1. High Level Overview of Kafka Connect 2. Purpose 3. Foundation 4. Kafka Connect Has To Do More Than Just Run @jtsweet
  • 33. Kafka Connect Has To Do More Than Run ● How do we investigate the Java heap utilization of our Kafka Connect worker? @jtsweet
  • 34. Kafka Connect Has To Do More Than Run ● How do we investigate the Java heap utilization of our Kafka Connect worker? ○ Enable verbose garbage collection (verbose gc) and heap dump generation ■ How do I enable it or turn it on? @jtsweet
  • 35. Kafka Connect Has To Do More Than Run ● Enabling verbose gc and heap dump generation on Java 8 @jtsweet
  • 36. Kafka Connect Has To Do More Than Run ● Enabling verbose gc and heap dump generation on Java 11 and Java 17 -Xlog:gc*,gc+ref=debug,gc+heap=debug,gc+age=trace:file=/tmp/gc.log:time,uptim e,level,tags:filecount=5,filesize=100m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdumpsFromJVM Same disclaimer that the process has to be able to write the locations specified. The location for the heap dump must be large enough for it to be written out completely without it being truncated. @jtsweet
  • 37. Kafka Connect Has To Do More Than Run ● How do we investigate the Java heap utilization of our Kafka Connect worker? ○ Enable verbose garbage collection (verbose gc) and heap dump generation ○ Enable JMX metrics for Kafka Connect @jtsweet
  • 38. Kafka Connect Has To Do More Than Run ● How do we investigate the Java heap utilization of our Kafka Connect worker? ○ Enable verbose garbage collection (verbose gc) and heap dump generation ○ Enable JMX metrics for Kafka Connect ● Should you enable all of the above? @jtsweet
  • 39. Kafka Connect Has To Do More Than Run ● How do we investigate the Java heap utilization of our Kafka Connect worker? ○ Enable verbose garbage collection (verbose gc) ○ Enable JMX metrics for Kafka Connect ● Should you enable all of the above? ○ Yes, 100% ● Do your connect logs show the specific connector and specific task? @jtsweet
  • 40. Kafka Connect Has To Do More Than Run ● How do we investigate the Java heap utilization of our Kafka Connect worker? ○ Enable verbose garbage collection (verbose gc) ○ Enable JMX metrics for Kafka Connect ● Should you enable all of the above? ○ Yes, 100% ● Do your connect logs show the specific connector and specific task? ○ Enable Mapped Diagnostic Context (MDC) logging in your Kafka Connect environments ■ Do you have examples? @jtsweet
  • 41. Kafka Connect Has To Do More Than Run Logs without Mapped Diagnostic Context enabled INFO Using multi thread/connection supporting pooling connection manager (io.searchbox.client.JestClientFactory) INFO Using default GSON instance (io.searchbox.client.JestClientFactory) INFO Node Discovery disabled... (io.searchbox.client.JestClientFactory) INFO Idle connection reaping disabled... (io.searchbox.client.JestClientFactory) @jtsweet
  • 42. Kafka Connect Has To Do More Than Run Logs with Mapped Diagnostic Context enabled INFO [sink-elastic-orders-00|task-0] Using multi thread/connection supporting pooling connection manager (io.searchbox.client.JestClientFactory:223) INFO [sink-elastic-orders-00|task-0] Using default GSON instance (io.searchbox.client.JestClientFactory:69) INFO [sink-elastic-orders-00|task-0] Node Discovery disabled... (io.searchbox.client.JestClientFactory:86) @jtsweet
  • 43. Kafka Connect Has To Do More Than Run ● How do we investigate the Java heap utilization of our Kafka Connect worker? ○ Enable verbose garbage collection (verbose gc) ○ Enable JMX metrics for Kafka Connect ● Should you enable all of the above? ○ Yes, 100% ● Do your connect logs show the specific connector and specific task? ○ Enable Mapped Diagnostic Context (MDC) logging in your Kafka Connect environments ■ Do you have examples? ● Which connect protocol should I be using in my environment? @jtsweet
  • 44. Kafka Connect Has To Do More Than Run ● How do we investigate the Java heap utilization of our Kafka Connect worker? ○ Enable verbose garbage collection (verbose gc) ○ Enable JMX metrics for Kafka Connect ● Should you enable all of the above? ○ Yes, 100% ● Do your connect logs show the specific connector and specific task? ○ Enable Mapped Diagnostic Context (MDC) logging in your Kafka Connect environments ■ Do you have examples? ● Which connect protocol should I be using in my environment? ○ Are you running Replicator as a connector? @jtsweet
  • 45. Kafka Connect Has To Do More Than Run ● How do we investigate the Java heap utilization of our Kafka Connect worker? ○ Enable verbose garbage collection (verbose gc) ○ Enable JMX metrics for Kafka Connect ● Should you enable all of the above? ○ Yes, 100% ● Do your connect logs show the specific connector and specific task? ○ Enable Mapped Diagnostic Context (MDC) logging in your Kafka Connect environments ■ Do you have examples? ● Which connect protocol should I be using in my environment? ○ Are you running Replicator as a connector?* @jtsweet
  • 46. Kafka Connect Has To Do More Than Run ● Java Runtime Environment (JRE) vs. Java Development Environment (JDK) @jtsweet
  • 47. Agenda 1. High Level Overview of Kafka Connect 2. Purpose 3. Foundation 4. Kafka Connect Has To Do More Than Just Run @jtsweet
  • 48. You can find me @ X (Previously Known As Twitter): RevJ1980 LinkedIn: linkedin.com/in/james-t-sweet-01465582
  • 49. Links: File descriptors and mmap settings How to enable verbose gc and heap dump Knowledge Base Article JMX Metrics for Kafka Connect Mapped Diagnostic Context (MDC) Replicator Deep Dive into Kafka Connect Protocol Theme music - We Gon’ Be Alright by Ty Tribbet