Configuring Kafka Connect To Be Successful At Scale

Conﬁguring Kafka Connect To Be
Successful At Scale
Travis Sweet
Sr. Technical Support Engineer

About Me
Travis Sweet
I’ve been supporting Java applications since
August 2008 through the present day, i.e. 15+
years. Over 5+ years of that time has been
supporting Conﬂuent Platform. In 2023 my
focus has been on Kafka Connect.
If I had my own theme music for this session
it would be:
We Gon’ Be Alright by Ty Tribbett
I like fries and ice cream.*
@jtsweet

Agenda
1. High Level Overview of Kafka Connect
2. Purpose
3. Foundation
4. Kafka Connect Has To Do More Than Just Run
@jtsweet

Overview of Kafka Connect
● Kafka Connect is a wonderful piece of computer software that allows users to
connect to external systems to put data into motion:
○ Source systems (where data comes from)
■ The data is produced (written to Kafka brokers)
○ Sink systems (where the data goes to, its new home)
■ The data is consumed (written to the new destination system)
@jtsweet

Purpose
● The house represents Kafka Connect Clusters that have not been configured
to be successful at scale.
@jtsweet

Purpose
● Similarities between the house and Kafka Connect Clusters.
@jtsweet

Purpose
● Where, when, and why does this happen?
@jtsweet

Purpose
● When, where, and why does this happen?
● We can prevent this!
@jtsweet

Purpose
● When, where, and why does this happen?
● We can prevent this!
● Please spread the word!
@jtsweet

Foundation
● CPUs/Cores
@jtsweet

Foundation
● CPUs/Cores
● Memory
@jtsweet

Foundation
● Memory
○ In respect to Kafka Connect (JVM or Java Virtual Machine)
@jtsweet

Foundation
● Memory
○ In respect to Kafka Connect (JVM)
○ There are 3 memory areas
@jtsweet

Foundation
● Memory
■ Metaspace memory area - previously named PermGen prior to Java 8
@jtsweet

Foundation
● Memory
■ Native memory area
@jtsweet

Foundation
● Memory
■ Java heap
@jtsweet

Foundation
● Memory
■ Java heap
○ Total memory footprint of a Java process
@jtsweet

Foundation
● Memory
■ Java heap
■ It is the sum of the 3 memory areas
@jtsweet

Foundation
● Memory
■ Java heap
○ What does the JVM try to do in regards to garbage collection?
@jtsweet

Foundation
● Memory
■ Java heap
○ What does the JVM try to do in regards to garbage collection?
■ By design the JVM will try to keep the heap utilization between 40% to 70% of the Java
heap.
@jtsweet

Let's Apply What We Have Learned So Far
● Our Kafka Connect worker will be running on a machine with 16 GB of RAM
@jtsweet

○ How much of the RAM goes to Kafka Connect?
@jtsweet

■ On average the Operating System will use around 2 GB of RAM for its own use.
@jtsweet

○ That leaves us 14 GB of RAM for Kafka Connect and as a best practice there are no additional
components running on the machine.
○ Now we take the 14 GB of RAM left and apply it to our 3 memory areas:
■ Java heap
■ Metaspace memory area
@jtsweet

■ Java heap
● -Xms2G -Xmx6G
@jtsweet

■ Java heap
● -Xms2G -Xmx6G
○ This leaves us with 8GB of RAM to be used between (14GB - 6GB = 8GB)
@jtsweet

■ Java heap
● -Xms2G -Xmx6G
○ This leaves us with 8GB of RAM to be used between
○ Remember the JVM’s ideal heap utilization of 40% to 70% of the Java heap
@jtsweet

■ Java heap
● -Xms2G -Xmx6G
○ This leaves us with 8GB of RAM to be used between
○ Remember the JVM’s ideal heap utilization of 40% to 70% of the Java heap
■ In regards to the 6GB maximum heap size we have an ideal range of 2.4 GB to 4.2 GB
@jtsweet

Kafka Connect Has To Do More Than Run
● How do we investigate the Java heap utilization of our Kafka Connect worker?
@jtsweet

○ Enable verbose garbage collection (verbose gc) and heap dump generation
■ How do I enable it or turn it on?
@jtsweet

● Enabling verbose gc and heap dump generation on Java 8
@jtsweet

● Enabling verbose gc and heap dump generation on Java 11 and Java 17
-Xlog:gc*,gc+ref=debug,gc+heap=debug,gc+age=trace:file=/tmp/gc.log:time,uptim
e,level,tags:filecount=5,filesize=100m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/heapdumpsFromJVM
Same disclaimer that the process has to be able to write the locations specified.
The location for the heap dump must be large enough for it to be written out
completely without it being truncated.
@jtsweet

○ Enable JMX metrics for Kafka Connect
@jtsweet

● Should you enable all of the above?
@jtsweet

○ Enable verbose garbage collection (verbose gc)
○ Yes, 100%
● Do your connect logs show the specific connector and specific task?
@jtsweet

○ Yes, 100%
○ Enable Mapped Diagnostic Context (MDC) logging in your Kafka Connect environments
■ Do you have examples?
@jtsweet

Logs without Mapped Diagnostic Context enabled
INFO Using multi thread/connection supporting pooling connection manager
(io.searchbox.client.JestClientFactory)
INFO Using default GSON instance (io.searchbox.client.JestClientFactory)
INFO Node Discovery disabled... (io.searchbox.client.JestClientFactory)
INFO Idle connection reaping disabled... (io.searchbox.client.JestClientFactory)
@jtsweet

Logs with Mapped Diagnostic Context enabled
INFO [sink-elastic-orders-00|task-0] Using multi thread/connection supporting
pooling connection manager (io.searchbox.client.JestClientFactory:223)
INFO [sink-elastic-orders-00|task-0] Using default GSON instance
(io.searchbox.client.JestClientFactory:69)
INFO [sink-elastic-orders-00|task-0] Node Discovery disabled...
(io.searchbox.client.JestClientFactory:86)
@jtsweet

○ Yes, 100%
● Which connect protocol should I be using in my environment?
@jtsweet

○ Yes, 100%
○ Are you running Replicator as a connector?
@jtsweet

○ Yes, 100%
○ Are you running Replicator as a connector?*
@jtsweet

● Java Runtime Environment (JRE) vs. Java Development Environment (JDK)
@jtsweet

You can ﬁnd me @
X (Previously Known As Twitter): RevJ1980
LinkedIn:
linkedin.com/in/james-t-sweet-01465582

Links:
File descriptors and mmap settings
How to enable verbose gc and heap dump Knowledge Base Article
JMX Metrics for Kafka Connect
Mapped Diagnostic Context (MDC)
Replicator
Deep Dive into Kafka Connect Protocol
Theme music - We Gon’ Be Alright by Ty Tribbet

Configuring Kafka Connect To Be Successful At Scale

Recommended

Recommended

More Related Content

Similar to Configuring Kafka Connect To Be Successful At Scale

Similar to Configuring Kafka Connect To Be Successful At Scale (20)

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

Configuring Kafka Connect To Be Successful At Scale