2. Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Respect Knolx session
timings, you are requested not
to join sessions after a 5
minutes threshold post the
session start time.
Feedback
Make sure to submit a
constructive feedback for all
sessions as it is very helpful
for the presenter.
Silent Mode
please keep your window on
mute.
Avoid Disturbance
Avoid leaving your window
unmuted after asking a
question.
3. Agenda
What is kafka connect
01
02
03
04
05
Core Concepts
Features of kafka Connect
Demo
Architecture of kafka connect
4. What is Kafka Connect?
Apache Kafka is a distributed, resilient, fault tolerant platform. Apache Kafka is a
well-known name in the world of Big Data. It is one of the most used distributed
streaming platforms
It is a framework for storing, reading and analyzing streaming data. It is a publish-
subscribe based durable messaging system exchanging data between processes,
applications, and servers.Apache Kafka is a distributed, resilient, fault tolerant
platform .
6. Some Important terms will help to understand kafka Connect:
● Connectors
● Tasks
● workers
● Transforms
● Coverters
Kafka Connect Terminologies
7. Standalone vs Distributed Mode
Standalone
● Single Process run both connectors and tasks.
● Configuration use .properties files
● Very easy to get start with, useful for development and testing.
● Not fault tolerant, no scalability, hard to monitor
Distributed
● Multiple workers run connectors and tasks
● Configuration is performed by a REST API
● easy to scale, and fault tolerant(rebalancing in case a worker dies)
● Useful for production deployment of connectors.
8. source Connector:-
A source connector collects data from a system. Source systems can be entire databases, streams tables, or
message brokers.
A source connector could also collect metrics from application servers into Kafka topics, making the data
available for stream processing with low latency
Sink Connector:-
A sink connector delivers data from Kafka topics into other systems, which might be indexes such as
Elasticsearch, batch systems such as Hadoop, or any kind of database
Different types of Kafka Connectors
9. Kafka Connect - High level
● Source Connector to get data from Common Data sources
● Sink Connector to publish that data in common data Store
● Make it easy for non-expensive dev to quickly get their data reliably into
kafka
● Part of your ETL pipeline
● Scaling made easy from small pipelies to company-wide pipelines
● Re-usable code!
10. Here are a few common ways Kafka Connect is used: -
Streaming Data Pipelines
Writing to Datastores from an Application
Evolve Processing from Old Systems to New
Kafka Connect Use Cases
11. ● it simplifies the development, deployment, and management of connectors
● it helps us to deploy large clusters by leveraging the distributed nature of Kafka, as well as
setups for development, testing, and small production deployments
● Kafka Connect helps us to handle the offset commit process.
● Kafka Connect uses the existing group management protocol; we can add more workers to
scale up a Kafka Connect cluster
Features