8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style

1
Riding the Streaming
Wave
DIY style:
Using & Building Kafka Connect
Plugins with Confluent Open Source
Konstantine Karantasis, Software Engineer

2
Intro
Contributor:
• Apache Kafka
• Confluent Open Source
• Confluent Enterprise

5
… and Answer
Streaming data platform as the central nervous system of data architecture.

13
Kafka Connect: Out-of-the-box
Get for free:
• Distributed deployment that scales.
• Lean multi-tenancy with packaging/classloading isolation.
• REST API to manage plugins.
• Easy to use interfaces for automatic and periodic as well as
manual tracking of progress.

14
Kafka Connect: Out-of-the-box
more:
• Native integration with monitoring
platforms such as Confluent
Control Center.
• Metrics (more coming up soon).
• Continuous open source
development.

15
Kafka Connect Plugins: A Developer’s Perspective
An ecosystem of plugins:
• Connectors
• Transforms
• Converters
How data flow through Connect API:

16
Kafka Connect Plugin Types
Connectors are the richest plugins. Two types:
• Source Connectors
• Sink Connectors
Connectors may support both structured and unstructured data:
• Converters and the Schema Registry
Transforms:
• Is data useful as is? Or can use some basic transformations?

17
Where to start
Sounds great! How do I start?
(or else: Why Confluent Open Source for Connect plugin dev)

18
Where to start
Iterate fast during development with Confluent CLI:

19
Classloading Isolation: Development with peace of mind
Use the plugin.path worker configuration property
my-plugins (included in the plugin.path )
kafka-connect-foo-connector
JAR files, sample configs, licenses,
etc.

20
Classloading Isolation: Development with peace of mind
Workers isolate the JARs for each connector, transform,
and converter to prevent conflicts.
my-plugins (included in the plugin.path )
kafka-connect-foo-connector
kafka-connect-bar-connector
kafka-connect-baz-uber.jar

21
Kafka Connect in Action
Let’s build a simple stream of data with Kafka Connect.
• Find a dev-friendly public feed (e.g. meetup rsvps)
• Start a simple source connector (here: file source connector)
Demo with Confluent CLI

22
Source Connector Dev: Basic Concepts
• Keep track of Source Connector’s progress: User-defined
source offsets.
• It’s a distributed system. Design for multiple workers/multiple
tasks per worker.
• Map your data to topic-partitions efficiently.
• At-least-once semantics for Source Connectors.

23
Kafka Connect in Action
• Next, start a sink connector to extend the stream
(here: s3 sink connector)
Demo with Confluent CLI

24
Sink Connector Dev: Basic Concepts
• Sink Connectors utilize Kafka consumer offsets to track
progress.
• By default at-least once semantics with automatic and periodic
offset commits.
• But if the Sink allows for determinism and idempotence, sink
connectors can be exactly-once!
• Examples: S3 and HDFS Connectors.

25
Debugging Kafka Connect and Connect Plugins
This looked nice, but also pretty ideal for Connector development.
How do I debug my Connect plugins?
• Set the right environment variables for Confluent CLI
export CONNECT_DEBUG=y;
export DEBUG_SUSPEND_FLAG=y;
• also: export CONFLUENT_CURRENT=/your/fav/dir/location
(doesn’t have to be have to be /tmp)
• Restart Kafka Connect worker
• Attach a remote debugger using your favorite IDE

26
Package and Publish your connector
So you built your Kafka Connect plugin!
• Currently follow a commonly used pattern using maven-
assembly-plugin or shade plugin
• maven-kafka-connect-plugin coming up soon.
• confluent.io/connectors

27
Summary
Why develop with Kafka Connect?
• Lot’s of functionality for free (scalability, multi-tenancy, mgmt, monitoring)
• Quick on-boarding
• Active community
How to develop your Kafka Connect plugins?
• Use Confluent CLI
• Use classloading isolation with plugin.path
• Debug your connector with your favorite IDE
• Extend existing open source plugins or build your own!
• Stay tuned, the best is yet to come!

8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style

Similar to 8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style (20)

More from Athens Big Data

More from Athens Big Data (20)

Recently uploaded

Recently uploaded (20)

8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style