SlideShare a Scribd company logo
1 of 71
Download to read offline
@jwfbean | @confluentinc
Building a Kafka Connector
Verified Integrations Program
Speakers
Jeff Bean
Partner Solution Architect
Lisa Sensmeier
Partner Marketing
Todd McGrath
Partner Solution Architect
Agenda
● Why Kafka Connect
● Kafka Connect Architecture
● Building a Connector
● Common Issues
● Examples
● Verification Criteria & Process
When Pipelines Attack
@jwfbean | @confluentinc
5
Apps
Stream
Processing
Search
KV
RDBMS DW
Real-Time
Analytics
Monitoring
@jwfbean | @confluentinc
6
PRODUCER
CONSUMER
Producer
Application
Consumer
Application
@jwfbean | @confluentinc
7
PRODUCER
CONSUMER
Sink
ConnectorSMTsSource
Connector ConverterSMTs Converter
KAFKA CONNECT KAFKA CONNECT
@jwfbean | @confluentinc
8
PRODUCER
CONSUMER
Sink
ConnectorSMTsSource
Connector ConverterSMTs Converter
KAFKA CONNECT KAFKA CONNECT
@jwfbean | @confluentinc
9
Connectors
● A logical job that copies data in and out of Apache Kafka®
● Maintains tasks
● Provides lifecycle and configuration information
● Ultimately a JAR available to the connect JVM
● Stateless! Use Kafka topics or source / target for state
@jwfbean | @confluentinc
10RESTAPI
RESTAPI
RESTAPI
C
TTTT
TTTT
Other
Task(s)
Other
Task(s)
Other
Task(s)
start(…)
poll(…) poll(…) poll(…)
poll(…)
Other
Connector(s)
start(…) start(…)
start(…)
TTTT
Running
@jwfbean | @confluentinc
11
Kafka Connect - Connector and Task Lifecycles
REST
API
REST
API
RESTAPI
KAFKA CONNECT
validate
CC
start(…)
deploy
Starting
@jwfbean | @confluentinc
12
Kafka Connect - Connector and Task Lifecycles
RESTAPI
RESTAPI
RESTAPI
C
taskConfigs(…)
TTTT
Running
@jwfbean | @confluentinc
13
Tasks
● Lifecycle managed by a connector
● Runs in a worker
● Manages one or more topic partitions
● The assignment is dynamic at runtime
● Does the actual copying and transforming of things
@jwfbean | @confluentinc
14
Workers
● Actual JVM processes running on a computer of some kind
● Tasks get allocated to workers
● Can run in standalone or distributed mode
○ Standalone is good for dev, one-offs, and conference demos
○ Distributed good for scale and fault tolerance
@jwfbean | @confluentinc
15RESTAPI
RESTAPI
RESTAPI
C
TTTT
TTTT
Other
Task(s) Other
Task(s)
Other
Task(s)
Other
Connector(s)
Recovery from Failures
@jwfbean | @confluentinc
16RESTAPI
RESTAPI
C
TTTT
TTTT
Other
Task(s)
Other
Task(s)
Other
Connector(s
)
Recovery from Failures
@jwfbean | @confluentinc
17RESTAPI
RESTAPI
C
TTTT
TTTT
Stopping
stop() stop() stop()stop()
stop() Other
Task(s)
Other
Connector(s
)
Other
Task(s)
@jwfbean | @confluentinc
18
Delivery Guarantees
● Framework-managed offsets
● At-least-once default
● Exactly once available with connector support
@jwfbean | @confluentinc
19
Converters
@jwfbean | @confluentinc
20
Converters
● Convert input data to bytes for storing in Kafka
● Convert input data from bytes to output to somewhere else
● Sit between the connector and Kafka in either direction
@twitter_handle | #hashtag | @confluentinc
21
Single Message Transform
(SMT)
Single Message
Transforms
● Mask sensitive information
● Add identifiers
● Tag events
● Lineage/provenance
● Remove unnecessary columns
● Route high priority events to
faster datastores
● Direct events to different
Elasticsearch indexes
● Cast data types to match
destination
● Remove unnecessary columns
Modify events before storing in Kafka:
Modify events going out of
Kafka:
@jwfbean | @confluentinc
23
Where SMTs Live
@jwfbean | @confluentinc
24
Built-in Transformations
● InsertField – Add a field using either static data or record metadata
● ReplaceField – Filter or rename fields
● MaskField – Replace field with valid null value for the type (0, empty string, etc.)
● ValueToKey – Set the key to one of the value’s fields
● HoistField – Wrap the entire event as a single field inside a Struct or a Map
● ExtractField – Extract a specific field from Struct and Map and include only this field in results
● SetSchemaMetadata – Modify the schema name or version
● TimestampRouter – Modify the topic of a record based on original topic and timestamp, which is
useful when using a sink that needs to write to different tables or indexes based on timestamps
● RegexpRouter – Modify the topic of a record based on the original topic, replacement string, and a
regular expression
Kafka Connect API
3
1
SourceConnector and SinkConnector
class Connector {
String version();
ConfigDef config();
Class<? extends Task> taskClass();
void start(Map<String, String> props);
List<Map<String, String>> taskConfigs(int maxTasks);
void stop();
...
}
3
2
SourceConnector and SinkConnector
class SinkConnector extends Connector {}
class SourceConnector extends Connector {}
@jwfbean | @confluentinc
33
Starting a connector
void start(Map<String, String> props);
•The user-specified connector
configuration
•Optionally talk to external system to get
information about tasks
•Determine the task configuration
•Optionally start thread(s) to monitor
external system, and if needed
ask for task reconfiguration
@jwfbean | @confluentinc
34
Task configs
List<Map<String, String>> taskConfigs(int maxTasks);
•Tell Connect the configurations for each
of the tasks
@jwfbean | @confluentinc
35
Considerations - Testing
•Design for testability
•Unit tests and integration tests in your build
-look for reusable integration tests (PR5516) using
embedded ZK & Kafka
•Continuous integration for regression tests
•System tests use real ZK, Kafka, Connect, and external systems
•Performance and soak tests
•See the Connect Verification Guide for more detailed information on testing.
@jwfbean | @confluentinc
36
SourceConnector and
SinkConnector - Stoppingvoid stop();
•Notification that the connector is being
stopped
•Clean up any resources that were
allocated so far
3
7
SourceTask
abstract class SourceTask implements Task {
protected SourceTaskContext context;
void start(Map<String, String> props);
List<SourceRecord> poll() throws InterruptedException;
void stop();
...
}
@jwfbean | @confluentinc
38
SourceTask - Starting
•This task’s configuration that was created
by your connector
•Read previously committed offsets to
know where in the external system to start
reading
•Create any resources it might need
- Connections to external system
- Buffers, queues, threads, etc.
@jwfbean | @confluentinc
39
SourceTask - getting records
•The topic partitions assigned to this task's
consumer
•Optionally pre-allocate writers and
resources
•Consumer will by default start based on
its own committed offsets
•Or use context.offsets(...) to set the
desired starting point
● Called frequently
● Get the next batch of records
that Connect should write to
Kafka
● block until there are “enough”
records to return
● return null if no records right
now
● For systems that push data
● use separate thread to receive
and process the data and
enqueue
● poll then dequeues and returns
records
List<SourceRecord> poll() throws InterruptedException;
@jwfbean | @confluentinc
40
SourceRecord
topic : String
partition : Integer
keySchema : Schema
key : Object
valueSchema : Schema
value : Object
timestamp : Long
headers : Headers
sourcePartition : Map
sourceOffset : Map
• Topic where the record is to be written
• Optional partition # for the topic
• Optional key and the schema that describes it
• Optional value and the schema that describes it
• Optional headers
• Optional timestamp
• Source “partition” and “offset"
- Describes where this record originated
- Defined by connector, used only by connector
- Connect captures last partition+offset that it writes, periodically committed to
connect-offsetstopic
- When task starts up, it reads these to know where it should start
- TIP: reuse the same partition Map instance
• Serialized via Converters
SourceRecord
@jwfbean | @confluentinc
41
Connect Schemas
name : String
version : String
doc : String
type : Type
parameters : Map
fields : List<Field>
name : String
schema : Schema
index : int
Field
Schema
• Name (required), version, and documentation
• Type of schema: primitive, Map, Array, Struct
• Whether the value described by the schema is optional
• Optional metadata parameters
• Information about the structure:
- For Struct schemas, the fields
- For Map schemas, the key schema and value schema
- For Array schemas, the element schema
• Name of field that this schema describes
• Schema for field value
• Index of field within the Struct
@jwfbean | @confluentinc
42
SourceTask - Stopping
•Clean up resources that were allocated so
farvoid stop();
4
3
SinkTask
abstract class SinkTask implements Task {
protected SingTaskContext context;
void start(Map<String, String> props);
void open(Collection<TopicPartition> partitions);
void put(Collection<SinkRecord> records);
Map<TopicPartition, OffsetAndMetadata> preCommit(
Map<TopicPartition, OffsetAndMetadata> currentOffsets
);
void close(Collection<TopicPartition> partitions);
void stop();
...
@jwfbean | @confluentinc
44
SinkTask - Starting
void start(Map<String, String> props);
•This task’s configuration that was created
by your connector
- includes topics and topics.regex property
•Start the task and create any resources it
might need
@jwfbean | @confluentinc
45
SinkTask - Assigned topic
partitions
void open(Collection<TopicPartition> partitions);
•Optionally pre-allocate writers and
resources
•Consumer will by default start based on
its own committed offsets
•Or use context.offsets(...) to set the
desired starting point
@jwfbean | @confluentinc
46
SinkTask - Process records
•The topic partitions assigned to this task's
consumer
•Optionally pre-allocate writers and
resources
● Called frequently with next batch of
records returned from the consumer
● Size of batch depends on
● availability of records in our assigned
topic partitions, and
● consumer settings in worker,
including:
● consumer.fetch.min.bytes and
consumer.fetch.max.bytes
● consumer.max.poll.interval.ms and
consumer.max.poll.records
● Either write directly to external
system or
buffer them and write when there are
“enough” to send
void put(Collection<SinkRecord> records);
@jwfbean | @confluentinc
47
SinkRecord
• Topic, partition, and offset from where the record was
consumed
• Optional key and the schema that describes it
• Optional value and the schema that describes it
• Timestamp and whether its create/append/none
• Headers
- ordered list of name-value pairs
- utility to convert value to desired type, if possible
• Deserialized via Converters
topic : String
partition : Integer
keySchema : Schema
key : Object
valueSchema : Schema
value : Object
timestamp : Long
timestampType : enum
offset : long
headers : Headers
SinkRecord
@jwfbean | @confluentinc
48
Sink Connectors and Exactly
Once Semantics (EOS)
• Must track the offsets of records that were written to external
system
- typically involves atomically storing offsets in external system
- for convenience can optionally still have consumer commit those offsets
• When a task restarts and is assigned topic partitions, it
- looks in the external system for the next offset for each topic partition
- set in the task context the exact offsets where to start consuming
@jwfbean | @confluentinc
49
SinkTask - Committing offsets
•The topic partitions assigned to this task's
consumer
•Optionally pre-allocate writers and
resources
•Consumer will by default start based on
its own committed offsets
•Or use context.offsets(...) to set the
desired starting point
● Called periodically based upon
worker’s
offset.flush.interval.ms
● defaults to 60 seconds
● Supplied the current offsets for
consumed records as of the
last call to put(…)
● Return the offsets that the
consumer should actually
commit
● should reflect what was
actually written so far to the
external system for EOS
● or return null if consumer
should not commit offsets
Map<TopicPartition, OffsetAndMetadata> preCommit(
Map<TopicPartition, OffsetAndMetadata>
currentOffsets);
@jwfbean | @confluentinc
50
SinkTask
•Close any writers / resources for these topic
partitions
void close(Collection<TopicPartition> partitions);
@jwfbean | @confluentinc
5
1
void stop();
Considerations
@jwfbean | @confluentinc
53
Considerations - Configuration
•Only way to control a connector
•Make as simple as possible, but no simpler
•Use validators and recommenders in ConfigDef
•Strive for backward compatibility
@jwfbean | @confluentinc
54
Considerations - Ordering
•Each Kafka topic partition is totally ordered
•Design connectors around this notion
@jwfbean | @confluentinc
55
Considerations - Scaling
•Use tasks to parallelize work
-each task runs in its own thread
-you can start other threads, but always clean them up properly
•Sink connectors - typically use # of tasks specified by user
•Source connectors - only one task (or several) may make sense
@jwfbean | @confluentinc
56
Considerations - Version Compatibility
•Connectors are by definition multi-body problems
-external system
-Connect runtime
-Kafka broker
•Specify what versions can be used together
@jwfbean | @confluentinc
57
Considerations - Resilience
•What are the failure modes?
•Can you retry (forever with exponential backoff) rather than fail?
-consider RetriableException
@jwfbean | @confluentinc
58
Considerations - Logging
•Don’t overuse ERROR or WARN
•Use INFO level so users know what’s happening under normal
operation
and when misbehaving
•Use DEBUG so users can diagnose why it’s behaving strangely
•Use TRACE for you
@jwfbean | @confluentinc
59
Considerations - Security
•Use PASSWORD configuration type in ConfigDef
•Communicate securely with external system
•Don’t log records (except maybe at TRACE)
@jwfbean | @confluentinc
60
Considerations - Naming things
•Don’t use org.apache.kafka.* packages
•These are reserved for use by the Apache Kafka project
@jwfbean | @confluentinc
61
Considerations - Licensing
•Indicate your license, whether Open Source License or proprietary
FAQs
Lifecycle Management
● How to make sure systems are ready before sending them
records?
○ SinkTaskContext.pause() / SinkTaskContext.resume() -
when target is not ready
○ SinkTask.open() - called when partition is assigned to a
sink task
○ Connector.start() - Database Creation
○ taskConfigs()
@jwfbean | @confluentinc
Ordering
● Apache Kafka gaurantees ordering but the Connector
Developer has to ensure it as well
● Parallel transactions might complete out of order on the
target system
@jwfbean | @confluentinc
DLQ
● There’s an open request for API access to the DLQ
● The DLQ is just a Kafka topic, it can be written to by a
producer, but this violates Kafka Connect guidelines
● The DLQ is for records it cannot process, not for errors
@jwfbean | @confluentinc
Push vs Pull
● Kafka Connect source / sink paradigm “reverses the polarity” of the Kafka
producer / consumer paradigm on purpose.
○ Producer PUSH to Kafka, Source Connector PULLS from sink (Framework
PUSHes to Kafka)
○ Consumer PULL from Kafka, Sink Connector PUSHES to sink (Framework
PULLs from Kafka)
● What if I want a target to PULL via sink connector?
○ The connector PULLs
○ Sink task can call SinkTaskContext.pause, check for liveness on target and
then SinkTaskContext.resume when ready.
Examples
68
MongoDB
•Source and Sink connector
•Source: Next release of MongoDB
CDC connector from Debezium
•Sink: Next release of from HPG’s
community connector
•https://www.mongodb.com/blog/post/getting-started-with-the-
mongodb-connector-for-apache-kafka-and-mongodb-atlas
•https://github.com/mongodb/mongo-kafka
69
Snowflake
•https://docs.snowflake.net/manuals/user-guide/kafka-connect
or.html
•https://github.com/snowflakedb/snowflake-kafka-connector
70
Neo4J
•https://www.confluent.io/blog/kafka-connect-neo4j-sink-plugin
•https://github.com/neo4j-contrib/neo4j-streams/tree/master/k
afka-connect-neo4j
71
Rockset
•https://rockset.com/blog/kafka-connect-plugin-for-rockset/
72
Confluent Supported
Examples
•S3:
https://github.com/confluentinc/kafka-connect-storage-cloud
•ElasticSearch:
https://github.com/confluentinc/kafka-connect-elasticsearch
@jwfbean | @confluentinc
Verification
Criteria
76
Sign up, ask questions, or start the process
confluent.io/Verified-Integrations-Program/
Email
vip@confluent.io
Kafka Summit San Francisco
kafka-summit.org/
code KS19Online25 for 25% off
Q&A
How to Build an Apache Kafka® Connector

More Related Content

What's hot

Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...confluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connectconfluent
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector? confluent
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafkaconfluent
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetesconfluent
 
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKai Wähner
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafkaconfluent
 

What's hot (20)

Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector?
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
 
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 

Similar to How to Build an Apache Kafka® Connector

Kafka meetup - kafka connect
Kafka meetup -  kafka connectKafka meetup -  kafka connect
Kafka meetup - kafka connectYi Zhang
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...HostedbyConfluent
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...HostedbyConfluent
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingGuozhang Wang
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsDataWorks Summit
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017confluent
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)Landoop Ltd
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsKevin Webber
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Databricks
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streamsconfluent
 

Similar to How to Build an Apache Kafka® Connector (20)

Kafka meetup - kafka connect
Kafka meetup -  kafka connectKafka meetup -  kafka connect
Kafka meetup - kafka connect
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

How to Build an Apache Kafka® Connector

  • 1. @jwfbean | @confluentinc Building a Kafka Connector Verified Integrations Program
  • 2. Speakers Jeff Bean Partner Solution Architect Lisa Sensmeier Partner Marketing Todd McGrath Partner Solution Architect
  • 3. Agenda ● Why Kafka Connect ● Kafka Connect Architecture ● Building a Connector ● Common Issues ● Examples ● Verification Criteria & Process
  • 7. @jwfbean | @confluentinc 7 PRODUCER CONSUMER Sink ConnectorSMTsSource Connector ConverterSMTs Converter KAFKA CONNECT KAFKA CONNECT
  • 8. @jwfbean | @confluentinc 8 PRODUCER CONSUMER Sink ConnectorSMTsSource Connector ConverterSMTs Converter KAFKA CONNECT KAFKA CONNECT
  • 9. @jwfbean | @confluentinc 9 Connectors ● A logical job that copies data in and out of Apache Kafka® ● Maintains tasks ● Provides lifecycle and configuration information ● Ultimately a JAR available to the connect JVM ● Stateless! Use Kafka topics or source / target for state
  • 10. @jwfbean | @confluentinc 10RESTAPI RESTAPI RESTAPI C TTTT TTTT Other Task(s) Other Task(s) Other Task(s) start(…) poll(…) poll(…) poll(…) poll(…) Other Connector(s) start(…) start(…) start(…) TTTT Running
  • 11. @jwfbean | @confluentinc 11 Kafka Connect - Connector and Task Lifecycles REST API REST API RESTAPI KAFKA CONNECT validate CC start(…) deploy Starting
  • 12. @jwfbean | @confluentinc 12 Kafka Connect - Connector and Task Lifecycles RESTAPI RESTAPI RESTAPI C taskConfigs(…) TTTT Running
  • 13. @jwfbean | @confluentinc 13 Tasks ● Lifecycle managed by a connector ● Runs in a worker ● Manages one or more topic partitions ● The assignment is dynamic at runtime ● Does the actual copying and transforming of things
  • 14. @jwfbean | @confluentinc 14 Workers ● Actual JVM processes running on a computer of some kind ● Tasks get allocated to workers ● Can run in standalone or distributed mode ○ Standalone is good for dev, one-offs, and conference demos ○ Distributed good for scale and fault tolerance
  • 15. @jwfbean | @confluentinc 15RESTAPI RESTAPI RESTAPI C TTTT TTTT Other Task(s) Other Task(s) Other Task(s) Other Connector(s) Recovery from Failures
  • 17. @jwfbean | @confluentinc 17RESTAPI RESTAPI C TTTT TTTT Stopping stop() stop() stop()stop() stop() Other Task(s) Other Connector(s ) Other Task(s)
  • 18. @jwfbean | @confluentinc 18 Delivery Guarantees ● Framework-managed offsets ● At-least-once default ● Exactly once available with connector support
  • 20. @jwfbean | @confluentinc 20 Converters ● Convert input data to bytes for storing in Kafka ● Convert input data from bytes to output to somewhere else ● Sit between the connector and Kafka in either direction
  • 21. @twitter_handle | #hashtag | @confluentinc 21 Single Message Transform (SMT)
  • 22. Single Message Transforms ● Mask sensitive information ● Add identifiers ● Tag events ● Lineage/provenance ● Remove unnecessary columns ● Route high priority events to faster datastores ● Direct events to different Elasticsearch indexes ● Cast data types to match destination ● Remove unnecessary columns Modify events before storing in Kafka: Modify events going out of Kafka:
  • 24. @jwfbean | @confluentinc 24 Built-in Transformations ● InsertField – Add a field using either static data or record metadata ● ReplaceField – Filter or rename fields ● MaskField – Replace field with valid null value for the type (0, empty string, etc.) ● ValueToKey – Set the key to one of the value’s fields ● HoistField – Wrap the entire event as a single field inside a Struct or a Map ● ExtractField – Extract a specific field from Struct and Map and include only this field in results ● SetSchemaMetadata – Modify the schema name or version ● TimestampRouter – Modify the topic of a record based on original topic and timestamp, which is useful when using a sink that needs to write to different tables or indexes based on timestamps ● RegexpRouter – Modify the topic of a record based on the original topic, replacement string, and a regular expression
  • 26. 3 1 SourceConnector and SinkConnector class Connector { String version(); ConfigDef config(); Class<? extends Task> taskClass(); void start(Map<String, String> props); List<Map<String, String>> taskConfigs(int maxTasks); void stop(); ... }
  • 27. 3 2 SourceConnector and SinkConnector class SinkConnector extends Connector {} class SourceConnector extends Connector {}
  • 28. @jwfbean | @confluentinc 33 Starting a connector void start(Map<String, String> props); •The user-specified connector configuration •Optionally talk to external system to get information about tasks •Determine the task configuration •Optionally start thread(s) to monitor external system, and if needed ask for task reconfiguration
  • 29. @jwfbean | @confluentinc 34 Task configs List<Map<String, String>> taskConfigs(int maxTasks); •Tell Connect the configurations for each of the tasks
  • 30. @jwfbean | @confluentinc 35 Considerations - Testing •Design for testability •Unit tests and integration tests in your build -look for reusable integration tests (PR5516) using embedded ZK & Kafka •Continuous integration for regression tests •System tests use real ZK, Kafka, Connect, and external systems •Performance and soak tests •See the Connect Verification Guide for more detailed information on testing.
  • 31. @jwfbean | @confluentinc 36 SourceConnector and SinkConnector - Stoppingvoid stop(); •Notification that the connector is being stopped •Clean up any resources that were allocated so far
  • 32. 3 7 SourceTask abstract class SourceTask implements Task { protected SourceTaskContext context; void start(Map<String, String> props); List<SourceRecord> poll() throws InterruptedException; void stop(); ... }
  • 33. @jwfbean | @confluentinc 38 SourceTask - Starting •This task’s configuration that was created by your connector •Read previously committed offsets to know where in the external system to start reading •Create any resources it might need - Connections to external system - Buffers, queues, threads, etc.
  • 34. @jwfbean | @confluentinc 39 SourceTask - getting records •The topic partitions assigned to this task's consumer •Optionally pre-allocate writers and resources •Consumer will by default start based on its own committed offsets •Or use context.offsets(...) to set the desired starting point ● Called frequently ● Get the next batch of records that Connect should write to Kafka ● block until there are “enough” records to return ● return null if no records right now ● For systems that push data ● use separate thread to receive and process the data and enqueue ● poll then dequeues and returns records List<SourceRecord> poll() throws InterruptedException;
  • 35. @jwfbean | @confluentinc 40 SourceRecord topic : String partition : Integer keySchema : Schema key : Object valueSchema : Schema value : Object timestamp : Long headers : Headers sourcePartition : Map sourceOffset : Map • Topic where the record is to be written • Optional partition # for the topic • Optional key and the schema that describes it • Optional value and the schema that describes it • Optional headers • Optional timestamp • Source “partition” and “offset" - Describes where this record originated - Defined by connector, used only by connector - Connect captures last partition+offset that it writes, periodically committed to connect-offsetstopic - When task starts up, it reads these to know where it should start - TIP: reuse the same partition Map instance • Serialized via Converters SourceRecord
  • 36. @jwfbean | @confluentinc 41 Connect Schemas name : String version : String doc : String type : Type parameters : Map fields : List<Field> name : String schema : Schema index : int Field Schema • Name (required), version, and documentation • Type of schema: primitive, Map, Array, Struct • Whether the value described by the schema is optional • Optional metadata parameters • Information about the structure: - For Struct schemas, the fields - For Map schemas, the key schema and value schema - For Array schemas, the element schema • Name of field that this schema describes • Schema for field value • Index of field within the Struct
  • 37. @jwfbean | @confluentinc 42 SourceTask - Stopping •Clean up resources that were allocated so farvoid stop();
  • 38. 4 3 SinkTask abstract class SinkTask implements Task { protected SingTaskContext context; void start(Map<String, String> props); void open(Collection<TopicPartition> partitions); void put(Collection<SinkRecord> records); Map<TopicPartition, OffsetAndMetadata> preCommit( Map<TopicPartition, OffsetAndMetadata> currentOffsets ); void close(Collection<TopicPartition> partitions); void stop(); ...
  • 39. @jwfbean | @confluentinc 44 SinkTask - Starting void start(Map<String, String> props); •This task’s configuration that was created by your connector - includes topics and topics.regex property •Start the task and create any resources it might need
  • 40. @jwfbean | @confluentinc 45 SinkTask - Assigned topic partitions void open(Collection<TopicPartition> partitions); •Optionally pre-allocate writers and resources •Consumer will by default start based on its own committed offsets •Or use context.offsets(...) to set the desired starting point
  • 41. @jwfbean | @confluentinc 46 SinkTask - Process records •The topic partitions assigned to this task's consumer •Optionally pre-allocate writers and resources ● Called frequently with next batch of records returned from the consumer ● Size of batch depends on ● availability of records in our assigned topic partitions, and ● consumer settings in worker, including: ● consumer.fetch.min.bytes and consumer.fetch.max.bytes ● consumer.max.poll.interval.ms and consumer.max.poll.records ● Either write directly to external system or buffer them and write when there are “enough” to send void put(Collection<SinkRecord> records);
  • 42. @jwfbean | @confluentinc 47 SinkRecord • Topic, partition, and offset from where the record was consumed • Optional key and the schema that describes it • Optional value and the schema that describes it • Timestamp and whether its create/append/none • Headers - ordered list of name-value pairs - utility to convert value to desired type, if possible • Deserialized via Converters topic : String partition : Integer keySchema : Schema key : Object valueSchema : Schema value : Object timestamp : Long timestampType : enum offset : long headers : Headers SinkRecord
  • 43. @jwfbean | @confluentinc 48 Sink Connectors and Exactly Once Semantics (EOS) • Must track the offsets of records that were written to external system - typically involves atomically storing offsets in external system - for convenience can optionally still have consumer commit those offsets • When a task restarts and is assigned topic partitions, it - looks in the external system for the next offset for each topic partition - set in the task context the exact offsets where to start consuming
  • 44. @jwfbean | @confluentinc 49 SinkTask - Committing offsets •The topic partitions assigned to this task's consumer •Optionally pre-allocate writers and resources •Consumer will by default start based on its own committed offsets •Or use context.offsets(...) to set the desired starting point ● Called periodically based upon worker’s offset.flush.interval.ms ● defaults to 60 seconds ● Supplied the current offsets for consumed records as of the last call to put(…) ● Return the offsets that the consumer should actually commit ● should reflect what was actually written so far to the external system for EOS ● or return null if consumer should not commit offsets Map<TopicPartition, OffsetAndMetadata> preCommit( Map<TopicPartition, OffsetAndMetadata> currentOffsets);
  • 45. @jwfbean | @confluentinc 50 SinkTask •Close any writers / resources for these topic partitions void close(Collection<TopicPartition> partitions);
  • 48. @jwfbean | @confluentinc 53 Considerations - Configuration •Only way to control a connector •Make as simple as possible, but no simpler •Use validators and recommenders in ConfigDef •Strive for backward compatibility
  • 49. @jwfbean | @confluentinc 54 Considerations - Ordering •Each Kafka topic partition is totally ordered •Design connectors around this notion
  • 50. @jwfbean | @confluentinc 55 Considerations - Scaling •Use tasks to parallelize work -each task runs in its own thread -you can start other threads, but always clean them up properly •Sink connectors - typically use # of tasks specified by user •Source connectors - only one task (or several) may make sense
  • 51. @jwfbean | @confluentinc 56 Considerations - Version Compatibility •Connectors are by definition multi-body problems -external system -Connect runtime -Kafka broker •Specify what versions can be used together
  • 52. @jwfbean | @confluentinc 57 Considerations - Resilience •What are the failure modes? •Can you retry (forever with exponential backoff) rather than fail? -consider RetriableException
  • 53. @jwfbean | @confluentinc 58 Considerations - Logging •Don’t overuse ERROR or WARN •Use INFO level so users know what’s happening under normal operation and when misbehaving •Use DEBUG so users can diagnose why it’s behaving strangely •Use TRACE for you
  • 54. @jwfbean | @confluentinc 59 Considerations - Security •Use PASSWORD configuration type in ConfigDef •Communicate securely with external system •Don’t log records (except maybe at TRACE)
  • 55. @jwfbean | @confluentinc 60 Considerations - Naming things •Don’t use org.apache.kafka.* packages •These are reserved for use by the Apache Kafka project
  • 56. @jwfbean | @confluentinc 61 Considerations - Licensing •Indicate your license, whether Open Source License or proprietary
  • 57. FAQs
  • 58. Lifecycle Management ● How to make sure systems are ready before sending them records? ○ SinkTaskContext.pause() / SinkTaskContext.resume() - when target is not ready ○ SinkTask.open() - called when partition is assigned to a sink task ○ Connector.start() - Database Creation ○ taskConfigs()
  • 59. @jwfbean | @confluentinc Ordering ● Apache Kafka gaurantees ordering but the Connector Developer has to ensure it as well ● Parallel transactions might complete out of order on the target system
  • 60. @jwfbean | @confluentinc DLQ ● There’s an open request for API access to the DLQ ● The DLQ is just a Kafka topic, it can be written to by a producer, but this violates Kafka Connect guidelines ● The DLQ is for records it cannot process, not for errors
  • 61. @jwfbean | @confluentinc Push vs Pull ● Kafka Connect source / sink paradigm “reverses the polarity” of the Kafka producer / consumer paradigm on purpose. ○ Producer PUSH to Kafka, Source Connector PULLS from sink (Framework PUSHes to Kafka) ○ Consumer PULL from Kafka, Sink Connector PUSHES to sink (Framework PULLs from Kafka) ● What if I want a target to PULL via sink connector? ○ The connector PULLs ○ Sink task can call SinkTaskContext.pause, check for liveness on target and then SinkTaskContext.resume when ready.
  • 63. 68 MongoDB •Source and Sink connector •Source: Next release of MongoDB CDC connector from Debezium •Sink: Next release of from HPG’s community connector •https://www.mongodb.com/blog/post/getting-started-with-the- mongodb-connector-for-apache-kafka-and-mongodb-atlas •https://github.com/mongodb/mongo-kafka
  • 68.
  • 70. 76 Sign up, ask questions, or start the process confluent.io/Verified-Integrations-Program/ Email vip@confluent.io Kafka Summit San Francisco kafka-summit.org/ code KS19Online25 for 25% off Q&A