"Kafka Connect and Apache Flink are popular frameworks for data integration and processing. Both frameworks provide for external connectivity through a connector SDK. Kafka connect mainly focuses on moving data in and out of Kafka, whereas connectors in Flink primarily move data from/to external sources /sinks for use in processing directly. While Kafka Connect has a robust and widely covered connector ecosystem and a popular SDK, connector interfaces in Flink are relatively new and still evolving.
How different are the two connector ecosystems? Should you write a Kafka Connector or a Flink Connector ?
This talk would cover the main connector interfaces in Flink, we will do a high level overview of 2 main FLIPs and briefly touch upon some high level Flink components which enable connectors to work. We will compare and contrast this to connector interfaces in Kafka while touching upon state management, validation models etc, and then briefly dive into some practical use-cases enabled through connectors across the two frameworks.
Attendees will learn how to develop a simple source connector in Flink through a sample Datadog connector
Existing Kafka Connector developers will be able to appreciate the key differences and features of the Flink connector SDK.
This session would also help understand some key considerations involved in writing connectors across these two frameworks along with some best practices."
18. Kafka Connect Flink Connector
State
Management
Framework handles source offset
management. Connector can’t
override this behaviour
Handled automatically with default
behaviour but connector can
override this.
Serialization Connectors send data to converters
in workers
Connectors have to serialize and
deserialize
Internal data
Representation
Custom objects: SourceRecord and
SinkRecord
User defined
Delivery
Semantics
(Best possible)
Source: Exactly once
Sink: At Least Once
Source: Exactly once
Sink: Exactly Once
Summary
18
20. Real time database
Replication
Use cases of Connectors and Kafka
Integrating multiple
systems with Kafka
Capture CDC events
from your source system
Database
Lookups
Real time
Analytics
Enrichment
Joins
Real time
Materialised views
20