While many companies are embracing Apache Kafka as their core event streaming platform they may still have events they want to unlock in other systems. Kafka Connect provides a common API for developers to do just that and the number of open-source connectors available is growing rapidly. The IBM MQ sink and source connectors allow you to flow messages between your Apache Kafka cluster and your IBM MQ queues. In this session I will share our lessons learned and top tips for building a Kafka Connect connector. I’ll explain how a connector is structured, how the framework calls it and some of the things to consider when providing configuration options. The more Kafka Connect connectors the community creates the better, as it will enable everyone to unlock the events in their existing systems.
How kafka is transforming hadoop, spark & stormEdureka!
Similar to Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM United Kingdom) Kafka Summit NYC 2019 (20)
Slide to show difference between Kafka and MQ: stream history vs reliable delivery
SourceConnector – import from other system
SinkConnector – export to other system
Run a cluster of worker process -> start them using the CLI
Then when you start a worker give it an idea and connectors that run will run on any worker and put tasks on any worker -> parallelism
# The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka.
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka.
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka.
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka.
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka.
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka.
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka.
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
You have the connector class which is used to connect to Kafka
The task class which does the processing of data into a format for Kafka
And then optional transformations
Start
- Parse config
- Only called on “clean” Connector
Start – initialize and one-time setup
Poll - Get new records from the third-party system, block if no data
Commit and CommitRecord – Optional methods to keep track of offsets internally
CommitRecord - Commit an individual SourceRecord when the callback from the producer client is received, or if a record is filtered by a transformation.
Put - Write records to third-party system
Flush - Optional method to prompt flushing all records that have been ‘put’
Provides scalability and reliability when connecting systems
Look out for existing connectors
Writing your own has some subtleties
Your external system’s API and Kafka Connect might not align