IBM Event StreamsApache Kafka
© 2019 IBM Corporation
Lessons learned building a connector using
Kafka Connect
Kate Stanley and Andrew Schofield
Kafka Summit NY 2019
© 2019 IBM Corporation
“Kafka Connect is a tool for scalably and reliably
streaming data between Apache Kafka and
other systems”
© 2019 IBM Corporation
IBM MQ
© 2019 IBM Corporation
MESSAGE QUEUING EVENT STREAMING
Assured delivery Stream history
© 2019 IBM Corporation
IBM MQ
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
© 2019 IBM Corporation
KAFKA
CLIENT
APP
KAFKA
CLIENT
APP
KAFKA CONNECT
IBM MQ
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
© 2019 IBM Corporation
© 2019 IBM Corporation
Getting started with
Kafka Connect
© 2019 IBM Corporation
Getting started with Kafka Connect
$ ls libs
connect-api-2.1.1.jar
connect-basic-auth-extension-2.1.1.jar
connect-file-2.1.1.jar
connect-json-2.1.1.jar
connect-runtime-2.1.1.jar
connect-transforms-2.1.1.jar
$ ls bin
connect-distributed.sh
connect-standalone.sh
© 2019 IBM Corporation
Getting started with Kafka Connect
$ ls libs
connect-api-2.1.1.jar
connect-basic-auth-extension-2.1.1.jar
connect-file-2.1.1.jar
connect-json-2.1.1.jar
connect-runtime-2.1.1.jar
connect-transforms-2.1.1.jar
$ ls bin
connect-distributed.sh
connect-standalone.sh
$ bin/connect-standalone.sh config/connect-standalone.properties
connector1.properties [connector2.properties]
$ bin/connect-distributed.sh config/connect-distributed.properties --
bootstrap.servers localhost:9092 --group.id connect
© 2019 IBM Corporation
Running distributed mode
© 2019 IBM Corporation
CONNECT
WORKER
CONNECT
WORKER
CONNECT
WORKER
Running distributed mode
© 2019 IBM Corporation
CONNECT
WORKER
CONNECT
WORKER
CONNECT
WORKER
API
API
API
Running distributed mode
© 2019 IBM Corporation
Getting started with Kafka Connect
$ curl http://localhost:8083/connector-plugins
[
{
"class":"org.apache.kafka.connect.file.FileStreamSinkConnector",
"type":"sink",
"version":"2.1.1”
},
{
"class":"org.apache.kafka.connect.file.FileStreamSourceConnector",
"type":"source",
"version":"2.1.1”
}
]
© 2019 IBM Corporation
Getting started with Kafka Connect
$ echo ‘{
"name":"kate-file-load",
"config":{"connector.class":"FileStreamSource",
"file":"config/server.properties",
"topic":"kafka-config-topic"}}’ |
curl -X POST -d @- http://localhost:8083/connectors
--header "content-Type:application/json"
$ curl http://localhost:8083/connectors
["kate-file-load"]
© 2019 IBM Corporation
Writing a connector
© 2019 IBM Corporation
Key considerations – partitions and topics
© 2019 IBM Corporation
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
Key considerations – partitions and topics
© 2019 IBM Corporation
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
SOURCE
CONNECTOR
Key considerations – partitions and topics
Topic
Partition 1
Partition 2
© 2019 IBM Corporation
Key considerations – partitions and topics
file-copy.txt
Partition 1
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
Partition 2
SOURCE
CONNECTOR
SINK
CONNECTOR
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
Topic
© 2019 IBM Corporation
Key considerations – Data formats
© 2019 IBM Corporation
EXTERNAL SYSTEM
FORMAT
KAFKA RECORD
FORMAT
KAFKA CONNECT
INTERNAL FORMAT
Key considerations – Data formats
© 2019 IBM Corporation
Key considerations – Data formats
EXTERNAL SYSTEM
FORMAT
KAFKA RECORD
FORMAT
KAFKA CONNECT
INTERNAL FORMAT
org.apache.kafka.connect.converters.ByteArrayConverter
org.apache.kafka.connect.storage.StringConverter
org.apache.kafka.connect.json.JsonConverter
© 2019 IBM Corporation
Implementing the API
© 2019 IBM Corporation
Anatomy of a connector
CONNECTOR TASK
CONNECTOR CLASS CONNECTOR TASK
CONNECTOR TASK
© 2019 IBM Corporation
Anatomy of a connector
CONNECTOR TASK
CONNECTOR TASK
CONNECTOR TASKCONNECTOR TASK
CONNECTOR
Connect worker
Connect worker
Connect worker
© 2019 IBM Corporation
version()
config()
validate(config)
start(config)
Connector initialize
parse and
validate config
Lifecycle of a connector
© 2019 IBM Corporation
Connector config
@Override
public ConfigDef config() {
ConfigDef configDef = new ConfigDef();
configDef.define(”config_option", Type.STRING, Importance.HIGH, ”Config option.") ;
return configDef;
}
$ curl -X PUT -d '{"connector.class":”MyConnector"}’
http://localhost:8083/connector-plugins/MyConnector/config/validate
{“configs”: [{
“definition”: {“name”: “config_option”, “importance”: “HIGH”, “default_value”: null, …},
”value”: {
“errors”: [“Missing required configuration ”config_option” which has no default value.”],
…
}
© 2019 IBM Corporation
version()
config()
validate(config)
start(config)
taskClass()
taskConfigs(max)
Connector initialize
parse and
validate config create tasks
Lifecycle of a connector
stop()
© 2019 IBM Corporation
Source Task initialize running
stop()
poll()
commit()
commitRecord(record)
version()
start(config)
Connector initialize
parse and
validate config create tasks
Lifecycle of a connector
© 2019 IBM Corporation
Lifecycle of a connector
Connector initialize
parse and
validate config create tasks
Sink Task initialize running
stop()
put(records)
flush(offsets)
version()
start(config)
© 2019 IBM Corporation
Kafka Connect and IBM
MQ
© 2019 IBM Corporation
It’s easy to connect IBM MQ to Apache Kafka
IBM has created a pair of connectors, available as source code or as part of IBM Event Streams
Source connector
From MQ queue to Kafka topic
https://github.com/ibm-messaging/kafka-connect-mq-source
Sink connector
From Kafka topic to MQ queue
https://github.com/ibm-messaging/kafka-connect-mq-sink
Fully supported by IBM for customers with support entitlement for IBM Event Streams
© 2019 IBM Corporation
Connecting IBM MQ to Apache Kafka
The connectors are deployed into a Kafka Connect runtime
This runs between IBM MQ and Apache Kafka
CLIENT
IBM MQ
TO.KAFKA.Q
FROM.KAFKA.Q
Kafka Connect worker
FROM.MQ.TOPIC
Kafka Connect worker
MQ SINK
CONNECTOR
TO.MQ.TOPIC
MQ SOURCE
CONNECTOR
CLIENT
© 2019 IBM Corporation
IBM MQ Advanced for z/OS VUE provides support for the Kafka Connect workers to be
deployed onto z/OS Unix System Services using bindings connections to MQ
Running Kafka Connect on a mainframe
BINDINGS
IBM MQ Advanced
for z/OS VUE
TO.KAFKA.Q
FROM.KAFKA.Q
Kafka Connect worker
FROM.MQ.TOPIC
Kafka Connect worker
MQ SINK
CONNECTOR
TO.MQ.TOPIC
MQ SOURCE
CONNECTOR
BINDINGS
Unix System Services
© 2019 IBM Corporation
Design of the MQ sink
connector
© 2019 IBM Corporation
MQ sink connector
Converter MessageBuilder
TO.MQ.TOPIC
SinkRecord
Value
(may be complex)
Schema
Kafka Record
Value
byte[]
Key
MQ Message
Payload
MQMD
(MQRFH2)
MQ SINK
CONNECTOR
FROM.KAFKA.Q
© 2019 IBM Corporation
Sink task – Design
Sink connector is relatively simple
The interface is synchronous and fits MQ quite well
Balancing efficiency with resource limits is the key
put(Collection<SinkRecord> records)
Converts Kafka records to MQ messages and sends in a transaction
Always requests a flush to avoid hitting MQ transaction limits
flush(Map<TopicPartition, OffsetAndMetadata> currentOffsets)
Commits any pending sends
This batches messages into MQ without excessively large batches
© 2019 IBM Corporation
Design of the MQ source
connector
© 2019 IBM Corporation
MQ source connector
RecordBuilder Converter
TO.MQ.TOPIC
Source Record
Value
(may be complex)
Schema
MQ Message Kafka Record
Null Record
MQ SOURCE
CONNECTOR
TO.KAFKA.Q
Value
byte[]
Payload
MQMD
(MQRFH2)
© 2019 IBM Corporation
Source task – Original design
Source connector is more complicated
It’s multi-threaded and asynchronous which fits MQ less naturally
List<SourceRecord> poll()
Waits for up to 30 seconds for MQ messages and returned as a batch
Multiple calls to poll() could contribute to an MQ transaction
commit()
Asynchronously commits the active MQ transaction
Works quite well but commit() is too infrequent under load which causes throttling
commit() does ensure that the most recent batch of messages polled have been acked by
Kafka, but it doesn’t quite feel like the right way to do it
© 2019 IBM Corporation
Source task – Revised design
Changed so each call to poll() comprises a single MQ transaction
commit() is no longer used in normal operation
List<SourceRecord> poll()
Waits for records from the previous poll() to be acked by Kafka
Commits the active MQ transaction – the previous batch
Waits for up to 30 seconds for MQ messages and returned as a new batch
commitRecord(SourceRecord record)
Just counts up the acks for the records sent
MQ transactions are much shorter-lived
No longer throttles under load
Feels a much better fit for the design of Kafka Connect
© 2019 IBM Corporation
Stopping a source task is tricky
stop() is called on SourceTask to indicate the task should stop
Running asynchronously wrt to the polling and commit threads
Can’t be sure whether poll() or commit() are currently active or will be called very soon
Since poll() and commit() may both want access to the MQ connection
It’s not clear when it’s safe to close it
KIP-419: Safely notify Kafka Connect SourceTask is stopped
Adds a stopped() method to SourceTask that is guaranteed to be the final call to the task
uninitialized
initialize()
initialized running stopping
start() stop() stopped()
poll()
commit()
commitRecord()
poll()
commit()
commitRecord()
© 2019 IBM Corporation
Summary
CONNECTOR TASK
CONNECTOR TASK
CONNECTOR TASKCONNECTOR TASK
CONNECTOR
Connect worker
Connect worker
Connect worker
© 2019 IBM Corporation
Summary
Over 80 connectors
IBM MQ
HDFS
Elasticsearch
MySQL
JDBC
MQTT
CoAP
+ many others
© 2019 IBM Corporation
Summary
Connector initialize
parse and
validate config create tasks
Sink Task initialize running
Source Task initialize running
© 2019 IBM Corporation
Summary
© 2019 IBM Corporation
Thank you
IBM Event Streams: ibm.com/cloud/event-streams
Kate Stanley @katestanley91
Andrew Schofield https://medium.com/@andrew_schofield
Kafka Connect: https://kafka.apache.org/documentation/#connect
https://github.com/ibm-messaging/kafka-connect-mq-source
https://github.com/ibm-messaging/kafka-connect-mq-sink

Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM United Kingdom) Kafka Summit NYC 2019

  • 1.
    IBM Event StreamsApacheKafka © 2019 IBM Corporation Lessons learned building a connector using Kafka Connect Kate Stanley and Andrew Schofield Kafka Summit NY 2019
  • 2.
    © 2019 IBMCorporation “Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems”
  • 3.
    © 2019 IBMCorporation IBM MQ
  • 4.
    © 2019 IBMCorporation MESSAGE QUEUING EVENT STREAMING Assured delivery Stream history
  • 5.
    © 2019 IBMCorporation IBM MQ MQ CLIENT APP MQ CLIENT APP MQ CLIENT APP MQ CLIENT APP
  • 6.
    © 2019 IBMCorporation KAFKA CLIENT APP KAFKA CLIENT APP KAFKA CONNECT IBM MQ MQ CLIENT APP MQ CLIENT APP MQ CLIENT APP MQ CLIENT APP
  • 7.
    © 2019 IBMCorporation
  • 8.
    © 2019 IBMCorporation Getting started with Kafka Connect
  • 9.
    © 2019 IBMCorporation Getting started with Kafka Connect $ ls libs connect-api-2.1.1.jar connect-basic-auth-extension-2.1.1.jar connect-file-2.1.1.jar connect-json-2.1.1.jar connect-runtime-2.1.1.jar connect-transforms-2.1.1.jar $ ls bin connect-distributed.sh connect-standalone.sh
  • 10.
    © 2019 IBMCorporation Getting started with Kafka Connect $ ls libs connect-api-2.1.1.jar connect-basic-auth-extension-2.1.1.jar connect-file-2.1.1.jar connect-json-2.1.1.jar connect-runtime-2.1.1.jar connect-transforms-2.1.1.jar $ ls bin connect-distributed.sh connect-standalone.sh $ bin/connect-standalone.sh config/connect-standalone.properties connector1.properties [connector2.properties] $ bin/connect-distributed.sh config/connect-distributed.properties -- bootstrap.servers localhost:9092 --group.id connect
  • 11.
    © 2019 IBMCorporation Running distributed mode
  • 12.
    © 2019 IBMCorporation CONNECT WORKER CONNECT WORKER CONNECT WORKER Running distributed mode
  • 13.
    © 2019 IBMCorporation CONNECT WORKER CONNECT WORKER CONNECT WORKER API API API Running distributed mode
  • 14.
    © 2019 IBMCorporation Getting started with Kafka Connect $ curl http://localhost:8083/connector-plugins [ { "class":"org.apache.kafka.connect.file.FileStreamSinkConnector", "type":"sink", "version":"2.1.1” }, { "class":"org.apache.kafka.connect.file.FileStreamSourceConnector", "type":"source", "version":"2.1.1” } ]
  • 15.
    © 2019 IBMCorporation Getting started with Kafka Connect $ echo ‘{ "name":"kate-file-load", "config":{"connector.class":"FileStreamSource", "file":"config/server.properties", "topic":"kafka-config-topic"}}’ | curl -X POST -d @- http://localhost:8083/connectors --header "content-Type:application/json" $ curl http://localhost:8083/connectors ["kate-file-load"]
  • 16.
    © 2019 IBMCorporation Writing a connector
  • 17.
    © 2019 IBMCorporation Key considerations – partitions and topics
  • 18.
    © 2019 IBMCorporation file.txt 1. Start 2. The beginning 3. The middle 4. Conclusion 5. Ending 6. Finish Key considerations – partitions and topics
  • 19.
    © 2019 IBMCorporation file.txt 1. Start 2. The beginning 3. The middle 4. Conclusion 5. Ending 6. Finish 1. Start 3. The middle 5. Ending 2. The beginning 4. Conclusion 6. Finish SOURCE CONNECTOR Key considerations – partitions and topics Topic Partition 1 Partition 2
  • 20.
    © 2019 IBMCorporation Key considerations – partitions and topics file-copy.txt Partition 1 file.txt 1. Start 2. The beginning 3. The middle 4. Conclusion 5. Ending 6. Finish 1. Start 3. The middle 5. Ending 2. The beginning 4. Conclusion 6. Finish Partition 2 SOURCE CONNECTOR SINK CONNECTOR 1. Start 3. The middle 5. Ending 2. The beginning 4. Conclusion 6. Finish Topic
  • 21.
    © 2019 IBMCorporation Key considerations – Data formats
  • 22.
    © 2019 IBMCorporation EXTERNAL SYSTEM FORMAT KAFKA RECORD FORMAT KAFKA CONNECT INTERNAL FORMAT Key considerations – Data formats
  • 23.
    © 2019 IBMCorporation Key considerations – Data formats EXTERNAL SYSTEM FORMAT KAFKA RECORD FORMAT KAFKA CONNECT INTERNAL FORMAT org.apache.kafka.connect.converters.ByteArrayConverter org.apache.kafka.connect.storage.StringConverter org.apache.kafka.connect.json.JsonConverter
  • 24.
    © 2019 IBMCorporation Implementing the API
  • 25.
    © 2019 IBMCorporation Anatomy of a connector CONNECTOR TASK CONNECTOR CLASS CONNECTOR TASK CONNECTOR TASK
  • 26.
    © 2019 IBMCorporation Anatomy of a connector CONNECTOR TASK CONNECTOR TASK CONNECTOR TASKCONNECTOR TASK CONNECTOR Connect worker Connect worker Connect worker
  • 27.
    © 2019 IBMCorporation version() config() validate(config) start(config) Connector initialize parse and validate config Lifecycle of a connector
  • 28.
    © 2019 IBMCorporation Connector config @Override public ConfigDef config() { ConfigDef configDef = new ConfigDef(); configDef.define(”config_option", Type.STRING, Importance.HIGH, ”Config option.") ; return configDef; } $ curl -X PUT -d '{"connector.class":”MyConnector"}’ http://localhost:8083/connector-plugins/MyConnector/config/validate {“configs”: [{ “definition”: {“name”: “config_option”, “importance”: “HIGH”, “default_value”: null, …}, ”value”: { “errors”: [“Missing required configuration ”config_option” which has no default value.”], … }
  • 29.
    © 2019 IBMCorporation version() config() validate(config) start(config) taskClass() taskConfigs(max) Connector initialize parse and validate config create tasks Lifecycle of a connector stop()
  • 30.
    © 2019 IBMCorporation Source Task initialize running stop() poll() commit() commitRecord(record) version() start(config) Connector initialize parse and validate config create tasks Lifecycle of a connector
  • 31.
    © 2019 IBMCorporation Lifecycle of a connector Connector initialize parse and validate config create tasks Sink Task initialize running stop() put(records) flush(offsets) version() start(config)
  • 32.
    © 2019 IBMCorporation Kafka Connect and IBM MQ
  • 33.
    © 2019 IBMCorporation It’s easy to connect IBM MQ to Apache Kafka IBM has created a pair of connectors, available as source code or as part of IBM Event Streams Source connector From MQ queue to Kafka topic https://github.com/ibm-messaging/kafka-connect-mq-source Sink connector From Kafka topic to MQ queue https://github.com/ibm-messaging/kafka-connect-mq-sink Fully supported by IBM for customers with support entitlement for IBM Event Streams
  • 34.
    © 2019 IBMCorporation Connecting IBM MQ to Apache Kafka The connectors are deployed into a Kafka Connect runtime This runs between IBM MQ and Apache Kafka CLIENT IBM MQ TO.KAFKA.Q FROM.KAFKA.Q Kafka Connect worker FROM.MQ.TOPIC Kafka Connect worker MQ SINK CONNECTOR TO.MQ.TOPIC MQ SOURCE CONNECTOR CLIENT
  • 35.
    © 2019 IBMCorporation IBM MQ Advanced for z/OS VUE provides support for the Kafka Connect workers to be deployed onto z/OS Unix System Services using bindings connections to MQ Running Kafka Connect on a mainframe BINDINGS IBM MQ Advanced for z/OS VUE TO.KAFKA.Q FROM.KAFKA.Q Kafka Connect worker FROM.MQ.TOPIC Kafka Connect worker MQ SINK CONNECTOR TO.MQ.TOPIC MQ SOURCE CONNECTOR BINDINGS Unix System Services
  • 36.
    © 2019 IBMCorporation Design of the MQ sink connector
  • 37.
    © 2019 IBMCorporation MQ sink connector Converter MessageBuilder TO.MQ.TOPIC SinkRecord Value (may be complex) Schema Kafka Record Value byte[] Key MQ Message Payload MQMD (MQRFH2) MQ SINK CONNECTOR FROM.KAFKA.Q
  • 38.
    © 2019 IBMCorporation Sink task – Design Sink connector is relatively simple The interface is synchronous and fits MQ quite well Balancing efficiency with resource limits is the key put(Collection<SinkRecord> records) Converts Kafka records to MQ messages and sends in a transaction Always requests a flush to avoid hitting MQ transaction limits flush(Map<TopicPartition, OffsetAndMetadata> currentOffsets) Commits any pending sends This batches messages into MQ without excessively large batches
  • 39.
    © 2019 IBMCorporation Design of the MQ source connector
  • 40.
    © 2019 IBMCorporation MQ source connector RecordBuilder Converter TO.MQ.TOPIC Source Record Value (may be complex) Schema MQ Message Kafka Record Null Record MQ SOURCE CONNECTOR TO.KAFKA.Q Value byte[] Payload MQMD (MQRFH2)
  • 41.
    © 2019 IBMCorporation Source task – Original design Source connector is more complicated It’s multi-threaded and asynchronous which fits MQ less naturally List<SourceRecord> poll() Waits for up to 30 seconds for MQ messages and returned as a batch Multiple calls to poll() could contribute to an MQ transaction commit() Asynchronously commits the active MQ transaction Works quite well but commit() is too infrequent under load which causes throttling commit() does ensure that the most recent batch of messages polled have been acked by Kafka, but it doesn’t quite feel like the right way to do it
  • 42.
    © 2019 IBMCorporation Source task – Revised design Changed so each call to poll() comprises a single MQ transaction commit() is no longer used in normal operation List<SourceRecord> poll() Waits for records from the previous poll() to be acked by Kafka Commits the active MQ transaction – the previous batch Waits for up to 30 seconds for MQ messages and returned as a new batch commitRecord(SourceRecord record) Just counts up the acks for the records sent MQ transactions are much shorter-lived No longer throttles under load Feels a much better fit for the design of Kafka Connect
  • 43.
    © 2019 IBMCorporation Stopping a source task is tricky stop() is called on SourceTask to indicate the task should stop Running asynchronously wrt to the polling and commit threads Can’t be sure whether poll() or commit() are currently active or will be called very soon Since poll() and commit() may both want access to the MQ connection It’s not clear when it’s safe to close it KIP-419: Safely notify Kafka Connect SourceTask is stopped Adds a stopped() method to SourceTask that is guaranteed to be the final call to the task uninitialized initialize() initialized running stopping start() stop() stopped() poll() commit() commitRecord() poll() commit() commitRecord()
  • 44.
    © 2019 IBMCorporation Summary CONNECTOR TASK CONNECTOR TASK CONNECTOR TASKCONNECTOR TASK CONNECTOR Connect worker Connect worker Connect worker
  • 45.
    © 2019 IBMCorporation Summary Over 80 connectors IBM MQ HDFS Elasticsearch MySQL JDBC MQTT CoAP + many others
  • 46.
    © 2019 IBMCorporation Summary Connector initialize parse and validate config create tasks Sink Task initialize running Source Task initialize running
  • 47.
    © 2019 IBMCorporation Summary
  • 48.
    © 2019 IBMCorporation Thank you IBM Event Streams: ibm.com/cloud/event-streams Kate Stanley @katestanley91 Andrew Schofield https://medium.com/@andrew_schofield Kafka Connect: https://kafka.apache.org/documentation/#connect https://github.com/ibm-messaging/kafka-connect-mq-source https://github.com/ibm-messaging/kafka-connect-mq-sink

Editor's Notes

  • #5 Slide to show difference between Kafka and MQ: stream history vs reliable delivery
  • #8 SourceConnector – import from other system SinkConnector – export to other system
  • #12 Run a cluster of worker process -> start them using the CLI Then when you start a worker give it an idea and connectors that run will run on any worker and put tasks on any worker -> parallelism
  • #18 # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  • #19 # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  • #20 # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  • #21 # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  • #22 # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  • #23 # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  • #24 # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  • #26 You have the connector class which is used to connect to Kafka The task class which does the processing of data into a format for Kafka And then optional transformations
  • #28 Start - Parse config - Only called on “clean” Connector
  • #31 Start – initialize and one-time setup Poll - Get new records from the third-party system, block if no data Commit and CommitRecord – Optional methods to keep track of offsets internally CommitRecord - Commit an individual SourceRecord when the callback from the producer client is received, or if a record is filtered by a transformation.
  • #32 Put - Write records to third-party system Flush - Optional method to prompt flushing all records that have been ‘put’
  • #45 Provides scalability and reliability when connecting systems
  • #46 Look out for existing connectors
  • #47 Writing your own has some subtleties
  • #48 Your external system’s API and Kafka Connect might not align