Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM United Kingdom) Kafka Summit NYC 2019

IBM Event StreamsApache Kafka
© 2019 IBM Corporation
Lessons learned building a connector using
Kafka Connect
Kate Stanley and Andrew Schofield
Kafka Summit NY 2019

“Kafka Connect is a tool for scalably and reliably
streaming data between Apache Kafka and
other systems”

IBM MQ

MESSAGE QUEUING EVENT STREAMING
Assured delivery Stream history

IBM MQ
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP

KAFKA
CLIENT
APP
KAFKA
CLIENT
APP
KAFKA CONNECT
IBM MQ
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP

Getting started with
Kafka Connect

Getting started with Kafka Connect
$ ls libs
connect-api-2.1.1.jar
connect-basic-auth-extension-2.1.1.jar
connect-file-2.1.1.jar
connect-json-2.1.1.jar
connect-runtime-2.1.1.jar
connect-transforms-2.1.1.jar
$ ls bin
connect-distributed.sh
connect-standalone.sh

$ ls libs
connect-api-2.1.1.jar
connect-basic-auth-extension-2.1.1.jar
connect-file-2.1.1.jar
connect-json-2.1.1.jar
connect-runtime-2.1.1.jar
connect-transforms-2.1.1.jar
$ ls bin
connect-distributed.sh
connect-standalone.sh
$ bin/connect-standalone.sh config/connect-standalone.properties
connector1.properties [connector2.properties]
$ bin/connect-distributed.sh config/connect-distributed.properties --
bootstrap.servers localhost:9092 --group.id connect

Running distributed mode

CONNECT
WORKER
CONNECT
WORKER
CONNECT
WORKER

CONNECT
WORKER
CONNECT
WORKER
CONNECT
WORKER
API
API
API

$ curl http://localhost:8083/connector-plugins
[
{
"class":"org.apache.kafka.connect.file.FileStreamSinkConnector",
"type":"sink",
"version":"2.1.1”
},
{
"class":"org.apache.kafka.connect.file.FileStreamSourceConnector",
"type":"source",
"version":"2.1.1”
}
]

$ echo ‘{
"name":"kate-file-load",
"config":{"connector.class":"FileStreamSource",
"file":"config/server.properties",
"topic":"kafka-config-topic"}}’ |
curl -X POST -d @- http://localhost:8083/connectors
--header "content-Type:application/json"
$ curl http://localhost:8083/connectors
["kate-file-load"]

Writing a connector

Key considerations – partitions and topics

file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish

file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
SOURCE
CONNECTOR
Topic
Partition 1
Partition 2

file-copy.txt
Partition 1
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
Partition 2
SOURCE
CONNECTOR
SINK
CONNECTOR
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
Topic

Key considerations – Data formats

EXTERNAL SYSTEM
FORMAT
KAFKA RECORD
FORMAT
KAFKA CONNECT
INTERNAL FORMAT

EXTERNAL SYSTEM
FORMAT
KAFKA RECORD
FORMAT
KAFKA CONNECT
INTERNAL FORMAT
org.apache.kafka.connect.converters.ByteArrayConverter
org.apache.kafka.connect.storage.StringConverter
org.apache.kafka.connect.json.JsonConverter

Implementing the API

Anatomy of a connector
CONNECTOR TASK
CONNECTOR CLASS CONNECTOR TASK
CONNECTOR TASK

Anatomy of a connector
CONNECTOR TASK
CONNECTOR TASK
CONNECTOR TASKCONNECTOR TASK
CONNECTOR
Connect worker
Connect worker
Connect worker

version()
config()
validate(config)
start(config)
Connector initialize
parse and
validate config
Lifecycle of a connector

Connector config
@Override
public ConfigDef config() {
ConfigDef configDef = new ConfigDef();
configDef.define(”config_option", Type.STRING, Importance.HIGH, ”Config option.") ;
return configDef;
}
$ curl -X PUT -d '{"connector.class":”MyConnector"}’
http://localhost:8083/connector-plugins/MyConnector/config/validate
{“configs”: [{
“definition”: {“name”: “config_option”, “importance”: “HIGH”, “default_value”: null, …},
”value”: {
“errors”: [“Missing required configuration ”config_option” which has no default value.”],
…
}

version()
config()
validate(config)
start(config)
taskClass()
taskConfigs(max)
parse and
validate config create tasks
stop()

Source Task initialize running
stop()
poll()
commit()
commitRecord(record)
version()
start(config)
parse and

parse and
Sink Task initialize running
stop()
put(records)
flush(offsets)
version()
start(config)

Kafka Connect and IBM
MQ

It’s easy to connect IBM MQ to Apache Kafka
IBM has created a pair of connectors, available as source code or as part of IBM Event Streams
Source connector
From MQ queue to Kafka topic
https://github.com/ibm-messaging/kafka-connect-mq-source
Sink connector
From Kafka topic to MQ queue
https://github.com/ibm-messaging/kafka-connect-mq-sink
Fully supported by IBM for customers with support entitlement for IBM Event Streams

Connecting IBM MQ to Apache Kafka
The connectors are deployed into a Kafka Connect runtime
This runs between IBM MQ and Apache Kafka
CLIENT
IBM MQ
TO.KAFKA.Q
FROM.KAFKA.Q
Kafka Connect worker
FROM.MQ.TOPIC
MQ SINK
CONNECTOR
TO.MQ.TOPIC
MQ SOURCE
CONNECTOR
CLIENT

IBM MQ Advanced for z/OS VUE provides support for the Kafka Connect workers to be
deployed onto z/OS Unix System Services using bindings connections to MQ
Running Kafka Connect on a mainframe
BINDINGS
IBM MQ Advanced
for z/OS VUE
TO.KAFKA.Q
FROM.KAFKA.Q
FROM.MQ.TOPIC
MQ SINK
CONNECTOR
TO.MQ.TOPIC
MQ SOURCE
CONNECTOR
BINDINGS
Unix System Services

Design of the MQ sink
connector

MQ sink connector
Converter MessageBuilder
TO.MQ.TOPIC
SinkRecord
Value
(may be complex)
Schema
Kafka Record
Value
byte[]
Key
MQ Message
Payload
MQMD
(MQRFH2)
MQ SINK
CONNECTOR
FROM.KAFKA.Q

Sink task – Design
Sink connector is relatively simple
The interface is synchronous and fits MQ quite well
Balancing efficiency with resource limits is the key
put(Collection<SinkRecord> records)
Converts Kafka records to MQ messages and sends in a transaction
Always requests a flush to avoid hitting MQ transaction limits
flush(Map<TopicPartition, OffsetAndMetadata> currentOffsets)
Commits any pending sends
This batches messages into MQ without excessively large batches

Design of the MQ source
connector

MQ source connector
RecordBuilder Converter
TO.MQ.TOPIC
Source Record
Value
(may be complex)
Schema
MQ Message Kafka Record
Null Record
MQ SOURCE
CONNECTOR
TO.KAFKA.Q
Value
byte[]
Payload
MQMD
(MQRFH2)

Source task – Original design
Source connector is more complicated
It’s multi-threaded and asynchronous which fits MQ less naturally
List<SourceRecord> poll()
Waits for up to 30 seconds for MQ messages and returned as a batch
Multiple calls to poll() could contribute to an MQ transaction
commit()
Asynchronously commits the active MQ transaction
Works quite well but commit() is too infrequent under load which causes throttling
commit() does ensure that the most recent batch of messages polled have been acked by
Kafka, but it doesn’t quite feel like the right way to do it

Source task – Revised design
Changed so each call to poll() comprises a single MQ transaction
commit() is no longer used in normal operation
List<SourceRecord> poll()
Waits for records from the previous poll() to be acked by Kafka
Commits the active MQ transaction – the previous batch
Waits for up to 30 seconds for MQ messages and returned as a new batch
commitRecord(SourceRecord record)
Just counts up the acks for the records sent
MQ transactions are much shorter-lived
No longer throttles under load
Feels a much better fit for the design of Kafka Connect

Stopping a source task is tricky
stop() is called on SourceTask to indicate the task should stop
Running asynchronously wrt to the polling and commit threads
Can’t be sure whether poll() or commit() are currently active or will be called very soon
Since poll() and commit() may both want access to the MQ connection
It’s not clear when it’s safe to close it
KIP-419: Safely notify Kafka Connect SourceTask is stopped
Adds a stopped() method to SourceTask that is guaranteed to be the final call to the task
uninitialized
initialize()
initialized running stopping
start() stop() stopped()
poll()
commit()
commitRecord()
poll()
commit()
commitRecord()

Summary
CONNECTOR TASK
CONNECTOR TASK
CONNECTOR TASKCONNECTOR TASK
CONNECTOR
Connect worker
Connect worker
Connect worker

Summary
Over 80 connectors
IBM MQ
HDFS
Elasticsearch
MySQL
JDBC
MQTT
CoAP
+ many others

Summary
parse and
Sink Task initialize running
Source Task initialize running

Summary

Thank you
IBM Event Streams: ibm.com/cloud/event-streams
Kate Stanley @katestanley91
Andrew Schofield https://medium.com/@andrew_schofield
Kafka Connect: https://kafka.apache.org/documentation/#connect
https://github.com/ibm-messaging/kafka-connect-mq-source
https://github.com/ibm-messaging/kafka-connect-mq-sink

Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM United Kingdom) Kafka Summit NYC 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM United Kingdom) Kafka Summit NYC 2019

Similar to Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM United Kingdom) Kafka Summit NYC 2019 (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM United Kingdom) Kafka Summit NYC 2019

Editor's Notes