SlideShare a Scribd company logo
1 of 48
IBM Event StreamsApache Kafka
© 2019 IBM Corporation
Lessons learned building a connector using
Kafka Connect
Kate Stanley and Andrew Schofield
Kafka Summit NY 2019
© 2019 IBM Corporation
“Kafka Connect is a tool for scalably and reliably
streaming data between Apache Kafka and
other systems”
© 2019 IBM Corporation
IBM MQ
© 2019 IBM Corporation
MESSAGE QUEUING EVENT STREAMING
Assured delivery Stream history
© 2019 IBM Corporation
IBM MQ
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
© 2019 IBM Corporation
KAFKA
CLIENT
APP
KAFKA
CLIENT
APP
KAFKA CONNECT
IBM MQ
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
MQ
CLIENT
APP
© 2019 IBM Corporation
© 2019 IBM Corporation
Getting started with
Kafka Connect
© 2019 IBM Corporation
Getting started with Kafka Connect
$ ls libs
connect-api-2.1.1.jar
connect-basic-auth-extension-2.1.1.jar
connect-file-2.1.1.jar
connect-json-2.1.1.jar
connect-runtime-2.1.1.jar
connect-transforms-2.1.1.jar
$ ls bin
connect-distributed.sh
connect-standalone.sh
© 2019 IBM Corporation
Getting started with Kafka Connect
$ ls libs
connect-api-2.1.1.jar
connect-basic-auth-extension-2.1.1.jar
connect-file-2.1.1.jar
connect-json-2.1.1.jar
connect-runtime-2.1.1.jar
connect-transforms-2.1.1.jar
$ ls bin
connect-distributed.sh
connect-standalone.sh
$ bin/connect-standalone.sh config/connect-standalone.properties
connector1.properties [connector2.properties]
$ bin/connect-distributed.sh config/connect-distributed.properties --
bootstrap.servers localhost:9092 --group.id connect
© 2019 IBM Corporation
Running distributed mode
© 2019 IBM Corporation
CONNECT
WORKER
CONNECT
WORKER
CONNECT
WORKER
Running distributed mode
© 2019 IBM Corporation
CONNECT
WORKER
CONNECT
WORKER
CONNECT
WORKER
API
API
API
Running distributed mode
© 2019 IBM Corporation
Getting started with Kafka Connect
$ curl http://localhost:8083/connector-plugins
[
{
"class":"org.apache.kafka.connect.file.FileStreamSinkConnector",
"type":"sink",
"version":"2.1.1”
},
{
"class":"org.apache.kafka.connect.file.FileStreamSourceConnector",
"type":"source",
"version":"2.1.1”
}
]
© 2019 IBM Corporation
Getting started with Kafka Connect
$ echo ‘{
"name":"kate-file-load",
"config":{"connector.class":"FileStreamSource",
"file":"config/server.properties",
"topic":"kafka-config-topic"}}’ |
curl -X POST -d @- http://localhost:8083/connectors
--header "content-Type:application/json"
$ curl http://localhost:8083/connectors
["kate-file-load"]
© 2019 IBM Corporation
Writing a connector
© 2019 IBM Corporation
Key considerations – partitions and topics
© 2019 IBM Corporation
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
Key considerations – partitions and topics
© 2019 IBM Corporation
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
SOURCE
CONNECTOR
Key considerations – partitions and topics
Topic
Partition 1
Partition 2
© 2019 IBM Corporation
Key considerations – partitions and topics
file-copy.txt
Partition 1
file.txt
1. Start
2. The beginning
3. The middle
4. Conclusion
5. Ending
6. Finish
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
Partition 2
SOURCE
CONNECTOR
SINK
CONNECTOR
1. Start
3. The middle
5. Ending
2. The beginning
4. Conclusion
6. Finish
Topic
© 2019 IBM Corporation
Key considerations – Data formats
© 2019 IBM Corporation
EXTERNAL SYSTEM
FORMAT
KAFKA RECORD
FORMAT
KAFKA CONNECT
INTERNAL FORMAT
Key considerations – Data formats
© 2019 IBM Corporation
Key considerations – Data formats
EXTERNAL SYSTEM
FORMAT
KAFKA RECORD
FORMAT
KAFKA CONNECT
INTERNAL FORMAT
org.apache.kafka.connect.converters.ByteArrayConverter
org.apache.kafka.connect.storage.StringConverter
org.apache.kafka.connect.json.JsonConverter
© 2019 IBM Corporation
Implementing the API
© 2019 IBM Corporation
Anatomy of a connector
CONNECTOR TASK
CONNECTOR CLASS CONNECTOR TASK
CONNECTOR TASK
© 2019 IBM Corporation
Anatomy of a connector
CONNECTOR TASK
CONNECTOR TASK
CONNECTOR TASKCONNECTOR TASK
CONNECTOR
Connect worker
Connect worker
Connect worker
© 2019 IBM Corporation
version()
config()
validate(config)
start(config)
Connector initialize
parse and
validate config
Lifecycle of a connector
© 2019 IBM Corporation
Connector config
@Override
public ConfigDef config() {
ConfigDef configDef = new ConfigDef();
configDef.define(”config_option", Type.STRING, Importance.HIGH, ”Config option.") ;
return configDef;
}
$ curl -X PUT -d '{"connector.class":”MyConnector"}’
http://localhost:8083/connector-plugins/MyConnector/config/validate
{“configs”: [{
“definition”: {“name”: “config_option”, “importance”: “HIGH”, “default_value”: null, …},
”value”: {
“errors”: [“Missing required configuration ”config_option” which has no default value.”],
…
}
© 2019 IBM Corporation
version()
config()
validate(config)
start(config)
taskClass()
taskConfigs(max)
Connector initialize
parse and
validate config create tasks
Lifecycle of a connector
stop()
© 2019 IBM Corporation
Source Task initialize running
stop()
poll()
commit()
commitRecord(record)
version()
start(config)
Connector initialize
parse and
validate config create tasks
Lifecycle of a connector
© 2019 IBM Corporation
Lifecycle of a connector
Connector initialize
parse and
validate config create tasks
Sink Task initialize running
stop()
put(records)
flush(offsets)
version()
start(config)
© 2019 IBM Corporation
Kafka Connect and IBM
MQ
© 2019 IBM Corporation
It’s easy to connect IBM MQ to Apache Kafka
IBM has created a pair of connectors, available as source code or as part of IBM Event Streams
Source connector
From MQ queue to Kafka topic
https://github.com/ibm-messaging/kafka-connect-mq-source
Sink connector
From Kafka topic to MQ queue
https://github.com/ibm-messaging/kafka-connect-mq-sink
Fully supported by IBM for customers with support entitlement for IBM Event Streams
© 2019 IBM Corporation
Connecting IBM MQ to Apache Kafka
The connectors are deployed into a Kafka Connect runtime
This runs between IBM MQ and Apache Kafka
CLIENT
IBM MQ
TO.KAFKA.Q
FROM.KAFKA.Q
Kafka Connect worker
FROM.MQ.TOPIC
Kafka Connect worker
MQ SINK
CONNECTOR
TO.MQ.TOPIC
MQ SOURCE
CONNECTOR
CLIENT
© 2019 IBM Corporation
IBM MQ Advanced for z/OS VUE provides support for the Kafka Connect workers to be
deployed onto z/OS Unix System Services using bindings connections to MQ
Running Kafka Connect on a mainframe
BINDINGS
IBM MQ Advanced
for z/OS VUE
TO.KAFKA.Q
FROM.KAFKA.Q
Kafka Connect worker
FROM.MQ.TOPIC
Kafka Connect worker
MQ SINK
CONNECTOR
TO.MQ.TOPIC
MQ SOURCE
CONNECTOR
BINDINGS
Unix System Services
© 2019 IBM Corporation
Design of the MQ sink
connector
© 2019 IBM Corporation
MQ sink connector
Converter MessageBuilder
TO.MQ.TOPIC
SinkRecord
Value
(may be complex)
Schema
Kafka Record
Value
byte[]
Key
MQ Message
Payload
MQMD
(MQRFH2)
MQ SINK
CONNECTOR
FROM.KAFKA.Q
© 2019 IBM Corporation
Sink task – Design
Sink connector is relatively simple
The interface is synchronous and fits MQ quite well
Balancing efficiency with resource limits is the key
put(Collection<SinkRecord> records)
Converts Kafka records to MQ messages and sends in a transaction
Always requests a flush to avoid hitting MQ transaction limits
flush(Map<TopicPartition, OffsetAndMetadata> currentOffsets)
Commits any pending sends
This batches messages into MQ without excessively large batches
© 2019 IBM Corporation
Design of the MQ source
connector
© 2019 IBM Corporation
MQ source connector
RecordBuilder Converter
TO.MQ.TOPIC
Source Record
Value
(may be complex)
Schema
MQ Message Kafka Record
Null Record
MQ SOURCE
CONNECTOR
TO.KAFKA.Q
Value
byte[]
Payload
MQMD
(MQRFH2)
© 2019 IBM Corporation
Source task – Original design
Source connector is more complicated
It’s multi-threaded and asynchronous which fits MQ less naturally
List<SourceRecord> poll()
Waits for up to 30 seconds for MQ messages and returned as a batch
Multiple calls to poll() could contribute to an MQ transaction
commit()
Asynchronously commits the active MQ transaction
Works quite well but commit() is too infrequent under load which causes throttling
commit() does ensure that the most recent batch of messages polled have been acked by
Kafka, but it doesn’t quite feel like the right way to do it
© 2019 IBM Corporation
Source task – Revised design
Changed so each call to poll() comprises a single MQ transaction
commit() is no longer used in normal operation
List<SourceRecord> poll()
Waits for records from the previous poll() to be acked by Kafka
Commits the active MQ transaction – the previous batch
Waits for up to 30 seconds for MQ messages and returned as a new batch
commitRecord(SourceRecord record)
Just counts up the acks for the records sent
MQ transactions are much shorter-lived
No longer throttles under load
Feels a much better fit for the design of Kafka Connect
© 2019 IBM Corporation
Stopping a source task is tricky
stop() is called on SourceTask to indicate the task should stop
Running asynchronously wrt to the polling and commit threads
Can’t be sure whether poll() or commit() are currently active or will be called very soon
Since poll() and commit() may both want access to the MQ connection
It’s not clear when it’s safe to close it
KIP-419: Safely notify Kafka Connect SourceTask is stopped
Adds a stopped() method to SourceTask that is guaranteed to be the final call to the task
uninitialized
initialize()
initialized running stopping
start() stop() stopped()
poll()
commit()
commitRecord()
poll()
commit()
commitRecord()
© 2019 IBM Corporation
Summary
CONNECTOR TASK
CONNECTOR TASK
CONNECTOR TASKCONNECTOR TASK
CONNECTOR
Connect worker
Connect worker
Connect worker
© 2019 IBM Corporation
Summary
Over 80 connectors
IBM MQ
HDFS
Elasticsearch
MySQL
JDBC
MQTT
CoAP
+ many others
© 2019 IBM Corporation
Summary
Connector initialize
parse and
validate config create tasks
Sink Task initialize running
Source Task initialize running
© 2019 IBM Corporation
Summary
© 2019 IBM Corporation
Thank you
IBM Event Streams: ibm.com/cloud/event-streams
Kate Stanley @katestanley91
Andrew Schofield https://medium.com/@andrew_schofield
Kafka Connect: https://kafka.apache.org/documentation/#connect
https://github.com/ibm-messaging/kafka-connect-mq-source
https://github.com/ibm-messaging/kafka-connect-mq-sink

More Related Content

What's hot

What's hot (20)

Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CISecure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operator
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
IBM MQ vs Apache ActiveMQ
IBM MQ vs Apache ActiveMQIBM MQ vs Apache ActiveMQ
IBM MQ vs Apache ActiveMQ
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
kafka
kafkakafka
kafka
 

Similar to Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM United Kingdom) Kafka Summit NYC 2019

Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
HostedbyConfluent
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 

Similar to Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM United Kingdom) Kafka Summit NYC 2019 (20)

Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
 
Technology choices for Apache Kafka and Change Data Capture
Technology choices for Apache Kafka and Change Data CaptureTechnology choices for Apache Kafka and Change Data Capture
Technology choices for Apache Kafka and Change Data Capture
 
Connecting mq&amp;kafka
Connecting mq&amp;kafkaConnecting mq&amp;kafka
Connecting mq&amp;kafka
 
Kafka with IBM Event Streams - Technical Presentation
Kafka with IBM Event Streams - Technical PresentationKafka with IBM Event Streams - Technical Presentation
Kafka with IBM Event Streams - Technical Presentation
 
Virtual Meetup Sweden - Reacting to an event driven world
Virtual Meetup Sweden - Reacting to an event driven worldVirtual Meetup Sweden - Reacting to an event driven world
Virtual Meetup Sweden - Reacting to an event driven world
 
Fast Kafka Apps! (Edoardo Comar and Mickael Maison, IBM) Kafka Summit London ...
Fast Kafka Apps! (Edoardo Comar and Mickael Maison, IBM) Kafka Summit London ...Fast Kafka Apps! (Edoardo Comar and Mickael Maison, IBM) Kafka Summit London ...
Fast Kafka Apps! (Edoardo Comar and Mickael Maison, IBM) Kafka Summit London ...
 
JSpring Virtual 2020 - Reacting to an event-driven world
JSpring Virtual 2020 - Reacting to an event-driven worldJSpring Virtual 2020 - Reacting to an event-driven world
JSpring Virtual 2020 - Reacting to an event-driven world
 
DevNexus - Reacting to an event driven world
DevNexus - Reacting to an event driven worldDevNexus - Reacting to an event driven world
DevNexus - Reacting to an event driven world
 
Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...
Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...
Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...
 
Jfokus - Reacting to an event-driven world
Jfokus - Reacting to an event-driven worldJfokus - Reacting to an event-driven world
Jfokus - Reacting to an event-driven world
 
What's new in MQ 9.1.* on z/OS
What's new in MQ 9.1.* on z/OSWhat's new in MQ 9.1.* on z/OS
What's new in MQ 9.1.* on z/OS
 
Kubernetes Apache Kafka
Kubernetes Apache KafkaKubernetes Apache Kafka
Kubernetes Apache Kafka
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
 
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQCloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
 
Spring Cloud Stream with Kafka
Spring Cloud Stream with KafkaSpring Cloud Stream with Kafka
Spring Cloud Stream with Kafka
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
 
App modernization on AWS with Apache Kafka and Confluent Cloud
App modernization on AWS with Apache Kafka and Confluent CloudApp modernization on AWS with Apache Kafka and Confluent Cloud
App modernization on AWS with Apache Kafka and Confluent Cloud
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
 
How kafka is transforming hadoop, spark & storm
How kafka is transforming hadoop, spark & stormHow kafka is transforming hadoop, spark & storm
How kafka is transforming hadoop, spark & storm
 

More from confluent

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 

Recently uploaded

Recently uploaded (20)

WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 

Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley & Andrew Schofield, IBM United Kingdom) Kafka Summit NYC 2019

  • 1. IBM Event StreamsApache Kafka © 2019 IBM Corporation Lessons learned building a connector using Kafka Connect Kate Stanley and Andrew Schofield Kafka Summit NY 2019
  • 2. © 2019 IBM Corporation “Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems”
  • 3. © 2019 IBM Corporation IBM MQ
  • 4. © 2019 IBM Corporation MESSAGE QUEUING EVENT STREAMING Assured delivery Stream history
  • 5. © 2019 IBM Corporation IBM MQ MQ CLIENT APP MQ CLIENT APP MQ CLIENT APP MQ CLIENT APP
  • 6. © 2019 IBM Corporation KAFKA CLIENT APP KAFKA CLIENT APP KAFKA CONNECT IBM MQ MQ CLIENT APP MQ CLIENT APP MQ CLIENT APP MQ CLIENT APP
  • 7. © 2019 IBM Corporation
  • 8. © 2019 IBM Corporation Getting started with Kafka Connect
  • 9. © 2019 IBM Corporation Getting started with Kafka Connect $ ls libs connect-api-2.1.1.jar connect-basic-auth-extension-2.1.1.jar connect-file-2.1.1.jar connect-json-2.1.1.jar connect-runtime-2.1.1.jar connect-transforms-2.1.1.jar $ ls bin connect-distributed.sh connect-standalone.sh
  • 10. © 2019 IBM Corporation Getting started with Kafka Connect $ ls libs connect-api-2.1.1.jar connect-basic-auth-extension-2.1.1.jar connect-file-2.1.1.jar connect-json-2.1.1.jar connect-runtime-2.1.1.jar connect-transforms-2.1.1.jar $ ls bin connect-distributed.sh connect-standalone.sh $ bin/connect-standalone.sh config/connect-standalone.properties connector1.properties [connector2.properties] $ bin/connect-distributed.sh config/connect-distributed.properties -- bootstrap.servers localhost:9092 --group.id connect
  • 11. © 2019 IBM Corporation Running distributed mode
  • 12. © 2019 IBM Corporation CONNECT WORKER CONNECT WORKER CONNECT WORKER Running distributed mode
  • 13. © 2019 IBM Corporation CONNECT WORKER CONNECT WORKER CONNECT WORKER API API API Running distributed mode
  • 14. © 2019 IBM Corporation Getting started with Kafka Connect $ curl http://localhost:8083/connector-plugins [ { "class":"org.apache.kafka.connect.file.FileStreamSinkConnector", "type":"sink", "version":"2.1.1” }, { "class":"org.apache.kafka.connect.file.FileStreamSourceConnector", "type":"source", "version":"2.1.1” } ]
  • 15. © 2019 IBM Corporation Getting started with Kafka Connect $ echo ‘{ "name":"kate-file-load", "config":{"connector.class":"FileStreamSource", "file":"config/server.properties", "topic":"kafka-config-topic"}}’ | curl -X POST -d @- http://localhost:8083/connectors --header "content-Type:application/json" $ curl http://localhost:8083/connectors ["kate-file-load"]
  • 16. © 2019 IBM Corporation Writing a connector
  • 17. © 2019 IBM Corporation Key considerations – partitions and topics
  • 18. © 2019 IBM Corporation file.txt 1. Start 2. The beginning 3. The middle 4. Conclusion 5. Ending 6. Finish Key considerations – partitions and topics
  • 19. © 2019 IBM Corporation file.txt 1. Start 2. The beginning 3. The middle 4. Conclusion 5. Ending 6. Finish 1. Start 3. The middle 5. Ending 2. The beginning 4. Conclusion 6. Finish SOURCE CONNECTOR Key considerations – partitions and topics Topic Partition 1 Partition 2
  • 20. © 2019 IBM Corporation Key considerations – partitions and topics file-copy.txt Partition 1 file.txt 1. Start 2. The beginning 3. The middle 4. Conclusion 5. Ending 6. Finish 1. Start 3. The middle 5. Ending 2. The beginning 4. Conclusion 6. Finish Partition 2 SOURCE CONNECTOR SINK CONNECTOR 1. Start 3. The middle 5. Ending 2. The beginning 4. Conclusion 6. Finish Topic
  • 21. © 2019 IBM Corporation Key considerations – Data formats
  • 22. © 2019 IBM Corporation EXTERNAL SYSTEM FORMAT KAFKA RECORD FORMAT KAFKA CONNECT INTERNAL FORMAT Key considerations – Data formats
  • 23. © 2019 IBM Corporation Key considerations – Data formats EXTERNAL SYSTEM FORMAT KAFKA RECORD FORMAT KAFKA CONNECT INTERNAL FORMAT org.apache.kafka.connect.converters.ByteArrayConverter org.apache.kafka.connect.storage.StringConverter org.apache.kafka.connect.json.JsonConverter
  • 24. © 2019 IBM Corporation Implementing the API
  • 25. © 2019 IBM Corporation Anatomy of a connector CONNECTOR TASK CONNECTOR CLASS CONNECTOR TASK CONNECTOR TASK
  • 26. © 2019 IBM Corporation Anatomy of a connector CONNECTOR TASK CONNECTOR TASK CONNECTOR TASKCONNECTOR TASK CONNECTOR Connect worker Connect worker Connect worker
  • 27. © 2019 IBM Corporation version() config() validate(config) start(config) Connector initialize parse and validate config Lifecycle of a connector
  • 28. © 2019 IBM Corporation Connector config @Override public ConfigDef config() { ConfigDef configDef = new ConfigDef(); configDef.define(”config_option", Type.STRING, Importance.HIGH, ”Config option.") ; return configDef; } $ curl -X PUT -d '{"connector.class":”MyConnector"}’ http://localhost:8083/connector-plugins/MyConnector/config/validate {“configs”: [{ “definition”: {“name”: “config_option”, “importance”: “HIGH”, “default_value”: null, …}, ”value”: { “errors”: [“Missing required configuration ”config_option” which has no default value.”], … }
  • 29. © 2019 IBM Corporation version() config() validate(config) start(config) taskClass() taskConfigs(max) Connector initialize parse and validate config create tasks Lifecycle of a connector stop()
  • 30. © 2019 IBM Corporation Source Task initialize running stop() poll() commit() commitRecord(record) version() start(config) Connector initialize parse and validate config create tasks Lifecycle of a connector
  • 31. © 2019 IBM Corporation Lifecycle of a connector Connector initialize parse and validate config create tasks Sink Task initialize running stop() put(records) flush(offsets) version() start(config)
  • 32. © 2019 IBM Corporation Kafka Connect and IBM MQ
  • 33. © 2019 IBM Corporation It’s easy to connect IBM MQ to Apache Kafka IBM has created a pair of connectors, available as source code or as part of IBM Event Streams Source connector From MQ queue to Kafka topic https://github.com/ibm-messaging/kafka-connect-mq-source Sink connector From Kafka topic to MQ queue https://github.com/ibm-messaging/kafka-connect-mq-sink Fully supported by IBM for customers with support entitlement for IBM Event Streams
  • 34. © 2019 IBM Corporation Connecting IBM MQ to Apache Kafka The connectors are deployed into a Kafka Connect runtime This runs between IBM MQ and Apache Kafka CLIENT IBM MQ TO.KAFKA.Q FROM.KAFKA.Q Kafka Connect worker FROM.MQ.TOPIC Kafka Connect worker MQ SINK CONNECTOR TO.MQ.TOPIC MQ SOURCE CONNECTOR CLIENT
  • 35. © 2019 IBM Corporation IBM MQ Advanced for z/OS VUE provides support for the Kafka Connect workers to be deployed onto z/OS Unix System Services using bindings connections to MQ Running Kafka Connect on a mainframe BINDINGS IBM MQ Advanced for z/OS VUE TO.KAFKA.Q FROM.KAFKA.Q Kafka Connect worker FROM.MQ.TOPIC Kafka Connect worker MQ SINK CONNECTOR TO.MQ.TOPIC MQ SOURCE CONNECTOR BINDINGS Unix System Services
  • 36. © 2019 IBM Corporation Design of the MQ sink connector
  • 37. © 2019 IBM Corporation MQ sink connector Converter MessageBuilder TO.MQ.TOPIC SinkRecord Value (may be complex) Schema Kafka Record Value byte[] Key MQ Message Payload MQMD (MQRFH2) MQ SINK CONNECTOR FROM.KAFKA.Q
  • 38. © 2019 IBM Corporation Sink task – Design Sink connector is relatively simple The interface is synchronous and fits MQ quite well Balancing efficiency with resource limits is the key put(Collection<SinkRecord> records) Converts Kafka records to MQ messages and sends in a transaction Always requests a flush to avoid hitting MQ transaction limits flush(Map<TopicPartition, OffsetAndMetadata> currentOffsets) Commits any pending sends This batches messages into MQ without excessively large batches
  • 39. © 2019 IBM Corporation Design of the MQ source connector
  • 40. © 2019 IBM Corporation MQ source connector RecordBuilder Converter TO.MQ.TOPIC Source Record Value (may be complex) Schema MQ Message Kafka Record Null Record MQ SOURCE CONNECTOR TO.KAFKA.Q Value byte[] Payload MQMD (MQRFH2)
  • 41. © 2019 IBM Corporation Source task – Original design Source connector is more complicated It’s multi-threaded and asynchronous which fits MQ less naturally List<SourceRecord> poll() Waits for up to 30 seconds for MQ messages and returned as a batch Multiple calls to poll() could contribute to an MQ transaction commit() Asynchronously commits the active MQ transaction Works quite well but commit() is too infrequent under load which causes throttling commit() does ensure that the most recent batch of messages polled have been acked by Kafka, but it doesn’t quite feel like the right way to do it
  • 42. © 2019 IBM Corporation Source task – Revised design Changed so each call to poll() comprises a single MQ transaction commit() is no longer used in normal operation List<SourceRecord> poll() Waits for records from the previous poll() to be acked by Kafka Commits the active MQ transaction – the previous batch Waits for up to 30 seconds for MQ messages and returned as a new batch commitRecord(SourceRecord record) Just counts up the acks for the records sent MQ transactions are much shorter-lived No longer throttles under load Feels a much better fit for the design of Kafka Connect
  • 43. © 2019 IBM Corporation Stopping a source task is tricky stop() is called on SourceTask to indicate the task should stop Running asynchronously wrt to the polling and commit threads Can’t be sure whether poll() or commit() are currently active or will be called very soon Since poll() and commit() may both want access to the MQ connection It’s not clear when it’s safe to close it KIP-419: Safely notify Kafka Connect SourceTask is stopped Adds a stopped() method to SourceTask that is guaranteed to be the final call to the task uninitialized initialize() initialized running stopping start() stop() stopped() poll() commit() commitRecord() poll() commit() commitRecord()
  • 44. © 2019 IBM Corporation Summary CONNECTOR TASK CONNECTOR TASK CONNECTOR TASKCONNECTOR TASK CONNECTOR Connect worker Connect worker Connect worker
  • 45. © 2019 IBM Corporation Summary Over 80 connectors IBM MQ HDFS Elasticsearch MySQL JDBC MQTT CoAP + many others
  • 46. © 2019 IBM Corporation Summary Connector initialize parse and validate config create tasks Sink Task initialize running Source Task initialize running
  • 47. © 2019 IBM Corporation Summary
  • 48. © 2019 IBM Corporation Thank you IBM Event Streams: ibm.com/cloud/event-streams Kate Stanley @katestanley91 Andrew Schofield https://medium.com/@andrew_schofield Kafka Connect: https://kafka.apache.org/documentation/#connect https://github.com/ibm-messaging/kafka-connect-mq-source https://github.com/ibm-messaging/kafka-connect-mq-sink

Editor's Notes

  1. Slide to show difference between Kafka and MQ: stream history vs reliable delivery
  2. SourceConnector – import from other system SinkConnector – export to other system
  3. Run a cluster of worker process -> start them using the CLI Then when you start a worker give it an idea and connectors that run will run on any worker and put tasks on any worker -> parallelism
  4. # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  5. # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  6. # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  7. # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  8. # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  9. # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  10. # The converters control conversion of data between the internal Kafka Connect representation and the messages in Kafka. key.converter=org.apache.kafka.connect.converters.ByteArrayConverter key.converter=org.apache.kafka.connect.storage.StringConverter key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.converters.ByteArrayConverter value.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter
  11. You have the connector class which is used to connect to Kafka The task class which does the processing of data into a format for Kafka And then optional transformations
  12. Start - Parse config - Only called on “clean” Connector
  13. Start – initialize and one-time setup Poll - Get new records from the third-party system, block if no data Commit and CommitRecord – Optional methods to keep track of offsets internally CommitRecord - Commit an individual SourceRecord when the callback from the producer client is received, or if a record is filtered by a transformation.
  14. Put - Write records to third-party system Flush - Optional method to prompt flushing all records that have been ‘put’
  15. Provides scalability and reliability when connecting systems
  16. Look out for existing connectors
  17. Writing your own has some subtleties
  18. Your external system’s API and Kafka Connect might not align