1
From Zero to Hero with Kafka®
Connect
Perry Krol
perry@confluent.io @perkrol
22
What is Kafka®
Connect?
3
Streaming Integration with Kafka®
Connect
Kafka®
Connect
Kafka®
Brokers
Sources
4
Streaming Integration with Kafka®
Connect
Kafka®
Connect
Kafka®
Brokers
Sinks
5
Streaming Integration with Kafka®
Connect
Kafka®
Connect
Kafka®
Brokers
Sources Sinks
6
Zero Coding!
{
"connector.class":
"io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url":
"jdbc:mysql://asgard:3306/demo",
"table.whitelist":
"sales,orders,customers"
}
7
Streaming Integration with Kafka®
Connect
Kafka®
Connect
Kafka
® Connect
AWS S3
HDFS
RDBMS
8
Streaming Integration with Kafka®
Connect
Kafka®
Connect
Kafka
® Connect
Kafka ®
Connect
AWS S3
HDFS
RDBMS
9
Writing to Data Stores from Kafka®
Kafka ®
Connect
Apps
DATA STORE
10
Evolve Processing from Old Systems to New
Kafka®
Connect
RDBMS
Existing App
11
Evolve Processing from Old Systems to New
Kafka®
Connect
RDBMS
New App
<x>
Existing App
1212
Kafka®
Connect Demo
13
Demo
https://github.com/confluentinc/demo-scene/tree/master/kafka-connect-zero-to-hero
Kafka®
Connect
Kafka
® Connect
Kafka ®
Connect
Elasticseach
14
Kafka®
Connect REST API - Tricks
http://go.rmoff.net/connector-status
15
Kafka®
Connect REST API - Tricks
http://go.rmoff.net/selectively-delete-connectors
(h/t to @madewithtea)
1616
Configuring Kafka®
Connect
Inside the API - connectors - transforms - converters
17
Kafka®
Connect Basics
Source Kafka®
Connect Kafka®
18
Connectors
Source Kafka®
Connect Kafka®
Connector
19
Connectors
"config": {
[...]
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "jdbc:postgresql://postgres:5432/",
"topics": "asgard.demo.orders",
}
20
Connectors
Source Kafka®
Connect Kafka®
Connector
Native Data Connect Record
21
Converters
Source Kafka®
Connect Kafka®
Connector
Native Data Connect Record
Converter
Bytes[ ]
22
Serialisation & Schemas
Confluent
Schema Registry
AVRO JSONProtobuf CSV
https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf
23
The Confluent Schema Registry
Source Kafka®
Connect Kafka®
ConnectKafka®
Target
Schema Registry
AVRO message AVRO message
AVRO schema AVRO schema
24
The Confluent Schema Registry
Source Kafka®
Connect Kafka®
ConnectKafka®
Target
Schema Registry
AVRO message AVRO message
AVRO schema AVRO schema
25
The Confluent Schema Registry
Source Kafka®
Connect Kafka®
ConnectKafka®
Target
Schema Registry
AVRO message AVRO message
AVRO schema AVRO schema
26
Converters
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
Set as a global default per-worker; optionally can be overridden per-connector
27
Converters
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
Set as a global default per-worker; optionally can be overridden per-connector
28
Converters
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
Set as a global default per-worker; optionally can be overridden per-connector
29
What About Internal Converters
value.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
key.internal.value.converter=org.apache.kafka.connect.json.JsonConverter
value.internal.value.converter=org.apache.kafka.connect.json.JsonConverter
key.internal.value.converter.bork.bork.bork=org.apache.kafka.connect.json.JsonConverter
key.internal.value.please.just.work.converter=org.apache.kafka.connect.json.JsonConverter
30
What About Internal Converters
value.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
key.internal.value.converter=org.apache.kafka.connect.json.JsonConverter
value.internal.value.converter=org.apache.kafka.connect.json.JsonConverter
key.internal.value.converter.bork.bork.bork=org.apache.kafka.connect.json.JsonConverter
key.internal.value.please.just.work.converter=org.apache.kafka.connect.json.JsonConverter
Internal converters are for internal use by Kafka Connect only, and have been deprecated as of Apache Kafka 2.0.
You should not change these, and you will get warnings from Apache Kafka 2.0 onwards, if you do try to configure them.
31
Single Message Transforms
Source Kafka®
Connect Kafka®
Connector ConverterTransform(s)
32
Single Message Transforms
"config": {
[...]
"transforms": "addDateToTopic,labelFooBar",
"transforms.addDateToTopic.type": "org.apache.kafka.connect.transforms.TimestampRouter",
"transforms.addDateToTopic.topic.format": "${topic}-${timestamp}",
"transforms.addDateToTopic.timestamp.format": "YYYYMM",
"transforms.labelFooBar.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.labelFooBar.renames": "delivery_address:shipping_address",
}
33
Single Message Transforms
"config": {
[...]
"transforms": "addDateToTopic,labelFooBar",
"transforms.addDateToTopic.type": "org.apache.kafka.connect.transforms.TimestampRouter",
"transforms.addDateToTopic.topic.format": "${topic}-${timestamp}",
"transforms.addDateToTopic.timestamp.format": "YYYYMM",
"transforms.labelFooBar.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.labelFooBar.renames": "delivery_address:shipping_address",
}
Transforms config
34
Single Message Transforms
"config": {
[...]
"transforms": "addDateToTopic,labelFooBar",
"transforms.addDateToTopic.type": "org.apache.kafka.connect.transforms.TimestampRouter",
"transforms.addDateToTopic.topic.format": "${topic}-${timestamp}",
"transforms.addDateToTopic.timestamp.format": "YYYYMM",
"transforms.labelFooBar.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.labelFooBar.renames": "delivery_address:shipping_address",
}
Apply these transforms
Transforms config
35
Single Message Transforms
"config": {
[...]
"transforms": "addDateToTopic,labelFooBar",
"transforms.addDateToTopic.type": "org.apache.kafka.connect.transforms.TimestampRouter",
"transforms.addDateToTopic.topic.format": "${topic}-${timestamp}",
"transforms.addDateToTopic.timestamp.format": "YYYYMM",
"transforms.labelFooBar.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.labelFooBar.renames": "delivery_address:shipping_address",
}
Apply these transforms
Transforms config Config per Transform
36
Extensible
Source Kafka®
Connect Kafka®
Connector ConverterTransform(s)
37
Confluent Hub
Online library of pre-packaged and
ready-to-install extensions or add-ons
for Confluent Platform and Apache
Kafka®:
● Connectors
● Transforms
● Converters
Easily install the components that suit
your needs into your local
environment with the Confluent Hub
client command line tool . https://hub.confuent.io
3838
Deploying Kafka®
Connect
Connectors - Tasks - Workers
39
Connectors & Tasks
S3 Sink
40
Connectors & Tasks
S3 SinkJDBC Source
S3 Task #1
JDBC Task #1
41
Connectors & Tasks
S3 SinkJDBC Source
S3 Task #1
JDBC Task #1 JDBC Task #2
42
Tasks & Workers
S3 SinkJDBC Source
S3 Task #1
JDBC Task #1 JDBC Task #2
Worker
43
Kafka®
Connect Standalone Worker
S3 Task #1
JDBC Task #1 JDBC Task #2
Offsets
Worker
44
“Scaling” the Standalone Worker
JDBC Task #1
Offsets Offsets
S3 Task #1JDBC Task #2
WorkerWorker
Fault-Tolerant? Nope!
45
Kafka®
Connect Distributed Worker
JDBC Task #2
● Offsets
● Config
● Status
S3 Task #1
JDBC Task #1
Worker
Kafka® Connect Cluster
46
Scaling the Distributed Worker
● Offsets
● Config
● Status
S3 Task #1
JDBC Task #1
WorkerWorker
Fault-Tolerant? Yeah!
JDBC Task #2
Kafka® Connect Cluster
47
Distributed Worker - Fault Tolerance
● Offsets
● Config
● Status
S3 Task #1
JDBC Task #1
WorkerWorker
Fault-Tolerant? Yeah!
JDBC Task #2
Kafka® Connect Cluster
48
Distributed Worker - Fault Tolerance
JDBC Task #2
● Offsets
● Config
● Status
S3 Task #1
JDBC Task #1
Worker
Kafka® Connect Cluster
49
Multiple Distributed Clusters
● Offsets
● Config
● Status
S3 Task #1 JDBC Task #1
Kafka® Connect Cluster #1
● Offsets
● Config
● Status
Kafka® Connect Cluster #2
JDBC Task #2
5050
Troubleshooting Kafka®
Connect
51
Troubleshooting Kafka®
Connect
Task
Connector
RUNNING
FAILED
$ curl -s "http://localhost:8083/connectors/source-debezium-orders/status" | 
jq '.connector.state'
"RUNNING"
$ curl -s "http://localhost:8083/connectors/source-debezium-orders/status" | 
jq '.tasks[0].state'
"FAILED"
52
Troubleshooting Kafka®
Connect
$curl -s "http://localhost:8083/connectors/source-debezium-orders-00/status" | jq '.tasks[0].trace'
"org.apache.kafka.connect.errors.ConnectExceptionntat
io.debezium.connector.mysql.AbstractReader.wrap(AbstractReader.java:230)ntat
io.debezium.connector.mysql.AbstractReader.failed(AbstractReader.java:197)ntat
io.debezium.connector.mysql.BinlogReader$ReaderThreadLifecycleListener.onCommunicationFailure(BinlogReader.java:
1018)ntat com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:950)ntat
com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:580)ntat
com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:825)ntat
java.lang.Thread.run(Thread.java:748)nCaused by: java.io.EOFExceptionntat
com.github.shyiko.mysql.binlog.io.ByteArrayInputStream.read(ByteArrayInputStream.java:190)ntat
com.github.shyiko.mysql.binlog.io.ByteArrayInputStream.readInteger(ByteArrayInputStream.java:46)ntat
com.github.shyiko.mysql.binlog.event.deserialization.EventHeaderV4Deserializer.deserialize(EventHeaderV4Deserial
izer.java:35)ntat
com.github.shyiko.mysql.binlog.event.deserialization.EventHeaderV4Deserializer.deserialize(EventHeaderV4Deserial
izer.java:27)ntat
com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.nextEvent(EventDeserializer.java:212)nt
at io.debezium.connector.mysql.BinlogReader$1.nextEvent(BinlogReader.java:224)ntat
com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:922)nt... 3 moren"
53
The Log is the Source of Truth
$ confluent log connect
$ docker-compose logs kafka-connect
$ cat /var/log/kafka/connect.log
54
Troubleshooting Kafka®
Connect
55
Troubleshooting Kafka®
Connect
“Task is being killed and will
not recover until manually restarted”
Symptom not Cause
56
Troubleshooting Kafka®
Connect
Error #1
Error #2
5757
Common Errors with Kafka®
Connect
58
org.apache.kafka.common.errors.SerializationException:
Unknown magic byte!
59
Mismatched Converters
org.apache.kafka.common.errors.SerializationException:
Unknown magic byte!
JSON
JSON
JSON
JSON
JSON
Messages are not AVRO
“Value.converter” :
“AvroConverter”
Use correct Converter for the Source Data
60
Mixed Serialisation Methods
org.apache.kafka.common.errors.SerializationException:
Unknown magic byte!
JSON
JSON
AVRO
AVRO
JSON
Some messages
are not AVRO
“Value.converter” :
“AvroConverter”
Use Error Handling to Deal with Bad Messages
61
Error Handling & Dead Letter Queues (DLQ)
https://cnfl.io/connect-dlq
Handled
● Convert (read/write from
Kafka®, [de]-serialisation)
● Transform
Not Handled
● Start (connections to a Data
Store)
● Poll / Put (read/write from/to
Data Store)*
* Can be retried by Kafka® Connect
62
Fail Fast
Source Topic Messages Sink Messages
https://cnfl.io/connect-dlq
63
You Only Live Once...
Source Topic Messages Sink Messages
https://cnfl.io/connect-dlq
errors.tolerance=all
64
Dead Letter Queue
Source Topic Messages Sink Messages
https://cnfl.io/connect-dlq
errors.tolerance=all
errors.deadletterqueue.topic.name=my_dlq
Dead Letter Queue
65
Re-processing the Dead Letter Queue
Source Topic Messages Sink Messages
https://cnfl.io/connect-dlq
Dead
Letter
Queue
Sink Messages
AVRO Sink
JSON Sink
66
No fields found using key and value schemas for table:
foo-bar
67
No fields found using key and value schemas for table:
foo-bar
JsonDeserializer with schemas.enable requires
"schema" and "payload" fields and may not contain
additional fields
68
Schema where are You...
No fields found using key and value schemas for table: foo-bar
JsonDeserializer with schemas.enable requires "schema" and "payload" fields and may not contain additional fields
69
Schema where are You...
$ http localhost:8081/subjects[...]
jq '.schema|fromjson'
{
"type": "record",
"name": "Value",
"namespace": "asgard.demo.ORDERS",
"fields": [
{
"name": "id",
"type": "int"
},
{
"name": "order_id",
"type": [
"null",
"int"
],
"default": null
},
{
"name": "customer_id",
"type": [
"null",
"int"
],
"default": null
},
$ kafkacat -b localhost:9092 -C -t mysql-debezium-asgard.demo.ORDERS
QF@Land RoverDefender 90SheffieldSwift LLC,54258 Michigan
Parkway(2019-05-09T13:42:28Z(2019-05-09T13:42:28Z
Schema
(from Schema Registry)
Messages
(AVRO)
70
Schema where are You...
{
"id": 7,
"order_id": 7,
"customer_id": 849,
"order_total_usd": 98062.21,
"make": "Pontiac",
"model": "Aztek",
"delivery_city": "Leeds",
"delivery_company": "Price-Zieme",
"delivery_address": "754 Becker Way",
"CREATE_TS": "2019-05-09T13:42:28Z",
"UPDATE_TS": "2019-05-09T13:42:28Z"
}
No Schema!
Messages
(JSON)
No fields found using key and value schemas for table: foo-bar
71
Schema where are You...
You need a schema!
Use Avro, or use JSON with schemas.enable=true
Either way, you need to re-configure the producer of the data
Or, use KSQL to impose a schema on the JSON/CSV data
and re-serialise it to Avro 👍🏻
No fields found using key and value schemas for table: foo-bar
72
Schema where are You...
Schema
(Embedded per
JSON message)
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
[...]
],
"optional": true,
"name": "asgard.demo.ORDERS.Value"
},
"payload": {
"id": 611,
"order_id": 111,
"customer_id": 851,
"order_total_usd": 182190.93,
"make": "Kia",
"model": "Sorento",
"delivery_city": "Swindon",
"delivery_company": "Lehner,
Kuvalis and Schaefer",
"delivery_address": "31 Cardinal
Junction",
"CREATE_TS":
"2019-05-09T13:54:59Z",
"UPDATE_TS": "2019-05-09T13:54:59Z"
}
Messages
(JSON)
JsonDeserializer with schemas.enable requires "schema" and "payload" fields and may not contain additional fields
25%OFF
Standard Priced
Conference Pass
Confluent Community
Discount Code
KS19Meetup
7474
11. November 2019
Steigenberger Frankfurter Hof
13. November 2019
NOVOTEL Zürich City West
Ben Stopford
Office of the CTO
Confluent
Axel Löhn
Senior Project Manager
Deutsche Bahn
Kai Waehner,
Technologist
Confluent
Ralph Debusmann
IoT Solution Architect
Bosch Power Tools
cnfl.io/cse19frankfurt cnfl.io/cse19zurich
75

Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_20190826v01