Integrating Apache Kafka with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren’t working. This talk will discuss the key design concepts within Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We’ll do a live demo of building pipelines with Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we’ll go hands-on in methodically diagnosing and resolving common issues encountered with Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Kafka Connect in containers.
3. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Sources
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
syslog
4. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
Amazon S3
Google BigQuery
Sinks
5. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
syslog
Amazon S3
Google BigQuery
6. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
{
"connector.class":
"io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url":
"jdbc:mysql://asgard:3306/demo",
"table.whitelist":
"sales,orders,customers"
}
https://docs.confluent.io/current/connect/
Look Ma, No Code!
7. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Streaming Pipelines
RDBMS
Kafka
Connect
Kafka
Connect
Amazon S3
HDFS
8. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
KafkaConnect
Writing to data stores from Kafka
App
Data
Store
9. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Evolve processing from old systems to new
RDBMS
Existing
App
New App
<x>
Kafka
Connect
11. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
REST API - tricks
http://go.rmoff.net/connector-status
12. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
REST API - tricks
http://go.rmoff.net/selectively-delete-connectors
(h/t to @madewithtea)
13. @rmoff #kafkasummit
From Zero to Hero with Kafka Connect
Configuring
Kafka
Connect
Inside the API - connectors, transforms, converters
14. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Kafka Connect basics
KafkaKafka ConnectSource
15. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Connectors
KafkaKafka ConnectSource
Connector
16. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Connectors
"config": {
[...]
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "jdbc:postgresql://postgres:5432/",
"topics": "asgard.demo.orders",
}
17. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Connectors
Connect
Record
Native data
Connector
KafkaKafka ConnectSource
18. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Converters
Connect
Record
Native data bytes[]
KafkaKafka ConnectSource
Connector Converter
19. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Serialisation & Schemas
-> Confluent
Schema Registry
Avro Protobuf JSON CSV
https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf
20. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
The Confluent Schema Registry
Source
Avro
Message
Target
Schema
RegistryAvro
Schema
Kafka
Connect
Kafka
ConnectAvro
Message
21. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Converters
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
Set as a global default per-worker; optionally can be overriden per-connector
22. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
What about internal converters?
value.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
key.internal.value.converter=org.apache.kafka.connect.json.JsonConverter
value.internal.value.converter=org.apache.kafka.connect.json.JsonConverter
key.internal.value.converter.bork.bork.bork=org.apache.kafka.connect.json.JsonConverter
key.internal.value.please.just.work.converter=org.apache.kafka.connect.json.JsonConverter
23. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Single Message Transforms
KafkaKafka ConnectSource
Connector
Transform(s)
Converter
24. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Single Message Transforms
"config": {
[...]
"transforms": "addDateToTopic,labelFooBar",
"transforms.addDateToTopic.type": "org.apache.kafka.connect.transforms.TimestampRouter",
"transforms.addDateToTopic.topic.format": "${topic}-${timestamp}",
"transforms.addDateToTopic.timestamp.format": "YYYYMM",
"transforms.labelFooBar.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.labelFooBar.renames": "delivery_address:shipping_address",
}
Do these transforms
Transforms config Config per transform
25. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Extensible
Connector
Transform(s)
Converter
26. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Confluent Hub
hub.confluent.io
42. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
The log is the source of truth
$ confluent log connect
$ docker-compose logs kafka-connect
$ cat /var/log/kafka/connect.log
43. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Kafka Connect
44. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Kafka Connect
Symptom not Cause
Task is being killed and will
not recover until manually restarted
"
"
45. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Kafka Connect
47. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
org.apache.kafka.common.errors.SerializationException:
Unknown magic byte!
48. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Mismatched converters
"value.converter":
"AvroConverter"
org.apache.kafka.common.errors.SerializationException:
Unknown magic byte!
Use the correct Converter for the
source dataⓘ
Messages are not Avro
49. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Mixed serialisation methods
"value.converter":
"AvroConverter"
org.apache.kafka.common.errors.SerializationException:
Unknown magic byte!
Some messages are not Avro
Use error handling to deal
with bad messagesⓘ
50. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Error Handling and DLQ
Handled
Convert (read/write from Kafka,
[de]-serialisation)
Transform
Not Handled
Start (Connections to a data
store)
Poll / Put (Read/Write from/to
data store)*
* can be retried by Connect
https://cnfl.io/connect-dlq
51. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Fail Fast
Kafka Connect
Source topic messages
Sink messages
https://cnfl.io/connect-dlq
52. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
YOLO ¯_(ツ)_/¯
Kafka Connect
Source topic messages
Sink messages
errors.tolerance=all
https://cnfl.io/connect-dlq
53. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Dead Letter Queue
Kafka Connect
Source topic messages
Sink messages
Dead
letter
queue
errors.tolerance=all
errors.deadletterqueue.topic.name=my_dlq
https://cnfl.io/connect-dlq
54. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Re-processing the Dead Letter Queue
Source topic messages
Sink messages
Dead
letter
queue
Kafka Connect (Avro sink)
Kafka Connect (JSON sink)
https://cnfl.io/connect-dlq
55. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
No fields found using key and value schemas for
table: foo-bar
56. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
No fields found using key and value schemas for
table: foo-bar
JsonDeserializer with schemas.enable requires
"schema" and "payload" fields and may not contain
additional fields
57. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
No fields found using key and value schemas for table: foo-bar
JsonDeserializer with schemas.enable requires "schema" and "payload" fields and may not contain additional fields
Schema, where art thou?
59. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
No fields found using key and value schemas for table: foo-bar
Schema, where art thou?
No schema!
Messages
(JSON)
{
"id": 7,
"order_id": 7,
"customer_id": 849,
"order_total_usd": 98062.21,
"make": "Pontiac",
"model": "Aztek",
"delivery_city": "Leeds",
"delivery_company": "Price-Zieme"
"delivery_address": "754 Becker W
"CREATE_TS": "2019-05-09T13:42:28
"UPDATE_TS": "2019-05-09T13:42:28
}
60. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
No fields found using key and value schemas for table: foo-bar
Schema, where art thou?
-> You need a schema!
-> Use Avro, or use JSON with schemas.enable=true
Either way, you need to re-configure the producer of the data
Or, use KSQL to impose a schema on the JSON/CSV data and
reserialise it to Avro !
https://www.confluent.io/blog/data-wrangling-apache-kafka-ksql
61. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
JsonDeserializer with schemas.enable requires "schema" and "payload" fields and may not contain additional fields
Schema, where art thou?
Messages
(JSON)
Schema
(Embedded per
JSON message)
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
[!!...]
],
"optional": true,
"name": "asgard.demo.OR
},
"payload": {
"id": 611,
"order_id": 111,
"customer_id": 851,
"order_total_usd": 1821
"make": "Kia",
"model": "Sorento",
"delivery_city": "Swind
62. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
JsonDeserializer with schemas.enable requires "schema" and
"payload" fields and may not contain additional fields
Schema, where art thou?
-> Your JSON must be structured as
expected.
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
[!!...]
],
"optional": true,
"name": "asgard.demo.ORDERS
},
"payload": {
"id": 611,
"order_id": 111,
"customer_id": 851,
"order_total_usd": 182190.9
63. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Using KSQL to apply schema to your data
JSON
Avro
Kafka
Connect
KSQL
CREATE STREAM ORDERS_JSON
(id INT,
order_total_usd DOUBLE,
delivery_city VARCHAR)
WITH (KAFKA_TOPIC='orders-json',
VALUE_FORMAT='JSON');
CREATE STREAM ORDERS_AVRO WITH (
VALUE_FORMAT='AVRO',
KAFKA_TOPIC='orders-avro')
AS SELECT * FROM ORDERS_JSON;
64. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
JSON
Avro
Kafka
Connect
KSQL
CREATE STREAM ORDERS_JSON
(id INT,
order_total_usd DOUBLE,
delivery_city VARCHAR)
WITH (KAFKA_TOPIC='orders-json',
VALUE_FORMAT='JSON');
CREATE STREAM ORDERS_AVRO WITH (
VALUE_FORMAT='AVRO',
KAFKA_TOPIC='orders-avro')
AS SELECT * FROM ORDERS_JSON;
Using KSQL to apply schema to your data
66. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Kafka Connect images on Docker Hub
confluentinc/cp-kafka-connect-base
kafka-connect-elasticsearch
kafka-connect-jdbc
kafka-connect-hdfs
[…]
confluentinc/cp-kafka-connect
67. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Adding connectors to a container
confluentinc/cp-kafka-connect-base
JAR
Confluent Hub
68. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
At runtime
JAR
confluentinc/cp-kafka-connect-base
kafka-connect:
image: confluentinc/cp-kafka-connect:5.2.1
environment:
CONNECT_PLUGIN_PATH: '/usr/share/java,/usr/share/confluent-hub-components'
command:
- bash
- -c
- |
confluent-hub install --no-prompt neo4j/kafka-connect-neo4j:1.0.0
/etc/confluent/docker/run
http://rmoff.dev/ksln19-connect-docker
69. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Build a new image
FROM confluentinc/cp-kafka-connect:5.2.1
ENV CONNECT_PLUGIN_PATH="/usr/share/java,/usr/share/confluent-hub-components"
RUN confluent-hub install --no-prompt neo4j/kafka-connect-neo4j:1.0.0
JAR
confluentinc/cp-kafka-connect-base
70. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Automating connector creation
# # Download JDBC drivers
cd /usr/share/java/kafka-connect-jdbc/
curl https:!//cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-8.0.13.tar.gz | tar xz
#
# Now launch Kafka Connect
/etc/confluent/docker/run &
#
# Wait for Kafka Connect listener
while [ $$(curl -s -o /dev/null -w %{http_code} http:!//$$CONNECT_REST_ADVERTISED_HOST_NAME:$…
echo -e $$(date) " Kafka Connect listener HTTP state: " $$(curl -s -o /dev/null -w %{http_…
sleep 5
done
#
# Create JDBC Source connector
curl -X POST http:!//localhost:8083/connectors -H "Content-Type: application/json" -d '{
"name": "jdbc_source_mysql_00",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:mysql:!//mysql:3306/demo",
"connection.user": "connect_user",
"connection.password": "asgard",
"topic.prefix": "mysql-00-",
"table.whitelist" : "demo.customers",
}
}'
# Don't let the container die
sleep infinity http://rmoff.dev/ksln19-connect-docker
72. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
Confluent Community Components
Apache Kafka with a bunch of cool stuff! For free!
Database Changes Log Events loT Data Web Events …
CRM
Data Warehouse
Database
Hadoop
Data
Integration
…
Monitoring
Analytics
Custom Apps
Transformations
Real-time Applications
…
Confluent Platform
Confluent Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Monitoring & Administration
Confluent Control Center | Security
Operations
Replicator | Auto Data Balancing
Development and Connectivity
Clients | Connectors | REST Proxy | CLI
SQL Stream Processing
KSQL
Datacenter Public Cloud Confluent Cloud
CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
73. From Zero to Hero with Kafka Connect
@rmoff #kafkasummit
http://cnfl.io/book-bundle
74. @rmoff #kafkasummit
From Zero to Hero with Kafka Connecthttp://cnfl.io/slack
@rmoff #kafkasummit
Please rate my session in
the Kafka Summit app :-)