1 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Kafka Connect - Debezium
Stream MySQL events to Kafka
2 | Kafka Connect /Debezium - Stream MySQL events to Kafka
About me
Kasun Don
Software Engineer - London
AWIN AG | Eichhornstraße 3 | 10785 Berlin
Telephone +49 (0)30 5096910 | info@awin.com | www.awin.com
• Automation & DevOps enthusiastic
• Hands on Big Data Engineering
• Open Source Contributor
3 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Why Streaming MySQL events (CDC) ?
• Integrations with Legacy Applications
Avoid dual writes when Integrating with legacy systems.
• Smart Cache Invalidation
Automatically invalidate entries in a cache as soon as the record(s) for entries change or are removed.
• Monitoring Data Changes
Immediately react to data changes committed by application/user.
• Data Warehousing
Atomic operation synchronizations for ETL-type solutions.
• Event Sourcing (CQRS)
Totally ordered collection of events to asynchronously update the read-only views while writes can be recorded as normal.
4 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Apache Kafka
Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable,
and durable.
Producer
Consumer Consumer Consumer
Producer Producer
Kafka
5 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Kafka Connect
Connectors – A logical process responsible for managing the copying of data between Kafka and
another system.
There are two types of connectors,
• Source Connectors import data from another system
• Sink Connectors export data from Kafka
Workers – Unit of work that schedules connectors and tasks in a
process.
There are two main type of workers: standalone and distributed
Tasks - Unit of process that handles assigned set of work load by connectors.
Connector configuration allows set to maximum number of tasks can be run by a
connector.
6 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Kafka Connect - Overview
Data
Source
Data
Sink
KafkaConnect
KAFKA
KafkaConnect
7 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Kafka Connect – Configuration
Common Connector Configuration
• name - Unique name for the connector. Attempting to register again with the same name will
fail.
• connector.class - The Java class for the connector
• tasks.max - The maximum number of tasks that should be created for this connector. The
connector may create fewer tasks if it cannot achieve this level of parallelism.
Please note that connector configuration might vary, see specific connector documentation for
more information.
Distributed Mode - Worker Configuration
bootstrap.servers - A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
group.id - A unique string that identifies the Connect cluster group this worker belongs to.
config.storage.topic - The topic to store connector and task configuration data in. This must be the same for all
workers with the same group.id.
offset.storage.topic - The topic to store offset data for connectors in. This must be the same for all workers with the
same group.id
status.storage.topic - The name of the topic where connector and task configuration status updates are stored.
For more distributed mode worker configuration : http://docs.confluent.io/current/connect/userguide.html#configuring-workers
8 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Kafka Connect – Running A Instance
It is recommended to run Kafka Connect on containerized environments such as Kubernetes, Mesos, Docker Swarm, or
YARN.
Kafka Connect distributed mode exposes port 8083 by default to serve management REST interface.
Kafka Connect does not automatically handle restarting or scaling workers which means your existing clustering solutions can continue to be used transparently. –
Confluent.io
$ docker run -d 
> --name=kafka-connect 
> --net=host 
> -e CONNECT_BOOTSTRAP_SERVERS="kafka-broker:9092" 
> -e CONNECT_GROUP_ID="group_1" 
> -e CONNECT_CONFIG_STORAGE_TOPIC="kafka-connect-config" 
> -e CONNECT_OFFSET_STORAGE_TOPIC="kafka-connect-offset" 
> -e CONNECT_STATUS_STORAGE_TOPIC="kafka-connect-status" 
> -e CONNECT_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" 
> -e CONNECT_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" 
> -e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" 
> -e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" 
> -e CONNECT_LOG4J_LOGGERS="io.debezium.connector.mysql=INFO" 
> -v /opt/kafka-connect/jars:/etc/kafka-connect/jars 
> --restart always 
> confluentinc/cp-kafka-connect:3.3.0
9 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Debezium Connector
What is Debezium ?
Debezium is an open source distributed platform for change data capture using MySQL row-level binary
logs. Debezium built on top of Kafka Connect API Framework to support fault tolerance and high availability
using Apache Kafka eco system. Debezium records in a transaction log all row-level changes committed to
each database table.
Supported Databases
Debezium currently able to support following list of database software.
• MySQL
• MongoDB
• PostgreSQL
For more Information : http://debezium.io/docs/connectors/
10 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Debezium Connector – MySQL Configuration
Enable binary logs
server-id = 1000001
log_bin = mysql-bin
binlog_format = row
binlog_row_image = full
expire_logs_days = 5
or
Enable GTIDs
gtid_mode = on
enforce_gtid_consistency = on
MySQL user with sufficient privileges
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION
CLIENT ON *.* TO 'debezium' IDENTIFIED BY password';
Supported MySQL topologies
• MySQL standalone
• MySQL master and slave
• Highly Available MySQL clusters
• Multi-Master MySQL
• Hosted MySQL eg: Amazon RDS and Amazon Aurora
11 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Debezium Connector – MySQL Connector
Configuration
Example Configuration
{
"name": "example-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "127.0.0.1",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "184054",
"database.server.name": "mysql-example",
"database.whitelist": "db1",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.mysql-example"
}
}
For more configuration : http://debezium.io/docs/connectors/mysql/
12 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Debezium Connector – Add Connector to Kafka
Connect
For more configuration : http://debezium.io/docs/connectors/mysql/
More REST Endpoints : https://docs.confluent.io/current/connect/managing.html#using-the-rest-interface
List Available Connector plugins
$ curl -s http://kafka-connect:8083/connector-plugins
[
{
"class": "io.confluent.connect.jdbc.JdbcSinkConnector"
},
{
"class": "io.confluent.connect.jdbc.JdbcSourceConnector"
},
{
"class": "io.debezium.connector.mysql.MySqlConnector"
},
{
"class": "org.apache.kafka.connect.file.FileStreamSinkConnector"
},
{
"class": "org.apache.kafka.connect.file.FileStreamSourceConnector"
}
]
Add connector
$ curl -s -X POST -H "Content-Type: application/json" --data @connector-config.json http://kafka-connect:8083/conn
Remove connector
$ curl -X DELETE -H "Content-Type: application/json” http://kafka-connect:8083/connectors
13 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Debezium Connector – Sample CDC Event
{
"schema": {},
"payload": {
"before": null,
"after": {
"id": 1004,
"first_name": "Anne Marie",
"last_name": "Kretchmar",
"email": "annek@noanswer.org"
},
"source": {
"name": "mysql-server-1",
"server_id": 223344,
"ts_sec": 1465581,
"gtid": null,
"file": "mysql-bin.000003",
"pos": 805,
"row": 0,
"snapshot": null
},
"op": "d",
"ts_ms": 1465581902461
}
}
{
"schema": {},
"payload": {
"before": {
"id": 1004,
"first_name": "Anne Marie",
"last_name": "Kretchmar",
"email": "annek@noanswer.org"
},
"after": null,
"source": {
"name": "mysql-server-1",
"server_id": 223344,
"ts_sec": 1465889,
"gtid": null,
"file": "mysql-bin.000003",
"pos": 806,
"row": 0,
"snapshot": null
},
"op": "d",
"ts_ms": 1465581902500
}
}
INSERT DELETE
14 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Useful Links
Kafka Connect – User Guide
http://docs.confluent.io/2.0.0/connect/userguide.
html
Debezium – Interactive tutorial
http://debezium.io/docs/tutorial/
Debezium – MySQL connector
http://debezium.io/docs/connectors/mysql/
Kafka Connect – REST Endpoints
http://docs.confluent.io/2.0.0/connect/userguide.html#rest-
interface
Debezium Support/User Group
User ::
https://gitter.im/debezium/user
Dev :: https://gitter.im/debezium/dev
Kafka Connect – Connectors
https://www.confluent.io/product/connectors/
15 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Q & A
16 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Thank you
http://linkedin.com/in/kasundon

Kafka Connect - debezium

  • 1.
    1 | KafkaConnect /Debezium - Stream MySQL events to Kafka Kafka Connect - Debezium Stream MySQL events to Kafka
  • 2.
    2 | KafkaConnect /Debezium - Stream MySQL events to Kafka About me Kasun Don Software Engineer - London AWIN AG | Eichhornstraße 3 | 10785 Berlin Telephone +49 (0)30 5096910 | info@awin.com | www.awin.com • Automation & DevOps enthusiastic • Hands on Big Data Engineering • Open Source Contributor
  • 3.
    3 | KafkaConnect /Debezium - Stream MySQL events to Kafka Why Streaming MySQL events (CDC) ? • Integrations with Legacy Applications Avoid dual writes when Integrating with legacy systems. • Smart Cache Invalidation Automatically invalidate entries in a cache as soon as the record(s) for entries change or are removed. • Monitoring Data Changes Immediately react to data changes committed by application/user. • Data Warehousing Atomic operation synchronizations for ETL-type solutions. • Event Sourcing (CQRS) Totally ordered collection of events to asynchronously update the read-only views while writes can be recorded as normal.
  • 4.
    4 | KafkaConnect /Debezium - Stream MySQL events to Kafka Apache Kafka Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable. Producer Consumer Consumer Consumer Producer Producer Kafka
  • 5.
    5 | KafkaConnect /Debezium - Stream MySQL events to Kafka Kafka Connect Connectors – A logical process responsible for managing the copying of data between Kafka and another system. There are two types of connectors, • Source Connectors import data from another system • Sink Connectors export data from Kafka Workers – Unit of work that schedules connectors and tasks in a process. There are two main type of workers: standalone and distributed Tasks - Unit of process that handles assigned set of work load by connectors. Connector configuration allows set to maximum number of tasks can be run by a connector.
  • 6.
    6 | KafkaConnect /Debezium - Stream MySQL events to Kafka Kafka Connect - Overview Data Source Data Sink KafkaConnect KAFKA KafkaConnect
  • 7.
    7 | KafkaConnect /Debezium - Stream MySQL events to Kafka Kafka Connect – Configuration Common Connector Configuration • name - Unique name for the connector. Attempting to register again with the same name will fail. • connector.class - The Java class for the connector • tasks.max - The maximum number of tasks that should be created for this connector. The connector may create fewer tasks if it cannot achieve this level of parallelism. Please note that connector configuration might vary, see specific connector documentation for more information. Distributed Mode - Worker Configuration bootstrap.servers - A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. group.id - A unique string that identifies the Connect cluster group this worker belongs to. config.storage.topic - The topic to store connector and task configuration data in. This must be the same for all workers with the same group.id. offset.storage.topic - The topic to store offset data for connectors in. This must be the same for all workers with the same group.id status.storage.topic - The name of the topic where connector and task configuration status updates are stored. For more distributed mode worker configuration : http://docs.confluent.io/current/connect/userguide.html#configuring-workers
  • 8.
    8 | KafkaConnect /Debezium - Stream MySQL events to Kafka Kafka Connect – Running A Instance It is recommended to run Kafka Connect on containerized environments such as Kubernetes, Mesos, Docker Swarm, or YARN. Kafka Connect distributed mode exposes port 8083 by default to serve management REST interface. Kafka Connect does not automatically handle restarting or scaling workers which means your existing clustering solutions can continue to be used transparently. – Confluent.io $ docker run -d > --name=kafka-connect > --net=host > -e CONNECT_BOOTSTRAP_SERVERS="kafka-broker:9092" > -e CONNECT_GROUP_ID="group_1" > -e CONNECT_CONFIG_STORAGE_TOPIC="kafka-connect-config" > -e CONNECT_OFFSET_STORAGE_TOPIC="kafka-connect-offset" > -e CONNECT_STATUS_STORAGE_TOPIC="kafka-connect-status" > -e CONNECT_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" > -e CONNECT_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" > -e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" > -e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" > -e CONNECT_LOG4J_LOGGERS="io.debezium.connector.mysql=INFO" > -v /opt/kafka-connect/jars:/etc/kafka-connect/jars > --restart always > confluentinc/cp-kafka-connect:3.3.0
  • 9.
    9 | KafkaConnect /Debezium - Stream MySQL events to Kafka Debezium Connector What is Debezium ? Debezium is an open source distributed platform for change data capture using MySQL row-level binary logs. Debezium built on top of Kafka Connect API Framework to support fault tolerance and high availability using Apache Kafka eco system. Debezium records in a transaction log all row-level changes committed to each database table. Supported Databases Debezium currently able to support following list of database software. • MySQL • MongoDB • PostgreSQL For more Information : http://debezium.io/docs/connectors/
  • 10.
    10 | KafkaConnect /Debezium - Stream MySQL events to Kafka Debezium Connector – MySQL Configuration Enable binary logs server-id = 1000001 log_bin = mysql-bin binlog_format = row binlog_row_image = full expire_logs_days = 5 or Enable GTIDs gtid_mode = on enforce_gtid_consistency = on MySQL user with sufficient privileges GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'debezium' IDENTIFIED BY password'; Supported MySQL topologies • MySQL standalone • MySQL master and slave • Highly Available MySQL clusters • Multi-Master MySQL • Hosted MySQL eg: Amazon RDS and Amazon Aurora
  • 11.
    11 | KafkaConnect /Debezium - Stream MySQL events to Kafka Debezium Connector – MySQL Connector Configuration Example Configuration { "name": "example-connector", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "tasks.max": "1", "database.hostname": "127.0.0.1", "database.port": "3306", "database.user": "debezium", "database.password": "dbz", "database.server.id": "184054", "database.server.name": "mysql-example", "database.whitelist": "db1", "database.history.kafka.bootstrap.servers": "kafka:9092", "database.history.kafka.topic": "dbhistory.mysql-example" } } For more configuration : http://debezium.io/docs/connectors/mysql/
  • 12.
    12 | KafkaConnect /Debezium - Stream MySQL events to Kafka Debezium Connector – Add Connector to Kafka Connect For more configuration : http://debezium.io/docs/connectors/mysql/ More REST Endpoints : https://docs.confluent.io/current/connect/managing.html#using-the-rest-interface List Available Connector plugins $ curl -s http://kafka-connect:8083/connector-plugins [ { "class": "io.confluent.connect.jdbc.JdbcSinkConnector" }, { "class": "io.confluent.connect.jdbc.JdbcSourceConnector" }, { "class": "io.debezium.connector.mysql.MySqlConnector" }, { "class": "org.apache.kafka.connect.file.FileStreamSinkConnector" }, { "class": "org.apache.kafka.connect.file.FileStreamSourceConnector" } ] Add connector $ curl -s -X POST -H "Content-Type: application/json" --data @connector-config.json http://kafka-connect:8083/conn Remove connector $ curl -X DELETE -H "Content-Type: application/json” http://kafka-connect:8083/connectors
  • 13.
    13 | KafkaConnect /Debezium - Stream MySQL events to Kafka Debezium Connector – Sample CDC Event { "schema": {}, "payload": { "before": null, "after": { "id": 1004, "first_name": "Anne Marie", "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "source": { "name": "mysql-server-1", "server_id": 223344, "ts_sec": 1465581, "gtid": null, "file": "mysql-bin.000003", "pos": 805, "row": 0, "snapshot": null }, "op": "d", "ts_ms": 1465581902461 } } { "schema": {}, "payload": { "before": { "id": 1004, "first_name": "Anne Marie", "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "after": null, "source": { "name": "mysql-server-1", "server_id": 223344, "ts_sec": 1465889, "gtid": null, "file": "mysql-bin.000003", "pos": 806, "row": 0, "snapshot": null }, "op": "d", "ts_ms": 1465581902500 } } INSERT DELETE
  • 14.
    14 | KafkaConnect /Debezium - Stream MySQL events to Kafka Useful Links Kafka Connect – User Guide http://docs.confluent.io/2.0.0/connect/userguide. html Debezium – Interactive tutorial http://debezium.io/docs/tutorial/ Debezium – MySQL connector http://debezium.io/docs/connectors/mysql/ Kafka Connect – REST Endpoints http://docs.confluent.io/2.0.0/connect/userguide.html#rest- interface Debezium Support/User Group User :: https://gitter.im/debezium/user Dev :: https://gitter.im/debezium/dev Kafka Connect – Connectors https://www.confluent.io/product/connectors/
  • 15.
    15 | KafkaConnect /Debezium - Stream MySQL events to Kafka Q & A
  • 16.
    16 | KafkaConnect /Debezium - Stream MySQL events to Kafka Thank you http://linkedin.com/in/kasundon