How to integrate your
database with kafka & CDC
Abdullah Zidan
About ?
we are talking about streaming integration of data, not just
bulk static copies of the data.
Two Options
● using the JDBC connector for Kafka Connect
● using a log-based Change Data Capture (CDC) tool which integrates with
Kafka Connect.
Kafka connect
Kafka connect
The Kafka Connect API is a core component of Apache
Kafka, introduced in version 0.9. It provides scalable and
resilient integration between Kafka and other systems.
Runs separately from kafka brokers
Connectors
● Scaleout of ingest and egress across nodes for greater throughput
● Automatic restart and failover of tasks in the event of node failure
● Automatic offset management
● Automatic preservation of source data schema
● Utilisation of data’s schema to create target objects (e.g. Hive tables when streaming to HDFS, RDBMS
tables when streaming to a database)
● Schema evolution and compatibility support (in conjunction with the Confluent Schema Registry)
● Automatic serialisation and deserialisation of data
● Single Message Transformations
● Exactly once processing semantics (on supported connectors)
JDBC plugin for Kafka Connect
The Confluent JDBC Connector for Kafka Connect enables you to
stream data to and from Kafka and any RDBMS that supports JDBC
iIt can stream entire schemas or just individual tables.
It can pull the entire contents (bulk), or do an incremental fetch of data that’s changed since the last poll
using a numeric key column, an update timestamp, or both.
CDC OPtions
● Query Based CDC
● Log based
How to integrate your database with kafka & CDC
How to integrate your database with kafka & CDC
How to integrate your database with kafka & CDC
How to integrate your database with kafka & CDC
How to integrate your database with kafka & CDC

How to integrate your database with kafka & CDC

  • 1.
    How to integrateyour database with kafka & CDC Abdullah Zidan
  • 2.
    About ? we aretalking about streaming integration of data, not just bulk static copies of the data.
  • 3.
    Two Options ● usingthe JDBC connector for Kafka Connect ● using a log-based Change Data Capture (CDC) tool which integrates with Kafka Connect.
  • 4.
  • 5.
    Kafka connect The KafkaConnect API is a core component of Apache Kafka, introduced in version 0.9. It provides scalable and resilient integration between Kafka and other systems. Runs separately from kafka brokers
  • 7.
    Connectors ● Scaleout ofingest and egress across nodes for greater throughput ● Automatic restart and failover of tasks in the event of node failure ● Automatic offset management ● Automatic preservation of source data schema ● Utilisation of data’s schema to create target objects (e.g. Hive tables when streaming to HDFS, RDBMS tables when streaming to a database) ● Schema evolution and compatibility support (in conjunction with the Confluent Schema Registry) ● Automatic serialisation and deserialisation of data ● Single Message Transformations ● Exactly once processing semantics (on supported connectors)
  • 8.
    JDBC plugin forKafka Connect The Confluent JDBC Connector for Kafka Connect enables you to stream data to and from Kafka and any RDBMS that supports JDBC iIt can stream entire schemas or just individual tables. It can pull the entire contents (bulk), or do an incremental fetch of data that’s changed since the last poll using a numeric key column, an update timestamp, or both.
  • 9.
    CDC OPtions ● QueryBased CDC ● Log based