SlideShare a Scribd company logo
Technology Choices for Kafka
and Change Data Capture
Kate Stanley and Andrew Schofield
Apache Kafka London Meetup October 2019
IBM Event StreamsApache Kafka
Change Data Capture identifies and
captures the changes to a data store
© 2019 IBM Corporation 2
Change Data Capture identifies and
captures the changes to a data store
as a stream of Kafka events
© 2019 IBM Corporation 3
Point-to-point data
integration
© 2019 IBM Corporation 4
MASTER DATABASE
APPLICATION
Point-to-point data
integration
© 2019 IBM Corporation 5
MASTER DATABASE
RECOVERY
DATABASE
AUDIT LOG
QUERY CACHE
APPLICATION
It’s publish/subscribe for data
© 2019 IBM Corporation 6
MASTER DATABASE
RECOVERY
DATABASE
AUDIT LOG
QUERY CACHE
APPLICATION
Technology choices
These different approaches have all been used successfully
1. Data store natively generates a feed of changes
2. Repeated queries, with optimization or restrictions
3. Log scanning
© 2019 IBM Corporation 7
Why use Kafka with CDC?
Kafka has lots of connectors to other systems
It acts as a buffer, loosening coupling between source and target
Publish/subscribe, instead of point-to-point
Makes it easy to process the CDC stream as events in Kafka client application code
© 2019 IBM Corporation 8
Kafka Connect JDBC source
© 2019 IBM Corporation 9
JDBC source connector
Uses JDBC to connect to any compliant relational database
e.g. Oracle, Microsoft SQL Server, DB2, MySQL and Postgres.
© 2019 IBM Corporation 10
JDBC source connector
Uses JDBC to connect to any compliant relational database
e.g. Oracle, Microsoft SQL Server, DB2, MySQL and Postgres.
Requires a Kafka Connect runtime
© 2019 IBM Corporation 11
JDBC source connector
Uses JDBC to connect to any compliant relational database
e.g. Oracle, Microsoft SQL Server, DB2, MySQL and Postgres.
Requires a Kafka Connect runtime
Can bulk copy tables with any columns
To receive just the changes, particular columns needed
© 2019 IBM Corporation 12
JDBC source connector
Uses JDBC to connect to any compliant relational database
e.g. Oracle, Microsoft SQL Server, DB2, MySQL and Postgres.
Requires a Kafka Connect runtime
Can bulk copy tables with any columns
To receive just the changes, particular columns needed
Open-source: https://github.com/confluentinc/kafka-connect-jdbc
© 2019 IBM Corporation 13
Configuring the JDBC
connector
© 2019 IBM Corporation 14
$ curl -X PUT -d '{"connector.class":”
io.confluent.connect.jdbc.JdbcSourceConnector"}’
http://localhost:8083/connector-
plugins/MyConnector/config/validate
Configuring the JDBC
connector
© 2019 IBM Corporation 15
$ curl -X PUT -d '{"connector.class":”
io.confluent.connect.jdbc.JdbcSourceConnector"}’
http://localhost:8083/connector-
plugins/MyConnector/config/validate
Required config options:
name
connector.class
connection.url – JDBC connection URL
topic.prefix – prefix to prepend to table names
16
Configuring the JDBC
connector
© 2019 IBM Corporation
Configuring the JDBC
connector
Required config options:
name
connector.class
connection.url – JDBC connection URL
topic.prefix – prefix to prepend to table names
mode – bulk, incrementing, timestamp, timestamp + incrementing
© 2019 IBM Corporation 17
Incrementing mode
Use a strictly incrementing column on each table to
detect only new rows.
© 2019 IBM Corporation 18
id
First
name
Surname Amount
0 John Smith 20
1 Daisy Williams 25
2 Laura Thomas 15
Incrementing mode
Use a strictly incrementing column on each table to
detect only new rows.
Requires incrementing.column.name to be set
Does not detect modifications or deletions of existing
rows
ID column must be present on all tables
Identifier must be in a single column
© 2019 IBM Corporation 19
id
First
name
Surname Amount
0 John Smith 20
1 Daisy Williams 25
2 Laura Thomas 15
Timestamp mode
Use a timestamp column to detect new and
modified rows.
© 2019 IBM Corporation 20
timestamp First name Surname Amount
2019-10-09
18:10:15
John Smith 20
2019-10-09
18:17:36
Daisy Williams 25
2019-10-09
18:57:12
Laura Thomas 15
Timestamp mode
Use a timestamp column to detect new and
modified rows.
© 2019 IBM Corporation 21
timestamp First name Surname Amount
2019-10-09
18:10:15
John Smith 20
2019-10-09
18:17:36
Daisy Williams 25
2019-10-09
18:57:12
Laura Thomas 15
Requires timestamp.column.name to be set
Timestamp column must be updated with each write
Timestamp column must be monotonically incrementing
Timestamp column must be present on all tables
Timestamp mode
Use a timestamp column to detect new and
modified rows.
© 2019 IBM Corporation 22
timestamp First name Surname Amount
2019-10-09
18:10:15
John Smith 20
2019-10-09
18:17:36
Daisy Williams 25
2019-10-09
18:57:12
Laura Thomas 15
Requires timestamp.column.name to be set
Timestamp column must be updated with each write
Timestamp column must be monotonically incrementing
Timestamp column must be present on all tables
Does not guarantee all updated data delivered, since timestamps aren’t unique.
Timestamp+Incrementing
mode
Uses both a timestamp column and incrementing id column.
Detects new and updated rows.
More robust than timestamp alone since the combination of id and timestamp should
be unique.
© 2019 IBM Corporation 23
timestamp id First name Surname Amount
2019-10-09
18:10:15
0 John Smith 20
2019-10-09
18:17:36
1 Daisy Williams 25
2019-10-09
18:57:12
2 Laura Thomas 15
JDBC source connector
© 2019 IBM Corporation 24
LICENSE:
Confluent Community License Agreement Version 1.0
© 2019 IBM Corporation 25
Building the JDBC connector
from source
1. Edit the pom.xml:
a) Comment out the Confluent parts of the
pom.xml
b) Add a version
c) Comment out checkstyle
d) Add Java 8 enforcement
e) Add versions for dependencies
2. git clone confluentinc/kafka-
connect-jdbc.git
3. cd kafka-connect-jdbc
mvn install –D skipTests
© 2019 IBM Corporation 26
Building the JDBC connector
from source
1. git clone confluentinc/kafka.git
(Apache 2.0 license)
2. cd kafka
gradle
./gradlew installAll
3. git clone confluentinc/common.git
(Apache 2.0 license)
4. cd common
mvn install
5. git clone confluentinc/kafka-connect-jdbc.git
(Confluent Community license)
6. cd kafka-connect-jdbc
mvn install
© 2019 IBM Corporation 27
Running the JDBC connector
Must check the JDBC driver has been loaded (SQLite and Postgres included by default)
1. Increase log level to DEBUG
2. Check JDBC driver JAR is in Loading plugin urls list
3. Check for ‘Added plugin’ line immediately after
CLASSPATH=/Users/katherinestanley/connectors/mysql-connector-java-
8.0.17.jar ./bin/connect-distributed.sh config/connect-
distributed.properties
https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector
© 2019 IBM Corporation 28
Debezium
© 2019 IBM Corporation 29
Debezium
Debezium is an open-source platform for change data capture using Kafka Connect
MySQL, MongoDB, PostgreSQL, SQL Server; incubator – Oracle, Cassandra, Db2 (soon)
Each supported database has separate code
Underlying technology depends on database
MySQL uses log scanning, SQL Server uses special CDC tables created by the database, …
Open-source – https://github.com/debezium/debezium
Proper open licence – Apache 2.0
© 2019 IBM Corporation 30
Debezium – log scanning
© 2019 IBM Corporation 31
Kafka Connect worker
T1
Ins
T2
Upd
T1
Ins
T2
Ins
T1
Del
T1
Cmt
T2
Pre
T2
Cmt
Debezium
Ins Upd Ins Ins Del
Read
Publish
DATABASE LOG
Debezium MySQL
Uses log scanning – requires configuration of row-based binary logs
WRITE_ROWS for row insert
UPDATE_ROWS for row update
DELETE_ROWS for row delete
QUERY for all kinds of miscellaneous stuff, including transaction commit
Nice and efficient, but connector code is very specific to MySQL internal details
© 2019 IBM Corporation 32
Database replication
© 2019 IBM Corporation 33
Source database
T1
Ins
T2
Upd
T1
Ins
T2
Ins
T1
Del
T1
Cmt
T2
Pre
T2
Cmt
CAPTURE PROGRAM
CHANGE DATA
TABLE
SOURCE
TABLE
Target database
TARGET
TABLE
APPLY PROGRAM
DATABASE LOG
Debezium – replication tables
© 2019 IBM Corporation 34
Source database
T1
Ins
T2
Upd
T1
Ins
T2
Ins
T1
Del
T1
Cmt
T2
Pre
T2
Cmt
CAPTURE PROGRAM
CHANGE DATA
TABLE
SOURCE
TABLE
DATABASE LOG
Kafka Connect worker
Debezium
Ins Upd Ins Ins Del
Publish
How can I try it?
Try the totally excellent Docker-based tutorial
https://debezium.io/documentation/reference/0.10/tutorial.html
© 2019 IBM Corporation 35
Record formatting
The default is comprehensive and very verbose
© 2019 IBM Corporation 36
{
"schema" : {
},
"payload" : {
"op": "u",
"source": {
...
},
"ts_ms" : "...",
"before" : {
"field1" : "oldvalue1",
"field2" : "oldvalue2"
},
"after" : {
"field1" : "newvalue1",
"field2" : "newvalue2"
}
}
}
Record formatting
Just use the provided ExtractNewRecordState SMT
© 2019 IBM Corporation 37
{
"schema" : {
},
"payload" : {
"op": "u",
"source": {
...
},
"ts_ms" : "...",
"before" : {
"field1" : "oldvalue1",
"field2" : "oldvalue2"
},
"after" : {
"field1" : "newvalue1",
"field2" : "newvalue2"
}
}
}
{
"field1" : "newvalue1",
"field2" : "newvalue2”
}
SMT
IBM InfoSphere Data Replication
© 2019 IBM Corporation 38
IBM InfoSphere Data Replication
Enterprise-grade CDC built exclusively on log scanning
Focus on performance and transactionality
Can be customised with user code
Does not use Kafka Connect because wants tighter control over publish
© 2019 IBM Corporation 39
IIDR architecture
© 2019 IBM Corporation 40
Source server
T1
Ins
T2
Upd
T1
Ins
T2
Ins
T1
Del
T1
Cmt
T2
Pre
T2
Cmt
CDC SOURCE ENGINE
DATABASE LOG
Ins Upd Ins Ins Del
Target server
CDC TARGET ENGINE
PublishWRITER
WRITER
PARSE
TRANSFORM
MANAGEMENT CONSOLE
Read
Send
er 15, 2018 / © 2018 IBM Corporation
Four Time-Interleaved Source Database Transactions
Transaction 1 Op1(Tab2) Op2(Tab3) Op3(tab2) Commit
Transaction 2 Op1(Tab2) Op2(Tab2) Op3(Tab3) Op4(tab2) Commit
Transaction 3 Op1(Tab1) Commit
Transaction 4 Op1(tab1) Commit
===================== TIME =====================è
Transactionally Consistent Consumer
Recreates order of operations in source database across multiple topics and
partitions, with no duplicates
Uses a ”commitstream” topic to maintain transaction metadata
User topic data is not modified
Kafka records can be written out of strict order and TCC sorts it all out
© 2019 IBM Corporation 42
Summary
© 2019 IBM Corporation 43
Summary
There is a variety of open-source and commercial CDC options for Kafka
Choice depends largely on desired throughput, flexibility, semantics and cost
© 2019 IBM Corporation 44
© 2019 IBM Corporation
IBM Cloud - London
This is a group for anyone interested in learning about
#IBMCloud, the cloud built for business. You can be an
existing #IBMCloud user, or someone who has never touched
the #IBMCloud before. Meetup topics will vary and can be of
interest to developers, administrators, or even business
leaders!
We are interested in using amazing tech to grow business and
make the world a better place. Some of the technology topics
that we will talk about are: cloud platforms, artificial
intelligence, blockchain, analytics, automation, cloud services
/ APIs, data science, integration, application development,
and governance.
Humanizing your chatbot,
how I digress!
Site Reliability
Engineer to the rescue!
Blockchain: The Good, The Bad and The Ugly!
Unlocking the power of
automation with AI and ML
Innovate with APIs (App
Mod #2)
Sign up at:
https://www.meetup.com/IBM-Cloud-London/
to come along and take part at our events!
Thank you
Kate Stanley @katestanley91
Andrew Schofield https://medium.com/@andrew_schofield
Links: https://kafka.apache.org/documentation/#connect
https://github.com/confluentinc/kafka-connect-jdbc
https://debezium.io
https://github.com/debezium
IBM Event Streams: ibm.biz/aboutEventStreams
© 2019 IBM Corporation 46

More Related Content

What's hot

Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
ChengKuan Gan
 
How easy (or hard) it is to monitor your graph ql service performance
How easy (or hard) it is to monitor your graph ql service performanceHow easy (or hard) it is to monitor your graph ql service performance
How easy (or hard) it is to monitor your graph ql service performance
Red Hat
 
Microservices Architecture Part 2 Event Sourcing and Saga
Microservices Architecture Part 2 Event Sourcing and SagaMicroservices Architecture Part 2 Event Sourcing and Saga
Microservices Architecture Part 2 Event Sourcing and Saga
Araf Karsh Hamid
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Araf Karsh Hamid
 
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, ConfluentMaking Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
HostedbyConfluent
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
Sunil Nagaraj
 
Microservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native AppsMicroservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native Apps
Araf Karsh Hamid
 
Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018
Araf Karsh Hamid
 
Functional reactive programming
Functional reactive programmingFunctional reactive programming
Functional reactive programming
Araf Karsh Hamid
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
HostedbyConfluent
 
Agile, User Stories, Domain Driven Design
Agile, User Stories, Domain Driven DesignAgile, User Stories, Domain Driven Design
Agile, User Stories, Domain Driven Design
Araf Karsh Hamid
 
Redis and Kafka - Advanced Microservices Design Patterns Simplified
Redis and Kafka - Advanced Microservices Design Patterns SimplifiedRedis and Kafka - Advanced Microservices Design Patterns Simplified
Redis and Kafka - Advanced Microservices Design Patterns Simplified
Allen Terleto
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
confluent
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Microservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive ProgrammingMicroservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive Programming
Araf Karsh Hamid
 
Microservices with Kafka Ecosystem
Microservices with Kafka EcosystemMicroservices with Kafka Ecosystem
Microservices with Kafka Ecosystem
Guido Schmutz
 
Microservices, DevOps & SRE
Microservices, DevOps & SREMicroservices, DevOps & SRE
Microservices, DevOps & SRE
Araf Karsh Hamid
 
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Jonghyun Lee
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 

What's hot (20)

Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 
How easy (or hard) it is to monitor your graph ql service performance
How easy (or hard) it is to monitor your graph ql service performanceHow easy (or hard) it is to monitor your graph ql service performance
How easy (or hard) it is to monitor your graph ql service performance
 
Microservices Architecture Part 2 Event Sourcing and Saga
Microservices Architecture Part 2 Event Sourcing and SagaMicroservices Architecture Part 2 Event Sourcing and Saga
Microservices Architecture Part 2 Event Sourcing and Saga
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SRE
 
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, ConfluentMaking Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
 
Microservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native AppsMicroservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native Apps
 
Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018
 
Functional reactive programming
Functional reactive programmingFunctional reactive programming
Functional reactive programming
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
 
Agile, User Stories, Domain Driven Design
Agile, User Stories, Domain Driven DesignAgile, User Stories, Domain Driven Design
Agile, User Stories, Domain Driven Design
 
Redis and Kafka - Advanced Microservices Design Patterns Simplified
Redis and Kafka - Advanced Microservices Design Patterns SimplifiedRedis and Kafka - Advanced Microservices Design Patterns Simplified
Redis and Kafka - Advanced Microservices Design Patterns Simplified
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Microservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive ProgrammingMicroservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive Programming
 
Microservices with Kafka Ecosystem
Microservices with Kafka EcosystemMicroservices with Kafka Ecosystem
Microservices with Kafka Ecosystem
 
Microservices, DevOps & SRE
Microservices, DevOps & SREMicroservices, DevOps & SRE
Microservices, DevOps & SRE
 
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 

Similar to Technology choices for Apache Kafka and Change Data Capture

Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
confluent
 
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
confluent
 
Containerize Legacy .NET Framework Web Apps for Cloud Migration
Containerize Legacy .NET Framework Web Apps for Cloud MigrationContainerize Legacy .NET Framework Web Apps for Cloud Migration
Containerize Legacy .NET Framework Web Apps for Cloud Migration
Amazon Web Services
 
Simplify Cloud Applications using Spring Cloud
Simplify Cloud Applications using Spring CloudSimplify Cloud Applications using Spring Cloud
Simplify Cloud Applications using Spring Cloud
Ramnivas Laddad
 
JSpring Virtual 2020 - Reacting to an event-driven world
JSpring Virtual 2020 - Reacting to an event-driven worldJSpring Virtual 2020 - Reacting to an event-driven world
JSpring Virtual 2020 - Reacting to an event-driven world
Grace Jansen
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
Amazon Web Services
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
Amazon Web Services
 
Virtual Meetup Sweden - Reacting to an event driven world
Virtual Meetup Sweden - Reacting to an event driven worldVirtual Meetup Sweden - Reacting to an event driven world
Virtual Meetup Sweden - Reacting to an event driven world
Grace Jansen
 
Breaking the Monolith Using AWS Container Services
Breaking the Monolith Using AWS Container ServicesBreaking the Monolith Using AWS Container Services
Breaking the Monolith Using AWS Container Services
Amazon Web Services
 
AWS DevDay Cologne - Automating building blocks choices you will face with co...
AWS DevDay Cologne - Automating building blocks choices you will face with co...AWS DevDay Cologne - Automating building blocks choices you will face with co...
AWS DevDay Cologne - Automating building blocks choices you will face with co...
Cobus Bernard
 
JLove conference 2020 - Reacting to an Event-Driven World
JLove conference 2020 - Reacting to an Event-Driven WorldJLove conference 2020 - Reacting to an Event-Driven World
JLove conference 2020 - Reacting to an Event-Driven World
Grace Jansen
 
Jfokus - Reacting to an event-driven world
Jfokus - Reacting to an event-driven worldJfokus - Reacting to an event-driven world
Jfokus - Reacting to an event-driven world
Grace Jansen
 
DevNexus - Reacting to an event driven world
DevNexus - Reacting to an event driven worldDevNexus - Reacting to an event driven world
DevNexus - Reacting to an event driven world
Grace Jansen
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
Timothy Spann
 
Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...
Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...
Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...
confluent
 
Deploy and scale your first cloud application with Amazon Lightsail - CMP202 ...
Deploy and scale your first cloud application with Amazon Lightsail - CMP202 ...Deploy and scale your first cloud application with Amazon Lightsail - CMP202 ...
Deploy and scale your first cloud application with Amazon Lightsail - CMP202 ...
Amazon Web Services
 
Webinar: Flink SQL in Action - Fabian Hueske
 Webinar: Flink SQL in Action - Fabian Hueske Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian Hueske
Ververica
 
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaFast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Altinity Ltd
 
Building a Critical Communications Platform Using Serverless Technologies
Building a Critical Communications Platform Using Serverless TechnologiesBuilding a Critical Communications Platform Using Serverless Technologies
Building a Critical Communications Platform Using Serverless Technologies
Amazon Web Services
 
Deep Dive - CI/CD on AWS
Deep Dive - CI/CD on AWSDeep Dive - CI/CD on AWS
Deep Dive - CI/CD on AWS
Amazon Web Services
 

Similar to Technology choices for Apache Kafka and Change Data Capture (20)

Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
 
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
 
Containerize Legacy .NET Framework Web Apps for Cloud Migration
Containerize Legacy .NET Framework Web Apps for Cloud MigrationContainerize Legacy .NET Framework Web Apps for Cloud Migration
Containerize Legacy .NET Framework Web Apps for Cloud Migration
 
Simplify Cloud Applications using Spring Cloud
Simplify Cloud Applications using Spring CloudSimplify Cloud Applications using Spring Cloud
Simplify Cloud Applications using Spring Cloud
 
JSpring Virtual 2020 - Reacting to an event-driven world
JSpring Virtual 2020 - Reacting to an event-driven worldJSpring Virtual 2020 - Reacting to an event-driven world
JSpring Virtual 2020 - Reacting to an event-driven world
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Virtual Meetup Sweden - Reacting to an event driven world
Virtual Meetup Sweden - Reacting to an event driven worldVirtual Meetup Sweden - Reacting to an event driven world
Virtual Meetup Sweden - Reacting to an event driven world
 
Breaking the Monolith Using AWS Container Services
Breaking the Monolith Using AWS Container ServicesBreaking the Monolith Using AWS Container Services
Breaking the Monolith Using AWS Container Services
 
AWS DevDay Cologne - Automating building blocks choices you will face with co...
AWS DevDay Cologne - Automating building blocks choices you will face with co...AWS DevDay Cologne - Automating building blocks choices you will face with co...
AWS DevDay Cologne - Automating building blocks choices you will face with co...
 
JLove conference 2020 - Reacting to an Event-Driven World
JLove conference 2020 - Reacting to an Event-Driven WorldJLove conference 2020 - Reacting to an Event-Driven World
JLove conference 2020 - Reacting to an Event-Driven World
 
Jfokus - Reacting to an event-driven world
Jfokus - Reacting to an event-driven worldJfokus - Reacting to an event-driven world
Jfokus - Reacting to an event-driven world
 
DevNexus - Reacting to an event driven world
DevNexus - Reacting to an event driven worldDevNexus - Reacting to an event driven world
DevNexus - Reacting to an event driven world
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 
Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...
Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...
Running Kafka in Kubernetes: A Practical Guide (Katherine Stanley, IBM United...
 
Deploy and scale your first cloud application with Amazon Lightsail - CMP202 ...
Deploy and scale your first cloud application with Amazon Lightsail - CMP202 ...Deploy and scale your first cloud application with Amazon Lightsail - CMP202 ...
Deploy and scale your first cloud application with Amazon Lightsail - CMP202 ...
 
Webinar: Flink SQL in Action - Fabian Hueske
 Webinar: Flink SQL in Action - Fabian Hueske Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian Hueske
 
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaFast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
 
Building a Critical Communications Platform Using Serverless Technologies
Building a Critical Communications Platform Using Serverless TechnologiesBuilding a Critical Communications Platform Using Serverless Technologies
Building a Critical Communications Platform Using Serverless Technologies
 
Deep Dive - CI/CD on AWS
Deep Dive - CI/CD on AWSDeep Dive - CI/CD on AWS
Deep Dive - CI/CD on AWS
 

More from Andrew Schofield

Event-driven microservices
Event-driven microservicesEvent-driven microservices
Event-driven microservices
Andrew Schofield
 
IBM Message Hub: Cloud-Native Messaging
IBM Message Hub: Cloud-Native MessagingIBM Message Hub: Cloud-Native Messaging
IBM Message Hub: Cloud-Native Messaging
Andrew Schofield
 
Effectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging EnvironmentEffectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging Environment
Andrew Schofield
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Andrew Schofield
 
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
Andrew Schofield
 
Ame 2269 ibm mq high availability
Ame 2269 ibm mq high availabilityAme 2269 ibm mq high availability
Ame 2269 ibm mq high availability
Andrew Schofield
 
Ame 4166 ibm mq appliance
Ame 4166 ibm mq applianceAme 4166 ibm mq appliance
Ame 4166 ibm mq appliance
Andrew Schofield
 
Connecting IBM MessageSight to the Enterprise
Connecting IBM MessageSight to the EnterpriseConnecting IBM MessageSight to the Enterprise
Connecting IBM MessageSight to the Enterprise
Andrew Schofield
 
Introduction to IBM MessageSight
Introduction to IBM MessageSightIntroduction to IBM MessageSight
Introduction to IBM MessageSight
Andrew Schofield
 

More from Andrew Schofield (9)

Event-driven microservices
Event-driven microservicesEvent-driven microservices
Event-driven microservices
 
IBM Message Hub: Cloud-Native Messaging
IBM Message Hub: Cloud-Native MessagingIBM Message Hub: Cloud-Native Messaging
IBM Message Hub: Cloud-Native Messaging
 
Effectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging EnvironmentEffectively Managing a Hybrid Messaging Environment
Effectively Managing a Hybrid Messaging Environment
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
 
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
 
Ame 2269 ibm mq high availability
Ame 2269 ibm mq high availabilityAme 2269 ibm mq high availability
Ame 2269 ibm mq high availability
 
Ame 4166 ibm mq appliance
Ame 4166 ibm mq applianceAme 4166 ibm mq appliance
Ame 4166 ibm mq appliance
 
Connecting IBM MessageSight to the Enterprise
Connecting IBM MessageSight to the EnterpriseConnecting IBM MessageSight to the Enterprise
Connecting IBM MessageSight to the Enterprise
 
Introduction to IBM MessageSight
Introduction to IBM MessageSightIntroduction to IBM MessageSight
Introduction to IBM MessageSight
 

Recently uploaded

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
abdulrafaychaudhry
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
vrstrong314
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 

Recently uploaded (20)

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 

Technology choices for Apache Kafka and Change Data Capture

  • 1. Technology Choices for Kafka and Change Data Capture Kate Stanley and Andrew Schofield Apache Kafka London Meetup October 2019 IBM Event StreamsApache Kafka
  • 2. Change Data Capture identifies and captures the changes to a data store © 2019 IBM Corporation 2
  • 3. Change Data Capture identifies and captures the changes to a data store as a stream of Kafka events © 2019 IBM Corporation 3
  • 4. Point-to-point data integration © 2019 IBM Corporation 4 MASTER DATABASE APPLICATION
  • 5. Point-to-point data integration © 2019 IBM Corporation 5 MASTER DATABASE RECOVERY DATABASE AUDIT LOG QUERY CACHE APPLICATION
  • 6. It’s publish/subscribe for data © 2019 IBM Corporation 6 MASTER DATABASE RECOVERY DATABASE AUDIT LOG QUERY CACHE APPLICATION
  • 7. Technology choices These different approaches have all been used successfully 1. Data store natively generates a feed of changes 2. Repeated queries, with optimization or restrictions 3. Log scanning © 2019 IBM Corporation 7
  • 8. Why use Kafka with CDC? Kafka has lots of connectors to other systems It acts as a buffer, loosening coupling between source and target Publish/subscribe, instead of point-to-point Makes it easy to process the CDC stream as events in Kafka client application code © 2019 IBM Corporation 8
  • 9. Kafka Connect JDBC source © 2019 IBM Corporation 9
  • 10. JDBC source connector Uses JDBC to connect to any compliant relational database e.g. Oracle, Microsoft SQL Server, DB2, MySQL and Postgres. © 2019 IBM Corporation 10
  • 11. JDBC source connector Uses JDBC to connect to any compliant relational database e.g. Oracle, Microsoft SQL Server, DB2, MySQL and Postgres. Requires a Kafka Connect runtime © 2019 IBM Corporation 11
  • 12. JDBC source connector Uses JDBC to connect to any compliant relational database e.g. Oracle, Microsoft SQL Server, DB2, MySQL and Postgres. Requires a Kafka Connect runtime Can bulk copy tables with any columns To receive just the changes, particular columns needed © 2019 IBM Corporation 12
  • 13. JDBC source connector Uses JDBC to connect to any compliant relational database e.g. Oracle, Microsoft SQL Server, DB2, MySQL and Postgres. Requires a Kafka Connect runtime Can bulk copy tables with any columns To receive just the changes, particular columns needed Open-source: https://github.com/confluentinc/kafka-connect-jdbc © 2019 IBM Corporation 13
  • 14. Configuring the JDBC connector © 2019 IBM Corporation 14 $ curl -X PUT -d '{"connector.class":” io.confluent.connect.jdbc.JdbcSourceConnector"}’ http://localhost:8083/connector- plugins/MyConnector/config/validate
  • 15. Configuring the JDBC connector © 2019 IBM Corporation 15 $ curl -X PUT -d '{"connector.class":” io.confluent.connect.jdbc.JdbcSourceConnector"}’ http://localhost:8083/connector- plugins/MyConnector/config/validate Required config options: name connector.class connection.url – JDBC connection URL topic.prefix – prefix to prepend to table names
  • 16. 16 Configuring the JDBC connector © 2019 IBM Corporation
  • 17. Configuring the JDBC connector Required config options: name connector.class connection.url – JDBC connection URL topic.prefix – prefix to prepend to table names mode – bulk, incrementing, timestamp, timestamp + incrementing © 2019 IBM Corporation 17
  • 18. Incrementing mode Use a strictly incrementing column on each table to detect only new rows. © 2019 IBM Corporation 18 id First name Surname Amount 0 John Smith 20 1 Daisy Williams 25 2 Laura Thomas 15
  • 19. Incrementing mode Use a strictly incrementing column on each table to detect only new rows. Requires incrementing.column.name to be set Does not detect modifications or deletions of existing rows ID column must be present on all tables Identifier must be in a single column © 2019 IBM Corporation 19 id First name Surname Amount 0 John Smith 20 1 Daisy Williams 25 2 Laura Thomas 15
  • 20. Timestamp mode Use a timestamp column to detect new and modified rows. © 2019 IBM Corporation 20 timestamp First name Surname Amount 2019-10-09 18:10:15 John Smith 20 2019-10-09 18:17:36 Daisy Williams 25 2019-10-09 18:57:12 Laura Thomas 15
  • 21. Timestamp mode Use a timestamp column to detect new and modified rows. © 2019 IBM Corporation 21 timestamp First name Surname Amount 2019-10-09 18:10:15 John Smith 20 2019-10-09 18:17:36 Daisy Williams 25 2019-10-09 18:57:12 Laura Thomas 15 Requires timestamp.column.name to be set Timestamp column must be updated with each write Timestamp column must be monotonically incrementing Timestamp column must be present on all tables
  • 22. Timestamp mode Use a timestamp column to detect new and modified rows. © 2019 IBM Corporation 22 timestamp First name Surname Amount 2019-10-09 18:10:15 John Smith 20 2019-10-09 18:17:36 Daisy Williams 25 2019-10-09 18:57:12 Laura Thomas 15 Requires timestamp.column.name to be set Timestamp column must be updated with each write Timestamp column must be monotonically incrementing Timestamp column must be present on all tables Does not guarantee all updated data delivered, since timestamps aren’t unique.
  • 23. Timestamp+Incrementing mode Uses both a timestamp column and incrementing id column. Detects new and updated rows. More robust than timestamp alone since the combination of id and timestamp should be unique. © 2019 IBM Corporation 23 timestamp id First name Surname Amount 2019-10-09 18:10:15 0 John Smith 20 2019-10-09 18:17:36 1 Daisy Williams 25 2019-10-09 18:57:12 2 Laura Thomas 15
  • 24. JDBC source connector © 2019 IBM Corporation 24 LICENSE: Confluent Community License Agreement Version 1.0
  • 25. © 2019 IBM Corporation 25
  • 26. Building the JDBC connector from source 1. Edit the pom.xml: a) Comment out the Confluent parts of the pom.xml b) Add a version c) Comment out checkstyle d) Add Java 8 enforcement e) Add versions for dependencies 2. git clone confluentinc/kafka- connect-jdbc.git 3. cd kafka-connect-jdbc mvn install –D skipTests © 2019 IBM Corporation 26
  • 27. Building the JDBC connector from source 1. git clone confluentinc/kafka.git (Apache 2.0 license) 2. cd kafka gradle ./gradlew installAll 3. git clone confluentinc/common.git (Apache 2.0 license) 4. cd common mvn install 5. git clone confluentinc/kafka-connect-jdbc.git (Confluent Community license) 6. cd kafka-connect-jdbc mvn install © 2019 IBM Corporation 27
  • 28. Running the JDBC connector Must check the JDBC driver has been loaded (SQLite and Postgres included by default) 1. Increase log level to DEBUG 2. Check JDBC driver JAR is in Loading plugin urls list 3. Check for ‘Added plugin’ line immediately after CLASSPATH=/Users/katherinestanley/connectors/mysql-connector-java- 8.0.17.jar ./bin/connect-distributed.sh config/connect- distributed.properties https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector © 2019 IBM Corporation 28
  • 29. Debezium © 2019 IBM Corporation 29
  • 30. Debezium Debezium is an open-source platform for change data capture using Kafka Connect MySQL, MongoDB, PostgreSQL, SQL Server; incubator – Oracle, Cassandra, Db2 (soon) Each supported database has separate code Underlying technology depends on database MySQL uses log scanning, SQL Server uses special CDC tables created by the database, … Open-source – https://github.com/debezium/debezium Proper open licence – Apache 2.0 © 2019 IBM Corporation 30
  • 31. Debezium – log scanning © 2019 IBM Corporation 31 Kafka Connect worker T1 Ins T2 Upd T1 Ins T2 Ins T1 Del T1 Cmt T2 Pre T2 Cmt Debezium Ins Upd Ins Ins Del Read Publish DATABASE LOG
  • 32. Debezium MySQL Uses log scanning – requires configuration of row-based binary logs WRITE_ROWS for row insert UPDATE_ROWS for row update DELETE_ROWS for row delete QUERY for all kinds of miscellaneous stuff, including transaction commit Nice and efficient, but connector code is very specific to MySQL internal details © 2019 IBM Corporation 32
  • 33. Database replication © 2019 IBM Corporation 33 Source database T1 Ins T2 Upd T1 Ins T2 Ins T1 Del T1 Cmt T2 Pre T2 Cmt CAPTURE PROGRAM CHANGE DATA TABLE SOURCE TABLE Target database TARGET TABLE APPLY PROGRAM DATABASE LOG
  • 34. Debezium – replication tables © 2019 IBM Corporation 34 Source database T1 Ins T2 Upd T1 Ins T2 Ins T1 Del T1 Cmt T2 Pre T2 Cmt CAPTURE PROGRAM CHANGE DATA TABLE SOURCE TABLE DATABASE LOG Kafka Connect worker Debezium Ins Upd Ins Ins Del Publish
  • 35. How can I try it? Try the totally excellent Docker-based tutorial https://debezium.io/documentation/reference/0.10/tutorial.html © 2019 IBM Corporation 35
  • 36. Record formatting The default is comprehensive and very verbose © 2019 IBM Corporation 36 { "schema" : { }, "payload" : { "op": "u", "source": { ... }, "ts_ms" : "...", "before" : { "field1" : "oldvalue1", "field2" : "oldvalue2" }, "after" : { "field1" : "newvalue1", "field2" : "newvalue2" } } }
  • 37. Record formatting Just use the provided ExtractNewRecordState SMT © 2019 IBM Corporation 37 { "schema" : { }, "payload" : { "op": "u", "source": { ... }, "ts_ms" : "...", "before" : { "field1" : "oldvalue1", "field2" : "oldvalue2" }, "after" : { "field1" : "newvalue1", "field2" : "newvalue2" } } } { "field1" : "newvalue1", "field2" : "newvalue2” } SMT
  • 38. IBM InfoSphere Data Replication © 2019 IBM Corporation 38
  • 39. IBM InfoSphere Data Replication Enterprise-grade CDC built exclusively on log scanning Focus on performance and transactionality Can be customised with user code Does not use Kafka Connect because wants tighter control over publish © 2019 IBM Corporation 39
  • 40. IIDR architecture © 2019 IBM Corporation 40 Source server T1 Ins T2 Upd T1 Ins T2 Ins T1 Del T1 Cmt T2 Pre T2 Cmt CDC SOURCE ENGINE DATABASE LOG Ins Upd Ins Ins Del Target server CDC TARGET ENGINE PublishWRITER WRITER PARSE TRANSFORM MANAGEMENT CONSOLE Read Send
  • 41. er 15, 2018 / © 2018 IBM Corporation Four Time-Interleaved Source Database Transactions Transaction 1 Op1(Tab2) Op2(Tab3) Op3(tab2) Commit Transaction 2 Op1(Tab2) Op2(Tab2) Op3(Tab3) Op4(tab2) Commit Transaction 3 Op1(Tab1) Commit Transaction 4 Op1(tab1) Commit ===================== TIME =====================è
  • 42. Transactionally Consistent Consumer Recreates order of operations in source database across multiple topics and partitions, with no duplicates Uses a ”commitstream” topic to maintain transaction metadata User topic data is not modified Kafka records can be written out of strict order and TCC sorts it all out © 2019 IBM Corporation 42
  • 43. Summary © 2019 IBM Corporation 43
  • 44. Summary There is a variety of open-source and commercial CDC options for Kafka Choice depends largely on desired throughput, flexibility, semantics and cost © 2019 IBM Corporation 44
  • 45. © 2019 IBM Corporation IBM Cloud - London This is a group for anyone interested in learning about #IBMCloud, the cloud built for business. You can be an existing #IBMCloud user, or someone who has never touched the #IBMCloud before. Meetup topics will vary and can be of interest to developers, administrators, or even business leaders! We are interested in using amazing tech to grow business and make the world a better place. Some of the technology topics that we will talk about are: cloud platforms, artificial intelligence, blockchain, analytics, automation, cloud services / APIs, data science, integration, application development, and governance. Humanizing your chatbot, how I digress! Site Reliability Engineer to the rescue! Blockchain: The Good, The Bad and The Ugly! Unlocking the power of automation with AI and ML Innovate with APIs (App Mod #2) Sign up at: https://www.meetup.com/IBM-Cloud-London/ to come along and take part at our events!
  • 46. Thank you Kate Stanley @katestanley91 Andrew Schofield https://medium.com/@andrew_schofield Links: https://kafka.apache.org/documentation/#connect https://github.com/confluentinc/kafka-connect-jdbc https://debezium.io https://github.com/debezium IBM Event Streams: ibm.biz/aboutEventStreams © 2019 IBM Corporation 46