SlideShare a Scribd company logo
1 of 40
Capture the Streams of
Database Changes
Randall Hauch
Founder of Debezium project
@rhauch
Apach Kafka™
2
Producers
Consumers
Apache Kafka Streams API
Apache Kafka Connect API
DB
Change Data Capture Connectors
3
See the list at https://www.confluent.io/product/connectors/
Apache Kafka™
Why capture streams of data changes?
4
DB
Application
Streaming data replication
5
DB
Apache Kafka™
DB2
Streaming analytics and machine learning
6
DB
…
Apache Kafka™
Streaming ETL
7
DB2
Extract Transform Load
DB
Apache Kafka™
Shared data in a microservice architecture
8
Bounded context
DB A
Service A
Apache Kafka™
changes changes changes
other
data
other
data
other
data
Bounded context
DB B
Service B
Bounded context
DB C
Service C
materialized
views
materialized
views
materialized
views
Deconstructed applications
9
DB
Application
Cache
Indexes
Cache
Indexes
DB
Apache Kafka™
CacheIndexes
Application
(dual writes!)
Kafka
Consumers
How do we get a stream of data changes?
10
DB
Application
?
Apache Kafka™
Consumers
How do we get a stream of data changes?
11
Modify the app to
write out events?
DB
Application
Application 2 Application 3
What about the
other apps that
change data?
Dual writes?!
Apache Kafka™
Consumers
How do we get a stream of data changes?
12
Or we can watch the database
DB
Application
Need a connector to do this
Just install, configure and run it,
and it will adapt
No need to change our apps!
Change data capture!
Kafka Connect
Connector
Databases 101
13
insert row 1
insert row 2
update row 1
insert row 3
delete row 2
insert row 4
update row 2
• Applications modify rows in transactions
• DBMS records the changes in a log,
then updates the tables
• DBMS uses log for recovery, replication, …
- MySQL binlog
- MongoDB oplog
- PostgreSQL WAL
• We can (try to) use the log for CDC*
Application
*mileage may vary
Change Data Capture (CDC) at work
14
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream
Change Data Capture (CDC) at work
15
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream
Change Data Capture (CDC) at work
16
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream Table*
Change Data Capture (CDC) at work
17
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream Table*
Stream-Table Duality
18
We can view a table as a stream
and
We can view a stream as a table
Change Data Capture (CDC) at work
19
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream Table*
What does a change event look like?
20
• Primary/unique key of the row
• Kind of operation: insert, update, delete
• State of the row after the changes
• State of the row before the changes
• Source-specific provenance metadata
- location in the log
- database name, table name
- transaction ID, source timestamp, …
• Capture timestamp
What does a change event look like?
21
• Key
- Primary/unique key of the row
• Value
- Operation
- State of the row after the changes
- State of the row before the changes (if available)
- Source-specific provenance metadata
- Capture timestamp
• Timestamp
This maps perfectly to a Kafka message!
Single Message Transforms
22
• Simple transformations for a single message
• Defined as part of Kafka Connect
- Some useful transforms provided in-the-box
- Easily implement your own
• Optionally deploy 1+ transforms with each connector
- Modify messages produced by source connector
- Modify messages sent to sink connectors
• Makes it much easier to mix and match connectors
Connectors started long after DBs were created
23
• Databases don’t keep all past changes
- The logs are not kept indefinitely
• So CDC connectors often start by taking an initial snapshot
- Capture initial state of every row at that time
- Then capture and apply changes committed after initial copy started
- Transition can be tricky, but is easier if changes are idempotent
- Must handle failure at any point
• Consumers are eventually consistent with upstream sources
- More sophisticated consumers might process source transactions
Debezium connectors
24
• MySQL connector
- Multiple MySQL topologies
- GTIDs, DDL and DML, table filters, events mirror table structures
• MongoDB connector
- Replica set or sharded cluster
- Only insert events have “after” state; others have patch operation
• PostgreSQL connector
- Provides server-side logical decoding plugin
- Table filters, events mirror table structures
• SQL Server and Oracle connectors coming next
Using Debezium + Kafka Connect
25
MySQL
Using Debezium + Kafka Connect
26
Apache Kafka™
MySQL
• Use existing Kafka cluster
Using Debezium + Kafka Connect
27
Apache Kafka™Kafka Connect
MySQL
• Use existing Kafka cluster
• Start Kafka Connect cluster
Using Debezium + Kafka Connect
28
Apache Kafka™Kafka Connect
MySQL
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s)
Using Debezium + Kafka Connect
29
Apache Kafka™Kafka Connect
MySQL
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
Using Debezium + Kafka Connect
30
Apache Kafka™Kafka Connect
MySQL
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
• Consume change events
Using Debezium + Kafka Connect
31
Apache Kafka™Kafka Connect
MySQL
Consumers
Consumers
Consumers
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
• Pause, undeploy, or redeploy connector at any time
Using Debezium + Kafka Connect
32
Apache Kafka™Kafka Connect
MySQL
Consumers
Consumers
Consumers
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
• Pause, undeploy, or redeploy connector at any time
• Consumers will keep consuming or block until there are more events
Using Debezium + Kafka Connect
33
Apache Kafka™Kafka Connect
MySQL
Consumers
Consumers
Consumers
MySQL
Connector
Using Debezium + Kafka Connect
34
Kafka Connect
Apache Kafka™Kafka Connect
MySQL
ConnectorMySQL
PostgreSQL
ConnectorPostgreSQL
MySQL
Connector
MySQL
MySQL
Connector
Consumers
Consumers
Consumers
Consumers
Consumers
Consumers
Consumers
DB2
Kafka Connect
Sink
Connector
Create data pipelines for data you already have
36
DB1
Extract
Kafka Streams
Transform Load
Kafka Connect
Source
Connector
Create data pipelines for data you already have
37
DB1
DB2
Extract
Kafka Streams
Transform Load
Kafka Connect
Source
Connector
Kafka Connect
Sink
Connector
DB2
Kafka Streams Kafka Connect
Sink
Connector
ApplicationsApplications
Create data pipelines for data you already have
38
DB1 DB2
Kafka Streams
Kafka Connect
Source
Connector
Kafka Connect
Sink
Connector
DB2
Kafka Streams Kafka Connect
Sink
Connector
Applications
&
Frameworks
Summary
39
• Just configure and deploy connectors - no custom code!
• Continuously captures changes with low latency and without batching
• Fault tolerant
- failures only cause a delay in processing
- still process events at least once
- avoid dual-write problems
• Use stream processing to combine/merge/join multiple low-level events
• CDC is more complex, but amortize across multiple systems
• Works with limited DBMSes (for now) that have APIs for CDC
Interested? Want to contribute?
40
debezium.io
@debezium
Thank you!

More Related Content

What's hot

A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practicesconfluent
 
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022HostedbyConfluent
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...confluent
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectKaufman Ng
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumClement Demonchy
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connectconfluent
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataTimothy Spann
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
 
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...confluent
 

What's hot (20)

A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debezium
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
 

Similar to Capture the Streams of Database Changes

Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCAbhijit Kumar
 
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)Serena Software
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connectconfluent
 
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application UpgradeOracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application UpgradeLucas Jellema
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database designSalehein Syed
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkFabian Hueske
 
Database Migrations with Gradle and Liquibase
Database Migrations with Gradle and LiquibaseDatabase Migrations with Gradle and Liquibase
Database Migrations with Gradle and LiquibaseDan Stine
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY StyleAthens Big Data
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Fieldconfluent
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebspasalapudi123
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of stateYoni Farin
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineSunil Nagaraj
 
Liquibase få kontroll på dina databasförändringar
Liquibase   få kontroll på dina databasförändringarLiquibase   få kontroll på dina databasförändringar
Liquibase få kontroll på dina databasförändringarSqueed
 

Similar to Capture the Streams of Database Changes (20)

Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
 
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
 
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application UpgradeOracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database design
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
 
Database Migrations with Gradle and Liquibase
Database Migrations with Gradle and LiquibaseDatabase Migrations with Gradle and Liquibase
Database Migrations with Gradle and Liquibase
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
 
Riding the Streaming Wave DIY style
Riding the Streaming Wave  DIY styleRiding the Streaming Wave  DIY style
Riding the Streaming Wave DIY style
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebs
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebs
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
 
SOA_BPM_12c_launch_event_SOA_track_deepdive_developerproductivityandperforman...
SOA_BPM_12c_launch_event_SOA_track_deepdive_developerproductivityandperforman...SOA_BPM_12c_launch_event_SOA_track_deepdive_developerproductivityandperforman...
SOA_BPM_12c_launch_event_SOA_track_deepdive_developerproductivityandperforman...
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
 
Liquibase få kontroll på dina databasförändringar
Liquibase   få kontroll på dina databasförändringarLiquibase   få kontroll på dina databasförändringar
Liquibase få kontroll på dina databasförändringar
 

More from confluent

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfQ-Advise
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfMehmet Akar
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfkalichargn70th171
 
A Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data MigrationA Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data MigrationHelp Desk Migration
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Soroosh Khodami
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024Shane Coughlan
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityamy56318795
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdfkalichargn70th171
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)Max Lee
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignNeo4j
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfmbmh111980
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesNeo4j
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Gáspár Nagy
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Krakówbim.edu.pl
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfsteffenkarlsson2
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion Clinic
 

Recently uploaded (20)

Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
A Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data MigrationA Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data Migration
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
AI Hackathon.pptx
AI                        Hackathon.pptxAI                        Hackathon.pptx
AI Hackathon.pptx
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 

Capture the Streams of Database Changes

  • 1. Capture the Streams of Database Changes Randall Hauch Founder of Debezium project @rhauch
  • 2. Apach Kafka™ 2 Producers Consumers Apache Kafka Streams API Apache Kafka Connect API DB
  • 3. Change Data Capture Connectors 3 See the list at https://www.confluent.io/product/connectors/
  • 4. Apache Kafka™ Why capture streams of data changes? 4 DB Application
  • 6. Streaming analytics and machine learning 6 DB … Apache Kafka™
  • 7. Streaming ETL 7 DB2 Extract Transform Load DB Apache Kafka™
  • 8. Shared data in a microservice architecture 8 Bounded context DB A Service A Apache Kafka™ changes changes changes other data other data other data Bounded context DB B Service B Bounded context DB C Service C materialized views materialized views materialized views
  • 10. Kafka Consumers How do we get a stream of data changes? 10 DB Application ?
  • 11. Apache Kafka™ Consumers How do we get a stream of data changes? 11 Modify the app to write out events? DB Application Application 2 Application 3 What about the other apps that change data? Dual writes?!
  • 12. Apache Kafka™ Consumers How do we get a stream of data changes? 12 Or we can watch the database DB Application Need a connector to do this Just install, configure and run it, and it will adapt No need to change our apps! Change data capture! Kafka Connect Connector
  • 13. Databases 101 13 insert row 1 insert row 2 update row 1 insert row 3 delete row 2 insert row 4 update row 2 • Applications modify rows in transactions • DBMS records the changes in a log, then updates the tables • DBMS uses log for recovery, replication, … - MySQL binlog - MongoDB oplog - PostgreSQL WAL • We can (try to) use the log for CDC* Application *mileage may vary
  • 14. Change Data Capture (CDC) at work 14 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream
  • 15. Change Data Capture (CDC) at work 15 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream
  • 16. Change Data Capture (CDC) at work 16 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream Table*
  • 17. Change Data Capture (CDC) at work 17 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream Table*
  • 18. Stream-Table Duality 18 We can view a table as a stream and We can view a stream as a table
  • 19. Change Data Capture (CDC) at work 19 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream Table*
  • 20. What does a change event look like? 20 • Primary/unique key of the row • Kind of operation: insert, update, delete • State of the row after the changes • State of the row before the changes • Source-specific provenance metadata - location in the log - database name, table name - transaction ID, source timestamp, … • Capture timestamp
  • 21. What does a change event look like? 21 • Key - Primary/unique key of the row • Value - Operation - State of the row after the changes - State of the row before the changes (if available) - Source-specific provenance metadata - Capture timestamp • Timestamp This maps perfectly to a Kafka message!
  • 22. Single Message Transforms 22 • Simple transformations for a single message • Defined as part of Kafka Connect - Some useful transforms provided in-the-box - Easily implement your own • Optionally deploy 1+ transforms with each connector - Modify messages produced by source connector - Modify messages sent to sink connectors • Makes it much easier to mix and match connectors
  • 23. Connectors started long after DBs were created 23 • Databases don’t keep all past changes - The logs are not kept indefinitely • So CDC connectors often start by taking an initial snapshot - Capture initial state of every row at that time - Then capture and apply changes committed after initial copy started - Transition can be tricky, but is easier if changes are idempotent - Must handle failure at any point • Consumers are eventually consistent with upstream sources - More sophisticated consumers might process source transactions
  • 24. Debezium connectors 24 • MySQL connector - Multiple MySQL topologies - GTIDs, DDL and DML, table filters, events mirror table structures • MongoDB connector - Replica set or sharded cluster - Only insert events have “after” state; others have patch operation • PostgreSQL connector - Provides server-side logical decoding plugin - Table filters, events mirror table structures • SQL Server and Oracle connectors coming next
  • 25. Using Debezium + Kafka Connect 25 MySQL
  • 26. Using Debezium + Kafka Connect 26 Apache Kafka™ MySQL • Use existing Kafka cluster
  • 27. Using Debezium + Kafka Connect 27 Apache Kafka™Kafka Connect MySQL • Use existing Kafka cluster • Start Kafka Connect cluster
  • 28. Using Debezium + Kafka Connect 28 Apache Kafka™Kafka Connect MySQL MySQL Connector • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s)
  • 29. Using Debezium + Kafka Connect 29 Apache Kafka™Kafka Connect MySQL • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot MySQL Connector
  • 30. • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot, capture changes Using Debezium + Kafka Connect 30 Apache Kafka™Kafka Connect MySQL MySQL Connector
  • 31. • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot, capture changes • Consume change events Using Debezium + Kafka Connect 31 Apache Kafka™Kafka Connect MySQL Consumers Consumers Consumers MySQL Connector
  • 32. • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot, capture changes • Pause, undeploy, or redeploy connector at any time Using Debezium + Kafka Connect 32 Apache Kafka™Kafka Connect MySQL Consumers Consumers Consumers MySQL Connector
  • 33. • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot, capture changes • Pause, undeploy, or redeploy connector at any time • Consumers will keep consuming or block until there are more events Using Debezium + Kafka Connect 33 Apache Kafka™Kafka Connect MySQL Consumers Consumers Consumers MySQL Connector
  • 34. Using Debezium + Kafka Connect 34 Kafka Connect Apache Kafka™Kafka Connect MySQL ConnectorMySQL PostgreSQL ConnectorPostgreSQL MySQL Connector MySQL MySQL Connector Consumers Consumers Consumers Consumers Consumers Consumers Consumers
  • 35. DB2 Kafka Connect Sink Connector Create data pipelines for data you already have 36 DB1 Extract Kafka Streams Transform Load Kafka Connect Source Connector
  • 36. Create data pipelines for data you already have 37 DB1 DB2 Extract Kafka Streams Transform Load Kafka Connect Source Connector Kafka Connect Sink Connector DB2 Kafka Streams Kafka Connect Sink Connector
  • 37. ApplicationsApplications Create data pipelines for data you already have 38 DB1 DB2 Kafka Streams Kafka Connect Source Connector Kafka Connect Sink Connector DB2 Kafka Streams Kafka Connect Sink Connector Applications & Frameworks
  • 38. Summary 39 • Just configure and deploy connectors - no custom code! • Continuously captures changes with low latency and without batching • Fault tolerant - failures only cause a delay in processing - still process events at least once - avoid dual-write problems • Use stream processing to combine/merge/join multiple low-level events • CDC is more complex, but amortize across multiple systems • Works with limited DBMSes (for now) that have APIs for CDC
  • 39. Interested? Want to contribute? 40 debezium.io @debezium