SlideShare a Scribd company logo
CDC patterns in
Hi!
○ Mario Molina.
○ Data Engineer.
○ Working in data & all things related since 2005.
○ You can find me at:
mmolimar_
mmolimar
mmolimar
Kafka as a Data Hub
Needs
● System integration / Data replication.
○ They are usually done via API calls with an orchestration engine or service mesh for
microservices.
○ But… sometimes there are not such APIs or legacy systems might be too complicated.
● Migration.
○ From monolithic to a microservices architecture.
○ From one database type to another.
● Audit trail of changes.
○ Analysis of changes.
○ Identification of patterns.
What is CDC?
● Stands for Change Data Capture.
● Identifies changes that happen in source systems to sync others.
● Observability about what is happening in the applications (data layer).
● Most common scenario in database systems.
● Helps to avoid dual writes.
CHANGES
Patterns - Modification Date
● Adding a column on each table with the date/time for each record (last modification).
● This column must be available in all tables we want to track.
● Be sure this column is reliable set (in the app or even with database triggers).
● Query the data based on a date range.
● Take into account the number of inserts/updates for this range.
● Deletes might be an issue. Use logical deletes instead.
SELECT *
FROM sample_table
WHERE updated_time > now() - interval ‘1 second’
● Database triggers can store the change in the original table into a “shadow” table.
● Store the whole record or just the PK from the original table.
● Must be defined for each table to track.
● Adds overhead into the database.
● Ensure all the triggers are enabled.
Patterns - Triggers
TRIGGER
TABLE
● Changes in database are stored in its transaction log.
● Based on this log, changes are notified.
● There is no overhead in the database and no extra SQL/configs required.
● More reliable.
● Each database implements its own way of representing changes.
Patterns - Log based
Patterns - Diff or custom scripts
● Compute the difference between the previous and the current state.
● Via SQL or custom implementations (based on the data source).
● Might generate more overhead.
● Requires more maintenance.
● More oriented to specific use cases.
Considerations
● Cannot apply same approach for all data sources.
● Understanding of the data changed (context).
● Management of delete records.
● Schema changes in the data source.
● Initial workloads due to huge volume of data.
Some products
● Commercial
○ Oracle Golden Gate - https://www.oracle.com/integration/goldengate
○ Qlik Replicate - https://www.qlik.com/us/data-streaming/data-streaming-cdc
○ HVR - https://www.hvr-software.com/product/change-data-capture
○ AWS Data Migration Service - https://aws.amazon.com/dms
● Open source:
○ Debezium - https://debezium.io
○ Maxwell - https://maxwells-daemon.io
○ MongoDB - https://docs.mongodb.com/kafka-connector
What about Kafka?
App
App
Producers
App
App
App
App
Consumers
App
App
Sources
Connectors
Sinks
App
Streams
App
With Kafka Connect
● We already have a bunch of connectors:
○ Debezium (PostgreSQL, MySQL, SQL Server, MongoDB).
○ MongoDB.
○ Oracle CDC.
○ SQData.
○ TiDB.
○ …
● You can find more in Confluent Hub!
○ https://www.confluent.io/hub
Let’s DO this!
Demo - Database Sync
Kafka
Connect
mongo-kafka
kukulcan
kafka-connect-jdb
c
MongoDB
Kafka
Connect
PostgreSQL
○ A REPL for Apache Kafka.
○ Support POSIX and Windows OS.
○ Written in Scala, Java and Python.
○ Shells in:
○ Ammonite REPL.
○ Scala REPL.
○ JShell.
○ Python shell.
○ APIs for Admin, Producer, Consumer, Connect,
Streams, Schema Registry, and KSQL.
kukulcan
https://github.com/mmolimar/kukulcan
○ Scripts to run the demo in Kukulcan.
○ Source code:
○ https://github.com/mmolimar/meetups
○ Documentation:
○ https://github.com/mmolimar/meetups/tree/master/kafka-cdc
Ammonite scripts
○ Source code:
○ https://github.com/mongodb/mongo-kafka
○ Documentation:
○ https://docs.mongodb.com/kafka-connector
○ Confluent Hub:
○ https://www.confluent.io/hub/mongodb/kafka-connect-mongodb
mongo-kafka
○ Source code:
○ https://github.com/confluentinc/kafka-connect-jdbc
○ Documentation:
○ https://docs.confluent.io/current/connect/kafka-connect-jdbc
○ Confluent Hub:
○ https://www.confluent.io/hub/confluentinc/kafka-connect-jdbc
kafka-connect-jdbc
Getting involved with Apache Kafka
○ Website: http://kafka.apache.org
○ Join the mailing lists:
○ users@kafka.apache.org
○ dev@kafka.apache.org
○ Slack: https://confluentcommunity.slack.com
○ Meetups: https://www.meetup.com/<LOCATION>-Kafka
○ Contribute: https://github.com/apache/kafka
○ Kafka Summit 2021: https://kafka-summit.org
THANKS!
Any questions?
mmolimar
mmolimar
mmolimar_

More Related Content

What's hot

Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
confluent
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
Databricks
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
Databricks
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debezium
Clement Demonchy
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
Prakash Chockalingam
 
Change Data Streaming Patterns For Microservices With Debezium (Gunnar Morlin...
Change Data Streaming Patterns For Microservices With Debezium (Gunnar Morlin...Change Data Streaming Patterns For Microservices With Debezium (Gunnar Morlin...
Change Data Streaming Patterns For Microservices With Debezium (Gunnar Morlin...
confluent
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Databricks
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
CalvinSim10
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
Kai Wähner
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
Jun Rao
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 

What's hot (20)

Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debezium
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Change Data Streaming Patterns For Microservices With Debezium (Gunnar Morlin...
Change Data Streaming Patterns For Microservices With Debezium (Gunnar Morlin...Change Data Streaming Patterns For Microservices With Debezium (Gunnar Morlin...
Change Data Streaming Patterns For Microservices With Debezium (Gunnar Morlin...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 

Similar to CDC patterns in Apache Kafka®

Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
Change data capture
Change data captureChange data capture
Change data capture
Ron Barabash
 
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
Liquibase case study
Liquibase case studyLiquibase case study
Liquibase case study
Vivek Dhayalan
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
ScyllaDB
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
NETWAYS
 
Schema migration in agile environmnets
Schema migration in agile environmnetsSchema migration in agile environmnets
Schema migration in agile environmnets
Vivek Dhayalan
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
Ankita Kapratwar
 
Rails DB migrate SAFE.pdf
Rails DB migrate SAFE.pdfRails DB migrate SAFE.pdf
Rails DB migrate SAFE.pdf
GowthamvelPalanivel
 
Cloud Architecture best practices
Cloud Architecture best practicesCloud Architecture best practices
Cloud Architecture best practices
Omid Vahdaty
 
Presentation mongo db munich
Presentation mongo db munichPresentation mongo db munich
Presentation mongo db munich
MongoDB
 
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
Dave Stokes
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
Petr Zapletal
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
DataWorks Summit
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
SAP Migration Overview
SAP Migration OverviewSAP Migration Overview
SAP Migration Overview
Sitaram Kotnis
 
Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus
Ashnikbiz
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
Ashnikbiz
 
Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!
pflueras
 

Similar to CDC patterns in Apache Kafka® (20)

Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Change data capture
Change data captureChange data capture
Change data capture
 
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
 
Liquibase case study
Liquibase case studyLiquibase case study
Liquibase case study
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Schema migration in agile environmnets
Schema migration in agile environmnetsSchema migration in agile environmnets
Schema migration in agile environmnets
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
Rails DB migrate SAFE.pdf
Rails DB migrate SAFE.pdfRails DB migrate SAFE.pdf
Rails DB migrate SAFE.pdf
 
Cloud Architecture best practices
Cloud Architecture best practicesCloud Architecture best practices
Cloud Architecture best practices
 
Presentation mongo db munich
Presentation mongo db munichPresentation mongo db munich
Presentation mongo db munich
 
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
SAP Migration Overview
SAP Migration OverviewSAP Migration Overview
SAP Migration Overview
 
Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!
 

More from confluent

Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud ConnectorsBreak data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
confluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
confluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
confluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
confluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
confluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
confluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
confluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
confluent
 

More from confluent (20)

Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud ConnectorsBreak data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 

Recently uploaded

Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
ssuser1915fe1
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
Edge AI and Vision Alliance
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Networks
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
LINUS PROJECTS (INDIA)
 

Recently uploaded (20)

Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
 

CDC patterns in Apache Kafka®

  • 2. Hi! ○ Mario Molina. ○ Data Engineer. ○ Working in data & all things related since 2005. ○ You can find me at: mmolimar_ mmolimar mmolimar
  • 3. Kafka as a Data Hub
  • 4. Needs ● System integration / Data replication. ○ They are usually done via API calls with an orchestration engine or service mesh for microservices. ○ But… sometimes there are not such APIs or legacy systems might be too complicated. ● Migration. ○ From monolithic to a microservices architecture. ○ From one database type to another. ● Audit trail of changes. ○ Analysis of changes. ○ Identification of patterns.
  • 5. What is CDC? ● Stands for Change Data Capture. ● Identifies changes that happen in source systems to sync others. ● Observability about what is happening in the applications (data layer). ● Most common scenario in database systems. ● Helps to avoid dual writes. CHANGES
  • 6. Patterns - Modification Date ● Adding a column on each table with the date/time for each record (last modification). ● This column must be available in all tables we want to track. ● Be sure this column is reliable set (in the app or even with database triggers). ● Query the data based on a date range. ● Take into account the number of inserts/updates for this range. ● Deletes might be an issue. Use logical deletes instead. SELECT * FROM sample_table WHERE updated_time > now() - interval ‘1 second’
  • 7. ● Database triggers can store the change in the original table into a “shadow” table. ● Store the whole record or just the PK from the original table. ● Must be defined for each table to track. ● Adds overhead into the database. ● Ensure all the triggers are enabled. Patterns - Triggers TRIGGER TABLE
  • 8. ● Changes in database are stored in its transaction log. ● Based on this log, changes are notified. ● There is no overhead in the database and no extra SQL/configs required. ● More reliable. ● Each database implements its own way of representing changes. Patterns - Log based
  • 9. Patterns - Diff or custom scripts ● Compute the difference between the previous and the current state. ● Via SQL or custom implementations (based on the data source). ● Might generate more overhead. ● Requires more maintenance. ● More oriented to specific use cases.
  • 10. Considerations ● Cannot apply same approach for all data sources. ● Understanding of the data changed (context). ● Management of delete records. ● Schema changes in the data source. ● Initial workloads due to huge volume of data.
  • 11. Some products ● Commercial ○ Oracle Golden Gate - https://www.oracle.com/integration/goldengate ○ Qlik Replicate - https://www.qlik.com/us/data-streaming/data-streaming-cdc ○ HVR - https://www.hvr-software.com/product/change-data-capture ○ AWS Data Migration Service - https://aws.amazon.com/dms ● Open source: ○ Debezium - https://debezium.io ○ Maxwell - https://maxwells-daemon.io ○ MongoDB - https://docs.mongodb.com/kafka-connector
  • 13. With Kafka Connect ● We already have a bunch of connectors: ○ Debezium (PostgreSQL, MySQL, SQL Server, MongoDB). ○ MongoDB. ○ Oracle CDC. ○ SQData. ○ TiDB. ○ … ● You can find more in Confluent Hub! ○ https://www.confluent.io/hub
  • 15. Demo - Database Sync Kafka Connect mongo-kafka kukulcan kafka-connect-jdb c MongoDB Kafka Connect PostgreSQL
  • 16. ○ A REPL for Apache Kafka. ○ Support POSIX and Windows OS. ○ Written in Scala, Java and Python. ○ Shells in: ○ Ammonite REPL. ○ Scala REPL. ○ JShell. ○ Python shell. ○ APIs for Admin, Producer, Consumer, Connect, Streams, Schema Registry, and KSQL. kukulcan https://github.com/mmolimar/kukulcan
  • 17. ○ Scripts to run the demo in Kukulcan. ○ Source code: ○ https://github.com/mmolimar/meetups ○ Documentation: ○ https://github.com/mmolimar/meetups/tree/master/kafka-cdc Ammonite scripts
  • 18. ○ Source code: ○ https://github.com/mongodb/mongo-kafka ○ Documentation: ○ https://docs.mongodb.com/kafka-connector ○ Confluent Hub: ○ https://www.confluent.io/hub/mongodb/kafka-connect-mongodb mongo-kafka
  • 19. ○ Source code: ○ https://github.com/confluentinc/kafka-connect-jdbc ○ Documentation: ○ https://docs.confluent.io/current/connect/kafka-connect-jdbc ○ Confluent Hub: ○ https://www.confluent.io/hub/confluentinc/kafka-connect-jdbc kafka-connect-jdbc
  • 20. Getting involved with Apache Kafka ○ Website: http://kafka.apache.org ○ Join the mailing lists: ○ users@kafka.apache.org ○ dev@kafka.apache.org ○ Slack: https://confluentcommunity.slack.com ○ Meetups: https://www.meetup.com/<LOCATION>-Kafka ○ Contribute: https://github.com/apache/kafka ○ Kafka Summit 2021: https://kafka-summit.org