V0000000
1
Friends don’t let friends do dual-writes!
Introducing Change Data
Capture with Debezium
Cheng Kuan Gan
Senior Specialist Solution Architect
Red Hat APAC
CHANGE
DATA
CAPTURE
V0000000
CHANGE DATA CAPTURE
2
The Issue with Dual Writes
Source:
What's the problem?
Change data capture to the rescue!
CDC Use Cases & Patterns
Replication
Audit Logs
Microservices
Practical Matters
Deployment Topologies
Running on Kubernetes
Single Message Transforms
Agenda
V0000000
CHANGE DATA CAPTURE
Better Solution
9
Stream changes events from the database
Order
Service
C | C | U | C | U | U | D Change Data
Capture
C - Change
U - Update
D - Delete
V0000000
CHANGE DATA CAPTURE
Better Solution
10
Stream changes events from the database
Order
Service
C | C | U | C | U | U | D Change Data
Capture
C - Change
U - Update
D - Delete
Search Index Cache
V0000000
CHANGE DATA CAPTURE
Debezium
12
Change Data Capture Platform
● CDC for multiple databases
○ Based on transaction logs
○ Snapshotting, Filtering etc.
● Fully open-source, very active community
● Latest version: 1.4
● Production deployments at multiple companies
(e.g. WePay, JW Player, Convoy, Trivago, OYO,
BlaBlaCar etc.)
V0000000
CHANGE DATA CAPTURE
Red Hat Integration CDC
13
● GA Connectors
○ MySQL
○ Postgres
○ SQL Server
○ MongoDB
○ DB2 (Linux only)
● Developer Preview:
○ Oracle 19 EE (LogMiner)
Supported Databases
V0000000
CHANGE DATA CAPTURE
Advantages of Log-based CDC
14
Tailing the Transaction Logs
● All data changes are captured
● No polling delay or overhead
● Transparent to writing applications and models
● Can capture deletes
● Can capture old record state and further meta data
V0000000
CHANGE DATA CAPTURE
Log vs Query based CDC
15
Query-based Log-based
All data changes are captured -
No polling delay or overhead -
Transparent to writing applications
and models -
Can capture deletes and old record
state -
Simple Installation/Configuration -
V0000000
CHANGE DATA CAPTURE
Debezium
16
Change Event Structure
● Key: PK of table
● Value: Describing the change event
○ Before state,
○ After state,
○ Metadata info
● Serialization formats:
○ JSON
○ Avro
● Cloud events could be used too
V0000000
CHANGE DATA CAPTURE
Single Message Transformations
17
Modify events before storing in Kafka
Image Source: “Penknife, Swiss Army Knife” by Emilian Robert Vicol , used under CC BY 2.0
● Lightweight single message inline transformation
● Format conversions
○ Time/date fields
○ Extract new row state
● Aggregate sharded tables to single topic
● Keep compatibility with existing consumers
● Transformation does not interact with external systems
V0000000
CHANGE DATA CAPTURE
Data Replication
22
Zero-Code Streaming Pipelines
| | | | | | |
| | | | | | | |
| | | | | |
MySQL
PostgreSQL
Apache Kafka
Kafka Connect Kafka Connect
DBZ PG
DBZ
MySQL
ES
Connector
ElasticSearch
V0000000
CHANGE DATA CAPTURE
Data Replication
23
Zero-Code Streaming Pipelines
| | | | | | |
| | | | | | | |
| | | | | |
MySQL
PostgreSQL
Apache Kafka
Kafka Connect Kafka Connect
DBZ PG
DBZ
MySQL
ES
Connector ElasticSearch
SQL
Connector
Data
Warehouse
V0000000
CHANGE DATA CAPTURE
A Trucking Company Improves ELT Performance with Debezium
24
Source:
Logs & Offsets: (Near) Real Time ELT with Apache Kafka + Snowflake
Low Latency, Zero Data Loss and Low Maintenance are key to maintain the user
experience and data democratization
● The ELT system is not
able to scale when
employee growth
exceeded 700+.
● Data that used to take
10-15 minutes to import
now takes 1-2 hours.
● Some larger datasets
expects latency of 6+
hours.
Modernized ETL
improved significantly
with Debezium
V0000000
CHANGE DATA CAPTURE
Data Replication
25
Zero-Code Streaming Pipelines
Source:
Logs & Offsets: (Near) Real Time ELT with Apache Kafka + Snowflake
V0000000
CHANGE DATA CAPTURE
Auditing
26
CDC and a bit of Kafka Streams
Source: http://bit.ly/debezium-auditlogs
| | | | | | | |
DBZ
CRM Service
Source DB
Kafka Connect
Apache Kafka
V0000000
CHANGE DATA CAPTURE
Auditing
27
CDC and a bit of Kafka Streams
Source: http://bit.ly/debezium-auditlogs
| | | | | | | |
DBZ
CRM Service
Source DB
Kafka Connect
Apache Kafka
Id User Use Case
tx-1 Bob Create Customer
tx-2 Sarah Delete Customer
tx-3 Rebecca Update Customer
V0000000
CHANGE DATA CAPTURE
Auditing
28
CDC and a bit of Kafka Streams
Source: http://bit.ly/debezium-auditlogs
| | | | | | | |
DBZ
CRM Service
Source DB
Kafka Connect
Apache Kafka
Id User Use Case
tx-1 Bob Create Customer
tx-2 Sarah Delete Customer
tx-3 Rebecca Update Customer
Customer Events
| | | | | |
Transactions
V0000000
CHANGE DATA CAPTURE
Auditing
29
CDC and a bit of Kafka Streams
Source: http://bit.ly/debezium-auditlogs
| | | | | | | |
DBZ
CRM Service
Source DB
Kafka Connect
Apache Kafka
Id User Use Case
tx-1 Bob Create Customer
tx-2 Sarah Delete Customer
tx-3 Rebecca Update Customer
Customer Events
| | | | | |
Transactions
Kafka Streams
V0000000
CHANGE DATA CAPTURE
Auditing
30
CDC and a bit of Kafka Streams
Source: http://bit.ly/debezium-auditlogs
| | | | | | | |
DBZ
CRM Service
Source DB
Kafka Connect
Apache Kafka
Id User Use Case
tx-1 Bob Create Customer
tx-2 Sarah Delete Customer
tx-3 Rebecca Update Customer
Customer Events
| | | | | |
Transactions
Kafka Streams
| | | | | | | |
Enriched Customers
V0000000
CHANGE DATA CAPTURE
Microservices
34
Mono to micro: Strangler Pattern
Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0
● Extract microservice for single component(s)
● Keep write requests against running monolith
● Stream changes to extracted microservice
● Test new functionality
● Switch over, evolve schema only afterwards
V0000000
CHANGE DATA CAPTURE
Mono to micro: Strangler Pattern
35
Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0
Customer
V0000000
CHANGE DATA CAPTURE
Mono to micro: Strangler Pattern
36
Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0
Customer Customer’
Router
CDC
Transformation
Reads /
Writes Reads
V0000000
CHANGE DATA CAPTURE
Mono to micro: Strangler Pattern
37
Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0
Customer
Router
CDC
Reads /
Writes
Reads /
Writes
CDC