SlideShare a Scribd company logo
1 of 40
Download to read offline
©Instaclustr Pty Limited, 2021
Change Data Capture (CDC)
With Kafka Connect® and
the Debezium PostgreSQL®
Source Connector
Paul Brebner
Technology Evangelist, Instaclustr
December 2021
© Instaclustr Pty Limited, 2021
Instaclustr Managed Platform
A complete ecosystem
to support mission
critical open source
big data applications
This Talk Focuses On
Technologies
• Debezium CDC Use Case
• PostgreSQL® (source database)
• Kafka® + Kafka Connect®
(streaming)
• Elasticsearch/OpenSearch® (sink
system)
Open Source
• There’s nothing specific to our
platform
• I used Instaclustr managed Kafka
and Elasticsearch © Instaclustr Pty Limited, 2021
(Source: Shutterstock)
Which Came
First?
© Instaclustr Pty Limited, 2021
Which Came
First?
(Source: Shutterstock)
The state or the
event?
What if you have
state and want
events?
Events and you
want state?
© Instaclustr Pty Limited, 2021
Or, how can
speed up an
Elephant
(PostgreSQL)
to be as fast as
a Cheetah
(Kafka)? Cheetahs are the fastest land animal (top speed
120km/hr. They can accelerate from 0 to 100km/hr in 3
seconds), three times faster than elephants (40km/hr)
(Source: Shutterstock)
© Instaclustr Pty Limited, 2021
1. The
Debezium
PostgreSQL
Connector
• The Debezium PostgreSQL connector captures
row-level database changes and streams them to
Kafka via Kafka Connect.
• Runs as a Kafka source connector
• How does it get Postgresql change events? Does it
poll with queries?
© Instaclustr Pty Limited, 2021
1. The
Debezium
PostgreSQL
Connector
• As of PostgreSQL 10+, there is a logical replication stream
mode, called pgoutput that is natively supported by
PostgreSQL
• This means that a Debezium PostgreSQL connector can
consume that replication stream [as a client] without the need
for additional plug-ins
pgoutput pgoutput client
Logical replication stream
© Instaclustr Pty Limited, 2021
1. The
Debezium
PostgreSQL
Connector—
Run It
• Download the Debezium PostgreSQL connector
• Deploy it:
o Upload to AWS S3 bucket
o Synchronise with Instaclustr managed Kafka connect
o "io.debezium.connector.postgresql.PostgresConnector" will be in list
of available connectors on the console
• Configure PostgreSQL
o Set wal_level (write ahead log) to logical (3rd non-default level,
requires server restart)
o Create Debezium user with REPLICATION and LOGIN permissions
o These need PostgreSQL admin permissions
• Configure Debezium connector and run it
o Plugin.name default must be set to pgoutput, need PG
username/password and IP
Configure
and
Run
curl https://KafkaConnectIP:8083/connectors -X POST -H
'Content-Type: application/json' -k -u kc_username:kc_password
-d '{
"name": "debezium-test1",
"config": {
"connector.class":
"io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "PG_IP",
"database.port": "5432",
"database.user": "pg_username",
"database.password": "pg_password",
"database.dbname" : "postgres",
"database.server.name": "test1",
"plugin.name": "pgoutput"
}
}
‘
If it worked you will see a single task
running, tasks.max can only = 1
© Instaclustr Pty Limited, 2021
Exploring the
Debezium
PostgreSQL
Connector
Change Data
Events
A terrifying “Giraffosaurus” (T-Raffe?)!
(Source: Shutterstock)
© Instaclustr Pty Limited, 2021
CRUD?
What
Operations
Result in
Change?
Create/Insert? Yes
Read? No
Update? Yes
Delete? Yes
© Instaclustr Pty Limited, 2021
Table ->
Topic
Mapping
What Does
the Kafka
Record Look
Like?
• CDC events from PostgreSQL: server + database + table à
Kafka topic: server.database.name
• Example insert record (table has id and v1 v2 integer columns):
• Struct{after=Struct{id=1,v1=2,v2=3},source=Struct{version=1.6.1.
Final,connector=postgresql,name=test1,ts_ms=1632457564326,d
b=postgres,sequence=["1073751912","1073751912"],schema=pu
blic,table=test1,txId=612,lsn=1073751968},op=c,ts_ms=1632457
564351}
• Operation types: “op=c” (insert), “op=u” (update), “op=d” (delete)
• For insert and update there’s an “after” record with id and values
after transaction committed
• For delete there’s a “before” record which shows id and NULL
value only
• And lots of metadata
• Documentation led me to believe I would be seeing JSON with
schema metadata? What’s wrong?
© Instaclustr Pty Limited, 2021
Updated
Connector
Configuration
and JSON
Record
Example
"value.converter":
"org.apache.kafka.connect.json.JsonConverter"
"value.converter.schemas.enable": "true"
"key.converter": "org.apache.kafka.connect.json.JsonConverter"
"key.converter.schemas.enable": "true”
Schema is verbose, has Schema and payload records; turn it
off, now have implicit payload only—Insert now looks like
this:
{"before":null,"after":{"id":10,"v1":10,"v2":10},"source":{"version":"
1.6.1.Final","connector":"postgresql","name":"test1","ts_ms":1632
717503331,"snapshot":"false","db":"postgres","sequence":"["194
6172256","1946172256"]","schema":"public","table":"test1","txId
":1512,"lsn":59122909632,"xmin":null},"op":"c","ts_ms":16327175
03781,"transaction":null}
© Instaclustr Pty Limited, 2021
Two T’s
- Truncations
- Transactions
• Truncate
o Is also a PostgreSQL operation (makes a table vanish)
o What would you expect to happen?
o Lots of deletes? No—nothing
o Turned off by default
• Transactions
o PostgreSQL is a real SQL transactional database
o What happens when multiple tables are changed in a
single transaction?
o You get multiple Kafka records, with the same transaction ID
o The transaction ID can optionally be written to another
Kafka topic
• Note that to process Truncations and Transactions the Kafka sink
connector needs to be pretty intelligent, and semantics will
depend on target sink system
© Instaclustr Pty Limited, 2021
Debezium
PostgreSQL
Connector
Throughput
How fast can a Debezium PostgreSQL Connector run?
(Source: Shutterstock)
© Instaclustr Pty Limited, 2021
Debezium
PostgreSQL
Connector
Throughput
1 Task Only
(Source: Shutterstock)
© Instaclustr Pty Limited, 2021
Debezium
PostgreSQL
Connector
Throughput
1 Task Only
Throughput limited to 7,000 events/s per task
© Instaclustr Pty Limited, 2021
Debezium
PostgreSQL
Connector
Throughput
1 Task Only
But PostgreSQL server is capable of 41,000 inserts/s (6x more, from previous tests)
© Instaclustr Pty Limited, 2021
Solutions?
Most workloads will be
more balanced between
reads/writes
One lane may be fine!
© Instaclustr Pty Limited, 2021
(Sources: Paul Brebner &
Shutterstock)
Solutions?
A wider bridge?
© Instaclustr Pty Limited, 2021
(Source: Paul Brebner)
Solutions?
Multiple connectors
1 per table?
© Instaclustr Pty Limited, 2021
M
ultiple
replication
slots
© Instaclustr Pty Limited, 2021
Solutions?
Multiple connectors
1 per table?
Odd Behaviour
One connector watching
2 tables
Multiple changes to 1 before
the other
Changes in 1st table all
processed before any changes
in the other (10m delay!)
Multiple connectors may be
best practice
Only 1 table at a time is processed?
© Instaclustr Pty Limited, 2021
(Source: Shutterstock)
What if there
are lots of
tables (and
databases)?
Better? 1 connector per
table “group” (tables
common to a service,
tables with similar change
rates, etc.)
Table 1 Table 2
Table 3 Table 4
Table 5 Table 6
Debezium
Connector 1
Debezium
Connector 2
Service
1
Service
2
© Instaclustr Pty Limited, 2021
Streaming
Debezium
PostgreSQL
Connector
Change Data
Capture Events
Into Elasticsearch
With Kafka Sink
Connectors
The final metamorphosis, from Cheetah (Kafka)
to Rhino (Elasticsearch)!
(Source: Shutterstock)
© Instaclustr Pty Limited, 2021
What Can
You Do With
the CDC Data
Once It’s in
Kafka?
Stream it into 1 or more sink systems, e.g. Elasticsearch
© Instaclustr Pty Limited, 2021
Pipeline Blog
Series
Berlin Beer Pipes?
(Source:Paul Brebner)
Reuse Kafka
Elasticsearch sink
connectors
Worked well with
schema less JSON data
© Instaclustr Pty Limited, 2021
Camel Sink
Connector?
Missing a class
(“org.elasticsearch.rest.
BytesRestResponse”)
Gave up!
(Source: Shutterstock)
APACHE
© Instaclustr Pty Limited, 2021
Tried the
Lenses
Connector
Example configuration
To process 7,000
events/s need more
tasks, partitions, and
Elasticsearch shards,
and probably BULK API!
curl https://KC_IP:8083/connectors/elastic-sink-tides/config -k -u
KC_user:KC_password -X PUT -H 'Content-Type: application/json' -d '
{
"connector.class" :
"com.datamountaineer.streamreactor.connect.elastic7.ElasticSinkCon
nector",
"tasks.max" : 100,
"topics" : "test1.public.test1",
"connect.elastic.hosts" : "ES_IP",
"connect.elastic.port" : 9201,
"connect.elastic.kcql" : "INSERT INTO test-index SELECT * FROM
test1.public.test",
"connect.elastic.use.http.username" : "ES_user",
"connect.elastic.use.http.password" : "ES_password"
}
}'
© Instaclustr Pty Limited, 2021
All Events Are
“Inserts” Into
Elasticsearch
But we have “before”
and “after”?!
Get rid of before events
with Single Message
Transformation on
Source connector side
curl https://KC_IP:8083/connectors -X POST -H 'Content-Type: application/json' -k -u
kc_user:kc_password -d '{
"name": "debezium-test1",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "pg_ip",
"database.port": "5432",
"database.user": "pg_user",
"database.password": "pg_password",
"database.dbname" : "postgres",
"database.server.name": "test1",
"plugin.name": "pgoutput",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
}
}
‘
This ”event flattening” SMT extracts the after field from a Debezium change
event and creates a simple Kafka record with the after field contents.
© Instaclustr Pty Limited, 2021
All Events Are
“Inserts” Into
Elasticsearch
But we have “before”
and “after”?!
Get rid of before events
with Single Message
Transformation on
Source connector side
curl https://KC_IP:8083/connectors -X POST -H 'Content-Type: application/json' -k -u
kc_user:kc_password -d '{
"name": "debezium-test1",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "pg_ip",
"database.port": "5432",
"database.user": "pg_user",
"database.password": "pg_password",
"database.dbname" : "postgres",
"database.server.name": "test1",
"plugin.name": "pgoutput",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
}
}
‘
This ”event flattening” SMT extracts the after field from a Debezium
change event and creates a simple Kafka record with the after field
contents.
How
w
ould
w
e
process
updates
and
deletes?
© Instaclustr Pty Limited, 2021
A Clever
Test/Trick?!
Previous Tidal data ⇨
Elasticsearch pipeline,
V2 modified to use
PostgreSQL as sink
Pipeline 1: Tidal Data (REST source connector)
à PostgreSQL
© Instaclustr Pty Limited, 2021
Pipeline 2: PostgreSQL à Elasticsearch
Events State
State Events State
A Clever
Test/Trick?!
So I used the
PostgreSQL Tidal data
as the source system!
Simple test as only have
“inserts”
Pipeline 1: Tidal Data (REST source connector)
à PostgreSQL
© Instaclustr Pty Limited, 2021
Kibana
Visualization
of
Tidal Data⇨
Kafka Connect ⇨
PostgreSQL ⇨
Kafka Connect ⇨
Elasticsearch ⇨ Kibana
© Instaclustr Pty Limited, 2021
Solving the
Chicken or
Egg Dilemma
i.e. It doesn’t matter as
long as we get to
eat the omelet
(Source: Shutterstock)
© Instaclustr Pty Limited, 2021
PostgreSQL
Configuration
Required to run
the Debezium
Source Connector
Not currently
supported in our
managed PG
service
1 Task Only
Limits throughput
Issues with
multiple tables per
connector?
Best-practice to
run multiple
connectors,
maybe 1 per
“related” tables?
CDC Events
Complex Kafka
record structure
Meta data and
data
Schema or
schemaless?
Truncate?
Transactions?
Sink
Connectors
May need
customization to
understand CDC
events and
process correctly
for target sink
system
Debezium PostgreSQL Conclusions
PostgreSQL
Configuration
Required to run the
Debezium Source
Connector
Not yet supported in
Instaclustr’s
managed PG
service
1 Task
Only
Limits throughput
Issues with multiple
tables per connector?
Best-practice to run
multiple connectors,
maybe 1 per “related”
tables?
CDC
Events
Complex Kafka
record structure
Meta data and data
Schema or
schemaless?
Truncate?
Transactions?
Sink
Connectors
May need
customization to
understand CDC
events and process
correctly for target
sink system
© Instaclustr Pty Limited, 2021
Debezium
PostgreSQL
Connector
- NOTES
■ This talk covers a generic open source solution
● Using Debezium
● PostgreSQL
● Apache Kafka Connect
● OpenSearch
■ For hosted PostgreSQL
● You may need help with PostgreSQL configuration from cloud
providers
■ But may be tricky to configure correctly
● For high throughput
● Many databases and tables
● For unbalanced changes across multiple tables
● I also didn’t test failover scenarios
■ The Debezium PostgreSQL Connector with
Instaclustr’s Managed PostgreSQL service is on
the roadmap for 2022
© Instaclustr Pty Limited, 2021
Further
Information
Blogs
■ www.instaclustr.com/paul-brebner/
■ Lots of Blogs using open source technologies:
PostgreSQL, Apache Kafka, Apache Cassandra,
Apache Spark, Apache Zookeeper, Redis,
Elasticsearch/OpenSearch, Cadence (new) etc
■ For interesting use cases:
IoT, ML, anomaly detection, geospatial, fintech,
pipelines, etc
■ Free Trial on homepage for all of these technologies
© Instaclustr Pty Limited, 2021
www.instaclustr.com
info@instaclustr.com
@instaclustr
THANK
YOU!
© Instaclustr Pty Limited, 2020

More Related Content

Similar to Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Source Connector

Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetHostedbyConfluent
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...HostedbyConfluent
 
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Matt Stubbs
 
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...StreamNative
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaData Con LA
 
Полет на Zeppelin с Apache Spark™ и Cassandra™
Полет на Zeppelin с Apache Spark™ и Cassandra™Полет на Zeppelin с Apache Spark™ и Cassandra™
Полет на Zeppelin с Apache Spark™ и Cassandra™Alex Ott
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101Whiteklay
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYWangda Tan
 
prodops.io k8s presentation
prodops.io k8s presentationprodops.io k8s presentation
prodops.io k8s presentationProdops.io
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...confluent
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesMesosphere Inc.
 
Flight on Zeppelin with Apache Spark & Cassandra
Flight on Zeppelin with Apache Spark & CassandraFlight on Zeppelin with Apache Spark & Cassandra
Flight on Zeppelin with Apache Spark & CassandraAlex Ott
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlowSpinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlowPaulBrebner2
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlowAll Things Open
 

Similar to Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Source Connector (20)

Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
Presto
PrestoPresto
Presto
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
 
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
 
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Полет на Zeppelin с Apache Spark™ и Cassandra™
Полет на Zeppelin с Apache Spark™ и Cassandra™Полет на Zeppelin с Apache Spark™ и Cassandra™
Полет на Zeppelin с Apache Spark™ и Cassandra™
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 
Intro to Kubernetes
Intro to KubernetesIntro to Kubernetes
Intro to Kubernetes
 
prodops.io k8s presentation
prodops.io k8s presentationprodops.io k8s presentation
prodops.io k8s presentation
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
 
Flight on Zeppelin with Apache Spark & Cassandra
Flight on Zeppelin with Apache Spark & CassandraFlight on Zeppelin with Apache Spark & Cassandra
Flight on Zeppelin with Apache Spark & Cassandra
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlowSpinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 

More from Paul Brebner

Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersPaul Brebner
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaPaul Brebner
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaPaul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Paul Brebner
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialPaul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980'sPaul Brebner
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...Paul Brebner
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...Paul Brebner
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...Paul Brebner
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...Paul Brebner
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Paul Brebner
 

More from Paul Brebner (20)

Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...
 

Recently uploaded

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Source Connector

  • 1. ©Instaclustr Pty Limited, 2021 Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL® Source Connector Paul Brebner Technology Evangelist, Instaclustr December 2021
  • 2. © Instaclustr Pty Limited, 2021 Instaclustr Managed Platform A complete ecosystem to support mission critical open source big data applications
  • 3. This Talk Focuses On Technologies • Debezium CDC Use Case • PostgreSQL® (source database) • Kafka® + Kafka Connect® (streaming) • Elasticsearch/OpenSearch® (sink system) Open Source • There’s nothing specific to our platform • I used Instaclustr managed Kafka and Elasticsearch © Instaclustr Pty Limited, 2021
  • 4. (Source: Shutterstock) Which Came First? © Instaclustr Pty Limited, 2021
  • 5. Which Came First? (Source: Shutterstock) The state or the event? What if you have state and want events? Events and you want state? © Instaclustr Pty Limited, 2021
  • 6. Or, how can speed up an Elephant (PostgreSQL) to be as fast as a Cheetah (Kafka)? Cheetahs are the fastest land animal (top speed 120km/hr. They can accelerate from 0 to 100km/hr in 3 seconds), three times faster than elephants (40km/hr) (Source: Shutterstock) © Instaclustr Pty Limited, 2021
  • 7. 1. The Debezium PostgreSQL Connector • The Debezium PostgreSQL connector captures row-level database changes and streams them to Kafka via Kafka Connect. • Runs as a Kafka source connector • How does it get Postgresql change events? Does it poll with queries? © Instaclustr Pty Limited, 2021
  • 8. 1. The Debezium PostgreSQL Connector • As of PostgreSQL 10+, there is a logical replication stream mode, called pgoutput that is natively supported by PostgreSQL • This means that a Debezium PostgreSQL connector can consume that replication stream [as a client] without the need for additional plug-ins pgoutput pgoutput client Logical replication stream © Instaclustr Pty Limited, 2021
  • 9. 1. The Debezium PostgreSQL Connector— Run It • Download the Debezium PostgreSQL connector • Deploy it: o Upload to AWS S3 bucket o Synchronise with Instaclustr managed Kafka connect o "io.debezium.connector.postgresql.PostgresConnector" will be in list of available connectors on the console • Configure PostgreSQL o Set wal_level (write ahead log) to logical (3rd non-default level, requires server restart) o Create Debezium user with REPLICATION and LOGIN permissions o These need PostgreSQL admin permissions • Configure Debezium connector and run it o Plugin.name default must be set to pgoutput, need PG username/password and IP
  • 10. Configure and Run curl https://KafkaConnectIP:8083/connectors -X POST -H 'Content-Type: application/json' -k -u kc_username:kc_password -d '{ "name": "debezium-test1", "config": { "connector.class": "io.debezium.connector.postgresql.PostgresConnector", "database.hostname": "PG_IP", "database.port": "5432", "database.user": "pg_username", "database.password": "pg_password", "database.dbname" : "postgres", "database.server.name": "test1", "plugin.name": "pgoutput" } } ‘ If it worked you will see a single task running, tasks.max can only = 1 © Instaclustr Pty Limited, 2021
  • 11. Exploring the Debezium PostgreSQL Connector Change Data Events A terrifying “Giraffosaurus” (T-Raffe?)! (Source: Shutterstock) © Instaclustr Pty Limited, 2021
  • 12. CRUD? What Operations Result in Change? Create/Insert? Yes Read? No Update? Yes Delete? Yes © Instaclustr Pty Limited, 2021
  • 13. Table -> Topic Mapping What Does the Kafka Record Look Like? • CDC events from PostgreSQL: server + database + table à Kafka topic: server.database.name • Example insert record (table has id and v1 v2 integer columns): • Struct{after=Struct{id=1,v1=2,v2=3},source=Struct{version=1.6.1. Final,connector=postgresql,name=test1,ts_ms=1632457564326,d b=postgres,sequence=["1073751912","1073751912"],schema=pu blic,table=test1,txId=612,lsn=1073751968},op=c,ts_ms=1632457 564351} • Operation types: “op=c” (insert), “op=u” (update), “op=d” (delete) • For insert and update there’s an “after” record with id and values after transaction committed • For delete there’s a “before” record which shows id and NULL value only • And lots of metadata • Documentation led me to believe I would be seeing JSON with schema metadata? What’s wrong? © Instaclustr Pty Limited, 2021
  • 14. Updated Connector Configuration and JSON Record Example "value.converter": "org.apache.kafka.connect.json.JsonConverter" "value.converter.schemas.enable": "true" "key.converter": "org.apache.kafka.connect.json.JsonConverter" "key.converter.schemas.enable": "true” Schema is verbose, has Schema and payload records; turn it off, now have implicit payload only—Insert now looks like this: {"before":null,"after":{"id":10,"v1":10,"v2":10},"source":{"version":" 1.6.1.Final","connector":"postgresql","name":"test1","ts_ms":1632 717503331,"snapshot":"false","db":"postgres","sequence":"["194 6172256","1946172256"]","schema":"public","table":"test1","txId ":1512,"lsn":59122909632,"xmin":null},"op":"c","ts_ms":16327175 03781,"transaction":null} © Instaclustr Pty Limited, 2021
  • 15. Two T’s - Truncations - Transactions • Truncate o Is also a PostgreSQL operation (makes a table vanish) o What would you expect to happen? o Lots of deletes? No—nothing o Turned off by default • Transactions o PostgreSQL is a real SQL transactional database o What happens when multiple tables are changed in a single transaction? o You get multiple Kafka records, with the same transaction ID o The transaction ID can optionally be written to another Kafka topic • Note that to process Truncations and Transactions the Kafka sink connector needs to be pretty intelligent, and semantics will depend on target sink system © Instaclustr Pty Limited, 2021
  • 16. Debezium PostgreSQL Connector Throughput How fast can a Debezium PostgreSQL Connector run? (Source: Shutterstock) © Instaclustr Pty Limited, 2021
  • 17. Debezium PostgreSQL Connector Throughput 1 Task Only (Source: Shutterstock) © Instaclustr Pty Limited, 2021
  • 18. Debezium PostgreSQL Connector Throughput 1 Task Only Throughput limited to 7,000 events/s per task © Instaclustr Pty Limited, 2021
  • 19. Debezium PostgreSQL Connector Throughput 1 Task Only But PostgreSQL server is capable of 41,000 inserts/s (6x more, from previous tests) © Instaclustr Pty Limited, 2021
  • 20. Solutions? Most workloads will be more balanced between reads/writes One lane may be fine! © Instaclustr Pty Limited, 2021 (Sources: Paul Brebner & Shutterstock)
  • 21. Solutions? A wider bridge? © Instaclustr Pty Limited, 2021 (Source: Paul Brebner)
  • 22. Solutions? Multiple connectors 1 per table? © Instaclustr Pty Limited, 2021
  • 23. M ultiple replication slots © Instaclustr Pty Limited, 2021 Solutions? Multiple connectors 1 per table?
  • 24. Odd Behaviour One connector watching 2 tables Multiple changes to 1 before the other Changes in 1st table all processed before any changes in the other (10m delay!) Multiple connectors may be best practice Only 1 table at a time is processed? © Instaclustr Pty Limited, 2021 (Source: Shutterstock)
  • 25. What if there are lots of tables (and databases)? Better? 1 connector per table “group” (tables common to a service, tables with similar change rates, etc.) Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Debezium Connector 1 Debezium Connector 2 Service 1 Service 2 © Instaclustr Pty Limited, 2021
  • 26. Streaming Debezium PostgreSQL Connector Change Data Capture Events Into Elasticsearch With Kafka Sink Connectors The final metamorphosis, from Cheetah (Kafka) to Rhino (Elasticsearch)! (Source: Shutterstock) © Instaclustr Pty Limited, 2021
  • 27. What Can You Do With the CDC Data Once It’s in Kafka? Stream it into 1 or more sink systems, e.g. Elasticsearch © Instaclustr Pty Limited, 2021
  • 28. Pipeline Blog Series Berlin Beer Pipes? (Source:Paul Brebner) Reuse Kafka Elasticsearch sink connectors Worked well with schema less JSON data © Instaclustr Pty Limited, 2021
  • 29. Camel Sink Connector? Missing a class (“org.elasticsearch.rest. BytesRestResponse”) Gave up! (Source: Shutterstock) APACHE © Instaclustr Pty Limited, 2021
  • 30. Tried the Lenses Connector Example configuration To process 7,000 events/s need more tasks, partitions, and Elasticsearch shards, and probably BULK API! curl https://KC_IP:8083/connectors/elastic-sink-tides/config -k -u KC_user:KC_password -X PUT -H 'Content-Type: application/json' -d ' { "connector.class" : "com.datamountaineer.streamreactor.connect.elastic7.ElasticSinkCon nector", "tasks.max" : 100, "topics" : "test1.public.test1", "connect.elastic.hosts" : "ES_IP", "connect.elastic.port" : 9201, "connect.elastic.kcql" : "INSERT INTO test-index SELECT * FROM test1.public.test", "connect.elastic.use.http.username" : "ES_user", "connect.elastic.use.http.password" : "ES_password" } }' © Instaclustr Pty Limited, 2021
  • 31. All Events Are “Inserts” Into Elasticsearch But we have “before” and “after”?! Get rid of before events with Single Message Transformation on Source connector side curl https://KC_IP:8083/connectors -X POST -H 'Content-Type: application/json' -k -u kc_user:kc_password -d '{ "name": "debezium-test1", "config": { "connector.class": "io.debezium.connector.postgresql.PostgresConnector", "database.hostname": "pg_ip", "database.port": "5432", "database.user": "pg_user", "database.password": "pg_password", "database.dbname" : "postgres", "database.server.name": "test1", "plugin.name": "pgoutput", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "false", "transforms": "unwrap", "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState" } } ‘ This ”event flattening” SMT extracts the after field from a Debezium change event and creates a simple Kafka record with the after field contents. © Instaclustr Pty Limited, 2021
  • 32. All Events Are “Inserts” Into Elasticsearch But we have “before” and “after”?! Get rid of before events with Single Message Transformation on Source connector side curl https://KC_IP:8083/connectors -X POST -H 'Content-Type: application/json' -k -u kc_user:kc_password -d '{ "name": "debezium-test1", "config": { "connector.class": "io.debezium.connector.postgresql.PostgresConnector", "database.hostname": "pg_ip", "database.port": "5432", "database.user": "pg_user", "database.password": "pg_password", "database.dbname" : "postgres", "database.server.name": "test1", "plugin.name": "pgoutput", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "false", "transforms": "unwrap", "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState" } } ‘ This ”event flattening” SMT extracts the after field from a Debezium change event and creates a simple Kafka record with the after field contents. How w ould w e process updates and deletes? © Instaclustr Pty Limited, 2021
  • 33. A Clever Test/Trick?! Previous Tidal data ⇨ Elasticsearch pipeline, V2 modified to use PostgreSQL as sink Pipeline 1: Tidal Data (REST source connector) à PostgreSQL © Instaclustr Pty Limited, 2021
  • 34. Pipeline 2: PostgreSQL à Elasticsearch Events State State Events State A Clever Test/Trick?! So I used the PostgreSQL Tidal data as the source system! Simple test as only have “inserts” Pipeline 1: Tidal Data (REST source connector) à PostgreSQL © Instaclustr Pty Limited, 2021
  • 35. Kibana Visualization of Tidal Data⇨ Kafka Connect ⇨ PostgreSQL ⇨ Kafka Connect ⇨ Elasticsearch ⇨ Kibana © Instaclustr Pty Limited, 2021
  • 36. Solving the Chicken or Egg Dilemma i.e. It doesn’t matter as long as we get to eat the omelet (Source: Shutterstock) © Instaclustr Pty Limited, 2021
  • 37. PostgreSQL Configuration Required to run the Debezium Source Connector Not currently supported in our managed PG service 1 Task Only Limits throughput Issues with multiple tables per connector? Best-practice to run multiple connectors, maybe 1 per “related” tables? CDC Events Complex Kafka record structure Meta data and data Schema or schemaless? Truncate? Transactions? Sink Connectors May need customization to understand CDC events and process correctly for target sink system Debezium PostgreSQL Conclusions PostgreSQL Configuration Required to run the Debezium Source Connector Not yet supported in Instaclustr’s managed PG service 1 Task Only Limits throughput Issues with multiple tables per connector? Best-practice to run multiple connectors, maybe 1 per “related” tables? CDC Events Complex Kafka record structure Meta data and data Schema or schemaless? Truncate? Transactions? Sink Connectors May need customization to understand CDC events and process correctly for target sink system © Instaclustr Pty Limited, 2021
  • 38. Debezium PostgreSQL Connector - NOTES ■ This talk covers a generic open source solution ● Using Debezium ● PostgreSQL ● Apache Kafka Connect ● OpenSearch ■ For hosted PostgreSQL ● You may need help with PostgreSQL configuration from cloud providers ■ But may be tricky to configure correctly ● For high throughput ● Many databases and tables ● For unbalanced changes across multiple tables ● I also didn’t test failover scenarios ■ The Debezium PostgreSQL Connector with Instaclustr’s Managed PostgreSQL service is on the roadmap for 2022 © Instaclustr Pty Limited, 2021
  • 39. Further Information Blogs ■ www.instaclustr.com/paul-brebner/ ■ Lots of Blogs using open source technologies: PostgreSQL, Apache Kafka, Apache Cassandra, Apache Spark, Apache Zookeeper, Redis, Elasticsearch/OpenSearch, Cadence (new) etc ■ For interesting use cases: IoT, ML, anomaly detection, geospatial, fintech, pipelines, etc ■ Free Trial on homepage for all of these technologies © Instaclustr Pty Limited, 2021