Modern event-based/streaming distributed systems embrace the idea that change is inevitable and actually desirable! Without being change-aware, systems are inflexible, can’t evolve or react, and are simply incapable of keeping up with real-time real-world data. But how can we speed up an “Elephant” (PostgreSQL) to be as fast as a “Cheetah” (Kafka)? In this talk, we'll introduce the Debezium PostgreSQL Connector, and explain how to deploy, configure and run it on a Kafka Connect cluster, explore the semantics and format of the change data events (including Schemas and Table/Topic mapping), and test the performance. Finally, we'll show how to stream the change data events into an example downstream system, Elasticsearch, using an open source sink connector.
Presentation for PostgresConf.CN and PGConf.Asia 2021 https://www.highgo.ca/2022/01/19/2021-pg-asia-conference-delivered-another-successful-online-conference-again/
9. 1. The
Debezium
PostgreSQL
Connector—
Run It
• Download the Debezium PostgreSQL connector
• Deploy it:
o Upload to AWS S3 bucket
o Synchronise with Instaclustr managed Kafka connect
o "io.debezium.connector.postgresql.PostgresConnector" will be in list
of available connectors on the console
• Configure PostgreSQL
o Set wal_level (write ahead log) to logical (3rd non-default level,
requires server restart)
o Create Debezium user with REPLICATION and LOGIN permissions
o These need PostgreSQL admin permissions
• Configure Debezium connector and run it
o Plugin.name default must be set to pgoutput, need PG
username/password and IP