More Related Content

Slideshows for you(20)

Similar to Introducing Change Data Capture with Debezium(20)

Introducing Change Data Capture with Debezium

  1. V0000000 1 Friends don’t let friends do dual-writes! Introducing Change Data Capture with Debezium Cheng Kuan Gan Senior Specialist Solution Architect Red Hat APAC CHANGE DATA CAPTURE
  2. V0000000 CHANGE DATA CAPTURE 2 The Issue with Dual Writes Source: What's the problem? Change data capture to the rescue! CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms Agenda
  3. V0000000 CHANGE DATA CAPTURE Common Problem 3 Updating multiple resources Order Service Database
  4. V0000000 CHANGE DATA CAPTURE Common Problem 4 Updating multiple resources Order Service Database Cache
  5. V0000000 CHANGE DATA CAPTURE Common Problem 5 Updating multiple resources Order Service Database Cache Search Index
  6. V0000000 CHANGE DATA CAPTURE Common Problem 6 Updating multiple resources Order Service Database Cache Search Index
  7. V0000000 Friends Don't Let Friends Do Dual Writes! CHANGE DATA CAPTURE 7
  8. V0000000 CHANGE DATA CAPTURE Better Solution 8 Stream changes events from the database Order Service
  9. V0000000 CHANGE DATA CAPTURE Better Solution 9 Stream changes events from the database Order Service C | C | U | C | U | U | D Change Data Capture C - Change U - Update D - Delete
  10. V0000000 CHANGE DATA CAPTURE Better Solution 10 Stream changes events from the database Order Service C | C | U | C | U | U | D Change Data Capture C - Change U - Update D - Delete Search Index Cache
  11. V0000000 Change Data Capture with Debezium CHANGE DATA CAPTURE Debezium is an open source distributed platform for change data capture 11
  12. V0000000 CHANGE DATA CAPTURE Debezium 12 Change Data Capture Platform ● CDC for multiple databases ○ Based on transaction logs ○ Snapshotting, Filtering etc. ● Fully open-source, very active community ● Latest version: 1.4 ● Production deployments at multiple companies (e.g. WePay, JW Player, Convoy, Trivago, OYO, BlaBlaCar etc.)
  13. V0000000 CHANGE DATA CAPTURE Red Hat Integration CDC 13 ● GA Connectors ○ MySQL ○ Postgres ○ SQL Server ○ MongoDB ○ DB2 (Linux only) ● Developer Preview: ○ Oracle 19 EE (LogMiner) Supported Databases
  14. V0000000 CHANGE DATA CAPTURE Advantages of Log-based CDC 14 Tailing the Transaction Logs ● All data changes are captured ● No polling delay or overhead ● Transparent to writing applications and models ● Can capture deletes ● Can capture old record state and further meta data
  15. V0000000 CHANGE DATA CAPTURE Log vs Query based CDC 15 Query-based Log-based All data changes are captured - No polling delay or overhead - Transparent to writing applications and models - Can capture deletes and old record state - Simple Installation/Configuration -
  16. V0000000 CHANGE DATA CAPTURE Debezium 16 Change Event Structure ● Key: PK of table ● Value: Describing the change event ○ Before state, ○ After state, ○ Metadata info ● Serialization formats: ○ JSON ○ Avro ● Cloud events could be used too
  17. V0000000 CHANGE DATA CAPTURE Single Message Transformations 17 Modify events before storing in Kafka Image Source: “Penknife, Swiss Army Knife” by Emilian Robert Vicol , used under CC BY 2.0 ● Lightweight single message inline transformation ● Format conversions ○ Time/date fields ○ Extract new row state ● Aggregate sharded tables to single topic ● Keep compatibility with existing consumers ● Transformation does not interact with external systems
  18. V0000000 Change Data Capture Uses & Patterns CHANGE DATA CAPTURE 18
  19. V0000000 CHANGE DATA CAPTURE Data Replication 19 Zero-Code Streaming Pipelines | | | | | | |   | | | | | | |   | | | | | | | MySQL PostgreSQL Apache Kafka
  20. V0000000 CHANGE DATA CAPTURE Data Replication 20 Zero-Code Streaming Pipelines | | | | | | |   | | | | | | |   | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect
  21. V0000000 CHANGE DATA CAPTURE Data Replication 21 Zero-Code Streaming Pipelines | | | | | | |   | | | | | | |   | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL
  22. V0000000 CHANGE DATA CAPTURE Data Replication 22 Zero-Code Streaming Pipelines | | | | | | |   | | | | | | |   | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL ES Connector ElasticSearch
  23. V0000000 CHANGE DATA CAPTURE Data Replication 23 Zero-Code Streaming Pipelines | | | | | | |   | | | | | | |   | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL ES Connector ElasticSearch SQL Connector Data Warehouse
  24. V0000000 CHANGE DATA CAPTURE A Trucking Company Improves ELT Performance with Debezium 24 Source: Logs & Offsets: (Near) Real Time ELT with Apache Kafka + Snowflake Low Latency, Zero Data Loss and Low Maintenance are key to maintain the user experience and data democratization ● The ELT system is not able to scale when employee growth exceeded 700+. ● Data that used to take 10-15 minutes to import now takes 1-2 hours. ● Some larger datasets expects latency of 6+ hours. Modernized ETL improved significantly with Debezium
  25. V0000000 CHANGE DATA CAPTURE Data Replication 25 Zero-Code Streaming Pipelines Source: Logs & Offsets: (Near) Real Time ELT with Apache Kafka + Snowflake
  26. V0000000 CHANGE DATA CAPTURE Auditing 26 CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs | | | | | | |   | DBZ CRM Service Source DB Kafka Connect Apache Kafka
  27. V0000000 CHANGE DATA CAPTURE Auditing 27 CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs | | | | | | |   | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer
  28. V0000000 CHANGE DATA CAPTURE Auditing 28 CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs | | | | | | |   | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions
  29. V0000000 CHANGE DATA CAPTURE Auditing 29 CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs | | | | | | |   | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions Kafka Streams
  30. V0000000 CHANGE DATA CAPTURE Auditing 30 CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs | | | | | | |   | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions Kafka Streams | | | | | | |   | Enriched Customers
  31. V0000000 CHANGE DATA CAPTURE Auditing 31 CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
  32. V0000000 CHANGE DATA CAPTURE Microservices 32 Microservices Data Exchange Source: ● Propagate data between different services without coupling ● Each service keeps optimised views locally
  33. V0000000 CHANGE DATA CAPTURE Microservices 33 Outbox Pattern Source: http://bit.ly/debezium-outbox-pattern
  34. V0000000 CHANGE DATA CAPTURE Microservices 34 Mono to micro: Strangler Pattern Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0 ● Extract microservice for single component(s) ● Keep write requests against running monolith ● Stream changes to extracted microservice ● Test new functionality ● Switch over, evolve schema only afterwards
  35. V0000000 CHANGE DATA CAPTURE Mono to micro: Strangler Pattern 35 Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0 Customer
  36. V0000000 CHANGE DATA CAPTURE Mono to micro: Strangler Pattern 36 Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0 Customer Customer’ Router CDC Transformation Reads / Writes Reads
  37. V0000000 CHANGE DATA CAPTURE Mono to micro: Strangler Pattern 37 Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0 Customer Router CDC Reads / Writes Reads / Writes CDC
  38. V0000000 Demo CHANGE DATA CAPTURE 38
  39. V0000000 Demo 39 Kafka Connect Kafka Connect Apache Kafka
  40. V0000000 Running on OpenShift CHANGE DATA CAPTURE Getting the best cloud-native Apache Kafka running on enterprise Kubernetes 40
  41. V0000000 CHANGE DATA CAPTURE Running on OpenShift 41 Cloud-native Apache Kafka Source: ● Provides: ○ Container images for Apache Kafka, Connect, Zookeeper and MirrorMaker ○ Kubernetes Operators for managing/configuring Apache Kafka clusters, topics and users ○ Kafka Consumer, Producer and Admin clients, Kafka Streams ● Upstream Community: Strimzi
  42. V0000000 CHANGE DATA CAPTURE Running on OpenShift 42 Deployment via Operators Source: ● YAML-based custom resource definitions for Kafka/Connect clusters, topics etc. ● Operator applies configuration ● Advantages ○ Automated deployment and scaling ○ Simplified upgrading ○ Portability across clouds
  43. V0000000 linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHat 43 Red Hat is the world’s leading provider of enterprise open source software solutions. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500. Thank you Optional section marker or title