"Initial snapshots are a core feature of Debezium: when setting up a new CDC connector, existing tables can be scanned in order to export their full state to consumers, before starting to capture changes from the transaction log. While this works great in general, a few questions came up again and again in the Debezium community over time:
* How to re-snapshot just a single table?
* How to pause and resume long-running snapshots?
* How to run snapshots in parallel to reading changes from the log?
All this, and more, becomes possible with the notion of incremental snapshots. In this session you'll learn how this innovative scheme of interleaving snapshot queries and log-based change events works under the hood and how it solves common tasks when running CDC pipelines. We'll also discuss advanced topics like parallelising snapshots and customising snapshot contents."
4. #DebeziumSnapshotting @gunnarmorling
● Software engineer at Decodable
● Former project lead of Debezium
● kcctl 🧸, JfrUnit, ModiTect,
MapStruct
● Spec Lead for Bean Validation 2.0
● Java Champion
Gunnar Morling
14. #DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
The Paper
● “DBLog: A Watermark Based
Change-Data-Capture
Framework”, by Andreas Andreakis
and Ioannis Papapanagiotou
● Key idea: interleave snapshot events
and events from TX log
https://arxiv.org/pdf/2010.12597v1.pdf