KAFKA Summit EMEA 2022
Andrea Gioia
CTO at Quantyca
Co-Founder at Blindata
Matteo Cimini
Data Architect at Quantyca
Handling Eventual Consistency in a Transactional World
Who we are?
Not an easy question to answer but keeping it simple...
Andrea Gioia
CTO & Partner
andrea.gioia@quantyca.it
Matteo Cimini
Data Architect
Quantyca is a privately owned technological
consulting firm specialized in data and metadata
management based in Italy
quantyca.it
matteo.cimini@quantyca.it
Digital Integration Hub
Where we left off
System of engagement System Of Insight
System of Records
Legacy
Systems
Application
Layer
Digital
Integration
Hub
API Gateway
Event-Based Integration Layer
High-Performance Data Store
Microservices
Metadata Management
Data offloaded from legacy systems are aggregated in read
models stored into dedicated low-latency, high performance
datastores accessible via APIs, events or batch.
The data store synchronizes with the back ends via
event-driven integration patterns.
Benefits
○ Responsive user experience
○ Offload legacy systems from expensive workloads
generated by front-end services
○ Support legacy refactoring
○ Align services to business domain
○ Enable real time analytics
○ Foster a data centric approach to integration
PROS
+ Can handle very high
throughput
CONS
- Not a good fit for complex
events processing
- TCO may not be optimal for
huge data volumes
PROS
+ Low Latency
+ Can handle very high
throughput
+ Simplified schema
management
CONS
- Not a good fit for complex
stateful transformations
- Can have some performance
issues at very high throughput
PROS
+ Largely used by service
developers, probably already
present in the architecture
+ Simplified schema
management
CONS
- Not a good fit for complex
events processing
- Can have some performance
issues at very high throughput
PROS
+ SQL Compliant
+ Transactional (ACID)
+ TCO can be optimized selecting
the right storage strategy
between RAM and disk
CONS
- Can have some performance
issues at very high throughput
- Low latency is not guaranteed
Event Store
The kafka option
RDBMS NoSQL DB
Streaming
Platform
Distributed Cache
Event store on Kafka
Main consistency challenges
TRANSACTIONAL CONSISTENCY: The read model must be
consistent from a transactional point of view with the
upstream source aggregate offloaded form source
system. Kafka is not a transactional system
ORDERING CONSISTENCY: The read model must be
updated only in forward way. Older states cannot replace
newer ones. For complex source aggregates is very easy
to have events delivered out of order.
HISTORICAL CONSISTENCY: The read model must be
easily created at anytime from scratch without
information loss. Infinite retention in kafka is possible but
it is not always a faisable option.
Event Store
Target
Read
Model
Source
Aggregate
Event store on Kafka
Source System Event Store
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
High
Performance
Data Store
Business
Events
(Ease of
consumption)
Commands Micro/Mini
Services
READ
WRITE
Handling consistency challenges in a DIH
Aggregate
Read
Model
Handling consistency challenges at the source
Outbox pattern
Source System Streaming Platform
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
DESCRIPTION
The source system is modified in order to inserts
messages/events into an outbox table as part of the local
transaction.
The modification can be performed at code or database
level (es. triggers or materialized views).
The connector that offloads data to the streaming platform
is triggered by the outbox table.
PROS
+ It has no overhead in terms of latency and throughput
+ It does not generate extra workloads at the source
CONS
- It’s not always possible to modify the source to
implement this pattern
OUTBOX Table
COMMIT TRX
INSERT
UPDATE
DELETE
INSERT
Handling consistency challenges at the source
Callback Pattern
Source System Streaming Platform
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
DESCRIPTION
All changes to tables that are part of the same aggregate
are mapped to the same topic as technical event that can
contain only the aggregate id and transaction id as payload.
For every transaction id a stream processor query the
legacy database extracting the modified aggregate, filtering
by id, and publishing it as payload of a new domain event
To reduce the workload on legacy the stream processor can
query a read replica
PROS
+ Do not require any modification at the source
CONS
- Even if the footprint at the source is less then a
standard pooling solutions it could be anyway not trivial
especially for high throughput transactional sources
Handling consistency challenges in kafka
In-flight handling pattern
Domain
Events
(Trusted
Views)
Kafka Streams
Ecosystem
Buffering
Events
Closing
Transactions
Ordering
Transactions
Transactions
Metadata
DESCRIPTION
All events that are part of the same aggregate are buffered to the
same changelog topic until the corresponding END transaction
event has been captured.
Closed transactions are then ordered by a Punctuation function
and mapped to the corresponding domain event.
PROS
+ It has minimum overhead in terms of latency and throughput
+ It does not generate extra workloads at the source
+ Do not require any modification at the source
CONS
- Need to write stateful applications
- Need to use Processor API capabilities (low level statestore,
punctuators, etc etc)
- Transaction Metadata must contain
END transaction informations
Streaming Platform
Technical
Events
(Speed &
Fidelity)
Transactions
Metadata
Buffering Layer
Handling consistency challenges in the fast storage
Cross-docking pattern
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
Fast Storage
Closing & Ordering
Transactions
Data Model
Business Events
(Ease of consumption)
Streaming Platform
DESCRIPTION
Events that are part of the same aggregate are
buffered in the same buffering table until the
corresponding END transaction event has been
captured.
Closed transactions are then ordered by a
micro-batch processor and then mapped to the
corresponding domain event.
PROS
+ It has medium overhead in terms of latency
and throughput
+ Stateful solution on a stateful storage
(typically strong consistent)
+ Fast Storage is SQL compliant in most cases
+ Business Events can be enriched with
external informations
CONS
- Need to add a component (Fast Storage) to
the overall architecture (DIH)
Takeaways
In a complex and heterogeneous architecture not all consumers can handle eventual consistency
There are different solutions to enforce consistency in an event driven architecture like the digital integration hub
There are no free lunches anyway. Every solution comes with pros and cons. It is important to evaluate them in the context.
The general rule of thumbs is to
○ use the outbox pattern based solution for every newly implemented custom source (and for all legacy sources you are
allowed to modify)
○ decide between in-flight handling and cross docking pattern based solution for existing solution considering latency
trade off and the skills set of your engineering team
What’s next…
We are working to define solution templates at the platform level in order to provide to data product teams consistency
preserving services in a self service way through a declarative interface. More on this next year ;-)
Questions?
Feel free to ask
matteo.cimini@quantyca.it
andrea.gioia@quantyca.it
Corso Milano, 45 / 20900 Monza (MB)
T. +39 039 9000 210 / F. +39 039 9000 211 / @ info@quantyca.it
www.quantyca.it

Kafka Summit 2022: Handling Eventual Consistency in a Transactional World.pdf

  • 1.
    KAFKA Summit EMEA2022 Andrea Gioia CTO at Quantyca Co-Founder at Blindata Matteo Cimini Data Architect at Quantyca Handling Eventual Consistency in a Transactional World
  • 2.
    Who we are? Notan easy question to answer but keeping it simple... Andrea Gioia CTO & Partner andrea.gioia@quantyca.it Matteo Cimini Data Architect Quantyca is a privately owned technological consulting firm specialized in data and metadata management based in Italy quantyca.it matteo.cimini@quantyca.it
  • 3.
    Digital Integration Hub Wherewe left off System of engagement System Of Insight System of Records Legacy Systems Application Layer Digital Integration Hub API Gateway Event-Based Integration Layer High-Performance Data Store Microservices Metadata Management Data offloaded from legacy systems are aggregated in read models stored into dedicated low-latency, high performance datastores accessible via APIs, events or batch. The data store synchronizes with the back ends via event-driven integration patterns. Benefits ○ Responsive user experience ○ Offload legacy systems from expensive workloads generated by front-end services ○ Support legacy refactoring ○ Align services to business domain ○ Enable real time analytics ○ Foster a data centric approach to integration
  • 4.
    PROS + Can handlevery high throughput CONS - Not a good fit for complex events processing - TCO may not be optimal for huge data volumes PROS + Low Latency + Can handle very high throughput + Simplified schema management CONS - Not a good fit for complex stateful transformations - Can have some performance issues at very high throughput PROS + Largely used by service developers, probably already present in the architecture + Simplified schema management CONS - Not a good fit for complex events processing - Can have some performance issues at very high throughput PROS + SQL Compliant + Transactional (ACID) + TCO can be optimized selecting the right storage strategy between RAM and disk CONS - Can have some performance issues at very high throughput - Low latency is not guaranteed Event Store The kafka option RDBMS NoSQL DB Streaming Platform Distributed Cache
  • 5.
    Event store onKafka Main consistency challenges TRANSACTIONAL CONSISTENCY: The read model must be consistent from a transactional point of view with the upstream source aggregate offloaded form source system. Kafka is not a transactional system ORDERING CONSISTENCY: The read model must be updated only in forward way. Older states cannot replace newer ones. For complex source aggregates is very easy to have events delivered out of order. HISTORICAL CONSISTENCY: The read model must be easily created at anytime from scratch without information loss. Infinite retention in kafka is possible but it is not always a faisable option. Event Store Target Read Model Source Aggregate
  • 6.
    Event store onKafka Source System Event Store Technical Events (Speed & Fidelity) Domain Events (Trusted Views) High Performance Data Store Business Events (Ease of consumption) Commands Micro/Mini Services READ WRITE Handling consistency challenges in a DIH Aggregate Read Model
  • 7.
    Handling consistency challengesat the source Outbox pattern Source System Streaming Platform Technical Events (Speed & Fidelity) Domain Events (Trusted Views) DESCRIPTION The source system is modified in order to inserts messages/events into an outbox table as part of the local transaction. The modification can be performed at code or database level (es. triggers or materialized views). The connector that offloads data to the streaming platform is triggered by the outbox table. PROS + It has no overhead in terms of latency and throughput + It does not generate extra workloads at the source CONS - It’s not always possible to modify the source to implement this pattern OUTBOX Table COMMIT TRX INSERT UPDATE DELETE INSERT
  • 8.
    Handling consistency challengesat the source Callback Pattern Source System Streaming Platform Technical Events (Speed & Fidelity) Domain Events (Trusted Views) DESCRIPTION All changes to tables that are part of the same aggregate are mapped to the same topic as technical event that can contain only the aggregate id and transaction id as payload. For every transaction id a stream processor query the legacy database extracting the modified aggregate, filtering by id, and publishing it as payload of a new domain event To reduce the workload on legacy the stream processor can query a read replica PROS + Do not require any modification at the source CONS - Even if the footprint at the source is less then a standard pooling solutions it could be anyway not trivial especially for high throughput transactional sources
  • 9.
    Handling consistency challengesin kafka In-flight handling pattern Domain Events (Trusted Views) Kafka Streams Ecosystem Buffering Events Closing Transactions Ordering Transactions Transactions Metadata DESCRIPTION All events that are part of the same aggregate are buffered to the same changelog topic until the corresponding END transaction event has been captured. Closed transactions are then ordered by a Punctuation function and mapped to the corresponding domain event. PROS + It has minimum overhead in terms of latency and throughput + It does not generate extra workloads at the source + Do not require any modification at the source CONS - Need to write stateful applications - Need to use Processor API capabilities (low level statestore, punctuators, etc etc) - Transaction Metadata must contain END transaction informations Streaming Platform Technical Events (Speed & Fidelity)
  • 10.
    Transactions Metadata Buffering Layer Handling consistencychallenges in the fast storage Cross-docking pattern Technical Events (Speed & Fidelity) Domain Events (Trusted Views) Fast Storage Closing & Ordering Transactions Data Model Business Events (Ease of consumption) Streaming Platform DESCRIPTION Events that are part of the same aggregate are buffered in the same buffering table until the corresponding END transaction event has been captured. Closed transactions are then ordered by a micro-batch processor and then mapped to the corresponding domain event. PROS + It has medium overhead in terms of latency and throughput + Stateful solution on a stateful storage (typically strong consistent) + Fast Storage is SQL compliant in most cases + Business Events can be enriched with external informations CONS - Need to add a component (Fast Storage) to the overall architecture (DIH)
  • 11.
    Takeaways In a complexand heterogeneous architecture not all consumers can handle eventual consistency There are different solutions to enforce consistency in an event driven architecture like the digital integration hub There are no free lunches anyway. Every solution comes with pros and cons. It is important to evaluate them in the context. The general rule of thumbs is to ○ use the outbox pattern based solution for every newly implemented custom source (and for all legacy sources you are allowed to modify) ○ decide between in-flight handling and cross docking pattern based solution for existing solution considering latency trade off and the skills set of your engineering team What’s next… We are working to define solution templates at the platform level in order to provide to data product teams consistency preserving services in a self service way through a declarative interface. More on this next year ;-)
  • 12.
    Questions? Feel free toask matteo.cimini@quantyca.it andrea.gioia@quantyca.it
  • 13.
    Corso Milano, 45/ 20900 Monza (MB) T. +39 039 9000 210 / F. +39 039 9000 211 / @ info@quantyca.it www.quantyca.it