Apache Kafka® and the Data Mesh

Apache Kafka® and the Data Mesh
James Gollan
Senior Solutions Engineer, Conﬂuent
Gnanaguru (Guru) Sattanathan
Senior Solutions Engineer, Conﬂuent

Copyright 2021, Conﬂuent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Conﬂuent, Inc.
Agenda
2
Opening & Introduction
Data Mesh - A brief recap
Apache Kafka & Data Mesh
How to get started ?
Demo

Several historical inﬂuences
4
DDD Microservices
Data Marts Event Streaming
Data on the Inside /
Data on the Outside

5
Data Mesh
A First Look
Domain
Retail
Core Banking
Institutional
...
Data
Product

Domain-driven
Decentralization
Local Autonomy
Per Domain
(Organizational
Concerns)
Data as a
First-class Product
Product thinking,
“Microservice for
Data”
Federated
Governance
Interoperability
Across Domains,
Network Effects
(Organizational
Concerns)
Self-serve
Data Platform
Infrastructure as a
Platform
Across Domains
1 2 3 4
The Principles of a Data Mesh

Principle 1: Domain-driven Decentralization
Anti-pattern: responsibility for data
becomes the domain of the DWH team
Pattern: Ownership of a data asset given to
the “local” team that is most familiar with it
Centralized
Data Ownership
Decentralized
Data Ownership
Objective: Ensure data is owned by those that truly understand it

Principle 2: Data as a First-Class Product
8
• Objective: Make shared data discoverable, addressable, trustworthy, secure,
so other teams can make good use of it.
• Data is treated as a true product, not a by-product.
This product thinking is important to prevent data chauvinism.

Principle 3: Self-serve Data Platform
9
Central infrastructure that provides real-time and historical data on demand
Objective: Make domains autonomous in their execution through rapid data provisioning

Principle 4: Federated Governance
10
• Objective: Independent data products can interoperate and create network effects.
• Establish global standards, like governance, that apply to all data products in the mesh.
• Ideally, these global standards and rules are applied automatically by the platform.
Domain Domain Domain Domain
Self-serve Data Platform
What is decided
locally by a domain?
What is globally?
(implemented and
enforced by platform)
Must balance between Decentralization vs. Centralization. No silver bullet!

Paradigm for Data-at-Rest: Relational Databases
Databases
Slow, daily
batch processing
Simple, static
real-time queries

Spaghetti: Data architectures often lack rigour
13

Kafka provides a solution. The implementation.
14
Kafka
Centralize an immutable stream of facts. Decentralize the freedom to act, adapt, and change.

Messaging reimagined as a 1st class data
system
01
Publish & Subscribe
to Streams of Events
02
Store
your Event Streams
03
Process & Analyze
your Events Streams

Data Product
Data Product
Why is Event Streaming a good ﬁt for meshing?
16
0 1 2 3 4 5 6 1
7
Streams are real-time, low latency ⇒ Propagate data immediately.
Streams are highly scalable ⇒ Handle today’s massive data volumes.
Streams are stored, replayable ⇒ Capture real-time & historical data.
Streams are immutable ⇒ Auditable source of record.
Streams are addressable, discoverable, … ⇒ Meet key criteria for mesh data.
Streams are popular for Microservices ⇒ Adapting to Data Mesh is often easy.

Onboarding existing data
17
Data
Product
Input
Data
Ports
Source
Connectors
Use Kafka connectors to stream data from cloud services and existing systems
into the mesh.
https://www.conﬂuent.io/hub/

Instantly Connect Popular Data Sources & Sinks
20+
Partner Supported
(self-managed)
Data Diode
Conﬂuent Supported
(self-managed)
90+
Growing list of fully managed
connectors in Cloud
Amazon S3 Blob storage
30+
Kinesis
Redshift
Event Hubs
Data Lake Gen 2
Cloud Dataproc

Event Streaming inside a data product
19
Input
Data
Ports
Output
Data
Ports
ksqlDB to ﬁlter,
process, join,
aggregate, analyze
Stream data from
other DPs or internal
systems into ksqlDB
1 2 Stream data to internal
systems or the outside.
Pull queries can drive a
req/res API.
3
Req/Res API
Pull Queries
Use ksqlDB, Kafka Streams apps, etc. for processing data in motion.

Event Streaming inside a data product
20
Input
Data
Ports
Output
Data
Ports
MySQL
Sink
Connector
Source
Connector
DB client apps
work as usual
Stream data from
other Data Products
into your local DB
Stream data to the outside
with CDC and e.g. the
Outbox Pattern, ksqlDB, etc.
1 3
2
Use Kafka connectors and CDC to “streamify” classic databases.

21
Ease of replication
across the Mesh
Cluster Linking & Other
Replication capabilities
Data
Product
STREAM
PROCESSOR
ksqlDB
Query is the interface
to the mesh
Events are the interface to
the mesh

Apache Kafka® and the Data Mesh

More Related Content

What's hot

Similar to Apache Kafka® and the Data Mesh

Recently uploaded

Apache Kafka® and the Data Mesh