Thoughtworks Zhamak Dehghani observations on these traditional approaches’s failure modes, inspired her to develop an alternative big data management architecture that she aptly named the Data Mesh. This represents a paradigm shift that draws from modern distributed architecture and is founded on the principles of domain-driven design, self-serve platform, and product thinking with Data. In the last decade Apache Kafka has established a new category of data management infrastructure for data in motion that has been leveraged in modern distributed data architectures.
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Evolution from EDA to Data Mesh with Kafka
1. Evolution from EDA to Data Mesh
aka
Data In Motion
A distributed approach to unlock the value of enterprise data
Andreas Sittler, Sr. Solutions Engineering Consultant
2. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Agenda
2
01
$whoami
02
Data Mesh: Motivation / Principles
03
Apache Kafka (short)
04
Data Mesh. Revisited.
Powered By Kafka / Confluent.
3. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Buzzwords…
3
4. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
$whoami
• Diploma in Physics (Hamburg / CERN)
• Background (focus)
• Messaging / Integration / EAI
• Workflow / BPM / DCM
• Companies
• Milestone/Template Software/Level8
• TIBCO
• Pegasystems
• Confluent
4
6. 6
Data Mesh Founder
Zhamak is a principal technology consultant at
ThoughtWorks with a focus on distributed
systems architecture and digital platform
strategy at Enterprise. She is a member of
ThoughtWorks Technology Advisory Board and
contributes to the creation of ThoughtWorks
Technology Radar.
Zhamak Dehghani
Director of Emerging Technologies | Data Mesh
Founder | Member of Tech Advisory Boards
8. Operational Data Plane Analytical Data Plane
Running the Business
Serving the Users
Optimizing the Business
Improving the User Experience
The Great Divide Of Data
9. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Data Architectures & Organization Today
9
BIG DATA
PLATFORM Ingest Process Serve
Centralized
Architecture
Technically
Decomposed
Hyper-Specialized
Silo Delivery
10. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
10
Data Marts DDD Microservices Event Streaming
Domain
Inventory
Orders
Shipments
Data Product
Data Mesh
...
Historic Influences
11. Data ownership by
domain
Data as a product Data governed
wherever it is
Data available
everywhere, self
serve
1 2 3 4
The Principles of a Data Mesh
12. Domain-driven
Decentralization
Local Autonomy
(Organizational Concerns)
Data as a
First-class Product
Product thinking,
“Microservice for Data”
Federated
Governance
Interoperability,
Network Effects
(Organizational Concerns)
Self-serve
Data Platform
Infra Tooling,
Across Domains
1 2 3 4
The Principles of a Data Mesh
15. 15
Shipping Data
Joe
Domain Responsibility: Practical example
1. Joe in Inventory has a problem with
Order data.
2. Inventory items are going negative,
because of bad Order data.
3. He could fix the data up locally in the
Inventory domain, and get on with his
job.
4. Or, better, he contacts Alice in Orders and
get it fixed at the source. This is more
reliable as Joe doesn’t fully understand
the Orders process.
5. Ergo, Alice needs be an responsible &
responsive “Data Product Owner”, so
everyone benefits from the fix to Joe’s
problem.
Orders Domain Shipment Domain
Order Data
Inventory Billing Recommendations
Alice
16. Alice must define herself as a Data Product Owner
16
Requires:
- Tools for managing issues
raised
- Pre-agreed SLAs
- Mindset shift to data being a
product owner for data
17. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Infra
Code
Data Product: A “microservice for the data world”
17
• Data product is a node on the data mesh, situated within a domain.
• Produces—and possibly consumes—high-quality data within the mesh.
• Encapsulates all the elements required for its function, namely data + code + infrastructure.
Data
Creates, manipulates,
serves, etc. that data
Powers the data (e.g., storage) and the
code (e.g., run, deploy, monitor)
“Items about to expire”
Data Product
Data and metadata,
including history
‘Quantum Architecture’
20. Let’s use an immutable log to share data!
20
1 2 3 4 5 6 7 8 9 10
Producers
write here
Kafka producers write to an
append-only, immutable, ordered
sequence of messages, which is
always ordered by time
● Sequential writes only
● No random disk access
● All operations are O(1)
● Highly efficient
21. 1 2 3 4 5 6 7 8 9 10
“Consumers”
scan the log
“Consumer”
A
“Consumer”
B
“Better than a queue”-like
behavior as Kafka consumer
groups allows for parallel in-order
consumption of data, which is
something that shared queues in
traditional message brokers do
not support.
● Sequential reads only
● Start at any offset
● All operations are O(1)
● Highly efficient
Slow consumers don’t back up
the broker: THE STREAM GOES
ON.
A log is like a queue, but re-readable :-D
22. Kafka topics are designed as a commit log that
captures events in a durable, scalable way
1 2 3 4 5 6 8 9
7
Partition 1
Old New
1 2 3 4 5 6 8
7
Partition 0 10
9 11 12
Partition 2 1 2 3 4 5 6 8
7 10
9 11 12
Writes
1 2 3 4 5 6 8
7 10
9 11 12
Producers
Writes
“Consumer” A
(offset=4)
“Consumer” B
(offset=7)
Reads
23. How else is Kafka different from traditional
messaging queues?
23
Topic partitions are
replicated to maximize
fault-tolerance
In addition to partitioning
topics, each partition can be
replicated across multiple
brokers to ensure high uptime
even if a broker is lost.
Producers and consumers
scale independently from
brokers
Production and consumption
rates (e.g. spike or slow
consumer issue) have no effect
on the broker. THE STREAM
GOES ON.
Event streams can be
enriched in real-time with
stream processing
ksqlDB and Kafka Streams
enable event streams to be
processed “in-flight” rather
than with a separate batch
solution
25. Data ownership by
domain
Data as a product Data governed
wherever it is
Data available
everywhere, self
serve
1 2 3 4
The Principles of a Data Mesh
26. 1 2 3 4
Domain-driven
Decentralization
Local Autonomy
(Organizational Concerns)
Data as a
First-class Product
Product thinking,
“Microservice for Data”
Federated
Governance
Interoperability,
Network Effects
(Organizational Concerns)
Self-serve
Data Platform
Infra Tooling,
Across Domains
The Principles of a Data Mesh
27. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 27
Connectivity within the mesh lends itself...
Domain
Data Product
Data Mesh
28. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 28
...naturally to Event Streaming with Kafka
Domain
Data Product
Mesh is a logical view, not physical!
Data Mesh
29. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
29
Data
Product
Data
Product
Data
Product
Data
Product
stream
(persisted) other streams
write
(publish)
read
(consume)
independently
Data producers are scalably decoupled from consumers.
Event Streaming is Pub/Sub, not Point-to-Point
30. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Data Product
Data Product
Why is Event Streaming a good fit for meshing?
0 1 2 3 4 5 6 1
7
Streams are real-time, low latency ⇒ Propagate data immediately.
Streams are highly scalable ⇒ Handle today’s massive data volumes.
Streams are stored, replayable ⇒ Capture real-time & historical data.
Streams are immutable ⇒ Auditable source of record.
Streams are addressable, discoverable, … ⇒ Meet key criteria for mesh data.
Streams are popular for Microservices ⇒ Adapting to Data Mesh is often easy.
30
31. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How to get data into & out of a data product
31
Data Product
Input
Data
Ports
Output
Data
Ports
Snapshot via
Nightly ETL
Snapshot via
Nighty ETL
Continuous
Stream
Snapshot via
Req/Res API
Snapshot via
Req/Res API
1
2
3
Continuous
Stream
32. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Onboarding existing data
32
Data
Product
Input
Data
Ports
Source
Connectors
Use Kafka connectors to stream data from cloud
services and existing systems into the mesh.
https://www.confluent.io/hub/
33. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Data product: what’s happening inside
33
Input
Data
Ports
Output
Data
Ports
…pick your favorites...
Data on the Inside: HOW the domain team solves specific problems
internally? This doesn’t matter to other domains.
34. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Event Streaming inside a data product
34
Input
Data
Ports
Output
Data
Ports
ksqlDB to filter,
process, join,
aggregate, analyze
Stream data from
other DPs or
internal systems
into ksqlDB
1 2 Stream data to
internal systems or
the outside. Pull
queries can drive a
req/res API.
3
Req/Res API
Pull Queries
Use ksqlDB, Kafka Streams apps, etc. for processing data in motion.
35. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Domain
Inventory
Orders
Shipments
Finance
Data Product
ksqlDB: Transform data across from across the mesh
ksqlDB
Join and transform data
taken from the mesh
(Realtime ETL Pattern)
36. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Domain
Inventory
Orders
Shipments
Finance
Data Product
ksqlDB: Query data in the mesh
ksqlDB
2. Query data
in the mesh
1. Create a materialized
view for your use case
37. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Use Kafka connectors and CDC to “streamify” classic databases.
Event Streaming inside a data product
37
Input
Data
Ports
Output
Data
Ports
MySQL
Sink
Connector
Source
Connector
DB client apps
work as usual
Stream data from
other Data Products
into your local DB
Stream data to the outside
with CDC and e.g. the
Outbox Pattern, ksqlDB, etc.
1 3
2
38. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Domain
Inventory
Orders
Shipments
Finance
Data Product
Use a Schema Registry
Schema
Registry
Confluent Schema Registry:
● Supports:
○ Avro
○ Protobuf
○ JSON Schema
● Can be used with Event
Streams and other
technologies
39. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Dealing with data change: schemas & versioning
39
Data
Product
Output
Data
Ports
V1 - user, product, quantity
V2 - userAnonymized, product, quantity
Also, when needed, data can be fully reprocessed by replaying history.
Publish evolving streams with back/forward-compatible schemas.
Publish versioned streams for breaking changes.
40. 1 2 3 4
Domain-driven
Decentralization
Local Autonomy
(Organizational Concerns)
Data as a
First-class Product
Product thinking,
“Microservice for Data”
Federated
Governance
Interoperability,
Network Effects
(Organizational Concerns)
Self-serve
Data Platform
Infra Tooling,
Across Domains
The Principles of a Data Mesh
42. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Attach Metadata to Schemas/Topics
42
43. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Search by Data Product
43
44. 1 2 3 4
Domain-driven
Decentralization
Local Autonomy
(Organizational Concerns)
Data as a
First-class Product
Product thinking,
“Microservice for Data”
Federated
Governance
Interoperability,
Network Effects
(Organizational Concerns)
Self-serve
Data Platform
Infra Tooling,
Across Domains
The Principles of a Data Mesh
45. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
45
A Data Mesh is one logical cluster
but often many real ones
Data
Product
Data Product has its own
cluster for internal use
In the cloud, clusters are free!
46. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Lineage is even more important for
these larger, more complex
implementations
46
48. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Monolith to Microservices
48
( Service Mesh )
Monolith Microservices
49. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Monolith to Data Mesh
49
Monolithic Data Lake Data Mesh
50. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Centralized Event Streams. Decentralized Data Products.
50
Kafka
Centralize an immutable stream of facts. Decentralize the freedom to act, adapt, and change.
51. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Data Mesh Journey
51
Principle 1
Data should have one owner:
the team that creates it.
Principle 2
Data is your product:
All exposed data should
be good data.
Principle 3
Get access to any data
immediately and painlessly,
be it historical or real-time.
Principle 4: Governance, with standards, security,
lineage, etc. (cross-cutting concerns)
Difficulty
to execute
Start Here
1
2
3
53. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Starter Links
53
• Podcast
• https://developer.confluent.io/podcast/why-data-mesh-ft-ben-stopford
• Practical Tutorial
• https://www.confluent.io/ko-kr/blog/how-to-build-a-data-mesh-using-event-streams/
• hosted version: https://www.confluent-data-mesh-prototype.com/
• Real-life Example
• https://developer.confluent.io/use-case/financial-services/saxo-banks-data-mesh-architecture/
• https://www.confluent.io/blog/distributed-domain-driven-architecture-data-mesh-best-practices/
55. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Learn More at
55
Learn more about using Kafka to develop a Data Mesh and explore how to build
a cloud-native Data Mesh using Confluent’s fully managed, serverless Apache
Kafka® service at https://developer.confluent.io/learn-kafka/data-mesh
Confluent Cloud
cnfl.io/confluent-cloud
Promo Code: DATAMESH101
Get Started Today