Apache Kafka® and the Data Mesh
James Gollan
Senior Solutions Engineer, Confluent
Gnanaguru (Guru) Sattanathan
Senior Solutions Engineer, Confluent
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Agenda
2
Opening & Introduction
Data Mesh - A brief recap
Apache Kafka & Data Mesh
How to get started ?
Demo
What is Data Mesh ?
Several historical influences
4
DDD Microservices
Data Marts Event Streaming
Data on the Inside /
Data on the Outside
5
Data Mesh
A First Look
Domain
Retail
Core Banking
Institutional
...
Data
Product
Domain-driven
Decentralization
Local Autonomy
Per Domain
(Organizational
Concerns)
Data as a
First-class Product
Product thinking,
“Microservice for
Data”
Federated
Governance
Interoperability
Across Domains,
Network Effects
(Organizational
Concerns)
Self-serve
Data Platform
Infrastructure as a
Platform
Across Domains
1 2 3 4
The Principles of a Data Mesh
Principle 1: Domain-driven Decentralization
Anti-pattern: responsibility for data
becomes the domain of the DWH team
Pattern: Ownership of a data asset given to
the “local” team that is most familiar with it
Centralized
Data Ownership
Decentralized
Data Ownership
Objective: Ensure data is owned by those that truly understand it
Principle 2: Data as a First-Class Product
8
• Objective: Make shared data discoverable, addressable, trustworthy, secure,
so other teams can make good use of it.
• Data is treated as a true product, not a by-product.
This product thinking is important to prevent data chauvinism.
Principle 3: Self-serve Data Platform
9
Central infrastructure that provides real-time and historical data on demand
Objective: Make domains autonomous in their execution through rapid data provisioning
Principle 4: Federated Governance
10
• Objective: Independent data products can interoperate and create network effects.
• Establish global standards, like governance, that apply to all data products in the mesh.
• Ideally, these global standards and rules are applied automatically by the platform.
Domain Domain Domain Domain
Self-serve Data Platform
What is decided
locally by a domain?
What is globally?
(implemented and
enforced by platform)
Must balance between Decentralization vs. Centralization. No silver bullet!
Why Apache Kafka ?
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Paradigm for Data-at-Rest: Relational Databases
Databases
Slow, daily
batch processing
Simple, static
real-time queries
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Spaghetti: Data architectures often lack rigour
13
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka provides a solution. The implementation.
14
Kafka
Centralize an immutable stream of facts. Decentralize the freedom to act, adapt, and change.
Messaging reimagined as a 1st class data
system
01
Publish & Subscribe
to Streams of Events
02
Store
your Event Streams
03
Process & Analyze
your Events Streams
Data Product
Data Product
Why is Event Streaming a good fit for meshing?
16
0 1 2 3 4 5 6 1
7
Streams are real-time, low latency ⇒ Propagate data immediately.
Streams are highly scalable ⇒ Handle today’s massive data volumes.
Streams are stored, replayable ⇒ Capture real-time & historical data.
Streams are immutable ⇒ Auditable source of record.
Streams are addressable, discoverable, … ⇒ Meet key criteria for mesh data.
Streams are popular for Microservices ⇒ Adapting to Data Mesh is often easy.
Onboarding existing data
17
Data
Product
Input
Data
Ports
Source
Connectors
Use Kafka connectors to stream data from cloud services and existing systems
into the mesh.
https://www.confluent.io/hub/
Instantly Connect Popular Data Sources & Sinks
20+
Partner Supported
(self-managed)
Data Diode
Confluent Supported
(self-managed)
90+
Growing list of fully managed
connectors in Cloud
Amazon S3 Blob storage
30+
Kinesis
Redshift
Event Hubs
Data Lake Gen 2
Cloud Dataproc
Event Streaming inside a data product
19
Input
Data
Ports
Output
Data
Ports
ksqlDB to filter,
process, join,
aggregate, analyze
Stream data from
other DPs or internal
systems into ksqlDB
1 2 Stream data to internal
systems or the outside.
Pull queries can drive a
req/res API.
3
Req/Res API
Pull Queries
Use ksqlDB, Kafka Streams apps, etc. for processing data in motion.
Event Streaming inside a data product
20
Input
Data
Ports
Output
Data
Ports
MySQL
Sink
Connector
Source
Connector
DB client apps
work as usual
Stream data from
other Data Products
into your local DB
Stream data to the outside
with CDC and e.g. the
Outbox Pattern, ksqlDB, etc.
1 3
2
Use Kafka connectors and CDC to “streamify” classic databases.
21
Ease of replication
across the Mesh
Cluster Linking & Other
Replication capabilities
Data
Product
STREAM
PROCESSOR
ksqlDB
Query is the interface
to the mesh
Events are the interface to
the mesh
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 22
developer.confluent.io
How to get started ?

Apache Kafka® and the Data Mesh

  • 1.
    Apache Kafka® andthe Data Mesh James Gollan Senior Solutions Engineer, Confluent Gnanaguru (Guru) Sattanathan Senior Solutions Engineer, Confluent
  • 2.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Agenda 2 Opening & Introduction Data Mesh - A brief recap Apache Kafka & Data Mesh How to get started ? Demo
  • 3.
  • 4.
    Several historical influences 4 DDDMicroservices Data Marts Event Streaming Data on the Inside / Data on the Outside
  • 5.
    5 Data Mesh A FirstLook Domain Retail Core Banking Institutional ... Data Product
  • 6.
    Domain-driven Decentralization Local Autonomy Per Domain (Organizational Concerns) Dataas a First-class Product Product thinking, “Microservice for Data” Federated Governance Interoperability Across Domains, Network Effects (Organizational Concerns) Self-serve Data Platform Infrastructure as a Platform Across Domains 1 2 3 4 The Principles of a Data Mesh
  • 7.
    Principle 1: Domain-drivenDecentralization Anti-pattern: responsibility for data becomes the domain of the DWH team Pattern: Ownership of a data asset given to the “local” team that is most familiar with it Centralized Data Ownership Decentralized Data Ownership Objective: Ensure data is owned by those that truly understand it
  • 8.
    Principle 2: Dataas a First-Class Product 8 • Objective: Make shared data discoverable, addressable, trustworthy, secure, so other teams can make good use of it. • Data is treated as a true product, not a by-product. This product thinking is important to prevent data chauvinism.
  • 9.
    Principle 3: Self-serveData Platform 9 Central infrastructure that provides real-time and historical data on demand Objective: Make domains autonomous in their execution through rapid data provisioning
  • 10.
    Principle 4: FederatedGovernance 10 • Objective: Independent data products can interoperate and create network effects. • Establish global standards, like governance, that apply to all data products in the mesh. • Ideally, these global standards and rules are applied automatically by the platform. Domain Domain Domain Domain Self-serve Data Platform What is decided locally by a domain? What is globally? (implemented and enforced by platform) Must balance between Decentralization vs. Centralization. No silver bullet!
  • 11.
  • 12.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Paradigm for Data-at-Rest: Relational Databases Databases Slow, daily batch processing Simple, static real-time queries
  • 13.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Spaghetti: Data architectures often lack rigour 13
  • 14.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Kafka provides a solution. The implementation. 14 Kafka Centralize an immutable stream of facts. Decentralize the freedom to act, adapt, and change.
  • 15.
    Messaging reimagined asa 1st class data system 01 Publish & Subscribe to Streams of Events 02 Store your Event Streams 03 Process & Analyze your Events Streams
  • 16.
    Data Product Data Product Whyis Event Streaming a good fit for meshing? 16 0 1 2 3 4 5 6 1 7 Streams are real-time, low latency ⇒ Propagate data immediately. Streams are highly scalable ⇒ Handle today’s massive data volumes. Streams are stored, replayable ⇒ Capture real-time & historical data. Streams are immutable ⇒ Auditable source of record. Streams are addressable, discoverable, … ⇒ Meet key criteria for mesh data. Streams are popular for Microservices ⇒ Adapting to Data Mesh is often easy.
  • 17.
    Onboarding existing data 17 Data Product Input Data Ports Source Connectors UseKafka connectors to stream data from cloud services and existing systems into the mesh. https://www.confluent.io/hub/
  • 18.
    Instantly Connect PopularData Sources & Sinks 20+ Partner Supported (self-managed) Data Diode Confluent Supported (self-managed) 90+ Growing list of fully managed connectors in Cloud Amazon S3 Blob storage 30+ Kinesis Redshift Event Hubs Data Lake Gen 2 Cloud Dataproc
  • 19.
    Event Streaming insidea data product 19 Input Data Ports Output Data Ports ksqlDB to filter, process, join, aggregate, analyze Stream data from other DPs or internal systems into ksqlDB 1 2 Stream data to internal systems or the outside. Pull queries can drive a req/res API. 3 Req/Res API Pull Queries Use ksqlDB, Kafka Streams apps, etc. for processing data in motion.
  • 20.
    Event Streaming insidea data product 20 Input Data Ports Output Data Ports MySQL Sink Connector Source Connector DB client apps work as usual Stream data from other Data Products into your local DB Stream data to the outside with CDC and e.g. the Outbox Pattern, ksqlDB, etc. 1 3 2 Use Kafka connectors and CDC to “streamify” classic databases.
  • 21.
    21 Ease of replication acrossthe Mesh Cluster Linking & Other Replication capabilities Data Product STREAM PROCESSOR ksqlDB Query is the interface to the mesh Events are the interface to the mesh
  • 22.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 22 developer.confluent.io
  • 23.
    How to getstarted ?