SlideShare a Scribd company logo
1 of 50
Download to read offline
Apache Kafka and the Data Mesh
Ben Stopford, Michael G. Noll
Office of the CTO, Confluent
Kafka Summit Americas, September 14-15, 2021
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What is Data Mesh?
2
Data Marts DDD Microservices Event Streaming
Domain
Inventory
Orders
Shipments
Data Product
Data Mesh
...
Data ownership by
domain
Data as a product Data governed
wherever it is
Data available
everywhere, self
serve
1 2 3 4
The Principles of a Data Mesh
4
A First Look
Domain
Inventory
Orders
Shipments
...
Data Product
Data Mesh
Domain-driven
Decentralization
Local Autonomy
(Organizational Concerns)
Data as a
First-class Product
Product thinking,
“Microservice for Data”
Federated
Governance
Interoperability,
Network Effects
(Organizational Concerns)
Self-serve
Data Platform
Infra Tooling,
Across Domains
1 2 3 4
The Principles of a Data Mesh
Principle 1: Domain-driven Decentralization
Pattern: Ownership of a data asset given to
the “local” team that is most familiar with it
Centralized
Data Ownership
Decentralized
Data Ownership
Objective: Ensure data is owned by those that truly understand it
Anti-pattern: responsibility for data
becomes the domain of the DWH team
Principle 1: Domain-driven Decentralization
Pattern: Ownership of a data asset given to
the “local” team that is most familiar with it
Centralized
Data Ownership
Decentralized
Data Ownership
Objective: Ensure data is owned by those that truly understand it
Anti-pattern: responsibility for data
becomes the domain of the DWH team
Data Mesh is about Connectivity
8
“Instead of collecting, you want to come up with a
model that allows connectivity of the data.”
Zhamak Dehghani
9
Shipping Data
Joe
Practical example
1. Joe in Inventory has a problem with
Order data.
2. Inventory items are going negative,
because of bad Order data.
3. He could fix the data up locally in the
Inventory domain, and get on with his
job.
4. Or, better, he contacts Alice in Orders and
get it fixed at the source. This is more
reliable as Joe doesn’t fully understand
the Orders process.
5. Ergo, Alice needs be an responsible &
responsive “Data Product Owner”, so
everyone benefits from the fix to Joe’s
problem.
Orders Domain Shipment Domain
Order Data
Inventory Billing Recommendations
Alice
10
Shipping Data
Joe
Practical example
1. Joe in Inventory has a problem with
Order data.
2. Inventory items are going negative,
because of bad Order data.
3. He could fix the data up locally in the
Inventory domain, and get on with his
job.
4. Or, better, he contacts Alice in Orders and
get it fixed at the source. This is more
reliable as Joe doesn’t fully understand
the Orders process.
5. Ergo, Alice needs be an responsible &
responsive “Data Product Owner”, so
everyone benefits from the fix to Joe’s
problem.
Orders Domain Shipment Domain
Order Data
Inventory Billing Recommendations
Alice
Recommendations: Domain-driven Decentralization
11
Learn from DDD:
• Use a standard language and nomenclature for data.
• Business users should understand a data flow diagram.
• The stream of events should create a shared narrative that is business-user comprehensible.
1 2 3 4
Domain-driven
Decentralization
Local Autonomy
(Organizational Concerns)
Data as a
First-class Product
Product thinking,
“Microservice for Data”
Federated
Governance
Interoperability,
Network Effects
(Organizational Concerns)
Self-serve
Data Platform
Infra Tooling,
Across Domains
The Principles of a Data Mesh
Principle 2: Data as a First-Class Product
13
• Objective: Make shared data discoverable, addressable, trustworthy, secure, so other
teams can make good use of it.
• Data is treated as a true product, not a by-product.
This product thinking is important to prevent data chauvinism.
Infra
Code
Data product, a “microservice for the data world”
14
• Data product is a node on the data mesh, situated within a domain.
• Produces—and possibly consumes—high-quality data within the mesh.
• Encapsulates all the elements required for its function, namely data + code + infrastructure.
Data
Creates, manipulates,
serves, etc. that data
Powers the data (e.g., storage) and the
code (e.g., run, deploy, monitor)
“Items about to expire”
Data Product
Data and metadata,
including history
15
Connectivity within the mesh lends itself...
Domain
Data Product
Data Mesh
16
...naturally to Event Streaming with Kafka
Domain
Data Product
Mesh is a logical view, not physical!
Data Mesh
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 17
Event Streaming is Pub/Sub, not Point-to-Point
Data
Product
Data
Product
Data
Product
Data
Product
stream
(persisted) other streams
write
(publish)
read
(consume)
independently
Data producers are scalably decoupled from consumers.
Data Product
Data Product
Why is Event Streaming a good fit for meshing?
18
0 1 2 3 4 5 6 1
7
Streams are real-time, low latency ⇒ Propagate data immediately.
Streams are highly scalable ⇒ Handle today’s massive
data volumes.
Streams are stored, replayable ⇒ Capture real-time & historical
data.
Streams are immutable ⇒ Auditable source of
record.
How to get data into & out of a data product
19
Data Product
Input
Data
Ports
Output
Data
Ports
Snapshot via
Nightly ETL
Snapshot via
Nighty ETL
Continuous
Stream
Snapshot via
Req/Res API
Snapshot via
Req/Res API
1
2
3
Continuous
Stream
Onboarding existing data
20
Data
Product
Input
Data
Ports
Source
Connectors
Use Kafka connectors to stream data from cloud
services and existing systems into the mesh.
https://www.confluent.io/hub/
Data product: what’s happening inside
21
Input
Data
Ports
Output
Data
Ports
…pick your favorites...
Data on the Inside: HOW the domain team solves specific problems
internally? This doesn’t matter to other domains.
Event Streaming inside a data product
22
Input
Data
Ports
Output
Data
Ports
ksqlDB to filter,
process, join,
aggregate, analyze
Stream data from
other DPs or
internal systems
into ksqlDB
1 2 Stream data to
internal systems or
the outside. Pull
queries can drive a
req/res API.
3
Req/Res API
Pull Queries
Use ksqlDB, Kafka Streams apps, etc. for processing data in motion.
Use Kafka connectors and CDC to “streamify” classic databases.
Event Streaming inside a data product
23
Input
Data
Ports
Output
Data
Ports
MySQL
Sink
Connector
Source
Connector
DB client apps
work as usual
Stream data from
other Data Products
into your local DB
Stream data to the outside
with CDC and e.g. the
Outbox Pattern, ksqlDB, etc.
1 3
2
Dealing with data change: schemas & versioning
24
Data
Product
Output
Data
Ports
V1 - user, product, quantity
V2 - userAnonymized, product, quantity
Also, when needed, data can be fully reprocessed by replaying history.
Publish evolving streams with back/forward-compatible schemas.
Publish versioned streams for breaking changes.
Find Data Products
25
Recommendations: Data as a First-class Product
26
1. Data-on-the-Outside is harder to change, but it has more value in a holistic sense.
a. Use schemas as a contract and to find data.
b. Handle incompatible schema changes using Dual Schema Upgrade Window pattern.
2. Get data from the source, not from intermediaries.
a. Otherwise, proliferation of ‘slightly corrupt’ data within the mesh. “Game of Telephone”.
b. Event Streaming makes it easy to subscribe to data from authoritative sources.
3. Change data at the source, including error fixes. Don’t “fix data up” locally.
4. Some data sources will be difficult to turn into first-class data products. Example: Batch-
based sources that lose event-level data or are not reproducible.
a. Use Event Streaming plus CDC, Outbox Pattern, etc. to integrate these into the mesh.
Recommendations: Data as a First-class Product
27
1. Data-on-the-Outside is harder to change, but it has more value in a holistic sense.
a. Use schemas as a contract and to find data.
b. Handle incompatible schema changes using Dual Schema Upgrade Window pattern.
2. Get data from the source, not from intermediaries.
a. Otherwise, proliferation of ‘slightly corrupt’ data within the mesh. “Game of Telephone”.
b. Event Streaming makes it easy to subscribe to data from authoritative sources.
3. Change data at the source, including error fixes. Don’t “fix data up” locally.
4. Some data sources will be difficult to turn into first-class data products. Example: Batch-
based sources that lose event-level data or are not reproducible.
a. Use Event Streaming plus CDC, Outbox Pattern, etc. to integrate these into the mesh.
Recommendations: Data as a First-class Product
28
1. Data-on-the-Outside is harder to change, but it has more value in a holistic sense.
a. Use schemas as a contract and to find data.
b. Handle incompatible schema changes using Dual Schema Upgrade Window pattern.
2. Get data from the source, not from intermediaries. Think: Demeter's law applied to data.
a. Otherwise, proliferation of ‘slightly corrupt’ data within the mesh. “Game of Telephone”.
b. Event Streaming makes it easy to subscribe to data from authoritative sources.
3. Change data at the source, including error fixes. Don’t “fix data up” locally.
4. Some data sources will be difficult to turn into first-class data products. Example: Batch-
based sources that lose event-level data or are not reproducible.
a. Use Event Streaming plus CDC, Outbox Pattern, etc. to integrate these into the mesh.
1 2 3 4
Domain-driven
Decentralization
Local Autonomy
(Organizational Concerns)
Data as a
First-class Product
Product thinking,
“Microservice for Data”
Federated
Governance
Interoperability,
Network Effects
(Organizational Concerns)
Self-serve
Data Platform
Infra Tooling,
Across Domains
The Principles of a Data Mesh
Why Self-service Matters
30
Trade Surveillance System
● Data from 13 sources
● Some sources publish events
● Needed both historical and real-time data
● Historical data from database extracts arranged with dev
team.
● Format of events different to format of extracts
● 9 months of effort to get 13 sources into the new system.
Why Self-service Matters
31
Trade Surveillance System
● Data from 13 sources
● Some sources publish events
● Needed both historical and real-time data
● Historical data from database extracts arranged with dev
team.
● Format of events different to format of extracts
● 9 months of effort to get 13 sources into the new system.
Principle 3: Self-serve Data Platform
32
Central infrastructure that provides real-time and historical data on demand
Objective: Make domains autonomous in their execution through rapid data provisioning
Consuming real-time & historical data from the mesh
33
1) Separate Systems for Real-time and Historical Data (Lambda Architecture)
- Considerations:
- Difficulty to correlate real-time with historical “snapshot” data
- Two systems to manage
- Unlike event streams, snapshots have less granularity
1) One System for Real-time and Historical Data (Kappa Architecture)
- Considerations:
- Operational complexity (addressed in Confluent Cloud)
- Downsides of immutability of regular streams: e.g. altering or deleting events
- Storage cost (addressed in Confluent Cloud, in Apache Kafka with KIP-405)
What this can look like in practice
34
Browse Schemas
35
With ksqlDB the data mesh is queryable and
decentralized.
Destination
Data Port
STREAM
PROCESSOR
ksqlDB
Query is the interface
to the mesh
Events are the interface to
the mesh
36
Mesh is one logical cluster. Data product has another.
Data
Product
Data Product has its own
cluster for internal use
1 2 3 4
Domain-driven
Decentralization
Local Autonomy
(Organizational Concerns)
Data as a
First-class Product
Product thinking,
“Microservice for Data”
Federated
Governance
Interoperability,
Network Effects
(Organizational Concerns)
Self-serve
Data Platform
Infra Tooling,
Across Domains
The Principles of a Data Mesh
Principle 4: Federated Governance
38
• Objective: Independent data products can interoperate and create network effects.
• Establish global standards, like governance, that apply to all data products in the mesh.
• Ideally, these global standards and rules are applied automatically by the platform.
Domain Domain Domain Domain
Self-serve Data Platform
What is decided
locally by a domain?
What is globally?
(implemented and
enforced by platform)
Must balance between Decentralization vs. Centralization. No silver bullet!
Example standard: Identifying customers globally
• Define how data is represented, so you can join and correlate data across different domains.
• Use data contracts, schemas, registries, etc. to implement and enforce such standards.
• Use Event Streaming to retrofit historical data to new requirements, standards.
39
customerId=29639
customerId=29639
customerId=29639
customerId=29639
SELECT … FROM orders o
LEFT JOIN shipments s
ON o.customerId = s.customerId
EMIT CHANGES;
Example standard: Detect errors and recover with Streams
40
• Use strategies like logging, data profiling, data lineage, etc. to detect errors in the mesh.
• Streams are very helpful to detect errors and identify cause-effect relationships.
• Streams let you recover and fix errors: e.g., replay & reprocess historical data.
Data
Product
Output
Data
Ports
0 1 2 3 4 5 6 7 8 9
My App
Bug? Error? Rewind
to start of stream,
then reprocess.
If needed, tell the origin data product to fix problematic data at the source.
Event Streams give
you a powerful
Time Machine.
Example standard: Tracking data lineage with Streams
41
• Lineage must work across domains and data products—and systems, clouds, data centers.
• Event streaming is a foundational technology for this.
On-premise
Recommendations: Federated Governance
42
1. Be pragmatic: Don’t expect governance systems to be perfect.
a. They are a map that helps you navigate the data-landscape of your company.
b. But there will always be roads that have changed or have not been mapped.
2. Governance is more a process—i.e., an organizational concern—than a technology.
3. Beware of centralized data models, which can become slow to change. Where they must
exist, use processes & tooling like GitHub to collaborate and change quickly. Good luck! 🙂
Implementing a Data Mesh
Data Mesh Journey
44
Principle 1
Data should have one owner:
the team that creates it.
Principle 2
Data is your product:
All exposed data should
be good data.
Principle 3
Get access to any data
immediately and painlessly,
be it historical or real-time.
Principle 4: Governance, with standards, security,
lineage, etc. (cross-cutting concerns)
Difficulty
to execute
Start Here
1
2
3
Data Mesh Journey
45
Principle 1
Data should have one owner:
the team that creates it.
Principle 2
Data is your product:
All exposed data should
be good data.
Principle 3
Get access to any data
immediately and painlessly,
be it historical or real-time.
Principle 4: Governance, with standards, security,
lineage, etc. (cross-cutting concerns)
Difficulty
to execute
1
2
3
Start Here
Data Mesh Journey
46
Principle 1
Data should have one owner:
the team that creates it.
Principle 2
Data is your product:
All exposed data should
be good data.
Principle 3
Get access to any data
immediately and painlessly,
be it historical or real-time.
Principle 4: Governance, with standards, security,
lineage, etc. (cross-cutting concerns)
Difficulty
to execute
1
2
3
Start Here
Implement a Data Mesh: Cheat Sheet
- Secure event streams
Access to event streams is
permissioned by a central body.
- Connect from any database
Sink Connectors are made available
for all supported database types to
ease the provisioning of new output
data ports in the mesh.
- Central user interface
- Discovery and registration of event
streams
- Searching schemas for data of
interest
- Previewing event streams
- Requesting to access event streams.
- Data lineage views
47
- Centralize data in motion
Introduce a central event streaming
platform.
- Nominate data owners
Firm owners for all key datasets in the
organization. Make ownership
information broadly accessible.
- Data on demand.
Events are either stored in Kafka
indefinitely or can be republished by
data products on demand.
- Handle Schema Change
Owners publish schema information
to the mesh. Process introduced for
schema change approval.
developer.confluent.io
• Free Courses on all things Kafka
and Event Streaming
• 50+ Design Patterns for Event
Streaming
• And more: Quickstarts, Tutorials, ...
Data Mesh Event Sourcing Kafka 101
Recommended free courses
Thank you!
@benstopford
@miguno

More Related Content

What's hot

Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMDATAVERSITY
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshConfluentInc1
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com confluent
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 

What's hot (20)

Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDM
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 

Similar to Apache Kafka and the Data Mesh | Michael Noll, Confluent

data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptxTarekHamdi8
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentHostedbyConfluent
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motionconfluent
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluentconfluent
 
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationMyth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationDenodo
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Denodo
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDATAVERSITY
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo
 
Secure Your Data with Virtual Data Fabric (ASEAN)
Secure Your Data with Virtual Data Fabric (ASEAN)Secure Your Data with Virtual Data Fabric (ASEAN)
Secure Your Data with Virtual Data Fabric (ASEAN)Denodo
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
"Going Offline", one of the hottest mobile app trends
"Going Offline", one of the hottest mobile app trends"Going Offline", one of the hottest mobile app trends
"Going Offline", one of the hottest mobile app trendsDerek Baron
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Denodo
 
Liberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and DatabricksLiberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and DatabricksPrecisely
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 

Similar to Apache Kafka and the Data Mesh | Michael Noll, Confluent (20)

data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
 
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationMyth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
 
Secure Your Data with Virtual Data Fabric (ASEAN)
Secure Your Data with Virtual Data Fabric (ASEAN)Secure Your Data with Virtual Data Fabric (ASEAN)
Secure Your Data with Virtual Data Fabric (ASEAN)
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
"Going Offline", one of the hottest mobile app trends
"Going Offline", one of the hottest mobile app trends"Going Offline", one of the hottest mobile app trends
"Going Offline", one of the hottest mobile app trends
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
Liberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and DatabricksLiberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and Databricks
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Apache Kafka and the Data Mesh | Michael Noll, Confluent

  • 1. Apache Kafka and the Data Mesh Ben Stopford, Michael G. Noll Office of the CTO, Confluent Kafka Summit Americas, September 14-15, 2021
  • 2. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What is Data Mesh? 2 Data Marts DDD Microservices Event Streaming Domain Inventory Orders Shipments Data Product Data Mesh ...
  • 3. Data ownership by domain Data as a product Data governed wherever it is Data available everywhere, self serve 1 2 3 4 The Principles of a Data Mesh
  • 5. Domain-driven Decentralization Local Autonomy (Organizational Concerns) Data as a First-class Product Product thinking, “Microservice for Data” Federated Governance Interoperability, Network Effects (Organizational Concerns) Self-serve Data Platform Infra Tooling, Across Domains 1 2 3 4 The Principles of a Data Mesh
  • 6. Principle 1: Domain-driven Decentralization Pattern: Ownership of a data asset given to the “local” team that is most familiar with it Centralized Data Ownership Decentralized Data Ownership Objective: Ensure data is owned by those that truly understand it Anti-pattern: responsibility for data becomes the domain of the DWH team
  • 7. Principle 1: Domain-driven Decentralization Pattern: Ownership of a data asset given to the “local” team that is most familiar with it Centralized Data Ownership Decentralized Data Ownership Objective: Ensure data is owned by those that truly understand it Anti-pattern: responsibility for data becomes the domain of the DWH team
  • 8. Data Mesh is about Connectivity 8 “Instead of collecting, you want to come up with a model that allows connectivity of the data.” Zhamak Dehghani
  • 9. 9 Shipping Data Joe Practical example 1. Joe in Inventory has a problem with Order data. 2. Inventory items are going negative, because of bad Order data. 3. He could fix the data up locally in the Inventory domain, and get on with his job. 4. Or, better, he contacts Alice in Orders and get it fixed at the source. This is more reliable as Joe doesn’t fully understand the Orders process. 5. Ergo, Alice needs be an responsible & responsive “Data Product Owner”, so everyone benefits from the fix to Joe’s problem. Orders Domain Shipment Domain Order Data Inventory Billing Recommendations Alice
  • 10. 10 Shipping Data Joe Practical example 1. Joe in Inventory has a problem with Order data. 2. Inventory items are going negative, because of bad Order data. 3. He could fix the data up locally in the Inventory domain, and get on with his job. 4. Or, better, he contacts Alice in Orders and get it fixed at the source. This is more reliable as Joe doesn’t fully understand the Orders process. 5. Ergo, Alice needs be an responsible & responsive “Data Product Owner”, so everyone benefits from the fix to Joe’s problem. Orders Domain Shipment Domain Order Data Inventory Billing Recommendations Alice
  • 11. Recommendations: Domain-driven Decentralization 11 Learn from DDD: • Use a standard language and nomenclature for data. • Business users should understand a data flow diagram. • The stream of events should create a shared narrative that is business-user comprehensible.
  • 12. 1 2 3 4 Domain-driven Decentralization Local Autonomy (Organizational Concerns) Data as a First-class Product Product thinking, “Microservice for Data” Federated Governance Interoperability, Network Effects (Organizational Concerns) Self-serve Data Platform Infra Tooling, Across Domains The Principles of a Data Mesh
  • 13. Principle 2: Data as a First-Class Product 13 • Objective: Make shared data discoverable, addressable, trustworthy, secure, so other teams can make good use of it. • Data is treated as a true product, not a by-product. This product thinking is important to prevent data chauvinism.
  • 14. Infra Code Data product, a “microservice for the data world” 14 • Data product is a node on the data mesh, situated within a domain. • Produces—and possibly consumes—high-quality data within the mesh. • Encapsulates all the elements required for its function, namely data + code + infrastructure. Data Creates, manipulates, serves, etc. that data Powers the data (e.g., storage) and the code (e.g., run, deploy, monitor) “Items about to expire” Data Product Data and metadata, including history
  • 15. 15 Connectivity within the mesh lends itself... Domain Data Product Data Mesh
  • 16. 16 ...naturally to Event Streaming with Kafka Domain Data Product Mesh is a logical view, not physical! Data Mesh
  • 17. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 17 Event Streaming is Pub/Sub, not Point-to-Point Data Product Data Product Data Product Data Product stream (persisted) other streams write (publish) read (consume) independently Data producers are scalably decoupled from consumers.
  • 18. Data Product Data Product Why is Event Streaming a good fit for meshing? 18 0 1 2 3 4 5 6 1 7 Streams are real-time, low latency ⇒ Propagate data immediately. Streams are highly scalable ⇒ Handle today’s massive data volumes. Streams are stored, replayable ⇒ Capture real-time & historical data. Streams are immutable ⇒ Auditable source of record.
  • 19. How to get data into & out of a data product 19 Data Product Input Data Ports Output Data Ports Snapshot via Nightly ETL Snapshot via Nighty ETL Continuous Stream Snapshot via Req/Res API Snapshot via Req/Res API 1 2 3 Continuous Stream
  • 20. Onboarding existing data 20 Data Product Input Data Ports Source Connectors Use Kafka connectors to stream data from cloud services and existing systems into the mesh. https://www.confluent.io/hub/
  • 21. Data product: what’s happening inside 21 Input Data Ports Output Data Ports …pick your favorites... Data on the Inside: HOW the domain team solves specific problems internally? This doesn’t matter to other domains.
  • 22. Event Streaming inside a data product 22 Input Data Ports Output Data Ports ksqlDB to filter, process, join, aggregate, analyze Stream data from other DPs or internal systems into ksqlDB 1 2 Stream data to internal systems or the outside. Pull queries can drive a req/res API. 3 Req/Res API Pull Queries Use ksqlDB, Kafka Streams apps, etc. for processing data in motion.
  • 23. Use Kafka connectors and CDC to “streamify” classic databases. Event Streaming inside a data product 23 Input Data Ports Output Data Ports MySQL Sink Connector Source Connector DB client apps work as usual Stream data from other Data Products into your local DB Stream data to the outside with CDC and e.g. the Outbox Pattern, ksqlDB, etc. 1 3 2
  • 24. Dealing with data change: schemas & versioning 24 Data Product Output Data Ports V1 - user, product, quantity V2 - userAnonymized, product, quantity Also, when needed, data can be fully reprocessed by replaying history. Publish evolving streams with back/forward-compatible schemas. Publish versioned streams for breaking changes.
  • 26. Recommendations: Data as a First-class Product 26 1. Data-on-the-Outside is harder to change, but it has more value in a holistic sense. a. Use schemas as a contract and to find data. b. Handle incompatible schema changes using Dual Schema Upgrade Window pattern. 2. Get data from the source, not from intermediaries. a. Otherwise, proliferation of ‘slightly corrupt’ data within the mesh. “Game of Telephone”. b. Event Streaming makes it easy to subscribe to data from authoritative sources. 3. Change data at the source, including error fixes. Don’t “fix data up” locally. 4. Some data sources will be difficult to turn into first-class data products. Example: Batch- based sources that lose event-level data or are not reproducible. a. Use Event Streaming plus CDC, Outbox Pattern, etc. to integrate these into the mesh.
  • 27. Recommendations: Data as a First-class Product 27 1. Data-on-the-Outside is harder to change, but it has more value in a holistic sense. a. Use schemas as a contract and to find data. b. Handle incompatible schema changes using Dual Schema Upgrade Window pattern. 2. Get data from the source, not from intermediaries. a. Otherwise, proliferation of ‘slightly corrupt’ data within the mesh. “Game of Telephone”. b. Event Streaming makes it easy to subscribe to data from authoritative sources. 3. Change data at the source, including error fixes. Don’t “fix data up” locally. 4. Some data sources will be difficult to turn into first-class data products. Example: Batch- based sources that lose event-level data or are not reproducible. a. Use Event Streaming plus CDC, Outbox Pattern, etc. to integrate these into the mesh.
  • 28. Recommendations: Data as a First-class Product 28 1. Data-on-the-Outside is harder to change, but it has more value in a holistic sense. a. Use schemas as a contract and to find data. b. Handle incompatible schema changes using Dual Schema Upgrade Window pattern. 2. Get data from the source, not from intermediaries. Think: Demeter's law applied to data. a. Otherwise, proliferation of ‘slightly corrupt’ data within the mesh. “Game of Telephone”. b. Event Streaming makes it easy to subscribe to data from authoritative sources. 3. Change data at the source, including error fixes. Don’t “fix data up” locally. 4. Some data sources will be difficult to turn into first-class data products. Example: Batch- based sources that lose event-level data or are not reproducible. a. Use Event Streaming plus CDC, Outbox Pattern, etc. to integrate these into the mesh.
  • 29. 1 2 3 4 Domain-driven Decentralization Local Autonomy (Organizational Concerns) Data as a First-class Product Product thinking, “Microservice for Data” Federated Governance Interoperability, Network Effects (Organizational Concerns) Self-serve Data Platform Infra Tooling, Across Domains The Principles of a Data Mesh
  • 30. Why Self-service Matters 30 Trade Surveillance System ● Data from 13 sources ● Some sources publish events ● Needed both historical and real-time data ● Historical data from database extracts arranged with dev team. ● Format of events different to format of extracts ● 9 months of effort to get 13 sources into the new system.
  • 31. Why Self-service Matters 31 Trade Surveillance System ● Data from 13 sources ● Some sources publish events ● Needed both historical and real-time data ● Historical data from database extracts arranged with dev team. ● Format of events different to format of extracts ● 9 months of effort to get 13 sources into the new system.
  • 32. Principle 3: Self-serve Data Platform 32 Central infrastructure that provides real-time and historical data on demand Objective: Make domains autonomous in their execution through rapid data provisioning
  • 33. Consuming real-time & historical data from the mesh 33 1) Separate Systems for Real-time and Historical Data (Lambda Architecture) - Considerations: - Difficulty to correlate real-time with historical “snapshot” data - Two systems to manage - Unlike event streams, snapshots have less granularity 1) One System for Real-time and Historical Data (Kappa Architecture) - Considerations: - Operational complexity (addressed in Confluent Cloud) - Downsides of immutability of regular streams: e.g. altering or deleting events - Storage cost (addressed in Confluent Cloud, in Apache Kafka with KIP-405)
  • 34. What this can look like in practice 34 Browse Schemas
  • 35. 35 With ksqlDB the data mesh is queryable and decentralized. Destination Data Port STREAM PROCESSOR ksqlDB Query is the interface to the mesh Events are the interface to the mesh
  • 36. 36 Mesh is one logical cluster. Data product has another. Data Product Data Product has its own cluster for internal use
  • 37. 1 2 3 4 Domain-driven Decentralization Local Autonomy (Organizational Concerns) Data as a First-class Product Product thinking, “Microservice for Data” Federated Governance Interoperability, Network Effects (Organizational Concerns) Self-serve Data Platform Infra Tooling, Across Domains The Principles of a Data Mesh
  • 38. Principle 4: Federated Governance 38 • Objective: Independent data products can interoperate and create network effects. • Establish global standards, like governance, that apply to all data products in the mesh. • Ideally, these global standards and rules are applied automatically by the platform. Domain Domain Domain Domain Self-serve Data Platform What is decided locally by a domain? What is globally? (implemented and enforced by platform) Must balance between Decentralization vs. Centralization. No silver bullet!
  • 39. Example standard: Identifying customers globally • Define how data is represented, so you can join and correlate data across different domains. • Use data contracts, schemas, registries, etc. to implement and enforce such standards. • Use Event Streaming to retrofit historical data to new requirements, standards. 39 customerId=29639 customerId=29639 customerId=29639 customerId=29639 SELECT … FROM orders o LEFT JOIN shipments s ON o.customerId = s.customerId EMIT CHANGES;
  • 40. Example standard: Detect errors and recover with Streams 40 • Use strategies like logging, data profiling, data lineage, etc. to detect errors in the mesh. • Streams are very helpful to detect errors and identify cause-effect relationships. • Streams let you recover and fix errors: e.g., replay & reprocess historical data. Data Product Output Data Ports 0 1 2 3 4 5 6 7 8 9 My App Bug? Error? Rewind to start of stream, then reprocess. If needed, tell the origin data product to fix problematic data at the source. Event Streams give you a powerful Time Machine.
  • 41. Example standard: Tracking data lineage with Streams 41 • Lineage must work across domains and data products—and systems, clouds, data centers. • Event streaming is a foundational technology for this. On-premise
  • 42. Recommendations: Federated Governance 42 1. Be pragmatic: Don’t expect governance systems to be perfect. a. They are a map that helps you navigate the data-landscape of your company. b. But there will always be roads that have changed or have not been mapped. 2. Governance is more a process—i.e., an organizational concern—than a technology. 3. Beware of centralized data models, which can become slow to change. Where they must exist, use processes & tooling like GitHub to collaborate and change quickly. Good luck! 🙂
  • 44. Data Mesh Journey 44 Principle 1 Data should have one owner: the team that creates it. Principle 2 Data is your product: All exposed data should be good data. Principle 3 Get access to any data immediately and painlessly, be it historical or real-time. Principle 4: Governance, with standards, security, lineage, etc. (cross-cutting concerns) Difficulty to execute Start Here 1 2 3
  • 45. Data Mesh Journey 45 Principle 1 Data should have one owner: the team that creates it. Principle 2 Data is your product: All exposed data should be good data. Principle 3 Get access to any data immediately and painlessly, be it historical or real-time. Principle 4: Governance, with standards, security, lineage, etc. (cross-cutting concerns) Difficulty to execute 1 2 3 Start Here
  • 46. Data Mesh Journey 46 Principle 1 Data should have one owner: the team that creates it. Principle 2 Data is your product: All exposed data should be good data. Principle 3 Get access to any data immediately and painlessly, be it historical or real-time. Principle 4: Governance, with standards, security, lineage, etc. (cross-cutting concerns) Difficulty to execute 1 2 3 Start Here
  • 47. Implement a Data Mesh: Cheat Sheet - Secure event streams Access to event streams is permissioned by a central body. - Connect from any database Sink Connectors are made available for all supported database types to ease the provisioning of new output data ports in the mesh. - Central user interface - Discovery and registration of event streams - Searching schemas for data of interest - Previewing event streams - Requesting to access event streams. - Data lineage views 47 - Centralize data in motion Introduce a central event streaming platform. - Nominate data owners Firm owners for all key datasets in the organization. Make ownership information broadly accessible. - Data on demand. Events are either stored in Kafka indefinitely or can be republished by data products on demand. - Handle Schema Change Owners publish schema information to the mesh. Process introduced for schema change approval.
  • 48. developer.confluent.io • Free Courses on all things Kafka and Event Streaming • 50+ Design Patterns for Event Streaming • And more: Quickstarts, Tutorials, ...
  • 49. Data Mesh Event Sourcing Kafka 101 Recommended free courses