SlideShare a Scribd company logo
Data Mesh 101
@yourtwitterhandle | developer.confluent.io
What’s Covered
1 What Is Data Mesh?
2 Hands On: Creating Your Own Data Mesh
3 Data Ownership by Domain
4 Data as a Product
5 Data Available Everywhere, Self-Service
6 Data Governed Wherever It Is
7 Implementing a Data Mesh
8 Hands On: Exploring the Data Mesh
9 Hands On: Creating a Data Product
What Is Data Mesh?
@yourtwitterhandle | developer.confluent.io
What Is Data Mesh?
DDD Microservices Event Streaming
Data Marts
DATA MESH
Domain
Inventory
Orders
Shipments
Data Product
...
@yourtwitterhandle | developer.confluent.io
The Principles of a Data Mesh
Data ownership by
domain
Data as a product Data governed
wherever it is
Data available
everywhere, self
serve
1 2 3 4
@yourtwitterhandle | developer.confluent.io
Spaghetti: Data Architectures Often Lack
Rigor
@yourtwitterhandle | developer.confluent.io
Centralized Event Streams. Decentralized Data
Products.
Kafka
Apps
Apps
Apps
Apps
Apps Apps
Centralize an immutable stream of facts.
Decentralize the freedom to act, adapt,
and change.
@yourtwitterhandle | developer.confluent.io
DATA MESH
Domain
Data Product
A First Look
Inventory
Orders
Shipments
...
GET STARTED TODAY
Confluent Cloud
cnfl.io/confluent-cloud
PROMO CODE: DATAMESH101
$101 of free usage
Hands On: Creating Your Own
Data Mesh
@yourtwitterhandle | developer.confluent.io
Hands On: Creating Your Own Data Mesh
Data Ownership by Domain
@yourtwitterhandle | developer.confluent.io
The Principles of a Data Mesh
Domain-driven
Decentralization
Data as a First
Class Product
Federated
Governance
Self-Serve Data
Platform
1 2 3 4
Local Autonomy
(Organizational Concerns)
@yourtwitterhandle | developer.confluent.io
Principle 1: Domain-Driven Decentralization
Objective: Ensure data is owned by those that truly understand it
Pattern: Ownership of a data asset given to the
“local” team that is most familiar with it
Centralized
Data Ownership
Decentralized
Data Ownership
Anti-pattern: responsibility for data becomes
the domain of the DWH team
@yourtwitterhandle | developer.confluent.io
Practical Example
Shipping Data
Joe
1. Joe in Inventory has a problem with Order
data.
2. Inventory items are going negative,
because of bad Order data.
3. He could fix the data up locally in the
Inventory domain, and get on with his job.
4. Or, better, he contacts Alice in Orders and
get it fixed at the source. This is more
reliable as Joe doesn’t fully understand
the Orders process.
5. Ergo, Alice needs be an responsible &
responsive “Data Product Owner”, so
everyone benefits from the fix to Joe’s
problem.
Orders Domain Shipment Domain
Order Data
Inventory Billing Recommendations
Alice
@yourtwitterhandle | developer.confluent.io
Recommendations: Domain-Driven
Decentralization
Learn from DDD:
● Use a standard language and nomenclature for data.
● Business users should understand a data flow diagram.
● The stream of events should create a shared narrative that is business-user
comprehensible.
Data as a Product
@yourtwitterhandle | developer.confluent.io
The Principles of a Data Mesh
Domain-driven
Decentralization
Data as a First-
Class Product
Federated
Governance
Self-Serve Data
Platform
1 2 3 4
Local Autonomy
(Organizational Concerns)
Product thinking,
“Microservice for Data”
@yourtwitterhandle | developer.confluent.io
Principle 2: Data as a First-Class Product
Objective: Make shared data discoverable, addressable, trustworthy, secure, so
other teams can make good use of it.
● Data is treated as a true product, not a by-product.
This product thinking is
important to prevent
data chauvinism.
@yourtwitterhandle | developer.confluent.io
Domain
Data Product
Data Product, a “Microservice for the
Data World”
● Data product is a node on the data mesh, situated within a domain.
● Produces—and possibly consumes—high-quality data within the mesh.
● Encapsulates all the elements required for its function, namely data + code + infrastructure.
Infra
Code
Data
Creates, manipulates,
serves, etc. that data
Powers the data (e.g., storage) and the
code (e.g., run, deploy, monitor)
“Items About to Expire”
Data Product
Data and metadata,
including history
@yourtwitterhandle | developer.confluent.io
Connectivity Within the Mesh Lends Itself …
DATA MESH
Domain
Data Product
@yourtwitterhandle | developer.confluent.io
…Naturally to Event Streaming with
Apache Kafka®
DATA MESH
Domain
Data Product
Mesh is a logical view,
not physical!
@yourtwitterhandle | developer.confluent.io
Data
Product
Event Streaming Is the Pub/Sub, Not
Point to Point
stream
(persisted) other streams
write
(publish)
read
(consume)
independently
Data producers are scalably decoupled from consumers
Data
Product
Data
Product
Data
Product
@yourtwitterhandle | developer.confluent.io
Why Is Event Streaming a Good Fit for Meshing?
Streams are real-time, low latency ⇒ Propagate data
immediately.
Streams are highly scalable ⇒ Handle today’s
massive data volumes.
Streams are stored, replayable ⇒ Capture real-time &
historical data.
Streams are immutable ⇒ Auditable
source of record.
Streams are addressable, discoverable, … ⇒ Meet key criteria for mesh
Data
Product
Data
Product
1 2 3 4 5 6 7
@yourtwitterhandle | developer.confluent.io
How to Get Data Into and Out of a
Data Product
Snapshot via
Nighty ETL
Continuous
Stream
Snapshot via
Req/Res API
Snapshot via
Nighty ETL
Snapshot via
Req/Res API
1
2
3
Data
Product
Output
Data
Ports
Input
Data
Ports
Continuous
Stream
@yourtwitterhandle | developer.confluent.io
Onboarding Existing Data
Source
Connectors
Use Kafka connectors to stream data from cloud
services and existing systems into the mesh
Data
Product
Input
Data
Ports
https://www.confluent.io/hub/
@yourtwitterhandle | developer.confluent.io
Data Product: What’s Happening Inside
Data on the Inside: HOW the domain team solves specific problems
internally? This doesn’t matter to other domains.
…pick your favorites...
Output
Data
Ports
Input
Data
Ports
@yourtwitterhandle | developer.confluent.io
Event Streaming Inside a Data Product
Use ksqlDB, Kafka Streams apps, etc., for processing data in motion
ksqlDB to filter,
process, join,
aggregate, analyze
Stream data from
other DPs or internal
systems into ksqlDB
1 2 Stream data to internal
systems or the outside.
Pull queries can drive a
req/res API.
3
Req/Res API
Pull Queries
Output
Data
Ports
Input
Data
Ports
@yourtwitterhandle | developer.confluent.io
Event Streaming Inside a Data Product
Use Kafka connectors and CDC to “streamify” classic databases
DB client apps
work as usual
Stream data from
other Data Products
into your local DB
1 2 Stream data to the
outside with CDC and
e.g. the Outbox
Pattern, ksqlDB, etc.
3
Sink
Connector
Source
Connector
MySQL
Output
Data
Ports
Input
Data
Ports
@yourtwitterhandle | developer.confluent.io
Dealing with Data Change: Schemas and
Versioning
Publish evolving streams with back/forward-compatible schemas.
Publish versioned streams for breaking changes.
V1 - user, product, quantity
V2 - userAnonymized, product, quantity
Data
Product
Output
Data
Ports
Also, when needed, data can be fully reprocessed by replaying history.
@yourtwitterhandle | developer.confluent.io
Recommendations: Data as a First-Class
Product
1. Data-on-the-Outside is harder to change, but it has more value in a holistic sense.
a. Use schemas as a contract.
b. Handle incompatible schema changes using Dual Schema Upgrade Window pattern.
2. Get data from the source, not from intermediaries. Think: Demeter's law applied to data.
a. Otherwise, proliferation of ‘slightly corrupt’ data within the mesh. “Game of Telephone”.
b. Event Streaming makes it easy to subscribe to data from authoritative sources.
3. Change data at the source, including error fixes. Don’t “fix data up” locally.
4. Some data sources will be difficult to turn into first-class data products. Example: Batch-
based sources that lose event-level data or are not reproducible.
a. Use Event Streaming plus CDC, Outbox Pattern, etc. to integrate these into the mesh.
@yourtwitterhandle | developer.confluent.io
Learn More
Find out how to implement a Data Product for data in motion with:
1. Apache Kafka 101 course
2. Kafka Connect 101 course
3. Building Data Pipelines with Apache Kafka & Confluent course
4. ksqlDB 101 course
developer.confluent.io/learn-kafka/
Data Available Everywhere, Self-
Service
@yourtwitterhandle | developer.confluent.io
The Principles of a Data Mesh
Domain-driven
Decentralization
Data as a First
Class Product
Federated
Governance
Self-Serve Data
Platform
1 2 3 4
Local Autonomy
(Organizational Concerns)
Product thinking,
“Microservice for Data”
Infra Tooling,
Across Domains
@yourtwitterhandle | developer.confluent.io
Why Self-Service Matters
Trade Surveillance System
● Data from 13 sources
● Some sources publish events
● Needed both historical and real-time data
● Historical data from database extracts arranged with dev team.
● Format of events different to format of extracts
● 9 months of effort to get 13 sources into the new system.
@yourtwitterhandle | developer.confluent.io
Principle 3: Self-Serve Data Platform
Central infrastructure that provides real-time and historical data on demand
Objective: Make domains autonomous in their execution through rapid data
provisioning
@yourtwitterhandle | developer.confluent.io
Consuming Real-Time & Historical Data from
the Mesh
1) Separate Systems for Real-Time and Historical Data (Lambda Architecture)
● Considerations:
○ Difficulty to correlate real-time with historical “snapshot” data
○ Two systems to manage
○ Unlike event streams, snapshots have less granularity
1) One System for Real-Time and Historical Data (Kappa Architecture)
● Considerations:
○ Operational complexity (addressed in Confluent Cloud)
○ Downsides of immutability of regular streams: e.g. altering or deleting events
○ Storage cost (addressed in Confluent Cloud, in Apache Kafka with KIP-405)
@yourtwitterhandle | developer.confluent.io
Browse Schemas
What This Can Look Like in Practice
@yourtwitterhandle | developer.confluent.io
Implementation: Database Inside Out
Connector
Messaging that
Remembers
STREAM
PROCESSOR
Database
Database
Database/View
ksqlDB
Connector
@yourtwitterhandle | developer.confluent.io
Data Mesh Is Queryable and Decentralized with
ksqlDB
Destination
Data Port
Query is the interface
to the mesh
Events are the
interface to the mesh
STREAM
PROCESSOR
@yourtwitterhandle | developer.confluent.io
Mesh is One Logical Cluster. Data Product Has
Another.
Data
Product
Data Product
has its own cluster
for internal use
@yourtwitterhandle | developer.confluent.io
Learn More
Find out more about how to use Kafka and event streaming:
1. Apache Kafka 101 course
2. Event Sourcing and Event Storage with Apache Kafka course
Learn how to create materialized views and process event streams:
1. Inside ksqlDB course
developer.confluent.io/learn-kafka/
Data Governed Wherever It Is
@yourtwitterhandle | developer.confluent.io
The Principles of a Data Mesh
Domain-driven
Decentralization
Data as a First
Class Product
Federated
Governance
Self-Serve Data
Platform
1 2 3 4
Local Autonomy
(Organizational Concerns)
Product thinking,
“Microservice for Data”
Infra Tooling,
Across Domains
Interoperability, Network Effects
(Organizational Concerns)
@yourtwitterhandle | developer.confluent.io
Domain Domain Domain
Domain
Principle 4: Federated Governance
Objective: Independent data products can interoperate and create network effects.
● Establish global standards, like governance, that apply to all data products in the mesh.
● Ideally, these global standards and rules are applied automatically by the platform.
Self-Serve Data Platform
What is decided
locally by a domain?
What is globally?
(implemented and
enforced by
platform)
Must balance between Decentralization vs. Centralization. No silver bullet!
@yourtwitterhandle | developer.confluent.io
SELECT … FROM orders o
LEFT JOIN shipments s
ON o.customerId = s.customerId
EMIT CHANGES;
Example Standard: Identifying Customers
Globally
● Define how data is represented, so you can join and correlate data across different domains.
● Use data contracts, schemas, registries, etc. to implement and enforce such standards.
● Use Event Streaming to retrofit historical data to new requirements, standards.
Domain
Data Product
customerId=29639
customerId=29639
customerId=29639
customerId=29639
@yourtwitterhandle | developer.confluent.io
Example Standard: Detect Errors and Recover
with Stream
● Use strategies like logging, data profiling, data lineage, etc. to detect errors in the mesh.
● Streams are very helpful to detect errors and identify cause-effect relationships.
● Streams let you recover and fix errors: e.g., replay & reprocess historical data.
My App
Bug? Error? Rewind to start of
stream, then reprocess.
Event Streams
give you a powerful
Time Machine.
0 1 2 3 4 5 6 7 8 9
If needed, tell the
origin data product to
fix problematic data at
the source.
Data
Product
Output
Data
Ports
@yourtwitterhandle | developer.confluent.io
Example Standard: Tracking Data Lineage
with Streams
● Lineage must work across domains and data products—and systems, clouds, data centers.
● Event streaming is a foundational technology for this.
Domain
Data Product
@yourtwitterhandle | developer.confluent.io
Recommendations: Federated Governance
1. Be pragmatic: Don’t expect governance systems to be perfect.
a. They are a map that helps you navigate the data-landscape of your company.
b. But there will always be roads that have changed or have not been mapped.
2. Governance is more a process—i.e., an organizational concern—than a technology.
3. Beware of centralized data models, which can become slow to change. Where they
must exist, use processes & tooling like GitHub to collaborate and change quickly.
Good luck! 🙂
Implementing a Data Mesh
@yourtwitterhandle | developer.confluent.io
Data Mesh Journey
Principle 1
Data should have one owner:
the team that creates it.
Principle 2
Data is your product:
All exposed data should be
good data.
Principle 3
Get access to any data
immediately and painlessly, be
it historical or real-time.
Principle 4: Governance, with standards, security, lineage, etc.
(cross-cutting concerns)
Difficulty
to execute
Start Here
1
2
3
@yourtwitterhandle | developer.confluent.io
Implement a Data Mesh
● Centralize data in motion
Introduce a central event streaming
platform.
● Nominate data owners
Firm owners for all key datasets in the
organization. Make ownership information
broadly accessible.
● Data on demand.
Events are either stored in Kafka indefinitely
or can be republished by data products on
demand.
● Handle Schema Change
Owners publish schema information to the
mesh. Process introduced for schema
change approval.
● Secure event streams
Access to event streams is permissioned by
a central body.
● Connect from any database
Sink Connectors are made available for all
supported database types to ease the
provisioning of new output data ports in the
mesh.
● Central user interface
Discovery and registration of event streams
Searching schemas for data of interest
Previewing event streams
Requesting to access event streams.
Data lineage views
Hands On: Exploring the Data Mesh
@yourtwitterhandle | developer.confluent.io
Hands On: Exploring the Data Mesh
Hands On: Creating a Data Product
@yourtwitterhandle | developer.confluent.io
Hands On: Creating a Data Product
Your Apache Kafka®
journey begins here
developer.confluent.io

More Related Content

What's hot

Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
ConfluentInc1
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
HostedbyConfluent
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Data Mesh
Data MeshData Mesh
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
DATAVERSITY
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
DATAVERSITY
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
HostedbyConfluent
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
 
Data mesh
Data meshData mesh
Data mesh
ManojKumarR41
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
DataWorks Summit/Hadoop Summit
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
Zaloni
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
Jochem van Grondelle
 
Data Migration to Azure
Data Migration to AzureData Migration to Azure
Data Migration to Azure
Sanjay B. Bhakta
 

What's hot (20)

Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Data mesh
Data meshData mesh
Data mesh
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
 
Data Migration to Azure
Data Migration to AzureData Migration to Azure
Data Migration to Azure
 

Similar to data-mesh-101.pptx

Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, Confluent
HostedbyConfluent
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
HostedbyConfluent
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
confluent
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
HostedbyConfluent
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai Wähner
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Denodo
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
Robert Grossman
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
Frank Kienle
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
Sheetal Pratik
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Denodo
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
RAHULRAHU8
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
Leandro Totino Pereira
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
confluent
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
James Serra
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
 
"Going Offline", one of the hottest mobile app trends
"Going Offline", one of the hottest mobile app trends"Going Offline", one of the hottest mobile app trends
"Going Offline", one of the hottest mobile app trends
Derek Baron
 

Similar to data-mesh-101.pptx (20)

Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, Confluent
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 
"Going Offline", one of the hottest mobile app trends
"Going Offline", one of the hottest mobile app trends"Going Offline", one of the hottest mobile app trends
"Going Offline", one of the hottest mobile app trends
 

Recently uploaded

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 

Recently uploaded (20)

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 

data-mesh-101.pptx

  • 2. @yourtwitterhandle | developer.confluent.io What’s Covered 1 What Is Data Mesh? 2 Hands On: Creating Your Own Data Mesh 3 Data Ownership by Domain 4 Data as a Product 5 Data Available Everywhere, Self-Service 6 Data Governed Wherever It Is 7 Implementing a Data Mesh 8 Hands On: Exploring the Data Mesh 9 Hands On: Creating a Data Product
  • 3. What Is Data Mesh?
  • 4. @yourtwitterhandle | developer.confluent.io What Is Data Mesh? DDD Microservices Event Streaming Data Marts DATA MESH Domain Inventory Orders Shipments Data Product ...
  • 5. @yourtwitterhandle | developer.confluent.io The Principles of a Data Mesh Data ownership by domain Data as a product Data governed wherever it is Data available everywhere, self serve 1 2 3 4
  • 6. @yourtwitterhandle | developer.confluent.io Spaghetti: Data Architectures Often Lack Rigor
  • 7. @yourtwitterhandle | developer.confluent.io Centralized Event Streams. Decentralized Data Products. Kafka Apps Apps Apps Apps Apps Apps Centralize an immutable stream of facts. Decentralize the freedom to act, adapt, and change.
  • 8. @yourtwitterhandle | developer.confluent.io DATA MESH Domain Data Product A First Look Inventory Orders Shipments ...
  • 9. GET STARTED TODAY Confluent Cloud cnfl.io/confluent-cloud PROMO CODE: DATAMESH101 $101 of free usage
  • 10. Hands On: Creating Your Own Data Mesh
  • 11. @yourtwitterhandle | developer.confluent.io Hands On: Creating Your Own Data Mesh
  • 13. @yourtwitterhandle | developer.confluent.io The Principles of a Data Mesh Domain-driven Decentralization Data as a First Class Product Federated Governance Self-Serve Data Platform 1 2 3 4 Local Autonomy (Organizational Concerns)
  • 14. @yourtwitterhandle | developer.confluent.io Principle 1: Domain-Driven Decentralization Objective: Ensure data is owned by those that truly understand it Pattern: Ownership of a data asset given to the “local” team that is most familiar with it Centralized Data Ownership Decentralized Data Ownership Anti-pattern: responsibility for data becomes the domain of the DWH team
  • 15. @yourtwitterhandle | developer.confluent.io Practical Example Shipping Data Joe 1. Joe in Inventory has a problem with Order data. 2. Inventory items are going negative, because of bad Order data. 3. He could fix the data up locally in the Inventory domain, and get on with his job. 4. Or, better, he contacts Alice in Orders and get it fixed at the source. This is more reliable as Joe doesn’t fully understand the Orders process. 5. Ergo, Alice needs be an responsible & responsive “Data Product Owner”, so everyone benefits from the fix to Joe’s problem. Orders Domain Shipment Domain Order Data Inventory Billing Recommendations Alice
  • 16. @yourtwitterhandle | developer.confluent.io Recommendations: Domain-Driven Decentralization Learn from DDD: ● Use a standard language and nomenclature for data. ● Business users should understand a data flow diagram. ● The stream of events should create a shared narrative that is business-user comprehensible.
  • 17. Data as a Product
  • 18. @yourtwitterhandle | developer.confluent.io The Principles of a Data Mesh Domain-driven Decentralization Data as a First- Class Product Federated Governance Self-Serve Data Platform 1 2 3 4 Local Autonomy (Organizational Concerns) Product thinking, “Microservice for Data”
  • 19. @yourtwitterhandle | developer.confluent.io Principle 2: Data as a First-Class Product Objective: Make shared data discoverable, addressable, trustworthy, secure, so other teams can make good use of it. ● Data is treated as a true product, not a by-product. This product thinking is important to prevent data chauvinism.
  • 20. @yourtwitterhandle | developer.confluent.io Domain Data Product Data Product, a “Microservice for the Data World” ● Data product is a node on the data mesh, situated within a domain. ● Produces—and possibly consumes—high-quality data within the mesh. ● Encapsulates all the elements required for its function, namely data + code + infrastructure. Infra Code Data Creates, manipulates, serves, etc. that data Powers the data (e.g., storage) and the code (e.g., run, deploy, monitor) “Items About to Expire” Data Product Data and metadata, including history
  • 21. @yourtwitterhandle | developer.confluent.io Connectivity Within the Mesh Lends Itself … DATA MESH Domain Data Product
  • 22. @yourtwitterhandle | developer.confluent.io …Naturally to Event Streaming with Apache Kafka® DATA MESH Domain Data Product Mesh is a logical view, not physical!
  • 23. @yourtwitterhandle | developer.confluent.io Data Product Event Streaming Is the Pub/Sub, Not Point to Point stream (persisted) other streams write (publish) read (consume) independently Data producers are scalably decoupled from consumers Data Product Data Product Data Product
  • 24. @yourtwitterhandle | developer.confluent.io Why Is Event Streaming a Good Fit for Meshing? Streams are real-time, low latency ⇒ Propagate data immediately. Streams are highly scalable ⇒ Handle today’s massive data volumes. Streams are stored, replayable ⇒ Capture real-time & historical data. Streams are immutable ⇒ Auditable source of record. Streams are addressable, discoverable, … ⇒ Meet key criteria for mesh Data Product Data Product 1 2 3 4 5 6 7
  • 25. @yourtwitterhandle | developer.confluent.io How to Get Data Into and Out of a Data Product Snapshot via Nighty ETL Continuous Stream Snapshot via Req/Res API Snapshot via Nighty ETL Snapshot via Req/Res API 1 2 3 Data Product Output Data Ports Input Data Ports Continuous Stream
  • 26. @yourtwitterhandle | developer.confluent.io Onboarding Existing Data Source Connectors Use Kafka connectors to stream data from cloud services and existing systems into the mesh Data Product Input Data Ports https://www.confluent.io/hub/
  • 27. @yourtwitterhandle | developer.confluent.io Data Product: What’s Happening Inside Data on the Inside: HOW the domain team solves specific problems internally? This doesn’t matter to other domains. …pick your favorites... Output Data Ports Input Data Ports
  • 28. @yourtwitterhandle | developer.confluent.io Event Streaming Inside a Data Product Use ksqlDB, Kafka Streams apps, etc., for processing data in motion ksqlDB to filter, process, join, aggregate, analyze Stream data from other DPs or internal systems into ksqlDB 1 2 Stream data to internal systems or the outside. Pull queries can drive a req/res API. 3 Req/Res API Pull Queries Output Data Ports Input Data Ports
  • 29. @yourtwitterhandle | developer.confluent.io Event Streaming Inside a Data Product Use Kafka connectors and CDC to “streamify” classic databases DB client apps work as usual Stream data from other Data Products into your local DB 1 2 Stream data to the outside with CDC and e.g. the Outbox Pattern, ksqlDB, etc. 3 Sink Connector Source Connector MySQL Output Data Ports Input Data Ports
  • 30. @yourtwitterhandle | developer.confluent.io Dealing with Data Change: Schemas and Versioning Publish evolving streams with back/forward-compatible schemas. Publish versioned streams for breaking changes. V1 - user, product, quantity V2 - userAnonymized, product, quantity Data Product Output Data Ports Also, when needed, data can be fully reprocessed by replaying history.
  • 31. @yourtwitterhandle | developer.confluent.io Recommendations: Data as a First-Class Product 1. Data-on-the-Outside is harder to change, but it has more value in a holistic sense. a. Use schemas as a contract. b. Handle incompatible schema changes using Dual Schema Upgrade Window pattern. 2. Get data from the source, not from intermediaries. Think: Demeter's law applied to data. a. Otherwise, proliferation of ‘slightly corrupt’ data within the mesh. “Game of Telephone”. b. Event Streaming makes it easy to subscribe to data from authoritative sources. 3. Change data at the source, including error fixes. Don’t “fix data up” locally. 4. Some data sources will be difficult to turn into first-class data products. Example: Batch- based sources that lose event-level data or are not reproducible. a. Use Event Streaming plus CDC, Outbox Pattern, etc. to integrate these into the mesh.
  • 32. @yourtwitterhandle | developer.confluent.io Learn More Find out how to implement a Data Product for data in motion with: 1. Apache Kafka 101 course 2. Kafka Connect 101 course 3. Building Data Pipelines with Apache Kafka & Confluent course 4. ksqlDB 101 course developer.confluent.io/learn-kafka/
  • 33. Data Available Everywhere, Self- Service
  • 34. @yourtwitterhandle | developer.confluent.io The Principles of a Data Mesh Domain-driven Decentralization Data as a First Class Product Federated Governance Self-Serve Data Platform 1 2 3 4 Local Autonomy (Organizational Concerns) Product thinking, “Microservice for Data” Infra Tooling, Across Domains
  • 35. @yourtwitterhandle | developer.confluent.io Why Self-Service Matters Trade Surveillance System ● Data from 13 sources ● Some sources publish events ● Needed both historical and real-time data ● Historical data from database extracts arranged with dev team. ● Format of events different to format of extracts ● 9 months of effort to get 13 sources into the new system.
  • 36. @yourtwitterhandle | developer.confluent.io Principle 3: Self-Serve Data Platform Central infrastructure that provides real-time and historical data on demand Objective: Make domains autonomous in their execution through rapid data provisioning
  • 37. @yourtwitterhandle | developer.confluent.io Consuming Real-Time & Historical Data from the Mesh 1) Separate Systems for Real-Time and Historical Data (Lambda Architecture) ● Considerations: ○ Difficulty to correlate real-time with historical “snapshot” data ○ Two systems to manage ○ Unlike event streams, snapshots have less granularity 1) One System for Real-Time and Historical Data (Kappa Architecture) ● Considerations: ○ Operational complexity (addressed in Confluent Cloud) ○ Downsides of immutability of regular streams: e.g. altering or deleting events ○ Storage cost (addressed in Confluent Cloud, in Apache Kafka with KIP-405)
  • 38. @yourtwitterhandle | developer.confluent.io Browse Schemas What This Can Look Like in Practice
  • 39. @yourtwitterhandle | developer.confluent.io Implementation: Database Inside Out Connector Messaging that Remembers STREAM PROCESSOR Database Database Database/View ksqlDB Connector
  • 40. @yourtwitterhandle | developer.confluent.io Data Mesh Is Queryable and Decentralized with ksqlDB Destination Data Port Query is the interface to the mesh Events are the interface to the mesh STREAM PROCESSOR
  • 41. @yourtwitterhandle | developer.confluent.io Mesh is One Logical Cluster. Data Product Has Another. Data Product Data Product has its own cluster for internal use
  • 42. @yourtwitterhandle | developer.confluent.io Learn More Find out more about how to use Kafka and event streaming: 1. Apache Kafka 101 course 2. Event Sourcing and Event Storage with Apache Kafka course Learn how to create materialized views and process event streams: 1. Inside ksqlDB course developer.confluent.io/learn-kafka/
  • 44. @yourtwitterhandle | developer.confluent.io The Principles of a Data Mesh Domain-driven Decentralization Data as a First Class Product Federated Governance Self-Serve Data Platform 1 2 3 4 Local Autonomy (Organizational Concerns) Product thinking, “Microservice for Data” Infra Tooling, Across Domains Interoperability, Network Effects (Organizational Concerns)
  • 45. @yourtwitterhandle | developer.confluent.io Domain Domain Domain Domain Principle 4: Federated Governance Objective: Independent data products can interoperate and create network effects. ● Establish global standards, like governance, that apply to all data products in the mesh. ● Ideally, these global standards and rules are applied automatically by the platform. Self-Serve Data Platform What is decided locally by a domain? What is globally? (implemented and enforced by platform) Must balance between Decentralization vs. Centralization. No silver bullet!
  • 46. @yourtwitterhandle | developer.confluent.io SELECT … FROM orders o LEFT JOIN shipments s ON o.customerId = s.customerId EMIT CHANGES; Example Standard: Identifying Customers Globally ● Define how data is represented, so you can join and correlate data across different domains. ● Use data contracts, schemas, registries, etc. to implement and enforce such standards. ● Use Event Streaming to retrofit historical data to new requirements, standards. Domain Data Product customerId=29639 customerId=29639 customerId=29639 customerId=29639
  • 47. @yourtwitterhandle | developer.confluent.io Example Standard: Detect Errors and Recover with Stream ● Use strategies like logging, data profiling, data lineage, etc. to detect errors in the mesh. ● Streams are very helpful to detect errors and identify cause-effect relationships. ● Streams let you recover and fix errors: e.g., replay & reprocess historical data. My App Bug? Error? Rewind to start of stream, then reprocess. Event Streams give you a powerful Time Machine. 0 1 2 3 4 5 6 7 8 9 If needed, tell the origin data product to fix problematic data at the source. Data Product Output Data Ports
  • 48. @yourtwitterhandle | developer.confluent.io Example Standard: Tracking Data Lineage with Streams ● Lineage must work across domains and data products—and systems, clouds, data centers. ● Event streaming is a foundational technology for this. Domain Data Product
  • 49. @yourtwitterhandle | developer.confluent.io Recommendations: Federated Governance 1. Be pragmatic: Don’t expect governance systems to be perfect. a. They are a map that helps you navigate the data-landscape of your company. b. But there will always be roads that have changed or have not been mapped. 2. Governance is more a process—i.e., an organizational concern—than a technology. 3. Beware of centralized data models, which can become slow to change. Where they must exist, use processes & tooling like GitHub to collaborate and change quickly. Good luck! 🙂
  • 51. @yourtwitterhandle | developer.confluent.io Data Mesh Journey Principle 1 Data should have one owner: the team that creates it. Principle 2 Data is your product: All exposed data should be good data. Principle 3 Get access to any data immediately and painlessly, be it historical or real-time. Principle 4: Governance, with standards, security, lineage, etc. (cross-cutting concerns) Difficulty to execute Start Here 1 2 3
  • 52. @yourtwitterhandle | developer.confluent.io Implement a Data Mesh ● Centralize data in motion Introduce a central event streaming platform. ● Nominate data owners Firm owners for all key datasets in the organization. Make ownership information broadly accessible. ● Data on demand. Events are either stored in Kafka indefinitely or can be republished by data products on demand. ● Handle Schema Change Owners publish schema information to the mesh. Process introduced for schema change approval. ● Secure event streams Access to event streams is permissioned by a central body. ● Connect from any database Sink Connectors are made available for all supported database types to ease the provisioning of new output data ports in the mesh. ● Central user interface Discovery and registration of event streams Searching schemas for data of interest Previewing event streams Requesting to access event streams. Data lineage views
  • 53. Hands On: Exploring the Data Mesh
  • 55. Hands On: Creating a Data Product
  • 57. Your Apache Kafka® journey begins here developer.confluent.io