How to govern and secure a Data Mesh?

How to govern and secure
a Data Mesh?

search lineage manage
quality policies

Provides data
Data
Producer
Team
Data
Consumer
Team Data
Engineers
&
Explorers
Provides data
Agnostic
Operational Data Platform
Supports making
data available
Supports data
discovery
Supports data provision
and data discovery

Challenge #1: Centralized Teams
4
BIG DATA
PLATFORM
Ingest Process Serve
Requirements Requirements
Data Data
Data Producers
Domain Expertise,
Direct Inﬂuence
on Data Quality
Central Data Team
Data Engineering
Capability,
Responsible for
Data Quality
Data Consumers
Interest in Data Quality,
Data Application
Experience
FAIL TO
BOOTSTRAP
FAIL TO SCALE
SOURCES
FAIL TO SCALE
CONSUMERS
FAIL TO MATERIALIZE
DATA-DRIVEN VALUE
Data Architectures
&
Organization
Today
Risks of Creating
a Disconnect between
Data Owners and
Users
Failure Symptoms
for Creating
Data Driven Value
Centralized
Architecture
Technically
Decomposed
Hyper-Specialized
Silo Delivery

Challenge #2: Data Sharing & System Decoupling
Get away from Point-to-Point Data Sharing

Need: Operational Data Platform
Scalable and completely ecoupled architecture
Source
Source
Source
Source
Data Product
Data Product
Data Product
Data Product
Source
Source
Source
Source
Data Product
Data Product
Data Product
Data Product
6
for high-quality, self-service access to real-time data streams
Combine and enrich data
from anywhere to anywhere
for real-time data sharing
and greater reuse

Challenge #3: Bridging the
Operational & Analytical Worlds
DATA PIPELINES
OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS
Domain 1
Operational
Database
Domain 2
Operational
Database
Domain 3
Operational
Database
Data Lake
Lake House
Data Mart
Data
Warehouse
ML/AI
Reports &
Dashboards

Analytical Repositories
The reality with today’s data integration strategy
A giant mess of monolithic point-to-point connections
with data ﬁdelity and governance challenges
Operational Repository

Challenge #4: “modern” data stack is built on a
legacy paradigm
Core processing
systems
External data
Unstructured data
Systems of Record
Browser mobile
Logs
Telemetry
SaaS apps
…
Infrastructure
Data Sources
Data Warehouse
Operational
Systems
Extract/Load
1
2
Transform
3
Reverse ETL
4
4
BI Tools
SaaS
Applications

Core processing
systems
External data
Unstructured data
Systems of Record
Browser mobile
Logs
Telemetry
SaaS apps
…
Infrastructure
Data Sources
Data Warehouse
Operational
Systems
Extract/Load
1
2
Transform
3
Reverse ELT
4
4
BI Tools
SaaS
Applications
Batch-based, low
fidelity stale data
is unsuitable for
real-time
operational and BI
use cases
Immature governance
and observability
creates data access
conflict between IT ops
and engineering teams
Infra-heavy data
processing
leads to scale and
performance
challenges and high
overall TCO
Over-reliance on
central data teams
with limited domain
knowledge become
innovation bottlenecks
Inflexible
monolithic design
results in multiple silo-ed
purpose-built pipelines,
increasing sprawl
Stale data, rigid engineering and poor lineage and governance
slows developer agility and innovation
Today’s data integration approaches create a chaotic
and unscalable data foundation

Data Mesh - Domain Data as a Product
Planning
Transformation
Visualization
E
T
L
E T
L
E T
L
Loyalty
Program
API
D
a
t
a
Products
API
D
a
t
a
Customer
Data
API
D
a
t
a
Risk
API
Payments
API
T
T
T

Data Mesh is an Architecture with Implementation
customer
analytics
Operational
Data Platform
Analytical
Data Platform
transaction
system
fraud
detection
payments
customer
onboarding
bank
account
Search listing
by term
asset
management
Identity Provider
Policy Provider
Data Catalog
Auditing
Self-Service Data Portal
Enterprise
Infrastructure

Cloud Data Systems
Data Stores
(I.e. PostgreSQL, MongoDB
Atlas, MySQL, Oracle DB)
Application Data
(i.e. Salesforce, ServiceNow,
Github, Zendesk)
Log Data &
Messaging Systems
(i.e. MQTT, Azure Service Bus,
Azure Event Hubs, Solace) ksqlDB
Conﬂuent
Source
connectors
Optional: SMT
Sink connectors
Optional: SMT
Power your operational and analytical systems with
real-time streaming data
OLTP Systems
MongoDB
Atlas
Amazon
DynamoDB
Azure
Cosmos DB
Google
BigTable
Cassandra Redis
PostgreSQL
MySQL
OLAP Systems
Amazon
Redshift
Snowﬂake Google
BigQuery
Azure Synapse
Analytics
Databricks
Delta Lake
Amazon S3Google Cloud
Storage
Azure Blob
Storage
Stream
Governance
Pre-built
Connectors

Data Mesh is an Architecture with Implementation
...
Device
Logs ... ...
...
Data Stores Logs 3rd Party Apps Custom Apps / Microservices
Real-time
Inventory
Real-time Fraud
Detection
Real-time
Customer 360
Machine
Learning
Models
Real-time Data
Transformation ...
Event-Streaming Applications
Universal Event Pipeline
S3
SaaS
apps
App
Mainframes Snowﬂake Splunk

Confluent proved to be ready EVERYWHERE..
Private Cloud
Deploy on premises with
Confluent Platform
Public/Multi-Cloud
Leverage a fully managed
service with Confluent
Cloud
Hybrid Cloud
Build a persistent bridge
from datacenter to cloud
with Cluster Linking

Validate
the Event
Classify
the Event
Connect
the Event
Discover
the Event
Protect the Event

Areas of Investment
Classify and
understand
the meaning
of the data in
Kafka.
Freedom of Choice
SELF-SERVICE
DATA CATALOG
DATA QUALITY
Enforce and
understand the
quality of the
data in Kafka.
Follow and
understand
the ﬂow of
the data in
Kafka.
DATA LINEAGE DATA POLICIES
Event Streaming Platform
Enforce
policies
around who
can see and
do what.
Decentralized
data
management
in Kafka.

Confluent’s Areas of Investment
Search and discover
Metadata index | UI & API access
Classification
Tags | Generic key values
Central metadata repository
Technical metadata | Business metadata
Understand the meaning of
the data in Kafka
Freedom of Choice
Profiling
Data insights
Monitor quality
API | UI
Schema validation
Client side | Broker side
Point in time lineage
Lookup lineage by date
Inter cluster lineage
Flow of data across clusters
Intra cluster lineage
Flow of data inside a cluster
EVENTS LINEAGE
EVENTS CATALOG
EVENTS QUALITY
Enterprise license
Understand the quality of
the data in Kafka
Understand the flow of the
data in Kafka
Fully Managed Cloud Service Self-managed Software
Apache Kafka
Live

Copyright 2021, Conﬂuent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Conﬂuent, Inc.
stream catalog
Increase collaboration and productivity
with self-service data discovery
Tag and classify data to increase the value of your catalog
Share what
you build
Find what
you need

Copyright 2021, Conﬂuent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Conﬂuent, Inc.
stream
catalog
Tags
Key value pairs
TECHNICAL METADATA BUSINESS METADATA
INDEX
ENTITY TYPES
Owner
...
Name
Creation date
Integration with 3rd parties
stream catalog

Custom
Producer
Custom
Consumer
Source
Connector
Sink
Connector
ksqlDB KStreams
Events Catalog
Entity type system
Cluster Topic Schema
Tag system
PII PCI Sensitive ...
Metadata
Metadata
Kafka
Technical
metadata
Kafka
Business
metadata Metadata
topic
producer consumer
Kafka
Lineage
metadata

Business Metadata
Team
name
person_of_contact
cost_center
Domain
name
team
boundary
Owner
name
phone
email
Data_product
name
tier
owner
github
url
repo

Next-gen data lifecycle with Conﬂuent
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
Topic A
Topic B
Topic C
Topic D
Data Product 1
Data Product 2
Connect
Continuously
stream data to
Conﬂuent
Real-time Apps
SaaS Apps
Data Warehouse
Dashboards
Govern
Tag and
secure data
streams
Enrich
Process and
cleanse data
Build
Create ready to
share, read to use
data products
Share
Multicast to any
destination
Database
SaaS app
Custom producer
System of Record

Overall Architecture
Kafka
search lineage manage
Integration
RBAC/
ABAC /
PBAC
quality policies
Events Quality
(Broker interceptors, content validation rules)
on-prem hosted
Technical metadata
Business metadata
Lineage metadata
REST API
GraphQL
Events
Portal
Events Catalog
Schema Registry
Connectors Client Applications
Java, .NET, Python, ..
REST Proxy
Analytics
Data storage
Data Catalog
Non-streaming
data sources
DB MQ Host

Conﬂuent acts as the Central Nervous System to
connect all of your apps & data systems
Databases
Data Warehouses
AWS, Azure, GCP
Legacy Infra / Mainframes
Custom Apps
SaaS Apps
Legacy Apps
AWS, Azure, GCP
Databases
Data Warehouses
Legacy Infra / Mainframes
Custom Apps
SaaS Apps
Legacy Apps

Inter-cluster lineage
API
Time series lineage
Interoperability
Catalog integration
Conﬂuent focuses on..
Intra-cluster lineage

SWITZERLAND
Backbine for a scalable busines
Networks & Data Mesh
Modern Payments in Cloud
Central Data Platform
Convergence of BI & CX
IoT & Microservices
Governed Data Mesh

Thank you!
Vielen Dank!
Merci beaucoup!

How to govern and secure a Data Mesh?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How to govern and secure a Data Mesh?

Similar to How to govern and secure a Data Mesh? (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

How to govern and secure a Data Mesh?