Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Architect’s Open-Source
Guide for a Data Mesh
Architecture
Lena Hall
Microsoft
Lena Hall
Director at Microsoft
Azure Engineering
ü Architecture
ü Cloud
ü Data
ü ML/AI
lenadroid
Entry Point
How to Move Beyond a Monolithic Data Lake to a Distributed
Data Mesh
https://martinfowler.com/articles/data-mo...
Talk Snapshot
• What is Data Mesh
• When is Data Mesh a Good Idea
• Core Principles and Concepts
• Example: Drone Delivery...
When and Why
Data Mesh
@lenadroid
Data Mesh is Not For Everyone
Challenges Indicating Data Mesh
May Be Considered
@lenadroid
Drone Delivery Service
lenadroid
WHYs
• Ambiguity in Ownership and Responsibility
• Slow Change due to Coupling to Monolithic System
• Data Engineering Res...
Ideas Composing Data Mesh Concept
@lenadroid
Core Ideas
ü Decentralized teams and data ownership
lenadroid
Core Ideas
ü Decentralized teams and data ownership
ü Data Products powered by Domain Driven Design
lenadroid
High-Level View of a Data Product
lenadroid
Core Ideas
ü Decentralized teams and data ownership
ü Data Products powered by Domain Driven Design
ü Self-serve Shared Da...
Core Ideas
ü Decentralized teams and data ownership
ü Data Products powered by Domain Driven Design
ü Self-serve Shared Da...
Drone Delivery Service Data Products
@lenadroid
lenadroid
Core Principles for Data Products
@lenadroid
lenadroid
DISCOVERABLE
Core Principles for Data Products
lenadroid
DISCOVERABLE
SELF-DESCRIBING
Core Principles for Data Products
lenadroid
DISCOVERABLE
SELF-DESCRIBING
ADDRESSABLE
Core Principles for Data Products
lenadroid
DISCOVERABLE
SELF-DESCRIBING
ADDRESSABLE
SECURE
Core Principles for Data Products
lenadroid
DISCOVERABLE
SELF-DESCRIBING
ADDRESSABLE
SECURE
TRUSTWORTHY
Core Principles for Data Products
lenadroid
DISCOVERABLE
SELF-DESCRIBING
ADDRESSABLE
SECURE
TRUSTWORTHY
INTEROPERABLE
Core Principles for Data Products
Input Ports Questions
• Data Source - Where is the data coming from? External dataset or
another data product?
• Data Form...
Self-Serve Shared Infrastructure
@lenadroid
REAL-TIME DATA
INGESTION
PROCESSING
OBJECT STORAGE
COLUMNAR STORAGE
PROCESSING
COLUMNAR STORAGE
INCOMING
REQUEST
WEB SERVI...
WEB SERVICE
PROCESSING
lenadroid
It can look like this
Azure Data
Lake
WEB SERVICE
lenadroid
Or, it can look like this
Google
Storage
lenadroid
Self-Serve Shared Infrastructure
SHARED PLATFORM FOR
STREAMING INGESTION
SHARED PLATFORM FOR
RAW DATA STORAGE
SH...
lenadroid
SHARED PLATFORM FOR
CONTAINER WORKLOADS
SHARED PLATFORM FOR
CONTINUOUS DELIVERY
SHARED PLATFORM
FOR OBSERVABILIT...
Wait, What About the OSS Tools for Data Mesh??
@lenadroid
Challenges with Data Mesh
@lenadroid
Challenges
• Cost questions
• Lack of end-to-end examples
• Efforts to shift from centralized architecture to decentraliza...
Considerations for Technology Choices
@lenadroid
Considerations for Technology Choices
• Workload sharing and multi-tenancy
• No-copy data and compute mobility support
• G...
Examples of Data Mesh-friendly Technologies
@lenadroid
data
Anthos
Azure Arc
Data Catalogue, Data Lineage,
Data Governance
OSS Data Analytics, Data
Processing, Data Querying
Clo...
Data Governance Systems
• Metadata
• Data lineage
• Data schemas
• Data relationships
• Data classification
• Data securit...
Open Formats
• Open standard
• Atomic updates, serializable isolation, transactions
• Concurrent operations
• Versioning, ...
Data Platforms (Cloud or OSS)
• Separation of storage and compute
• Support for no-copy data sharing
• Bringing compute to...
Multi-Cloud Infrastructure Management
• Terraform
Open-source infrastructure as code software tool that enables you to saf...
Multi-Cloud Workload Portability
• Azure Arc
Build cloud-native apps anywhere, at scale. Run Azure services in any Kuberne...
Kubernetes Open-Standard Technologies
• Open Application Model
An open standard for defining cloud native apps.
KubeVella ...
Benefits Brought by Data Mesh
• Data Quality
• Tailored resource and focus allocation
• Organizational cohesion while allo...
Important Focus Areas for Technology Providers
• Open Standards, Open Protocols, Open-Source Integrations
• Workload shari...
Data Mesh will drive better Interoperability, Open
Standards, and Data Quality in the Industry
@lenadroid
Thank you!
Follow lenadroid for more insights
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Architect’s Open-Source Guide for a Data Mesh Architecture

Download to read offline

Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?

In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.

The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.

This session is targeted for architects, decision-makers, data-engineers, and system designers.

Architect’s Open-Source Guide for a Data Mesh Architecture

  1. 1. Architect’s Open-Source Guide for a Data Mesh Architecture Lena Hall Microsoft
  2. 2. Lena Hall Director at Microsoft Azure Engineering ü Architecture ü Cloud ü Data ü ML/AI lenadroid
  3. 3. Entry Point How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh https://martinfowler.com/articles/data-monolith-to-mesh.html Data Mesh Principles and Logical Architecture https://martinfowler.com/articles/data-mesh-principles.html Slack for Data-Mesh-Learning https://launchpass.com/data-mesh-learning lenadroid
  4. 4. Talk Snapshot • What is Data Mesh • When is Data Mesh a Good Idea • Core Principles and Concepts • Example: Drone Delivery Service • Challenges • OSS and Open Standards lenadroid
  5. 5. When and Why Data Mesh @lenadroid
  6. 6. Data Mesh is Not For Everyone
  7. 7. Challenges Indicating Data Mesh May Be Considered @lenadroid
  8. 8. Drone Delivery Service lenadroid
  9. 9. WHYs • Ambiguity in Ownership and Responsibility • Slow Change due to Coupling to Monolithic System • Data Engineering Resources Bottleneck lenadroid
  10. 10. Ideas Composing Data Mesh Concept @lenadroid
  11. 11. Core Ideas ü Decentralized teams and data ownership lenadroid
  12. 12. Core Ideas ü Decentralized teams and data ownership ü Data Products powered by Domain Driven Design lenadroid
  13. 13. High-Level View of a Data Product lenadroid
  14. 14. Core Ideas ü Decentralized teams and data ownership ü Data Products powered by Domain Driven Design ü Self-serve Shared Data Infrastructure lenadroid
  15. 15. Core Ideas ü Decentralized teams and data ownership ü Data Products powered by Domain Driven Design ü Self-serve Shared Data Infrastructure ü Global Federated Governance lenadroid
  16. 16. Drone Delivery Service Data Products @lenadroid
  17. 17. lenadroid
  18. 18. Core Principles for Data Products @lenadroid
  19. 19. lenadroid DISCOVERABLE Core Principles for Data Products
  20. 20. lenadroid DISCOVERABLE SELF-DESCRIBING Core Principles for Data Products
  21. 21. lenadroid DISCOVERABLE SELF-DESCRIBING ADDRESSABLE Core Principles for Data Products
  22. 22. lenadroid DISCOVERABLE SELF-DESCRIBING ADDRESSABLE SECURE Core Principles for Data Products
  23. 23. lenadroid DISCOVERABLE SELF-DESCRIBING ADDRESSABLE SECURE TRUSTWORTHY Core Principles for Data Products
  24. 24. lenadroid DISCOVERABLE SELF-DESCRIBING ADDRESSABLE SECURE TRUSTWORTHY INTEROPERABLE Core Principles for Data Products
  25. 25. Input Ports Questions • Data Source - Where is the data coming from? External dataset or another data product? • Data Format - What is the format of the source input? • Rate of Updates - How frequently does the input need to be updated? Output Ports Questions • End-consumers - Who are the end-users of the data product? • Data purpose - What are they planning to do with the data outputs? • Data access - Who needs to have access? How do they prefer to access the data output? • Data address - How do they prefer to access the data output? • Data Format - What format of the data do they expect? Identity and Permission Policies Questions • Which resources can this data product be allowed to access? • Which data products or users can read which output ports of this data product? • Are all sensitive resources this data product offers protected according their required privacy standards (e.g. HIPAA, GDPR, PII, CCPA, etc.) • Is the permissions policy stored and managed in the same package as the data product? Data Product Action Questions • What is the action that needs to happen to produce the outcomes for the end-users? • What are the required adjustments, transformations, filters, updates, or quality improvements to the input data? Operational Questions • How can this data product be discovered and how should it be described to other data products that might want to consume it? • Which metadata and information should it make available to the end- users? • Where and how should data product versioning be managed during updates to ensure consistency with how the end-users consume it? • Which SLAs or SLOs does the data product provide? • Which product success metrics can this data product expose and keep track of? (adoption, usage, quality) • Is the automation/resource orchestration logic stored in the same package? Other Questions • Is this product not tightly coupled to any other data source, data product, or any other resource that makes him not interoperable? • Does this data product follow the defined global governance standards and practices defined by the organization? • Does this data product have any implementation details that could interfere with its portability? Cheat Sheet for Planning Data Products lenadroid
  26. 26. Self-Serve Shared Infrastructure @lenadroid
  27. 27. REAL-TIME DATA INGESTION PROCESSING OBJECT STORAGE COLUMNAR STORAGE PROCESSING COLUMNAR STORAGE INCOMING REQUEST WEB SERVICE PROCESSING lenadroid Types of Workloads Within a Data Product
  28. 28. WEB SERVICE PROCESSING lenadroid It can look like this Azure Data Lake
  29. 29. WEB SERVICE lenadroid Or, it can look like this Google Storage
  30. 30. lenadroid Self-Serve Shared Infrastructure SHARED PLATFORM FOR STREAMING INGESTION SHARED PLATFORM FOR RAW DATA STORAGE SHARED PLATFORM FOR COLUMNAR DATA STORAGE SHARED PLATFORM FOR CONTAINER WORKLOADS SHARED PLATFORM FOR CONTINUOUS DELIVERY SHARED PLATFORM FOR OBSERVABILITY AND MORE… DEPENDING ON THE ORGANIZATION DATA CATALOGUE
  31. 31. lenadroid SHARED PLATFORM FOR CONTAINER WORKLOADS SHARED PLATFORM FOR CONTINUOUS DELIVERY SHARED PLATFORM FOR OBSERVABILITY DISCOVERABLE SELF-DESCRIBING ADDRESSABLE SECURE TRUSTWORTHY INTEROPERABLE Data Mesh SHARED PLATFORM FOR STREAMING INGESTION SHARED PLATFORM FOR RAW DATA STORAGE SHARED PLATFORM FOR COLUMNAR DATA STORAGE DATA CATALOGUE
  32. 32. Wait, What About the OSS Tools for Data Mesh?? @lenadroid
  33. 33. Challenges with Data Mesh @lenadroid
  34. 34. Challenges • Cost questions • Lack of end-to-end examples • Efforts to shift from centralized architecture to decentralization- friendly techniques • Automation required for enabling creating data products • Underestimating the importance organizational aspects lenadroid
  35. 35. Considerations for Technology Choices @lenadroid
  36. 36. Considerations for Technology Choices • Workload sharing and multi-tenancy • No-copy data and compute mobility support • Granularity of access-control • Richness of automation and extensibility capabilities • Flexibility and elasticity • Provider-agnostic/multi-cloud operations support • Variety of limitations (quotas, data volume, resource count, etc.) • Open Standards, Open Protocols, Open-Source Integrations lenadroid
  37. 37. Examples of Data Mesh-friendly Technologies @lenadroid
  38. 38. data Anthos Azure Arc Data Catalogue, Data Lineage, Data Governance OSS Data Analytics, Data Processing, Data Querying Cloud Storage Open Formats Data Ingestion, Streaming Data Orchestration, Workflows OSS Storage Products for Data Analytics and Processing Data Visualization and BI Tools Data Experimentation Cross-Platform Concepts and Tools Multi and Hybrid Cloud Tools Amazon S3 Azure Data Lake Google Storage Infrastructure Automation lenadroid
  39. 39. Data Governance Systems • Metadata • Data lineage • Data schemas • Data relationships • Data classification • Data security • Data catalog lenadroid
  40. 40. Open Formats • Open standard • Atomic updates, serializable isolation, transactions • Concurrent operations • Versioning, rollbacks, time-travel • Schema Evolution • Scale, Efficiency, Data Volumes • Compatibility with existing data stores and languages lenadroid
  41. 41. Data Platforms (Cloud or OSS) • Separation of storage and compute • Support for no-copy data sharing • Bringing compute to data • Fine-tuned granularity of permissions for access • Support for automation and resource management • Open standards and interoperability with other platforms and tools for governance, visualization, analytics, etc. lenadroid
  42. 42. Multi-Cloud Infrastructure Management • Terraform Open-source infrastructure as code software tool that enables you to safely and predictably create, change, and improve infrastructure. • Pulumi Open-source infrastructure as code SDK that enables you to create, deploy, and manage infrastructure on any cloud, using your favorite languages. • Crossplane Assemble infrastructure from multiple vendors, and expose higher level self-service APIs for application teams to consume, without having to write any code. lenadroid
  43. 43. Multi-Cloud Workload Portability • Azure Arc Build cloud-native apps anywhere, at scale. Run Azure services in any Kubernetes environment, whether it’s on-premises, multi-cloud, or at the edge • Google Athnos A modern application management platform that provides a consistent development and operations experience for cloud and on-premises environments lenadroid
  44. 44. Kubernetes Open-Standard Technologies • Open Application Model An open standard for defining cloud native apps. KubeVella - https://kubevela.io/docs/concepts • Open Policy Agent Declarative Policy-as-Code, enables portability, combination with Infra-as-Code. https://www.openpolicyagent.org/docs/latest • Service Catalog Provision managed services and make them available within a Kubernetes cluster. https://kubernetes.io/docs/concepts/extend-kubernetes/service-catalog/ NOT AN EXHAUSTIVE LIST lenadroid
  45. 45. Benefits Brought by Data Mesh • Data Quality • Tailored resource and focus allocation • Organizational cohesion while allowing flexibility • Reducing complexity • Democratizing creating value • Better understanding of value and innovation opportunities • Empowering a more consistent and fast change @lenadroid
  46. 46. Important Focus Areas for Technology Providers • Open Standards, Open Protocols, Open-Source Integrations • Workload sharing and multi-tenancy • No-copy data and compute mobility support • Granularity of access-control • Richness of automation and extensibility capabilities • Flexibility and elasticity • Provider-agnostic/multi-cloud operations support • Variety of limitations (quotas, data volume, resource count, etc.) @lenadroid
  47. 47. Data Mesh will drive better Interoperability, Open Standards, and Data Quality in the Industry @lenadroid
  48. 48. Thank you! Follow lenadroid for more insights
  • MathieuProvencher

    Sep. 22, 2021
  • SubhabrataRay2

    Jun. 25, 2021

Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh? In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry. The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems. This session is targeted for architects, decision-makers, data-engineers, and system designers.

Views

Total views

496

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

49

Shares

0

Comments

0

Likes

2

×