Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
Data Mesh is a new socio-technical approach to data architecture, first described by Zhamak Dehghani and popularised through a guest blog post on Martin Fowler's site.
Since then, community interest has grown, due to Data Mesh's ability to explain and address the frustrations that many organisations are experiencing as they try to get value from their data. The 2022 publication of Zhamak's book on Data Mesh further provoked conversation, as have the growing number of experience reports from companies that have put Data Mesh into practice.
So what's all the fuss about?
On one hand, Data Mesh is a new approach in the field of big data. On the other hand, Data Mesh is application of the lessons we have learned from domain-driven design and microservices to a data context.
In this talk, Chris and Pablo will explain how Data Mesh relates to current thinking in software architecture and the historical development of data architecture philosophies. They will outline what benefits Data Mesh brings, what trade-offs it comes with and when organisations should and should not consider adopting it.
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
Enterprise data architectures usually contain many systems—data lakes, message queues, and data warehouses—that data must pass through before it can be analyzed. Each transfer step between systems adds a delay and a potential source of errors. What if we could remove all these steps? In recent years, cloud storage and new open source systems have enabled a radically new architecture: the lakehouse, an ACID transactional layer over cloud storage that can provide streaming, management features, indexing, and high-performance access similar to a data warehouse. Thousands of organizations including the largest Internet companies are now using lakehouses to replace separate data lake, warehouse and streaming systems and deliver high-quality data faster internally. I’ll discuss the key trends and recent advances in this area based on Delta Lake, the most widely used open source lakehouse platform, which was developed at Databricks.
This describes a conceptual model approach to designing an enterprise data fabric. This is the set of hardware and software infrastructure, tools and facilities to implement, administer, manage and operate data operations across the entire span of the data within the enterprise across all data activities including data acquisition, transformation, storage, distribution, integration, replication, availability, security, protection, disaster recovery, presentation, analytics, preservation, retention, backup, retrieval, archival, recall, deletion, monitoring, capacity planning across all data storage platforms enabling use by applications to meet the data needs of the enterprise.
The conceptual data fabric model represents a rich picture of the enterprise’s data context. It embodies an idealised and target data view.
Designing a data fabric enables the enterprise respond to and take advantage of key related data trends:
• Internal and External Digital Expectations
• Cloud Offerings and Services
• Data Regulations
• Analytics Capabilities
It enables the IT function demonstrate positive data leadership. It shows the IT function is able and willing to respond to business data needs. It allows the enterprise to meet data challenges
• More and more data of many different types
• Increasingly distributed platform landscape
• Compliance and regulation
• Newer data technologies
• Shadow IT where the IT function cannot deliver IT change and new data facilities quickly
It is concerned with the design an open and flexible data fabric that improves the responsiveness of the IT function and reduces shadow IT.
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
Past, present and future of data mesh at Intuit. This deck describes a vision and strategy for improving data worker productivity through a Data Mesh approach to organizing data and holding data producers accountable. Delivered at the inaugural Data Mesh Leaning meetup on 5/13/2021.
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
Data mesh was among the most discussed and controversial enterprise data management topics of 2021. One of the reasons people struggle with data mesh concepts is we still have a lot of open questions that we are not thinking about:
Are you thinking beyond analytics? Are you thinking about all possible stakeholders? Are you thinking about how to be agile? Are you thinking about standardization and policies? Are you thinking about organizational structures and roles?
Join data.world VP of Product Tim Gasper and Principal Scientist Juan Sequeda for an honest, no-bs discussion about data mesh and its role in data governance.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesn’t address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
Organizations have been chasing the dream of data democratization, unlocking and accessing data at scale to serve their customers and business, for over a half a century from early days of data warehousing. They have been trying to reach this dream through multiple generations of architectures, such as data warehouse and data lake, through a cambrian explosion of tools and a large amount of investments to build their next data platform. Despite the intention and the investments the results have been middling.
In this keynote, Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, and its predecessor data warehouse.
She introduces Data Mesh, a paradigm shift in big data management that draws from modern distributed architecture: considering domains as the first class concern, applying self-sovereignty to distribute the ownership of data, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
This talk introduces the principles underpinning data mesh and Zhamak's recent learnings in creating a path to bring data mesh to life in your organization.
Data Catalog for Better Data Discovery and GovernanceDenodo
Watch full webinar here: https://buff.ly/2Vq9FR0
Data catalogs are en vogue answering critical data governance questions like “Where all does my data reside?” “What other entities are associated with my data?” “What are the definitions of the data fields?” and “Who accesses the data?” Data catalogs maintain the necessary business metadata to answer these questions and many more. But that’s not enough. For it to be useful, data catalogs need to deliver these answers to the business users right within the applications they use.
In this session, you will learn:
*How data catalogs enable enterprise-wide data governance regimes
*What key capability requirements should you expect in data catalogs
*How data virtualization combines dynamic data catalogs with delivery
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
Snowflake is one of the most powerful, efficient data warehouses on the market today—and we joined forces with the Snowflake team to show you how it works!
In this webinar:
- Learn how to optimize Snowflake
- Hear insider tips and tricks on how to improve performance
- Get expert insights from Craig Collier, Technical Architect from Snowflake, and Kalyan Arangam, Solution Architect from Matillion
- Find out how leading brands like Converse, Duo Security, and Pets at Home use Snowflake and Matillion ETL to make data-driven decisions
- Discover how Matillion ETL and Snowflake work together to modernize your data world
- Learn how to utilize the impressive scalability of Snowflake and Matillion
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
Data Mesh is a new socio-technical approach to data architecture, first described by Zhamak Dehghani and popularised through a guest blog post on Martin Fowler's site.
Since then, community interest has grown, due to Data Mesh's ability to explain and address the frustrations that many organisations are experiencing as they try to get value from their data. The 2022 publication of Zhamak's book on Data Mesh further provoked conversation, as have the growing number of experience reports from companies that have put Data Mesh into practice.
So what's all the fuss about?
On one hand, Data Mesh is a new approach in the field of big data. On the other hand, Data Mesh is application of the lessons we have learned from domain-driven design and microservices to a data context.
In this talk, Chris and Pablo will explain how Data Mesh relates to current thinking in software architecture and the historical development of data architecture philosophies. They will outline what benefits Data Mesh brings, what trade-offs it comes with and when organisations should and should not consider adopting it.
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
Enterprise data architectures usually contain many systems—data lakes, message queues, and data warehouses—that data must pass through before it can be analyzed. Each transfer step between systems adds a delay and a potential source of errors. What if we could remove all these steps? In recent years, cloud storage and new open source systems have enabled a radically new architecture: the lakehouse, an ACID transactional layer over cloud storage that can provide streaming, management features, indexing, and high-performance access similar to a data warehouse. Thousands of organizations including the largest Internet companies are now using lakehouses to replace separate data lake, warehouse and streaming systems and deliver high-quality data faster internally. I’ll discuss the key trends and recent advances in this area based on Delta Lake, the most widely used open source lakehouse platform, which was developed at Databricks.
This describes a conceptual model approach to designing an enterprise data fabric. This is the set of hardware and software infrastructure, tools and facilities to implement, administer, manage and operate data operations across the entire span of the data within the enterprise across all data activities including data acquisition, transformation, storage, distribution, integration, replication, availability, security, protection, disaster recovery, presentation, analytics, preservation, retention, backup, retrieval, archival, recall, deletion, monitoring, capacity planning across all data storage platforms enabling use by applications to meet the data needs of the enterprise.
The conceptual data fabric model represents a rich picture of the enterprise’s data context. It embodies an idealised and target data view.
Designing a data fabric enables the enterprise respond to and take advantage of key related data trends:
• Internal and External Digital Expectations
• Cloud Offerings and Services
• Data Regulations
• Analytics Capabilities
It enables the IT function demonstrate positive data leadership. It shows the IT function is able and willing to respond to business data needs. It allows the enterprise to meet data challenges
• More and more data of many different types
• Increasingly distributed platform landscape
• Compliance and regulation
• Newer data technologies
• Shadow IT where the IT function cannot deliver IT change and new data facilities quickly
It is concerned with the design an open and flexible data fabric that improves the responsiveness of the IT function and reduces shadow IT.
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
Past, present and future of data mesh at Intuit. This deck describes a vision and strategy for improving data worker productivity through a Data Mesh approach to organizing data and holding data producers accountable. Delivered at the inaugural Data Mesh Leaning meetup on 5/13/2021.
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
Data mesh was among the most discussed and controversial enterprise data management topics of 2021. One of the reasons people struggle with data mesh concepts is we still have a lot of open questions that we are not thinking about:
Are you thinking beyond analytics? Are you thinking about all possible stakeholders? Are you thinking about how to be agile? Are you thinking about standardization and policies? Are you thinking about organizational structures and roles?
Join data.world VP of Product Tim Gasper and Principal Scientist Juan Sequeda for an honest, no-bs discussion about data mesh and its role in data governance.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesn’t address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
Organizations have been chasing the dream of data democratization, unlocking and accessing data at scale to serve their customers and business, for over a half a century from early days of data warehousing. They have been trying to reach this dream through multiple generations of architectures, such as data warehouse and data lake, through a cambrian explosion of tools and a large amount of investments to build their next data platform. Despite the intention and the investments the results have been middling.
In this keynote, Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, and its predecessor data warehouse.
She introduces Data Mesh, a paradigm shift in big data management that draws from modern distributed architecture: considering domains as the first class concern, applying self-sovereignty to distribute the ownership of data, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
This talk introduces the principles underpinning data mesh and Zhamak's recent learnings in creating a path to bring data mesh to life in your organization.
Data Catalog for Better Data Discovery and GovernanceDenodo
Watch full webinar here: https://buff.ly/2Vq9FR0
Data catalogs are en vogue answering critical data governance questions like “Where all does my data reside?” “What other entities are associated with my data?” “What are the definitions of the data fields?” and “Who accesses the data?” Data catalogs maintain the necessary business metadata to answer these questions and many more. But that’s not enough. For it to be useful, data catalogs need to deliver these answers to the business users right within the applications they use.
In this session, you will learn:
*How data catalogs enable enterprise-wide data governance regimes
*What key capability requirements should you expect in data catalogs
*How data virtualization combines dynamic data catalogs with delivery
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
Snowflake is one of the most powerful, efficient data warehouses on the market today—and we joined forces with the Snowflake team to show you how it works!
In this webinar:
- Learn how to optimize Snowflake
- Hear insider tips and tricks on how to improve performance
- Get expert insights from Craig Collier, Technical Architect from Snowflake, and Kalyan Arangam, Solution Architect from Matillion
- Find out how leading brands like Converse, Duo Security, and Pets at Home use Snowflake and Matillion ETL to make data-driven decisions
- Discover how Matillion ETL and Snowflake work together to modernize your data world
- Learn how to utilize the impressive scalability of Snowflake and Matillion
The Shifting Landscape of Data IntegrationDATAVERSITY
Enterprises and organizations from every industry and scale are working to leverage data to achieve their strategic objectives — whether they are to be more profitable, effective, risk-tolerant, prepared, sustainable, and/or adaptable in an ever-changing world. Data has exploded in volume during the last decade as humans and machines alike produce data at an exponential pace. Also, exciting technologies have emerged around that data to improve our abilities and capabilities around what we can do with data.
Behind this data revolution, there are forces at work, causing enterprises to shift the way they leverage data and accelerate the demand for leverageable data. Organizations (and the climates in which they operate) are becoming more and more complex. They are also becoming increasingly digital and, thus, dependent on how data informs, transforms, and automates their operations and decisions. With increased digitization comes an increased need for both scale and agility at scale.
In this session, we have undertaken an ambitious goal of evaluating the current vendor landscape and assessing which platforms have made, or are in the process of making, the leap to this new generation of Data Management and integration capabilities.
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
Watch full webinar here: https://bit.ly/3dudL6u
It's not if you move to the cloud, but when. Most organisations are well underway with migrating applications and data to the cloud. In fact, most organisations - whether they realise it or not - have a multi-cloud strategy. Single, hybrid, or multi-cloud…the potential benefits are huge - flexibility, agility, cost savings, scaling on-demand, etc. However, the challenges can be just as large and daunting. A poorly managed migration to the cloud can leave users frustrated at their inability to get to the data that they need and IT scrambling to cobble together a solution.
In this session, we will look at the challenges facing data management teams as they migrate to cloud and multi-cloud architectures. We will show how the Denodo Platform can:
- Reduce the risk and minimise the disruption of migrating to the cloud.
- Make it easier and quicker for users to find the data that they need - wherever it is located.
- Provide a uniform security layer that spans hybrid and multi-cloud environments.
Govern and Protect Your End User InformationDenodo
Watch this Fast Data Strategy session with speakers Clinton Cohagan, Chief Enterprise Data Architect, Lawrence Livermore National Lab & Nageswar Cherukupalli, Vice President & Group Manager, Infosys here: https://buff.ly/2k8f8M5
In its recent report “Predictions 2018: A year of reckoning”, Forrester predicts that 80% of firms affected by GDPR will not comply with the regulation by May 2018. Of those noncompliant firms, 50% will intentionally not comply.
Compliance doesn’t have to be this difficult! What if you have an opportunity to facilitate compliance with a mature technology and significant cost reduction? Data virtualization is a mature, cost-effective technology that enables privacy by design to facilitate compliance.
Attend this session to learn:
• How data virtualization provides a compliance foundation with data catalog, auditing, and data security.
• How you can enable single enterprise-wide data access layer with guardrails.
• Why data virtualization is a must-have capability for compliance use cases.
• How Denodo’s customers have facilitated compliance.
ADV Slides: Data Pipelines in the Enterprise and ComparisonDATAVERSITY
Despite the many, varied, and legitimate data platforms that exist today, data seldom lands once in its perfect spot for the long haul of usage. Data is continually on the move in an enterprise into new platforms, new applications, new algorithms, and new users. The need for data integration in the enterprise is at an all-time high.
Solutions that meet these criteria are often called data pipelines. These are designed to be used by business users, in addition to technology specialists, for rapid turnaround and agile needs. The field is often referred to as self-service data integration.
Although the stepwise Extraction-Transformation-Loading (ETL) remains a valid approach to integration, ELT, which uses the power of the database processes for transformation, is usually the preferred approach. The approach can often be schema-less and is frequently supported by the fast Apache Spark back-end engine, or something similar.
In this session, we look at the major data pipeline platforms. Data pipelines are well worth exploring for any enterprise data integration need, especially where your source and target are supported, and transformations are not required in the pipeline.
Data Mesh is the decentralized architecture where your units of architecture is a domain driven data set that is treated as a product owned by domains or teams that most intimately know that data either creating it or they are consuming it and re-sharing it and allocated specific roles that have the accountability and the responsibility to provide that data as a product abstracting away complexity into infrastructure layer a self-serve infrastructure layer so that create these products more much more easily.
Data lakes are central repositories that store large volumes of structured, unstructured, and semi-structured data. They are ideal for machine learning use cases and support SQL-based access and programmatic distributed data processing frameworks. Data lakes can store data in the same format as its source systems or transform it before storing it. They support native streaming and are best suited for storing raw data without an intended use case. Data quality and governance practices are crucial to avoid a data swamp. Data lakes enable end-users to leverage insights for improved business performance and enable advanced analytics.
Rapidly Enable Tangible Business Value through Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3EEU2vK
Uber, the world’s largest taxi company, own no fleet; AirBnb , the largest accommodation provider owns no real estate. The extraordinary way that companies is growing fast, globally and with little investment, was with thin layers on top of a complex system of others’ goods or services that owned the customer interface. In Digital transformation- Data Minimization sometimes very useful to deliver business value rapidly without physical data redundancy – specially for seamless data migration from OLTP, OLAP, Legacy platforms for quick data domain/Data product access for incremental value until the desired Architecture/data estate evolve. To achieve the same -Data virtualization logically allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located, and can provide a single customer view of the overall data. While implementing next-gen solution leveraging DV, it has certain set of key considerations and caveat with focused long term strategy, Target state Architecture and use case.
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
Watch full webinar here: https://bit.ly/3hgOSwm
Data Lake technologies have been in constant evolution in recent years, with each iteration primising to fix what previous ones failed to accomplish. Several data lake engines are hitting the market with better ingestion, governance, and acceleration capabilities that aim to create the ultimate data repository. But isn't that the promise of a logical architecture with data virtualization too? So, what’s the difference between the two technologies? Are they friends or foes? This session will explore the details.
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Denodo
Watch full webinar here: https://bit.ly/3fBpO2M
Data Fabric has been a hot topic in town and Gartner has termed it as one of the top strategic technology trends for 2022. Noticeably, many mid-to-large organizations are also starting to adopt this logical data fabric architecture while others are still curious about how it works.
With a better understanding of data fabric, you will be able to architect a logical data fabric to enable agile data solutions that honor enterprise governance and security, support operations with automated recommendations, and ultimately, reduce the cost of maintaining hybrid environments.
In this on-demand session, you will learn:
- What is a data fabric?
- How is a physical data fabric different from a logical data fabric?
- Which one should you use and when?
- What’s the underlying technology that makes up the data fabric?
- Which companies are successfully using it and for what use case?
- How can I get started and what are the best practices to avoid pitfalls?
Traditionally, data integration has meant compromise. No matter how rapidly data architects and developers could complete a project before its deadline, speed would always come at the expense of quality. On the other hand, if they focused on delivering a quality project, it would generally drag on for months thus exceeding its deadline. Finally, if the teams concentrated on both quality and rapid delivery, the costs would invariably exceed the budget. Regardless of which path you chose, the end result would be less than desirable. This led some experts to revisit the scope of data integration. This write up shall focus on the same issue.
Modern Data Management for Federal ModernizationDenodo
Watch full webinar here: https://bit.ly/2QaVfE7
Faster, more agile data management is at the heart of government modernization. However, Traditional data delivery systems are limited in realizing a modernized and future-proof data architecture.
This webinar will address how data virtualization can modernize existing systems and enable new data strategies. Join this session to learn how government agencies can use data virtualization to:
- Enable governed, inter-agency data sharing
- Simplify data acquisition, search and tagging
- Streamline data delivery for transition to cloud, data science initiatives, and more
Data and Application Modernization in the Age of the Cloudredmondpulver
Data modernization is key to unlocking the full potential of your IT investments, both on premises and in the cloud. Enterprises and organizations of all sizes rely on their data to power advanced analytics, machine learning, and artificial intelligence.
Yet the path to modernizing legacy data systems for the cloud is full of pitfalls that cost time, money, and resources. These issues include high hardware and staffing costs, difficulty moving data and analytical processes to cloud environments, and inadequate support for real-time use cases. These issues delay delivery timelines and increase costs, impacting the return on investment for new, cutting-edge applications.
Watch this webinar in which James Kobielus, TDWI senior research director for data management, explores how enterprises are modernizing their mainframe data and application infrastructures in the cloud to sustain innovation and drive efficiencies. Kobielus will engage John de Saint Phalle, senior product manager at Precisely, in a discussion that addresses the following key questions:
When should enterprises consider migrating and replicating all their data assets to modern public clouds vs. retaining some on-premises in hybrid deployments?How should enterprises modernize their legacy data and application infrastructures to unlock innovation and value in the age of cloud computing?What are the key investments that enterprises should make to modernize their data pipelines to deliver better AI/ML applications in the cloud?What is the optimal data engineering workflow for building, testing, and operationalizing high-quality modern AI/ML applications in the cloud?What value does real-time replication play in migrating data and applications to modern cloud data architectures?What challenges do enterprises face in ensuring and maintaining the integrity, fitness, and quality of the data that they migrate to modern clouds?What tools and methodologies should enterprise application developers use to refactor and transform legacy data applications that have migrated to modern clouds
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
Sharing a presentation highlighting some key aspects to be taken into consideration while harnessing your Digital Transformation projects as a Digital Intelligence enabler for your enterprise
Managing Large Amounts of Data with SalesforceSense Corp
Critical "design skew" problems and solutions - Engaging Big Objects, MuleSoft, Snowflake and Tableau at the right time
Salesforce’s ability to handle large workloads and participate in high-consumption, mobile-application-powering technologies continues to evolve. Pub/sub-models and the investment in adjacent properties like Snowflake, Kafka, and MuleSoft, has broadened the development scope of Salesforce. Solutions now range from internal and in-platform applications to fueling world-scale mobile applications and integrations. Unfortunately, guidance on the extended capabilities is not well understood or documented. Knowing when to move your solution to a higher-order is an important Architect skill.
In this webinar, Paul McCollum, UXMC and Technical Architect at Sense Corp, will present an overview of data and architecture considerations. You’ll learn to identify reasons and guidelines for updating your solutions to larger-scale, modern reference infrastructures, and when to introduce products like Big Objects, Kafka, MuleSoft, and Snowflake.
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3saONRK
COVID-19 has pushed every industry and organization to embrace digital transformation at scale, upending the way many businesses will operate for the foreseeable future. Organizations no longer tolerate monolithic and centralized data architecture; they are embracing flexibility, modularity, and distributed data architecture to help drive innovation and modernize processes.
The pandemic has compelled organizations to accelerate their digital transformation initiatives and look for smarter and more agile ways to manage and leverage their corporate data assets. Data governance has become challenging in the ever-increasing complexity and distributed nature of the data ecosystem. Interoperability, collaboration and trust in data are imperative for a business to succeed. Data needs to be easily accessible and fit for purpose.
In this session, Denodo experts will discuss 5 key trends that are expected to be top of mind for CIOs and CDOs;
- Distributed Data Environments
- Decision Intelligence
- Modern Data Architecture
- Composable Data & Analytics
- Hyper-personalized Experiences
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
3. Trends which transform our data landscapes
Massive increase of computing power driven
by hardware innovation (SSD storage, in-
Memory, GPU) let us move the data to the
compute.
Cloud, API’s make it easier to integrate. Software
& Platform as a Service (SAAS, PAAS) offerings
will push the connectivity and API usage even
further.
Explosion of tools
New (open source) concepts are introduced,
such as NoSQL database types, Block chain,
new database designs, distributed models
(Hadoop), new analytical, etc.
Exponential growth of data; especially external
(open data, social), internal, structured,
unstructured can all be used for delivering more
insight.
Eco-system connectivity
Exponential growth of (outside) data
Increase of computing power
Stronger regulatory requirements, such as
GDPR and BCBS 239. Data Quality and Data
Lineage becomes more important.
Increased regulatory attention
The read/write ratio changes because of
intensive data consumption. Data is read much
more, increased real-time consumption, more
search.
The read/write ratio increases
4. Every application that creates data, needs and will have a database
Application A Application B
Consequently, when we have two applications, we hypothesize that each application has its own ‘database’.
When there is interoperability between these two applications, we expect data to be transferred from one
application to the other.
Every application, at least in the context of data management, that creates data, needs and will have a
database. Even stateless applications that create data have “databases”. In these scenarios the database
typically sits in the RAM or in a temp file.
5. We can’t escape from data integration
Application A Application B
The ‘always’ required data transformation lies in the fact that an application database schema is designed to
meet the application’s specific requirements. Since the requirements differ from application to application,
the schemas are expected to be different and data integration is always required when moving data around.
A crucial aspect when it comes to data transfer is that data integration is always right around the corner.
Whether you do ETL or ELT, virtual or physical, batch or real-time, there’s no escape from the data
integration* dilemma.
Data integration
6. The roles of data provider and data consumer will frame any architecture
Applications are either data providers or data consumers and, as we will see, sometimes both.
These concepts will frame our future architecture
Data provider
• Providing application is the
application where the data is
created (data origination) and
provided from.
• The data in the application is
expected to be known and owned
by owner.
• Must provide a form of backward
compatibility to guarantee stable
consumption.
• Can be external as well, which
requires conformation on data
exchange.
Data consumer
• Consuming application is the
application where the data is
required within a specific context,
e.g., for commercial purposes,
management decisions, risk, etc.
• Typically has unique and diverse
needs.
• A consuming application may be
both a data provider and data
consumer.
7. Problem with existing architectures
There’s a deep assumption that centralization is the solution to data management. This includes
centralizing all data and management activities using one central team, building one data platform,
using one ETL framework, using one canonical model, etc.
Transactional
Sources
Analytical
Consumers
Centralized architecture
• Single team with centralized knowledge and book of work
• Centralized pipelines for all extraction / ingestion activities
• Centralized transformations applied for harmonized data
• Central platform serves as a large integration database: all
execution and analysis is done on the same platform
Data providers Data consumers
Central engineering team
Transactional
Sources
Transactional
Sources
Analytical
Consumers
Analytical
Consumers
8. Business drivers for moving to data mesh
Lack of data
ownership
Lack of data
quality
Difficult to see
interdependencies.
Model conflicts
across business
concerns.
Tremendous effort
of integration and
coordination leads
to bypasses
Siloed teams =>
Business and IT
work in silos
Disconnect
between the data
producer's vs data
consumers
Central team
becomes the
bottleneck
Difficult to apply
policy and
governance
Hard to see
dependencies
(technical dept)
Small changes
become to risky
due to unexpected
consequences
Technical
ownership, rather
than data
ownership
Many Enterprises are saddled with outdated Data Architectures that do not scale to the needs of large multi-
disciplinary organizations.
9. Paradigm shift towards domain-ownership
The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern
distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling
each domain to handle its own data pipelines.
Supporting governance and domain-agnostic platform infrastructure
Data Providers Data Product
Data Providers Data Product
Data Providers Data Product
Source-oriented
domains
Consumer-
specific
transformation
Data Consumer
Consumer-
specific
transformation
Data Consumer
Consumer-
specific
transformation
Data Consumer
Consumption-oriented
domains
10. Governed Mesh
Harmonised Mesh
Highly Federated Mesh
?
Build-out common core services
with flexibility to bolt-on domain
specific customisations
✔Pros
Consistent core processes
Enable domain specialisation
Encourage self-service
Offers flexibility
❌Cons
Increased management overhead
Requires governance and data asset
indexing
Complete autonomy for groups
to implement own stack in
different environments.
✔Pros
Offers flexibility
Reduced time to market
❌Cons
Poor visibility across platform
Incompatible interfaces
Capability duplication, increased costs
Russian doll data integration
Creates technology debt
Leverage common policies and
templates that ensure baseline
security and compatibility.
✔Pros
Consistent core design
Enable domain specialisation
Encourage self-service
Offers organisational flexibility
❌Cons
Increased management overhead
Requires strong governance and
cataloguing
Governance Topologies : Different Approaches
Centralised
(Control)
Distributed
(Agility)
11. D1 D2 D3
• Central indexing: Core Services Provider
pattern enforces domains to always
distribute data via a central hub
• The Core Services Provider better
addresses time-variant and non-volatile
concerns of large data consumers, since
it can facilitate orchestration for data-
dependent domains.
• The Centralized Platform Mesh better
enforces data governance standards:
you, for example, can block distribution
of low-quality data
• The Centralized Platform Mesh can be
complimented with Master Data
Management and Data Quality tools.
• Increased governance and overhead.
Central team might become the
bottleneck.
Data Lake
Example Node Blueprint
Nodes sub-partitioned by domains
Each node is an instance
of the blueprint.
Data and
integration hub
Domain #5 Domain #6
Data
Products
Data Virtualisation
Platform Integration
Data Sources
Data teams
Domain
#3
Domain #2
Domain
#4
Domain #1
Common services
(i.e. Monitoring,
Key mgmt., config repo)
Governed mesh
12. • Azure Harmonised Mesh allows multiple
groups within an organisation to operate
their own analytics platform whilst
adhering to common policies and
standards.
• The central datahub hosts data
catalogue, mesh wide audit capabilities,
monitoring, and services for automation,
data discovery, metadata registration,
etc.
• The central data platform group defines
blueprints that encompass baseline
security, policies, capabilities and
standards.
• New nodes are instantiated based on
these blueprints, which encompass key
capabilities to enable enterprise analytics
(ie. Storage, integration components,
monitoring, key management, ELT,
analytical engines, and automation)
• Node instances can be augmented to
serve respective business requirements,
i.e. deploying additional domains,
customising domains and data products
within the node.
• Nodes are typically split by either org-
division, business function, or region.
Harmonised Mesh
Central
hub
Data
Products
Domain #5 Domain #6
Domain
#3
Domain #2
Domain
#4
Domain #1
Data Virtualisation
Platform Integration
Data Sources
Data teams
13. • Highly federated allows for complete
autonomy for groups to implement own
stack in different environments.
• Allows for greater flexibility for special
domains, e.g., experiments or fast time
to market.
• Allows for mixed governance
approaches, e.g., small domains typically
distribute via central hubs, larger
domains distribute themselves
• Might create a lot of political infighting
over who controls the data and/or data
sovereignty is needed
• Poor visibility across platform
• Incompatible interfaces
• Capability duplication, increased costs
• Russian doll data integration
• Creates technology debt
Highly federated
Central
hub
Data
Products
Data Virtualisation
Platform Integration
Data Sources
Data teams
14. Proposed Architecture: paradigm shift towards distributed data, domain-
driven, self-service and data products
Data domains
This is about decomposing
the architecture around
domains: the origin and
knowledge of data. Data
domains are boundaries that
represent knowledge,
behavior, laws and activities
around data. They are aligned
with application or business
capabilities.
Data products
This is about treating data
as products: stable, read-
optimized and ready for
consumption. A data
product is data from a
domain (data source) which
has data transformation
applied for improved
readability.
Data platform
This is about delivering a
self-serve data platform that
abstracts away the technical
complexity. It is centered
around automation, self-
service onboarding, global
interoperability standards,
and so on.
Data community
This is about building a
culture that conforms to the
same set of standards, such
as data quality, security, etc.
This requires
topologies, discoverable
metadata repositories, a
data marketplace and data
democratization capabilities.
15. Example functional domain decomposition of an Airline company
Online ticket
management
Discount and
loyalty
management
Offline ticket
management
Bookings &
commissions
Delay and
resolution
management
Advertising
and
marketing
Customer
management
Reservation
and planning
Recruitment &
employee
management
Aerospace
engineer
management
Personnel
management
Purser
management
Groundcrew
personal
management
Baggage
handling and
lost items
Pilot
management
Ramp agent
passenger
service
Airplane
maintenance
Engines and
spares
management
Fuel
optimization
Flight plan
and overview
management
Flight
optimization
planning
Aviation
insurance
management
Airport and
lounges
management
Labor and
logistics
management
Assets and
financing
Income and
taxes
management
Cost
management
Partnership
and
communication
IT
services
management
Emission and
fare trading
management
Car leasing
and pick up
services
Regulatory
procedures
management
Customer services management
Staff and personnel management
Airflight management
Supporting services management
16. Example: Collaboration between different domains
Data product
Data product
Data integration
Data integration
Data integration
Data product
Data integration
Customer management
Discount and loyalty management
Baggage handling and lost items
Standard services Standard services
17. The following guidance ensures better data ownership, data usability and data platform usage:
• Define data interoperability standards, such as protocols, file formats and data types.
• Define required metadata: schema, classifications, business terms, attribute relationships, etc.
• Define data filtering approach: reserved column names, encapsulated metadata, etc.
• Determine level of granularity of partitioning (domain, application, component, etc.)
• Setup conditions for onboarding new data: data quality criteria, structure of data, external data, etc.
• Define data product guidance (grouping of data, reference data, data types, etc.)
• Define requirements contract or data sharing repository
• Define governance roles (data owner, application owner, data steward, data user, platform owner, etc.)
• Establish capabilities for lineage ingestion + define procedure for lineage delivery + unique hash key for data lineage
• Define lineage level of granularity (application, table, column)
• Determine classifications, tags, scanning rules
• Define conditions for data consumption (via secure views, secure layer, ETL, etc.)
• How to organize data lake (containers, folders, sub-folders, etc.)
• Define data profiling and life cycle management criteria (move after 7 years, etc.)
• Define enterprise reference data (key identifiers, enrichment process, etc.)
• Define approach for log and historical data processing: transactional data, master data, reference data
• Define process for redeliveries and reconciliation process (data versioning)
• Align with Enterprise Architecture on technology choices: what services are allowed by what domains; what services are reserved.
There is long list of Data Governance-related tasks
18. What are Data Domains?
Search
keywords
Promotions
Top selling
products
Orders
Customer
profiles
Data Products
Integration
Services
Operational
systems
Marketing
Domain
Customer services
Domain
Order management
Domain
• A domain is simply a collection of people typically organized around a common business purpose.
• Create and serve data products to other domains and end users, independently from other domains
• Ensure data is accessible, usable, available, and meets the quality criteria defined
• Evolve data products based on user feedback. Retire data products when they become irrelevant.
19. Recommendation: standardize on ‘common driveway patterns’
Data product
Data producing
services
Centrally managed data infrastructure
capabilities for data consumption
Data consuming
services
Discount and loyalty management
Providing
domains
Domain using common driveway patterns for consuming data
Domain using common driveway patterns for building data products
Data product
Data producing
services
Customer management
Centrally managed data
infrastructure capabilities
Downstream
consumption
20. Data product
Data producing
services
Customer management #1
Centrally managed data
infrastructure capabilities
Data product
Data producing
services
Customer management #2
Centrally managed data
infrastructure capabilities Data consuming
services
Integrated Customer management view
Data product
(Aggregated)
Customer management domain
Example: different instances of the same business capability
21. Data
product
Data
product
Data product
(producer)
Data integration
Data consuming
services
Centrally managed data
infrastructure capabilities
Aggregated data
Data integration
Discount and
loyalty
management
Data
product
Analytical use case
(Inner-architecture)
Downstream
consumption
Data
product
Data producing
services
Centrally managed data
infrastructure capabilities
Example: aggregate creation and sharing newly created data
22. A data and integration mesh must provide a set of common design
patterns to address complex integration challenges
# Pattern type Pattern description
Data
distribution
Application
integration
1.1 CQRS Data products, using batch publishing, are most
efficient when dealing with larger quantities of data
processing, such as advanced analytics or business
intelligence.
X
2.1 (RESTful)
APIs and
callback APIs
APIs that operate within SOA are meant for strongly
consistent reads and commands. The
communication in this model goes directly between
two applications.
X X
2.2 Read APIs APIs that are provided via data products are for
reading eventual consistent data. This is because
there’s a slight delay between the state of the
application and building a data product.
X
3.1 Event
streaming
Events brokers are most suitable for processing,
distributing and routing messages, such as event
notifications, change state detections, and so on.
X X
3.2 Message
queueing
The mediator topology is more useful when all
requests need to go through a central mediator
where it will post messages to queues. This is more
useful when you need to orchestrate a complex
series of events in a workflow or error handling and
transactional integrity is more important.
X
3.3 Event-
carried state
transfer
Event brokers are suitable for event-carried state
transfer and building up history. This can be useful
when applications wants to access larger volumes of
data of other application’s data without calling the
source, or to comply with more complex security
requirements, such as GDPR.
X
Data
Providing
Domain
Operational
commands,
strong reads
API Products
(API Gateway)
Operational
commands,
strong
consistent reads
Event
publishers
Batch
publishers
API Product
(API Gateway)
Data Products
Secure
consumption
(Synapse
views)
Event Products
(Event broker)
Eventual
consistent
reads
Event
subscribers
Batch
subscribers
Event
publishers
Message
publishers
Message
subscribers
Event Products
(Event notifications)
Event products
(Message queuing)
Event
subscribers
API-based
ingestion
2.1
2.2
3.3
1.1
3.1
3.2
Application integration architecture
Data architecture
23. Best practises from the field:
• The transition towards a domain-oriented structure is a transition. Instead of mapping out everything upfront, you can
work out your domain list organically, as you are onboarding new providers and consumers into your architecture.
• Domains should align with the business model, strategies and business processes. The best practise is to use business
capabilities as a reference model, study common terminology (ubiquitous language) and overlapping data
requirements.
• When choosing application boundaries, be aware that the word application means different things to different people.
Lastly, domain modelling and domain-driven design play a vital role in Enterprise Architecture (EA).
• As a general principle, your domains should never directly talk to systems or applications from other domains. Always
use anti-corruption layers: data products, API products or events. A best practice for enforcement is to apply isolation,
for example network segregation
• It’s the ubiquitous language, a formalized and agreed representation of the language, that both the engineers, the
experts and the users share to understand each other. This language is typically stored in a central data catalogue.
• Setting boundaries covers both the business granularity and technical granularity:
• The business granularity starts with a top-down decomposition of the business concerns: the analysis of the highest-
level functional context, scope (i.e., ‘boundary context’) and activities. These must be divided into smaller ‘areas’, use
cases and business objectives. This exercise requires good business knowledge and expertise on how to divide
efficiently business processes, domains, functions etc.
• The technical granularity is performed towards specific goals such as: reusability, flexibility (easy adaptation to
frequent functional changes), performance, security and scalability. The key point of balance is about making the
right trade-offs. Business users might use the same data, but if the technical requirements are conflicting with each
other, it might be better to separate concerns. For example, if one specific business task need to intensively
aggregate data, and other to quickly select individual records, it can be better to separate these conflicting concerns.
The same might apply for flexibility. One business task might require daily changes, the other one must remain stable
for at least a quarter. Again, you should consider separating the concerns.
Data Domain nuances and considerations
24. Best practices for overlapping contexts
Data
products
Domain #1
Domain #2
Domain #3
Data
products
Domain #1 +
shared
Domain #2
Domain #3
Data
products
Shared
Domain #2
Domain #3
Domain #1
Data
products
Domain for
#1, #2 and
#3
In the partnership model, the
integration logic is coordinated in an
ad hoc manner. All domains
cooperate with and regard each
other’s needs. A big commitment is
needed from everybody, because
each cannot change the shared logic
freely.
Separate ways pattern can be used if
the associated cost of duplication are
preferred over reusability. This pattern
is typically a choice when high
flexibility and agility are required by all
different domains.
A conformist pattern can be used
to conform all domains entirely to all
requirements. This pattern can also
be a choice when the integration
work is extremely complex, no other
parties are allowing to have control,
or when vendor packages are used.
Different integration patterns can be used, when multiple domain contexts and relationships exist. The examples
below show three different domains with overlapping concerns.
A customer-supplier pattern can be
used if one domain is strong and
willing to take ownership of the data
and needs of downstream consumers.
The drawbacks of this pattern can be
conflicting concerns, forcing
downstream teams to negotiate
deliverables and schedule prioritizes.
25. Data Products:
• Data, which is made available for broad consumption.
• Are aligned to the domain: business functions and goals.
• Inherit the ubiquitous language.
• Optimized (transformed) for readability: complex application models are abstracted away.
• Decoupled from the operational/transactional application.
• Use sub-products, which are logically organized around subject areas
• Not conformed to specific needs of data consumers.
• Are captured directly from the source. Not obfuscated via other systems.
• Are semantically consistent across all delivery methods: batch, event-driven and API-based.
• Remain compatible from the moment created.
• Adhere to central interoperability standards.
Data Product Ownership:
• Each data product has a data (product) owner.
• Data owners are responsible for the governance, metadata, quality and transformations.
• Newly created data leads to new data ownership.
• May delegate its responsibilities for sub-products.
What are Data Products?
Data
product
Data
product
Data
product
Data
product
is owned contains sub-
products
can set requirements
26. Metadata about ownership, definitions,
technical schemas, interfaces,
consumption, security, logging, etc.
Source-aligned domains:
represent the reality of the business, as
closely as possible
Consumer-aligned domains:
analytical transformed data, which fits
the needs of a specific use case.
Principles for success:
• Data is managed and delivered
throughout the domains.
• New data results in new ownership.
• Metadata must be captured for helping
the organization to gain confidence in
data.
• Data consumers can also become data
providers. If so, they must adhere to the
same principles.
• Decouple producers from consumers
• Optimize for intensive data consumption
• Decouple when crossing the boundaries
• Domain boundaries are infrastructure-,
network-, and organization-agnostic.
Consumer- and provider model facilitated via centralized governance
Data
product
Data
product
Data
product
Data
product
27. Principles for success:
• Hide the application technical details
• The ubiquitous language is the language for communication
• Interfaces must have a certain level of maturity and stability
• Data should be consistent across all patterns
Providers may utilize one or multiple data distribution components at the
same time. If so, the same principles apply
Additional guidance:
• No raw data! Encapsulate legacy or complex systems. A
consuming team might act as a provider by abstracting
complexity and guaranteeing interface compatibility.
• External providers: conformation pattern or mediation via an
additional team.
28. If your chain of data distribution is engineered correctly, you can automatically extract
indicators of interface stability, data quality, lineage and, and schema information, etc.
Ownership, context
and security
classifications
Transformation
Lineage
Transformation
Lineage
Usage and
application
statistics
Data quality
29. Team Topologies as a delivery approach for fast flow development
Platform
Team
Enabling
Team
Business
users
IT
Engineers
Business
users
IT
Engineers
Business
users
IT
Engineers
Governance
team
Data
products
Data
products
Data
products
Enabling
Team
BLUE: Domain-aligned teams, organized per
data domain building data products
GREY: Enabling teams handling overarching subjects,
like data distribution and data science
RED: Platform team(s) building the
self-service platform
GREEN: Governance team(s)
defining policies and standards
Generic
capabilities
Business
users
IT
Engineers
Business
users
IT
Engineers
30. Azure Event
Hubs
Azure Data
Lake Store Gen2
For capturing read-
optimized domain data
Data
Product
Team
Data
Product
Team
Data
Product
Team
Data
Product
Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and
exploration)
Data
Product
Team
Data
Product
Team
Data
Product
Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data
Engineering
Team
Data Management
Landing Zone
Data governance
team
Azure Purview
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-
optimized data products)
Data integration
Data integration
Data Landing Zone
Data
Bricks
Shared Service
Data-driven
applications
Data product
Data product
Example reference architecture for governed mesh; small-sized company
31. Azure Event
Hubs
Data Management
Services
Domain
Team
Domain
Team
Domain
Team
Domain
Team
Domain
Team
API management team
Data distribution team
Data-driven
enablement
Application-integration
enablement
Domain
Team
Domain
Team
Azure API
Management
(Domain-
oriented APIs)
Logic apps for
aggregation
and/or experience
Container team
Domain
Team
Domain
Team
Data-driven
applications
Real-time applications,
operational systems
External facing
front-end
applications
Analytical
applications
Azure
Kubernetes
Services
High-
performing web
application
enablement
team
High-performing web
application enablement
Platform
enablement
Platform
team
Modern
application
development
enablement
Real-time application
integration
32. Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-
optimized domain data
Data
Product
Team
Data
Product
Team
Data
Product
Team
Data
Product
Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and
exploration)
Data
Product
Team
Data
Product
Team
Data
Product
Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data
Engineering
Team
Data Landing Zone for data consumption enablement
Data Management
Landing Zone
Data governance
team
Azure Purview
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-
optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
Data-driven
applications
Data product
Data product
Data
Bricks
Shared Service
Example reference architecture for governed mesh; using landing zones to optimize
distribution and consumption of data
33. Data Management
Landing Zone
Data governance
team
Azure Purview
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimizeddomain data
Data Product
Team
Data Product
Team
Data Product
Team
Data Product
Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hocandexploration)
Data Product
Team
Data Product
Team
Data Product
Team
Real-time applications, operational
systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimizeddata products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
Data-driven
applications
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimizeddomain data
Data Product
Team
Data Product
Team
Data Product
Team
Data Product
Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hocandexploration)
Data Product
Team
Data Product
Team
Data Product
Team
Real-time applications, operational
systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimizeddata products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
Data-driven
applications
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimizeddomain data
Data Product
Team
Data Product
Team
Data Product
Team
Data Product
Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hocandexploration)
Data Product
Team
Data Product
Team
Data Product
Team
Real-time applications, operational
systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimizeddata products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
Data-driven
applications
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimizeddomain data
Data Product
Team
Data Product
Team
Data Product
Team
Data Product
Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hocandexploration)
Data Product
Team
Data Product
Team
Data Product
Team
Real-time applications, operational
systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimizeddata products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
Data-driven
applications
Data product
Data product
Example reference architecture for harmonized mesh; using landing zones for larger
domains
34. Data Management
Landing Zone
Data governance
team
Azure Purview
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Data Management
Landing Zone
Data governance
team
Azure Purview
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Data Management
Landing Zone
Data governance
team
Azure Purview
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Data Management
Landing Zone
Data governance
team
Azure Purview
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Azure Event
Hubs
Azure Data
Lake Gen2
For capturing read-optimized domain data
Data Product Team
Data Product Team
Data Product Team
Data Product Team
Data-onboarding team
Data integration
Synapse
Analytics
(Serverless for
ad-hoc and exploration)
Data Product Team
Data Product Team
Data Product Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data Engineering Team
Data Lake Services
(data products)
Azure Data
Factory
(Transformations to read-optimized data products)
Data integration
Data integration
Data Landing Zone for data distribution enablement
Data
Bricks
Shared Service
D
a
t
a
-
d
ri
v
e
n
a
p
p
li
c
a
ti
o
n
s
Data product
Data product
Example reference architecture for harmonized mesh; using multiple data management
landing zones and many data landing zones to a world-wide distributed organization
35. Data entity-based approach (guidance only)
Customer Orders
Product
Enterprise Data Governance Committee
A data governance committee, made up of data owners,
will help oversee data governance company-wide and act
as an authority. Ultimately, the data governance committee
sets the rules and policies for the data governance
initiative. They will receive and review reports regarding
new procedures, policies, protocols.
Dispute
resolution
process
Operating model is independent
of location and line of business,
and thus must be aligned with
your business domains.
Sponsor
(CIO / CDO)
Domain
Data Steward(s)
Business
Domain SMEs
IT Data
Architect
Data owner
Dispute
resolution
process
Domain
Data Steward(s)
Business
Domain SMEs
IT Data
Architect
Data owner
Dispute
resolution
process
Domain
Data Steward(s)
Business
Domain SMEs
IT Data
Architect
Data owner
Dispute
resolution
process
Data Gov Working Group
Data steward Data steward
Order management domain
Virtual SME community
Data Gov Working Group
Data steward Data steward
Data Gov Working Group
Data steward Data steward
Customer services domain
Virtual SME community
Marketing domain
Virtual SME community
36. Control board and working groups (guidance only)
Data owners
DG Leader Selected SMEs
IT Lead
e.g. Lead Enterprise Architect
Enterprise Data Governance Committee
Domain
Data Steward(s)
Business
Domain SMEs
IT Data
Architect
Data owner
Domain
Data Steward(s)
Business
Domain SMEs
IT Data
Architect
Data owner
… …
Multiple working groups
Executive
Data Governance Working Group Data Governance Working Group
Typical Data Governance Committee activities:
• Makes strategic and tactical decisions
• Approve standards for interoperability, metadata,
data sharing, contract management, etc.
• Set global and domain-oriented data services
• Determine domain boundaries
• Approve enterprise classifications
• Approve data product and lineage guidance
• Approve dispute handling and policy documents
• Role assignment of data owners
• Assign remediating high-priority issues
37. Data Governance process
Identify
data
Register &
assign
ownership
Catalogue,
remediate
& refine
Onboard
data
products
Define
use case
Define
data
needs
Consult
data
owner(s)
Identify
data
Obtain
approval(s)
Register
contract
Register
lineage
Raise
data
issues
Register
new data
creation
Data onboarding process
Data consumption process
Data controlling process
Inject
usage
policies
One of the most important aspects of data governance is having a well-documented data-onboarding and consumption framework. The
underlying process, as illustrated below, is typically outlined in a RACI matrix describing who is responsible, accountable, to be consulted and to
be informed within a certain enforcement, process or for a certain artifact as a policy or standard.
38. Why Data Governance is easier on public cloud
• Data locality is easier: everything is metadata-
driven (management groups, tagging, labeled
resources, policies, etc.)
• Governance enforcement is easier: consistency via
policies, hub-spoke deployment models,
subscription boundaries, etc.
• No need to maintain copies with new technologies
like Azure Synapse and Polyglot Persistence
(virtualized instances, fast queries)
• Large availability of powerful tools to process
data at scale
• Security in a hybrid world can be better enforced
via Azure policies, Azure Arc, Azure Monitor,
managed identities, audit logging, data retention
policies, fine-grained access controls, etc.
39. Companies typically may go through any or all these stages
• Define first use case(s)
• Deploy first data management
landing zone
• Define first (ingestion) pattern
(e.g., batch parquet)
• Develop first data product
(ingested raw, abstracted to
product)
• Determine 'just-enough'
governance
• Define metadata requirements
(application information,
schema metadata)
• Register first data consumer
(manual process)
• Refine target architecture
• Deploy additional data
management landing zones
• Extend with second, third and
fourth data products
• Realize data product metadata
repository (database or excel)
• Implement first set of controls
(data quality, schema validation)
• Realize consuming pipeline
(taking input as output)
• Establish data ownership
• Implement self-service
registration, metadata ingestion
• Offer additional transformation
patterns (transformation
framework, ETL tools, etc.)
• Enrich controls on provider side
(glossary, lineage, linkage)
• Implement consuming process:
approvals, use case metadata,
deploy secure views by hand
• Establish data governance
control board
• Apply automation: automatic
secure view provisioning
• Deploy strong data governance,
setup dispute body
• Finalize data product guidelines
• Define additional
interoperability standard
• Develop self-service data
consumption process
• Develop data query, self-
service, catalogue, lineage
capabilities, etc.
• Develop additional data
marketplace capabilities.
Stage 1 – First landing
zone
Stage 2 – Additional
data domains
Stage 3 – Improve
consumption readiness
Stage 4 – Critical
governance components
40. Enterprise Scale Analytics and AI – Solution Narrative & Architecture
Solution Brief
The Enterprise Scale Analytics and AI
Framework is intended to provide a robust,
distributed, and scalable Azure analytics
environment for large enterprise customers.
Incorporating the tenants of the Well
Architected Framework, it follows a hub-and-
spoke model for centralized governance and
controls while allowing logical or functional
business units to operate individual Landing
Zones to facilitate their analytics workloads.
The centralized Data Management
Subscription allows for collaboration and
information sharing without compromising
security.
Additionally, the framework provides
organizational guidance to align with best
practice and maintain security boundaries.
Recommendations are outlined across
operational teams and end-user personas to
ensure that all relevant needs can be met.
Throughout the development of this framework,
Customer Architecture & Engineering have been
working side-by-side with the customer and our
Product & Engineering teams.
Architecture
Key Components
Critical Data Management
Subscription Services – central to
each customer environment is the
Data Management Sub which is
dependent on the following services
for facilitating metadata, security,
and governance of the entire
ecosystem:
• Azure Purview
• Azure Log Analytics
• Azure Key Vault
• Azure Active Directory
• Azure Private Link
• Azure Virtual Network
Core Data Landing Zone Services
– For each Data LZ deployment, the
requested data elements and use-
cases will most commonly use the
following services:
• Azure Synapse Analytics
• Azure Databricks
• Azure Data Factory
• Azure Data Lake Storage
An objective of this framework is to
keep all aspects of operations on
the Azure platform, including 3rd
party components where necessary.
Scenario
• Enterprise Scale Analytics and AI
allows multiple groups within an
organisation to operate their
own analytics platform whilst
adhering to common policies
and standards.
• The central Data Management
Subscription hosts data
catalogue, mesh wide audit
capabilities, monitoring, and
auxiliary services for automation.
• The central data platform group
defines policies that encompass
baseline security, policies,
capabilities and standards.
• New Data Landing Zones are
instantiated based on these
blueprints, which encompass key
capabilities to enable enterprise
analytics (i.e.. Storage,
monitoring, key management,
ELT, analytical engines, and
automation)
• Data Landing Zones can be
augmented to serve respective
business requirements, i.e.,
deploying additional domains,
customising domains and data
products within the Data
Landing Zone.
• Data Landing Zones are typically
split by either org-division,
function, or region.
Customer Scenario
• Committed to agility and Self-Service Analytics
• Already using ADLS Gen 2 or migrating from ADLS Gen 1.
• Minimum deployment of 1 x Data Management and 1 x Data LZ
• Expandable by adding Data LZs as business needs change or grow
Design Principles
• Data Domain and Data Product
democratization
• Microservices-driven design
• Policy-driven governance
• Single control and management plane
• Align Azure-native design and roadmaps
Data Landing Zone = Azure Subscription
Enterprise Scaled Version
Data Management
Subscription
Data Landing Zone
Data Products
Basic Version
Source: Customer Architecture and Engineering (CAE)
41. Cloud Adoption Framework for Data Management and Analytics: https://docs.microsoft.com/en-us/azure/cloud-adoption-
framework/scenarios/data-management/
Start with a common value proposition:
• Create a common vision for data use aligned with the goals of the organization.
• Allocate data governance representatives throughout the organization. Gather support.
• Stress out the importance of data ownership: robust data, stable data consumption, increased customer satisfaction, new business opportunities.
• Stress out that higher quality creates greater trust in data.
• Layout common definitions of “Data Architecture” and “Data Governance”. Make it relevant for the rest of the organization.
• Identify the most difficult data challenges facing stakeholders and determine how data governance can address these.
• Establish a data governance body and its target operating model.
• Create improved understanding and transparency around processes.
• Identify milestones and metrics for your data governance proposition. These might include:
• Reduction of time in finding and collecting data
• Reduction of solving data inconsistencies and errors
• Time saved by streamlining data processes
• Improvements made by data quality
• Expanded use and new use cases
• Regulatory compliancy, data privacy and security goals
Typical next steps
42. Progress through the data governance maturity model
Ungoverned Stage 1 Stage 2 Fully governed
People
No stakeholder executive sponsor Stakeholder sponsor in place Stakeholder sponsor in place Stakeholder sponsor in place
No roles and responsibilities defined Roles and responsibilities defined Roles and responsibilities defined Roles and responsibilities defined
No DG control board DG control board in place but no ability DG control board in place with data DG control board in place with data
No DG working groups No DG working groups Some DG working groups in place All DG working groups in place
No data owners accountable for data No data owners accountable for data Some data owners in place All data owners in place
No data stewards appointed with responsibility for
data quality
Some data stewards in place for DQ but scope too
broad e.g. whole dept
Data stewards in place and assigned to DG working
groups for specific data
Data stewards in place assigned to DG working
groups for specific data
No one accountable for data privacy No one accountable for data privacy CPO accountable for privacy (no tools) CPO accountable for privacy with tools
No one accountable for access security IT accountable for access security IT Sec accountable for access security IT Sec accountable for access security & responsible
for enforcing privacy
No one to produce trusted data assets Data publisher identified and accountable for
producing trusted data
Data publisher identified and accountable for
producing trusted data
Data publisher identified and accountable for
producing trusted data
No SMEs identified for data entities Some SMEs identified but not engaged SMEs identified & in DG working groups SMEs identified & in DG working groups
Process
No common business vocabulary Common biz vocabulary started in a glossary Common business vocabulary established Common business vocabulary complete
No way to know where data is located, its data quality
or if it is sensitive data
Data catalog auto data discovery, profiling & sensitive
data detection on some systems
Data catalog auto data discovery, profiling & sensitive
data detection on all structured data
Data catalog auto data discovery, profiling & sensitive
data detection on structured & unstructured in all
systems w/ full auto tagging
No process to govern authoring or maintenance of
policies and rules
Governance of data access security policy authoring
& maintenance on some systems
Governance of data access security, privacy &
retention policy authoring & maintenance
Governance of data access security, privacy &
retention policy authoring & maintenance
No way to enforce policies & rules Piecemeal enforcement of data access security
policies & rules across systems with no catalog
integration
Enforcement of data access security and privacy
policies and rules across systems with catalog
integration
Enforcement of data access security, privacy &
retention policies and rules across all systems
No processes to monitor data quality, data privacy or
data access security
Some ability to monitor data quality
Some ability to monitor privacy (e.g. queries)
Monitoring and stewardship of DQ & data privacy on
core systems with DBMS masking
Monitoring and stewardship of DQ & data privacy on
all systems with dynamic masking
No availability of fully trusted data assets Dev started on a small set of trusted data assets using
data fabric software
Several core trusted data assets created using data
fabric
Continuous delivery of trusted data assets with
enterprise data marketplace
No way to know if a policy violation occurred or
process to act if it did
Data access security violation detection in some
systems
Data access security violation detection in all systems Data access security violation detection in all systems
No vulnerability testing process Limited vulnerability testing process Vulnerability testing process on all systems Vulnerability testing process on all systems
No common process for master data creation,
maintenance & sync
MDM with common master data CRUD & sync
processes for single entity
MDM with common master data CRUD & sync
processes for some data entities
MDM with common master data CRUD & sync
processes for all master data entities complete
43. Progress through the data governance maturity model
Ungoverned Stage 1 Stage 2 Fully governed
Policies
No data governance classification schemes on
confidentiality & retention
Data governance classification scheme for
confidentiality
Data governance classification scheme for both
confidentiality and retention
Data governance classification scheme for both
confidentiality and retention
No policies & rules to govern data quality Policies & rules to govern data quality started in
common vocabulary in business glossary
Policies & rules to govern data quality defined in
common vocabulary in catalog biz glossary
Policies & rules to govern data quality defined in
common vocabulary in catalog biz glossary
No policies & rules to govern data access security Some policies & rules to govern data access security
created in different technologies
Policies & rules to govern data access security & data
privacy consolidated in the data catalog using
classification scheme
Policies & rules to govern data access security, data
privacy and retention consolidated in the data catalog
using classification schemes and enforced everywhere
No policies & rules to govern data privacy Some policies & rules to govern data privacy Policies & rules to govern data access security & data
privacy consolidated in the data catalog using
classification scheme
Policies & rules to govern data access security, data
privacy and retention consolidated in the data catalog
using classification schemes and enforced everywhere
No policies & rules to govern data retention No policies & rules to govern data retention Some policies & rules to govern data retention Policies & rules to govern data access security, data
privacy and retention consolidated in the data catalog
using classification schemes and enforced everywhere
No policies & rules to govern master data
maintenance
Policies & rules to govern master data maintenance
for a single master data entity
Policies & rules to govern master data maintenance
for some master data entities
Policies & rules to govern master data maintenance
for all master data entities
Technology
No data catalog with auto data discovery, profiling &
sensitive data detection
Data catalog with auto data discovery, profiling &
sensitive data detection purchased
Data catalog with auto data discovery, profiling &
sensitive data detection purchased
Data catalog with auto data discovery, profiling &
sensitive data detection purchased
No data fabric software with multi-cloud edge and
data centre connectivity
Data fabric software with multi-cloud edge and data
centre connectivity & catalog integration purchased
Data fabric software with multi-cloud edge and data
centre connectivity & catalog integration purchased
Data fabric software with multi-cloud edge and data
centre connectivity & catalog integration purchased
No metadata lineage Metadata lineage available in data catalog on trusted
assets being developed using fabric
Metadata lineage available in data catalog on trusted
assets being developed using fabric
Metadata lineage available in data catalog on trusted
assets being developed using fabric
No data stewardship tools Data stewardship tools available as part of the data
fabric software
Data stewardship tools available as part of the data
fabric software
Data stewardship tools available as part of the data
fabric software
No data access security tool Data access security in multiple technologies Data access security in multiple technologies Data access security enforced in all systems
No data privacy enforcement software No data privacy enforcement software Data privacy enforcement in some DBMSs Data privacy enforcement in all data stores
No master data management system Single entity master data management system Multi-entity master data management system Multi-entity master data management system
44. Cloud Adoption Framework (CAF)
Subscriptions
for
operational
applications
Other
Subscriptions
Data Management Landing Zone
(Purview, Master Data Management, Data Quality, Key Vault, Policies)
Data Landing Zone
Data Domain Data Domain Data Domain Data Domain Data Domain
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
Single landing zone; you're just starting or prefer to be in control
45. Cloud Adoption Framework (CAF)
Subscriptions Subscriptions
Data Management Landing Zone
(Purview, Master Data Management, Data Quality, Key Vault, Policies)
Data Landing Zone
(consumer-aligned)
Data Domain Data Domain Data Domain Data Domain Data Domain
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
Data Landing Zone
(source system-aligned)
Data Domain
Data
Product
Data
Product
Data
Product
Source system- and consumer-aligned landing zones
46. Data Landing Zone
(special)
Data Landing Zone
(main subsidiaries; both providing and consuming)
Cloud Adoption Framework (CAF)
Data Management Landing Zone
(Purview, Master Data Management, Data Quality, Key Vault, Policies)
Data Landing Zone
(distribution hub)
Data Domain Data Domain
Data
Product
Data
Product
Data Domain
Data
Product
Data
Product
Data Domain
Data
Product
Data
Product
Data Domain
Data
Product
Data
Product
Hub-, generic- and special data landing zones
Data
Product
Data
Product
Data
Product
Data
Product
Data
Product
47. Data Landing Zone
(#n)
Cloud Adoption Framework (CAF)
Subscriptions Subscriptions
Data Management Landing Zone
(Purview, Master Data Management, Data Quality, Key Vault, Policies)
Data Landing Zone
(functional area #1)
Data Domain
Data
Product
Data
Product
Data Domain
Data
Product
Data
Product
Data Domain
Data
Product
Data
Product
Data Landing Zone
(functional area #2)
Data Domain
Data
Product
Data
Product
Data Domain
Data
Product
Data
Product
Functional and regionally aligned data landing zones
48. Cloud Adoption Framework (CAF)
Subscriptions Subscriptions
Data Management Landing Zone
Data Landing
Zone
Data
Domain
Data
Product
Data
Product
Data Landing
Zone
Data
Domain
Data
Product
Data
Product
Data Landing
Zone
Data
Domain
Data
Product
Data
Product
Data Management Landing Zone
Data Landing
Zone
Data
Domain
Data
Product
Data
Product
Data Landing
Zone
Data
Domain
Data
Product
Data
Product
Data Landing
Zone
Data
Domain
Data
Product
Data
Product
Large scale enterprise requiring different data management zones
49. Providing domains Consuming domains
Data sources
Data platform instance
Raw data
technical,
unstructured
various file types
Anti-corruption
layer
Data sources Raw
Read-optimized,
immutable, organized
around subject areas
Data
Product
latest historical
archive
active
Data
source
Data platform instance
Data sources
Data Mart
DWH
(Dimensional
models)
Data mart
Data mart
Pipeline
Aggregate
Highly reusable data
latest historical
High-level platform design and governance
Data
Product
Data
Product
Data
Product Data
Product
Data
Product
Data platform
Data provider
Newly created data
Data
Product
Data
Product
50. Azure Solution Architecture
Azure Event
Hubs
Data sources Raw
(L1)
Spark
Data Providing Enabling Services
Reporting and Analytics
Analysis
Services
Machine
Learning
Cognitive
Services
Azure
Data Share
Data Warehousing and Data Share
Serverless
Pools
Dedicated
Pools
Data sources Azure
Functions
Logic apps for
aggregation
and/or experience
External facing
front-end
applications
Real-time application
integration
Application
Team
Application
Team
Real-time and operational applications
Data
Team
New data
Data
Team
Reporting
Azure
Functions
Azure Purview
Pipelines
Curated
(L2)
Combined
(L3)
Azure Event
Hubs
Platform
team
Governance
team