The document provides an overview of data mesh principles and hands-on examples for implementing a data mesh. It discusses key concepts of a data mesh including data ownership by domain, treating data as a product, making data available everywhere through self-service, and federated governance of data wherever it resides. Hands-on examples are provided for creating a data mesh topology with Apache Kafka as the underlying infrastructure, developing data products within domains, and exploring consumption of real-time and historical data from the mesh.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
Past, present and future of data mesh at Intuit. This deck describes a vision and strategy for improving data worker productivity through a Data Mesh approach to organizing data and holding data producers accountable. Delivered at the inaugural Data Mesh Leaning meetup on 5/13/2021.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
Past, present and future of data mesh at Intuit. This deck describes a vision and strategy for improving data worker productivity through a Data Mesh approach to organizing data and holding data producers accountable. Delivered at the inaugural Data Mesh Leaning meetup on 5/13/2021.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
Organizations have been chasing the dream of data democratization, unlocking and accessing data at scale to serve their customers and business, for over a half a century from early days of data warehousing. They have been trying to reach this dream through multiple generations of architectures, such as data warehouse and data lake, through a cambrian explosion of tools and a large amount of investments to build their next data platform. Despite the intention and the investments the results have been middling.
In this keynote, Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, and its predecessor data warehouse.
She introduces Data Mesh, a paradigm shift in big data management that draws from modern distributed architecture: considering domains as the first class concern, applying self-sovereignty to distribute the ownership of data, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
This talk introduces the principles underpinning data mesh and Zhamak's recent learnings in creating a path to bring data mesh to life in your organization.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Product-thinking is making a big impact in the data world with the rise of Data Products, Data Product Managers, data mesh, and treating “Data as a Product.” But Honest, No-BS: What is a Data Product? And what key questions should we ask ourselves while developing them? Tim Gasper (VP of Product, data.world), will walk through the Data Product ABCs as a way to make treating data as a product way simpler: Accountability, Boundaries, Contracts and Expectations, Downstream Consumers, and Explicit Knowledge.
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
Data mesh was among the most discussed and controversial enterprise data management topics of 2021. One of the reasons people struggle with data mesh concepts is we still have a lot of open questions that we are not thinking about:
Are you thinking beyond analytics? Are you thinking about all possible stakeholders? Are you thinking about how to be agile? Are you thinking about standardization and policies? Are you thinking about organizational structures and roles?
Join data.world VP of Product Tim Gasper and Principal Scientist Juan Sequeda for an honest, no-bs discussion about data mesh and its role in data governance.
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
With the aid of any number of data management and processing tools, data flows through multiple on-prem and cloud storage locations before it’s delivered to business users. As a result, IT teams — including IT Ops, DataOps, and DevOps — are often overwhelmed by the complexity of creating a reliable data pipeline that includes the automation and observability they require.
The answer to this widespread problem is a centralized data pipeline orchestration solution.
Join Stonebranch’s Scott Davis, Global Vice President and Ravi Murugesan, Sr. Solution Engineer to learn how DataOps teams orchestrate their end-to-end data pipelines with a platform approach to managing automation.
Key Learnings:
- Discover how to orchestrate data pipelines across a hybrid IT environment (on-prem and cloud)
- Find out how DataOps teams are empowered with event-based triggers for real-time data flow
- See examples of reports, dashboards, and proactive alerts designed to help you reliably keep data flowing through your business — with the observability you require
- Discover how to replace clunky legacy approaches to streaming data in a multi-cloud environment
- See what’s possible with the Stonebranch Universal Automation Center (UAC)
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...Jochem van Grondelle
Recently the concept of a ‘data mesh’ was introduced by Zhamak Deghani to solve architectural and organizational challenges with getting value from data at scale more logically and efficiently, built around four principles:
* Domain-oriented decentralized data ownership
* Data as a product
* Self-serve data infrastructure as a platform
* Federated computational governance
This presentation will initially deep-dive into the ‘data mesh’ and how it fundamentally differs from the typical data lake architectures used today. Subsequently, it describes OLX Europe’s current data platform state aimed partially towards a more decentralized data architecture, covering its analytical data platform, data infrastructure, data discovery, and data privacy.
Finally, it will see to what extent the main principles around the ‘data mesh’ can be applied to a future vision for our data platform and what advantages and challenges implementing such a vision can bring for OLX and other companies.
For more information on data mesh principles, check out the original article by Zhamak: https://martinfowler.com/articles/data-mesh-principles.html.
Apache Kafka and the Data Mesh | Michael Noll, ConfluentHostedbyConfluent
Data mesh is a relatively recent term that describes a set of principles that good modern data systems uphold. A kind of “microservices” for the data-centric world. While the data mesh is not technology-specific as a pattern, the building of systems that adopt and implement data mesh principles have a relatively long history under different guises.
In this talk, we share our recommendations and picks of what every developer should know about building a streaming data mesh with Kafka. We introduce the four principles of the data mesh: domain-driven decentralization, data as a product, self-service data platform, and federated governance. We then cover topics such as the differences between working with event streams versus centralized approaches and highlight the key characteristics that make streams a great fit for implementing a mesh, such as their ability to capture both real-time and historical data. We’ll examine how to onboard data from existing systems into a mesh, modelling the communication within the mesh, how to deal with changes to your domain’s “public” data, give examples of global standards for governance, and discuss the importance of taking a product-centric view on data sources and the data sets they share.
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentHostedbyConfluent
Data mesh is a relatively recent term that describes a set of principles that good modern data systems uphold. A kind of “microservices” for the data-centric world. While the data mesh is not technology-specific as a pattern, the building of systems that adopt and implement data mesh principles have a relatively long history under different guises.
In this talk, we share our recommendations and picks of what every developer should know about building a streaming data mesh with Kafka. We introduce the four principles of the data mesh: domain-driven decentralization, data as a product, self-service data platform, and federated governance. We then cover topics such as the differences between working with event streams versus centralized approaches and highlight the key characteristics that make streams a great fit for implementing a mesh, such as their ability to capture both real-time and historical data. We’ll examine how to onboard data from existing systems into a mesh, modelling the communication within the mesh, how to deal with changes to your domain’s “public” data, give examples of global standards for governance, and discuss the importance of taking a product-centric view on data sources and the data sets they share.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
Organizations have been chasing the dream of data democratization, unlocking and accessing data at scale to serve their customers and business, for over a half a century from early days of data warehousing. They have been trying to reach this dream through multiple generations of architectures, such as data warehouse and data lake, through a cambrian explosion of tools and a large amount of investments to build their next data platform. Despite the intention and the investments the results have been middling.
In this keynote, Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, and its predecessor data warehouse.
She introduces Data Mesh, a paradigm shift in big data management that draws from modern distributed architecture: considering domains as the first class concern, applying self-sovereignty to distribute the ownership of data, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
This talk introduces the principles underpinning data mesh and Zhamak's recent learnings in creating a path to bring data mesh to life in your organization.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Product-thinking is making a big impact in the data world with the rise of Data Products, Data Product Managers, data mesh, and treating “Data as a Product.” But Honest, No-BS: What is a Data Product? And what key questions should we ask ourselves while developing them? Tim Gasper (VP of Product, data.world), will walk through the Data Product ABCs as a way to make treating data as a product way simpler: Accountability, Boundaries, Contracts and Expectations, Downstream Consumers, and Explicit Knowledge.
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
Data mesh was among the most discussed and controversial enterprise data management topics of 2021. One of the reasons people struggle with data mesh concepts is we still have a lot of open questions that we are not thinking about:
Are you thinking beyond analytics? Are you thinking about all possible stakeholders? Are you thinking about how to be agile? Are you thinking about standardization and policies? Are you thinking about organizational structures and roles?
Join data.world VP of Product Tim Gasper and Principal Scientist Juan Sequeda for an honest, no-bs discussion about data mesh and its role in data governance.
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
With the aid of any number of data management and processing tools, data flows through multiple on-prem and cloud storage locations before it’s delivered to business users. As a result, IT teams — including IT Ops, DataOps, and DevOps — are often overwhelmed by the complexity of creating a reliable data pipeline that includes the automation and observability they require.
The answer to this widespread problem is a centralized data pipeline orchestration solution.
Join Stonebranch’s Scott Davis, Global Vice President and Ravi Murugesan, Sr. Solution Engineer to learn how DataOps teams orchestrate their end-to-end data pipelines with a platform approach to managing automation.
Key Learnings:
- Discover how to orchestrate data pipelines across a hybrid IT environment (on-prem and cloud)
- Find out how DataOps teams are empowered with event-based triggers for real-time data flow
- See examples of reports, dashboards, and proactive alerts designed to help you reliably keep data flowing through your business — with the observability you require
- Discover how to replace clunky legacy approaches to streaming data in a multi-cloud environment
- See what’s possible with the Stonebranch Universal Automation Center (UAC)
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...Jochem van Grondelle
Recently the concept of a ‘data mesh’ was introduced by Zhamak Deghani to solve architectural and organizational challenges with getting value from data at scale more logically and efficiently, built around four principles:
* Domain-oriented decentralized data ownership
* Data as a product
* Self-serve data infrastructure as a platform
* Federated computational governance
This presentation will initially deep-dive into the ‘data mesh’ and how it fundamentally differs from the typical data lake architectures used today. Subsequently, it describes OLX Europe’s current data platform state aimed partially towards a more decentralized data architecture, covering its analytical data platform, data infrastructure, data discovery, and data privacy.
Finally, it will see to what extent the main principles around the ‘data mesh’ can be applied to a future vision for our data platform and what advantages and challenges implementing such a vision can bring for OLX and other companies.
For more information on data mesh principles, check out the original article by Zhamak: https://martinfowler.com/articles/data-mesh-principles.html.
Apache Kafka and the Data Mesh | Michael Noll, ConfluentHostedbyConfluent
Data mesh is a relatively recent term that describes a set of principles that good modern data systems uphold. A kind of “microservices” for the data-centric world. While the data mesh is not technology-specific as a pattern, the building of systems that adopt and implement data mesh principles have a relatively long history under different guises.
In this talk, we share our recommendations and picks of what every developer should know about building a streaming data mesh with Kafka. We introduce the four principles of the data mesh: domain-driven decentralization, data as a product, self-service data platform, and federated governance. We then cover topics such as the differences between working with event streams versus centralized approaches and highlight the key characteristics that make streams a great fit for implementing a mesh, such as their ability to capture both real-time and historical data. We’ll examine how to onboard data from existing systems into a mesh, modelling the communication within the mesh, how to deal with changes to your domain’s “public” data, give examples of global standards for governance, and discuss the importance of taking a product-centric view on data sources and the data sets they share.
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentHostedbyConfluent
Data mesh is a relatively recent term that describes a set of principles that good modern data systems uphold. A kind of “microservices” for the data-centric world. While the data mesh is not technology-specific as a pattern, the building of systems that adopt and implement data mesh principles have a relatively long history under different guises.
In this talk, we share our recommendations and picks of what every developer should know about building a streaming data mesh with Kafka. We introduce the four principles of the data mesh: domain-driven decentralization, data as a product, self-service data platform, and federated governance. We then cover topics such as the differences between working with event streams versus centralized approaches and highlight the key characteristics that make streams a great fit for implementing a mesh, such as their ability to capture both real-time and historical data. We’ll examine how to onboard data from existing systems into a mesh, modelling the communication within the mesh, how to deal with changes to your domain’s “public” data, give examples of global standards for governance, and discuss the importance of taking a product-centric view on data sources and the data sets they share.
Evolution from EDA to Data Mesh: Data in Motionconfluent
Thoughtworks Zhamak Dehghani observations on these traditional approaches’s failure modes, inspired her to develop an alternative big data management architecture that she aptly named the Data Mesh. This represents a paradigm shift that draws from modern distributed architecture and is founded on the principles of domain-driven design, self-serve platform, and product thinking with Data. In the last decade Apache Kafka has established a new category of data management infrastructure for data in motion that has been leveraged in modern distributed data architectures.
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
"Data mesh is a relatively recent architectural innovation, espoused as one of the best ways to fix analytic data. We renegotiate aged social conventions by focusing on treating data as a product, with a clearly defined data product owner, akin to that of any other product. In addition, we focus on building out a self-service platform with integrated governance, letting consumers safely access and use the data they need to solve their business problems.
Data mesh is prescribed as a solution for _analytical data_, so that conventionally analytical results (think weekly sales or monthly revenue reports) can be more accurately and predictably computed. But what about non-analytical business operations? Would they not also benefit from data products backed by self-service capabilities and dedicated owners? If you've ever provided a customer with an analytical report that differed from their operational conclusions, then this talk is for you.
Adam discusses the resounding successes he has seen from applying data mesh _off-label_ to both analytical and operational domains. The key? Event streams. Well-defined, incrementally updating data products that can power both real-time and batch-based applications, providing a single source of data for a wide variety of application and analytical use cases. Adam digs into the common areas of success seen across numerous clients and customers and provides you with a set of practical guidelines for implementing your own minimally viable data mesh.
Finally, Adam covers the main social and technical hurdles that you'll encounter as you implement your own data mesh. Learn about important data use cases, data domain modeling techniques, self-service platforms, and building an iteratively successful data mesh."
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
If there were a buzzword of the hour, it would certainly be "data mesh"! This new architectural paradigm unlocks analytic data at scale and enables rapid access to an ever-growing number of distributed domain datasets for various usage scenarios.
As such, the data mesh addresses the most common weaknesses of the traditional centralized data lake or data platform architecture. And the heart of a data mesh infrastructure must be real-time, decoupled, reliable, and scalable.
This presentation explores how Apache Kafka, as an open and scalable decentralized real-time platform, can be the basis of a data mesh infrastructure and - complemented by many other data platforms like a data warehouse, data lake, and lakehouse - solve real business problems.
There is no silver bullet or single technology/product/cloud service for implementing a data mesh. The key outcome of a data mesh architecture is the ability to build data products; with the right tool for the job.
A good data mesh combines data streaming technology like Apache Kafka or Confluent Cloud with cloud-native data warehouse and data lake architectures from Snowflake, Databricks, Google BigQuery, et al.
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
Tinder’s Quickfire Pipeline powers all things data at Tinder. It was originally built using AWS Kinesis Firehoses and has since been extended to use both Kafka and other event buses. It is the core of Tinder’s data infrastructure. This rich data flow of both client and backend data has been extended to service a variety of needs at Tinder, including Experimentation, ML, CRM, and Observability, allowing backend developers easier access to shared client side data. We perform this using many systems, including Kafka, Spark, Flink, Kubernetes, and Prometheus. Many of Tinder’s systems were natively designed in an RPC first architecture.
Things we’ll discuss decoupling your system at scale via event-driven architectures include:
– Powering ML, backend, observability, and analytical applications at scale, including an end to end walk through of our processes that allow non-programmers to write and deploy event-driven data flows.
– Show end to end the usage of dynamic event processing that creates other stream processes, via a dynamic control plane topology pattern and broadcasted state pattern
– How to manage the unavailability of cached data that would normally come from repeated API calls for data that’s being backfilled into Kafka, all online! (and why this is not necessarily a “good” idea)
– Integrating common OSS frameworks and libraries like Kafka Streams, Flink, Spark and friends to encourage the best design patterns for developers coming from traditional service oriented architectures, including pitfalls and lessons learned along the way.
– Why and how to avoid overloading microservices with excessive RPC calls from event-driven streaming systems
– Best practices in common data flow patterns, such as shared state via RocksDB + Kafka Streams as well as the complementary tools in the Apache Ecosystem.
– The simplicity and power of streaming SQL with microservices
Modern Data Management for Federal ModernizationDenodo
Watch full webinar here: https://bit.ly/2QaVfE7
Faster, more agile data management is at the heart of government modernization. However, Traditional data delivery systems are limited in realizing a modernized and future-proof data architecture.
This webinar will address how data virtualization can modernize existing systems and enable new data strategies. Join this session to learn how government agencies can use data virtualization to:
- Enable governed, inter-agency data sharing
- Simplify data acquisition, search and tagging
- Streamline data delivery for transition to cloud, data science initiatives, and more
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Denodo
Watch full webinar here: https://bit.ly/3fBpO2M
Data Fabric has been a hot topic in town and Gartner has termed it as one of the top strategic technology trends for 2022. Noticeably, many mid-to-large organizations are also starting to adopt this logical data fabric architecture while others are still curious about how it works.
With a better understanding of data fabric, you will be able to architect a logical data fabric to enable agile data solutions that honor enterprise governance and security, support operations with automated recommendations, and ultimately, reduce the cost of maintaining hybrid environments.
In this on-demand session, you will learn:
- What is a data fabric?
- How is a physical data fabric different from a logical data fabric?
- Which one should you use and when?
- What’s the underlying technology that makes up the data fabric?
- Which companies are successfully using it and for what use case?
- How can I get started and what are the best practices to avoid pitfalls?
Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. One was is to first persist the data into a data store and then use a traditional data visualisation solution to present the data.
If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualisation tools might already integrate with the specific data store. An other option is to use a Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solution and highlights some of the products available to implement these blueprints.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeDenodo
Watch this webinar in full here: https://buff.ly/2IxM8Iy
Watch all webinars from the Denodo Packed Lunch webinar series here: https://buff.ly/2IR3q6w
While big data initiatives have become necessary for any business to generate actionable insights, big data fabric has become a necessity for any successful big data initiative. The best of breed big data fabrics should deliver actionable insights to the business users with minimal effort, provide end-to-end security to the entire enterprise data platform and provide real-time data integration, while delivering self-service data platform to business users.
Attend this session to learn how big data fabric enabled by data virtualization:
• Provides lightning fast self-service data access to business users
• Centralizes data security, governance and data privacy
• Fulfills the promise of data lakes to provide actionable insights
Getting real-time analytics for devices/application/business monitoring from trillions of events and petabytes of data like companies Netflix, Uber, Alibaba, Paypal, Ebay, Metamarkets do.
Microsoft Fabric is the next version of Azure Data Factory, Azure Data Explorer, Azure Synapse Analytics, and Power BI. It brings all of these capabilities together into a single unified analytics platform that goes from the data lake to the business user in a SaaS-like environment. Therefore, the vision of Fabric is to be a one-stop shop for all the analytical needs for every enterprise and one platform for everyone from a citizen developer to a data engineer. Fabric will cover the complete spectrum of services including data movement, data lake, data engineering, data integration and data science, observational analytics, and business intelligence. With Fabric, there is no need to stitch together different services from multiple vendors. Instead, the customer enjoys end-to-end, highly integrated, single offering that is easy to understand, onboard, create and operate.
This is a hugely important new product from Microsoft and I will simplify your understanding of it via a presentation and demo.
Agenda:
What is Microsoft Fabric?
Workspaces and capacities
OneLake
Lakehouse
Data Warehouse
ADF
Power BI / DirectLake
Resources
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo
Watch full webinar here: https://buff.ly/46pRfV7
This Denodo session explores the power of data virtualization, shedding light on its architecture, customer value, and a diverse range of use cases. Attendees will discover how the Denodo Platform enables seamless connectivity to various data sources while effortlessly combining, cleansing, and delivering data through 5 differentiated use cases.
Architecture: Delve into the core architecture of the Denodo Platform and learn how it empowers organizations to create a unified virtual data layer. Understand how data is accessed, integrated, and delivered in a real-time, agile manner.
Value for the Customer: Explore the tangible benefits that Denodo offers to its customers. From cost savings to improved decision-making, discover how the Denodo Platform helps organizations derive maximum value from their data assets.
Five Different Use Cases: Uncover five real-world use cases where Denodo's data virtualization platform has made a significant impact. From data governance to analytics, Denodo proves its versatility across a variety of domains.
- Logical Data Fabric
- Self Service Analytics
- Data Governance
- 360 degree of Entities
- Hybrid/Multi-Cloud Integration
Watch this illuminating session to gain insights into the transformative capabilities of the Denodo Platform.
"Going Offline", one of the hottest mobile app trendsDerek Baron
One of the hottest trends in mobile is "going offline", yet organizations are faced with a tripling of time and cost when adding offline functionality to a business app. According to Forrester Research, the ability to work offline is "the most important and difficult mobile feature...and will be a consideration for nearly every modern application".
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
2. @yourtwitterhandle | developer.confluent.io
What’s Covered
1 What Is Data Mesh?
2 Hands On: Creating Your Own Data Mesh
3 Data Ownership by Domain
4 Data as a Product
5 Data Available Everywhere, Self-Service
6 Data Governed Wherever It Is
7 Implementing a Data Mesh
8 Hands On: Exploring the Data Mesh
9 Hands On: Creating a Data Product
5. @yourtwitterhandle | developer.confluent.io
The Principles of a Data Mesh
Data ownership by
domain
Data as a product Data governed
wherever it is
Data available
everywhere, self
serve
1 2 3 4
13. @yourtwitterhandle | developer.confluent.io
The Principles of a Data Mesh
Domain-driven
Decentralization
Data as a First
Class Product
Federated
Governance
Self-Serve Data
Platform
1 2 3 4
Local Autonomy
(Organizational Concerns)
14. @yourtwitterhandle | developer.confluent.io
Principle 1: Domain-Driven Decentralization
Objective: Ensure data is owned by those that truly understand it
Pattern: Ownership of a data asset given to the
“local” team that is most familiar with it
Centralized
Data Ownership
Decentralized
Data Ownership
Anti-pattern: responsibility for data becomes
the domain of the DWH team
15. @yourtwitterhandle | developer.confluent.io
Practical Example
Shipping Data
Joe
1. Joe in Inventory has a problem with Order
data.
2. Inventory items are going negative,
because of bad Order data.
3. He could fix the data up locally in the
Inventory domain, and get on with his job.
4. Or, better, he contacts Alice in Orders and
get it fixed at the source. This is more
reliable as Joe doesn’t fully understand
the Orders process.
5. Ergo, Alice needs be an responsible &
responsive “Data Product Owner”, so
everyone benefits from the fix to Joe’s
problem.
Orders Domain Shipment Domain
Order Data
Inventory Billing Recommendations
Alice
16. @yourtwitterhandle | developer.confluent.io
Recommendations: Domain-Driven
Decentralization
Learn from DDD:
● Use a standard language and nomenclature for data.
● Business users should understand a data flow diagram.
● The stream of events should create a shared narrative that is business-user
comprehensible.
18. @yourtwitterhandle | developer.confluent.io
The Principles of a Data Mesh
Domain-driven
Decentralization
Data as a First-
Class Product
Federated
Governance
Self-Serve Data
Platform
1 2 3 4
Local Autonomy
(Organizational Concerns)
Product thinking,
“Microservice for Data”
19. @yourtwitterhandle | developer.confluent.io
Principle 2: Data as a First-Class Product
Objective: Make shared data discoverable, addressable, trustworthy, secure, so
other teams can make good use of it.
● Data is treated as a true product, not a by-product.
This product thinking is
important to prevent
data chauvinism.
20. @yourtwitterhandle | developer.confluent.io
Domain
Data Product
Data Product, a “Microservice for the
Data World”
● Data product is a node on the data mesh, situated within a domain.
● Produces—and possibly consumes—high-quality data within the mesh.
● Encapsulates all the elements required for its function, namely data + code + infrastructure.
Infra
Code
Data
Creates, manipulates,
serves, etc. that data
Powers the data (e.g., storage) and the
code (e.g., run, deploy, monitor)
“Items About to Expire”
Data Product
Data and metadata,
including history
23. @yourtwitterhandle | developer.confluent.io
Data
Product
Event Streaming Is the Pub/Sub, Not
Point to Point
stream
(persisted) other streams
write
(publish)
read
(consume)
independently
Data producers are scalably decoupled from consumers
Data
Product
Data
Product
Data
Product
24. @yourtwitterhandle | developer.confluent.io
Why Is Event Streaming a Good Fit for Meshing?
Streams are real-time, low latency ⇒ Propagate data
immediately.
Streams are highly scalable ⇒ Handle today’s
massive data volumes.
Streams are stored, replayable ⇒ Capture real-time &
historical data.
Streams are immutable ⇒ Auditable
source of record.
Streams are addressable, discoverable, … ⇒ Meet key criteria for mesh
Data
Product
Data
Product
1 2 3 4 5 6 7
25. @yourtwitterhandle | developer.confluent.io
How to Get Data Into and Out of a
Data Product
Snapshot via
Nighty ETL
Continuous
Stream
Snapshot via
Req/Res API
Snapshot via
Nighty ETL
Snapshot via
Req/Res API
1
2
3
Data
Product
Output
Data
Ports
Input
Data
Ports
Continuous
Stream
26. @yourtwitterhandle | developer.confluent.io
Onboarding Existing Data
Source
Connectors
Use Kafka connectors to stream data from cloud
services and existing systems into the mesh
Data
Product
Input
Data
Ports
https://www.confluent.io/hub/
27. @yourtwitterhandle | developer.confluent.io
Data Product: What’s Happening Inside
Data on the Inside: HOW the domain team solves specific problems
internally? This doesn’t matter to other domains.
…pick your favorites...
Output
Data
Ports
Input
Data
Ports
28. @yourtwitterhandle | developer.confluent.io
Event Streaming Inside a Data Product
Use ksqlDB, Kafka Streams apps, etc., for processing data in motion
ksqlDB to filter,
process, join,
aggregate, analyze
Stream data from
other DPs or internal
systems into ksqlDB
1 2 Stream data to internal
systems or the outside.
Pull queries can drive a
req/res API.
3
Req/Res API
Pull Queries
Output
Data
Ports
Input
Data
Ports
29. @yourtwitterhandle | developer.confluent.io
Event Streaming Inside a Data Product
Use Kafka connectors and CDC to “streamify” classic databases
DB client apps
work as usual
Stream data from
other Data Products
into your local DB
1 2 Stream data to the
outside with CDC and
e.g. the Outbox
Pattern, ksqlDB, etc.
3
Sink
Connector
Source
Connector
MySQL
Output
Data
Ports
Input
Data
Ports
30. @yourtwitterhandle | developer.confluent.io
Dealing with Data Change: Schemas and
Versioning
Publish evolving streams with back/forward-compatible schemas.
Publish versioned streams for breaking changes.
V1 - user, product, quantity
V2 - userAnonymized, product, quantity
Data
Product
Output
Data
Ports
Also, when needed, data can be fully reprocessed by replaying history.
31. @yourtwitterhandle | developer.confluent.io
Recommendations: Data as a First-Class
Product
1. Data-on-the-Outside is harder to change, but it has more value in a holistic sense.
a. Use schemas as a contract.
b. Handle incompatible schema changes using Dual Schema Upgrade Window pattern.
2. Get data from the source, not from intermediaries. Think: Demeter's law applied to data.
a. Otherwise, proliferation of ‘slightly corrupt’ data within the mesh. “Game of Telephone”.
b. Event Streaming makes it easy to subscribe to data from authoritative sources.
3. Change data at the source, including error fixes. Don’t “fix data up” locally.
4. Some data sources will be difficult to turn into first-class data products. Example: Batch-
based sources that lose event-level data or are not reproducible.
a. Use Event Streaming plus CDC, Outbox Pattern, etc. to integrate these into the mesh.
32. @yourtwitterhandle | developer.confluent.io
Learn More
Find out how to implement a Data Product for data in motion with:
1. Apache Kafka 101 course
2. Kafka Connect 101 course
3. Building Data Pipelines with Apache Kafka & Confluent course
4. ksqlDB 101 course
developer.confluent.io/learn-kafka/
34. @yourtwitterhandle | developer.confluent.io
The Principles of a Data Mesh
Domain-driven
Decentralization
Data as a First
Class Product
Federated
Governance
Self-Serve Data
Platform
1 2 3 4
Local Autonomy
(Organizational Concerns)
Product thinking,
“Microservice for Data”
Infra Tooling,
Across Domains
35. @yourtwitterhandle | developer.confluent.io
Why Self-Service Matters
Trade Surveillance System
● Data from 13 sources
● Some sources publish events
● Needed both historical and real-time data
● Historical data from database extracts arranged with dev team.
● Format of events different to format of extracts
● 9 months of effort to get 13 sources into the new system.
36. @yourtwitterhandle | developer.confluent.io
Principle 3: Self-Serve Data Platform
Central infrastructure that provides real-time and historical data on demand
Objective: Make domains autonomous in their execution through rapid data
provisioning
37. @yourtwitterhandle | developer.confluent.io
Consuming Real-Time & Historical Data from
the Mesh
1) Separate Systems for Real-Time and Historical Data (Lambda Architecture)
● Considerations:
○ Difficulty to correlate real-time with historical “snapshot” data
○ Two systems to manage
○ Unlike event streams, snapshots have less granularity
1) One System for Real-Time and Historical Data (Kappa Architecture)
● Considerations:
○ Operational complexity (addressed in Confluent Cloud)
○ Downsides of immutability of regular streams: e.g. altering or deleting events
○ Storage cost (addressed in Confluent Cloud, in Apache Kafka with KIP-405)
40. @yourtwitterhandle | developer.confluent.io
Data Mesh Is Queryable and Decentralized with
ksqlDB
Destination
Data Port
Query is the interface
to the mesh
Events are the
interface to the mesh
STREAM
PROCESSOR
42. @yourtwitterhandle | developer.confluent.io
Learn More
Find out more about how to use Kafka and event streaming:
1. Apache Kafka 101 course
2. Event Sourcing and Event Storage with Apache Kafka course
Learn how to create materialized views and process event streams:
1. Inside ksqlDB course
developer.confluent.io/learn-kafka/
44. @yourtwitterhandle | developer.confluent.io
The Principles of a Data Mesh
Domain-driven
Decentralization
Data as a First
Class Product
Federated
Governance
Self-Serve Data
Platform
1 2 3 4
Local Autonomy
(Organizational Concerns)
Product thinking,
“Microservice for Data”
Infra Tooling,
Across Domains
Interoperability, Network Effects
(Organizational Concerns)
45. @yourtwitterhandle | developer.confluent.io
Domain Domain Domain
Domain
Principle 4: Federated Governance
Objective: Independent data products can interoperate and create network effects.
● Establish global standards, like governance, that apply to all data products in the mesh.
● Ideally, these global standards and rules are applied automatically by the platform.
Self-Serve Data Platform
What is decided
locally by a domain?
What is globally?
(implemented and
enforced by
platform)
Must balance between Decentralization vs. Centralization. No silver bullet!
46. @yourtwitterhandle | developer.confluent.io
SELECT … FROM orders o
LEFT JOIN shipments s
ON o.customerId = s.customerId
EMIT CHANGES;
Example Standard: Identifying Customers
Globally
● Define how data is represented, so you can join and correlate data across different domains.
● Use data contracts, schemas, registries, etc. to implement and enforce such standards.
● Use Event Streaming to retrofit historical data to new requirements, standards.
Domain
Data Product
customerId=29639
customerId=29639
customerId=29639
customerId=29639
47. @yourtwitterhandle | developer.confluent.io
Example Standard: Detect Errors and Recover
with Stream
● Use strategies like logging, data profiling, data lineage, etc. to detect errors in the mesh.
● Streams are very helpful to detect errors and identify cause-effect relationships.
● Streams let you recover and fix errors: e.g., replay & reprocess historical data.
My App
Bug? Error? Rewind to start of
stream, then reprocess.
Event Streams
give you a powerful
Time Machine.
0 1 2 3 4 5 6 7 8 9
If needed, tell the
origin data product to
fix problematic data at
the source.
Data
Product
Output
Data
Ports
48. @yourtwitterhandle | developer.confluent.io
Example Standard: Tracking Data Lineage
with Streams
● Lineage must work across domains and data products—and systems, clouds, data centers.
● Event streaming is a foundational technology for this.
Domain
Data Product
49. @yourtwitterhandle | developer.confluent.io
Recommendations: Federated Governance
1. Be pragmatic: Don’t expect governance systems to be perfect.
a. They are a map that helps you navigate the data-landscape of your company.
b. But there will always be roads that have changed or have not been mapped.
2. Governance is more a process—i.e., an organizational concern—than a technology.
3. Beware of centralized data models, which can become slow to change. Where they
must exist, use processes & tooling like GitHub to collaborate and change quickly.
Good luck! 🙂
51. @yourtwitterhandle | developer.confluent.io
Data Mesh Journey
Principle 1
Data should have one owner:
the team that creates it.
Principle 2
Data is your product:
All exposed data should be
good data.
Principle 3
Get access to any data
immediately and painlessly, be
it historical or real-time.
Principle 4: Governance, with standards, security, lineage, etc.
(cross-cutting concerns)
Difficulty
to execute
Start Here
1
2
3
52. @yourtwitterhandle | developer.confluent.io
Implement a Data Mesh
● Centralize data in motion
Introduce a central event streaming
platform.
● Nominate data owners
Firm owners for all key datasets in the
organization. Make ownership information
broadly accessible.
● Data on demand.
Events are either stored in Kafka indefinitely
or can be republished by data products on
demand.
● Handle Schema Change
Owners publish schema information to the
mesh. Process introduced for schema
change approval.
● Secure event streams
Access to event streams is permissioned by
a central body.
● Connect from any database
Sink Connectors are made available for all
supported database types to ease the
provisioning of new output data ports in the
mesh.
● Central user interface
Discovery and registration of event streams
Searching schemas for data of interest
Previewing event streams
Requesting to access event streams.
Data lineage views