SlideShare a Scribd company logo
Designing and implementing
Data Mesh at your company
In partnership with:
Participating meetups in
Boston
NYC
Chicago
Toronto
Montreal
Who we are
/in/royhasson/
/in/jasonfhall/
Roy Hasson - Head of product @ Upsolver
Jason Hall - Sr. Solutions Architect @ Upsolver
Ex-AWS
- Product for Amazon Athena, AWS Glue and AWS Lake Formation
- Founding member of AWS Data Lake and Data Mesh initiatives
- Guiding and supporting Data Mesh implementations with customers
- Works with customers to plan and implement data pipeline strategies
- Helps to ensure successful data projects from inception to production
-
Challenge to make big impacts, quicker
Business users are saying:
It takes too long to onboard new data
Central IT/data teams are a bottleneck
Can’t find, understand and access data
Takes too long to make small tweaks
Engineering users are saying:
We don’t understand business needs
Too many requests and tweaks
Integrations are complex and fragile
Difficult to hire good data engineers
Trying to solve the challenge with existing patterns
https://aws.amazon.com/big-data/what-is-a-data-lake/ https://databricks.com/product/data-lakehouse
Lakehouse
Decoupled
Data Lake
Build to suit
https://www.snowflake.com/blog/data-cloud-hybrid-data-warehouse-data-lake/
Data Warehouse
Hybrid
These solutions do not work on their own
Data lake
- Too low level, integrations are manual and complex
- Encourages inconsistent implementations, difficult to secure
- Open and vibrant community
Lakehouse
- Fewer tools options, simpler to implement, manual integrations
- Encourages centralization and lock-in
- Vibrant community in parts of the stack (storage and core engine)
Hybrid DWH
- 3-4 primary vendors to choose from, vertically integrated
- Encourages centralization and lock-in
- Limited by the vendor’s roadmap
This is not what we’re talking about
https://future.a16z.com/emerging-architectures-modern-data-infrastructure/
…this - Introducing Data Mesh
https://martinfowler.com/articles/data-monolith-to-mesh.html
Flexible organization design aligned to business needs
Flexible organization design and self-service tooling
Data domains - Autonomous units with ownership and accountability. Domains can produce
and/or consume data with other domains
Data infrastructure as a platform - Build once use everywhere. Enables consistent tooling,
engineering and security best practices, and ease of integration.
Data as a product - Data assets are treated like products. Delivered in a reliable, consistent and
secure manner. They are easily discoverable and accessible across the org
Overarching governance - Procedures and guidelines to secure, audit and control quality of data
in the organization.
Why Data Mesh at JPMC
Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo
High level Data Mesh design @ JPMC
Source AWS @ https://aws.amazon.com/blogs/big-data/how-jpmorgan-chase-built-a-data-mesh-architecture-to-drive-significant-value-to-enhance-their-enterprise-data-platform/
A single data domain built on an open data lake architecture
Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo
Creating a mesh with multiple data domains
Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo
Why Data Mesh at Intuit
Source Intuit July 2021 @ Data Mesh Learning Meetup - https://youtu.be/tNcxoASumB8
Intuit Data Mesh data products
Intuit data mesh strategy @ https://medium.com/intuit-engineering/intuits-data-mesh-strategy-778e3edaa017
Why Data Mesh at Zalando
Source Zalando @ Spark + AI Summit 2020 - https://youtu.be/eiUhV56uVUc
Moving to a Data Mesh at Zalando
Source Zalando @ Spark + AI Summit 2020 - https://youtu.be/eiUhV56uVUc
What can we learn from JPMC, Intuit and Zalando
1. Primary drivers - Autonomy, ownership and data-as-a-product
2. Sharing - producer/consumer model
3. Common data infrastructure - improve cost, scale and management overhead
a. JPMC opted for a build your own data lake
b. Zalando used Databricks Lakehouse as a base for their platform
c. Intuit created an open platform letting data domains choose
4. Central catalog - unified data asset discoverability, collaboration and entitlements
What to consider when getting started
1. What are the primary outcomes when implementing Data Mesh?
a. Autonomy - eliminating bottlenecks
b. Ownership and accountability - single owner, governance, quality and hygiene of data
c. Sharing - share and collaborate with teams to do more with data
d. Data products and data as code
2. Data infra - build vs. buy
a. Is owning the infra business critical?
b. Do you have the resources, how long will it take to build, how invested will you be 2yrs from now?
c. Can you build some and buy some?
3. What are the most important outputs you need to deliver?
a. Ownership and discoverability = unified catalog
b. Autonomy = producer/consumer, data contracts
c. Data as code = GitOps + dbt/python + data contracts
What to avoid early on
1. Don’t try to solve loosely defined problems
a. What does governance mean to you?
b. What does self-service analytics mean?
2. Don’t expand your scope, reduce it
a. Focus on outputs you need to deliver on your primary business outcomes
3. Don’t over complicate your architecture
a. Try to avoid doing everything that seems cool today
b. Build on top of best practices and familiar patterns - simpler to support and find help
c. Avoid vendor and technology lock-in
d. The more you build, the more you need to maintain. Avoid unnecessary tech debt
Getting started with organizational autonomy
Extending to make discovery and understanding easier
Starting with data as a product
Summary
● Data Mesh is an organizational pattern - get your company on-board
● Identify the primary business outcomes you want to deliver with DM
● Focus on what you need to build now to deliver on an outcome soon
● Ensure data has clear ownership and accountability (quality, SLA, etc.)
● Treat data as a product
Demo architecture and data flow
Thank you
Join the Upsolver Community
to continue the conversation
upsolver.com
/in/royhasson/
/in/jasonfhall/
Schedule a Demo: Sign Up for SQLake:
Last Resort…Email the Sales Guy:
* $20 Door Dash Gift Card for everyone that schedules a demo
Actually, There is Such a Thing as a Free Lunch…..*

More Related Content

Similar to Boston Data Engineering: Designing and Implementing Data Mesh at Your Company with Upsolver

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
Himanshu Bari
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: Cloud
DATAVERSITY
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: Cloud
Data Blueprint
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User Information
Denodo
 
CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)
Michael King
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
DATAVERSITY
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Denodo
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced Analytics
DATAVERSITY
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Denodo
 
Open Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITOpen Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise IT
andreas kuncoro
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)
Denodo
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdf
Neo4j
 
Cloud Computing and Enterprise Architecture
Cloud Computing and Enterprise ArchitectureCloud Computing and Enterprise Architecture
Cloud Computing and Enterprise Architecture
David Linthicum
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
Denodo
 
pwc-data-mesh.pdf
pwc-data-mesh.pdfpwc-data-mesh.pdf
pwc-data-mesh.pdf
ssuser18927d
 
DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?
DATAVERSITY
 
Developing Your Cloud Strategy
Developing Your Cloud StrategyDeveloping Your Cloud Strategy
Developing Your Cloud Strategy
VISI
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
PwC
 

Similar to Boston Data Engineering: Designing and Implementing Data Mesh at Your Company with Upsolver (20)

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: Cloud
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: Cloud
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User Information
 
CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced Analytics
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
 
Open Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITOpen Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise IT
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdf
 
Cloud Computing and Enterprise Architecture
Cloud Computing and Enterprise ArchitectureCloud Computing and Enterprise Architecture
Cloud Computing and Enterprise Architecture
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
 
pwc-data-mesh.pdf
pwc-data-mesh.pdfpwc-data-mesh.pdf
pwc-data-mesh.pdf
 
DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?
 
Developing Your Cloud Strategy
Developing Your Cloud StrategyDeveloping Your Cloud Strategy
Developing Your Cloud Strategy
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
 

Recently uploaded

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 

Recently uploaded (20)

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 

Boston Data Engineering: Designing and Implementing Data Mesh at Your Company with Upsolver

  • 1. Designing and implementing Data Mesh at your company In partnership with: Participating meetups in Boston NYC Chicago Toronto Montreal
  • 2. Who we are /in/royhasson/ /in/jasonfhall/ Roy Hasson - Head of product @ Upsolver Jason Hall - Sr. Solutions Architect @ Upsolver Ex-AWS - Product for Amazon Athena, AWS Glue and AWS Lake Formation - Founding member of AWS Data Lake and Data Mesh initiatives - Guiding and supporting Data Mesh implementations with customers - Works with customers to plan and implement data pipeline strategies - Helps to ensure successful data projects from inception to production -
  • 3. Challenge to make big impacts, quicker Business users are saying: It takes too long to onboard new data Central IT/data teams are a bottleneck Can’t find, understand and access data Takes too long to make small tweaks Engineering users are saying: We don’t understand business needs Too many requests and tweaks Integrations are complex and fragile Difficult to hire good data engineers
  • 4. Trying to solve the challenge with existing patterns https://aws.amazon.com/big-data/what-is-a-data-lake/ https://databricks.com/product/data-lakehouse Lakehouse Decoupled Data Lake Build to suit https://www.snowflake.com/blog/data-cloud-hybrid-data-warehouse-data-lake/ Data Warehouse Hybrid
  • 5. These solutions do not work on their own Data lake - Too low level, integrations are manual and complex - Encourages inconsistent implementations, difficult to secure - Open and vibrant community Lakehouse - Fewer tools options, simpler to implement, manual integrations - Encourages centralization and lock-in - Vibrant community in parts of the stack (storage and core engine) Hybrid DWH - 3-4 primary vendors to choose from, vertically integrated - Encourages centralization and lock-in - Limited by the vendor’s roadmap
  • 6. This is not what we’re talking about https://future.a16z.com/emerging-architectures-modern-data-infrastructure/
  • 7. …this - Introducing Data Mesh https://martinfowler.com/articles/data-monolith-to-mesh.html Flexible organization design aligned to business needs
  • 8. Flexible organization design and self-service tooling Data domains - Autonomous units with ownership and accountability. Domains can produce and/or consume data with other domains Data infrastructure as a platform - Build once use everywhere. Enables consistent tooling, engineering and security best practices, and ease of integration. Data as a product - Data assets are treated like products. Delivered in a reliable, consistent and secure manner. They are easily discoverable and accessible across the org Overarching governance - Procedures and guidelines to secure, audit and control quality of data in the organization.
  • 9. Why Data Mesh at JPMC Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo
  • 10. High level Data Mesh design @ JPMC Source AWS @ https://aws.amazon.com/blogs/big-data/how-jpmorgan-chase-built-a-data-mesh-architecture-to-drive-significant-value-to-enhance-their-enterprise-data-platform/
  • 11. A single data domain built on an open data lake architecture Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo
  • 12. Creating a mesh with multiple data domains Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo
  • 13. Why Data Mesh at Intuit Source Intuit July 2021 @ Data Mesh Learning Meetup - https://youtu.be/tNcxoASumB8
  • 14. Intuit Data Mesh data products Intuit data mesh strategy @ https://medium.com/intuit-engineering/intuits-data-mesh-strategy-778e3edaa017
  • 15. Why Data Mesh at Zalando Source Zalando @ Spark + AI Summit 2020 - https://youtu.be/eiUhV56uVUc
  • 16. Moving to a Data Mesh at Zalando Source Zalando @ Spark + AI Summit 2020 - https://youtu.be/eiUhV56uVUc
  • 17. What can we learn from JPMC, Intuit and Zalando 1. Primary drivers - Autonomy, ownership and data-as-a-product 2. Sharing - producer/consumer model 3. Common data infrastructure - improve cost, scale and management overhead a. JPMC opted for a build your own data lake b. Zalando used Databricks Lakehouse as a base for their platform c. Intuit created an open platform letting data domains choose 4. Central catalog - unified data asset discoverability, collaboration and entitlements
  • 18. What to consider when getting started 1. What are the primary outcomes when implementing Data Mesh? a. Autonomy - eliminating bottlenecks b. Ownership and accountability - single owner, governance, quality and hygiene of data c. Sharing - share and collaborate with teams to do more with data d. Data products and data as code 2. Data infra - build vs. buy a. Is owning the infra business critical? b. Do you have the resources, how long will it take to build, how invested will you be 2yrs from now? c. Can you build some and buy some? 3. What are the most important outputs you need to deliver? a. Ownership and discoverability = unified catalog b. Autonomy = producer/consumer, data contracts c. Data as code = GitOps + dbt/python + data contracts
  • 19. What to avoid early on 1. Don’t try to solve loosely defined problems a. What does governance mean to you? b. What does self-service analytics mean? 2. Don’t expand your scope, reduce it a. Focus on outputs you need to deliver on your primary business outcomes 3. Don’t over complicate your architecture a. Try to avoid doing everything that seems cool today b. Build on top of best practices and familiar patterns - simpler to support and find help c. Avoid vendor and technology lock-in d. The more you build, the more you need to maintain. Avoid unnecessary tech debt
  • 20. Getting started with organizational autonomy
  • 21. Extending to make discovery and understanding easier
  • 22. Starting with data as a product
  • 23. Summary ● Data Mesh is an organizational pattern - get your company on-board ● Identify the primary business outcomes you want to deliver with DM ● Focus on what you need to build now to deliver on an outcome soon ● Ensure data has clear ownership and accountability (quality, SLA, etc.) ● Treat data as a product
  • 24. Demo architecture and data flow
  • 25. Thank you Join the Upsolver Community to continue the conversation upsolver.com /in/royhasson/ /in/jasonfhall/
  • 26. Schedule a Demo: Sign Up for SQLake: Last Resort…Email the Sales Guy: * $20 Door Dash Gift Card for everyone that schedules a demo Actually, There is Such a Thing as a Free Lunch…..*