SlideShare a Scribd company logo
1 of 26
Download to read offline
Designing and implementing
Data Mesh at your company
In partnership with:
Participating meetups in
Boston
NYC
Chicago
Toronto
Montreal
Who we are
/in/royhasson/
/in/jasonfhall/
Roy Hasson - Head of product @ Upsolver
Jason Hall - Sr. Solutions Architect @ Upsolver
Ex-AWS
- Product for Amazon Athena, AWS Glue and AWS Lake Formation
- Founding member of AWS Data Lake and Data Mesh initiatives
- Guiding and supporting Data Mesh implementations with customers
- Works with customers to plan and implement data pipeline strategies
- Helps to ensure successful data projects from inception to production
-
Challenge to make big impacts, quicker
Business users are saying:
It takes too long to onboard new data
Central IT/data teams are a bottleneck
Can’t find, understand and access data
Takes too long to make small tweaks
Engineering users are saying:
We don’t understand business needs
Too many requests and tweaks
Integrations are complex and fragile
Difficult to hire good data engineers
Trying to solve the challenge with existing patterns
https://aws.amazon.com/big-data/what-is-a-data-lake/ https://databricks.com/product/data-lakehouse
Lakehouse
Decoupled
Data Lake
Build to suit
https://www.snowflake.com/blog/data-cloud-hybrid-data-warehouse-data-lake/
Data Warehouse
Hybrid
These solutions do not work on their own
Data lake
- Too low level, integrations are manual and complex
- Encourages inconsistent implementations, difficult to secure
- Open and vibrant community
Lakehouse
- Fewer tools options, simpler to implement, manual integrations
- Encourages centralization and lock-in
- Vibrant community in parts of the stack (storage and core engine)
Hybrid DWH
- 3-4 primary vendors to choose from, vertically integrated
- Encourages centralization and lock-in
- Limited by the vendor’s roadmap
This is not what we’re talking about
https://future.a16z.com/emerging-architectures-modern-data-infrastructure/
…this - Introducing Data Mesh
https://martinfowler.com/articles/data-monolith-to-mesh.html
Flexible organization design aligned to business needs
Flexible organization design and self-service tooling
Data domains - Autonomous units with ownership and accountability. Domains can produce
and/or consume data with other domains
Data infrastructure as a platform - Build once use everywhere. Enables consistent tooling,
engineering and security best practices, and ease of integration.
Data as a product - Data assets are treated like products. Delivered in a reliable, consistent and
secure manner. They are easily discoverable and accessible across the org
Overarching governance - Procedures and guidelines to secure, audit and control quality of data
in the organization.
Why Data Mesh at JPMC
Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo
High level Data Mesh design @ JPMC
Source AWS @ https://aws.amazon.com/blogs/big-data/how-jpmorgan-chase-built-a-data-mesh-architecture-to-drive-significant-value-to-enhance-their-enterprise-data-platform/
A single data domain built on an open data lake architecture
Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo
Creating a mesh with multiple data domains
Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo
Why Data Mesh at Intuit
Source Intuit July 2021 @ Data Mesh Learning Meetup - https://youtu.be/tNcxoASumB8
Intuit Data Mesh data products
Intuit data mesh strategy @ https://medium.com/intuit-engineering/intuits-data-mesh-strategy-778e3edaa017
Why Data Mesh at Zalando
Source Zalando @ Spark + AI Summit 2020 - https://youtu.be/eiUhV56uVUc
Moving to a Data Mesh at Zalando
Source Zalando @ Spark + AI Summit 2020 - https://youtu.be/eiUhV56uVUc
What can we learn from JPMC, Intuit and Zalando
1. Primary drivers - Autonomy, ownership and data-as-a-product
2. Sharing - producer/consumer model
3. Common data infrastructure - improve cost, scale and management overhead
a. JPMC opted for a build your own data lake
b. Zalando used Databricks Lakehouse as a base for their platform
c. Intuit created an open platform letting data domains choose
4. Central catalog - unified data asset discoverability, collaboration and entitlements
What to consider when getting started
1. What are the primary outcomes when implementing Data Mesh?
a. Autonomy - eliminating bottlenecks
b. Ownership and accountability - single owner, governance, quality and hygiene of data
c. Sharing - share and collaborate with teams to do more with data
d. Data products and data as code
2. Data infra - build vs. buy
a. Is owning the infra business critical?
b. Do you have the resources, how long will it take to build, how invested will you be 2yrs from now?
c. Can you build some and buy some?
3. What are the most important outputs you need to deliver?
a. Ownership and discoverability = unified catalog
b. Autonomy = producer/consumer, data contracts
c. Data as code = GitOps + dbt/python + data contracts
What to avoid early on
1. Don’t try to solve loosely defined problems
a. What does governance mean to you?
b. What does self-service analytics mean?
2. Don’t expand your scope, reduce it
a. Focus on outputs you need to deliver on your primary business outcomes
3. Don’t over complicate your architecture
a. Try to avoid doing everything that seems cool today
b. Build on top of best practices and familiar patterns - simpler to support and find help
c. Avoid vendor and technology lock-in
d. The more you build, the more you need to maintain. Avoid unnecessary tech debt
Getting started with organizational autonomy
Extending to make discovery and understanding easier
Starting with data as a product
Summary
● Data Mesh is an organizational pattern - get your company on-board
● Identify the primary business outcomes you want to deliver with DM
● Focus on what you need to build now to deliver on an outcome soon
● Ensure data has clear ownership and accountability (quality, SLA, etc.)
● Treat data as a product
Demo architecture and data flow
Thank you
Join the Upsolver Community
to continue the conversation
upsolver.com
/in/royhasson/
/in/jasonfhall/
Schedule a Demo: Sign Up for SQLake:
Last Resort…Email the Sales Guy:
* $20 Door Dash Gift Card for everyone that schedules a demo
Actually, There is Such a Thing as a Free Lunch…..*

More Related Content

Similar to Boston Data Engineering: Designing and Implementing Data Mesh at Your Company with Upsolver

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategyHimanshu Bari
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Blueprint
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudDATAVERSITY
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User InformationDenodo
 
CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)Michael King
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonDATAVERSITY
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsDATAVERSITY
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Denodo
 
Open Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITOpen Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITandreas kuncoro
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)Denodo
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfNeo4j
 
Cloud Computing and Enterprise Architecture
Cloud Computing and Enterprise ArchitectureCloud Computing and Enterprise Architecture
Cloud Computing and Enterprise ArchitectureDavid Linthicum
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationDenodo
 
DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?DATAVERSITY
 
Developing Your Cloud Strategy
Developing Your Cloud StrategyDeveloping Your Cloud Strategy
Developing Your Cloud StrategyVISI
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...PwC
 

Similar to Boston Data Engineering: Designing and Implementing Data Mesh at Your Company with Upsolver (20)

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: Cloud
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: Cloud
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User Information
 
CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced Analytics
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
 
Open Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITOpen Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise IT
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdf
 
Cloud Computing and Enterprise Architecture
Cloud Computing and Enterprise ArchitectureCloud Computing and Enterprise Architecture
Cloud Computing and Enterprise Architecture
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
 
pwc-data-mesh.pdf
pwc-data-mesh.pdfpwc-data-mesh.pdf
pwc-data-mesh.pdf
 
DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture — What’s the Next Big Thing?
 
Developing Your Cloud Strategy
Developing Your Cloud StrategyDeveloping Your Cloud Strategy
Developing Your Cloud Strategy
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Boston Data Engineering: Designing and Implementing Data Mesh at Your Company with Upsolver

  • 1. Designing and implementing Data Mesh at your company In partnership with: Participating meetups in Boston NYC Chicago Toronto Montreal
  • 2. Who we are /in/royhasson/ /in/jasonfhall/ Roy Hasson - Head of product @ Upsolver Jason Hall - Sr. Solutions Architect @ Upsolver Ex-AWS - Product for Amazon Athena, AWS Glue and AWS Lake Formation - Founding member of AWS Data Lake and Data Mesh initiatives - Guiding and supporting Data Mesh implementations with customers - Works with customers to plan and implement data pipeline strategies - Helps to ensure successful data projects from inception to production -
  • 3. Challenge to make big impacts, quicker Business users are saying: It takes too long to onboard new data Central IT/data teams are a bottleneck Can’t find, understand and access data Takes too long to make small tweaks Engineering users are saying: We don’t understand business needs Too many requests and tweaks Integrations are complex and fragile Difficult to hire good data engineers
  • 4. Trying to solve the challenge with existing patterns https://aws.amazon.com/big-data/what-is-a-data-lake/ https://databricks.com/product/data-lakehouse Lakehouse Decoupled Data Lake Build to suit https://www.snowflake.com/blog/data-cloud-hybrid-data-warehouse-data-lake/ Data Warehouse Hybrid
  • 5. These solutions do not work on their own Data lake - Too low level, integrations are manual and complex - Encourages inconsistent implementations, difficult to secure - Open and vibrant community Lakehouse - Fewer tools options, simpler to implement, manual integrations - Encourages centralization and lock-in - Vibrant community in parts of the stack (storage and core engine) Hybrid DWH - 3-4 primary vendors to choose from, vertically integrated - Encourages centralization and lock-in - Limited by the vendor’s roadmap
  • 6. This is not what we’re talking about https://future.a16z.com/emerging-architectures-modern-data-infrastructure/
  • 7. …this - Introducing Data Mesh https://martinfowler.com/articles/data-monolith-to-mesh.html Flexible organization design aligned to business needs
  • 8. Flexible organization design and self-service tooling Data domains - Autonomous units with ownership and accountability. Domains can produce and/or consume data with other domains Data infrastructure as a platform - Build once use everywhere. Enables consistent tooling, engineering and security best practices, and ease of integration. Data as a product - Data assets are treated like products. Delivered in a reliable, consistent and secure manner. They are easily discoverable and accessible across the org Overarching governance - Procedures and guidelines to secure, audit and control quality of data in the organization.
  • 9. Why Data Mesh at JPMC Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo
  • 10. High level Data Mesh design @ JPMC Source AWS @ https://aws.amazon.com/blogs/big-data/how-jpmorgan-chase-built-a-data-mesh-architecture-to-drive-significant-value-to-enhance-their-enterprise-data-platform/
  • 11. A single data domain built on an open data lake architecture Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo
  • 12. Creating a mesh with multiple data domains Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo
  • 13. Why Data Mesh at Intuit Source Intuit July 2021 @ Data Mesh Learning Meetup - https://youtu.be/tNcxoASumB8
  • 14. Intuit Data Mesh data products Intuit data mesh strategy @ https://medium.com/intuit-engineering/intuits-data-mesh-strategy-778e3edaa017
  • 15. Why Data Mesh at Zalando Source Zalando @ Spark + AI Summit 2020 - https://youtu.be/eiUhV56uVUc
  • 16. Moving to a Data Mesh at Zalando Source Zalando @ Spark + AI Summit 2020 - https://youtu.be/eiUhV56uVUc
  • 17. What can we learn from JPMC, Intuit and Zalando 1. Primary drivers - Autonomy, ownership and data-as-a-product 2. Sharing - producer/consumer model 3. Common data infrastructure - improve cost, scale and management overhead a. JPMC opted for a build your own data lake b. Zalando used Databricks Lakehouse as a base for their platform c. Intuit created an open platform letting data domains choose 4. Central catalog - unified data asset discoverability, collaboration and entitlements
  • 18. What to consider when getting started 1. What are the primary outcomes when implementing Data Mesh? a. Autonomy - eliminating bottlenecks b. Ownership and accountability - single owner, governance, quality and hygiene of data c. Sharing - share and collaborate with teams to do more with data d. Data products and data as code 2. Data infra - build vs. buy a. Is owning the infra business critical? b. Do you have the resources, how long will it take to build, how invested will you be 2yrs from now? c. Can you build some and buy some? 3. What are the most important outputs you need to deliver? a. Ownership and discoverability = unified catalog b. Autonomy = producer/consumer, data contracts c. Data as code = GitOps + dbt/python + data contracts
  • 19. What to avoid early on 1. Don’t try to solve loosely defined problems a. What does governance mean to you? b. What does self-service analytics mean? 2. Don’t expand your scope, reduce it a. Focus on outputs you need to deliver on your primary business outcomes 3. Don’t over complicate your architecture a. Try to avoid doing everything that seems cool today b. Build on top of best practices and familiar patterns - simpler to support and find help c. Avoid vendor and technology lock-in d. The more you build, the more you need to maintain. Avoid unnecessary tech debt
  • 20. Getting started with organizational autonomy
  • 21. Extending to make discovery and understanding easier
  • 22. Starting with data as a product
  • 23. Summary ● Data Mesh is an organizational pattern - get your company on-board ● Identify the primary business outcomes you want to deliver with DM ● Focus on what you need to build now to deliver on an outcome soon ● Ensure data has clear ownership and accountability (quality, SLA, etc.) ● Treat data as a product
  • 24. Demo architecture and data flow
  • 25. Thank you Join the Upsolver Community to continue the conversation upsolver.com /in/royhasson/ /in/jasonfhall/
  • 26. Schedule a Demo: Sign Up for SQLake: Last Resort…Email the Sales Guy: * $20 Door Dash Gift Card for everyone that schedules a demo Actually, There is Such a Thing as a Free Lunch…..*