Boston Data Engineering: Designing and Implementing Data Mesh at Your Company with Upsolver

Designing and implementing
Data Mesh at your company
In partnership with:
Participating meetups in
Boston
NYC
Chicago
Toronto
Montreal

Who we are
/in/royhasson/
/in/jasonfhall/
Roy Hasson - Head of product @ Upsolver
Jason Hall - Sr. Solutions Architect @ Upsolver
Ex-AWS
- Product for Amazon Athena, AWS Glue and AWS Lake Formation
- Founding member of AWS Data Lake and Data Mesh initiatives
- Guiding and supporting Data Mesh implementations with customers
- Works with customers to plan and implement data pipeline strategies
- Helps to ensure successful data projects from inception to production
-

Challenge to make big impacts, quicker
Business users are saying:
It takes too long to onboard new data
Central IT/data teams are a bottleneck
Can’t ﬁnd, understand and access data
Takes too long to make small tweaks
Engineering users are saying:
We don’t understand business needs
Too many requests and tweaks
Integrations are complex and fragile
Diﬃcult to hire good data engineers

Trying to solve the challenge with existing patterns
https://aws.amazon.com/big-data/what-is-a-data-lake/ https://databricks.com/product/data-lakehouse
Lakehouse
Decoupled
Data Lake
Build to suit
https://www.snowﬂake.com/blog/data-cloud-hybrid-data-warehouse-data-lake/
Data Warehouse
Hybrid

These solutions do not work on their own
Data lake
- Too low level, integrations are manual and complex
- Encourages inconsistent implementations, diﬃcult to secure
- Open and vibrant community
Lakehouse
- Fewer tools options, simpler to implement, manual integrations
- Encourages centralization and lock-in
- Vibrant community in parts of the stack (storage and core engine)
Hybrid DWH
- 3-4 primary vendors to choose from, vertically integrated
- Encourages centralization and lock-in
- Limited by the vendor’s roadmap

This is not what we’re talking about
https://future.a16z.com/emerging-architectures-modern-data-infrastructure/

…this - Introducing Data Mesh
https://martinfowler.com/articles/data-monolith-to-mesh.html
Flexible organization design aligned to business needs

Flexible organization design and self-service tooling
Data domains - Autonomous units with ownership and accountability. Domains can produce
and/or consume data with other domains
Data infrastructure as a platform - Build once use everywhere. Enables consistent tooling,
engineering and security best practices, and ease of integration.
Data as a product - Data assets are treated like products. Delivered in a reliable, consistent and
secure manner. They are easily discoverable and accessible across the org
Overarching governance - Procedures and guidelines to secure, audit and control quality of data
in the organization.

Why Data Mesh at JPMC
Source JPMC July 2021 @ Data Mesh Learning Meetup - https://youtu.be/7iazNKG8XQo

High level Data Mesh design @ JPMC
Source AWS @ https://aws.amazon.com/blogs/big-data/how-jpmorgan-chase-built-a-data-mesh-architecture-to-drive-signiﬁcant-value-to-enhance-their-enterprise-data-platform/

A single data domain built on an open data lake architecture

Creating a mesh with multiple data domains

Why Data Mesh at Intuit
Source Intuit July 2021 @ Data Mesh Learning Meetup - https://youtu.be/tNcxoASumB8

Intuit Data Mesh data products
Intuit data mesh strategy @ https://medium.com/intuit-engineering/intuits-data-mesh-strategy-778e3edaa017

Why Data Mesh at Zalando
Source Zalando @ Spark + AI Summit 2020 - https://youtu.be/eiUhV56uVUc

Moving to a Data Mesh at Zalando
Source Zalando @ Spark + AI Summit 2020 - https://youtu.be/eiUhV56uVUc

What can we learn from JPMC, Intuit and Zalando
1. Primary drivers - Autonomy, ownership and data-as-a-product
2. Sharing - producer/consumer model
3. Common data infrastructure - improve cost, scale and management overhead
a. JPMC opted for a build your own data lake
b. Zalando used Databricks Lakehouse as a base for their platform
c. Intuit created an open platform letting data domains choose
4. Central catalog - uniﬁed data asset discoverability, collaboration and entitlements

What to consider when getting started
1. What are the primary outcomes when implementing Data Mesh?
a. Autonomy - eliminating bottlenecks
b. Ownership and accountability - single owner, governance, quality and hygiene of data
c. Sharing - share and collaborate with teams to do more with data
d. Data products and data as code
2. Data infra - build vs. buy
a. Is owning the infra business critical?
b. Do you have the resources, how long will it take to build, how invested will you be 2yrs from now?
c. Can you build some and buy some?
3. What are the most important outputs you need to deliver?
a. Ownership and discoverability = uniﬁed catalog
b. Autonomy = producer/consumer, data contracts
c. Data as code = GitOps + dbt/python + data contracts

What to avoid early on
1. Don’t try to solve loosely deﬁned problems
a. What does governance mean to you?
b. What does self-service analytics mean?
2. Don’t expand your scope, reduce it
a. Focus on outputs you need to deliver on your primary business outcomes
3. Don’t over complicate your architecture
a. Try to avoid doing everything that seems cool today
b. Build on top of best practices and familiar patterns - simpler to support and ﬁnd help
c. Avoid vendor and technology lock-in
d. The more you build, the more you need to maintain. Avoid unnecessary tech debt

Getting started with organizational autonomy

Extending to make discovery and understanding easier

Starting with data as a product

Summary
● Data Mesh is an organizational pattern - get your company on-board
● Identify the primary business outcomes you want to deliver with DM
● Focus on what you need to build now to deliver on an outcome soon
● Ensure data has clear ownership and accountability (quality, SLA, etc.)
● Treat data as a product

Demo architecture and data ﬂow

Thank you
Join the Upsolver Community
to continue the conversation
upsolver.com
/in/royhasson/
/in/jasonfhall/

Schedule a Demo: Sign Up for SQLake:
Last Resort…Email the Sales Guy:
* $20 Door Dash Gift Card for everyone that schedules a demo
Actually, There is Such a Thing as a Free Lunch…..*

Boston Data Engineering: Designing and Implementing Data Mesh at Your Company with Upsolver

Recommended

Recommended

More Related Content

Similar to Boston Data Engineering: Designing and Implementing Data Mesh at Your Company with Upsolver

Similar to Boston Data Engineering: Designing and Implementing Data Mesh at Your Company with Upsolver (20)

Recently uploaded

Recently uploaded (20)

Boston Data Engineering: Designing and Implementing Data Mesh at Your Company with Upsolver