Data product thinking-Will the Data Mesh save us from analytics history

@rwerschkull
nl.linkedin.com/in/rogierwerschkull
1
Data Product Thinking
Will ‘the Data Mesh’ save us from analytics misery?
Rogier Werschkull, RogerData
version 1.0, date 1-6-2022

@rwerschkull
2
Betteridge's law of headlines…
Any headline that ends in a question mark can
be answered by the word NO

@rwerschkull
3
No, it won’t ‘save us’….
There is NO quick fix to become ‘Data Driven’
 or ‘information supported’
 or whatever you want to call what we are doing here…
But there are Data Mesh aspects that make sense!
 …if you ignore most of the (tech) vendor washing…
As in…

@rwerschkull
4
About me...
 Rogier Werschkull
 21 years ‘in field’ of Data, Data warehousing and
Business Intelligence
 Data architecture advise, data modeling, data
engineering, data-analytics product owner
 Blogger, trainer, conference speaker
 Contact details:
 www.linkedin.com/in/rogierwerschkull/
 rogier@rogerdata.nl
 @rwerschkull

@rwerschkull
5
80-90%?

@rwerschkull
6
Failure…
Photo credit: https://highfiveexports.wordpress.com/2010/06/25/3000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur/
@rwerschkull

@rwerschkull
7
“in an average organization the car park or art
collection is better managed than data.”
Gartner analyst Frank Buytendijk

@rwerschkull
8
“Every single company I've worked at and talked to has the
same problem without a single exception so far:
poor data quality...
Either there's incomplete data, missing data,
duplicative data.”
Ruslan Belkin, former VP of Engineering @ Twitter and Salesforce

@rwerschkull
9
Data mesh tries (again?) to combat the ‘Analytics Misery’
we tend to create…
HOW?
 By decentralizing most ‘data warehousing concerns’ to individual
business domains. As in:
1. In either the operational source system
2. Or in a decentralized DWH team that sits ‘closer’ to the source systems
 By ‘calling out’ the required organizational / cultural change to
accomplish this…
What is data Mesh-1?

@rwerschkull
10
Data mesh is described in the official Data Mesh Book by
Zhamak Dehghani (thoughtworks) as follows:
‘Data mesh is a decentralized sociotechnical approach to share,
access, and manage analytical data in complex and large-scale
environments—within or across organizations’
What is data Mesh-2?

@rwerschkull
11
Only for large organizations, as in:
 Having a lot of source systems
 A lot of employees
 Where there are clear / separated business domains
Only for large organizations that…
 are not afraid to experiment
• That can live with the current absence of viable implementation patterns
 have a mature (centralized?) data / analytics department
 preferably can influence the design / development of the operational
applications they use
For WHO and WHEN?

@rwerschkull
12
Data mesh is foremost about
people and processes
 About chaining the data-analytics
‘culture’
About the process of
collaborating on…
 creating decentralized, valuable
data products within a ‘business
domain’
 and sharing data between
these domains
NOT about technology!
People
Process
Data
Technology

@rwerschkull
13
Tech bullshit example: starburst.io
…’sort of’ resolved for now!
In January 2022:
In May 2022:

@rwerschkull
14
 It’s not that I don’t agree with the core data mesh principles
 It’s the way they are explained: ‘quite academic’
 Implementation guidelines are still missing
• Which Zhamak also clearly mentions, as in that it is still an emerging concept!
 But then still, some vital context is truly missing
 A mayor issue I have with the book (that we need to counter)
 I really have doubts on the amount of data integration experience the
contributors have, based on that
• it states that folks building DWH’s are still striving for the ‘single version of the truth’
(if you lived 10 years ago yes…)
• there is no mention at all of modern ELM-based data modeling patterns
(that are there to help data integration)
Quite often the book’s content does
not help in this aspect…

@rwerschkull
15
Why we might need to take this seriously…

@rwerschkull
16
“The primary purpose of a
data warehouse
is to transform data from
an application state into an integrated corporate
state”
Bill Inmon, the father of datawarehousing
Is it a new DWH pattern?

@rwerschkull
17
After all,
‘Data
Warehousin
g’ is a tech
agnostic
activity…
Subject
Oriented
Integrated
Time Variant Non-Volatile
DWH

@rwerschkull
18
It’s about doing
this
analytical
work with
data
somehow
somewhere
Structure information so it can
be consumed easily. Shaped
for a diverse type of users, use
cases and tools
Reliable, durable
integration / unification
of data
Register the history and
history of changes
to data
Store data you receive
once, protected from
ungoverned deletion
DWH

@rwerschkull
19
Common DWH patterns overview
Data -
analytics
Architectur
e
What is it? Architecture
solution
features
Most common
development
style
Data warehousing ‘concern’
Subject Oriented Integrated Time-Variant Non-Volatile
Data lake A repository of raw
data of any type for
analytics purposes
Decentralized
tech.
On premise or
cloud
Decentralized
‘puddles of lake’
No No Yes, in general
implemented as a
file store
Yes, in general
implemented as a file
store
DWH 3.0 /
‘Lakehouse
’
The re-merger of
data lake and
‘classical’ DWH
concerns, also
known as the
‘Modern DWH’
Centralized
tech. Cloud
based
Centralized or
decentralized
depending on
business
complexity
Yes, via database
transformation
rules
Yes, via database
integration rules
Yes, via a database
Historical Staging
Area
Yes, via a database
Historical Staging
Area
Data
Mesh
Distributed data
architecture that
pushes down
‘DWH concerns’
the source /
‘business domain’
Highly
decentralized
tech.
On premise
or cloud
Highly
decentralized
by definition.
Focus on ‘data
as a product’
Yes, but
mainly pushed
to the ‘business
domains’
Yes, local
withing the
business
domain and
centralized via
a ‘knowledge
graph’ like
‘mapping’
Yes, but pushed
to the ‘business
domains’
Yes, but pushed
to the ‘business
domains’
Data Fabric Distributed data
architecture where
‘time variant / non
volatile’ concerns are
pushed down to the
source systems
Centralized
tech
On premise or
cloud.
Sources
decentralized
Centralized or
decentralized
depending on
business
complexity
Yes, via
centralised
virtual
transformation
rules
Yes, via
centralised
virtual integration
logic
What the
operational
system provides
or by creating a
Historical Staging
area in an analytical
What the
operational system
provides or by
creating a Historical
Staging area in an
analytical database

@rwerschkull
20
Data Mesh proposes to separate and push down these ‘DWH
concerns’ like operational applications do in a microservices
based architecture

@rwerschkull
21
1. Principle of domain ownership
1. Analytical data should be owned by either the source system or its main
consumers
2.Data as a product
1.Build data artifacts with a true product
(management) mindset
3. Self Service data Platform
 Use (shared) Infrastructure as a platform (in the cloud?) to build this
4. Federated Computational Governance
 Data governance operating model based on a federated decision-making and
accountability
Based on these foundational principles…

@rwerschkull
22
The
proposed
changes…

@rwerschkull
23
DHW created artifacts
should become ‘data
as a product’
For a consumer, a
data product should
be…
The ‘Data as a Product’ principle
Feasible
Usable
Valuable
Data as a
product

@rwerschkull
24
 Feasible
 Can the product be made in an acceptable time & for acceptable costs?
 Valuable
 What are the desires of my customers?
 What is my market?
• How to do marketing?
 What is the USP?
 What ‘price is justified?’
 Are my customers happy?
 Usable
 Is the product being used?
 Is the product easy (enough) to use?
 Are my customers happy?
Some examples of work you’ll need to do
here!

@rwerschkull
25
To see ‘data / information as a product’*it practically needs to be:
 Discoverable
 An easy, google-like way to find data sets
 Addressable
 The product needs to have a permanent unique identifier that stays stable over time
 Understandable
 The product needs to accompanied with metadata that describes WHAT something is
 Trustworthy and ‘truthful’
 The product needs to have a lot a data quality metrics and lineage metadata attached
 Natively accessible
 Accessible via any interface that suits the consumer, ie as API / via ODBC-SQL / stream ‘topic’
 Interoperable and composable
 The product needs to be accompanied with metadata on HOW it can be combined with other products
 Valuable on its own
 Useable without the need to first combine it with other data products
 Secure
 Data security / privacy needs to work on the product without needing ‘something else’
Data Mesh Data Product principles

@rwerschkull
26
…an extension / repackaging of the existing FAIR data
principles
 https://www.go-fair.org/fair-principles/
This is not completely new…
FAIR DATA MESH
Findable Discoverable
Understandable
Accessible Addressable
Natively accessible
Secure
Interoperable Trustworthy and ‘truthful’
Interoperable and composable
Valuable on its own
Reusable Natively accessible
Trustworthy and ‘truthful’
Understandable

@rwerschkull
27
A majority agrees this makes sense…

@rwerschkull
28
Missing from the
book:
When
implementing
each data
product,
these concerns
need to be
addressed…
Subject
Oriented
Integrated
Time Variant Non-Volatile
DWH

@rwerschkull
29
‘Pushing down’ DWH concerns to the operational systems
will likely be a long journey
In addition, a lot of the tech mentioned in the book to cover
some vital aspects of the data mesh does not exist yet
The alternative I see…
 Use the DWH 3.0 / Lakehouse pattern
 Make sure to cover the mentioned Data Mesh principles there
• I think the key there is to use Data Vault or other ELM-based data
modeling style as an enabler
Overall, this would be my starting point
when MVP-ing a ‘meshy architecture’

@rwerschkull
30
 Subject Oriented
 Create domain specific and centralized hubs
 Create domain specific satellites
 Integrated
 Domain specific hubs should (obviously) be integrated within a domain
 Centralized hubs should be fed from all domains and / or an central MDM
source
 Centralized same as links should be created to enable integrating across
domains
• Domain specific satellites can then be shared too
 Time variant & non volatile
 Before the Subject Oriented / Integrated step, data should be loaded RAW in
a Historical Staging Area first
Implementing the ‘DWH concerns’ using
Data Vault

@rwerschkull
31
Hubs & centralization:
It will remain complicated!

@rwerschkull
32
Implementing ‘data as a product’ in
the DWH 3.0 / Lakehouse pattern
Product Principle My first implementation ‘idea’ / suggestion
Discoverable • Data product metadata should be pushed or pulled from all domains towards a data catalog with a good
search interface where NO MANUAL CURATION is needed!
Addressable • If source entity names change, the entity names should remain stable. That’s also the purpose of a GOOD
data vault hub
Trustworthy and ‘truthful’ • Data quality test should be part of each data product, not having them should block releasing them
• The data catalog mentioned in discoverable should handle lineage
Natively accessible • Next to storing data products ‘in a database’, and using ODBC/ JDBC to access, create a data-API on top
of each data product or make sure it can be used via an API too
• Database systems with ‘low friction’ data sharing capabilities could help here
Interoperable and
composable
• Embed metadata from parent / child data products in the data product itself
• Again, a data catalog plays a central role here
Valuable on its own • ELM based data modeling patterns (like Data Vault) and a datamart modeling style like Kimball /
Dimensional is still the way to go
Secure • Use the native row and column level security features of modern cloud based analytical databases.
• Register these policies as metadata.
• This requires data product consumers to consume using named accounts only

@rwerschkull
33
5 tips wat NU te doen
Nog toevoegen

@rwerschkull
34
 All relevant data mesh info is collected at on
https://datameshlearning.com/user-stories/
 Scott Hireleman drives his initiative
 He also host the accompanying Data Mesh Radio podcast
• Listen to Shane Gibson’s (Knowledge Gap presenter) here:
https://daappod.com/data-mesh-radio/repeatable-patterns-and-data-mesh-shane-gibson/
Check out the user journeys here:
 https://datameshlearning.com/user-stories/
LAST: Where can I find more info?

@rwerschkull
35
Thank you!
Contact details:
www.linkedin.com/in/rogierwerschkull/
rogier@rogerdata.nl
@rwerschkull

Data product thinking-Will the Data Mesh save us from analytics history

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data product thinking-Will the Data Mesh save us from analytics history

Similar to Data product thinking-Will the Data Mesh save us from analytics history (20)

Recently uploaded

Recently uploaded (20)

Data product thinking-Will the Data Mesh save us from analytics history

Editor's Notes