A paradigm shift
in analytical data
management
architecture
© 2020 ThoughtWorks
“Everything starts
with a story...“
- Joseph Campbell (1904-1987)
American professor of literature
and author
66%
⬆
2019
2018
2018
19%
⬇
2019
The inconvenient truth
NewVantage Partners Releases 2019 Big Data and AI Executive Survey (link)
Accelerated
investment in
Big Data/AI
Unproven business results
On every measure
- Create data-driven culture
- Treating data as business assets
- Competing on data and analytics
FAILED
Zhamak Dehghani @zhamakd 2019
OPERATIONAL
DATA
Running the business
and serving the users
ANALYTICAL
(BIG)
DATA
Optimizing the
business and user
experience
Optimized for
application/services logic
Captures current state of
applications
Transactional (CRUD)
Data on the inside -
API-based access
Polyglot - graph database,
no-sql, document store,
relational database
Optimized for analytical
logic - training machine
learning models, create
reports
Historical and ready-only
Data on the outside -
events, files, tables
Polyglot: object store, big
table, streams
@zhamakd | © 2020 ThoughtWorks
ETL!
Lets orient
What do we really mean by data?
https://martinfowler.com/bliki/DataLake.html
ETL!
Power BI
Google BigQuery
Operational
Data
Analytical
Data
Accessed via SQL
Data Warehouse
@zhamakd | © 2020 ThoughtWorks
https://martinfowler.com/bliki/DataLake.html
ELT!
Operational
Data
Analytical
Data
Accessed via files
Analytical
Data
Accessed via SQL, APIs
Google Cloud Storage
Azure Data Lake Storage
AWS S3
Apache Airflow
Azure Data Factory
Data Lake
@zhamakd | © 2020 ThoughtWorks
https://cloud.google.com/solutions/build-a-data-lake-on-gcp
Operational
Data
ELT!
Analytical
Data
Accessed via files
Analytical
Data
Accessed via SQL, APIs
Data Lake
On Cloud
@zhamakd | © 2020 ThoughtWorks
BIG DATA | AI
PLATFORM
Ubiquitous data Innovation agenda
@zhamakd | © 2020 ThoughtWorks
Centralized | Monolithic
Hyper-specialized Silo
Hyper-specialized
Data | ML
Engineers
@zhamakd | © 2020 ThoughtWorks
Architecture orthogonal
to the axis change
Architecture decomposition around pipeline stages
FEATURES
DATA-ORIENTE
D CHANGE
@zhamakd | © 2020 ThoughtWorks
Disconnected execution
@zhamakd | © 2020 ThoughtWorks
BIG DATA
PLATFORM
HYPER-SPECIALIZED SILO
DELIVERY
CENTRALIZED
ARCHITECTURE
DISCONNECTED
EXECUTION
@zhamakd | © 2020 ThoughtWorks
FAIL TO MATERIALIZE
DATA-DRIVEN VALUE
FAIL TO SCALE
CONSUMERS
FAIL TO SCALE
SOURCES
FAIL TO
BOOTSTRAP
@zhamakd | © 2020 ThoughtWorks
Failure symptoms
Where do we
go from here?
© 2020 ThoughtWorks
BIG DATA
PLATFORM
What it could look like …
@zhamakd | © 2020 ThoughtWorks
Domains aligned
with shared aggregates
Domains aligned
with the origin of data
Facts & reality of
business
Immutable timed events
Historical snapshots
Change less frequently
Permanently captured
Domains aligned with
the consumption
Fit for consumer
purpose
Aggregation /
Projections /
Transformed
Change more often
Can be recreated
@zhamakd | © 2020 ThoughtWorks
Decompose data around domains
Distribute the ownership
Domain Data
Product Owner
Domain Data
Product
SHARED | DISCOVERABLE
TRUSTWORTHY
SELF-DESCRIBING
ADDRESSABLE
INTER OPERABLE
SECURE@zhamakd | © 2020 ThoughtWorks
Serve data as a product
Delight the consumer with ease of data discovery and use
Domain Data
Product
Polyglot
Input Data Ports
Polyglot
Output Data Ports
Control Ports@zhamakd | © 2020 ThoughtWorks
Data product is the architecture quantum
Always historical and read only access to data
Enable autonomy
Abstract technical complexity in self-serve data infrastructure
Data Infra Team
Data | ML Infrastructure as a
Platform
Data product blueprint
Data product creation
Unified access patterns
Discoverability
Access Control
Polyglot storage
SLO and monitoring
Pipeline orchestration
Data product CI/CD
Automating Governance
...
@zhamakd | © 2020 ThoughtWorks
Data Infra as a PlatformData Infra as a Platform
Global Governance
| Open Standards
Data product
boundaries & definition
guidelines
Define platform fitness
functions
& global services
Model common data
elements & metadataDomain Data
Product Owners
&
Platform
Product Owners
@zhamakd | © 2020 ThoughtWorks
Build an ecosystem
Create a federated and computational governance to enable
interoperability & ecosystem thinking
Data Infra as a Platform
Data Infra as a Platform
Global Governance
& Open Standards
Execute
THROUGH
ITERATIONS OF
CONNECTED
INTELLIGENCE
ACTIONS
DATA INTELLIGENCE
@zhamakd | © 2020 ThoughtWorks
Pillars of the Data Mesh paradigm shift
DOMAIN ORIENTED
DECENTRALIZATION
DATA AS A
PRODUCT
SELF-SERVE DATA
INFRA
AS A PLATFORM
FEDERATED
COMPUTATIONAL
GOVERNANCE
@zhamakd | © 2020 ThoughtWorks
Example
© 2020 ThoughtWorks
Insurance Claims Domain
CALL CENTER
CLAIMS
SYSTEM
CALL
CENTER
CLAIMS
DATA
PRODUCT
CDC INPUT
DATA PORT
CALL CENTER
CLAIMS
DAILY SNAPSHOTS
ONLINE
CLAIMS
SYSTEM
ONLINE
CLAIMS
DATA
PRODUCT
EVENTS INPUT
DATA PORT
ONLINE
CLAIMS
INFINIT EVENTS
LOG
ONLINE
CLAIMS
DAILY SNAPSHOTS
CLAIMS
DATA
PRODUCT
INPUT
DATA PORT
CLAIMS
DAILY SNAPSHOTS
CLAIMS
LIVE EVENTS
MEMBERS DOMAIN
MEMBERS
DATA
PRODUCT
MEMBERS
DAILY SNAPSHOTS
CLAIMS
DATA
PRODUCT
INPUT
DATA PORT
CLAIMS
DAILY SNAPSHOTS
CLAIMS
LIVE EVENTS
INSURANCE CLAIMS DOMAIN
MEMBERS ASSISTANCE DOMAIN
MEMBERS
ASSISTANCE
DATA
PRODUCT
INPUT
DATA PORT
MEMBERS
REQUIRING
ASSISTANCE
DAILY SNAPSHOTS
What is next
© 2020 ThoughtWorks
“The best way to
predict the future
is to create it.”
- Alan Kay (1940)
Computer scientist
29
Paradigm Shift
Decentralized Ownership & Federated Governance
Domain as first class concern
FROM TO
Centralized Ownership & Governance
Monolithic Distributed
Pipeline as first-class concern
Data as a by-product
Data as a product
@zhamakd | © 2020 ThoughtWorks
“A different language is a different vision of life”
(Federico Fellini)
FROM TO
SERVINGIGESTING
DISCOVER | CONSUME | LINKEXTRACT | LOAD | ONBOARD
PUBLISH DATA VIA PORTSFLOW DATA THROUGH PIPELINES
CENTRALIZED LAKE | WAREHOUSE | PLATFORM ECOSYSTEM OF DATA AS PRODUCTS
@zhamakd | © 2020 ThoughtWorks
https://www.infoq.com/articles/architecture-trends-2020/ “I see the industry’s question of What is Data Mesh, is changing
to How to do Data Mesh in the new year, and of course How to do
Data Mesh Right in the following years as the adoption grows.”
https://martinfowler.com/articles/data-monolith-to-mesh.html
@zhamakd | © 2020 ThoughtWorks
OPERATIONAL
DATA
ANALYTICAL
(BIG)
DATA
ETL!
ANALYTICAL
(BIG)
DATA
@zhamakd | © 2020 ThoughtWorks
Convergence of operational
and analytical worlds
While respecting the different users they serve and different
technology stack to support each ...
Zhamak Dehghani
@zhamakd
Obrigada
© 2020 ThoughtWorks

[XConf Brasil 2020] Data mesh