#DataDevOps
A MANIFESTO FOR A DEVOPS-LIKE
CULTURE SHIFT IN DATA &
ANALYTICS
SEBASTIAN HEROLD
DR. ARIF WIDER
2018-06-26
MUNICH
2
Sebastian Herold
Big Data Architect @ Zalando
@heroldamus
Previously 7 years @ Scout24
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
4
Data Challenges
Data Manifesto
AI Empowerment
Data Architecture
Data-Driven Company
AGENDA
DataDevOps Culture
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
5
> 300,000
product choices
as at June 2018
ZALANDO IN NUMBERS
~4.5billion EURO
revenue 2017
> 75%
of visits via
mobile devices
> 200
million
visits
per
month
> 23
millionactive customers
> 15,000
employees in
Europe
17
countries
~ 2,000
brands
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
6
ZALANDO IN NUMBERS
GB/s on Kafka read
>2
People in Tech
>2000
Dev Teams
>250
MSTR User
>2000
AWS Accounts
>260
Data Scientists
>150
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
7
DATA CHALLENGES
FRAUD
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
8
DATA CHALLENGES
PRICING &
FORECASTING
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
9
DATA CHALLENGES
PERSONALISATION
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
10
DATA CHALLENGES
SIZING
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
11
DATA CHALLENGESVISUAL
SEARCH
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
12
DATA CHALLENGES
MANY
MORE
13
AI EMPOWERMENT
INSTITUTIONALISING
MACHINE LEARNING
AT SCALE
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
INNOVATION
TOP-DOWN vs BOTTOM-UP
15 TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
BEGINNERS EXPERTSAI MATURITY
2017
2018
2019
AI SKILLS SHIFT
TIME
16
FACETS OF INSTITUTIONALISING
Processes InfrastructureData Quality
Education Marketing Serving EventsData
Metadata Sharability Compliance ConnectivityGuilds
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
17
MSTR
Learn
TEAMAIMATURITY
Define Explore Extract Model Serve Observe
“LEVEL ZERO”
ANALYTICS &
REPORTING
AI EXPERTS
DATA PRODUCT JOURNEY
Basic
Training
Offers
BI Consulting
AI
Consulting
AI Literacy
Training
Expert Training
Data Science
Guild
MS Excel
Data Catalog
incl. meta
data
SQL Engine / SuperSet
Kafka
Jupyter Notebook Hub
ETL RStudio Shiny
Spark
DIFFERENT OFFERS FOR DIFFERENT PEOPLE & STEPS
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
18
WHERE WE CAME FROM: DISTRIBUTED DATA PLATFORMS
ZALON’S
DATA PLATFORM
FASHION STORE’S
DATA PLATFORM
OTHER BUSINESS UNIT’S
DATA PLATFORMBI PLATFORM
OTHER BUSINESS UNIT’S
DATA PLATFORM
CENTRAL
DATA PLATFORM
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
19
WHERE WE WANT TO GO: INTEGRATED DATA PLATFORM
FASHION STORE’S
DATA PLATFORM
OTHER BUSINESS UNIT’S
DATA PLATFORM
OTHER BUSINESS UNIT’S
DATA PLATFORM
CENTRAL
DATA PLATFORM ZALON’S
DATA PLATFORM
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
20
DATA PLATFORM DESIGN PRINCIPLES
CLOUD FIRST DATA FRESHNESS & QUALITY
MERGE ANALYTICS AND DATA SCIENCE EMPOWER CONSUMERS AND PRODUCERS
INNOVATION
SCALABILITY
FLEXIBILITY
STREAMING
MICRO-BATCHING
BI AI
SELF-SERVICE
RESPONSIBILITIES
TOOLING
ROLES
METADATA
PROCESSES
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
21
DATA PLATFORM ARCHITECTURE
DBs
MicroServices
GAP/Others
Data Sources
22
DATA PLATFORM ARCHITECTURE
DBs
MicroServices
GAP/Others
Data Sources Ingestion
Event-Bus
Batch or Delta
Loads + CDC
Connectors
Data Gateway
23
DBs
MicroServices
GAP/Others
Data Sources Ingestion Storage
Event-Bus
Batch or Delta
Loads + CDC
Connectors
Data Storage
Data Catalog
Data Gateway
Model
Repo
DATA PLATFORM ARCHITECTURE
Metadata Flow
Data Flow
24
DBs
MicroServices
GAP/Others
Data Sources Ingestion Storage Processing
Event-Bus
Batch or Delta
Loads + CDC
Connectors
Data Storage
Data Catalog Orchestration
Batch Process.
Acceleration Layer
SQL Engine
Stream Process.
Data Gateway
Model
Repo
Model Serving
DATA PLATFORM ARCHITECTURE
Metadata Flow
Data Flow
25
DATA PLATFORM ARCHITECTURE
DBs
MicroServices
GAP/Others
Data Sources Ingestion Storage Processing
Event-Bus
Batch or Delta
Loads + CDC
Connectors
Data Storage
Data Catalog Orchestration
Batch Process.
Acceleration Layer BI Tools
SQL/Apps
Notebooks
Data Catalog UI
SQL Engine
Stream Process.
Data Gateway
Model
Repo
Model Serving
Access
Metadata Flow
Data Flow
26
DATA PLATFORM ARCHITECTURE
Governance
Processes & Glossary
DBs
MicroServices
GAP/Others
Data Sources Ingestion Storage Processing
Event-Bus
Batch or Delta
Loads + CDC
Connectors
Data Storage
Data Catalog Orchestration
Batch Process.
Acceleration Layer BI Tools
SQL/Apps
Notebooks
Data Catalog UI
SQL Engine
Stream Process.
Data Gateway
Model
Repo
Model Serving
Access
Metadata Flow
Data Flow
27
MERGE OF BI AND DATA SCIENCE JOURNEY
BI Product
Journey
AI Product
Journey≈
Learn Define Explore Extract Model Serve Observe
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
28
MERGE OF BI AND DATA SCIENCE JOURNEY
BI Product
Journey
AI Product
Journey≈
Explore Extract Model Serve ObserveLearn Define
LET’S FOCUS ON THE TECHNICAL PART !
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
29
BI Product
Journey
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
30
BI PRODUCT JOURNEY
EXPLORATION
Explore Extract Model Serve Observe
Data Catalog UI
31
BI PRODUCT JOURNEY
EXTRACTION
Explore Extract Model Serve Observe
32
BI PRODUCT JOURNEY
MODELING
Explore Extract Model Serve Observe
33
BI PRODUCT JOURNEY
SERVING
Explore Extract Model Serve Observe
34
BI PRODUCT JOURNEY
OBSERVING
Explore Extract Model Serve Observe
35
AI Product
Journey
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
36
AI PRODUCT JOURNEY
EXPLORATION
Explore Extract Model Serve Observe
Data Catalog UI
37
AI PRODUCT JOURNEY
EXTRACTION
Explore Extract Model Serve Observe
38
AI PRODUCT JOURNEY
Explore Extract Model Serve Observe
Model Repo
MODELING
39
Panda Serving
AI PRODUCT JOURNEY
SERVING
Explore Extract Model Serve Observe
40
AI PRODUCT JOURNEY
OBSERVING
Explore Extract Model Serve Observe
41 TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
THE DATA DRIVEN COMPANY
“The McKinsey Global Institute indicates that data driven organizations
are 23 times more likely to acquire customers, 6 times as likely to retain
those customers, and 19 times as likely to be profitable as a result.”
What does “data driven” mean
● Data is a key asset of the company
● All decisions in the company (products and processes) are data-driven, i.e. based on objective
data insights
● Data Analytics and Data Science are common place in the company
● Company-wide data-architecture in place
● Company-wide data governance rules in place
Source: https://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/five-facts-how-customer-analytics-boosts-corporate-performance
42
DATA MANIFESTO
THEMES FOR A
DATA-DRIVEN COMPANY
AT SCALE
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
43
M
ETRIC
CONSUMER
DATA LANDSCAPE
DATA
PRODUCER
THEMES FOR DATA AT SCALE
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
44
M
ETRIC
CONSUMER
DATA LANDSCAPE
DATA
PRODUCER
THEMES FOR DATA AT SCALE
AutonomyAutonomy
Alignment
Ownership
Platform
Transparency
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
45
M
ETRIC
CONSUMER
DATA LANDSCAPE
DATA
PRODUCER
THEMES FOR DATA AT SCALE
Autonomy
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
46
M
ETRIC
CONSUMER
DATA LANDSCAPE
DATA
PRODUCER
THEMES FOR DATA AT SCALE
Autonomy
Alignment
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
47
M
ETRIC
CONSUMER
DATA LANDSCAPE
DATA
PRODUCER
THEMES FOR DATA AT SCALE
Autonomy
Alignment
Ownership
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
48
M
ETRIC
CONSUMER
DATA PLATFORM
DATA LANDSCAPE
DATA
PRODUCER
THEMES FOR DATA AT SCALE
Autonomy
Alignment
Ownership
Platform
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
49
M
ETRIC
CONSUMER
DATA PLATFORM
DATA LANDSCAPE
DATA
PRODUCER
THEMES FOR DATA AT SCALE
Autonomy
Alignment
Ownership
Platform
Transparency
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
50
AWSCENTRAL DATA LAKE ON S3
ROLES & RESPONSIBILITIES
DATA CATALOG
DATA INFRA
CHECKOUT
SERVICE
PRODUCER
SPECIAL
OFFER
SERVICE
CONSUMER
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
51
AWSCENTRAL DATA LAKE ON S3
ROLES & RESPONSIBILITIES
DATA CATALOG
DATA INFRA
ORDER EVENTS
EVENT METADATA
CHECKOUT
SERVICE
PRODUCER
SPECIAL
OFFER
SERVICE
CONSUMER
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
52
AWSCENTRAL DATA LAKE ON S3
ROLES & RESPONSIBILITIES
ORDER EVENTS
EVENT METADATA
CHECKOUT
SERVICE
DATA CATALOG
PRODUCER
DATA INFRA
INGESTION TEMPLATE
SPECIAL
OFFER
SERVICE
CONSUMER
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
53
AWSCENTRAL DATA LAKE ON S3
ROLES & RESPONSIBILITIES
ORDER EVENTS
EVENT METADATA
CHECKOUT
SERVICE
DATA CATALOG
PRODUCER
DATA INFRA
INGESTION TEMPLATE VIEW: ORDER HISTORY BY USER
SPECIAL
OFFER
SERVICE
CONSUMER
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
54
DATADEVOPS
A CULTURE OF
DISTRIBUTED RESPONSIBILITIES
ABOUT DATA & ANALYTICS
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
55
DATADEVOPS
WHAT IS DEVOPS?
Distributed
Ops skills
Shared Ops
responsibilities
Self-service
platforms
Cross-functional
dev teams
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
56
DATADEVOPS
WHAT IS DATADEVOPS?
Distributed
Data skills
Shared Data
responsibilities
Self-service
Data platform
Cross-functional
product teams
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
57
Consequences for Product Teams
‣ Think about data & reporting
‣ Deliver your data to the lake
‣ Provide meta data
‣ Eat your own dog food: Consume your own data
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
58
Benefits for Product Teams
‣ Independently work with data
‣ No dependencies to data teams
‣ It’s easy to consume data produced by other teams
‣ Faster product & measurement iterations
TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
THANKS!
QUESTIONS?
A MANIFESTO FOR A DEVOPS-LIKE
CULTURE SHIFT IN DATA &
ANALYTICS
SEBASTIAN HEROLD
DR. ARIF WIDER
2018-06-26
MUNICH

DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics