DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics

DataDevOps: A Manifesto for a
DevOps-like Culture Shift
in Data & Analytics
Dr. Arif Wider & Sebastian Herold
Munich, Feb 7th, 2018

Seite 2
Dr. Arif Wider
- Senior Consultant/Dev
- Scala/FP enthusiast
- ThoughtWorks Germany
data strategy group
@arifwider
Sebastian Herold
- Chief Data Architect
@Scout @Scout24
until Dec
- BigData Architect
@Zalando from Jan
- Data Evangelist
@heroldamus

Seite 3
Road to MicroService Architecture – How we started in 2007
BI Tool
Middle
Tier
DWH
Staging
Core DB
CRM
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
2007
Web
Tier
Analyst
BI Dev

Seite 4
Road to MicroService Architecture – How things got complicated in 2011
BI Tool
Middle
Tier
DWH
Staging
Core DB
CRM
Web
2011
API
APP
$$$
APPMySQL
Analyst
BI Dev

APPMySQL
APPMySQL
APPMySQL
Seite 5
Road to MicroService Architecture – How we sliced the monolith in 2013
BI Tool
DWH
StagingCRM
Web
2013
API
APPMySQL
Core DB
EXP
Mongo
SEA
Elastic
Sync APP
APIAPI
API
HADOOP
REST API
Analyst
BI Dev
DE

AWS
APP
APP
APP
APPMySQL
APPMySQL
APPMySQL
Seite 6
Road to MicroService Architecture – How a central data team doesn’t scale
BI Tool
DWH
StagingCRM
Web
2015
API
APPMySQL
Core DB
EXP
Mongo
SEA
Elastic
Sync APP
APIAPIAPI
HADOOP
REST API
APPAPP
Analyst
BI Dev
DE

Core DB APPAPPAPPAPPAPPAPPAPPAPPAPP
AWS
Seite 7
Road to MicroService Architecture – How we rearchitectured our Data Landscape
BI Tool
DWH
Central Data Lake on S3
CRM
2017
Core DB APP
REST API
Analyst
DE
BI Dev
APPAPPAPP

Seite 8
Scout24 wants to become a truly data-driven company
Fast & easy data-driven
product development…
…supported by
Data & Analytics

Seite 9
Scout24 wants to become a truly data-driven company
Everywhere in the company... ...without bloating up D‘n‘A
Image source: https://www.oddsemiconductorservices.com/

Seite 10
SCOUT24
DATA LANDSCAPE
MANIFESTO
ROLES, RESPONSIBILITIES, AND VALUES
FOR A DATA-DRIVEN COMPANY AT SCALE

Seite 11
SCOUT24 DATA LANDSCAPE MANIFESTO
#1 Preamble
Data is a key asset of our
company.

Seite 12
#2 Our Responsibility
We, Data & Analytics, are
responsible for providing a
solid Data Platform as well
as clear guidelines and
training how to participate
in the Data Landscape.
Data Platform
D’n’A
Data Landscape

Seite 13
#3 Data Autonomy, Not Anarchy
Data autonomy puts data
producers & data consumers
in control of their data & of
their metrics and thereby
allows us to be data-driven
at scale, but this comes with
responsibility.
Data Platform
Data
Producer Consumer
D’n’A
Data Landscape

Seite 14
Roles & Responsibilities
Checkout
service
Special
offer
service
D’N’A
Producer
Consumer
Data Catalog
D’n’A

Seite 15
#4 Producer’s Responsibility
Data producers are
responsible for publishing
data to the central Data
Lake, for the data's quality,
and for publishing metadata
that makes it easy to find
and consume the data.
Data Platform
Metadata
Data
Producer
D’n’A
Data Landscape

Data Catalog
Seite 16
Checkout
service
order events
Special
offer
service
Producer
Consumer
D’n’A

Data Catalog
Seite 17
Checkout
service
order events
Special
offer
service
Ingestion Template
Producer
Consumer
D’n’A

Seite 18
#5 Consumer’s Responsibility
Data consumers are
responsible for the definition
& visualization of metrics
and for driving the imple-
mentation and maintenance
of these metrics.
Data Platform
Producer Consumer
D’n’A
Data Landscape

Data Catalog
Seite 19
Checkout
service
order events
Special
offer
service
View: order history by userIngestion Template
Producer
Consumer
D’n’A

Seite 20
#6 Exception: Core KPIs
We, Data & Analytics, take
the full ownership and
responsibility of the few top
company-wide core KPIs.
Data Platform
Producer Consumer
D’n’A
Data Landscape
Core
metric

Data Catalog
Seite 21
BI Tool
Analyst
Checkout
service
order events
Special
offer
service
View: order history by user
View: revenue generated
from orders by segments
Ingestion Template
Producer
Consumer
D’n’A

Seite 22
#7 Transparency Over Continuity
We value data transparency
over data continuity, which
means we may break metric
comparability if it is for the
cause of enabling better
insights.
Data Platform
Producer Consumer
D’n’A
Data Landscape
Core
metric

Seite 23
The Ultimate Goal
Data Platform
Metadata
Data
Producer Consumer
D’n’A
Data Landscape
Core
metric
Data
products
A federal landscape of data
producers and consumers
with just enough rules to
ensure seamless co-
operation without severely
impeding autonomy.

Seite 24
Consequences for Product
Development Teams?
- Think about data & reporting
- Deliver your data to the lake
- Provide meta data (schema, descriptions, versions)
- Eat your own dog food: Consume your own data
for reporting -> take responsibility for data quality

Seite 25
Benefits for Product Development
Teams?
- Independently work with data
- No dependencies to data teams
- Company data is curated and it’s easy to consume
data produced by other teams

DevOps
Seite 26
#DataDevOps

Seite 27
Learnings and lessons
 Publish exhaustive, general, and denormalized event data
 Avoid consumer-specific tailoring of data you publish
 Consume your own data, e.g. for KPI reports
 Try out ad-hoc analytics notebooks to get better insights
 Inform data producers, if you rely on their data
 Invest in documentation and guidelines for your data
platform to keep your effort for support low

www.scout24.com
Thanks!
Questions?
Sebastian Herold Arif Wider

DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics

Similar to DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics (20)

More from Dr. Arif Wider

More from Dr. Arif Wider (9)

Recently uploaded

Recently uploaded (20)

DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics

Editor's Notes