To support the successful production, consumption and governance of data needed to establish a data-driven product team, Scout24 (Europe’s largest online marketplace for cars and real estate) and ThoughtWorks created a manifesto of seven principles for DataDevOps.
4. DataDevOps | Sean Gustafson & Arif Wider
HOW WE STARTED IN 2007
BI TOOL
2007
MIDDLE TIERWEB CORE DB
CRM
DWH
ANALYST
BI DEV
STAGING
5. DataDevOps | Sean Gustafson & Arif Wider
APP
HOW THINGS GOT COMPLICATED IN 2011
BI TOOL
2011
$$$
APP
API
MIDDLE TIER
WEB
CORE DB
CRM
DWH
MYSQL
ANALYST
BI DEV
STAGING
2007
6. DataDevOps | Sean Gustafson & Arif Wider
HADOOP
REST API
APP
MYSQL
BI TOOL
CORE DB
CRM
DWH
MYSQL
ANALYST
BI DEV
STAGING
HOW WE SLICED THE MONOLITH IN 2013
2013
APP
APP
WEB
SEA
API
API
API
API
SYNC
ELASTIC
DE
2011
2007
7. DataDevOps | Sean Gustafson & Arif Wider
AWS
APP
HADOOP
REST API
BI TOOL
CORE DB
CRM
DWH
ANALYST
BI DEV
STAGING
APP
WEB
EXP SEA
API
API
API
API
SYNC
MONGO ELASTIC
DE
HOW A CENTRAL DATA TEAM DOESN’T SCALE
APP APP
2015
2013
2011
2007
APP
APP
APP
MYSQL
MYSQL
MYSQL
MYSQL APP
8. DataDevOps | Sean Gustafson & Arif Wider
AWS
CENTRAL DATA LAKE ON S3
AWS
HOW WE REARCHITECTURED OUR DATA LANDSCAPE
2017
BI DEV
DE
CORE DB
CORE DBAPI
API
API
CRM
APP
APP
APP
APP
APP
APP
APP
APP
REST API
2015
2013
2011
2007
BI TOOL
DWH
ANALYST
10. DataDevOps | Sean Gustafson & Arif Wider
AC
T
PL
AN
D
O
CH
ECK
Fast & easy data-driven
product development…
…supported by
Data & Analytics
11. DataDevOps | Sean Gustafson & Arif Wider
Everywhere in the company... ...without bloating up D‘n‘A
AC
T
PL
AN
D
O
CH
ECK
AC
T
PL
AN
D
O
CH
ECK
AC
T
PL
AN
D
O
CH
ECK
AC
T
PL
AN
D
O
CH
ECK
AC
T
PL
AN
D
O
CH
ECK
AC
T
PL
AN
D
O
CH
ECK
AC
T
PL
AN
D
O
CH
ECK
AC
T
PL
AN
D
O
CH
ECK
AC
T
PL
AN
D
O
CH
ECK
AC
T
PL
ANDO
CH
ECK
13. DataDevOps | Sean Gustafson & Arif Wider
SCOUT24 DATA LANDSCAPE MANIFESTO
#1 Preamble
Data is a key asset of our company.
SCOUT24 DATA LANDSCAPE MANIFESTO
14. DataDevOps | Sean Gustafson & Arif Wider
#2 Central Data Team’s Responsibility
We, Data & Analytics, are responsible for
providing a solid Data Platform as well
as clear guidelines and training how to
participate in the Data Landscape.
SCOUT24 DATA LANDSCAPE MANIFESTO
D’n’A
DATA PLATFORM
DATA LANDSCAPE
15. DataDevOps | Sean Gustafson & Arif Wider
SCOUT24 DATA LANDSCAPE MANIFESTO
#3 Data Autonomy, Not Anarchy
Data autonomy puts data producers &
data consumers in control of their data
& of their metrics and thereby allows
us to be data-driven at scale, but this
comes with responsibility.
SCOUT24 DATA LANDSCAPE MANIFESTO
M
ETRIC
CONSUMER
D’n’A
DATA PLATFORM
DATA LANDSCAPE
DATA
PRODUCER
16. DataDevOps | Sean Gustafson & Arif Wider
AWSCENTRAL DATA LAKE ON S3
ROLES & RESPONSIBILITIES
DATA CATALOG
D’n’A
CHECKOUT
SERVICE
PRODUCER
SPECIAL
OFFER
SERVICE
CONSUMER
17. DataDevOps | Sean Gustafson & Arif Wider
SCOUT24 DATA LANDSCAPE MANIFESTO
#4 Producer’s Responsibility
Data producers are responsible for
publishing data to the central Data
Lake, for the data's quality, and for
publishing metadata that makes it
easy to find and consume the data.
SCOUT24 DATA LANDSCAPE MANIFESTO
D’n’A
DATA PLATFORM
DATA LANDSCAPE
DATA
METADATA
PRODUCER
18. DataDevOps | Sean Gustafson & Arif Wider
AWSCENTRAL DATA LAKE ON S3
ROLES & RESPONSIBILITIES
DATA CATALOG
D’n’A
ORDER EVENTS
EVENT METADATA
CHECKOUT
SERVICE
PRODUCER
SPECIAL
OFFER
SERVICE
CONSUMER
19. DataDevOps | Sean Gustafson & Arif Wider
AWSCENTRAL DATA LAKE ON S3
ROLES & RESPONSIBILITIES
ORDER EVENTS
EVENT METADATA
CHECKOUT
SERVICE
DATA CATALOG
PRODUCER
D’n’A
INGESTION TEMPLATE
SPECIAL
OFFER
SERVICE
CONSUMER
20. DataDevOps | Sean Gustafson & Arif Wider
SCOUT24 DATA LANDSCAPE MANIFESTO
#5 Consumer’s Responsibility
Data consumers are responsible for the
definition & visualization of metrics and
for driving the implementation and
maintenance of these metrics.
SCOUT24 DATA LANDSCAPE MANIFESTO
M
ETRIC
CONSUMER
D’n’A
DATA PLATFORM
DATA LANDSCAPE
PRODUCER
21. DataDevOps | Sean Gustafson & Arif Wider
AWSCENTRAL DATA LAKE ON S3
ROLES & RESPONSIBILITIES
ORDER EVENTS
EVENT METADATA
CHECKOUT
SERVICE
DATA CATALOG
PRODUCER
SPECIAL
OFFER
SERVICE
CONSUMER
D’n’A
INGESTION TEMPLATE VIEW: ORDER HISTORY BY USER
22. DataDevOps | Sean Gustafson & Arif Wider
SCOUT24 DATA LANDSCAPE MANIFESTO
#6 Transparency Over Continuity
We value data transparency over data
continuity, which means we may
break metric comparability if it is for
the cause of enabling better insights.
SCOUT24 DATA LANDSCAPE MANIFESTO
M
ETRIC
CONSUMER
D’n’A
DATA PLATFORM
CORE
METRIC
DATA LANDSCAPE
PRODUCER
23. DataDevOps | Sean Gustafson & Arif Wider
SCOUT24 DATA LANDSCAPE MANIFESTO
The Ultimate Goal
SCOUT24 DATA LANDSCAPE MANIFESTO
A federal landscape of data producers
and consumers with just enough norms
to ensure seamless co-operation
without severely impeding autonomy. M
ETRIC
CONSUMER
D’n’A
DATA PLATFORM
DATA LANDSCAPE
DATA
METADATA
PRODUCER
24. DataDevOps | Sean Gustafson & Arif Wider
Consequences for Product Teams
‣ Think about data & reporting
‣ Deliver your data to the lake
‣ Provide meta data
‣ Eat your own dog food: Consume your own data
25. DataDevOps | Sean Gustafson & Arif Wider
Benefits for Product Teams
‣ Independently work with data
‣ No dependencies to data teams
‣ It’s easy to consume data produced by other teams
‣ Faster product & measurement iterations
28. DataDevOps | Sean Gustafson & Arif Wider
How to convince everyone to go along?
‘Nudge’ them to participate
Promote the platform
Refuse new use cases in Data Warehouse
29. DataDevOps | Sean Gustafson & Arif Wider
Design ‘nudges’ into the Platform
Make Data Lake easier than something else:
‣ automatic table publishing, partition detection
‣ backup and disaster recovery
‣ access control for restricted data
‣ optimize file formats (e.g. parquet) for efficiency
30. DataDevOps | Sean Gustafson & Arif Wider
Learnings and lessons
‣ Change needs to be technological, organizational and cultural
‣ Build features to give benefits that counteract resistance
‣ Communication is the key
31. DataDevOps | Sean Gustafson & Arif Wider
MOST IMPORTANTLY:
Have a strong opinion about how the
company should use data and build a
platform that pushes toward that vision.