www.scout24.com
The Scout24 Data Landscape Manifesto:
Building an Opinionated Data Platform
Predictive Analytics World Berlin | 13.11.2018 | Sean Gustafson
5
Core Geographies
and an overall presence
in 18 countries
80m
Household Reach
2
Major Household Brand Names
Scout24 AG
• SDAX
• € 489 million revenue (2017)
Our technical evolution
How we started in 2007
BI Tool
Middle
Tier
DWH
Staging
Core
DB
CRM
2007
Web
Tier
Analyst
BI Dev
How things got complicated in 2011
BI Tool
Middle
Tier
DWH
Staging
Core DB
CRM
Web
2011
API
APP
$$$
APPMySQL
Analyst
BI Dev
APPMySQL
APPMySQL
APPMySQL
How we sliced the monolith in 2013
BI Tool
DWH
StagingCRM
Web
2013
API
APPMySQL
Core
DB
EXP
Mongo
SEA
Elastic
Sync APP
APIAPI
API
HADOOP
REST API
Analyst
BI Dev
DE
AWS
APP
APP
APP
APPMySQL
APPMySQL
APPMySQL
How a central data team doesn’t scale
BI Tool
DWH
StagingCRM
Web
2015
API
APPMySQL
Core
DB
EXP
Mongo
SEA
Elastic
Sync APP
APIAPIAPI
HADOOP
REST API
APPAPP
Analyst
BI Dev
DE
Core
DB
APPAPPAPPAPPAPPAPPAPPAPPAPP
AWS
How we re-architected our Data Landscape
BI Tool
Presto
Central Data Lake on S3
CRM
2017
Core
DB APP
REST API
APPAPPAPP
Analyst
BI Dev
DE
DS
Spark, R, etc.
Our organizational evolution
Core
DB
APPAPPAPPAPPAPPAPPAPPAPPAPP
AWS
BI Tool
Presto
Central Data Lake on S3
CRM
2017
Core
DB APP
REST API
APPAPPAPP
Analyst
BI Dev
DE
DS
Spark, R, etc.
Analyst
BI Dev
DE
DS
Data Platform Engineering
Analysts (in residence)
Data Scientists (in residence)
Analyst
Central Analysts Team
Scout24 wants to become a truly data-driven company
Fast & easy data-driven
product development…
…supported by
Data & Analytics
Scout24 wants to become a truly data-driven company
Everywhere in the company... ...without bloating up DnA
Our cultural revolution
SCOUT24
DATA LANDSCAPE
MANIFESTO
ROLES, RESPONSIBILITIES, AND VALUES
FOR A DATA-DRIVEN COMPANY AT SCALE
Data is a key asset of our
company.
#1 Preamble
#2 Our Responsibility
We, Data & Analytics, are
responsible for providing a
solid Data Platform as well
as clear guidelines and
training how to participate
in the Data Landscape. Data Platform
DnA
Data Landscape
#3 Data Autonomy, Not Anarchy
Data autonomy puts data producers
& data consumers in control of
their data & of their metrics and
thereby allows us to be data-driven
at scale, but this comes with
responsibility. Data Platform
Data
Producer Consumer
DnA
Data Landscape
#4 Producer’s Responsibility
Data producers are responsible for
publishing data to the central Data
Lake, for the data's quality, and for
publishing metadata that makes it
easy to find and consume the data.
Data Platform
Metadata
Data
Producer
DnA
Data Landscape
#5 Consumer’s Responsibility
Data consumers are responsible for
the definition & visualization of
metrics and for driving the
implementation and maintenance of
these metrics.
Data Platform
Producer Consumer
DnA
Data Landscape
#6 Exception: Core KPIs
We, Data & Analytics, take the
full ownership and responsibility
of the few top company-wide
core KPIs.
Data Platform
Producer Consumer
DnA
Data Landscape
Core
metric
#7 Transparency Over Continuity
We value data transparency over
data continuity, which means
we may break metric
comparability if it is for the
cause of enabling better insights.
Data Platform
Producer Consumer
DnA
Data Landscape
Core
metric
The Ultimate Goal
Data Platform
Metadata
Data
Producer Consumer
DnA
Data Landscape
Core
metric
A federal landscape of data
producers and consumers with just
enough rules to ensure seamless
co-operation without severely
impeding autonomy.
Centralized Federated
Control Autonomy
Perfection Scale
Pull Push
Product is Data Product is Platform
Reporting Reporting, Ad hoc Analytics,
Machine Learning
Data Warehouse vs. Data Platform
How to convince them to go along?
à ‘Nudge’ them to participate
à Promote the platform
à Refuse new use cases in Data Warehouse
Result:
Product teams have much higher responsibility
Design ‘nudges’ into the Platform
Make Data Lake easier than something else:
- automatic table publishing, partition detection
- backup and disaster recovery
- access control for restricted data
- optimize file formats (e.g. parquet) for efficiency
Learnings and lessons
à Change needs to be technological, organizational
and cultural
à Build features to give benefits that counteract resistance
à Communication is the key
Have a strong opinion about
how your company should use data and
build a platform that pushes toward that vision.
Most importantly:

The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform

  • 1.
    www.scout24.com The Scout24 DataLandscape Manifesto: Building an Opinionated Data Platform Predictive Analytics World Berlin | 13.11.2018 | Sean Gustafson
  • 2.
    5 Core Geographies and anoverall presence in 18 countries 80m Household Reach 2 Major Household Brand Names Scout24 AG • SDAX • € 489 million revenue (2017)
  • 3.
  • 4.
    How we startedin 2007 BI Tool Middle Tier DWH Staging Core DB CRM 2007 Web Tier Analyst BI Dev
  • 5.
    How things gotcomplicated in 2011 BI Tool Middle Tier DWH Staging Core DB CRM Web 2011 API APP $$$ APPMySQL Analyst BI Dev
  • 6.
    APPMySQL APPMySQL APPMySQL How we slicedthe monolith in 2013 BI Tool DWH StagingCRM Web 2013 API APPMySQL Core DB EXP Mongo SEA Elastic Sync APP APIAPI API HADOOP REST API Analyst BI Dev DE
  • 7.
    AWS APP APP APP APPMySQL APPMySQL APPMySQL How a centraldata team doesn’t scale BI Tool DWH StagingCRM Web 2015 API APPMySQL Core DB EXP Mongo SEA Elastic Sync APP APIAPIAPI HADOOP REST API APPAPP Analyst BI Dev DE
  • 8.
    Core DB APPAPPAPPAPPAPPAPPAPPAPPAPP AWS How we re-architectedour Data Landscape BI Tool Presto Central Data Lake on S3 CRM 2017 Core DB APP REST API APPAPPAPP Analyst BI Dev DE DS Spark, R, etc.
  • 9.
  • 10.
    Core DB APPAPPAPPAPPAPPAPPAPPAPPAPP AWS BI Tool Presto Central DataLake on S3 CRM 2017 Core DB APP REST API APPAPPAPP Analyst BI Dev DE DS Spark, R, etc.
  • 11.
    Analyst BI Dev DE DS Data PlatformEngineering Analysts (in residence) Data Scientists (in residence) Analyst Central Analysts Team
  • 12.
    Scout24 wants tobecome a truly data-driven company Fast & easy data-driven product development… …supported by Data & Analytics
  • 13.
    Scout24 wants tobecome a truly data-driven company Everywhere in the company... ...without bloating up DnA
  • 14.
  • 15.
    SCOUT24 DATA LANDSCAPE MANIFESTO ROLES, RESPONSIBILITIES,AND VALUES FOR A DATA-DRIVEN COMPANY AT SCALE
  • 16.
    Data is akey asset of our company. #1 Preamble
  • 17.
    #2 Our Responsibility We,Data & Analytics, are responsible for providing a solid Data Platform as well as clear guidelines and training how to participate in the Data Landscape. Data Platform DnA Data Landscape
  • 18.
    #3 Data Autonomy,Not Anarchy Data autonomy puts data producers & data consumers in control of their data & of their metrics and thereby allows us to be data-driven at scale, but this comes with responsibility. Data Platform Data Producer Consumer DnA Data Landscape
  • 19.
    #4 Producer’s Responsibility Dataproducers are responsible for publishing data to the central Data Lake, for the data's quality, and for publishing metadata that makes it easy to find and consume the data. Data Platform Metadata Data Producer DnA Data Landscape
  • 20.
    #5 Consumer’s Responsibility Dataconsumers are responsible for the definition & visualization of metrics and for driving the implementation and maintenance of these metrics. Data Platform Producer Consumer DnA Data Landscape
  • 21.
    #6 Exception: CoreKPIs We, Data & Analytics, take the full ownership and responsibility of the few top company-wide core KPIs. Data Platform Producer Consumer DnA Data Landscape Core metric
  • 22.
    #7 Transparency OverContinuity We value data transparency over data continuity, which means we may break metric comparability if it is for the cause of enabling better insights. Data Platform Producer Consumer DnA Data Landscape Core metric
  • 23.
    The Ultimate Goal DataPlatform Metadata Data Producer Consumer DnA Data Landscape Core metric A federal landscape of data producers and consumers with just enough rules to ensure seamless co-operation without severely impeding autonomy.
  • 24.
    Centralized Federated Control Autonomy PerfectionScale Pull Push Product is Data Product is Platform Reporting Reporting, Ad hoc Analytics, Machine Learning Data Warehouse vs. Data Platform
  • 25.
    How to convincethem to go along? à ‘Nudge’ them to participate à Promote the platform à Refuse new use cases in Data Warehouse Result: Product teams have much higher responsibility
  • 26.
    Design ‘nudges’ intothe Platform Make Data Lake easier than something else: - automatic table publishing, partition detection - backup and disaster recovery - access control for restricted data - optimize file formats (e.g. parquet) for efficiency
  • 27.
    Learnings and lessons àChange needs to be technological, organizational and cultural à Build features to give benefits that counteract resistance à Communication is the key
  • 28.
    Have a strongopinion about how your company should use data and build a platform that pushes toward that vision. Most importantly: