DATA SCIENCE
DEVELOPMENT
LIFECYCLE
EVERYONE TALKS ABOUT IT,
NOBODY REALLY KNOWS HOW
TO DO IT AND EVERYONE THINKS
EVERYONE ELSE IS DOING IT
René Traue, Christian Lindenlaub
Global Data Science
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Science Development Lifecycle @ GfK
© GfK© GfK
© GfK© GfK
Photo by Dino Reichmuth on Unsplash
LET’S TAKE YOU TO A JOURNEY
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Scientist
Data Scientist
Data Scientist
Data Scientist
© GfK▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 201920-Nov-19
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
GfK Point of Sales Data
World's no 1 provider for data on consumer technology products
+90 countries
Global coverage and
local market expertise
Point-of-sale data from >425,000
sample shops worldwide
14.5 million SKUs in our database,
1.5 million new SKUs per year
Channels
(online & offline)
700+ product groups tracked
and 15,000 features
Volume, value, features,
pricing, distribution …
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Is the equation that simple?
=Data Scientist
Data Scientist
Data Scientist
Data Scientist
© GfK
2017
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Scientist
Data Scientist
Data Scientist
Data Scientist
© GfK
2019
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Scientist
Data Scientist
Data ScientistScrum Master
Product Owner
Machine Learning
Engineer
Software Engineer
Client
Software Engineer
Data ScientistProduct Owner
Machine Learning
Engineer
© GfK
2019
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Scientist
Data Scientist
Data ScientistScrum Master
Product Owner
Machine Learning
Engineer
Software Engineer
Client
Software Engineer
Data ScientistProduct Owner
Machine Learning
Engineer
© GfK© GfK
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
As we are building Software – use Scrum
Product backlog
Data Scientist
Data Scientist
Data Scientist
Product Owner
Data Scientist
Product
vision
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
As we are building Software – use Scrum
Product backlog
Data Scientist
Data Scientist
Data Scientist
Scrum Master
Product Owner
Data Scientist
Research & Development
SCRUMProduct
vision
© GfK▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
As we are building software – use Scrum
As a user, I want to know …
Description:
Step 1: Try approach A
Step 2: Evalute results
…
Definition of Done:
…
Acceptance Criteria:
…
38
5
3
Data Scientist
Data Scientist
Data Scientist
Scrum Master
Product Owner
Data Scientist
Planning Meeting
20-Nov-19
© GfK© GfK
two days later…
Photo by chuttersnap on Unsplash
© GfK▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Re-Specification II
another Planning Meeting
As a user, I want to know …
Description:
Step 1: Try approach B
Step 2: Evalute results
…
Definition of Done:
…
Acceptance Criteria:
…
131
5
8
Data Scientist
Data Scientist
Data Scientist
Scrum Master
Product Owner
Data Scientist
20-Nov-19
© GfK© GfK
one day later…
Photo by chuttersnap on Unsplash
© GfK
Re-Specification III
another Planning Meeting
As a user, I want to know …
Description:
Step 1: Try approach C
Step 2: Evalute results
…
Definition of Done:
…
Acceptance Criteria:
…
88
8
3
Data Scientist
Data Scientist
Data Scientist
Scrum Master
Product Owner
Data Scientist
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Lost in documentation …
“Individual and Interactions
over processes and tools” ?!?Data Scientist
Data Scientist
Data Scientist
Data Scientist
Scrum Master
Product Owner
© GfK© GfK
“Data Scientist work estimation:
Building a plane
vs.
exploring what flying is”
Photo by Ross Parmly on Unsplash
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
We are doing research!
We are doing Research!
Data Scientist
Data Scientist Data Scientist
Data Scientist
© GfK
In Progress Testing Done
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
We are doing research!
Let’s have less formalities
We are doing Research!
As a user, I want to know …
Description:
Try approach ABC
… and less specific
with the ticketsData Scientist
Data Scientist Data Scientist
Data Scientist
Acceptance criteria:
Result has to be negative...
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Happy Data Scientists, but unhappy stakeholders
How far are we?
When can we
release?
Data Scientist
Data Scientist Data Scientist
Data Scientist
Scrum Master
Product Owner
© GfK© GfK
How to solve the
dilemma?
Photo by Olav Ahrens Røtne on Unsplash
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
How to solve the dilemma?
Research & Development
SCRUM
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Research ≠ Production
Engineering & Integration
Research
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Research ≠ Production
Use Kanban in Research
Engineering & Integration
Research
[KANBAN]
Product backlog
Product
vision
requirements
prioritization,
basic acceptance
criteria,
risk assessment,
estimation
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Research ≠ Production
Use Kanban in Research
Engineering & Integration
Research
ToDo In Progress Testing Done
[KANBAN]
Product backlog
Product
vision
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Research ≠ Production
Use Kanban in Research
Engineering & Integration
Research
ToDo In Progress Testing Done
[KANBAN]
Product backlog
Product
vision
Validate results with
PM/Clients
acceptance criteria
(refinement)
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Research ≠ Production
Use Kanban in Research
Engineering & Integration
Research
ToDo In Progress Testing Done
[KANBAN]
Product backlog
Product
vision
Iterative process
Validate results with
PM/Clients
acceptance criteria
(refinement)
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Research ≠ Production
Create Transparency within Kanban mode
Engineering & Integration
Research
[KANBAN]
Product backlog
Product
vision
Iterative process
Daily StandupStorytelling
Retrospective
Planning
Validate results with
PM/Clients
acceptance criteria
(refinement)
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Research ≠ Production
Let’s integrate the Prototype
Engineering & Integration
Research
[KANBAN]
Product backlog
Product
vision
Iterative process
Daily StandupStorytelling
Retrospective
Planning
Development backlog
Validate results with
PM/Clients
acceptance criteria
(refinement)
© GfK© GfK
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
“Software is built by Software Engineers”
+
Software Engineer
Software Engineer
Data Scientist
Data Scientist
Data Scientist
Data Scientist
© GfK
20-Nov-19
▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Scientist
Data Scientist
Data Scientist
Data Scientist
Software Engineer
Software Engineer
© GfK
20-Nov-19
▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Scientist
Data Scientist
Data Scientist
Data Scientist
Software Engineer
Software Engineer
© GfK© GfK
How to solve the
dilemma?
Photo by Olav Ahrens Røtne on Unsplash
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Adapt team setup
Add Machine Learning Engineers to the team
Software Engineer
Software Engineer
Data Scientist
Data Scientist
Data Scientist
Data Scientist
Machine Learning
Engineer
Machine Learning
Engineer
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Research ≠ Production
Let’s productionize the Prototype
Engineering & Integration
Research
[KANBAN]
Product backlog
Product
vision
Iterative process
Daily StandupStorytelling
Retrospective
Planning
acceptance criteria
(refinement)
Validate results with
PM/Clients
Development backlog
Develop, test,
release
Develop, test,
release
[SCRUM]
© GfK© GfK
© GfK
20-Nov-19
▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
RELEASE PARTY
Data Scientist
Data Scientist
Data ScientistScrum Master
Product Owner
Machine Learning
Engineer
Machine Learning
Engineer
Software Engineer
Software Engineer
Data ScientistProduct Owner
MVP
© GfK
20-Nov-19
▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
And Now?
Data Scientist
Data Scientist
Data ScientistScrum Master
Product Owner
Machine Learning
Engineer
Machine Learning
Engineer
Software Engineer
Software Engineer
Data ScientistProduct Owner
MVP
© GfK
20-Nov-19
▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Clients Start Using It
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Feedback and Questions arrive …
PRESCRIPTIVE ANALYTICS SOLUTION
Price KPI 1 in November 2019:
0.87
Client
Price KPI 1 seems
pretty high…
© GfK
20-Nov-19
▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Feedback and Questions arrive …
Data Scientist
Data Scientist
Data Scientist
Data Scientist
Price KPI 1 seems
pretty high.
Why is item XY not
shown in week 48?
How to interpret this
number over time?
How is this value
exactly calculated?
© GfK
20-Nov-19
▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
And the running system needs to be monitored
Data Scientist
Data Scientist
Data Scientist
Data Scientist
We generated new results
for October – please
evaluate if they still make
sense!
Product Owner
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Incident-based handling…
▪ Every request == exploration task
▪ No data-driven monitoring/debugging
beside model diagnostics
We generated new results
for October – please
evaluate if they still make
sense!
Price KPI 1 is seems
too big…
Why is item XY not
shown in week 48?
How to interpret this
number over time?
How is this value
exactly calculated?
© GfK
20-Nov-19
▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
…while development goes on.
Data Scientist
Data Scientist
Data Scientist
Data Scientist
We generated new results
for October – please
evaluate if they still make
sense!
Price KPI 1 is seems
too big…
Why is item XY not
shown in week 48?
How to interpret this
number over time?
How is this value
exactly calculated?
Product backlog
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Incident-based Quality Assurance
▪ Torn between development and production
▪ Expensive exploration tasks to answer requests
▪ Development constantly interrupted by urgent feedback-
tasks
Data Scientist
Data Scientist
Data Scientist
Data Scientist
© GfK© GfK
How to solve the
dilemma?
Photo by Olav Ahrens Røtne on Unsplash
© GfK
Establish a Quality Assurance system to
✓Answer client requests
✓Monitor behavior in production
Photo by Olav Ahrens Røtne on Unsplash
© GfK
Quality Assurance System
▪ The quality assurance is based on meta-data called
→ Quality Performance Indicators [QPIs]:
- „QPIs are measures that quantify module behavior“
- QPIs are calculated during pipeline execution and help to persist intermediate results
- Every data point in the Front-End has its particular set of QPIs attached
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
© GfK
Establish a Quality Assurance system to
✓Answer client requests
✓Monitor behavior in production
Photo by Olav Ahrens Røtne on Unsplash
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Answer Client Requests
PRESCRIPTIVE ANALYTICS SOLUTION
Price KPI 1 in November 2019:
0.87
Client
Price KPI 1 seems
pretty high…
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Quality Assurance System
Price KPI 1 in November 2019:
0.87
KPI
Price KPI1:
0.87
PRESCRIPTIVE ANALYTICS SOLUTION
© GfK
Quality Assurance System
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Model Accuracy:
0.89
Accuracy of the model used to estimate this value.
Revenue Base:
2.48 Mio €
Sum of Revenue of the transaction behind the measure.
Share of imputed observations:
0.0075
Share of the observations with imputation.
Variance last 5 weeks:
0.35
Variance a measure has over the last n observations.
QPIQPIQPIQPIQPI
KPI
Price KPI1:
0.87
Removed Outliers:
0.03
Share of observations not considered due to outlier reduction
© GfK
Quality Assurance System
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Model Accuracy:
0.89
Accuracy of the model used to estimate this value.
Revenue Base:
2.48 Mio €
Sum of Revenue of the transaction behind the measure.
Share of imputed observations:
0.0075
Share of the observations with imputation.
Price Variance last 5 weeks:
0.35
Variance of prices in the last 5 weeks.
QPIQPIQPIQPIQPI
KPI
Price KPI1:
0.87
Removed Outliers:
0.03
Share of observations not considered due to outlier reduction
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Quality Assurance System
PRESCRIPTIVE ANALYTICS SOLUTION
Price KPI 1 in November 2019:
0.87
Client
Price KPI 1 seems
pretty high…
© GfK
Client
Price KPI 1 seems
pretty high…
Quality Assurance System
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
QPI
Data Scientist
You are right – we checked the
results and its caused by a high
variance in the prices over the
last months.
Price Variance last 5 weeks:
0.35
Variance of prices in the last 5 weeks.
© GfK
Client
Price KPI 1 seems
pretty high…
Quality Assurance System
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
QPI
Data Scientist
You are right – we checked the
results and its caused by a high
variance in the prices over the
last months.
Price Variance last 5 weeks:
0.35
Variance of prices in the last 5 weeks.
QPI
Promo Intensity:
0.89
Share of sales under promotion.
We also see a super high
promotion intensity on this
item. The Price KPI is strongly
influenced by changes in the
promotion patterns.
© GfK
Quality Assurance System
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
QPI
Data Scientist
You are right – we checked the
results and its caused by a high
variance in the prices over the
last months.
Price
Varia
QPI
Sh
We also see a super high
promotion intensity on this
item. The Price KPI is strongly
influenced by changes in the
promotion patterns.
How to approach user/client requests
▪ Increase share of client requests answerable by
QPIs
© GfK
Quality Assurance System
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
QPI
Operations
Price
Varia
QPI
Sh
We also see a super high
promotion intensity on this
item. The Price KPI is strongly
influenced by changes in the
promotion patterns.
How to approach user/client requests
▪ Increase share of client requests answerable by
QPIs
▪ Free-Up development resources by including
operations
You are right – we checked the
results and its caused by a high
variance in the prices over the
last months.
© GfK
Quality Assurance System
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
We also see a super high
promotion intensity on this
item. The Price KPI is strongly
influenced by changes in the
promotion patterns.
How to approach user/client requests
▪ Increase share of client requests answerable by
QPIs
▪ Free-Up development resources by including
operations
▪ Self-service evidence functionality
PRESCRIP
Price KPI
This values is caused by a high
variance in the prices over the
last months.
© GfK
Establish a Quality Assurance system to
✓Answer client requests
✓Monitor behavior in production
Photo by Olav Ahrens Røtne on Unsplash
© GfK
Monitor correct behavior in production
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data
Module FrontEnd
© GfK
Monitor correct behavior in production
Introduce additional Quality Assurance layers
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Introduce additional Quality Assurance layers
Data
Module FrontEnd
Output
QA
© GfK
Monitor correct behavior in production
Goal:
Understand for every data point in
the front end if the quality is good
enough.
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data
Module FrontEnd
Output
QA
© GfK
Monitor correct behavior in production
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Model Accuracy:
0.89
Accuracy of the model used to estimate this value.
Revenue Base:
2.48 Mio €
Sum of Revenue of the transaction behind the measure.
Share of imputed observations:
0.0075
Share of the observations with imputation.
QPIQPIQPIQPIQPI
KPI
Price KPI1:
0.87
Removed Outliers:
0.03
Share of observations not considered due to outlier reduction
Price Variance last 5 weeks:
0.35
Variance of prices in the last 5 weeks.
© GfK
Monitor correct behavior in production
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Model Accuracy:
0.89
Accuracy of the model used to estimate this value.
Revenue Base:
2.48 Mio €
Sum of Revenue of the transaction behind the measure.
Share of imputed observations:
0.0075
Share of the observations with imputation.
QPIQPIQPIQPIQPI
KPI
If Removed Outliers > 0.04:
WARN
Rule
If Removed Outliers > 0.06:
BLOCK
Rule
If Model Accuracy < 0.75:
BLOCK
Rule
Price KPI1:
0.87
Removed Outliers:
0.03
Share of observations not considered due to outlier reduction
Price Variance last 5 weeks:
0.35
Variance of prices in the last 5 weeks.
© GfK
Monitor correct behavior in production
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data
Module FrontEnd
Output
QA
© GfK
Output of the QA-System
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Module FrontEnd
Output
QA
BLOCK WARN PASS
Transactions 0,2 % 0,1 % 99,7 %
Revenue 0,8 % 2,8 % 96,4 %
Salesunits 1,1 % 3,7% 95,2 %
© GfK© GfK
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
What do we mean with scaling?
It‘s not a technical problem:
- Not technically running the modules with more data/faster
It‘s a methodological challenge:
- If the method to generate a churn model works for one our our e-commerce stores, could we
use this for our other stores as well?
- If a predictive maintanance model runs well for the machine in Plant A, could we use the
same approach for Plant B?
- If we can apply a module on the data for TVs in Germany, does it also work for Washing
Machines in France?
© GfK
Let‘s scale…
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Scientist
Data Scientist
Data Scientist Machine Learning
Engineer
Machine Learning
Engineer
Data Scientist
© GfK
Let‘s scale…
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Product Owner
We got great client feedback
for our MVP for the first
country. Could we scale the
product around the globe
now?
Data Scientist
Data Scientist
Data Scientist Machine Learning
Engineer
Machine Learning
Engineer
Data Scientist
© GfK
Let‘s scale…
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Scientist
Data Scientist
Data Scientist Machine Learning
Engineer
Machine Learning
Engineer
Data Scientist
Product Owner
We got great client feedback
for our MVP for the first
country. Could we scale the
product around the globe
now?
© GfK
Let‘s scale…
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Scientist
Data Scientist
Data Scientist Machine Learning
Engineer
Machine Learning
Engineer
Data Scientist
Product Owner
„It seems like the pipeline
does not run through for
45 % of the countries. And
the error messages differ
between them.“
We got great client feedback
for our MVP for the first
country. Could we scale the
product around the globe
now?
© GfK
Let‘s scale…
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Scientist
Data Scientist
Data Scientist Machine Learning
Engineer
Machine Learning
Engineer
Data Scientist
Product Owner
„It seems like the pipeline
does not run through for
45 % of the countries. And
the error messages differ
between them.“
„The outlier detection seems
to be too strict for smaller
countries. We basically filter
out everything here.“
„We see different market
behaviour in the asian
market. Our approach does
not work there.“
„Data Quality really differs
between the markets.“
We got great client feedback
for our MVP for the first
country. Could we scale the
product around the globe
now?
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
What happened?
▪ Solution „overfitted“ to suit a subset of the total data
▪ Set-up of a dedicated scaling team required
▪ Too many versions for different data sources
▪ Overall process has been time-consuming
Data Scientist
Data Scientist
Data Scientist Machine Learning
Engineer
Machine Learning
Engineer
Data Scientist
© GfK
Let‘s scale…
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Scientist
Data Scientist
Data Scientist Machine Learning
Engineer
Machine Learning
Engineer
Data Scientist
Product Owner
„OK, we have provided the
results now for 350 category-
country combinations over 60
weeks – could you please
evaluate the results?
?!?
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
What happened?
▪ Solution „overfitted“ to suit a subset of the total data
▪ Set-up of a dedicated scaling team required
▪ Too many versions for different data sources
▪ Overall process has been time-consuming
▪ The “human”-evaluation process did not scale
Data Scientist
Data Scientist
Data Scientist Machine Learning
Engineer
Machine Learning
Engineer
Data Scientist
© GfK© GfK
How to solve the
dilemma?
Photo by Olav Ahrens Røtne on Unsplash
© GfK
✓Consider Scaling During Research
✓Automate Evaluation Of New Data
Photo by Olav Ahrens Røtne on Unsplash
© GfK
✓Consider Scaling During Research
✓Automate Evaluation Of New Data
Photo by Olav Ahrens Røtne on Unsplash
© GfK
Consider Scaling during Research
▪ „Broad“ data samples available during development
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
© GfK
Consider Scaling during Research
▪ „Broad“ data samples available during development
▪ Make Scaling a step in the KANBAN-process
▪ Use QPIs to check hypotheses/observations
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Product backlog
Research
[KANBAN]
Product
vision
Iterative process
Daily StandupStorytelling
Retrospective
Planning
Validation with
PM/Clients
Check
Scaling
© GfK
Consider Scaling During Research
Examples
▪ Use QPIs to check assumptions/findings across the data landscape
▪ “I see that the data source does not contain negative values in column XY”
→ Make a QPI out of it
▪ “I see that my filter reduces 0.9% of the observations”
→ Make a QPI out of it
▪ “Using a Random Forest gives me a accuracy 0.87”
→ Make a QPI out of it
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
© GfK
Consider Scaling during Research
▪ „Broad“ data samples available during development
▪ Make Scaling a step in the KANBAN-process
▪ Use QPIs rules to check hypotheses/observations
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Product backlog
Research
[KANBAN]
Product
vision
Iterative process
Daily StandupStorytelling
Retrospective
Planning
Validation with
PM/Clients
Check
Scaling
© GfK
✓Consider Scaling During Research
✓Automate Evaluation Of New Data
Photo by Olav Ahrens Røtne on Unsplash
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Automate Evaluation Of New Data
One Country/Category - Recap
Data Module
FrontEnd
Module
Output
QA
BLOCK WARN PASS
Transactions 0,2 % 0,1 % 99,7 %
Revenue 0,8 % 2,8 % 96,4 %
Salesunits 1,1 % 3,7% 95,2 %
© GfK
Automate Evaluation Of New Data
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
© GfK
Automate Evaluation Of New Data
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
© GfK
Automate Evaluation Of New Data
Data Module
FrontEnd
Module
Output
QA
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
© GfK
Automate Evaluation Of New Data
Data Module
FrontEnd
Module
Output
QA
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
BLOCK WARN PASS
Transactions 0,2 % 0,1 % 99,7 %
Revenue 0,8 % 2,8 % 96,4 %
Salesunits 1,1 % 3,7% 95,2 %
© GfK
Automate Evaluation Of New Data
Data Module
FrontEnd
Module
Output
QA
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
BLOCK WARN PASS
Transactions 0,2 % 0,1 % 99,7 %
Revenue 0,8 % 2,8 % 96,4 %
Salesunits 1,1 % 3,7% 95,2 %
READY FOR
USAGE
NEEDS
REFINEMENT
© GfK
Development backlog
Product backlog
Engineering & Integration
Research
Scaling
Triage and process feedback from
clients and industry experts
Maintenance & Support
[KANBAN]
[SCRUM]
Scaling Backlog/
Data Landscape
Module Execution Output QA-
Check
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Develop, test,
release
Develop, test,
release
Product
vision
Iterative process
Daily StandupStorytelling
Retrospective
Planning
Needs refinement for
Scaling Backlog
Requirements
prioritization,
acceptance criteria,
estimation
Validation with
PM/Clients,
acceptance criteria
(refinement)
Check
Scaling
© GfK© GfK
© GfK
2019
20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Data Scientist
Data Scientist
Data ScientistScrum Master
Product Owner
Machine Learning
Engineer
Software Engineer
Client
Software Engineer
Data ScientistProduct Owner
Machine Learning
Engineer
© GfK
2021?
▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
▪ How do QPIs influence in a online-
calculation mode?
▪ Limits of a rule-based quality evaluation?
© GfK20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
Questions?
Germany
Traue, René
Senior Data Scientist
+49 911 395 2381
rene.traue@gfk.com
Germany
Lindenlaub, Christian
Data Scientist
+49 911 395 4371
christian.Lindenlaub@gfk.com

Data Science Development Lifecycle - Everyone Talks About it, Nobody Really Knows How to Do it and Everyone Else is Doing It - Christian Lindenlaub and Rene Traue

  • 1.
    DATA SCIENCE DEVELOPMENT LIFECYCLE EVERYONE TALKSABOUT IT, NOBODY REALLY KNOWS HOW TO DO IT AND EVERYONE THINKS EVERYONE ELSE IS DOING IT René Traue, Christian Lindenlaub Global Data Science
  • 2.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Science Development Lifecycle @ GfK
  • 3.
  • 4.
    © GfK© GfK Photoby Dino Reichmuth on Unsplash LET’S TAKE YOU TO A JOURNEY 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
  • 5.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Scientist Data Scientist Data Scientist Data Scientist
  • 6.
    © GfK▪ DataScience Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 201920-Nov-19
  • 7.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 GfK Point of Sales Data World's no 1 provider for data on consumer technology products +90 countries Global coverage and local market expertise Point-of-sale data from >425,000 sample shops worldwide 14.5 million SKUs in our database, 1.5 million new SKUs per year Channels (online & offline) 700+ product groups tracked and 15,000 features Volume, value, features, pricing, distribution …
  • 8.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Is the equation that simple? =Data Scientist Data Scientist Data Scientist Data Scientist
  • 9.
    © GfK 2017 20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Scientist Data Scientist Data Scientist Data Scientist
  • 10.
    © GfK 2019 20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Scientist Data Scientist Data ScientistScrum Master Product Owner Machine Learning Engineer Software Engineer Client Software Engineer Data ScientistProduct Owner Machine Learning Engineer
  • 11.
    © GfK 2019 20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Scientist Data Scientist Data ScientistScrum Master Product Owner Machine Learning Engineer Software Engineer Client Software Engineer Data ScientistProduct Owner Machine Learning Engineer
  • 12.
  • 13.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 As we are building Software – use Scrum Product backlog Data Scientist Data Scientist Data Scientist Product Owner Data Scientist Product vision
  • 14.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 As we are building Software – use Scrum Product backlog Data Scientist Data Scientist Data Scientist Scrum Master Product Owner Data Scientist Research & Development SCRUMProduct vision
  • 15.
    © GfK▪ DataScience Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 As we are building software – use Scrum As a user, I want to know … Description: Step 1: Try approach A Step 2: Evalute results … Definition of Done: … Acceptance Criteria: … 38 5 3 Data Scientist Data Scientist Data Scientist Scrum Master Product Owner Data Scientist Planning Meeting 20-Nov-19
  • 16.
    © GfK© GfK twodays later… Photo by chuttersnap on Unsplash
  • 17.
    © GfK▪ DataScience Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Re-Specification II another Planning Meeting As a user, I want to know … Description: Step 1: Try approach B Step 2: Evalute results … Definition of Done: … Acceptance Criteria: … 131 5 8 Data Scientist Data Scientist Data Scientist Scrum Master Product Owner Data Scientist 20-Nov-19
  • 18.
    © GfK© GfK oneday later… Photo by chuttersnap on Unsplash
  • 19.
    © GfK Re-Specification III anotherPlanning Meeting As a user, I want to know … Description: Step 1: Try approach C Step 2: Evalute results … Definition of Done: … Acceptance Criteria: … 88 8 3 Data Scientist Data Scientist Data Scientist Scrum Master Product Owner Data Scientist 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
  • 20.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Lost in documentation … “Individual and Interactions over processes and tools” ?!?Data Scientist Data Scientist Data Scientist Data Scientist Scrum Master Product Owner
  • 21.
    © GfK© GfK “DataScientist work estimation: Building a plane vs. exploring what flying is” Photo by Ross Parmly on Unsplash
  • 22.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 We are doing research! We are doing Research! Data Scientist Data Scientist Data Scientist Data Scientist
  • 23.
    © GfK In ProgressTesting Done 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 We are doing research! Let’s have less formalities We are doing Research! As a user, I want to know … Description: Try approach ABC … and less specific with the ticketsData Scientist Data Scientist Data Scientist Data Scientist Acceptance criteria: Result has to be negative...
  • 24.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Happy Data Scientists, but unhappy stakeholders How far are we? When can we release? Data Scientist Data Scientist Data Scientist Data Scientist Scrum Master Product Owner
  • 25.
    © GfK© GfK Howto solve the dilemma? Photo by Olav Ahrens Røtne on Unsplash
  • 26.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 How to solve the dilemma? Research & Development SCRUM
  • 27.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Research ≠ Production Engineering & Integration Research
  • 28.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Research ≠ Production Use Kanban in Research Engineering & Integration Research [KANBAN] Product backlog Product vision requirements prioritization, basic acceptance criteria, risk assessment, estimation
  • 29.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Research ≠ Production Use Kanban in Research Engineering & Integration Research ToDo In Progress Testing Done [KANBAN] Product backlog Product vision
  • 30.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Research ≠ Production Use Kanban in Research Engineering & Integration Research ToDo In Progress Testing Done [KANBAN] Product backlog Product vision Validate results with PM/Clients acceptance criteria (refinement)
  • 31.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Research ≠ Production Use Kanban in Research Engineering & Integration Research ToDo In Progress Testing Done [KANBAN] Product backlog Product vision Iterative process Validate results with PM/Clients acceptance criteria (refinement)
  • 32.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Research ≠ Production Create Transparency within Kanban mode Engineering & Integration Research [KANBAN] Product backlog Product vision Iterative process Daily StandupStorytelling Retrospective Planning Validate results with PM/Clients acceptance criteria (refinement)
  • 33.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Research ≠ Production Let’s integrate the Prototype Engineering & Integration Research [KANBAN] Product backlog Product vision Iterative process Daily StandupStorytelling Retrospective Planning Development backlog Validate results with PM/Clients acceptance criteria (refinement)
  • 34.
  • 35.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 “Software is built by Software Engineers” + Software Engineer Software Engineer Data Scientist Data Scientist Data Scientist Data Scientist
  • 36.
    © GfK 20-Nov-19 ▪ DataScience Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Scientist Data Scientist Data Scientist Data Scientist Software Engineer Software Engineer
  • 37.
    © GfK 20-Nov-19 ▪ DataScience Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Scientist Data Scientist Data Scientist Data Scientist Software Engineer Software Engineer
  • 38.
    © GfK© GfK Howto solve the dilemma? Photo by Olav Ahrens Røtne on Unsplash
  • 39.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Adapt team setup Add Machine Learning Engineers to the team Software Engineer Software Engineer Data Scientist Data Scientist Data Scientist Data Scientist Machine Learning Engineer Machine Learning Engineer
  • 40.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Research ≠ Production Let’s productionize the Prototype Engineering & Integration Research [KANBAN] Product backlog Product vision Iterative process Daily StandupStorytelling Retrospective Planning acceptance criteria (refinement) Validate results with PM/Clients Development backlog Develop, test, release Develop, test, release [SCRUM]
  • 41.
  • 42.
    © GfK 20-Nov-19 ▪ DataScience Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 RELEASE PARTY Data Scientist Data Scientist Data ScientistScrum Master Product Owner Machine Learning Engineer Machine Learning Engineer Software Engineer Software Engineer Data ScientistProduct Owner MVP
  • 43.
    © GfK 20-Nov-19 ▪ DataScience Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 And Now? Data Scientist Data Scientist Data ScientistScrum Master Product Owner Machine Learning Engineer Machine Learning Engineer Software Engineer Software Engineer Data ScientistProduct Owner MVP
  • 44.
    © GfK 20-Nov-19 ▪ DataScience Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Clients Start Using It
  • 45.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Feedback and Questions arrive … PRESCRIPTIVE ANALYTICS SOLUTION Price KPI 1 in November 2019: 0.87 Client Price KPI 1 seems pretty high…
  • 46.
    © GfK 20-Nov-19 ▪ DataScience Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Feedback and Questions arrive … Data Scientist Data Scientist Data Scientist Data Scientist Price KPI 1 seems pretty high. Why is item XY not shown in week 48? How to interpret this number over time? How is this value exactly calculated?
  • 47.
    © GfK 20-Nov-19 ▪ DataScience Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 And the running system needs to be monitored Data Scientist Data Scientist Data Scientist Data Scientist We generated new results for October – please evaluate if they still make sense! Product Owner
  • 48.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Incident-based handling… ▪ Every request == exploration task ▪ No data-driven monitoring/debugging beside model diagnostics We generated new results for October – please evaluate if they still make sense! Price KPI 1 is seems too big… Why is item XY not shown in week 48? How to interpret this number over time? How is this value exactly calculated?
  • 49.
    © GfK 20-Nov-19 ▪ DataScience Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 …while development goes on. Data Scientist Data Scientist Data Scientist Data Scientist We generated new results for October – please evaluate if they still make sense! Price KPI 1 is seems too big… Why is item XY not shown in week 48? How to interpret this number over time? How is this value exactly calculated? Product backlog
  • 50.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Incident-based Quality Assurance ▪ Torn between development and production ▪ Expensive exploration tasks to answer requests ▪ Development constantly interrupted by urgent feedback- tasks Data Scientist Data Scientist Data Scientist Data Scientist
  • 51.
    © GfK© GfK Howto solve the dilemma? Photo by Olav Ahrens Røtne on Unsplash
  • 52.
    © GfK Establish aQuality Assurance system to ✓Answer client requests ✓Monitor behavior in production Photo by Olav Ahrens Røtne on Unsplash
  • 53.
    © GfK Quality AssuranceSystem ▪ The quality assurance is based on meta-data called → Quality Performance Indicators [QPIs]: - „QPIs are measures that quantify module behavior“ - QPIs are calculated during pipeline execution and help to persist intermediate results - Every data point in the Front-End has its particular set of QPIs attached 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
  • 54.
    © GfK Establish aQuality Assurance system to ✓Answer client requests ✓Monitor behavior in production Photo by Olav Ahrens Røtne on Unsplash
  • 55.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Answer Client Requests PRESCRIPTIVE ANALYTICS SOLUTION Price KPI 1 in November 2019: 0.87 Client Price KPI 1 seems pretty high…
  • 56.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Quality Assurance System Price KPI 1 in November 2019: 0.87 KPI Price KPI1: 0.87 PRESCRIPTIVE ANALYTICS SOLUTION
  • 57.
    © GfK Quality AssuranceSystem 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Model Accuracy: 0.89 Accuracy of the model used to estimate this value. Revenue Base: 2.48 Mio € Sum of Revenue of the transaction behind the measure. Share of imputed observations: 0.0075 Share of the observations with imputation. Variance last 5 weeks: 0.35 Variance a measure has over the last n observations. QPIQPIQPIQPIQPI KPI Price KPI1: 0.87 Removed Outliers: 0.03 Share of observations not considered due to outlier reduction
  • 58.
    © GfK Quality AssuranceSystem 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Model Accuracy: 0.89 Accuracy of the model used to estimate this value. Revenue Base: 2.48 Mio € Sum of Revenue of the transaction behind the measure. Share of imputed observations: 0.0075 Share of the observations with imputation. Price Variance last 5 weeks: 0.35 Variance of prices in the last 5 weeks. QPIQPIQPIQPIQPI KPI Price KPI1: 0.87 Removed Outliers: 0.03 Share of observations not considered due to outlier reduction
  • 59.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Quality Assurance System PRESCRIPTIVE ANALYTICS SOLUTION Price KPI 1 in November 2019: 0.87 Client Price KPI 1 seems pretty high…
  • 60.
    © GfK Client Price KPI1 seems pretty high… Quality Assurance System 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 QPI Data Scientist You are right – we checked the results and its caused by a high variance in the prices over the last months. Price Variance last 5 weeks: 0.35 Variance of prices in the last 5 weeks.
  • 61.
    © GfK Client Price KPI1 seems pretty high… Quality Assurance System 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 QPI Data Scientist You are right – we checked the results and its caused by a high variance in the prices over the last months. Price Variance last 5 weeks: 0.35 Variance of prices in the last 5 weeks. QPI Promo Intensity: 0.89 Share of sales under promotion. We also see a super high promotion intensity on this item. The Price KPI is strongly influenced by changes in the promotion patterns.
  • 62.
    © GfK Quality AssuranceSystem 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 QPI Data Scientist You are right – we checked the results and its caused by a high variance in the prices over the last months. Price Varia QPI Sh We also see a super high promotion intensity on this item. The Price KPI is strongly influenced by changes in the promotion patterns. How to approach user/client requests ▪ Increase share of client requests answerable by QPIs
  • 63.
    © GfK Quality AssuranceSystem 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 QPI Operations Price Varia QPI Sh We also see a super high promotion intensity on this item. The Price KPI is strongly influenced by changes in the promotion patterns. How to approach user/client requests ▪ Increase share of client requests answerable by QPIs ▪ Free-Up development resources by including operations You are right – we checked the results and its caused by a high variance in the prices over the last months.
  • 64.
    © GfK Quality AssuranceSystem 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 We also see a super high promotion intensity on this item. The Price KPI is strongly influenced by changes in the promotion patterns. How to approach user/client requests ▪ Increase share of client requests answerable by QPIs ▪ Free-Up development resources by including operations ▪ Self-service evidence functionality PRESCRIP Price KPI This values is caused by a high variance in the prices over the last months.
  • 65.
    © GfK Establish aQuality Assurance system to ✓Answer client requests ✓Monitor behavior in production Photo by Olav Ahrens Røtne on Unsplash
  • 66.
    © GfK Monitor correctbehavior in production 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Module FrontEnd
  • 67.
    © GfK Monitor correctbehavior in production Introduce additional Quality Assurance layers 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Introduce additional Quality Assurance layers Data Module FrontEnd Output QA
  • 68.
    © GfK Monitor correctbehavior in production Goal: Understand for every data point in the front end if the quality is good enough. 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Module FrontEnd Output QA
  • 69.
    © GfK Monitor correctbehavior in production 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Model Accuracy: 0.89 Accuracy of the model used to estimate this value. Revenue Base: 2.48 Mio € Sum of Revenue of the transaction behind the measure. Share of imputed observations: 0.0075 Share of the observations with imputation. QPIQPIQPIQPIQPI KPI Price KPI1: 0.87 Removed Outliers: 0.03 Share of observations not considered due to outlier reduction Price Variance last 5 weeks: 0.35 Variance of prices in the last 5 weeks.
  • 70.
    © GfK Monitor correctbehavior in production 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Model Accuracy: 0.89 Accuracy of the model used to estimate this value. Revenue Base: 2.48 Mio € Sum of Revenue of the transaction behind the measure. Share of imputed observations: 0.0075 Share of the observations with imputation. QPIQPIQPIQPIQPI KPI If Removed Outliers > 0.04: WARN Rule If Removed Outliers > 0.06: BLOCK Rule If Model Accuracy < 0.75: BLOCK Rule Price KPI1: 0.87 Removed Outliers: 0.03 Share of observations not considered due to outlier reduction Price Variance last 5 weeks: 0.35 Variance of prices in the last 5 weeks.
  • 71.
    © GfK Monitor correctbehavior in production 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Module FrontEnd Output QA
  • 72.
    © GfK Output ofthe QA-System 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Module FrontEnd Output QA BLOCK WARN PASS Transactions 0,2 % 0,1 % 99,7 % Revenue 0,8 % 2,8 % 96,4 % Salesunits 1,1 % 3,7% 95,2 %
  • 73.
  • 74.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 What do we mean with scaling? It‘s not a technical problem: - Not technically running the modules with more data/faster It‘s a methodological challenge: - If the method to generate a churn model works for one our our e-commerce stores, could we use this for our other stores as well? - If a predictive maintanance model runs well for the machine in Plant A, could we use the same approach for Plant B? - If we can apply a module on the data for TVs in Germany, does it also work for Washing Machines in France?
  • 75.
    © GfK Let‘s scale… 20-Nov-19▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Scientist Data Scientist Data Scientist Machine Learning Engineer Machine Learning Engineer Data Scientist
  • 76.
    © GfK Let‘s scale… 20-Nov-19▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Product Owner We got great client feedback for our MVP for the first country. Could we scale the product around the globe now? Data Scientist Data Scientist Data Scientist Machine Learning Engineer Machine Learning Engineer Data Scientist
  • 77.
    © GfK Let‘s scale… 20-Nov-19▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Scientist Data Scientist Data Scientist Machine Learning Engineer Machine Learning Engineer Data Scientist Product Owner We got great client feedback for our MVP for the first country. Could we scale the product around the globe now?
  • 78.
    © GfK Let‘s scale… 20-Nov-19▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Scientist Data Scientist Data Scientist Machine Learning Engineer Machine Learning Engineer Data Scientist Product Owner „It seems like the pipeline does not run through for 45 % of the countries. And the error messages differ between them.“ We got great client feedback for our MVP for the first country. Could we scale the product around the globe now?
  • 79.
    © GfK Let‘s scale… 20-Nov-19▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Scientist Data Scientist Data Scientist Machine Learning Engineer Machine Learning Engineer Data Scientist Product Owner „It seems like the pipeline does not run through for 45 % of the countries. And the error messages differ between them.“ „The outlier detection seems to be too strict for smaller countries. We basically filter out everything here.“ „We see different market behaviour in the asian market. Our approach does not work there.“ „Data Quality really differs between the markets.“ We got great client feedback for our MVP for the first country. Could we scale the product around the globe now?
  • 80.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 What happened? ▪ Solution „overfitted“ to suit a subset of the total data ▪ Set-up of a dedicated scaling team required ▪ Too many versions for different data sources ▪ Overall process has been time-consuming Data Scientist Data Scientist Data Scientist Machine Learning Engineer Machine Learning Engineer Data Scientist
  • 81.
    © GfK Let‘s scale… 20-Nov-19▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Scientist Data Scientist Data Scientist Machine Learning Engineer Machine Learning Engineer Data Scientist Product Owner „OK, we have provided the results now for 350 category- country combinations over 60 weeks – could you please evaluate the results? ?!?
  • 82.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 What happened? ▪ Solution „overfitted“ to suit a subset of the total data ▪ Set-up of a dedicated scaling team required ▪ Too many versions for different data sources ▪ Overall process has been time-consuming ▪ The “human”-evaluation process did not scale Data Scientist Data Scientist Data Scientist Machine Learning Engineer Machine Learning Engineer Data Scientist
  • 83.
    © GfK© GfK Howto solve the dilemma? Photo by Olav Ahrens Røtne on Unsplash
  • 84.
    © GfK ✓Consider ScalingDuring Research ✓Automate Evaluation Of New Data Photo by Olav Ahrens Røtne on Unsplash
  • 85.
    © GfK ✓Consider ScalingDuring Research ✓Automate Evaluation Of New Data Photo by Olav Ahrens Røtne on Unsplash
  • 86.
    © GfK Consider Scalingduring Research ▪ „Broad“ data samples available during development 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
  • 87.
    © GfK Consider Scalingduring Research ▪ „Broad“ data samples available during development ▪ Make Scaling a step in the KANBAN-process ▪ Use QPIs to check hypotheses/observations 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Product backlog Research [KANBAN] Product vision Iterative process Daily StandupStorytelling Retrospective Planning Validation with PM/Clients Check Scaling
  • 88.
    © GfK Consider ScalingDuring Research Examples ▪ Use QPIs to check assumptions/findings across the data landscape ▪ “I see that the data source does not contain negative values in column XY” → Make a QPI out of it ▪ “I see that my filter reduces 0.9% of the observations” → Make a QPI out of it ▪ “Using a Random Forest gives me a accuracy 0.87” → Make a QPI out of it 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
  • 89.
    © GfK Consider Scalingduring Research ▪ „Broad“ data samples available during development ▪ Make Scaling a step in the KANBAN-process ▪ Use QPIs rules to check hypotheses/observations 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Product backlog Research [KANBAN] Product vision Iterative process Daily StandupStorytelling Retrospective Planning Validation with PM/Clients Check Scaling
  • 90.
    © GfK ✓Consider ScalingDuring Research ✓Automate Evaluation Of New Data Photo by Olav Ahrens Røtne on Unsplash
  • 91.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Automate Evaluation Of New Data One Country/Category - Recap Data Module FrontEnd Module Output QA BLOCK WARN PASS Transactions 0,2 % 0,1 % 99,7 % Revenue 0,8 % 2,8 % 96,4 % Salesunits 1,1 % 3,7% 95,2 %
  • 92.
    © GfK Automate EvaluationOf New Data 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
  • 93.
    © GfK Automate EvaluationOf New Data 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
  • 94.
    © GfK Automate EvaluationOf New Data Data Module FrontEnd Module Output QA 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019
  • 95.
    © GfK Automate EvaluationOf New Data Data Module FrontEnd Module Output QA 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 BLOCK WARN PASS Transactions 0,2 % 0,1 % 99,7 % Revenue 0,8 % 2,8 % 96,4 % Salesunits 1,1 % 3,7% 95,2 %
  • 96.
    © GfK Automate EvaluationOf New Data Data Module FrontEnd Module Output QA 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 BLOCK WARN PASS Transactions 0,2 % 0,1 % 99,7 % Revenue 0,8 % 2,8 % 96,4 % Salesunits 1,1 % 3,7% 95,2 % READY FOR USAGE NEEDS REFINEMENT
  • 97.
    © GfK Development backlog Productbacklog Engineering & Integration Research Scaling Triage and process feedback from clients and industry experts Maintenance & Support [KANBAN] [SCRUM] Scaling Backlog/ Data Landscape Module Execution Output QA- Check 20-Nov-19 ▪ Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Develop, test, release Develop, test, release Product vision Iterative process Daily StandupStorytelling Retrospective Planning Needs refinement for Scaling Backlog Requirements prioritization, acceptance criteria, estimation Validation with PM/Clients, acceptance criteria (refinement) Check Scaling
  • 98.
  • 99.
    © GfK 2019 20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Data Scientist Data Scientist Data ScientistScrum Master Product Owner Machine Learning Engineer Software Engineer Client Software Engineer Data ScientistProduct Owner Machine Learning Engineer
  • 100.
    © GfK 2021? ▪ DataScience Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 ▪ How do QPIs influence in a online- calculation mode? ▪ Limits of a rule-based quality evaluation?
  • 101.
    © GfK20-Nov-19 ▪Data Science Development Lifecycle | René Traue, Christian Lindenlaub | Predictive Analytics World Business Berlin 2019 Questions? Germany Traue, René Senior Data Scientist +49 911 395 2381 rene.traue@gfk.com Germany Lindenlaub, Christian Data Scientist +49 911 395 4371 christian.Lindenlaub@gfk.com