Behind The Scenes Data Science Coolblue 2018-03-22

Andres Martinez | Manager Data Science | a.martinez@coolblue.nl | 22-03-2018

Agenda
● What means Data Scientist at Coolblue.
● Delivering data science solutions in an agile, data-driven company.
● Organization.
Data Science at Coolblue

Descriptive:
What is happening now based on incoming data.
Diagnostic:
What happened and why.
Predictive:
An analysis of likely scenarios of what might happen. The deliverables
are usually a predictive forecast.
Prescriptive:
This type of analysis reveals what actions should be taken.
Analytics outputs

● … should contains the underlying dynamic of the process we want to predict
● … is accurate enough for creating scenarios and anticipate actions.
Where we focus our efforts
A good predictive model...

● … should contains the underlying dynamic of the process we want to predict.
Drivers impact
Diagnosis
Prediction

● … should contains the underlying dynamic of the process we want to predict
Drivers impact
Diagnosis
Prediction

● … is accurate enough for creating scenarios and anticipate actions.
Future scenarios and actions
today
Prediction
Prescription
n-people required

Our definition of Data Scientist
It is about implementation
Statistical Analysis, model
estimation,...
industrialization
managing models'
lifecycle at scale

Power is nothing without control
We take care...
● The period when a model is valid
always is bounded.
● Continuous monitoring and
adjustment.

Two main responsibilities
On-time and precise
Accuracy and precision in our
predictive models
figures delivered on time

Revenue of 1.2 billion in 2017

Generalize, monitoring, orchestrate
Data Science Agile delivering at Coolblue
● Generalize the solution.
● Continuous deployment pipeline.
● Feedback adjustment and monitoring.
● Statistical summarization.
● Overarching logic to orquestate the procedure.

Measure performance
We setup KPIs!

Outputs can be inputs as well
Avoid silos, input and outputs are connected among solutions.
Synergies between solutions

Manual and continuous efforts…

The strength is in the team
Boosting performance!
● Appropriate tasks and responsibilities.
● A single individual is not enough: team
really matters!
● Knowledge sharing.
● There is not a single recipe.

The three components:
Build technical solutions
when there is value on it!
Flexible & agile organization
Data Science across
Coolblue through close
cooperation
Validation Implementation
Production
Exploration and validation: work in
domains/knowledge centers in close
cooperation with Business Analysis.
Collaboration

Problem understanding
Hypothesis creation
Data gathering
Feature engineering
Model selection and/or estimation
Model evaluation
Generalization
Implementation in production
Full stack DS vs. Pioneers
Core team
Data
Scientist
Satellite
Data
Scientist

Domain-a
Head of Tech
Manager
Data Science
Data Science Satellite 1
Team Lead Tech
Scrum team
Team Lead Tech
Scrum team
Data Science Satellite 2
Data Science Satellite m
Organization
Domain-b
Domain-a
Domain-b
Domain-c
Tech principles & scrum methodology Research & PoC

Matthias Schuurmans | Data Scientist | m.schuurmans@coolblue.nl | 22-03-2018

Forecasts
● Overview
● Shipments forecast
○ Production and monitoring
○ Evaluation
● Demand forecast
○ Production and monitoring
○ Evaluation
Agenda

Overview
● Planning
○ Package on time
○ Fast response from customer service
○ Setting realistic targets
○ Products in stock
?Why care?

Ordered before 23:59, delivered tomorrow

Overview
Product needs to be in stock
Enough people to pick/pack

Overview
Forecasts
Operational
Producttype
~4.5k
producttypes
~50k forecasts!
Demand
~45k products
Invoices
2 countries
Shipments
3 warehouses

Shipments forecast
● Just enough people in the warehouses
● 3 warehouses: Parcel, XL and Whitegoods
● 3 horizons: 7 days, 14 days and 364 days
● Nice data
Context

Shipments forecast
Evaluation Production

Shipments forecast evaluation
Good features? Good models /
parameters?
Good data? Good forecast?

Portfolio Shipments forecast
Features Models
Trend Regularized Regression
Seasonality XGBoost
Holidays Ensembles
Events Feature subsets
Lag targets Target transformations
Polynomials Customized weights
Dummies

Shipments forecast evaluation
● Cross Validation and KPIs
○ Percentage below 10% error
○ Root Mean Squared Error
○ Mean Absolute Percentage Error
● Extra attention for special cases
○ Christmas
● Interpretability / transparency
○ Effects of features
● Stability
Good forecast?

Demand forecast
● Just enough products in stock
● ~45.000 products
● Forecast ‘demand’ of a product 7 days ahead
Context

Demand forecast
● “Demand” not perfectly measurable
● Sparse data
● Volatile data
● Start at 01.00, ready at 05.00
Challenges

Demand forecast
Evaluation Production

Portfolio Demand forecast
Features Models
Trend Regularized Regression
Seasonality Neural Networks
Holidays Support Vector Regressors
Events Weighted Average
Lag targets MARS
Polynomials Decision Trees
Dummified Feature subsets

Good forecast?
Good forecast?
Deal with automatically:
○ Cross Validation and KPIs
○ Stability
Ability to investigate manually:
○ Extra attention for special cases
○ Interpretability / transparency

Forecasting
● Forecasting very important for planning
● Pick a best model
○ Smart feature engineering
○ Relevant models and parameters
○ Grid and decide based on error metrics, stability, transparency
● Calculate using best model every day
● Use cloud when appropriate
● Automate and monitor everything!
Summary

Daan Marechal | Satellite Data Scientist | d.marechal@coolblue.nl | 22-03-2018

Recommender Systems are software tools and techniques providing suggestions
for items to be of use to a user. The suggestions provided are aimed at supporting
their users in various decision-making processes.
Increase satisfaction and
boost sales
Pretty well known…
Recommender systems

Let’s start from the beginning
What do we want to achieve?
The goal is to get better results than our current method for recommending products.

Product Click Through Rate
How are we going to measure it?
Once we have a solution.. let’s make an A/B test in an Email Marketing Campaign.

Allow me to jump to the very end...

A/B test… fingers crossed
A
B

Success!
A
B
Product CTR
+60%
Conversion
+5%

How did we do it?
Let’s get serious

What should we use!?
Nearest Neighbors
Decision Trees
Rule-based Classifiers
Bayesian Classifiers
Artificial Neural Networks
Support Vector Machines
Ensembles of Classifiers
Most popular and fundamental techniques used
Collaborative filtering, content-based filtering, data mining methods and context-aware
methods.
K-means
Other alternatives to K-means
Association Rule Mining
Other ad-hoc methods
Classification
Cluster analysis
Others

These are the typical features...
So, what do we have here?
● Gender
● Region
● Specified interests
● Purchase history
● etc.

Customer interactions
Lisa
How to make it really personal?

We should suggest...
Customers visualize a set of products
Next sequence of products

Talking to our customers!?
A product sequence is like a phrase

This helps in deciding the model
● Several thousands products to be recommended
● It seems not to be depended of the gender, regions, etc.
● Each customer visualizes a very personal set of products
● Try to respond with a new personal set of products
Brief summary after some analysis:

Recurrent Neural Network
Nearest Neighbors
Decision Trees
Rule-based Classifiers
Bayesian Classifiers
Artificial Neural Networks
Support Vector Machines
Ensembles of Classifiers
Among of possibilities
K-means
Other alternatives to K-means
Association Rule Mining
Other ad-hoc methods
Classification
Cluster analysis
Others

Let’s see what the literature says
● Not many papers about RNN and
recommender systems
● All papers are very recent: 2016, 2017
● Results are very promising but still there are
no figures of real tests (only offline
experiments).

We set up the benchmark
Still we believe it’s worth the try!
We are in the research phase… we could try a quick PoC.
For more information:
Cole MacLean, Barbara Garza, and Suren Oganesian. A recurrent neural network based subreddit recommendation
system. 2017.

Evaluation
Mean Average Precision @ k
● Average Precision @ k looks at a ranked set of k recommended items
● Checks whether relevant item is in the recommended set
1 0 0
0 0
AP@5 = 0.20 0 0
0 1
AP@5 = 1
● Mean Average Precision @ k is the mean of all AP@k’s

Computationally expensive
● 4 CPUs Locally: ~120 hrs (estimated)
● 16 vCPUs in the cloud: ~36 hours
● GPU (NVIDIA Tesla k80) in the cloud: ~8 hrs
● GPU (NVIDIA Tesla P100) in the cloud: ~4.5 hrs
About the timing.

Nice, we got it!
MAP@5: ~0.0925!

TensorFlow & Google Cloud Platform
Some notes about the training
● Python
● For the PoC we have used Jupyter notebooks
● TensorFlow for the Neural Network
● All models have been trained in GCP Compute
Engine

● Start from the beginning: how are we going to measure success?
● Understand your data
● What is the current status? Literature? Benchmark
● Offline test and fine tuning: PoC
● A/B testing
● Future steps for industrializing it: Core Team
Wrap-up
Summary

Behind The Scenes Data Science Coolblue 2018-03-22

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Behind The Scenes Data Science Coolblue 2018-03-22

Similar to Behind The Scenes Data Science Coolblue 2018-03-22 (20)

Recently uploaded

Recently uploaded (20)

Behind The Scenes Data Science Coolblue 2018-03-22

Editor's Notes