SlideShare a Scribd company logo
1 of 94
Andres Martinez | Manager Data Science | a.martinez@coolblue.nl | 22-03-2018
Agenda
● What means Data Scientist at Coolblue.
● Delivering data science solutions in an agile, data-driven company.
● Organization.
Data Science at Coolblue
Descriptive:
What is happening now based on incoming data.
Diagnostic:
What happened and why.
Predictive:
An analysis of likely scenarios of what might happen. The deliverables
are usually a predictive forecast.
Prescriptive:
This type of analysis reveals what actions should be taken.
Analytics outputs
● … should contains the underlying dynamic of the process we want to predict
● … is accurate enough for creating scenarios and anticipate actions.
Where we focus our efforts
A good predictive model...
● … should contains the underlying dynamic of the process we want to predict.
Drivers impact
A good predictive model...
Diagnosis
Prediction
● … should contains the underlying dynamic of the process we want to predict
Drivers impact
A good predictive model...
Diagnosis
Prediction
● … should contains the underlying dynamic of the process we want to predict
Drivers impact
A good predictive model...
Diagnosis
Prediction
● … is accurate enough for creating scenarios and anticipate actions.
Future scenarios and actions
A good predictive model...
today
Prediction
Prescription
n-people required
Our definition of Data Scientist
It is about implementation
Statistical Analysis, model
estimation,...
industrialization
managing models'
lifecycle at scale
Power is nothing without control
We take care...
● The period when a model is valid
always is bounded.
● Continuous monitoring and
adjustment.
Two main responsibilities
On-time and precise
Accuracy and precision in our
predictive models
figures delivered on time
High expectations
Revenue of 1.2 billion in 2017
Generalize, monitoring, orchestrate
Data Science Agile delivering at Coolblue
● Generalize the solution.
● Continuous deployment pipeline.
● Feedback adjustment and monitoring.
● Statistical summarization.
● Overarching logic to orquestate the procedure.
Where to start?
Measure performance
We setup KPIs!
Outputs can be inputs as well
Avoid silos, input and outputs are connected among solutions.
Synergies between solutions
Manual and continuous efforts…
Well maintained path!
The strength is in the team
Boosting performance!
● Appropriate tasks and responsibilities.
● A single individual is not enough: team
really matters!
● Knowledge sharing.
● There is not a single recipe.
The three components:
Build technical solutions
when there is value on it!
Flexible & agile organization
Data Science across
Coolblue through close
cooperation
Validation Implementation
Production
Exploration and validation: work in
domains/knowledge centers in close
cooperation with Business Analysis.
Collaboration
Problem understanding
Hypothesis creation
Data gathering
Feature engineering
Model selection and/or estimation
Model evaluation
Generalization
Implementation in production
Full stack DS vs. Pioneers
Core team
Data
Scientist
Satellite
Data
Scientist
Domain-a
Head of Tech
Manager
Data Science
Data Science Satellite 1
Team Lead Tech
Scrum team
Team Lead Tech
Scrum team
Data Science Satellite 2
Data Science Satellite m
Organization
Domain-b
Domain-a
Domain-b
Domain-c
Tech principles & scrum methodology Research & PoC
Andres Martinez | Manager Data Science | a.martinez@coolblue.nl | 22-03-2018
Matthias Schuurmans | Data Scientist | m.schuurmans@coolblue.nl | 22-03-2018
Forecasts
● Overview
● Shipments forecast
○ Production and monitoring
○ Evaluation
● Demand forecast
○ Production and monitoring
○ Evaluation
Agenda
Overview
● Planning
○ Package on time
○ Fast response from customer service
○ Setting realistic targets
○ Products in stock
?Why care?
Ordered before 23:59, delivered tomorrow
Overview
Overview
Product needs to be in stock
Enough people to pick/pack
Overview
Forecasts
Operational
Producttype
~4.5k
producttypes
~50k forecasts!
Demand
~45k products
Invoices
2 countries
Shipments
3 warehouses
Overview
Forecasts
Operational
Producttype
~4.5k
producttypes
~50k forecasts!
Demand
~45k products
Invoices
2 countries
Shipments
3 warehouses
Shipments forecast
Shipments forecast
● Just enough people in the warehouses
● 3 warehouses: Parcel, XL and Whitegoods
● 3 horizons: 7 days, 14 days and 364 days
● Nice data
Context
Shipments forecast
Evaluation Production
Shipments forecast production
Shipments forecast production
Shipments forecast production
Shipments forecast production
Shipments forecast evaluation
Shipments forecast evaluation
Good features? Good models /
parameters?
Good data? Good forecast?
Good features and models?
Good features and models?
Portfolio Shipments forecast
Features Models
Trend Regularized Regression
Seasonality XGBoost
Holidays Ensembles
Events Feature subsets
Lag targets Target transformations
Polynomials Customized weights
Dummies
Shipments forecast evaluation
● Cross Validation and KPIs
○ Percentage below 10% error
○ Root Mean Squared Error
○ Mean Absolute Percentage Error
● Extra attention for special cases
○ Christmas
● Interpretability / transparency
○ Effects of features
● Stability
Good forecast?
Shipments forecast evaluation
Shipments forecast evaluation
Shipments forecast evaluation
Ordered before 23:59, delivered tomorrow
Demand forecast
● Just enough products in stock
● ~45.000 products
● Forecast ‘demand’ of a product 7 days ahead
Context
Demand forecast
● “Demand” not perfectly measurable
● Sparse data
● Volatile data
● Start at 01.00, ready at 05.00
Challenges
Demand forecast
Evaluation Production
Demand forecast
Demand forecast
Demand forecast production
Demand forecast production
Demand forecast evaluation
Good features and models?
Portfolio Demand forecast
Features Models
Trend Regularized Regression
Seasonality Neural Networks
Holidays Support Vector Regressors
Events Weighted Average
Lag targets MARS
Polynomials Decision Trees
Dummified Feature subsets
Good forecast?
Good forecast?
Deal with automatically:
○ Cross Validation and KPIs
○ Stability
Ability to investigate manually:
○ Extra attention for special cases
○ Interpretability / transparency
Demand forecast evaluation
Forecasting
● Forecasting very important for planning
● Pick a best model
○ Smart feature engineering
○ Relevant models and parameters
○ Grid and decide based on error metrics, stability, transparency
● Calculate using best model every day
● Use cloud when appropriate
● Automate and monitor everything!
Summary
Matthias Schuurmans | Data Scientist | m.schuurmans@coolblue.nl | 22-03-2018
Daan Marechal | Satellite Data Scientist | d.marechal@coolblue.nl | 22-03-2018
Recommender Systems are software tools and techniques providing suggestions
for items to be of use to a user. The suggestions provided are aimed at supporting
their users in various decision-making processes.
Increase satisfaction and
boost sales
Pretty well known…
Recommender systems
Impacting
Recommender systems
45.000 different products
Let’s start from the beginning
What do we want to achieve?
The goal is to get better results than our current method for recommending products.
Product Click Through Rate
How are we going to measure it?
Once we have a solution.. let’s make an A/B test in an Email Marketing Campaign.
Allow me to jump to the very end...
A/B test… fingers crossed
A
B
Success!
A
B
Product CTR
+60%
Conversion
+5%
How did we do it?
Let’s get serious
What should we use!?
Nearest Neighbors
Decision Trees
Rule-based Classifiers
Bayesian Classifiers
Artificial Neural Networks
Support Vector Machines
Ensembles of Classifiers
Most popular and fundamental techniques used
Collaborative filtering, content-based filtering, data mining methods and context-aware
methods.
K-means
Other alternatives to K-means
Association Rule Mining
Other ad-hoc methods
Classification
Cluster analysis
Others
These are the typical features...
So, what do we have here?
● Gender
● Region
● Specified interests
● Purchase history
● etc.
Customer interactions
Lisa
How to make it really personal?
We should suggest...
Customers visualize a set of products
Next sequence of products
Talking to our customers!?
A product sequence is like a phrase
This helps in deciding the model
● Several thousands products to be recommended
● It seems not to be depended of the gender, regions, etc.
● Each customer visualizes a very personal set of products
● Try to respond with a new personal set of products
Brief summary after some analysis:
Recurrent Neural Network
Nearest Neighbors
Decision Trees
Rule-based Classifiers
Bayesian Classifiers
Artificial Neural Networks
Support Vector Machines
Ensembles of Classifiers
Among of possibilities
K-means
Other alternatives to K-means
Association Rule Mining
Other ad-hoc methods
Classification
Cluster analysis
Others
Let’s see what the literature says
● Not many papers about RNN and
recommender systems
● All papers are very recent: 2016, 2017
● Results are very promising but still there are
no figures of real tests (only offline
experiments).
We set up the benchmark
Still we believe it’s worth the try!
We are in the research phase… we could try a quick PoC.
For more information:
Cole MacLean, Barbara Garza, and Suren Oganesian. A recurrent neural network based subreddit recommendation
system. 2017.
Evaluation
Mean Average Precision @ k
● Average Precision @ k looks at a ranked set of k recommended items
● Checks whether relevant item is in the recommended set
1 0 0
0 0
AP@5 = 0.20 0 0
0 1
AP@5 = 1
● Mean Average Precision @ k is the mean of all AP@k’s
Typical RNN architecture
Computationally expensive
● 4 CPUs Locally: ~120 hrs (estimated)
● 16 vCPUs in the cloud: ~36 hours
● GPU (NVIDIA Tesla k80) in the cloud: ~8 hrs
● GPU (NVIDIA Tesla P100) in the cloud: ~4.5 hrs
About the timing.
Faster experimenting
Nice, we got it!
MAP@5: ~0.0925!
TensorFlow & Google Cloud Platform
Some notes about the training
● Python
● For the PoC we have used Jupyter notebooks
● TensorFlow for the Neural Network
● All models have been trained in GCP Compute
Engine
● Start from the beginning: how are we going to measure success?
● Understand your data
● What is the current status? Literature? Benchmark
● Offline test and fine tuning: PoC
● A/B testing
● Future steps for industrializing it: Core Team
Wrap-up
Summary
Daan Marechal | Satellite Data Scientist | d.marechal@coolblue.nl | 22-03-2018

More Related Content

What's hot

Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Fordham University
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsArtifacia
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
Natural language processing
Natural language processingNatural language processing
Natural language processingHansi Thenuwara
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningRoberto Pereira Silveira
 

What's hot (20)

Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Lstm
LstmLstm
Lstm
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its Applications
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Text Similarity
Text SimilarityText Similarity
Text Similarity
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
 
5. phase of nlp
5. phase of nlp5. phase of nlp
5. phase of nlp
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

Similar to Behind The Scenes Data Science Coolblue 2018-03-22

Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolLouis Cialdella
 
Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsProduct School
 
Giving Organisations new Capabilities to ask the Right Business Questions
Giving Organisations new Capabilities to ask the Right Business QuestionsGiving Organisations new Capabilities to ask the Right Business Questions
Giving Organisations new Capabilities to ask the Right Business QuestionsOReillyStrata
 
Big data and other buzzwords
Big data and other buzzwordsBig data and other buzzwords
Big data and other buzzwordsAndrew Clark
 
Ml in a Day Workshop 5/1
Ml in a Day Workshop 5/1Ml in a Day Workshop 5/1
Ml in a Day Workshop 5/1CCG
 
Machine learning101 v1.2
Machine learning101 v1.2Machine learning101 v1.2
Machine learning101 v1.2CCG
 
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdfmtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdfJens-Fabian Goetzmann
 
ONE-SIZE DOESN'T FIT ALL - EFFECTIVELY (RE)EVALUATE A DATA SOLUTION FOR YOUR ...
ONE-SIZE DOESN'T FIT ALL - EFFECTIVELY (RE)EVALUATE A DATA SOLUTION FOR YOUR ...ONE-SIZE DOESN'T FIT ALL - EFFECTIVELY (RE)EVALUATE A DATA SOLUTION FOR YOUR ...
ONE-SIZE DOESN'T FIT ALL - EFFECTIVELY (RE)EVALUATE A DATA SOLUTION FOR YOUR ...DevOpsDays Tel Aviv
 
Nancy's webinar
Nancy's webinarNancy's webinar
Nancy's webinarVWO
 
Planning a data solution - "By Failing to prepare, you are preparing to fail"
Planning a data solution - "By Failing to prepare, you are preparing to fail"Planning a data solution - "By Failing to prepare, you are preparing to fail"
Planning a data solution - "By Failing to prepare, you are preparing to fail"Itai Yaffe
 
Product Agility: 3 fundamentals from the trenches (Braga,PT)
Product Agility: 3 fundamentals from the trenches (Braga,PT)Product Agility: 3 fundamentals from the trenches (Braga,PT)
Product Agility: 3 fundamentals from the trenches (Braga,PT)Pedro Teixeira
 
Big Data Testing Strategies
Big Data Testing StrategiesBig Data Testing Strategies
Big Data Testing StrategiesKnoldus Inc.
 
Intro to Data Analytics with Oscar's Director of Product
 Intro to Data Analytics with Oscar's Director of Product Intro to Data Analytics with Oscar's Director of Product
Intro to Data Analytics with Oscar's Director of ProductProduct School
 
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...Susan Hanley
 
Going Beyond ‘What Success Looks Like’ – Using Data to Achieve Successful Pro...
Going Beyond ‘What Success Looks Like’ – Using Data to Achieve Successful Pro...Going Beyond ‘What Success Looks Like’ – Using Data to Achieve Successful Pro...
Going Beyond ‘What Success Looks Like’ – Using Data to Achieve Successful Pro...Jamie Clouting (CSPO)
 
QA_Chapter_01_Dr_B_Dayal_Overview.pptx
QA_Chapter_01_Dr_B_Dayal_Overview.pptxQA_Chapter_01_Dr_B_Dayal_Overview.pptx
QA_Chapter_01_Dr_B_Dayal_Overview.pptxTeshome62
 
How to succeed at data without even trying!
How to succeed at data without even trying!How to succeed at data without even trying!
How to succeed at data without even trying!Dylan
 

Similar to Behind The Scenes Data Science Coolblue 2018-03-22 (20)

Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product School
 
Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data Decisions
 
Giving Organisations new Capabilities to ask the Right Business Questions
Giving Organisations new Capabilities to ask the Right Business QuestionsGiving Organisations new Capabilities to ask the Right Business Questions
Giving Organisations new Capabilities to ask the Right Business Questions
 
Baworld adapting to whats happening
Baworld adapting to whats happeningBaworld adapting to whats happening
Baworld adapting to whats happening
 
Big data and other buzzwords
Big data and other buzzwordsBig data and other buzzwords
Big data and other buzzwords
 
Ml in a Day Workshop 5/1
Ml in a Day Workshop 5/1Ml in a Day Workshop 5/1
Ml in a Day Workshop 5/1
 
Machine learning101 v1.2
Machine learning101 v1.2Machine learning101 v1.2
Machine learning101 v1.2
 
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdfmtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
 
ONE-SIZE DOESN'T FIT ALL - EFFECTIVELY (RE)EVALUATE A DATA SOLUTION FOR YOUR ...
ONE-SIZE DOESN'T FIT ALL - EFFECTIVELY (RE)EVALUATE A DATA SOLUTION FOR YOUR ...ONE-SIZE DOESN'T FIT ALL - EFFECTIVELY (RE)EVALUATE A DATA SOLUTION FOR YOUR ...
ONE-SIZE DOESN'T FIT ALL - EFFECTIVELY (RE)EVALUATE A DATA SOLUTION FOR YOUR ...
 
Nancy's webinar
Nancy's webinarNancy's webinar
Nancy's webinar
 
DS Life Cycle
DS Life CycleDS Life Cycle
DS Life Cycle
 
DS Life Cycle
DS Life CycleDS Life Cycle
DS Life Cycle
 
Planning a data solution - "By Failing to prepare, you are preparing to fail"
Planning a data solution - "By Failing to prepare, you are preparing to fail"Planning a data solution - "By Failing to prepare, you are preparing to fail"
Planning a data solution - "By Failing to prepare, you are preparing to fail"
 
Product Agility: 3 fundamentals from the trenches (Braga,PT)
Product Agility: 3 fundamentals from the trenches (Braga,PT)Product Agility: 3 fundamentals from the trenches (Braga,PT)
Product Agility: 3 fundamentals from the trenches (Braga,PT)
 
Big Data Testing Strategies
Big Data Testing StrategiesBig Data Testing Strategies
Big Data Testing Strategies
 
Intro to Data Analytics with Oscar's Director of Product
 Intro to Data Analytics with Oscar's Director of Product Intro to Data Analytics with Oscar's Director of Product
Intro to Data Analytics with Oscar's Director of Product
 
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...
 
Going Beyond ‘What Success Looks Like’ – Using Data to Achieve Successful Pro...
Going Beyond ‘What Success Looks Like’ – Using Data to Achieve Successful Pro...Going Beyond ‘What Success Looks Like’ – Using Data to Achieve Successful Pro...
Going Beyond ‘What Success Looks Like’ – Using Data to Achieve Successful Pro...
 
QA_Chapter_01_Dr_B_Dayal_Overview.pptx
QA_Chapter_01_Dr_B_Dayal_Overview.pptxQA_Chapter_01_Dr_B_Dayal_Overview.pptx
QA_Chapter_01_Dr_B_Dayal_Overview.pptx
 
How to succeed at data without even trying!
How to succeed at data without even trying!How to succeed at data without even trying!
How to succeed at data without even trying!
 

Recently uploaded

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

Behind The Scenes Data Science Coolblue 2018-03-22

Editor's Notes

  1. Create your own Fact/slogan here: https://coolblueblauwdruk.nl/en/huisstijl/feit-slogan-generator
  2. Create your own Fact/slogan here: https://coolblueblauwdruk.nl/en/huisstijl/feit-slogan-generator
  3. Run Azkaban UAT STS BQ: SELECT creation_timestamp, warehouse_id, DATE(forecast.timestamp) AS forecast.date, forecast.value FROM [coolblue-bi-platform-uat:forecasts.short_term_shipments_with_reallocations] WHERE creation_timestamp > timestamp('2018-03-22T16:00:00') ORDER BY forecast.timestamp, warehouse_id Quick show of Data Science landscape dashboard, click through to short term shipments Quick show of performance and stability tabs in Shiny
  4. Quick show of the GFM design file, run, show output
  5. Run DF calculate for 1k products, prepare well! Show table before, cluster upping, CPU usage, cluster downing, table after BQ: SELECT creation_datetime, forecast_start_date, product_id, value, model_queue_id FROM [coolblue-bi-platform-dev:demand_forecast.forecast] WHERE creation_datetime > DATETIME('2018-03-22T14:00:00') ORDER BY product_id Check Shiny individual product 638470 while cluster is upping
  6. Run DF optimize for 5 products, prepare well! Show table before, cluster upping, CPU usage, cluster downing, table after BQ: SELECT mq.product_id, mq.model_queue_id, mq.insert_datetime, m.model_description FROM [coolblue-bi-platform-dev:demand_forecast.model_queue] mq INNER JOIN [coolblue-bi-platform-dev:demand_forecast.model] m ON m.model_id = mq.model_id WHERE mq.insert_datetime > DATETIME('2018-03-22T14:00:00')
  7. Create your own Fact/slogan here: https://coolblueblauwdruk.nl/en/huisstijl/feit-slogan-generator
  8. Satellite: meaning that I work in the domains and explore the available data and models that we could use, depending on the goal. I make proof of concepts and once we can show what the added value of a model is, the Core team is going to productionize the model and automate all the processes needed. Today I am going to show you a recent project that I did, which is about creating a recommender system.
  9. Suggestions → not necessarily personal Research → personalized leads to higher customer satisfaction/loyalty → boost in sales
  10. Increasing datasets → hot topics Recommender systems has been a hot topic for over a few decades and because of the ever increasing datasets it is a very interesting problem for data scientists to solve. At Coolblue we have lots of data and therefore it is exciting to use this data to improve customer experience. Once we create a good recommender system, we can make a big impact by targeting each and every customer individually!
  11. At Coolblue we have 45.000 different products which makes it difficult for customers to find exactly what they are looking for. You should know that there is a great variety of products to choose from, but that customers only have limited time available to browse through all options. Therefore it is of great importance for us to show customers relevant products as early as possible in their customer journey. We have to think about solutions to help our customers find interesting products. For this reason we investigated the possibility to improve our current logic behind personal recommendations.
  12. At Coolblue we already have personalized recommendations. These recommendations are made by looking at recent behavior and purchases of customers. The recommendations that we are going to generate need to be better than the current recommendations. But.. what is better? How can we measure the performance of personal recommendations?
  13. We can measure this by A/B testing. A/B testing is a useful tool to measure the performance of two different variations of webpages or e-mails. We have set up an A/B test within our weekly newsletter, sent by email marketing. For us right now it is FASTER to obtain results when testing it in the email domain. For a proof of concept, implementing the obtained personal recommendations on the website would be too difficult and would take up too much time. In the email A/B test, we send half of the customers the current recommendations and the other half our new personal recommendations. The main metric that we are looking at is the product click through rate. This is essentially the share of customers that click on a personal recommendation. We seek to increase this metric, meaning that the engagement / interaction with the e-mail will be higher. When this metric increases it also means that more customers will land on our website and therefore are more likely to buy a product.
  14. So now I have discussed how we are going to measure the performance of personal recommendations. Let me jump forward in time and show you the A/B test itself...
  15. This is the e-mail that we sent, as you can see, the only difference in the e-mail is the products that are recommended. The other parts of the e-mail are exactly the same in both variations.
  16. As you can see, the results are extremely good! The performance of our recommendations is significantly better than the recommendations provided by the current logic. The CTR of the products increases with 60%, which is huge.. Moreover, the people that click on the products also have a 5% higher probability of buying something when they land on the Coolblue website. This means that we provide high quality recommendations that not only raise awareness of products, but also creates desire. The people actually want to buy the products that we recommended. By using this A/B test we have proved the value of productionizing the model that we build.
  17. Ok, so how did we do this?
  18. OVERWHELMED First of all, we did some deep research. There are many well-known methods to recommend products to customers. The main techniques are collaborative filtering, content based filtering and context-aware methods. I will not go into detail in these techniques but I do want to mention that we can use classification techniques to predict the most relevant product for each customer. Next to that we can use clustering techniques to group products in order to find similar products.
  19. DO THESE FEATURES DRIVE RECOMMENDATIONS? Usually, recommender systems look at the customer features. For instance when looking at gender, the model is basically looking for products that are being sold more often to females than to males and will then boost these products to the female customers and vice versa. Of course, the same happens for region etc. It is basically segmenting the customers into different groups and each customer in this group will get recommended the same products. But, are these features really good drivers to predict the right products to the right customer? Let’s say we know that according to their specifics, a customer is not likely to buy an apple macbook. Maybe because we know that she always buys the less expensive products in a category. What will happen if she is looking for laptops and it turns out that she is constantly looking at macbooks. When we look at her features, we will not recommend her a macbook because she is not going to buy expensive products, right? But why would she look at the macbooks then? Basically she is telling us what she is interested in by her browsing behavior.. a macbook!
  20. So, we have come up with a product sequence. The customer is interacting with us through the products that she sees on our website. This sequence is ordered so it means that we should be able to extract patterns and very insightful information from these sequences.
  21. And then we have this. The model would generate a new sequence of products, which we can recommend.
  22. This means that in a very abstract way, we are trying to create a chatbot. The customers interact with this chatbot implicitly and the model will produce new sentences, containing products. This is where natural language processing comes in. We can think of these sequences as sentences. But except for words the sentences consist of productids. If we find a model from NLP we should be able to input our sentences and the model would be able to extract relationships between words, which in our case are products. Also, models from NLP are also able to generate new sequences following the same patterns and logic based on the input data. Next to this, when we put in a unfinished sequence, it is able to finish the sequence!
  23. So, to summarize. We have thousands of products that we can recommend in order to increase customer satisfaction and to create awareness of new products. After analyzing the data we have we noticed that the customer features did not drive which product is going to be bought in the future. So we need another way to personalize the content, which we do using the set of products seen within each session. These sequences are extremely personal and gives us valuable information about the customer. These sequences tell us which products are likely to be seen together and by using millions of these sequences we are able to extract patterns and recreate flows of customers.
  24. Notice that we are dealing with a multiclass classification problem with as many classes as we have products, which makes artificial neural networks a natural candidate. Next to that, by following the intuition I mentioned before, we are aiming for a model that can handle sequential data. This means that we should be able to use recurrent neural networks, which originate in NLP. Recurrent Neural Networks are extremely effective in modelling sequential data, which is what we need. They are capable of generating sequences following the same patterns. Pattern recognition Sequence modelling Multi class classification As many classes as products
  25. Ok, so let’s research what already has been done in recommender systems using Recurrent Neural Networks. And what turns out is that it is actually not used a lot. We found a few papers discussing the use of recurrent neural networks in recommender systems, but only one unpublished article is following the same intuition as ours. The theoretical performance of this proposed model is promising, but this is only measured using offline experimenting. This means the recommended products are not tested on real customers, but it’s just evaluated on a test set in the data.
  26. The theoretical performance in this paper is based on a metric called Mean Average Precision@k.
  27. This is a metric that measures the average precision of a set of predicted items. The set of predicted items is cut off at k, so it is only looking at the top k products in the prediction. Okay but what is average precision? It basically compares the set of recommended products with the actual relevant item. Notice that the model is trained to predict the next item in the sequence, meaning that there is only one relevant item per sequence. If this relevant product is in the recommended set of products the precision goes up by a certain amount. This amount is based on the position of this relevant product in the recommended set of products. This means that it matters in which order the recommended set of products is presented. Basically, when the relevant item is in the first position, the average precision is 1, when it is in the 5th position the average precision is 1/5. Then, the Mean Average Precision is the average of the average precisions of all sequences in the test set. We are going to compare our results with the MAP@5 in this paper since we are not going to recommend more than 5 products.
  28. So, let me show you how a recurrent neural network looks like. By looking at this architecture it reveals why recurrent neural networks are so effective in modelling sequences. This is because it involves timesteps. In our case, we can input a product at each timestep and then predict the last product in the sequence. As mentioned before, this means that there is only 1 correct product. After we inputted a product in the first timestep, the recurrent layer computes what to let through to the next timestep. The next timestep then receives a new product, and the valuable information from the previous timestep. This means that it can remember long term patterns. The output is a probability vector of the products. This means that during training we can compare the output vector with the actual next product. By using this feedback, the model learns to alter parameters in the recurrent layers in order to improve the predictions.
  29. Okay, now that we have chosen a model and searched for a benchmark. It’s time to train our own recurrent neural network. We train our model using 1 million sequences, which makes it computationally expensive… We started with experimenting this locally on 4 CPUs but soon found out that in order to train a model long enough to become a good classifier, it would take approximately 120 hours.. This is without fine tuning parameters, so if we change some parameters, we would have to wait 120 hours before we see the effects. As you can understand, this is not the way to go. So we started computing in the cloud, and this significantly reduced the training time. On the GPU in the cloud it was reduced to 4.5 hours..
  30. And this means we can finetune our models and see the effects way faster! And as a result, this led to better models
  31. After experimenting and finetuning the model parameters we have obtained an MAP@5 of 0.091, which is very close to the paper discussed before. We do have to keep in mind however that our estimation is based on 20% more items than the paper, which makes it significantly harder to obtain the same MAP@5!
  32. We have trained all models for the proof of concept using Jupyter notebooks. For the neural networks we used the TensorFlow package for Python. As mentioned before we trained the model using the Google Cloud Compute Engine.
  33. Create your own Fact/slogan here: https://coolblueblauwdruk.nl/en/huisstijl/feit-slogan-generator