SlideShare a Scribd company logo
1 of 46
Download to read offline
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Arnoud de Munnik & Jerry Vos, wehkamp
Applied Machine Learning for
Ranking Products in an
Ecommerce Setting
#UnifiedDataAnalytics #SparkAISummit
3#UnifiedDataAnalytics #SparkAISummit
Data Scientist
@wehkamp since 2001
Education: Econometrics
Jerry Vos
Data Scientist
@wehkamp since 2011
Education: Marketing Research
Arnoud de Munnik
Agenda
• Intro wehkamp
• E-commerce ranking problem
• Our learning-to-rank pipeline
• Ranking model
• Q&A
4#UnifiedDataAnalytics #SparkAISummit
the online department store for families in the
Netherlands
1952 - first advertisement 1955 - first catalog 1995 - first steps online 2010 - completely online 2018 - mobile first
2019 -
a great shop
experience
our history
where we come from
over 2.000 brands
C&A // Vingino // Hunkemöller // Mango // Tommy Hilfiger // Scotch & Soda // ONLY
HK Living // House Doctor // Woood // Bloomingville // Zuiver // whkmp’s own
our categories
Fashion // Home & garden // Electronics // Entertainment // Household // Sports & Leisure // Beauty &
Health
>400.000
products
>500.000
daily visitors
661 million
sales 18/19
11 million
packages
> 950
colleagues
60%
of customers
shopping mobile
72%
of our
customers is
female
Our journey
8#UnifiedDataAnalytics #SparkAISummit
• We work(ed) with a traditional corporate data warehouse
• Need: ML, flexibility, speed, enabling, etc.
• 2 years ago: pilot Spark on Databricks
– Challenges: Training of people, data in cloud
• Today:
– Transformation to Databricks / Cloud (S3)
– Lots of new (ML) products/prototypes and colleagues on DB platform
Machine learning @ wehkamp
9#UnifiedDataAnalytics #SparkAISummit
Recommend
ers
Forecasti
ng
Image
classificati
on
Search
Personalisat
ion
Product
ranking
Fraud
detection
And a
lot
more
Machine learning @ wehkamp
10#UnifiedDataAnalytics #SparkAISummit
Recommend
ers
Forecasti
ng
Image
classificati
on
Search
Personalisat
ion
Product
ranking
Fraud
detection
And a
lot
more
Ranking problem for ecommerce
Ranking problem for ecommerce
12#UnifiedDataAnalytics #SparkAISummit
User search for ‘jeans’
Relevant?
We return 4401 products
Ranking problem for ecommerce
13#UnifiedDataAnalytics #SparkAISummit
User navigates to
‘ladies jeans’ overview
page
Relevant?
We return 2176 products
Ranking problem for ecommerce
14#UnifiedDataAnalytics #SparkAISummit
● Consider a visit to a ‘product overview page’ (example
‘ladies jeans’) as a user query
● Main problem: maximize the order of relevance of returned
products given a user query
● How good is this list?
● Suppose we know how relevant each item is, can we
define an overall score for the relevancy of this list?
● Yes we can, the answer is NDCG
(Normalized Discounted Cumulative Gain)
https://en.wikipedia.org/wiki/Discounted_cumulative_gain
Ranking problem for ecommerce
● Suppose we know relevancy scores,
let’s rank them
● Let’s add a correction for position via
Log2(i+1)
● Divide and sum to get a score:
discounted cumulative gain (7,84)
● Do the same, but for this list in
perfect order to get an Ideal DCG.
That score will be: 9,00
● Divide our DCG / IDCG =
normalized discounted cumulative
gain (0.87)
Ranking problem for ecommerce
2 3 4
1
i
1 2 1,00 2,00
2 3 1,58 1,89
3 4 2,00 2,00
4 1 2,32 0,43
5 3 2,58 1,16
6 1 2,81 0,36
3 1
Sum: 7,84
Ranking problem for ecommerce
Relevancy scores Explain the scores with features
32 1
Title match
4
Article match
Maximize the NDCG, by giving weight to features
Reviews
Seasonality
Price
…
Learning to rank pipeline
Special thanks
Wikimedia
MjoLniR: https://github.com/wikimedia/search-MjoLniR
Pipeline
20#UnifiedDataAnalytics #SparkAISummit
Data
collection
1
Click model
2
Feature
generation
3
Ranking model
4
Serve model
(ElasticSearch
LTR)
5
Evaluation
(Tableau)
6
For relevancy scores
For explaining relevance
For estimating weights to
features
For productionising
Efforts
21#UnifiedDataAnalytics #SparkAISummit
• Initial effort of building pipeline:
2 data scientists and 1 data engineer (for search and Product Overview Page) for a couple of
months
• New click/ranking model:
1 data scientist can train, test and push a new ranking model to production within 1 hour
Data collection
22#UnifiedDataAnalytics #SparkAISummit
● Source: raw Google Analytics feed (daily)
● Per product list (i.e. search, overview page):
○ ProductID
○ Position / Page
○ Impression / Click
● Challenges:
○ tagging is different for web and app
○ devices have different display formats
Click model
Reality: We don’t know the relevancy scores; use a click model.
Goal: determine relevance of products in each SOP/POP
Approach: predict the relevance of products based on impressions and clicks of products
given its position
• Clicks over Expected clicks (COEC)
• Corrected for small search queries
In our case:
better results, easier to train & explain
• DBN click model
(https://github.com/varepsilon/clickmodels)
• Paper: Dynamic Bayesian Network ( DBN
) model: Chapelle, O. and Zhang, Y. 2009.
A dynamic bayesian network click model
for web search ranking. WWW (2009)
COEC click model
24#UnifiedDataAnalytics #SparkAISummit
Example
COEC click model
25#UnifiedDataAnalytics #SparkAISummit
search phrase Product Id clicks Expected
clicks
Bucket aka
relevancy
Jeans 0123456 250 50 3
Jeans 6543210 200 20 4
Jeans 3211231 300 300 2
Jeans 4566543 400 800 1
Jeans Random product id
9997979
- - 0Add random data
Demo clickmodel
26#UnifiedDataAnalytics #SparkAISummit
Query: “Flared jeans”
Relevancy: 1
Relevancy: 4
Feature generation
Try to explain and predict which attributes (i.e. features) of products (wrt user query)
contribute to its relevance score
27#UnifiedDataAnalytics #SparkAISummit
- Title match
- Description match
- Tf-idf
- …
● Limit the number of features to
< 100 (latency issues)
● For POP features we did not
use OHE, but a Bayesian
encoder to limit number of
features
- Popularity
- Discount / Promo
- Seasonality
- Reviews
- Days online
- Brand
- ..
Feature examples
Feature generation
28#UnifiedDataAnalytics #SparkAISummit
Feature
notebooks
Kafka
Search
processor ES index
“jeans”
Log results with
feature values
S3
● Initial training and query building with snapshot data
Query
Feature generation
29#UnifiedDataAnalytics #SparkAISummit
Feature
notebooks
Delta pre-
processed
data
Delta
feature
DB
Fetch
features
and send
to ES
Seasonality
estimate per
article type
OpenWeatherMap
API
AGG
view/sales/promo
data
Timeseries models
with prediction à
Scaled via Pandas
UDF
ES index
Feature generation
30#UnifiedDataAnalytics #SparkAISummit
Feature
notebooks
Kafka
Search
processor ES index
“jeans”
Log results with
feature values
S3
Query
Add
clickmodel
labels
Train
model
Add model
to ES
Ranking model
Ranking model
• Many machine learning techniques to use
• Elastic Search LTR plugin supports XGBoost
• XGBoost → eXtreme Gradient Boosting
– Variant of the gradient boosting technique (tree-based model)
– Non-linearity
– Good results (e.g. Kaggle competitions)
– Easy to use, tune, and evaluate
– Fast (parallel computation on single machine but also cluster
support, e.g. Spark)
• XGBoost has lots of parameters to tune; we adopt help from Hyperopt
https://hyperopt.github.io/hyperopt/
• XGBoost has rank:ndcg as option
32#UnifiedDataAnalytics #SparkAISummit
Ranking model
33#UnifiedDataAnalytics #SparkAISummit
• Each hyperopt run stores the result of its best parameters in MLflow
Ranking model
34#UnifiedDataAnalytics #SparkAISummit
After training, store information
in MLflow:
• Feature importances (SHAP)
• Test and training datasets
• Feature map
• XGBoost model
Ranking model
35#UnifiedDataAnalytics #SparkAISummit
For examining the feature importance
of each model we use SHAP
https://github.com/slundberg/shap
F1
F2
F3
F4
…
Serve model
36#UnifiedDataAnalytics #SparkAISummit
With a few lines of code we
save our model to Elastic
index
Serve model
37#UnifiedDataAnalytics #SparkAISummit
Serve model
38#UnifiedDataAnalytics #SparkAISummit
Split 1
Split 2
Split 3
Serve model
39#UnifiedDataAnalytics #SparkAISummit
Popularity
score > 0.006?
NoYes
Node 0
Node 1 Node 2
Split 1
Split 2
Split 3
Split 1
Serve model
40#UnifiedDataAnalytics #SparkAISummit
Popularity
score > 0.006?
No
Title match
> 9.25?
Yes
Yes No
Node 0
Node 1 Node 2
Node 3 Node 4
Split 1
Split 2
Split 3
Split 1
Split 2
Evaluation
• Use A/B testing to check if ranking models outperform the standard
implementation. Configuration of tests done with Planout
https://github.com/facebook/planout
• An automated Tableau report will show the results of the A/B test
• We are reporting quite a few metrics, but most importantly looking at:
- Click Trough Rate
- Revenue per session
- Paul Score https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/Glossary
41#UnifiedDataAnalytics #SparkAISummit
Evaluation
42#UnifiedDataAnalytics #SparkAISummit
Our journey ahead
• For search; build multiple models for multiple
categories, based on searchphrase classification
• Add more product specific attributes
• Test with personalisation
Wrap up
44#UnifiedDataAnalytics #SparkAISummit
Automating a learning-to-rank pipeline requires a lot of different parts
working together.
- Google Analytics
- Databricks / Spark
- Elasticsearch
- S3
- XGBoost
- Hyperopt / SHAP
- MLflow
- Planout
- Tableau
Questions?
45#UnifiedDataAnalytics #SparkAISummit
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

More Related Content

What's hot

Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Sease
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
Machine Learning for retail and ecommerce
Machine Learning for retail and ecommerceMachine Learning for retail and ecommerce
Machine Learning for retail and ecommerceAndrei Lopatenko
 
Deploying End-to-End Deep Learning Pipelines with ONNX
Deploying End-to-End Deep Learning Pipelines with ONNXDeploying End-to-End Deep Learning Pipelines with ONNX
Deploying End-to-End Deep Learning Pipelines with ONNXDatabricks
 
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Benjamin Le
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix ScaleAish Fenton
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemAkshat Thakar
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsJames Kirk
 
Learned Embeddings for Search and Discovery at Instacart
Learned Embeddings for  Search and Discovery at InstacartLearned Embeddings for  Search and Discovery at Instacart
Learned Embeddings for Search and Discovery at InstacartSharath Rao
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 
Learning to rank
Learning to rankLearning to rank
Learning to rankBruce Kuo
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerceAlexander Konduforov
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
 

What's hot (20)

Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
Machine Learning for retail and ecommerce
Machine Learning for retail and ecommerceMachine Learning for retail and ecommerce
Machine Learning for retail and ecommerce
 
Deploying End-to-End Deep Learning Pipelines with ONNX
Deploying End-to-End Deep Learning Pipelines with ONNXDeploying End-to-End Deep Learning Pipelines with ONNX
Deploying End-to-End Deep Learning Pipelines with ONNX
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Learned Embeddings for Search and Discovery at Instacart
Learned Embeddings for  Search and Discovery at InstacartLearned Embeddings for  Search and Discovery at Instacart
Learned Embeddings for Search and Discovery at Instacart
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Learning to rank
Learning to rankLearning to rank
Learning to rank
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 

Similar to Applied Machine Learning for Ranking Products in an Ecommerce Setting

Data Integration and Marketing Attribution
Data Integration and Marketing Attribution Data Integration and Marketing Attribution
Data Integration and Marketing Attribution ROIVENUE™
 
Lean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterLean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterBrad Swanson
 
Understanding Web Analytics and Google Analytics
Understanding Web Analytics and Google AnalyticsUnderstanding Web Analytics and Google Analytics
Understanding Web Analytics and Google AnalyticsPrathamesh Kulkarni
 
Adobe Business.pptx
Adobe Business.pptxAdobe Business.pptx
Adobe Business.pptxAnkush Kapil
 
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...Databricks
 
Real-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with HadoopReal-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with HadoopDataWorks Summit
 
Embedded Analytics Campaign Strategy
Embedded Analytics Campaign StrategyEmbedded Analytics Campaign Strategy
Embedded Analytics Campaign StrategyJessica Legg
 
The Triangle - A universal method of working with digital analytics and marke...
The Triangle - A universal method of working with digital analytics and marke...The Triangle - A universal method of working with digital analytics and marke...
The Triangle - A universal method of working with digital analytics and marke...Robert Børlum-Bach
 
mrkt354lecture4i-140209143215-phpapp02.pptx
mrkt354lecture4i-140209143215-phpapp02.pptxmrkt354lecture4i-140209143215-phpapp02.pptx
mrkt354lecture4i-140209143215-phpapp02.pptxNeelamSheoliha2
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
Introduction to Google Analytics
Introduction to Google AnalyticsIntroduction to Google Analytics
Introduction to Google AnalyticsAVIK BAL
 
Supercharge your Salesforce with 10 Awesome tips & tricks
Supercharge your Salesforce with 10 Awesome tips & tricksSupercharge your Salesforce with 10 Awesome tips & tricks
Supercharge your Salesforce with 10 Awesome tips & tricksYeurDreamin'
 
Connecting social media to e-commerce
Connecting social media to e-commerceConnecting social media to e-commerce
Connecting social media to e-commercekrsenthamizhselvi
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Christopher Gutknecht
 
CoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsCoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsDataBench
 
CoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsCoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core Operationst_ivanov
 
E-Commerce and MongoDB at Backcountry.com
E-Commerce and MongoDB at Backcountry.comE-Commerce and MongoDB at Backcountry.com
E-Commerce and MongoDB at Backcountry.comMongoDB
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1SigOpt
 
How to Apply Machine Learning by Lyft Senior Product Manager
How to Apply Machine Learning by Lyft Senior Product ManagerHow to Apply Machine Learning by Lyft Senior Product Manager
How to Apply Machine Learning by Lyft Senior Product ManagerProduct School
 
Building ML models for smart retail
Building ML models for smart retailBuilding ML models for smart retail
Building ML models for smart retailAlbert Y. C. Chen
 

Similar to Applied Machine Learning for Ranking Products in an Ecommerce Setting (20)

Data Integration and Marketing Attribution
Data Integration and Marketing Attribution Data Integration and Marketing Attribution
Data Integration and Marketing Attribution
 
Lean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterLean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products Faster
 
Understanding Web Analytics and Google Analytics
Understanding Web Analytics and Google AnalyticsUnderstanding Web Analytics and Google Analytics
Understanding Web Analytics and Google Analytics
 
Adobe Business.pptx
Adobe Business.pptxAdobe Business.pptx
Adobe Business.pptx
 
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
 
Real-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with HadoopReal-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with Hadoop
 
Embedded Analytics Campaign Strategy
Embedded Analytics Campaign StrategyEmbedded Analytics Campaign Strategy
Embedded Analytics Campaign Strategy
 
The Triangle - A universal method of working with digital analytics and marke...
The Triangle - A universal method of working with digital analytics and marke...The Triangle - A universal method of working with digital analytics and marke...
The Triangle - A universal method of working with digital analytics and marke...
 
mrkt354lecture4i-140209143215-phpapp02.pptx
mrkt354lecture4i-140209143215-phpapp02.pptxmrkt354lecture4i-140209143215-phpapp02.pptx
mrkt354lecture4i-140209143215-phpapp02.pptx
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Introduction to Google Analytics
Introduction to Google AnalyticsIntroduction to Google Analytics
Introduction to Google Analytics
 
Supercharge your Salesforce with 10 Awesome tips & tricks
Supercharge your Salesforce with 10 Awesome tips & tricksSupercharge your Salesforce with 10 Awesome tips & tricks
Supercharge your Salesforce with 10 Awesome tips & tricks
 
Connecting social media to e-commerce
Connecting social media to e-commerceConnecting social media to e-commerce
Connecting social media to e-commerce
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
CoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsCoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core Operations
 
CoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsCoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core Operations
 
E-Commerce and MongoDB at Backcountry.com
E-Commerce and MongoDB at Backcountry.comE-Commerce and MongoDB at Backcountry.com
E-Commerce and MongoDB at Backcountry.com
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1
 
How to Apply Machine Learning by Lyft Senior Product Manager
How to Apply Machine Learning by Lyft Senior Product ManagerHow to Apply Machine Learning by Lyft Senior Product Manager
How to Apply Machine Learning by Lyft Senior Product Manager
 
Building ML models for smart retail
Building ML models for smart retailBuilding ML models for smart retail
Building ML models for smart retail
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 

Recently uploaded (20)

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 

Applied Machine Learning for Ranking Products in an Ecommerce Setting

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Arnoud de Munnik & Jerry Vos, wehkamp Applied Machine Learning for Ranking Products in an Ecommerce Setting #UnifiedDataAnalytics #SparkAISummit
  • 3. 3#UnifiedDataAnalytics #SparkAISummit Data Scientist @wehkamp since 2001 Education: Econometrics Jerry Vos Data Scientist @wehkamp since 2011 Education: Marketing Research Arnoud de Munnik
  • 4. Agenda • Intro wehkamp • E-commerce ranking problem • Our learning-to-rank pipeline • Ranking model • Q&A 4#UnifiedDataAnalytics #SparkAISummit
  • 5. the online department store for families in the Netherlands
  • 6. 1952 - first advertisement 1955 - first catalog 1995 - first steps online 2010 - completely online 2018 - mobile first 2019 - a great shop experience our history where we come from
  • 7. over 2.000 brands C&A // Vingino // Hunkemöller // Mango // Tommy Hilfiger // Scotch & Soda // ONLY HK Living // House Doctor // Woood // Bloomingville // Zuiver // whkmp’s own our categories Fashion // Home & garden // Electronics // Entertainment // Household // Sports & Leisure // Beauty & Health >400.000 products >500.000 daily visitors 661 million sales 18/19 11 million packages > 950 colleagues 60% of customers shopping mobile 72% of our customers is female
  • 8. Our journey 8#UnifiedDataAnalytics #SparkAISummit • We work(ed) with a traditional corporate data warehouse • Need: ML, flexibility, speed, enabling, etc. • 2 years ago: pilot Spark on Databricks – Challenges: Training of people, data in cloud • Today: – Transformation to Databricks / Cloud (S3) – Lots of new (ML) products/prototypes and colleagues on DB platform
  • 9. Machine learning @ wehkamp 9#UnifiedDataAnalytics #SparkAISummit Recommend ers Forecasti ng Image classificati on Search Personalisat ion Product ranking Fraud detection And a lot more
  • 10. Machine learning @ wehkamp 10#UnifiedDataAnalytics #SparkAISummit Recommend ers Forecasti ng Image classificati on Search Personalisat ion Product ranking Fraud detection And a lot more
  • 11. Ranking problem for ecommerce
  • 12. Ranking problem for ecommerce 12#UnifiedDataAnalytics #SparkAISummit User search for ‘jeans’ Relevant? We return 4401 products
  • 13. Ranking problem for ecommerce 13#UnifiedDataAnalytics #SparkAISummit User navigates to ‘ladies jeans’ overview page Relevant? We return 2176 products
  • 14. Ranking problem for ecommerce 14#UnifiedDataAnalytics #SparkAISummit ● Consider a visit to a ‘product overview page’ (example ‘ladies jeans’) as a user query ● Main problem: maximize the order of relevance of returned products given a user query
  • 15. ● How good is this list? ● Suppose we know how relevant each item is, can we define an overall score for the relevancy of this list? ● Yes we can, the answer is NDCG (Normalized Discounted Cumulative Gain) https://en.wikipedia.org/wiki/Discounted_cumulative_gain Ranking problem for ecommerce
  • 16. ● Suppose we know relevancy scores, let’s rank them ● Let’s add a correction for position via Log2(i+1) ● Divide and sum to get a score: discounted cumulative gain (7,84) ● Do the same, but for this list in perfect order to get an Ideal DCG. That score will be: 9,00 ● Divide our DCG / IDCG = normalized discounted cumulative gain (0.87) Ranking problem for ecommerce 2 3 4 1 i 1 2 1,00 2,00 2 3 1,58 1,89 3 4 2,00 2,00 4 1 2,32 0,43 5 3 2,58 1,16 6 1 2,81 0,36 3 1 Sum: 7,84
  • 17. Ranking problem for ecommerce Relevancy scores Explain the scores with features 32 1 Title match 4 Article match Maximize the NDCG, by giving weight to features Reviews Seasonality Price …
  • 18. Learning to rank pipeline
  • 20. Pipeline 20#UnifiedDataAnalytics #SparkAISummit Data collection 1 Click model 2 Feature generation 3 Ranking model 4 Serve model (ElasticSearch LTR) 5 Evaluation (Tableau) 6 For relevancy scores For explaining relevance For estimating weights to features For productionising
  • 21. Efforts 21#UnifiedDataAnalytics #SparkAISummit • Initial effort of building pipeline: 2 data scientists and 1 data engineer (for search and Product Overview Page) for a couple of months • New click/ranking model: 1 data scientist can train, test and push a new ranking model to production within 1 hour
  • 22. Data collection 22#UnifiedDataAnalytics #SparkAISummit ● Source: raw Google Analytics feed (daily) ● Per product list (i.e. search, overview page): ○ ProductID ○ Position / Page ○ Impression / Click ● Challenges: ○ tagging is different for web and app ○ devices have different display formats
  • 23. Click model Reality: We don’t know the relevancy scores; use a click model. Goal: determine relevance of products in each SOP/POP Approach: predict the relevance of products based on impressions and clicks of products given its position • Clicks over Expected clicks (COEC) • Corrected for small search queries In our case: better results, easier to train & explain • DBN click model (https://github.com/varepsilon/clickmodels) • Paper: Dynamic Bayesian Network ( DBN ) model: Chapelle, O. and Zhang, Y. 2009. A dynamic bayesian network click model for web search ranking. WWW (2009)
  • 25. COEC click model 25#UnifiedDataAnalytics #SparkAISummit search phrase Product Id clicks Expected clicks Bucket aka relevancy Jeans 0123456 250 50 3 Jeans 6543210 200 20 4 Jeans 3211231 300 300 2 Jeans 4566543 400 800 1 Jeans Random product id 9997979 - - 0Add random data
  • 26. Demo clickmodel 26#UnifiedDataAnalytics #SparkAISummit Query: “Flared jeans” Relevancy: 1 Relevancy: 4
  • 27. Feature generation Try to explain and predict which attributes (i.e. features) of products (wrt user query) contribute to its relevance score 27#UnifiedDataAnalytics #SparkAISummit - Title match - Description match - Tf-idf - … ● Limit the number of features to < 100 (latency issues) ● For POP features we did not use OHE, but a Bayesian encoder to limit number of features - Popularity - Discount / Promo - Seasonality - Reviews - Days online - Brand - .. Feature examples
  • 28. Feature generation 28#UnifiedDataAnalytics #SparkAISummit Feature notebooks Kafka Search processor ES index “jeans” Log results with feature values S3 ● Initial training and query building with snapshot data Query
  • 29. Feature generation 29#UnifiedDataAnalytics #SparkAISummit Feature notebooks Delta pre- processed data Delta feature DB Fetch features and send to ES Seasonality estimate per article type OpenWeatherMap API AGG view/sales/promo data Timeseries models with prediction à Scaled via Pandas UDF ES index
  • 30. Feature generation 30#UnifiedDataAnalytics #SparkAISummit Feature notebooks Kafka Search processor ES index “jeans” Log results with feature values S3 Query Add clickmodel labels Train model Add model to ES
  • 32. Ranking model • Many machine learning techniques to use • Elastic Search LTR plugin supports XGBoost • XGBoost → eXtreme Gradient Boosting – Variant of the gradient boosting technique (tree-based model) – Non-linearity – Good results (e.g. Kaggle competitions) – Easy to use, tune, and evaluate – Fast (parallel computation on single machine but also cluster support, e.g. Spark) • XGBoost has lots of parameters to tune; we adopt help from Hyperopt https://hyperopt.github.io/hyperopt/ • XGBoost has rank:ndcg as option 32#UnifiedDataAnalytics #SparkAISummit
  • 33. Ranking model 33#UnifiedDataAnalytics #SparkAISummit • Each hyperopt run stores the result of its best parameters in MLflow
  • 34. Ranking model 34#UnifiedDataAnalytics #SparkAISummit After training, store information in MLflow: • Feature importances (SHAP) • Test and training datasets • Feature map • XGBoost model
  • 35. Ranking model 35#UnifiedDataAnalytics #SparkAISummit For examining the feature importance of each model we use SHAP https://github.com/slundberg/shap F1 F2 F3 F4 …
  • 36. Serve model 36#UnifiedDataAnalytics #SparkAISummit With a few lines of code we save our model to Elastic index
  • 39. Serve model 39#UnifiedDataAnalytics #SparkAISummit Popularity score > 0.006? NoYes Node 0 Node 1 Node 2 Split 1 Split 2 Split 3 Split 1
  • 40. Serve model 40#UnifiedDataAnalytics #SparkAISummit Popularity score > 0.006? No Title match > 9.25? Yes Yes No Node 0 Node 1 Node 2 Node 3 Node 4 Split 1 Split 2 Split 3 Split 1 Split 2
  • 41. Evaluation • Use A/B testing to check if ranking models outperform the standard implementation. Configuration of tests done with Planout https://github.com/facebook/planout • An automated Tableau report will show the results of the A/B test • We are reporting quite a few metrics, but most importantly looking at: - Click Trough Rate - Revenue per session - Paul Score https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/Glossary 41#UnifiedDataAnalytics #SparkAISummit
  • 43. Our journey ahead • For search; build multiple models for multiple categories, based on searchphrase classification • Add more product specific attributes • Test with personalisation
  • 44. Wrap up 44#UnifiedDataAnalytics #SparkAISummit Automating a learning-to-rank pipeline requires a lot of different parts working together. - Google Analytics - Databricks / Spark - Elasticsearch - S3 - XGBoost - Hyperopt / SHAP - MLflow - Planout - Tableau
  • 46. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT