SlideShare a Scribd company logo
1 of 34
Download to read offline
Counterfactual
Learning to Rank:
Alex Egg | @eggie5
Personalized Recommendations
In Ecommerce
Outline
● 2 Stage IR System
● Candidate Selection
● Ranking
● Personalization (Modeling Interactions)
● Features (for recommenders)
● Log Feedback
● Biased Data (Counterfactuals & Reinforcement)
● Training
● Tuning
● Deployment
● Evaluation
● Ops
Restaurant
Recommendations
Menu/Dish
Recommendations
Rest/Dish/Cuisine
Search
Cuisine
Recommendations
Two-Stage Information Retrieval System
2 Stages:
● Candidates
● Rankings
Candidate Selection (Recall)
Motivation: We can’t rank the whole catalog in SLA
Fast/High recall set << Catalogue
● metadata-based filters: eg select items in user’s fav
cuisines or genre
● Item co-occurrences: eg clusters that belong to your past
items
● k-nearest neighbors: eg find similar items in Rn
space (see
ANN later)
Ranking (Precision)
Rank Candidates w/ high precision using Supervised Learning
● Classification
○ Binomial: P( click | u, i )
○ Multinomial: P( I | u ) → Autoencoders
● Ranking
○ Pointwise, pairwise, listwise
* Choice of approach is a product of your supervision labels
Binary feedback or relevance labels
Supervised Learning Task w/ sparse categorical variables:
f(X) = y, D=(X,y), X=(U,R), {U,R,} ∈ R1
, y ∈ {0,1}
Linear Model: P(y|X) = σ(Xw) = σ( u1
w1
+ r2
w2
)
U(is_french) = {1, -1}
R(is_french) = {1, -1}
X=[1, 1] ← french lover + french rest
X=[1, -1] ← french lover + non-french rest
X=[-1, 1] ← french hater + french rest
X=[-1,-1] ← french hater + non-french rest
Personalization (Modeling Interactions)
Feature Crosses: (2nd-order)
σ(ɸ(X)w) = σ( u1
w1
+ r2
w2
+ u1
r1
w3
)
X=[1, 1, 1] ← french lover + french rest
X=[1, -1, -1] ← french lover + non-french rest
X=[-1, 1, -1] ← french hater + french rest
X=[-1,-1, 1] ← french hater + non-french rest
Go Deep!: 2-layer MLP
How to model nth-Order Interactions?
● Explicit & implicit feature crosses (very sparse feature
space, expensive)
● Combinations of explicit and implicit (wide & deep)!
Deep & Cross Network
Are multiplicative crosses enough? FMs → MLPs
Recent studies [1, 2] found that DNNs are inefficient to
even approximately model 2nd or 3rd-order feature
crosses.
● What is advantage of DCN?
○ Efficient Explicit Feature Crosses
1: Latent cross: Making use of context in recurrent recommender systems.
2: Deep & Cross Network for Ad Click Predictions
Features
Sparse categorical variables
→ embeddings
Examples:
● User
● Item
● Context
Log Feedback
Full-feedback → Partial-feedback (logs)
● Log Feedback
● Biased Data
● Evaluation Paradox
Log Feedback
D = (x, y, r)
● x: context (user)
● y: action (item/ranking)
● r: reward (feedback click/order)
Feedback
● Explicit Feedback: like/stars
● Implicit Feedback: watch/click/order
Tradeoff: quantity/quality
�
�
Biased Data P( y | X )
Feedback
● Organic/Full-Feedback
○ Common in Academia, Rare in industry (ADS-16, MSLR-WEB30k)
● Bandit/partially-observed-Feedback
○ Click logs from industrial applications
Analogy: What is your favourite color: red or black? => red
P(y=red | X=🧐) = 1 ← is this actually true?
“Missing, not at random”
Apply this analogy to any recsys you use: netflix, spotify, amzn, grubhub
Evaluation (Thought Experiment)
● Classic train/test split to predict the test set accurately...
● Dataset of production system logs D=(x,y,r) ...
● What is the value of predicting the test set accurately?
● Is the test-set a reflection of organic user behavior? (No) Or a reflection of the logging
policy!? (Yes)
● There is a difference between a prediction and a recommendation (A recommendation is an
intervention)
● Bandit feedback is the product of a logging policy
● Logging policy is the previous generation recommender, ie the dataset (logs)
Goal
Supervised Learning Predict test-set
Actual Predict user behavior
Test-set != user behavior
Counterfactual
Learning ● Selection Bias
● Position Bias
Randomization and stochasticity
Causal Embeddings for Recommendation. Bonner, Vasile. Recsys ‘18
Selection Bias (Randomization)
Bias from Feedback loops
Add Exploration → Stochasticity (Randomization)
● Random Exploration w/ ϵ-Greedy Bandit
● Causal Embeddings: Jointly factorize unbiased
and greedy embeddings
Position Bias (Randomization)
Bias from devices
Inverse Propensity Scoring
Compute inverse propensities 1/bi
across ranks for
random bucket
Offset loss:
Counterfactual
Evaluation
● Partial Information
● Full Information
● Partial Information w/ bandit
feedback
Medical Analogy
Patient Bypass Stent Drugs
1 0
2 1
3 1
4 0
5 1
6 1
7 1
8 0
9 0
10 1
11 1
Partial Information Setting
Counterfactual Thinking
Treating Heart Attacks
● Treatments: Y: [bypass, stent, drugs]
● Outcomes δi
: 5 year survival (0/1)
󰢛 Which treatment is best??
● Drugs 3/4🏅
● Stent ⅔
● Bypass 2/4
Really? 🤔
Patient Bypass Stent Drugs
1 0 1 0
2 1 1 0
3 0 0 1
4 0 0 0
5 0 1 1
6 1 0 0
7 1 0 1
8 0 1 0
9 0 1 0
10 1 1 0
11 1 1 1
Full Information Setting
Treatment Effects
Example:
● Bypass = 5/11 = .45
● Stent = 7/11 = .63🏅
● Drugs = 4/11 = .36
Bypass Stent Drugs
0 1 0
1 1 0
0 0 1
0 0 0
0 1 1
1 0 0
1 0 1
0 1 0
0 1 0
1 1 0
1 1 1
Patient P_B P_S P_D
1 .3 .6 .1
2 .4 .5 .1
3 .1 .1 .8
4 .6 .3 .1
5 .2 .1 .7
6 .4 .2 .4
7 .1 .1 .8
8 .1 .8 .1
9 .3 .3 .4
10 .3. .2 .1
11 .4 .4 .2
Partial Information Setting w/ Bandit Feedback
Assignment
R’
IPS
(y)=∑𝐈(yi
=y)/pi
δ(xi
,yi
)
●Bypass = 1/11 (0/.3 + 1/.4 + 0/.3 + 1/.4) = .45
●Stent = 1/11 (1/.5 + 0/.3 + 1/.2) = .63🏅
●Drugs = 1/11 (1/.8 + 1/.7 + 1/.8 + 0/.1) = .36
Off-policy Evaluation, eg 🎉 Offline AB Testing🎉
1. Policy: Deterministic y = f(x) → Stochastic p ~ π(x)
2. Log propensites: D=(x,y,r,p)
3. Build IPS Estimator
Counterfactual Evaluation
Experiments
& Results
Personalized Policy
● Evaluation
● Interaction Modeling
● Multi-relevance Feedback
● Congregated Search
● Market Generalization
● GPU Workflows
Evaluation
Offline Metrics
● NDCG
● MRR
● AUC
● Gini-lorenz
Hyperparameter Tuning
● Vertical Scaling across GPUs (p3) w/
● Exhaustive Search over 66 combinations. w/o
concurrency would take ~200h.
Online Metrics
● Conversion Rate (Conv)
● Orders/Visitor (OPV)
● Revenue
● Diversity
● Fairness
Offline Evaluation
Metric Corr w/ Conv
MRR .55 📈
AUC .51
NDCG .48
loss .02
Surrogate Metric: Can we get directional estimates of online metrics, offline?
Can we design a metrics that tracks conversion rate?
On-Policy Evaluation
● Most-popular policy vs Personalization Policy
● Personalization policy +20% & improves over time
● Diversity: ~5 cuisines/slate, 60% unique
On-Policy Evaluation: Fairness
Inequality (Lorenz Curve + Gini Index)
Quantifies inequality (ie impressions across merchants
or wealth across populations)
The EE variant is more equitable than the MP variant.
Gini
Baseline .60
MP .59
EE .49 🏅
Experiment: Causal Embeddings
Hypothesis: If we use the uniform data in a
principled manner we can increase performance
by overcoming selection bias.
Experiment:
● Random
● Biased
● Random ∪ Biased
● CausE
Results: Principled use of uniform was beneficial
Causal Embeddings for Recommendation. Bonner, Vasile. Recsys ‘18
AUC
Random
(small data)
.56
Biased .72
Random ∪
Biased
.73
Causal .74
Experiment: Interaction Modeling
Hypothesis: MLPs are universal function
approximators?
Experiment: Evaluate MLP against
feature crosses
Results: MLP does not capture full
interactions
NDCG
(unintentful)
MRR
(unintentful)
AUC
(unintentful)
Random .511 .216 .500
UMP .615 .582 .653
MLP .627 .586 .689
DCN .657
(+4.7%)
.617
(+5.2%)
.695
(+0.8%)
Experiment: Multi-relevance Feedback
Sources of Feedback:
● Impressions
● Clicks
● Orders
Metric Disagreement
Online Eval or Off-policy Eval
NDCG
(unintentful)
MRR
(unintentful)
AUC
(unintentful)
Orders &
Clicks
.668 .633 .675
Clicks .665 (0%) .600 (-5%) .757 (+12%)
Experiment: Global vs Local Models
(Markets)
If your recommender operates in markets
of varying sizes with distinct
cultural/taste patterns, it’s important that
your recs are high-quality in all markets.
● Operational Pain
● Market Sparsity
NDCG
(unintentful)
MRR
(unintentful)
AUC
(unintentful)
Local .617 0.557 0.635
Global .749
(+21%)
0.709
(+27.2%)
0.736
(+12%)
Experiment: GPU Data Pipelines
● IO → CPU → RAM → GPU RAM → GPU
● IO Bound
○ Sequential Data Access: Libsvm → TFRecords, GPU: 4% → 90%
○ tf.data pipelines are CPU only
○ Vmap: batch → map
○ Prefetch (to GPU memory)
Step/s Train Time (h) Batch Size
CPU (64 cores) 0.58 47 448
K80(4992 cores) 3.2 10 127
V100* (5120 cores) 8.5 (2-3x) 3.2 (3x) 448
* Not using Tensor-Cores/FP16
Thank You
Alex Egg | @eggie5

More Related Content

What's hot

LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at NetflixJustin Basilico
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsJames Kirk
 
Learning to rank
Learning to rankLearning to rank
Learning to rankBruce Kuo
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 

What's hot (20)

Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Learning to rank
Learning to rankLearning to rank
Learning to rank
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 

Similar to GTC 2021: Counterfactual Learning to Rank in E-commerce

Causality without headaches
Causality without headachesCausality without headaches
Causality without headachesBenoît Rostykus
 
Setting up an A/B-testing framework
Setting up an A/B-testing frameworkSetting up an A/B-testing framework
Setting up an A/B-testing frameworkAgnes van Belle
 
Recommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyRecommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyVito Walter Anelli
 
Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De...
 Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De... Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De...
Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De...Anh Luong
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3Luis Borbon
 
ODSC Causal Inference Workshop (November 2016) (1)
ODSC Causal Inference Workshop (November 2016) (1)ODSC Causal Inference Workshop (November 2016) (1)
ODSC Causal Inference Workshop (November 2016) (1)Emily Glassberg Sands
 
Linear Probability Models and Big Data: Prediction, Inference and Selection Bias
Linear Probability Models and Big Data: Prediction, Inference and Selection BiasLinear Probability Models and Big Data: Prediction, Inference and Selection Bias
Linear Probability Models and Big Data: Prediction, Inference and Selection BiasSuneel Babu Chatla
 
Causal reasoning and Learning Systems
Causal reasoning and Learning SystemsCausal reasoning and Learning Systems
Causal reasoning and Learning SystemsTrieu Nguyen
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
 
Empirical Evaluation of Active Learning in Recommender Systems
Empirical Evaluation of Active Learning in Recommender SystemsEmpirical Evaluation of Active Learning in Recommender Systems
Empirical Evaluation of Active Learning in Recommender SystemsUniversity of Bergen
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind MapAshish Patel
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 InternshipTaylor Martell
 
Review: [KDD'21]Model-Agnostic Counterfactual Reasoning for Eliminating Popul...
Review: [KDD'21]Model-Agnostic Counterfactual Reasoning for Eliminating Popul...Review: [KDD'21]Model-Agnostic Counterfactual Reasoning for Eliminating Popul...
Review: [KDD'21]Model-Agnostic Counterfactual Reasoning for Eliminating Popul...CS Kwak
 
Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionCombining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionGianluca Bontempi
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
 

Similar to GTC 2021: Counterfactual Learning to Rank in E-commerce (20)

Causality without headaches
Causality without headachesCausality without headaches
Causality without headaches
 
Setting up an A/B-testing framework
Setting up an A/B-testing frameworkSetting up an A/B-testing framework
Setting up an A/B-testing framework
 
Recommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyRecommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross Entropy
 
ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdf
 
Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De...
 Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De... Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De...
Human-Machine Collaboration in Organizations: Impact of Algorithm Bias on De...
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
Weka.arff
Weka.arffWeka.arff
Weka.arff
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
 
ODSC Causal Inference Workshop (November 2016) (1)
ODSC Causal Inference Workshop (November 2016) (1)ODSC Causal Inference Workshop (November 2016) (1)
ODSC Causal Inference Workshop (November 2016) (1)
 
Weka presentation cmt111
Weka presentation cmt111Weka presentation cmt111
Weka presentation cmt111
 
Linear Probability Models and Big Data: Prediction, Inference and Selection Bias
Linear Probability Models and Big Data: Prediction, Inference and Selection BiasLinear Probability Models and Big Data: Prediction, Inference and Selection Bias
Linear Probability Models and Big Data: Prediction, Inference and Selection Bias
 
Causal reasoning and Learning Systems
Causal reasoning and Learning SystemsCausal reasoning and Learning Systems
Causal reasoning and Learning Systems
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Empirical Evaluation of Active Learning in Recommender Systems
Empirical Evaluation of Active Learning in Recommender SystemsEmpirical Evaluation of Active Learning in Recommender Systems
Empirical Evaluation of Active Learning in Recommender Systems
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
 
Review: [KDD'21]Model-Agnostic Counterfactual Reasoning for Eliminating Popul...
Review: [KDD'21]Model-Agnostic Counterfactual Reasoning for Eliminating Popul...Review: [KDD'21]Model-Agnostic Counterfactual Reasoning for Eliminating Popul...
Review: [KDD'21]Model-Agnostic Counterfactual Reasoning for Eliminating Popul...
 
Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionCombining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

GTC 2021: Counterfactual Learning to Rank in E-commerce

  • 1. Counterfactual Learning to Rank: Alex Egg | @eggie5 Personalized Recommendations In Ecommerce
  • 2. Outline ● 2 Stage IR System ● Candidate Selection ● Ranking ● Personalization (Modeling Interactions) ● Features (for recommenders) ● Log Feedback ● Biased Data (Counterfactuals & Reinforcement) ● Training ● Tuning ● Deployment ● Evaluation ● Ops
  • 3.
  • 5. Two-Stage Information Retrieval System 2 Stages: ● Candidates ● Rankings
  • 6. Candidate Selection (Recall) Motivation: We can’t rank the whole catalog in SLA Fast/High recall set << Catalogue ● metadata-based filters: eg select items in user’s fav cuisines or genre ● Item co-occurrences: eg clusters that belong to your past items ● k-nearest neighbors: eg find similar items in Rn space (see ANN later)
  • 7. Ranking (Precision) Rank Candidates w/ high precision using Supervised Learning ● Classification ○ Binomial: P( click | u, i ) ○ Multinomial: P( I | u ) → Autoencoders ● Ranking ○ Pointwise, pairwise, listwise * Choice of approach is a product of your supervision labels Binary feedback or relevance labels
  • 8. Supervised Learning Task w/ sparse categorical variables: f(X) = y, D=(X,y), X=(U,R), {U,R,} ∈ R1 , y ∈ {0,1} Linear Model: P(y|X) = σ(Xw) = σ( u1 w1 + r2 w2 ) U(is_french) = {1, -1} R(is_french) = {1, -1} X=[1, 1] ← french lover + french rest X=[1, -1] ← french lover + non-french rest X=[-1, 1] ← french hater + french rest X=[-1,-1] ← french hater + non-french rest Personalization (Modeling Interactions) Feature Crosses: (2nd-order) σ(ɸ(X)w) = σ( u1 w1 + r2 w2 + u1 r1 w3 ) X=[1, 1, 1] ← french lover + french rest X=[1, -1, -1] ← french lover + non-french rest X=[-1, 1, -1] ← french hater + french rest X=[-1,-1, 1] ← french hater + non-french rest Go Deep!: 2-layer MLP How to model nth-Order Interactions? ● Explicit & implicit feature crosses (very sparse feature space, expensive) ● Combinations of explicit and implicit (wide & deep)!
  • 9. Deep & Cross Network Are multiplicative crosses enough? FMs → MLPs Recent studies [1, 2] found that DNNs are inefficient to even approximately model 2nd or 3rd-order feature crosses. ● What is advantage of DCN? ○ Efficient Explicit Feature Crosses 1: Latent cross: Making use of context in recurrent recommender systems. 2: Deep & Cross Network for Ad Click Predictions
  • 10. Features Sparse categorical variables → embeddings Examples: ● User ● Item ● Context
  • 11. Log Feedback Full-feedback → Partial-feedback (logs) ● Log Feedback ● Biased Data ● Evaluation Paradox
  • 12. Log Feedback D = (x, y, r) ● x: context (user) ● y: action (item/ranking) ● r: reward (feedback click/order) Feedback ● Explicit Feedback: like/stars ● Implicit Feedback: watch/click/order Tradeoff: quantity/quality � �
  • 13. Biased Data P( y | X ) Feedback ● Organic/Full-Feedback ○ Common in Academia, Rare in industry (ADS-16, MSLR-WEB30k) ● Bandit/partially-observed-Feedback ○ Click logs from industrial applications Analogy: What is your favourite color: red or black? => red P(y=red | X=🧐) = 1 ← is this actually true? “Missing, not at random” Apply this analogy to any recsys you use: netflix, spotify, amzn, grubhub
  • 14. Evaluation (Thought Experiment) ● Classic train/test split to predict the test set accurately... ● Dataset of production system logs D=(x,y,r) ... ● What is the value of predicting the test set accurately? ● Is the test-set a reflection of organic user behavior? (No) Or a reflection of the logging policy!? (Yes) ● There is a difference between a prediction and a recommendation (A recommendation is an intervention) ● Bandit feedback is the product of a logging policy ● Logging policy is the previous generation recommender, ie the dataset (logs) Goal Supervised Learning Predict test-set Actual Predict user behavior Test-set != user behavior
  • 15. Counterfactual Learning ● Selection Bias ● Position Bias Randomization and stochasticity
  • 16. Causal Embeddings for Recommendation. Bonner, Vasile. Recsys ‘18 Selection Bias (Randomization) Bias from Feedback loops Add Exploration → Stochasticity (Randomization) ● Random Exploration w/ ϵ-Greedy Bandit ● Causal Embeddings: Jointly factorize unbiased and greedy embeddings
  • 17. Position Bias (Randomization) Bias from devices Inverse Propensity Scoring Compute inverse propensities 1/bi across ranks for random bucket Offset loss:
  • 18. Counterfactual Evaluation ● Partial Information ● Full Information ● Partial Information w/ bandit feedback Medical Analogy
  • 19. Patient Bypass Stent Drugs 1 0 2 1 3 1 4 0 5 1 6 1 7 1 8 0 9 0 10 1 11 1 Partial Information Setting Counterfactual Thinking Treating Heart Attacks ● Treatments: Y: [bypass, stent, drugs] ● Outcomes δi : 5 year survival (0/1) 󰢛 Which treatment is best?? ● Drugs 3/4🏅 ● Stent ⅔ ● Bypass 2/4 Really? 🤔
  • 20. Patient Bypass Stent Drugs 1 0 1 0 2 1 1 0 3 0 0 1 4 0 0 0 5 0 1 1 6 1 0 0 7 1 0 1 8 0 1 0 9 0 1 0 10 1 1 0 11 1 1 1 Full Information Setting Treatment Effects Example: ● Bypass = 5/11 = .45 ● Stent = 7/11 = .63🏅 ● Drugs = 4/11 = .36
  • 21. Bypass Stent Drugs 0 1 0 1 1 0 0 0 1 0 0 0 0 1 1 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 1 1 1 Patient P_B P_S P_D 1 .3 .6 .1 2 .4 .5 .1 3 .1 .1 .8 4 .6 .3 .1 5 .2 .1 .7 6 .4 .2 .4 7 .1 .1 .8 8 .1 .8 .1 9 .3 .3 .4 10 .3. .2 .1 11 .4 .4 .2 Partial Information Setting w/ Bandit Feedback Assignment R’ IPS (y)=∑𝐈(yi =y)/pi δ(xi ,yi ) ●Bypass = 1/11 (0/.3 + 1/.4 + 0/.3 + 1/.4) = .45 ●Stent = 1/11 (1/.5 + 0/.3 + 1/.2) = .63🏅 ●Drugs = 1/11 (1/.8 + 1/.7 + 1/.8 + 0/.1) = .36
  • 22. Off-policy Evaluation, eg 🎉 Offline AB Testing🎉 1. Policy: Deterministic y = f(x) → Stochastic p ~ π(x) 2. Log propensites: D=(x,y,r,p) 3. Build IPS Estimator Counterfactual Evaluation
  • 23. Experiments & Results Personalized Policy ● Evaluation ● Interaction Modeling ● Multi-relevance Feedback ● Congregated Search ● Market Generalization ● GPU Workflows
  • 24. Evaluation Offline Metrics ● NDCG ● MRR ● AUC ● Gini-lorenz Hyperparameter Tuning ● Vertical Scaling across GPUs (p3) w/ ● Exhaustive Search over 66 combinations. w/o concurrency would take ~200h. Online Metrics ● Conversion Rate (Conv) ● Orders/Visitor (OPV) ● Revenue ● Diversity ● Fairness
  • 25. Offline Evaluation Metric Corr w/ Conv MRR .55 📈 AUC .51 NDCG .48 loss .02 Surrogate Metric: Can we get directional estimates of online metrics, offline? Can we design a metrics that tracks conversion rate?
  • 26. On-Policy Evaluation ● Most-popular policy vs Personalization Policy ● Personalization policy +20% & improves over time ● Diversity: ~5 cuisines/slate, 60% unique
  • 27. On-Policy Evaluation: Fairness Inequality (Lorenz Curve + Gini Index) Quantifies inequality (ie impressions across merchants or wealth across populations) The EE variant is more equitable than the MP variant. Gini Baseline .60 MP .59 EE .49 🏅
  • 28. Experiment: Causal Embeddings Hypothesis: If we use the uniform data in a principled manner we can increase performance by overcoming selection bias. Experiment: ● Random ● Biased ● Random ∪ Biased ● CausE Results: Principled use of uniform was beneficial Causal Embeddings for Recommendation. Bonner, Vasile. Recsys ‘18 AUC Random (small data) .56 Biased .72 Random ∪ Biased .73 Causal .74
  • 29. Experiment: Interaction Modeling Hypothesis: MLPs are universal function approximators? Experiment: Evaluate MLP against feature crosses Results: MLP does not capture full interactions NDCG (unintentful) MRR (unintentful) AUC (unintentful) Random .511 .216 .500 UMP .615 .582 .653 MLP .627 .586 .689 DCN .657 (+4.7%) .617 (+5.2%) .695 (+0.8%)
  • 30. Experiment: Multi-relevance Feedback Sources of Feedback: ● Impressions ● Clicks ● Orders Metric Disagreement Online Eval or Off-policy Eval NDCG (unintentful) MRR (unintentful) AUC (unintentful) Orders & Clicks .668 .633 .675 Clicks .665 (0%) .600 (-5%) .757 (+12%)
  • 31. Experiment: Global vs Local Models (Markets) If your recommender operates in markets of varying sizes with distinct cultural/taste patterns, it’s important that your recs are high-quality in all markets. ● Operational Pain ● Market Sparsity NDCG (unintentful) MRR (unintentful) AUC (unintentful) Local .617 0.557 0.635 Global .749 (+21%) 0.709 (+27.2%) 0.736 (+12%)
  • 32. Experiment: GPU Data Pipelines ● IO → CPU → RAM → GPU RAM → GPU ● IO Bound ○ Sequential Data Access: Libsvm → TFRecords, GPU: 4% → 90% ○ tf.data pipelines are CPU only ○ Vmap: batch → map ○ Prefetch (to GPU memory) Step/s Train Time (h) Batch Size CPU (64 cores) 0.58 47 448 K80(4992 cores) 3.2 10 127 V100* (5120 cores) 8.5 (2-3x) 3.2 (3x) 448 * Not using Tensor-Cores/FP16
  • 33.
  • 34. Thank You Alex Egg | @eggie5