SlideShare a Scribd company logo
Recap: Designing a more Efficient Estimator for
Off-policy Evaluation in Bandits with Large Action
Spaces
Ajinkya More, Linas Baltrunas, Nikos Vlassis, Justin Basilico
REVEAL Workshop at RecSys 2019
Copenhagen, Denmark, Sept 20, 2019
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 1 / 19
Introduction
Contextual bandits are a common modeling tool in modern
personalization systems (e.g. artwork personalization at Netflix,
playlist ordering on Spotify, ad ranking in web search)
Evaluating bandit algorithms via online A/B testing is expensive
Off-policy evaluation is a preferred alternative
Goal: A sensitive, efficient estimator that aligns with online metrics
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 2 / 19
Challenge
Several metrics proposed in literature have limitations in personalization
settings:
Replay, IPS, SNIPS - High variance
Few matches with many arms
Requires very large amounts of data
Direct Method, Doubly Robust - Require a reward model
This model is used to evaluate other models
Introduces modeling bias
E.g. what features to use in the reward model?
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 3 / 19
Overview
This paper:
Introduce a new off-policy evaluation metric: Recap
Trades off bias for significantly lowering variance in off-policy evaluation
Model free - No need to define reward model
Study properties via simulated data
Investigate alignment with online metric
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 4 / 19
Notation
Let A = {1, 2, ..., K} be the set of actions.
In trial t ∈ {1, 2, ..., n}, the environment chooses a context xt and in
response the agent chooses an arm at ∈ A.
The environment then reveals a reward ra,t ∈ [0, 1].
Denote the probability of the logging policy taking action a for
context x as π(x, a) and the logged action at trial t as aπ.
We want to estimate the expected reward E[ra] of a deterministic
target policy τ given data from a potentially different logging policy.
We assume the target policy is mediated by a scorer or a ranker, that
assigns a real valued score s(x, a) to each arm and selects the arm
argmaxa∈As(x, a).
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 5 / 19
IPS, SNIPS
Inverse Propensity Scoring (IPS):
ˆV τ
IPS =
Σn
t=0raπ
I(aτ =aπ)
π(xt ,aπ)
n
(1)
Self-Normalized Inverse Propensity Scoring (SNIPS):
ˆV τ
SNIPS =
Σn
t=0raπ
I(aτ =aπ)
π(xt ,aπ)
Σn
t=0
I(aτ =aπ)
π(xt ,aπ)
(2)
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 6 / 19
Recap
ˆV τ
Recap =
Σn
t=0raπ
RR
π(xt ,aπ)
Σn
t=0
RR
π(xt ,aπ)
(3)
where
RR =
1
Σa∈AI(s(x, a) ≥ s(x, aπ))
is simply the reciprocal rank of the logged action according to the scores
assigned by the target policy.
It assigns a“partial credit” when the logged action and target policy action
are different (which may be seen as “stochastifying” a deterministic
policy).
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 7 / 19
Recap(m)
More generally, for m ∈ N (or even m ∈ R+), we define,
ˆV τ
Recap(m) =
Σn
t=0raπ
RRm
π(xt ,aπ)
Σn
t=0
RRm
π(xt ,aπ)
(4)
ˆV τ
Recap(1) = ˆV τ
Recap
As m → ∞, ˆV τ
Recap(m) → ˆV τ
SNIPS
Thus, m allows us to smoothly trade off bias for variance.
Though we find m = 1 to work well in practice.
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 8 / 19
Simulations
Set up
1 The rewards for arm a are drawn from a Bernoulli distribution with
parameter θa = 1/a
2 The arm scores s(x, a) are drawn from a normal distribution N(µ, σ)
where σ = 0.1 and µ is drawn from a uniform distribution with
support [0, 0.2].
3 In each trial, the logging policy samples the action from a
multinomial distribution over the arms.
4 The target policy is argmaxa∈As(x, a).
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 9 / 19
Behavior in small action spaces
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 10 / 19
Behavior in large action spaces
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 11 / 19
Variance: Small number of arms
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 12 / 19
Variance: Medium number of arms
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 13 / 19
Variance: Large number of arms
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 14 / 19
Risk
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 15 / 19
Online–Offline correlation
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 16 / 19
Summary
Recap: An off-policy estimator for bandits based on MRR
Trades off a small increase in bias (compared with IPS) for
significantly reduced variance compared to both IPS and SNIPS
Recap uses all the available data making it more efficient with small
logged data or large numbers of arms
Doesn’t require a reward model as compared to Direct Method and
Doubly Robust
Signs of alignment with online metrics
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 17 / 19
Ongoing Work
Analyze some theoretical properties of the estimator
Bias/Variance/Risk of the estimator
Closed form expressions
Asymptotic properties
Is the estimator consistent?
Deviation bounds
Training models to optimize Recap
Connection to Learning to Rank approaches
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 18 / 19
Thank you!
Questions?
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 19 / 19

More Related Content

What's hot

Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
Parmeshwar Khurd
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Anoop Deoras
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
Justin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Fernando Amat
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
Anoop Deoras
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
Faisal Siddiqi
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
Zachary Schendel
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
Roelof van Zwol
 
Reward Innovation for long-term member satisfaction
Reward Innovation for long-term member satisfactionReward Innovation for long-term member satisfaction
Reward Innovation for long-term member satisfaction
Jiangwei Pan
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Förderverein Technische Fakultät
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at Netflix
Jiangwei Pan
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 

What's hot (20)

Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
Reward Innovation for long-term member satisfaction
Reward Innovation for long-term member satisfactionReward Innovation for long-term member satisfaction
Reward Innovation for long-term member satisfaction
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at Netflix
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 

More from Justin Basilico

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
Justin Basilico
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico
 

More from Justin Basilico (7)

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 

Recently uploaded

National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 

Recently uploaded (20)

National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 

Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Bandits with Large Action Spaces

  • 1. Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Bandits with Large Action Spaces Ajinkya More, Linas Baltrunas, Nikos Vlassis, Justin Basilico REVEAL Workshop at RecSys 2019 Copenhagen, Denmark, Sept 20, 2019 Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 1 / 19
  • 2. Introduction Contextual bandits are a common modeling tool in modern personalization systems (e.g. artwork personalization at Netflix, playlist ordering on Spotify, ad ranking in web search) Evaluating bandit algorithms via online A/B testing is expensive Off-policy evaluation is a preferred alternative Goal: A sensitive, efficient estimator that aligns with online metrics Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 2 / 19
  • 3. Challenge Several metrics proposed in literature have limitations in personalization settings: Replay, IPS, SNIPS - High variance Few matches with many arms Requires very large amounts of data Direct Method, Doubly Robust - Require a reward model This model is used to evaluate other models Introduces modeling bias E.g. what features to use in the reward model? Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 3 / 19
  • 4. Overview This paper: Introduce a new off-policy evaluation metric: Recap Trades off bias for significantly lowering variance in off-policy evaluation Model free - No need to define reward model Study properties via simulated data Investigate alignment with online metric Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 4 / 19
  • 5. Notation Let A = {1, 2, ..., K} be the set of actions. In trial t ∈ {1, 2, ..., n}, the environment chooses a context xt and in response the agent chooses an arm at ∈ A. The environment then reveals a reward ra,t ∈ [0, 1]. Denote the probability of the logging policy taking action a for context x as π(x, a) and the logged action at trial t as aπ. We want to estimate the expected reward E[ra] of a deterministic target policy τ given data from a potentially different logging policy. We assume the target policy is mediated by a scorer or a ranker, that assigns a real valued score s(x, a) to each arm and selects the arm argmaxa∈As(x, a). Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 5 / 19
  • 6. IPS, SNIPS Inverse Propensity Scoring (IPS): ˆV τ IPS = Σn t=0raπ I(aτ =aπ) π(xt ,aπ) n (1) Self-Normalized Inverse Propensity Scoring (SNIPS): ˆV τ SNIPS = Σn t=0raπ I(aτ =aπ) π(xt ,aπ) Σn t=0 I(aτ =aπ) π(xt ,aπ) (2) Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 6 / 19
  • 7. Recap ˆV τ Recap = Σn t=0raπ RR π(xt ,aπ) Σn t=0 RR π(xt ,aπ) (3) where RR = 1 Σa∈AI(s(x, a) ≥ s(x, aπ)) is simply the reciprocal rank of the logged action according to the scores assigned by the target policy. It assigns a“partial credit” when the logged action and target policy action are different (which may be seen as “stochastifying” a deterministic policy). Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 7 / 19
  • 8. Recap(m) More generally, for m ∈ N (or even m ∈ R+), we define, ˆV τ Recap(m) = Σn t=0raπ RRm π(xt ,aπ) Σn t=0 RRm π(xt ,aπ) (4) ˆV τ Recap(1) = ˆV τ Recap As m → ∞, ˆV τ Recap(m) → ˆV τ SNIPS Thus, m allows us to smoothly trade off bias for variance. Though we find m = 1 to work well in practice. Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 8 / 19
  • 9. Simulations Set up 1 The rewards for arm a are drawn from a Bernoulli distribution with parameter θa = 1/a 2 The arm scores s(x, a) are drawn from a normal distribution N(µ, σ) where σ = 0.1 and µ is drawn from a uniform distribution with support [0, 0.2]. 3 In each trial, the logging policy samples the action from a multinomial distribution over the arms. 4 The target policy is argmaxa∈As(x, a). Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 9 / 19
  • 10. Behavior in small action spaces Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 10 / 19
  • 11. Behavior in large action spaces Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 11 / 19
  • 12. Variance: Small number of arms Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 12 / 19
  • 13. Variance: Medium number of arms Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 13 / 19
  • 14. Variance: Large number of arms Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 14 / 19
  • 15. Risk Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 15 / 19
  • 16. Online–Offline correlation Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 16 / 19
  • 17. Summary Recap: An off-policy estimator for bandits based on MRR Trades off a small increase in bias (compared with IPS) for significantly reduced variance compared to both IPS and SNIPS Recap uses all the available data making it more efficient with small logged data or large numbers of arms Doesn’t require a reward model as compared to Direct Method and Doubly Robust Signs of alignment with online metrics Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 17 / 19
  • 18. Ongoing Work Analyze some theoretical properties of the estimator Bias/Variance/Risk of the estimator Closed form expressions Asymptotic properties Is the estimator consistent? Deviation bounds Training models to optimize Recap Connection to Learning to Rank approaches Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 18 / 19
  • 19. Thank you! Questions? Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 19 / 19