SlideShare a Scribd company logo
Recap: Designing a more Efficient Estimator for
Off-policy Evaluation in Bandits with Large Action
Spaces
Ajinkya More, Linas Baltrunas, Nikos Vlassis, Justin Basilico
REVEAL Workshop at RecSys 2019
Copenhagen, Denmark, Sept 20, 2019
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 1 / 19
Introduction
Contextual bandits are a common modeling tool in modern
personalization systems (e.g. artwork personalization at Netflix,
playlist ordering on Spotify, ad ranking in web search)
Evaluating bandit algorithms via online A/B testing is expensive
Off-policy evaluation is a preferred alternative
Goal: A sensitive, efficient estimator that aligns with online metrics
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 2 / 19
Challenge
Several metrics proposed in literature have limitations in personalization
settings:
Replay, IPS, SNIPS - High variance
Few matches with many arms
Requires very large amounts of data
Direct Method, Doubly Robust - Require a reward model
This model is used to evaluate other models
Introduces modeling bias
E.g. what features to use in the reward model?
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 3 / 19
Overview
This paper:
Introduce a new off-policy evaluation metric: Recap
Trades off bias for significantly lowering variance in off-policy evaluation
Model free - No need to define reward model
Study properties via simulated data
Investigate alignment with online metric
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 4 / 19
Notation
Let A = {1, 2, ..., K} be the set of actions.
In trial t ∈ {1, 2, ..., n}, the environment chooses a context xt and in
response the agent chooses an arm at ∈ A.
The environment then reveals a reward ra,t ∈ [0, 1].
Denote the probability of the logging policy taking action a for
context x as π(x, a) and the logged action at trial t as aπ.
We want to estimate the expected reward E[ra] of a deterministic
target policy τ given data from a potentially different logging policy.
We assume the target policy is mediated by a scorer or a ranker, that
assigns a real valued score s(x, a) to each arm and selects the arm
argmaxa∈As(x, a).
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 5 / 19
IPS, SNIPS
Inverse Propensity Scoring (IPS):
ˆV τ
IPS =
Σn
t=0raπ
I(aτ =aπ)
π(xt ,aπ)
n
(1)
Self-Normalized Inverse Propensity Scoring (SNIPS):
ˆV τ
SNIPS =
Σn
t=0raπ
I(aτ =aπ)
π(xt ,aπ)
Σn
t=0
I(aτ =aπ)
π(xt ,aπ)
(2)
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 6 / 19
Recap
ˆV τ
Recap =
Σn
t=0raπ
RR
π(xt ,aπ)
Σn
t=0
RR
π(xt ,aπ)
(3)
where
RR =
1
Σa∈AI(s(x, a) ≥ s(x, aπ))
is simply the reciprocal rank of the logged action according to the scores
assigned by the target policy.
It assigns a“partial credit” when the logged action and target policy action
are different (which may be seen as “stochastifying” a deterministic
policy).
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 7 / 19
Recap(m)
More generally, for m ∈ N (or even m ∈ R+), we define,
ˆV τ
Recap(m) =
Σn
t=0raπ
RRm
π(xt ,aπ)
Σn
t=0
RRm
π(xt ,aπ)
(4)
ˆV τ
Recap(1) = ˆV τ
Recap
As m → ∞, ˆV τ
Recap(m) → ˆV τ
SNIPS
Thus, m allows us to smoothly trade off bias for variance.
Though we find m = 1 to work well in practice.
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 8 / 19
Simulations
Set up
1 The rewards for arm a are drawn from a Bernoulli distribution with
parameter θa = 1/a
2 The arm scores s(x, a) are drawn from a normal distribution N(µ, σ)
where σ = 0.1 and µ is drawn from a uniform distribution with
support [0, 0.2].
3 In each trial, the logging policy samples the action from a
multinomial distribution over the arms.
4 The target policy is argmaxa∈As(x, a).
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 9 / 19
Behavior in small action spaces
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 10 / 19
Behavior in large action spaces
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 11 / 19
Variance: Small number of arms
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 12 / 19
Variance: Medium number of arms
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 13 / 19
Variance: Large number of arms
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 14 / 19
Risk
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 15 / 19
Online–Offline correlation
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 16 / 19
Summary
Recap: An off-policy estimator for bandits based on MRR
Trades off a small increase in bias (compared with IPS) for
significantly reduced variance compared to both IPS and SNIPS
Recap uses all the available data making it more efficient with small
logged data or large numbers of arms
Doesn’t require a reward model as compared to Direct Method and
Doubly Robust
Signs of alignment with online metrics
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 17 / 19
Ongoing Work
Analyze some theoretical properties of the estimator
Bias/Variance/Risk of the estimator
Closed form expressions
Asymptotic properties
Is the estimator consistent?
Deviation bounds
Training models to optimize Recap
Connection to Learning to Rank approaches
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 18 / 19
Thank you!
Questions?
Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 19 / 19

More Related Content

What's hot

Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
Yves Raimond
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Jaya Kawale
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
Faisal Siddiqi
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
Faisal Siddiqi
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Harald Steck
 
Netflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 Stars
Xavier Amatriain
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
Anoop Deoras
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at Netflix
Jiangwei Pan
 
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Databricks
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Fernando Amat
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
YONG ZHENG
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 

What's hot (20)

Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Netflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 Stars
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at Netflix
 
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 

More from Justin Basilico

Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
Justin Basilico
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico
 

More from Justin Basilico (6)

Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 

Recently uploaded

LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdfLeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
SelfMade bd
 
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
bellared2
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
SynapseIndia
 
Step-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From ScratchStep-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From Scratch
softsuave
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
Razin Mustafiz
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
DianaGray10
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
FIDO Alliance
 
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision MakingConnector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
DianaGray10
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
ZachWylie3
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
SubhamMandal40
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
Zilliz
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
Baishakhi Ray
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
shyamraj55
 

Recently uploaded (20)

LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdfLeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
 
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
 
Step-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From ScratchStep-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From Scratch
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
 
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision MakingConnector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
 

Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Bandits with Large Action Spaces

  • 1. Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Bandits with Large Action Spaces Ajinkya More, Linas Baltrunas, Nikos Vlassis, Justin Basilico REVEAL Workshop at RecSys 2019 Copenhagen, Denmark, Sept 20, 2019 Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 1 / 19
  • 2. Introduction Contextual bandits are a common modeling tool in modern personalization systems (e.g. artwork personalization at Netflix, playlist ordering on Spotify, ad ranking in web search) Evaluating bandit algorithms via online A/B testing is expensive Off-policy evaluation is a preferred alternative Goal: A sensitive, efficient estimator that aligns with online metrics Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 2 / 19
  • 3. Challenge Several metrics proposed in literature have limitations in personalization settings: Replay, IPS, SNIPS - High variance Few matches with many arms Requires very large amounts of data Direct Method, Doubly Robust - Require a reward model This model is used to evaluate other models Introduces modeling bias E.g. what features to use in the reward model? Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 3 / 19
  • 4. Overview This paper: Introduce a new off-policy evaluation metric: Recap Trades off bias for significantly lowering variance in off-policy evaluation Model free - No need to define reward model Study properties via simulated data Investigate alignment with online metric Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 4 / 19
  • 5. Notation Let A = {1, 2, ..., K} be the set of actions. In trial t ∈ {1, 2, ..., n}, the environment chooses a context xt and in response the agent chooses an arm at ∈ A. The environment then reveals a reward ra,t ∈ [0, 1]. Denote the probability of the logging policy taking action a for context x as π(x, a) and the logged action at trial t as aπ. We want to estimate the expected reward E[ra] of a deterministic target policy τ given data from a potentially different logging policy. We assume the target policy is mediated by a scorer or a ranker, that assigns a real valued score s(x, a) to each arm and selects the arm argmaxa∈As(x, a). Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 5 / 19
  • 6. IPS, SNIPS Inverse Propensity Scoring (IPS): ˆV τ IPS = Σn t=0raπ I(aτ =aπ) π(xt ,aπ) n (1) Self-Normalized Inverse Propensity Scoring (SNIPS): ˆV τ SNIPS = Σn t=0raπ I(aτ =aπ) π(xt ,aπ) Σn t=0 I(aτ =aπ) π(xt ,aπ) (2) Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 6 / 19
  • 7. Recap ˆV τ Recap = Σn t=0raπ RR π(xt ,aπ) Σn t=0 RR π(xt ,aπ) (3) where RR = 1 Σa∈AI(s(x, a) ≥ s(x, aπ)) is simply the reciprocal rank of the logged action according to the scores assigned by the target policy. It assigns a“partial credit” when the logged action and target policy action are different (which may be seen as “stochastifying” a deterministic policy). Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 7 / 19
  • 8. Recap(m) More generally, for m ∈ N (or even m ∈ R+), we define, ˆV τ Recap(m) = Σn t=0raπ RRm π(xt ,aπ) Σn t=0 RRm π(xt ,aπ) (4) ˆV τ Recap(1) = ˆV τ Recap As m → ∞, ˆV τ Recap(m) → ˆV τ SNIPS Thus, m allows us to smoothly trade off bias for variance. Though we find m = 1 to work well in practice. Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 8 / 19
  • 9. Simulations Set up 1 The rewards for arm a are drawn from a Bernoulli distribution with parameter θa = 1/a 2 The arm scores s(x, a) are drawn from a normal distribution N(µ, σ) where σ = 0.1 and µ is drawn from a uniform distribution with support [0, 0.2]. 3 In each trial, the logging policy samples the action from a multinomial distribution over the arms. 4 The target policy is argmaxa∈As(x, a). Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 9 / 19
  • 10. Behavior in small action spaces Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 10 / 19
  • 11. Behavior in large action spaces Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 11 / 19
  • 12. Variance: Small number of arms Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 12 / 19
  • 13. Variance: Medium number of arms Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 13 / 19
  • 14. Variance: Large number of arms Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 14 / 19
  • 15. Risk Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 15 / 19
  • 16. Online–Offline correlation Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 16 / 19
  • 17. Summary Recap: An off-policy estimator for bandits based on MRR Trades off a small increase in bias (compared with IPS) for significantly reduced variance compared to both IPS and SNIPS Recap uses all the available data making it more efficient with small logged data or large numbers of arms Doesn’t require a reward model as compared to Direct Method and Doubly Robust Signs of alignment with online metrics Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 17 / 19
  • 18. Ongoing Work Analyze some theoretical properties of the estimator Bias/Variance/Risk of the estimator Closed form expressions Asymptotic properties Is the estimator consistent? Deviation bounds Training models to optimize Recap Connection to Learning to Rank approaches Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 18 / 19
  • 19. Thank you! Questions? Ajinkya, Linas, Nikos, Justin Recap Metric REVEAL, RecSys 2019 19 / 19