SlideShare a Scribd company logo
Reward innovation for
long-term member
satisfaction
Gary Tang, Jiangwei Pan, Henry
Wang and Justin Basilico
Goal
Create a personalized homepage to help
members find content to watch and enjoy
that
maximizes long-term member satisfaction.
Member enjoys watching Netflix
So continues the subscription
and
tells their friends about it
Long-term satisfaction for Netflix
Batch learning from bandit feedback
Production policy
● show recommendations to the member
Member gives feedbacks on recommendations
● immediate: skip/play a show
● long-term: cancel/renew subscription
Goal
● train a policy to maximize the long-term reward
Batch learning from bandit feedback
Production policy
● show recommendations to the member
Member gives feedbacks on recommendations
● immediate: skip/play a show
● long-term: cancel/renew subscription
Goal
● train a policy to maximize the long-term reward
Challenges with long-term retention
Noisy signal
Influenced by external
factors
Nonsensitive signal
Only sensitive for
“borderline” members
Delayed signal
Need to wait a long time &
hard to attribute
Proxy reward
Train policy to optimize proxy reward
● highly correlated with long-term
satisfaction
● sensitive to individual
recommendations
Immediate feedback as proxy
● e.g. start to play a show
Delayed feedback could be more aligned
● e.g. completing a show
Delayed proxy reward
Need to wait a long time to observe the
delayed reward
● can not be used in training
immediately
● can hurt coldstarting of model
Don’t want to wait? predict delayed reward
● use all user actions up to policy training
time
Train policy to maximize predicted reward
Note: predicted reward can not be used
online as it uses post-recommendation user
actions
The ideas are not new
Related work:
● Long-term optimization in recommenders
● Reward shaping in reinforcement learning
● Online reward optimization
We focus on reward innovation as an important
product development workstream at Netflix
Integrating reward in bandit
Reward component
provides the objective
of bandit policy
training data
Reward innovation
● Ideation: what aspect of long-term satisfaction hasn’t been
captured as a reward?
○ Requires balancing perspectives of ML, business, and
psychology. Not easy!
● Development: how do we compute this new reward for every
recommended item?
○ Collecting immediate feedbacks, predicting delayed feedbacks
● Evaluation: whether this is a good reward for the bandit
policy to maximize?
Reward infrastructure
Common development patterns
● Computing immediate proxy
rewards
● Predicting delayed proxy rewards
● Combining multiple rewards
● Sharing rewards across multiple
recommender components
Reward evaluation
● Online testing is expensive (time, resources)
● Use offline evaluation to determine promising
reward candidates for online testing
● Challenge: hard to compare policies trained with
different rewards
Offline reward evaluation
● compare policies along multiple
reward axes using OPE
● choose small number of candidate
policies on pareto front
● compare candidate policies online
using long-term user satisfaction
metrics
Practical learnings
● Reward normalization: dynamic range of reward
can affect SGD training dynamics
● Reward features: pairing a reward with a correlated
feature tends to improve the model’s ability to
optimize that reward
● Reward alignment: make different parts of the
overall recommender system “point in the same
direction”
Summary
● Proxy rewards: we can train bandit policies using
proxy rewards to optimize long-term member
satisfaction
● Art and science: to come up with good reward
hypotheses
● Supporting infrastructure: we develop
infrastructure to help iterate on new hypotheses
quickly
Open challenges
● Proxy rewards: how can we identify proxy rewards that are
aligned with long-term satisfaction in a more principle
way?
● Reinforcement learning: can we use reinforcement
learning to optimize long-term reward directly for
recommendations?
Thank you!
Questions?
Jiangwei Pan
jpan@netflix.com

More Related Content

What's hot

A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Correlation, causation and incrementally recommendation problems at netflix ...
Correlation, causation and incrementally  recommendation problems at netflix ...Correlation, causation and incrementally  recommendation problems at netflix ...
Correlation, causation and incrementally recommendation problems at netflix ...
Roelof van Zwol
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
Justin Basilico
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
Parmeshwar Khurd
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
Oguz Semerci
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Yves Raimond
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Förderverein Technische Fakultät
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
Anoop Deoras
 
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
Anmol Bhasin
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 

What's hot (20)

A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Correlation, causation and incrementally recommendation problems at netflix ...
Correlation, causation and incrementally  recommendation problems at netflix ...Correlation, causation and incrementally  recommendation problems at netflix ...
Correlation, causation and incrementally recommendation problems at netflix ...
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 

Similar to Reward Innovation for long-term member satisfaction

VIP Program Creation
VIP Program CreationVIP Program Creation
VIP Program Creation
Jenny Weigle
 
2016 5-5 technology association of georgia - tag - sales force engagement (2)...
2016 5-5 technology association of georgia - tag - sales force engagement (2)...2016 5-5 technology association of georgia - tag - sales force engagement (2)...
2016 5-5 technology association of georgia - tag - sales force engagement (2)...
Erin Bush
 
Headspace apm growth project
Headspace apm growth projectHeadspace apm growth project
Headspace apm growth project
Dhiren Patel
 
Analytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation SlidesAnalytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation Slides
HarvardComms
 
Aligning with partners on customer success
Aligning with partners on customer successAligning with partners on customer success
Aligning with partners on customer success
Matthew Klassen
 
Case Study: How Our B2B Tech Company Amplified Demand Gen with Podcast Advert...
Case Study: How Our B2B Tech Company Amplified Demand Gen with Podcast Advert...Case Study: How Our B2B Tech Company Amplified Demand Gen with Podcast Advert...
Case Study: How Our B2B Tech Company Amplified Demand Gen with Podcast Advert...
Kiwi Creative
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Jaya Kawale
 
Places for People - Web Strategy Presentation
Places for People - Web Strategy PresentationPlaces for People - Web Strategy Presentation
Places for People - Web Strategy Presentation
Michael Sutcliffe (CIM, BA Hons)
 
How to Gauge Engagement in Moodle Using NPS
How to Gauge Engagement in Moodle Using NPSHow to Gauge Engagement in Moodle Using NPS
How to Gauge Engagement in Moodle Using NPS
Lambda Solutions
 
Dev's Guide to Feedback Driven Development
Dev's Guide to Feedback Driven DevelopmentDev's Guide to Feedback Driven Development
Dev's Guide to Feedback Driven Development
Marty Haught
 
Panel: Making responsible gambling work within the industry
Panel: Making responsible gambling work within the industry Panel: Making responsible gambling work within the industry
Panel: Making responsible gambling work within the industry
Horizons RG
 
Net Promoter Camp Training
Net Promoter Camp TrainingNet Promoter Camp Training
Net Promoter Camp Training
IPRA conference
 
Serious Games + Learning Science = Win: How to Teach Product Knowledge, Polic...
Serious Games + Learning Science = Win: How to Teach Product Knowledge, Polic...Serious Games + Learning Science = Win: How to Teach Product Knowledge, Polic...
Serious Games + Learning Science = Win: How to Teach Product Knowledge, Polic...
Bottom-Line Performance
 
Mkt2 building youronlinemktgpresence
Mkt2 building youronlinemktgpresenceMkt2 building youronlinemktgpresence
Mkt2 building youronlinemktgpresence
brock55
 
The New World of Channel Incentives
The New World of Channel IncentivesThe New World of Channel Incentives
The New World of Channel Incentives
CCI - An E2open Company
 
5 Ways to Increase the Effectiveness of Your Compensation Programs
5 Ways to Increase the Effectiveness of Your Compensation Programs5 Ways to Increase the Effectiveness of Your Compensation Programs
5 Ways to Increase the Effectiveness of Your Compensation Programs
Human Capital Media
 
How to Keep Up Your Creative Testing After Budget Cuts
How to Keep Up Your Creative Testing After Budget CutsHow to Keep Up Your Creative Testing After Budget Cuts
How to Keep Up Your Creative Testing After Budget Cuts
Tinuiti
 
Necessary Elements of Digital Marketing to Grow Your Business
Necessary Elements of Digital Marketing to Grow Your BusinessNecessary Elements of Digital Marketing to Grow Your Business
Necessary Elements of Digital Marketing to Grow Your Business
Digital Vidya
 
First 30 days of Your CRO Program
First 30 days of Your CRO ProgramFirst 30 days of Your CRO Program
First 30 days of Your CRO Program
VWO
 
Closing the Loop: Enhancing User Experience with Monetization | Tal Shoham
Closing the Loop: Enhancing User Experience with Monetization | Tal ShohamClosing the Loop: Enhancing User Experience with Monetization | Tal Shoham
Closing the Loop: Enhancing User Experience with Monetization | Tal Shoham
Jessica Tams
 

Similar to Reward Innovation for long-term member satisfaction (20)

VIP Program Creation
VIP Program CreationVIP Program Creation
VIP Program Creation
 
2016 5-5 technology association of georgia - tag - sales force engagement (2)...
2016 5-5 technology association of georgia - tag - sales force engagement (2)...2016 5-5 technology association of georgia - tag - sales force engagement (2)...
2016 5-5 technology association of georgia - tag - sales force engagement (2)...
 
Headspace apm growth project
Headspace apm growth projectHeadspace apm growth project
Headspace apm growth project
 
Analytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation SlidesAnalytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation Slides
 
Aligning with partners on customer success
Aligning with partners on customer successAligning with partners on customer success
Aligning with partners on customer success
 
Case Study: How Our B2B Tech Company Amplified Demand Gen with Podcast Advert...
Case Study: How Our B2B Tech Company Amplified Demand Gen with Podcast Advert...Case Study: How Our B2B Tech Company Amplified Demand Gen with Podcast Advert...
Case Study: How Our B2B Tech Company Amplified Demand Gen with Podcast Advert...
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Places for People - Web Strategy Presentation
Places for People - Web Strategy PresentationPlaces for People - Web Strategy Presentation
Places for People - Web Strategy Presentation
 
How to Gauge Engagement in Moodle Using NPS
How to Gauge Engagement in Moodle Using NPSHow to Gauge Engagement in Moodle Using NPS
How to Gauge Engagement in Moodle Using NPS
 
Dev's Guide to Feedback Driven Development
Dev's Guide to Feedback Driven DevelopmentDev's Guide to Feedback Driven Development
Dev's Guide to Feedback Driven Development
 
Panel: Making responsible gambling work within the industry
Panel: Making responsible gambling work within the industry Panel: Making responsible gambling work within the industry
Panel: Making responsible gambling work within the industry
 
Net Promoter Camp Training
Net Promoter Camp TrainingNet Promoter Camp Training
Net Promoter Camp Training
 
Serious Games + Learning Science = Win: How to Teach Product Knowledge, Polic...
Serious Games + Learning Science = Win: How to Teach Product Knowledge, Polic...Serious Games + Learning Science = Win: How to Teach Product Knowledge, Polic...
Serious Games + Learning Science = Win: How to Teach Product Knowledge, Polic...
 
Mkt2 building youronlinemktgpresence
Mkt2 building youronlinemktgpresenceMkt2 building youronlinemktgpresence
Mkt2 building youronlinemktgpresence
 
The New World of Channel Incentives
The New World of Channel IncentivesThe New World of Channel Incentives
The New World of Channel Incentives
 
5 Ways to Increase the Effectiveness of Your Compensation Programs
5 Ways to Increase the Effectiveness of Your Compensation Programs5 Ways to Increase the Effectiveness of Your Compensation Programs
5 Ways to Increase the Effectiveness of Your Compensation Programs
 
How to Keep Up Your Creative Testing After Budget Cuts
How to Keep Up Your Creative Testing After Budget CutsHow to Keep Up Your Creative Testing After Budget Cuts
How to Keep Up Your Creative Testing After Budget Cuts
 
Necessary Elements of Digital Marketing to Grow Your Business
Necessary Elements of Digital Marketing to Grow Your BusinessNecessary Elements of Digital Marketing to Grow Your Business
Necessary Elements of Digital Marketing to Grow Your Business
 
First 30 days of Your CRO Program
First 30 days of Your CRO ProgramFirst 30 days of Your CRO Program
First 30 days of Your CRO Program
 
Closing the Loop: Enhancing User Experience with Monetization | Tal Shoham
Closing the Loop: Enhancing User Experience with Monetization | Tal ShohamClosing the Loop: Enhancing User Experience with Monetization | Tal Shoham
Closing the Loop: Enhancing User Experience with Monetization | Tal Shoham
 

Recently uploaded

A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 

Recently uploaded (20)

A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 

Reward Innovation for long-term member satisfaction

  • 1. Reward innovation for long-term member satisfaction Gary Tang, Jiangwei Pan, Henry Wang and Justin Basilico
  • 2. Goal Create a personalized homepage to help members find content to watch and enjoy that maximizes long-term member satisfaction.
  • 3. Member enjoys watching Netflix So continues the subscription and tells their friends about it Long-term satisfaction for Netflix
  • 4. Batch learning from bandit feedback Production policy ● show recommendations to the member Member gives feedbacks on recommendations ● immediate: skip/play a show ● long-term: cancel/renew subscription Goal ● train a policy to maximize the long-term reward
  • 5. Batch learning from bandit feedback Production policy ● show recommendations to the member Member gives feedbacks on recommendations ● immediate: skip/play a show ● long-term: cancel/renew subscription Goal ● train a policy to maximize the long-term reward
  • 6. Challenges with long-term retention Noisy signal Influenced by external factors Nonsensitive signal Only sensitive for “borderline” members Delayed signal Need to wait a long time & hard to attribute
  • 7. Proxy reward Train policy to optimize proxy reward ● highly correlated with long-term satisfaction ● sensitive to individual recommendations Immediate feedback as proxy ● e.g. start to play a show Delayed feedback could be more aligned ● e.g. completing a show
  • 8. Delayed proxy reward Need to wait a long time to observe the delayed reward ● can not be used in training immediately ● can hurt coldstarting of model Don’t want to wait? predict delayed reward ● use all user actions up to policy training time Train policy to maximize predicted reward Note: predicted reward can not be used online as it uses post-recommendation user actions
  • 9. The ideas are not new Related work: ● Long-term optimization in recommenders ● Reward shaping in reinforcement learning ● Online reward optimization We focus on reward innovation as an important product development workstream at Netflix
  • 10. Integrating reward in bandit Reward component provides the objective of bandit policy training data
  • 11. Reward innovation ● Ideation: what aspect of long-term satisfaction hasn’t been captured as a reward? ○ Requires balancing perspectives of ML, business, and psychology. Not easy! ● Development: how do we compute this new reward for every recommended item? ○ Collecting immediate feedbacks, predicting delayed feedbacks ● Evaluation: whether this is a good reward for the bandit policy to maximize?
  • 12. Reward infrastructure Common development patterns ● Computing immediate proxy rewards ● Predicting delayed proxy rewards ● Combining multiple rewards ● Sharing rewards across multiple recommender components
  • 13. Reward evaluation ● Online testing is expensive (time, resources) ● Use offline evaluation to determine promising reward candidates for online testing ● Challenge: hard to compare policies trained with different rewards
  • 14. Offline reward evaluation ● compare policies along multiple reward axes using OPE ● choose small number of candidate policies on pareto front ● compare candidate policies online using long-term user satisfaction metrics
  • 15. Practical learnings ● Reward normalization: dynamic range of reward can affect SGD training dynamics ● Reward features: pairing a reward with a correlated feature tends to improve the model’s ability to optimize that reward ● Reward alignment: make different parts of the overall recommender system “point in the same direction”
  • 16. Summary ● Proxy rewards: we can train bandit policies using proxy rewards to optimize long-term member satisfaction ● Art and science: to come up with good reward hypotheses ● Supporting infrastructure: we develop infrastructure to help iterate on new hypotheses quickly
  • 17. Open challenges ● Proxy rewards: how can we identify proxy rewards that are aligned with long-term satisfaction in a more principle way? ● Reinforcement learning: can we use reinforcement learning to optimize long-term reward directly for recommendations?