Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Balancing Discovery and Continuation in Recommendations

Our objective for the Netflix recommendation engine is to create a personalized experience for our members, making it easier for them to find a video to watch and enjoy. When a member logs on to the service, she/he may be in one or a combination of different watching modes: discovering a new content to watch, continuing to watch a partially-watched movie or a TV show she/he has been binging on, playing one of the contents she/he had put in her play list during an earlier session, etc. If, for example, we can reasonably predict when a member is more likely to be in the continuation mode, and which videos she/he is more likely to resume, it makes sense to place those videos in more prominent places of the home page. In this talk we focus on understanding the discovery vs. continuation behavior and explain how we have used machine learning to improve the member experience by learning a personalized balance between those two modes. As a case study, we focus on a recent change on the personalization of a row of recommendations called “Continue Watching,” which appears on the main page of the Netflix member homepage on the website and the app and currently drives a significant proportion of member streaming hours.

  • Be the first to comment

Balancing Discovery and Continuation in Recommendations

  1. 1. Hossein Taghavi With: Ashok Chandrashekar, Linas Baltrunas, and Justin Basilico Balancing Discovery and Continuation in Recommendations RecSysTV 2016
  2. 2. Outline § Background: Netflix recommendations § Recommending for different modes of watching § Case study: Continue Watching row § Conclusions 2
  3. 3. Evolution of Netflix 2006 2016
  4. 4. Netflix Scale § > 83M members § > 190 countries § > 1000 device types § > 3.7B hours of content streamed every month § 36% of peak US downstream traffic 4
  5. 5. § Recommendations through predicted star rating § Contest: § Accuracy measured by root mean squared error (RMSE) § Improve by 10% = $1 million! § Data size: § 100M ratings (back then “almost massive”) 5
  6. 6. Turn on Netflix, and the absolute best contents for you would automatically start playing Recommendation System: Ideal State 6
  7. 7. Create a page of recommendations where the titles you are most likely to watch and enjoy are shown on the most visible parts of the page Meanwhile… 7
  8. 8. Title Ranking Everything is a RecommendationRowSelection&Ordering Recommendations are driven by machine learning algorithms Over 80% of what members watch comes from our recommendations 8
  9. 9. How the Homepage is Built § The titles are organized as rows § Ordering of titles within rows depends on the row type § Selection and ordering of rows: § Personalized page generation algorithm § Also some business rules and constraints § Balance thematic coherence, relevance, and diversity 9
  10. 10. Various Types of Member Interactions/Feedback § Plays § How long, pause, rewind, skip, etc. § Rating and social § Rate, like, share § Context § Time, location, device, language § Interactions § Scrolling, opening a title page, search, list add 10
  11. 11. Building the Recommendations is Data Driven § Try an idea offline using historical data to see if it would have made better recommendations § Offline metrics: AUC, nDCG, Recall, … § If it did, deploy a live A/B test to see if it performs well in Production § Primary metric: Member retention Idea / Problem Data Algorithm Model Metrics A/B Testing 11
  12. 12. For More Reading § Netflix tech blog: § § § 12
  13. 13. Building recommendation algorithms that are balanced for different modes of watching 13
  14. 14. The same you watched last time! What Is the Most Likely Title You Will Watch? § A large portion of watching hours are spent in continue watching mode 14
  15. 15. Different Modes of Watching § Continuation: Resume a recently-watched TV/Movie § List: Play a title previously added to My List § Rewatch: Rewatch a title enjoyed in the past § Discovery: Discover a new title to watch 15
  16. 16. Recommending for Different Modes: Approach 1 § Build one unified model for ranking the titles in each row and one for ranking rows § Optimized for the likelihood of play/enjoyment from the page § Benefits: § Fewer models to maintain § Fewer A/B tests 16
  17. 17. Approach 1: Challenges § Members behave differently in different modes § Different row types are designed for different behaviors § Hard to capture and balance all that in one objective § E.g. simply ranking titles by likelihood of play will fill the page with already-watched titles è Poor member experience § Recommendations for different modes have different sensitivities to member actions § Continuation recs may react immediately to watching activities, My List recs may react to My List add/remove activities, etc. 17
  18. 18. Approach 2: Dedicated Models + Blend § Build separate models for the each mode § Blend the results on the page § Blending can be done through a model trained offline, or a parameter tuned online § E.g., one or more dedicated rows for each mode § Pro: § More modular, provides more intuitive knobs for balancing § Con: § Less elegant, more maintenance 18
  19. 19. Case Study: Continue Watching Row 19
  20. 20. Continue Watching Row: The Past § CW row was shown on some devices § Videos sorted by recency of last watch § Row appearance on page by business rules § On the website, only a single CW title § A very significant fraction of plays are continuations § CW deserved a better treatment 20
  21. 21. Objective § Unify the CW row across devices § Optimize the row in two dimensions: § Row position on page § Place it higher when the member is more likely to resume a video § Re-order the titles within the CW row § By their likelihood to be resumed in the current session 21
  22. 22. Some Intuitive Patterns § Member may be more likely to want to § Resume a video if: § In the middle of binging a TV show § Partially watched a movie recently § Often watched it around this time of the day, location, or on the current device § Discover a new title if: § Just finished a movie or completed all episodes of a show § Hasn’t watched anything recently § Is a relatively new member 22
  23. 23. Building a Recommendation Model for CW § Feature Brainstorm § Training Data § Models and Metrics § Implementation 23
  24. 24. Feature Ideas § Member-level: § Member’s subscription: tenure, country, language § How active has the member been recently § Member past ratings, genre preferences, etc. 24
  25. 25. Feature Ideas § Video and member’s previous interactions with it: § How recently was the video added to the catalog, watched, ... § How much of the movie/show watched § Video metadata: § Type and genre of video, # episodes § E.g., kids titles may be re-watched more § What else is on the catalog § Popularity and relevance of the video § How often do members resume this video 25
  26. 26. Feature Ideas § Contextual: § Time of the day and day of the week § Location at various resolutions § Device 26
  27. 27. Title Ranking Model § Training data § Continuation sessions § Look at which of the recently-watched titles were played? § Model § Learn-to-rank: Linear/ensembles/… § Optimize for how well we rank the played title among other titles 27
  28. 28. Title Ranking Model: Performance § Baseline: Ranking by recency of last play § Recency rank was also an important feature in the model § Metrics significantly higher than the baseline § E.g. Significant lift in precision § A/B testing also showed improvements 28
  29. 29. Row Placement Model § Objective § Estimate the likelihood of continuation vs. discovery § Map that likelihood to a position on the page § Simplification: § Fix two candidate positions on the page and apply a threshold § Tune the threshold to optimize some accuracy metric 29
  30. 30. Row Placement Model: Training § Training data § Randomly select sessions with plays globally § Model § Binary classification of continuation vs. discovery sessions § Evaluated using classification and ranking metrics 30
  31. 31. Row Placement Model: Performance § Metrics § Achieved high classification metrics for predicting continuation vs discovery § Error types: § False positives è CW occupies top of the page unnecessarily § False negative è Difficult for member to find the CW title § Placing the row § Threshold trades off FP and FN è Hard to tune offline § Tuned the threshold by A/B testing 31
  32. 32. Reusing the Title Ranking Model § Use the title-level scores § Calibrate scores to get probability Pt of continuation for each CW title t § Aggregate into an overall probability of continuation § E.g., assuming independence: PCW = 1 - ∏tϵCW (1- Pt) § Pro: Avoids maintaining two separate models § Con: Not as accurate as a dedicated model 32
  33. 33. Context Awareness § Title ranks highest on the same time of day and device as last play § Experiment: § Played “Sid the Science Kid” on iPhone § Played “Narcos” on the website è Different ranking on iPhone and Web 33
  34. 34. Serving the CW Row in Production § Score cannot be precomputed è Real- or near real-time § Some features are context dependent § Row should refresh each time a member watches a title § Need to push updates to clients to keep the row fresh § Latency bottleneck: Data transfers from the cache to computation backend § Requires careful backend engineering § Fallback strategy: If computation fails, can use recency ranking 34
  35. 35. Conclusions and Future Directions 35
  36. 36. Conclusions § Important to understand different modes of behavior § Continuation is a key driver of streaming hours § Improving CW recommendations improves member experience § A/B testing showed significant boost in user engagement § Future: § Incorporate the placement of CW row (and others) into the main page construction model § When can we automatically start resuming a title? 36
  37. 37. Questions? Upcoming blog post on this topic at: Job openings: 37