Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Homepage Personalization at Spotify

An overview of Homepage personalization at Spotify

Homepage Personalization at Spotify

  1. 1. Homepage Personalization at Spotify Oğuz Semerci, Aloïs Gruson, Clay Gibson, Ben Lacker, Catherine Edwards, Vladan Radosavljevic
  2. 2. Spotify is a global audio subscription service By the numbers 232M 108M 79 50M+ 450k+
  3. 3. What’s at stake on the Homepage? The Homepage is the first thing you see when you open the app. It is many things: a discovery tool, a personal music assistant, a marketplace for artists and their fans. Spotify’s mission is to unlock the potential of human creativity — by giving a million creative artists the opportunity to live off their art and billions of fans the opportunity to enjoy and be inspired by it. Personalization is powerful in this challenging content space with vast volume and variety.
  4. 4. 01 More on Spotify Homepage 02 Overview of the Ranking algorithm and the bandit policy 03 Sanity checks used in practice for policy debiasing and model behavior Talk outline
  5. 5. Homepage organization The Homepage is made up of cards: podcast shows or episodes, albums, playlists, radio stations, artist pages, etc. Cards are organized into shelves. Shelf A Shelf B
  6. 6. Each user is eligible for hundreds of candidate shelves, which can be editorially or programmatically curated. Shelves pull from a pool of millions of cards. All shelf candidates and their respective cards are ranked in real-time when you load Home. Made for X Your Favorite Albums Similar to Y Recommended for Today Iconic 80s Soundtracks Discovered in Greenwich Village Programmatic Curation Editorial Curation Embedding Network Ranking Recommendation Funnel
  7. 7. Ranking Algorithm and Bandit Policy
  8. 8. Log user feedback: interactions such as clicks, likes, streams Learn to rank Homepage based on logged feedback data. Homepage ranking as end-to-end ML problem Ranking algorithm serves recommendations Train ranking algorithm using logged feedback
  9. 9. Consequences of Feedback Loops Without randomization in the feedback loop, you risk: ● Homogenized user behavior (Chaney et al. 2018) ● Diminishing diversity over time (Nguyen et al. 2014) ● Poor representation of the long tail (Mehrotra et al. 2018) Continuous exploration and content pool expansion are helpful (Jiang et al, 2019)
  10. 10. Log user feedback: interactions such as clicks, likes, streams Ranking algorithm serves recommendations Train ranking algorithm using logged feedback Introduce exploration
  11. 11. Exploration policy introduces randomness Log user feedback: interactions such as clicks, likes, streams Ranking algorithm serves recommendations Train ranking algorithm using logged feedback + policy propensities Introduce exploration
  12. 12. Random data collection Randomize the Homepage for a small fraction of requests Ways to introduce exploration Bandit Policy Explore/exploit as Homepage is assembled (McInerney et al., 2018) Bandit approaches are becoming popular: ● Artwork personalization at Netflix (Amat et al. 2018) ● News article recommendation in Yahoo (Chu et al. 2012) ● Personalization at Amazon Music (ICML 2019) ● REVEAL ’19 workshop here Fully randomized experiment Randomize the Homepage for a small fraction of users
  13. 13. Explore/Exploit on the Homepage An example of an epsilon-greedy policy for ranking the Spotify Homepage. 0.7 0.20.8 Card Candidates Predicted stream rate
  14. 14. Explore/Exploit on the Homepage An example of an epsilon-greedy policy for ranking the Spotify Homepage. 0.7 0.20.8 Card Candidates 0.8 𝜋 = (1- 𝝐) + 𝝐/ 3
  15. 15. Explore/Exploit on the Homepage An example of an epsilon-greedy policy for ranking the Spotify Homepage. 0.7 0.20.8 Card Candidates 0.8 0.2 𝜋 = 𝝐/ 2
  16. 16. Explore/Exploit on the Homepage An example of an epsilon-greedy policy for ranking the Spotify Homepage. 0.7 0.20.8 Card Candidates 0.8 0.2 0.7 𝜋 = 1
  17. 17. Training the reward model* Counterfactual inference for model parameters * Explore, Exploit, Explain: Personalizing Explainable Recommendations with Bandits. J McInerney, B Lacker, S Hansen, K Higley, H.Bouchard, A Gruson & R Mehrotra. RecSys 2018.
  18. 18. Research Directions & Practical Challenges Many research directions we work on: ● Designing better reward models (REVEAL, talk by Mounia Lalmas) ● Optimizing for the marketplace (Marketplaces tutorial, Rishabh and Ben) ● Careful feature engineering to mitigate feedback loop side effects and better rank new content ● Creating a more representative Homepage (Henriette Cramer in Responsible Recommendation Panel) But we need to have integration tests (kind of) so that we are confident that we’ve got the basics right.
  19. 19. Sanity Checks used in Practice Three examples
  20. 20. Need a way to validate that policy debiasing yields roughly unbiased training data. Sanity Checks for policy debiasing Method: ● Remove position bias by using training data from top position.. ● Train a linear model with a single feature (shelf_name) to predict a metric that’s observable online (CTR). ● Compare prediction from debiased model to observed outcome during exploration in that position.
  21. 21. Need a way to validate that policy debiasing yields roughly unbiased training data. Sanity Checks for policy debiasing With importance sampling Without importance sampling
  22. 22. Product strategy Sanity Checks for problem specific model behavior Aggregate ranking metrics (e.g. NDCG) have low resolution and offer little visibility into model behavior. But stakeholders have expectations about what the model should do in specific situations. We build trust in the model internally and externally by creating metrics around these expectations and using them as sanity checks. Artists Curators Users
  23. 23. Music has repetitive consumption patterns. Users have habitual behavior on Home. If a user has a clear preference for a specific shelf, models should rank that shelf high on the page, regardless of what it is. A user has a “favorite” shelf if a significant amount of their consumption can be attributed to that shelf. Measure the average row where that shelf is placed for those users. Favorite Shelf Position Sanity Check modelA modelB shelfX shelfY shelfZ
  24. 24. Daily & Hourly Patterns Sanity Check “Why don’t I see “Peaceful Piano” on top of my homepage every night?” ● Zoom into repetitive consumption patterns and habitual behavior. ● Measure if the row position is higher at the right time when applicable. streamrate
  25. 25. 01 Motivation for exploration when collecting training data 02 Methods for collection policies and an epsilon greedy example 03 Three examples of simple sanity checks we use in production while navigating the complex ecosystem of the homepage personalization Conclusions
  26. 26. Thank you! References: [1] Lihong Li, Wei Chu, John Langford, Robert E. Schapire, A Contextual-Bandit Approach to Personalized News Article Recommendation arXiv preprint arXiv:1003.0146 [2] Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, and Fernando Diaz. 2018. Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. CIKM '18. ACM, New York, NY, USA, 2243-2251 [3] Allison J. B. Chaney, Brandon Stewart, and Barbara Engelhardt. 2017. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. arXiv preprint arXiv:1710.11214 [4] J. McInerney, B. Lacker, S. Hansen, K. Higley, H. Bouchard, A. Gruson, R. Mehrotra. Explore, Exploit, Explain: Personalizing Explainable Recommendations with Bandits. In ACM Conference on Recommender Systems (RecSys), October 2018 [5] Ray Jiang, Silvia Chiappa, Tor Lattimore, Andras Agyorgy, and Pushmeet Kohli. 2019. Degenerate Feedback Loops in Recommender Systems. arXiv:arXiv:1902.10730 [6] Thorsten Joachims, Adith Swaminathan, Tobias Schnabel Unbiased learning from biased user feedback arXiv:arXiv:1608.04468 [7] Fernando Amat, Ashok Chandrashekar, Tony Jebara, and Justin Basilico. 2018. Artwork personalization at netflix. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18). https://www.spotifyjobs.com

×