Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainable Recommendations with Multi-objective Contextual Bandits

498 views

Published on

Recommendations in a Marketplace: Personalizing Explainable Recommendations with Multi-objective Contextual Bandits

Published in: Technology
  • Be the first to comment

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainable Recommendations with Multi-objective Contextual Bandits

  1. 1. - Introduction to Marketplaces - Relevance vs Fairness trade-off - Multi-objective Contextual Bandits 29 March 2019 Recommendations in a Marketplace Rishabh Mehrotra Research Scientist, Spotify Research London, UK rishabhm@spotify.com
  2. 2.
  3. 3. • • – – – –
  4. 4. • • – – – – – • • • – • • – • • • • • • •
  5. 5. Today’s Talk Phase I: User-centric RecSys (Bandit: Explore, Exploit, Explain) Phase II: Inject one competing objective (Relevance vs Fairness) Phase III: Multi-stakeholder Bandits User centric Multi- Stakeholder
  6. 6. Traditional RecSys Approaches
  7. 7. Approaches for RecSys Collaborative Filtering, i.e. matrix factorization
  8. 8. Approaches for RecSys Collaborative Filtering -- extended, i.e. Tensor factorization AAAI 2010: Collaborative Filtering Meets Mobile Recommendation: A User-Centered Approach
  9. 9. Approaches for RecSys Latent variable models RecSys 2015: A probabilistic model for using social networks in personalized item recommendation
  10. 10. Approaches for RecSys Neural Embeddings User Embedding
  11. 11. Approaches for RecSys Neural Embeddings User Embedding … with Side Information RecSys 2016: Meta-Prod2Vec - Product Embeddings Using Side-Information for Recommendation
  12. 12. Approaches for RecSys Neural Embeddings User Embedding … with Side Information Joint User-Item Embedding WSDM 2017: Joint Deep Modeling of Users and Items Using Reviews for Recommendation
  13. 13. Approaches for RecSys Neural Collaborative Ranking WWW 2017: Neural Collaborative Filtering
  14. 14. Approaches for RecSys Variants of Recommendation Styles: - Short vs long term - Cold start or cohort based - Multi-view & multi-interest models - Mult-task recommendation SIGIR 2012: Modeling the Impact of Short- and Long-Term Behavior on Search Personalization
  15. 15. Approaches for RecSys Variants of Recommendation Styles: - Short vs long term - Cold start & cohort based - Multi-view & multi-interest models - Mult-task recommendation SIGIR 2014: Cohort Modeling for Enhanced Personalized Search
  16. 16. Approaches for RecSys Variants of Recommendation Styles: - Short vs long term - Cold start or cohort based - Multi-view & multi-interest models - Mult-task recommendation RecSys 2013: Nonlinear Latent Factorization by Embedding Multiple User Interests
  17. 17. Approaches for RecSys Variants of Recommendation Styles: - Short vs long term - Cold start or cohort based - Multi-view & multi-interest models - Multi-task recommendation KDD 2018: Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks
  18. 18. Approaches for RecSys
  19. 19. Approaches for RecSys What do they have in common?
  20. 20. Approaches for RecSys What do they have in common? User centric focus
  21. 21. Traditional RecSys: User Centric ● User centric nature of systems: ○ Recommendations models catered to users: ■ user needs ■ user interests ■ user behavior & interactions ■ personalization ○ Evaluation approaches for user satisfaction ■ Measuring user engagement ■ Optimizing for user satisfaction ■ User centric metrics *WSDM 2018 Tutorial on metrics of user engagement; Mounia Lalmas, et al [link]
  22. 22. Two-sided Marketplace Marketplace: Intermediaries that help facilitate economic interaction between two or more sets of agents
  23. 23. Two-sided Marketplace ARTISTS FANS Marketplace: Intermediaries that help facilitate economic interaction between two or more sets of agents
  24. 24. Recommendation in 2-sided Marketplace Stakeholder(s) User Artists Advertisers Campaign(s) Platform provider
  25. 25. Recommendation in 2-sided Marketplace Stakeholder(s) User Artists Advertisers Campaign(s) Platform provider Metrics Streams Engagement levels Reach / Depth / Retention Downstreams (saves, artist views) Other proxies of user satisfaction Exposure Audience growth Revenue LTV Diversity
  26. 26. Select an arm (i.e. card) Recommendation Strategy
  27. 27. Select an arm (i.e. card) Recommendation Strategy
  28. 28. Select an arm (i.e. card) Recommendation Strategy user-centric
  29. 29. User centric ML model is not meant to optimize for different objectives
  30. 30. Recommendation Strategy Recommendation strategy = ??
  31. 31. Recommendation Strategy f(𝞹1 , 𝞹2 , 𝞹3 , 𝞹4 )
  32. 32. Recommendation Strategy Select an arm (i.e. card)user-centric
  33. 33. user-centric artist-centric Spotify economics Recommendation Strategy Select an arm (i.e. card)
  34. 34. user-centric artist-centric Spotify economics Recommendation Strategy Solution: find optimal recommendations which satisfy multiple objectives!
  35. 35. user-centric artist-centric Spotify economics Recommendation Strategy Multi-objective Optimization Aliases: Multi-objective Multi-sided Multi-criteria Multi-stakeholder Multi-attribute Multi-agent
  36. 36. Disclaimer ● Multi-objective ML has been around for decades ● Past work on constrained optimization in industrial setting ○ WWW 2015: Constrained Optimization for Homepage Relevance (LinkedIn) ○ SIGIR 2012: Personalized Click Shaping through Lagrangian Duality for Online Recommendation ○ arXiv 2018: Joint Revenue Optimization at Etsy (Etsy) ○ SIGIR 2018: Turning Clicks into Purchases: Revenue Optimization for Product Search in E-Commerce (Etsy) ○ KDD 2011: Click Shaping to Optimize Multiple Objectives (Yahoo!) ● Why this talk then? ○ Most past approaches work in Learning to Rank setting ○ Relatively less work in interaction ML or RL, specifically bandit setting
  37. 37. Today’s Talk Phase I: User-centric RecSys (Bandit: Explore, Exploit, Explain) Phase II: Inject one competing objective (Relevance vs Fairness) Phase III: Multi-stakeholder Bandits User centric Multi- Stakeholder
  38. 38. Phase II: Relevance - Fairness trade-off Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, Fernando Diaz (CIKM 2018)
  39. 39. Pitfalls of User Centric RecSys Recommendations based predicted relevance results in Superstar Economics Suppliers would want a fair opportunity to be presented to the users Blindly optimizing for relevance might have a detrimental impact on supplier fairness
  40. 40. Research Question: Relevance ← Satisfaction → Fairness
  41. 41. Key Definitions Relevance: We identify a recommendation as relevant if it closely resembles user’s interest profile (embedding based representation for users & tracks) User Satisfaction: Defined as the subjective measure on the utility of recommendations. Rely on implicit feedback based on behavioral signals (i.e. # tracks played)
  42. 42. Key Definitions Fairness: - numerous attempts to define fairness [FAT*’18, ICML’18] - unlikely that there will be a universal definition appropriate across all applications
  43. 43. Key Definitions Fairness*: - numerous attempts to define fairness [FAT*’18, ICML’18] - unlikely that there will be a universal definition appropriate across all applications ● Statistical bias ● Group fairness ○ Demographic parity ○ Equal Pos Pred. Value ○ Equal Neg Pred. Value ○ Equal False + Rate ○ Equal False - Rate ○ Accuracy equity ● Blindness ● Individual fairness ○ Equal thresholds ○ Similarity metric ● Process fairness (feature rating) ● Diversity (various definitions) ● Representational harms ○ Stereotype mirroring ○ Cross-dataset generalization ○ Bias in representation learning ○ Bias amplification FAT* 2018 Tutorial: 21 definitions of fairness and their politics [link] ICML 2018 Tutorial: Defining and Designing Fair Algorithms [link] #algo-bias Confluence page [link]
  44. 44. Key Definitions 2 1 1 4 0 0 (√2 + √1 + √1) > (√4 + √0 + √0) Fairness*: Define group fairness: a set of tracks is fair if it contains tracks from artists that belong to different groups (i.e. popularity bins/tiers). *Framework amenable to other interpretations and definitions of fairness * Representative & Informative Query Selection for Learning to Rank using Submodular Functions Rishabh Mehrotra, Emine Yilmaz, SIGIR 2015
  45. 45. Recommendation Policies Policy I: Optimizing Relevance
  46. 46. Recommendation Policies Policy I: Optimizing Relevance Policy II: Optimizing Fairness
  47. 47. Recommendation Policies Policy I: Optimizing Relevance Policy II: Optimizing Fairness Policy III: Probabilistic Policy
  48. 48. Recommendation Policies Policy I: Optimizing Relevance Policy II: Optimizing Fairness Policy III: Probabilistic Policy Policy IV: Trade-off Relevance & Fairness
  49. 49. Recommendation Policies System designers are wary of negatively impacting user satisfaction → avoid showing less relevant content
  50. 50. Recommendation Policies System designers are wary of negatively impacting user satisfaction → avoid showing less relevant content Policy V: Guaranteed Relevance System designers are wary of negatively impacting user satisfaction → avoid showing less relevant content This policy guarantees relevance to be above a certain threshold
  51. 51. Leverage User Specific Traits? (i.e. user tolerance)
  52. 52. Recommendation Policies Conjecture: Users have varying extent of sensitivity towards fair content ● Some users more flexible than others around the distribution of artists recommended
  53. 53. Recommendation Policies Conjecture: Users have varying extent of sensitivity towards fair content ● Some users more flexible than others around the distribution of artists recommended User Fairness Affinity: Computed as: difference in user satisfaction when recommended relevant content, versus when recommended fair content
  54. 54. Recommendation Policies Policy VI: Adaptive Policy Extreme case view: ● optimize for relevance for users with negative affinity scores ● optimize for fairness for users with a positive score
  55. 55. Summary of Recommendation Policies Policy I: Optimizing Relevance Policy II: Optimizing Fairness Policy III: Probabilistic Policy Policy IV: Trade-off Relevance & Fairness Policy V: Guaranteed Relevance Policy VI: Adaptive Policy I Policy VI: Adaptive Policy II
  56. 56. How does this trade-off fare?
  57. 57. Experiments: Trade-off Analysis ● Optimizing for Fairness hurts satisfaction ○ 35% decline in SAT ○ Motivate the need for trade-off
  58. 58. Experiments: Trade-off Analysis ● Optimizing for Fairness hurts satisfaction ○ 35% decline in SAT ○ Motivate the need for trade-off ● Gradual improvement in SAT as we move from β=0 to β=1 ○ 10% lift in SAT for half-way ○ Sharp increase in SAT beyond 0.7 Fairness Relevance
  59. 59. Experiments: Impact of Guarantees ● Guaranteeing relevance helps improve SAT ○ Higher maximum SAT score (0.84 vs 0.64)
  60. 60. Experiments: Incorporating User Tolerance Adaptive policies fare better than ● Only Fairness & only Relevance ● Interleaved (max SAT 0.65) ○ Over 12% improvement in SAT
  61. 61. Experiments: Incorporating User Tolerance Adaptive policies fare better than ● Only Fairness & only Relevance ● Interleaved (max SAT 0.65) ○ Over 12% improvement in SAT Adaptive policies: major gains in Fairness, without severe losses in Relevance
  62. 62. Experiments: Holistic View Cost vs Benefit analysis Compute loss in fairness, loss in relevance & gain in SAT.
  63. 63. Experiments: Holistic View Cost vs Benefit analysis Simple interpolation -- no good region (high SAT loss or high fairness loss) ProbPolicy: balancing with β=0.7 gives best results Guaranteed R: hurts fairness Adaptive policy: best overall trade-off
  64. 64. Summary: Phase II Relevance vs Fairness - Trading off Relevance ← SAT → Fairness is better than blindly optimizing for relevance - User tolerance aware model helps! - There is benefit in considering objectives beyond just User SAT Motivates the need for considering multiple stakeholder objectives beyond just User SAT
  65. 65. Today’s Talk Phase I: User-centric RecSys (Bandit: Explore, Exploit, Explain) Phase II: Inject one competing objective (Relevance vs Fairness) Phase III: Multi-stakeholder Bandits User centric Multi- Stakeholder
  66. 66. Phase III: Multi-objective Models for Marketplaces Multi-objective Linear Contextual Bandits via Generalised Gini Function Niannan Xue, Rishabh Mehrotra, Mounia Lalmas (under review)
  67. 67. user-centric artist-centric business economics Select an arm (i.e. card) Multi-objective Contextual Bandits
  68. 68. Multi-objective (MO) Contextual Bandits f(𝞹1 , 𝞹2 , 𝞹3 , 𝞹4 )
  69. 69. Multi-objective Contextual Bandits f(.): Generalized Gini Index - Ordered weighted averaging (OWA) - Respects Pigou-Dalton transfer: prefer allocations that are more equitable
  70. 70. Proposed: Multi-Objective Contextual Bandits via GGI ● Goal: Find an arm selection strategy ○ probability distribution based on which an arm (i.e. recommendation) is selected
  71. 71. Proposed: Multi-Objective Contextual Bandits via GGI ● Goal: Find an arm selection strategy ○ probability distribution based on which a recommendation is selected ● For a bandit instance at round t, we are given features with
  72. 72. Proposed: Multi-Objective Contextual Bandits via GGI ● Goal: Find an arm selection strategy ○ probability distribution based on which a recommendation is selected ● For a bandit instance at round t, we are given features with ● If we choose arm k, we observe linear reward where
  73. 73. Proposed: Multi-Objective Contextual Bandits via GGI ● Goal: Find an arm selection strategy ○ probability distribution based on which a recommendation is selected ● For a bandit instance at round t, we are given features with ● If we choose arm k, we observe linear reward where ● If vectorial mean feedback for each arm is known: ○ Find optimal arm via full sweep
  74. 74. Proposed: Multi-Objective Contextual Bandits via GGI ● Goal: Find an arm selection strategy ○ probability distribution based on which a recommendation is selected ● For a bandit instance at round t, we are given features with ● If we choose arm k, we observe linear reward where ● If vectorial mean feedback for each arm is known: ○ Find optimal arm via full sweep ● But its not known, its context dependent ○ Optimal policy given by:
  75. 75. Problem setup: ➔ K = Number of arms ➔ D = Number of objectives ➔ Robustness of the algorithm ➔ Ridge regression regularisation Proposed Multi-Objective Model
  76. 76. Params initialisation: ➔ Uniform strategy ➔ Auxiliary matrices for analytical solution to ridge regression Proposed Multi-Objective Model
  77. 77. Linear realizability: ➔ Observe all contexts ➔ Estimate mean rewards ◆ via l2-regularised least-squares ridge regression Proposed Multi-Objective Model
  78. 78. Online Gradient Descent: ➔ Non-vanishing step size ➔ Project a[t] back onto A Proposed Multi-Objective Model
  79. 79. Action and Update - Sample arm kt based on the distribution a[t] - Observe reward from user - Update the model Proposed Multi-Objective Model
  80. 80. Is it going to work?
  81. 81. ● Theoretically: Is the regret bounded? ● Regret bounds in past papers ○ ICML 2017: Provably Optimal Algorithms for Generalized Linear Contextual Bandits ■ ○ ICML 2013: Thompson Sampling for Contextual Bandits with Linear Payoffs ■ ○ NIPS 2011: Improved Algorithms for Linear Stochastic Bandits ■ ○ AISTATS 2011: Contextual Bandits with Linear Payoff Functions ● We derive the regret bounds for multi-objective contextual bandits Is it going to work?
  82. 82. - Sublinear in T (i.e. no. of rounds) - Increases with robustness Overall regret bounded by
  83. 83. Exciting Offline Results
  84. 84. Experiments I: Multi- vs Single- Objectives Use-case: all objectives are user interaction based metrics (no competing business objective yet) - Clicks - Stream time - Business streams - Total number of songs played
  85. 85. Experiments I: Multi- vs Single- Objectives Use-case: all objectives are user interaction based metrics - Clicks - Stream time - Business streams - Total number of songs played ● Optimizing for different objectives impacts other objectives ○ If you want more clicks, optimize for clicks
  86. 86. Experiments I: Multi- vs Single- Objectives Use-case: all objectives are user interaction based metrics - Clicks - Stream time - Business streams - Total number of songs played ● Optimizing for different objectives impacts other objectives ○ If you want more clicks, optimize for clicks ● Multi-objective model performs much better
  87. 87. Experiments I: Multi- vs Single- Objectives
  88. 88. Experiments I: Multi- vs Single- Objectives Use-case: all objectives are user interaction based metrics - Clicks - Stream time - Business streams - Total number of songs played ● Optimizing for different objectives impacts other objectives ○ If you want more clicks, optimize for clicks ● Multi-objective model performs much better Optimizing for multiple interaction metrics performs better for each metric than directly optimizing that metric
  89. 89. Experiments II: Add Competing Objective ● Competing objectives: ○ User interaction objectives: clicks, streams, no. of songs played, stream length ○ Add: a business objective, (say) gender exposure ● Significant gains in business objective
  90. 90. Experiments II: Add Competing Objective ● Competing objectives: ○ User interaction objectives: clicks, streams, no. of songs played, stream length ○ Add: a business objective, (say) gender exposure ● Significant gains in business objective … without loss in user centric metrics
  91. 91. Experiments II: Add Competing Objective ● Competing objectives: ○ User interaction objectives: clicks, streams, no. of songs played, stream length ○ Add: a business objective, (say) gender exposure ● Significant gains in business objective … without loss in user centric metrics Not necessarily a Zero-Sum Game … perhaps we “can” get gains in business objectives without loss in user centric objectives
  92. 92. Experiments III: Ways of doing Multi-Objective ● Naive multi-objective doesn’t work! ● Proposed multi-objective model performs better than: ○ Ε-greedy multi-objective
  93. 93. Experiments III: Ways of doing Multi-Objective ● Naive multi-objective doesn’t work! ● Proposed multi-objective model performs better than: ○ Ε-greedy multi-objective How we do multi-objective ML matters a lot!
  94. 94. Summary: Phase III Multi-objective Models for Marketplaces - Optimizing for multiple interaction metrics performs better for each metric than directly optimizing that metric - Not necessarily a Zero-Sum Game perhaps we “can” get gains in business objectives without loss in user centric objectives - How we do multi-objective ML matters
  95. 95. Today’s Talk Phase I: User-centric RecSys (Bandit: Explore, Exploit, Explain) Phase II: Inject one competing objective (Relevance vs Fairness) Phase III: Multi-stakeholder Bandits User centric Multi- Stakeholder
  96. 96. Thank you! Rishabh Mehrotra Research Scientist, Spotify Research London, UK rishabhm@spotify.com

×