Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Metrics, Engagement & Personalization

These are the slides of the keynote I gave at UMAP 2019 (User Modeling, Adaptation and Personalization) held in Larnaca, June 2019. The theme of the conference this year was "Making Personalization Transparent: Giving Control Back To The User". I looked at the 1st part for my talk.

When users interact with the recommendations served to them, they leave behind fine-grained traces of interaction patterns, which can be leveraged to predict how satisfying their experience was. This talk will present various works and personal thoughts on how to measure user engagement. It will discuss the definition and development of metrics of user satisfaction that can be used as proxy of user engagement, and will include cases of good, bad and ugly scenarios. An important message will be to show that, to make personalization transparent, it is important to consider the heterogeneity of both user and content to formalise the notion of satisfaction, and in turn design the appropriate satisfaction metrics to capture these. One way to do this is to consider the following angles: 1) Understanding intents; 2) Optimizing for the right metric; 3) Acting on segmentation; and 4) Thinking about diversity.

Metrics, Engagement & Personalization

  1. 1. Metrics, Engagement & Personalization Mounia Lalmas
  2. 2. 1998 to 2008 from Lecturer to Professor at Queen Mary University of London working on information retrieval, structured document retrieval and running INEX, a worldwide evaluation framework for XML retrieval. 2008 to 2010 Microsoft Research/Royal Academy of Engineering Research Professor at University of Glasgow working on quantum-inspired models of interactive information retrieval. 2011 to 2013 Visiting Principal Research Scientist at Yahoo Labs Barcelona working on user engagement in search, social media and news. 2013 to 2017 Director of Research at Yahoo (now Verizon Media) London working on advertising sciences. 2017 to now Director of Research & Head of Tech Research in Personalization at Spotify London working on personalisation and discovery. J Lehmann, M Lalmas, E Yom-Tov & G Dupret. Models of User Engagement, 20th conference on User Modeling, Adaptation, and Personalization (UMAP 2012), Montreal, 16-20 July 2012. A little bit about me
  3. 3. PZN Offsite 2019 Outline About How
  4. 4. PZN Offsite 2019 About User engagement Metrics Interpretations
  5. 5. PZN Offsite 2019 About User engagement Metrics Interpretations
  6. 6. What is user engagement? User engagement is the quality of the user experience that emphasizes the positive aspects of interaction – in particular the fact of wanting to use the technology longer and often. S Attfield, G Kazai, M Lalmas & B Piwowarski. Towards a science of user engagement (Position Paper). WSDM Workshop on User Modelling for Web Applications, 2011.
  7. 7. Why is it important to engage users? Users have increasingly enhanced expectations about their interactions with technology … resulting in increased competition amongst the providers of online services. utilitarian factors (e.g. usability) → hedonic and experiential factors of interaction (e.g. fun, fulfillment) → user engagement M Lalmas, H O’Brien and E Yom-Tov. Measuring user engagement. Morgan & Claypool Publishers, 2014.
  8. 8. The engagement life cycle Point of engagement Period of engagement Disengagement Re-engagement How engagement starts (Acquisition & Activation) Aesthetics & novelty in sync with user interests & contexts. Ability to maintain user attention and interests Main part of engagement and the focus of this talk. Loss of interests leads to passive usage & even stopping usage Identifying users that are likely to churn often undertaken. Engage again after becoming disengaged Triggered by relevance, novelty, convenience, remembering past positive experience sometimes as result of campaign strategy.
  9. 9. New Users Acquisition Active Users Activation Disengagement Dormant Users Churn Disengagement Re-engagement Period of engagement relates to user behaviour with the product during a session and across sessions. The engagement life cycle
  10. 10. 10 New Users Acquisition Active Users Activation Disengagement Dormant Users Churn Disengagement Re-engagement Period of engagement relates to user behaviour with the product during a session and across sessions. 10 The engagement life cycleQuality of the user experience during and across sessions People remember satisfactory experiences and want to repeat them. We need metrics to quantify the quality of the user experience → metrics of satisfaction.
  11. 11. PZN Offsite 2019 About User engagement Metrics Interpretations
  12. 12. Measures, metrics & KPIs Measurement: process of obtaining one or more quantity values that can reasonably be attributed to a quantity e.g. number of clicks Metric: a measure is a number that is derived from taking a measurement … in contrast, a metric is a calculation e.g. click-through rate Key performance indicator (KPI): quantifiable measure demonstrating how effectively key business objectives are being achieved e.g. conversion rate a measure can be used as metric but not all metrics are measures a KPI is a metric but not all metrics are KPIs
  13. 13. 3. Optimization metrics Objective metrics to train personalization algorithms Three levels of metrics 2. Behavioral metrics Online metrics 1. Business metrics KPIs
  14. 14. follow post percentage completion dwell time abandonment rate click to stream impression to click long clicksave Optimization metrics quantify how users engage within a session and act as proxy of satisfaction.
  15. 15. Why several metrics? Games Users spend much time per visit. Search Users come frequently but do not stay long. Social media Users come frequently & stay long. Niche Users come on average once a week. News Users come periodically. Service Users visit site when needed.
  16. 16. Leaning backLeaning in Active Occupied Playlists types Pure discovery sets Trending tracks Fresh Finds Playlist metrics Downstreams Artist discoveries # or % of tracks sampled Playlists types Sleep Chill at home Ambient sounds Playlist metrics Session time Playlists types Workout Study Gaming Playlist metrics Session time Skip rate Playlists types Hits flagships Decades Moods Playlist metrics Skip rate Downstreams Why several metrics?
  17. 17. PZN Offsite 2019 About User engagement Metrics Interpretations
  18. 18. Click The bad. What is the value of a click?
  19. 19. Click-through rate = ratio of users who click on a specific link to the number of total users who view a page, email, advertisement, … Most used optimization metric.
  20. 20. interest-specific search media (periodic) e-commerce media (daily) J Lehmann, M Lalmas, E Yom-Tov & G Dupret. Models of User Engagement. UMAP 2012. Type of engagement depends on the structure of the site and content.
  21. 21. Abandonment in search = when there is no click on the search result page User is dissatisfied → bad abandonment User found result(s) on the search result page → good abandonment Cursor trail length Total distance (pixel) traveled by cursor on search result page Shorter for good abandonment Movement time Cursor speed Total time (second) cursor moved on on search result page Average cursor speed (pixel/second) Longer when answers in snippet (good abandonment) Slower when answers in snippet (good abandonment) J Huang, R White & S Dumais. No clicks, no problem: using cursor movements to understand and improve search. CHI 2011.
  22. 22. Dwell time is a better proxy for user interest on a news article than click. An efficient way to reduce click-baits. Optimizing for dwell time led to increase in click-through rates. X Yi, L Hong, E Zhong, N Nan Liu & S Rajan. Beyond Clicks: Dwell Time for Personalization. RecSys 2014. H Lu, M Zhang, W Ma, Y Shao, Y Liu & S Ma. Quality Effects on User Preferences and Behaviors in Mobile News Streaming User Modeling. WWW 2019.
  23. 23. peak on app X Accidental clicks do not reflect post-click experience. app X peak on app Y dwell time distribution of apps X and Y for given ad app Y G Tolomei, M Lalmas, A Farahat & A Haines. Data-driven identification of accidental clicks on mobile ads with applications to advertiser cost discounting and click-through rate prediction. Journal of Data Science and Analytics, 2018.
  24. 24. A Turpin & F Scholer. User performance versus precision measures for simple search tasks. SIGIR 2006. Similar time taken to find first relevant document whatever the number of retrieved relevant documents.
  25. 25. Dwell time The bad. What does spending time really means?
  26. 26. Dwell time = contiguous time spent on a site or web page. Dwell time varies by site type. Dwell time has a relatively large variance even for the same site. average and variance of dwell time of 50 sites E Yom-Tov, M Lalmas, R Baeza-Yates, G Dupret, J Lehmann & P Donmez. Measuring Inter-Site Engagement. BigData 2013.
  27. 27. Reading cursor heatmap of relevant document vs scanning cursor heatmap of non-relevant document Q Guo & E Agichtein. Beyond dwell time: estimating document relevance from cursor movements and other post-click searcher behavior. WWW 2012.
  28. 28. Reading a relevant long document vs scanning a long non-relevant document Q Guo & E Agichtein. Beyond dwell time: estimating document relevance from cursor movements and other post-click searcher behavior. WWW 2012.
  29. 29. Dwell time used as proxy of landing page quality. non-mobile optimized mobile optimized M Lalmas, J Lehmann, G Shaked, F Silvestri & G Tolomei. Promoting Positive Post-click Experience for In-Stream Yahoo Gemini Users. KDD Industry track 2015. Dwell time on non-optimized landing pages comparable and even higher than on mobile-optimized ones.
  30. 30. Bias The ugly. Who and what do we amplify? Optimizing for engagement alone has consequences.
  31. 31. (unfair) computational bias = discrimination that is systemic and unfair in favoring certain individuals or groups over others in an algorithmic system. data bias = a systemic distortion in the data that compromises its representativeness. B Friedman & H Nissenbaum. Bias in computer systems. TOIS 1996. A Olteanu, E Kıcıman, C Castillo & F Diaz. A Critical Review of Online Social Data: Limitations, Ethical Challenges, and Current Solutions. Tutorial @ KDD 2017.
  32. 32. Harms of allocation withhold opportunity or resources. Harms of representation reinforce subordination along the lines of identity, stereotypes. K Crawford. The Trouble With Bias. Keynote N(eur)IPS 2017. Under-serving
  33. 33. top most popular tweets top most popular tweets + geographical diverse Being from a central or peripheral location makes a difference. Peripheral users did not perceive the timeline as being diverse. E. Graells, M. Lalmas & R. Baeza-Yates. Encouraging Diversity- and Representation-Awareness in Geographically Centralized Content. IUI 2016.
  34. 34. PZN Offsite 2019 How The good. Understanding intents Optimizing for the right metric Acting on segmentation Thinking about diversity
  35. 35. Understanding intents
  36. 36. R Mehrotra, M Lalmas, D Kenney, T Lim-Meng & G Hashemian. Jointly Leveraging Intent and Interaction Signals to Predict User Satisfaction with Slate Recommendations. WWW 2019. Three Intent Models Passively listening - quickly access playlists or saved music - play music matching mood or activity - find music to play in background Actively engaging - discover new music to listen to now - save new music or follow new playlists for later - explore artists or albums more deeply Home Considering intent and learning across intents improves ability to infer user satisfaction by 20%. intent important to interpret user interaction
  37. 37. P Ravichandran, J Garcia-Gathright, C Hosey, B St. Thomas & J Thom. Developing Evaluation Metrics for Instant Search Using Mixed Methods. SIGIR 2019. A Li, J Thom, P Ravichandran, C Hosey, B St. Thomas & J Garcia-Gathright. Search Mindsets: Understanding Focused and Non-Focused Information Seeking in Music Search. WWW 2019. C Hosey, L Vujović, B St. Thomas, J Garcia-Gathright & J Thom. Just Give Me What I Want: How People Use and Evaluate Music Search. CHI 2019. INTENT What users want to do MINDSET How users think about results Search Understanding intent helps understand users’ perceptions of success in search. success rate more sensitive than click-through rate.
  38. 38. Important to consider user intent to predict satisfaction, define optimization metric or interpret a metric. N Su, J He, Y Liu, M Zhang & S Ma. User Intent, Behaviour, and Perceived Satisfaction in Product Search. WSDM 2018. J Cheng, C Lo & J Leskovec. Predicting Intent Using Activity Logs: How Goal Specificity and Temporal Range Affect User Behavior. WWW 2017. user research quantitative research intent model intent-aware optimisation common intent Understanding intent is hard
  39. 39. Optimizing for the right metric
  40. 40. P Dragone, R Mehrotra & M Lalmas. Deriving User- and Content-specific Rewards for Contextual Bandits. WWW 2019. Using playlist consumption time to informed metric to optimise for Spotify Home reward function Optimizing for mean consumption time led to +22.24% in predicted stream rate. Defining per user x playlist cluster led to further +13%. mean of consumption time co-clustering user group x playlist type Home
  41. 41. M. Lalmas, J. Lehmann, G. Shaked, F. Silvestri and G. Tolomei. Positive Post-click Experience for In-Stream Yahoo Gemini Users. KDD Industry 2016. Landing page Positive post-click experience (“long” clicks) has an effect on users clicking on ads again Dwell time is time until user returns to publisher and used as proxy of quality of landing page
  42. 42. Personalization algorithm will be very good at optimizing for the chosen metric. X Yi, L Hong, E Zhong, N Nan Liu & S Rajan. Beyond clicks: dwell time for personalization. RecSys 2014. M Lalmas, H O’Brien & E Yom-Tov. Measuring user engagement. Morgan & Claypool Publishers, 2014. J Lehmann, M Lalmas, E Yom-Tov and G Dupret. Models of User Engagement. UMAP 2012. qualitative research correlation vs causation interaction contributioninvolvement Choosing metric is important
  43. 43. Acting on segmentation
  44. 44. Measure of user listening diversity specialist generalist Listening diversity = number of genres liked in past x months Like a genre = have affinity for at least y artists in that genre Paper in preparation. I Waller and A Anderson. Generalists and Specialists: Using Community Embeddings to Quantify Activity Diversity in Online Platforms. WWW 2019. Genre diversity By segmenting users into specialist vs generalists, we observed different retention behaviours.
  45. 45. N Barbieri, F Silvestri & M Lalmas. Improving Post-Click User's Engagement on Native Ads via Survival Analysis. WWW 2016. Landing page quality Users tend to spend more time on finance ads rather than beauty ads. Optimizing for dwell time must account for type of content.
  46. 46. Segmentation helps personalization algorithms to perform for users and contents across the spectrum. Y Jinyun, W Chu & R White. Cohort modeling for enhanced personalized search. SIGIR 2014 S Goel, A Broder, E Gabrilovich & B Pang. Anatomy of the long tail: ordinary people with extraordinary tastes. WSDM 2010. R White, S Dumais & J Teevan. Characterizing the influence of domain expertise on web search behavior. WSDM 2009. who? what? where? why?when? Optimizing for segmentation
  47. 47. Thinking about diversity
  48. 48. Paper under review. Satisfaction Optimizing for multiple satisfaction objectives together performs better than single metric optimization. Satisfaction metrics include clicks, stream time, number of song played, etc. Model is learning more relevant patterns of user satisfaction with more optimization metrics.
  49. 49. R Mehrotra, J McInerney, H Bouchard, M Lalmas & F Diaz. Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. CIKM 2018. Playlist is deemed fair if it contains tracks from artists with different popularity groups. Very few sets have both high relevance & high fairness. “Fairness” Relevance Popularity Gains in fairness possible without severe loss of satisfaction. Adaptive policies aware of user receptiveness perform better.
  50. 50. When thinking diversity, personalization algorithms become informed about what and who they serve. H Cramer, J Wortman-Vaughan, K Holstein, H Wallach, H Daume, M Dudík, S Reddy & J Garcia-Gathright. Algorithmic bias in practice. FAT* Industry Translation Tutorial, 2019. P Shah, A Soni & T Chevalier. Online Ranking with Constraints: A Primal-Dual Algorithm and Applications to Web Traffic-Shaping. KDD 2017. D Agarwal & S Chatterjee. Constrained optimization for homepage relevance. WWW 2015. algorithmic bias data bias over-index awarenessunder-index Understanding diversity
  51. 51. One more thing The very good.
  52. 52. Qualitative&quantitativeresearch KPIs&businessmetrics Algorithms Training & Datasets Optimizationmetrics Evaluation offline & online Measurement & signals Features (item) Features (user) Features (context) Bias Making personalization work
  53. 53. session session session session session next day, next week, next month, etc Inter-session engagement measures user engagement across sessions and relates to KPIs and business metrics. Intra-session engagement measures user engagement during the session. session
  54. 54. Intra-session measures can easily mislead, especially for a short time. Why inter-session metrics? R Kohavi, A Deng, B Frasca, R Longbotham, T Walker & Y Xu. Trustworthy online controlled experiments: Five puzzling outcomes explained. KDD 2012.
  55. 55. Correlation / Causation ● Do not capture how users engage during a session ● May not deliver much, if any, in terms of improving personalization algorithms ● Not clear yet how personalization algorithms can learn from using inter-session metrics We optimize (and monitor) intra-session metrics … but those that move inter-session metrics. intra-session metrics inter-session metrics Optimization Hong & M Lalmas. Tutorial on Online User Engagement: Metrics and Optimization. WWW 2019. H Hohnhold, D O’Brien & D Tang. Focusing on the Long-term: It’s Good for Users and Business. KDD 2015. G Dupret & M Lalmas. Absence time and user engagement: Evaluating Ranking Functions. WSDM 2013. We do not optimize for inter-session metrics
  56. 56. Let us recap Making personalization transparent.
  57. 57. PZN Offsite 2019 Understanding intents Optimizing for the right metric Acting on segmentation Thinking about diversity User intents help informing metric optimization & metric interpretation. Diversity, intents, segmentation help making personalization transparent. Segmentation helps adapting personalization models.
  58. 58. Thank you