Today, I had the big honor to give the opening keynote at the 8th AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2020), being held virtually. HCOMP is the home of the human computation and crowdsourcing community working on frameworks, methods and systems that bring together people and machine intelligence to achieve better results. I decided to totally revamp a previous talk to focus on so-called "human in the loop" and showed how we incorporate human in the loop to personalise at scale, with some of the research at Spotify. Sharing the slides for general interests.
2. Spotify’s mission is to
unlock the potential of
human creativity —
by giving a million
creative artists the
opportunity to live off
their art and billions of
fans the opportunity to
enjoy and be inspired
by it.
5. humancomputation.com
HCOMP is the home of the human computation and
crowdsourcing community. It’s the premier venue for
presenting latest findings from research and practice into
frameworks, methods and systems that bring together
people and machine intelligence to achieve better results.
“
”
6. humancomputation.com
HCOMP is the home of the human computation and
crowdsourcing community. It’s the premier venue for
presenting latest findings from research and practice into
frameworks, methods and systems that bring together
people and machine intelligence to achieve better results.
“
”
The role of “human in the loop”
when personalising at scale.
Part 1: About putting humans in the loop
Part 2: How Spotify puts these ideas into practice
9. What is user engagement?
User engagement is the quality of the user
experience that emphasizes the positive aspects
of interaction – in particular the fact of wanting
to use the technology longer and often.
S Attfield, G Kazai, M Lalmas & B Piwowarski. Towards a science of user engagement (Position Paper).
WSDM Workshop on User Modelling for Web Applications, 2011.
“
”
10. Point of
engagement
Period of
engagement
Disengagement
Re-engagement
How engagement starts (acquisition & activation)
Aesthetics & novelty in sync with user interests & contexts
Ability to maintain user attention and interests
Often the focus personalisation algorithms and the focus of this talk
Loss of interests leads to passive usage & even stopping usage
Identifying users that are likely to churn often undertaken
Engage again after becoming disengaged
Triggered by relevance, novelty, convenience, remembering past positive
experience sometimes as result of campaign strategy
The engagement life cycle
16. Why several metrics?
Games
Users visit infrequently
but stay a long time
Search
Users visit frequently
but do not stay for long
Social media
Users visit frequently
and stay a long time
Niche
Users visit once a week
News
Users visit periodically,
e.g. morning and evening
Service
Users visit site
when needed
17. Playlists differ in
listening patterns
Search has a particular
engagement pattern
Engagement varies by
media type and freshness
Home has its own
“star” engagement pattern
Why several metrics for Spotify?
18. Leaning backLeaning in Active Occupied
Playlists types
Pure discovery sets
Trending tracks
Fresh Finds
Playlist metrics
Downstreams
Artist discoveries
# or % of tracks sampled
Playlists types
Sleep
Chill at home
Ambient sounds
Playlist metrics
Session time
Playlists types
Workout
Study
Gaming
Playlist metrics
Session time
Skip rate
Playlists types
Hits flagships
Decades
Moods
Playlist metrics
Skip rate
Downstreams
Why several metrics for Spotify playlists?
20. What is human in the loop?
For the purpose of this talk
“Human-in-the-loop or HITL is defined as a model that requires human interaction.”
Wikipedia
“Human-in-the-loop (HITL) is a branch of artificial intelligence that leverages both human and
machine intelligence to create machine learning models.”
Appen, January 15, 2019
Some thinkings around designing AI systems with human in the loop
● L van Ahn & L Dabbish. Designing Games with a Purpose. Communications of the ACM 2008.
● S Amershi, D Weld, M Vorvoreanu, A Fourney, B Nushi, P Collisson, J Suh, S Iqbal, PN Bennett, K Inkpen, J Teevan, R Kikin-Gil & E Horvitz. Guidelines for Human-AI Interaction. CHI 2019.
● G Bansal, B Nushi, E Kamar, WS Lasecki, DS Weld & E Horvitz. Beyond accuracy: The role of mental models in human-AI team performance. HCOMP 2019.
22. Understanding intents
What do users want on Home?
What do users want from Search?
Optimizing for the right metric
How are users satisfied with Search?
How are users satisfied with Playlists?
Acting on segmentation
What music users listen to?
How do users listen to music?
Thinking about diversity
How to consider diversity in user satisfaction?
How to consider diversity in content?
How to incorporate human in the loop
to personalise at scale?
Some answers through the lens of our research at Spotify
1 2
3 4
23. PZN Offsite 2019
How
Part II
Understanding
intents
Optimising for
the right metric
Acting on
segmentation
Thinking
about diversity
25. [1] R Mehrotra, M Lalmas, D Kenney, T Lim-Meng & G
Hashemian. Jointly Leveraging Intent and Interaction
Signals to Predict User Satisfaction with Slate
Recommendations. WWW 2019.
What do users
want on Home?
Knowing user intent on Home helps interpreting
user implicit feedback
Passively Listening
- quickly access playlists or saved music (2)
- play music matching mood or activity (4)
- find music to play in background (6)
Other
Home is default screen (1)
Actively Engaging
- discover new music to listen to now (3)
- to find X (5)
- save new music or follow new playlists for later (7)
- explore artists or albums more deeply (8)
Considering intent
and learning across
intents improves
ability to infer user
satisfaction by 20%
26. FOCUSED
One specific thing in mind
● Find it or not
● Quickest/easiest path to results
is important
● From nothing good enough,
good enough to better than
good enough
● Willing to try things out
● But still want to fulfil their intent
EXPLORATORY
A path to explore
● Difficult for users to assess
how it went
● May be able to answer in
relative terms
● Users expect to be active when
in an exploratory mindset
● Effort is expected
[2] C Hosey, L Vujović, B St. Thomas, J Garcia-Gathright
& J Thom. Just Give Me What I Want: How People Use
and Evaluate Music Search. CHI 2019.
[3] A Li, J Thom, P Ravichandran, C Hosey, B St.
Thomas & J Garcia-Gathright. Search Mindsets:
Understanding Focused and Non-Focused
Information Seeking in Music Search. WWW 2019.
How users think about
results relate to how they
use Spotify tabs
What do users
want from
Search?
OPEN
A seed of an idea in mind
27. Important to consider user intent to predict
satisfaction, define optimisation metric or interpret
a metric.
K Shu, S Mukherjee, G Zheng, A Hassan Awadallah, M Shokouhi & S Dumais. Learning with Weak Supervision for Email Intent Detection. SIGIR 2020.
J Thom, A Nazarian, R Brillman, H Cramer & S Mennicken. "Play Music": User Motivations and Expectations for Non-Specific Voice Queries. ISMIR 2020.
N Martelaro, S Mennicken, J Thom, H Cramer & W Ju. Using Remote Controlled Speech Agents to Explore Music Experience in Context. DIS 2020.
N Su, J He, Y Liu, M Zhang & S Ma. User Intent, Behaviour, and Perceived Satisfaction in Product Search. WSDM 2018.
J Cheng, C Lo & J Leskovec. Predicting Intent Using Activity Logs: How Goal Specificity and Temporal Range Affect User Behavior. WWW 2017.
Understanding intent is hard
29. [4] P Chandar, J Garcia-Gathright, C Hosey, B St
Thomas & J Thom. Developing Evaluation Metrics for
Instant Search Using Mixed Methods. SIGIR 2019.
Success rate: a composite
metric of all success-related
behaviors, is more sensitive
than click-through rate
Users evaluate their search experience in
terms of effort and success
TYPE
User communicates
with us
CONSIDER
User evaluates what
we show them
DECIDE
User ends the
search session
EFFORT
Depends on a user mindset:
focused, open, exploratory
SUCCESS
Depends on user goal:
listen, organize, share
Success
Click-through
How are users
satisfied with
Search?
30. [5] P Dragone, R Mehrotra & M Lalmas. Deriving User-
and Content-specific Rewards for Contextual
Bandits. WWW 2019.
Using playlist consumption time to inform metric
to optimise for playlist satisfaction on Home
Optimizing for mean consumption time led to +22.24% in predicted
stream rate. Defining per user x playlist cluster led to further +13%
mean of
consumption
time
co-clustering
user group x
playlist type
How are users
satisfied with
playlists?
31. Personalisation algorithms will be very good at
optimising for the chosen metric.
L Hong & M Lalmas. Tutorial on Online User Engagement: Metrics and Optimization. WWW 2019 & KDD 2020.
H Hohnhold, D O’Brien & D Tang. Focusing on the Long-term: It’s Good for Users and Business. KDD 2015.
G Dupret & M Lalmas. Absence time and user engagement: Evaluating Ranking Functions. WSDM 2013.
X Yi, L Hong, E Zhong, N Nan Liu & S Rajan. Beyond clicks: dwell time for personalization. RecSys 2014.
M Lalmas, H O’Brien & E Yom-Tov. Measuring user engagement. Morgan & Claypool Publishers, 2014.
J Lehmann, M Lalmas, E Yom-Tov and G Dupret. Models of User Engagement. UMAP 2012.
Choosing metric is important
33. [6] S Way, J Garcia-Gathright, and H Cramer. Local
Trends in Global Music Streaming. ICWSM 2020.
Despite access to a global catalog, countries
are increasingly streaming their own, local music
Global music trade is
strongly shaped by
language and geography
Used Gravity Modeling
to study how these
relationships are changing
over time, around the world
Local music is on the rise
What music
users listen to?
34. [7] A Anderson, L Maystre, R Mehrotra, I Anderson & M
Lalmas. Algorithmic Effects on the Diversity of
Consumption on Spotify. WWW 2020.
Generalists and specialists exhibit different
retention and conversion behaviorsHow do users
listen to music?
Generalist-Specialist Score (GS)
Specialist Generalist
Generalists churn less and convert more than specialists
35. Segmentation helps personalisation algorithms
to perform for users and contents across the
spectrum.
C Hanser, C Hansen, L Maystre, R Mehrotra, B Brost, F Tomasi & M Lalmas. Contextual and Sequential User Embeddings for Large-Scale Music Recommendation. RecSys 2020.
A Epps-Darling, R Takeo, H Cramer. Female creator representation in music streaming. ISMIR 2020.
Y Jinyun, W Chu & R White. Cohort modeling for enhanced personalized search. SIGIR 2014.
S Goel, A Broder, E Gabrilovich & B Pang. Anatomy of the long tail: ordinary people with extraordinary tastes. WSDM 2010.
R White, S Dumais & J Teevan. Characterizing the influence of domain expertise on web search behavior. WSDM 2009.
Optimizing for segmentation
37. [8] R Mehrotra, N Xue & M Lalmas. Bandit based
Optimization of Multiple Objectives on a Music
Streaming Platform. KDD 2020.
Optimizing for multiple satisfaction objectives
together performs better than single metric
optimisation
Optimising for multiple satisfaction
metrics performs better for each metric
than directly optimising that metric
● clicks
● streaming time
● total number of tracks played
Single objective models Multi-
objective
model
Learning more relevant patterns of user
satisfaction with more optimisation metrics
● positive correlation between objectives
● holistic view of user experience
How to
consider
diversity in user
satisfaction?
38. [9] C Hansen, R Mehrotra, C Hansen, B Brost, L Maystre
& M Lalmas. Shifting Consumption towards Diverse
Content on Music Streaming Platforms. WSDM 2021.
Personalisation algorithms need to explicitly
optimise for content diversity
As personalisation algorithms increase
in complexity, they improve satisfaction
but at the cost of content diversity
Choice of personalisation algorithm
is more important when considering
diversity compared to considering
satisfaction only
How to
consider
diversity of
content?
39. When thinking about diversity, personalisation
algorithms become informed about what and
who they serve.
M Abdool, M Haldar, P Ramanathan, T Sax, L Zhang, A Manasawala, S Yang, B Turnbull, Q Zhang & T Legrand. Managing Diversity in Airbnb Search. KDD 2020.
H Cramer, J Wortman-Vaughan, K Holstein, H Wallach, H Daume, M Dudík, S Reddy & J Garcia-Gathright. Algorithmic bias in practice. FAT* Industry Translation Tutorial, 2019.
H Steck. Calibrated recommendations. RecSys 2019.
P Shah, A Soni & T Chevalier. Online Ranking with Constraints: A Primal-Dual Algorithm and Applications to Web Traffic-Shaping. KDD 2017.
D Agarwal & S Chatterjee. Constrained optimization for homepage relevance. WWW 2015.
Understanding diversity
40. Let us recap
... to incorporate human in the loop
when personalising at scale
41. PZN Offsite 2019
Understanding intents
Optimising for the right metric
Acting on segmentation
Thinking about diversity
User intents help inform metric
optimisation & interpretation.
Intent, segmentation & diversity
help bring the human in the loop
in our personalisation algorithms.
Segmentation helps adapt
personalisation algorithms.
1
2
3
4
Recap