Recommending and
Searching
Research @ Spotify
Mounia Lalmas
May 2019
What we do at Spotify
Spotify’s mission is to
unlock the potential of
human creativity — by
giving a million creative
artists the opportunity
to live off their art and
billions of fans the
opportunity to enjoy
and be inspired by it.
Our team mission:
Match fans and artists in a personal and relevant way.
ARTISTS FANS
What does it
mean to match
fans and artists
in a personal
and relevant
way?
songs
playlists
podcasts
...
catalog
search
browse
talk
users
What does it mean to match fans and artists
in a personal and relevant way?Artists
Fans
“We conclude that information retrieval and
information filtering are indeed two sides of
the same coin. They work together to help
people get the information needed to
perform their tasks.”
Information filtering and information retrieval: Two sides of the same coin? NJ Belkin & WB Croft,
Communications of the ACM, 1992.
“We can conclude that recommender
systems and search are also two sides of
the same coin at Spotify. They work
together to help fans get the music they will
enjoy listening”.
PULL
PARADIGM
PUSH
PARADIGM
is this the case?
Home … the push paradigm
Home
Home is the default screen of the mobile app
for all Spotify users worldwide.
It surfaces the best of what Spotify has to
offer, for every situation, personalized
playlists, new releases, old favorites, and
undiscovered gems.
Help users find something they are going to
enjoy listening to, quickly.
Streaming UserBaRT
Explore, Exploit, Explain: Personalizing Explainable Recommendations with Bandits. J McInerney, B Lacker, S Hansen, K Higley, H.Bouchard, A Gruson
& R Mehrotra, RecSys 2018.
BaRT: Machine learning algorithm for
Spotify Home
BaRT (Bandits for Recommendations as Treatments)
How to rank playlists (cards) in each shelf first, and then how to rank the shelves?
Explore vs Exploit
Flip a coin with given probability of tail
If head, pick best card in M according to predicted reward r → EXPLOIT
If tail, pick card from M at random → EXPLORE
BaRT: Multi-armed bandit algorithm for Home
https://hackernoon.com/reinforcement-learning-part-2-152fb510cc54
Success is captured by the reward function
Reward
Binarised Streaming Time
BaRT UserStreaming
success is when user
streams the playlist
for at least 30s
Is success the same for all playlists?
Consumption time of a sleep playlist is longer than average playlist consumption time.
Jazz listeners consume Jazz and other playlists for longer period than average users.
one reward function
for all users and all
playlists
success independent of
user and playlist
one reward function
per user x playlist
success depends on user and
playlist
too granular, sparse, noisy,
costly to generate & maintain
one reward function
per group of users x
playlists
success depends on group of
users listening to group of
playlists
Personalizing the reward function for BaRT
Co-clustering using streaming time
users
playlists
user
groups
playlist groups
Dhillon, Mallela & Modha, "Information-theoretic co-clustering”, KDD 2003.
group = cluster
group of user x playlist = co-cluster
user type playlist type
Deriving User- and Content-specific Rewards for
Contextual Bandits. P Dragone, R Mehrotra & M Lalmas.
WWW 2019.
Using playlist consumption time to inform metric
to optimise for (Home reward function)
Optimizing for mean consumption time led to +22.24% in predicted
stream rate. Defining per user x playlist cluster led to further +13%.
mean of consumption time
co-clustering user group x
playlist type
Metrics
importance to affinity type
features over generic
(age/day) features
Jointly Leveraging Intent and Interaction Signals to
Predict User Satisfaction with Slate Recommendations.
R Mehrotra, M Lalmas, D Kenney, T Lim-Meng & G
Hashemian. WWW 2019.
Three Intent Models intent important to interpret
user interaction
Passively Listening
- quickly access playlists or saved music
- play music matching mood or activity
- find music to play in background
Actively Engaging
- discover new music to listen to now
- save new music or follow new playlists for later
- explore artists or albums more deeply
User intent
Machine learning across intents on Home is
better than intent-agnostic machine learning
Considering intent improves ability to infer user satisfaction by 20%.
Towards a Fair Marketplace: Counterfactual Evaluation of
the trade-off between Relevance, Fairness & Satisfaction
in Recommendation Systems. R Mehrotra, J McInerney,
H Bouchard, M Lalmas & F Diaz. CIKM 2018.
Playlist is deemed diverse if it
contains tracks from artists with
different popularity groups.
Very few sets have both high
relevance & high diversity.
Diversity Relevance
Diversity
Recommender system optimizing for relevance
may not have high diversity estimate.
Gains in fairness possible without severe loss of satisfaction.
Adaptive policies aware of user receptiveness perform better.
Offline evaluation framework to launch,
evaluate and archive machine learning
studies, ensuring reproducibility and allowing
sharing across teams.
Offline Evaluation to Make Decisions About Playlist
Recommendation Algorithms. A Gruson, P Chandar, C
Charbuillet, J McInerney, S Hansen, D Tardieu & B
Carterette, WSDM 2019.
Offline
evaluation
Search … pull paradigm
Large catalog
40M+ songs, 3B+ playlists
2K+ microgenres
Many languages
79 markets
Different modalities
Typed, voice
Heterogeneous content
Music, podcast
Various granularities
Song, artist, playlist, podcast
Various goals
Focus, discover, lean-back, mood,
activity
Searching for music
Overview of the user journey in search
TYPE/TALK
User
communicates
with us
CONSIDER
User evaluates
what we show
them
DECIDE
User ends the
search session
INTENT
What the user
wants to do
MINDSET
How the user
thinks about
results
Search is instantaneous … at each keystroke
m my my_ my_f my_fav
s sa satt sat sati statis
Search is instantaneous
… the search logs for “satisfaction”
From prefix to query
→ What is the actual query?
→ What is a click vs prefix vs query?
prefix
query
A user can approach any intent with any mindset
FOCUSED
One specific thing in mind
OPEN
A seed of an idea in mind
EXPLORATORY
A path to explore
LISTEN
Have a listening session
ORGANIZE
Curate for future listening
SHARE
Connect with friends
FACT CHECK
Find specific information
EXPLORATORY mindset seems rare and likely better served by other features such as Browse.
LISTEN and ORGANIZE are most prominent intents & associated with lean-back vs lean-in behavior.
FOCUSED
One specific thing in mind
OPEN
A seed of an idea in mind
EXPLORATORY
A path to explore
● Find it or not
● Quickest/easiest
path to results is
important
● From nothing good
enough, good enough
to better than good
enough
● Willing to try things out
● But still want to fulfil
their intent
● Difficult for users to
assess how it went
● May be able to answer
in relative terms
● Users expect to be
active when in an
exploratory mindset
● Effort is expected
User
mindsets
Just Give Me What I Want: How People Use and Evaluate
Music Search. C Hosey, L Vujović, B St. Thomas, J
Garcia-Gathright & J Thom, CHI 2019.
How the user thinks about results
Focused
mindset
Search Mindsets: Understanding Focused and
Non-Focused Information Seeking in Music Search. A Li, J
Thom, P Ravichandran, C Hosey, B St. Thomas & J
Garcia-Gathright, WWW 2019.
Understanding mindset helps us understand
search satisfaction.
65% of searches were focused.
When users search with a Focused Mindset
Put MORE effort in search.
Scroll down and click on lower rank results.
Click MORE on album/track/artist and LESS on
playlist.
MORE likely to save/add but LESS likely to
stream directly.
Developing Evaluation Metrics for Instant Search Using
Mixed Methods. P Ravichandran, J Garcia-Gathright, C
Hosey, B St. Thomas & J Thom. SIGIR 2019.
success rate more
sensitive than
click-through rate.
Metrics
Users evaluate their experience on search
based on two main factors: success and effort
TYPE
User communicates
with us
CONSIDER
User evaluates what
we show them
DECIDE
User ends the
search session
EFFORT SUCCESS
Voice … the pull & push paradigm?
Search by
voice
Users ask for Spotify to play music, without
saying what they would like to hear
→ open mindset
Play
Spotify
Play music
Play music
from Spotify
Play me
some music
Play the
music
Play my
Spotify
Play some
music on
Spotify
Play some
music
Play music
on Spotify
Non-specific querying is a way for a user to effortlessly start
a listening session via voice.
Non-specific querying is a way to remove the burden of
choice when a user is open to lean-back listening.
User education matters as users will not engage in a
use-case they do not know about.
Trust and control are central to a positive experience. Users
need to trust the system enough to try it out.
Search as push paradigmSearch by
voice
Some final words
Qualitative&quantitativeresearch
KPIs&businessmetrics
Algorithms
Training & Datasets
Optimizationmetrics
Evaluation offline & online
Measurement & signals
Features
(item)
Features
(user)
Features
(context) Bias
Making machine learning work … at Spotify
Qualitative&quantitativeresearch
KPIs&businessmetrics
Algorithms
Training & Datasets
Optimizationmetrics
Evaluation offline & online
Measurement & signals
Features
(item)
Features
(user)
Features
(context) Bias
Making machine learning work … in this talk
conversational search (voice)
intent & mindset
BaRT
rewardfunction
forBaRT
diversity
ML-Lab search
metrics
Thank you!

Recommending and searching @ Spotify

  • 1.
    Recommending and Searching Research @Spotify Mounia Lalmas May 2019
  • 2.
    What we doat Spotify
  • 3.
    Spotify’s mission isto unlock the potential of human creativity — by giving a million creative artists the opportunity to live off their art and billions of fans the opportunity to enjoy and be inspired by it.
  • 4.
    Our team mission: Matchfans and artists in a personal and relevant way. ARTISTS FANS
  • 5.
    What does it meanto match fans and artists in a personal and relevant way?
  • 6.
    songs playlists podcasts ... catalog search browse talk users What does itmean to match fans and artists in a personal and relevant way?Artists Fans
  • 7.
    “We conclude thatinformation retrieval and information filtering are indeed two sides of the same coin. They work together to help people get the information needed to perform their tasks.” Information filtering and information retrieval: Two sides of the same coin? NJ Belkin & WB Croft, Communications of the ACM, 1992.
  • 8.
    “We can concludethat recommender systems and search are also two sides of the same coin at Spotify. They work together to help fans get the music they will enjoy listening”. PULL PARADIGM PUSH PARADIGM is this the case?
  • 9.
    Home … thepush paradigm
  • 10.
    Home Home is thedefault screen of the mobile app for all Spotify users worldwide. It surfaces the best of what Spotify has to offer, for every situation, personalized playlists, new releases, old favorites, and undiscovered gems. Help users find something they are going to enjoy listening to, quickly.
  • 11.
    Streaming UserBaRT Explore, Exploit,Explain: Personalizing Explainable Recommendations with Bandits. J McInerney, B Lacker, S Hansen, K Higley, H.Bouchard, A Gruson & R Mehrotra, RecSys 2018. BaRT: Machine learning algorithm for Spotify Home
  • 12.
    BaRT (Bandits forRecommendations as Treatments) How to rank playlists (cards) in each shelf first, and then how to rank the shelves?
  • 13.
    Explore vs Exploit Flipa coin with given probability of tail If head, pick best card in M according to predicted reward r → EXPLOIT If tail, pick card from M at random → EXPLORE BaRT: Multi-armed bandit algorithm for Home https://hackernoon.com/reinforcement-learning-part-2-152fb510cc54
  • 14.
    Success is capturedby the reward function Reward Binarised Streaming Time BaRT UserStreaming success is when user streams the playlist for at least 30s
  • 15.
    Is success thesame for all playlists? Consumption time of a sleep playlist is longer than average playlist consumption time. Jazz listeners consume Jazz and other playlists for longer period than average users.
  • 16.
    one reward function forall users and all playlists success independent of user and playlist one reward function per user x playlist success depends on user and playlist too granular, sparse, noisy, costly to generate & maintain one reward function per group of users x playlists success depends on group of users listening to group of playlists Personalizing the reward function for BaRT
  • 17.
    Co-clustering using streamingtime users playlists user groups playlist groups Dhillon, Mallela & Modha, "Information-theoretic co-clustering”, KDD 2003. group = cluster group of user x playlist = co-cluster
  • 18.
  • 19.
    Deriving User- andContent-specific Rewards for Contextual Bandits. P Dragone, R Mehrotra & M Lalmas. WWW 2019. Using playlist consumption time to inform metric to optimise for (Home reward function) Optimizing for mean consumption time led to +22.24% in predicted stream rate. Defining per user x playlist cluster led to further +13%. mean of consumption time co-clustering user group x playlist type Metrics importance to affinity type features over generic (age/day) features
  • 20.
    Jointly Leveraging Intentand Interaction Signals to Predict User Satisfaction with Slate Recommendations. R Mehrotra, M Lalmas, D Kenney, T Lim-Meng & G Hashemian. WWW 2019. Three Intent Models intent important to interpret user interaction Passively Listening - quickly access playlists or saved music - play music matching mood or activity - find music to play in background Actively Engaging - discover new music to listen to now - save new music or follow new playlists for later - explore artists or albums more deeply User intent Machine learning across intents on Home is better than intent-agnostic machine learning Considering intent improves ability to infer user satisfaction by 20%.
  • 21.
    Towards a FairMarketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. R Mehrotra, J McInerney, H Bouchard, M Lalmas & F Diaz. CIKM 2018. Playlist is deemed diverse if it contains tracks from artists with different popularity groups. Very few sets have both high relevance & high diversity. Diversity Relevance Diversity Recommender system optimizing for relevance may not have high diversity estimate. Gains in fairness possible without severe loss of satisfaction. Adaptive policies aware of user receptiveness perform better.
  • 22.
    Offline evaluation frameworkto launch, evaluate and archive machine learning studies, ensuring reproducibility and allowing sharing across teams. Offline Evaluation to Make Decisions About Playlist Recommendation Algorithms. A Gruson, P Chandar, C Charbuillet, J McInerney, S Hansen, D Tardieu & B Carterette, WSDM 2019. Offline evaluation
  • 23.
  • 24.
    Large catalog 40M+ songs,3B+ playlists 2K+ microgenres Many languages 79 markets Different modalities Typed, voice Heterogeneous content Music, podcast Various granularities Song, artist, playlist, podcast Various goals Focus, discover, lean-back, mood, activity Searching for music
  • 25.
    Overview of theuser journey in search TYPE/TALK User communicates with us CONSIDER User evaluates what we show them DECIDE User ends the search session INTENT What the user wants to do MINDSET How the user thinks about results
  • 26.
    Search is instantaneous… at each keystroke m my my_ my_f my_fav
  • 27.
    s sa sattsat sati statis Search is instantaneous … the search logs for “satisfaction” From prefix to query → What is the actual query? → What is a click vs prefix vs query? prefix query
  • 28.
    A user canapproach any intent with any mindset FOCUSED One specific thing in mind OPEN A seed of an idea in mind EXPLORATORY A path to explore LISTEN Have a listening session ORGANIZE Curate for future listening SHARE Connect with friends FACT CHECK Find specific information EXPLORATORY mindset seems rare and likely better served by other features such as Browse. LISTEN and ORGANIZE are most prominent intents & associated with lean-back vs lean-in behavior.
  • 29.
    FOCUSED One specific thingin mind OPEN A seed of an idea in mind EXPLORATORY A path to explore ● Find it or not ● Quickest/easiest path to results is important ● From nothing good enough, good enough to better than good enough ● Willing to try things out ● But still want to fulfil their intent ● Difficult for users to assess how it went ● May be able to answer in relative terms ● Users expect to be active when in an exploratory mindset ● Effort is expected User mindsets Just Give Me What I Want: How People Use and Evaluate Music Search. C Hosey, L Vujović, B St. Thomas, J Garcia-Gathright & J Thom, CHI 2019. How the user thinks about results
  • 30.
    Focused mindset Search Mindsets: UnderstandingFocused and Non-Focused Information Seeking in Music Search. A Li, J Thom, P Ravichandran, C Hosey, B St. Thomas & J Garcia-Gathright, WWW 2019. Understanding mindset helps us understand search satisfaction. 65% of searches were focused. When users search with a Focused Mindset Put MORE effort in search. Scroll down and click on lower rank results. Click MORE on album/track/artist and LESS on playlist. MORE likely to save/add but LESS likely to stream directly.
  • 31.
    Developing Evaluation Metricsfor Instant Search Using Mixed Methods. P Ravichandran, J Garcia-Gathright, C Hosey, B St. Thomas & J Thom. SIGIR 2019. success rate more sensitive than click-through rate. Metrics Users evaluate their experience on search based on two main factors: success and effort TYPE User communicates with us CONSIDER User evaluates what we show them DECIDE User ends the search session EFFORT SUCCESS
  • 32.
    Voice … thepull & push paradigm?
  • 33.
    Search by voice Users askfor Spotify to play music, without saying what they would like to hear → open mindset Play Spotify Play music Play music from Spotify Play me some music Play the music Play my Spotify Play some music on Spotify Play some music Play music on Spotify
  • 34.
    Non-specific querying isa way for a user to effortlessly start a listening session via voice. Non-specific querying is a way to remove the burden of choice when a user is open to lean-back listening. User education matters as users will not engage in a use-case they do not know about. Trust and control are central to a positive experience. Users need to trust the system enough to try it out. Search as push paradigmSearch by voice
  • 35.
  • 36.
    Qualitative&quantitativeresearch KPIs&businessmetrics Algorithms Training & Datasets Optimizationmetrics Evaluationoffline & online Measurement & signals Features (item) Features (user) Features (context) Bias Making machine learning work … at Spotify
  • 37.
    Qualitative&quantitativeresearch KPIs&businessmetrics Algorithms Training & Datasets Optimizationmetrics Evaluationoffline & online Measurement & signals Features (item) Features (user) Features (context) Bias Making machine learning work … in this talk conversational search (voice) intent & mindset BaRT rewardfunction forBaRT diversity ML-Lab search metrics
  • 38.