SlideShare a Scribd company logo
Spotify Playlist Recommender
Table of Contents
1. The task
2. The dataset
3. Metrics
4. Proposed Solutions
5. EDA
6. Result
The task
The goal of the challenge is to develop a system for the task of
automatic playlist continuation. Given a set of playlist features,
participants' systems shall generate a list of recommended tracks
that can be added to that playlist, thereby 'continuing' the playlist.
We define the task formally as follows
Input
Given N incomplete playlists
Output
A list of 500 recommended candidate tracks, ordered by
relevance in decreasing order.
The dataset
The Million Playlist Dataset (MPD) contains 1,000,000 playlists
created by users on the Spotify platform. These playlists were
created during the period of January 2010 through October 2017.
Test set
In order to replicate competition's test set, we remove some
playlists from original playlists such that
All tracks in the challenge set appear in the MPD
All holdout tracks appear in the MPD
Test set size: 1000 playlists
Train set
Train set will be the original set subtracts tracks in test sets
Metrics
G: the ground truth set of tracks
R: and the ordered list of recommended tracks by R.
The size of a set or list is denoted by | â‹… |
R-precision
R-precision is the number of retrieved relevant tracks divided by
the number of known relevant tracks (i.e., the number of withheld
tracks):
Normalized discounted cumulative gain (NDCG)
Discounted cumulative gain (DCG) measures the ranking quality of
the recommended tracks, increasing when relevant tracks are
placed higher in the list.
The ideal DCG or IDCG is, on our case, equal to:
If the size of the set intersection of G and R, is empty, then the
DCG is equal to 0. The NDCG metric is now calculated as:
Recommended Songs clicks
Recommended Songs is a Spotify feature that, given a set of tracks
in a playlist, recommends 10 tracks to add to the playlist. The list
can be refreshed to produce 10 more tracks. Recommended Songs
clicks is the number of refreshes needed before a relevant track is
encountered. It is calculated as follows:
If the metric does not exist (i.e. if there is no relevant track in R), a
value of 51 is picked (which is 1 + the maximum number of clicks
possible).
Proposed Solutions
For our problem, we treat "user" as "playlist" and "item" as "song".
Because the dataset don't provide much information about each
song so we won't use content-based filtering. Therefore we would
only focus on
KNN
Collaborative Filtering
Frequent Pattern Growth
Matrix Factorization
Collaborative Filtering (CF)
Playlist based CF: From similarity between each playlist and
how other playlists "rate" (include or not) a track, I can infer the
current "rate".
Song based CF: From similarity between each songs and how
current playlist "rate" other songs, I can infer the current "rate".
Advantage:
easier to implementation
new data can be added easily and incrementally
don't need content of items
scales well with correlated items
Disadvantage:
are dependent on human ratings
cold start problem for new user and item
sparsity problem of rating matrix
limited scalability for large datasets
1. Construct playlist-song matrix
"1" means that song is included in the playlist and "0" otherwise.
For example, playlist 1 contains song 2 and song 3, song 2 also
includes in playlist 2.
2. Calculate the cosine similarity between song-song or playlist-
playlist. In playlist-playlist similarity, we take each row as a
vector while in song-song similarity we take column as a
vector.
For playlist-playlist, we predict that a playlist p contains song s is
given by the weighted sum of all other playlists' containing for song
s where the weighting is the cosine similarity between the each
playlist and the input playlist p. Then normalizing the result.
=
With song-song, we simply replace similarity matrix of playlists by
that of songs.
=
r
^ps ∣sim(p,p )∣
p′
∑ ′
sim(p,p )r
p′
∑ ′
p s
′
r
^ps ∣sim(s,s )∣
s′
∑ ′
sim(s,s )r
s′
∑ ′
ps′
KNN
Playlist-based: Find most similar playlist with current playlist
and add all non-duplicate songs.
Song-based: Find most similar songs with current songs in
playlist and add them to the playlist.
Advantage:
Same as Collaborative Filtering
Disadvantage:
Need to mantain the large matrix of similarity
Cold start problem
Frequent Pattern Growth
I have a number of current tracks, I will look at other playlists to find
common tracks associated with currents tracks and add them.
Advantage:
Scalability
Don't have problem with sparsity
Disadvantage:
Expensive model building
1. The first step of FP-growth is to calculate item frequencies and
identify frequent items.
2. The second step of FP-growth uses a suffix tree (FP-tree)
structure to encode transactions without generating candidate
sets explicitly, which are usually expensive to generate.
3. After the second step, the frequent itemsets can be extracted
from the FP-tree
Matrix Factorization
Decompose the playlist-songs matrix into dot product of many
matrix and use these matrix to make inference.
Matrix factorization can be done with orthogonal factorization
(SVD), probabilistic factorization (PMF) or Non-Negative
factorization (NMF).
Advantage:
Better addresses the sparsity and scalabity problem.
Improve prediction performance
Disadvantage:
Expensive model building
Trade-off between prediction performance and scalability
Loss of information in dimension reduction technique
Due to the large memory use when compute smaller matrix, I will
not implement Matrix Factorization.
Word2vec
We can use Word2vec to reduce the dimension of matrix and
perform KNN on the new dimension
Eploratory Data Analysis
Number of:
playlists: 1,000,000
Tracks: 66,346,428
Unique tracks: 2,262,292
Unique albums: 734,684
Unique Titles: 92,944
Distribution of: Playlist length, Number of Albums / Playlist,
Number of Artist / Playlist, Number of edits / Playlist, Number
of Followers / Playlist, Number of Tracks / Playlist
As we can see all distributions are left-skewed which means if
we are looking for average value, we should go for "Median"
not "Mean"
Median of playlist length: 11422438.0
Median of number of albums in each playlist: 37.0
Median of number of artists in each playlist: 29.0
Median of number of edits in each playlist: 29.0
Median of number of followers in each playlist: 1.0
Median of number of tracks in each playlist: 49.0
Top 20 Songs in Sporify playlists
Top 20 Artist in Spotify Playlists
Result
Method
R-
precision
NDCG
Song
Click
Time
Taken
(s)
Playlist-based KNN 0.7766 1.6010 0.0 41.42
Song-based KNN 0.7847 0.7975 0.0 4183
Word2Vec + Song-based 0.0030 0.004 10.35
Word2Vec + Playlist-based 0.0171 0.015 8.086
Playlist-based CF (get top
K rating songs)
0.7844 0.8011 0.0
approx
12000
FP Growth
Conclusion
Playlist-based and song-based KNN perform really well on this
dataset. Thanks for multi-processing, the inteference is really
fask. However similarity matrix and playlist-song matrix are
built beforehand.
Collaborative filtering also show comparable result with KNN
but take much longer to infer result.
Word2Vec dimension reduction failed to capture the similarity
between songs / playlists. It means that despite that playlists
are sequence of songs, its behaviour is difference from text
sequence.
Further improvement
Implement FP-growth in Spark and compare with current
solutions
Matrix Factorization

More Related Content

Similar to Hsjs.pdf

Music
MusicMusic
MUSIC APPLICATION (1).pdf
MUSIC   APPLICATION (1).pdfMUSIC   APPLICATION (1).pdf
MUSIC APPLICATION (1).pdf
namrataSingh900842
 
Science of hit song
Science of hit songScience of hit song
Science of hit song
Sridhar Pandellapalli
 
Case-based Sequential Ordering of Songs for Playlist Recommendation
Case-based Sequential Ordering of Songs for Playlist RecommendationCase-based Sequential Ordering of Songs for Playlist Recommendation
Case-based Sequential Ordering of Songs for Playlist Recommendation
Uri Levanon
 
In C++ please I need output please You will be building a .pdf
In C++ please I need output please   You will be building a .pdfIn C++ please I need output please   You will be building a .pdf
In C++ please I need output please You will be building a .pdf
sunilverma8487
 
Information retrieval to recommender systems
Information retrieval to recommender systemsInformation retrieval to recommender systems
Information retrieval to recommender systems
Data Science Society
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
Oscar Carlsson
 
The echo nest-music_discovery(1)
The echo nest-music_discovery(1)The echo nest-music_discovery(1)
The echo nest-music_discovery(1)
Sophia Yeiji Shin
 
Mood Detection
Mood DetectionMood Detection
Mood Detection
Puneet Singh
 
Sentiment analysis of song lyrics
Sentiment analysis of song lyricsSentiment analysis of song lyrics
Sentiment analysis of song lyrics
DeepanjanKundu2
 
Prediction of Spotify song popularity.pdf
 Prediction of Spotify song popularity.pdf Prediction of Spotify song popularity.pdf
Prediction of Spotify song popularity.pdf
AmanAshutosh3
 
819 LAB Program Playlist c++ You will be building a li.pdf
819 LAB Program Playlist c++ You will be building a li.pdf819 LAB Program Playlist c++ You will be building a li.pdf
819 LAB Program Playlist c++ You will be building a li.pdf
meenaaarika
 
Emofy
Emofy Emofy
Write a program in C++ language that implements a music play.pdf
Write a program in C++ language that implements a music play.pdfWrite a program in C++ language that implements a music play.pdf
Write a program in C++ language that implements a music play.pdf
iconsystemsslm
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
Andy Sloane
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
Aloïs Gruson
 
Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics
Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyricsVisualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics
Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics
Kayleigh Beard
 
Visualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics
VisualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyricsVisualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics
Visualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics
Nathalie Post
 
Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022
AndriaLesane
 
How to Discover New Music in 2023 5 Best Apps
How to Discover New Music in 2023 5 Best AppsHow to Discover New Music in 2023 5 Best Apps
How to Discover New Music in 2023 5 Best Apps
DarylMitchell9
 

Similar to Hsjs.pdf (20)

Music
MusicMusic
Music
 
MUSIC APPLICATION (1).pdf
MUSIC   APPLICATION (1).pdfMUSIC   APPLICATION (1).pdf
MUSIC APPLICATION (1).pdf
 
Science of hit song
Science of hit songScience of hit song
Science of hit song
 
Case-based Sequential Ordering of Songs for Playlist Recommendation
Case-based Sequential Ordering of Songs for Playlist RecommendationCase-based Sequential Ordering of Songs for Playlist Recommendation
Case-based Sequential Ordering of Songs for Playlist Recommendation
 
In C++ please I need output please You will be building a .pdf
In C++ please I need output please   You will be building a .pdfIn C++ please I need output please   You will be building a .pdf
In C++ please I need output please You will be building a .pdf
 
Information retrieval to recommender systems
Information retrieval to recommender systemsInformation retrieval to recommender systems
Information retrieval to recommender systems
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
 
The echo nest-music_discovery(1)
The echo nest-music_discovery(1)The echo nest-music_discovery(1)
The echo nest-music_discovery(1)
 
Mood Detection
Mood DetectionMood Detection
Mood Detection
 
Sentiment analysis of song lyrics
Sentiment analysis of song lyricsSentiment analysis of song lyrics
Sentiment analysis of song lyrics
 
Prediction of Spotify song popularity.pdf
 Prediction of Spotify song popularity.pdf Prediction of Spotify song popularity.pdf
Prediction of Spotify song popularity.pdf
 
819 LAB Program Playlist c++ You will be building a li.pdf
819 LAB Program Playlist c++ You will be building a li.pdf819 LAB Program Playlist c++ You will be building a li.pdf
819 LAB Program Playlist c++ You will be building a li.pdf
 
Emofy
Emofy Emofy
Emofy
 
Write a program in C++ language that implements a music play.pdf
Write a program in C++ language that implements a music play.pdfWrite a program in C++ language that implements a music play.pdf
Write a program in C++ language that implements a music play.pdf
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
 
Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics
Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyricsVisualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics
Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics
 
Visualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics
VisualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyricsVisualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics
Visualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics
 
Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022
 
How to Discover New Music in 2023 5 Best Apps
How to Discover New Music in 2023 5 Best AppsHow to Discover New Music in 2023 5 Best Apps
How to Discover New Music in 2023 5 Best Apps
 

Recently uploaded

Mastering The Best Restaurant Advertising Campaigns Detailed Guide
Mastering The Best Restaurant Advertising Campaigns Detailed GuideMastering The Best Restaurant Advertising Campaigns Detailed Guide
Mastering The Best Restaurant Advertising Campaigns Detailed Guide
Kopa Global Technologies
 
Email Marketing Master Class - Chris Ferris
Email Marketing Master Class - Chris FerrisEmail Marketing Master Class - Chris Ferris
Efficient Website Management for Digital Marketing Pros
Efficient Website Management for Digital Marketing ProsEfficient Website Management for Digital Marketing Pros
Efficient Website Management for Digital Marketing Pros
Lauren Polinsky
 
Global Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
Global Growth Starts With Translation - How To Unlock Global Markets - Tim KirbyGlobal Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
Global Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Luxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdf
Luxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdfLuxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdf
Luxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdf
KiranRai75
 
PickUp_conversational AI_Capex, Inc._20240611
PickUp_conversational AI_Capex, Inc._20240611PickUp_conversational AI_Capex, Inc._20240611
PickUp_conversational AI_Capex, Inc._20240611
Shuntaro Kogame
 
How to Kickstart Content Marketing With A Small Team - Dennis Shiao
How to Kickstart Content Marketing With A Small Team - Dennis ShiaoHow to Kickstart Content Marketing With A Small Team - Dennis Shiao
How to Kickstart Content Marketing With A Small Team - Dennis Shiao
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Unlocking Everyday Narratives: The Power of Storytelling in Marketing - Chad...
Unlocking Everyday Narratives: The Power of Storytelling in Marketing  - Chad...Unlocking Everyday Narratives: The Power of Storytelling in Marketing  - Chad...
Unlocking Everyday Narratives: The Power of Storytelling in Marketing - Chad...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Pillar-Based Marketing Master Class - Ryan Brock
Pillar-Based Marketing Master Class - Ryan BrockPillar-Based Marketing Master Class - Ryan Brock
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Playlist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music UPlaylist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music U
SemajahParker
 
Mastering Your Online Visibility - Fernando Angulo
Mastering Your Online Visibility - Fernando AnguloMastering Your Online Visibility - Fernando Angulo
Google Ads Vs Social Media Ads-A comparative analysis
Google Ads Vs Social Media Ads-A comparative analysisGoogle Ads Vs Social Media Ads-A comparative analysis
Google Ads Vs Social Media Ads-A comparative analysis
akashrawdot
 
How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...
How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...
How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
How American Bath Group Leveraged Kontent
How American Bath Group Leveraged KontentHow American Bath Group Leveraged Kontent
Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...
Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...
Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Consumer Journey Mapping & Personalization Master Class - Sabrina Killgo
Consumer Journey Mapping & Personalization Master Class - Sabrina KillgoConsumer Journey Mapping & Personalization Master Class - Sabrina Killgo
Consumer Journey Mapping & Personalization Master Class - Sabrina Killgo
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
From Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptx
From Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptxFrom Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptx
From Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptx
Boston SEO Services
 

Recently uploaded (20)

Mastering The Best Restaurant Advertising Campaigns Detailed Guide
Mastering The Best Restaurant Advertising Campaigns Detailed GuideMastering The Best Restaurant Advertising Campaigns Detailed Guide
Mastering The Best Restaurant Advertising Campaigns Detailed Guide
 
Email Marketing Master Class - Chris Ferris
Email Marketing Master Class - Chris FerrisEmail Marketing Master Class - Chris Ferris
Email Marketing Master Class - Chris Ferris
 
Unleash the Power of Storytelling - Win Hearts, Change Minds, Get Results - R...
Unleash the Power of Storytelling - Win Hearts, Change Minds, Get Results - R...Unleash the Power of Storytelling - Win Hearts, Change Minds, Get Results - R...
Unleash the Power of Storytelling - Win Hearts, Change Minds, Get Results - R...
 
Efficient Website Management for Digital Marketing Pros
Efficient Website Management for Digital Marketing ProsEfficient Website Management for Digital Marketing Pros
Efficient Website Management for Digital Marketing Pros
 
Global Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
Global Growth Starts With Translation - How To Unlock Global Markets - Tim KirbyGlobal Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
Global Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
 
Luxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdf
Luxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdfLuxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdf
Luxury Hanloom Saree Brand ,Capstone Project_Kiran Bansal.pdf
 
PickUp_conversational AI_Capex, Inc._20240611
PickUp_conversational AI_Capex, Inc._20240611PickUp_conversational AI_Capex, Inc._20240611
PickUp_conversational AI_Capex, Inc._20240611
 
How to Kickstart Content Marketing With A Small Team - Dennis Shiao
How to Kickstart Content Marketing With A Small Team - Dennis ShiaoHow to Kickstart Content Marketing With A Small Team - Dennis Shiao
How to Kickstart Content Marketing With A Small Team - Dennis Shiao
 
Unlocking Everyday Narratives: The Power of Storytelling in Marketing - Chad...
Unlocking Everyday Narratives: The Power of Storytelling in Marketing  - Chad...Unlocking Everyday Narratives: The Power of Storytelling in Marketing  - Chad...
Unlocking Everyday Narratives: The Power of Storytelling in Marketing - Chad...
 
Pillar-Based Marketing Master Class - Ryan Brock
Pillar-Based Marketing Master Class - Ryan BrockPillar-Based Marketing Master Class - Ryan Brock
Pillar-Based Marketing Master Class - Ryan Brock
 
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
 
Playlist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music UPlaylist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music U
 
Mastering Your Online Visibility - Fernando Angulo
Mastering Your Online Visibility - Fernando AnguloMastering Your Online Visibility - Fernando Angulo
Mastering Your Online Visibility - Fernando Angulo
 
Google Ads Vs Social Media Ads-A comparative analysis
Google Ads Vs Social Media Ads-A comparative analysisGoogle Ads Vs Social Media Ads-A comparative analysis
Google Ads Vs Social Media Ads-A comparative analysis
 
How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...
How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...
How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...
 
How American Bath Group Leveraged Kontent
How American Bath Group Leveraged KontentHow American Bath Group Leveraged Kontent
How American Bath Group Leveraged Kontent
 
Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...
Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...
Data-Driven Personalization - Build a Competitive Advantage by Knowing Your C...
 
Consumer Journey Mapping & Personalization Master Class - Sabrina Killgo
Consumer Journey Mapping & Personalization Master Class - Sabrina KillgoConsumer Journey Mapping & Personalization Master Class - Sabrina Killgo
Consumer Journey Mapping & Personalization Master Class - Sabrina Killgo
 
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
 
From Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptx
From Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptxFrom Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptx
From Hope to Despair The Top 10 Reasons Businesses Ditch SEO Tactics.pptx
 

Hsjs.pdf

  • 1. Spotify Playlist Recommender Table of Contents 1. The task 2. The dataset 3. Metrics 4. Proposed Solutions 5. EDA 6. Result
  • 2. The task The goal of the challenge is to develop a system for the task of automatic playlist continuation. Given a set of playlist features, participants' systems shall generate a list of recommended tracks that can be added to that playlist, thereby 'continuing' the playlist. We define the task formally as follows Input Given N incomplete playlists Output A list of 500 recommended candidate tracks, ordered by relevance in decreasing order.
  • 3. The dataset The Million Playlist Dataset (MPD) contains 1,000,000 playlists created by users on the Spotify platform. These playlists were created during the period of January 2010 through October 2017. Test set In order to replicate competition's test set, we remove some playlists from original playlists such that All tracks in the challenge set appear in the MPD All holdout tracks appear in the MPD Test set size: 1000 playlists Train set Train set will be the original set subtracts tracks in test sets
  • 4. Metrics G: the ground truth set of tracks R: and the ordered list of recommended tracks by R. The size of a set or list is denoted by | â‹… | R-precision R-precision is the number of retrieved relevant tracks divided by the number of known relevant tracks (i.e., the number of withheld tracks):
  • 5. Normalized discounted cumulative gain (NDCG) Discounted cumulative gain (DCG) measures the ranking quality of the recommended tracks, increasing when relevant tracks are placed higher in the list. The ideal DCG or IDCG is, on our case, equal to: If the size of the set intersection of G and R, is empty, then the DCG is equal to 0. The NDCG metric is now calculated as:
  • 6. Recommended Songs clicks Recommended Songs is a Spotify feature that, given a set of tracks in a playlist, recommends 10 tracks to add to the playlist. The list can be refreshed to produce 10 more tracks. Recommended Songs clicks is the number of refreshes needed before a relevant track is encountered. It is calculated as follows: If the metric does not exist (i.e. if there is no relevant track in R), a value of 51 is picked (which is 1 + the maximum number of clicks possible).
  • 7. Proposed Solutions For our problem, we treat "user" as "playlist" and "item" as "song". Because the dataset don't provide much information about each song so we won't use content-based filtering. Therefore we would only focus on KNN Collaborative Filtering Frequent Pattern Growth Matrix Factorization
  • 8. Collaborative Filtering (CF) Playlist based CF: From similarity between each playlist and how other playlists "rate" (include or not) a track, I can infer the current "rate". Song based CF: From similarity between each songs and how current playlist "rate" other songs, I can infer the current "rate".
  • 9. Advantage: easier to implementation new data can be added easily and incrementally don't need content of items scales well with correlated items Disadvantage: are dependent on human ratings cold start problem for new user and item sparsity problem of rating matrix limited scalability for large datasets
  • 10. 1. Construct playlist-song matrix "1" means that song is included in the playlist and "0" otherwise. For example, playlist 1 contains song 2 and song 3, song 2 also includes in playlist 2.
  • 11. 2. Calculate the cosine similarity between song-song or playlist- playlist. In playlist-playlist similarity, we take each row as a vector while in song-song similarity we take column as a vector.
  • 12. For playlist-playlist, we predict that a playlist p contains song s is given by the weighted sum of all other playlists' containing for song s where the weighting is the cosine similarity between the each playlist and the input playlist p. Then normalizing the result. = With song-song, we simply replace similarity matrix of playlists by that of songs. = r ^ps ∣sim(p,p )∣ p′ ∑ ′ sim(p,p )r p′ ∑ ′ p s ′ r ^ps ∣sim(s,s )∣ s′ ∑ ′ sim(s,s )r s′ ∑ ′ ps′
  • 13. KNN Playlist-based: Find most similar playlist with current playlist and add all non-duplicate songs. Song-based: Find most similar songs with current songs in playlist and add them to the playlist.
  • 14. Advantage: Same as Collaborative Filtering Disadvantage: Need to mantain the large matrix of similarity Cold start problem
  • 15. Frequent Pattern Growth I have a number of current tracks, I will look at other playlists to find common tracks associated with currents tracks and add them.
  • 16. Advantage: Scalability Don't have problem with sparsity Disadvantage: Expensive model building
  • 17. 1. The first step of FP-growth is to calculate item frequencies and identify frequent items. 2. The second step of FP-growth uses a suffix tree (FP-tree) structure to encode transactions without generating candidate sets explicitly, which are usually expensive to generate. 3. After the second step, the frequent itemsets can be extracted from the FP-tree
  • 18. Matrix Factorization Decompose the playlist-songs matrix into dot product of many matrix and use these matrix to make inference. Matrix factorization can be done with orthogonal factorization (SVD), probabilistic factorization (PMF) or Non-Negative factorization (NMF).
  • 19. Advantage: Better addresses the sparsity and scalabity problem. Improve prediction performance Disadvantage: Expensive model building Trade-off between prediction performance and scalability Loss of information in dimension reduction technique Due to the large memory use when compute smaller matrix, I will not implement Matrix Factorization.
  • 20. Word2vec We can use Word2vec to reduce the dimension of matrix and perform KNN on the new dimension
  • 21. Eploratory Data Analysis Number of: playlists: 1,000,000 Tracks: 66,346,428 Unique tracks: 2,262,292 Unique albums: 734,684 Unique Titles: 92,944
  • 22. Distribution of: Playlist length, Number of Albums / Playlist, Number of Artist / Playlist, Number of edits / Playlist, Number of Followers / Playlist, Number of Tracks / Playlist As we can see all distributions are left-skewed which means if we are looking for average value, we should go for "Median" not "Mean"
  • 23. Median of playlist length: 11422438.0 Median of number of albums in each playlist: 37.0 Median of number of artists in each playlist: 29.0 Median of number of edits in each playlist: 29.0 Median of number of followers in each playlist: 1.0 Median of number of tracks in each playlist: 49.0
  • 24. Top 20 Songs in Sporify playlists Top 20 Artist in Spotify Playlists
  • 25. Result Method R- precision NDCG Song Click Time Taken (s) Playlist-based KNN 0.7766 1.6010 0.0 41.42 Song-based KNN 0.7847 0.7975 0.0 4183 Word2Vec + Song-based 0.0030 0.004 10.35 Word2Vec + Playlist-based 0.0171 0.015 8.086 Playlist-based CF (get top K rating songs) 0.7844 0.8011 0.0 approx 12000 FP Growth
  • 26. Conclusion Playlist-based and song-based KNN perform really well on this dataset. Thanks for multi-processing, the inteference is really fask. However similarity matrix and playlist-song matrix are built beforehand. Collaborative filtering also show comparable result with KNN but take much longer to infer result. Word2Vec dimension reduction failed to capture the similarity between songs / playlists. It means that despite that playlists are sequence of songs, its behaviour is difference from text sequence.
  • 27. Further improvement Implement FP-growth in Spark and compare with current solutions Matrix Factorization