SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 30 day free trial to unlock unlimited reading.
Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filtering with Playlist Title Information
Presentation slide for the paper "Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filtering with Playlist Title Information", which is accepted as presenting team for the RecSys Challenge 2018.
1.
Towards Seed-Free Music Playlist
Generation:
Enhancing Collaborative Filtering with Playlist Title
Information
Jaehun Kim, Minz Won, Cynthia C. S. Liem, Alan Hanjalic
1
3.
First attempt : WRMF
3
● Good Old MF
○ Weighted Regularized Matrix Factorization [1]
■ Developed for implicit feedback
■ ALS* optimization : fast and reliable
■ Only 2~3 hyper parameters
*Alternating Least Square (or Coordinate Descent)
R U~= x V
7.
First attempt : WRMF
7
● Good Old MF
○ Already reasonable performance
○ Except No-Seed case => Cold Start Problem
● Any metadata or content for playlist?
8.
First attempt : WRMF
8
● Good Old MF
○ Already reasonable performance
○ Except No-Seed case => Cold Start Problem
● Any metadata or content for playlist?
○ Playlist titles!
10.
● Text information (implicitly) represents the playlist
● Some key statistics
Playlist Titles
10
# of titles (MPD + Challenge Set) 1,010,000
# of unique titles 93,250
# of unique titles (stemmed) 49,808
Single word ~60%
Less than two words ~92%
11.
● Text information (implicitly) represents the playlist
● Some key statistics
Playlist Titles
11
# of titles (MPD + Challenge Set) 1,010,000
# of unique titles 93,250
# of unique titles (stemmed) 49,808
Single word ~60%
Less than two words ~92%
1. Playlist titles ~= Words
12.
● Noisiness
Playlist Titles
12
Categories Examples
Special characters //Pretty Little Liars//, ** some tunes, ?!?
Repeated characters Yaaaas, summerrrr, partayyy
Shortened words Chillin, Temp, favss
Abbreviated words loml, jb, IDFK, jjjj
Symbolic expressions
Multiple languages otoño, 電台收藏, アニメ
13.
● Noisiness
Playlist Titles
13
Categories Examples
Special characters //Pretty Little Liars//, ** some tunes, ?!?
Repeated characters Yaaaas, summerrrr, partayyy
Shortened words Chillin, Temp, favss
Abbreviated words loml, jb, IDFK, jjjj
Symbolic expressions
Multiple languages otoño, 電台收藏, アニメ
2. Standard word-level approaches
(may be) NOT WORKING
14.
Playlist Titles
14
● Playlist titles ~= words
● Standard word-level approaches (may be) not working
● Character level approach : Character N-GRAM
15.
Character N-gram
15
Input Text “Character N-gram”
Unigram (1-gram) C, h, a, r, a, c, t, e, r, , N, -, g, r, a, m
Bigram (2-gram) Ch, ha, ar, ac, ct, te, er, r , N, N-, -g, ...
Trigram (3-gram) Cha, har, arc, rct, cte, ter, er , r N, N-, ...
Quadrogram (4-gram) Char, harc, arct, rcte, cter, ter , er N, ...
... ...
16.
Character N-gram
16
Input Text “Character N-gram”
Unigram (1-gram) C, h, a, r, a, c, t, e, r, , N, -, g, r, a, m
Bigram (2-gram) Ch, ha, ar, ac, ct, te, er, r , N, N-, -g, ...
Trigram (3-gram) Cha, har, arc, rct, cte, ter, er , r N, N-, ...
Quadrogram (4-gram) Char, harc, arct, rcte, cter, ter , er N, ...
... ...
Cha har arc cte ter er r N N-abc zeb ytj pyk nrc nfe
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ...
jkl
Bag-of-Character-N-gram
17.
● Build bag-of-n-grams for each playlist (Train + Test Set)
● For each testing playlist
○ Find M closest playlist in Train set using cosine distance
○ Collect tracks from retrieved playlist
○ Recommend L most popular tracks
Title-based RecSys
NGRAM:Similarity Based
17
18.
Cosine
Distance
Title-based RecSys
NGRAM:Similarity Based
18
Cha har arc cte ter er r N N-abc zeb ytj pyk nrc nfe
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ...
jkl
...
...
Testing
Playlists
Training
Playlist
L most popular tracks among
M closest training playlists
Bag-of-Character-N-gram
42.
Take away
42
● MF is still powerful
● Setting up right (internal) evaluation setup is more important than model
● Software engineering DOES MATTER
○ Since the scalability DOES MATTER
○ Since hyper-parameter tuning DOES MATTER
● Deep learning is not a magic wand
○ No Free Lunch
○ It costs a LOT
● Content-based algorithms still gives small (but significant) gain to CF
44.
References
44
[1] Hu, Yifan, Yehuda Koren, and Chris Volinsky. "Collaborative filtering for implicit feedback datasets." Data Mining, 2008.
ICDM'08. Eighth IEEE International Conference on. Ieee, 2008.
[2] Wang, Xinxi, and Ye Wang. "Improving content-based and hybrid music recommendation using deep learning."
Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014.
[3] Van den Oord, Aäron, Sander Dieleman, and Benjamin Schrauwen. "Deep content-based music recommendation."
Advances in neural information processing systems. 2013.
Presentation slide for the paper "Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filtering with Playlist Title Information", which is accepted as presenting team for the RecSys Challenge 2018.
1.
Towards Seed-Free Music Playlist
Generation:
Enhancing Collaborative Filtering with Playlist Title
Information
Jaehun Kim, Minz Won, Cynthia C. S. Liem, Alan Hanjalic
1
3.
First attempt : WRMF
3
● Good Old MF
○ Weighted Regularized Matrix Factorization [1]
■ Developed for implicit feedback
■ ALS* optimization : fast and reliable
■ Only 2~3 hyper parameters
*Alternating Least Square (or Coordinate Descent)
R U~= x V
7.
First attempt : WRMF
7
● Good Old MF
○ Already reasonable performance
○ Except No-Seed case => Cold Start Problem
● Any metadata or content for playlist?
8.
First attempt : WRMF
8
● Good Old MF
○ Already reasonable performance
○ Except No-Seed case => Cold Start Problem
● Any metadata or content for playlist?
○ Playlist titles!
10.
● Text information (implicitly) represents the playlist
● Some key statistics
Playlist Titles
10
# of titles (MPD + Challenge Set) 1,010,000
# of unique titles 93,250
# of unique titles (stemmed) 49,808
Single word ~60%
Less than two words ~92%
11.
● Text information (implicitly) represents the playlist
● Some key statistics
Playlist Titles
11
# of titles (MPD + Challenge Set) 1,010,000
# of unique titles 93,250
# of unique titles (stemmed) 49,808
Single word ~60%
Less than two words ~92%
1. Playlist titles ~= Words
12.
● Noisiness
Playlist Titles
12
Categories Examples
Special characters //Pretty Little Liars//, ** some tunes, ?!?
Repeated characters Yaaaas, summerrrr, partayyy
Shortened words Chillin, Temp, favss
Abbreviated words loml, jb, IDFK, jjjj
Symbolic expressions
Multiple languages otoño, 電台收藏, アニメ
13.
● Noisiness
Playlist Titles
13
Categories Examples
Special characters //Pretty Little Liars//, ** some tunes, ?!?
Repeated characters Yaaaas, summerrrr, partayyy
Shortened words Chillin, Temp, favss
Abbreviated words loml, jb, IDFK, jjjj
Symbolic expressions
Multiple languages otoño, 電台收藏, アニメ
2. Standard word-level approaches
(may be) NOT WORKING
14.
Playlist Titles
14
● Playlist titles ~= words
● Standard word-level approaches (may be) not working
● Character level approach : Character N-GRAM
15.
Character N-gram
15
Input Text “Character N-gram”
Unigram (1-gram) C, h, a, r, a, c, t, e, r, , N, -, g, r, a, m
Bigram (2-gram) Ch, ha, ar, ac, ct, te, er, r , N, N-, -g, ...
Trigram (3-gram) Cha, har, arc, rct, cte, ter, er , r N, N-, ...
Quadrogram (4-gram) Char, harc, arct, rcte, cter, ter , er N, ...
... ...
16.
Character N-gram
16
Input Text “Character N-gram”
Unigram (1-gram) C, h, a, r, a, c, t, e, r, , N, -, g, r, a, m
Bigram (2-gram) Ch, ha, ar, ac, ct, te, er, r , N, N-, -g, ...
Trigram (3-gram) Cha, har, arc, rct, cte, ter, er , r N, N-, ...
Quadrogram (4-gram) Char, harc, arct, rcte, cter, ter , er N, ...
... ...
Cha har arc cte ter er r N N-abc zeb ytj pyk nrc nfe
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ...
jkl
Bag-of-Character-N-gram
17.
● Build bag-of-n-grams for each playlist (Train + Test Set)
● For each testing playlist
○ Find M closest playlist in Train set using cosine distance
○ Collect tracks from retrieved playlist
○ Recommend L most popular tracks
Title-based RecSys
NGRAM:Similarity Based
17
18.
Cosine
Distance
Title-based RecSys
NGRAM:Similarity Based
18
Cha har arc cte ter er r N N-abc zeb ytj pyk nrc nfe
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ...
jkl
...
...
Testing
Playlists
Training
Playlist
L most popular tracks among
M closest training playlists
Bag-of-Character-N-gram
42.
Take away
42
● MF is still powerful
● Setting up right (internal) evaluation setup is more important than model
● Software engineering DOES MATTER
○ Since the scalability DOES MATTER
○ Since hyper-parameter tuning DOES MATTER
● Deep learning is not a magic wand
○ No Free Lunch
○ It costs a LOT
● Content-based algorithms still gives small (but significant) gain to CF
44.
References
44
[1] Hu, Yifan, Yehuda Koren, and Chris Volinsky. "Collaborative filtering for implicit feedback datasets." Data Mining, 2008.
ICDM'08. Eighth IEEE International Conference on. Ieee, 2008.
[2] Wang, Xinxi, and Ye Wang. "Improving content-based and hybrid music recommendation using deep learning."
Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014.
[3] Van den Oord, Aäron, Sander Dieleman, and Benjamin Schrauwen. "Deep content-based music recommendation."
Advances in neural information processing systems. 2013.