Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filtering with Playlist Title Information

41 views

Published on

Presentation slide for the paper "Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filtering with Playlist Title Information", which is accepted as presenting team for the RecSys Challenge 2018.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filtering with Playlist Title Information

  1. 1. Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filtering with Playlist Title Information Jaehun Kim, Minz Won, Cynthia C. S. Liem, Alan Hanjalic 1
  2. 2. Preliminary Approach
  3. 3. First attempt : WRMF 3 ● Good Old MF ○ Weighted Regularized Matrix Factorization [1] ■ Developed for implicit feedback ■ ALS* optimization : fast and reliable ■ Only 2~3 hyper parameters *Alternating Least Square (or Coordinate Descent) R U~= x V
  4. 4. First attempt : WRMF 4
  5. 5. First attempt : WRMF 5
  6. 6. First attempt : WRMF 6 NO SEED Average Winner
  7. 7. First attempt : WRMF 7 ● Good Old MF ○ Already reasonable performance ○ Except No-Seed case => Cold Start Problem ● Any metadata or content for playlist?
  8. 8. First attempt : WRMF 8 ● Good Old MF ○ Already reasonable performance ○ Except No-Seed case => Cold Start Problem ● Any metadata or content for playlist? ○ Playlist titles!
  9. 9. Playlist Titles
  10. 10. ● Text information (implicitly) represents the playlist ● Some key statistics Playlist Titles 10 # of titles (MPD + Challenge Set) 1,010,000 # of unique titles 93,250 # of unique titles (stemmed) 49,808 Single word ~60% Less than two words ~92%
  11. 11. ● Text information (implicitly) represents the playlist ● Some key statistics Playlist Titles 11 # of titles (MPD + Challenge Set) 1,010,000 # of unique titles 93,250 # of unique titles (stemmed) 49,808 Single word ~60% Less than two words ~92% 1. Playlist titles ~= Words
  12. 12. ● Noisiness Playlist Titles 12 Categories Examples Special characters //Pretty Little Liars//, ** some tunes, ?!? Repeated characters Yaaaas, summerrrr, partayyy Shortened words Chillin, Temp, favss Abbreviated words loml, jb, IDFK, jjjj Symbolic expressions Multiple languages otoño, 電台收藏, アニメ
  13. 13. ● Noisiness Playlist Titles 13 Categories Examples Special characters //Pretty Little Liars//, ** some tunes, ?!? Repeated characters Yaaaas, summerrrr, partayyy Shortened words Chillin, Temp, favss Abbreviated words loml, jb, IDFK, jjjj Symbolic expressions Multiple languages otoño, 電台收藏, アニメ 2. Standard word-level approaches (may be) NOT WORKING
  14. 14. Playlist Titles 14 ● Playlist titles ~= words ● Standard word-level approaches (may be) not working ● Character level approach : Character N-GRAM
  15. 15. Character N-gram 15 Input Text “Character N-gram” Unigram (1-gram) C, h, a, r, a, c, t, e, r, , N, -, g, r, a, m Bigram (2-gram) Ch, ha, ar, ac, ct, te, er, r , N, N-, -g, ... Trigram (3-gram) Cha, har, arc, rct, cte, ter, er , r N, N-, ... Quadrogram (4-gram) Char, harc, arct, rcte, cter, ter , er N, ... ... ...
  16. 16. Character N-gram 16 Input Text “Character N-gram” Unigram (1-gram) C, h, a, r, a, c, t, e, r, , N, -, g, r, a, m Bigram (2-gram) Ch, ha, ar, ac, ct, te, er, r , N, N-, -g, ... Trigram (3-gram) Cha, har, arc, rct, cte, ter, er , r N, N-, ... Quadrogram (4-gram) Char, harc, arct, rcte, cter, ter , er N, ... ... ... Cha har arc cte ter er r N N-abc zeb ytj pyk nrc nfe 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ... jkl Bag-of-Character-N-gram
  17. 17. ● Build bag-of-n-grams for each playlist (Train + Test Set) ● For each testing playlist ○ Find M closest playlist in Train set using cosine distance ○ Collect tracks from retrieved playlist ○ Recommend L most popular tracks Title-based RecSys NGRAM:Similarity Based 17
  18. 18. Cosine Distance Title-based RecSys NGRAM:Similarity Based 18 Cha har arc cte ter er r N N-abc zeb ytj pyk nrc nfe 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ... jkl ... ... Testing Playlists Training Playlist L most popular tracks among M closest training playlists Bag-of-Character-N-gram
  19. 19. Title-based RecSys :: Model Based 19
  20. 20. Title-based RecSys :: Model Based 20 1. (pre-trained) Transfer`Title info’
  21. 21. Title-based RecSys :: Model Based 21 1. 2. Substitute
  22. 22. Title-based RecSys :: Model Based 22 1. 2. ?X. Wang et al. (2014) [2] A. Van den Oord et al. (2013) [3]
  23. 23. Title-based RecSys :: Model Based 23 1. 2. + ?!
  24. 24. Proposed Model
  25. 25. Multi-Objective Collab. Filtering 25 MF for Pre-trained Factors Main objective function for
  26. 26. Multi-Objective Collab. Filtering 26 1. 2.
  27. 27. Multi-Objective Collab. Filtering 27 1. 2.
  28. 28. RNCF : Recurrent Neural CF 28
  29. 29. Results
  30. 30. Results:Overall 30 ● It works! (10th on final Leaderboard)
  31. 31. Results:Overall 31 ● Superior over simple baselines
  32. 32. Results:Overall 32 ● WRMF > SVD
  33. 33. HRNCF : Hybrid RNCF 33 R U~= x V Filling missing factors
  34. 34. Results:Overall 34 ● HRNCF: best of both worlds
  35. 35. Results:Overall 35 ● Simple NGRAM distance may have given us very similar performance
  36. 36. Hyperparameters: WRMF 36
  37. 37. Hyperparameters: WRMF 37
  38. 38. Hyperparameters: RNCF 38
  39. 39. Hyperparameters: RNCF 39
  40. 40. Hyperparameters: RNCF 40
  41. 41. Discussion & Take away
  42. 42. Take away 42 ● MF is still powerful ● Setting up right (internal) evaluation setup is more important than model ● Software engineering DOES MATTER ○ Since the scalability DOES MATTER ○ Since hyper-parameter tuning DOES MATTER ● Deep learning is not a magic wand ○ No Free Lunch ○ It costs a LOT ● Content-based algorithms still gives small (but significant) gain to CF
  43. 43. Thank you! Code: https://github.com/eldrin/recsys18-spotify-spotif-ai 43
  44. 44. References 44 [1] Hu, Yifan, Yehuda Koren, and Chris Volinsky. "Collaborative filtering for implicit feedback datasets." Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. Ieee, 2008. [2] Wang, Xinxi, and Ye Wang. "Improving content-based and hybrid music recommendation using deep learning." Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014. [3] Van den Oord, Aäron, Sander Dieleman, and Benjamin Schrauwen. "Deep content-based music recommendation." Advances in neural information processing systems. 2013.

×