Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Comparing topic models for a movie recommendation system webist2014

presentation at WEBIST Conference 2014, Barcellona, Spain

  • Login to see the comments

  • Be the first to like this

Comparing topic models for a movie recommendation system webist2014

  1. 1. Sonia Bergamaschi, Laura Po and Serena Sorrentino Department of Engineering “Enzo Ferrari”, University of Modena and Reggio Emilia, Italy Comparing Topic Models for a Movie Recommendation System
  2. 2.  Recommendation systems 
  3. 3.   
  4. 4.  their performance greatly suffers when little information about the users preferences are given
  5. 5.  movie plots without knowing any user preferences Topic Models  
  6. 6.   Local database movie selected by the user NO personal information NO user preferences
  7. 7.  
  8. 8.    Internet Movie Database Open Movie Database
  9. 9. Cast&Crew Movie Person IMDB Movie Collection 1,861,736 IMDB Personality Collection 3,165,235 TMDB Film Collection 20,861 IMDB Cast Collection 24,662,392 TMDB Person Collection 234,986 TMDB Production Collection 225,494 English Dbpedia Movie Collection 164,508 EnglishDbpedia Crew Collection 6,102 German Dbpedia Movie Collection 164,508 German Dbpedia Crew Collection 866 English Dbpedia Actor Collection 6,151 German Dbpedia Actor Collection 1,039
  10. 10. 1. Plot Vectorization - 2. Weights Computation- 3. Matrix Reduction by using Topic Models 4. Movie Similarity Computation-
  11. 11.    keyword1 keyword2 … plot a plot b wb,2 plot c The weight of keyword 2 according to plot b
  12. 12.  lower  add movies without re-computing  find similar movies
  13. 13.  
  14. 14. T Document by Keyword Matrix (d x k) K Topic by Keyword Matrix (z x k) = x S Topic by Topic Matrix (z x z) DT Document by Topic Matrix (d x z) x P(k|d) Document distribution over Keywords (d x k) P(k|z) Topic distrib. over Keywords (z x k) = x LSA LDA P(z|d) Document distrib. over Topics (d x z) 204,000 plots x 220,000 keywords 204,000 plots x 500 topics 204,000 plots x 50 topics204,000 plots x 220,000 keywords A test on the IMDb database, about 1,8 million of multimedia only 204,000 has a plot available.
  15. 15.   LSA allows to select plots that are better related to the target’s plot themes
  16. 16.    Off-line tests
  17. 17. • 20 users • 18 movies • the top 6 recommendations from both LSA and LDA • 594 evaluations collected
  18. 18. LDA does not have good performance on movie recommendations: it is not able to suggest movies of the same saga and it suggests erroneous entries for movies that have short plot LSA achieves good performance on movie recommendations: it is able to suggest movies of the same saga and also unknown movies related to the target one
  19. 19. • 30 users • 18 movies • the top 6 recommendations from both LDA and IMDb • 146 evaluations collected
  20. 20.   
  21. 21.      

×