Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Raiders of the Lost Star

1,217 views
1,127 views

Published on

Introduction to Recommender Systems Research at Telefonica just when it was all starting November 2007

Published in: Technology, Travel
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,217
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Raiders of the Lost Star

  1. 1. TELEFÓNICA I+D November 27 2007 Raiders of the Lost Coping with Information Overflow through Recommendations © 2007 Telefónica Investigación y Desarrollo, S.A. Unipersonal Xavier Amatriain Researcher Give us the content
  2. 2. <ul><li>01 Introduction </li></ul><ul><li>02 Recommender Systems </li></ul><ul><li>03 The Netflix Prize </li></ul><ul><li>04 The Sparsity Problem </li></ul><ul><li>05 Working with the Data </li></ul><ul><li>06 So what does Telefonica get out of all this? </li></ul><ul><li>07 Conclusions </li></ul>Index
  3. 3. A little icebreaker What movie do you like?
  4. 4. Information overload “People read around 10 MB worth of material a day, hear 400 MB a day, and see one MB of information every second” The Economist, November 2006
  5. 5. Tell me what you like... <ul><li>Tell me what you like and I will tell you who you are </li></ul><ul><li>Tell me who you know and I will tell you what you like </li></ul><ul><li>Tell me what you have and I will tell you what you need </li></ul>
  6. 6. The value of recommendations <ul><li>Netflix: 2/3 of the movies rented were recommended </li></ul><ul><li>Google News: recommendations generate 38% more clickthrough </li></ul><ul><li>Amazon: 35% sales from recommendations </li></ul><ul><li>Choicestream: 28% of the people would buy more music if they found what they liked. </li></ul>
  7. 7. 02 Recommender Systems
  8. 8. The “Recommender problem” <ul><li>Estimate a utility function that is able to automatically predict how much a user will like an item that is unknown for her. Based on: </li></ul><ul><ul><li>Past behavior </li></ul></ul><ul><ul><li>Relations to other users </li></ul></ul><ul><ul><li>Item similarity </li></ul></ul><ul><ul><li>... </li></ul></ul>
  9. 9. Approaches to Recommendation <ul><li>Collaborative Filtering </li></ul><ul><ul><li>Recommend items based only on how other users have previously rated those items </li></ul></ul><ul><ul><li>User-based </li></ul></ul><ul><ul><ul><li>Find similar users to me and recommend what those users liked </li></ul></ul></ul><ul><ul><li>Item-based </li></ul></ul><ul><ul><ul><li>Find a similar item to those that I have previously liked </li></ul></ul></ul><ul><li>Content-based </li></ul><ul><ul><li>Recommend based on features inherent to the items </li></ul></ul>
  10. 10. What works? <ul><li>What works clearly depends on the domain of the recommender: Domain-specific modeling </li></ul><ul><li>However, in the general case it has been demonstrated that the best isolated approach is (currently) the item-based collaborative filtering. </li></ul><ul><ul><li>Other approaches can be hybridized to improve results in specific cases (cold-start problem...) </li></ul></ul>
  11. 11. 03 The Netflix Prize
  12. 12. The Netflix Prize <ul><li>500,000 users * 17,000 movie titles = 100M ratings = $1M (if you “only” improve existing system by 10%! From 0.95 to 0.85 RMSE) </li></ul><ul><ul><li>This is what Netflix thinks a 10% improvement is worth for their business </li></ul></ul><ul><ul><li>29K contestants on 23K teams from 165 countries. </li></ul></ul><ul><ul><li>19K valid submissions from 2700 teams; 59 submissions in the “last 24 hours” </li></ul></ul>
  13. 13. The Netflix Prize <ul><li>First conclusion: it is really extremely simple to reach a “reasonable” recommendations and extremely difficult to improve them. </li></ul>
  14. 14. The Netflix Prize <ul><li>(Apart from the extremely unlikely possibility of getting the $1M) it is a great source of data and measurable improvement. </li></ul><ul><ul><li>100M ratings from 1 to 5 </li></ul></ul><ul><ul><li>Measure of success: RMSE </li></ul></ul><ul><li>Most successfull teams are using item-based collaborative filtering and some sort of matrix factorization (such as SVD) and... </li></ul>
  15. 15. The Netflix Prize <ul><li>Currently the leader is at 8.5% improvement (blending 107 individual predictors using all sorts of techniques) </li></ul><ul><li>Many teams are merging </li></ul>
  16. 16. 04 The Sparsity Problem
  17. 17. The Sparsity Problem <ul><li>If you represent the Netflix rating data in a User/Movie matrix you get... </li></ul><ul><ul><li>500,000 x 17,000 = 8,500 M positions </li></ul></ul><ul><ul><li>Out of which only 100M are not 0's! </li></ul></ul><ul><li>Methods of dimensionality reduction </li></ul><ul><ul><li>Matrix Factorization </li></ul></ul><ul><ul><li>Clustering </li></ul></ul><ul><ul><li>Projection (PCA ...) </li></ul></ul>
  18. 18. Dimensionality Reduction <ul><li>Matrix Factorization </li></ul><ul><ul><li>This is so far the “winning horse” </li></ul></ul><ul><ul><li>In particular the Singular Value Decomposition method (Simon Funk's modified SVD) </li></ul></ul><ul><li>Clustering </li></ul><ul><ul><li>Similar results can be obtained but a higher computational cost (so far many “traditional” algorithms such as K-nn have been tried with varying results). </li></ul></ul>
  19. 19. Our approach to Dimensionality Reduction <ul><li>We are experimenting with message-passing clustering algorithms </li></ul><ul><ul><li>Affinity Propagation (Frey&Dueck, Science, February 2007) </li></ul></ul>
  20. 20. But wait... Is this all about tweaking algorithms? 05 Working with the data
  21. 21. What about the data? <ul><li>Data massaging </li></ul><ul><ul><li>Denoising – can we remove outliers and/or estimate noise? </li></ul></ul><ul><ul><ul><li>We are working on estimating noise inherent to the absolute quantized rating system. </li></ul></ul></ul><ul><ul><li>Remove global effects </li></ul></ul><ul><ul><ul><li>User tendencies (e.g. to rate higher than others) </li></ul></ul></ul><ul><ul><ul><li>Movie tendencies </li></ul></ul></ul><ul><ul><ul><li>Cross tendencies (movie vs. time...) </li></ul></ul></ul>
  22. 22. Approaching the sparsity problem <ul><li>A different (although complementary) approach to reducing data sparsity deals with trying to improve the data set. </li></ul><ul><li>2 possibilities </li></ul><ul><ul><li>Content-based approach </li></ul></ul><ul><ul><ul><li>“Group” similar items because they share similar important features (such as genre or director in films) to reduce dimensions </li></ul></ul></ul><ul><ul><ul><li>Add editorial data from external sources </li></ul></ul></ul><ul><ul><li>User-based approach </li></ul></ul><ul><ul><ul><li>Are there users “out there” that can provide missing data </li></ul></ul></ul>
  23. 23. User-oriented data approach <ul><li>Adding “expert” users might help in clustering the data set </li></ul><ul><li>We are crawling the web to find complementary information for users such as critics or others coming from services similar to Netflix </li></ul>
  24. 24. Algorithms + data + all those other things <ul><li>Serendipity </li></ul><ul><li>User Interface </li></ul><ul><li>Architecture </li></ul><ul><li>.... </li></ul>
  25. 25. 06 What does Telefonica get out of all this?
  26. 26. Some TEF projects using RS <ul><li>Online picture repository </li></ul><ul><li>IPTV program recommendation (Imagenio) </li></ul><ul><li>Personalized Addvertisement Placement (hyper-targetting) </li></ul><ul><li>Music recommendation on the cell phone </li></ul><ul><li>Product recommendation for online stores </li></ul>Multimedia Entertainment E-commerce Social Networking News/Blogs/Portals Comunidades PLATFORM PRODUCTS AND SERVICES COMMERCIALIZATION Content Packaging and Design Devices Access Commercialization Customers Recommendation Systems
  27. 27. 07 Conclusions <ul><li>Key technology in future years </li></ul><ul><li>Many areas to improve and large unexplored research field </li></ul><ul><ul><li>Area related to many traditional disciplines: Computer Science, Statistics, Economics, Sociology... </li></ul></ul><ul><li>Research results immediately applicable </li></ul><ul><ul><li>And generate revenues </li></ul></ul><ul><li>Core approach is reusable in many cases </li></ul>

×