Raiders of the Lost Star

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    3 Favorites

    Raiders of the Lost Star - Presentation Transcript

    1. TELEFÓNICA I+D November 27 2007 Raiders of the Lost Coping with Information Overflow through Recommendations © 2007 Telefónica Investigación y Desarrollo, S.A. Unipersonal Xavier Amatriain Researcher Give us the content
      • 01 Introduction
      • 02 Recommender Systems
      • 03 The Netflix Prize
      • 04 The Sparsity Problem
      • 05 Working with the Data
      • 06 So what does Telefonica get out of all this?
      • 07 Conclusions
      Index
    2. A little icebreaker What movie do you like?
    3. Information overload “People read around 10 MB worth of material a day, hear 400 MB a day, and see one MB of information every second” The Economist, November 2006
    4. Tell me what you like...
      • Tell me what you like and I will tell you who you are
      • Tell me who you know and I will tell you what you like
      • Tell me what you have and I will tell you what you need
    5. The value of recommendations
      • Netflix: 2/3 of the movies rented were recommended
      • Google News: recommendations generate 38% more clickthrough
      • Amazon: 35% sales from recommendations
      • Choicestream: 28% of the people would buy more music if they found what they liked.
    6. 02 Recommender Systems
    7. The “Recommender problem”
      • Estimate a utility function that is able to automatically predict how much a user will like an item that is unknown for her. Based on:
        • Past behavior
        • Relations to other users
        • Item similarity
        • ...
    8. Approaches to Recommendation
      • Collaborative Filtering
        • Recommend items based only on how other users have previously rated those items
        • User-based
          • Find similar users to me and recommend what those users liked
        • Item-based
          • Find a similar item to those that I have previously liked
      • Content-based
        • Recommend based on features inherent to the items
    9. What works?
      • What works clearly depends on the domain of the recommender: Domain-specific modeling
      • However, in the general case it has been demonstrated that the best isolated approach is (currently) the item-based collaborative filtering.
        • Other approaches can be hybridized to improve results in specific cases (cold-start problem...)
    10. 03 The Netflix Prize
    11. The Netflix Prize
      • 500,000 users * 17,000 movie titles = 100M ratings = $1M (if you “only” improve existing system by 10%! From 0.95 to 0.85 RMSE)
        • This is what Netflix thinks a 10% improvement is worth for their business
        • 29K contestants on 23K teams from 165 countries.
        • 19K valid submissions from 2700 teams; 59 submissions in the “last 24 hours”
    12. The Netflix Prize
      • First conclusion: it is really extremely simple to reach a “reasonable” recommendations and extremely difficult to improve them.
    13. The Netflix Prize
      • (Apart from the extremely unlikely possibility of getting the $1M) it is a great source of data and measurable improvement.
        • 100M ratings from 1 to 5
        • Measure of success: RMSE
      • Most successfull teams are using item-based collaborative filtering and some sort of matrix factorization (such as SVD) and...
    14. The Netflix Prize
      • Currently the leader is at 8.5% improvement (blending 107 individual predictors using all sorts of techniques)
      • Many teams are merging
    15. 04 The Sparsity Problem
    16. The Sparsity Problem
      • If you represent the Netflix rating data in a User/Movie matrix you get...
        • 500,000 x 17,000 = 8,500 M positions
        • Out of which only 100M are not 0's!
      • Methods of dimensionality reduction
        • Matrix Factorization
        • Clustering
        • Projection (PCA ...)
    17. Dimensionality Reduction
      • Matrix Factorization
        • This is so far the “winning horse”
        • In particular the Singular Value Decomposition method (Simon Funk's modified SVD)
      • Clustering
        • Similar results can be obtained but a higher computational cost (so far many “traditional” algorithms such as K-nn have been tried with varying results).
    18. Our approach to Dimensionality Reduction
      • We are experimenting with message-passing clustering algorithms
        • Affinity Propagation (Frey&Dueck, Science, February 2007)
    19. But wait... Is this all about tweaking algorithms? 05 Working with the data
    20. What about the data?
      • Data massaging
        • Denoising – can we remove outliers and/or estimate noise?
          • We are working on estimating noise inherent to the absolute quantized rating system.
        • Remove global effects
          • User tendencies (e.g. to rate higher than others)
          • Movie tendencies
          • Cross tendencies (movie vs. time...)
    21. Approaching the sparsity problem
      • A different (although complementary) approach to reducing data sparsity deals with trying to improve the data set.
      • 2 possibilities
        • Content-based approach
          • “Group” similar items because they share similar important features (such as genre or director in films) to reduce dimensions
          • Add editorial data from external sources
        • User-based approach
          • Are there users “out there” that can provide missing data
    22. User-oriented data approach
      • Adding “expert” users might help in clustering the data set
      • We are crawling the web to find complementary information for users such as critics or others coming from services similar to Netflix
    23. Algorithms + data + all those other things
      • Serendipity
      • User Interface
      • Architecture
      • ....
    24. 06 What does Telefonica get out of all this?
    25. Some TEF projects using RS
      • Online picture repository
      • IPTV program recommendation (Imagenio)
      • Personalized Addvertisement Placement (hyper-targetting)
      • Music recommendation on the cell phone
      • Product recommendation for online stores
      Multimedia Entertainment E-commerce Social Networking News/Blogs/Portals Comunidades PLATFORM PRODUCTS AND SERVICES COMMERCIALIZATION Content Packaging and Design Devices Access Commercialization Customers Recommendation Systems
    26. 07 Conclusions
      • Key technology in future years
      • Many areas to improve and large unexplored research field
        • Area related to many traditional disciplines: Computer Science, Statistics, Economics, Sociology...
      • Research results immediately applicable
        • And generate revenues
      • Core approach is reusable in many cases
    27.  
    SlideShare Zeitgeist 2009

    + Xavier  AmatriainXavier Amatriain Nominate

    custom

    194 views, 3 favs, 1 embeds more stats

    Introduction to Recommender Systems Research at Tel more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 194
      • 193 on SlideShare
      • 1 from embeds
    • Comments 0
    • Favorites 3
    • Downloads 7
    Most viewed embeds
    • 1 views on http://quiroga.zobyhost.com

    more

    All embeds
    • 1 views on http://quiroga.zobyhost.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories