Recommender Systems

31,454 views

Published on

A survey on current recommendation technologies, including own latest research at IMPCA, Curtin University of Technology

3 Comments
74 Likes
Statistics
Notes
  • am doing a research on collaborative filtering for teaching in a learning of web 3.0 and indeed your ppts really give me lots of ideas on Recsys. tqvm.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • as a new learner in this field,your ppt gives a comprehensive and easy-understanding introduction to Recsys, thank u
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Fioricet is often prescribed for tension headaches caused by contractions of the muscles in the neck and shoulder area. Buy now from http://www.fioricetsupply.com and make a deal for you.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
31,454
On SlideShare
0
From Embeds
0
Number of Embeds
588
Actions
Shares
0
Downloads
3,407
Comments
3
Likes
74
Embeds 0
No embeds

No notes for slide

Recommender Systems

  1. RecSys: Recommender Systems Tran The Truyen http://truyen.vietlabs.com
  2. The world is an over-crowded place
  3. They all want to get our attention
  4. We are overloaded • Thousands of news articles and blog posts each day • Millions of movies, books and music tracks online • In Hanoi, > 50 TV channels, thousands of programs each day • In New York, several thousands of ad messages sent to us per day
  5. But we really need and consume only a few of them!
  6. Sometimes, all we need is this
  7. Or, just this ! RB TU IS D ’T N O D
  8. Help me!
  9. Can Google help? • Yes, but only when we really know what we are looking for • What if I just want some interesting music tracks? – Btw, what does it mean by “interesting”?
  10. Can Facebook help? • Yes, I tend to find my friends’ stuffs interesting • What if I had only few friends, and what they like do not always attract me?
  11. Can experts help? • Yes, but it won’t scale well – Everyone receives exactly the same advice! • It is what they like, not me! – Like movies, what get expert approval does not guarantee attention of the mass
  12. OK, here is the idea called RecSys: I like these bits • To recommend to us something we may like – It may not be popular – The world is long-tailed • How? – Based on our history of using services – Based on other people like us – Ever heard of “collective intelligence”?
  13. Hang on, what is long-tailed? • Popularised by Chris Anderson, Wired 2004 The short-tailed distribution The bell-shaped distribution The long-tailed distribution
  14. Ever heard of • GroupLens? • Amazon recommendation? • Netflix Cinematch? • Google News personalization? • Netflix Prize $1mil challenge? • Strands? • TiVo? • Findory?
  15. Want some evidences? (Celma & Lamere, ISMIR 2007) • Netflix: – 2/3 rented movies are from recommendation • Google News – 38% more click-through are due to recommendation • Amazon – 35% sales are from recommendation
  16. What can be recommended? • Advertising messages • Tags • Investment choices • News articles • Restaurants • Online mates (Dating services) • Cafes • Future friends (Social network sites) • Music tracks • Courses in e-learning • Movies • Drug components • TV programs • Research papers • Books • Citations • Cloths • Code modules • Supermarket goods • Programmers
  17. But, what do recommender systems do, exactly? 1. Predict how much you may like a certain product/service 2. Compose a list of N best items for you 3. Compose a list of N best users for a certain product/service 4. Explain to you why these items are recommended to you 5. Adjust the prediction and recommendation based on your feedback and other people
  18. Graph representation Titanic Taken Panda ? Me My friend You Another guy
  19. We must also take a good care of • Data normalisation • Removal or reduction of noise • Protection of users’ privacy • Attack: someone just doesn’t like your system
  20. Task 1: Preference prediction • Collaborative filtering – User-based method – Item-based method – Matrix Factorization • Content-based filtering • Hybrid: – Linear/sequential/switching combination – Semi-Restricted Boltzmann Machines
  21. Collaborative filtering (1) • User-based method (1994, GroupLens) – Many people liked “Kungfu Panda” item 123 4 5678 – Can you tell how much I like it? 1545 3 4 – The idea is to pick about 20-50 2 35 4 5 people who share similar 3 4 5 4 taste with me, then how much I 45 5 35 4 like depend on how much 54 33 4 THEY liked. 652 35 – In short: you may like it 7 1 4 2 user because your “friends” liked it 8 5 43
  22. Collaborative filtering (2) • Item-based method (2001, deployed at Amazon) – I have watched so many good & bad movies – Would you recommend me watching “Taken”? item – The idea is to pick from my 1 2 3 4 5 67 8 previous list 20-50 movies that 1 4 3 4 5 5 share similar audience with “Taken”, then how much I will like 2 3 5 4 5 depend on how much I liked those 3 4 5 4 early movies 4 5 35 5 4 – In short: I tend to watch this movie because I have watched those 5 4 3 3 4 movies … or 6 5 2 3 5 – People who have watched those 7 1 4 2 user movies also liked this movie (Amazon style) 8 5 4 3
  23. Collaborative filtering (3) ~ [0.1 0.3 0.2 0.9 0.5 0.4 0.7 0.3 0.8 1.5] • Matrix Factorization (2006, Netflix challgence) – You many have watched thousands of movies – But perhaps I can tell these movies belong to 10 groups, like Action, Sci-Fi, Animation, etc,… – So 10 numbers are enough to describe your taste – Likewise, “Titanic” has been watched by millions people, but perhaps …10 numbers are enough to describe its features – Magic: these hidden aspects can be discovered automatically by Matrix Factorization!
  24. Problems with collaborative filtering • Scale – Netflix (2007): 5M users, 50K movies, 1.4B ratings • Sparse data – I have rated only one book at Amazon! • Cold-Start – New users and items do not have history • Popularity bias – Everyone reads “Harry Potter” • Hacking – Someone reads “Harry Potter” reads “Karma Sutra”
  25. Content-based method • Web page: words, hyperlinks, images, tags, comments, titles, URL, topic • Music: genre, rhythm, melody, harmony, lyrics, meta data, artists, bands, press releases, expert reviews, loudness, energy, time, spectrum, duration, frequency, pitch, key, mode, mood, style, tempo • User: age, sex, job, location, time, income, education, language, family status, hobbies, general interests, Web usage, computer usage, fan club membership, opinion, comments, tags, mobile usage • Context: time, location, mobility, activity, socializing, emotion
  26. Content-based method (2) • Can we acquire those content pieces automatically? – Fairly easy for text – Difficult for music and video, except for digital signals. E.g. music genre classification 60-80% accuracy – A lot of noise, e.g. misplaced tags – Attacks • What can we do with these? – Compute similarity between items or users – Query items that are similar to a given item – Match item’s content and user’s profile
  27. Content-based method (3) • Measuring similarity – Cosine, TF-IDF as in standard Information Retrieval – KL-divergence for probability-oriented guys – Euclidean, dimensionality reduction if you want – Anything you can imagine of!
  28. Hybrid: Semi-Restricted Boltzmann Machines (2009, IMPCA) User A User B User C • A probabilistic combination of – Item-based method – User-based method – Matrix Factorization – (May be) content-based method • It looks like a Neural Network 11 00 111 000 – But it does not really so ☺ 11 00 111 000 11 00 111 000 11 00 111 000 Item X • It really is a type of Markov random fields, which is, again, a type of Graphical Models – Self-advertising: I work on these stuffs for living!
  29. But, what do recommender systems do, exactly? 1. Predict how much you may like a certain product/service 2. Compose a list of N best items for you 3. Compose a list of N best users for a certain product/service 4. Explain to you why these items are recommended to you 5. Adjust the prediction and recommendation based on your feedback and other people
  30. Task 2,3: Top-N recommendation • Top-N item list: – Find similar users, collect what they like – Filter out those the user has rated – Rank the remaining items by considering • The number of times each item is liked by those users • The popularity of the item • The associated ratings • The similarity between each item in the list and what the user has rated • Switching the role of item to user, we may have top-N user list
  31. But, what do recommender systems do, exactly? 1. Predict how much you may like a certain product/service 2. Compose a list of N best items for you 3. Compose a list of N best users for a certain product/service 4. Explain to you why these items are recommended to you 5. Adjust the prediction and recommendation based on your feedback and other people
  32. Task 4: Explanation • This is a current hit … • More on this artist … • Try something from similar artists … • Someone similar to you also like this … • As you listened to that, you may want this … • These two go together … • This is most popular in your group … • This is highly rated … • Try something new …
  33. Task 4: Explanation (2) • Examples from Strands.com – Welcom back (recently viewed) – For you today – New for you – Hot / Most popular of this type – Other people also do this … – Similar or related products – Complementary accessories – This goes with this … – Gift idea – Shopping assisant
  34. But, what do recommender systems do, exactly? 1. Predict how much you may like a certain product/service 2. Compose a list of N best items for you 3. Compose a list of N best users for a certain product/service 4. Explain to you why these items are recommended to you 5. Adjust the prediction and recommendation based on your feedback and other people
  35. Task 5: Online updating • New items and users come each hour or minute • The two worlds: – Most songs and books are still interesting for a long time (the tail is really long) – Most news articles are read on the day and forgotten next day • But tracking back is useful to follow an event or scandal • Online updating large-scale neighbour-based systems is NOT easy at all
  36. Evaluation • How do we know the recommendation is good? – How good is good? – Measures should be automated • Practice: training/testing split (e.g. 80/20) • Popular criteria – Prediction error: ZOE, MAE, RMSE – Hit recall/precision/F-measure, rank utility, ROC curve,
  37. Evaluation (2) • Yet little on – Relevance – Usefulness – % Increase in purchase – % Reduction in cost – Novelty/surprise/long-tails – Diversity – Coverage – Explainability
  38. A question: Can we make use of these information sources? • Blogs • Social Media • Online comments • Online stores • Review sites • Locations • Mobility
  39. A case-study: Strands • Services for any online-retailers – Retailers send product, purchase information into Strands server (one retailer per account) through APIs – Strands returns recommendation for each visitor • The same logic for social media servers • moneyStrands for personal financial management (e.g. investment recommendation) • MyStrands for music personalization
  40. Want more practical hints? • New books: – Toby Segaran, Programming Collective Intelligence, O'Reilly, 2007 – Satnam Alag, Collective Intelligence in Action, Manning Publications, 2009 • Check out for real deployment: – TechCrunch – ReadWriteWeb
  41. Want more state-of-the-arts? • Research in Recommender Systems is becoming a mainstream, evidenced from the recent conference ACM RecSys. • Other places: – ICWSM: Weblog and Social Media – WebKDD: Web Knowledge Discovery and Data Mining – WWW: The original WWW conference – SIGIR: Information Retrieval – ACM KDD: Knowledge Discovery and Data Mining – ICML: Machine Learning
  42. Questions left to you • Will you trust such Recommender Systems? • Will you implement and deploy it here? • Will you do research? – PhD scholarships available (as of 19/4/09) – See http://truyen.vietlabs.com/scholarship.html – Warning: you are going to waste 3-5 years of your youth life!

×