Collaborative Filtering Algorithms :Getting StartedVivek A. Ganesanvivganes@gmail.comBig Data Gods Meetup, Santa Clara, CA...
Before we startCopyright 2013, Vivek A. Ganesan, All rights reserved 1o A BIG thank you to our sponsors –Big Data Cloudo M...
IntroductionCopyright 2013, Vivek A. Ganesan, All rights reserved 2o Program Outlineo This is an opt-in program, it is FRE...
AgendaCopyright 2013, Vivek A. Ganesan, All rights reserved 3o Introduction to CF Algorithmso When to use CF?o Metricso Ex...
Introduction to CF AlgorithmsCopyright 2013, Vivek A. Ganesan, All rights reserved 4o A family of algorithms used to predi...
CF : Common sense versionCopyright 2013, Vivek A. Ganesan, All rights reserved 5o Out of a large group of users who have r...
CF : VisualCopyright 2013, Vivek A. Ganesan, All rights reserved 6User/Movie Sleepless in Seattle Titanic Terminator 2Alic...
A sample approachCopyright 2013, Vivek A. Ganesan, All rights reserved 7o Compute Eduardo’s “similarity” to all otherusers...
Step 1 : Measuring SimilarityCopyright 2013, Vivek A. Ganesan, All rights reserved 8o Start with a distance metrico There ...
CF : Distances & SimilaritiesCopyright 2013, Vivek A. Ganesan, All rights reserved 9Alice Bob Chandra Dawood3.16 & 0.24 1....
ImprovementsCopyright 2013, Vivek A. Ganesan, All rights reserved 10o Some users rate movies consistently higher andothers...
A recommendation engineCopyright 2013, Vivek A. Ganesan, All rights reserved 11o Imagine a much larger data set of users a...
Questions? Comments?Thank You!E-mail: vivganes@gmail.comTwitter : onevivekCopyright 2013, Vivek A. Ganesan, All rightsrese...
Upcoming SlideShare
Loading in...5
×

Collaborative filtering getting_started

504

Published on

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
504
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Collaborative filtering getting_started"

  1. 1. Collaborative Filtering Algorithms :Getting StartedVivek A. Ganesanvivganes@gmail.comBig Data Gods Meetup, Santa Clara, CA May 13,2013
  2. 2. Before we startCopyright 2013, Vivek A. Ganesan, All rights reserved 1o A BIG thank you to our sponsors –Big Data Cloudo Meeting Spaceo Supporto Check out their big data training
  3. 3. IntroductionCopyright 2013, Vivek A. Ganesan, All rights reserved 2o Program Outlineo This is an opt-in program, it is FREE! (as in beer)o We do social coding (which means you share yourcode as open source, Apache v2 license)o Program duration = 1 month, weekly sprintso Weekly meetup (topical + social coding + Q/A)o A weekend hackathon (Sat. afternoon) alternateweeks (deep technical immersion)o Demo at the end of the program
  4. 4. AgendaCopyright 2013, Vivek A. Ganesan, All rights reserved 3o Introduction to CF Algorithmso When to use CF?o Metricso Exerciseo Questions?
  5. 5. Introduction to CF AlgorithmsCopyright 2013, Vivek A. Ganesan, All rights reserved 4o A family of algorithms used to predicto The preference of an user for an item, giveno a matrix of user preferences for items, whereo preferences must be expressed numerically (for e.g.user ratings of item on a 1 to 5 integer scale)o Collaborative because it only looks at userpreferences and does not take in to account user oritem attributeso Filtering, is math speak for selecting a subset
  6. 6. CF : Common sense versionCopyright 2013, Vivek A. Ganesan, All rights reserved 5o Out of a large group of users who have rateditems :o Pick a “small” subset of users who are “similar” toyouo Now, for an item that you have not yet rated but your“similar” users have rated :o Figure out an “average” rating for the item from your“similar” group of userso Weigh it with your rating history and predict a rating
  7. 7. CF : VisualCopyright 2013, Vivek A. Ganesan, All rights reserved 6User/Movie Sleepless in Seattle Titanic Terminator 2Alice 5 5 3Bob 1 3 5Chandra 3 5 4Dawood 2 3 5Eduardo (you oractive user)2 4?
  8. 8. A sample approachCopyright 2013, Vivek A. Ganesan, All rights reserved 7o Compute Eduardo’s “similarity” to all otheruserso Pick the three users “most similar” to Eduardoo Weigh their ratings for Terminator 2 by theirdegree of similarity to Eduardoo Make sure that the predicted rating is withinthe given scale (0 to 5)o … and predict Eduardo’s rating for Terminator 2
  9. 9. Step 1 : Measuring SimilarityCopyright 2013, Vivek A. Ganesan, All rights reserved 8o Start with a distance metrico There are several : let’s pick Euclidean for e.g.o For n space, square root of sum of squareddifferenceso Convert it to a similarity score (0 to 1)o 1/(1 + Euclidean Distance) (adding 1 to avoiddivision by zero)o 0 for no match, 1 for perfect match
  10. 10. CF : Distances & SimilaritiesCopyright 2013, Vivek A. Ganesan, All rights reserved 9Alice Bob Chandra Dawood3.16 & 0.24 1.414 & 0.414 1.414 & 0.414 1 & 0.5• Pick the top three users most similar to Eduardo :• Dawood, Bob and Chandra• Weigh their ratings for Terminator 2 by theirdegree of similarity to Eduardo :• (0.414 x 5) + (0.414 x 4) + (0.5 x 5) = 6.226• Ooops – too big a rating (0 to 5 scale)!• Divide by sum of similarities (0.414 + 0.414 + 0.5)• Answer : 6.226/1.328 = 4.688 (our prediction)
  11. 11. ImprovementsCopyright 2013, Vivek A. Ganesan, All rights reserved 10o Some users rate movies consistently higher andothers rate them consistently lowero Adjust for this by adding distance from meanand then finally adding mean of the activeusero Consult the Group Lens paper for detailso Use other measures that solves for “gradeinflation” e.g. Pearson’s
  12. 12. A recommendation engineCopyright 2013, Vivek A. Ganesan, All rights reserved 11o Imagine a much larger data set of users andmovie ratingso Do the same math for all users against all otheruserso Then predict ratings for those movies for whichusers have not yet ratedo For a given user, pick the top N predicted ratingmovies and recommend those
  13. 13. Questions? Comments?Thank You!E-mail: vivganes@gmail.comTwitter : onevivekCopyright 2013, Vivek A. Ganesan, All rightsreserved12

×