What are product recommendations, and how do they work?
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

What are product recommendations, and how do they work?

  • 962 views
Uploaded on

The content of this presentation is based on: ...

The content of this presentation is based on:

Chapter 1, 2 and 4 of the following book: Owen, Anil, Dunning, Friedman. Mahout in Action. Shelter Island, NY: Manning Publications Co., 2012.

Chapter “Discussion of Similarity Metrics” of the following publication: Shanley Philip. Data Mining Portfolio.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
962
On Slideshare
956
From Embeds
6
Number of Embeds
3

Actions

Shares
Downloads
54
Comments
0
Likes
3

Embeds 6

http://localhost 4
https://si0.twimg.com 1
https://twitter.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. CC 2.0 by Horia Varlan | http://flic.kr/p/7vjmof
  • 2. Septem ber 1, 2012•  What are Product Recommenders 2 •  Introducing Recommenders •  A Simple Example •  Recommender Evaluation•  How do they work? •  Machine learning tool – Apache MahoutNamics Conference 2012Agenda
  • 3. Septem ber 1, 2012•  Spin-off of MeMo News AG, the 3 leading provider for Social Media Monitoring & Analytics in Switzerland•  Big Data expert, focused on Hadoop, HBase and Solr•  Objective: Transforming data into insightsIntroAbout Sentric
  • 4. CC 2.0 by Dennis Wong | http://flic.kr/p/6C3RuV  
  • 5. Septem ber 1, 2012•  Each day we form opinions about 5 things we like, don’t like, and don’t even care about.•  People tend to like things … •  that similar people like •  that are similar to other things they like•  These patterns can be used to predict such likes and dislikes.Introducing RecommendersThe Patterns
  • 6. Septem ber 1, 2012user-based – Look to what people with 6similar tastes seem to likeExample:Introducing RecommendersStrategies for Discovering New Things
  • 7. Septem ber 1, 2012item-based – Figure out what items are 7like the ones you already like (again by looking toothers’ apparent preferences)Example:Introducing RecommendersStrategies for Discovering New Things
  • 8. Septem ber 1, 2012content-based – Suggest items based on 8 Septemparticular attribute (again by looking to others’ apparent ber 1, 2012preferences)Example:Introducing RecommendersStrategies for Discovering New Things
  • 9. Septem ber 1, 2012 9Collaborative Filtering – Item-basedProducing recommendationsbased on, and only basedon, knowledge of users’ User-based Content-basedrelationships to items. RecommendersRecommendation is all about predictingpatterns of taste, and using them todiscover new and desirable things youdidn’t already know about.Introducing RecommendersThe Definition of Recommendation
  • 10. CC 2.0 by Will Scullin | http://flic.kr/p/6K9jb8  
  • 11. Septem ber 1, 2012•  Let’s start with a simple example 11 Create  Input   Create  a   Analyse  the   Data   Recommender   Output  A Simple user-based ExampleThe Workflow
  • 12. Septem ber 1, 2012•  Recommendations will 1,101,5.0 
 12 1,102,3.0 
 base on input-data User 1 has a preference 3.0 1,103,2.5 
 for item 102 2,101,2.0 
•  Data takes the form of 2,102,2.5 
 preferences –associations 2,103,5.0 
 2,104,2.0 
 from users to items 3,101,2.5 
 3,104,4.0 
 3,105,4.5 
 3,107,5.0 
Example: 4,101,5.0 
 4,103,3.0" 4,104,4.5"These values might be ratings 4,106,4.0"on a scale of 1 to 5, where 1 5,101,4.0" 5,102,3.0"indicates items the user can’t 5,103,2.0" 5,104,4.0"stand, and 5 indicates 5,105,3.5"favorites. 5,106,4.0 "    A Simple user-based ExampleInput Data
  • 13. Septem ber 1, 2012•  Trend visualization for positive users 1,101,5.0 
 13 1,102,3.0 
 preferences (in petrol) 1,103,2.5 
 2,101,2.0 
 2,102,2.5 
 1 5 3 2,103,5.0 
 2,104,2.0 
 3,101,2.5 
 3,104,4.0 
 3,105,4.5 
 101 102 103 104 105 106 107 3,107,5.0 
 4,101,5.0 
 4,103,3.0" 4,104,4.5" 4,106,4.0" 5,101,4.0" 2 4 5,102,3.0" 5,103,2.0" 5,104,4.0"•  All other preferences are recognized as 5,105,3.5" negative – the user doesn’t seem to like the 5,106,4.0 " item that much (red, dotted)    A Simple user-based ExampleTrend Visualization
  • 14. Septem ber 1, 2012Users 1 and 5 seem to have similar tastes. 14Both like 101, like 102 a little less, and like 103 less still 1 5 101 102 103 104 105 106 107 Users 1 and 4 seem to have similar tastes. Both 2 4 seem to like 101 and 103 identicallyUsers 1 and 2 have tastes that seemto run counter to each otherA Simple user-based ExampleTrend Visualization
  • 15. Septem ber 1, 2012So what product might be recommended to 15user 1? 1 5 3 101 102 103 104 105 106 107 2 4 Obviously not 101, 102 or 103. User 1 already knows about these.A Simple user-based ExampleAnalyzing the Output
  • 16. Septem ber 1, 2012The output could be: [item:104, value:4.257081]" 16The recommender engine did so because itestimated user 1’s preference for 104 to beabout 4.3, and that was the highest among allthe items eligible for recommendation.Questions:•  Is this the best recommendation for user 1?•  What exactly is a good recommendation?A Simple user-based ExampleAnalyzing the Output
  • 17. CC 2.0 by larsaaboe | http://flic.kr/p/7nJpV8  
  • 18. Septem ber 1, 2012 Goal: 18 Evaluate how closely the estimated preferences match the actual preferences. How? Produce Compare estimate estimates with Reasonable 30% for testPrepare Split Run preferences Analyse test data à data set   70 % for training with training Calculate a data score Experiment with other recommendersA Simple user-based ExampleEvaluating a Recommender
  • 19. Septem ber 1, 2012Example evaluation output for a 19particular recommender engine Item 1 Item 2 Item 3 Actual 3.0 5.0 4.0 Estimate 3.5 2.0 5.0 Difference 0.5 3.0 1.0 Average distance = (0.5+3.0+1.0)/3=1.5 Root-mean-square =√((0.52+3.02+1.02)/3)=1.8484Note: A score of 0.0 would mean perfect estimationA Simple user-based ExampleEvaluating a Recommender
  • 20. CC 2.0 by amtrak_russ | http://flic.kr/p/6fAPej  
  • 21. Septem ber 1, 2012•  Mahout … 21 •  Open-source machine learning library from Apache (Java) •  Can be used for large data collections – it’s scalable, build upon Apache Hadoop •  Implements algorithms such as Classification, Recommenders, Clustering •  Incubates a number of techniques and algorithms•  ML it’s a hype! But …In a NutshellApache Mahout
  • 22. Septem ber 1, 2012A Simple Recommender 22class RecommenderExample {" … main(String[] args) throws … {" DataModel model = new FileDataModel(new File(“examle.csv")); " UserSimilarity similarity = " new PearsonCorrelationSimilarity(model);" UserNeighborhood neighborhood = " new NearestNUserNeighborhood(2, similarity, model);" Recommender recommender = " new GenericUserBasedRecommender(model, neighborhood, similarity);" List<RecommendedItem> recommendations = recommender.recommend(1, 1);"" for (RecommendedItem recommendation : recommendations) {" System.out.println(recommendation);" }"}}"  A Simple user-based ExampleCreate a Recommender
  • 23. Septem ber 1, 2012 23 <<interface>>   UserSimilarity   <<interface>>   <<interface>>   ApplicaAon   Recommender   DataModel   <<interface>>   UserNeighborhood  A user-based RecommenderComponent Interaction
  • 24. Septem ber 1, 2012NearestNUserNeighborhood ThresholdUserNeighborhood 24 2   2   1   1   5   5   3   3   4   4  A neighborhood around user 1is chosen to consist of the Defining a neighborhood ofthree most similar users: 5, 4, most-similar users with aand 2 similarity thresholdAlgorithmsUserNeighborhood
  • 25. Septem ber 1, 2012Implementations of this interface define a 25notion of similarity between two users.Implementations should return values in therange -1.0 to 1.0, with 1.0 representing perfectsimilarity. <<interface>>
 UserSimilarity" EuclideanDistance PearsonCorrelation UncenteredCosine Similarity" Similarity" Similarity" LogLikelihood TanimotoCoefficient ..." Similarity" Similarity"AlgorithmsUser Similarity
  • 26. Septem ber 1, 2012Similarity between data objects can be represented in 26a variety of ways:•  Distance between data objects is sum of the distances of each attribute of the data objects (i.e. Euclidean Distance)•  Measuring how the attributes of both data objects change with respect to the variation of the mean value for the attributes (Pearson Correlation coefficient)•  Using the word frequencies for each document, the normalized dot product of the frequencies can be used as a measure of similarity (cosine similarity)•  An a few more ..AlgorithmsUser Similarity
  • 27. Septem ber 1, 2012Similarity between 27two data objects: 5 4 User 5 User 1 3 102 User 2 2 1 User 3 User 4 0 0 1 2 3 4 5 101Mathematically & PlotEuclidean Distance
  • 28. Septem ber 1, 2012Similarity between 28two data objects: 5 4.5 4 104 101 3.5 3 102 User 5 2.5 2 103 1.5 1 0.5 0 0 1 2 3 4 5 User 1Mathematically & PlotPearson Correlation
  • 29. Septem ber 1, 2012 29 Questions? Jean-Pierre König, jean-pierre.koenig@sentric.chNamics Conference 2012Thank you!
  • 30. Septem ber 1, 2012•  References 30 The content of this presentation is based on: •  Chapter 1, 2 and 4 of the following book: Owen, Anil, Dunning, Friedman. Mahout in Action. Shelter Island, NY: Manning Publications Co., 2012. •  Chapter “Discussion of Similarity Metrics” of the following publication: Shanley Philip. Data Mining Portfolio.•  Links http://bitly.com/bundles/jpkoenig/1A Simple user-based ExampleLiteratur & Links