0
Adam Warski

Recommendation
systems

5/11/2013 Warszawa-JUG

Introduction using Mahout

@adamwarski
Agenda

❖

What is a recommender?!

❖

Types of recommenders, input data!

❖

Recommendations with Mahout: single node!

❖...
Who Am I?
❖

Day: coding @ SoftwareMill!

❖

Afternoon: playgrounds, Duplos, etc.!

❖

Evenings: blogging, open-source!
❖
...
Me + recommenders
❖

Yap.TV!

❖

Recommends shows to users
basing on their favorites!
❖
❖

❖

from Yap!
from Facebook!

So...
Some background

5/11/2013 Warszawa-JUG

@adamwarski
Information …

❖

Retrieval!
❖

❖

given a query, select items!

Filtering!
❖

given a user, filter out irrelevant items

5...
What can be recommended?

❖

Items to users!

❖

Items to items!

❖

Users to users!

❖

Users to items

5/11/2013 Warszaw...
Input data

❖

(user, item, rating) tuples!
❖

❖

rating: e.g. 1-5 stars!

(user, item) tuples!
❖

unary

5/11/2013 Warsza...
Input data
❖

Implicit!
❖
❖

reads!

❖
❖

clicks!

watching a video!

Explicits!
❖

explicit rating!

❖

favorites

5/11/2...
Prediction vs recommendation
❖

Recommendations!
❖
❖

❖

suggestions!
top-N!

Predictions!
❖

ratings

5/11/2013 Warszawa-...
Collaborative Filtering

❖

Something more than search keywords!
❖
❖

❖

tastes!
quality!

Collaborative: data from many u...
Content-based recommenders

❖

Define features and feature values!

❖

Describe each item as a vector of features

5/11/201...
Content-based recommenders
❖

User vector!
❖
❖

❖

e.g. counting user tags!
sum of item vectors!

Prediction: cosine betwe...
User-User CF

❖

Measure similarity between users!

❖

Prediction: weighted combination of existing ratings

5/11/2013 War...
User-User CF

❖

Domain must be scoped so that there’s agreement!

❖

Individual preferences should be stable

5/11/2013 W...
User similarity

❖

Many different metrics!
!

❖

Pearson correlation:

5/11/2013 Warszawa-JUG

@adamwarski
User neighbourhood
❖

Threshold on similarity!

❖

Top-N neighbours!
❖

❖

25-100: good starting point!

Variations:!
❖

t...
Predictions in UU-CF

5/11/2013 Warszawa-JUG

@adamwarski
Item-Item CF

❖

Similar to UU-CF, only with items!

❖

Item similarity!
❖

Pearson correlation on item ratings!

❖

co-oc...
Evaluating recommenders
❖

How to tell if a recommender is good?!
❖
❖

❖

compare implementations!
are the recommendations...
Leave one out

❖

Remove one preference!

❖

Rebuild the model!

❖

See if it is recommended!

❖

Or, see what is the pred...
Some metrics

❖

Mean absolute error:!

❖

Mean squared error

5/11/2013 Warszawa-JUG

@adamwarski
Precision/recall
❖

Precision: % of recommended items that are relevant!

❖

Recall: % of relevant items that are recommen...
Problems
❖

Novelty: hard to measure!
❖

❖

leave one out: punishes recommenders which
recommend new items!

Test on human...
Diversity/serendipity
❖

Increase diversity:!
❖
❖

❖

top-n!
as items come in, remove the ones that are too similar
to pri...
Netflix challenge

❖

$1m for the best recommender!

❖

Used mean error!

❖

Winning algorithm won on predicting low ratin...
Mahout single-node

5/11/2013 Warszawa-JUG

@adamwarski
5/11/2013 Warszawa-JUG

@adamwarski
5/11/2013 Warszawa-JUG

@adamwarski
Mahout single-node
❖

User-User CF!

❖

Item-Item CF!

❖

Various metrics!
❖
❖

Pearson!

❖
❖

Euclidean!

Log-likelihood!...
Let’s code

5/11/2013 Warszawa-JUG

@adamwarski
Mahout multi-node
❖

Based on Hadoop!

❖

Pre-computed recommendations!

❖

Item-based recommenders!
❖

❖

Co-occurrence m...
Running Mahout w/ Hadoop
hadoop jar mahout-core-0.8-job.jar !
! org.apache.mahout.cf.taste.hadoop.item.RecommenderJob !
! ...
Links
❖

http://www.warski.org!

❖

http://mahout.apache.org/!

❖

http://grouplens.org/!

❖

http://graphlab.org/graphchi...
Thanks!

❖

Questions?!

❖

Stickers ->!

❖

adam@warski.org

5/11/2013 Warszawa-JUG

@adamwarski
Upcoming SlideShare
Loading in...5
×

Recommendation systems with Mahout: introduction

2,022

Published on

An introduction to recommendation systems: types of recommenders, types of data (input), user-user collaborative filtering, item-item collaborative filtering, running Mahout on a single node and on multiple nodes (on Hadoop)

Published in: Technology, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,022
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
88
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Recommendation systems with Mahout: introduction"

  1. 1. Adam Warski Recommendation systems 5/11/2013 Warszawa-JUG Introduction using Mahout @adamwarski
  2. 2. Agenda ❖ What is a recommender?! ❖ Types of recommenders, input data! ❖ Recommendations with Mahout: single node! ❖ Recommendations with Mahout: multiple nodes 5/11/2013 Warszawa-JUG @adamwarski
  3. 3. Who Am I? ❖ Day: coding @ SoftwareMill! ❖ Afternoon: playgrounds, Duplos, etc.! ❖ Evenings: blogging, open-source! ❖ Original author of Hibernate Envers! ❖ ElasticMQ, Veripacks, MacWire! ❖ http://www.warski.org 5/11/2013 Warszawa-JUG @adamwarski
  4. 4. Me + recommenders ❖ Yap.TV! ❖ Recommends shows to users basing on their favorites! ❖ ❖ ❖ from Yap! from Facebook! Some business rules 5/11/2013 Warszawa-JUG @adamwarski
  5. 5. Some background 5/11/2013 Warszawa-JUG @adamwarski
  6. 6. Information … ❖ Retrieval! ❖ ❖ given a query, select items! Filtering! ❖ given a user, filter out irrelevant items 5/11/2013 Warszawa-JUG @adamwarski
  7. 7. What can be recommended? ❖ Items to users! ❖ Items to items! ❖ Users to users! ❖ Users to items 5/11/2013 Warszawa-JUG @adamwarski
  8. 8. Input data ❖ (user, item, rating) tuples! ❖ ❖ rating: e.g. 1-5 stars! (user, item) tuples! ❖ unary 5/11/2013 Warszawa-JUG @adamwarski
  9. 9. Input data ❖ Implicit! ❖ ❖ reads! ❖ ❖ clicks! watching a video! Explicits! ❖ explicit rating! ❖ favorites 5/11/2013 Warszawa-JUG @adamwarski
  10. 10. Prediction vs recommendation ❖ Recommendations! ❖ ❖ ❖ suggestions! top-N! Predictions! ❖ ratings 5/11/2013 Warszawa-JUG @adamwarski
  11. 11. Collaborative Filtering ❖ Something more than search keywords! ❖ ❖ ❖ tastes! quality! Collaborative: data from many users 5/11/2013 Warszawa-JUG @adamwarski
  12. 12. Content-based recommenders ❖ Define features and feature values! ❖ Describe each item as a vector of features 5/11/2013 Warszawa-JUG @adamwarski
  13. 13. Content-based recommenders ❖ User vector! ❖ ❖ ❖ e.g. counting user tags! sum of item vectors! Prediction: cosine between two vectors! ❖ profile! ❖ item 5/11/2013 Warszawa-JUG @adamwarski
  14. 14. User-User CF ❖ Measure similarity between users! ❖ Prediction: weighted combination of existing ratings 5/11/2013 Warszawa-JUG @adamwarski
  15. 15. User-User CF ❖ Domain must be scoped so that there’s agreement! ❖ Individual preferences should be stable 5/11/2013 Warszawa-JUG @adamwarski
  16. 16. User similarity ❖ Many different metrics! ! ❖ Pearson correlation: 5/11/2013 Warszawa-JUG @adamwarski
  17. 17. User neighbourhood ❖ Threshold on similarity! ❖ Top-N neighbours! ❖ ❖ 25-100: good starting point! Variations:! ❖ trust networks, e.g. friends in a social network 5/11/2013 Warszawa-JUG @adamwarski
  18. 18. Predictions in UU-CF 5/11/2013 Warszawa-JUG @adamwarski
  19. 19. Item-Item CF ❖ Similar to UU-CF, only with items! ❖ Item similarity! ❖ Pearson correlation on item ratings! ❖ co-occurences 5/11/2013 Warszawa-JUG @adamwarski
  20. 20. Evaluating recommenders ❖ How to tell if a recommender is good?! ❖ ❖ ❖ compare implementations! are the recommendations ok?! Business metrics! ❖ what leads to increased sales 5/11/2013 Warszawa-JUG @adamwarski
  21. 21. Leave one out ❖ Remove one preference! ❖ Rebuild the model! ❖ See if it is recommended! ❖ Or, see what is the predicted rating 5/11/2013 Warszawa-JUG @adamwarski
  22. 22. Some metrics ❖ Mean absolute error:! ❖ Mean squared error 5/11/2013 Warszawa-JUG @adamwarski
  23. 23. Precision/recall ❖ Precision: % of recommended items that are relevant! ❖ Recall: % of relevant items that are recommended! ❖ Precision@N, recall@N Recall Ideal recs Precision Recs 5/11/2013 Warszawa-JUG All items @adamwarski
  24. 24. Problems ❖ Novelty: hard to measure! ❖ ❖ leave one out: punishes recommenders which recommend new items! Test on humans! ❖ A/B testing 5/11/2013 Warszawa-JUG @adamwarski
  25. 25. Diversity/serendipity ❖ Increase diversity:! ❖ ❖ ❖ top-n! as items come in, remove the ones that are too similar to prior items! Increase serendipity:! ❖ downgrade popular items 5/11/2013 Warszawa-JUG @adamwarski
  26. 26. Netflix challenge ❖ $1m for the best recommender! ❖ Used mean error! ❖ Winning algorithm won on predicting low ratings 5/11/2013 Warszawa-JUG @adamwarski
  27. 27. Mahout single-node 5/11/2013 Warszawa-JUG @adamwarski
  28. 28. 5/11/2013 Warszawa-JUG @adamwarski
  29. 29. 5/11/2013 Warszawa-JUG @adamwarski
  30. 30. Mahout single-node ❖ User-User CF! ❖ Item-Item CF! ❖ Various metrics! ❖ ❖ Pearson! ❖ ❖ Euclidean! Log-likelihood! Support for evaluation 5/11/2013 Warszawa-JUG @adamwarski
  31. 31. Let’s code 5/11/2013 Warszawa-JUG @adamwarski
  32. 32. Mahout multi-node ❖ Based on Hadoop! ❖ Pre-computed recommendations! ❖ Item-based recommenders! ❖ ❖ Co-occurrence matrix! Matrix decomposition! ❖ Alternating Least Squares 5/11/2013 Warszawa-JUG @adamwarski
  33. 33. Running Mahout w/ Hadoop hadoop jar mahout-core-0.8-job.jar ! ! org.apache.mahout.cf.taste.hadoop.item.RecommenderJob ! ! --booleanData ! ! --similarityClassname SIMILARITY_LOGLIKELIHOOD ! ! --output output ! ! --input input/data.dat 5/11/2013 Warszawa-JUG @adamwarski
  34. 34. Links ❖ http://www.warski.org! ❖ http://mahout.apache.org/! ❖ http://grouplens.org/! ❖ http://graphlab.org/graphchi/! ❖ https://class.coursera.org/recsys-001/class! ❖ https://github.com/adamw/mahout-pres 5/11/2013 Warszawa-JUG @adamwarski
  35. 35. Thanks! ❖ Questions?! ❖ Stickers ->! ❖ adam@warski.org 5/11/2013 Warszawa-JUG @adamwarski
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×