Exploring content recommendation

629
-1

Published on

A study and experiences with Mahout recommender

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
629
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
42
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Exploring content recommendation

  1. 1. Exploring content recommendation Felipe Besson @fmbesson March, 2013
  2. 2. “A lot of times, people don't know what they want until you show it to them.” Steve Jobs “We don't make money when we sell things; we make money when we help customers make purchase decisions.” Jeff Bezos, Amazon Why recommendation is important ?
  3. 3. An Apache project to build scalable machine learning libraries ● Focused on large data sets ● Adaption of standard machine learning algorithms ● Run on Apache Hadoop (map/reduce paradigm) … or on a non Hadoop node
  4. 4. Who is using Mahout ? Source: https://cwiki.apache.org/MAHOUT/powered-by-mahout.html
  5. 5. Supported core algorithms ● Classification ● Clustering ● Recommendation ● Pattern Mining ● Regression ● Dimension Reduction ● Evolutionary Algorithms ● Vector Similarity
  6. 6. Mahout Recommender Collaborative filtering People often get the best recommendation from someone with similar taste ● People tend to like things that are similar to other things they like ● There are patterns in people likes and dislikes John Bob movie1 movie1 movie2 movie2 movie42 movie4 movie5 Will Bob like movie4? and movie5?
  7. 7. Mahout Recommender Available recommenders ● Item based ● User based Execution modes ● Taste: online but not distributed ● Hadoop: offline (batch) but distributed Parameters ● Many coefficients to calculate user and item similarity and neighborhood ● Data model abstractions
  8. 8. Mahout Recommender (Hadoop) Input user_id item_id preference_value (optional) 1, 23, 0.9 1, 15, 0.5 1, 89, 0.1 2, 11, 0.3 2, 15, 0.2 9, 10, 0.5 9, 99, 0.9 9, 11, 0.1 8, 11, 0.5 ... Output user_id [recommended_item, score] 1: [10, 0.93; 11, 0.84; … ] 2: [23, 0.72; 17, 0.60; … ] 8: [121, 0.98; 23, 0.78; … ] 17: [12, 0.89; 32, 0.56; … ] 42: [129, 0.92; 98, 0.45; … ] ...
  9. 9. 1st try! Movie recommendation Netflix base (http://www.netflixprize.com/) ● # of user tastes: 2.817.131 ● # of movies: 17.770 ● # of users: 472891 Environment and performance ● Hadoop pseudo-distributed ● Computer ● Intel® Core™ i5-3317U CPU @ 1.70GHz × 4 ● 6Gb RAM ● Total time: ~ 16 minutes
  10. 10. How to run ? 1. Copy the input file to HDFS (Hadoop distributed file system) hadoop fs -put qualifying.txt /netflix/input/data.txt 2. Run the recommender hadoop jar core/target/mahout-core-0.8-SNAPSHOT-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=/netflix/input/data.txt -Dmapred.output.dir=/netflix/output --numRecommendations 10 --similarityClassname SIMILARITY_LOGLIKELIHOOD
  11. 11. Results Recommender analyzer https://github.com/besson/recommender_analyzer http://rec-analyzer.herokuapp.com/
  12. 12. Results
  13. 13. References Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman. Mahout in Action, Manning publications, 2011.
  14. 14. Thanks Felipe Besson @fmbesson

×