Exploring content
recommendation
Felipe Besson
@fmbesson
March, 2013
“A lot of times, people don't know what they
want until you show it to them.”
Steve Jobs
“We don't make money when we sell...
An Apache project to build scalable machine
learning libraries
●
Focused on large data sets
●
Adaption of standard machine...
Who is using Mahout ?
Source: https://cwiki.apache.org/MAHOUT/powered-by-mahout.html
Supported core algorithms
●
Classification
●
Clustering
●
Recommendation
●
Pattern Mining
●
Regression
●
Dimension Reducti...
Mahout Recommender
Collaborative filtering
People often get the best recommendation from someone
with similar taste
●
Peop...
Mahout Recommender
Available recommenders
●
Item based
●
User based
Execution modes
●
Taste: online but not distributed
●
...
Mahout Recommender (Hadoop)
Input
user_id
item_id
preference_value (optional)
1, 23, 0.9
1, 15, 0.5
1, 89, 0.1
2, 11, 0.3
...
1st try!
Movie recommendation
Netflix base (http://www.netflixprize.com/)
●
# of user tastes: 2.817.131
●
# of movies: 17....
How to run ?
1. Copy the input file to HDFS (Hadoop distributed
file system)
hadoop fs -put qualifying.txt /netflix/input/...
Results
Recommender analyzer
https://github.com/besson/recommender_analyzer
http://rec-analyzer.herokuapp.com/
Results
References
Sean Owen, Robin Anil, Ted Dunning, and Ellen
Friedman. Mahout in Action, Manning publications,
2011.
Thanks
Felipe Besson
@fmbesson
Upcoming SlideShare
Loading in...5
×

Exploring content recommendation

567

Published on

A study and experiences with Mahout recommender

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
567
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
42
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Exploring content recommendation"

  1. 1. Exploring content recommendation Felipe Besson @fmbesson March, 2013
  2. 2. “A lot of times, people don't know what they want until you show it to them.” Steve Jobs “We don't make money when we sell things; we make money when we help customers make purchase decisions.” Jeff Bezos, Amazon Why recommendation is important ?
  3. 3. An Apache project to build scalable machine learning libraries ● Focused on large data sets ● Adaption of standard machine learning algorithms ● Run on Apache Hadoop (map/reduce paradigm) … or on a non Hadoop node
  4. 4. Who is using Mahout ? Source: https://cwiki.apache.org/MAHOUT/powered-by-mahout.html
  5. 5. Supported core algorithms ● Classification ● Clustering ● Recommendation ● Pattern Mining ● Regression ● Dimension Reduction ● Evolutionary Algorithms ● Vector Similarity
  6. 6. Mahout Recommender Collaborative filtering People often get the best recommendation from someone with similar taste ● People tend to like things that are similar to other things they like ● There are patterns in people likes and dislikes John Bob movie1 movie1 movie2 movie2 movie42 movie4 movie5 Will Bob like movie4? and movie5?
  7. 7. Mahout Recommender Available recommenders ● Item based ● User based Execution modes ● Taste: online but not distributed ● Hadoop: offline (batch) but distributed Parameters ● Many coefficients to calculate user and item similarity and neighborhood ● Data model abstractions
  8. 8. Mahout Recommender (Hadoop) Input user_id item_id preference_value (optional) 1, 23, 0.9 1, 15, 0.5 1, 89, 0.1 2, 11, 0.3 2, 15, 0.2 9, 10, 0.5 9, 99, 0.9 9, 11, 0.1 8, 11, 0.5 ... Output user_id [recommended_item, score] 1: [10, 0.93; 11, 0.84; … ] 2: [23, 0.72; 17, 0.60; … ] 8: [121, 0.98; 23, 0.78; … ] 17: [12, 0.89; 32, 0.56; … ] 42: [129, 0.92; 98, 0.45; … ] ...
  9. 9. 1st try! Movie recommendation Netflix base (http://www.netflixprize.com/) ● # of user tastes: 2.817.131 ● # of movies: 17.770 ● # of users: 472891 Environment and performance ● Hadoop pseudo-distributed ● Computer ● Intel® Core™ i5-3317U CPU @ 1.70GHz × 4 ● 6Gb RAM ● Total time: ~ 16 minutes
  10. 10. How to run ? 1. Copy the input file to HDFS (Hadoop distributed file system) hadoop fs -put qualifying.txt /netflix/input/data.txt 2. Run the recommender hadoop jar core/target/mahout-core-0.8-SNAPSHOT-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=/netflix/input/data.txt -Dmapred.output.dir=/netflix/output --numRecommendations 10 --similarityClassname SIMILARITY_LOGLIKELIHOOD
  11. 11. Results Recommender analyzer https://github.com/besson/recommender_analyzer http://rec-analyzer.herokuapp.com/
  12. 12. Results
  13. 13. References Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman. Mahout in Action, Manning publications, 2011.
  14. 14. Thanks Felipe Besson @fmbesson
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×