Exploring content recommendation
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Exploring content recommendation

  • 787 views
Uploaded on

A study and experiences with Mahout recommender

A study and experiences with Mahout recommender

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
787
On Slideshare
787
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
40
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Exploring content recommendation Felipe Besson @fmbesson March, 2013
  • 2. “A lot of times, people don't know what they want until you show it to them.” Steve Jobs “We don't make money when we sell things; we make money when we help customers make purchase decisions.” Jeff Bezos, Amazon Why recommendation is important ?
  • 3. An Apache project to build scalable machine learning libraries ● Focused on large data sets ● Adaption of standard machine learning algorithms ● Run on Apache Hadoop (map/reduce paradigm) … or on a non Hadoop node
  • 4. Who is using Mahout ? Source: https://cwiki.apache.org/MAHOUT/powered-by-mahout.html
  • 5. Supported core algorithms ● Classification ● Clustering ● Recommendation ● Pattern Mining ● Regression ● Dimension Reduction ● Evolutionary Algorithms ● Vector Similarity
  • 6. Mahout Recommender Collaborative filtering People often get the best recommendation from someone with similar taste ● People tend to like things that are similar to other things they like ● There are patterns in people likes and dislikes John Bob movie1 movie1 movie2 movie2 movie42 movie4 movie5 Will Bob like movie4? and movie5?
  • 7. Mahout Recommender Available recommenders ● Item based ● User based Execution modes ● Taste: online but not distributed ● Hadoop: offline (batch) but distributed Parameters ● Many coefficients to calculate user and item similarity and neighborhood ● Data model abstractions
  • 8. Mahout Recommender (Hadoop) Input user_id item_id preference_value (optional) 1, 23, 0.9 1, 15, 0.5 1, 89, 0.1 2, 11, 0.3 2, 15, 0.2 9, 10, 0.5 9, 99, 0.9 9, 11, 0.1 8, 11, 0.5 ... Output user_id [recommended_item, score] 1: [10, 0.93; 11, 0.84; … ] 2: [23, 0.72; 17, 0.60; … ] 8: [121, 0.98; 23, 0.78; … ] 17: [12, 0.89; 32, 0.56; … ] 42: [129, 0.92; 98, 0.45; … ] ...
  • 9. 1st try! Movie recommendation Netflix base (http://www.netflixprize.com/) ● # of user tastes: 2.817.131 ● # of movies: 17.770 ● # of users: 472891 Environment and performance ● Hadoop pseudo-distributed ● Computer ● Intel® Core™ i5-3317U CPU @ 1.70GHz × 4 ● 6Gb RAM ● Total time: ~ 16 minutes
  • 10. How to run ? 1. Copy the input file to HDFS (Hadoop distributed file system) hadoop fs -put qualifying.txt /netflix/input/data.txt 2. Run the recommender hadoop jar core/target/mahout-core-0.8-SNAPSHOT-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=/netflix/input/data.txt -Dmapred.output.dir=/netflix/output --numRecommendations 10 --similarityClassname SIMILARITY_LOGLIKELIHOOD
  • 11. Results Recommender analyzer https://github.com/besson/recommender_analyzer http://rec-analyzer.herokuapp.com/
  • 12. Results
  • 13. References Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman. Mahout in Action, Manning publications, 2011.
  • 14. Thanks Felipe Besson @fmbesson