Your SlideShare is downloading. ×
Buidling large scale recommendation engine
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Buidling large scale recommendation engine


Published on

Quick survey of recommendation strategies and introduction of mahout

Quick survey of recommendation strategies and introduction of mahout

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Netflix: 7 days to 1 day. 30M watches per day.
  • 똣하지 않는 발견 !
  • This is effective when you have a lot more users than items.
  • 2% of users provide feedbacks
  • Make captions more visible and also Likes button on the far left.
  • Make captions more visible and also Likes button on the far left.
  • Access log case: lots of robots access. What would be business case for Polyvore. Where is your traffic coming from? What are user’s intetion? Sizes of users and items. Seasonality
  • Transcript

    • 1. BuildingRecommendationEngine Keeyong Han, Jan 2013
    • 2. Table of Contents1. What is Recommendation?2. Different Recommendation Strategies3. Introduction of Hadoop/Mahout4. Building Recommendation Engine with Hadoop/Mahout5. How to use Mahout6. Q&A
    • 3. What isRecommendation?
    • 4. Definition ofRecommendation Engine"A recommendation system providesinformation or items that are likely to be ofinterest to a user, in an automated fashion”- Alpa Jain from Twitter"Serve the right item to users in anautomated fashion to optimize long-termbusiness objectives"- Deepak Agarwal from Yahoo
    • 5. Examples• Related Product (Amazon)• Movie Recommendation (Netflix)• News Contents (Yahoo)• Online Dating (eHarmony)• Search Autocomplete (Google)• Connection Recommendation (LinkedIn)• Song Recommendation (Pandora)• Walmart – (Physical) Store Layout
    • 6. Why Recommendation?• A way for users to find contents of interest (from large selections) with less efforts. o Natural way to personalization! o Serendipity factor• For companies, a good way to introduce new and unknown contents
    • 7. DifferentRecommendationStrategiesItem vs. User
    • 8. Item basedrecommendation (1)1. Content-based Item Recommendation. o Using meta data from Item, compute similarity between items. i. Description, price, category and so on ii. Normalize these into a feature vector (numeric values) i. You can think of it as a point in N-dimension. iii. Compute the distances between vectors. i. Euclidean Distance Score ii. Cosine Similarity Score iii. Pearson Correlation Score
    • 9. Item basedrecommendation (2)2. Collaborative Filtering. o Leverage users’ collective intelligence  Similar users tend to like similar items  Amazon’s product recommendation is a very good and famous example o Will look at this in more detail
    • 10. User basedrecommendation• First group users into different clusters o Represent users as feature vectors  Information about users: • geo-location, gender, age, …  Items users liked or rated o K-nearest neighbors (KNN) is used a lot• From each cluster, find representative items o Some kind of graph traversal o Highest rated items o Most liked items
    • 11. Challenges ofRecommendation Engine• Cold Starter o For new users and/or items, no information to leverage.• Sparse Data o Item reviews or purchases are not very common.• Scalability Issue o The bigger the data gets, the more computation is needed.
    • 12. Introduction ofHadoop/Mahout
    • 13. What is Hadoop?• An open source distributed computation and storage platform after Google File System and MapReduce framework• Perfect fit for large scale batch offline processing but not for realtime processing• Widely used in many companies
    • 14. What is Mahout?• An open source machine learning library written in Java. 1. Standalone 2. MapReduce. o Supports large scale batch offline processing.• Covers the followings o Recommendation/Collaborative Filtering. o Classification: Supervised Learning. o Clustering: Unsupervised Learning.
    • 15. BuildingRecommendation Enginewith Hadoop/Mahout
    • 16. Typical Architecture Data Collection Web server logs, MySQL tables, ... (explicit Input Data Pre-processing (ETL, Filtering, …) feedback and implicit feedback) Recommendation Data Building (Mahout) Output Data Post-processing (Re-ordering) Hadoop Load Final Data To Serving Layer MySQL, NoSQL, Recommendation Serving Layer Solr/ElasticSearch, ...
    • 17. Use Case:Polyvore – Item Page Item in question Content Based Recommendation Collaborative Filtering
    • 18. Use Case:Polyvore – Home Page Personalized Recommendation
    • 19. People who liked thisalso like ...• This is based on "Collaborative Filtering”• Construct co-occurrence matrix or Item similarity matrix – S[NxN] o Increment S[i,j] and S[j,i] if item i and item j are liked by the same user o Repeat this for all users for their liked items• For item k, find the most co-occurred items (from column k or row k) as recommendations.
    • 20. PersonalizedRecommendation• This is based on "Collaborative Filtering”• Extension of previous topic• Computation-wise, matrix multiplication a. First, build a similar matrix (S) for items b. Next, build a preference vector (P) for user c. Next, multiply two matrices from a and b  R=SxP a. Lastly, sort the final vector elements of R
    • 21. Polyvore Example• Assumption: o N items and M users. Users can only like (no rating)• Create item similarity matrix of S (NxN) o This will be used as recommendations in Item page• Create user preference vector of P(1xN) o Set all P(i) which are liked by the user in question• Multiply S by P o Sort result elements by the score o This will be personalized item recommendation
    • 22. How to use Mahout?• ItemSimilarityJob class • Main class to compute co-occurrence matrix.• RecommenderJob class • Main class to generate personalized recommendations.hadoop jar mahout-core-0.8-job.jar -Dmapred.input.dir=input/user-item-rating.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData --similarityClassname SIMILARITY_COOCCURRENCE --minPrefsPerUser 2 --maxPrefsPerUser 50000This will run total 10 mapreduce jobs to generate final recommendations forusers
    • 23. How to use Mahout?(Contd)• Input File: user-item-rating.txt o userID,itemID[,rating] per line.• How to compute similarity between Items o --similarityClassname parameter determines  CooccurrenceCountSimilarity  LogLikelihoodSimilarity  TanimotoCoefficientSimilarity  CityBlockSimilarity  CosineSimilarity  PearsonCorrelationSimilarity  EuclideanDistanceSimilarity
    • 24. How to use Mahout?(Contd)• Final Output o UserID [(ItemID,Score),(ItemID,Score),...... o ...• Load this from HDFS to a serving layer o Relational Database o Search Engine o NoSQL
    • 25. Lessons• Need to understand business domain o This takes time and efforts• Garbage In Garbage Out o Filtering is very important• Start with simple approach o And then improve gradually• Having automated pipeline is very important o More experiments with less efforts is doable o Remember you will have to do lots of experiments o But it is hard and takes time to build
    • 26. Next stage ofrecommendation?• Need realtime & scalable recommendation technology.• Recommendation As A Service. •
    • 27. Q&