Recommendation engines


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Recommendation engines

  1. 1. Cegeka AI/ML Competence Center Recommendation engines Theory and intro to Georgian Micsa
  2. 2. Georgian Micsa● Software engineer with 6+ years of experience, mainly Java but also JavaScript and .NET● Interested on OOD, architecture and agile software development methodologies● Currently working as Senior Java Developer @ Cegeka●●
  3. 3. What is it?● Recommender/recommendation system/engine/platform● A subclass of information filtering system● Predict the rating or preference that a user would give to a new item (music, books, movies, people or groups etc)● Can use a model built from the characteristics of an item (content-based approaches)● Can use the users social environment (collaborative filtering approaches)
  4. 4. Examples● ○ Recommend additional books ○ Frequently bought together books ○ Implemented using a sparse matrix of book cooccurrences● Pandora Radio ○ Plays music with similar characteristics ○ Content based filtering based on properties of song/artist ○ Based also on users feedback ○ Users emphasize or deemphasize certain characteristics
  5. 5. Examples 2● ○ Collaborative filtering ○ Recommends songs by observing the tracks played by user and comparing to behaviour of other users ○ Suggests songs played by users with similar interests● Netflix ○ Predictions of movies ○ Hybrid approach ○ Collaborative filtering based on user`s previous ratings and watching behaviours (compared to other users) ○ Content based filtering based on characteristics of movies
  6. 6. Collaborative filtering● Collect and analyze a large amount of information on users’ behaviors, activities or preferences● Predict what users will like based on their similarity to other users● It does not rely on the content of the items● Measures user similarity or item similarity● Many algorithms: ○ the k-nearest neighborhood (k-NN) ○ the Pearson Correlation ○ etc.
  7. 7. Collaborative filtering 2● Build a model from users profile collecting explicit and implicit data● Explicit data: ○ Asking a user to rate an item on a sliding scale. ○ Rank a collection of items from favorite to least favorite. ○ Presenting two items to a user and asking him/her to choose the better one of them. ○ Asking a user to create a list of items that he/she likes.● Implicit data: ○ Observing the items that a user views in an online store. ○ Analyzing item/user viewing times ○ Keeping a record of the items that a user purchases online. ○ Obtaining a list of items that a user has listened to or watched ○ Analyzing the users social network and discovering similar likes and dislikes
  8. 8. Collaborative filtering 3● Collaborative filtering approaches often suffer from three problems: ○ Cold Start: needs a large amount of existing data on a user in order to make accurate recommendations ○ Scalability: a large amount of computation power is often necessary to calculate recommendations. ○ Sparsity: The number of items sold on major e-commerce sites is extremely large. The most active users will only have rated a small subset of the overall database. Thus, even the most popular items have very few ratings.
  9. 9. Content-based filtering● Based on information about and characteristics of the items● Try to recommend items that are similar to those that a user liked in the past (or is examining in the present)● Use an item profile (a set of discrete attributes and features)● Content-based profile of users based on a weighted vector of item features● The weights denote the importance of each feature to the user● To compute the weights: ○ average values of the rated item vector ○ Bayesian Classifiers, cluster analysis, decision trees, and artificial neural networks
  10. 10. Content-based filtering 2● Can collect feedback from user to assign higher or lower weights on the importance of certain attributes● Cross-content recommendation: music, videos, products, discussions etc. from different services can be recommended based on news browsing.● Popular for movie recommendations: Internet Movie Database, See This Next etc.
  11. 11. Hybrid Recommender Systems● Combines collaborative filtering and content-based filtering● Implemented in several ways: ○ by making content-based and collaborative-based predictions separately and then combining them ○ by adding content-based capabilities to a collaborative-based approach (and vice versa) ○ by unifying the approaches into one model● Studies have shown that hybrid methods can provide more accurate recommendations than pure approaches● Overcome cold start and the sparsity problems● Netflix and See This Next
  12. 12. What is Apache Mahout?● A scalable Machine Learning library● Apache License● Scalable to reasonably large datasets (core algorithms implemented in Map/Reduce, runnable on Hadoop)● Distributed and non-distributed algorithms● Community● Usecases • Clustering (group items that are topically related) • Classification (learn to assign categories to documents) • Frequent Itemset Mining (find items that appear together) • Recommendation Mining (find items a user might like)
  13. 13. Non-distributed recommenders● Non-distributed, non Hadoop, collaborative recommender algorithms● Java or external server which exposes recommendation logic to your application via web services and HTTP● Key interfaces: ○ DataModel: CSV files or database ○ UserSimilarity: computes similarity between users ○ ItemSimilarity: computes similarity between items ○ UserNeighborhood: used for similarity of users ○ Recommender: produces recommendations● Different implementations based on your needs● Input in this format: UserId,ItemId,[Preference or Rating]● Preference is not needed in case of associations (pages viewed by users)
  14. 14. User-based recommender exampleDataModel model = new FileDataModel(new File("data.txt"));UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(model);// Optional:userSimilarity.setPreferenceInferrer(new AveragingPreferenceInferrer());UserNeighborhood neighborhood = new NearestNUserNeighborhood(3, userSimilarity, model);Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, userSimilarity);Recommender cachingRecommender = new CachingRecommender(recommender);List<RecommendedItem> recommendations = cachingRecommender.recommend(1234, 10);
  15. 15. Item-based recommender exampleDataModel model = new FileDataModel(new File("data.txt"));// Construct the list of pre-computed correlationsCollection<GenericItemSimilarity.ItemItemSimilarity> correlations = ...;ItemSimilarity itemSimilarity = new GenericItemSimilarity(correlations);Recommender recommender = new GenericItemBasedRecommender(model, itemSimilarity);Recommender cachingRecommender = new CachingRecommender(recommender);List<RecommendedItem> recommendations = cachingRecommender.recommend(1234, 10);
  16. 16. Recommender evaluationFor preference data models:DataModel myModel = ...;RecommenderBuilder builder = new RecommenderBuilder() { public Recommender buildRecommender(DataModel model) { // build and return the Recommender to evaluate here }};RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();double evaluation = evaluator.evaluate(builder, myModel, 0.9, 1.0);For boolean data models, precision and recall can be computed.
  17. 17. Distributed Item Based● Mahout offers 2 Hadoop Map/Reduce jobs aimed to support Itembased Collaborative Filtering● ○ computes all similar items ○ input is a CSV file with theformat userID,itemID,value ○ output is a file of itemIDs with their associated similarity value ○ different configuration options: eg. similarity measure to use (co occurrence, Euclidian distance, Pearson correlation, etc.)● ○ Completely distributed itembased recommender ○ input is a CSV file with the format userID,itemID,value ○ output is a file of userIDs with associated recommended itemIDs and their scores ○ also configuration options
  18. 18. Mahout tips● Start with non-distributed recommenders● 100M user-item associations can be handled by a modern server with 4GB of heap available as a real-time recommender● Over this scale distributed algorithms make sense● Data can be sampled, noisy and old data can be pruned● Ratings: GenericItemBasedRecommender and PearsonCorrelationSimilarity● Preferences: GenericBooleanPrefItemBasedRecommender and LogLikelihoodSimilarity● Content-based item-item similarity => your own ItemSimilarity
  19. 19. Mahout tips 2● CSV files ○ FileDataModel ○ push new files periodically● Database ○ XXXJDBCDataModel ○ ReloadFromJDBCDataModel● Offline or live recommendations? ○ Distributed algorithms => Offline periodical computations ○ Data is pushed periodically as CSV files or in DB ○ SlopeOneRecommender deals with updates quickly ○ Real time update of the DataModel and refresh recommander after some events (user rates an item etc.)
  20. 20. References●●●●● collaborative-filtering-in-java-with-mahout-taste/