Recommendation Engine
Akshat Thakar
Precursor
Awareness about Analytics
• Jargon Buster
• Recommendation System for Web/Digital
Analytics
• Technology
• Sentiment Analysis
Clustering
• Collaborative-based filtering
• Item based
• User Based
Recommendation
Similarity
Measurement–Pearson, Tanimoto
Algorithm - K-means
Similarity Measurement - Euclidean
Classification NLP
• Content-based filtering
• Regression
• Decision Tree
• SVM
• NN
• Voice Recognition
• Video Analytics
Content Based, Collaborative
Filtering[CF] and Hybrid
Recommendation System
• Content Based systems focus on properties of items.
Similarity of items is determined by measuring the
similarity in their properties.
Needs History Data.
• Collaborative-Filtering systems focus on the
relationship between users and items. Similarity of
items is determined by the similarity of the ratings of
those items by the users who have rated both items.
Source-http://infolab.stanford.edu/~ullman/mmds/ch9.pdf
How users are similar?
CF - User Similarity
Similarity Notion
User
Neighborhood
User Based
Recommender
#1 #2 #3
User Id Item Id Rating
Data Model
CF - Item Similarity
How items are similar?
Similarity Notion
Item Based
Recommender#1 #2 #3
User Id Item Id Rating
Data Model
Item-neighborhood
Source-http://www.theregister.co.uk/2006/08/15/beer_diapers/
Similarity Notion
• Pearson Correlation - measures the tendency of the numbers[User
Preferences] to move together proportionally. When this tendency is high, the correlation is
close to 1
• Spearman Correlation – Rank based on user preference
• Euclidean Distance - based on the distance between users. Smaller the
distance, more similarity in users.
• Tanimoto Coefficient – based on number of items in common
• LogLikelihood Similarity
How to code?
How Similarity Definition affects
Neighborhood formation?
Source: http://www.slideshare.net/Cataldo/apache-mahout-tutorial-recommendation-20132014
Mahout In Action
Threshold based
neighborhood
Evaluation
• Evaluate Top n Recommendations
• Precision and Recall
Relevant Non Relevant
Search Result Shown
True Positive False Positive
Search result Not Shown
False Negative True Negative
Source-https://en.wikipedia.org/wiki/Precision_and_recall
System Solutioning - More than
Algorithm Accuracy
• Business Goal Injection
• Novelty – avoiding repeated recommendations
• Diversity – How diverse are recommended items?
Does it include all sub topics?
• Positive Feedback
• Negative Feedback
source: http://www.slideshare.net/Zhenv5/diversity-and-novelty-for-recommendation-system
Technology
• Mahout –
Hadoop(optional), Java.
Lot of stable algorithms.
• R
Rhadoop
Lot of Statistics packages.
• Spark
Emerging Technology
Algorithms are getting added

Recommendation system

  • 1.
  • 2.
    Precursor Awareness about Analytics •Jargon Buster • Recommendation System for Web/Digital Analytics • Technology
  • 3.
    • Sentiment Analysis Clustering •Collaborative-based filtering • Item based • User Based Recommendation Similarity Measurement–Pearson, Tanimoto Algorithm - K-means Similarity Measurement - Euclidean Classification NLP • Content-based filtering • Regression • Decision Tree • SVM • NN • Voice Recognition • Video Analytics
  • 4.
    Content Based, Collaborative Filtering[CF]and Hybrid Recommendation System • Content Based systems focus on properties of items. Similarity of items is determined by measuring the similarity in their properties. Needs History Data. • Collaborative-Filtering systems focus on the relationship between users and items. Similarity of items is determined by the similarity of the ratings of those items by the users who have rated both items. Source-http://infolab.stanford.edu/~ullman/mmds/ch9.pdf
  • 5.
    How users aresimilar? CF - User Similarity Similarity Notion User Neighborhood User Based Recommender #1 #2 #3 User Id Item Id Rating Data Model
  • 6.
    CF - ItemSimilarity How items are similar? Similarity Notion Item Based Recommender#1 #2 #3 User Id Item Id Rating Data Model Item-neighborhood Source-http://www.theregister.co.uk/2006/08/15/beer_diapers/
  • 7.
    Similarity Notion • PearsonCorrelation - measures the tendency of the numbers[User Preferences] to move together proportionally. When this tendency is high, the correlation is close to 1 • Spearman Correlation – Rank based on user preference • Euclidean Distance - based on the distance between users. Smaller the distance, more similarity in users. • Tanimoto Coefficient – based on number of items in common • LogLikelihood Similarity How to code?
  • 8.
    How Similarity Definitionaffects Neighborhood formation? Source: http://www.slideshare.net/Cataldo/apache-mahout-tutorial-recommendation-20132014 Mahout In Action Threshold based neighborhood
  • 9.
    Evaluation • Evaluate Topn Recommendations • Precision and Recall Relevant Non Relevant Search Result Shown True Positive False Positive Search result Not Shown False Negative True Negative Source-https://en.wikipedia.org/wiki/Precision_and_recall
  • 10.
    System Solutioning -More than Algorithm Accuracy • Business Goal Injection • Novelty – avoiding repeated recommendations • Diversity – How diverse are recommended items? Does it include all sub topics? • Positive Feedback • Negative Feedback source: http://www.slideshare.net/Zhenv5/diversity-and-novelty-for-recommendation-system
  • 11.
    Technology • Mahout – Hadoop(optional),Java. Lot of stable algorithms. • R Rhadoop Lot of Statistics packages. • Spark Emerging Technology Algorithms are getting added

Editor's Notes

  • #4 Recommendation borrows lot of statistical methods and ML techniques. So same algo can occur in clustering or new one. Algo we disccuss today and frequqnetly used one.
  • #5 How many of you like gmail, android and google search. Tech Talk, Technical Blogs, Books. I do not know buying behavior, but social behavior.
  • #6 Facebook news. Tech talkl, blogs, reading books. Next certification cource.
  • #7 Baby items. Beer and diaper. Evaluate both User Similarity vs Item Similarity. How to keep items in same isle.
  • #8 School of thought. Preference values. Rank.
  • #11 Baby items for old age people, or gap between two kids, two laptop purshases. Google searches.