Recommendation
Systems
By : D.Robin Reni ( AI Intern -
SPM )
Mentor : C.Aneesh (AI Lead - SPM
)
What is
Recommendation
Systems ?
It is an automated system which can recommend
relevant items to the user based on his previous
interactions with the other items in that system.
Where it comes from ?
> Search Engines & Internet
> Information Retrieval
> Machine Learning
> E-Commerce & Advertising
> Data
Have it gone Serious ?
WHY DO WE USE IT ?
Better Customer
Experience
User Personalization
Increase Revenue
↘↙
Example :
NEED TO KNOW
User : A user in a recommender system is the party that is
receiving and acting on the recommendations.
Item : An item in a recommender system is the passive party
that is being recommended to the users.
Interactions
Positive Negative
Explicit Implicit
Representations
A (typically) low-dimensional vector that encodes the feature information about the
user or item.
Often called “embedding,” “latent user/item,”or “latent representation".
”Representation size, which is the dimension of the latent space, is often referred to as
“components.”
Example : Matrix Representation
HOW IT WORKS ?
THREE MAJOR METHODOLOGY
Content Based Collaborative Filtering Hybrid Systems
Content Based
Recommendation
Systems
Filter the key topics from the document where user
interested and interacted with it and train the model
with those keywords to provide the relevant document
to the user.
Example : Content Based
Action Rescue President
User 1 1 1 0
User 2 1 0 1
User Profile
Action Rescue Preseident
Olympus 1 1 1
White House 1 1 0
London 1 0 1
Item Profile
Olympus White House London
User 1 2 2 ?
User 2 ? 1 2
User Item Interaction
Collaborative
Filtering
Identifying the similarity with user and items interactions and find
the best similar user/item for the target user , this similarity data
act as the interaction dataset for the recommendation systems .
Example : Collaborative Filtering
Movie 1 Movie 2 Movie 3 Movie 4 Movie 5
User 1 2 ? 4 3
User 2 2 2 3 4
User 3 1 2 3 2
User Item Interactions
Hybrid
Recommendation
Systems
A system that combines content-based filtering and
collaborative filtering could take advantage from both the
representation of the content as well as the similarities
among users.
Example : Hybrid Systems
Comedy Historical Adventure User 4
User 1 1 0.3 0.6 -
User 2 0.9 0.1 0.7 +
User 3 0.8 1 0.6 ?
Framework of Recommendation Systems
Interactions
User Features
User Representations
Item Features
Item Representations
Prediction
Learning
Evaluation
Evaluation Metrics
Precision: The fraction of total no of relevant items is there in the recommended items
by the recommender system
Recall: The fraction of total no of relevant items is recommended from the relevant
items list by the recommender system
AUC_Score: The probability that a randomly chosen positive example has a higher
score than a randomly chosen negative example . A perfect score is 1.0.
Example : Precision & Recall Tradeoff
Recommended Items
Car
Mobile
Headset
Tshirt
Watch
Relevant Items
Car
Watch
Perfume
Precision: 2 / 5 = 0.4 (1.0 is the good score )
Recall: 2 / 3= 0.6 (1.0 is the good score )
TOOLS FOR RECOMMENDATION SYSTEMS
LightFM
LightFM is a Python implementation of a number of popular
recommendation algorithms for both implicit and explicit feedback,
including efficient implementation of BPR and WARP ranking
losses. It's easy to use, fast (via multithreaded model estimation),
and produces high quality results.
FRAMEWORK OF
LIGHTFM
Interactions *
User Features *
User Representation Linear
Item Features *
Item Representation Linear
Prediction Dot-Product
Learning Logistic,
WARP,BPR
Example :
Aerospace Medicine Analytics Transport
John 1 -1 1 1
Laura -1 1 1 0
Tim 0 -1 1 1
User Profile Representation
Aerospace Medicine Analytics Transport
flight project 1 0 1 1
Drug Discovery 0 1 1 0
Automobile Incubation 0 0 1 1
Item Profile Representation
After taking Dot Product (Making
Prediction),
Flight project Drug Discovery Automobile Incubation
John 1 -1 0
Laura -1 1 0
Tim 0 -1 1
1 = User Liked it
-1 = User Dislike it
0 = Predict the user recommendation level and the user not interacted with it
Learn using Loss Functions
Four Kinds of Loss Function used to optimize the recommendations
• Logistic
• Bayesian Personalized Ranking (BPR)
• Weighted Approximate Pair Wise (WARP)
• K-OS WARP
Logistic : Used when both +ve & -ve
Example: Consider you have to give the probability on how
much automobile incubation can be recommended to John
Find X: Calculate the dot product of Automobile Features and John
Features
i.e : X = 1*0 + -1 *0 + 1*1 + 1*1
= 2
Logistic Function f(x) : ( 1 + e-x )-1 : ( 1 + e-2 )-1 : 0.73
WARP ( Example )
Consider we are recommending the project for tim
Flight Project Drug Discovery Automobile
Tim 0.2 0.57 0.6
0 1 0Actual Output
x1: Flight Project x2: Drug Discovery x3: Automobile
Pairwise(x1,x2) : x1 < x2 : x2 (output) No loss
Pairwise(x2,x3) : x3 > x2 : x3 (Output) Loss(x2 Output)
Loss Function (x2,x3)=ln(X-1/N)(x3 - x2) = ln ( 3 - 1 /2) (0.6 - 0.2) = 0
To optimize the loss use stochastic gradient
Benchmarking of LightFM
Annoy
Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with
Python bindings to search for points in space that are close to a given
query point. It also creates large read-only file-based data structures
that are mmapped into memory so that many processes may share the
same data.
Annoy have only prediction part
Annoy Procedures :
• Collect the user and their item interactions
• Index and add the item into the user vector
• Specify to Build the n_trees
• Load the indexed vector
• Using KNN(K Nearest Neighbors) and the distance
function build the n_tress for the data.
• Get the relevant items for the particular user by
searching K-nearest neighbors
• Evaluate the recommended items by precision
Annoy Mapping
Benchmarking of Annoy
FAISS
(Facebook Artificial
Intelligence
Similarity Search )
Faiss is a library for efficient similarity search and clustering of
dense vectors. It contains algorithms that search in sets of vectors
of any size, up to ones that possibly do not fit in RAM. It also
contains supporting code for evaluation and parameter tuning.
FAISS Procedures :
• Get the data frame like user_id , item_id , ratings
• Give the dimension value ( user can be interacted with n items then n_items is
the dimension
• Give the user size and create a numpy vector with the ratings value for each
user
• Set the queries size and also its value
• Create an index vector using the dimension value and the index function of Faiss
• Search the top@k items for the user (FAISS automatically implement KNN and
find the nearest item using euclidean distance )
• Evaluate the recommended items by some cross validation and calculating
their precision values
Non-Metric Space
Library
( NMSLib )
Non-Metric Space Library (NMSLIB) is an efficient cross-platform
similarity search library and a toolkit for evaluation of similarity
search methods. The goal of the project is to create an effective
and comprehensive toolkit for searching in generic non-metric
spaces.
NMSLib Procedures :
Form the interaction matrix
• Split the interaction matrix into data_matrix and query_matrix
• Specify the thread properties like index_thread_quantity
• Define the no.of KNN going to calucate
• Index the data_matrix and also specify the distance algorithm
• Search the nearest neighbors for the query matrix
• Using golden standard method calculate the nearest neighbors for the query matrix
• Calculate the recall
Alternating Least
Squares ( ALS ) using
Implicit Library
Implicit is a Python collaborative filter toolkit that uses matrix
factorization to learn representations.Includes factorization
classes for ALS and BPR.
Singular Value Decomposition
ALS Procedures :
• Collect the interaction matrix and the item and user latent factors.
• In ALS , we find the matrix factorization of the interaction matrix using SVD i.e R =
UEPT
• After predicting the values , optimize the UEPT by using ALS and Stochastic
Gradient Descent
E corresponds to the latent factors weight
Cost Function :
• Matching Solution :
• Now get the predicted interaction matrix .
• Evaluate the model by the predicted and the original interaction matrix
CHALLENGES
1. Cold Start Problem
2. Observing User Interactions
3. Complex Algorithms
4. Unpredictability
5. Long Tail Concern
6. Data Availability
FACTS YOU LIKE TO KNOW
1. 35 % purchase you make in amazon are by recommendation systems
2. 70% of videos that each user watching in youtube are recommended by Google Automated
Recommendation Engine
3. 75% of what people what people watching in Netflix are recommended by the
recommendation System.
4. Deploying Recommendation engine save upto $1 billion for the E-commerce giants ,
says NetFlix Market Study Report.
THINGS YOU CAN TRY
DATASETS
Movies : https://www.kaggle.com/sengzhaotoo/movielens-small
Jokes : http://eigentaste.berkeley.edu/dataset/
Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia
Music : http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/index.html
Product: http://jmcauley.ucsd.edu/data/amazon/links.html
KERNELS
1. https://www.kaggle.com/robinreni/recommendation-benchmarking-iia
2. https://www.kaggle.com/axelderomblay/udacity-workshop-on-recommendation-systems
3. https://www.kaggle.com/toorkp/wsdm-recommendations
4. https://www.kaggle.com/tanetboss/user-clustering-for-anime-recommendation
5. https://www.kaggle.com/abhinav97dutt/book-recommendation-collaborative-filteringt/
THAT'S IT

Recommendation Systems

  • 1.
    Recommendation Systems By : D.RobinReni ( AI Intern - SPM ) Mentor : C.Aneesh (AI Lead - SPM )
  • 2.
    What is Recommendation Systems ? Itis an automated system which can recommend relevant items to the user based on his previous interactions with the other items in that system.
  • 3.
    Where it comesfrom ? > Search Engines & Internet > Information Retrieval > Machine Learning > E-Commerce & Advertising > Data
  • 4.
    Have it goneSerious ?
  • 5.
    WHY DO WEUSE IT ? Better Customer Experience User Personalization Increase Revenue ↘↙
  • 6.
  • 7.
  • 8.
    User : Auser in a recommender system is the party that is receiving and acting on the recommendations. Item : An item in a recommender system is the passive party that is being recommended to the users.
  • 9.
  • 10.
    Representations A (typically) low-dimensionalvector that encodes the feature information about the user or item. Often called “embedding,” “latent user/item,”or “latent representation". ”Representation size, which is the dimension of the latent space, is often referred to as “components.”
  • 11.
    Example : MatrixRepresentation
  • 12.
  • 13.
    THREE MAJOR METHODOLOGY ContentBased Collaborative Filtering Hybrid Systems
  • 14.
    Content Based Recommendation Systems Filter thekey topics from the document where user interested and interacted with it and train the model with those keywords to provide the relevant document to the user.
  • 15.
    Example : ContentBased Action Rescue President User 1 1 1 0 User 2 1 0 1 User Profile Action Rescue Preseident Olympus 1 1 1 White House 1 1 0 London 1 0 1 Item Profile Olympus White House London User 1 2 2 ? User 2 ? 1 2 User Item Interaction
  • 16.
    Collaborative Filtering Identifying the similaritywith user and items interactions and find the best similar user/item for the target user , this similarity data act as the interaction dataset for the recommendation systems .
  • 17.
    Example : CollaborativeFiltering Movie 1 Movie 2 Movie 3 Movie 4 Movie 5 User 1 2 ? 4 3 User 2 2 2 3 4 User 3 1 2 3 2 User Item Interactions
  • 18.
    Hybrid Recommendation Systems A system thatcombines content-based filtering and collaborative filtering could take advantage from both the representation of the content as well as the similarities among users.
  • 19.
    Example : HybridSystems Comedy Historical Adventure User 4 User 1 1 0.3 0.6 - User 2 0.9 0.1 0.7 + User 3 0.8 1 0.6 ?
  • 20.
    Framework of RecommendationSystems Interactions User Features User Representations Item Features Item Representations Prediction Learning Evaluation
  • 21.
    Evaluation Metrics Precision: Thefraction of total no of relevant items is there in the recommended items by the recommender system Recall: The fraction of total no of relevant items is recommended from the relevant items list by the recommender system AUC_Score: The probability that a randomly chosen positive example has a higher score than a randomly chosen negative example . A perfect score is 1.0.
  • 22.
    Example : Precision& Recall Tradeoff Recommended Items Car Mobile Headset Tshirt Watch Relevant Items Car Watch Perfume Precision: 2 / 5 = 0.4 (1.0 is the good score ) Recall: 2 / 3= 0.6 (1.0 is the good score )
  • 23.
  • 24.
    LightFM LightFM is aPython implementation of a number of popular recommendation algorithms for both implicit and explicit feedback, including efficient implementation of BPR and WARP ranking losses. It's easy to use, fast (via multithreaded model estimation), and produces high quality results.
  • 25.
    FRAMEWORK OF LIGHTFM Interactions * UserFeatures * User Representation Linear Item Features * Item Representation Linear Prediction Dot-Product Learning Logistic, WARP,BPR
  • 26.
    Example : Aerospace MedicineAnalytics Transport John 1 -1 1 1 Laura -1 1 1 0 Tim 0 -1 1 1 User Profile Representation Aerospace Medicine Analytics Transport flight project 1 0 1 1 Drug Discovery 0 1 1 0 Automobile Incubation 0 0 1 1 Item Profile Representation
  • 27.
    After taking DotProduct (Making Prediction), Flight project Drug Discovery Automobile Incubation John 1 -1 0 Laura -1 1 0 Tim 0 -1 1 1 = User Liked it -1 = User Dislike it 0 = Predict the user recommendation level and the user not interacted with it
  • 28.
    Learn using LossFunctions Four Kinds of Loss Function used to optimize the recommendations • Logistic • Bayesian Personalized Ranking (BPR) • Weighted Approximate Pair Wise (WARP) • K-OS WARP
  • 29.
    Logistic : Usedwhen both +ve & -ve Example: Consider you have to give the probability on how much automobile incubation can be recommended to John Find X: Calculate the dot product of Automobile Features and John Features i.e : X = 1*0 + -1 *0 + 1*1 + 1*1 = 2 Logistic Function f(x) : ( 1 + e-x )-1 : ( 1 + e-2 )-1 : 0.73
  • 30.
    WARP ( Example) Consider we are recommending the project for tim Flight Project Drug Discovery Automobile Tim 0.2 0.57 0.6 0 1 0Actual Output x1: Flight Project x2: Drug Discovery x3: Automobile Pairwise(x1,x2) : x1 < x2 : x2 (output) No loss Pairwise(x2,x3) : x3 > x2 : x3 (Output) Loss(x2 Output) Loss Function (x2,x3)=ln(X-1/N)(x3 - x2) = ln ( 3 - 1 /2) (0.6 - 0.2) = 0 To optimize the loss use stochastic gradient
  • 31.
  • 32.
    Annoy Annoy (Approximate NearestNeighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data. Annoy have only prediction part
  • 33.
    Annoy Procedures : •Collect the user and their item interactions • Index and add the item into the user vector • Specify to Build the n_trees • Load the indexed vector • Using KNN(K Nearest Neighbors) and the distance function build the n_tress for the data. • Get the relevant items for the particular user by searching K-nearest neighbors • Evaluate the recommended items by precision
  • 34.
  • 35.
  • 36.
    FAISS (Facebook Artificial Intelligence Similarity Search) Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
  • 37.
    FAISS Procedures : •Get the data frame like user_id , item_id , ratings • Give the dimension value ( user can be interacted with n items then n_items is the dimension • Give the user size and create a numpy vector with the ratings value for each user • Set the queries size and also its value • Create an index vector using the dimension value and the index function of Faiss • Search the top@k items for the user (FAISS automatically implement KNN and find the nearest item using euclidean distance ) • Evaluate the recommended items by some cross validation and calculating their precision values
  • 38.
    Non-Metric Space Library ( NMSLib) Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The goal of the project is to create an effective and comprehensive toolkit for searching in generic non-metric spaces.
  • 39.
    NMSLib Procedures : Formthe interaction matrix • Split the interaction matrix into data_matrix and query_matrix • Specify the thread properties like index_thread_quantity • Define the no.of KNN going to calucate • Index the data_matrix and also specify the distance algorithm • Search the nearest neighbors for the query matrix • Using golden standard method calculate the nearest neighbors for the query matrix • Calculate the recall
  • 40.
    Alternating Least Squares (ALS ) using Implicit Library Implicit is a Python collaborative filter toolkit that uses matrix factorization to learn representations.Includes factorization classes for ALS and BPR.
  • 41.
  • 42.
    ALS Procedures : •Collect the interaction matrix and the item and user latent factors. • In ALS , we find the matrix factorization of the interaction matrix using SVD i.e R = UEPT • After predicting the values , optimize the UEPT by using ALS and Stochastic Gradient Descent E corresponds to the latent factors weight Cost Function : • Matching Solution : • Now get the predicted interaction matrix . • Evaluate the model by the predicted and the original interaction matrix
  • 43.
    CHALLENGES 1. Cold StartProblem 2. Observing User Interactions 3. Complex Algorithms 4. Unpredictability 5. Long Tail Concern 6. Data Availability
  • 44.
    FACTS YOU LIKETO KNOW 1. 35 % purchase you make in amazon are by recommendation systems 2. 70% of videos that each user watching in youtube are recommended by Google Automated Recommendation Engine 3. 75% of what people what people watching in Netflix are recommended by the recommendation System. 4. Deploying Recommendation engine save upto $1 billion for the E-commerce giants , says NetFlix Market Study Report.
  • 45.
    THINGS YOU CANTRY DATASETS Movies : https://www.kaggle.com/sengzhaotoo/movielens-small Jokes : http://eigentaste.berkeley.edu/dataset/ Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia Music : http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/index.html Product: http://jmcauley.ucsd.edu/data/amazon/links.html KERNELS 1. https://www.kaggle.com/robinreni/recommendation-benchmarking-iia 2. https://www.kaggle.com/axelderomblay/udacity-workshop-on-recommendation-systems 3. https://www.kaggle.com/toorkp/wsdm-recommendations 4. https://www.kaggle.com/tanetboss/user-clustering-for-anime-recommendation 5. https://www.kaggle.com/abhinav97dutt/book-recommendation-collaborative-filteringt/
  • 46.