basic Function and Terminology of Recommendation Systems. Some Algorithmic Implementation with some sample Dataset for Understanding. It contains all the Layers of RS Framework well explained.
2. What is
Recommendation
Systems ?
It is an automated system which can recommend
relevant items to the user based on his previous
interactions with the other items in that system.
3. Where it comes from ?
> Search Engines & Internet
> Information Retrieval
> Machine Learning
> E-Commerce & Advertising
> Data
8. User : A user in a recommender system is the party that is
receiving and acting on the recommendations.
Item : An item in a recommender system is the passive party
that is being recommended to the users.
10. Representations
A (typically) low-dimensional vector that encodes the feature information about the
user or item.
Often called “embedding,” “latent user/item,”or “latent representation".
”Representation size, which is the dimension of the latent space, is often referred to as
“components.”
14. Content Based
Recommendation
Systems
Filter the key topics from the document where user
interested and interacted with it and train the model
with those keywords to provide the relevant document
to the user.
15. Example : Content Based
Action Rescue President
User 1 1 1 0
User 2 1 0 1
User Profile
Action Rescue Preseident
Olympus 1 1 1
White House 1 1 0
London 1 0 1
Item Profile
Olympus White House London
User 1 2 2 ?
User 2 ? 1 2
User Item Interaction
16. Collaborative
Filtering
Identifying the similarity with user and items interactions and find
the best similar user/item for the target user , this similarity data
act as the interaction dataset for the recommendation systems .
17. Example : Collaborative Filtering
Movie 1 Movie 2 Movie 3 Movie 4 Movie 5
User 1 2 ? 4 3
User 2 2 2 3 4
User 3 1 2 3 2
User Item Interactions
18. Hybrid
Recommendation
Systems
A system that combines content-based filtering and
collaborative filtering could take advantage from both the
representation of the content as well as the similarities
among users.
19. Example : Hybrid Systems
Comedy Historical Adventure User 4
User 1 1 0.3 0.6 -
User 2 0.9 0.1 0.7 +
User 3 0.8 1 0.6 ?
20. Framework of Recommendation Systems
Interactions
User Features
User Representations
Item Features
Item Representations
Prediction
Learning
Evaluation
21. Evaluation Metrics
Precision: The fraction of total no of relevant items is there in the recommended items
by the recommender system
Recall: The fraction of total no of relevant items is recommended from the relevant
items list by the recommender system
AUC_Score: The probability that a randomly chosen positive example has a higher
score than a randomly chosen negative example . A perfect score is 1.0.
22. Example : Precision & Recall Tradeoff
Recommended Items
Car
Mobile
Headset
Tshirt
Watch
Relevant Items
Car
Watch
Perfume
Precision: 2 / 5 = 0.4 (1.0 is the good score )
Recall: 2 / 3= 0.6 (1.0 is the good score )
24. LightFM
LightFM is a Python implementation of a number of popular
recommendation algorithms for both implicit and explicit feedback,
including efficient implementation of BPR and WARP ranking
losses. It's easy to use, fast (via multithreaded model estimation),
and produces high quality results.
25. FRAMEWORK OF
LIGHTFM
Interactions *
User Features *
User Representation Linear
Item Features *
Item Representation Linear
Prediction Dot-Product
Learning Logistic,
WARP,BPR
26. Example :
Aerospace Medicine Analytics Transport
John 1 -1 1 1
Laura -1 1 1 0
Tim 0 -1 1 1
User Profile Representation
Aerospace Medicine Analytics Transport
flight project 1 0 1 1
Drug Discovery 0 1 1 0
Automobile Incubation 0 0 1 1
Item Profile Representation
27. After taking Dot Product (Making
Prediction),
Flight project Drug Discovery Automobile Incubation
John 1 -1 0
Laura -1 1 0
Tim 0 -1 1
1 = User Liked it
-1 = User Dislike it
0 = Predict the user recommendation level and the user not interacted with it
28. Learn using Loss Functions
Four Kinds of Loss Function used to optimize the recommendations
• Logistic
• Bayesian Personalized Ranking (BPR)
• Weighted Approximate Pair Wise (WARP)
• K-OS WARP
29. Logistic : Used when both +ve & -ve
Example: Consider you have to give the probability on how
much automobile incubation can be recommended to John
Find X: Calculate the dot product of Automobile Features and John
Features
i.e : X = 1*0 + -1 *0 + 1*1 + 1*1
= 2
Logistic Function f(x) : ( 1 + e-x )-1 : ( 1 + e-2 )-1 : 0.73
30. WARP ( Example )
Consider we are recommending the project for tim
Flight Project Drug Discovery Automobile
Tim 0.2 0.57 0.6
0 1 0Actual Output
x1: Flight Project x2: Drug Discovery x3: Automobile
Pairwise(x1,x2) : x1 < x2 : x2 (output) No loss
Pairwise(x2,x3) : x3 > x2 : x3 (Output) Loss(x2 Output)
Loss Function (x2,x3)=ln(X-1/N)(x3 - x2) = ln ( 3 - 1 /2) (0.6 - 0.2) = 0
To optimize the loss use stochastic gradient
32. Annoy
Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with
Python bindings to search for points in space that are close to a given
query point. It also creates large read-only file-based data structures
that are mmapped into memory so that many processes may share the
same data.
Annoy have only prediction part
33. Annoy Procedures :
• Collect the user and their item interactions
• Index and add the item into the user vector
• Specify to Build the n_trees
• Load the indexed vector
• Using KNN(K Nearest Neighbors) and the distance
function build the n_tress for the data.
• Get the relevant items for the particular user by
searching K-nearest neighbors
• Evaluate the recommended items by precision
36. FAISS
(Facebook Artificial
Intelligence
Similarity Search )
Faiss is a library for efficient similarity search and clustering of
dense vectors. It contains algorithms that search in sets of vectors
of any size, up to ones that possibly do not fit in RAM. It also
contains supporting code for evaluation and parameter tuning.
37. FAISS Procedures :
• Get the data frame like user_id , item_id , ratings
• Give the dimension value ( user can be interacted with n items then n_items is
the dimension
• Give the user size and create a numpy vector with the ratings value for each
user
• Set the queries size and also its value
• Create an index vector using the dimension value and the index function of Faiss
• Search the top@k items for the user (FAISS automatically implement KNN and
find the nearest item using euclidean distance )
• Evaluate the recommended items by some cross validation and calculating
their precision values
38. Non-Metric Space
Library
( NMSLib )
Non-Metric Space Library (NMSLIB) is an efficient cross-platform
similarity search library and a toolkit for evaluation of similarity
search methods. The goal of the project is to create an effective
and comprehensive toolkit for searching in generic non-metric
spaces.
39. NMSLib Procedures :
Form the interaction matrix
• Split the interaction matrix into data_matrix and query_matrix
• Specify the thread properties like index_thread_quantity
• Define the no.of KNN going to calucate
• Index the data_matrix and also specify the distance algorithm
• Search the nearest neighbors for the query matrix
• Using golden standard method calculate the nearest neighbors for the query matrix
• Calculate the recall
40. Alternating Least
Squares ( ALS ) using
Implicit Library
Implicit is a Python collaborative filter toolkit that uses matrix
factorization to learn representations.Includes factorization
classes for ALS and BPR.
42. ALS Procedures :
• Collect the interaction matrix and the item and user latent factors.
• In ALS , we find the matrix factorization of the interaction matrix using SVD i.e R =
UEPT
• After predicting the values , optimize the UEPT by using ALS and Stochastic
Gradient Descent
E corresponds to the latent factors weight
Cost Function :
• Matching Solution :
• Now get the predicted interaction matrix .
• Evaluate the model by the predicted and the original interaction matrix
43. CHALLENGES
1. Cold Start Problem
2. Observing User Interactions
3. Complex Algorithms
4. Unpredictability
5. Long Tail Concern
6. Data Availability
44. FACTS YOU LIKE TO KNOW
1. 35 % purchase you make in amazon are by recommendation systems
2. 70% of videos that each user watching in youtube are recommended by Google Automated
Recommendation Engine
3. 75% of what people what people watching in Netflix are recommended by the
recommendation System.
4. Deploying Recommendation engine save upto $1 billion for the E-commerce giants ,
says NetFlix Market Study Report.
45. THINGS YOU CAN TRY
DATASETS
Movies : https://www.kaggle.com/sengzhaotoo/movielens-small
Jokes : http://eigentaste.berkeley.edu/dataset/
Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia
Music : http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/index.html
Product: http://jmcauley.ucsd.edu/data/amazon/links.html
KERNELS
1. https://www.kaggle.com/robinreni/recommendation-benchmarking-iia
2. https://www.kaggle.com/axelderomblay/udacity-workshop-on-recommendation-systems
3. https://www.kaggle.com/toorkp/wsdm-recommendations
4. https://www.kaggle.com/tanetboss/user-clustering-for-anime-recommendation
5. https://www.kaggle.com/abhinav97dutt/book-recommendation-collaborative-filteringt/