RECOMMENDATION SYSTEM
By Anamta Sayyed
OUTLINE
 Introduction Recommendations
 Model For Recommendations.
 Recommendations Types
 Collaborative Filtering with Examples.
 Content based Filtering with Examples.
 R code for Collaborative Filtering (item based)
 Hybrid Recommendation
 Recommendation Technologies
WHAT IS RECOMMENDATION SYSTEM?
 Data Filtering tools that make use of Algorithm &
Data to Recommend
 They’re in Automated form to analyze the need of
users & also relates which items could be/or user
could buy along with some other items
RECOMMENDATIONS
 Software tools and techniques providing
suggestions for items/products/people etc.
 The suggestions relate to various Decision –Making
process.
 Focused for individual who lack sufficient personnel
experience or data to take decision.
 It’s a information filtering system to predict the
rating or preference that user give to item.
Ex. What item to buy. What music to listen. What
online news to listen.
EXAMPLES
EXAMPLES
WANT SOME EVIDENCES?
 Netflix:
2/3 Rented movies are from Recommendation
 Google News:
38% more clicks are due to Recommendation
 Amazon:
35% sales are from Recommendation
RECOMMENDATION PROCESS
TYPES
Recommendation System
Content-Based Approach
Hybrid Model (Content-
Based Approach)
Collaborative Filtering (CF)
TYPES..
COLLABORATIVE FILTERING
•It Recommends Items based on similarity measures between Users &
Items.
COLLABORATIVE FILTERING TYPES
COLLABORATIVE FILTERING
MEASURING SIMILARITIES
 Jaccard Distance
dj(I1,I2) = │AUB│-│A∩B│
 Cosine Distance
cos = ∑(A*B)
√ ∑(A*A)* √ ∑(B*B)
 Rounding the Data
eg. Ratings from 1-5
3/4/51
1/2 Null
 Many More...
│AUB│
The data set contains information about users and
which song they have listened to on Last.FM.
# Read data from Last.FM frequency matrix
data<-read.csv(file.choose())
head(data[,c(1,3:8)])
#drop Column user
data.content <- (data[,!(names(data) %in% c("user"))])
R CODE FOR ITEM BASED COLLABORATIVE
FILTERING
#Create a helper function to calculate the cosine
between two vectors
getCosine function(x,y)
{
this.cosine sum(x*y) / (sqrt(sum(x*x)) *
sqrt(sum(y*y)))
return(this.cosine)
}
#Create a placeholder to store results of our cosine
similarities (dataframe listing item vs. Item)
data.content.similaritymatrix(NA,nrow=ncol(data.cont
ent),ncol=ncol(data.content),dimnames=list(colname
s(data.content),colnames(data.content)))
Head(data.content.similarity)
will now put similarities between items
Lets fill in those empty spaces with cosine similarities
Loop through the columns
for(i in 1:ncol(data.content)) {
# Loop through the columns for each column
for(j in 1:ncol(data.content)) {
# Fill in placeholder with cosine similarities
data.content.similarity[i,j] <-
getCosine(as.matrix(data.content[i]),as.matrix(data.content[j]))
}
}
data.content.similarity <- as.data.frame(data.content.similarity)
# Get the top 10 neighbours for each and convert data frame to matrix form
data.content.neighbours matrix(NA,
nrow=ncol(data.content.similarity),ncol=11,
dimnames=list(colnames(data.content.similarity)))
#Then we need to find the neighbours. This is another loop but runs much
faster
for(i in 1:ncol(data.content))
{
data.content.neighbours[i,] 
(t(head(n=11,rownames(data.content.similarity[order
(data.content.similarity[,i],decreasing=TRUE),][i]))))
}
we use t() to rotate the similarity matrix since the neighbour one is shaped
differently
head(data.content.neighbours)
This means for those listening to Abba we would recommend
Madonna and Robbie Williams.
Likewise for people listening to ACDC we would recommend
the Red Hot Chilli Peppers and Metallica
CONTENT BASED
•It is based on properties of items recommended.
•Eg. If user reads about big data articles, so the articles
related to big data willbe recommended to user.
•It is based on topic or features of the product
What can we do with these?
 Query Items that are similar to these items
 Match Items’s content and user’s Profile
Measuring Similarity
• Cosine,TF-IDF as in standard Information.
• Euclidean Dimensionality reduction if you want.
DIFFERENCE
HYBRID
EXAMPLE
SOURCE OF INFORMATION
 Explicit ratings on a numeric/ 5-star/3-star etc. scale
• Explicit binary ratings (thumbs up/thumbs down)
 Implicit information, e.g.,
– who bookmarked/linked to the item?
– how many times was it viewed?
– how many units were sold?
– how long did users read the page?
• Item descriptions/features
• User profiles/preferences
RECOMMENDATION TECHNOLOGIES
THANK YOU!

Recommendation System

  • 1.
  • 2.
    OUTLINE  Introduction Recommendations Model For Recommendations.  Recommendations Types  Collaborative Filtering with Examples.  Content based Filtering with Examples.  R code for Collaborative Filtering (item based)  Hybrid Recommendation  Recommendation Technologies
  • 3.
    WHAT IS RECOMMENDATIONSYSTEM?  Data Filtering tools that make use of Algorithm & Data to Recommend  They’re in Automated form to analyze the need of users & also relates which items could be/or user could buy along with some other items
  • 4.
    RECOMMENDATIONS  Software toolsand techniques providing suggestions for items/products/people etc.  The suggestions relate to various Decision –Making process.  Focused for individual who lack sufficient personnel experience or data to take decision.  It’s a information filtering system to predict the rating or preference that user give to item. Ex. What item to buy. What music to listen. What online news to listen.
  • 5.
  • 6.
  • 7.
    WANT SOME EVIDENCES? Netflix: 2/3 Rented movies are from Recommendation  Google News: 38% more clicks are due to Recommendation  Amazon: 35% sales are from Recommendation
  • 8.
  • 9.
    TYPES Recommendation System Content-Based Approach HybridModel (Content- Based Approach) Collaborative Filtering (CF)
  • 10.
  • 11.
    COLLABORATIVE FILTERING •It RecommendsItems based on similarity measures between Users & Items.
  • 12.
  • 13.
  • 14.
    MEASURING SIMILARITIES  JaccardDistance dj(I1,I2) = │AUB│-│A∩B│  Cosine Distance cos = ∑(A*B) √ ∑(A*A)* √ ∑(B*B)  Rounding the Data eg. Ratings from 1-5 3/4/51 1/2 Null  Many More... │AUB│
  • 15.
    The data setcontains information about users and which song they have listened to on Last.FM. # Read data from Last.FM frequency matrix data<-read.csv(file.choose()) head(data[,c(1,3:8)]) #drop Column user data.content <- (data[,!(names(data) %in% c("user"))]) R CODE FOR ITEM BASED COLLABORATIVE FILTERING
  • 16.
    #Create a helperfunction to calculate the cosine between two vectors getCosine function(x,y) { this.cosine sum(x*y) / (sqrt(sum(x*x)) * sqrt(sum(y*y))) return(this.cosine) } #Create a placeholder to store results of our cosine similarities (dataframe listing item vs. Item) data.content.similaritymatrix(NA,nrow=ncol(data.cont ent),ncol=ncol(data.content),dimnames=list(colname s(data.content),colnames(data.content)))
  • 17.
    Head(data.content.similarity) will now putsimilarities between items Lets fill in those empty spaces with cosine similarities Loop through the columns for(i in 1:ncol(data.content)) { # Loop through the columns for each column for(j in 1:ncol(data.content)) { # Fill in placeholder with cosine similarities data.content.similarity[i,j] <- getCosine(as.matrix(data.content[i]),as.matrix(data.content[j])) } } data.content.similarity <- as.data.frame(data.content.similarity)
  • 18.
    # Get thetop 10 neighbours for each and convert data frame to matrix form data.content.neighbours matrix(NA, nrow=ncol(data.content.similarity),ncol=11, dimnames=list(colnames(data.content.similarity))) #Then we need to find the neighbours. This is another loop but runs much faster for(i in 1:ncol(data.content)) { data.content.neighbours[i,]  (t(head(n=11,rownames(data.content.similarity[order (data.content.similarity[,i],decreasing=TRUE),][i])))) } we use t() to rotate the similarity matrix since the neighbour one is shaped differently
  • 19.
    head(data.content.neighbours) This means forthose listening to Abba we would recommend Madonna and Robbie Williams. Likewise for people listening to ACDC we would recommend the Red Hot Chilli Peppers and Metallica
  • 20.
    CONTENT BASED •It isbased on properties of items recommended. •Eg. If user reads about big data articles, so the articles related to big data willbe recommended to user. •It is based on topic or features of the product
  • 21.
    What can wedo with these?  Query Items that are similar to these items  Match Items’s content and user’s Profile Measuring Similarity • Cosine,TF-IDF as in standard Information. • Euclidean Dimensionality reduction if you want.
  • 22.
  • 23.
  • 24.
  • 25.
    SOURCE OF INFORMATION Explicit ratings on a numeric/ 5-star/3-star etc. scale • Explicit binary ratings (thumbs up/thumbs down)  Implicit information, e.g., – who bookmarked/linked to the item? – how many times was it viewed? – how many units were sold? – how long did users read the page? • Item descriptions/features • User profiles/preferences
  • 26.
  • 27.