2. OUTLINE
Introduction Recommendations
Model For Recommendations.
Recommendations Types
Collaborative Filtering with Examples.
Content based Filtering with Examples.
R code for Collaborative Filtering (item based)
Hybrid Recommendation
Recommendation Technologies
3. WHAT IS RECOMMENDATION SYSTEM?
Data Filtering tools that make use of Algorithm &
Data to Recommend
They’re in Automated form to analyze the need of
users & also relates which items could be/or user
could buy along with some other items
4. RECOMMENDATIONS
Software tools and techniques providing
suggestions for items/products/people etc.
The suggestions relate to various Decision –Making
process.
Focused for individual who lack sufficient personnel
experience or data to take decision.
It’s a information filtering system to predict the
rating or preference that user give to item.
Ex. What item to buy. What music to listen. What
online news to listen.
7. WANT SOME EVIDENCES?
Netflix:
2/3 Rented movies are from Recommendation
Google News:
38% more clicks are due to Recommendation
Amazon:
35% sales are from Recommendation
14. MEASURING SIMILARITIES
Jaccard Distance
dj(I1,I2) = │AUB│-│A∩B│
Cosine Distance
cos = ∑(A*B)
√ ∑(A*A)* √ ∑(B*B)
Rounding the Data
eg. Ratings from 1-5
3/4/51
1/2 Null
Many More...
│AUB│
15. The data set contains information about users and
which song they have listened to on Last.FM.
# Read data from Last.FM frequency matrix
data<-read.csv(file.choose())
head(data[,c(1,3:8)])
#drop Column user
data.content <- (data[,!(names(data) %in% c("user"))])
R CODE FOR ITEM BASED COLLABORATIVE
FILTERING
16. #Create a helper function to calculate the cosine
between two vectors
getCosine function(x,y)
{
this.cosine sum(x*y) / (sqrt(sum(x*x)) *
sqrt(sum(y*y)))
return(this.cosine)
}
#Create a placeholder to store results of our cosine
similarities (dataframe listing item vs. Item)
data.content.similaritymatrix(NA,nrow=ncol(data.cont
ent),ncol=ncol(data.content),dimnames=list(colname
s(data.content),colnames(data.content)))
17. Head(data.content.similarity)
will now put similarities between items
Lets fill in those empty spaces with cosine similarities
Loop through the columns
for(i in 1:ncol(data.content)) {
# Loop through the columns for each column
for(j in 1:ncol(data.content)) {
# Fill in placeholder with cosine similarities
data.content.similarity[i,j] <-
getCosine(as.matrix(data.content[i]),as.matrix(data.content[j]))
}
}
data.content.similarity <- as.data.frame(data.content.similarity)
18. # Get the top 10 neighbours for each and convert data frame to matrix form
data.content.neighbours matrix(NA,
nrow=ncol(data.content.similarity),ncol=11,
dimnames=list(colnames(data.content.similarity)))
#Then we need to find the neighbours. This is another loop but runs much
faster
for(i in 1:ncol(data.content))
{
data.content.neighbours[i,]
(t(head(n=11,rownames(data.content.similarity[order
(data.content.similarity[,i],decreasing=TRUE),][i]))))
}
we use t() to rotate the similarity matrix since the neighbour one is shaped
differently
19. head(data.content.neighbours)
This means for those listening to Abba we would recommend
Madonna and Robbie Williams.
Likewise for people listening to ACDC we would recommend
the Red Hot Chilli Peppers and Metallica
20. CONTENT BASED
•It is based on properties of items recommended.
•Eg. If user reads about big data articles, so the articles
related to big data willbe recommended to user.
•It is based on topic or features of the product
21. What can we do with these?
Query Items that are similar to these items
Match Items’s content and user’s Profile
Measuring Similarity
• Cosine,TF-IDF as in standard Information.
• Euclidean Dimensionality reduction if you want.
25. SOURCE OF INFORMATION
Explicit ratings on a numeric/ 5-star/3-star etc. scale
• Explicit binary ratings (thumbs up/thumbs down)
Implicit information, e.g.,
– who bookmarked/linked to the item?
– how many times was it viewed?
– how many units were sold?
– how long did users read the page?
• Item descriptions/features
• User profiles/preferences