collaborative filtering

Combining Content-based and
Collaborative Filtering
Department of Computer Science and
Engineering, Slovak University of Technology
polcicova@dcs.elf.stuba.sk
navrat@elf.stuba.sk
Gabriela Polčicová
Pavol Návrat

Overview
• Information Filtering and its Types
• Combined Method
• Experiment with Information
Filtering Methods
• Conclusions

Information Filtering (1)
– delivery of relevant information to the people who need it
• Types of Information Filtering
– Content-based - for textual documents
– Collaborative - for communities of users
• Interests
– information about interests - stored in profiles
– expressing opinions to documents - ratings
• Ratings {i, j, rij}
– for user i, item j, the value of rating rij

Information Filtering (2)
Filter
Learning
interests
Estimating the
value of rating
Choosing
recommendations
Rated items
{user, item, value}
Unrated items
{user, item}
Recommendations
{user, item, estimation}

Content-based Filtering (1)
• Basic idea
– recommending documents based on content and
properties of document
• Profile
– consists of keywords with assigned weights
– only documents matching profile are recommended
• Recommendations
– based on objective measurable properties

Content-based Filtering (2)
Documents rated by the user
Documents of interest
Documents unrated by the user
PROFILE
Keywords, phrases
with weights
Documents matching profile
=> recommended documents
Documents, ratings

Collaborative Filtering (1)
• Basic idea
– automating “word of mouth”
– leverage opinions of like-minded users while making
decisions
• Schema
– collecting users’ opinions
– searching for like-minded users
– making recommendations

Profile of
current
user
Profile of
user 1
Profile of
user 2
Profile of
user 3
Profile of
user 4
Profile of
user 5
Documents from
like-minded users’
profiles
=> recommended
documents

kci =
∑ (rcj - rc) (rij - ri)
j ∈ Ici
∑ (rcj - rc)2
∑ (rij - ri)2
j ∈ Ici j ∈ Ici
• Recommendations computation: weighted sum of ratings
rcj = rc +
∑ (rij - ri) kci
i ∈ Ucj
∑ |kci|
i ∈ Ucj
• Similarity measure: Pearson Correlation Coefficient

• Computing of estimates for missing ratings by Content-
based Filtering method for each user
• Searching for like-minded users
– computing coefficient kci between current and i-th user
(only from ratings)
– computing coefficient kci’ between current and i-th user
(from both ratings and estimates)
• New recommendations computation
– using ratings (with coefficients kci) and also ratings with
estimates (with coefficient kci’) as weights in weighted
sum of ratings and estimates

Datasets for Experiments
• Data:
– EachMovie - users‘ ratings for movies
www.research.digital.com/SRC/eachmovie/
– IMDB - textual information for CBF (movies‘ descriptions)
www.imdb.com/
• Datasets:
– A - ratings from the period up to Mar 1, 1996
(810 ratings from 71 users)
– B - ratings from the period uo to Mar 15, 1996
– C - ratings from the period up to Apr 1, 1996

EachMovie Data and Constant Method
Percentage of ratings in EachMovie
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
1 2 3 4 5 6
ratings
A
B
C
• Constant Method rcj = 5

Experiments with Combination of Content-
based and Collaborative Filtering (2)
Dataset
Divide dataset
into training
set (90%) and
test set (10%)
Apply filtering
methods and
evaluate their
performance
Content-based
Filtering method
Collaborative
Filtering method
Combined
Filtering method
recommendations
recommendations
recommendations
test, training sets
test, training sets
Evaluation of methods’ performance
Constant
methodrecommendations
test set

Metrics
• Coverage = percentage of items for which the method is able to
compute estimates
• Accuracy =
• F-measure =
• NMAE =
2.Precision.Recall
Precision + Recall
|R ∩L| + |R ∩L|
|L| + |L|
|R ∩ L|
|R|
|R ∩ L|
|L|
∑|rij - rij|
n.s
Precision =
Recall =
R - set of recommended
items
L - set of liked items

Results of Experiments
Coverage
0,8
0,85
0,9
0,95
1
A B C
Accuracy
0,7
0,75
0,8
0,85
0,9
A B C
F-measure
0,8
0,85
0,9
0,95
1
A B C
F-measure
0,8
0,85
0,9
0,95
1
A B C
CF
CBF
combined
constant

Conclusions
• Combination of content-based and collaborative
filtering might help in initial phase
Future work
• Weighting of coefficients
• Comparing method with additional methods

Content-based Filtering - Vector
Representation of Documents and Profiles
Wj= (0, … , 0, 0.5 , 0, … , 0, 0.3 , 0, … , 0, 0.2 , 0, … , 0)
profilei = ∑ rj .wij
n
j = 1
D = ( … , computer, … , learning, … , machine, …. )
Documentj
computer machine
learning
TF-IDF
TF-IDFTF-IDF
W . Profile
|W| . |Profile|
Sim(W, Profile) =

Collaborative Filtering - Example
A B C D E F G
current 1 4 5
1 3 5 1 2
2 1 3 2 5
3 5 1 4 5
4 1 4 2 4
5 2 4 2 5
2

kci =
∑ (rcj - rc) (rij - ri)
j ∈ Ici
∑ (rcj - rc)2
∑ (rij - ri)2
j ∈ Ici j ∈ Ici
• Recommendations computation: weighted sum of ratings
and estimates
rcj = rc +
∑ (rij - ri) kci + ∑ (rij - ri) kci’
i ∈ Ucj
CBF
∑ |kci| + ∑ |kci’|
i ∈ U’cj
i ∈ Ucj i ∈ U’cj
• Similarity measure: Pearson Correlation Coefficient
’
’
’ ’
CBF CBF
CBF CBF

Experiments with Combination of Content-
based and Collaborative Filtering (1)
• Content-based Filtering Method (CBF)
– documents and profiles: vector representation - weighted
keywords (TF-IDF)
– estimation computation: normalized dot product of
document and profile vectors
• Collaborative Filtering (CF)
– Pearson correlation coefficient
– weighted sum of ratings
• Combination of CF and CBF
– Pearson correlation coefficients
– weighted sum of ratings and CBF estimations
• Constant Method (rcj = 5)

collaborative filtering

More Related Content

Similar to collaborative filtering

Recently uploaded

collaborative filtering

Editor's Notes