Cold-Start Management with Cross-Domain Collaborative Filtering and Tags

EC-Web - August 2013, Prague, Czech Republic
Cold-Start Management with Cross-Domain
Collaborative Filtering and Tags
Manuel Enrich, Matthias Braunhofer, and Francesco Ricci
Free University of Bozen - Bolzano
Piazza Domenicani 3, 39100 Bolzano, Italy
{menrich,mbraunhofer,fricci}@unibz.it

Outline
2
• Recommender Systems and the Cold-Start Problem
• State of the Art
• Tag-Based Rating Prediction Models
• Experimental Evaluation
• Conclusions and Future Work

Recommender Systems (RSs)
• Goal: recommend new, relevant items to users based on their feedback
• Explicit feedback (ratings) vs. implicit feedback (purchase / browsing
history)
• Two basic technical approaches:
• Collaborative ﬁltering (CF)
• Content-based
3

Cold-Start Problem
• CF RSs suﬀer from the cold-start problem
• New user problem: How do you recommend to a new user?
• New item problem: How do you recommend a new item with no ratings?
• Content-based RSs overcome the new item problem
4
5 ? 3
2 45
? 43
? ??
5 ? 3
2 4 ?5
?
? 4 ?3

Outline
5

Cross-Domain CF (1/2)
• Technique that uses user ratings in one (auxiliary) domain to improve the
recommendation accuracy in another (target) domain (Berkovsky et al.,
2007)
• Example:
6
Ratings in target domain
Recommender System
Recommended content in
target domain
Ratings in auxiliary domain

Cross-Domain CF (2/2)
• Main limitation: its limited applicability
• It fails when no common users / items are shared among the domains
• Example:
7
5
? 3 4
4
3
2
?5
3
4 2 5
3
4
4
15
Ratings in target domain
Ratings in auxiliary domain

Additional Knowledge Sources
• Extend existing rating prediction models by incorporating additional
sources of information about the users and items to better predict user
preferences (Koren and Bell, 2011; Baltrunas et al., 2012)
• Utilize implicit feedback, demographic data, contextual factors, ...
• Main limitations:
• Extensive training sets (i.e., browsing / purchase histories, ratings in
context) to learn the models are required
• Training sets are speciﬁc to the application’s target domain
8

Tag-Induced Cross-Domain CF
• Exploits user-generated tags that are shared across domains to link their
users and items (Shi et al., 2011)
• Cross-domain similarities calculated based on user-assigned tags are
used to constrain matrix factorization
• Main limitations:
• Depends on a similarity function that might inﬂuence the recommendation
quality
• Requires the target user to have tagged several items
9

Outline
10

Our Solution: Tag-Based Prediction Models
• Main assumption: it is possible to exploit the information about how users
tag and rate items in a particular domain to improve the prediction accuracy
in another domain
• Example:
11
Target domain knowledge
Auxiliary domain knowledge
5
? 3 4
4
3
2
?5
3
4 2 5
3
4
4
15
ExcitingExciting
?
Exciting

Our Solution: Tag-Based Prediction Models
• Main assumption: it is possible to exploit the information about how users
tag and rate items in a particular domain to improve the prediction accuracy
in another domain
• Example:
11
Target domain knowledge
Auxiliary domain knowledge
5
? 3 4
4
3
2
?5
3
4 2 5
3
4
4
15
ExcitingExciting
5
Exciting

Latent Factor Models
• Each user u and item i are associated with latent factor vectors pu and qi
• Dot product captures the predicted user’s overall interest in the item:
• Factor vectors are learned using:
• Stochastic gradient descent
• Alternating least squares
12
ˆrui = puqi
T

Incorporating Tags
• Consider that the item i has been tagged with some tags T(i)
• We can use an additional set of factor vectors, one for each tag t, yt -
expressing how much an item that was annotated with tag t is loading the
factors
• The rating prediction function is now:
13
ˆrui = pu (qi
T
+ yt
T
)
t∈T (i)
∑

1st Proposed Model: UserItemTags
• Main idea: user ratings for an item may be also correlated with the speciﬁc
tags the user attached to the item
• Assumption: target user has tagged the item
14
ˆrui = pu (qi
T
+
1
Tu (i)
yt
T
)
t∈Tu (i)
∑
pu : latent factor vector of user u
qi : latent factor vector of item i
Tu(i) : set of tags assigned by user u to item i
yt : latent factor vector of tag t

2nd Proposed Model: UserItemRelTags
• Main idea: same as before, except that we consider only relevant tags (i.e.,
tags that have a statistically signiﬁcant inﬂuence on the ratings)
• Assumption: target user has tagged the item
15
ˆrui = pu (qi
T
+
1
TRu (i)
yt
T
)
t∈TRu (i)
∑
TRu(i) : set of relevant tags assigned by user u to item i
yt : latent factor vector of tag t

3rd Proposed Model: ItemRelTags
• Main idea: target user rating for an item can be better predicted by modeling
how tags overall inﬂuence the item’s ratings
• Advantage: doesn’t require the target user to have tagged the item
16
ˆrui = pu (qi
T
+
1
TRoi
TRoi (t)yt
T
)
t∈TR(i)
∑
TR(i) : set of relevant tags assigned to item i
TRoi: relevant tags applied to item i (incl. duplicates)
TRoi(t): tag occurrences of tag t in item i

Outline
17

Used Datasets
• 2 tagged rating datasets
18
Total number of ratings 24,564 24,564
Unique users 2,026 283
Unique items 5,088 12,554
Unique tags 9,486 4,708
Tag assignments 44,805 78,239
Average ratings per user 12.12 86.80
Average tags per rating 1.82 3.18
% of tags overlapping with the tags used in
the other domain
14.54 29.31
* The statistics refer to the datasets after performing some pre-processing

Cross-Domain Recommendations
Evaluation Design (1/2)
• 2 results for each model
• MovieLens as target and LibraryThing as auxiliary domain
• LibraryThing as target and MovieLens as auxiliary domain
• SVD model (Koren and Bell, 2011) used as baseline system
• Only data coming from the target domain used for training
• Rating prediction accuracy measured in terms of:
• Mean Absolute Error (MAE)
19

EC-Web - August 2013, Prague, Czech Republic 20
• (Extended) 10-fold cross validation scheme:

• Break up the data from target domain into 10 pieces
Target domain Auxiliary domain

• Treat one piece as test dataset and ﬁt the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)

• Treat one piece as test dataset and ﬁt the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)
• Repeat

Evaluation Results (1/2)
Average MAEs using MovieLens as target and LibraryThing as auxiliary domain
21
0.74%
0.76%
0.78%
0.8%
0.82%
0.84%
0.86%
0.88%
0.9%
0.92%
0.94%
0.96%
0.98%
10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%%
Average'MAE'
Usage'of'data'from'target'domain'
SVD% UserItemTags% UserItemRelTags% ItemRelTags%

Average MAEs using LibraryThing as target and MovieLens as auxiliary domain
22
0.76%
0.78%
0.8%
0.82%
0.84%
0.86%
0.88%
0.9%
10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%%
Average'MAE'
Usage'of'data'from'the'target'domain'
SVD% UserItemTags% UserItemRelTags% ItemRelTags%

Single-Domain Recommendations
Evaluation Design
• Check the performance of the models using only rating and tagging data in
the target domain
• In each of the 10 iterations, one split used as test and the remaining data
as training set
• Training set is split into 10 further parts used for incremental training
• SVD used as a baseline model
23

Comparison of models’ MAEs - single vs. cross-domain (MovieLens target)
24
0.74%
0.76%
0.78%
0.8%
0.82%
0.84%
0.86%
0.88%
0.9%
0.92%
0.94%
0.96%
0.98%
10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%%
Average'MAE'
Usage'of'data'
SVD% UserItemRelTags% UserItemRelTags%(cross@domain)%
ItemRelTags% ItemRelTags%(cross@domain)%

Comparison of models’ MAEs - single vs. cross-domain (LibraryThing target)
25
0.76%
0.78%
0.8%
0.82%
0.84%
0.86%
0.88%
0.9%
0.92%
10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%%
Average'MAE'
Usage'of'data'
SVD% UserItemRelTags% UserItemRelTags%(cross@domain)%
ItemRelTags% ItemRelTags%(cross@domain)%

Outline
26

Conclusions
• Novel cross-domain recommendation approaches
• improve the prediction accuracy on a target domain using rating and
tagging data from an auxiliary domain (assuming that there is a good tag
overlap)
• very useful in the cold-start situation (i.e., when a small amount of training
data in the target domain is available)
• improve the rating prediction also in a single-domain scenario (i.e., using
only rating and tagging data in the target domain)
27

Future Work
• Extended evaluation
• Better correlation of the algorithm performance to the characteristics of the
datasets
• Usage of other datasets / comparison with other cross-domain RSs
• Analysis of ﬁelds of application
• Exploitation in context-aware RSs
• Generation of more diverse recommendations
28

Questions?
Thank you.

Cold-Start Management with Cross-Domain Collaborative Filtering and Tags

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Cold-Start Management with Cross-Domain Collaborative Filtering and Tags

Similar to Cold-Start Management with Cross-Domain Collaborative Filtering and Tags (20)

Recently uploaded

Recently uploaded (20)

Cold-Start Management with Cross-Domain Collaborative Filtering and Tags