Recommender systems suffer from the new user problem, i.e., the difficulty to make accurate predictions for users that have rated only few items. Moreover, they usually compute recommendations for items just in one domain, such as movies, music, or books. In this paper we deal with such a cold-start situation exploiting cross-domain recommendation techniques, i.e., we suggest items to a user in one target domain by using ratings of other users in a, completely disjoint, auxiliary domain. We present three rating prediction models that make use of information about how users tag items in an auxiliary domain, and how these tags correlate with the ratings to improve the rating prediction task in a different target domain. We show that the proposed techniques can effectively deal with the considered cold-start situation, given that the tags used in the two domains overlap.
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
1. EC-Web - August 2013, Prague, Czech Republic
Cold-Start Management with Cross-Domain
Collaborative Filtering and Tags
Manuel Enrich, Matthias Braunhofer, and Francesco Ricci
Free University of Bozen - Bolzano
Piazza Domenicani 3, 39100 Bolzano, Italy
{menrich,mbraunhofer,fricci}@unibz.it
2. EC-Web - August 2013, Prague, Czech Republic
Outline
2
• Recommender Systems and the Cold-Start Problem
• State of the Art
• Tag-Based Rating Prediction Models
• Experimental Evaluation
• Conclusions and Future Work
3. EC-Web - August 2013, Prague, Czech Republic
Outline
2
• Recommender Systems and the Cold-Start Problem
• State of the Art
• Tag-Based Rating Prediction Models
• Experimental Evaluation
• Conclusions and Future Work
4. EC-Web - August 2013, Prague, Czech Republic
Recommender Systems (RSs)
• Goal: recommend new, relevant items to users based on their feedback
• Explicit feedback (ratings) vs. implicit feedback (purchase / browsing
history)
• Two basic technical approaches:
• Collaborative filtering (CF)
• Content-based
3
5. EC-Web - August 2013, Prague, Czech Republic
Cold-Start Problem
• CF RSs suffer from the cold-start problem
• New user problem: How do you recommend to a new user?
• New item problem: How do you recommend a new item with no ratings?
• Content-based RSs overcome the new item problem
4
5 ? 3
2 45
? 43
? ??
5 ? 3
2 4 ?5
?
? 4 ?3
6. EC-Web - August 2013, Prague, Czech Republic
Outline
5
• Recommender Systems and the Cold-Start Problem
• State of the Art
• Tag-Based Rating Prediction Models
• Experimental Evaluation
• Conclusions and Future Work
7. EC-Web - August 2013, Prague, Czech Republic
Cross-Domain CF (1/2)
• Technique that uses user ratings in one (auxiliary) domain to improve the
recommendation accuracy in another (target) domain (Berkovsky et al.,
2007)
• Example:
6
Ratings in target domain
Recommender System
Recommended content in
target domain
Ratings in auxiliary domain
8. EC-Web - August 2013, Prague, Czech Republic
Cross-Domain CF (2/2)
• Main limitation: its limited applicability
• It fails when no common users / items are shared among the domains
• Example:
7
5
? 3 4
4
3
2
?5
3
4 2 5
3
4
4
15
Ratings in target domain
Ratings in auxiliary domain
9. EC-Web - August 2013, Prague, Czech Republic
Additional Knowledge Sources
• Extend existing rating prediction models by incorporating additional
sources of information about the users and items to better predict user
preferences (Koren and Bell, 2011; Baltrunas et al., 2012)
• Utilize implicit feedback, demographic data, contextual factors, ...
• Main limitations:
• Extensive training sets (i.e., browsing / purchase histories, ratings in
context) to learn the models are required
• Training sets are specific to the application’s target domain
8
10. EC-Web - August 2013, Prague, Czech Republic
Tag-Induced Cross-Domain CF
• Exploits user-generated tags that are shared across domains to link their
users and items (Shi et al., 2011)
• Cross-domain similarities calculated based on user-assigned tags are
used to constrain matrix factorization
• Main limitations:
• Depends on a similarity function that might influence the recommendation
quality
• Requires the target user to have tagged several items
9
11. EC-Web - August 2013, Prague, Czech Republic
Outline
10
• Recommender Systems and the Cold-Start Problem
• State of the Art
• Tag-Based Rating Prediction Models
• Experimental Evaluation
• Conclusions and Future Work
12. EC-Web - August 2013, Prague, Czech Republic
Our Solution: Tag-Based Prediction Models
• Main assumption: it is possible to exploit the information about how users
tag and rate items in a particular domain to improve the prediction accuracy
in another domain
• Example:
11
Target domain knowledge
Auxiliary domain knowledge
5
? 3 4
4
3
2
?5
3
4 2 5
3
4
4
15
ExcitingExciting
?
Exciting
13. EC-Web - August 2013, Prague, Czech Republic
Our Solution: Tag-Based Prediction Models
• Main assumption: it is possible to exploit the information about how users
tag and rate items in a particular domain to improve the prediction accuracy
in another domain
• Example:
11
Target domain knowledge
Auxiliary domain knowledge
5
? 3 4
4
3
2
?5
3
4 2 5
3
4
4
15
ExcitingExciting
5
Exciting
14. EC-Web - August 2013, Prague, Czech Republic
Latent Factor Models
• Each user u and item i are associated with latent factor vectors pu and qi
• Dot product captures the predicted user’s overall interest in the item:
• Factor vectors are learned using:
• Stochastic gradient descent
• Alternating least squares
12
ˆrui = puqi
T
15. EC-Web - August 2013, Prague, Czech Republic
Incorporating Tags
• Consider that the item i has been tagged with some tags T(i)
• We can use an additional set of factor vectors, one for each tag t, yt -
expressing how much an item that was annotated with tag t is loading the
factors
• The rating prediction function is now:
13
ˆrui = pu (qi
T
+ yt
T
)
t∈T (i)
∑
16. EC-Web - August 2013, Prague, Czech Republic
1st Proposed Model: UserItemTags
• Main idea: user ratings for an item may be also correlated with the specific
tags the user attached to the item
• Assumption: target user has tagged the item
14
ˆrui = pu (qi
T
+
1
Tu (i)
yt
T
)
t∈Tu (i)
∑
pu : latent factor vector of user u
qi : latent factor vector of item i
Tu(i) : set of tags assigned by user u to item i
yt : latent factor vector of tag t
17. EC-Web - August 2013, Prague, Czech Republic
2nd Proposed Model: UserItemRelTags
• Main idea: same as before, except that we consider only relevant tags (i.e.,
tags that have a statistically significant influence on the ratings)
• Assumption: target user has tagged the item
15
ˆrui = pu (qi
T
+
1
TRu (i)
yt
T
)
t∈TRu (i)
∑
pu : latent factor vector of user u
qi : latent factor vector of item i
TRu(i) : set of relevant tags assigned by user u to item i
yt : latent factor vector of tag t
18. EC-Web - August 2013, Prague, Czech Republic
3rd Proposed Model: ItemRelTags
• Main idea: target user rating for an item can be better predicted by modeling
how tags overall influence the item’s ratings
• Advantage: doesn’t require the target user to have tagged the item
16
ˆrui = pu (qi
T
+
1
TRoi
TRoi (t)yt
T
)
t∈TR(i)
∑
pu : latent factor vector of user u
qi : latent factor vector of item i
TR(i) : set of relevant tags assigned to item i
TRoi: relevant tags applied to item i (incl. duplicates)
TRoi(t): tag occurrences of tag t in item i
19. EC-Web - August 2013, Prague, Czech Republic
Outline
17
• Recommender Systems and the Cold-Start Problem
• State of the Art
• Tag-Based Rating Prediction Models
• Experimental Evaluation
• Conclusions and Future Work
20. EC-Web - August 2013, Prague, Czech Republic
Used Datasets
• 2 tagged rating datasets
18
Total number of ratings 24,564 24,564
Unique users 2,026 283
Unique items 5,088 12,554
Unique tags 9,486 4,708
Tag assignments 44,805 78,239
Average ratings per user 12.12 86.80
Average tags per rating 1.82 3.18
% of tags overlapping with the tags used in
the other domain
14.54 29.31
* The statistics refer to the datasets after performing some pre-processing
21. EC-Web - August 2013, Prague, Czech Republic
Cross-Domain Recommendations
Evaluation Design (1/2)
• 2 results for each model
• MovieLens as target and LibraryThing as auxiliary domain
• LibraryThing as target and MovieLens as auxiliary domain
• SVD model (Koren and Bell, 2011) used as baseline system
• Only data coming from the target domain used for training
• Rating prediction accuracy measured in terms of:
• Mean Absolute Error (MAE)
19
23. EC-Web - August 2013, Prague, Czech Republic 20
Cross-Domain Recommendations
Evaluation Design (2/2)
• (Extended) 10-fold cross validation scheme:
• Break up the data from target domain into 10 pieces
Target domain Auxiliary domain
24. EC-Web - August 2013, Prague, Czech Republic 20
Cross-Domain Recommendations
Evaluation Design (2/2)
• (Extended) 10-fold cross validation scheme:
• Break up the data from target domain into 10 pieces
• Treat one piece as test dataset and fit the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)
Target domain Auxiliary domain
Target domain Auxiliary domain
25. EC-Web - August 2013, Prague, Czech Republic 20
Cross-Domain Recommendations
Evaluation Design (2/2)
• (Extended) 10-fold cross validation scheme:
• Break up the data from target domain into 10 pieces
• Treat one piece as test dataset and fit the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)
Target domain Auxiliary domain
Target domain Auxiliary domain
26. EC-Web - August 2013, Prague, Czech Republic 20
Cross-Domain Recommendations
Evaluation Design (2/2)
• (Extended) 10-fold cross validation scheme:
• Break up the data from target domain into 10 pieces
• Treat one piece as test dataset and fit the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)
Target domain Auxiliary domain
Target domain Auxiliary domain
27. EC-Web - August 2013, Prague, Czech Republic 20
Cross-Domain Recommendations
Evaluation Design (2/2)
• (Extended) 10-fold cross validation scheme:
• Break up the data from target domain into 10 pieces
• Treat one piece as test dataset and fit the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)
Target domain Auxiliary domain
Target domain Auxiliary domain
28. EC-Web - August 2013, Prague, Czech Republic 20
Cross-Domain Recommendations
Evaluation Design (2/2)
• (Extended) 10-fold cross validation scheme:
• Break up the data from target domain into 10 pieces
• Treat one piece as test dataset and fit the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)
Target domain Auxiliary domain
Target domain Auxiliary domain
29. EC-Web - August 2013, Prague, Czech Republic 20
Cross-Domain Recommendations
Evaluation Design (2/2)
• (Extended) 10-fold cross validation scheme:
• Break up the data from target domain into 10 pieces
• Treat one piece as test dataset and fit the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)
Target domain Auxiliary domain
Target domain Auxiliary domain
30. EC-Web - August 2013, Prague, Czech Republic 20
Cross-Domain Recommendations
Evaluation Design (2/2)
• (Extended) 10-fold cross validation scheme:
• Break up the data from target domain into 10 pieces
• Treat one piece as test dataset and fit the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)
Target domain Auxiliary domain
Target domain Auxiliary domain
31. EC-Web - August 2013, Prague, Czech Republic 20
Cross-Domain Recommendations
Evaluation Design (2/2)
• (Extended) 10-fold cross validation scheme:
• Break up the data from target domain into 10 pieces
• Treat one piece as test dataset and fit the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)
Target domain Auxiliary domain
Target domain Auxiliary domain
32. EC-Web - August 2013, Prague, Czech Republic 20
Cross-Domain Recommendations
Evaluation Design (2/2)
• (Extended) 10-fold cross validation scheme:
• Break up the data from target domain into 10 pieces
• Treat one piece as test dataset and fit the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)
Target domain Auxiliary domain
Target domain Auxiliary domain
33. EC-Web - August 2013, Prague, Czech Republic 20
Cross-Domain Recommendations
Evaluation Design (2/2)
• (Extended) 10-fold cross validation scheme:
• Break up the data from target domain into 10 pieces
• Treat one piece as test dataset and fit the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)
Target domain Auxiliary domain
Target domain Auxiliary domain
34. EC-Web - August 2013, Prague, Czech Republic 20
Cross-Domain Recommendations
Evaluation Design (2/2)
• (Extended) 10-fold cross validation scheme:
• Break up the data from target domain into 10 pieces
• Treat one piece as test dataset and fit the model incrementally by
adding 10% from the other nine pieces (which together with the data
from the auxiliary domain are now the training data)
• Repeat
Target domain Auxiliary domain
Target domain Auxiliary domain
Target domain Auxiliary domain
35. EC-Web - August 2013, Prague, Czech Republic
Cross-Domain Recommendations
Evaluation Results (1/2)
Average MAEs using MovieLens as target and LibraryThing as auxiliary domain
21
0.74%
0.76%
0.78%
0.8%
0.82%
0.84%
0.86%
0.88%
0.9%
0.92%
0.94%
0.96%
0.98%
10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%%
Average'MAE'
Usage'of'data'from'target'domain'
SVD% UserItemTags% UserItemRelTags% ItemRelTags%
36. EC-Web - August 2013, Prague, Czech Republic
Cross-Domain Recommendations
Evaluation Results (2/2)
Average MAEs using LibraryThing as target and MovieLens as auxiliary domain
22
0.76%
0.78%
0.8%
0.82%
0.84%
0.86%
0.88%
0.9%
10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%%
Average'MAE'
Usage'of'data'from'the'target'domain'
SVD% UserItemTags% UserItemRelTags% ItemRelTags%
37. EC-Web - August 2013, Prague, Czech Republic
Single-Domain Recommendations
Evaluation Design
• Check the performance of the models using only rating and tagging data in
the target domain
• (Extended) 10-fold cross validation scheme:
• In each of the 10 iterations, one split used as test and the remaining data
as training set
• Training set is split into 10 further parts used for incremental training
• SVD used as a baseline model
23
40. EC-Web - August 2013, Prague, Czech Republic
Outline
26
• Recommender Systems and the Cold-Start Problem
• State of the Art
• Tag-Based Rating Prediction Models
• Experimental Evaluation
• Conclusions and Future Work
41. EC-Web - August 2013, Prague, Czech Republic
Conclusions
• Novel cross-domain recommendation approaches
• improve the prediction accuracy on a target domain using rating and
tagging data from an auxiliary domain (assuming that there is a good tag
overlap)
• very useful in the cold-start situation (i.e., when a small amount of training
data in the target domain is available)
• improve the rating prediction also in a single-domain scenario (i.e., using
only rating and tagging data in the target domain)
27
42. EC-Web - August 2013, Prague, Czech Republic
Future Work
• Extended evaluation
• Better correlation of the algorithm performance to the characteristics of the
datasets
• Usage of other datasets / comparison with other cross-domain RSs
• Analysis of fields of application
• Exploitation in context-aware RSs
• Generation of more diverse recommendations
28