Cold-Start Management with Cross-Domain Collaborative Filtering and Tags

  • 706 views
Uploaded on

Recommender systems suffer from the new user problem, i.e., the difficulty to make accurate predictions for users that have rated only few items. Moreover, they usually compute recommendations for …

Recommender systems suffer from the new user problem, i.e., the difficulty to make accurate predictions for users that have rated only few items. Moreover, they usually compute recommendations for items just in one domain, such as movies, music, or books. In this paper we deal with such a cold-start situation exploiting cross-domain recommendation techniques, i.e., we suggest items to a user in one target domain by using ratings of other users in a, completely disjoint, auxiliary domain. We present three rating prediction models that make use of information about how users tag items in an auxiliary domain, and how these tags correlate with the ratings to improve the rating prediction task in a different target domain. We show that the proposed techniques can effectively deal with the considered cold-start situation, given that the tags used in the two domains overlap.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
706
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
15
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. EC-Web - August 2013, Prague, Czech Republic Cold-Start Management with Cross-Domain Collaborative Filtering and Tags Manuel Enrich, Matthias Braunhofer, and Francesco Ricci Free University of Bozen - Bolzano Piazza Domenicani 3, 39100 Bolzano, Italy {menrich,mbraunhofer,fricci}@unibz.it
  • 2. EC-Web - August 2013, Prague, Czech Republic Outline 2 • Recommender Systems and the Cold-Start Problem • State of the Art • Tag-Based Rating Prediction Models • Experimental Evaluation • Conclusions and Future Work
  • 3. EC-Web - August 2013, Prague, Czech Republic Outline 2 • Recommender Systems and the Cold-Start Problem • State of the Art • Tag-Based Rating Prediction Models • Experimental Evaluation • Conclusions and Future Work
  • 4. EC-Web - August 2013, Prague, Czech Republic Recommender Systems (RSs) • Goal: recommend new, relevant items to users based on their feedback • Explicit feedback (ratings) vs. implicit feedback (purchase / browsing history) • Two basic technical approaches: • Collaborative filtering (CF) • Content-based 3
  • 5. EC-Web - August 2013, Prague, Czech Republic Cold-Start Problem • CF RSs suffer from the cold-start problem • New user problem: How do you recommend to a new user? • New item problem: How do you recommend a new item with no ratings? • Content-based RSs overcome the new item problem 4 5 ? 3 2 45 ? 43 ? ?? 5 ? 3 2 4 ?5 ? ? 4 ?3
  • 6. EC-Web - August 2013, Prague, Czech Republic Outline 5 • Recommender Systems and the Cold-Start Problem • State of the Art • Tag-Based Rating Prediction Models • Experimental Evaluation • Conclusions and Future Work
  • 7. EC-Web - August 2013, Prague, Czech Republic Cross-Domain CF (1/2) • Technique that uses user ratings in one (auxiliary) domain to improve the recommendation accuracy in another (target) domain (Berkovsky et al., 2007) • Example: 6 Ratings in target domain Recommender System Recommended content in target domain Ratings in auxiliary domain
  • 8. EC-Web - August 2013, Prague, Czech Republic Cross-Domain CF (2/2) • Main limitation: its limited applicability • It fails when no common users / items are shared among the domains • Example: 7 5 ? 3 4 4 3 2 ?5 3 4 2 5 3 4 4 15 Ratings in target domain Ratings in auxiliary domain
  • 9. EC-Web - August 2013, Prague, Czech Republic Additional Knowledge Sources • Extend existing rating prediction models by incorporating additional sources of information about the users and items to better predict user preferences (Koren and Bell, 2011; Baltrunas et al., 2012) • Utilize implicit feedback, demographic data, contextual factors, ... • Main limitations: • Extensive training sets (i.e., browsing / purchase histories, ratings in context) to learn the models are required • Training sets are specific to the application’s target domain 8
  • 10. EC-Web - August 2013, Prague, Czech Republic Tag-Induced Cross-Domain CF • Exploits user-generated tags that are shared across domains to link their users and items (Shi et al., 2011) • Cross-domain similarities calculated based on user-assigned tags are used to constrain matrix factorization • Main limitations: • Depends on a similarity function that might influence the recommendation quality • Requires the target user to have tagged several items 9
  • 11. EC-Web - August 2013, Prague, Czech Republic Outline 10 • Recommender Systems and the Cold-Start Problem • State of the Art • Tag-Based Rating Prediction Models • Experimental Evaluation • Conclusions and Future Work
  • 12. EC-Web - August 2013, Prague, Czech Republic Our Solution: Tag-Based Prediction Models • Main assumption: it is possible to exploit the information about how users tag and rate items in a particular domain to improve the prediction accuracy in another domain • Example: 11 Target domain knowledge Auxiliary domain knowledge 5 ? 3 4 4 3 2 ?5 3 4 2 5 3 4 4 15 ExcitingExciting ? Exciting
  • 13. EC-Web - August 2013, Prague, Czech Republic Our Solution: Tag-Based Prediction Models • Main assumption: it is possible to exploit the information about how users tag and rate items in a particular domain to improve the prediction accuracy in another domain • Example: 11 Target domain knowledge Auxiliary domain knowledge 5 ? 3 4 4 3 2 ?5 3 4 2 5 3 4 4 15 ExcitingExciting 5 Exciting
  • 14. EC-Web - August 2013, Prague, Czech Republic Latent Factor Models • Each user u and item i are associated with latent factor vectors pu and qi • Dot product captures the predicted user’s overall interest in the item: • Factor vectors are learned using: • Stochastic gradient descent • Alternating least squares 12 ˆrui = puqi T
  • 15. EC-Web - August 2013, Prague, Czech Republic Incorporating Tags • Consider that the item i has been tagged with some tags T(i) • We can use an additional set of factor vectors, one for each tag t, yt - expressing how much an item that was annotated with tag t is loading the factors • The rating prediction function is now: 13 ˆrui = pu (qi T + yt T ) t∈T (i) ∑
  • 16. EC-Web - August 2013, Prague, Czech Republic 1st Proposed Model: UserItemTags • Main idea: user ratings for an item may be also correlated with the specific tags the user attached to the item • Assumption: target user has tagged the item 14 ˆrui = pu (qi T + 1 Tu (i) yt T ) t∈Tu (i) ∑ pu : latent factor vector of user u qi : latent factor vector of item i Tu(i) : set of tags assigned by user u to item i yt : latent factor vector of tag t
  • 17. EC-Web - August 2013, Prague, Czech Republic 2nd Proposed Model: UserItemRelTags • Main idea: same as before, except that we consider only relevant tags (i.e., tags that have a statistically significant influence on the ratings) • Assumption: target user has tagged the item 15 ˆrui = pu (qi T + 1 TRu (i) yt T ) t∈TRu (i) ∑ pu : latent factor vector of user u qi : latent factor vector of item i TRu(i) : set of relevant tags assigned by user u to item i yt : latent factor vector of tag t
  • 18. EC-Web - August 2013, Prague, Czech Republic 3rd Proposed Model: ItemRelTags • Main idea: target user rating for an item can be better predicted by modeling how tags overall influence the item’s ratings • Advantage: doesn’t require the target user to have tagged the item 16 ˆrui = pu (qi T + 1 TRoi TRoi (t)yt T ) t∈TR(i) ∑ pu : latent factor vector of user u qi : latent factor vector of item i TR(i) : set of relevant tags assigned to item i TRoi: relevant tags applied to item i (incl. duplicates) TRoi(t): tag occurrences of tag t in item i
  • 19. EC-Web - August 2013, Prague, Czech Republic Outline 17 • Recommender Systems and the Cold-Start Problem • State of the Art • Tag-Based Rating Prediction Models • Experimental Evaluation • Conclusions and Future Work
  • 20. EC-Web - August 2013, Prague, Czech Republic Used Datasets • 2 tagged rating datasets 18 Total number of ratings 24,564 24,564 Unique users 2,026 283 Unique items 5,088 12,554 Unique tags 9,486 4,708 Tag assignments 44,805 78,239 Average ratings per user 12.12 86.80 Average tags per rating 1.82 3.18 % of tags overlapping with the tags used in the other domain 14.54 29.31 * The statistics refer to the datasets after performing some pre-processing
  • 21. EC-Web - August 2013, Prague, Czech Republic Cross-Domain Recommendations Evaluation Design (1/2) • 2 results for each model • MovieLens as target and LibraryThing as auxiliary domain • LibraryThing as target and MovieLens as auxiliary domain • SVD model (Koren and Bell, 2011) used as baseline system • Only data coming from the target domain used for training • Rating prediction accuracy measured in terms of: • Mean Absolute Error (MAE) 19
  • 22. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme:
  • 23. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme: • Break up the data from target domain into 10 pieces Target domain Auxiliary domain
  • 24. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme: • Break up the data from target domain into 10 pieces • Treat one piece as test dataset and fit the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain
  • 25. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme: • Break up the data from target domain into 10 pieces • Treat one piece as test dataset and fit the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain
  • 26. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme: • Break up the data from target domain into 10 pieces • Treat one piece as test dataset and fit the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain
  • 27. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme: • Break up the data from target domain into 10 pieces • Treat one piece as test dataset and fit the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain
  • 28. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme: • Break up the data from target domain into 10 pieces • Treat one piece as test dataset and fit the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain
  • 29. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme: • Break up the data from target domain into 10 pieces • Treat one piece as test dataset and fit the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain
  • 30. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme: • Break up the data from target domain into 10 pieces • Treat one piece as test dataset and fit the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain
  • 31. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme: • Break up the data from target domain into 10 pieces • Treat one piece as test dataset and fit the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain
  • 32. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme: • Break up the data from target domain into 10 pieces • Treat one piece as test dataset and fit the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain
  • 33. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme: • Break up the data from target domain into 10 pieces • Treat one piece as test dataset and fit the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain
  • 34. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) • (Extended) 10-fold cross validation scheme: • Break up the data from target domain into 10 pieces • Treat one piece as test dataset and fit the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) • Repeat Target domain Auxiliary domain Target domain Auxiliary domain Target domain Auxiliary domain
  • 35. EC-Web - August 2013, Prague, Czech Republic Cross-Domain Recommendations Evaluation Results (1/2) Average MAEs using MovieLens as target and LibraryThing as auxiliary domain 21 0.74% 0.76% 0.78% 0.8% 0.82% 0.84% 0.86% 0.88% 0.9% 0.92% 0.94% 0.96% 0.98% 10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%% Average'MAE' Usage'of'data'from'target'domain' SVD% UserItemTags% UserItemRelTags% ItemRelTags%
  • 36. EC-Web - August 2013, Prague, Czech Republic Cross-Domain Recommendations Evaluation Results (2/2) Average MAEs using LibraryThing as target and MovieLens as auxiliary domain 22 0.76% 0.78% 0.8% 0.82% 0.84% 0.86% 0.88% 0.9% 10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%% Average'MAE' Usage'of'data'from'the'target'domain' SVD% UserItemTags% UserItemRelTags% ItemRelTags%
  • 37. EC-Web - August 2013, Prague, Czech Republic Single-Domain Recommendations Evaluation Design • Check the performance of the models using only rating and tagging data in the target domain • (Extended) 10-fold cross validation scheme: • In each of the 10 iterations, one split used as test and the remaining data as training set • Training set is split into 10 further parts used for incremental training • SVD used as a baseline model 23
  • 38. EC-Web - August 2013, Prague, Czech Republic Single-Domain Recommendations Evaluation Results (1/2) Comparison of models’ MAEs - single vs. cross-domain (MovieLens target) 24 0.74% 0.76% 0.78% 0.8% 0.82% 0.84% 0.86% 0.88% 0.9% 0.92% 0.94% 0.96% 0.98% 10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%% Average'MAE' Usage'of'data' SVD% UserItemRelTags% UserItemRelTags%(cross@domain)% ItemRelTags% ItemRelTags%(cross@domain)%
  • 39. EC-Web - August 2013, Prague, Czech Republic Single-Domain Recommendations Evaluation Results (2/2) Comparison of models’ MAEs - single vs. cross-domain (LibraryThing target) 25 0.76% 0.78% 0.8% 0.82% 0.84% 0.86% 0.88% 0.9% 0.92% 10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%% Average'MAE' Usage'of'data' SVD% UserItemRelTags% UserItemRelTags%(cross@domain)% ItemRelTags% ItemRelTags%(cross@domain)%
  • 40. EC-Web - August 2013, Prague, Czech Republic Outline 26 • Recommender Systems and the Cold-Start Problem • State of the Art • Tag-Based Rating Prediction Models • Experimental Evaluation • Conclusions and Future Work
  • 41. EC-Web - August 2013, Prague, Czech Republic Conclusions • Novel cross-domain recommendation approaches • improve the prediction accuracy on a target domain using rating and tagging data from an auxiliary domain (assuming that there is a good tag overlap) • very useful in the cold-start situation (i.e., when a small amount of training data in the target domain is available) • improve the rating prediction also in a single-domain scenario (i.e., using only rating and tagging data in the target domain) 27
  • 42. EC-Web - August 2013, Prague, Czech Republic Future Work • Extended evaluation • Better correlation of the algorithm performance to the characteristics of the datasets • Usage of other datasets / comparison with other cross-domain RSs • Analysis of fields of application • Exploitation in context-aware RSs • Generation of more diverse recommendations 28
  • 43. EC-Web - August 2013, Prague, Czech Republic Questions? Thank you.