Using Social- and Pseudo-Social Networks to Improve Recommendation Quality
1 ITWP2011, BarcelonaUsing Social- and Pseudo-Social Networks to ImproveRecommendation QualityAlan Said, Ernesto W. De Luca, Sahin Albayrak
2Abstract The accumulated amount of data in the digital universe reached 1.2 Zettabytes (1 billion terabytes) in 2010. 50% increase since 2008. Websites increasingly accumulate a wider variety of data on their users Without necessarily using it This paper: how can this data be used to improve recommendation
3Outline Introduction Recommender Systems Problem statement Dataset Statistics Social and Pseudo-Social networks Approach Results
4Introduction IMDb, one of the first online recommender systems, turned 20 on October 17th 2010. Ever since their beginning, recommender systems have, through relatively simple techniques, produced recommendations for their users Today’s online systems contain more information about their users, we should use that information. Which information is important?
5The Problem• What to do with the heaps of information available? • What and how to use in order to improve, or learn how to improve recommendations • How should we treat • Friendships? • Comments? • Idols? • common interests? • How important are these in terms of recommendation quality?
6Dataset From the movie domain – Moviepilot.de Germany’s largest movie recommendation community 1M+ users 13M ratings 50K movies Subset used here 10, 000 randomly selected users with minimum 30 ratings 1.5M ratings 50, 000 comments 4, 000 friendships 170, 000 idols 25, 000 ”diggs”
7Social- and Pseudo-Socialnetworks Social networks Explicit statements of friendship between users Pseudo social networks Users commenting on the same movie Users being fans of the same people Users ”digging” the same news articles, trailers, etc. 38% of ratings performed by users with friends 45% of ratings performed by users with comments 77% of ratings performed by users who are fans 29% of ratings performed by users who ”digg”
8The Approach Augmentig k-Nearest Neighbor neighborhoods by using information from (pseudo) social networks Using standard Pearson Similarity Increasing the similarity of users in the same networks in order to add them to the neighborhood
9The Approach Standard neighborhood Augmented neighborhood
10Motivation Similarity metrics (Pearson, Jaccard, etc) are based on co- ratings Popular items often heighten similarities without adding ”value” e.g. movies like ”The Matrix” and ”The Lord of The Rings” often have similar (high) ratings, even if users do not share taste Adding importance to users who share other interests filters out some of the effects of popular items.
12Conclusion Social and interaction (co-commenting, etc) networks seem to hold more information than standard CF is able to identify Similarity metrics do not always tell the complete truth ToDo’s: Find items that are important for establishing similarity between users Investigate what other information can be used for measuring similarities