Your SlideShare is downloading. ×
Clustering Technique for Collaborative  Filtering Recommendation and Application to Venue Recommendation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Clustering Technique for Collaborative Filtering Recommendation and Application to Venue Recommendation

3,615
views

Published on

Published in: Education, Technology

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,615
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
131
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Pham Manh Cuong
  • Transcript

    • 1. Clustering Techniques for Collaborative Filtering and the Application to Venue Recommendation Manh Cuong Pham , Yiwei Cao, Ralf Klamma Information Systems and Database Technology RWTH Aachen, Germany Graz , Austria, September 01, 2010 I-KNOW 2010
    • 2. Agenda
      • Introduction
      • Clustering techniques for collaborative filtering
      • Case study: venue recommendation
        • Data sets: DBLP and CiteSeerX
        • User-based
        • Item-based
      • Conclusions and Outlook
    • 3. Introduction
      • Recommender systems: help users dealing with information overload
      • Components of a recommender system [ Burke2002 ]
        • Set of users, set of items (products)
        • Implicit/explicit user rating on items
        • Additional information: trust, collaboration, etc.
        • Algorithms for generating recommendations
      • Recommendation techniques [ Adomavicius and Tuzhilin 2005 ]
        • Collaborative Filtering (CF) [Breese et al. 1998 ]
          • Memory-based algorithms: user-based, item-based [Sarwar 2001]
          • Model-based algorithms: Bayesian network [ Breese1998 ] ; Clustering [ Ungar 1998 ] ; Rule-based [ Sarwar2000 ] ; Machine learning on graphs [Zhou 2005, 2008]; PLSA [Hofmann 1999] ; Matrix factorization [Koren 2009]
        • Content-based recommendation [Sarwar et al. 2001]
        • Hybrid approaches [Burke 2002]
    • 4. Clustering and Collaborative Filtering Cluster 2 Cluster 1 item-based CF User clustering Item clustering item-based CF item-based CF
      • Problems: large-scale data; sparse rating matrix;
      • diversity of users and items
      • Previous approaches: Clustering based on ratings
        • K-means, Metis, etc. [Rashid 2006, Xue 2005, O’Connor 2001]
      • Our approach
        • Clustering based on additional information: relationships between users, items
        • Improvement on both efficiency and accuracy
      x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
    • 5. Evaluation: Venue Recommendation
      • Recommend venues (conferences, journals, workshops) to researchers
      • User-based CF
        • Populate user-item matrix using venue participation history
        • Ratings: normalized venue publication counts
        • User-clustering: co-authorship network
      • Item-based CF
        • Similarity between venues based on citation
        • Similarity measure: cosine
        • Venue clustering: similarity network
    • 6. Data Sets
      • DBLP (http://www.informatik.uni-trier.de/~ley/db/)
        • 788,259 author’s names
        • 1,226,412 publications
        • 3,490 venues (conferences, workshops, journals)
      • CiteSeerX (http://citeseerx.ist.psu.edu/)
        • 7,385,652 publications (including publications in reference lists)
        • 22,735,240 citations
        • Over 4 million author’s names
      • Combination
        • Canopy clustering [ McCallum 2000 ]
        • Result: 864,097 matched pairs
        • On average: venues cite 2306 and
        • are cited 2037 times
    • 7. User-based CF: Author Clustering
      • Data: DBLP
      • Perform 2 test cases for the years of 2005 and 2006
        • Clustering of co-authorship networks
        • 2005s network: 478,108 nodes; 1,427,196 edges
        • 2006s network: 544,601 nodes; 1,686,867 edges
        • Prediction of the venue participation
      • Clustering algorithm
        • Density-based algorithm [Clauset 2004 ]
        • Obtained modularity: 0.829 and 0.82
      • Cluster size distribution follows Power law
    • 8. User-based CF: Performance
      • Precisions for 1000 random chosen authors
      • Precisions computed at 11 standard recall levels 0%, 10%,….,100%
      • Results
        • Clustering performs better
        • Not significant improved
        • Better efficiency
      • Further improvement
        • Different networks: citation
        • Overlapping clustering
    • 9. Item-based CF: Venue Network Creation and Clustering
      • Knowledge network
        • Aggregate bibliography coupling counts at venue level
        • Undirected graph G(V, E) , where V : venues, E : edges weighted by cosine similarity
        • Threshold:
        • Clustering: density-based algorithm [ Neuman 2004, Clauset 2004 ]
        • Network visualization: force-directed paradigm [ Fruchterman 1991 ]
      • Knowledge flow network (for venue ranking, see Pham & Klamma 2010 )
        • Aggregate bibliography coupling counts at venue level
        • Threshold: citation counts >= 50
        • Domains from Microsoft Academic Search ( http://academic.research.microsoft.com/)
    • 10. Knowledge Network: the Visualization
    • 11. Knowledge Network: Clustering
    • 12. Interdisciplinary Venues: Top Betweenness Centrality
    • 13. High Prestige Series: Top PageRank
    • 14. Conclusions and Future Research
      • Clustering and recommender systems
        • Advantage of using additional information for clustering
        • Application of clustering for both user-based and item-based CF
        • Key issue: impact of the communities (cluster) on the quality of recommendations; non-overlapping communities vs. overlapping communities
      • Outlook
        • Further evaluation: trust networks clustering, paper and potential collaborator recommendation
        • Datasets: Epinion, Last.fm, etc.
        • Digital libraries in Web 2.0: Mendeley, ResearchGate, etc.