Top-N Recommender Systems: Revisiting Item Neighborhood MethodsGeorge KarypisDepartment of Computer Science & EngineeringUniversity of Minnesotakarypis@cs.umn.eduhttp://www.cs.umn.edu/~karypisAbstractTop-N recommender systems are designed to generate a ranked list of items that a user will finduseful based on the user’s prior activity. These systems have become ubiquitous and are anessential tool for information filtering and (e-)commerce. Over the years, collaborative filtering,which derive these recommendations by leveraging past activities of groups of users, hasemerged as the most prominent approach for solving this problem. Among the multitude ofmethods that have been developed, item-based nearest neighbor algorithms are among thesimplest and yet best-performing methods for Top-N recommender systems. These methodsrank the items to be recommended based on how similar they are to the items in a user’s prioractivity history, using various co-occurrence similarity measures.In this talk we present our recent work in these item-based neighborhood methods that hassubstantially improved the accuracy of the predictions. One shortcoming of traditional item-based neighborhood methods is that they rely on a similarity measure that needs to be specifieda priori. To address this problem we developed a class of item-based neighborhood methodsthat directly estimate from the training data a sparse item-item similarity matrix. This similaritymatrix is estimated using a structural equation modeling (SEM) framework, which requires eachcolumn of the user-item matrix to be approximated as a sparse aggregation of some othercolumns. These other columns correspond to the learned neighbors and their aggregationweights to the learned similarities. A second shortcoming of item-based neighborhood methodsis that the item-item similarity measures rely on co-occurrences, which become problematicwhen the datasets are very sparse and the number of items pairs with sufficiently many co-occurrences is small. To address this problem we extended the SEM framework to estimate afactored version of the item-item similarity matrix. This factored representation projects theitems in a lower dimensional space, which allows for meaningful similarity estimates betweenitems that never co-occurred in the original user-item matrix. In addition to the above, we alsodiscuss and present result from our work to enhance the above SEM-models by incorporatingitem side information to further improve the Top-N recommendation accuracy and to alsoaddress the item cold-start recommendation problem.BioGeorge Karypis is a professor at the Department of Computer Science & Engineering at theUniversity of Minnesota, Twin Cities. His research interests spans the areas of data mining,bioinformatics, cheminformatics, high performance computing, information retrieval,collaborative filtering, and scientific computing. His research has resulted in the development ofsoftware libraries for serial and parallel graph partitioning (METIS and ParMETIS), hypergraphpartitioning (hMETIS), for parallel Cholesky factorization (PSPASES), for collaborative filtering-based recommendation algorithms (SUGGEST), clustering high dimensional datasets (CLUTO),finding frequent patterns in diverse datasets (PAFI), and for protein secondary structureprediction (YASSPP). He has coauthored over 200 papers on these topics and a book title“Introduction to Parallel Computing” (Publ. Addison Wesley, 2003, 2nd edition). In addition, he is
serving on the program committees of many conferences and workshops on these topics, andon the editorial boards of the IEEE Transactions on Knowledge and Data Engineering, SocialNetwork Analysis and Data Mining Journal, International Journal of Data Mining andBioinformatics, the journal on Current Proteomics, Advances in Bioinformatics, and Biomedicineand Biotechnology.