• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Enhanced Vector Space Models for Content-based Recommender Systems
 

Enhanced Vector Space Models for Content-based Recommender Systems

on

  • 2,261 views

My Contribution for the RecSys 2010 Doctoral Consortium, 4th ACM Conference on Recommender Systems, Barcelona, Spain, 26-30 Sept 2010

My Contribution for the RecSys 2010 Doctoral Consortium, 4th ACM Conference on Recommender Systems, Barcelona, Spain, 26-30 Sept 2010

Statistics

Views

Total Views
2,261
Views on SlideShare
2,261
Embed Views
0

Actions

Likes
6
Downloads
103
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • very great!
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Enhanced Vector Space Models for Content-based Recommender Systems Enhanced Vector Space Models for Content-based Recommender Systems Presentation Transcript

  • ACM Recommender Systems 2010 Barcelona, Spain Enhanced Vector Space Models for Content-based Recommender Systems Cataldo Musto - cataldomusto@di.uniba.it University of Bari “Aldo Moro” (Italy), SWAP Research Group ACM Recsys 2010 Doctoral Symposium 26.09.10
  • outline 2/30 • Motivations • Goals • Analysis of Vector Space Models • Enhanced Vector Space Models • Random Indexing-based model • Semantic Vectors-based model • Experimental Evaluation • Open Issues • Future Works Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • vector space model 3/30 item 2 item n item 1 Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • vector space model 4/30 • Introduced by Salton in 1975 • Given a set of documents and given N features describing the documents the VSM builds an N-dimensional Vector Space • Each item is represented as a point in the Vector Space • Application: Information Retrieval • Query: point in the Vector Space • Assumption: the nearest documents in the Vector Space are the most relevant ones • Cosine Similarity to compute the similarity between query and documents Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • idea 5/30 • To investigate the impact of Vector Space Models in the area of Information Filtering • “Information Filtering & Information Retrieval: two sides of the same coin?”, Belkin & Croft, 1992 • Strong Analogies • Documents to be retrieved vs. Items to be filtered • Query vs. User Profiles • Both IF and IR can share the same weighting techniques (TF/IDF) and similarity measures (Cosine similarity) Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • vsm analysis 6/30 • Strong Points • State-of-the-art model for the IR community • Clean and Solid formalism • Simpleness of calculations between objects in a VSM Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • vsm analysis (2) 7/30 • Weak Points • High Dimensionality • NLP operations (stopwords elimination, stemming and so on) • Not incremental • The whole Vector Space has to be generated from scratch whenever a new item is added to the repository • Does not manage the latent semantic of documents • Any permutation of the terms in a document has the same VSM representation! Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • goals 8/30 • To introduce tools and techniques able to overcome these drawbacks • Random Indexing • Dimensionality reduction technique • Sahlgren, 2005 • Semantic Vectors • Java open-source package • Widdows, 2007 Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • random indexing 9/30 • Random Indexing (RI) is an incremental and effective technique for dimensionality reduction • Introduced by Sahlgren in 2005 • Based on the so-called “Distributional Hypothesis” • “Words that occur in the same context tend to have similar meanings” • “Meaning is its use” (Wittgenstein) Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • how it works? 10/30 • Random Indexing reduces the m-dimensional term/doc matrix to a new k-dimensional matrix • How? • By multiplying the original matrix with a random one, built in an incremental way • formally: An,m Rm,k = Bn,k • k << m • After projection, the distance between points in the vector space is preserved Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • random matrix 11/30 • How is the random matrix build? • The whole process is based on the concept of “context” • Given a term, its “context” is the set of other words it co-occurs with • The matrix is built in an iterative and incremental way • The vector representing each document depends on the term that occur in it • The vector representing each term depends on its context (the other terms it co-occurs with) Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • item representation 12/30 • A context vector is assigned for each term. This vector has a fixed dimension (k) and it can contain only values in -1, 0,1. Values are distributed in a random way but the number of non-zero elements is much smaller. • The Vector Space representation of a term is obtained by summing the context vectors of the terms it co- occurs with. • The Vector Space representation of a document (item) is obtained by summing the context vectors of the terms that occur in it Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • ...summing up 13/30 • Random Indexing • Dimensionality reduction technique • Similar to LSA • Incremental • Tremendous saving of computational resources • Manages the semantics of documents • The position of a document (item) in the vector space depends on the position of the terms that occur in the document • The position of a terms depends on the position of the other terms it co-occurs with! Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • recommendation models 14/30 • We developed two different recommendation models • Both based on vector space built through Random Indexing • Random Indexing-based model (RI) • Semantic Vectors-based model (SV) Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • profile representation 15/30 • What about the user profiles? • Assumption • The information coming from documents (items) that the user liked in the past could be a reliable source of information for building user profiles • The Vector Space representation of a user profile is obtained by combining the context vectors of all the documents that the user liked in the past. • Definition of RI-based and SV-based models • The difference lies in the way they exploit the vector space to build user profiles Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • RI-based approach 16/30 Documents Rate Threshold VSM representation of RI-based profile for user u Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • RI-based approach 17/30 • The simplest user profile • Combines the information coming from previously liked documents in an uniform way • Different ratings are not managed! • Definition of a weighted counterpart, called W-RI • Weighted Random Indexing Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • wRI-based approach 18/30 Documents Rate Threshold VSM representation of wRI-based profile for user u Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • wRI-based approach 19/30 • Both models inherit a classical problem of VSM • User profiles modeled only according to positive preferences • In classical text classifiers (Naive Bayes, SVM, etc.) both positive and negative preferences are modeled • Definition of Semantic Vectors (SV) based model to tackle this problem Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • semantic vectors 20/30 • Open-source package written in Java • Implements a Random Indexing-based approach for documents indexing • Integrates a negation operator based on quantum mechanics • Query as “A not B” are allowed! • Projection of vector A on the subspace orthogonal to those generated by the vector B Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • SV-based approach 21/30 Positive User Profile Vector Negative User Profile Vector VSM representation of SV-based profile for user u Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • wSV-based approach 22/30 Positive User Profile Vector Negative User Profile Vector VSM representation of wSV-based profile for user u Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • recommendation step 23/30 • Given a user profile u and a set of items we can suppose that the most relevant items for u are the nearest ones in the vector space • RI and wRI: Submission of a query based on • SV and wSV: Submission of a query based on • Returns the items with as much as possible features from p+ and as less as possible features from p- • Cosine Similarity to rank the items • Items whose similarity is under a certain threshold are labeled as non-relevant and filtered • Recommendation of the items with the highest similarity w.r.t. liked documents are combined. Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • experimental evaluation 24/30 • 100k Movielens Dataset • Content-based information crawled from Wikipedia • Movies without a Wikipedia entry were deleted • 613 users, 520 items, 40k ratings • 5-fold cross validation • Average Precision @1, @3, @5, @7, @ 10 • NLP processing: stopwords elimination Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • experimental design 25/30 • Experiment 1 • Do the weighting schema improve the predictive accuracy of the recommendation models? • Experiment 2 • Do the introduction of a negation operator improve the predictive accuracy of the recommendation models? Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • results - experiment 1 26/30 RI W-RI SV W-SV 86.4 87 86.125 86.5 85.85 86 85.575 85.5 85.3 85 AVP@1 AVP@5 AVP@10 AVP@1 AVP@5 AVP@10 • Our weighting model (even in this naive form) improves the predictive accuracy of both RI-based and SV-based models Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • results - experiment 2 27/30 RI SV W-RI W-SV 87 87 86.5 86.5 86 86 85.5 85.5 85 85 AVP@1 AVP@5 AVP@10 AVP@1 AVP@5 AVP@10 • The integration of a negation operator based on quantum mechanics improves the predictive accuracy of both RI-based and SV-based models Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • results 28/30 RI W-RI SV W-SV Bayes Av-Precision@1 85.93 86.33 85.97 86.78 86.39 Av-Precision@3 85.78 85.97 86.19 86.33 85.97 Av-Precision@5 85.75 86.10 85.99 86.16 85.83 Av-Precision@7 85.61 85.92 85.88 85.95 85.77 Av-Precision@10 85.45 85.76 85.76 85.83 85.75 • SV and RI improve the Average Precision with respect to the Naive Bayes approach (currently implemented in our recommender system) 28 Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • conclusions 29/30 • Investigation of the impact of enhanced VSM in the area of content-based recommender systems • Use of Random Indexing for dimensionality reduction • Definition of RI and SV-based models • Encouraging experimental results • First results improve the predictive accuracy obtained by classical content-based filtering techniques (e.g. Bayes) Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • open issues & future works 30/30 • Work-in-progress • Experimental Evaluation on a classical TF/IDF-based VSM • Open Issues • Looking for a state-of-the-art dataset for the evaluation of content- based recommendation models • Future Work • Comparison of the predictive accuracy with different NLP steps (stemming, entity recognition, POS-tagging and so on) • Integration of Social Media (Facebook, Twitter, LinkedIn) for building accurate user profiles by skipping the training step • Integration of Linked Data-based representation (by exploiting DBPedia data) to exploit explicit relationships between concepts Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • http://www.di.uniba.it/~swap/ discussion Cataldo Musto - cataldomusto@di.uniba.it University of Bari (Italy), SWAP Research Group ACM Recsys 2010 Doctoral Symposium