Random Indexing for Content-based Recommender Systems

1,811 views

Published on

Presentation for IIR 2011 - Italian Information Retrieval Workshop (Milano, Italy, 28.01.11)

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,811
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
85
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Random Indexing for Content-based Recommender Systems

    1. 1. IIR 2011 - Italian Information Retrieval Workshop Milano, Italy Random Indexing for Content-basedRecommender Systems Cataldo Musto - cataldomusto@di.uniba.it Pasquale Lops, Marco de Gemmis, Giovanni Semeraro University of Bari “Aldo Moro” (Italy), SWAP Research Group 28.01.11
    2. 2. outline 2/18 • Introduction • Analysis of Vector Space Models • Content-based Recommender Systems • Random Indexing for Content-based Recommender Systems • Introducing Random Indexing • Recommendation models • Experimental Evaluation • Open Issues • Future WorksC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    3. 3. vector space model 3/18 • Weak Points • High Dimensionality • Not incremental • Does not manage the latent semantics of documents • Does not manage negative preferencesC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    4. 4. recommender systems 4/18 • A specific type of Information Filtering system that attempts to recommend information items (films, television, video on demand, music, books,   etc) that are likely to be of interest to the user • Content-based Recommender Systems • The degree of interest is inferred by comparing the textual features extracted from the item w.r.t. the features stored in the user profileC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    5. 5. goals 5/18 • To investigate the impact of VSM in the area of content-based recommender systems • To introduce techniques able to overcome VSM typical VSM issues • Random Indexing • Dimensionality reduction technique (Sahlgren, 2005) • Negation Operator • Based on Quantum Logic (Widdows, 2007)C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    6. 6. random indexing 6/18 • Random Indexing (RI) is an incremental and effective technique for dimensionality reduction • Introduced by Sahlgren in 2005 • Based on the so-called “Distributional Hypothesis” • “Words that occur in the same context tend to have similar meanings” • “Meaning is its use” (Wittgenstein)C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    7. 7. how it works? 7/18 • Random Indexing reduces the m-dimensional term/doc matrix to a new k-dimensional matrix • How? • By multiplying the original matrix with a random one, built in an incremental way • formally: An,m Rm,k = Bn,k • k << m • After projection, the distance between points in the vector space is preservedC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    8. 8. building the matrix 8/18 • A context vector is assignedcan contain only vector has a fixed dimension (k) and it for each term. This values in -1, 0,1. Values are distributed in a random way but the number of non-zero elements is much smaller. • The Vector Space representation of a term is obtained by summing the context vectors of the terms it co-occurs with. • The Vector Space representation of a document (item) is obtained by summing the context vectors of the terms that occur in itC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    9. 9. profile representation 9/18 • What about the user profiles? • Assumption • The information coming from documents (items) that the user liked in the past could be a reliable source of information for building user profiles • The Vector Space representation of a user profile is obtained by combining the context vectors of all the documents that the user liked in the past.C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    10. 10. RI-based approach 10/18 Documents Rating Threshold VSM representation of RI-based profile for user uC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    11. 11. wRI-based approach 11/18 Documents Rating Threshold Higher weight given to the documents with higher ratingC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    12. 12. negation operator 12/18 • Both models inherit a classical problem of VSM • User profiles modeled only according to positive preferences • In classical text classifiers (Naive Bayes, SVM, etc.) both positive and negative preferences are modeled • Introduction of a Negation Operator based on Quantum Logic to tackle this problem • Query as “A not B” are allowed! • Projection of vector A on the subspace orthogonal to those generated by the vector B (*) http://code.google.com/p/semanticvectors/ • Implemented in the Semantic Vectors* open-source packageC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    13. 13. SV-based approach 13/18 Positive User Profile Vector Negative User Profile Vector VSM representation of SV-based profile for user uC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    14. 14. wSV-based approach 14/18 Positive User Profile Vector Negative User Profile Vector VSM representation of wSV-based profile for user uC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    15. 15. recommendation step 15/18 • u and a set of items we can suppose that the most relevant Given a user profile items for u are the nearest ones in the vector space • RI and wRI: Submission of a query based on • SV and wSV: Submission of a query based on • Returns the items with as much as possible features from p+ and as less as possible features from p- • Cosine Similarity to rank the items • Items whose similarity is under a certain threshold are labeled as non-relevant and filtered • Recommendation of the items with the highest similarity w.r.t. liked documents are combined.C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    16. 16. experimental design 16/18 • Dataset • Based on MovieLens, enriched with contents crawled from Wikipedia • 613 users, 520 items, 25k terms, 40k ratings • Experiment 1 • Do the weighting schema improve the predictive accuracy of the recommendation models? • Experiment 2 • Do the introduction of a negation operator improve the predictive accuracy of the recommendation models?C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    17. 17. results 17/18 RI W-RI SV W-SV Bayes Av-Precision@1 85.93 86.33 85.97 86.78 86.39 Av-Precision@3 85.78 85.97 86.19 86.33 85.97 Av-Precision@5 85.75 86.10 85.99 86.16 85.83 Av-Precision@7 85.61 85.92 85.88 85.95 85.77 Av-Precision@10 85.45 85.76 85.76 85.83 85.75 • SV and RI improve the Average Precision with respect to the Naive Bayes approach (currently implemented in our recommender system) 17C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    18. 18. conclusions 18/18 • Investigation of the impact of Random Indexing in the area of content-based recommender systems • Use of Random Indexing for dimensionality reduction • Introduction of Negation Operator based on Quantum Logic • Encouraging experimental results • First results improve the predictive accuracy obtained by classical content-based filtering techniques (e.g. Bayes) • Work-in-progress • To compare results with classical TF/IDF-based VSM, LSA, Rocchio and so onC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
    19. 19. http://www.di.uniba.it/~swap/ discussionThanks for your attention Cataldo Musto - cataldomusto@di.uniba.it University of Bari (Italy), SWAP Research Group IIR 2011 - Italian Information Retrieval Workshop

    ×