Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Linked Open Data to Support Content... by Roberto Mirizzi 813 views
- Semantics-aware Content-based Recom... by Pasquale Lops 1842 views
- How to use web personalization to b... by Qubit 1122 views
- THE IMPACT OF WEB PERSONALIZATION by Alireza Khosroyar 629 views
- Recommender systems: Content-based ... by Viet-Trung TRAN 1319 views
- Recommender Systems (Machine Learni... by Xavier Amatriain 129176 views

1,811 views

Published on

Presentation for IIR 2011 - Italian Information Retrieval Workshop (Milano, Italy, 28.01.11)

Published in:
Education

No Downloads

Total views

1,811

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

85

Comments

0

Likes

1

No embeds

No notes for slide

- 1. IIR 2011 - Italian Information Retrieval Workshop Milano, Italy Random Indexing for Content-basedRecommender Systems Cataldo Musto - cataldomusto@di.uniba.it Pasquale Lops, Marco de Gemmis, Giovanni Semeraro University of Bari “Aldo Moro” (Italy), SWAP Research Group 28.01.11
- 2. outline 2/18 • Introduction • Analysis of Vector Space Models • Content-based Recommender Systems • Random Indexing for Content-based Recommender Systems • Introducing Random Indexing • Recommendation models • Experimental Evaluation • Open Issues • Future WorksC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 3. vector space model 3/18 • Weak Points • High Dimensionality • Not incremental • Does not manage the latent semantics of documents • Does not manage negative preferencesC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 4. recommender systems 4/18 • A speciﬁc type of Information Filtering system that attempts to recommend information items (ﬁlms, television, video on demand, music, books, etc) that are likely to be of interest to the user • Content-based Recommender Systems • The degree of interest is inferred by comparing the textual features extracted from the item w.r.t. the features stored in the user proﬁleC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 5. goals 5/18 • To investigate the impact of VSM in the area of content-based recommender systems • To introduce techniques able to overcome VSM typical VSM issues • Random Indexing • Dimensionality reduction technique (Sahlgren, 2005) • Negation Operator • Based on Quantum Logic (Widdows, 2007)C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 6. random indexing 6/18 • Random Indexing (RI) is an incremental and effective technique for dimensionality reduction • Introduced by Sahlgren in 2005 • Based on the so-called “Distributional Hypothesis” • “Words that occur in the same context tend to have similar meanings” • “Meaning is its use” (Wittgenstein)C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 7. how it works? 7/18 • Random Indexing reduces the m-dimensional term/doc matrix to a new k-dimensional matrix • How? • By multiplying the original matrix with a random one, built in an incremental way • formally: An,m Rm,k = Bn,k • k << m • After projection, the distance between points in the vector space is preservedC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 8. building the matrix 8/18 • A context vector is assignedcan contain only vector has a ﬁxed dimension (k) and it for each term. This values in -1, 0,1. Values are distributed in a random way but the number of non-zero elements is much smaller. • The Vector Space representation of a term is obtained by summing the context vectors of the terms it co-occurs with. • The Vector Space representation of a document (item) is obtained by summing the context vectors of the terms that occur in itC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 9. proﬁle representation 9/18 • What about the user proﬁles? • Assumption • The information coming from documents (items) that the user liked in the past could be a reliable source of information for building user proﬁles • The Vector Space representation of a user proﬁle is obtained by combining the context vectors of all the documents that the user liked in the past.C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 10. RI-based approach 10/18 Documents Rating Threshold VSM representation of RI-based proﬁle for user uC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 11. wRI-based approach 11/18 Documents Rating Threshold Higher weight given to the documents with higher ratingC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 12. negation operator 12/18 • Both models inherit a classical problem of VSM • User proﬁles modeled only according to positive preferences • In classical text classiﬁers (Naive Bayes, SVM, etc.) both positive and negative preferences are modeled • Introduction of a Negation Operator based on Quantum Logic to tackle this problem • Query as “A not B” are allowed! • Projection of vector A on the subspace orthogonal to those generated by the vector B (*) http://code.google.com/p/semanticvectors/ • Implemented in the Semantic Vectors* open-source packageC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 13. SV-based approach 13/18 Positive User Proﬁle Vector Negative User Proﬁle Vector VSM representation of SV-based proﬁle for user uC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 14. wSV-based approach 14/18 Positive User Proﬁle Vector Negative User Proﬁle Vector VSM representation of wSV-based proﬁle for user uC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 15. recommendation step 15/18 • u and a set of items we can suppose that the most relevant Given a user proﬁle items for u are the nearest ones in the vector space • RI and wRI: Submission of a query based on • SV and wSV: Submission of a query based on • Returns the items with as much as possible features from p+ and as less as possible features from p- • Cosine Similarity to rank the items • Items whose similarity is under a certain threshold are labeled as non-relevant and ﬁltered • Recommendation of the items with the highest similarity w.r.t. liked documents are combined.C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 16. experimental design 16/18 • Dataset • Based on MovieLens, enriched with contents crawled from Wikipedia • 613 users, 520 items, 25k terms, 40k ratings • Experiment 1 • Do the weighting schema improve the predictive accuracy of the recommendation models? • Experiment 2 • Do the introduction of a negation operator improve the predictive accuracy of the recommendation models?C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 17. results 17/18 RI W-RI SV W-SV Bayes Av-Precision@1 85.93 86.33 85.97 86.78 86.39 Av-Precision@3 85.78 85.97 86.19 86.33 85.97 Av-Precision@5 85.75 86.10 85.99 86.16 85.83 Av-Precision@7 85.61 85.92 85.88 85.95 85.77 Av-Precision@10 85.45 85.76 85.76 85.83 85.75 • SV and RI improve the Average Precision with respect to the Naive Bayes approach (currently implemented in our recommender system) 17C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 18. conclusions 18/18 • Investigation of the impact of Random Indexing in the area of content-based recommender systems • Use of Random Indexing for dimensionality reduction • Introduction of Negation Operator based on Quantum Logic • Encouraging experimental results • First results improve the predictive accuracy obtained by classical content-based ﬁltering techniques (e.g. Bayes) • Work-in-progress • To compare results with classical TF/IDF-based VSM, LSA, Rocchio and so onC.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
- 19. http://www.di.uniba.it/~swap/ discussionThanks for your attention Cataldo Musto - cataldomusto@di.uniba.it University of Bari (Italy), SWAP Research Group IIR 2011 - Italian Information Retrieval Workshop

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment