Ph.D. Defense - Enhanced Vector Space Models for Content-based Recommender Systems
Upcoming SlideShare
Loading in...5
×
 

Ph.D. Defense - Enhanced Vector Space Models for Content-based Recommender Systems

on

  • 1,539 views

08.06.12

08.06.12
PhD defense: Enhanced Vector Space Models for Content-based Recommender Systems.

Statistics

Views

Total Views
1,539
Views on SlideShare
1,537
Embed Views
2

Actions

Likes
6
Downloads
77
Comments
0

2 Embeds 2

http://www.slashdocs.com 1
http://yali-ld1.linkedin.biz 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Ph.D. Defense - Enhanced Vector Space Models for Content-based Recommender Systems Ph.D. Defense - Enhanced Vector Space Models for Content-based Recommender Systems Presentation Transcript

  • Università degli Studi di Bari ‘Aldo Moro’ Dottorato di Ricerca in Informatica - Ciclo XXIV Enhanced Vector Space Models for Content-based Recommender Systems Cataldo Musto, Ph.D. Candidate Supervisor: prof. Giovanni Semeraro08.06.12
  • what will we talk about in the next 40 minutes?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • life is all a matter of decisionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • life is all a matter of decisionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • decision-making is actually challengingCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • decision-making is actually challengingCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • decision-making is actually challengingCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • as much we need to hold knowledge as possibleCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Leibniz “In things which are absolutely indifferent there can be no choice and consequently no option or will. ”Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • information age knowledge is spread through the WebCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • social media changed the rules for informationmanagement and knowledge acquisitionCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • exponential growth of the available informationCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • it is physiologically impossible to follow the information flow in real timeCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • how much information?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • we daily interact with393 bits of information per secondCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • human brain can absorb126 bits of information per secondCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • we can handle 126 bits of information we deal with 393 bits of information ratio: more than (Source: Adrian C.Ott, The 24-hour customer) 3x consequence: Information OverloadCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Information OverloadCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Information OverloadCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Information OverloadCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Information OverloadCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Information OverloadCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • paradox of choice (Barry Schwartz, TED talk “Why more is less”)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Buridan’s ass paradox Two alternatives. The ass cannot decide. It starves.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Is the information overload actually unbearable?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • “It is not information overload. It is filter failure” Clay Shirky talk @Web2.0 ExpoCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Solution we need to the improve techniques for filtering the informationCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Information Filtering (IF) “To expose users only with the information that are relevant for them, thus avoiding information overload.” to filter. as kids do when they play with sand.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • IF applicationsExample: Recommender System Relevant items (movies, news, books, etc.) are pushed to the user according to her needs.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Recommender Systems are an effective way to face the Information Overload problemCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • example Amazon.com RecommendationsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Information Retrieval (IR) “Findings of relevant pieces of information from a collection of (usually unstructured) data”Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • IR applications Example: Search Engines Relevant document are returned to the user, according to her query.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • IR vs. IF • IR and IF represent two strictly related research areas • Same goal: to optimize and make easier the access to (unstructured) data sources • “Two sides of the same coin” (*) (*) N.Belkin, W. Croft: Information Filtering and Information Retrieval: Two sides of the same coin”, Communications of ACM, Volume 35, Issue 12, pp. 29-38, 1992Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • IR vs IF: differences • Little differences • Representation of user needs • Query in IR, user profile in IF • Convergence between IR and IF • Personalized Search !Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Ph.D. dissertation Research Question Is it possible to exploit the convergence between IR and IF to introduce a recommendation framework based on IR techniques?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • outline.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • outline (1/2) • recommender systems • content-based recommender systems (CBRS) • vector space models • VSM for CBRS • strengths and weaknessesCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • outline (2/2) • eVSM: enhanced vector space models • semantics in VSMs • dimensionality reduction in VSMs • modeling negation in VSMs • applications and experimental evaluation • movie recommendation • Philips TV-guides personalizationCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • recommender systems.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • definition guiding the Recommender Systems have the goal of users in a personalized way to interesting or useful objects in a large space of possible options. Burke, 2002 (*) (*) Robin D. Burke: Hybrid Recommender Systems: Survey and Experiments. UMUAI, volume 12, issue 4, 331-370 (2002)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • suggestions• Examples • books or news to read • music to be listened to • movies worth to be watched • restaurants, etc.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Some maths (1/2) • Let • U set of users • I set of items • Given • user u ∈ U • item i ∈ ICataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Some maths (2/2) • A recommender system should predict how relevant item i is for user u by defining a scoring function • f: U×I→[0,1] = scoring function • The items with the highest value of f are labeled as relevant and returned to the userCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • classes of RSs • In literature many approaches for building RSs have been introduced. • Collaborative Recommender Systems • Content-based Recommender Systems • Knowledge-based Recommender Systems • Demographic-based Recommender Systems • Social Recommender Systems • Hybrid Recommender SystemsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • classes of RSs • In literature many approaches for building RSs have been introduced. • Collaborative Recommender Systems FOCUS • Content-based Recommender Systems • Knowledge-based Recommender Systems • Demographic-based Recommender Systems • Social Recommender Systems • Hybrid Recommender SystemsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • content-based recommenders Suggest items similar to those liked in the past by the userCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • content-based recommenders key concepts • Each item has to be described through a set of textual features • Movie plots, content of news, book summaries,Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • content-based recommenders key concepts • User profile contains the features that often occur in the items the user liked • A profile of a user interested in basketball will contain keywords related to it (example: basketball teams, players or competitions)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • content-based recommenders key concepts • Recommendations are provided by calculating the overlap between the features stored in the user profile and those that occur in the item. • The bigger the overlap, the higher the relevanceCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • content-based recommenders example: news recommendations Items User Profile User is interested in ♥ news articles about sports, football, ♥ cycling, etc.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • content-based recommenders example: news recommendations Items Recommendations ♥ ♥Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • content-based recommenders example: news recommendations Items Recommendations ♥ X ♥Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • content-based recommenders example: news recommendations Items Recommendations ♥ X ♥Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • main building block vector space model the most adopted IR model (*) (*) Gerard Salton: A Vector Space Model for Automatic Indexing, Communications of the ACM, vol. 18, nr. 11, pages 613–620Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • vector space model (VSM) Testo • Given a set of n features (vocabulary) Testo • f={ f1, f2 ... fn } • Given a set of M items • Each document (item) is represented as a point a an n-dimensional vector space • I = (wi in the itemw is the weight of i feature .....w ) -f1 fn fiCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • VSM representation football news sports news politics news politics newsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • research question Is it possible to exploit VSM for a recommendation scenario?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • VSM for CBRS how to adapt it? • In VSM each item is represented as a vector • User profile vector space representation as well needs a • How? • For example, by combining vectors of the items (documents) the user liked in the pastCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • VSM representation user profile football news sports news politics news politics newsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • VSM representation Recommendation task seen as user profile similarity calculation football news between vectors sports news politics news politics newsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • VSM representation recommender systmem suggests user profile football and football news sports news sports news politics news politics newsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Can this model be improved? Yes.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • VSM weaknesses • Modeling Negation • VSM does not model negative evidences • The vector space representation only depends on the features that occur in the document, there are no assumption about the features that don’t occur • What a specific user dislikes is not consideredCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • VSM weaknesses • High Dimensionality • As the number of documents grows, the number of features grows as well • Large vector spaces are difficult to manageCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • VSM weaknesses •Language issues • Does not manage the latent semantic of documents • String matching-based approach • A CBRS based on VSM cannot understand the information it manages apple ?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • VSM weaknesses •Language issues • Representation is language-dependant • User profile built in a language can not be exploited to provide recommendation of items described in another language • It would be good to receive (e.g.) recommendation about news written by english newspapers even if I expressed my interest only on italian news articles!Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • How to catch these issues?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • a novel recommendation framework based on VSM eVSM enhanced Vector Space Model (*) (*) Cataldo Musto: Enhanced Vector Space Models for Content-based Recommender Systems, RECSYS 2010, pages 361-364Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM goals • To introduce a CBRS based on VSM • To catch representation issues of VSM •No Semantics •High Dimensionality •No modeling of Negative Information •Language-dependant recommendationsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • a novel recommendation framework based on VSM eVSM step 1: modeling semantics step 2: dimensionality reduction step 3: modeling negation step 4: building user profiles step 5: providing suggestionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • how to improve the semantic modeling in VSMs? distributional models (Firth, 1957) Firth, J.R. A synopsis of linguistic theory 1930-1955. In Studies in Linguistic Analysis, pp. 1-32, 1957.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • distributional models “meaning is its use” L.WittgensteinCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • distributional models insightby analyzing large corpus of textual data it is possibleto infer information about the usage (about the meaning)of the terms.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • distributional models insightby analyzing large corpus of textual data it is possibleto infer information about the usage (about the meaning)of the terms. exampleCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Distributional Models term/context matrix c1 c2 c3 c4 c5 c6 c7 c8 c9 t1 ✔ ✔ ✔ ✔ t2 ✔ ✔ ✔ ✔ t3 ✔ ✔ ✔ t4 ✔ ✔ ✔ ✔Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • distributional models • Key: definition of what is the ‘context’ • Different granularities are possible • Document • Paragraph • Sentence • Sliding window of wordsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Distributional Models term/context matrix c1 c2 c3 c4 c5 c6 c7 c8 c9 t1 ✔ ✔ ✔ ✔ t2 ✔ ✔ ✔ ✔ t3 ✔ ✔ ✔ t4 ✔ ✔ ✔ ✔Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • distributional models beer vs. glass: good overlap c1 c2 c3 c4 c5 c6 c7 c8 c9 t1 ✔ ✔ ✔ ✔ t2 ✔ ✔ ✔ ✔ t3 ✔ ✔ ✔ t4 ✔ ✔ ✔ ✔Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • distributional models beer vs. spoon: no overlap c1 c2 c3 c4 c5 c6 c7 c8 c9 t1 ✔ ✔ ✔ ✔ t2 ✔ ✔ ✔ ✔ t3 ✔ ✔ ✔ t4 ✔ ✔ ✔ ✔Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • distributional models recap models for representing terms/ documents in large vector spaces light semantics it is simple to calculate similarities between words but the high dimensionality problem is even worsened!Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • a novel recommendation framework based on VSM eVSM step 1: modeling semantics step 2: dimensionality reduction step 3: modeling negation step 4: building user profiles step 5: providing suggestionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Random Indexing (Sahlgren, 2005) Sahlgren, M. An Introduction to Random Indexing. Proceedings of the Methods and Applications of Semantic Indexing Workshop, TKE 2005.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • dimensionality reduction random indexing • Strenghts • Incremental approach • Based on distributional hypothesis • Builds a small-scale semantic vector space representationCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing • Input • n-dimensional term-document matrix • Output • k-dimensional term-context matrix • k << n • Approximation built upon distributional hypothesis • Based on contexts, but much more compact!Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing dimensionality reduction d1 d2 d3 d4 d5 . . . dn c1 c2 c3 c4 c5 . . . ck t1 t1 t2 n >> k t2 t3 t3 t4 t4 t5 t5 term/document matrix term/context matrixCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing dimensionality reduction d1 d2 d3 d4 d5 . . . dn c1 c2 c3 c4 c5 . . . ck t1 t1 t2 n >> k t2 k is a simple t3 t3 parameter of the model t4 t4 t5 t5 term/document matrix term/context matrixCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing dimensionality reduction d1 d2 d3 d4 d5 . . . dn c1 c2 c3 c4 c5 . . . ck t1 t1 t2 n >> k t2 the smaller , the k more the efficiency t3 t3 and the loss of t4 t4 information t5 t5 term/document matrix term/context matrixCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing some literature • Roots • Sparse distributed representations (Kanerva, 1988) • Studies about Random Projection • State of the art applications • Clustering text documents (Kohonen, 2000) • Image data compression (Bingham, 2001) • Information Retrieval (Basile, 2010) • Collaborative filtering (Cisielczyk, 2010) • Never exploited for CBRS.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • How to obtain the smaller k-dimensional representation?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing algorithm • (1) Definition of the context. • Document ? Paragraph ? Sentence ? Word ? • (2) Each ‘context’ is assigned a context vector. • Dimension of the vector = k • Allowed values = {-1, 0, 1} • Constraints: non-zero elements have to be much smaller • Values distributed in a random wayCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing context vectors k=8 rc1 = (0, 0, -1, 1, 0, 0, 0, 0) rc2 = (1, 0, 0, 0, 0, 0, 0, -1) rc3 = (0, 0, 0, 0, 0, -1, 1, 0) rc4 = (-1, 1-, 0, 0, 0, 0, 0, 0) rc5 = (0, 0, 0, -1, 1, 0, 0, 0)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing algorithm • (3) The vector space representation of a term t is obtained by combining the random vectors of the contexts it occurs in. rc1 = (0, 0, -1, 1, 0, 0, 0, 0) rc2 = (1, 0, 0, 0, 0, 0, 0, -1) rc3 = (0, 0, 0, 0, 0, -1, 1, 0) t1 ∈ {c1, c2} rc4 = (-1, 1-, 0, 0, 0, 0, 0, 0) rc5 = (0, 0, 0, -1, 1, 0, 0, 0)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing algorithm • (3) The vector space representation of a term t is obtained by combining the random vectors of the contexts it occurs in. rc1 = (0, 0, -1, 1, 0, 0, 0, 0) t1 ∈ {c1, c2} rc2 = (1, 0, 0, 0, 0, 0, 0, -1) rc3 = (0, 0, 0, 0, 0, -1, 1, 0) rc1 = (0, 0, -1, 1, 0, 0, 0, 0) rc4 = (-1, 1-, 0, 0, 0, 0, 0, 0) rc2 = (1, 0, 0, 0, 0, 0, 0, -1 rc5 = (0, 0, 0, -1, 1, 0, 0, 0) t1 = (1, 0, -1, 1, 0, 0, 0, -1)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing algorithm • (3) The vector space representation of a term t is obtained by combining the random vectors of the contexts it occurs in. • (4) The vector space representation of a document d is obtained by combining the vector space representation of the terms that occur in the document.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing algorithm • (3) The vector space representation of a term t is obtained by combining the random vectors of the contexts it occurs in. output: WORDSPACE • (4) The vector space representation of a document d is obtained by combining the vector space representation of the terms that occur in the document.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing algorithm • (3) The vector space representation of a term t is obtained by combining the random vectors of the contexts it occurs in. output: DOCSPACE • (4) The vector space representation of a document d is obtained by combining the vector space representation of the terms that occur in the document.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing WordSpace DocSpace c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck t1 d1 t2 Uniform d2 t3 Representation d3 t4 d4 t5 d5 Comparison between Comparison between terms documentsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Dimensionality reduction is obtained upon a set of random vectors Does it sound weird?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing theoretical basis • Johnson-Lindenstauss Lemma (*) • Distance between points are approximately preserved. • Constraint: orthogonal vectors • Random Indexing vectors are nearly-ortoghonal. • The loss of information depends on the parameter k (*) Johnson, W and Lindenstauss, J. Extensions of lipschitz maps into a Hilbert space. Contemporary Mathematics, 1984Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • random indexing johnson-lindenstrauss lemmaCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • a novel recommendation framework based on VSM eVSM step 1: modeling semantics step 2: dimensionality reduction step 3: modeling negation step 4: building user profiles step 5: providing suggestionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • quantum negation (Widdows, 2007) Sahlgren, M. An Introduction to Random Indexing. Proceedings of the Methods and Applications of Semantic Indexing Workshop, TKE 2005.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • negation in VSMs state of the art • State-of-the-art approaches: poor theoretical background • Post-retrieval filtering, Rocchio Algorithm (Rocchio, 1971) • Widdows proposed a different point of view • Negation view as a form of orthogonality between vectors • Vision inherited from Quantum LogicCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • negation in VSMs Quantum Negation • Some theory • Given vector a and vector b • Through quantum negation it is possible to defined a vector a not b (a ∧¬b) • Projection of vector a on the subspace orthogonal to those generated by vector bCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • quantum negation application to CBRS • Vector A models positive feedbacks • Information about what a user likes • Vector B models negative feedbacks • Information about what a user does not like • Vector A not B combines both information sourcesCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM building blocks - recap • Distributional Models • Light semantic modeling • Random Indexing (Sahlgren, 2005) • Incremental technique for dimensionality reduction • Quantum Negation (Widdows, 2007) • Negation operator based on Quantum LogicCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM building blocks - recap • A content-based recommendation framework needs to: • Represent items • Build user profiles • Provide suggestions • Random Indexing and Quantum Negation provide a novel representation model.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • a novel recommendation framework based on VSM eVSM step 1: modeling semantics step 2: dimensionality reduction step 3: modeling negation step 4: building user profiles step 5: providing suggestionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM building user profiles • Represent profiles in eVSM • Vector space representation • Obtained by combining the vectors of the items the user liked • How? • Four different profiling modelsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • User Profiles Random Indexing-based (RI) Items Rating Threshold VSM representation of RI-based profile for user uCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • User Profiles Quantum Negation-based (QN) Positive User Profile Vector Negative User Profile Vector VSM representation of QN-based profile for user uCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • User Profiles Weighted Random Indexing-based (w-RI) Items Rating Threshold Higher weight given to the documents with higher ratingCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • User Profiles Weighted Quantum Negation-based (w-QN) Positive User Profile Vector Negative User Profile Vector VSM representation of wQN-based profile for user uCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • a novel recommendation framework based on VSM eVSM step 1: modeling semantics step 2: dimensionality reduction step 3: modeling negation step 4: building user profiles step 5: providing suggestionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM providing suggestions - monolingual scenario DocSpace c1 c2 c3 c4 c5 . . . ck d1 d2 d3 d4 p P All the items are vectors in a DocSpaceCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM providing suggestions - monolingual scenario DocSpace c1 c2 c3 c4 c5 . . . ck d1 d2 d3 d4 p profile is a vector in a DocSpaceCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM providing suggestions - monolingual scenario DocSpace c1 c2 c3 c4 c5 . . . ck d1 d2 d3 d4 p Similarity calculation between p and each itemCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Some maths (1/2) • Let • U set of users • I set of items • Given • active user u ∈ UCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Some maths (2/2) • For each couple (u, ij) • For both user u and item i a vector space representation is provided • u = (fu1, fu2 ... fun) • i = (fi1, fi2 ... fin) • Calculate sim(u, ij) • Cosine similarity • Order ij in a descending similarity order • Return the top-k elementsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Similarity-based recommendations Relevance of an item seen as a form of similarity The most similar items are returned to the target userCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • What about multilanguage recommendations?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM providing suggestions - multilingual scenario • eVSM for multilingual recommendations • Assumption • The distribution of the terms is (almost) language- independent drink bere beer / birra glass bicchiereCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM providing suggestions - multilingual scenario • eVSM for multilingual recommendations • Assumption • The distribution of the terms is (almost) language- independent • The position of concept of in a WordSpace beer will be always the same, regardless the language!Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • (english) WordSpace beer wine spoon dogCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • (italian) WordSpace relationships between terms stay birra regardless the language! vino cucchiaio caneCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM providing suggestions - multilingual scenario DocSpace for L1 DocSpace for L2 c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck Parallel d1 DocSpaces d1 d2 Built upon the d2 same d3 d3 set of d4 random d4 d5 vectors d5 (italian) (english)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM providing suggestions - multilingual scenario DocSpace for L1 DocSpace for L2 c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck Parallel d1 DocSpaces d1 d2 Built upon the d2 same d3 d3 set of d4 random d4 p vectors d5 L1 user profile in L1 (italian)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM providing suggestions - multilingual scenario DocSpace for L1 DocSpace for L2 c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck Parallel d1 DocSpaces d1 d2 Built upon the d2 same d3 d3 set of d4 random d4 p vectors p L1 L1 we can project user profile in the DocSpace of english itemsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • eVSM providing suggestions - multilingual scenario DocSpace for L1 DocSpace for L2 c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck Parallel d1 DocSpaces d1 d2 Built upon the d2 same d3 d3 set of d4 random d4 p vectors p L1 L1 similarity computations of italian profile with english items to build multilingual recommendationsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Multilingual recommendations come with no costs. Thanks to distributional hypothesis.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experimental evaluation applicationsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • evaluation of eVSM • selected experiments • movie recommendation • monolingual scenario • Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis: Random Indexing and Negative User Preferences for Enhancing Content-Based Recommender Systems. EC-Web 2011. 270-281 • multilingual scenario • Cataldo Musto, Fedelucio Narducci, Pierpaolo Basile, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro: Cross-Language Information Filtering: Word Sense Disambiguation vs. Distributional Models. AI*IA 2011 • epg personalization • Cataldo Musto, Fedelucio Narducci, Pasquale Lops, Giovanni Semeraro, Marco de Gemmis, Mauro Barbieri, Jan H. M. Korst,Verus Pronk, Ramon Clout. Enhanced Semantic TV-Show Representation for personalized electronic program guides. UMAP 2012 (to be presented)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • movie recommendation ‘in vitro’ experiments • Goal: to provide users with recommendations about movies worth to be watched. • Subset of 100k MovieLens dataset + Wikipedia content • Monolingual and Multilingual settingsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • monolingual experiment parameter tuning • Size of context vectors • k = 50, 100, 200, 400 • 99% reduction of DocSpace • original size: 25k • Profiling models • RI, w-RI, QN- w-QN • Weighted vs. Unweighted • With negations vs. without negationCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experimental design experiments • Experiment 1 • Do the weighting scheme and the introduction of a negation operator improve the predictive accuracy of the recommendation models? • Experiment 2 • How do the model perform with respect to other state of the art approaches?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 85.7485.8 85.61 85.57 85.4685.43 85.5 85.36 85.29 85.03 84.84 84.9 84.7884.8184.84 84.75 84 p@1 P@3 P@5 P@10 Weighted vs Unweighted: improvement under 0.2%Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 85.7485.8 85.61 85.57 85.4685.43 85.5 85.36 85.29 85.03 84.84 84.9 84.7884.8184.84 84.75 84 p@1 P@3 P@5 P@10 Weighted vs Unweighted: improvement under 0.2%Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 Peak: +0.52 85.8 85.74 85.61 85.57 85.4685.43 85.5 85.36 85.29 85.03 84.84 84.9 84.7884.8184.84 84.75 84 p@1 P@3 P@5 P@10 However, differences are not statistically significantCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 1 size=400 - Movielens dataset 87 RI WRI QN WQN 86.25 86.01 85.94 85.82 85.59 85.6 85.48 85.55 85.5285.5585.58 85.52 85.5 85.32 85.34 85.24 84.94 84.86 84.75 84 p@1 P@3 P@5 P@10 Negation vs No-negation: improvement under 0.5%Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 Gap: +1.08 85.8 85.74 85.61 85.57 85.46 85.43 85.5 85.36 85.29 85.03 84.84 84.9 84.78 84.81 84.84 84.75 84 p@1 P@3 P@5 P@10 Some exception, P@1 and P@3 , comparison W-RI vs. W-QNCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 85.74 85.8 85.61 85.57 85.5 85.29 85.36 Gap: +0.77 85.46 85.43 85.03 84.84 84.9 84.78 84.81 84.84 84.75 84 p@1 P@3 P@5 P@10 The use of negation operator improves the accuracy in a significant way.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 Gap: +1.08 85.8 85.74 85.61 85.57 85.46 85.43 85.5 85.36 85.29 85.03 84.84 84.9 84.78 84.81 84.84 84.75 84 p@1 P@3 P@5 P@10 Peaks in P@1 and P@3 are statistically significantCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 85.74 85.8 85.61 85.57 85.46 85.43 85.5 85.36 85.29 85.03 84.84 84.9 84.78 84.81 84.84 84.75 84 p@1 P@3 P@5 P@10Generally speaking, W-QN configuration outperforms the others.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 85.74 85.8 85.61 Gap: +1.4% 85.57 85.46 85.43 85.5 85.36 85.29 85.03 84.84 84.9 84.78 84.81 84.84 84.75 84 p@1 P@3 P@5 P@10 The combined use of weigthing and negation significally improves the accuracyCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 1 impact of negation operator and weighting scheme context vectors - size 50 100 200 400 P@1 ✔ ✔ ✔ P@3 ✔ ✔ ✔ P@5 P@10 ✔ ✔ = statistical significanceCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 1 impact of negation operator and weighting scheme context vectors - size 50 100 200 400 P@1 ✔ ✔ ✔ P@3 ✔ ✔ ✔ P@5 P@10 ✔ The combined use of weigthing and negation significally improves the accuracyCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 2 87 size=400 - Movielens dataset eVSM VSM 86.25 85.94 86.01 LSI Bayes 85.58 85.52 85.5 85.39 85.27 84.97 84.85 84.77 84.75 84.75 84.7 84.7 84.58 84.47 84.5 84.43 84 p@1 P@3 P@5 P@10 Gap always around 1%Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 2 87 size=400 - Movielens dataset eVSM VSM 86.25 85.94 86.01 LSI Bayes 85.58 85.52 85.5 85.39 85.27 84.97 84.85 84.77 84.75 84.75 84.7 84.7 84.58 84.47 84.5 84.43 84 p@1 P@3 P@5 P@10 Significant ImprovementCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 2 eVSM vs. state of the art approaches context vectors - size 100 400 P@1 ✔ ✔ P@3 ✔ ✔ P@5 P@10 ✔ ✔ = statistical significanceCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experiment 2 eVSM vs. state of the art approaches context vectors - size 100 400 P@1 ✔ ✔ P@3 ✔ ✔ P@5 P@10 ✔ With vectors of size=400 eVSM significantly outperforms state of the art approaches.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • multilingual scenario experimental design • Experiment 1 • ENG-ITA: learning user profiles in english, recommending items in italian • Experiment 2 • ITA-ENG: learning user profiles in italian, recommending items in english • Experiment 3 • ENG-ENG: learning user profiles in english recommending items in english • Experiment 4 • ITA-ITA: learning user profiles in italian, recommending items in italianCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experimental results - p@5 experiment w-qn w-ri eng-ita 84.65 84.65 ita-eng 84.85 84.63 eng-eng 85.23 85.29 ita-ita 85.27 84.84 outcome: monolingua slightly better than multilinguaCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experimental results - p@5 experiment w-qn w-ri eng-ita 84.65 84.65 ita-eng 84.85 84.63 eng-eng 85.23 85.29 ita-ita 85.27 84.84 outcome: no significative difference between resultsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • experimental results - p@5 experiment w-qn w-ri eng-ita 84.65 84.65 ita-eng 84.85 84.63 eng-eng 85.23 85.29 ita-ita 85.27 84.84 outcome: multilingual recommendations as good as monolingual onesCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • EPGs personalization ‘in vitro’ experiments • Personalization of TV shows for EPGs • Philips - Aprico.tv datasetCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Watchmi plug-in developed by Aprico.tv ‘in vitro’ experimentsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • scenario EPGs personalization • A user profile can be built as a set of genres (program type) the user likes • The goal is to provide user with a set of suggestions • What TV shows should she watch?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • dataset Aprico.tv data • TV shows gathered from a set of 47 German-language channel • Provided by Axel Springer • TV shows textual features • title, synopsis, description • program type (Movie, Sport, Documentary, Magazine, etc.)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • description of the task • retrieval task • Given a set of program types and a repository of TV shows • We want to retrieve the shows that belong to a specific program typeCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • retrieval task Comparison of the approaches • Random Indexing (baseline) • Each program type has a (compressed) vector space representation • Each TV show has a (compressed) vector space representation • Cosine Similarity • Given a program type (input) a set of n TV shows is returned, according to their descending cosine similarityCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • retrieval task Comparison of the approaches • Random Indexing +Quantum Negation • As for classification task. • For each program type both a positive and negative Vector Space representation are provided • Quantum Negation used to model the TV shows that do not belong a certain program type • Cosine Similarity • Given a program type (input) a set of n TV shows is returned, according to their descending cosine similarityCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Philips Aprico.tv dataset program tv shows 133.579 17 types features 306,006 Dataset largely umbalanced e.g. 40k TV Series, 25k documentaries, 15k movies but only 2k sport TV shows.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • details • Metric • P@n • Parameters • Dimension of the vectors • 500, 1000, 1500, 2000 • Minimum number of occurrences • 1, 3Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • retrieval task results - p@n 82.6% 66.3%Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • retrieval task results - p@n 65.9% 45.2%ìCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • retrieval task results - p@n 58.1% 36.5%Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • EPGs personalization recap of the experiment • Good results. Around 85% precision in the first ten results • RI + QN significantly outperforms RI (around + 20%) • Size of context vectors does not influence the precisioneCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • other experiments a brief summaryCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • other experiments a brief summary • Content-based Movie Recommendation • MovieLens dataset •Size of context vectors does not affect the predictive accuracy • Yahoo! WebScope dataset • Similar outcomes • Weighting did not provide significative improvement, negation provides a significative improvement • EPGs Personalization • RI for Classification task • RI performs worse w.r.t VSM in the task of classifying TV shows • Comparable results in terms of macro-average among program typesCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • other experiments a brief summary • Content-based Music Recommendation • Platform for Music Recommendation • Extraction of user profiles by mining Facebook data, eVSM for providing recommendation. Evaluation on the effectiveness of social media for recommendation tasks. • Platform for playlists personalization • Personalization based on eVSM works better than a personalization baseline based on DBpedia (Linked Data)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • recap and contributions.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • information overload.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • IR-based CBRS.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • building blocks distributional models.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • building blocks random indexing.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • building blocks quantum negation.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • contributions richer representation based on VSM.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • contributions framework for multilingual recommendationsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • contributions eVSMCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • Now we can answer the Research Question Yes. It is possible to exploit IR-based techniques in CBRS area for developing a novel content-based recommendation framework.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • future research.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • evaluation with user-based metrics (serendipity, novelty, unexpectedness)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • open knowledge sources and social media for CBRS.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • linked data.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • modeling context.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • end.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • questions?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
  • “Qualunque cosa tu possa fare, qualunque sogno tupossa sognare, comincia. L’audacia reca in sè genialità, magia e forza. Comincia ora.” Goethe.“Whatever you can do, or dream you can do, begin. Boldness has genius, magic and power in it. Begin it now.” Thank you.Cataldo Musto - Enhanced Vector Space Model for Content-based Recommender Systems - Ph.D. defense - University of Bari “Aldo Moro”, Italy - 08.06.12