Your SlideShare is downloading. ×
Ph.D. Defense - Enhanced Vector Space Models for Content-based Recommender Systems
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Ph.D. Defense - Enhanced Vector Space Models for Content-based Recommender Systems

1,599
views

Published on

08.06.12 …

08.06.12
PhD defense: Enhanced Vector Space Models for Content-based Recommender Systems.

Published in: Technology, Business

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,599
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
100
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript

    • 1. Università degli Studi di Bari ‘Aldo Moro’ Dottorato di Ricerca in Informatica - Ciclo XXIV Enhanced Vector Space Models for Content-based Recommender Systems Cataldo Musto, Ph.D. Candidate Supervisor: prof. Giovanni Semeraro08.06.12
    • 2. what will we talk about in the next 40 minutes?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 3. life is all a matter of decisionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 4. life is all a matter of decisionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 5. decision-making is actually challengingCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 6. decision-making is actually challengingCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 7. decision-making is actually challengingCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 8. as much we need to hold knowledge as possibleCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 9. Leibniz “In things which are absolutely indifferent there can be no choice and consequently no option or will. ”Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 10. information age knowledge is spread through the WebCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 11. social media changed the rules for informationmanagement and knowledge acquisitionCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 12. exponential growth of the available informationCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 13. Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 14. it is physiologically impossible to follow the information flow in real timeCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 15. how much information?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 16. we daily interact with393 bits of information per secondCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 17. human brain can absorb126 bits of information per secondCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 18. we can handle 126 bits of information we deal with 393 bits of information ratio: more than (Source: Adrian C.Ott, The 24-hour customer) 3x consequence: Information OverloadCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 19. Information OverloadCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 20. Information OverloadCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 21. Information OverloadCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 22. Information OverloadCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 23. Information OverloadCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 24. paradox of choice (Barry Schwartz, TED talk “Why more is less”)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 25. Buridan’s ass paradox Two alternatives. The ass cannot decide. It starves.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 26. Is the information overload actually unbearable?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 27. “It is not information overload. It is filter failure” Clay Shirky talk @Web2.0 ExpoCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 28. Solution we need to the improve techniques for filtering the informationCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 29. Information Filtering (IF) “To expose users only with the information that are relevant for them, thus avoiding information overload.” to filter. as kids do when they play with sand.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 30. IF applicationsExample: Recommender System Relevant items (movies, news, books, etc.) are pushed to the user according to her needs.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 31. Recommender Systems are an effective way to face the Information Overload problemCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 32. example Amazon.com RecommendationsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 33. Information Retrieval (IR) “Findings of relevant pieces of information from a collection of (usually unstructured) data”Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 34. IR applications Example: Search Engines Relevant document are returned to the user, according to her query.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 35. IR vs. IF • IR and IF represent two strictly related research areas • Same goal: to optimize and make easier the access to (unstructured) data sources • “Two sides of the same coin” (*) (*) N.Belkin, W. Croft: Information Filtering and Information Retrieval: Two sides of the same coin”, Communications of ACM, Volume 35, Issue 12, pp. 29-38, 1992Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 36. IR vs IF: differences • Little differences • Representation of user needs • Query in IR, user profile in IF • Convergence between IR and IF • Personalized Search !Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 37. Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 38. Ph.D. dissertation Research Question Is it possible to exploit the convergence between IR and IF to introduce a recommendation framework based on IR techniques?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 39. outline.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 40. outline (1/2) • recommender systems • content-based recommender systems (CBRS) • vector space models • VSM for CBRS • strengths and weaknessesCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 41. outline (2/2) • eVSM: enhanced vector space models • semantics in VSMs • dimensionality reduction in VSMs • modeling negation in VSMs • applications and experimental evaluation • movie recommendation • Philips TV-guides personalizationCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 42. recommender systems.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 43. definition guiding the Recommender Systems have the goal of users in a personalized way to interesting or useful objects in a large space of possible options. Burke, 2002 (*) (*) Robin D. Burke: Hybrid Recommender Systems: Survey and Experiments. UMUAI, volume 12, issue 4, 331-370 (2002)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 44. suggestions• Examples • books or news to read • music to be listened to • movies worth to be watched • restaurants, etc.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 45. Some maths (1/2) • Let • U set of users • I set of items • Given • user u ∈ U • item i ∈ ICataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 46. Some maths (2/2) • A recommender system should predict how relevant item i is for user u by defining a scoring function • f: U×I→[0,1] = scoring function • The items with the highest value of f are labeled as relevant and returned to the userCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 47. classes of RSs • In literature many approaches for building RSs have been introduced. • Collaborative Recommender Systems • Content-based Recommender Systems • Knowledge-based Recommender Systems • Demographic-based Recommender Systems • Social Recommender Systems • Hybrid Recommender SystemsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 48. classes of RSs • In literature many approaches for building RSs have been introduced. • Collaborative Recommender Systems FOCUS • Content-based Recommender Systems • Knowledge-based Recommender Systems • Demographic-based Recommender Systems • Social Recommender Systems • Hybrid Recommender SystemsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 49. content-based recommenders Suggest items similar to those liked in the past by the userCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 50. content-based recommenders key concepts • Each item has to be described through a set of textual features • Movie plots, content of news, book summaries,Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 51. content-based recommenders key concepts • User profile contains the features that often occur in the items the user liked • A profile of a user interested in basketball will contain keywords related to it (example: basketball teams, players or competitions)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 52. content-based recommenders key concepts • Recommendations are provided by calculating the overlap between the features stored in the user profile and those that occur in the item. • The bigger the overlap, the higher the relevanceCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 53. content-based recommenders example: news recommendations Items User Profile User is interested in ♥ news articles about sports, football, ♥ cycling, etc.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 54. content-based recommenders example: news recommendations Items Recommendations ♥ ♥Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 55. content-based recommenders example: news recommendations Items Recommendations ♥ X ♥Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 56. content-based recommenders example: news recommendations Items Recommendations ♥ X ♥Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 57. main building block vector space model the most adopted IR model (*) (*) Gerard Salton: A Vector Space Model for Automatic Indexing, Communications of the ACM, vol. 18, nr. 11, pages 613–620Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 58. vector space model (VSM) Testo • Given a set of n features (vocabulary) Testo • f={ f1, f2 ... fn } • Given a set of M items • Each document (item) is represented as a point a an n-dimensional vector space • I = (wi in the itemw is the weight of i feature .....w ) -f1 fn fiCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 59. VSM representation football news sports news politics news politics newsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 60. research question Is it possible to exploit VSM for a recommendation scenario?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 61. VSM for CBRS how to adapt it? • In VSM each item is represented as a vector • User profile vector space representation as well needs a • How? • For example, by combining vectors of the items (documents) the user liked in the pastCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 62. VSM representation user profile football news sports news politics news politics newsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 63. VSM representation Recommendation task seen as user profile similarity calculation football news between vectors sports news politics news politics newsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 64. VSM representation recommender systmem suggests user profile football and football news sports news sports news politics news politics newsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 65. Can this model be improved? Yes.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 66. VSM weaknesses • Modeling Negation • VSM does not model negative evidences • The vector space representation only depends on the features that occur in the document, there are no assumption about the features that don’t occur • What a specific user dislikes is not consideredCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 67. VSM weaknesses • High Dimensionality • As the number of documents grows, the number of features grows as well • Large vector spaces are difficult to manageCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 68. VSM weaknesses •Language issues • Does not manage the latent semantic of documents • String matching-based approach • A CBRS based on VSM cannot understand the information it manages apple ?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 69. VSM weaknesses •Language issues • Representation is language-dependant • User profile built in a language can not be exploited to provide recommendation of items described in another language • It would be good to receive (e.g.) recommendation about news written by english newspapers even if I expressed my interest only on italian news articles!Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 70. How to catch these issues?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 71. a novel recommendation framework based on VSM eVSM enhanced Vector Space Model (*) (*) Cataldo Musto: Enhanced Vector Space Models for Content-based Recommender Systems, RECSYS 2010, pages 361-364Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 72. eVSM goals • To introduce a CBRS based on VSM • To catch representation issues of VSM •No Semantics •High Dimensionality •No modeling of Negative Information •Language-dependant recommendationsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 73. a novel recommendation framework based on VSM eVSM step 1: modeling semantics step 2: dimensionality reduction step 3: modeling negation step 4: building user profiles step 5: providing suggestionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 74. how to improve the semantic modeling in VSMs? distributional models (Firth, 1957) Firth, J.R. A synopsis of linguistic theory 1930-1955. In Studies in Linguistic Analysis, pp. 1-32, 1957.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 75. distributional models “meaning is its use” L.WittgensteinCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 76. distributional models insightby analyzing large corpus of textual data it is possibleto infer information about the usage (about the meaning)of the terms.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 77. distributional models insightby analyzing large corpus of textual data it is possibleto infer information about the usage (about the meaning)of the terms. exampleCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 78. Distributional Models term/context matrix c1 c2 c3 c4 c5 c6 c7 c8 c9 t1 ✔ ✔ ✔ ✔ t2 ✔ ✔ ✔ ✔ t3 ✔ ✔ ✔ t4 ✔ ✔ ✔ ✔Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 79. distributional models • Key: definition of what is the ‘context’ • Different granularities are possible • Document • Paragraph • Sentence • Sliding window of wordsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 80. Distributional Models term/context matrix c1 c2 c3 c4 c5 c6 c7 c8 c9 t1 ✔ ✔ ✔ ✔ t2 ✔ ✔ ✔ ✔ t3 ✔ ✔ ✔ t4 ✔ ✔ ✔ ✔Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 81. distributional models beer vs. glass: good overlap c1 c2 c3 c4 c5 c6 c7 c8 c9 t1 ✔ ✔ ✔ ✔ t2 ✔ ✔ ✔ ✔ t3 ✔ ✔ ✔ t4 ✔ ✔ ✔ ✔Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 82. distributional models beer vs. spoon: no overlap c1 c2 c3 c4 c5 c6 c7 c8 c9 t1 ✔ ✔ ✔ ✔ t2 ✔ ✔ ✔ ✔ t3 ✔ ✔ ✔ t4 ✔ ✔ ✔ ✔Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 83. distributional models recap models for representing terms/ documents in large vector spaces light semantics it is simple to calculate similarities between words but the high dimensionality problem is even worsened!Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 84. a novel recommendation framework based on VSM eVSM step 1: modeling semantics step 2: dimensionality reduction step 3: modeling negation step 4: building user profiles step 5: providing suggestionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 85. Random Indexing (Sahlgren, 2005) Sahlgren, M. An Introduction to Random Indexing. Proceedings of the Methods and Applications of Semantic Indexing Workshop, TKE 2005.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 86. dimensionality reduction random indexing • Strenghts • Incremental approach • Based on distributional hypothesis • Builds a small-scale semantic vector space representationCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 87. random indexing • Input • n-dimensional term-document matrix • Output • k-dimensional term-context matrix • k << n • Approximation built upon distributional hypothesis • Based on contexts, but much more compact!Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 88. random indexing dimensionality reduction d1 d2 d3 d4 d5 . . . dn c1 c2 c3 c4 c5 . . . ck t1 t1 t2 n >> k t2 t3 t3 t4 t4 t5 t5 term/document matrix term/context matrixCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 89. random indexing dimensionality reduction d1 d2 d3 d4 d5 . . . dn c1 c2 c3 c4 c5 . . . ck t1 t1 t2 n >> k t2 k is a simple t3 t3 parameter of the model t4 t4 t5 t5 term/document matrix term/context matrixCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 90. random indexing dimensionality reduction d1 d2 d3 d4 d5 . . . dn c1 c2 c3 c4 c5 . . . ck t1 t1 t2 n >> k t2 the smaller , the k more the efficiency t3 t3 and the loss of t4 t4 information t5 t5 term/document matrix term/context matrixCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 91. random indexing some literature • Roots • Sparse distributed representations (Kanerva, 1988) • Studies about Random Projection • State of the art applications • Clustering text documents (Kohonen, 2000) • Image data compression (Bingham, 2001) • Information Retrieval (Basile, 2010) • Collaborative filtering (Cisielczyk, 2010) • Never exploited for CBRS.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 92. How to obtain the smaller k-dimensional representation?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 93. random indexing algorithm • (1) Definition of the context. • Document ? Paragraph ? Sentence ? Word ? • (2) Each ‘context’ is assigned a context vector. • Dimension of the vector = k • Allowed values = {-1, 0, 1} • Constraints: non-zero elements have to be much smaller • Values distributed in a random wayCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 94. random indexing context vectors k=8 rc1 = (0, 0, -1, 1, 0, 0, 0, 0) rc2 = (1, 0, 0, 0, 0, 0, 0, -1) rc3 = (0, 0, 0, 0, 0, -1, 1, 0) rc4 = (-1, 1-, 0, 0, 0, 0, 0, 0) rc5 = (0, 0, 0, -1, 1, 0, 0, 0)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 95. random indexing algorithm • (3) The vector space representation of a term t is obtained by combining the random vectors of the contexts it occurs in. rc1 = (0, 0, -1, 1, 0, 0, 0, 0) rc2 = (1, 0, 0, 0, 0, 0, 0, -1) rc3 = (0, 0, 0, 0, 0, -1, 1, 0) t1 ∈ {c1, c2} rc4 = (-1, 1-, 0, 0, 0, 0, 0, 0) rc5 = (0, 0, 0, -1, 1, 0, 0, 0)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 96. random indexing algorithm • (3) The vector space representation of a term t is obtained by combining the random vectors of the contexts it occurs in. rc1 = (0, 0, -1, 1, 0, 0, 0, 0) t1 ∈ {c1, c2} rc2 = (1, 0, 0, 0, 0, 0, 0, -1) rc3 = (0, 0, 0, 0, 0, -1, 1, 0) rc1 = (0, 0, -1, 1, 0, 0, 0, 0) rc4 = (-1, 1-, 0, 0, 0, 0, 0, 0) rc2 = (1, 0, 0, 0, 0, 0, 0, -1 rc5 = (0, 0, 0, -1, 1, 0, 0, 0) t1 = (1, 0, -1, 1, 0, 0, 0, -1)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 97. random indexing algorithm • (3) The vector space representation of a term t is obtained by combining the random vectors of the contexts it occurs in. • (4) The vector space representation of a document d is obtained by combining the vector space representation of the terms that occur in the document.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 98. random indexing algorithm • (3) The vector space representation of a term t is obtained by combining the random vectors of the contexts it occurs in. output: WORDSPACE • (4) The vector space representation of a document d is obtained by combining the vector space representation of the terms that occur in the document.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 99. random indexing algorithm • (3) The vector space representation of a term t is obtained by combining the random vectors of the contexts it occurs in. output: DOCSPACE • (4) The vector space representation of a document d is obtained by combining the vector space representation of the terms that occur in the document.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 100. random indexing WordSpace DocSpace c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck t1 d1 t2 Uniform d2 t3 Representation d3 t4 d4 t5 d5 Comparison between Comparison between terms documentsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 101. Dimensionality reduction is obtained upon a set of random vectors Does it sound weird?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 102. random indexing theoretical basis • Johnson-Lindenstauss Lemma (*) • Distance between points are approximately preserved. • Constraint: orthogonal vectors • Random Indexing vectors are nearly-ortoghonal. • The loss of information depends on the parameter k (*) Johnson, W and Lindenstauss, J. Extensions of lipschitz maps into a Hilbert space. Contemporary Mathematics, 1984Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 103. random indexing johnson-lindenstrauss lemmaCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 104. a novel recommendation framework based on VSM eVSM step 1: modeling semantics step 2: dimensionality reduction step 3: modeling negation step 4: building user profiles step 5: providing suggestionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 105. quantum negation (Widdows, 2007) Sahlgren, M. An Introduction to Random Indexing. Proceedings of the Methods and Applications of Semantic Indexing Workshop, TKE 2005.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 106. negation in VSMs state of the art • State-of-the-art approaches: poor theoretical background • Post-retrieval filtering, Rocchio Algorithm (Rocchio, 1971) • Widdows proposed a different point of view • Negation view as a form of orthogonality between vectors • Vision inherited from Quantum LogicCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 107. negation in VSMs Quantum Negation • Some theory • Given vector a and vector b • Through quantum negation it is possible to defined a vector a not b (a ∧¬b) • Projection of vector a on the subspace orthogonal to those generated by vector bCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 108. quantum negation application to CBRS • Vector A models positive feedbacks • Information about what a user likes • Vector B models negative feedbacks • Information about what a user does not like • Vector A not B combines both information sourcesCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 109. eVSM building blocks - recap • Distributional Models • Light semantic modeling • Random Indexing (Sahlgren, 2005) • Incremental technique for dimensionality reduction • Quantum Negation (Widdows, 2007) • Negation operator based on Quantum LogicCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 110. eVSM building blocks - recap • A content-based recommendation framework needs to: • Represent items • Build user profiles • Provide suggestions • Random Indexing and Quantum Negation provide a novel representation model.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 111. a novel recommendation framework based on VSM eVSM step 1: modeling semantics step 2: dimensionality reduction step 3: modeling negation step 4: building user profiles step 5: providing suggestionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 112. eVSM building user profiles • Represent profiles in eVSM • Vector space representation • Obtained by combining the vectors of the items the user liked • How? • Four different profiling modelsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 113. User Profiles Random Indexing-based (RI) Items Rating Threshold VSM representation of RI-based profile for user uCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 114. User Profiles Quantum Negation-based (QN) Positive User Profile Vector Negative User Profile Vector VSM representation of QN-based profile for user uCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 115. User Profiles Weighted Random Indexing-based (w-RI) Items Rating Threshold Higher weight given to the documents with higher ratingCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 116. User Profiles Weighted Quantum Negation-based (w-QN) Positive User Profile Vector Negative User Profile Vector VSM representation of wQN-based profile for user uCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 117. a novel recommendation framework based on VSM eVSM step 1: modeling semantics step 2: dimensionality reduction step 3: modeling negation step 4: building user profiles step 5: providing suggestionsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 118. eVSM providing suggestions - monolingual scenario DocSpace c1 c2 c3 c4 c5 . . . ck d1 d2 d3 d4 p P All the items are vectors in a DocSpaceCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 119. eVSM providing suggestions - monolingual scenario DocSpace c1 c2 c3 c4 c5 . . . ck d1 d2 d3 d4 p profile is a vector in a DocSpaceCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 120. eVSM providing suggestions - monolingual scenario DocSpace c1 c2 c3 c4 c5 . . . ck d1 d2 d3 d4 p Similarity calculation between p and each itemCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 121. Some maths (1/2) • Let • U set of users • I set of items • Given • active user u ∈ UCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 122. Some maths (2/2) • For each couple (u, ij) • For both user u and item i a vector space representation is provided • u = (fu1, fu2 ... fun) • i = (fi1, fi2 ... fin) • Calculate sim(u, ij) • Cosine similarity • Order ij in a descending similarity order • Return the top-k elementsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 123. Similarity-based recommendations Relevance of an item seen as a form of similarity The most similar items are returned to the target userCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 124. What about multilanguage recommendations?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 125. eVSM providing suggestions - multilingual scenario • eVSM for multilingual recommendations • Assumption • The distribution of the terms is (almost) language- independent drink bere beer / birra glass bicchiereCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 126. eVSM providing suggestions - multilingual scenario • eVSM for multilingual recommendations • Assumption • The distribution of the terms is (almost) language- independent • The position of concept of in a WordSpace beer will be always the same, regardless the language!Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 127. (english) WordSpace beer wine spoon dogCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 128. (italian) WordSpace relationships between terms stay birra regardless the language! vino cucchiaio caneCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 129. eVSM providing suggestions - multilingual scenario DocSpace for L1 DocSpace for L2 c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck Parallel d1 DocSpaces d1 d2 Built upon the d2 same d3 d3 set of d4 random d4 d5 vectors d5 (italian) (english)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 130. eVSM providing suggestions - multilingual scenario DocSpace for L1 DocSpace for L2 c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck Parallel d1 DocSpaces d1 d2 Built upon the d2 same d3 d3 set of d4 random d4 p vectors d5 L1 user profile in L1 (italian)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 131. eVSM providing suggestions - multilingual scenario DocSpace for L1 DocSpace for L2 c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck Parallel d1 DocSpaces d1 d2 Built upon the d2 same d3 d3 set of d4 random d4 p vectors p L1 L1 we can project user profile in the DocSpace of english itemsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 132. eVSM providing suggestions - multilingual scenario DocSpace for L1 DocSpace for L2 c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck Parallel d1 DocSpaces d1 d2 Built upon the d2 same d3 d3 set of d4 random d4 p vectors p L1 L1 similarity computations of italian profile with english items to build multilingual recommendationsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 133. Multilingual recommendations come with no costs. Thanks to distributional hypothesis.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 134. experimental evaluation applicationsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 135. evaluation of eVSM • selected experiments • movie recommendation • monolingual scenario • Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis: Random Indexing and Negative User Preferences for Enhancing Content-Based Recommender Systems. EC-Web 2011. 270-281 • multilingual scenario • Cataldo Musto, Fedelucio Narducci, Pierpaolo Basile, Pasquale Lops, Marco de Gemmis, Giovanni Semeraro: Cross-Language Information Filtering: Word Sense Disambiguation vs. Distributional Models. AI*IA 2011 • epg personalization • Cataldo Musto, Fedelucio Narducci, Pasquale Lops, Giovanni Semeraro, Marco de Gemmis, Mauro Barbieri, Jan H. M. Korst,Verus Pronk, Ramon Clout. Enhanced Semantic TV-Show Representation for personalized electronic program guides. UMAP 2012 (to be presented)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 136. movie recommendation ‘in vitro’ experiments • Goal: to provide users with recommendations about movies worth to be watched. • Subset of 100k MovieLens dataset + Wikipedia content • Monolingual and Multilingual settingsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 137. monolingual experiment parameter tuning • Size of context vectors • k = 50, 100, 200, 400 • 99% reduction of DocSpace • original size: 25k • Profiling models • RI, w-RI, QN- w-QN • Weighted vs. Unweighted • With negations vs. without negationCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 138. experimental design experiments • Experiment 1 • Do the weighting scheme and the introduction of a negation operator improve the predictive accuracy of the recommendation models? • Experiment 2 • How do the model perform with respect to other state of the art approaches?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 139. experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 85.7485.8 85.61 85.57 85.4685.43 85.5 85.36 85.29 85.03 84.84 84.9 84.7884.8184.84 84.75 84 p@1 P@3 P@5 P@10 Weighted vs Unweighted: improvement under 0.2%Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 140. experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 85.7485.8 85.61 85.57 85.4685.43 85.5 85.36 85.29 85.03 84.84 84.9 84.7884.8184.84 84.75 84 p@1 P@3 P@5 P@10 Weighted vs Unweighted: improvement under 0.2%Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 141. experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 Peak: +0.52 85.8 85.74 85.61 85.57 85.4685.43 85.5 85.36 85.29 85.03 84.84 84.9 84.7884.8184.84 84.75 84 p@1 P@3 P@5 P@10 However, differences are not statistically significantCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 142. experiment 1 size=400 - Movielens dataset 87 RI WRI QN WQN 86.25 86.01 85.94 85.82 85.59 85.6 85.48 85.55 85.5285.5585.58 85.52 85.5 85.32 85.34 85.24 84.94 84.86 84.75 84 p@1 P@3 P@5 P@10 Negation vs No-negation: improvement under 0.5%Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 143. experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 Gap: +1.08 85.8 85.74 85.61 85.57 85.46 85.43 85.5 85.36 85.29 85.03 84.84 84.9 84.78 84.81 84.84 84.75 84 p@1 P@3 P@5 P@10 Some exception, P@1 and P@3 , comparison W-RI vs. W-QNCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 144. experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 85.74 85.8 85.61 85.57 85.5 85.29 85.36 Gap: +0.77 85.46 85.43 85.03 84.84 84.9 84.78 84.81 84.84 84.75 84 p@1 P@3 P@5 P@10 The use of negation operator improves the accuracy in a significant way.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 145. experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 Gap: +1.08 85.8 85.74 85.61 85.57 85.46 85.43 85.5 85.36 85.29 85.03 84.84 84.9 84.78 84.81 84.84 84.75 84 p@1 P@3 P@5 P@10 Peaks in P@1 and P@3 are statistically significantCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 146. experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 85.74 85.8 85.61 85.57 85.46 85.43 85.5 85.36 85.29 85.03 84.84 84.9 84.78 84.81 84.84 84.75 84 p@1 P@3 P@5 P@10Generally speaking, W-QN configuration outperforms the others.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 147. experiment 1 size=100 - Movielens dataset 87 86.69 RI WRI QN WQN 86.25 86.17 85.74 85.8 85.61 Gap: +1.4% 85.57 85.46 85.43 85.5 85.36 85.29 85.03 84.84 84.9 84.78 84.81 84.84 84.75 84 p@1 P@3 P@5 P@10 The combined use of weigthing and negation significally improves the accuracyCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 148. experiment 1 impact of negation operator and weighting scheme context vectors - size 50 100 200 400 P@1 ✔ ✔ ✔ P@3 ✔ ✔ ✔ P@5 P@10 ✔ ✔ = statistical significanceCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 149. experiment 1 impact of negation operator and weighting scheme context vectors - size 50 100 200 400 P@1 ✔ ✔ ✔ P@3 ✔ ✔ ✔ P@5 P@10 ✔ The combined use of weigthing and negation significally improves the accuracyCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 150. experiment 2 87 size=400 - Movielens dataset eVSM VSM 86.25 85.94 86.01 LSI Bayes 85.58 85.52 85.5 85.39 85.27 84.97 84.85 84.77 84.75 84.75 84.7 84.7 84.58 84.47 84.5 84.43 84 p@1 P@3 P@5 P@10 Gap always around 1%Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 151. experiment 2 87 size=400 - Movielens dataset eVSM VSM 86.25 85.94 86.01 LSI Bayes 85.58 85.52 85.5 85.39 85.27 84.97 84.85 84.77 84.75 84.75 84.7 84.7 84.58 84.47 84.5 84.43 84 p@1 P@3 P@5 P@10 Significant ImprovementCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 152. experiment 2 eVSM vs. state of the art approaches context vectors - size 100 400 P@1 ✔ ✔ P@3 ✔ ✔ P@5 P@10 ✔ ✔ = statistical significanceCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 153. experiment 2 eVSM vs. state of the art approaches context vectors - size 100 400 P@1 ✔ ✔ P@3 ✔ ✔ P@5 P@10 ✔ With vectors of size=400 eVSM significantly outperforms state of the art approaches.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 154. multilingual scenario experimental design • Experiment 1 • ENG-ITA: learning user profiles in english, recommending items in italian • Experiment 2 • ITA-ENG: learning user profiles in italian, recommending items in english • Experiment 3 • ENG-ENG: learning user profiles in english recommending items in english • Experiment 4 • ITA-ITA: learning user profiles in italian, recommending items in italianCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 155. experimental results - p@5 experiment w-qn w-ri eng-ita 84.65 84.65 ita-eng 84.85 84.63 eng-eng 85.23 85.29 ita-ita 85.27 84.84 outcome: monolingua slightly better than multilinguaCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 156. experimental results - p@5 experiment w-qn w-ri eng-ita 84.65 84.65 ita-eng 84.85 84.63 eng-eng 85.23 85.29 ita-ita 85.27 84.84 outcome: no significative difference between resultsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 157. experimental results - p@5 experiment w-qn w-ri eng-ita 84.65 84.65 ita-eng 84.85 84.63 eng-eng 85.23 85.29 ita-ita 85.27 84.84 outcome: multilingual recommendations as good as monolingual onesCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 158. EPGs personalization ‘in vitro’ experiments • Personalization of TV shows for EPGs • Philips - Aprico.tv datasetCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 159. Watchmi plug-in developed by Aprico.tv ‘in vitro’ experimentsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 160. scenario EPGs personalization • A user profile can be built as a set of genres (program type) the user likes • The goal is to provide user with a set of suggestions • What TV shows should she watch?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 161. dataset Aprico.tv data • TV shows gathered from a set of 47 German-language channel • Provided by Axel Springer • TV shows textual features • title, synopsis, description • program type (Movie, Sport, Documentary, Magazine, etc.)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 162. description of the task • retrieval task • Given a set of program types and a repository of TV shows • We want to retrieve the shows that belong to a specific program typeCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 163. retrieval task Comparison of the approaches • Random Indexing (baseline) • Each program type has a (compressed) vector space representation • Each TV show has a (compressed) vector space representation • Cosine Similarity • Given a program type (input) a set of n TV shows is returned, according to their descending cosine similarityCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 164. retrieval task Comparison of the approaches • Random Indexing +Quantum Negation • As for classification task. • For each program type both a positive and negative Vector Space representation are provided • Quantum Negation used to model the TV shows that do not belong a certain program type • Cosine Similarity • Given a program type (input) a set of n TV shows is returned, according to their descending cosine similarityCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 165. Philips Aprico.tv dataset program tv shows 133.579 17 types features 306,006 Dataset largely umbalanced e.g. 40k TV Series, 25k documentaries, 15k movies but only 2k sport TV shows.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 166. details • Metric • P@n • Parameters • Dimension of the vectors • 500, 1000, 1500, 2000 • Minimum number of occurrences • 1, 3Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 167. retrieval task results - p@n 82.6% 66.3%Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 168. retrieval task results - p@n 65.9% 45.2%ìCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 169. retrieval task results - p@n 58.1% 36.5%Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 170. EPGs personalization recap of the experiment • Good results. Around 85% precision in the first ten results • RI + QN significantly outperforms RI (around + 20%) • Size of context vectors does not influence the precisioneCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 171. other experiments a brief summaryCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 172. other experiments a brief summary • Content-based Movie Recommendation • MovieLens dataset •Size of context vectors does not affect the predictive accuracy • Yahoo! WebScope dataset • Similar outcomes • Weighting did not provide significative improvement, negation provides a significative improvement • EPGs Personalization • RI for Classification task • RI performs worse w.r.t VSM in the task of classifying TV shows • Comparable results in terms of macro-average among program typesCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 173. other experiments a brief summary • Content-based Music Recommendation • Platform for Music Recommendation • Extraction of user profiles by mining Facebook data, eVSM for providing recommendation. Evaluation on the effectiveness of social media for recommendation tasks. • Platform for playlists personalization • Personalization based on eVSM works better than a personalization baseline based on DBpedia (Linked Data)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 174. recap and contributions.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 175. information overload.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 176. IR-based CBRS.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 177. building blocks distributional models.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 178. building blocks random indexing.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 179. building blocks quantum negation.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 180. contributions richer representation based on VSM.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 181. contributions framework for multilingual recommendationsCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 182. contributions eVSMCataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 183. Now we can answer the Research Question Yes. It is possible to exploit IR-based techniques in CBRS area for developing a novel content-based recommendation framework.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 184. future research.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 185. evaluation with user-based metrics (serendipity, novelty, unexpectedness)Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 186. open knowledge sources and social media for CBRS.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 187. linked data.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 188. modeling context.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 189. end.Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 190. questions?Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
    • 191. “Qualunque cosa tu possa fare, qualunque sogno tupossa sognare, comincia. L’audacia reca in sè genialità, magia e forza. Comincia ora.” Goethe.“Whatever you can do, or dream you can do, begin. Boldness has genius, magic and power in it. Begin it now.” Thank you.Cataldo Musto - Enhanced Vector Space Model for Content-based Recommender Systems - Ph.D. defense - University of Bari “Aldo Moro”, Italy - 08.06.12

    ×