The document discusses using random indexing and quantum negation for TV show retrieval and classification. It presents an approach using these techniques to build a recommender system for personalized electronic program guides. The key tasks are classifying TV shows into categories like movies, sports, and documentaries, and retrieving shows that users would like based on their preferences and past viewing behavior. The document outlines issues with existing vector space models and how random indexing and quantum negation can help address these issues for real-world recommendation scenarios.
Event Detection in Surveillance Video: How we Got Here, What We Should Do Next - presentation on our work on crowd counting and a reflection on 10 years of TRECVid Surveillance Event Detection task
https://mcv-m6-video.github.io/deepvideo-2019/
These slides provides an overview of how deep neural networks can be used to solve an object tracking task
Invited talk at USTC and SJTU, discuss recent progress in object re-identification against very large repository, especially the problem of fast key point detection, feature repeatability prediction, aggregation, and object repository indexing and search.
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)Thilo Stadelmann
A high-level introduction to the current buzz around "Deep Learning" (That it is famous, successfull, and a continuation of neural network research; what is new since the last century, what is the basic idea, what is our outlook into ints future).
Followed by our stake in it and two use cases (face recognition, text analytics).
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...maranlar
Within computer science, "Multimedia" is a field of research that investigates how computers can support people in communication, information finding, and knowledge/opinion building. Multimedia content is defined broadly. It includes not only video, but also images accompanied by text and other information (for example, a geo-location). It can be professionally produced, or generated by users for online sharing. Computer scientists historically have a “love-hate” relationship with multimedia. They “love” it because of the richness of the data sources and the wealth of available data, which leads to interesting problems to tackle with machine learning. They “hate” it because multimedia is a diffuse and moving target: the interpretation of multimedia differs from person to person, and changes over time in the course of its use as a communication medium. This talk gives a view onto ongoing research in the area of multimedia information retrieval algorithms, which help people find multimedia. We look at a series of topics that reveal how pattern recognition, text processing, and crowdsourcing tools are used in multimedia research, and discuss both their limitations and their potential.
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Cataldo Musto
Convegno a Porte Chiuse dell'Associazione Italiana per l'Intelligenza Artificiale insieme al Ministero per gli Affari Esteri e la Cooperazione Internazionale - 30 Giugno 2021
Exploring the Effects of Natural Language Justifications in Food Recommender ...Cataldo Musto
Cataldo Musto, Alain D. Starke, Christoph Trattner, Amon Rapp, and Giovanni Semeraro. 2021. Exploring the Effects of Natural Language Justifications in Food Recommender Systems. In Proceedings of the 29th ACM
Conference on User Modeling, Adaptation and Personalization (UMAP ’21), June 21–25, 2021, Utrecht, Netherlands. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3450613.3456827
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Cataldo Musto
Natural Language Justifications for Recommender Systems Exploiting Text Summarization and Sentiment Analysis - AI*IA 2019 - Italian Conference on Artificial Intelligence
More Related Content
Similar to Random Indexing and Quantum Negation for TV-Shows Retrieval and Classification
Event Detection in Surveillance Video: How we Got Here, What We Should Do Next - presentation on our work on crowd counting and a reflection on 10 years of TRECVid Surveillance Event Detection task
https://mcv-m6-video.github.io/deepvideo-2019/
These slides provides an overview of how deep neural networks can be used to solve an object tracking task
Invited talk at USTC and SJTU, discuss recent progress in object re-identification against very large repository, especially the problem of fast key point detection, feature repeatability prediction, aggregation, and object repository indexing and search.
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)Thilo Stadelmann
A high-level introduction to the current buzz around "Deep Learning" (That it is famous, successfull, and a continuation of neural network research; what is new since the last century, what is the basic idea, what is our outlook into ints future).
Followed by our stake in it and two use cases (face recognition, text analytics).
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...maranlar
Within computer science, "Multimedia" is a field of research that investigates how computers can support people in communication, information finding, and knowledge/opinion building. Multimedia content is defined broadly. It includes not only video, but also images accompanied by text and other information (for example, a geo-location). It can be professionally produced, or generated by users for online sharing. Computer scientists historically have a “love-hate” relationship with multimedia. They “love” it because of the richness of the data sources and the wealth of available data, which leads to interesting problems to tackle with machine learning. They “hate” it because multimedia is a diffuse and moving target: the interpretation of multimedia differs from person to person, and changes over time in the course of its use as a communication medium. This talk gives a view onto ongoing research in the area of multimedia information retrieval algorithms, which help people find multimedia. We look at a series of topics that reveal how pattern recognition, text processing, and crowdsourcing tools are used in multimedia research, and discuss both their limitations and their potential.
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Cataldo Musto
Convegno a Porte Chiuse dell'Associazione Italiana per l'Intelligenza Artificiale insieme al Ministero per gli Affari Esteri e la Cooperazione Internazionale - 30 Giugno 2021
Exploring the Effects of Natural Language Justifications in Food Recommender ...Cataldo Musto
Cataldo Musto, Alain D. Starke, Christoph Trattner, Amon Rapp, and Giovanni Semeraro. 2021. Exploring the Effects of Natural Language Justifications in Food Recommender Systems. In Proceedings of the 29th ACM
Conference on User Modeling, Adaptation and Personalization (UMAP ’21), June 21–25, 2021, Utrecht, Netherlands. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3450613.3456827
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Cataldo Musto
Natural Language Justifications for Recommender Systems Exploiting Text Summarization and Sentiment Analysis - AI*IA 2019 - Italian Conference on Artificial Intelligence
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsCataldo Musto
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints - HUM 2018 – Holistic User Modeling Workshop jointly held with
UMAP 2018 – 26th International
Conference on User Modeling,
Adaptation and Personalization
Singapore - July 8, 2018
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Essentials of Automations: Optimizing FME Workflows with Parameters
Random Indexing and Quantum Negation for TV-Shows Retrieval and Classification
1. High Tech Campus, Philips Research
Eindhoven, Netherlands
Random Indexing and Quantum
Negation for TV-Shows
Retrieval and Classification
Cataldo Musto, Ph.D. Student
cataldomusto@di.uniba.it - cataldo.musto@philips.com
University of Bari “Aldo Moro” (Italy), SWAP Research Group
Philips Research Center - Eindhoven (Netherlands) - HI&E Group
14.07.11
2. outline
• part 1: introduction
• information overload, personalization, information filtering, recommender
systems
• part 2: approaches
• vector space model, random indexing, quantum negation
• part 3: scenario
• tv-show recommendation, description of the data, description of the tasks
• part 4: experimental evaluation
• results, discussion, future work
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
3. part 1: introduction
what are we talking about?
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
4. TV
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
5. text messages
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
6. phone calls
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
7. internet navigation
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
8. scenario
• Daily interaction with electronic
devices
• eMail, Web navigation, Social
media, instant messaging
• Continuous flow of
information
• in 2007, 500.000 terabyte of
information have been produced
on the Web in one year
• By including also telephone,
radio, TV and so on we reach 18
exabytes of data!
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
9. information overload
• Consequences:
cognitive overload
• It is impossible to
effectively deal with
this surplus of
information
• It is difficult to quickly
find the information
we really need
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
11. information filtering
”
An information filtering system is a
system that removes redundant of
unwanted information from an information
stream using automated methods ”
Wikipedia.
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
12. information filtering systems
• How do they work?
• Usually, in three steps
• Training Step
• User Modeling
• Filtering
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
13. Step 1:
Training
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
14. Step 2:
User Modeling
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
15. Step 3:
Filtering
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
16. recommender systems
• A specific type of Information Filtering system
that attempts to recommend
information items (films, television, video on
demand, music, books, etc) that are likely to be of
interest to the user
• Everyday we interact with recommender
systems, even if we do not know it!
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
17. Amazon
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
18. YouTube
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
19. recommendation approaches
• Content-based filtering
• No interactions between users. Each user is an atomic entity
• Prerequisite: each item to be recommended has to be described through a set of
textual features
• We store in a user profile the features that often
occur in the items she like
• Assumption: if a user usually likes items in whose description often occurs a specific feature we
can assume that he will like that items also in the future
• e.g.
• If User_A likes a news with the features “Football” and “Internazionale FC” inside
• We can recommend her other news about both Football or Internazionale
FC
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
20. part 2: approaches
vector space model, random indexing,quantum negation
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
21. vector space model
• Introduced by Salton in
1975
• Given a set of M documents
(items) d = (d1.....dM)
• Given N features describing
the documents
• Each document (item) is
represented in a an N-
dimensional vector space
• The whole corpus is
represented in a N*M matrix
called term/document
matrix
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
22. vector space model
• VSM in a recommendation scenario
• Document: point in the vector space
• User profile: point in the vector space
• e.g. built as the sum of the vector space representation of the documents
liked in the past by the user
• Goal: to find the documents that are the most relevant ones for that user profile
• Assumption
• the most similar documents in the vector space are the most
relevant ones
• Cosine Similarity to compute the similarity between query and
documents
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
23. vsm analysis (2)
• Weak Points
• Not incremental
• The whole Vector Space has to be generated from scratch
whenever a new item is added to the repository
• High Dimensionality
• NLP operations (stopwords elimination, stemming and so on)
• Does not manage negative evidence
• The vector space representation only depends on the features that occur in
the document, there are no assumption about the features that don’t occur
• Does not manage the latent semantic of documents
• Any permutation of the terms in a document has the same
VSM representation!
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
24. idea
• To introduce tools and techniques
able to overcome these drawbacks
• Random Indexing
• Dimensionality reduction technique
Sahlgren, 2005
• Quantum Negation
• Based on Quantum Logic
Widdows, 2007
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
25. random indexing
• Random Indexing (RI) is an incremental and effective
technique for dimensionality reduction
• Distributional Models
• Assumption: we can infer information about terms
by analyzing how are they used in large corpus of data
• Based on the so-called “Distributional Hypothesis”
• “Words that occur in the same context tend to have
similar meanings”
• “Meaning is its use” (Wittgenstein)
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
26. how it works?
Random Indexing reduces the original
dimensional term/doc matrix to a new lower
dimensional matrix
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
27. how it works?
• How?
• By multiplying the original
matrix with a random
one, built in an incremental
way
• formally: An,m * Rm,k = Bn,k
• k << m
• After projection, the
distance between points in
the vector space is preserved
• Johnson-Lindenstrauss
Lemma
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
28. random matrix
• How is the random matrix build?
• The whole process is based on the concept of “context”
• Given a term, its “context” could be the whole document, a
paragraph, a sentence, a sliding window of words and so on.
• The definition of the context influences the structure of the
matrix
• The matrix is built in an iterative and incremental way
• The vector representing each document depends on the terms
that occur in it
• The vector representing each term depends on its context
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
29. item representation
• A context vector is assigned for each context (for simplicity, we
assume as context the whole document)
• This vector has a fixed dimension (k) and it can contain only values in
-1, 0,1. Values are distributed in a random way but the number of non-
zero elements is much smaller.
• The Vector Space representation of a term is obtained by summing all
its context (the documents it occurs in).
• The Vector Space representation of a document (item) is
obtained by summing the context vectors of the terms that occur in it
• Output: lower-dimensional vector space representation
based on random context vectors
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
30. quantum negation
• Random Indexing is still not capable of managing negative evidence
• RI can be coupled with Quantum Negation (QN) operator
• Definition inherited by Quantum logic
• Negation as a form of orthogonality between
vectors
• Given two vectors A e B , we can define the vector A not B
• It represents the projection of the vector A on the subspace
orthogonal to those generated by vector B
• In a recommendation scenario, this operator could be used to
model two vectors, the first one representing positive
evidence and the second one for modeling negative ones
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
31. ...summing up
• VSM is an effective model for document retrieval
• It can be exploited in recommendation scenarios
• It suffers from some well-known drawbacks
• Solutions
• Random Indexing is an incremental and effective approach
that can catch the high-dimensionality problem
• Quantum Negation can effectively model negative evidence
• The combined use of RI and QN is a good
alternative to VSM, especially for real-life scenarios
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
32. part 3: scenario
tv-shows recommendation
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
33. Scenario:
EPG (Electronic Program Guides)
personalization
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
34. scenario
• Given a set of TV-Shows
we want to provide
user a set of
suggestions about the
shows that she should
watch, according on her
preferences
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
35. approach
Currently the recommendation
model is implemented through
the Vector Space Model (VSM)
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
36. data
• TV shows gathered from a set of
47 German-language broadcast
channel
• Each TV show is described
through a set of textual
features (title, synopsis,
description, etc.) gathered from an
XML feed
• Each TV-Show is mapped to a fixed
program type (Movie, Sport,
Documentary, Magazine, etc.)
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
37. problems
• How to represent the data?
• We compared two approaches
• Bag of Words (BOW)
• Tag.me
• Which ones are the typical use cases?
• We identified two tasks
• Classification Task
• Retrieval Task
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
38. data representation
• Bag of Words
• Each item i is described through the
words that appear in the text
• Weighting of the words
• Counting of the occurrences,
normalization, TF-IDF weighting, etc.
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
39. BOW representation
• To improve BOW representation
• Usually textual description are very noisy
• Full of uninformative words
• Further processing can improve
the classical BOW representation
• Stopword removal: filtering of all the
uninformative words (articles, adverbs,
adjectives and so on)
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
40. data representation
• Tag.me
• Online tool developed by the University
of Pisa (Italy)
• Goal: to identify Wikipedia concepts that
occur in the text
• Idea: to process original text through Tag.me
in order to avoid noise and provide a novel
representation based on high-level
Wikipedia concepts
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
41. tag.me web interface
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
42. final output
Bow
Tag.me
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
43. description of the tasks
• task 1: classification
• Given a flow of TV shows, we would classify
them against a the set of program types
• task 2: retrieval
• Given a set of program type and a repository
of TV shows, we would retrieve the shows
that belong to a specific program type
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
44. VSM for TV shows classification
• Steps
• 1) Build a vector space for the tv shows
• 2) Build a vector for each program type
• 3) Use cosine similarity to compare tv shows
and program types
• 4) Assign the TV show to the program type that got
the highest cosine similarity
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
45. VSM for TV shows classification
• Step 1: build a vector space
representation of the TV-shows
• For each TV show we collected a set of words by
using the synopsis and the title of the show
• We filtered out the set of the words through a
fixed set of 996 stopwords for
German language
• We calculated the TF-IDF score for each
document
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
46. VSM for TV shows classification
• Step 2: build a vector for each
program type
• Given the vector space representation of
each document
• The vector space representation of each
program type is the sum of the
vector space representations of each tv-
show that belongs to that program type
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
47. VSM for TV shows classification
• Given a set of TV-shows
• T=(s1...sn)
• Given a set of program types
• P=(t1...tm)
• We define a function pt: P T
• It returns the program type of a tv show
• We can build the set S(t_i) as the set of the tv-shows that belong to t_i
• It returns the program type of a tv show
•
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
48. VSM for TV shows classification
• Given the set
S(t_i) with a
cardinality of k,
the vector space
representation of
the program
type is simply
given by
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
49. VSM for TV shows classification
• Step 3 and Step 4
• Given the vector space representation of both
program types and tv shows
• Use of cosine similarity to compare each TV
shows against the set of the program types
• We assigned the TV show to the program type
that got the highest cosine similarity
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
50. RI for TV shows classification
• Steps
• 1) Build a vector space for the tv shows
• 2) Reduce the vector space through the
Random Indexing algorithm
• 3) Build a vector for each program type on the (reduced)
vector space
• 4) Use cosine similarity to compare tv shows and
program types
• 5) Assign the TV show to the program type that got the
highest cosine similarity
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
51. RI for TV shows retrieval
• Steps
• 1) Build a vector space for the tv shows
• 2) Reduce the vector space through the Random
Indexing algorithm
• 3) Build a positive vector for each program type on the
(reduced) vector space
• 4) Use cosine similarity to compare tv shows and
program types
• 5) Rank the tv shows and assign the first N to
the program type
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
52. RI+QN for TV shows retrieval
• Steps
• 1) Build a vector space for the tv shows
• 2) Reduce the vector space through the Random Indexing
algorithm
• 3) Build a positive vector for each program type on the
(reduced) vector space
• 4) Build a negative vector for each program type
on the (reduced) vector space
• 5) Use cosine similarity to compare tv shows with
both positive and negative program types vectors
• 6) Rank the tv shows and assign the first N to the program type
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
53. RI+QN for TV shows retrieval
• Given a set of TV-shows
• T=(s1...sn)
• Given a set of program types
• P=(t1...tm)
• We define a function pt: P T
• It returns the program type of a tv show
• We can build the set S(t_i) as the set of the tv-shows that belong to t_i
• It returns the program type of a tv show
•
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
54. RI+QN for TV shows retrieval
• Given the sets S(t_i) and
its complement with a
cardinality of k and z the
vector space
representation of the
program type is simply
given by
• The positive and negative
vector will be combined in
order to emphasize the
features that occur in the
positive vector and avoid
the ones that occur in the
negative one
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
55. ...summing up
• Classification task
• Comparison of VSM and RI
• We build a vector space
• Applied RI to reduce the vector space
• We tried to classify TV shows in the complete vector space and in the reduced
one, comparing the accuracy
• Retrieval task
• Comparison of RI and RI+QN
• We build a vector space
• Applied RI to reduce the vector space
• Build both positive and negative program types vectors and applied QN
• We tried to retrieve TV shows and we compared the the RI without negation and
the RI with negation
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
56. part 4: experimental evaluation
results, discussion, future work
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
57. dataset
program
tv shows 133.579 17
types
features features
306,006 74,599
(BOW) (Tag.me)
avg
avg features
42.11 features 9.21
(BOW)
(Tag.me)
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
58. experimental design
• 10-fold cross validation
• Dataset splitted in 10 partitions
• 9 partitions for training the models, the
last one for testing
• Results averaged over all the
partitions
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
59. metrics
• classification task
• precision =
• retrieval task
• precision @n =
• precision @k% =
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
60. tuning of parameters
• Random Indexing algorithm
• Dimension of the vectors
• Classification task: 500, 700
• Retrieval task: 500, 1000, 1500, 2000
• Minimum number of occurrences
• Classification task: 2
• Retrieval task: 1, 3
• Training Cycles
• Classification task: 1, 2
• Retrieval task: 1
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
61. classification task - results
size occur. cycles tag.me bow
500 2 1 37.38 42.91
700 2 1 40.28 47.76
500 2 1 44.61 54.32
700 2 1 45.33 54.33
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
62. classification task: comparison
68.7
54.3 54.3
47.7
42.9
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
63. classification - results per program type
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
64. classification task - outcomes
• BOW better than Tag.me
• Representation too poor
• Difficult to learn a solid and effective model for text classification
• Dimension of the vector space and the second training cycles affect the
predictive accuracy
• RI does not overcome the baseline
• Vector space reduced over 99% (from 133579 to 500 or 700)
• Too much loss of information
• but
• Splitting the results for single program types the Random Indexing got better results in
10 out of 17 program types
• Need to investigate the reasons of that
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
65. retrieval task - bow - p@n
82.6%
66.3%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
66. retrieval task - bow - p@n
65.9%
45.2%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
67. retrieval task - bow - p@n
58.1%
36.5%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
68. retrieval task - bow - p@k%
86.0%
58.1%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
69. retrieval task - bow - p@k%
55.4%
35.4%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
70. retrieval task - tagme - p@n
61.9%
47.9%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
71. retrieval task - tagme - p@n
53.7%
40.9%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
72. retrieval task - tagme - p@n
51.6%
39.0%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
73. retrieval task - tagme - p@k%
76.6%
57.9%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
74. retrieval task - tagme - p@k%
49.6%
35.4%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
75. retrieval task - overview
82.6%
61.9%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
76. retrieval task - overview
65.0%
53.0%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
77. retrieval task - overview
58.3%
53.2%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
78. retrieval task - outcomes
• BOW always better than Tag.me
• Between 5 and 20% difference
• Parameters do not affect the accuracy
• QN operator improves the retrieval
accuracy by almost 20%
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
79. conclusions & future work
• In scenarios where the recommender system has to deal with a continous flow of
information the VSM is not suitable
• RI is able to effectively catch typical VSM drawbacks
• Classification task
• Even if its accuracy is lower, these preliminar results need to be further
investigated, for example testing the algorithm with different values
of the parameters
• Is a worsening in precision suitable for an algorithm that provides a big
improvement in scalability and efficiency?
• Retrieval Task
• QN improves the predictive accuracy of the model in the
retrieval tasks
• Novel operator, this is important outcome with a good
scientific impact
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
80. Thanks for you
attention.
Cataldo Musto, Ph.D. Student
cataldomusto@di.uniba.it - cataldo.musto@philips.com
University of Bari “Aldo Moro” (Italy), SWAP Research Group
Philips Research Center - Eindhoven (Netherlands) - HI&E Group
14.07.11
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11