Points covered in this PPT: Syntax Analysis - CFG, top-down and bottom-up parsers, RDP, Predictive parser, SLR,LR(1), LALR parsers, using ambiguous grammar, Error detection and recovery, automatic construction of parsers using YACC,
Introduction to Semantic analysis-Need of semantic analysis, type checking and type conversion.
NLP techniques used for Spell checking to recommend find error in the written word and also suggest a relevant word.
Algorithm: Jaccard Coefficient, The Levenshtein Distance
Points covered in this PPT: Syntax Analysis - CFG, top-down and bottom-up parsers, RDP, Predictive parser, SLR,LR(1), LALR parsers, using ambiguous grammar, Error detection and recovery, automatic construction of parsers using YACC,
Introduction to Semantic analysis-Need of semantic analysis, type checking and type conversion.
NLP techniques used for Spell checking to recommend find error in the written word and also suggest a relevant word.
Algorithm: Jaccard Coefficient, The Levenshtein Distance
A VERY high level over view of Graph Analytics concepts and techniques, including structural analytics, Connectivity Analytics, Community Analytics, Path Analytics, as well as Pattern Matching
Abstract A usage of regular expressions to search text is well known and understood as a useful technique. Regular Expressions are generic representations for a string or a collection of strings. Regular expressions (regexps) are one of the most useful tools in computer science. NLP, as an area of computer science, has greatly benefitted from regexps: they are used in phonology, morphology, text analysis, information extraction, & speech recognition. This paper helps a reader to give a general review on usage of regular expressions illustrated with examples from natural language processing. In addition, there is a discussion on different approaches of regular expression in NLP. Keywords— Regular Expression, Natural Language Processing, Tokenization, Longest common subsequence alignment, POS tagging
----------------------------
Lexical Analysis, Tokens, Patterns, Lexemes, Example pattern, Stages of a Lexical Analyzer, Regular expressions to the lexical analysis, Implementation of Lexical Analyzer, Lexical analyzer: use as generator.
A VERY high level over view of Graph Analytics concepts and techniques, including structural analytics, Connectivity Analytics, Community Analytics, Path Analytics, as well as Pattern Matching
Abstract A usage of regular expressions to search text is well known and understood as a useful technique. Regular Expressions are generic representations for a string or a collection of strings. Regular expressions (regexps) are one of the most useful tools in computer science. NLP, as an area of computer science, has greatly benefitted from regexps: they are used in phonology, morphology, text analysis, information extraction, & speech recognition. This paper helps a reader to give a general review on usage of regular expressions illustrated with examples from natural language processing. In addition, there is a discussion on different approaches of regular expression in NLP. Keywords— Regular Expression, Natural Language Processing, Tokenization, Longest common subsequence alignment, POS tagging
----------------------------
Lexical Analysis, Tokens, Patterns, Lexemes, Example pattern, Stages of a Lexical Analyzer, Regular expressions to the lexical analysis, Implementation of Lexical Analyzer, Lexical analyzer: use as generator.
LAK2011: 1st International Conference on Learning Analytics and Knowledge February 27-March 1, 2011
Banff, Alberta
Anna De Liddo, Simon Buckingham Shum,
Ivana Quinto, Michelle Bachler, Lorella Cannavacciuolo
An evaluation of SimRank and Personalized PageRank to build a recommender sys...Paolo Tomeo
The Web of Data is the natural evolution of the World Wide Web from a set of interlinked documents to a set of interlinked entities. It is a graph of information resources interconnected by semantic relations, thereby yielding the name Linked Data. The proliferation of Linked Data is for sure an opportunity to create a new family of data-intensive applications such as recommender systems. In particular, since content-based recommender systems base on the notion of similarity between items, the selection of the right graph-based similarity metric is of paramount importance to build an effective recommendation engine. In this paper, we review two existing metrics, SimRank and PageRank, and investigate their suitability and performance for computing similarity between resources in RDF graphs and investigate their usage to feed a content-based recommender system. Finally, we conduct experimental evaluations on a dataset for musical artists and bands recommendations thus comparing our results with two other content-based baselines
measuring their performance with precision and recall, catalog coverage, items distribution and novelty metrics.
Impact your Library UX with Contextual InquiryRachel Vacek
A contextual inquiry is a research study that involves in-depth interviews where users walk through common tasks in the physical environment in which they typically perform them. It can be used to better understand the intents and motivations behind user behavior. In this session, learn what’s needed to conduct a contextual inquiry and how to analyze the ethnographic data once collected. I'll cover how to synthesize and visualize your findings as sequence models and affinity diagrams that directly inform the development of personas and common task flows. Finally, learn how this process can help guide your design and content strategy efforts while constructing a rich picture of the user experience.
Linked Open Data-enabled Strategies for Top-N RecommendationsCataldo Musto
Linked Open Data-enabled Strategies for Top-N Recommendations - Cataldo Musto, Pierpaolo Basile, Pasquale Lops, Marco De Gemmis and Giovanni Semeraro - 1st Workshop on New Trends in Content-based Recommender Systems, co-located with ACM Recommender Systems 2014
Transcript - DOIs to support citation of grey literatureARDC
24th May 2017
This webinar was the first in a series examining persistent identifiers and their use in research. It begins with a brief introduction on the use of persistent identifiers in research followed by an outline of how UNSW has approached supporting discovery and citation of grey literature.
Watch the full webinar: https://www.youtube.com/watch?v=TLXYwrBu8wc
Academia, part of my 2014-2015 lectures at the University of Bergamo.Roberto Peretta
Academia, part of my 2014-2015 lectures at the University of Bergamo. It's stimulus material, posted to improve communication with current students. It's not interesting for the academia.
Supervised Sentiment Classification using DTDP algorithmIJSRD
Sentiment analysis is the process widely used in all fields and it uses the statistical machine learning approach for text modeling. The primarily used approach is Bag-of-words (BOW). Though, this technique has some limitations in polarity shift problem. Thus, here we propose a new method called Dual sentiment analysis (DSA) which resolves the polarity shift problem. Proposed method involves two approaches such as dual training and dual prediction (DPDT). First, we propose a data expansion technique by creating a reversed review for training data. Second, dual training and dual prediction algorithm is developed for doing analysis on sentiment data. The dual training algorithm is used for learning a sentiment classifier and the dual prediction algorithm is developed for classifying the review by considering two sides of one review.
Presentation for Ed-Media 2010 Conference, http://www.aace.org/conf/edmedia/, to be held in Toronto, Canada, June 29 –July 2, 2010.
We propose that one of the barriers to OER adoption is the lack of transparency of practitioners’ ‘thinking’ around OERs.Threfore we propose to move from opening up contents and OER to opening people’s thinking about OERs.
Our objective is to make this thinking visible and exportable in a way that support the emergence of collective intelligence around OER. To cater for this we designed Cohere, a prototype socio-technical infrastructure to gather Collective Intelligence around OER.
Presentation for Ed-Media 2010 Conference, http://www.aace.org/conf/edmedia/, to be held in Toronto, Canada, June 29 –July 2, 2010.
We propose that one of the barriers to OER adoption is the lack of transparency of practitioners’ ‘thinking’ around OERs.Threfore we propose to move from opening up contents and OER to opening people’s thinking about OERs.
Our objective is to make this thinking visible and exportable in a way that support the emergence of collective intelligence around OER. To cater for this we designed Cohere, a prototype socio-technical infrastructure to gather Collective Intelligence around OER.
Similar to A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts (20)
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Cataldo Musto
Convegno a Porte Chiuse dell'Associazione Italiana per l'Intelligenza Artificiale insieme al Ministero per gli Affari Esteri e la Cooperazione Internazionale - 30 Giugno 2021
Exploring the Effects of Natural Language Justifications in Food Recommender ...Cataldo Musto
Cataldo Musto, Alain D. Starke, Christoph Trattner, Amon Rapp, and Giovanni Semeraro. 2021. Exploring the Effects of Natural Language Justifications in Food Recommender Systems. In Proceedings of the 29th ACM
Conference on User Modeling, Adaptation and Personalization (UMAP ’21), June 21–25, 2021, Utrecht, Netherlands. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3450613.3456827
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Cataldo Musto
Natural Language Justifications for Recommender Systems Exploiting Text Summarization and Sentiment Analysis - AI*IA 2019 - Italian Conference on Artificial Intelligence
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsCataldo Musto
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints - HUM 2018 – Holistic User Modeling Workshop jointly held with
UMAP 2018 – 26th International
Conference on User Modeling,
Adaptation and Personalization
Singapore - July 8, 2018
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Key Trends Shaping the Future of Infrastructure.pdf
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts
1. DART 2014
8th Internation Workshop on
Information Filtering and Retrieval
Pisa (Italy)
December 10, 2014
A comparison of lexicon-based
approaches for Sentiment Analysis
of microblog posts
Cataldo Musto, Giovanni Semeraro, Marco Polignano
(Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)
2. Outline
• Background
• Sentiment Analysis
• Lexicon-based approaches
• Methodology
• State-of-the-art
lexicons
• Experiments
• Conclusions
Cataldo Musto, Giovanni Semeraro, Marco Polignano 2
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
3. Background
One minute on the Web
Cataldo Musto, Giovanni Semeraro, Marco Polignano 3
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
4. Background
One minute on the Web
4
Information
Overload
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
5. 5
Background
Information Overload
Obstacleor Opportunity?
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
6. 6
Opportunities
(Social) Content Analytics
Insight: to aggregate rough human-generated data to get
valuable people-based findings
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
7. - Real-time polls
7
Social Content Analytics
Applications
- Social CRM
- Online brand
monitoring
All these applications share a common denominator
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
8. - Real-time polls
They all need a methodology to automatically associate
an opinion and/or a polarity to each piece of content
8
Social Content Analytics
Applications
- Social CRM
- Online brand
monitoring
All these applications share a common denominator
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
9. - Real-time polls
9
Social Content Analytics
Applications
- Social CRM
Solution:
- Online brand
monitoring
Sentiment Analysis
All these applications share a common denominator
They all need a methodology to automatically associate
an opinion and/or a polarity to each piece of content
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
10. 10
Sentiment Analysis
Definition
“It is the field of study that
analyzes people’s
opinions, sentiments,
evaluations, appraisals,
attitudes, and emotions
towards entities such as
products, services,
organizations, individuals,
issues, events, topics, and
their attributes “ (*)
(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
11. 11
Sentiment Analysis
Definition
“It is the field of study that
analyzes people’s
opinions, sentiments,
evaluations, appraisals,
attitudes, and emotions
towards entities such as
products, services,
organizations, individuals,
issues, events, topics, and
their attributes “ (*)
(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)
We will focus on the polarity detection task
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
12. 12
Sentiment Analysis
State of the art
Supervised
Approaches
(Machine Learning-based)
Unsupervised
Approaches
(Lexicon-based)
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
13. Man ?
13
Sentiment Analysis
Supervised approaches
Dog
Learn a classification model
relying on labeled examples
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
14. frustration - -
joy +++
14
Sentiment Analysis
Unsupervised approaches
Rely on external lexical resources
that associate a polarity score to each term.
Sentiment of the content depends on
the sentiment of the terms which compose it.
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
15. 15
Sentiment Analysis
Supervised vs Unsupervised
Pros Cons
Nakov, Preslav, et al. "Semeval-2013 task 2: Sentiment analysis in Twitter.”
Proceedings of SemEval 2013
Rosenthal, Sara, et al. "Semeval-2014 task 9: Sentiment analysis in Twitter."
Proceedings of SemEval 2014.
(*)
(**)
Supervised
Higher Accuracy
(*) (**)
Pre-labeled
examples
Unsupervised No Training
Accuracy depends on lexical
resources
Several lexical resources available
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
16. Pros Cons
Supervised
Higher Accuracy
(*) (**)
Pre-labeled
examples
Unsupervised No Training
Accuracy depends on lexical
resources
Several lexical resources available
We focus on
lexicon-based approaches
16
Sentiment Analysis
Supervised vs Unsupervised
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
17. 17
Contributions
We propose a novel
unsupervised lexicon-based
approach for
sentiment analysis
We provide a
comparison of
lexical resources for
sentiment analysis of
microblog posts
1.
2.
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
18. 18
Methodology
Lexicon-based approach
Insight:
The polarity of a textual content (e.g. a
microblog posts) depends on the polarity
of the microphrases which compose it.
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
19. 19
Methodology
Lexicon-based approach
Insight:
The polarity of a textual content (e.g. a
microblog posts) depends on the polarity
of the microphrases which compose it.
A microphrase is built
whenever a splitting cue
is found in the text
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
20. Conjunctions, adverbs and
punctuations are used as
20
Methodology
Lexicon-based approach
Insight:
The polarity of a textual content (e.g. a
microblog posts) depends on the polarity
of the microphrases which compose it.
A microphrase is built
whenever a splitting cue
is found in the text
splitting cues
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
21. Conjunctions, adverbs and
punctuations are used as
21
Methodology
Lexicon-based approach
Insight:
The polarity of a textual content (e.g. a
microblog posts) depends on the polarity
of the microphrases which compose it.
A microphrase is built
whenever a splitting cue
is found in the text
splitting cues
example: “I don’t like this food, it’s terrible”
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
22. Conjunctions, adverbs and
punctuations are used as
22
Methodology
Lexicon-based approach
Insight:
The polarity of a textual content (e.g. a
microblog posts) depends on the polarity
of the microphrases which compose it.
A microphrase is built
whenever a splitting cue
is found in the text
splitting cues
example: “I don’t like this food, it’s terrible”
{
{
splitting
m1 cue
m2
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
23. 23
Methodology
Lexicon-based approach
Insight:
The polarity of a textual content (e.g. a
microblog posts) depends on the polarity
of the microphrases which compose it.
k
pol(T) = Σ pol(mi)
i=1
Tweet microphrase
T={m1…mk}
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
24. 24
Methodology
Lexicon-based approach
Insight:
The polarity of a microphrase depends on
the polarity of the terms which compose it.
k
pol(T) = Σ pol(mi)
i=1
Tweet microphrase
n
pol(mi) = Σ score(tj)
j=1
term
T={m1…mk}
Mi={t1…tn}
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
25. 25
Methodology
Four variant proposed
Basic
k
pol(T) = Σ pol(mi) i=1
n
pol(mi) = Σ score(tj)
j=1
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
26. Four variant proposed
Normalized
pol(T) = Σ pol(mi) i=1
pol(mi) = Σ
score(tj)
26
Methodology
Basic
k
pol(T) = Σ pol(mi) i=1
n
pol(mi) = Σ score(tj)
j=1
n
|mi|
j=1
Score of each microphrase is normalized
according to its length
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
27. Four variant proposed
Normalized
pol(T) = Σ pol(mi) i=1
pol(mi) = Σ
score(tj)
with an higher weight
categories=adverbs, verbs, adjectives & valence
27
Methodology
Basic
k
pol(T) = Σ pol(mi) i=1
n
pol(mi) = Σ score(tj)
j=1
n
|mi|
j=1
Emphasized
pol(T) = Σ pol(mi) i=1
pol(mi) = n
Σ score(tj)
j=1
*w(tj)
Specific categories are provided
&&
valence shifters (intensifiers & downtoners)
Several weights have been evaluated
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
28. Four variant proposed
Normalized
pol(T) = Σ pol(mi) i=1
pol(mi) = Σ
score(tj)
28
Methodology
Basic
k
pol(T) = Σ pol(mi) i=1
n
pol(mi) = Σ score(tj)
j=1
n
|mi|
j=1
Emphasized Normalized-Emphasized
pol(T) = Σ pol(mi) i=1
pol(mi) = n
Σ score(tj)
j=1
pol(T) = Σ pol(mi)
pol(mi) = Σscore(tj)
Combination
|mi| *w(tj) *w(tj)
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
29. We have a problem
Normalized
pol(T) = Σ pol(mi) i=1
pol(mi) = Σ
score(tj)
29
Methodology
Basic
k
pol(T) = Σ pol(mi) i=1
n
pol(mi) = Σ score(tj)
j=1
n
|mi|
j=1
Emphasized Normalized-Emphasized
pol(T) = Σ pol(mi) i=1
pol(mi) = n
Σ score(tj)
j=1
pol(T) = Σ pol(mi)
pol(mi) = Σscore(tj)
|mi| *w(tj) *w(tj)
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
30. We have a problem
Normalized
pol(T) = Σ pol(mi) i=1
pol(mi) = Σ
How to calculate
score(score(tj) ?
tj)
30
Methodology
Basic
k
pol(T) = Σ pol(mi) i=1
n
pol(mi) = Σ score(tj)
j=1
n
|mi|
j=1
Emphasized Normalized-Emphasized
pol(T) = Σ pol(mi) i=1
pol(mi) = n
Σ score(tj)
j=1
pol(T) = Σ pol(mi)
pol(mi) = Σscore(tj)
|mi| *w(tj) *w(tj)
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
31. 31
Solution
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
32. 32
Lexical Resources
State of the art
We evaluated four state-of-the-art
resources for sentiment analysis
SentiWordNet
http://sentiwordnet.isti.cnr.it
WordNet Affect
http://wndomains.fbk.eu/wnaffect.html
SenticNet
http://sentic.net
MPQA
http://mpqa.cs.pitt.edu
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
33. 33
Lexical Resources SentiWordNet(*)
Each WordNet synset is provided with three different
sentiment scores (positivity, negativity, objectivity)
(*) Baccianella, Stefano, Andrea Esuli, and Fabrizio
Sebastiani. "SentiWordNet 3.0: An Enhanced Lexical
Resource for Sentiment Analysis and Opinion Mining."
LREC. Vol. 10. 2010.
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
34. 34
Lexical Resources WordNet Affect(*)
WordNet extension
Affective-related synsets
are mapped with an A-Label
e.g. euphoria —> positive-emotion
illness —> physical state
(*) Strapparava, Carlo, and Alessandro Valitutti. "WordNet
Affect: an Affective Extension of WordNet." LREC. Vol. 4.
2004.
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
35. 35
Lexical Resources SenticNet(*)
Inspired by the Hourglass of
Emotions model
Each term is represented of the
ground of the intensity of four basic
emotional dimensions (sensitivity,
aptitude, attention, pleasantness)
The activation level of each dimension
defines 16 basic emotions
(*) Cambria, Erik, Daniel Olsher, and Dheeraj Rajagopal.
"SenticNet 3: a common and common-sense knowledge
base for cognition-driven sentiment analysis." Twenty-eighth
AAAI conference on artificial intelligence. 2014.
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
36. 36
Lexical Resources SenticNet(*)
According to the triggered emotions, each term
is provided with an aggregated polarity score
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
37. 37
Lexical Resources SenticNet(*)
SenticNet models a sentiment score
for some bigrams and trigrams as well!
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
38. 38
Lexical Resources MPQA(*)
(*) Wilson, Theresa, Janyce Wiebe, and Paul Hoffmann.
"Recognizing contextual polarity in phrase-level
sentiment analysis." Proceedings of the conference on
human language technology and empirical methods in
natural language processing. Association for Computational
Linguistics, 2005.
Each term is
(manually) provided
with a discrete
sentiment score
+1 positive
0 neutral
-1 negative
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
39. 39
Lexical Resources Comparison
Resource Coverage (terms)
SentiWordNet 117,659
WordNet Affect 200
SenticNet 14,000
MPQA 8,222
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
40. Cataldo Musto, Giovanni Semeraro, Marco Polignano 40
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
41. 41
Lexical Resources
Score calculation
SentiWordNet
Given a term,
score(tj) is the
mean of the
sentiment score of
all the possible
synsets of tj
score(good) = 0.75 + 0 + 1 +1 =
4
0.687
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
42. Score calculation
Given a term, score(tj),
WordNet Affect hierarchy is
climbed until an A-Label which
occur in SentiWordNet is found.
tj inherits the sentiment
score of the A-Label
score(good) = score(benevolence) =
0.339
42
Lexical Resources
WordNet Affect
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
43. 43
Lexical Resources
Score calculation
SenticNet
Given a term,
score(tj), SenticNet
APIs are queried
and sentiment
score is extracted
score(good) = 0.883
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
44. 44
Lexical Resources
Score calculation
MPQA
Given a term,
score(tj), MPQA
Lexicon are
queried and
sentiment score is
extracted
score(good) = 1
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
45. 45
Methodology
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
46. Experimental Evaluation
Research Hypothesis
46
1. How do the different
versions of the algorithm
perform with respect to state-of-the-
art datasets?
2. What is the best lexical
resource to detect the polarity
of microblog posts?
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
47. Experimental Evaluation
Description of the datasets
47
• SemEval-2013 • 14,435 Tweets • 8,180 training • 3,255 test • Positive, Negative, Neutral • STS Dataset • 1,600,000 Tweets • only 359 test • Positive, Negative
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
48. Experimental Evaluation
Statistics about Coverage
48
Lexicon SemEval-2013-Test STS-Test
Vocabulary Size 18,309 6,711
SentiWordNet 4,314 883
WordNet-Affect 149 48
MPQA 897 224
SenticNet 1,497 326
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
49. Experiment 1
49
Intra-Lexicons evaluation
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
50. norm vs norm+emph
significant (p < 0,0001)
Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
57,67
58,1
58,65
58,99
45 50 55 60 65
50
SemEval :: SentiWordNet
Emphasis and Normalization improve the accuracy
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
51. Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
53,92
55,05
53,95
55,08
not significant
45 50 55 60 65
51
SemEval :: WordNet Affect
Emphasis and Normalization improve the accuracy
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
52. Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
58,03
57,97
58,25
58,1
not significant
45 50 55 60 65
52
SemEval :: MPQA
Emphasis improves the accuracy. Normalization doesn’t.
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
53. Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
48,69
47,25
48,29
48,08
norm vs norm+emph
significant (p < 0,0001)
45 50 55 60 65
53
SemEval :: SenticNet
No improvement
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
54. Experiment 1
54
General Outcomes
SentiWordNet WordNet Affect MPQA
Emphasis leads to improvements
(7 out of 8 comparisons).
1.
2.
SenticNet
Normalization doesn’t. (1 out of
4 comparisons)
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
55. Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
71,87
72,42
71,31
71,59
not significant
gaps
60 63,75 67,5 71,25 75
55
STS :: SentiWordNet
Normalization improves the accuracy. Emphasis doesn’t
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
56. Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
62,95
62,67
62,96
62,95
60 63,75 67,5 71,25 75
56
STS :: WordNet Affect
not significant
gaps
Emphasis improves the accuracy. Normalization doesn’t
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
57. Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
69,54
70,75
69,92
70,76
60 63,75 67,5 71,25 75
57
STS :: MPQA
not significant
gaps
Both Emphasis and Normalization improve the accuracy.
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
58. Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
74,37
74,65
74,65
73,82
not significant
70 71,75 73,5 75,25 77
58
STS :: SenticNet
Normalization improves the accuracy. Emphasis doesn’t
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
59. Experiment 1
SenticNet
59
General Outcomes
SentiWordNet WordNet Affect MPQA
1.
Controversial behavior (normalization
typically improves, emphasis doesn’t) 2.
Little statistical significance
(small dataset)
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
60. Experiment 2
60
Inter-Lexicons evaluation
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
61. Experiment 2
61
Comparison between lexicons
Accuracy
80
60
40
20
0
SentiWordNet SenticNet WordNet-Affect MPQA
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
62. Experiment 2
SentiWordNet is the best-performing configuration on SemEval data
62
Comparison between lexicons
Accuracy
80
60
40
20
0
SentiWordNet SenticNet WordNet-Affect MPQA
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
63. Experiment 2
63
Comparison between lexicons
Accuracy
80
60
40
20
0
SentiWordNet SenticNet WordNet-Affect MPQA
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99
MPQA well-performs on SemEval data
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
64. Experiment 2
SenticNet has a controversial behavior: worst on SemEval - best on STS
64
Comparison between lexicons
Accuracy
80
60
40
20
0
SentiWordNet SenticNet WordNet-Affect MPQA
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
65. Experiment 2
Reason: SenticNet can hardly classify neutral Tweets (threshold learning?)
65
Comparison between lexicons
Accuracy
80
60
40
20
0
SentiWordNet SenticNet WordNet-Affect MPQA
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
66. Experiment 2
66
Comparison between lexicons
Accuracy
80
60
40
20
0
SentiWordNet SenticNet WordNet-Affect MPQA
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99
SentiWordNet and MPQA confirm their performance on STS
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
67. Experiment 2
Poor coverage negatively influences Wordnet-Affect performances
67
Comparison between lexicons
Accuracy
80
60
40
20
0
SentiWordNet SenticNet WordNet-Affect MPQA
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
68. Experiment 2
68
Statistical Analysis
Accuracy
80
60
40
20
0
SentiWordNet SenticNet WordNet-Affect MPQA
58,25
62,96
55,08
74,65
48,69
72,42
best p < 0,0001 p < 0,001 p < 0,50 p < 0,42 best p < 0,0001 p < 0,11
SemEval-2013 STS
70,76
58,99
= not significant gap = significant gap
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
69. Experiment 2
69
Conclusions
Accuracy
80
60
40
20
0
SentiWordNet SenticNet WordNet-Affect MPQA
58,25
62,96
55,08
74,65
48,69
72,42
best p < 0,0001 p < 0,001 p < 0,50 p < 0,42 best p < 0,0001 p < 0,11
SemEval-2013 STS
70,76
58,99
= best-performing lexicons
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
70. Conclusions
Cataldo Musto, Giovanni Semeraro, Marco Polignano 70
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
71. Lessons Learned
INVESTIGATION ABOUT THE EFFECTIVENESS OF LEXICAL RESOURCES IN
POLARITY CLASSIFICATION OF MICROBLOG POSTS
Comparison of 4 state-of-the-art resources
71
SentiWordNet - SenticNet - MPQA - WordNet Affect
Evaluation.
Research Question: What is the impact of each lexical resource in
the task of polarity classification?
MPQA and SentiWordNet typically overcome other resources
(interesting result, due to the smaller coverage of MPQA)
SenticNet behavior is worth to be deepen investigated
1.
2.
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
72. Future Research
72
Evaluation against different datasets and with
more lexical results;
Better tuning of parameters (classification
threshold) , integration of more complex
syntactic structures, merging lexical resources
Integration of the algorithm in a
recommendation framework to exploit
sentiment-based information to model user
interests
Cataldo Musto, Giovanni Semeraro, Marco Polignano
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014