MTech Seminar Presentation [IIT-Bombay]

Resources for Sentiment Analysis
Seminar Presentation
Sagar Ahire
133050073
IIT Bombay
02 May, 2014
Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 1 / 48

Roadmap
1 Introduction
2 Sentiwordnet
3 SO-CAL
4 Wordnet-Aﬀect
5 Indian-Language Sentiwordnets
6 Conclusions

Introduction
Roadmap: We Are Here
1 Introduction
2 Sentiwordnet
3 SO-CAL
4 Wordnet-Aﬀect
6 Conclusions

Introduction Overview
Overview
An overview of today’s presentation:
This presentation covers lexical resources for sentiment analysis.

Overview
Four resources are covered, each using a diﬀerent approach for
representation and creation:

Overview
Sentiwordnet, created automatically, with 3 graded scores per synset

Overview
SO-CAL, created manually, with a graded score per word

Overview
Wordnet-Aﬀect, created semi-automatically, with aﬀect information for
each synset

Overview
Wordnet-Aﬀect, created semi-automatically, with aﬀect information for
each synset
Indian-Language Sentiwordnet, created by projecting the English
Sentiwordnet

Introduction Sentiment Analysis
Sentiment Analysis
Sentiment Analysis: Determining the opinion expressed in a text

Sentiment Analysis
Approaches:

Sentiment Analysis
Approaches:
Classiﬁer-based

Sentiment Analysis
Approaches:
Classiﬁer-based
Lexicon-based

Why Lexicon-based Approach?
The classiﬁer-based approach has the following drawbacks:

Domain Speciﬁcity (Example: Movie reviews mentioning ‘writer’,
‘plot’, etc.) [Bro01]
Lack of Context (Example: ‘good’ vs ‘not good’ vs ‘not very good’)

Domain Speciﬁcity (Example: Movie reviews mentioning ‘writer’,
‘plot’, etc.) [Bro01]
Lack of Context (Example: ‘good’ vs ‘not good’ vs ‘not very good’)
The lexicon-based approach aims at solving these problems.

Introduction Sentiment Lexicons
Sentiment Lexicons
A sentiment lexicon is a sentiment database for language units of the form
(lexical unit, sentiment).

Sentiment Lexicons
Choices for lexical unit:
Word
Word sense
Phrase, etc.

Sentiment Lexicons
Choices for lexical unit:
Word
Word sense
Phrase, etc.
Choices for sentiment:
Fixed categorization into ‘positive’ and ‘negative’
Graded sets like ‘strongly positive’, ‘mildly positive’, ‘neutral’, ‘mildly
negative’, ‘strongly negative’
Score in an interval like [0, 1] or [−1, +1]

Approaches for Creation
Manual
Automatic

Sentiwordnet
1 Introduction
2 Sentiwordnet
3 SO-CAL
4 Wordnet-Aﬀect
6 Conclusions

Sentiwordnet
Introduction to Sentiwordnet
Sentiwordnet [ES06] is an automatically generated sentiment lexicon made
using Wordnet. Its salient features are:

Sentiwordnet
High coverage

Sentiwordnet
High coverage
Support for graded sentiment labels

Sentiwordnet
High coverage
Support for graded sentiment labels
Support for both sentiment classiﬁcation and subjectivity detection

Sentiwordnet Structure
Structure of Sentiwordnet
Sentiwordnet = Wordnet + Sentiment Information.

Each synset s is given three sentiment scores:
Positive score Pos(s)
Negative score Neg(s)
Objective score Obj(s)
Pos(s) + Neg(s) + Obj(s) = 1

Each synset s is given three sentiment scores:
Positive score Pos(s)
Negative score Neg(s)
Objective score Obj(s)
Pos(s) + Neg(s) + Obj(s) = 1
Example Synset
beautifula: Pos = 0.75, Neg = 0.00, Obj = 0.25
a
URL: http://sentiwordnet.isti.cnr.it/search.php?q=beautiful

Sentiwordnet Creation
Creation Steps
The top-level steps in the algorithm to create Sentiwordnet are as follows:

Creation Steps
1 Selection of seed set

Creation Steps
2 Expansion using Wordnet’s semantic relations

Creation Steps
3 Training of a team of ternary classiﬁers

Creation Steps
3 Training of a team of ternary classifiers
4 Classification of each Wordnet synset using the classifiers

SO-CAL
1 Introduction
2 Sentiwordnet
3 SO-CAL
4 Wordnet-Aﬀect
6 Conclusions

SO-CAL
Introduction to SO-CAL
SO-CAL is a system that uses a manually-constructed lexicon. Its salient
features are:

SO-CAL
features are:
Highly detailed lexicon

SO-CAL
features are:
Graded sentiment label

SO-CAL
features are:
Graded sentiment label
Low coverage, but high accuracy

SO-CAL Structure
Features Used
SO-CAL classiﬁes words into various features and treats each feature
diﬀerently in the lexicon. They are:

SO-CAL Structure
Features Used
Adjectives

SO-CAL Structure
Features Used
Adjectives
Nouns, Verbs, Adverbs and Multiwords

SO-CAL Structure
Features Used
Adjectives
Intensiﬁers and Downtoners

SO-CAL Structure
Features Used
Adjectives
Negation

SO-CAL Structure
Features Used
Adjectives
Negation
Irrealis Blocking

SO-CAL Structure
Structure of SO-CAL
Sentiment scoring:

SO-CAL Structure
Structure of SO-CAL
Sentiment scoring:
Words are scored in [−5, +5]
Intensiﬁers and negation further act upon these scores

SO-CAL Structure
Structure of SO-CAL
Sentiment scoring:
Words are scored in [−5, +5]
Intensiﬁers and negation further act upon these scores
Examples
good: +3
monstrosity: −5
masterpiece: +5

Wordnet-Aﬀect
1 Introduction
2 Sentiwordnet
3 SO-CAL
4 Wordnet-Aﬀect
6 Conclusions

Wordnet-Affect
Introduction to Wordnet-Affect
Wordnet-Affect [SV04] is a semi-automatically generated sentiment lexicon
made using Wordnet. It associates affective information with each
synset. Its salient features are:

Wordnet-Affect
Introduction to Wordnet-Affect
Wordnet-Affect [SV04] is a semi-automatically generated sentiment lexicon
made using Wordnet. It associates affective information with each
synset. Its salient features are:
Highly detailed
Ability to handle sentiment differently depending on emotion

Wordnet-Affect Structure
Structure of Wordnet-Affect
Wordnet-Affect = Wordnet + Affect Information.

Wordnet-Affect = Wordnet + Affect Information.
Affect is represented using the following:
An a-label which represents the emotion,
The valency which indicates the sentiment.

The a-label is a tree of emotions starting at a root node with each
leaf node corresponding to a synset.

The a-label is a tree of emotions starting at a root node with each
leaf node corresponding to a synset.
The valency can be any of positive, negative, neutral or ambiguous.

root
mental-state
cognitive-state affective-state
mood emotion
positive-emotion
joy
elation
love
worship
negative-emotion
sadness
melancholy
shame
embarrassment
. . .
. . .
physical-state . . .

Wordnet-Aﬀect Creation
Creation Steps
Wordnet-Aﬀect was created using the following steps:

Creation Steps
Manual creation of initial resource

Creation Steps
Manual creation of initial resource
Automatic expansion using Wordnet relations

Indian-Language Sentiwordnets
1 Introduction
2 Sentiwordnet
3 SO-CAL
4 Wordnet-Aﬀect
6 Conclusions

Introduction to Indian-Language Sentiwordnets
Indian-language Sentiwordnets can be created using Wordnet projection
[JRB10]. This approach has the following salient features:

Introduction to Indian-Language Sentiwordnets
Indian-language Sentiwordnets can be created using Wordnet projection
[JRB10]. This approach has the following salient features:
Easy to create once backing resources are available
No reduplication of eﬀort
Use of tried-and-tested representations

Indian-Language Sentiwordnets Creation
Creation Steps
The process of projecting a Sentiwordnet has the following steps:
Fetch a synset from the English Sentiwordnet.

Creation Steps
Find the corresponding Hindi synset using Indowordnet.

Creation Steps
Find the corresponding Hindi synset using Indowordnet.
Assign sentiment scores from English synset to Hindi synset.

Conclusions
1 Introduction
2 Sentiwordnet
3 SO-CAL
4 Wordnet-Aﬀect
6 Conclusions

Conclusions
A Comparison of the Resources
Criterion SWN SO-CAL WN-Aﬀect IL-SWN
Sentiment 3 x [0, 1] [−5, +5] Aﬀect 3 x [0, 1]
Lexical Unit Synset Word Synset Synset
Backing Resource Wordnet None Wordnet SWN + In-
dowordnet
Creation Automatic Manual Automatic Projection
No of Entries 117,000 5,000 900 16,000

Conclusions
Concluding Remarks
To conclude, there are three choices in making a sentiment lexicon:

Conclusions
Concluding Remarks
Creation Approach: Manual, Automatic, Semi-Automatic or
Projection

Conclusions
Concluding Remarks
Projection
Lexical Unit: Word, Synset or Higher Representations

Conclusions
Concluding Remarks
Projection
Lexical Unit: Word, Synset or Higher Representations
Sentiment: Labels, Graded Scores or Aﬀect Information

Conclusions
Concluding Remarks: Creation Approach
Manual Approach Automatic Approach
High annotation accuracy Low annotation accuracy
High time investment Low time investment
More details supported Less details supported

Conclusions
Concluding Remarks: Lexical Unit
Word Synset
Unreliable for polysemous words Reliable for polysemous words
No pre-processing required Requires WSD
Projection is comparatively diﬃcult Projection is comparatively easier

Conclusions
Concluding Remarks: Sentiment
Graded scores have been shown to be better than mere labels in general.
Moreover, a graded score resource can always be converted to a
label-based resource.
Aﬀect information can help in specialized circumstances.

Conclusions
Future Work
Possible directions in the future:

Conclusions
Future Work
Possible directions in the future:
Automatic resources for higher-level lexical units like phrases, trees,
etc.
Manual resources for synsets
Manual lexicons for Indian languages
Techniques for building dynamic resources to incorporate ‘netspeak’
and other slang

Conclusions
References I
Julian Brooke, A semantic approach to automatic text sentiment
analysis, M.A. thesis, Stanford University, 2001.
Andrea Esuli and Fabrizio Sebastiani, SentiWordNet: A publicly
available lexical resource for opinion mining, Proceedings of the 5th
Conference on Language Resources and Evaluation (LREC-06), 2006,
pp. 417–422.
Andrea Esuli, Automatic generation of lexical resources for opinion
mining: Models, algorithms and applications, Ph.D. thesis, Universita
di Pisa, 2008.
Christiane Fellbaum, Wordnet: An electronic lexical database, A
Bradford Book, 1998.

Conclusions
References II
Vasileios Hatzivassiloglou and Kathleen R. McKeown, Predicting the
semantic orientation of adjectives, Proceedings of the 35th Annual
Meeting of the Association for Computational Linguistics and Eighth
Conference of the European Chapter of the Association for
Computational Linguistics, Association for Computational Linguistics,
1997, pp. 174–181.
Aditya Joshi, Balamurali A R, and Pushpak Bhattacharyya, A
fall-back strategy for sentiment analysis in hindi: a case study,
Proceedings of ICON 2010: 8th International Conference on Natural
Language Processing, Macmillan Publishers, India, 2010.
Jaap Kamps, Maarten Marx, Robert J. Mokken, and Maarten
de Rijke, Using wordnet to measure semantic orientations of
adjectives, Proceedings of LREC-04, 4th International Conference on
Language Resources and Evaluation, 2004, pp. 1115–1118.

Conclusions
References III
Ellen Riloff and Janyce Wiebe, Learning extraction patterns for
subjective expressions, Proceedings of the 2003 Conference on
Empirical Methods in Natural Language Processing, Association for
Computational Linguistics, 2003, pp. 105–112.
Carlo Strapparava and Alessandro Valitutti, WordNet-Affect: an
affective extension of WordNet, Proceedings of the 4th International
Conference on Language Resources and Evaluation (LREC-04), 2004,
pp. 1083–1086.
Peter D. Turney and Michael L. Littman, Measuring praise and
criticism: Inference of semantic orientation from association, ACM
Transactions on Information Systems 21 (2003), no. 4, 315–346.

Additional Slides Wordnet
Wordnet
Wordnet [Fel98] is a lexical database organized by word sense. The
fundamental unit of storage is called a synset.

Wordnet
Wordnet [Fel98] is a lexical database organized by word sense. The
fundamental unit of storage is called a synset.
An Example Synset
brilliant, superba: of surpassing excellence
“a brilliant performance”; “a superb actor”
a
URL: http://wordnetweb.princeton.edu/perl/webwn?s=brilliant

Semantic Relations in Wordnet
Wordnet synsets are linked to each other by relations called semantic
relations. Some of them are:

Antonymy
Meronymy
Hypernymy
Hyponymy
Similar to, etc.

Antonymy
Meronymy
Hypernymy
Hyponymy
Similar to, etc.
These relations are helpful in creating the training set for classifying
synsets to create Sentiwordnet.

Additional Slides Background
Sentiment Classiﬁcation
Initial work that automatically detected the sentiment of a word led to
today’s modern lexicons. This included:

Use of conjunction-separated adjectives [HM97]

PMI-based Extraction using Web Queries [TL03]

Graph Expansion using Wordnet [KMMdR04]

Graph Expansion using Wordnet [KMMdR04]
Classiﬁcation using Wordnet Glosses [Esu08]

Subjectivity Detection
Work that identifies whether a term is indeed subjective is necessary to
filter out objective words from sentiment classification. This includes:

Adapting Wordnet Glosses to Subjectivity Detection [Esu08]

Adapting Wordnet Glosses to Subjectivity Detection [Esu08]
Bootstrapping Subjective Expressions from a Corpus [RW03]

Additional Slides Structure of SO-CAL
Adjectives
Adjectives were collected from a 500-document corpus and annotated with
a sentiment score from −5 to +5.
Examples
good: +3
sleazy: −3

Nouns, Verbs, Adverbs, Multiwords
This was extended to other parts of speech and multiword expressions, for
a total of about 5,000 words.
Examples
monstrosity: −5
masterpiece: +5
inspire: +2
funny: +2 vs. act funny: −1

Intensiﬁers are words that increase sentiment intensity while downtoners
are words that reduce sentiment intensity. For example extraordinarily and
somewhat.

Intensifiers are words that increase sentiment intensity while downtoners
are words that reduce sentiment intensity. For example extraordinarily and
somewhat.
Intensifiers and downtoners are modeled as percentage modifiers.
Examples
slightly: −50%
extraordinarily: +50%

Negation
Negation is modeled as a numeric shift of value 4 towards the opposite
sentiment.
Examples
good: +3 ⇒ not good: −1
atrocious: −5 ⇒ not atrocious: −1

Irrealis Blocking
An irrealis marker is a word that indicates that the sentiment may not be
reliable because the event hasn’t actually happened. For example, ‘would’,
‘expect’, ‘if’, quotation marks, etc.

Irrealis Blocking
An irrealis marker is a word that indicates that the sentiment may not be
reliable because the event hasn’t actually happened. For example, ‘would’,
‘expect’, ‘if’, quotation marks, etc.
Sentences with irrealis markers are ignored for sentiment analysis.

Additional Slides Sentiwordnet Creation
Seed Set
Two seed sets are created:
Lp for positive synsets
Ln for negative synsets

Seed Set
Two seed sets are created:
Lp for positive synsets
Ln for negative synsets
Each synset representation consists of:
The terms
The defninition
The sample phrases
Explicit indication of negation

Wordnet Expansion
Relations of Wordnet used for expansion:

Wordnet Expansion
Relations of Wordnet used for expansion:
Direct antonymy
Similarity
Derived from
Pertains to
Attribute
Also see

Classifiers
8 classifiers were created differing in:

Classiﬁers
No of iterations of expansion (0, 2, 4, 6)

Classiﬁers
No of iterations of expansion (0, 2, 4, 6)
Learning algorithm (SVM, Rocchio)

Classifiers
Each ternary classifier is a sum of 2 binary classifiers:

Classiﬁers
Positive vs. Not Positive
Negative vs. Not Negative

Classiﬁers
Positive vs. Not Positive
Negative vs. Not Negative
The results are combined as:
Positive Not Positive
Negative Objective Negative
Not Negative Positive Objective

MTech Seminar Presentation [IIT-Bombay]

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MTech Seminar Presentation [IIT-Bombay]

Similar to MTech Seminar Presentation [IIT-Bombay] (20)

More from Sagar Ahire

More from Sagar Ahire (8)

Recently uploaded

Recently uploaded (20)

MTech Seminar Presentation [IIT-Bombay]