2. What is SA & OM?
⢠Identify the orientation of opinion in a
piece of text
⢠Can be generalized to a wider set of
emotions
The movie
was fabulous!
The movie
stars Mr. X
The movie
was horrible!
3. Motivation
⢠Knowing sentiment is a very natural ability
of a human being.
Can a machine be trained to do it?
⢠SA aims at getting sentiment-related
knowledge especially from the huge
amount of information on the internet
⢠Can be generally used to understand
opinion in a set of documents
4. Tripod of Sentiment Analysis
Cognitive
Science
Natural
Language
Processing
Machine
Learning
Sentiment
Analysis
Natural
Language
Processing
Machine
Learning
6. Challenges
⢠Contrasts with standard text-based
categorization
⢠Domain dependent
⢠Sarcasm
⢠Dissatisfied expressions
Mere presence of words is
Indicative of the category
in case of text categorization.
Not the case with
sentiment analysis
⢠Contrasts with standard text-based
categorization
⢠Domain dependent
⢠Sarcasm
⢠Dissatisfied expressions
Sentiment of a word
is w.r.t. the
domain.
Example: âunpredictableâ
For steering of a car,
For movie review,
⢠Contrasts with standard text-based
categorization
⢠Domain dependent
⢠Sarcasm
⢠Dissatisfied expressions
Sarcasm uses words of
a polarity to represent
another polarity.
Example: The perfume is so
amazing that I suggest you wear it
with your windows shut
⢠Contrasts with standard text-based
categorization
⢠Domain dependent
⢠Sarcasm
⢠Dissatisfied expressions
the sentences/words that
contradict the overall sentiment
of the set are in majority
Example: The actors are good,
the music is brilliant and appealing.
Yet, the movie fails to strike a chord.
7. SentiWordNet
â˘Lexical resource for sentiment
analysis
â˘Built on the top of WordNet synsets
â˘Attaches sentiment-related
information with synsets
9. Building SentiWordNet
⢠Ln, Lo, Lp are the three seed sets
⢠Iteratively expand the seed sets through K
steps
⢠Train the classifier for the expanded sets
10. Lp
Ln
Expansion of seed sets
The sets at the end of kth step are called Tr(k,p) and Tr(k,n)
Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n)
11. Committee of classifiers
⢠Train a committee of classifiers of different
types and different K-values for the given
data
⢠Observations:
â Low values of K give high precision and low
recall
â Accuracy in determining positivity or
negativity, however, remains almost constant
12. WordNet Affect
⢠Similar to SentiWordNet (an earlier work)
⢠WordNet-Affect: WordNet + annotated
affective concepts in hierarchical order
⢠Hierarchy called âaffective domain labelsâ
â behaviour
â personality
â cognitive state
14. Constructing the graph
⢠Why graphs?
⢠Nodes and edges?
⢠Individual Scores
⢠Association scores
To model item-specific
and pairwise information
independently.
⢠Why graphs?
⢠Nodes and edges?
⢠Individual Scores
⢠Association scores
Nodes: Sentences of
the document and source & sink
Source & sink represent
the two classes of sentences
Edges: Weighted with
either of the two scores
⢠Why graphs?
⢠Nodes and edges?
⢠Individual Scores
⢠Association scores
Prediction whether
the sentence is subjective or not
Indsub(si)=
⢠Why graphs?
⢠Nodes and edges?
⢠Individual Scores
⢠Association scores
Prediction whether two
sentences should have
the same subjectivity level
T : Threshold â maximum distance upto
which sentences may be considered
proximal
f: The decaying function
i, j : Position numbers
15. Constructing the graph
⢠Build an undirected graph G with vertices
{v1, v2âŚ,s, t} (sentences and s,t)
⢠Add edges (s, vi) each with weight ind1(xi)
⢠Add edges (t, vi) each with weight ind2(xi)
⢠Add edges (vi, vk) with weight assoc (vi, vk)
⢠Partition cost:
19. Approach 1: Using adjectives
⢠Many adjectives have high sentiment
value
â A âbeautifulâ bag
â A âwoodenâ bench
â An âembarrassingâ performance
⢠An idea would be to augment this polarity
information to adjectives in the WordNet
20. Setup
⢠Two anchor words (extremes of the
polarity spectrum) were chosen
⢠PMI of adjectives with respect to these
adjectives is calculated
Polarity Score (W)= PMI(W,excellent) â PMI (W, poor)
excellent poor
word
PMI PMI
21. Experimentation
⢠K-means clustering algorithm used on the
basis of polarity scores
⢠The clusters contain words with similar
polarities
⢠These words can be linked using an
âisopolarity linkâ in WordNet
22. Results
⢠Three clusters seen
⢠Major words were with negative polarity
scores
⢠The obscure words were removed by
selecting adjectives with familiarity count
of 3
â the ones that are not very common
23. Approach 2: Using Adverb-
Adjective Combinations (AACs)
⢠Calculate sentiment value based on the
effect of adverbs on adjectives
⢠Linguistic ideas:
⢠Adverbs of affirmation: certainly
⢠Adverbs of doubt: possibly
⢠Strong intensifying adverbs: extremely
⢠Weak intensifying adverbs: scarcely
⢠Negation and Minimizers: never
24. Moving towards computationâŚ
⢠Based on type of adverb, the score of the
resultant AAC will be affected
⢠Example of an axiom:
⢠Example : âextremely goodâ is more
positive than âgoodâ
26. Scoring the sentiment on a topic
⢠Rel (t) : Sentences in d that reference to
topic t
⢠s : Sentence is Rel (t)
⢠Appl+(s) : AACs with positive score in s
⢠Appl-(s) : AACs with negative score in s
⢠Return strength =
27. Findings
⢠APSr with r=0.35 worked the best (Better
correlation with human subject)
â Adjectives are more important than adverbs in
terms of sentiment
⢠AACs give better precision and recall as
compared to only adjectives
29. Lexicon
subj. bolt
b VB bolt subj
subj. lack obj.
b VB lack obj ~subj
Argument that sends the
sentiment (subj./obj.)
Argument that receives the
sentiment (subj./obj.)
Argument that receives the
sentiment (subj./obj.)
30. Lexicon
⢠Also allows âS+â characters
⢠Similar to regular expressions
⢠E.g. to put S+ to risk
â The favorability of the subject depends on the
favorability of âS+â.
31. Example
The movie lacks a good story.
G JJ good obj.
The movie lacks S+.
B VB lack obj ~subj.
Lexicon : Steps :
1) Consider a context window of upto five
words
2) Shallow parse the sentence
3) Step-by-step calculate the sentiment value
based on lexicon and by adding âS+â
characters at each step
33. Applications
⢠Review-related analysis
⢠Developing âhate mail filtersâ analogous to
âspam mail filtersâ
⢠Question-answering (Opinion-oriented
questions may involve different treatment)
34. Conclusion & Future Work
⢠Lexical Resources have been developed to
capture sentiment-related nature
⢠Subjective extracts provide a better accuracy of
sentiment prediction
⢠Several approaches use algorithms like Naïve
Bayes, clustering, etc. to perform sentiment
analysis
⢠The cognitive angle to Sentiment Analysis can
be explored in the future
35. References (1/2)
⢠Tetsuya Nasukawa, Jeonghee Yi. âSentiment Analysis: Capturing
Favorability Using Natural Language Processingâ. In K-CAP â03, Florida,
pages 1-8. 2003.
⢠Alekh Agarwal, Pushpak Bhattacharyya. âAugmenting WordNet with polarity
information on adjectivesâ. In K-CAP â03, Florida, pages 1-8. 2003.
⢠SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining
Andrea Esuli, Fabrizio Sebastiani
⢠âMachine Learningâ, Han and Kamber, 2nd edition, 310-330.
⢠http://wordnet.princeton.edu
⢠Farah Benamara, Carmine Cesarano, Antonio Picariello, VS Subrahmanian
et al; âSentiment Analysis: Adjectives and Adverbs are better than Adjectives
Aloneâ; In ICWSM â2007 Boulder, CO USA, 2007.
36. References (2/2)
⢠Jon M. Kleinberg; âAuthoritative Sources in a Hyperlinked Environmentâ as
IBM Research Report RJ 10076, May 1997, Pgs. 1 â 34.
⢠www.cs.uah.edu/~jrushing/cs696-summer2004/notes/Ch8Supp.ppt
⢠Opinion Mining and Sentiment Analysis, Foundations and Trends in
Information Retrieval, B. Pang and L. Lee, Vol. 2, Nos. 1â2 (2008) 1â135,
2008.
⢠Bo Pang, Lillian Lee; âA Sentimental Education: Sentiment Analysis Using
Subjectivity Summarization Based on Minimum Cutsâ; Proceedings of the
42nd ACL; pp. 271â278; 2004.
⢠http://www.cse.iitb.ac.in/~veeranna/ppt/Wordnet-Affect.ppt