Sentiment Analysis in Twitter with
Lightweight Discourse Analysis
-Subhabrata Mukherjee & Pushpak Bhattacharyya
Presented By: Naveen Kumar
Outline
 Introduction to Sentiment Analysis
 Motivation
 Introduction : Approach
 Discourse Relations
◦ Discourse Relations Critical for Sentiment Analysis
◦ Semantic Operators Influencing Discourse Relations
 Algorithm Proposed
◦ Feature Variables
◦ Steps for different semantics operators and discourse relations
 Feature Vector Classification
 Evaluation
◦ Data Set Statistics
◦ Accuracy Measures
 Conclusion
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 2
What is Sentiment Analysis ?
 Classification on polarity of given text .
 Example : “@abc Good to see your government is working
for common people”
 Approaches : Opinion Mining
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 3
Introduction to Sentiment Analysis:
 Usage of Sentiment Analysis Applications : Brand
Monitoring , Business Planning , Risk Analysis and many
others
 Sentiment analysis is much more than simplistically
subtracting the number of “negative” words from the
number of “positive” in a document or message in order to
produce a score.
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 4
Introduction to Sentiment Analysis(Contd..) :
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 5
Fig : Approaches for designing Sentiment Analysis Application
Introduction to Sentiment Analysis (Contd..):
 Most Common Techniques Used for Sentiment Analysis:
◦ POS : Finding adjectives as they are indicators for opinion
◦ Term Presence and Frequency
◦ Opinion Dictionaries
◦ Negations : ‘not good’ is equivalent to bad
◦ Point-wise Mutual Information
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 6
Introduction to Sentiment Analysis (Contd..):
 Data expansion in recent years is one of the key factor
which increased the role of Sentiment Analysis in business.
 Looking ahead , we can see a true social democracy in
different domains which harness the opinion of crowd
rather than few experts
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 7
Motivation
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 8
▪ Example :
I’m quite excited about Tintin , despite not really liking original
comics.
 This sentence is positive although there is equal number of
positive and negative words.
 “Despite” gives more weight to previous discourse segment.
Motivation
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 9
▪ Example :
Think I’ll stay with whole sci-fi shit but this time …. classic movie

 This sentence is positive although there is equal number of
positive and negative words.
 “But” gives more weight to following discourse segment.
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 10
Motivation
 Discourse elements in text alters the polarity of text
 Sentiment Analysis classification should also take discourse
elements.
 This paper takes into consideration : connectives, modals,
conditionals and negation in a sentence for polarity
detection
 In this work bag of word model incorporates discourse
information which impacts the sentiment of a sentence
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 11
Motivation : Traditional Approach in
Discourse Analysis
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 12
• Traditional approaches for discourse analysis are mostly based on
Rhetorical Structure Theory (RTS) proposed by Mann (1988)
• But , most of these theories are focused on structured text.
• Tweet : “@abc I am gonna party 2day ….. wanna join?”
• Limitations:
• Parsers and taggers do fail frequently in unstructured text like
tweets
• Heavy linguistic processes increases the processing time and
slow down the entire application.
Introduction: Approach
 The approach deals with noisy and unstructured data as in
Tweets
 This works well even with the structured reviews and gives
a better accuracy compared to state of the art system.
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 13
Introduction: Approach
 Approach in this paper , identifies features for Sentiment
Analysis classification which incorporates discourse
information to give better sentiment classification accuracy
 Features here can include bag of words, along with domain
specific features like emoticons, hashtags, etc.
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 14
Discourse Relations
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 15
 Coherently Structured Discourse is collection of sentences related to each other.
Coherence Relation Conjunction
Cause-Effect because ; and so
Violated Expectations although; but; while
Condition If…(then) ; as long as ; while
Similarity and ; (and) similarity
Contrast by contrast ; but
Temporal Sequences (and) then ; first , second , ..before;
after ; while
Attribution according to … ; …said ; claim that …
; maintain that … ; stated
Example for example ; for instance
Elaboration also ; futhermore ; in addition
Generalization In general
Table : Contentful Conjunctions used to illustrate Coherence Relations
Discourse Relations Critical for Sentiment
Analysis
1. Violated Expectation and Contrast :
◦ Violating expectation conjunctions oppose or refute the neighboring discourse segment.
◦ Categorized in two : Conj_Fol and Conj_Prev
◦ Example :
 “He wasn’t prepared for the exam but he did pretty well”
 “I loved rides at Disney World despite having acrophobia”
◦ The discourse segment following “but” should be given more weight. In Example 2, the discourse
segment preceding “despite” should be given more weight.
2. Conclusive or Inferential Conjunctions : Set of conjunctions : Conj_infer. The discourse segment
following them should be given more weight.
Eg : @User I wasn’t much satisfied with ur so-called gud phone and subsequently decided to reject it.
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 16
Discourse Relations Critical for Sentiment
Analysis (Contd..)
3. Conditionals :
◦ The presence of “if” , tones down the final polarity as it introduces a hypothetical
situation in the context.
◦ Example : “If Micromax improved its battery life , it wud hv been gr8 phone”
4. Other Discourse Relations : Sentences having other discourse conjunctions which
has no contrasting or conflicting information can be handled by bag of word approach.
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 17
Semantic Operators Influencing Discourse
Relations
 Example : “You may like the movie despite the bad
reviews”
 Here ‘despite’ gives more weight to ‘like’
 But ‘may’ pulls down the weight as it creates uncertaininty
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 18
Semantic Operators Influencing Discourse
Relations (Contd..)
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 19
Relations Attributes
Conj_Fol but, however, nevertheless , otherwise , yet ,
still , nonetheless
Conj_Prev till , until , despite , in spite , though , although
Conj_Infer therefore , furthermore , consequently , thus ,
as a result , eventually , hence
Conditionals if
Strong_Mod might , could , can , would , may
Weak_Mod should , ought to , need not , shall , will , must
Neg not, neither , never , no , nor
Table : Discourse Relations and Semantic Operators Essential for Sentiment Analysis
Semantic Operators Influencing Discourse
Relations (Contd..)
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 20
1. Modals :
• Strong_Mod is the set of modals that express a higher degree of
uncertainity in any situation.
•Eg: He may be a rising star
• Weak_Mod is the set of modals that express lesser degree of
uncertainty and more emphasis on
certain events or situations.
Eg: Our civil service should work without interference.
Semantic Operators Influencing Discourse
Relations (Contd..)
2. Negation :
 The negation operator (Example: not, neither, nor, nothing
etc.) inverts the sentiment of the word following it.
Eg : I do not like Nokia but I like Samsung.
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 21
Algorithm Proposed :
 Discourse Relations and Semantic Operators are used to
create feature vectors
 Let user post R consist of ‘m’ sentences si(i=1…m) where
si consist of n words
wij (i=1…m , j=1..n)
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 22
Algorithm Proposed (Cond..): Feature
Variables
 Let A be the set of all discourse relations as described
before.
 Feature Variables :
 fij : Weight of the word in sentence , initialized to 1
 flipij : Indicates weather the polarity of word should be flipped or not
 hypij : Indicates the presence of conditional or strong modal in sentence
 Iterate the algorithm for every sentence :
for i = 1 … m
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 23
Algorithm Proposed (Cond..): Step 1
 Deals with conditional and strong modals
 Algo :
 for j =1..n
hypij = 0;
if wij == Conditional OR Strong _Mod :
hypij = 1;
end if
end for
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 24
Algorithm Proposed (Cond..): Step 2
 Deals with Conj_Fol and Conj_Infer
 Algo :
 for j =1..n
fij = 1;
if wij == Conj_Fol OR Conj_Infer :
for k=j+1 .. N and w != A
fik + = 1;
end for
end if
end for
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 25
Algorithm Proposed (Cond..): Step 3
 Deals with Conj_Prev
 Algo :
 for j =1..n
fij = Taken from Step 2;
if wij == Conj_Prev :
for k=1 .. j-1 and w != A
fik + = 1;
end for
end if
end for
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 26
Algorithm Proposed (Cond..): Step 4
 Deals with Negation
 Algo :
 for j =1..n
flipij = 1;
if wij == Neg :
for k=1 .. Neg_Window and w != (Conj_Prev OR Conj_Fol)
flipi,j+k = -1;
end for
end if
end for
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 27
Feature Vector Classification
 We devised a lexicon based
system as well as a supervised
system for feature vector
 Lexicon Based Classification :
◦ Here :
 fij : weight of each word in sentence
 flipij : indicates polarity of word should
be flipped or not
 hypij : indicates the presence of
conditional or strong_mod
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 28
Feature Vector Classification (Contd..)
 Supervised Classification :
◦ SVM’s are used to classify the set of feature vectors {flipij, wij, fij
and hypij}.
◦ Features used :
 N-grams
 Stop Words
 Feature Weight
 Modal and Conditional Indicator
 Stemming
 Negation
 Emoticons
 POS
 Feature Space
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 29
Evaluation : Data Sets Statistics
 Data Set 1 :
◦ 8507 tweets are collected based on a total of around 2000 different entities from
20 different domains.
◦ Four classes - positive, negative, objective-not-spam and objective-spam.
 Data Set 2 :
◦ Hashtags #positive, #joy, #excited, #happy etc. are used to collect tweets
bearing positive sentiment,
◦ Hashtags like #negative, #sad, #depressed, #gloomy, #disappointed etc. are
used to collect negative tweets.
 Data Set 3 :
◦ Travel Review Dataset contains 595 polarity-tagged documents for each of the
positive and negative classes.
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 30
Evaluation : Data Sets Statistics
 Base Line System : C-Feel-It (Joshi et al., 2011) Rule Based System
 The only difference between the two systems is the handling of
connectives , modals, conditionals and negation
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 31
Manually Annotated DataSet 1
#Positive #Negative #ObjectiveNotSpam #Objective #Total
2548 1209 2757 1993 8507
Auto Annotated DataSet 2
#Positive #Negative Total
7348 7866 15214
Table : Twitter Datasets 1 and 2 Statistics
Evaluation : Accuracy Measures
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 32
Table : Accuracy Comparision b/w Base Line System and Discourse System using Lexicon in Data Set 1 and 2
Accuracy Measure for 2 Class Classification : DataSet 1
Base Line System Discourse System
68.58 72.81
Accuracy Measure for 3 Class Classification : DataSet 1
Base Line System Discourse System
57.2 61.31
Accuracy Measure for 2 Class Classification : DataSet 2
Base Line System Discourse System
80.55 84.91
Evaluation : Accuracy Measures
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 33
Fig : Accuracy Comparision b/w Base Line System and Discourse System using Lexicon in Data Set 1 and 2
0
10
20
30
40
50
60
70
80
90
Accuracy Measure for 2 Class Classification
: DataSet 1
Accuracy Measure for 3 Class Classification
: DataSet 1
Accuracy Measure for 2 Class Classification
: DataSet 2
BaseLineSystem
Discourse System
Evaluation : Accuracy Measures
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 34
Table : Accuracy Comparison b/w Base Line SVM System and Discourse SVM System in Data Set 1 and 2
Accuracy Measure for 2 Class Classification : DataSet 1
Base Line System Discourse System
69.49 70.75
Accuracy Measure for 3 Class Classification : DataSet 1
Base Line System Discourse System
63.11 64.23
Accuracy Measure for 2 Class Classification : DataSet 2
Base Line System Discourse System
91.99 93.01
Evaluation : Accuracy Measures
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 35
Fig : Accuracy Comparison b/w Base Line SVM System and Discourse SVM System in Data Set 1 and 2
0
10
20
30
40
50
60
70
80
90
100
Accuracy Measure for 2 Class Classification
: DataSet 1
Accuracy Measure for 3 Class Classification
: DataSet 1
Accuracy Measure for 2 Class Classification
: DataSet 2
BaseLineSystem
Discourse System
Evaluation : Accuracy Measures
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 36
•This evaluation is used to determine whether our discourse based approach
performs well for structured text as well
Sentiment Evaluation Criterion Accuracy
Baseline Bag-of-Words Model 69.62
Bag-of-Words Model + Discourse 71.78
Table : Accuracy Comparision between Bag of Word and Siscourse System using Lexicon in DataSet 3
Evaluation : Accuracy Measures
 Supervised Classification Approach under 4 scenarios :
◦ When only unigrams are used
◦ When only sense of unigrams are used
◦ When unigrams are used along with their senses (synset-id’s)
◦ When unigrams are used with senses and discourse features
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 37Note : IWSD : Iterative Word Sense Disambiguation ((Khapra et al., 2010)
System Accuracy(%)
Only Unigrams 84.9
Only IWSD Sense of Unigrams 85.48
Unigrams + IWSD Sense of Unigrams 86.08
Unigrams + IWSD Sense of Unigrams +
Discourse 88.13
Table : Accuracy Comparison In Dataset 3
Evaluation : Accuracy Measures
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 38Note : IWSD : Iterative Word Sense Disambiguation ((Khapra et al., 2010)
Figure: Accuracy Comparison In Dataset 3
84.9
85.48
86.08
88.13
83
84
85
86
87
88
89
Unigram IWSD Sense of Unigram Unigram + IWSD Sense of Unigram Unigram + IWSD Sense of Unigram +
Discourse
Accuracy(%)
Conclusion
 Most of the works on unstructured data uses bag of words
approach , ignores discourse markers
 Evaluation on different dataset show that incorporating
discourse information improves the performance by 2-4%
 The approach presented can deal with unstructured data as
it doesn’t need to parse the document.
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 39
Questions for You
 What feature variables are used in the algorithm for
incorporating discourse information?
 What is the advantage of the presented approach over
traditional approaches ?
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 40
Sentiment Analysis in Twitter with Lightweight
Discourse Analysis 41

Sentiment Analysis in Twitter with Lightweight Discourse Analysis

  • 1.
    Sentiment Analysis inTwitter with Lightweight Discourse Analysis -Subhabrata Mukherjee & Pushpak Bhattacharyya Presented By: Naveen Kumar
  • 2.
    Outline  Introduction toSentiment Analysis  Motivation  Introduction : Approach  Discourse Relations ◦ Discourse Relations Critical for Sentiment Analysis ◦ Semantic Operators Influencing Discourse Relations  Algorithm Proposed ◦ Feature Variables ◦ Steps for different semantics operators and discourse relations  Feature Vector Classification  Evaluation ◦ Data Set Statistics ◦ Accuracy Measures  Conclusion Sentiment Analysis in Twitter with Lightweight Discourse Analysis 2
  • 3.
    What is SentimentAnalysis ?  Classification on polarity of given text .  Example : “@abc Good to see your government is working for common people”  Approaches : Opinion Mining Sentiment Analysis in Twitter with Lightweight Discourse Analysis 3
  • 4.
    Introduction to SentimentAnalysis:  Usage of Sentiment Analysis Applications : Brand Monitoring , Business Planning , Risk Analysis and many others  Sentiment analysis is much more than simplistically subtracting the number of “negative” words from the number of “positive” in a document or message in order to produce a score. Sentiment Analysis in Twitter with Lightweight Discourse Analysis 4
  • 5.
    Introduction to SentimentAnalysis(Contd..) : Sentiment Analysis in Twitter with Lightweight Discourse Analysis 5 Fig : Approaches for designing Sentiment Analysis Application
  • 6.
    Introduction to SentimentAnalysis (Contd..):  Most Common Techniques Used for Sentiment Analysis: ◦ POS : Finding adjectives as they are indicators for opinion ◦ Term Presence and Frequency ◦ Opinion Dictionaries ◦ Negations : ‘not good’ is equivalent to bad ◦ Point-wise Mutual Information Sentiment Analysis in Twitter with Lightweight Discourse Analysis 6
  • 7.
    Introduction to SentimentAnalysis (Contd..):  Data expansion in recent years is one of the key factor which increased the role of Sentiment Analysis in business.  Looking ahead , we can see a true social democracy in different domains which harness the opinion of crowd rather than few experts Sentiment Analysis in Twitter with Lightweight Discourse Analysis 7
  • 8.
    Motivation Sentiment Analysis inTwitter with Lightweight Discourse Analysis 8 ▪ Example : I’m quite excited about Tintin , despite not really liking original comics.  This sentence is positive although there is equal number of positive and negative words.  “Despite” gives more weight to previous discourse segment.
  • 9.
    Motivation Sentiment Analysis inTwitter with Lightweight Discourse Analysis 9 ▪ Example : Think I’ll stay with whole sci-fi shit but this time …. classic movie   This sentence is positive although there is equal number of positive and negative words.  “But” gives more weight to following discourse segment.
  • 10.
    Sentiment Analysis inTwitter with Lightweight Discourse Analysis 10
  • 11.
    Motivation  Discourse elementsin text alters the polarity of text  Sentiment Analysis classification should also take discourse elements.  This paper takes into consideration : connectives, modals, conditionals and negation in a sentence for polarity detection  In this work bag of word model incorporates discourse information which impacts the sentiment of a sentence Sentiment Analysis in Twitter with Lightweight Discourse Analysis 11
  • 12.
    Motivation : TraditionalApproach in Discourse Analysis Sentiment Analysis in Twitter with Lightweight Discourse Analysis 12 • Traditional approaches for discourse analysis are mostly based on Rhetorical Structure Theory (RTS) proposed by Mann (1988) • But , most of these theories are focused on structured text. • Tweet : “@abc I am gonna party 2day ….. wanna join?” • Limitations: • Parsers and taggers do fail frequently in unstructured text like tweets • Heavy linguistic processes increases the processing time and slow down the entire application.
  • 13.
    Introduction: Approach  Theapproach deals with noisy and unstructured data as in Tweets  This works well even with the structured reviews and gives a better accuracy compared to state of the art system. Sentiment Analysis in Twitter with Lightweight Discourse Analysis 13
  • 14.
    Introduction: Approach  Approachin this paper , identifies features for Sentiment Analysis classification which incorporates discourse information to give better sentiment classification accuracy  Features here can include bag of words, along with domain specific features like emoticons, hashtags, etc. Sentiment Analysis in Twitter with Lightweight Discourse Analysis 14
  • 15.
    Discourse Relations Sentiment Analysisin Twitter with Lightweight Discourse Analysis 15  Coherently Structured Discourse is collection of sentences related to each other. Coherence Relation Conjunction Cause-Effect because ; and so Violated Expectations although; but; while Condition If…(then) ; as long as ; while Similarity and ; (and) similarity Contrast by contrast ; but Temporal Sequences (and) then ; first , second , ..before; after ; while Attribution according to … ; …said ; claim that … ; maintain that … ; stated Example for example ; for instance Elaboration also ; futhermore ; in addition Generalization In general Table : Contentful Conjunctions used to illustrate Coherence Relations
  • 16.
    Discourse Relations Criticalfor Sentiment Analysis 1. Violated Expectation and Contrast : ◦ Violating expectation conjunctions oppose or refute the neighboring discourse segment. ◦ Categorized in two : Conj_Fol and Conj_Prev ◦ Example :  “He wasn’t prepared for the exam but he did pretty well”  “I loved rides at Disney World despite having acrophobia” ◦ The discourse segment following “but” should be given more weight. In Example 2, the discourse segment preceding “despite” should be given more weight. 2. Conclusive or Inferential Conjunctions : Set of conjunctions : Conj_infer. The discourse segment following them should be given more weight. Eg : @User I wasn’t much satisfied with ur so-called gud phone and subsequently decided to reject it. Sentiment Analysis in Twitter with Lightweight Discourse Analysis 16
  • 17.
    Discourse Relations Criticalfor Sentiment Analysis (Contd..) 3. Conditionals : ◦ The presence of “if” , tones down the final polarity as it introduces a hypothetical situation in the context. ◦ Example : “If Micromax improved its battery life , it wud hv been gr8 phone” 4. Other Discourse Relations : Sentences having other discourse conjunctions which has no contrasting or conflicting information can be handled by bag of word approach. Sentiment Analysis in Twitter with Lightweight Discourse Analysis 17
  • 18.
    Semantic Operators InfluencingDiscourse Relations  Example : “You may like the movie despite the bad reviews”  Here ‘despite’ gives more weight to ‘like’  But ‘may’ pulls down the weight as it creates uncertaininty Sentiment Analysis in Twitter with Lightweight Discourse Analysis 18
  • 19.
    Semantic Operators InfluencingDiscourse Relations (Contd..) Sentiment Analysis in Twitter with Lightweight Discourse Analysis 19 Relations Attributes Conj_Fol but, however, nevertheless , otherwise , yet , still , nonetheless Conj_Prev till , until , despite , in spite , though , although Conj_Infer therefore , furthermore , consequently , thus , as a result , eventually , hence Conditionals if Strong_Mod might , could , can , would , may Weak_Mod should , ought to , need not , shall , will , must Neg not, neither , never , no , nor Table : Discourse Relations and Semantic Operators Essential for Sentiment Analysis
  • 20.
    Semantic Operators InfluencingDiscourse Relations (Contd..) Sentiment Analysis in Twitter with Lightweight Discourse Analysis 20 1. Modals : • Strong_Mod is the set of modals that express a higher degree of uncertainity in any situation. •Eg: He may be a rising star • Weak_Mod is the set of modals that express lesser degree of uncertainty and more emphasis on certain events or situations. Eg: Our civil service should work without interference.
  • 21.
    Semantic Operators InfluencingDiscourse Relations (Contd..) 2. Negation :  The negation operator (Example: not, neither, nor, nothing etc.) inverts the sentiment of the word following it. Eg : I do not like Nokia but I like Samsung. Sentiment Analysis in Twitter with Lightweight Discourse Analysis 21
  • 22.
    Algorithm Proposed : Discourse Relations and Semantic Operators are used to create feature vectors  Let user post R consist of ‘m’ sentences si(i=1…m) where si consist of n words wij (i=1…m , j=1..n) Sentiment Analysis in Twitter with Lightweight Discourse Analysis 22
  • 23.
    Algorithm Proposed (Cond..):Feature Variables  Let A be the set of all discourse relations as described before.  Feature Variables :  fij : Weight of the word in sentence , initialized to 1  flipij : Indicates weather the polarity of word should be flipped or not  hypij : Indicates the presence of conditional or strong modal in sentence  Iterate the algorithm for every sentence : for i = 1 … m Sentiment Analysis in Twitter with Lightweight Discourse Analysis 23
  • 24.
    Algorithm Proposed (Cond..):Step 1  Deals with conditional and strong modals  Algo :  for j =1..n hypij = 0; if wij == Conditional OR Strong _Mod : hypij = 1; end if end for Sentiment Analysis in Twitter with Lightweight Discourse Analysis 24
  • 25.
    Algorithm Proposed (Cond..):Step 2  Deals with Conj_Fol and Conj_Infer  Algo :  for j =1..n fij = 1; if wij == Conj_Fol OR Conj_Infer : for k=j+1 .. N and w != A fik + = 1; end for end if end for Sentiment Analysis in Twitter with Lightweight Discourse Analysis 25
  • 26.
    Algorithm Proposed (Cond..):Step 3  Deals with Conj_Prev  Algo :  for j =1..n fij = Taken from Step 2; if wij == Conj_Prev : for k=1 .. j-1 and w != A fik + = 1; end for end if end for Sentiment Analysis in Twitter with Lightweight Discourse Analysis 26
  • 27.
    Algorithm Proposed (Cond..):Step 4  Deals with Negation  Algo :  for j =1..n flipij = 1; if wij == Neg : for k=1 .. Neg_Window and w != (Conj_Prev OR Conj_Fol) flipi,j+k = -1; end for end if end for Sentiment Analysis in Twitter with Lightweight Discourse Analysis 27
  • 28.
    Feature Vector Classification We devised a lexicon based system as well as a supervised system for feature vector  Lexicon Based Classification : ◦ Here :  fij : weight of each word in sentence  flipij : indicates polarity of word should be flipped or not  hypij : indicates the presence of conditional or strong_mod Sentiment Analysis in Twitter with Lightweight Discourse Analysis 28
  • 29.
    Feature Vector Classification(Contd..)  Supervised Classification : ◦ SVM’s are used to classify the set of feature vectors {flipij, wij, fij and hypij}. ◦ Features used :  N-grams  Stop Words  Feature Weight  Modal and Conditional Indicator  Stemming  Negation  Emoticons  POS  Feature Space Sentiment Analysis in Twitter with Lightweight Discourse Analysis 29
  • 30.
    Evaluation : DataSets Statistics  Data Set 1 : ◦ 8507 tweets are collected based on a total of around 2000 different entities from 20 different domains. ◦ Four classes - positive, negative, objective-not-spam and objective-spam.  Data Set 2 : ◦ Hashtags #positive, #joy, #excited, #happy etc. are used to collect tweets bearing positive sentiment, ◦ Hashtags like #negative, #sad, #depressed, #gloomy, #disappointed etc. are used to collect negative tweets.  Data Set 3 : ◦ Travel Review Dataset contains 595 polarity-tagged documents for each of the positive and negative classes. Sentiment Analysis in Twitter with Lightweight Discourse Analysis 30
  • 31.
    Evaluation : DataSets Statistics  Base Line System : C-Feel-It (Joshi et al., 2011) Rule Based System  The only difference between the two systems is the handling of connectives , modals, conditionals and negation Sentiment Analysis in Twitter with Lightweight Discourse Analysis 31 Manually Annotated DataSet 1 #Positive #Negative #ObjectiveNotSpam #Objective #Total 2548 1209 2757 1993 8507 Auto Annotated DataSet 2 #Positive #Negative Total 7348 7866 15214 Table : Twitter Datasets 1 and 2 Statistics
  • 32.
    Evaluation : AccuracyMeasures Sentiment Analysis in Twitter with Lightweight Discourse Analysis 32 Table : Accuracy Comparision b/w Base Line System and Discourse System using Lexicon in Data Set 1 and 2 Accuracy Measure for 2 Class Classification : DataSet 1 Base Line System Discourse System 68.58 72.81 Accuracy Measure for 3 Class Classification : DataSet 1 Base Line System Discourse System 57.2 61.31 Accuracy Measure for 2 Class Classification : DataSet 2 Base Line System Discourse System 80.55 84.91
  • 33.
    Evaluation : AccuracyMeasures Sentiment Analysis in Twitter with Lightweight Discourse Analysis 33 Fig : Accuracy Comparision b/w Base Line System and Discourse System using Lexicon in Data Set 1 and 2 0 10 20 30 40 50 60 70 80 90 Accuracy Measure for 2 Class Classification : DataSet 1 Accuracy Measure for 3 Class Classification : DataSet 1 Accuracy Measure for 2 Class Classification : DataSet 2 BaseLineSystem Discourse System
  • 34.
    Evaluation : AccuracyMeasures Sentiment Analysis in Twitter with Lightweight Discourse Analysis 34 Table : Accuracy Comparison b/w Base Line SVM System and Discourse SVM System in Data Set 1 and 2 Accuracy Measure for 2 Class Classification : DataSet 1 Base Line System Discourse System 69.49 70.75 Accuracy Measure for 3 Class Classification : DataSet 1 Base Line System Discourse System 63.11 64.23 Accuracy Measure for 2 Class Classification : DataSet 2 Base Line System Discourse System 91.99 93.01
  • 35.
    Evaluation : AccuracyMeasures Sentiment Analysis in Twitter with Lightweight Discourse Analysis 35 Fig : Accuracy Comparison b/w Base Line SVM System and Discourse SVM System in Data Set 1 and 2 0 10 20 30 40 50 60 70 80 90 100 Accuracy Measure for 2 Class Classification : DataSet 1 Accuracy Measure for 3 Class Classification : DataSet 1 Accuracy Measure for 2 Class Classification : DataSet 2 BaseLineSystem Discourse System
  • 36.
    Evaluation : AccuracyMeasures Sentiment Analysis in Twitter with Lightweight Discourse Analysis 36 •This evaluation is used to determine whether our discourse based approach performs well for structured text as well Sentiment Evaluation Criterion Accuracy Baseline Bag-of-Words Model 69.62 Bag-of-Words Model + Discourse 71.78 Table : Accuracy Comparision between Bag of Word and Siscourse System using Lexicon in DataSet 3
  • 37.
    Evaluation : AccuracyMeasures  Supervised Classification Approach under 4 scenarios : ◦ When only unigrams are used ◦ When only sense of unigrams are used ◦ When unigrams are used along with their senses (synset-id’s) ◦ When unigrams are used with senses and discourse features Sentiment Analysis in Twitter with Lightweight Discourse Analysis 37Note : IWSD : Iterative Word Sense Disambiguation ((Khapra et al., 2010) System Accuracy(%) Only Unigrams 84.9 Only IWSD Sense of Unigrams 85.48 Unigrams + IWSD Sense of Unigrams 86.08 Unigrams + IWSD Sense of Unigrams + Discourse 88.13 Table : Accuracy Comparison In Dataset 3
  • 38.
    Evaluation : AccuracyMeasures Sentiment Analysis in Twitter with Lightweight Discourse Analysis 38Note : IWSD : Iterative Word Sense Disambiguation ((Khapra et al., 2010) Figure: Accuracy Comparison In Dataset 3 84.9 85.48 86.08 88.13 83 84 85 86 87 88 89 Unigram IWSD Sense of Unigram Unigram + IWSD Sense of Unigram Unigram + IWSD Sense of Unigram + Discourse Accuracy(%)
  • 39.
    Conclusion  Most ofthe works on unstructured data uses bag of words approach , ignores discourse markers  Evaluation on different dataset show that incorporating discourse information improves the performance by 2-4%  The approach presented can deal with unstructured data as it doesn’t need to parse the document. Sentiment Analysis in Twitter with Lightweight Discourse Analysis 39
  • 40.
    Questions for You What feature variables are used in the algorithm for incorporating discourse information?  What is the advantage of the presented approach over traditional approaches ? Sentiment Analysis in Twitter with Lightweight Discourse Analysis 40
  • 41.
    Sentiment Analysis inTwitter with Lightweight Discourse Analysis 41