Sarcasm Detection: Achilles Heel of sentiment analysis

Sarcasm Detection:
Achilles Heel of Sentiment Analysis
Anuj Gupta
Independent Researcher
Former Director - ML, Huawei Technologies

“Technical talk right after lunch!!
That is exactly what every speaker wants.
Thanks, Naresh and ODSC team.”
2

Sarcasm
● Greek: sarkázein (speak bitterly, use of irony to mock)
French: sarcasme
● Nuanced form of language where often the speaker explicitly states the opposite of
what she implies.
● Deliberately mean opposite of what is on the surface.
“This talk looks like great fun ;)”
5

Importance of sarcasm detection
Business Perspective:
● Organizations tap into social media for public opinion on their products &
services and real time customer assistance.
● To assist this, sentiment analysis is a key offering in any and every CRM tool.
● Customers often use sarcasm to expressing their frustration with
products/services.
6

● Most sentiment analysis systems (SAS) fail to detect sarcasm and wrongly
infer the sentiment
● Both the systems got fooled by the word “love”.
● Most SAS lack the sophistication needed to detect sarcasm. 7
Stanford’s sentiment analysis demo Aylien sentiment analysis demo

● This places extra burden on customer care teams.
● Owing to the volume, velocity of traffic, subtlety of language, background &
cultural differences; agents can miss sarcasm completely.
● Missing/Misinterpreting = PR disasters for brands.
8

Research Perspective:
● Much like QnA, text summarization, machine translation, sarcasm detection
involves complexity of language and is believed to be a much harder task.
● Any progress in sarcasm detection, is an positive step towards pushing the
boundaries of NLP.
● Only recently, people started to look into it.
● With improvement in our understanding and approaches to sentiment
analysis, researchers started focusing on more difficult cases
○ Aspect based sentiment analysis
○ Sarcasm detection
So, be it be business/research perspective, it is worth
investing time and energy in sarcasm detection.
9

● Sarcasm: “a sharp, bitter, or cutting expression or remark; a bitter gibe or taunt”.
● Sarcasm is negative sentiment.
○ You are never sarcastically positive
What makes sarcasm detection difficult?
● It is deliberate - people employ play of language.
● It is subtle: it is just a word, phrases or a punctuation that is here and there.
● Even humans can find it hard to understand.
● Sarcasm is often used on social media platforms like twitter.
● Sarcasm in twitter comes with additional challenges : Fewer word cues (280
characters), spelling mistakes, acronyms, slang words, ever evolving vocabulary.
Key Characteristics
10

Problem Statement
Business problem: Build a sentiment analysis system capable of handling
sarcasm.
Abstract problem: Given an unlabeled tweet T from user U, a solution should
automatically detect if T is sarcastic or not.
Sarcasm ?text
Sentiment Analysis
No
Negative SentimentYes
11

Scope and Assumptions
● Consider the following sarcasm: “If Hillary wins, she will surely be pleased to
recall Monica each time she enters Oval office”.
Detecting this requires:
● Anaphora resolution
● Fact extraction
● Logical reasoning
● Such complex cases are beyond the scope of this work.
● Further, we assume all information necessary to detect sarcasm is contained
in same sentence (twitter data).
[Detecting sarcasm in paragraphs and articles is a much harder problem]
12

Dataset
Manually identified sources for sarcasm:
● Hashtags : #sarcasm, #not, #irony
● Handles : @sarcastic_us, @heissarcastic, @SarcasmMsg ….
What is not sarcasm ? Everything else.
● For this we also used twitter datasets for sentiment analysis.
● Being short (280 characters), all information necessary to detect sarcasm is
contained in same sentence.
● After cleaning left with ~100K data points, ~50K per class.
● Built test data of 20K data points in similar fashion, but from a different timeline.
13

Literature Survey
● Until very recently, hand coded features were used extensively.
○ Unigrams, bigrams, trigrams, n-grams, dictionary-based lexical features.
○ Pragmatic features such as emoticons, capitalization, punctuation.
○ Presence of a positive sentiment in close proximity of a negative situation phrase as a
feature for sarcasm detection.
○ Features based on frequency (gap between rare and common words)
○ Incongruous: number of time a word is followed by a word of opposite polarity.
○ #positive words, # negative words, length of longest sequences without polarity flip.
[Tsur et al., 2010, Gonzalez-Ib´anez et al., 2011, Riloff et al., 2013, Buschmeier et al.,
2014, Joshi et al., 2015] 14

● These features are based on certain observations in the dataset. Thus, they
are mostly dataset specific.
● While this can give great performance on the dataset in hand, from a product
point of view one would like to have more robust features.
● Features that are not brittle and generalize to other datasets.
● People started to apply DL models.
15

Baseline
● Treated this as binary classification problem.
● Single layer RNNs (LSTM, GRU).
● Failed to generalize (F1 score of ~68%).
● Owing to not having enough data, they overfit very quickly.
● Simple CNN did far better (F1 score of ~76%)
16

Need for stronger signals
Literature on sarcasm detection has typically used 3 clues :
1. Sentiment
2. Emotion
3. Personality
Let us understand each one of them in detail.
17

Sentiment
● Most sarcastic sentences show a shift in sentiment
“I love the pain present in the breakups”
(shift in sentiment)
● There is a contradiction between sentiment of “love” and “pain of breakups”.
This is a hallmark of sarcasm.
● Thus, including sentiment clues should help in sarcasm detection.
Traditionally this was done via sentiment lexicons.
○ # negative words, # positive words, # sentiment shifts across adjacent words
● Instead, we use features extracted from neural network trained for sentiment. 18

Emotion
● Emotion: feelings such as happiness, anger, jealousy, grief, etc. One can
have many emotions simultaneously. Subjective in nature.
● Sentiment: opinion or mental attitude produced by emotions about something.
This is much more objective.
● Sarcastic sentences are rich in emotions.
● “My steller programing carrier: job offer ; Ctrl C, Ctrl V; resignation. Repeat”
● Pain, sadness, anger, disgust etc
● Thus, including emoticons clues should help.
● We use features extracted from neural network trained for emotion.
19

Personality
● There is a body of work that argues that sarcasm is not just a linguistic
phenomena but also a behavioral phenomena i.e. it not just about what is
being said but also who is saying that is super important.
● i.e. sarcasm is user specific: some users have a stronger tendency to be
sarcastic as compared to others*.
● This body of work factors in the history of the user in question to derive
features to model behavioral aspect. For this, they use past tweets of the
user.
● Researchers have shown substantial gains using personality based signals.
20
* There are systematic studies that establish positive correlation between ability to create & understand sarcasm and higher cognitive ability.

Personality
● However, from the view point of production system this is super challenging.
● One has to either store features from every users past timeline.
● Or retrieving a users history at run time, compute features on the fly
● Given typical volumes, both these choices have severe implications on
throughput and resources.
● Hence, we did not use personality based indicators.
21

Our solution in nutshell
22
Text
Sentiment
features
Final
Classifier
Emotion
features
Features from
baseline
Sentiment model
Emotion model
Baseline model
CNN / Linear models

Details
1. Sentiment model
2. Emotion model
3. Final Classifier
23

Sentiment Model
● Objective is to build a sentiment model where the second last layer will
be used to extract features.
● We built a (standard) CNN for this.
○ Alternative layers of Convolution and Max Pooling
○ Followed by fully connected layers
○ Softmax output
● Text is tokenized (we used tweet tokenizer by Allen Ritter)
● Embedding layer is initialized using pretrained GloVe word embeddings
for twitter.
25

Sentiment Model
● To train this network we used a dataset for sentiment analysis.
○ 3 classes - negative, positive and neutral
○ Public dataset + custom data
● All convolutions are 1D convolutions.
○ Height of the filter varies.
○ Width of the filter is same as that of embedding dimension.
26

Emotion Model
● Build a emotion model where the second last layer will be used to extract
features.
● We built a (standard) CNN for this.
○ Alternative layers of Convolution and Max Pooling
○ Followed by fully connected layers
○ Softmax output
● Text is tokenized
● Embedding layer is initialized using pretrained GloVe word embeddings
for twitter.
28

Emotion Model
● To train this network we used a dataset for emotion analysis.
○ 6 classes - anger, disgust, surprise, sadness, joy and fear
○ Public dataset + custom data
● All convolutions are 1D convolutions.
29

Model Architecture
● Owing to scarcity of data, we kept networks simple.
● Embedding layer was frozen. Not fine tuned.
30

Our solution in nutshell
31
Text
Sentiment
features
Final
Classifier
Emotion
features
Features from
baseline
Pretrained Sentiment model
Pretrained Emotion model
Pretraned Baseline model
CNN / Linear models

Results
● Test Data came from a different timeline.
● ~20K balanced test set.
32

Future work
● Train your own word embedding
● Character n-gram embeddings
● Retry RNNs
● Attention networks
● Collect more data!
○ Collecting right data for negative class (not sarcasm) is very important.
○ Adding public datasets of sentiment to negative class helped us a lot.
● It will be interesting to see the impact of factoring in user behaviour.
33

Learnings
● Sarcasm detection is an important problem.
● It is difficult:
○ long term dependencies
○ subtle changes of word or punctuation can flip the polarity
○ Needs facts and external knowledge
● Present sentiment analysis systems are bad at detecting sarcasm.
● Pretrained (sub-task) specific CNNs can work in text as well.
● This is an example of domain knowledge + Deep Learning.
● Data collection strategy is important
○ Comprehensively collecting what is not sarcasm.
○ Adding public datasets of sentiment to negative class helped us a lot. 34

Thank You
Questions?
35
@anujgupta82
/anujgupta-82

Sarcasm Detection: Achilles Heel of sentiment analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sarcasm Detection: Achilles Heel of sentiment analysis

Similar to Sarcasm Detection: Achilles Heel of sentiment analysis (20)

More from Anuj Gupta

More from Anuj Gupta (9)

Recently uploaded

Recently uploaded (20)

Sarcasm Detection: Achilles Heel of sentiment analysis