Sarcasm Detection

Sarcasm Detection
David Boyhan
CS-410
December 9, 2016
University of Illinois -
Urbana Champaign
MCS – DSO
Dboyhan2@illinois.edu
ABSTRACT
In this paper, I describe efforts to use natural language processing
techniques to discern sarcastic comments from non-sarcastic
comments
Keywords
Sentiment Analysis, Sarcasm, Natural Language Processing
1. INTRODUCTION
Sentiment analysis is a branch of natural language processing
(“NLP”) intended to identify the “polarity” (positive or negative)
of a comment or communication. This is useful in situations such
as customer support and product review analysis to determine if
there is positive or negative feedback being submitted. However,
sentiment analysis is frequently stymied by sarcastic comments.
The project is an attempt to use different NLP techniques to
discriminate sarcastic from non-sarcastic comments.
2. DEFINITIONS
Sarcasm is defined by the Oxford English Dictionary as “A sharp,
bitter, or cutting expression or remark; a bitter gibe or taunt.” [1]
However, in more common usage, sarcasm is typically conflated
with an intentionally ironic statement. [2] In other words, sarcasm
will most typically be a statement that is intentionally the exact
opposite of the literal meaning of the statement. For example, “I
love waiting on line” or “I hate receiving thoughtful gifts.” In
each case, the communicator or speaker obviously means the
opposite of the actual statement.
In NLP terms, sarcasm can be defined as false positive or false
negative sentiment classification. That is, using conventional
sentiment analysis would yield the wrong result for a sarcastic
comment.
3. CREATING OR FINDING A CORPUS
3.1 Web-based Tagging
The first part of this project, as proposed, was to develop a web-
based tool for marking sarcastic comments from a browser (such
as tweets or user comments) and then, using a JavaScript
bookmarklet or a Chrome extension, tagging the highlighted text
and automatically adding it to a “sarcasm corpus.” Unfortunately,
this portion of the project was not at all successful. JavaScript in
particular and web-browsers in general are specifically designed
to isolate the web from the native OS. As a result, although it is
relatively easy to use JavaScript to manipulate highlighted text
internally, including copying into the OS’ clipboard, it is
extremely difficult to then take that text and write it to a local file.
This would be a significant security risk and although it would be
possible to develop work-arounds, such as OS specific apps that
automatically “sweep” clipboard contents into a local file, this
would no longer be web-based only or OS agnostic and also
seemed to be significantly beyond the scope of CS410. As a
result, this portion of the project was suspended.
3.2 Pre-Tagged Sarcasm Corpora
In the absence of a tool and resources to develop a large tagged
sarcasm corpora, it was necessary to find pre-tagged corpora
elsewhere. Two resources in particular were identified and
reviewed. The first was developed at Fordham University, [3] and
was developed by using sentiment analysis on Amazon product
reviews together with the number of stars assigned to the product
in the review. In simplest terms, the corpus identified ironic or
sarcastic comments by cross-indexing positive polarity to fewer
stars. [4] My concern in using this corpus was that in reviewing
samples of the corpus, it did not appear that there were many
features that would help develop a sarcasm model beyond the star
ratings.
Accordingly, I identified a second tagged sarcasm corpus, which I
subsequently selected for analysis. This second corpus was
developed at the University of Santa Cruz [5] and included
quotations and responses mined from political debates on multiple
internet forums and tagged through Amazon’s Mechanical Turk
into “sarcastic” and “not-sarcastic” categories. There were
approximately 5,000 tagged comment/response pairs, split
roughly 50/50 between sarcastic and non-sarcastic.
4. PROCESSING & ANALYSIS
In order to build an NLP model using the tagged corpus, I used
the WEKA 3 Data Mining toolset [6]. WEKA has a number of
NLP tools built in and has a package management tool that allows
the addition of other algorithms, such as SVM.
In order to use WEKA, the corpus needed to be converted from its
tagged CSV format into WEKA custom arff format. As a part of
that conversion, I prepared different sub-corpora to work with:
Quote and Response – Stop Words Removed
Quote and Response – Stop Words Retained
Quote Only – Stop Words Removed
Response Only – Stop Words Retained
I then applied multiple NLP categorization algorithms to the
different corpora, including naïve Bayes, k-nearest neighbor
(“KNN”) and support vector machines (“SVM”). Unfortunately,

as shown below, the results were significantly poorer than I
anticipated:
Naive Bayes
Correctly
Classified
2424 51.66%
Incorrectly
Classified
2268 48.34%
True
Positive
False
Positive
Precision Recall F-Measure
Sarcastic 0.613 0.58 0.514 0.613 0.559
Not
Sarcastic
0.42 0.387 0.521 0.42 0.465
Weighted
Avg.
0.517 0.483 0.517 0.517 0.512
KNN
Correctly
Classified
2431 51.81%
Incorrectly
Classified
2261 48.19%
True
Positive
False
Positive
Sarcastic 0.422 0.385 0.522 0.422 0.467
Not
Sarcastic
0.615 0.578 0.515 0.615 0.561
Weighted
Avg.
0.518 0.482 0.519 0.518 0.514
SVM
Correctly
Classified
2356 50.21%
Incorrectly
Classified
2336 49.79%
True
Positive
False
Positive
Sarcastic 0.405 0.401 0.503 0.405 0.449
Not
Sarcastic
0.599 0.595 0.502 0.599 0.546
Weighted
Avg.
0.502 0.498 0.502 0.502 0.497
Clearly, based on both Precision and F-Measure, results were
extremely mediocre, bordering on little better than coin flipping.
What was particularly surprising about these results was that use
of presumably more “reliable” algorithms, such as KNN and
SVM actually yielded poorer results that algorithms considered
less robust, such as naïve Bayes.
These results were consistent across all four corporate corpora.
Further attempts to improve performance, such as retaining or
eliminating punctuation, layering models and similar efforts were
equally disappointing.
5. BARRIERS TO RECOGNIZING
SARCASM
As a result of the poor results using various algorithms, I began
researching in more detail the issues associated with sarcasm
recognition.
5.1 Humans Are Very Bad at Recognizing
Sarcasm
A significant body of research supports the assertion that human
beings are almost as bad at identifying written sarcasm as the
models discussed above. Anecdotally, there are two separate
“wiki-how” entries on identifying sarcasm [7] and in 2014, the
U.S. Secret Service put in a request for software that would
identify sarcasm in communications. [8] In a 2005 study, only
56% of participants correctly identified sarcastic vs. non-sarcastic
comments when sent as a message. [9]
In a 2006 survey, 55% of respondents incorrectly believed they
were providing an example of a sarcastic comment. [10]
The 2005 study found that when the same messages were
transmitted through a voice recording, the recipient interpreted the
emotion correctly 73% of the time (consistent with senders’
expectations.) [11].
Two features in particular, are critical in correctly identifying
sarcasm in written communications: context and emphasis. [12]
5.2 Context
The identity, beliefs and demographics of the communicator are
essential to correctly identifying sarcasm.
For example, typically, a 10-year-old boy’s views on the latest
Transformers movie is likely to be very different from those of a
45-year-old woman’s. This would be essential in determining
whether the phrase “I can’t wait to spend my Saturday at the new
Transformers movie” is sarcastic or not.
Similarly, a positive comment about an entertainer posted on a
fan-site is far more likely to be genuine (non-sarcastic) than one
posted on a site devoted to critics of that entertainer. Capturing
this context may be simple for human beings. However, in the
NLP arena, with the exception of posting location, most, if not all
of the context associated with the communicator is hidden.
5.3 Emphasis
Emphasis (or, in the case of vocal communication, facial
expression or speech inflection) are similarly critical in
identifying sarcasm, especially in the absence of another context.
In electronic communications, the author will often attempt to
provide emphasis through the use of punctuation, spacing, font
formatting (capitalization, italics, bold, underlining) and non-
word emoticons (e.g., the ubiquitous colon-parenthesis side-ways
smiley-face). [13]
For example, compare the sentences “Oh, yes, he did an excellent
job.” And “Oh yes … he did an EXCELLENT job …” Although
the only difference is formatting and punctuation, the difference
in intent is obvious. In typical NLP pre-processing, most if not all
of this information is discarded.
6. Re-Examining the Corpus
Unfortunately, as discussed above, various attempts to improve
model performance, including retaining stop-words and limiting
the analysis to the initial comment or the response were
unsuccessful. There were also no apparent methods for adding
additional context or finding emphasis which had been lost.
Consequently, I felt it was appropriate to re-examine the tagged
corpus. Although I had performed an initial review and validation
of the corpus, I did not review a significant number of the tags.
The tags themselves were provided by Amazon Mechanical Turk
reviewers. I therefore reviewed approximately 10% of the overall
tags applied (500 tags). The results were surprising, but, perhaps
should not have been given research on human assessment of
written communications for sarcasm.
Of the 500 tags reviewed, I felt that 239 were incorrectly tagged.
For example, the following statement and response was tagged as
sarcastic:

Quote: “Still dumb to compare the two. Both are no
good. Theft is better then murder...however a theif
bragging because he didn't murder is a fool..Chris Rock
actually did a funny skit on this.”
Response: “So why did you say pot was worse than
alcohol? Just because both are apparently bad, doesn't
make one 'worse'. I'll point out that you compared the
two first, so saying it's dumb to compare the two's
shooting yourself in the foot.”
There does not appear to be any irony intended in either the initial
quote nor the response. Nor, in fact, does the exchange appear to
be particularly cutting or bitter.
Given that in a 10% sample, I found that approximately ½ of the
tags were subject to challenge, it is not surprising that none of the
NLP models used were any better. Further, the discrepancies were
completely consistent with the 2005 and 2006 studies discussed
above.
7. NEXT STEPS & CONCLUSION
In order to perform a more accurate analysis, there is a need for
either a more consistently tagged corpus or access to additional
context and emphasis information. Clearly, sarcasm identification
is highly subjective and is unlikely to be improved unless the
human classifiers have a more formal background in rhetoric.
Accordingly, it may be worthwhile exploring use of selected
“expert groups” in tagging documents rather than a broader group
such as the Mechanical Turk. A more difficult issue is one of
identifying communicator context and retaining written emphasis.
Unformatted “text only” services such as twitter will limit font
formatting but at least permit the expanded emoji set and
punctuation based formatting. Further, there may be significant
information about the individual communicator available through
their social media profile. Accordingly, a more realistic “next
step” would be to build a corpus that first creates a demographic
profile on a twitter user (or similar social media commentator)
and then associates tweets and comments with that person for
categorization. With both pieces of data, identification of sarcasm
may be significantly simpler. However, as noted above, using
comments in a vacuum without any context appears to be
comparable to or worse than human assessment and will likely
remain so.
8. REFERENCES
[1] Oxford English Dictionary (entry for “Sarcasm”)
http://www.oed.com/viewdictionaryentry/Entry/170938
[2] “Irony” - “The expression of one's meaning by using
language that normally signifies the opposite, typically for
humorous or emphatic effect; esp. (in earlier use) the use of
approbatory language to imply condemnation or contempt
(cf. sarcasm n.).” Oxford English Dictionary, (Online)
http://www.oed.com/view/Entry/99565#eid64994
[3] Sarcasm Corpus – Review of Amazon Products,
http://storm.cis.fordham.edu/~filatova/SarcasmCorpus.html
[4] Filatova, Elena, Irony and Sarcasm: Corpus Generation and
Analysis Using Crowdsourcing,
https://pdfs.semanticscholar.org/830f/d0969fbc1e5f2e112aa9
a948b50216150c0a.pdf
[5] Sarcasm Corpus – V2, https://nlds.soe.ucsc.edu/sarcasm2
[6] Weka 3: Data Mining in Java,
http://www.cs.waikato.ac.nz/ml/weka/
[7] http://www.wikihow.com/Detect-Sarcasm-in-Writing and
http://www.wikihow.com/Tell-if-Someone-Is-Being-
Sarcastic
[8] Zezima, Katie, “The Fix: The Secret Service wants software
that detects social media sarcasm. Yeah, sure it will work.”,
The Washington Post, June 3, 2014
[9] Kruger, Justin; Epley, Nicholas; Parker, Jason; Ng, Zhi-Wen,
Egocentrism over e-mail: Can we communicate as well as
we think?, Journal of Personality and Social Psychology, Vol
89(6), Dec 2005, 925-936 http://dx.doi.org/10.1037/0022-
3514.89.6.925
[10] Galinsky, A. D., Magee, J. C., Inesi, M. E., & Gruenfeld, D.
H. (2006). Power and perspectives not taken. Psychological
Science, 17(12), 1068-1074. doi: 10.1111/j.1467-
9280.2006.01824.x
(http://pss.sagepub.com/content/17/12/1068.full?oauth-
code=M6R720OaLlROh7BdyvSikpQgSwrU8ZZi)
[11] Kruger, et.al.
[12] Riordan, M. A., & Trichtinger, L. A. (2016). Overconfidence
at the Keyboard: Confidence and accuracy in interpreting
affect in e‐mail exchanges. Human Communication
Research. doi: 10.1111/hcre.12093
[13] Thompson, D., & Filik, R. (2016). Sarcasm in written
communication: Emoticons are efficient markers of
intention. Journal of Computer‐Mediated Communication,
21(2), 105-120. doi: 10.1111/jcc4.12156

Sarcasm Detection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sarcasm Detection

Similar to Sarcasm Detection (20)

Recently uploaded

Recently uploaded (20)

Sarcasm Detection