This document discusses the current limitations and challenges of sentiment analysis. It summarizes that while sentiment analysis systems can detect the polarity of sentiment at a basic level, they struggle with more complex linguistic expressions that change sentiment. Specifically:
- Systems are largely binary (positive vs negative) and cannot capture varying degrees of sentiment.
- Common linguistic constructions like negation, conjunct reversal, and context dependence still break many systems.
- Averaging sentiment scores obscures meaningful differences in sentiment within text.
- Data used to train systems contains inherent cultural biases that are learned and reproduced.
20. Far from being solved
Minimally changing sentences breaks current
sentiment analysis systems
Systems can only handle the ends of the
sentiment spectrum (+/-)
Inherent bias in the data
22. Systems fail to capture the difference
If not for its show-offy back-and- forthing of time,
the movie would be a banal, pointlessly
depressing exercise.
Despite its show-offy back-and- forthing of time,
the movie is a banal, pointlessly depressing
exercise.
+
-
23. Using mostly unknown and first-time actors, Loach
spins a passable coming-of-age tale, which
should please his fans and provides a diversion
for the rest of us.
Using mostly unknown and first-time actors, Loach
spins a passable coming-of-age tale, which ought
to have pleased his fans and provided a diversion
for the rest of us.
+
-
Systems fail to capture the difference
24. We are far from robust sentiment
analysis
[Mahler et al. 2017]
26. “Standard” negation
This is one of the worst movies of the year.
This is not one of the worst movies of the year.
Director Jordan fails to deliver a film
worthy of such mythical figure.
Director Jordan did not fail to deliver a film
worthy of such mythical figure.
-
58% break, n=6
-
+
0%
-
83%
27. Post-fixed negation
Exciting heist film for teens and their families.
Exciting heist film for teens and their families – not!
A fun bit of action comedy with a bang up cast.
A fun bit of action comedy with a bang up cast –
not!
+
-
100% break, n=2
-
+
28. Conjunct reversal
Munich ricochets all over the place, but it hits its
target dead-on.
Munich hits its target dead-on, but ricochets all over
the place.
71% break, n=7
-
100%
+
29. Conjunct reversal
Munich ricochets all over the place, but it hits its
target dead-on.
Munich hits its target dead-on, but ricochets all over
the place.
The sentiments are right on the money, but the
execution never quite filled me with holiday cheer.
The execution never quite filled me with holiday
cheer, but the sentiments are right on the money.
-
71% break, n=7
-
+
100%
67%
+
30. Do systems understand the
linguistic structure?
Systems seem to understand the individual
words to detect sentiment
But they cannot yet handle more complex
linguistic expressions
33. Phrases with sentiment score
0.55556
a film as Byatt fans could hope for
A sly dissection of the inanities of the
contemporary music business and a
rather sad story of the difficulties
13,15,15
13,15,15
34. a film as Byatt fans could hope for
A sly dissection of the inanities of the
contemporary music business and a
rather sad story of the difficulties
a sad, superior human
a joke
Angel presents events partly from the
perspective of Aurelie and Christelle and
infuses the film with the sensibility of a
particularly nightmarish fairytale
13,15,15
13,15,15
9,13, 21
8,15, 20
9,17 17
Phrases with sentiment score
0.55556
35. Averages obscure differences
In most datasets sentiment scores are averaged
But this conflates the data in a way that
obscures differences
Another common measure: majority voting
But data with variance is informative too!
39. Data representation: word
embeddings
Based on distributional semantics
With long, wispy eyelashes and a vibrant blue face,
Bernie the southern cassowary has a look that
rivals even the fanciest of peacocks.
Cassowaries stand between 1.5-2 meters in height
with both sexes similar in appearance.
How Dangerous Are Cassowaries, Really? -
Scientific American Blog ...
Scientists currently recognize three living species of
cassowary—all of which are restricted to New
Guinea, northeastern Australia, and nearby islands.
42. What other things are dangerous?
What other things live in Australia?
Most Dangerous Cancers in Men and Women Infographic
Why are cancer cells dangerous?
The koala is an arboreal herbivorous marsupial native
to Australia.
The AKF estimates that there are likely to be less than
80,000 Koalas remaining in Australia today and it could
be as low as 43,000.
Annotated every phrase in a sentence for sentiment, which gives a great picture of compositionality.
Concretely, annotated phrases for sentiment with a slider bar labeled from very negative (1) to very positive (25). Each phrase was scored by at least 3 annotators.
The scores were then averaged and mapped between 0 and 1. Again as far as I can tell, nothing was done to ensure Turkers were not spamming.
They took careful care to bin the annotations in something sensible, that matches the annotations. 5 way split as well as a binary split
5 way split as well as a binary split
Recursive Neural Tensor Network trained on all nodes.
All phrases: classification for every n-gram, and then restrict the data to full sentences (not broken down).
I give you here the best results they achieved but with all the other models they tested, you have the same picture: big drop when trained at sentence-level for the 5-way classification.
2 tasks: builder – build sentiment analysis systems
Breaker – minimal edit items from the test sets so that the system would “break” – where break means that it doesn’t assign the correct sentiment to exactly one of the items in the pair
All systems miss that examples
Here we see the raw scores: remember the scale goes from 1 to 25.
The last two examples in each category are full sentences.
And throwing away data for which high-annotator agreement isn’t obtained.
Macro-average – the goal here was to see how much inclucing the compicated class impact the results. You can probably get better results with a better machine learning algorithm
Sentiment analysis systems work well on the data they have been trained on.