The document discusses detecting sentiment-based contradictions in text. It defines contradictions as situations where two sentences are unlikely to be true together based on sentiment. It motivates the need to analyze contradictions due to diversity and change of views expressed on social media and other online forums. The paper proposes a three step approach to detect sentiment-based contradictions: topic identification, sentiment extraction and aggregation, and contradiction extraction.
2. Introduction
Contradictions, what are they?
‣ Contradictions in text are situations where
’two sentences are extremely unlikely to be true together’
‣ Contradictions may be of different types, for example:
antonymy: hot - cold, light - dark, good - bad
negation: nice - not nice, i love you - i love you not
mismatches: the solar system has 8 planets - there are 9 planets
sentiments: i like this book - this reading makes me sick
‣ Sentiment Contradictions may occur due to:
diversity of views
change of views
2
3. Introduction
Motivation
‣ There are many services where users publish their opinions:
blogs, wikis, forums, social networks and others
‣ Sentiment analysis is used to:
learn customers attitude to a product or its features
analyze people's reaction to some event
‣ Such problems require scalable sentiment aggregation, which is:
diversity-preserving
time-aware
3
4. Contradiction
Detection
Contradiction Detection Pipeline
Text Text Topic1 Topic1
topic1 statistics contr1
Text topic1 opinion1 contr2
topic2 topic2 Topic2 Topic2
opinion2 statistics contr3
Text Topic Opinion Opinion Contradiction
Extraction Identification Extraction Aggregation Extraction
s
on
es al
s
lu tic
ti
on
pa eb
cs
s
ic
xt
Va tis
s
pi
ni
W
d
ge
Te
ra
To
a
pi
St
nt
O
Co
4
5. Contradiction
Detection
Formulating the Problem
‣ Sentiment S is a real number in the range [-1,1], reflecting opinion polarity
‣ Aggregated Sentiment µS is the mean value over sentiments in the collection
‣ Simultaneous Contradiction. when two groups of documents express a very
different sentiment on the same topic, in the same time interval.
‣ Change of Sentiment. when two groups of documents express a very different
sentiment on the same topic, but in consequitive time intervals.
5
6. cuss later, be restricted to those, that were postedA problem with the above We
tive sentiments compensate each other. within the window w.
way of set as D(w), sentiment arises when there = D(w) .
denote thiscalculating topicand n as its cardinality, n exists a large
ally deter-
ON D ETECTION ). Contradiction
number of documents with aggregated sentiment µ close to zero
very low sentiment values (neutral doc-
sideration. con- In this example, a value of S
ch have high Detection
uments). In this case, the value of µS will be drawn close to zero,
implies a high level of contradiction situation of the contradiction.
because of positive and nega-
e involvedabove
ng regions without necessarily reflecting the true
tive Therefore, we suggest to additionally consider the variance of the
sentiments compensate each other. A problem with the above
way sentiments along topic sentiment arises when there exists σSlarge
of calculating with their mean value. The sentiment variance a2
onsider the popu-Contradiction Preserving Sentiment Aggregation
ECTION ). number of documents with very low sentiment values (neutral doc-
is defined as follows:
ons may indicate
high con-
mmunity. Due to uments). In this case, the valuenof µS will be drawn close to zero,
1
2 2
ons above first without necessarily reflecting
olution to the σS = the (Si −situation of the contradiction.
true µS ) (1)
n i=1
direct extension. Therefore, we suggest to additionally consider the variance of the
According to the above definition, when The sentiment variance 2
general, and can sentiments along with their mean value.there is a large uncertainty σS
the popu-
e above problem, is defined the follows: sentiment of a collection of documents on a
about as collective
y indicate
eating contradic- particular topic, the topic sentiment variance is large as well.
egated Sentiment.
y. Due to Figure 1 shows two example sentiment distributions. Distribution
2 1 n 2
to the first A with µS close to zero = a high i − µS ) indicates a very con- (1)
σS and (Svariance
n i=1
tradictive topic. Distribution B shows a far less contradictive topic
ION
extension. with sentiment mean µS in the positive range and low variance. For
e aand can ap- According to the above definition, when there is a large uncertainty
l, three step
example, a group of documents with µS close to zero and a high
problem, about the collective sentiment Figurecollectionvery contradictive,on a
variance (distribution A on the of a 1) will be of documents
contradic- particular topic, the topicsentiment µSvariance is large as positive
and another group with sentiment shifted to negative or well.
c pair, and
Sentiment.
e texts.
Figure 1 low variance is likely to sentimentcontradictive (distribution
with shows two example be far less distributions. Distribution
A with µtheclose to zero and a high a large number of neutral sen- con-
B on
S
Figure 1). When assuming variance indicates a very
methods, or adap- timents in the collection, we have two opposite trends: the average
tradictive topic. Distribution B shows a far less contradictive topic
sentiment moves towards zero and sentiment variance decreases.
steps as ’prepro-
ng. The focus of withIf these trends will µS in the positive range and lowdocuments For
e step ap-
sentiment mean compensate each other, the neutral variance.
oach. example, a group of documents with µmuch. to zero and a high
would not affect the contradiction value S close
6
7. range [−1, 1]
mplies ais a real number in theINGLE -Tbecause that indicates the
opic first need determine all the different topics and of positive and
eding hightolevelPof contradiction shiftingthen calculateETECTIONnega- a particular to identify contradicting opinions we need to
T the
ROBLEM 1 (S OPIC C
ve sentiments compensate eachTτ , and topic Tproblemtext. the
ity of the author’saopinioninterval
corresponding For given time on
sentiments.
window3 w. ). In
for contradictions inother. AONTRADICTION D regions ofabove order to be able
aexpressedtime the time Nega- For
in a with
, identify
Contradiction
define a measure of contradiction. Assume that we want to look
topic T , represent where arisesand positive is exceeding large Measuring aContradictions For a particular
a predefinedset of documents D, which T opinions 4.2 for contradictions in shifting time window3 w.
ay positive values threshold ρ.the sentiment a contradiction level for we use for calculation, will
size w, negative
andof calculating topic -T OPIC C ONTRADICTION D ETECTION ). a when there exists In order to, the set of documents D,contradictingfor calculation, will to
P ROBLEM 1some INGLE
(S topic T be able to identify which we use opinions we need
later, ofwhile time interval τto those,user-defined. theposted within thedefine a measureWecontradiction. Assume the window w. We look
ctively, a given the absolute value T ,sentimenttime regions of thedoc- restricted to those, that were posted within that we want to
be restricted , and topic ofidentify that were
sentimentvalues (neutral window w. of
represents
umber documents with very, low
For The this interval, D(w), and n as we will discuss later,n = be
denote time seta contradiction level As automatically
τ is D(w)
gth of predefined size w, wherevalueeitherµSuser-defined,Tits exceeding deter- zero, this.set asin a shiftingas its cardinality,3n = D(w) . particular
ments).a the opinion. threshold, ρ,as of be will for or is cardinality, for contradictions D(w), and n time window w. For a
deter- In this case, the can
the be drawn close to denote Detection
some thresholdmined in an adaptive fashion based aggregated sentiment µS In this example,documents D, which we use for close to zero will
In this example, a value of on the data under consideration. topicclose to zerovalue of aggregated sentiment µS calculation,
ρ. a
ation. necessarily reflecting the all the situation of the contradiction.T , theaset oflevel of contradiction because of positive and nega-
ithout computing sentiments for true topics in a datasetwe also needbe restricted high implies
t from timeimpliescan,high level of contradiction because of positive sentiments compensate each postedAwithin the window w. We
We
a also user-defined. As we will discuss are involved
determine individual texts, that
tive andto those, that were other. problem with the above
nega-
olved The interval, τ is
in contradictions, as follows. consider the variance of thelater,
herefore, we suggesteither becompensateoreach other. Amultipledenote thiscalculating topicand n as its cardinality, nexists a large
to additionally there = D(w) .
user-defined, automatically problem 2 the as D(w), sentiment
mpute the polarity on some topic aggregated overdeter- set
tive sentiments (A LL -T OPICS C ONTRADICTION D ETECTION ). with of of above with very lowarises whenvalues (neutral doc-
way
an adaptive their mean value. ThePreserving Sentiment Aggregation close to zero,
Contradiction time periods).con- In numberhighdocuments contradictionS because of positive and nega-
the threshold, ρ, can
P ROBLEM 2
ntiments along differenttime based onas welltopicssentimentwhen therethis example, a case, the value of µsentimentdrawn µS close to zero
mined in waywithgiven authors, τ ,thesentiment which have variance σS
(that may span For acalculating topic data under consideration.
of fashion interval identify as T , arises high exists a large
uments). In level
this
value of aggregated sentiment
will be
definedcan also determine documents number verythat are involved above implies a necessarilyof
ON ).
We as follows: all the topics in a dataset
number of level, or large with of contradicting regions values without we suggest to additionallysituation of thevariance of the
tradiction
low sentiment tive Therefore, doc-
(neutral reflecting the true
sentiments compensate each other. A problem with the above
contradiction.
in contradictions, asthreshold.
(A follows.
EFNINITION 2 some GGREGATED S ENTIMENT ). The Aggregated of calculating topic sentiment arises when there exists a large consider the
con- uments). Inproblemcase, the value of to consider the popu-
this is interesting if we want µS will be drawn sentiments zero,
1 n
latter collection of documents D on topic T ,
wayclose toalong with their mean value. The sentiment variance σS 2
ment S expressed σ 2
above µP ROBLEM 2larityin a = web topics.i Frequentthe true situation of the contradiction. with very low sentiment values (neutral doc-
The(A LL -T OPICS C ONTRADICTION D ETECTION ).
Raw Sentiments of certain
S
2
(S − µS contradictions may indicate number of documents
without necessarily reflecting ) have high con- (1) defined as follows:
is
fined For the mean"hot" topics,τ whichi=1 topicsinterestsentiments assigned uments). In this case, the valuen of µS will be drawn close to zero,
as a given timevalue over all individual of the community. Due to
n attract
interval , identify the T , which 1
tradiction level, or limitations, in thisof towe only discuss a solution to the the variance of the σS = n true µS )
Therefore, we suggest contradicting regions above first without necessarily reflecting the(Si −situation of the contradiction.
large number paper additionally consider
2 2
space defined on the same range of [−1, 1] as (1)
at collection. µS is 2 i=1
some threshold.contradictionwith to the second one is its setTheT documents with suggest to additionally consider the variance of the
tosentimentsdefinition, whenathere value. where n isTherefore, we a σS
theaboveas follows: µ of mean is S largeextension.
problem, since a solution their 1
along value different direct uncertaintyvariance
of sentiment
ments Aggr. Sentiment
ccording to the Though, n a
= n ∑ work ,
popu- and calculated the approach we propose in this i=1 isigeneral, and can sentiments along with their mean value.there is a large uncertainty σS
According to the above definition, when The sentiment variance 2
as follows:we S to consider the popu-
is definedsolutions for several other variations of the above problem,
The much higher cardinality.
bout the latter D. lead to interesting of a want
ardinality of problem as detection of if collection repeating contradic- on about as topic, the sentiment of a collection large as well.
is
dicate collective sentiment topics with periodically of documents is defined the follows: topic sentiment variance is of documents on a
a collective
articular topic, the topic tothe interest variance isAggregatedthe observations made two example sentiment distributions. Distribution
larity of certain web topics. the contradiction formula
Incorporating sentiment contradictions may indicate
such Frequent particular
ue to with the
n
tions attract most frequentlythe1community. Due 2
2 alternating
of large as to
"hot" topics, whichorpropose the following final formula for computing1con- well.
Sentiment. Figure shows n
above, weVariance of different collections of texts, A with µS close to zero = 1 a high variance)2
e first 1 shows two in paper we σS = distributions.S )
omparingSentiment this values only discuss a solution to the first
gurespace limitations, example sentiment n i=1(Si − µ Distribution
the sentiment (1) σS and (Si − µS indicates a very con- (1)
2
tradiction CONTRADICTION DETECTION
4. values: n
adictions are identified as follows. variancedirect extension.very con- sentiment mean µ in thei=1 arange andcontradictive topic
withproblem, sinceto solution to the second one is its indicates a
a zero and a high
sion. µS close Given the problems described before, we propose a three step ap-
tradictive topic. Distribution B shows far less
with positive low variance. For
S
adictive topic. approach we proposevariations ThereaboveW there is a topic the collective of documentsof whenwill be to of large uncertainty
d can to solutions for several othershows).athatϑislesswhen can
Though, the proach to contradiction in this workincludes: ⋅ general, and
According to theB analysis, fartheσS contradictive large uncertaintysentiment Figure collectionis zero and a high on a
Distribution above definition,is a contradiction example,(distribution A on the with 1)S there very contradictive,
According toathe above definition, µ close a
2 group
lead Contradiction
EFNINITION 3 (C ONTRADICTION= of C problem, aboutvariance (3) a documents
blem, T ,as detectiontheScollective documents, D1 ,contradic- in aparticular topic, the topic sentiment variance is large as well.
topic, about two groups of
Detection ofthe positive range and collection of documents groupa
topics forsentiment (µSa 2low variance. For another on with sentiment µS shifted to negative or positive
each sentence, of )
ϑ+
ith sentiment meanof topics sentiments for each sentence-topicD2 ⊂ D
µ in
such between Detection of with periodically repeating pair, and and
adic- collectiontheofwhere D1the Dfor= µS when the informationFigureon low Figure 1).isWhen assuming less contradictive (distribution
ment particular frequently alternating Aggregatedvariance is a high 1 the variance likely to be far a large number of neutral sen-
or with close
ample, In the denominator, we add a small value, ϑ ≠ 0, whichBallows to Sentiment.
Analysis of sentiments 2 topic,across multiple texts.
with
tionsa groupD, documents with sentiment to zero andlarge as well. two example sentiment distributions. Distribution
most topic, topic shows
riance Figure shows Figure 1) will be very 2 D adap- A with µ close to zero
ment. (distributionlevel of two example when (µ )distributions. timents in the collection,and a high variance trends: a very con-
eyed aboutlimit the 1one existingcontradiction refer tobetween asis1closetradictiveStopic. Distribution have two opposite variancethe average topic
considerably moreWe will sentiment contradictive,Distribution
T is tationsA and two can be achieved using existing methods, ’prepro- to sentimentThe towardswe B shows a far less contradictive
Steps on the different these steps or and zero. moves
indicates
4. CONTRADICTIONzero and a high variance indicates a very con-
to DETECTION
S zero and sentiment
A with µofand describe themS shiftedensure that focuspositive these trends will µS in the positive range and low variance. For
close methods. briefly to following. The or of withIf
decreases.
nd another nominatorof them.
han within groupcessing’is multiplied
with Ssentiment µ by ϑ in the to negative contradiction values
each calculate contradiction by combining Aggr. Sentimentmean compensate each other, the neutral documents
We one topic. Distribution B shows step ap-
‣ tradictive is then be far we propose a approach. (distribution anot topic documents with Variance.zero and a high
before,
sentiment and Sentiment
Given the problems describedthe contradiction detection three a far less contradictiveaffect the contradiction value much.
would
ith low variancethis paper analysis, that includes: Figure 2(c) shows howEvidently, we need to combine mean and S close of sentiments in
proach fall within the interval [0; 1].contradictive
is likely to less example, group of
a contra- µ to
to contradiction variance
e above definition, we purposely not specify exactly what itvariance (distribution A on the Figure 1) will be verythe contra-
on theDetectionAggregatedmean on aininclose to 0,range and low anotheris high. computing contradictions.toThen, contradictive,
‣dictionsentiment topics perµS the positiveLatent neutral variance. For
with When assuming
p for a sentimenttopicsPreprocessingsentence, theapply theanother one. and a singlevalue C can besentimentas:S shifted negative or positive
ap- Figure of4.1identifying besentence,ϑ large number theDirichlet sen- ϑ values with computed µ
If 1). value depends
forto Sentiment we from
each very different denominator. Smaller of contradiction formula for
s value group of documents with µS close to diction group
ments in the collection, (LDA)havesentence-topic pair, and to work on average and a highlikely to be far less contradictive (distribution
example, contradiction points with µS closethe zero,zero example
emphasize a for algorithm [1], which we extended
For
Detection ofAllocation we each two opposite trends: to the with low variance is
sentiments for
ntiment ‣changes the LDA and topicsentencesseveral most probable topics. difference, making
efine contradiction of(distribution basis,the higher input documents B on the Figure 1). When assuming a large number of neutral sen-
Analysis of sentimentspairwise A onthe Figure 1) will be very contradictive,
The largerlevel [4]. So across ϑ where we evaluate the
variance athe variance, arevalues variance decreases.
on the contradiction.
moves sentenceopinion. Larger multiple texts.
towards for assigned with considered as this
zero and sentiment mask σS2
reement between two begroupofusingsentiment µScollection. negative orvalue
and for can groups each pairequal. In this study, In timents
of contradictions more we theaa continuous senti-
achieved documents in
C=
Steps oneand twocompensatewith existing methods, or adap-to we used in the collection, we have two opposite trends: the average
levels anothereach sentence-topic other, assign neutral documents a positive
shifted (µS )2
(2)
these trends will offor
ase, tations ofwithThen,⋅value−4thewhichlikely effectivegroup contradictivewhere µS is squared so that its units are the same variance decreases.
the similarity5methods., We will refer toindicatesfar for’prepro- assentiment moves towards zero and sentiment as of σ2 .
ofexisting contradiction value to each polarityitsthe opinion exhibiting a
ϑ the 10 in range [-1;1] that these steps lessofserves
ment information within be a
=low variance is was much. as
not affect expressed regarding the in the For the sentiment assignment step, purpose, (distribution
ould cessing’ and describe them brieflydisagreement level. This defi-If these trends will compensate each other, the neutral should S
focus number of neutral sen- the intuition that contradiction values documents
topic. following. The largeof 6
erence point, providing a basic When assuming adistorting the final results.captures
B onbehavior across for fine-grained opinion analysis [7]. Nev- would formula the contradiction value much.
the Figure 1). datasets, without This
stable we to combine mean and variance of sentiments in not affect topics whose sentiment value is close to zero, and
vidently,paper need the
this we is then use contradiction detection approach.
an existing tool be higher for
8. Contradiction
Detection
Contradiction Time Series
We generated time series of raw sentiments
with half of them being gaussian noise
We can see an aggregated sentiment, which
is first positive and later changes to negative
On the contradiction series we get peaks of
contradiction only at the time points t1 and t2
They correspond either to simultaneous
contradiction (t2) or to change of sentiment (t1)
and can be distinguished by change of signs
of !S just before and just after peak points.
7
9. Contradiction
Detection
Contradiction Time Series Storage - TimeTree
We need a scalability on the number of topics and time
interval length, yet the efficiency of access and update.
We need to analyze time series of contradiction
level using different granularities online.
Smaller time windows allow us to detect
more simultaneous contradiction.
Larger ones reveal opposite opinions,
which are sparse across time.
8
10. Contradiction
Detection
Contradiction Time Series Queries
We have the following types of queries:
‣Adaptive-threshold queries
output nodes C > rho • Cparent
‣Fixed-threshold queries
output nodes C > rho
‣Single-topic retrieval
topic is specified
‣All-topic retrieval
retrieve all
9
11. Performance
Tests
Usefulness Evaluation - Setup
‣ We applied our algorithms to a range of data sets:
drug reviews collected from the DrugRatingz.com website
comments to YouTube videos from L3S
comments on postings from Slashdot.com
‣ User evaluation was performed in two independent stages,
during each step we asked users to find sentiment contradictions:
in the first stage, by looking at trends of positive and negative sentiments
in the second stage, by providing a plot of the contradiction level
‣ We measured time and number of clicks, needed to find one contradiction
!T = T2/T1 !N = N2/N1 smaller is better
‣ We also asked users to evaluate the effectiveness of each approach
!D = D2/D1 each value was in the range [1-5], larger is better
‣ Precision was calculated as a fraction of clicks, finding contradiction
!P = P2/P1 P1 = N1/NC P2 = N2/NC larger is better
‣ Then, we compared their improvements for each dataset
10
13. Performance
Tests
Usefulness Evaluation - Results
Dataset Topic name Size ∆D ∆T ∆N P1 P2 ∆P
1)
Drug Ambien 60 1.50 0.60 0.88 0.70 0.81 1.20
Ratingz Yaz 300 1.58 0.93 0.78 0.75 0.95 1.32
Slashdot Int. control 159 1.17 0.89 0.58 0.37 0.63 2.14
YouTube Zune HD 472 2.07 0.68 0.62 0.36 0.61 2.09
Average 1.58 0.77 0.72 0.55 0.75 1.69
Table 1: Evaluation results for different topics.
‣ Stage 2 (based on contradictions) has showed improvement on all measures
‣ The largest many of the automatically identified contradictions were real ones
improvement in precision was achieved for Slashdot and YouTube
‣ Time improvement was by the users).dataset with many contradictionswas
(i.e., verified small for a The results show that our approach (Yaz),
always more successful in suggesting to users time-intervals that
but it was large for dataset with infrequent contradictions (Ambien)
contained contradictions, with an overall average success rate of
‣ In all cases,75%, tool as high asto reduce the search space (number of clicks)
our and allowed 95% (topic "Yaz").
The above results demonstrate that our approach can successfully
identify contradictions in an automated way, and quickly guide
users to the relevant parts of the data.
6.4 Evaluation of Scalability
We evaluate the scalability of the CTree for solving Problems 1
12
14. Performance
Tests
Scalability Evaluation - Experimental Setup
‣Syntetic dataset contained randomly generated sentiments for 1,000 topics
‣We generated sets of 25 single-topic and all-topics queries
‣In queries we used granularities and topics drawn uniformly at random
‣We measured the time needed to execute these queries against the database
as a function of the time interval, and the granularity of the time windows.
‣We report results for both the fixed threshold and the adaptive threshold.
select c1.topic_id, c1.timeBegin, c1.timeEnd from contradictions c1
join contradictions c2 on c1.topic_id = c2.topic_id and c2.granularity = c1.granularity + 1 and
c1.timeBegin is between c2.timeBegin and c2.timeEnd
where c1.contradiction ! c2.contradiction * @threshold and c1.granularity = @window and
(c1.timeBegin is between startDate and endDate or c1.timeEnd is between startDate and endDate);
13
15. Performance
Tests
Scalability Evaluation - Results
1 103 1 105
time, ms
time, ms
Query 1: time interval test Query 2: time interval test
1 102
‣ Linearly scales on the
length of time series
4
1 10
1 101
‣ Fixed threshold shows
1 100
time interval, days
1 103
time interval, days lower answering time
200 400 800 1600 200 400 800 1600
1 103
Query 1: granularity test
1 106
Query 2: granularity test ‣ Increasing granularity
1 102 1 105
makes answering faster
1 101 1 104
‣ All-Topic queries approx.
1 100 1 103 by two orders slower
time, ms
time, ms
granularity, days granularity, days
1 10-1 1 102
1 10 100 1000 1 10 100 1000
dbms adaptive dbms fixed 14
16. Related Work
Related Work - Contradiction Analysis
‣ Contradictions were initially defined as textual inference and analyzed using linguistic
technologies:
(De Marneffe et al., 2008; Harabagui et al., 2006).
‣ Contradictions can be distinguished on a large scale by calculating the entropy or clustering
accuracy:
(Choudhury et.al., 2008; Varlamis et.al., 2008).
‣ Contradictions can also be analyzed using visual inspection of opposite sentiments:
(Chen et al., 2006; Liu et.al., 2005).
‣ Contradictive sentiments can be useful sources for opinion summarization:
(Kim and Zhai, 2009; Paul et.al., 2010).
15