Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mikalai Tsytsarau


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mikalai Tsytsarau

  1. 1. Mikalai Tsytsarau University of Trento March 28, 2011© WILMAR PHOTOGRAPHY.COM Mikalai Tsytsarau Themis Palpanas Kerstin Denecke University of Trento University of Trento L3S Research Center Scalable Detection of Sentiment-Based Contradictions
  2. 2. Introduction Contradictions, what are they?‣ Contradictions in text are situations where ’two sentences are extremely unlikely to be true together’‣ Contradictions may be of different types, for example: antonymy: hot - cold, light - dark, good - bad negation: nice - not nice, i love you - i love you not mismatches: the solar system has 8 planets - there are 9 planets sentiments: i like this book - this reading makes me sick‣ Sentiment Contradictions may occur due to: diversity of views change of views 2
  3. 3. Introduction Motivation‣ There are many services where users publish their opinions: blogs, wikis, forums, social networks and others‣ Sentiment analysis is used to: learn customers attitude to a product or its features analyze peoples reaction to some event‣ Such problems require scalable sentiment aggregation, which is: diversity-preserving time-aware 3
  4. 4. Contradiction Detection Contradiction Detection Pipeline Text Text Topic1 Topic1 topic1 statistics contr1 Text topic1 opinion1 contr2 topic2 topic2 Topic2 Topic2 opinion2 statistics contr3 Text Topic Opinion Opinion Contradiction Extraction Identification Extraction Aggregation Extraction s on es al s lu tic ti onpa eb cs s ic xt Va tis s pi ni W d ge Te ra To a pi St nt O Co 4
  5. 5. Contradiction Detection Formulating the Problem‣ Sentiment S is a real number in the range [-1,1], reflecting opinion polarity‣ Aggregated Sentiment µS is the mean value over sentiments in the collection‣ Simultaneous Contradiction. when two groups of documents express a very different sentiment on the same topic, in the same time interval.‣ Change of Sentiment. when two groups of documents express a very different sentiment on the same topic, but in consequitive time intervals. 5
  6. 6. cuss later, be restricted to those, that were postedA problem with the above We tive sentiments compensate each other. within the window w. way of set as D(w), sentiment arises when there = D(w) . denote thiscalculating topicand n as its cardinality, n exists a largeally deter-ON D ETECTION ). Contradiction number of documents with aggregated sentiment µ close to zero very low sentiment values (neutral doc-sideration. con- In this example, a value of Sch have high Detection uments). In this case, the value of µS will be drawn close to zero, implies a high level of contradiction situation of the contradiction. because of positive and nega-e involvedaboveng regions without necessarily reflecting the true tive Therefore, we suggest to additionally consider the variance of the sentiments compensate each other. A problem with the above way sentiments along topic sentiment arises when there exists σSlarge of calculating with their mean value. The sentiment variance a2onsider the popu-Contradiction Preserving Sentiment AggregationECTION ). number of documents with very low sentiment values (neutral doc- is defined as follows: ons may indicate high con-mmunity. Due to uments). In this case, the valuenof µS will be drawn close to zero, 1 2 2ons above first without necessarily reflectingolution to the σS = the (Si −situation of the contradiction. true µS ) (1) n i=1 direct extension. Therefore, we suggest to additionally consider the variance of the According to the above definition, when The sentiment variance 2 general, and can sentiments along with their mean value.there is a large uncertainty σS the popu- e above problem, is defined the follows: sentiment of a collection of documents on a about as collectivey indicate eating contradic- particular topic, the topic sentiment variance is large as well.egated Sentiment.y. Due to Figure 1 shows two example sentiment distributions. Distribution 2 1 n 2to the first A with µS close to zero = a high i − µS ) indicates a very con- (1) σS and (Svariance n i=1 tradictive topic. Distribution B shows a far less contradictive topic IONextension. with sentiment mean µS in the positive range and low variance. Fore aand can ap- According to the above definition, when there is a large uncertainty l, three step example, a group of documents with µS close to zero and a high problem, about the collective sentiment Figurecollectionvery contradictive,on a variance (distribution A on the of a 1) will be of documentscontradic- particular topic, the topicsentiment µSvariance is large as positive and another group with sentiment shifted to negative or well.c pair, andSentiment.e texts. Figure 1 low variance is likely to sentimentcontradictive (distribution with shows two example be far less distributions. Distribution A with µtheclose to zero and a high a large number of neutral sen- con- B on S Figure 1). When assuming variance indicates a verymethods, or adap- timents in the collection, we have two opposite trends: the average tradictive topic. Distribution B shows a far less contradictive topic sentiment moves towards zero and sentiment variance decreases. steps as ’prepro- ng. The focus of withIf these trends will µS in the positive range and lowdocuments For e step ap- sentiment mean compensate each other, the neutral variance. oach. example, a group of documents with µmuch. to zero and a high would not affect the contradiction value S close 6
  7. 7. range [−1, 1]mplies ais a real number in theINGLE -Tbecause that indicates the opic first need determine all the different topics and of positive andeding hightolevelPof contradiction shiftingthen calculateETECTIONnega- a particular to identify contradicting opinions we need to T the ROBLEM 1 (S OPIC Cve sentiments compensate eachTτ , and topic Tproblemtext. the ity of the author’saopinioninterval corresponding For given time on sentiments. window3 w. ). In for contradictions inother. AONTRADICTION D regions ofabove order to be able aexpressedtime the time Nega- For in a with , identify Contradiction define a measure of contradiction. Assume that we want to look topic T , represent where arisesand positive is exceeding large Measuring aContradictions For a particular a predefinedset of documents D, which T opinions 4.2 for contradictions in shifting time window3 w. ay positive values threshold ρ.the sentiment a contradiction level for we use for calculation, will size w, negativeandof calculating topic -T OPIC C ONTRADICTION D ETECTION ). a when there exists In order to, the set of documents D,contradictingfor calculation, will to P ROBLEM 1some INGLE (S topic T be able to identify which we use opinions we need later, ofwhile time interval τto those,user-defined. theposted within thedefine a measureWecontradiction. Assume the window w. We look ctively, a given the absolute value T ,sentimenttime regions of thedoc- restricted to those, that were posted within that we want to be restricted , and topic ofidentify that were sentimentvalues (neutral window w. of representsumber documents with very, low For The this interval, D(w), and n as we will discuss later,n = be denote time seta contradiction level As automatically τ is D(w) gth of predefined size w, wherevalueeitherµSuser-defined,Tits exceeding deter- zero, this.set asin a shiftingas its cardinality,3n = D(w) . particularments).a the opinion. threshold, ρ,as of be will for or is cardinality, for contradictions D(w), and n time window w. For adeter- In this case, the can the be drawn close to denote Detection some thresholdmined in an adaptive fashion based aggregated sentiment µS In this example,documents D, which we use for close to zero will In this example, a value of on the data under consideration. topicclose to zerovalue of aggregated sentiment µS calculation, ρ. aation. necessarily reflecting the all the situation of the contradiction.T , theaset oflevel of contradiction because of positive and nega- ithout computing sentiments for true topics in a datasetwe also needbe restricted high implies t from timeimpliescan,high level of contradiction because of positive sentiments compensate each postedAwithin the window w. We We a also user-defined. As we will discuss are involved determine individual texts, that tive andto those, that were other. problem with the above nega-olved The interval, τ is in contradictions, as follows. consider the variance of thelater,herefore, we suggesteither becompensateoreach other. Amultipledenote thiscalculating topicand n as its cardinality, nexists a large to additionally there = D(w) . user-defined, automatically problem 2 the as D(w), sentiment mpute the polarity on some topic aggregated overdeter- set tive sentiments (A LL -T OPICS C ONTRADICTION D ETECTION ). with of of above with very lowarises whenvalues (neutral doc- way an adaptive their mean value. ThePreserving Sentiment Aggregation close to zero, Contradiction time periods).con- In numberhighdocuments contradictionS because of positive and nega- the threshold, ρ, can P ROBLEM 2 ntiments along differenttime based onas welltopicssentimentwhen therethis example, a case, the value of µsentimentdrawn µS close to zero mined in waywithgiven authors, τ ,thesentiment which have variance σS (that may span For acalculating topic data under consideration. of fashion interval identify as T , arises high exists a large uments). In level this value of aggregated sentiment will be definedcan also determine documents number verythat are involved above implies a necessarilyof ON ). We as follows: all the topics in a dataset number of level, or large with of contradicting regions values without we suggest to additionallysituation of thevariance of the tradiction low sentiment tive Therefore, doc- (neutral reflecting the true sentiments compensate each other. A problem with the above contradiction. in contradictions, asthreshold. (A follows. EFNINITION 2 some GGREGATED S ENTIMENT ). The Aggregated of calculating topic sentiment arises when there exists a large consider the con- uments). Inproblemcase, the value of to consider the popu- this is interesting if we want µS will be drawn sentiments zero, 1 n latter collection of documents D on topic T , wayclose toalong with their mean value. The sentiment variance σS 2 ment S expressed σ 2above µP ROBLEM 2larityin a = web topics.i Frequentthe true situation of the contradiction. with very low sentiment values (neutral doc- The(A LL -T OPICS C ONTRADICTION D ETECTION ). Raw Sentiments of certain S 2 (S − µS contradictions may indicate number of documents without necessarily reflecting ) have high con- (1) defined as follows: isfined For the mean"hot" topics,τ whichi=1 topicsinterestsentiments assigned uments). In this case, the valuen of µS will be drawn close to zero, as a given timevalue over all individual of the community. Due to n attract interval , identify the T , which 1 tradiction level, or limitations, in thisof towe only discuss a solution to the the variance of the σS = n true µS ) Therefore, we suggest contradicting regions above first without necessarily reflecting the(Si −situation of the contradiction. large number paper additionally consider 2 2 space defined on the same range of [−1, 1] as (1)at collection. µS is 2 i=1 some threshold.contradictionwith to the second one is its setTheT documents with suggest to additionally consider the variance of the tosentimentsdefinition, whenathere value. where n isTherefore, we a σS theaboveas follows: µ of mean is S largeextension. problem, since a solution their 1 along value different direct uncertaintyvariance of sentimentments Aggr. Sentiment ccording to the Though, n a = n ∑ work ,popu- and calculated the approach we propose in this i=1 isigeneral, and can sentiments along with their mean value.there is a large uncertainty σS According to the above definition, when The sentiment variance 2 as follows:we S to consider the popu- is definedsolutions for several other variations of the above problem, The much higher cardinality.bout the latter D. lead to interesting of a want ardinality of problem as detection of if collection repeating contradic- on about as topic, the sentiment of a collection large as well. isdicate collective sentiment topics with periodically of documents is defined the follows: topic sentiment variance is of documents on a a collectivearticular topic, the topic tothe interest variance isAggregatedthe observations made two example sentiment distributions. Distribution larity of certain web topics. the contradiction formula Incorporating sentiment contradictions may indicate such Frequent particular ue to with the n tions attract most frequentlythe1community. Due 2 2 alternating of large as to "hot" topics, whichorpropose the following final formula for computing1con- well. Sentiment. Figure shows n above, weVariance of different collections of texts, A with µS close to zero = 1 a high variance)2e first 1 shows two in paper we σS = distributions.S )omparingSentiment this values only discuss a solution to the first gurespace limitations, example sentiment n i=1(Si − µ Distribution the sentiment (1) σS and (Si − µS indicates a very con- (1) 2 tradiction CONTRADICTION DETECTION 4. values: n adictions are identified as follows. variancedirect extension.very con- sentiment mean µ in thei=1 arange andcontradictive topic withproblem, sinceto solution to the second one is its indicates a a zero and a high sion. µS close Given the problems described before, we propose a three step ap- tradictive topic. Distribution B shows far less with positive low variance. For Sadictive topic. approach we proposevariations ThereaboveW there is a topic the collective of documentsof whenwill be to of large uncertainty d can to solutions for several othershows).athatϑislesswhen can Though, the proach to contradiction in this workincludes: ⋅ general, and According to theB analysis, fartheσS contradictive large uncertaintysentiment Figure collectionis zero and a high on a Distribution above definition,is a contradiction example,(distribution A on the with 1)S there very contradictive, According toathe above definition, µ close a 2 group lead Contradiction EFNINITION 3 (C ONTRADICTION= of C problem, aboutvariance (3) a documentsblem, T ,as detectiontheScollective documents, D1 ,contradic- in aparticular topic, the topic sentiment variance is large as well. topic, about two groups of Detection ofthe positive range and collection of documents groupa topics forsentiment (µSa 2low variance. For another on with sentiment µS shifted to negative or positive each sentence, of ) ϑ+ ith sentiment meanof topics sentiments for each sentence-topicD2 ⊂ D µ in such between Detection of with periodically repeating pair, and and adic- collectiontheofwhere D1the Dfor= µS when the informationFigureon low Figure 1).isWhen assuming less contradictive (distribution ment particular frequently alternating Aggregatedvariance is a high 1 the variance likely to be far a large number of neutral sen- or with close ample, In the denominator, we add a small value, ϑ ≠ 0, whichBallows to Sentiment. Analysis of sentiments 2 topic,across multiple texts. with tionsa groupD, documents with sentiment to zero andlarge as well. two example sentiment distributions. Distribution most topic, topic shows riance Figure shows Figure 1) will be very 2 D adap- A with µ close to zeroment. (distributionlevel of two example when (µ )distributions. timents in the collection,and a high variance trends: a very con-eyed aboutlimit the 1one existingcontradiction refer tobetween asis1closetradictiveStopic. Distribution have two opposite variancethe average topic considerably moreWe will sentiment contradictive,Distribution T is tationsA and two can be achieved using existing methods, ’prepro- to sentimentThe towardswe B shows a far less contradictive Steps on the different these steps or and zero. moves indicates 4. CONTRADICTIONzero and a high variance indicates a very con- to DETECTION S zero and sentiment A with µofand describe themS shiftedensure that focuspositive these trends will µS in the positive range and low variance. For close methods. briefly to following. The or of withIf decreases.nd another nominatorof them.han within groupcessing’is multiplied with Ssentiment µ by ϑ in the to negative contradiction values each calculate contradiction by combining Aggr. Sentimentmean compensate each other, the neutral documents We one topic. Distribution B shows step ap- ‣ tradictive is then be far we propose a approach. (distribution anot topic documents with Variance.zero and a high before, sentiment and Sentiment Given the problems describedthe contradiction detection three a far less contradictiveaffect the contradiction value much. would ith low variancethis paper analysis, that includes: Figure 2(c) shows howEvidently, we need to combine mean and S close of sentiments in proach fall within the interval [0; 1].contradictive is likely to less example, group of a contra- µ to to contradiction variance e above definition, we purposely not specify exactly what itvariance (distribution A on the Figure 1) will be verythe contra- on theDetectionAggregatedmean on aininclose to 0,range and low anotheris high. computing contradictions.toThen, contradictive, ‣dictionsentiment topics perµS the positiveLatent neutral variance. For with When assuming p for a sentimenttopicsPreprocessingsentence, theapply theanother one. and a singlevalue C can besentimentas:S shifted negative or positive ap- Figure of4.1identifying besentence,ϑ large number theDirichlet sen- ϑ values with computed µ If 1). value depends forto Sentiment we from each very different denominator. Smaller of contradiction formula for s value group of documents with µS close to diction groupments in the collection, (LDA)havesentence-topic pair, and to work on average and a highlikely to be far less contradictive (distribution example, contradiction points with µS closethe zero,zero example emphasize a for algorithm [1], which we extended For Detection ofAllocation we each two opposite trends: to the with low variance is sentiments for ntiment ‣changes the LDA and topicsentencesseveral most probable topics. difference, making efine contradiction of(distribution basis,the higher input documents B on the Figure 1). When assuming a large number of neutral sen- Analysis of sentimentspairwise A onthe Figure 1) will be very contradictive, The largerlevel [4]. So across ϑ where we evaluate the variance athe variance, arevalues variance decreases. on the contradiction. moves sentenceopinion. Larger multiple texts. towards for assigned with considered as this zero and sentiment mask σS2 reement between two begroupofusingsentiment µScollection. negative orvalue and for can groups each pairequal. In this study, In timents of contradictions more we theaa continuous senti- achieved documents in C= Steps oneand twocompensatewith existing methods, or adap-to we used in the collection, we have two opposite trends: the average levels anothereach sentence-topic other, assign neutral documents a positive shifted (µS )2 (2) these trends will offor ase, tations ofwithThen,⋅value−4thewhichlikely effectivegroup contradictivewhere µS is squared so that its units are the same variance decreases. the similarity5methods., We will refer toindicatesfar for’prepro- assentiment moves towards zero and sentiment as of σ2 . ofexisting contradiction value to each polarityitsthe opinion exhibiting a ϑ the 10 in range [-1;1] that these steps lessofserves ment information within be a =low variance is was much. as not affect expressed regarding the in the For the sentiment assignment step, purpose, (distribution ould cessing’ and describe them brieflydisagreement level. This defi-If these trends will compensate each other, the neutral should S focus number of neutral sen- the intuition that contradiction values documents topic. following. The largeof 6erence point, providing a basic When assuming adistorting the final results.captures B onbehavior across for fine-grained opinion analysis [7]. Nev- would formula the contradiction value much. the Figure 1). datasets, without This stable we to combine mean and variance of sentiments in not affect topics whose sentiment value is close to zero, andvidently,paper need the this we is then use contradiction detection approach. an existing tool be higher for
  8. 8. Contradiction Detection Contradiction Time SeriesWe generated time series of raw sentimentswith half of them being gaussian noiseWe can see an aggregated sentiment, whichis first positive and later changes to negativeOn the contradiction series we get peaks ofcontradiction only at the time points t1 and t2They correspond either to simultaneouscontradiction (t2) or to change of sentiment (t1)and can be distinguished by change of signsof !S just before and just after peak points. 7
  9. 9. Contradiction Detection Contradiction Time Series Storage - TimeTreeWe need a scalability on the number of topics and timeinterval length, yet the efficiency of access and update.We need to analyze time series of contradictionlevel using different granularities online.Smaller time windows allow us to detectmore simultaneous contradiction.Larger ones reveal opposite opinions,which are sparse across time. 8
  10. 10. Contradiction Detection Contradiction Time Series QueriesWe have the following types of queries:‣Adaptive-threshold queries output nodes C > rho • Cparent‣Fixed-threshold queries output nodes C > rho‣Single-topic retrieval topic is specified‣All-topic retrieval retrieve all 9
  11. 11. Performance Tests Usefulness Evaluation - Setup‣ We applied our algorithms to a range of data sets: drug reviews collected from the DrugRatingz.com website comments to YouTube videos from L3S comments on postings from Slashdot.com‣ User evaluation was performed in two independent stages,during each step we asked users to find sentiment contradictions: in the first stage, by looking at trends of positive and negative sentiments in the second stage, by providing a plot of the contradiction level‣ We measured time and number of clicks, needed to find one contradiction !T = T2/T1 !N = N2/N1 smaller is better‣ We also asked users to evaluate the effectiveness of each approach !D = D2/D1 each value was in the range [1-5], larger is better‣ Precision was calculated as a fraction of clicks, finding contradiction !P = P2/P1 P1 = N1/NC P2 = N2/NC larger is better‣ Then, we compared their improvements for each dataset 10
  12. 12. Performance Tests Usefulness Evaluation - Workflow Stage 1Stage 2 11
  13. 13. Performance Tests Usefulness Evaluation - Results Dataset Topic name Size ∆D ∆T ∆N P1 P2 ∆P1) Drug Ambien 60 1.50 0.60 0.88 0.70 0.81 1.20 Ratingz Yaz 300 1.58 0.93 0.78 0.75 0.95 1.32 Slashdot Int. control 159 1.17 0.89 0.58 0.37 0.63 2.14 YouTube Zune HD 472 2.07 0.68 0.62 0.36 0.61 2.09 Average 1.58 0.77 0.72 0.55 0.75 1.69 Table 1: Evaluation results for different topics. ‣ Stage 2 (based on contradictions) has showed improvement on all measures ‣ The largest many of the automatically identified contradictions were real ones improvement in precision was achieved for Slashdot and YouTube ‣ Time improvement was by the users).dataset with many contradictionswas (i.e., verified small for a The results show that our approach (Yaz), always more successful in suggesting to users time-intervals that but it was large for dataset with infrequent contradictions (Ambien) contained contradictions, with an overall average success rate of ‣ In all cases,75%, tool as high asto reduce the search space (number of clicks) our and allowed 95% (topic "Yaz"). The above results demonstrate that our approach can successfully identify contradictions in an automated way, and quickly guide users to the relevant parts of the data. 6.4 Evaluation of Scalability We evaluate the scalability of the CTree for solving Problems 1 12
  14. 14. Performance Tests Scalability Evaluation - Experimental Setup‣Syntetic dataset contained randomly generated sentiments for 1,000 topics‣We generated sets of 25 single-topic and all-topics queries‣In queries we used granularities and topics drawn uniformly at random‣We measured the time needed to execute these queries against the databaseas a function of the time interval, and the granularity of the time windows.‣We report results for both the fixed threshold and the adaptive threshold.select c1.topic_id, c1.timeBegin, c1.timeEnd from contradictions c1join contradictions c2 on c1.topic_id = c2.topic_id and c2.granularity = c1.granularity + 1 andc1.timeBegin is between c2.timeBegin and c2.timeEndwhere c1.contradiction ! c2.contradiction * @threshold and c1.granularity = @window and(c1.timeBegin is between startDate and endDate or c1.timeEnd is between startDate and endDate); 13
  15. 15. Performance Tests Scalability Evaluation - Results1 103 1 105 time, ms time, ms Query 1: time interval test Query 2: time interval test1 102 ‣ Linearly scales on the length of time series 4 1 101 101 ‣ Fixed threshold shows1 100 time interval, days 1 103 time interval, days lower answering time 200 400 800 1600 200 400 800 16001 103 Query 1: granularity test 1 106 Query 2: granularity test ‣ Increasing granularity1 102 1 105 makes answering faster1 101 1 104 ‣ All-Topic queries approx.1 100 1 103 by two orders slower time, ms time, ms granularity, days granularity, days1 10-1 1 102 1 10 100 1000 1 10 100 1000 dbms adaptive dbms fixed 14
  16. 16. Related Work Related Work - Contradiction Analysis‣ Contradictions were initially defined as textual inference and analyzed using linguistic technologies: (De Marneffe et al., 2008; Harabagui et al., 2006).‣ Contradictions can be distinguished on a large scale by calculating the entropy or clustering accuracy: (Choudhury et.al., 2008; Varlamis et.al., 2008).‣ Contradictions can also be analyzed using visual inspection of opposite sentiments: (Chen et al., 2006; Liu et.al., 2005).‣ Contradictive sentiments can be useful sources for opinion summarization: (Kim and Zhai, 2009; Paul et.al., 2010). 15
  17. 17. Mikalai TsytsarauThanks for your attention! 16