Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stance Detection with Bidirectional Conditional Encoding

223 views

Published on

EMNLP 2016 Paper: https://arxiv.org/pdf/1606.05464.pdf
Code: https://github.com/sheffieldnlp/stance-conditional

Stance detection is the task of classifying the
attitude expressed in a text towards a target
such as Hillary Clinton to be “positive”, “negative”
or “neutral”. Previous work has assumed
that either the target is mentioned in
the text or that training data for every target
is given. This paper considers the more
challenging version of this task, where targets
are not always mentioned and no training data
is available for the test targets. We experiment
with conditional LSTM encoding, which
builds a representation of the tweet that is dependent
on the target, and demonstrate that it
outperforms encoding the tweet and the target
independently. Performance is improved further
when the conditional model is augmented
with bidirectional encoding. We evaluate our
approach on the SemEval 2016 Task 6 Twitter
Stance Detection corpus achieving performance
second best only to a system trained on
semi-automatically labelled tweets for the test
target. When such weak supervision is added,
our approach achieves state–of-the-art results.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Stance Detection with Bidirectional Conditional Encoding

  1. 1. Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina Bontcheva {i.augenstein | t.rocktaschel}@cs.ucl.ac.uk, {a.vlachos | k.bontcheva}@sheffield.ac.uk Twitter Stance Detection with Bidirectional Conditional Encoding Bidirectional Conditional Encoding Bidirectional encoding of tweet conditioned on bidirectional encoding of target The stance is predicted using the last forward and reversed output representations Stance Detection Classify attitude of tweet towards target as “favor”, “against”, “none” Tweet: “No more Hillary Clinton” Target: Donald Trump Stance: FAVOR Training targets: Climate Change is a Real Concern, Feminist Movement, Atheism, Legalization of Abortion, Hillary Clinton Testing target: Donald Trump Challenges •  Tweet interpretation depends on target •  Solution: bidirectional conditional model •  Labelled data not available for the test target •  Solution: 1) domain adaptation; 2) weak labelling Results Unseen Target Models Weakly Supervised Models Comparison Against SOA 0.3 0.35 0.4 0.45 0.5 0.55 0.6 Macro F1 BoW TweetOnly Concat TarCondTweet TweetCondTar BiCond Conclusions •  Learning conditional sequence representation of targets and tweets better than representations without conditioning •  State of the art performance with weakly supervised model •  Good target representations with unsupervised pre-training •  Successful sequence representation learning with small data Acknowledgements This work was partially supported by the European Union, grant No. 611233 PHEME (http://www.pheme.eu) and by the Microsoft Research PhD Scholarship Programme. Data •  SemEval 2016 Task 6 Twitter Stance Detection corpus •  5 628 labelled training tweets (1 278 about Hillary Clinton, used for dev) •  278 013 unlabelled Donald Trump tweets •  395 212 additional unlabelled tweets about all targets •  707 Donald Trump testing tweets Experiments Stance Detection Models •  Sequence-agnostic: BoWV •  Target-agnostic: Tweet-only LSTM encoding (TweetOnly) •  Target+Tweet, no conditioning: Concatenated target and tweet LSTM encodings (Concat) •  Target+Tweet, conditioning: Target conditioned on tweet (TarCondTweet); tweet conditioned on target (TweetCondTar); bidirectional conditioning (BiCond) Word Embedding Training •  Random initialisation •  Trained fixed word embeddings on tweets •  Pre-trained word embeddings on tweets, continue training with supervised objective Model Macro F1 SVM-ngrams-comb (official) 0.2843 BiCond (Unseen Target) 0.4901 pkudblab (Weakly Supervised*) 0.5628 BiCond (Weakly Supervised) 0.5803 References SemEval 2016 Stance Detection: http://alt.qcri.org/semeval2016/task6/ Code: https://github.com/sheffieldnlp/stance-conditional x1 c! 1 c1 h! 1 h1 x2 c! 2 c2 h! 2 h2 x3 c! 3 c3 h! 3 h3 x4 c! 4 c4 h! 4 h4 x5 c! 5 c5 h! 5 h5 x6 c! 6 c6 h! 6 h6 x7 c! 7 c7 h! 7 h7 x8 c! 8 c8 h! 8 h8 x9 c! 9 c9 h! 9 h9 Legalization of Abortion A foetus has rights too ! Target Tweet Figure 1: Bidirectional encoding of tweet conditioned on bidirectional encoding of target ([c! 3 c1 ]). The stance is predicted using the last forward and reversed output representations ([h! 9 h4 ]). Here, xt is an input vector at time step t, ct denotes the LSTM memory, ht 2 Rk is an output vector and the remaining weight matrices and biases are train- able parameters. We concatenate the two output vec- tor representations and classify the stance using the softmax over a non-linear projection softmax(tanh(Wta htarget + Wtw htweet + b)) into the space of the three classes for stance detec- tion where Wta, Wtw 2 R3⇥k are trainable weight matrices and b 2 R3 is a trainable class bias. This model learns target-independent distributed repre- sentations for the tweets and relies on the non- linear projection layer to incorporate the target in the stance prediction. 3.2 Conditional Encoding In order to learn target-dependent tweet representa- tions, we use conditional encoding as previously ap- plied to the task of recognising textual entailment (Rockt¨aschel et al., 2016). We use one LSTM to en- code the target as a fixed-length vector. Then, we encode the tweet with another LSTM, whose state is initialised with the representation of the target. Finally, we use the last output vector of the tweet LSTM to predict the stance of the target-tweet pair. Formally, let (x1, . . . , xT ) be a sequence of tar- get word vectors, (xT+1, . . . , xN ) be a sequence of tweet word vectors and [h0 c0] be a start state of zeros. The two LSTMs map input vectors and a pre- vious state to a next state as follows: [h1 c1] = LSTMtarget (x1, h0, c0) . . . [hT cT ] = LSTMtarget (xT , hT 1, cT 1) [hT+1 cT+1] = LSTMtweet (xT+1, h0, cT ) . . . [hN cN ] = LSTMtweet (xN , hN 1, cN 1) Finally, the stance of the tweet w.r.t. the target is classified using a non-linear projection c = tanh(WhN ) where W 2 R3⇥k is a trainable weight matrix. This effectively allows the second LSTM to read the tweet in a target-specific manner, which is crucial since the stance of the tweet depends on the target (recall the Donald Trump example above). 3.3 Bidirectional Conditional Encoding Bidirectional LSTMs (Graves and Schmidhuber, 2005) have been shown to learn improved represen- tations of sequences by encoding a sequence from left to right and from right to left. Therefore, we adapt the conditional encoding model from Sec- tion 3.2 to use bidirectional LSTMs, which repre- sent the target and the tweet using two vectors for each of them, one obtained by reading the target [c3 ! c1 ! ] [h9 ! h4 ! ] Setting Unseen Target Setting •  Train on Climate Change Is A Real Concern, Feminist Movement, Atheism, Legalization of Abortion, Hillary Clinton tweets •  Test on Donald Trump tweets Weakly Supervised Setting •  Weakly label Donald tweets using hashtags for training •  Test on Donald Trump tweets 0.3 0.35 0.4 0.45 0.5 0.55 0.6 Macro F1 BoW TweetOnly Concat TarCondTweet TweetCondTar BiCond

×