Weakly Supervised
Machine Reading
Isabelle Augenstein
University College London
October 2016
What is Machine Reading?
• Automatic reading (i.e. encoding of text)
• Automatic understanding of text
• Useful ingredients for machine reading
• Representation learning
• Structured prediction
• Generating training data
Machine Reading
RNNs are a popular method for machine reading method_for ( MR, XXX )
Supporting text Question
u(q)r(s)
g(x)What is a good method
for machine reading?
Machine Reading Tasks
• Word Representation Learning
• Output: vector for each word
• Learn relations between words, learn to distinguish words from one
another
• Unsupervised objective: word embeddings
• Sequence Representation Learning
• Output: vector for each sentence / paragraph
• Learn how likely a sequence is given a corpus, learn what next
most likely word is given a sequence of words
• Unsupervised objective: unconditional language models, natural
language generation
• Supervised objective: sequence classification tasks
Machine Reading Tasks
• Pairwise Sequence Representation Learning
• Output: vector for pairs of sentences / paragraphs
• Learn how likely a sequence is given another sequence and a
corpus
• Pairs of sequences can be encoded independently or encoded
conditioned on one another
• Unsupervised objective: conditional language models
• Supervised objective: stance detection, knowledge base slot filling,
question answering
Talk Outline
• Learning emoji2vec Embeddings from their Description
– Word representation learning, generating training data
• Numerically Grounded and KB Conditioned Language
Models
– (Conditional) Sequence representation learning
• Stance Detection with Bidirectional Conditional Encoding
– Conditional sequence representation learning, generating training data
Machine Reading: Word Representation
Learning
RNNs are a popular method for machine reading method_for ( MR, XXX )
Supporting text Question
What is a good method
for machine reading?
emoji2vec
• Emoji use has increased
• Emoji carry sentiment, which could be useful for
e.g. sentiment analysis
emoji2vec
emoji2vec
• Task: learn representations for emojis
• Problem: many emojis are used infrequently, and
typical word representation learning methods (e.g.
word2vec) require them to be seen several times
• Solution: learn emojis from their description
emoji2vec
• Method: emoji embedding is sum of word
embeddings of words in description
emoji2vec
• Results
– Emoji vectors are useful in addition to GoogleNews
vectors for sentiment analysis task
– Analogy task also works for emojis
emoji2vec
• Conclusions
– Alternative source for learning representations
(descriptions) very useful, especially for rare words
Machine Reading: Sequence Representation
Learning (Unsupervised)
RNNs are a popular method for machine reading method_for ( MR, XXX )
Supporting text Question
What is a good method
for machine reading?
Numerically Grounded + KB Conditioned
Language Models
Semantic Error Correction with Language Models
Numerically Grounded + KB Conditioned
Language Models
• Problem: clinical data contains many numbers,
many are unseen at test time
• Solution: concatenate RNN input embeddings with
numerical representations
• Problem: clinical data contains, in addition to
report, incomplete and inconsistent KB entry for
each patient, how to use it?
• Solution: lexicalise KB and condition on it
Numerically Grounded + KB Conditioned
Language Models
Numerically Grounded + KB Conditioned
Language Models
Model MAP P R F1
Random 27.75 5.73 10.29 7.36
Base LM 64.37 39.54 64.66 49.07
Cond 62.76 37.46 62.20 46.76
Num 68.21 44.25 71.19 54.58
Cond+Num 69.14 45.36 71.43 55.48
Semantic Error Correction Results
Numerically Grounded + KB Conditioned
Language Models
• Conclusions
– Accounting for out-of-vocabulary tokens at test time
increases performance
– Duplicate information from lexicalising KB can help
further
Machine Reading: Pairwise Sequence
Representation Learning (Supervised)
RNNs are a popular method for machine reading method_for ( MR, XXX )
Supporting text Question
u(q)r(s)
g(x)What is a good method
for machine reading?
Stance Detection with Conditional Encoding
“@realDonaldTrump is the only honest voice of the
@GOP”
• Task: classify attitude of a text towards a given
target as “positive”, ”negative”, or ”neutral”
• Example tweet is positive towards Donald Trump,
but (implicitly) negative towards Hillary Clinton
Stance Detection with Conditional Encoding
• Challenges
– Learn a model that interprets the tweet stance towards
a target that might not be mentioned in the tweet itself
– Learn model without labelled training data for the target
with respect to which we are predicting the stance
Stance Detection with Conditional Encoding
• Challenges
– Learn a model that interprets the tweet stance towards
a target that might not be mentioned in the tweet itself
• Solution: bidirectional conditional model
– Learn model without labelled training data for the target
with respect to which we are predicting the stance
• Solution 1: use training data labelled for other targets (domain
adaptation setting)
• Solution 2: automatically label training data for target, using a
small set of manually defined hashtags (weakly labelled setting)
Stance Detection with Conditional Encoding
Stance Detection with Conditional Encoding
• Domain Adaptation Setting
– Train on Legalization of Abortion, Atheism, Feminist
Movement, Climate Change is a Real Concern and
Hillary Clinton, evaluate on Donald Trump tweets
Model Stance P R F1
FAVOR 0.3145 0.5270 0.3939
Concat AGAINST 0.4452 0.4348 0.4399
Macro 0.4169
FAVOR 0.3033 0.5470 0.3902
BiCond AGAINST 0.6788 0.5216 0.5899
Macro 0.4901
Stance Detection with Conditional Encoding
• Weakly Supervised Setting
– Weakly label Donald Trump tweets using hashtags,
evaluate on Donald Trump tweets
Model Stance P R F1
FAVOR 0.5506 0.5878 0.5686
Concat AGAINST 0.5794 0.4883 0.5299
Macro 0.5493
FAVOR 0.6268 0.6014 0.6138
BiCond AGAINST 0.6057 0.4983 0.5468
Macro 0.5803
Stance Detection with Conditional Encoding
• Other findings
– Pre-training word embeddings on large in-domain
corpus with unsupervised objective and continuing to
optimise them towards supervised objective works well
• Better than pre-training without further optimisation, or random
initialisation, or Google News embeddings
– LSTM encoding of tweets and targets works better than
sum of word embeddings baseline, despite small
training set (7k – 14k instances)
– Almost all instances for which target mentioned in tweet
have non-neutral stance
Stance Detection with Conditional Encoding
• Conclusions
– Modelling sentence pair relationship is important
– Automatic labelling of in-domain tweets is even more
important
– Learning sequence representations also a good
approach for small data
Thank you!
isabelleaugenstein.github.io
i.augenstein@ucl.ac.uk
@IAugenstein
github.com/isabelleaugenstein
References
Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak,
Sebastian Riedel. emoji2vec: Learning Emoji Representations from
their Description. SocialNLP at EMNLP 2016.
https://arxiv.org/abs/1609.08359
Georgios Spithourakis, Isabelle Augenstein, Sebastian Riedel.
Numerically Grounded Language Models for Semantic Error
Correction. EMNLP 2016. https://arxiv.org/abs/1608.04147
Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina
Bontcheva. Stance Detection with Bidirectional Conditional
Encoding. EMNLP 2016. https://arxiv.org/abs/1606.05464
Collaborators
Kalina Bontcheva
University of Sheffield
Andreas Vlachos
University of Sheffield
George
Spithourakis
UCLMatko
Bošnjak
UCL
Sebastian Riedel
UCL
Tim Rocktäschel
UCL
Ben
Eisner
Princeton

Weakly Supervised Machine Reading

  • 1.
    Weakly Supervised Machine Reading IsabelleAugenstein University College London October 2016
  • 2.
    What is MachineReading? • Automatic reading (i.e. encoding of text) • Automatic understanding of text • Useful ingredients for machine reading • Representation learning • Structured prediction • Generating training data
  • 3.
    Machine Reading RNNs area popular method for machine reading method_for ( MR, XXX ) Supporting text Question u(q)r(s) g(x)What is a good method for machine reading?
  • 4.
    Machine Reading Tasks •Word Representation Learning • Output: vector for each word • Learn relations between words, learn to distinguish words from one another • Unsupervised objective: word embeddings • Sequence Representation Learning • Output: vector for each sentence / paragraph • Learn how likely a sequence is given a corpus, learn what next most likely word is given a sequence of words • Unsupervised objective: unconditional language models, natural language generation • Supervised objective: sequence classification tasks
  • 5.
    Machine Reading Tasks •Pairwise Sequence Representation Learning • Output: vector for pairs of sentences / paragraphs • Learn how likely a sequence is given another sequence and a corpus • Pairs of sequences can be encoded independently or encoded conditioned on one another • Unsupervised objective: conditional language models • Supervised objective: stance detection, knowledge base slot filling, question answering
  • 6.
    Talk Outline • Learningemoji2vec Embeddings from their Description – Word representation learning, generating training data • Numerically Grounded and KB Conditioned Language Models – (Conditional) Sequence representation learning • Stance Detection with Bidirectional Conditional Encoding – Conditional sequence representation learning, generating training data
  • 7.
    Machine Reading: WordRepresentation Learning RNNs are a popular method for machine reading method_for ( MR, XXX ) Supporting text Question What is a good method for machine reading?
  • 8.
    emoji2vec • Emoji usehas increased • Emoji carry sentiment, which could be useful for e.g. sentiment analysis
  • 9.
  • 10.
    emoji2vec • Task: learnrepresentations for emojis • Problem: many emojis are used infrequently, and typical word representation learning methods (e.g. word2vec) require them to be seen several times • Solution: learn emojis from their description
  • 11.
    emoji2vec • Method: emojiembedding is sum of word embeddings of words in description
  • 12.
    emoji2vec • Results – Emojivectors are useful in addition to GoogleNews vectors for sentiment analysis task – Analogy task also works for emojis
  • 13.
    emoji2vec • Conclusions – Alternativesource for learning representations (descriptions) very useful, especially for rare words
  • 14.
    Machine Reading: SequenceRepresentation Learning (Unsupervised) RNNs are a popular method for machine reading method_for ( MR, XXX ) Supporting text Question What is a good method for machine reading?
  • 15.
    Numerically Grounded +KB Conditioned Language Models Semantic Error Correction with Language Models
  • 16.
    Numerically Grounded +KB Conditioned Language Models • Problem: clinical data contains many numbers, many are unseen at test time • Solution: concatenate RNN input embeddings with numerical representations • Problem: clinical data contains, in addition to report, incomplete and inconsistent KB entry for each patient, how to use it? • Solution: lexicalise KB and condition on it
  • 17.
    Numerically Grounded +KB Conditioned Language Models
  • 18.
    Numerically Grounded +KB Conditioned Language Models Model MAP P R F1 Random 27.75 5.73 10.29 7.36 Base LM 64.37 39.54 64.66 49.07 Cond 62.76 37.46 62.20 46.76 Num 68.21 44.25 71.19 54.58 Cond+Num 69.14 45.36 71.43 55.48 Semantic Error Correction Results
  • 19.
    Numerically Grounded +KB Conditioned Language Models • Conclusions – Accounting for out-of-vocabulary tokens at test time increases performance – Duplicate information from lexicalising KB can help further
  • 20.
    Machine Reading: PairwiseSequence Representation Learning (Supervised) RNNs are a popular method for machine reading method_for ( MR, XXX ) Supporting text Question u(q)r(s) g(x)What is a good method for machine reading?
  • 21.
    Stance Detection withConditional Encoding “@realDonaldTrump is the only honest voice of the @GOP” • Task: classify attitude of a text towards a given target as “positive”, ”negative”, or ”neutral” • Example tweet is positive towards Donald Trump, but (implicitly) negative towards Hillary Clinton
  • 22.
    Stance Detection withConditional Encoding • Challenges – Learn a model that interprets the tweet stance towards a target that might not be mentioned in the tweet itself – Learn model without labelled training data for the target with respect to which we are predicting the stance
  • 23.
    Stance Detection withConditional Encoding • Challenges – Learn a model that interprets the tweet stance towards a target that might not be mentioned in the tweet itself • Solution: bidirectional conditional model – Learn model without labelled training data for the target with respect to which we are predicting the stance • Solution 1: use training data labelled for other targets (domain adaptation setting) • Solution 2: automatically label training data for target, using a small set of manually defined hashtags (weakly labelled setting)
  • 24.
    Stance Detection withConditional Encoding
  • 25.
    Stance Detection withConditional Encoding • Domain Adaptation Setting – Train on Legalization of Abortion, Atheism, Feminist Movement, Climate Change is a Real Concern and Hillary Clinton, evaluate on Donald Trump tweets Model Stance P R F1 FAVOR 0.3145 0.5270 0.3939 Concat AGAINST 0.4452 0.4348 0.4399 Macro 0.4169 FAVOR 0.3033 0.5470 0.3902 BiCond AGAINST 0.6788 0.5216 0.5899 Macro 0.4901
  • 26.
    Stance Detection withConditional Encoding • Weakly Supervised Setting – Weakly label Donald Trump tweets using hashtags, evaluate on Donald Trump tweets Model Stance P R F1 FAVOR 0.5506 0.5878 0.5686 Concat AGAINST 0.5794 0.4883 0.5299 Macro 0.5493 FAVOR 0.6268 0.6014 0.6138 BiCond AGAINST 0.6057 0.4983 0.5468 Macro 0.5803
  • 27.
    Stance Detection withConditional Encoding • Other findings – Pre-training word embeddings on large in-domain corpus with unsupervised objective and continuing to optimise them towards supervised objective works well • Better than pre-training without further optimisation, or random initialisation, or Google News embeddings – LSTM encoding of tweets and targets works better than sum of word embeddings baseline, despite small training set (7k – 14k instances) – Almost all instances for which target mentioned in tweet have non-neutral stance
  • 28.
    Stance Detection withConditional Encoding • Conclusions – Modelling sentence pair relationship is important – Automatic labelling of in-domain tweets is even more important – Learning sequence representations also a good approach for small data
  • 29.
  • 30.
    References Ben Eisner, TimRocktäschel, Isabelle Augenstein, Matko Bošnjak, Sebastian Riedel. emoji2vec: Learning Emoji Representations from their Description. SocialNLP at EMNLP 2016. https://arxiv.org/abs/1609.08359 Georgios Spithourakis, Isabelle Augenstein, Sebastian Riedel. Numerically Grounded Language Models for Semantic Error Correction. EMNLP 2016. https://arxiv.org/abs/1608.04147 Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina Bontcheva. Stance Detection with Bidirectional Conditional Encoding. EMNLP 2016. https://arxiv.org/abs/1606.05464
  • 31.
    Collaborators Kalina Bontcheva University ofSheffield Andreas Vlachos University of Sheffield George Spithourakis UCLMatko Bošnjak UCL Sebastian Riedel UCL Tim Rocktäschel UCL Ben Eisner Princeton