Weakly Supervised Machine Reading

Weakly Supervised
Machine Reading
Isabelle Augenstein
University College London
October 2016

What is Machine Reading?
• Automatic reading (i.e. encoding of text)
• Automatic understanding of text
• Useful ingredients for machine reading
• Representation learning
• Structured prediction
• Generating training data

Machine Reading
RNNs are a popular method for machine reading method_for ( MR, XXX )
Supporting text Question
u(q)r(s)
g(x)What is a good method
for machine reading?

Machine Reading Tasks
• Word Representation Learning
• Output: vector for each word
• Learn relations between words, learn to distinguish words from one
another
• Unsupervised objective: word embeddings
• Sequence Representation Learning
• Output: vector for each sentence / paragraph
• Learn how likely a sequence is given a corpus, learn what next
most likely word is given a sequence of words
• Unsupervised objective: unconditional language models, natural
language generation
• Supervised objective: sequence classification tasks

Machine Reading Tasks
• Pairwise Sequence Representation Learning
• Output: vector for pairs of sentences / paragraphs
• Learn how likely a sequence is given another sequence and a
corpus
• Pairs of sequences can be encoded independently or encoded
conditioned on one another
• Unsupervised objective: conditional language models
• Supervised objective: stance detection, knowledge base slot filling,
question answering

Talk Outline
• Learning emoji2vec Embeddings from their Description
– Word representation learning, generating training data
• Numerically Grounded and KB Conditioned Language
Models
– (Conditional) Sequence representation learning
• Stance Detection with Bidirectional Conditional Encoding
– Conditional sequence representation learning, generating training data

Machine Reading: Word Representation
Learning
What is a good method

emoji2vec
• Emoji use has increased
• Emoji carry sentiment, which could be useful for
e.g. sentiment analysis

emoji2vec
• Task: learn representations for emojis
• Problem: many emojis are used infrequently, and
typical word representation learning methods (e.g.
word2vec) require them to be seen several times
• Solution: learn emojis from their description

emoji2vec
• Method: emoji embedding is sum of word
embeddings of words in description

emoji2vec
• Results
– Emoji vectors are useful in addition to GoogleNews
vectors for sentiment analysis task
– Analogy task also works for emojis

emoji2vec
• Conclusions
– Alternative source for learning representations
(descriptions) very useful, especially for rare words

Machine Reading: Sequence Representation
Learning (Unsupervised)
What is a good method

Numerically Grounded + KB Conditioned
Language Models
Semantic Error Correction with Language Models

Language Models
• Problem: clinical data contains many numbers,
many are unseen at test time
• Solution: concatenate RNN input embeddings with
numerical representations
• Problem: clinical data contains, in addition to
report, incomplete and inconsistent KB entry for
each patient, how to use it?
• Solution: lexicalise KB and condition on it

Language Models

Language Models
Model MAP P R F1
Random 27.75 5.73 10.29 7.36
Base LM 64.37 39.54 64.66 49.07
Cond 62.76 37.46 62.20 46.76
Num 68.21 44.25 71.19 54.58
Cond+Num 69.14 45.36 71.43 55.48
Semantic Error Correction Results

Language Models
• Conclusions
– Accounting for out-of-vocabulary tokens at test time
increases performance
– Duplicate information from lexicalising KB can help
further

Machine Reading: Pairwise Sequence
Representation Learning (Supervised)
u(q)r(s)
g(x)What is a good method

Stance Detection with Conditional Encoding
“@realDonaldTrump is the only honest voice of the
@GOP”
• Task: classify attitude of a text towards a given
target as “positive”, ”negative”, or ”neutral”
• Example tweet is positive towards Donald Trump,
but (implicitly) negative towards Hillary Clinton

• Challenges
– Learn a model that interprets the tweet stance towards
a target that might not be mentioned in the tweet itself
– Learn model without labelled training data for the target
with respect to which we are predicting the stance

• Challenges
– Learn a model that interprets the tweet stance towards
a target that might not be mentioned in the tweet itself
• Solution: bidirectional conditional model
– Learn model without labelled training data for the target
with respect to which we are predicting the stance
• Solution 1: use training data labelled for other targets (domain
adaptation setting)
• Solution 2: automatically label training data for target, using a
small set of manually defined hashtags (weakly labelled setting)

• Domain Adaptation Setting
– Train on Legalization of Abortion, Atheism, Feminist
Movement, Climate Change is a Real Concern and
Hillary Clinton, evaluate on Donald Trump tweets
Model Stance P R F1
FAVOR 0.3145 0.5270 0.3939
Concat AGAINST 0.4452 0.4348 0.4399
Macro 0.4169
FAVOR 0.3033 0.5470 0.3902
BiCond AGAINST 0.6788 0.5216 0.5899
Macro 0.4901

• Weakly Supervised Setting
– Weakly label Donald Trump tweets using hashtags,
evaluate on Donald Trump tweets
Model Stance P R F1
FAVOR 0.5506 0.5878 0.5686
Concat AGAINST 0.5794 0.4883 0.5299
Macro 0.5493
FAVOR 0.6268 0.6014 0.6138
BiCond AGAINST 0.6057 0.4983 0.5468
Macro 0.5803

• Other findings
– Pre-training word embeddings on large in-domain
corpus with unsupervised objective and continuing to
optimise them towards supervised objective works well
• Better than pre-training without further optimisation, or random
initialisation, or Google News embeddings
– LSTM encoding of tweets and targets works better than
sum of word embeddings baseline, despite small
training set (7k – 14k instances)
– Almost all instances for which target mentioned in tweet
have non-neutral stance

• Conclusions
– Modelling sentence pair relationship is important
– Automatic labelling of in-domain tweets is even more
important
– Learning sequence representations also a good
approach for small data

Thank you!
isabelleaugenstein.github.io
i.augenstein@ucl.ac.uk
@IAugenstein
github.com/isabelleaugenstein

References
Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak,
Sebastian Riedel. emoji2vec: Learning Emoji Representations from
their Description. SocialNLP at EMNLP 2016.
https://arxiv.org/abs/1609.08359
Georgios Spithourakis, Isabelle Augenstein, Sebastian Riedel.
Numerically Grounded Language Models for Semantic Error
Correction. EMNLP 2016. https://arxiv.org/abs/1608.04147
Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina
Bontcheva. Stance Detection with Bidirectional Conditional
Encoding. EMNLP 2016. https://arxiv.org/abs/1606.05464

Collaborators
Kalina Bontcheva
University of Sheffield
Andreas Vlachos
University of Sheffield
George
Spithourakis
UCLMatko
Bošnjak
UCL
Sebastian Riedel
UCL
Tim Rocktäschel
UCL
Ben
Eisner
Princeton

Weakly Supervised Machine Reading

More Related Content

What's hot

Viewers also liked

Similar to Weakly Supervised Machine Reading

More from Isabelle Augenstein

Recently uploaded

Weakly Supervised Machine Reading