AN EXPLORATION OF NON-LABEL-PRESERVING DATA AUGMENTATIONS FOR ACTIVE LEARNING

AN EXPLORATION OF NON
LABEL-PRESERVING DATA
AUGMENTATIONS
Jonathan Zarecki
To appear in IJCAI20 as “Textual Membership Queries”

About me
◦ Jonathan Zarecki
◦ MSc in ML & Active Learning with Prof. Shaul
Markovitch (Technion)
◦ Currently pursuing a Phd in CS with Prof. Gal Chechik
(BIU & Nvidia).

Overview
◦ Potential problems of traditional data augmentations in text
◦ Quick overview of active-learning
◦ Definition of new textual modification operators
◦ Applying heuristic-search with modification operators for
active-learning.
◦ Empirical evaluation of this method on several datasets

Data
Augmentations
(quickly)
Traditional
augmentations ensure
that the label remains
constant
This is a strong
limitations of the
operations we can
perform !

Data
Augmentations
(quickly)
Now with text
Batman is really
awesome
is really awesome
Batman is really not
awesome
Awesome is really
Batman
Batman is really great
Random
deletion
Random
Insertion
Random
Switch
Synonym
Replacemen
t
EDA – Wei & Zou (EMNLP 19)

Data
Augmentations
(quickly)
Now with text
In textual
augmentations it’s not
always trivial to keep
the sentence valid or
readable.
Batman is really
awesome
is really awesome
Batman is really not
awesome
Awesome is really
Batman
Batman is really great
Random
deletion
Random
Insertion
Random
Switch
Synonym
Replacemen
t
EDA – Wei & Zou (EMNLP 19)

Non-Restrictive Textual Augmentations
◦ What will happen if we let loose ? Apply any augmentation we want ?
My favorite movie so far
My computer favorite movie
so far
Add computer
My computer favorite movie
so
Remove far

◦ LSTMs ?
So you’re telling me
Continue with
LSTM

◦ LSTMs ?
Sometimes they’re pretty good tho
One does not simply
Continue with
LSTM
This meme does not exist - Imgflip

◦ But Let’s leave unreadable sentences aside.
◦ Another important property of using more expressive augmentations is that
the label might change !
Batman is really
awesome
Batman is really bad

Non Label-Preserving (LP) Augmentations
We want augmentations which will:
1. Change the sentence’s meaning significantly
2. Keep the sentence fully readable
(Somewhat) Unlike image augmentations
Using more expressive textual augmentation have the risk to make the resulting sentence
gibberish or completely change it’s label
Not knowing an example’s label we arrive at the field of active-learning

Overview
◦Potential problems of traditional data augmentations
in text
active-learning.

Overview
◦Quick overview of active-learning
active-learning.

Training
Active-Learning – Quick Overview
Labeled

Unlabeled
pool
Labeled
Inference

Unlabeled
pool
Labeled
Inference
Labeling
If uncertain,
Hi ! I’m a data
labeler

Unlabeled
pool
Inference
Labeling
If uncertain,
Training
Labeled
Hi ! I’m a data
labeler

Unlabeled
pool
Labeled
Inference
Training
Labeling
If uncertain,
Hi ! I’m a data
labeler

If uncertain,
The key of active learning is how to
measure the uncertainty.

Unlabeled
Pool
Inference
Labeling
What We’ll be Doing
If uncertain,
Labeled
Training

Generated
pool
Inference
Labeling
What We’ll be Doing
If uncertain,
Labeled
Training
Augment Without any unlabeled
data !

Overview
◦Definition of new textual modification operators
active-learning.

Why are textual modifications hard?
◦ When sentences are not built carefully they can easily become unreadable:
◦ Sentences has to comply to syntactic rules.
◦ But also to semantic rules
Took I the dog
to
Does not comply to
syntactic rules
I ate a book for breakfast
Does not “make sense”

Modification Operators Definition
◦ First we find all “replaceable word” in the sentence
◦ Nouns, verbs & adjectives
◦ For each replaceable word we look at the
knowledge-base and find words to replace it.
◦ All options returned are the modification operators for a
given sentence.
I hate all the catsI hate all the cats
hate
despise
adore
dislike
detest
cats
Dogs
Wolves
Lions
Pigs

So how does it look ?
I hate all the cats
I hate all the dogs
I despise all the cats
I adore all the cats
I adore all the dogs
Hate
Speech
I hate all the cats
Non-Hate
Speech

Semantic Knowledge-bases
◦ In order for our modification operators to work we need to find meaningful replacements.
◦ Replacements should be functionally similar – behave the same as the replaced word
◦ We need a knowledge-base where we can find such words.
◦ Options for this include: word2vec, WordNet and more.
◦ We chose “Dependency Word2vec” (Levy & Goldberg et al. 2014) as our
knowledge base
Dependency w2v – ACL 2014

Qualitative analysis of the knowledge-
bases
Dependency Word2vec (Levy & Goldberg 2014)
Introduces a subtle change in the word2vec context:
Australian scientist discovers star withtelescope
Australian scientist discoversstar with telescope
prep_withnsubj
dobj
Word2vec:
Dependency
word2vec:

Qualitative analysis of the knowledge-
bases
Dependency Word2vec (Levy & Goldberg 2014)
Functional similarity is exhibited very well in dep w2v.
w2v
dumbledore
hallows
half-blood
malfoy
snape
Dep w2v
sunnydale
collinwood
calarts
greendale
millfield
hogwarts
Related to
Harry Potter
Schools

Full example of modification operators
Batman is really awesomeBatman is really awesome
Batman
superman
superboy
supergirl
catwoman
aquaman
awesome
terrific
marvelous
wonderful
lousy
awful
Further analysis of 4 different
knowledge-bases can be seen in
the full paper.

Overview
◦Definition of new textual modification operators
◦ Applying heuristic-search with modification operators for active-learning.

Overview
◦Applying heuristic-search with modification
operators for active-learning.

Stochastic Synthesis Algorithm
◦ A simple way to use the operators is just applying them randomly.
◦ Until enough instances have been generated do:
1. Randomly choose an instance from the available examples
2. Apply a random operators to it
3. Return as new MQ
Examples
𝜙1
𝜙1
1 𝜙1
2
𝜙1
3𝜙1
3
𝜙2
𝜙2
1 𝜙2
2
𝜙2
3
𝜙2
3
New examples

Using search algorithms to generate
examples
◦ Repeatedly applying these
operators gives us many
options
◦ Using search algorithms we
can actively look for the
most informative examples
◦ But how do we
direct the search ?
𝜙1
𝜙1
1
𝜙1
2
𝜙1
3
= 𝜙2
𝜙2
1
= 𝜙3 𝜙2
2
𝜙2
3
𝜙3
1 𝜙3
2
𝜙3
3

Search Heuristic Function
◦ To direct the search we need a function that gives higher score to more informative
instances.
◦ We used existing active learning functions to give higher score to more informative
examples:
◦ Uncertainty sampling (Lewis & Gale, 1994)
◦ Expected model change (Lindenbaum, Markovitch, & Rusakov, 2004)

Heuristic-Search Generation
◦ Similar to the stochastic approach, but apply a search alg’ to pick the best example
2. Run a heuristic search on that instance
3. Return as a new example
Examples
𝜙3
1 𝜙3
2
𝜙3
3
New examples
𝜙2
1
= 𝜙3 𝜙2
2 𝜙2
3
𝜙1
𝜙1
1
𝜙1
2 𝜙1
3
= 𝜙2
𝜙2
3
Uncertainty sampling
directs the search

Heuristic-Search Generation
◦ Similar to the stochastic approach, but apply a search alg’ to pick the best example
2. Run a heuristic search on that instance
3. Return as a new example
Examples
𝜙′3
1 𝜙′3
2
𝜙′3
3
New examples
𝜙′2
1
= 𝜙′3 𝜙′2
2 𝜙′2
3
𝜙′1
𝜙′1
1
𝜙′1
2 𝜙′1
3
= 𝜙′2
𝜙′3
3
𝜙2
3
Uncertainty sampling
directs the search

Overview
◦ Applying heuristic-search with modification operators for active-learning.

Sentence Quality – Human Evaluation
◦ We already talked about how generating a readable sentence can be hard, are these operators
comply with that ?
◦ We randomly chose 1000 sentences from each category, and asked
“Is this sentence fully readable to you ?”
Original Sentences (96%)
Yes No
HS Sentences (95%)
Yes No
Wikipedia LSTM
Sentences (21%)
Yes No

Labeled
Training
Starting with 10 examples
Experiment 1 – Batch Active-Learning

Unlabeled
pool
Inference
Labeling
If uncertain,
Labeled
Training

Generated
pool
Inference
Labeling
If uncertain,
Labeled
Training
Augment Without any unlabeled
data !

Datasets
◦ Sentiment Analysis:
◦ CMR: Cornell sentiment polarity dataset
◦ SST: Stanford sentiment treebank, a sentence sentiment analysis dataset
◦ KS: A Kaggle short sentence sentiment analysis dataset
◦ Subjectivity/Objectivity Detection
◦ SUBJ: Cornell sentence subjective / objective dataset
◦ Offensive-language and Hate-speech Detection:
◦ HS: Hate speech and offensive language classification dataset

Compared Methods
◦ Our methods:
◦ Uncertainty sampling Hill-climbing MQ synthesis (US-HC-MQ)
◦ Uncertainty sampling Beam-search MQ synthesis (US-BS-MQ)
◦ Stochastic Synthesis (S-MQ)
◦ Competitor Methods:
◦ WordNet-based Synonym-replacement (WNA) (Lecun et al. 2016)
◦ Original examples (IDEAL)
◦ LSTM Generator (RNN) – pretrained on English Wikipedia
Uses unlabeled
data

Results –
Experiment 1
• We can see that our methods
have consistently improved
the initial accuracy
• The search-based method
are superior among almost all
datasets

Labeled
Training
Experiment 2 - Measuring Label Switch

Generated
pool
Inference
Labeled
Training
Augment
Choose 50 most
informative
How many
switched their
original label ?

◦ We compare 3 synthesis algorithms:
1. Uncertainty hill-climbing (search-based generation)
2. Stochastic hill-climbing (multiple random operators)
3. Stochastic Synthesis

What did we see ?
◦ Potential problems of non-restrictive augmentations in text
◦ Definition of new modification operators (= non-label preserving augmentations)
in the textual domain.
◦ Using heuristic-search with modification operators for generating new examples
for active-learning.

I want to thank you
for coming !
I want to thank you
for arriving !
I would-like to thank
you for coming !
I want to condemn
you for coming !
I want to thank you
for going !
Thanks-for-coming
sentence
I want to thank you
for coming !
Thank You !
Questions ?

AN EXPLORATION OF NON-LABEL-PRESERVING DATA AUGMENTATIONS FOR ACTIVE LEARNING

Recommended

Recommended

More Related Content

Similar to AN EXPLORATION OF NON-LABEL-PRESERVING DATA AUGMENTATIONS FOR ACTIVE LEARNING

Similar to AN EXPLORATION OF NON-LABEL-PRESERVING DATA AUGMENTATIONS FOR ACTIVE LEARNING (20)

Recently uploaded

Recently uploaded (20)

AN EXPLORATION OF NON-LABEL-PRESERVING DATA AUGMENTATIONS FOR ACTIVE LEARNING

Editor's Notes