Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

Paper information
1
• Title
ü Text Processing Like Humans Do: Visually Attacking and
Shielding NLP Systems
• URL
ü https://aclweb.org/anthology/papers/N/N19/N19-1165/
• Author
ü Steffen Eger, Gözde Gül Şahin, Andreas Rücklé, Ji-Ung Lee,
Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar,
Edwin Simpson, Iryna Gurevych
• Conference
ü NAACL2019

Background: visual perturbations to text
2
• Visual perturbations to text are often used to
obfuscate offensive comments in social media
• Those perturbations are considered as a new type of
adversarial attack in NLP
1 4M JUST GO1NG TO K1LL YOU ƒv¢K !!
You are f**ck!ng !d!0t
Adversarial attack:
Make modifications to an input to fool the system, while
the original meaning is still understood by humans

Background:
Advantages of visual perturbations
3
1. They do not require any linguistic knowledge beyond the
character level
2. They are less damaging to human perception than syntax
errors or the insertion of nagations
3. They do not require knowledge of the attacked model
In summary, visual perturbations are easily
applicable to any languages, domains and tasks
Perturbed: 1 4M JUST GO1NG TO K1LL YOU ƒv¢K !!
⇅
Raw: I AM JUST GOING TO KILL YOU FUCK !!

Summary of this paper:
4
• Develop three methods for visual perturbations
• Confirm that humans are robust to visual perturbations
• Confirm that the performance of SOTA NLP models
drops when attacked by visual perturbations
• Develop three methods to shield from visual attacks

5

Proposed visual perturbations
6
Proposed methods perturb input sentences by
replacing each character randomly based on:
• Image-based character embedding (ICES)
• Description-based character embedding (DCES)
• Easy-character embedding (ECES)

7
Image-based character embedding (ICES)
ü retrieve a 24*24 image of the character and convert it into
576 dimensional embedding vector
ü replace characters of the input sentences by their nearest
neighbors in the embedding space
Proposed visual perturbations:
Image-based
c
ć
Ҫ
ą
ă
a
embedding
space

8
Description-based character embedding (DCES)
ü retrieve the description of each Unicode character
ü replace characters by other ones whose description shares
many of the words of the target description
a - latin small letter “a”
à - latin small letter “a” with grave
description:
replace
Descriptions-based

9
Easy-character-based character embedding (ECES)
ü replace characters of the input sentences by manually
defined characters (targets are 52 characters: a-zA-Z)
a → â
b → ḃ
c → ĉ
:
rule: replace
Easy-character-based
manually
defined

• Ten nearest neighbors in different character spaces
• Examples of perturbed and original sentences
Easy-character-based
10
ECES-0.8
flipping probability of perturbations

11
• Confirm that humans are robust to visual
perturbations

12
To evaluate human performances, asked annotators to
recover the original sentences given perturbed text
ü calculate error rate by measuring the normalized edit distance
between the recovered sentence and the original one
Human annotation experiment against
visual perturbation
Flipping probability p
Errorratein%
Humans are very good at understanding visual perturbationsbetter
ECES

13

14
Evaluate the capabilities of SOTA NLP models for
below tasks to deal with visual attacks (by DCES)
• POS tagging (POS)
• Chunking (Chunk)
ü Dataset: CoNLL 2000
ü Model: Bi-LSTM with ELMo
• Grapheme-to-phoneme (G2P)
ü Dataset: Combilex pronunciation of American English
ü Model: Bi-LSTM
• Toxic comment classification (TC)
ü Dataset: Kaggle dataset
ü Model: Feed-forward network with ELMo
Computational experiment against
visual perturbation: settings

15
Show the relative performance s*(p) compared to
the performance of no perturbations s(0)
Computational experiment against
visual perturbation (no shielding)
better
= s*(p)
All systems degrade considerably compared
to the systems with no perturbations
p

16

17
Develop three shielding methods against visual attacks
• Adversarial training (AT)
ü Replace original training examples by perturbed data
• Visual character embedding (CE)
ü Use fixed ICEs to initialize the embeddings of the models
• Rule-based recovery (RBR)
ü Replace each non-standard character in the input with its
nearest standard neighbor in ICES (a-zA-Z + punctuation)
Proposed shielding methods against
visual perturbations

18
Show the performance improvements Δ between
shielding treatments σ(p)/s(0) and original scores s*(p)
visual perturbations: results
better
ΔAT
ΔCE
ΔAT+CE
ΔRBR
p p
= (p)/s(0) s(p)/s(0)<latexit sha1_base64="y0dGgyoE1b/V7cP6jkz5sSEf7iE=">AAAChnichVFNS1tBFD0+q8b4FXUjuEkNSlyY3ieKRSiE6qJLv6KCkfDe6xgH3xdvJgEbui70D7hwVUFEpNv6A9z0D3ThTyguFdy48ObliVip3mFmzpy5586ZGTt0pdJEl21G+5uOzq5Ud7qnt69/IDM4tK6CWuSIkhO4QbRpW0q40hclLbUrNsNIWJ7tig17b6G5v1EXkZKBv6b3Q7HtWVVf7kjH0kxVMm/Li8LV1oeyklXPyoeT71SeJrNTWfWAK5kcFSiO7HNgJiCHJJaCzAnK+IwADmrwIOBDM3ZhQXHbgglCyNw2GsxFjGS8L/AVadbWOEtwhsXsHo9VXm0lrM/rZk0Vqx0+xeUesTKLcfpDp3RNv+mM/tLdf2s14hpNL/s82y2tCCsD30dWb19VeTxr7D6qXvSssYP3sVfJ3sOYad7CaenrXw6uV+dXxhsTdERX7P8HXdIF38Cv3zjHy2LlEGn+APPf534O1qcLJhXM5Zlc8WPyFSmMYgx5fu85FPEJSyjxud/wE79wbqSMgjFrzLVSjbZEM4wnYRTvAfXek4o=</latexit><latexit sha1_base64="y0dGgyoE1b/V7cP6jkz5sSEf7iE=">AAAChnichVFNS1tBFD0+q8b4FXUjuEkNSlyY3ieKRSiE6qJLv6KCkfDe6xgH3xdvJgEbui70D7hwVUFEpNv6A9z0D3ThTyguFdy48ObliVip3mFmzpy5586ZGTt0pdJEl21G+5uOzq5Ud7qnt69/IDM4tK6CWuSIkhO4QbRpW0q40hclLbUrNsNIWJ7tig17b6G5v1EXkZKBv6b3Q7HtWVVf7kjH0kxVMm/Li8LV1oeyklXPyoeT71SeJrNTWfWAK5kcFSiO7HNgJiCHJJaCzAnK+IwADmrwIOBDM3ZhQXHbgglCyNw2GsxFjGS8L/AVadbWOEtwhsXsHo9VXm0lrM/rZk0Vqx0+xeUesTKLcfpDp3RNv+mM/tLdf2s14hpNL/s82y2tCCsD30dWb19VeTxr7D6qXvSssYP3sVfJ3sOYad7CaenrXw6uV+dXxhsTdERX7P8HXdIF38Cv3zjHy2LlEGn+APPf534O1qcLJhXM5Zlc8WPyFSmMYgx5fu85FPEJSyjxud/wE79wbqSMgjFrzLVSjbZEM4wnYRTvAfXek4o=</latexit><latexit sha1_base64="y0dGgyoE1b/V7cP6jkz5sSEf7iE=">AAAChnichVFNS1tBFD0+q8b4FXUjuEkNSlyY3ieKRSiE6qJLv6KCkfDe6xgH3xdvJgEbui70D7hwVUFEpNv6A9z0D3ThTyguFdy48ObliVip3mFmzpy5586ZGTt0pdJEl21G+5uOzq5Ud7qnt69/IDM4tK6CWuSIkhO4QbRpW0q40hclLbUrNsNIWJ7tig17b6G5v1EXkZKBv6b3Q7HtWVVf7kjH0kxVMm/Li8LV1oeyklXPyoeT71SeJrNTWfWAK5kcFSiO7HNgJiCHJJaCzAnK+IwADmrwIOBDM3ZhQXHbgglCyNw2GsxFjGS8L/AVadbWOEtwhsXsHo9VXm0lrM/rZk0Vqx0+xeUesTKLcfpDp3RNv+mM/tLdf2s14hpNL/s82y2tCCsD30dWb19VeTxr7D6qXvSssYP3sVfJ3sOYad7CaenrXw6uV+dXxhsTdERX7P8HXdIF38Cv3zjHy2LlEGn+APPf534O1qcLJhXM5Zlc8WPyFSmMYgx5fu85FPEJSyjxud/wE79wbqSMgjFrzLVSjbZEM4wnYRTvAfXek4o=</latexit><latexit sha1_base64="y0dGgyoE1b/V7cP6jkz5sSEf7iE=">AAAChnichVFNS1tBFD0+q8b4FXUjuEkNSlyY3ieKRSiE6qJLv6KCkfDe6xgH3xdvJgEbui70D7hwVUFEpNv6A9z0D3ThTyguFdy48ObliVip3mFmzpy5586ZGTt0pdJEl21G+5uOzq5Ud7qnt69/IDM4tK6CWuSIkhO4QbRpW0q40hclLbUrNsNIWJ7tig17b6G5v1EXkZKBv6b3Q7HtWVVf7kjH0kxVMm/Li8LV1oeyklXPyoeT71SeJrNTWfWAK5kcFSiO7HNgJiCHJJaCzAnK+IwADmrwIOBDM3ZhQXHbgglCyNw2GsxFjGS8L/AVadbWOEtwhsXsHo9VXm0lrM/rZk0Vqx0+xeUesTKLcfpDp3RNv+mM/tLdf2s14hpNL/s82y2tCCsD30dWb19VeTxr7D6qXvSssYP3sVfJ3sOYad7CaenrXw6uV+dXxhsTdERX7P8HXdIF38Cv3zjHy2LlEGn+APPf534O1qcLJhXM5Zlc8WPyFSmMYgx5fu85FPEJSyjxud/wE79wbqSMgjFrzLVSjbZEM4wnYRTvAfXek4o=</latexit>

19
better
ΔAT
ΔCE
ΔAT+CE
ΔRBR
p p
All tasks other than G2P profit from AT
• AT did not perform well on G2P because missing
tokens are more problematic than other tasks

20
better
ΔAT
ΔCE
ΔAT+CE
ΔRBR
p p
TC and G2P profit from CE
• CE can restore tokens from those neighborhoods in the
embedding space
• CE did not perform well on POS and Chunk because ELMo
might weaken the effect of CE

21
better
ΔAT
ΔCE
ΔAT+CE
ΔRBR
p p
All tasks profit from AT with CE
• The combination of them can boost the effect of each other

22
better
ΔAT
ΔCE
ΔAT+CE
ΔRBR
p p
All tasks profit from RBR lower than AT + CE
• RBR may incorrectly replace input tokens that affect the
performances

23
Show examples of the prediction in TC (flipping prob. = 0.1)
visual perturbations: example
ECES
DCES
ECES
DCES
• Perturbing specific words reduces the score of a non-shielded
approach, while perturbing useless words like ‘he’ has little effect
Answer Prediction

24
Show examples of the prediction in TC (flipping prob. = 0.1)
visual perturbations: example
ECES
DCES
ECES
DCES
• Perturbing specific words reduces the score of a non-shielded
approach, while perturbing useless words like ‘he’ has little effect
• Overall, all the shielding approaches help in various degrees
Answer Prediction

25

Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

Recommended

Recommended

More Related Content

Similar to Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

Similar to Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems (20)

Recently uploaded

Recently uploaded (20)

Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems