Bae bert based adversarial examples for text classification

•

0 likes•277 views

taeseon ryu

Deep learning paper study 91th - BAE

Science

BAE: BERT-based Adversarial
Examples for Text Classification
신동진, 박희수, 황소현

Adversarial Example
Explaining and Harnessing Adversarial Examples, ICLR ‘15

Adversarial Attack in NLP
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment, AAAI ‘20

Prior works
- Character level error, adding / deleting words
- 부자연스러움
- 문법적 오류
- Rule-based synonym replacement
- TextFooler (AAAI ‘20)
- Synonym replacement
- Token-level similarity에만 의존
- e.g. The restaurant service was poor
→ broke : inappropriate
→ terrible : context-aware
- need more context-awareness
⇒ use BERT-MLM (Masked Language Model)

Problem Definition
- Given…
- S: 주어진 문장
- C: target model
- y: S의 ground-truth class
- 생성 목표: Sadv
- Semantic Similarity : Sadv
~ S
- Misclassification: C(Sadv
) != y
- This film offers many delights and surprises ⇒ C(S) = Positive
→ This movie offers enough pleasures and surprises ⇒ C(Sadv
) = Negative

Threat Model
- Black-box attack
- Classification 결과만 접근 가능
- Model weights, gradients, training data 접근 불가
- Soft-label
- score of classification
↔ hard-label : 0/1 class inclusion

1. Token Importance
- token ti
가 classification에 영향을 주는 정도
- 이 token을 직접 바꾸거나, 주변에 다른 token을 넣으면 classification이 바뀔
가능성이 높음
- P(C(S)=y) - P(C(S - {ti
})=y)로 계산
This film offers many delights and surprises
0.1 0.3 0.2 0.4 0.9 0.1 0.6Ii

2. Mask & predict top-K tokens
- Importance가 높은 token부터 masking
- BAE-I (insert)의 경우 token의 인접 위치에 mask를 삽입
- BERT-MLM을 이용해서 mask에 들어갈 token을 K개 예측
This film offers many ? and surprises
pleasures
twists
challenges

3. Filter semantic similarity
- 원래 문장과 실제로 의미가 같은지 판단
- Universal Sentence Encoder (USE) 기반의 sentence similarity encoder 사용
- 문법 체크
- BAE-R의 경우 원래 token과 part of speech (POS)가 같은지 확인
This film offers many ? and surprises
pleasures
twists
challenges

4. Apply perturbation
- 생성된 문장을 target model로 분류
- Prediction이 변하는 token이 여럿일 때 : USE score가 가장 높은 token
- Prediction이 변하는 token이 없을 때 : P(C(Sadv
) = y)를 가장 많이 낮추는 token
- 이 과정을 반복
This movie offers enough pleasures and surprises

Experiments
- Dataset
- Amazon, Yelp, IMDB (sentiment classification)
- MR (sentiment polarity)
- MPQA (opinion polarity)
- Subj (subjective / objective question)
- Model
- word-LSTM / word-CNN / BERT
- Attacks
- BAE-R (replace)
- BAE-I (insert)
- BAE-R/I (either replace or insert)
- BAE-R+I (replace, and then insert)

1. Automatic Evaluation
- Classification accuracy
- 31.0% → 4.0% : strong attack
- Semantic similarity (from USE)
- TextFooler: 0.747
- BAE-R+I: 0.848
⇒ semantically more similar

2. Effectiveness
- Replace/Insert operation의 횟수를 제한했을 때 attack 성공률 비교
- 같은 변형 비율에서 TextFooler보다 강력
- 40-50% 변형에서 포화 (최대 accuracy drop 도달)

3. Qualitative Examples
- Replacement / Insertion
- TextFooler: complex synonym
- BAE
- more natural
- fewer perturbations

4. Human Evaluation
원본 / Attack 문장들에 대한 사람의 평가
- Sentiment accuracy & Naturalness
- R > R+I > TextFooler
✱ Automatic evaluation 에서
USE score는 R+I > R
⇒ USE의 한계점

5. Replace vs. Insert
- A: (R/I) - R
- B: (R/I) - I
- A > B ⇒ Insert 필수 > Replace 필수
- C: (R/I) - R - I
- Subj에서 C의 비율이 가장 큼 & Subj가 가장 robust
⇒ Replace와 Insert를 모두 사용할 수 있다는 것이 중요

Similar to Bae bert based adversarial examples for text classification

AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series Amazon Web Services Korea

웹사이트 벤치마킹의 9가지 패턴shannonsi

Lcos显示产业在中国的机遇巍陆

AWS 비용, 어떻게 사용하고 계신가요? - 비용 최적화를 위한 AWS의 다양한 툴 알아보기 – 허경원, AWS 클라우드 파이낸셜 매니저:...Amazon Web Services Korea

Testing at the core of digital optimizationFlorian Pihs

Revisiting the Sibling Head in Object DetectorSungchul Kim

Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Sungchul Kim

12.2008 Trendbird Monthly Trend Report Samplewebtel125

Hr 045 職場經驗分享2handbook

Similar to Bae bert based adversarial examples for text classification (10)

AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series

웹사이트 벤치마킹의 9가지 패턴

Lcos显示产业在中国的机遇

AWS 비용, 어떻게 사용하고 계신가요? - 비용 최적화를 위한 AWS의 다양한 툴 알아보기 – 허경원, AWS 클라우드 파이낸셜 매니저:...

Testing at the core of digital optimization

Revisiting the Sibling Head in Object Detector

Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...

12.2008 Trendbird Monthly Trend Report Sample

Hr 045 職場經驗分享2

Recently uploaded

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314

Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal

Orientation, design and principles of polyhousejana861314

Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju

GBSN - Microbiology (Unit 1)Areesha Ahmad

GBSN - Biochemistry (Unit 1)Areesha Ahmad

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344

Green chemistry and Sustainable development.pptxRajatChauhan518211

CELL -Structural and Functional unit of life.pdfNistarini College, Purulia (W.B) India

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Formation of low mass protostars and their circumstellar disksSérgio Sacani

DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi

Biological Classification BioHack (3).pdfmuntazimhurra

Animal Communication- Auditory and Visual.pptxUmerFayaz5

Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6

GBSN - Microbiology (Unit 2)Areesha Ahmad

Botany 4th semester series (krishna).pdfSumit Kumar yadav

VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P

Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174

Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani

Recently uploaded (20)

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...

Spermiogenesis or Spermateleosis or metamorphosis of spermatid

Orientation, design and principles of polyhouse

Pests of cotton_Sucking_Pests_Dr.UPR.pdf

GBSN - Microbiology (Unit 1)

GBSN - Biochemistry (Unit 1)

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...

Green chemistry and Sustainable development.pptx

CELL -Structural and Functional unit of life.pdf

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service

Formation of low mass protostars and their circumstellar disks

DIFFERENCE IN BACK CROSS AND TEST CROSS

Biological Classification BioHack (3).pdf

Animal Communication- Auditory and Visual.pptx

Biopesticide (2).pptx .This slides helps to know the different types of biop...

GBSN - Microbiology (Unit 2)

Botany 4th semester series (krishna).pdf

VIRUSES structure and classification ppt by Dr.Prince C P

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN

Hubble Asteroid Hunter III. Physical properties of newly found asteroids

Bae bert based adversarial examples for text classification

1. BAE: BERT-based Adversarial Examples for Text Classification 신동진, 박희수, 황소현

2. Adversarial Example Explaining and Harnessing Adversarial Examples, ICLR ‘15

3. Adversarial Attack in NLP Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment, AAAI ‘20

4. Prior works - Character level error, adding / deleting words - 부자연스러움 - 문법적 오류 - Rule-based synonym replacement - TextFooler (AAAI ‘20) - Synonym replacement - Token-level similarity에만 의존 - e.g. The restaurant service was poor → broke : inappropriate → terrible : context-aware - need more context-awareness ⇒ use BERT-MLM (Masked Language Model)

5. Problem Definition - Given… - S: 주어진 문장 - C: target model - y: S의 ground-truth class - 생성 목표: Sadv - Semantic Similarity : Sadv ~ S - Misclassification: C(Sadv ) != y - This film offers many delights and surprises ⇒ C(S) = Positive → This movie offers enough pleasures and surprises ⇒ C(Sadv ) = Negative

6. Threat Model - Black-box attack - Classification 결과만 접근 가능 - Model weights, gradients, training data 접근 불가 - Soft-label - score of classification ↔ hard-label : 0/1 class inclusion

7. 1. Token Importance - token ti 가 classification에 영향을 주는 정도 - 이 token을 직접 바꾸거나, 주변에 다른 token을 넣으면 classification이 바뀔 가능성이 높음 - P(C(S)=y) - P(C(S - {ti })=y)로 계산 This film offers many delights and surprises 0.1 0.3 0.2 0.4 0.9 0.1 0.6Ii

8. 2. Mask & predict top-K tokens - Importance가 높은 token부터 masking - BAE-I (insert)의 경우 token의 인접 위치에 mask를 삽입 - BERT-MLM을 이용해서 mask에 들어갈 token을 K개 예측 This film offers many ? and surprises pleasures twists challenges

9. 3. Filter semantic similarity - 원래 문장과 실제로 의미가 같은지 판단 - Universal Sentence Encoder (USE) 기반의 sentence similarity encoder 사용 - 문법 체크 - BAE-R의 경우 원래 token과 part of speech (POS)가 같은지 확인 This film offers many ? and surprises pleasures twists challenges

10. 4. Apply perturbation - 생성된 문장을 target model로 분류 - Prediction이 변하는 token이 여럿일 때 : USE score가 가장 높은 token - Prediction이 변하는 token이 없을 때 : P(C(Sadv ) = y)를 가장 많이 낮추는 token - 이 과정을 반복 This movie offers enough pleasures and surprises

11. Experiments - Dataset - Amazon, Yelp, IMDB (sentiment classification) - MR (sentiment polarity) - MPQA (opinion polarity) - Subj (subjective / objective question) - Model - word-LSTM / word-CNN / BERT - Attacks - BAE-R (replace) - BAE-I (insert) - BAE-R/I (either replace or insert) - BAE-R+I (replace, and then insert)

12. 1. Automatic Evaluation - Classification accuracy - 31.0% → 4.0% : strong attack - Semantic similarity (from USE) - TextFooler: 0.747 - BAE-R+I: 0.848 ⇒ semantically more similar

13. 1. Automatic Evaluation

14. 2. Effectiveness - Replace/Insert operation의 횟수를 제한했을 때 attack 성공률 비교 - 같은 변형 비율에서 TextFooler보다 강력 - 40-50% 변형에서 포화 (최대 accuracy drop 도달)

15. 3. Qualitative Examples - Replacement / Insertion - TextFooler: complex synonym - BAE - more natural - fewer perturbations

16. 4. Human Evaluation 원본 / Attack 문장들에 대한 사람의 평가 - Sentiment accuracy & Naturalness - R > R+I > TextFooler ✱ Automatic evaluation 에서 USE score는 R+I > R ⇒ USE의 한계점

17. 5. Replace vs. Insert - A: (R/I) - R - B: (R/I) - I - A > B ⇒ Insert 필수 > Replace 필수 - C: (R/I) - R - I - Subj에서 C의 비율이 가장 큼 & Subj가 가장 robust ⇒ Replace와 Insert를 모두 사용할 수 있다는 것이 중요

18. The End

Bae bert based adversarial examples for text classification

Recommended

Recommended

More Related Content

Similar to Bae bert based adversarial examples for text classification

Similar to Bae bert based adversarial examples for text classification (10)

More from taeseon ryu

More from taeseon ryu (20)

Recently uploaded

Recently uploaded (20)

Bae bert based adversarial examples for text classification