Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial Attacks

Robust Filtering Schemes for Machine
Learning Systems to Defend Adversarial
Attacks
Presented by :
Kishor Datta Gupta
kgupta1@memphis.edu
1

2
Presentation Outline
• Adversarial Attacks(AA)
• Adversarial Defenses
Introduction
Nature of Adversarial Attacks
• Input Filtering scheme
• Output Filtering Scheme
• End-to-End Protection Scheme
Our Solution
Conclusion
Publications

Adversarial Attack (AA) on AI/ML
Types:
• Poisoning Attack : Manipulate training data
“Manipulation of training data, Machine Learning (ML) model architecture, or manipulate
testing data in a way that will result in wrong output from ML”
3
Reference[1]

Types:
• Evasion Attack: Manipulate input data
4
Reference[2]

Types:
• Evasion Attack: Manipulate input data
• Trojan AI : Manipulate AI Architecture (example: Changes weights value)
5
Ref: Dasgupta et el 2020

Type of Evasion-Based Attacks
• One pixel Attack (Not practical)
Score based attack
• Lavan
• Dpatch
Patch Attack ( Human can identify)
• Basic : FGSM, BIM
• Saliency Map attack: JSMA
• Advanced low perturb attacks: CW
Gradient Attack:
• Hopskipjump attack
• Deepfool attack
Decision Attack:
• BPDA
Adaptive Attack:
6
Uses the gradients of the loss with respect to the input image to create a new
image that maximizes the loss. This new image is called the adversarial image.
(Ref: Dasgupta et el 2020)

Defense Strategies for AA
• Generate Adversarial Example and
Retrain the model
• Limitations: Reduce the accuracy of
learning model
Retrain:
• Using PCA, low-pass filtering, JPEG
compression, soft thresholding techniques
as pre-processing technique.
• Limitation: Vulnerable to adaptive attack.
Input Reconstruction or
Transformation:
• Modifying the ML architecture to detect
adversarial attack
• Limitations: Require Modification of
learning models.
Model Modification:
Reference[5]
7

Adversarial Input has Noise
Different Attack method has different types of noise/manipulation style
9
Generalized
Example 1
Real Example 2

Filters which can detect some noises
Observation: Clean and adversarial images have quantifiable noise difference
10
Example 1
Example 2

Low Noise AAs are not effective in
Physical world
Percentage of adversarial samples getting ineffective due to environmental factors
11
Attack Types
Minimum Adversarial noise:
Print/Screen

Key Research Focus
12
AA are transferable, and there are numerous ways to formulate AAs in different ML models.
AA have an additional noise signature which is detectable by some filters.
Low noise AAs are not effective in the physical world.
Counter physical world AA and identifying TrojAI can be sufficient if other security policies are effective.
Our initial approach is to focus on adversarial noise detection in input data only,
instead of study how these attack formulated in ML model or how the ML model
behaves. (consider ML models are inaccessible Blackbox)

DF Scheme Architecture
13
Proposed Solution

Input Filter Scheme
Reference[5]
14

DF Scheme Architecture
15
Input Filter Scheme

Detection of Adversarial Traits(3)
16

Detection of Adversarial Traits(2)
17

Average Histogram for white color.
Filters Metrics For Adversarial
Detection
18
Adversarial
Clean
Histogram's representation is dependent of the color of the object being studied, ignoring its shape and texture.

Filters Metrics For Adversarial
Detection
19
Clean
Adversarial

Filters For Adversarial Detection
20

Experimental Setup
Filters Class Filter ID Filter Name Filter ID Filter Name Filter ID Filter Name
Noise Reduction FT1 Sharp FT7 Thickening FT13 Shrink
Noise Addition FT2 blur FT8
Additive
noise
FT14 Dither
Texture Based FT3 Gabor FT9 Census FT15 wavelet
Transformation FT4 Fourier FT10 Laplace FT16 LogPolar
Analytical FT5 Distance FT11 Morph FT17
Gaussian
edge
Edge Detection FT6 Sobel FT12 Canny
21

Different filter types has different accuracy for different attack family.
Relation Between AAs and Filters
• Sharpening, Thickening,
Shrinking
Noise Reduction Filters
•Blur, Additive noise, Dithering
Noise Addition Filters
•Fourier, Laplace, Log polar
Transformation Filters
•Gabor, Census
Texture based Filters
• Sobel, Canny
Edge Detection Filters
• Distance filter, Morph
Analytical Filter
• FGSM,
BIM,MBIM,PGD
Gradient
Attack:
• Hopskipjump,
Deepfool
Decision
Attack:
22
Attack Family
Filter types
Darker means Better Accuracy

Relation Between AAs and Filters (2)
• Sharpening, Thickening,
Shrinking
Noise Reduction Filters
•Blur, Additive noise, Dithering,
Additive Gaussian Noise
Noise Addition Filters
•Fourier, Laplace, Log polar
Transformation Filters
•Gabor, Census
Texture based Filters
• Sobel, Canny
Edge Detection Filters
• Distance filter, Morph
Analytical Filter
• FGSM,
BIM,MBIM,PGD
Gradient
Attack:
• Hopskipjump,
Deepfool
Decision
Attack:
23
Filter from same family has similar performance
AA Attack Family
Filter types
GN
Blur
AN
Dither
ing
Gabor
Census
Darker means Better Accuracy

Genetic Algorithm
24
A small set of filter can show same performance.

Our Goal:
• We need to find an optimal sequence of filters that can detect most of the attack types using
SNR and Histogram values.
• We need multiple set of filter sequences so for each input we can use different sequence of
filter to make the defense system dynamic.
Problem: There can be billion possible combination of filter set can exist.
Determining Filter Sequence to detect AAs
25
Exhaustive search will be computationally costly.
Genetic algorithm can be used for multiple filter sequence set search.

Use of GA is to find optimal sequence of filters with fitness satisfying three objectives: Accuracy ,
Time cost and Diversity of filter family
Population initialized with randomly created filter sequence of variable length, As Example
Individual 1: FT2-FT6-FT7,
Individual 2: FT8-FT3-FT10-FT2
Individual 3: FT7-FT5
..
..
Use of Genetic Algorithm(GA)
• Based on these 3 objective, a variable length multi-objective weighted fitness function
with penalty for longer sequence length shorted the individuals based on their optimality.
• An elitist strategy employed to keep best and used steady-state GA (replacing half of the
population. A random mutation also done to avoid local optima.
• GA evolution terminated after an optimum where best fitness does not change for a period.
Each individual's accuracy calculated based on their accuracy to detect adversarial example
Time cost is based on time consumption filter sequence take to process 100 image.
How many filter family is represented in sequence counted as diversity
26

GA Implementation (Flow Diagram)
27

GA Encoding: Variable Length & different order
28
Illustration of GA Encoding Performance Metrics

29
Maintaining Diversity in GA Population

30
GA run Progression & Dropoffs

GA Performance Measure (Fitness Function)
31

GA Performances with Varying Parameters
33

Analysis: GA is better than Random and Brute search
35

MOGA: Input Scheme Experimental Results
Multi-objective GA algorithm
Objective 1: Accuracy (blue)
Objective 2: Time (pink)
Objective 3: Filter Family Diversity (Black)
Red: Each sequence position
Green: Pareto optimal sets
36
Accuracy Against FGSM, JSMA and CW attack samples

MOGA: Experimental Results and Comparison (2)
Adversarial Detection CNN EAF
Accuracy 96% 100% *
Training Time 1200s 90s
Test 100input 0.75s 0.09
Dataset: MNIST Adversarial (600 images)
CNN: 28 convolutional (kernel size 3)-> 2 MaxPool (kernel size 2) ->28 convolutional ->Flatten->relu->128
dense (25 epoch)
EAF: 16 filter
1 GPU, Ram 32GB
* Used all 60000 images to generate range.
37

Limitations of only Input Filter based Defense
• Need to generate adversarial inputs.
• Not effective against TrojAI/Backdoors.
• Vulnerable to adaptive attacks as filter numbers are
finite.
38

We need an adaptive defense strategy which don’t modify
the learning model and don’t require the adversarial
knowledge.
Output filtering scheme
Detect adversarial input using only the knowledge of
non-adversarial data,
Converting it as an Outlier detection problem
41

Effect of Filters
MNIST dataset: AA and Clean data in 2D space
42

Performance in Binary Classification
44

Negative Selection Algorithm for Outlier Detection
• Define Self as a normal pattern of activity or stable behavior of a system/process as a collection of logically split segments (equal-size) of
pattern sequence and represent the collection as a multiset S of strings of length lover a finite alphabet.
• Generate a set R of detectors, each of which fails to match any string in S.
• Monitor new observations (of S) for changes by continually testing the detectors matching against representatives of S. If any detector
ever matches, a change ( or deviation) must have occurred in system behavior.
45

V-detector Negative Selection
Algorithm
Main idea of V-detector By allowing the detectors to have some variable
properties, V-detector enhances negative selection algorithm from
several aspects: It takes fewer large detectors to cover non-self region –
saving time and space Small detector covers holes better. Coverage is
estimated when the detector set is generated. The shapes of detectors
or even the types of matching rules can be extended to be variable too.
46
(Reference: JI and Dasgupta 2005)

Generating (Negative) Detector set
47

Experimental Results
Detection accuracy for different attack type for different class of CIFAR and MNIST dataset
50
Detection accuracy for binary classification of clean and adversarial input(all) MNIST dataset

Adversarial attacks on CFIAR and
IMAGENET* detection rate
51

Outlier Detection models
Type Abbr Algorithm
Linear Model
MCD Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores)
OCSVM One-Class Support Vector Machines
LMDD Deviation-based Outlier Detection (LMDD)
Proximity-Based
LOF Local Outlier Factor
COF Connectivity-Based Outlier Factor
CBLOF Clustering-Based Local Outlier Factor
LOCI LOCI: Fast outlier detection using the local correlation integral
HBOS Histogram-based Outlier Score
SOD Subspace Outlier Detection
ROD Rotation-based Outlier Detection
Probabilistic
ABOD Angle-Based Outlier Detection
COPOD COPOD: Copula-Based Outlier Detection
FastABOD Fast Angle-Based Outlier Detection using approximation
MAD Median Absolute Deviation (MAD)
SOS Stochastic Outlier Selection
Outlier Ensembles
IForest Isolation Forest
FB Feature Bagging
LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles
XGBOD Extreme Boosting Based Outlier Detection (Supervised)
LODA Lightweight On-line Detector of Anomalies
Neural Networks
AutoEncoder Fully connected AutoEncoder (use reconstruction error as the outlier score)
VAE Variational AutoEncoder (use reconstruction error as the outlier score)
Beta-VAE Variational AutoEncoder (all customized loss term by varying gamma and capacity)
SO_GAAL Single-Objective Generative Adversarial Active Learning
MO_GAAL Multiple-Objective Generative Adversarial Active Learning
52

Comparison of
results with different
outlier detection
models to compare
V-detector NSA
performance with
other OCC methods.
Comparison with different outlier methods
53

Limitation
54
• ML model are processing all the input.
• Detection process is longer for trivial adversarial examples.

57
Basic Workflow of End-to-End scheme

Detail components of DF scheme 58

Comparison with other methods
F1 score comparison of different detection method with our proposed method
Advantage our proposed method over other detection methods
59
• No attack Sample Generation Needed.
• No ML Model Modification.
• Protection against preprocess based adaptive attacks.
• Independent of ML model architecture and transferable for similar dataset

Summary
• This research conducted extensive investigation to develop end-to-
end protection mechanism for Learning Systems.
• For a given problem/dataset, we used a collection of filters (having
varying degree of discriminatory abilities) to find a robust ensemble of
filters using a genetic algorithm (GA) for AA detection .
• Variable Length MOGA for searching set of Filters that are effective
against different type of AAs.
• We devised an adaptive negative filtering methodology to detect
adversarial attacks that does not modify the ML model or information
about the ML model.
• Our strategy can be implemented in most of the ML-based system
without expensive retraining.
• Current Adaptive attacks are ineffective in our negative filtering
approach as they are regenerating for each/or batch of input. 60

Publications
Patent under submission: System for Dual-Filtering for Learning Systems to Prevent Adversarial Attacks. (APP no: 63/022,323)
by Dipankar Dasgupta, Kishor Datta Gupta
Conference:
•Gupta, Kishor Datta, Dipankar Dasgupta, and Zahid Akhtar. "Adversarial Input Detection Using Image Processing Techniques (IPT)." In 2020 11th
IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), pp. 0309-0315. IEEE, 2020.
https://doi.org/10.1109/UEMCON51285.2020.9298060
•Gupta, Kishor Datta, Dipankar Dasgupta, and Zahid Akhtar. "Applicability issues of evasion-based adversarial attacks and mitigation techniques." In
2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1506-1515. IEEE, 2020. https://doi.org/10.1109/SSCI47803.2020.9308589
•Gupta, Kishor Datta, and Dipankar Dasgupta. "Using Negative Detectors for Identifying Adversarial Data Manipulation in Machine Learning" In 2021
International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, July18–22, 2021.
Journal:
•Gupta, Kishor Datta and Dipankar Dasgupta. “Dual-Filtering (DF) Schemes for Learning Systems to prevent Adversarial Attacks” Journal: Springer
Complex & Intelligent Systems, Manuscript ID: CAIS-D-21-00347, Submission date: March 2021. (under review)
•Gupta, Kishor Datta, and Dipankar Dasgupta. “Adaptive Ensemble of Filters (AEF) to Detect Adversarial Inputs” Journal: ACM Transactions on
Evolutionary Learning and Optimization, Manuscript ID: TELO-2020-45, Submission date: December 2020. (Second Review)
•Gupta, Kishor Datta, Dipankar Dasgupta, and Zahid Akhtar. “Determining Sequence of Image Processing Technique (IPT) to Detect Adversarial
Attacks” Journal: Springer Nature Computer Science, Manuscript ID: SNCS-D-20-01775, Submission date: October 2020. (Minor Revision),
61

Direction for future researchers
• Can explore use of deep learning method to generate filters
instead of searching using GA.
• Can explore explainable AI method to make the system more
reliable.
• Explore zero-shot learning method to detect adversarial input
using only self data.
62

Different Adversarial Attack Points
in deployed system
63
Effective operating system and communication channel security is a prerequisite.

References:
1.Machine learning in cyber security: Survey, D Dasgupta, Z Akhtar and Sajib Sen
2.https://medium.com/onfido-tech/adversarial-attacks-and-defences-for-convolutional-neural-networks-66915ece52e7
3.“Poisoning Attacks against Support Vector Machines”, Biggio et al. 2013.[https://arxiv.org/abs/1206.6389]
4.“Intriguing properties of neural networks”, Szegedy et al. 2014. [https://arxiv.org/abs/1312.6199]
5.“Explaining and Harnessing Adversarial Examples”, Goodfellow et al. 2014. [https://arxiv.org/abs/1412.6572]
6.“Towards Evaluating the Robustness of Neural Networks”, Carlini and Wagner 2017b. [https://arxiv.org/abs/1608.04644]
7.“Practical Black-Box Attacks against Machine Learning”, Papernot et al. 2017. [https://arxiv.org/abs/1602.02697]
8.“Attacking Machine Learning with Adversarial Examples”, Goodfellow, 2017. [https://openai.com/blog/adversarial-example-research/]
9.https://medium.com/@ODSC/adversarial-attacks-on-deep-neural-networks-ca847ab1063
10.Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint
arXiv:1708.06733, 2017.
11.A brief survey of Adversarial Machine Learning and Defense Strategies.Z Akhtar, D Dasgupta Technical Report, The University of Memphis,
12.Determining Sequence of Image Processing Technique (IPT) to Detect Adversarial AttacksKD Gupta, D Dasgupta, Z Akhtar, arXiv preprint arXiv:2007.00337
13.Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, and James Storer. Deflecting adversarial attacks with pixel deflection. In Proceedings of the IEEE
conference on computer vision and pattern recognition, pages 8571–8580, 2018.
14.Nicholas Carlini. Lessons learned from evaluating the robustness of defenses to adversarial examples. 2019.
15.Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, and Alexey Kurakin. On
evaluating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019.
16.Nicholas Carlini and DavidWagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016.
17.Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on
Artificial Intelligence and Security, pages 3–14, 2017.
18.Nicholas Carlini and DavidWagner. Magnet and" efficient defenses against adversarial attacks“ are not robust to adversarial examples. arXiv preprint
arXiv:1711.08478, 2017.
65

Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial Attacks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial Attacks

Similar to Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial Attacks (20)

More from Kishor Datta Gupta

More from Kishor Datta Gupta (20)

Recently uploaded

Recently uploaded (20)

Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial Attacks