This presentation discusses robust filtering schemes to defend machine learning systems against adversarial attacks. It outlines three main defense schemes: input filtering, output filtering, and an end-to-end protection scheme. The input filtering scheme uses a genetic algorithm to determine an optimal sequence of filters to detect adversarial examples. The output filtering scheme formulates the detection of adversarial inputs as an outlier detection problem. The end-to-end scheme integrates components for adversarial detection, filtering, and classification into a unified framework for protection. Experimental results show the proposed approaches can effectively detect various adversarial attack types while maintaining high classification accuracy.
3. Adversarial Attack (AA) on AI/ML
Types:
• Poisoning Attack : Manipulate training data
“Manipulation of training data, Machine Learning (ML) model architecture, or manipulate
testing data in a way that will result in wrong output from ML”
3
Reference[1]
4. Adversarial Attack (AA) on AI/ML
Types:
• Poisoning Attack : Manipulate training data
• Evasion Attack: Manipulate input data
“Manipulation of training data, Machine Learning (ML) model architecture, or manipulate
testing data in a way that will result in wrong output from ML”
4
Reference[2]
5. Adversarial Attack (AA) on AI/ML
Types:
• Poisoning Attack : Manipulate training data
• Evasion Attack: Manipulate input data
• Trojan AI : Manipulate AI Architecture (example: Changes weights value)
“Manipulation of training data, Machine Learning (ML) model architecture, or manipulate
testing data in a way that will result in wrong output from ML”
5
Ref: Dasgupta et el 2020
6. Type of Evasion-Based Attacks
• One pixel Attack (Not practical)
Score based attack
• Lavan
• Dpatch
Patch Attack ( Human can identify)
• Basic : FGSM, BIM
• Saliency Map attack: JSMA
• Advanced low perturb attacks: CW
Gradient Attack:
• Hopskipjump attack
• Deepfool attack
Decision Attack:
• BPDA
Adaptive Attack:
6
Uses the gradients of the loss with respect to the input image to create a new
image that maximizes the loss. This new image is called the adversarial image.
(Ref: Dasgupta et el 2020)
7. Defense Strategies for AA
• Generate Adversarial Example and
Retrain the model
• Limitations: Reduce the accuracy of
learning model
Retrain:
• Using PCA, low-pass filtering, JPEG
compression, soft thresholding techniques
as pre-processing technique.
• Limitation: Vulnerable to adaptive attack.
Input Reconstruction or
Transformation:
• Modifying the ML architecture to detect
adversarial attack
• Limitations: Require Modification of
learning models.
Model Modification:
Reference[5]
7
9. Adversarial Input has Noise
Different Attack method has different types of noise/manipulation style
9
Generalized
Example 1
Real Example 2
10. Filters which can detect some noises
Observation: Clean and adversarial images have quantifiable noise difference
10
Example 1
Example 2
11. Low Noise AAs are not effective in
Physical world
Percentage of adversarial samples getting ineffective due to environmental factors
11
Attack Types
Minimum Adversarial noise:
Print/Screen
12. Key Research Focus
12
AA are transferable, and there are numerous ways to formulate AAs in different ML models.
AA have an additional noise signature which is detectable by some filters.
Low noise AAs are not effective in the physical world.
Counter physical world AA and identifying TrojAI can be sufficient if other security policies are effective.
Our initial approach is to focus on adversarial noise detection in input data only,
instead of study how these attack formulated in ML model or how the ML model
behaves. (consider ML models are inaccessible Blackbox)
18. Average Histogram for white color.
Filters Metrics For Adversarial
Detection
18
Adversarial
Clean
Histogram's representation is dependent of the color of the object being studied, ignoring its shape and texture.
25. Our Goal:
• We need to find an optimal sequence of filters that can detect most of the attack types using
SNR and Histogram values.
• We need multiple set of filter sequences so for each input we can use different sequence of
filter to make the defense system dynamic.
Problem: There can be billion possible combination of filter set can exist.
Determining Filter Sequence to detect AAs
25
Exhaustive search will be computationally costly.
Genetic algorithm can be used for multiple filter sequence set search.
26. Use of GA is to find optimal sequence of filters with fitness satisfying three objectives: Accuracy ,
Time cost and Diversity of filter family
Population initialized with randomly created filter sequence of variable length, As Example
Individual 1: FT2-FT6-FT7,
Individual 2: FT8-FT3-FT10-FT2
Individual 3: FT7-FT5
..
..
Use of Genetic Algorithm(GA)
• Based on these 3 objective, a variable length multi-objective weighted fitness function
with penalty for longer sequence length shorted the individuals based on their optimality.
• An elitist strategy employed to keep best and used steady-state GA (replacing half of the
population. A random mutation also done to avoid local optima.
• GA evolution terminated after an optimum where best fitness does not change for a period.
Each individual's accuracy calculated based on their accuracy to detect adversarial example
Time cost is based on time consumption filter sequence take to process 100 image.
How many filter family is represented in sequence counted as diversity
26
36. MOGA: Input Scheme Experimental Results
Multi-objective GA algorithm
Objective 1: Accuracy (blue)
Objective 2: Time (pink)
Objective 3: Filter Family Diversity (Black)
Red: Each sequence position
Green: Pareto optimal sets
36
Accuracy Against FGSM, JSMA and CW attack samples
37. MOGA: Experimental Results and Comparison (2)
Adversarial Detection CNN EAF
Accuracy 96% 100% *
Training Time 1200s 90s
Test 100input 0.75s 0.09
Dataset: MNIST Adversarial (600 images)
CNN: 28 convolutional (kernel size 3)-> 2 MaxPool (kernel size 2) ->28 convolutional ->Flatten->relu->128
dense (25 epoch)
EAF: 16 filter
1 GPU, Ram 32GB
* Used all 60000 images to generate range.
37
38. Limitations of only Input Filter based Defense
• Need to generate adversarial inputs.
• Not effective against TrojAI/Backdoors.
• Vulnerable to adaptive attacks as filter numbers are
finite.
38
41. We need an adaptive defense strategy which don’t modify
the learning model and don’t require the adversarial
knowledge.
Output filtering scheme
Detect adversarial input using only the knowledge of
non-adversarial data,
Converting it as an Outlier detection problem
41
45. Negative Selection Algorithm for Outlier Detection
• Define Self as a normal pattern of activity or stable behavior of a system/process as a collection of logically split segments (equal-size) of
pattern sequence and represent the collection as a multiset S of strings of length lover a finite alphabet.
• Generate a set R of detectors, each of which fails to match any string in S.
• Monitor new observations (of S) for changes by continually testing the detectors matching against representatives of S. If any detector
ever matches, a change ( or deviation) must have occurred in system behavior.
45
46. V-detector Negative Selection
Algorithm
Main idea of V-detector By allowing the detectors to have some variable
properties, V-detector enhances negative selection algorithm from
several aspects: It takes fewer large detectors to cover non-self region –
saving time and space Small detector covers holes better. Coverage is
estimated when the detector set is generated. The shapes of detectors
or even the types of matching rules can be extended to be variable too.
46
(Reference: JI and Dasgupta 2005)
50. Experimental Results
Detection accuracy for different attack type for different class of CIFAR and MNIST dataset
50
Detection accuracy for binary classification of clean and adversarial input(all) MNIST dataset
52. Outlier Detection models
Type Abbr Algorithm
Linear Model
MCD Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores)
OCSVM One-Class Support Vector Machines
LMDD Deviation-based Outlier Detection (LMDD)
Proximity-Based
LOF Local Outlier Factor
COF Connectivity-Based Outlier Factor
CBLOF Clustering-Based Local Outlier Factor
LOCI LOCI: Fast outlier detection using the local correlation integral
HBOS Histogram-based Outlier Score
SOD Subspace Outlier Detection
ROD Rotation-based Outlier Detection
Probabilistic
ABOD Angle-Based Outlier Detection
COPOD COPOD: Copula-Based Outlier Detection
FastABOD Fast Angle-Based Outlier Detection using approximation
MAD Median Absolute Deviation (MAD)
SOS Stochastic Outlier Selection
Outlier Ensembles
IForest Isolation Forest
FB Feature Bagging
LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles
XGBOD Extreme Boosting Based Outlier Detection (Supervised)
LODA Lightweight On-line Detector of Anomalies
Neural Networks
AutoEncoder Fully connected AutoEncoder (use reconstruction error as the outlier score)
VAE Variational AutoEncoder (use reconstruction error as the outlier score)
Beta-VAE Variational AutoEncoder (all customized loss term by varying gamma and capacity)
SO_GAAL Single-Objective Generative Adversarial Active Learning
MO_GAAL Multiple-Objective Generative Adversarial Active Learning
52
53. Comparison of
results with different
outlier detection
models to compare
V-detector NSA
performance with
other OCC methods.
Comparison with different outlier methods
53
54. Limitation
54
• ML model are processing all the input.
• Detection process is longer for trivial adversarial examples.
59. Comparison with other methods
F1 score comparison of different detection method with our proposed method
Advantage our proposed method over other detection methods
59
• No attack Sample Generation Needed.
• No ML Model Modification.
• Protection against preprocess based adaptive attacks.
• Independent of ML model architecture and transferable for similar dataset
60. Summary
• This research conducted extensive investigation to develop end-to-
end protection mechanism for Learning Systems.
• For a given problem/dataset, we used a collection of filters (having
varying degree of discriminatory abilities) to find a robust ensemble of
filters using a genetic algorithm (GA) for AA detection .
• Variable Length MOGA for searching set of Filters that are effective
against different type of AAs.
• We devised an adaptive negative filtering methodology to detect
adversarial attacks that does not modify the ML model or information
about the ML model.
• Our strategy can be implemented in most of the ML-based system
without expensive retraining.
• Current Adaptive attacks are ineffective in our negative filtering
approach as they are regenerating for each/or batch of input. 60
61. Publications
Patent under submission: System for Dual-Filtering for Learning Systems to Prevent Adversarial Attacks. (APP no: 63/022,323)
by Dipankar Dasgupta, Kishor Datta Gupta
Conference:
•Gupta, Kishor Datta, Dipankar Dasgupta, and Zahid Akhtar. "Adversarial Input Detection Using Image Processing Techniques (IPT)." In 2020 11th
IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), pp. 0309-0315. IEEE, 2020.
https://doi.org/10.1109/UEMCON51285.2020.9298060
•Gupta, Kishor Datta, Dipankar Dasgupta, and Zahid Akhtar. "Applicability issues of evasion-based adversarial attacks and mitigation techniques." In
2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1506-1515. IEEE, 2020. https://doi.org/10.1109/SSCI47803.2020.9308589
•Gupta, Kishor Datta, and Dipankar Dasgupta. "Using Negative Detectors for Identifying Adversarial Data Manipulation in Machine Learning" In 2021
International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, July18–22, 2021.
Journal:
•Gupta, Kishor Datta and Dipankar Dasgupta. “Dual-Filtering (DF) Schemes for Learning Systems to prevent Adversarial Attacks” Journal: Springer
Complex & Intelligent Systems, Manuscript ID: CAIS-D-21-00347, Submission date: March 2021. (under review)
•Gupta, Kishor Datta, and Dipankar Dasgupta. “Adaptive Ensemble of Filters (AEF) to Detect Adversarial Inputs” Journal: ACM Transactions on
Evolutionary Learning and Optimization, Manuscript ID: TELO-2020-45, Submission date: December 2020. (Second Review)
•Gupta, Kishor Datta, Dipankar Dasgupta, and Zahid Akhtar. “Determining Sequence of Image Processing Technique (IPT) to Detect Adversarial
Attacks” Journal: Springer Nature Computer Science, Manuscript ID: SNCS-D-20-01775, Submission date: October 2020. (Minor Revision),
61
62. Direction for future researchers
• Can explore use of deep learning method to generate filters
instead of searching using GA.
• Can explore explainable AI method to make the system more
reliable.
• Explore zero-shot learning method to detect adversarial input
using only self data.
62
63. Different Adversarial Attack Points
in deployed system
63
Effective operating system and communication channel security is a prerequisite.
65. References:
1.Machine learning in cyber security: Survey, D Dasgupta, Z Akhtar and Sajib Sen
2.https://medium.com/onfido-tech/adversarial-attacks-and-defences-for-convolutional-neural-networks-66915ece52e7
3.“Poisoning Attacks against Support Vector Machines”, Biggio et al. 2013.[https://arxiv.org/abs/1206.6389]
4.“Intriguing properties of neural networks”, Szegedy et al. 2014. [https://arxiv.org/abs/1312.6199]
5.“Explaining and Harnessing Adversarial Examples”, Goodfellow et al. 2014. [https://arxiv.org/abs/1412.6572]
6.“Towards Evaluating the Robustness of Neural Networks”, Carlini and Wagner 2017b. [https://arxiv.org/abs/1608.04644]
7.“Practical Black-Box Attacks against Machine Learning”, Papernot et al. 2017. [https://arxiv.org/abs/1602.02697]
8.“Attacking Machine Learning with Adversarial Examples”, Goodfellow, 2017. [https://openai.com/blog/adversarial-example-research/]
9.https://medium.com/@ODSC/adversarial-attacks-on-deep-neural-networks-ca847ab1063
10.Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint
arXiv:1708.06733, 2017.
11.A brief survey of Adversarial Machine Learning and Defense Strategies.Z Akhtar, D Dasgupta Technical Report, The University of Memphis,
12.Determining Sequence of Image Processing Technique (IPT) to Detect Adversarial AttacksKD Gupta, D Dasgupta, Z Akhtar, arXiv preprint arXiv:2007.00337
13.Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, and James Storer. Deflecting adversarial attacks with pixel deflection. In Proceedings of the IEEE
conference on computer vision and pattern recognition, pages 8571–8580, 2018.
14.Nicholas Carlini. Lessons learned from evaluating the robustness of defenses to adversarial examples. 2019.
15.Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, and Alexey Kurakin. On
evaluating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019.
16.Nicholas Carlini and DavidWagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016.
17.Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on
Artificial Intelligence and Security, pages 3–14, 2017.
18.Nicholas Carlini and DavidWagner. Magnet and" efficient defenses against adversarial attacks“ are not robust to adversarial examples. arXiv preprint
arXiv:1711.08478, 2017.
65