This talk describes a study that showed that integrating foveation into modern convolutional neural network improves their robustness to adversarial attacks and common image corruptions. These slides are of a talk given by Muhammad Ahmed Shah at Riken AIP, Tokyo, Japan as part of the TrustML Young Scientist Seminar.
2. Adversarial Attacks on ML Models
2
𝛿 𝑥 + 𝛿
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
ϵ
10. Adversarial Vulnerability of DNNs
• The adversary searches for points within (an approximation of) the
human’s perceptual boundary for which the ML classifier responds
differently than we do
10
cat
11. Adversarial Vulnerability of DNNs
• If the classifier accurately models the perceptual boundary, the
adversary would have to find a point outside the boundary to change
the classifier’s output.
11
cat
12. Adversarial Vulnerability of DNNs
• If the classifier accurately models the perceptual boundary the
adversary would have to find a point outside the boundary to change
the classifier’s output.
12
Not A Cat
14. Adversarial Attack Methods
• Projected Gradient Descent (PGD) [Madry+2018]
1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿′ − 𝑥
cat
14
Π𝜖 is a projection onto a ℓ𝑝-norm ball of
radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used
and Π𝑥 projects to onto the subspace that 𝑥
lies in. Usually [-1, 1] for images.
15. Adversarial Attack Methods
• Projected Gradient Descent (PGD) [Madry+2018]
1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿′ − 𝑥
cat
15
Π𝜖 is a projection onto a ℓ𝑝-norm ball of
radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used
and Π𝑥 projects to onto the subspace that 𝑥
lies in. Usually [-1, 1] for images.
16. Adversarial Attack Methods
• Projected Gradient Descent (PGD) [Madry+2018]
1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿′ − 𝑥
cat
16
Π𝜖 is a projection onto a ℓ𝑝-norm ball of
radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used
and Π𝑥 projects to onto the subspace that 𝑥
lies in. Usually [-1, 1] for images.
17. Adversarial Attack Methods
• Projected Gradient Descent (PGD) [Madry+2018]
1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿′ − 𝑥
cat
17
Π𝜖 is a projection onto a ℓ𝑝-norm ball of
radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used
and Π𝑥 projects to onto the subspace that 𝑥
lies in. Usually [-1, 1] for images.
18. Adversarial Attack Methods
• Projected Gradient Descent (PGD) [Madry+2018]
1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿′ − 𝑥
cat
18
Π𝜖 is a projection onto a ℓ𝑝-norm ball of
radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used
and Π𝑥 projects to onto the subspace that 𝑥
lies in. Usually [-1, 1] for images.
19. SoTA Adversarial Defenses Train the Model to
be Robust
• Adversarial Training:
• Basic Algorithm [Madry+2017]
1. 𝑓𝑜𝑟 𝑥, 𝑦 ∈ 𝐷:
2. 𝑥𝑎𝑑𝑣 ← max
𝑥′∈𝒳
𝑥−𝑥′ ≤𝜖
ℒ(𝑓𝜃 𝑥′ , 𝑦)
3. 𝜃 ← ∇𝜃ℒ 𝑓𝜃 𝑥𝑎𝑑𝑣 , 𝑦
• Work well in most practical scenarios (hence empirical)
but no formal proof.
• Overfitting to the attack type and attack parameters
used during training.
19
20. Most Defenses Overfit to Training
Configuration
• The robustness is learned not intrinsic different types of attacks can break the defense
• Small norm perturbations are only one type of perturbations that humans are invariant
to.
• Ideal models should be invariant to all!
20
clean
ℓ
∞
PGD
Attack
Adversarial
Patch
Real World Adversarial Attacks Common Corruptions
Humans are invariant to all of the above without any specialized training
21. Hypothesis: The Robustness of Human Vision
is Emergent not Learned
• Perhaps the robustness of human vision is due to mechanisms and
constraints that DNNs do not have.
21
Humans DNNs
Neural activations are stochastic Neural activations are deterministic
Highly recurrent Usually exclusively feed-forward
Independent synaptic weights Tied weights (in convolutional NNs)
Constrained receptive fields (usually
Gabors)
Arbitrary receptive fields
Foveated vision (>90% of visual field is
low-res)
100% of the visual field is high-res
… …
22. Hypothesis: The Robustness of Human Vision
is Emergent not Learned
• Perhaps the robustness of human vision is due to mechanisms and
constraints that DNNs do not have.
22
Humans DNNs
Neural activations are stochastic Neural activations are deterministic
Highly recurrent Usually exclusively feed-forward
Independent synaptic weights Tied weights (in convolutional NNs)
Constrained receptive fields (usually
Gabors)
Arbitrary receptive fields
Foveated vision (>90% of visual field is
low-res)
100% of the visual field is high-res
… …
23. Hypothesis: Viewing the world at different
levels of fidelity makes human vision robust
• Humans vision is high-resolution only at the point of fixation and
blurry everywhere else (this is called foveation).
• Due to this constraint humans rely on low frequency features of the
image, such as shape. (Geirhos+, 2018)
• Since DNNs are not constrained, they tend to rely on high-frequency
features, such as textures. (Geirhos+, 2018)
• Adversarial attacks exploit DNNs by adding high-frequency
perturbations to the image (Wang+, 2020).
• We simulate foveation as a preprocessing step in CNNs and evaluate
its impact on robustness.
25
24. What is Foveation?
• There are 2 types of photoreceptors in the retina:
• Cones are sensitive to color
• Rods are sensitive to only illumination
27
25. What is Foveation?
• Cones are densely packed in a
small region in the center of the
retina called fovea, but sparse
every where else.
• Thus, vision has maximum fidelity
(acuity) at the fovea and
deteriorates in the periphery
• The perceived image in the
periphery is low resolution and
appears desaturated. (Hansen+,
2009; Stewart+, 2019)
28
By Cmglee - Own work,
CC BY-SA 3.0,
https://commons.wikime
dia.org/w/index.php?curi
d=29924570
Michael O. Wilkinson, Roger S.
Anderson, Arthur Bradley, Larry
N. Thibos; Resolution acuity
across the visual field for mesopic
and scotopic illumination. Journal
of Vision 2020;20(10):7.
doi: https://doi.org/10.1167/jov.20.
10.7.
27. R-Blur: Overview
31
+ 𝛿~𝒩(0, 𝜎)
𝜶𝑐
𝜶𝑟
1. Select fixation point
2. Add Gaussian Noise
3. Split into color and grey channels and
apply adaptive blurring
4. Combine the color and grey channels
28. Selecting the Fixation Point
• Training: Random fixation
• Evaluation: five fixation points at
the four corners and the center,
and the logits are averaged
• Not necessarily optimal
• Currently developing methods for
dynamically selecting fixation point
based on the image.
32
DNN DNN DNN DNN DNN
1
5
Σ
𝑦
29. Computing Eccentricity
• Eccentricity ≡ Distance from
fixation point
• Opticians measure the eccentricity
radially, i.e. Euclidian distance
• Need to extract circular regions to blur
– inefficient
• We use a different distance metric
𝑒𝑝𝑥,𝑝𝑦
=
max( 𝑝𝑥 − 𝑓𝑥 , |𝑝𝑦 − 𝑓𝑦|)
𝑊
• Regions with same eccentricity are
squares – can be extracted by slicing
the image tensor
33
30. Estimating Visual Acuity
• The acuity of color vision decreases
exponentially with eccentricity.
• The acuity of grey vision generally
much lower, and is minimal at the
fixation point.
• We approximate color and grey
acuity as:
𝒟𝐶 𝑒 ; 𝜎𝐶 = max Λ 𝑒; 0, 𝜎𝐶 , 𝜍 𝑒; 0,2.5𝜎𝐶
𝒟𝑅 𝑒; 𝜎𝑅, 𝑚 = 𝑚(1 − 𝒟𝐶 𝑒 ; 𝜎𝑅 )
• Λ and 𝜍 are the PDF for the Laplace
and Cauchi distribution
• We set 𝜎𝐶 = 0.12, 𝜎𝑅 = 0.09 and 𝑚 =
0.12
34
Michael O. Wilkinson, Roger S.
Anderson, Arthur Bradley, Larry
N. Thibos; Resolution acuity
across the visual field for mesopic
and scotopic illumination. Journal
of Vision 2020;20(10):7.
doi: https://doi.org/10.1167/jov.20.
10.7.
31. Quantizing Visual Acuity
• The visual acuity at a pixel
determines the std. dev. of
Gaussian blur applied to it.
• #Kernels = # Unique acuity values
= # unique eccentricity values =
𝑊
• To improve efficiency, we
quantize the estimated visual
acuity values
35
32. Applying Blur
• We compute the std. dev. of the
Gaussian kernel at each pixel as
𝛽𝐷(𝑒𝑝𝑥,𝑝𝑦
)
• 𝐷(𝑒𝑝𝑥,𝑝𝑦
) is the estimated acuity,
and 𝛽 = 0.05
36
33. Desaturation via Combination
• The blurred grey and color
images are combined in a
pixelwise combination
• The pixel weights are the color
and grey visual acuity values
37
34. Evaluation: Models and Baselines
38
ResNet18
ResNet
Baseline
ResNet18
R-Blur
ResNet18
R-Warp
PGD Attack
ResNet18
Adversarial
Training
35. Evaluation: Datasets
* We use 1000 testing images
39
Dataset #Train #Val #Test
Ecoset 1.4 M 28 K 28 K*
Ecoset-10 50 K 1 K 1 K
Ecoset-100 480 K 5 K 5 K*
36. Result #1: R-Blur Improves Empirical
Robustness
• Measured accuracy under Auto-PGD
(Croce & Hein, 2020)
• R-Blur is much better than ResNet
baseline, and R-Warp
• R-Blur is not as good as adversarial
training but that is expected
• AT is trained on adversarially perturbed
data
40
Ecoset
Ecoset-100
37. Computing Certifiable Robustness
• Accuracy under Auto-PGD is not a formal guarantee.
• Certifiably Correct @ 𝑟 ≡ 𝐸𝛿: 𝛿 2≤𝑟 1 𝑓 𝑥 + 𝛿 = 𝑦∗ ≥ 0.999
• The model 𝑓 classifies 𝑥 certifiably correctly at radius 𝑟 if it correctly classifies
𝑥 99.9% of the time under random perturbations of size at most 𝑟
• Certified Accuracy @ 𝑟 ≡ 𝐸𝑥,𝑦~𝐷 𝐶𝐶(𝑥, 𝑦; 𝑓)
• The certified accuracy at radius 𝑟 is the fraction of test data on which the
prediction of 𝑓 is certifiably correct.
41
38. Computing Certifiable Robustness
• Used randomized smoothing (Cohen+, 2019) to compute certified
accuracy at different perturbation sizes (radii):
1. Perturb the input by 105
noise samples from 𝒩 0, 𝜎
2. Obtain the model’s prediction for each sample
3. Compute the Binomial p-value to determine the maximum 𝑟 at which the
model classifies the image certifiably correctly
42
39. Result #2: R-Blur is Certifiable Robust
• G-Noise is a model trained
with Gaussian noise only
(no blurring)
• The robustness of R-Blur is
certifiable
• R-Blur achieves better
robustness than G-Noise
against larger
perturbations
43
Ecoset-100 Ecoset
40. Result #3: R-Blur is robust to Common
Corruptions
• We want models to be robust
to a variety of perturbations
not just those with small ℓ𝑝
norms.
• Images were perturbed by 19
non-adversarial corruptions
at 5 different severity levels.
• R-Blur achieves higher
accuracy than all models
when images are severely
corrupted
44
Ecoset-100 Ecoset
41. Result #4: All Components of R-Blur
Contribute to Robustness
• Removing each component of R-Blur
leads to reduction in accuracy under
adversarial attack.
45
43. Result #6: Accuracy of R-Blur Depends on the
Fixation Point
• A small (but significant) fraction
of images are highly sensitive to
the fixation point.
• We use an oracle to obtain the
model’s prediction at 49 fixation
points and pick the one at which
it is correct (if such a point exists)
47
44. Result #6: Accuracy of R-Blur Depends on the
Fixation Point
• Under oracle fixation point selection the
accuracy of R-Blur increases to within 2%
of the ResNet baseline
• Methods for predicting the fixation point
from the image are part of ongoing work.
48
5-fixation
Oracle
45. Conclusion
• R-Blur significantly improves the robustness of CNNs without being
trained on perturbed data.
• The robustness of R-Blur generalizes better than AT to different
perturbation types.
• R-Blur shows the promise of biologically motivated approaches to
model design, especially as it relates to their robustness.
• With predefined fixation points the clean accuracy of R-Blur is lacking.
• Appropriate fixation point selection can mitigate the loss in accuracy
considerably.
49
46. References
Ramezani, F., Kheradpisheh, S. R., Thorpe, S. J., and Ghodrati, M. Object categorization in visual periphery is modulated by
delayed foveal noise. Journal of Vision, 19 (9):1–1, 2019.
Stewart, E. E. M., Valsecchi, M., and Sch¨utz, A. C. A review of interactions between peripheral and foveal vision. Journal of Vision,
20(12):2–2, 11 2020. ISSN 1534-7362. doi: 10.1167/jov.20.12.2.
Hansen, T., Pracejus, L., and Gegenfurtner, K. R. Color perception in the intermediate periphery of the visual field. Journal of vision,
9(4):26–26, 2009
Wang, Haohan, et al. "High-frequency component helps explain the generalization of convolutional neural networks." Proceedings
of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
Dapello, J., Marques, T., Schrimpf, M., Geiger, F., Cox, D., and DiCarlo, J. J. Simulating a primary visual cortex at the front of cnns
improves robustness to image perturbations. Advances in Neural Information Processing Systems, 33:13073–13087, 2020.
Mehrer, J., Spoerer, C. J., Jones, E. C., Kriegeskorte, N., and Kietzmann, T. C. An ecologically motivated image dataset for deep
learning yields better models of human vision. Proceedings of the National Academy of Sciences, 118(8):e2011417118, 2021.
Krizhevsky, A., Nair, V., and Hinton, G. Cifar-10 (Canadian institute for advanced research). URL
http://www.cs.toronto.edu/∼kriz/cifar.html.
Croce, F. and Hein, M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In
International conference on machine learning, pp. 2206–2216. PMLR, 2020.
Cohen, J., Rosenfeld, E., and Kolter, Z. Certified adversarial robustness via randomized smoothing. In International Conference on
Machine Learning, pp. 1310–1320. PMLR, 2019.
Geirhos, Robert, et al. "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness." International
Conference on Learning Representations, 2018.
50
Editor's Notes
I’ll start by providing some background on adversarial attacks against ML models
Consider the image on the left, that is clearly showing a cute panda. If we pass this image to a deep learning model trained on imagenet, it also correctly identifies the panda.
Now consider the image on the right. This image is obtained by superimposing the patch of seemingly random noise, delta, shown in the middle onto the original image. The noise is scaled to a very small magnitude, epsilon, such that the modified image looks identical to the original. However, our imagenet model, which correctly classified the original image, classifies the modified image as a gibbon with 99% confidence.
I’ll begin by describing the problem of adversarial vulnerability in deep neural networks
We have two classifiers, the human and the ML classifier, and an adversary.
Under the threat model that we are considering, the goal of the adversary is to an input sample in such a way that the ML classifier responds to it differently than a human.
The objective of the ML classifier is to replicate the human’s decision function for the given task.
Consider the case of visual perception.
When the human sees this image, they will recognize it as a cat. When making this decision the human brings a vast amount of experiential and, even, scientific knowledge to bear.
In addition to this specific image, the human will also consider several perceptual variants of the same image as cats, as long as they exhibit the features that the human knows to be characteristic of cats.
The ML classifier, on the other hand, is provided only very sparse information by the human.
For example, the human will tell the classifier that these three images contain cats, without telling it what it actually means to be a cat, and that this image is that of a dog.
Now suppose that these images are arranged as shown in the human’s perceptual space. The image of the dog represents the entire perceptual region that contains dogs.
The classifier’s job is to discriminate between cats and dogs.
With the information the classifier is provided it can learn a boundary, such as this one, that intersects the perceptual region containing cats.
In fact, it can learn many such boundaries, all of which would be equally optimal given the information provided to the model.
In fact, it can learn many such boundaries, all of which would be equally optimal given the information provided to the model.
In fact, it can learn many such boundaries, all of which would be equally optimal given the information provided to the model.
In fact, it can learn many such boundaries, all of which would be equally optimal given the information provided to the model.
However, make the model vulnerable to adversarial attacks.
To craft its attack, the adversary searches for points within the human’s perceptual boundary for which the ML classifier responds differently than the human.
Since in most cases the human’s perceptual boundary is unknown (even to us humans), the adversary approximates it using a metric ball of a small radius around the data point.
In this example, this region here contains points that the human would consider as cats, but the classifier would classify as dog
If the classifier, was somehow able to accurately model the perceptual boundary of the human it would be robust to adversarial attacks, at least under the threat model we are considering.
In this case any, the adversary can not find a point within the human’s perceptual region for a cat, that the classifier will classify as a dog, and vice versa. Any point that the adversary picks such that the it changes the decision of the classifier, will also change the decision of the human.
One of the simplest techniquesis the fast gradient sign method (FGSM). FGSM computes the perturbation by computing the gradient of the loss function w.r.t to the input, x, and scaling its sign by epsilon.
A more powerful technique can be obtained by using projected gradient descent to iteratively find a perturbation that maximizes loss while remaining imperceptibly small. Like FGSM this technique computes the perturbation using the gradient of the loss function w.r.t to x. [START CLICKING FOR ANIMATION]
A more powerful technique can be obtained by using projected gradient descent to find a perturbation that maximizes loss while remaining imperceptibly small. Like FGSM this technique computes the perturbation using the gradient of the loss function w.r.t to x. [START CLICKING FOR ANIMATION]
A more powerful technique can be obtained by using projected gradient descent to find a perturbation that maximizes loss while remaining imperceptibly small. Like FGSM this technique computes the perturbation using the gradient of the loss function w.r.t to x. [START CLICKING FOR ANIMATION]
A more powerful technique can be obtained by using projected gradient descent to find a perturbation that maximizes loss while remaining imperceptibly small. Like FGSM this technique computes the perturbation using the gradient of the loss function w.r.t to x. [START CLICKING FOR ANIMATION]
A more powerful technique can be obtained by using projected gradient descent to find a perturbation that maximizes loss while remaining imperceptibly small. Like FGSM this technique computes the perturbation using the gradient of the loss function w.r.t to x. [START CLICKING FOR ANIMATION]