adversarial robustness lecture

Adversarial Robustness
Muhammad Ahmed Shah
04/19/2022

Outline
• Part I: Introduction to Adversarial Perturbations
• What are adversarial perturbations?
• How are adversarial perturbations created?
• How to defend against adversarial perturbations?
• Why do adversarial perturbations exist?
• Part II: Past and Current Projects in Our Group
• Towards Adversarial Robustness via Compact Feature Representations
• Biologically Inspired Models for Adversarial Robustness

Adversarial Attacks on ML Models
𝛿 𝑥 + 𝛿
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
ϵ

Adversarial Vulnerability of DNNs
• We have two classifiers
The Human The ML classifier The Adversary
5

• The ML classifier is trying to replicate the human’s decision function.
cat
6

• To teach the classifier, the human provides it very sparse feedback.
That’s a cat!
7

• The classifier is trying to replicate the human’s decision function.
cat
8

cat
9

cat
10

cat
11

• The adversary searches for points within (an approximation of) the
human’s perceptual boundary for which the ML classifier responds
differently than we do
cat
12

• If the classifier accurately models the perceptual boundary, the
adversary would have to find a point outside the boundary to change
the classifier’s output.
cat
13

• If the classifier accurately models the perceptual boundary the
adversary would have to find a point outside the boundary to change
the classifier’s output.
Not A Cat
14

Adversarial Attack Methods
• Fast Gradient Sign Method (FGSM) [Goodfellow+2014]
𝑥𝑎𝑑𝑣 = 𝑥 + 𝜖 ⋅ sign(∇𝑥ℒ 𝑓 𝑥 , 𝑦 )
cat
16
𝜖 ⋅ sign(∇𝑥ℒ 𝑓 𝑥 , 𝑦 )

• Projected Gradient Descent (PGD) [Madry+2018]
1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿′ − 𝑥
cat
17
Π𝜖 is a projection onto a ℓ𝑝-norm ball of
radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used
and Π𝑥 projects to onto the subspace that 𝑥
lies in. Usually [-1, 1] for images.

1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿 − 𝑥
cat
18

1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿 − 𝑥
cat
19

1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿 − 𝑥
cat
20

1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿 − 𝑥
cat
21

Types of Adversarial Defenses
• Empirical Defenses:
• Work well in most practical scenarios (hence empirical)
• No formal proof that they will always work.
• Certifiable Defenses:
• Provable (certifiable) that prediction of the model for a given input does not
change if a norm-bounded adversarial perturbation is added to it.
‖𝛿‖

Adversarial Training
• AT is perhaps the most successful empirical defense against adversarial
attacks.
• Over the years, several variations of AT have been proposed that have
made it more effective.
• Basic Algorithm [Madry+2017]
1. 𝑓𝑜𝑟 𝑥, 𝑦 ∈ 𝐷:
2. 𝑥𝑎𝑑𝑣 ← max
𝑥′∈𝒳
𝑥−𝑥′ ≤𝜖
ℒ(𝑓𝜃 𝑥′
, 𝑦)
3. 𝜃 ← ∇𝜃ℒ 𝑓𝜃 𝑥𝑎𝑑𝑣 , 𝑦
• Note that we create a new adversarial example in each iteration. Why?
• The model has changed so the gradients will change and thus the adversarial
perturbation will change

Adversarial Training
• Some issues:
• Multiple optimization steps required for computing adversarial perturbations
=> slow and computationally intensive => difficult to scale to larger
models/datasets.
• Robust overfitting – robust accuracy improves on the train set but decreases
on the testing set.
• Overfitting to the attack type and attack parameters used during training.

Incremental Improvements to Adversarial
Training
• Logit Pairing [Kannan+18]:
𝜃 ← ∇𝜃 ℒ 𝑓𝜃 𝑥𝑎𝑑𝑣 , 𝑦 + 𝛼 𝑓𝜃 𝑥 − 𝑓𝜃 𝑥𝑎𝑑𝑣
• TRADES [Zhang+19]:
argmax 𝑥′∈𝒳
𝑥−𝑥′ ≤𝜖
ℒ 𝑓𝜃 𝑥 , 𝑓𝜃 𝑥𝑎𝑑𝑣
𝜃 ← ∇𝜃 ℒ 𝑓𝜃 𝑥𝑎𝑑𝑣 , 𝑦 + ℒ 𝑓𝜃 𝑥 , 𝑓𝜃 𝑥𝑎𝑑𝑣 /𝜆

Speeding Up Adversarial Training [Wong+20]

Speeding Up Adversarial Training [Wong+20]
• Use a single FGSM step instead
of multiple PGD steps during
adversarial training.
• Generally this does not work, but
the authors state that the key is
random initialization of the
perturbation.
• Can train an ImageNet model in
12 hours, compared to 50 hours
required for ImageNet.

State-of-the-Art Adversarial Training
[Rebuffi+21]

State-of-the-Art Adversarial
Training [Rebuffi+21]
• Combines model weight averaging, data
augmentation and synthetic data
generation to achieve SOTA robust
accuracy on CIFAR-10
• Weight Averaging:
1. 𝜃𝑡+1 = 𝜃𝑡 + 𝜂∇𝜃
2. 𝜃𝑡+1 = 𝜏𝜃𝑡+1 + 1 − 𝜏 𝜃𝑡
• Uses TRADES to perform AT.
• Key Outcomes:
• Weight Averaging improves robustness
• CutMix data augmentation provides the best
clean-robust accuracy tradeoff
• Increasing the training data improves robust
accuracy
• Using a small amount synthetic data is
beneficial.

Randomized Smoothing [Cohen+19]
• RS is a very popular certifiable defense.
• Smoothed Classifier – 𝑔:
𝑔 𝑥 = argmax𝑐∈𝒴𝑃𝜀~𝒩(0,𝐼𝜎) 𝑓 𝑥 + 𝜀 = 𝑐
• Smoothing to Robustness:
• Let 𝑝𝐴 and 𝑝𝐵 be the probabilities of the most probable class and the second-
most probable class, respectively and let Φ be the Gaussian CDF.
• 𝑔 is robust in a ℓ2 ball of radius 𝑅 =
𝜎
2
Φ−1 𝑝𝐴 − Φ−1 𝑝𝐵 around 𝑥.

Randomized Smoothing [Cohen+19]
• SAMPLEUNDERNOISE:
1. 𝑓𝑜𝑟 𝑖: 1 → 𝑛:
2. 𝜖~𝒩 0, 𝐼𝜎
3. 𝑐 = 𝑓 𝑥 + 𝜖
4. 𝑐𝑜𝑢𝑛𝑡𝑠 𝑐 += 1
• BINOMPVALUE returns the p-value of the
hypothesis test that 𝑛𝐴~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛𝐴 +

The Classifier is Learning Non-Robust
Features
• The features extracted from the data can be of 3 types:
• Robust:
• Features that are likely to be meaningful to humans as well.
• Non-robust:
• Features that are useful for the task at hand but are either not what humans would use,
or their usefulness is an artifact of the dataset
• Useless:
• They can not be used to improve performance on the task.

Example: XOR
• Now the algorithm has been provided this new table instead
• Target Function: 𝑌 = 𝑋1 xor 𝑋2
• 𝑋3 is a spurious input
X1 X2 X3 Y
0 0 0 0
0 1 0 1
1 0 0 1
1 1 0 0
0 0 1 0
0 1 1 1
1 0 1 1
1 1 1 0
35

Example: XOR
• The algorithm can learn any of these patterns for the unseen input
combinations
• Only one is right for our target function
• If it learns any of the others, the output for some combinations of X1 and X2
can be made erroneous by choosing the right X3
X1 X2 X3 Y
0 0 0 0
0 1 0 1
1 0 0 1
1 1 0 0
0 0 1 0
1 1 1 0
0 1 1 0
1 0 1 0
0 1 1 1
1 0 1 0
0 1 1 0
1 0 1 1
0 1 1 1
1 0 1 1
36

Example: XOR
• The number of missing patterns is exponential in the number of spurious bits
• The number of possible extensions to the table is exponential in the number of
missing patterns
• The number of ways of adversarially modifying inputs increases super-
exponentially with the number of spurious bits
X1 X2 X3 Y
0 0 0 0
0 1 0 1
1 0 0 1
1 1 0 0
0 0 1 0
1 1 1 0
0 1 1 0
1 0 1 0
0 1 1 1
1 0 1 0
0 1 1 0
1 0 1 1
0 1 1 1
1 0 1 1
2𝐾 − 𝐷
22𝐾−𝐷
37

Robust and Non-Robust Features in Images
• Ilyas+18 used adversarial attacks
to disentangle robust and non-
robust features
• Non-robust features can provide
good accuracy on clean data.

High Frequency Features [Wang+20]
• [Wang+20] identifies a particular type of
non-robust feature – high frequency
components.
• Generally human perception operates on
a bounded band of frequencies, both in
terms of visual and audio perception.
• HF components in audio are in audible to
humans (e.g. dog whistle)
• HF components in images seem like noise to
us and we try to filter it out.
• HF components of the images are
separated by computing the fourier
transform of the image and thresholding
the distance to the component to the
centroid
FFT Log
https://homepages.inf.ed.ac.uk/rbf/HIPR2/fourier.htm

• It turns out image classifiers are
heavily reliant on high-frequency
components.
• If a trained model is presented with
only the low frequency components, it
performs poorly.
• If it is presented with only the high
frequency components it predicts
correctly
• In the initial epochs using LF
components does not harm training
accuracy, but in later epochs it does
•  models learn LF components first
then they learn HF components.

• Frequency Components and
Robustness:
• If models rely heavily on HF
components are vulnerable
• Humans don’t rely on HF, so
changing HF components will
likely not change the human’s
prediction.
• If a convolutional filter highly
weighs HF components, the
model will rely on HF
components.
• Hypothesis: Smoothing the
filter will improve robustness
• There is a noticeable
improvement.

Differences between Biological Systems and
ML Models [Firestone+20]
• Biological and artificial systems
are constrained in different ways,
so comparison based only on
performance are not apples-to-
apples.
• Humans have foveated vision.
• Adding a foveated filter to images
before passing them to CNNs yields
more semantically meaningful
perturbations

apples.
• Machines have a restricted
vocabulary
1. Crossword
2. Bagel
3. Starfish
4. School bus

apples.
• Machines have a restricted
vocabulary
• Different inductive biases – shape
vs. texture

• Biological and artificial systems are
constrained in different ways, so
comparison based only on
apples.
• Machines have a restricted vocabulary
• Different inductive biases – shape vs.
texture
• Machines (usually) perform feed
forward processing, while humans can
do recurrent processing

Towards Adversarial Robustness via Compact
Feature Representations [Shah+21a]

Recall the Example: XOR
• Now the algorithm has been provided this new table instead
• Target Function: 𝑌 = 𝑋1 xor 𝑋2
• 𝑋3 is a spurious input
X1 X2 X3 Y
0 0 0 0
0 1 0 1
1 0 0 1
1 1 0 0
0 0 1 0
0 1 1 1
1 0 1 1
1 1 1 0
48

Neurons Are Features in DNNs
• Too few neurons  features are under-specified  low accuracy
• Too many neurons  features are over-specified  adversarial vulnerability
We propose a method to identify superfluous features and remove/reduce their
influence on the model’s output

Identifying Superfluous Features
• Let 𝒇 = [𝑓1, … , 𝑓𝑛] be feature vector
• We can decompose each 𝑓𝑖 ∈ 𝒇 as
𝑓𝑖 = 𝜙 𝒇−𝑖 + 𝛿𝑖
• 𝛿𝑖 is the novel information encoded by 𝑓𝑖 and 𝜙 𝒇−𝑖 is the redundant
information.
• We quantify the usefulness of 𝑓𝑖 as 𝐼(𝛿𝑖, 𝑌), where 𝑌 is the true label
• Less useful (superfluous) features are not related to the true label and can be
exploited by the adversary

Decomposing the Features [Shah+20]
• We assume 𝜙 to be a linear function
𝜙 𝒇−𝑖 = 𝒇−𝑖𝒂(𝑖)
• We can find 𝒂(𝑗)
by solving
minimize𝒂∈ℝ𝑛−1𝔼𝑥∼𝒫 𝒇−𝑖𝒂(𝑖) − 𝑓𝑖
2
• We can compute 𝛿𝑖 = 𝑓𝑖 − 𝜙(𝒇−𝑖)
51

Determining the Usefulness of A Feature
[Shah+20]
• Recall: We define the usefulness of 𝑓𝑖 as 𝐼(𝛿𝑖, 𝑌)
• 𝐼 𝛿𝑖, 𝑌 is difficult to compute so we estimate it using two methods:
• First-order approximation:
𝐼 𝛿𝑖, 𝑌 = ∇𝑓𝑖
𝐿 𝑋, 𝑌 𝛿𝑖
• MINE [Belghazi+2018]: Uses a neural network to estimate MI.
• We rank the features by their usefulness and remove the least useful
features using LRE-AMC

Removing Features Using LRE-AMC
[Shah+21b]
• Given the neuron-feature equivalence
𝑓𝑖 = 𝜎(𝑊𝑖𝒇)
• To remove 𝑓𝑖 we can simply remove the 𝑖𝑡ℎ column of 𝑊
• This also removes the influence of 𝜙 𝒇−𝑖 which may cause the
weights in the next layer (𝑊) to become sub-optimal.
• To mitigate the error, we make the following adjustment
𝑊𝑘𝑗 ← 𝑊𝑘𝑗 + 𝑊𝑘𝑖𝒂𝑗
𝑖
• After removing the neuron, we fine-tune the network.

Lossless Redundancy Elimination
• Consider this network
• 𝑦1 = 𝑤11𝑧1 + 𝑤12𝑧2 + 𝑤13𝑧3
• 𝑦2 = 𝑤21𝑧1 + 𝑤22𝑧2 + 𝑤23𝑧3
𝑧2 𝑧3
𝑧1
𝑦2
𝑦1
54

• 𝑦1 = 𝑤11𝑧1 + 𝑤12𝑧2 + 𝑤13𝑧3
• 𝑦2 = 𝑤21𝑧1 + 𝑤22𝑧2 + 𝑤23𝑧3
• Suppose 𝑧1 = 𝑧2 + 𝑧3 𝑧2 𝑧3
𝑧1
𝑦2
𝑦1
55

• 𝑦1 = 𝑤11𝑧1 + 𝑤12𝑧2 + 𝑤13𝑧3
• 𝑦2 = 𝑤21𝑧1 + 𝑤22𝑧2 + 𝑤23𝑧3
• Suppose 𝑧1 = 𝑧2 + 𝑧3
• We can remove 𝑧1
• And readjust weights
• 𝑤12 ← 𝑤12 + 𝑤11 𝑤13 ← 𝑤13 + 𝑤11
• 𝑤22 ← 𝑤22 + 𝑤21 𝑤23 ← 𝑤23 + 𝑤21
𝑧2 𝑧3
𝑦2
𝑦1
56

Evaluation Setup
• Models
• VGG-16 trained on CIFAR-10
• AlexNet trained on CIFAR-10
• LeNet trained on MNIST
• Attack Settings:
• PGD adversary with various ℓ∞ and ℓ2 constraints
• Baseline:
• Adversarial Training [Madry+17]
• Gaussian Smoothing [Cohen+19]
• Vanilla LRE-AMC (ranks and prunes neurons with low 𝛿𝑖)

Results Method −Δ𝑃 𝐴𝑐𝑐𝑐𝑙𝑛 ℓ∞
= 0.015
ℓ∞
= 0.031
ℓ∞
= 0.062
ℓ2 = 0.5 ℓ2 = 1.0 ℓ2 = 2.0
LeNet
None 0 99.1 1.2 0 0 18 3.97 1
Ours (MI) 86.8 95.7 13.1 12.1 10.2 14.9 12.7 11.5
Ours (FO) 93.9 96.2 6.3 4.5 2.8 6.3 2.8 0.8
AlexNet
None 0 77.5 8.9 0.3 0.08 7.23 0.2 0.06
LRE-AMC 97.7 74.6 23.7 17 14.3 25 17.3 14.5
Ours(FO) 98.3 72.3 14.6 10.1 9.5 13.2 9.8 9.2
Ours(MI) 98.3 72.2 10.2 4.2 3.2 9.5 4.3 3.7
VGG-16
None 0 90.3 1.4 0 0 4 1.8 0.6
AdvTrain 0 74.9 57.1 37.1 8.6 53.2 27.6 3.5
GSmooth 0 82.9 43.5 13.8 0.8 47.6 16.6 1.0
Ours(FO) 87.7 85.6 20.0 17.4 13.3 20.6 19.3 15.8
LRE-AMC 84.6 87.7 11.2 9.3 5.7 11.5 9.9 6.8
Ours(MI) 98.3 85.7 11.8 9.2 7.0 12.4 9.5 7.1
• Removing spurious features
significantly improves robustness
• First-order estimation of MI
seems to work better than MINE
• The robustness gains of our
techniques generalize to even
larger perturbation sizes, where
it outperforms adversarial
training and gaussian smoothing
• Note that our method does
not employ perturbed data
at any point unlike the other
defenses

Conclusion
• We have shown that pruning neurons that encode superfluous
features improves the robustness of DNNs while making them more
compact.
• Our results appears to contradict [Nakkiran, 2019, Madry+, 2017]
who posit that high capacity models are a pre-requisite for adversarial
robustness.
• Our results show that high capacity may be required at training time to learn
robust features, but judiciously removing spurious neurons/features can
make the models much more robust.

• Biological and artificial systems are
constrained in different ways, so
comparison based only on
apples.
• Machines have a restricted vocabulary
• Different inductive biases – shape vs.
texture
• Machines (usually) perform feed
forward processing, while humans
can do recurrent processing

The Bayesian Brain Hypothesis [Parr+18]
• Brain encodes beliefs (in the synaptic weights) about the causes of sensory
data, and these beliefs are updated in response to new sensory
information
• Complete class theorem: there is always a prior belief that renders an observed
behavior Bayes optimal.
• Pathologies related to loss of sensory signals can be explained as there
being no observation to modulate the influence of the prior.
• Autism can be understood as weak prior beliefs about the environment
due to which patients rely to a greater degree on the stimuli obtained from
the environment.
• There are ascending and descending connections in the brain:
• Ascending connections carry predictions and descending connections carry errors

The Predictive Coding Hypothesis [Rao+99]
• “The approach postulates that
neural networks learn the
statistical regularities of the
natural world, signaling
deviations from such regularities
to higher processing centers. This
reduces redundancy by removing
the predictable, and hence
redundant, components of the
input signal”. [Rao+99]
𝐼|𝑟~𝒩 𝑓 𝑈𝑟 , 𝜎𝐼 ∝ 𝑓 𝑈𝑟 − 𝐼 2
𝑟|𝑟𝑡𝑑~𝒩 𝑟𝑡𝑑, 𝜎𝑖 ∝ 𝑟 − 𝑟𝑡𝑑 2
+𝑔(𝑟)

The Predictive Coding Hypothesis [Rao+99]
• Another way to look at this hypothesis is that the brain is aligning the
observations to its expectation.
• Since the brain mostly observes natural (clean) data its prior is closely
aligned with clean data
• We hypothesize that when we observe adversarially perturbed data,
the brain aligns it with its expectation and, in a way, denoises the data
internally.

Sparse Coding Hypothesis [Paiton+20]
• The neural activation patterns that arise in response to a stimulus
tend to be very sparse.
• This is true deep within cortical areas, as well as in the retina and
Lateral Geniculate Nucleus.

• Locally Competitive Algorithm (LCA)
[Rozell+18]
𝐸 𝑢 = 𝑥 − Φ𝜎(𝑢) 2
2
+ 𝜆 𝜎(𝑢) 1
• Used to learn sparse representations from data
1. 𝑓𝑜𝑟 𝑘: 1 → 𝐾:
2. 𝑢 ← ∇𝑢𝐸 𝑢
3. Φ ← ∇Φ𝐸 𝑢
• ∇𝑢𝐸 𝑢 = 𝜎′ 𝑢 ⊙ −Φ𝑇𝑠 + Φ𝑇Φ𝜎 𝑢 − 𝜆𝟏
• Conventionally each activation is computed
(somewhat) conditionally independent of the
other activations – they are pointwise activations.
• Due to this term each activation is dependent on
the activations of all the other neurons around it
– these are population activations.

Sparse Coding Hypothesis [Paiton+20] –
Relation to Adversarial Robustness
• Iso-contour is a group of points at which a function has the same value
• Point-wise non-linear neurons the iso-response contours are straight
• perturbations perpendicular to the weight vector do not change the output
• Perturbations parallel to the weight vector can change the output.
• They can be arbitrarily far from the weights
• Neurons with population non-linearities (e.g. with horizontal
connections) have curved iso-response contours
• if a perturbation is perpendicular to the weight of one neuron it might not be
perpendicular to the weight of some other neuron thus the output of all the
neurons will change.
• Curved iso-contours indicate specificity towards a preferred direction
• For highly curved surfaces perturbations need to be parallel and close to the
weight vector
• Curvature increases as over-completeness is increased.

• Point-wise non-linear neurons – adversarial
attack travels parallel to the target class's
weight vector until it hits the decision
boundary.
• Population non-linear neurons – attack travels
perpendicular to the iso-contours until it
reaches the target class's weight vectors and
then travels along it.
• Perturbation likely to be semantically meaningful
• Key Takeaway:
• Population non-linearities induce robustness by
increasing specificity of the neurons.

Neural Activity Covariance [Hennig+21]
• It has been observed that neurons in the brain exhibit a degree of
covariability.
• Importantly, this covariability pattern remains largely fixed in the
short term, even if it hampers task performance.
• We hypothesize that this covariability pattern might have implications
for robustness.
• The space of possible activation patterns is reduced.
• The adversary can not modify the image in any arbitrary way, rather the
perturbations must respect the covariability pattern.
• If it does not, then the neural activities will be “projected” to the space that respects the
covariability pattern and under this projection the adversarial perturbation may no
longer be adversarial.

Computationalizing Neural Covariability
• A straightforward approach:
• Covariability ≡ Covariance
• Choose a family of probability distributions to model the conditional distribution of the
activations
𝑃(𝑎𝑖|𝑎−𝑖; Σ)
• Update each 𝑎𝑖 to maximize its conditional probability.
• Advantages:
• Simple and interpretable.
• Disadvantages:
• Simple distributions then to be unimodal so the activations will be pushed to a single value
and thus be rendered useless.
• Complicated multimodal distributions might be harder to learn and would still have a
relatively small number of stationary points.
• Covariance is symmetric by definition, however, we do not have reason to believe that such
symmetry of incoming and outgoing synaptic weights exists in the brain.

Computationalizing Neural Covariability
• Another approach:
• Covariability ≡ Predictability
• Let 𝑎𝑖 ≈ 𝑓𝑊 𝑎−𝑖
• Update each 𝑎𝑖 to optimize argmax𝑦𝑖
𝐷 𝑎𝑖, 𝑓𝑊 𝑎−𝑖
• Advantages:
• Flexibility
• 𝑓𝑊 can be selected to give larger number of stationary points
• Symmetric relationships are not required.
• Starting to look like Boltzmann Machines…

Emergent Architecture
• Synaptic connections are continuously being created and destroyed in
the brain.
• The network of connection, with features like recurrence, feedback
and skip connections is somewhat emergent.
• We try to simulate that by considering a fully connected network.

A Fully Connected (FC) Network with
Activation Consistency Maximization
• Let 𝑎𝑖 ≈ 𝑓𝑊𝑙 𝑎−𝑖 and let 𝑓𝑊𝑙 𝑎−𝑖 = 𝜓 𝑊𝑙
𝑎−𝑖
• Alternatively, we can let 𝑎𝑖 ≈ 𝑓𝑊𝑙 𝑎 and constrain 𝑊𝑙
to
have zero on the diagonal.
• Inference Algorithm
1. 𝑠 ← 𝑊𝑓𝑥
2. 𝑓𝑜𝑟 𝑖: 1 → 𝑘:
3. 𝑎 ← 𝜙 𝑠
4. 𝑠 ← 𝑎 − 𝜂𝑎 ∇𝑎 𝑎 − 𝜓(𝑊𝑙𝑎) 2
2
(𝑊𝑖𝑖
𝑙
= 0, ∀𝑖)
5. 𝑦 ← 𝑠1:𝐶
• During training run the inference algorithm and
backprop.
• Low magnitude weights are pruned away after
training.

Experimental Setup – Models
• First, we consider the case in which 𝜓(𝑊𝑙𝑎) = 𝑊𝑙𝑎, so 𝑓𝑊𝑙 is linear.
• We consider FC models with 10 and 64 units.
• The 10-unit model has no hidden units – the final activations of the 10 units
are the logits.
• The 64-unit model has 54 hidden units and so has more flexibility and greater
representational power.
• We experiment with 8, 16 and 32 iterations of activation consistency
optimization.

Experimental Setup – Baselines
• As baselines we consider two 2-layer MLPs, the first has 10 units in
both layer and the second has 64 units.
• These models have comparable numbers of parameters to the FC models.
• The FC model has 𝐷 × 𝑁 + 𝑁2
+ 𝑁 parameters, while the MLPs have 𝐷 ×
𝑁 + 𝑁 + 𝑁2 + 𝑁 + 𝑁 × 𝐶 + 𝐶.
• The performance difference will be due to the additional computation being
performed and not due to additional parameters/memorization capabilities.

Experimental Setup – Training and Evaluation
• The models are trained on 10K images from MNIST or FMNIST and
evaluated on 1K images.
• The models are also evaluated on adversarially perturbed data.
• The perturbations are computed using the Projected Gradient
Descent attack
• The attack is repeated several times with different upper bounds on
the ℓ∞ norms of the perturbations.
• The upper bounds we consider are 0.01, 0.02, 0.03, 0.05, 0.1

Results
• The FC models exhibit
significantly grater
adversarial robustness
• The FC models suffer slight
degradation in clean
accuracy
• Increasing the number of
units improves the
performance of the MLP
more significantly than the
FC models
• Increasing the number of
iterations improves
performance at higher levels
of perturbation.
MNIST
FMNIST

Learned Lateral Matrices
• The lateral matrices
exhibits a high level of
symmetry
• This is expected because
𝑓𝑊 is linear.
10-Units/32 steps
MNIST
FMNIST

Evolution of Activations and Accuracy –
FMNIST The linear separability and accuracy of some
classes increases rapidly across the iterations.
𝜖 = 0.0 𝜖 = 0.05 𝜖 = 0.1
10-Units
64-Units

Optimizing Non-Linear Consistency
• Let 𝜓 = 𝑅𝑒𝐿𝑈
• Hypothesis: Introducing non-linearity will allow for non-symmetric
relationships between neurons
• The experimental setup remains the same

Results
• Some conflicting results.
• Significant
improvements in the 10
unit model
• But less so in the 64 unit
model
• Robustness decreases in
the 64-unit MNIST
model but increases in
the 64-unit FMNIST
model.
MNIST
FMNIST

Evolution of Activations and Accuracy – 64-
Unit Models
There seems to be little change in the activations for
the MNIST model with ReLU consistency optimization.
MNIST FMNIST
Linear
ReLU

Learned Lateral Matrices
• The lateral matrices are
less symmetric now
10-Units/32 steps
MNIST
FMNIST

Introducing Activity-Input Consistency
Optimization
• Inference Algorithm:
1. 𝑠 ← 𝑊𝑓𝑥
2. For 𝑖: 1 → 𝑘:
3. 𝑎 ← 𝜙 𝑠
4. 𝑠 ← 𝑎 − 𝜂𝑎 ∇𝑎 𝑎 − 𝜓(𝑊𝑙𝑎) 2
2
+ 𝜵𝒂 𝒙 − 𝝍(𝑾𝒃𝒂) 𝟐
𝟐
(𝑊𝑖𝑖
𝑙
= 0, ∀𝑖)
5. 𝑦 ← 𝑠1:𝐶
• Everything else remains the same.

Motivating Activity-Input Consistency
Optimization
• Recall the predictive coding hypothesis
• The brain tries to make the internal representations at each level “consistent”
with representations at the previous level and the next level.
• It is possible (but possibly unlikely) that activity consistency
optimization modifies the activations to a point that they have no
information about the input.
• When 𝑓𝑊 is linear, there is a trivial solution 𝑎 = 𝟎.
• Adding an addition objective of reconstructing the input discourages
the optimization from discarding image related information.

Results
• Optimizing input-
activity consistency
improves robustness for
FMNIST but not MNIST
• Perhaps, augmentations
like non-linear
consistency and input-
activity consistency are
only needed for more
complex datasets.
MNIST
FMNIST

Convolutionalizing The Model: Method 1
Method 1: We simply scan the input using the model we used for MNIST and FMNIST
• This method effectively optimizes the consistency between the channels but not between different spatial
coordinates.

Method 1: We simply scan the input using the model we used for MNIST and FMNIST
• This method effectively optimizes the consistency between the channels but not between different spatial
coordinates.
Advantages:
• Simple to implement
• Memory efficient – the weight matrices depend
only on the size of the kernel and the number of
channels, not the image size.
Disadvantages:
• Does not optimize consistency between spatial
coordinates.

Method 2: Optimizing consistency across both channels and spatial coordinates is too memory intensive so
optimize consistency between spatial coordinates in each channel, but not across channels.

Method 2: Optimizing consistency across both channels and spatial coordinates is too memory intensive so
optimize consistency between spatial coordinates in each channel, but not across channels.
Advantages:
• Optimizes spatial consistency which is known to be
important in images.
Disadvantages:
• Still wasteful in resources, since images have local
spatial consistency but not global – better to
window the consistency optimization.
• Channel consistency is not optimized.

Method 3: We simply scan the input using the model we used for
MNIST and FMNIST
• This method effectively optimizes the consistency between the
channels but not between different spatial coordinates.
Advantages:
• Optimizes both channel and local-spatial
consistency
Disadvantages:
• Windowing operation is slow

Experimental Setup – Models
• 𝜓 = 𝑅𝑒𝐿𝑈
• 64 units
• 1 layer
• 32 activity consistency optimization iterations
• 5x5 kernel and 3x3 stride
• ReLU activations

Experimental Setup – Baseline
• 1-layer CNN
• 64 channels
• 5x5 kernel with 3x3 stride

Experimental Setup – Training and Evaluation
• The models are trained on 10K images from CIFAR10 and evaluated
on 1K images.
• The models are also evaluated on adversarially perturbed data.
• The perturbations are computed using the Projected Gradient
Descent attack
• The attack is repeated several times with different upper bounds on
the ℓ∞ norms of the perturbations.
• The upper bounds we consider are 0.008, 0.016, 0.024, 0.032, 0.048, 0.064

Comparing All Methods
• Our method improves
robustness
significantly
• Robustness comes at
the cost of clean
accuracy

Summary and Conclusions
• Integrated biologically inspired mechanisms into deep learning models.
• These mechanisms constrain the activation patterns i.e. arbitrary patterns
become less likely
• In a way the model is internally denoising the perturbations and moving them
towards activations observed during training – some rudimentary memory
mechanism.
• Experimental results show that this makes the models more robust
• Hypothesis: the adversary has an additional task of ensuring that the perturbations
are “familiar” to the model.
• The experimental results generalize across datasets and model
architectures.

Future Directions
• Refine the convolutional architectures
• We might need to increase depth to improve accuracy
• Depth is required for shift invariance
• Further investigate the similarities between our consistency optimization
approach and Boltzmann machines / Hopfield nets
• There is a hypothesis that signaling between cortical areas in the brain
takes place in a very small “communication subspace” [Kohn+20]
• Neural activations in the source area that lie outside this subspace cause little or no
activity in the target area
• Perhaps quantizing the activity of the model may be a way of implementing this
subspace.

References
[Goodfellow+14] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
[Madry+18] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu, “Towards deep learning models resistant to adversarial attacks,” 2017.
[Kannan+18] Kannan, Harini, Alexey Kurakin, and Ian Goodfellow. "Adversarial logit pairing." arXiv preprint arXiv:1803.06373 (2018).
[Zhang+19] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 7472–7482. PMLR, 2019.
[Wong+20] Wong, Eric, Leslie Rice, and J. Zico Kolter. "Fast is better than free: Revisiting adversarial training." arXiv preprint arXiv:2001.03994 (2020).
[Rebuffi+21] Rebuffi, Sylvestre-Alvise, et al. "Fixing data augmentation to improve adversarial robustness." arXiv preprint arXiv:2103.01946 (2021).
[Cohen+19] Jeremy M. Cohen, Elan Rosenfeld, and J. Zico Kolter, “Certified adversarial robustness via randomized smoothing,” CoRR, 2019.
[Ilyas+18] Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry, “Adversarial examples are not bugs, they are features,” 2019.
[Wang+19] Haohan Wang, Xindi Wu, Pengcheng Yin, and Eric P. Xing, “High frequency component helps explain the generalization of convolutional neural networks,” CoRR, 2019.
[Firestone+20] Firestone, Chaz. "Performance vs. competence in human–machine comparisons." Proceedings of the National Academy of Sciences 117.43 (2020): 26562-26571.
[Shah+21a] Shah, Muhammad A., Raphael Olivier, and Bhiksha Raj. "Towards Adversarial Robustness Via Compact Feature Representations." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE,
2021.
[Nakkiran19] Preetum Nakkiran, “Adversarial robustness may be at odds with simplicity,” 2019.
[Belghazi+2018] Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm, “Mine: mutual information neural estimation,” arXiv:1801.04062, 2018.
[Shah+21b] Muhammad Shah, Raphael Olivier, and Bhiksha Raj, “Exploiting non-linear redundancy for neural model compression,” in ICPR, 2021.
[Parr+18] Parr, Thomas, Geraint Rees, and Karl J. Friston. "Computational neuropsychology and Bayesian inference." Frontiers in human neuroscience (2018): 61.
[Rao+99] Rao, Rajesh PN, and Dana H. Ballard. "Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects." Nature neuroscience 2.1 (1999): 79-87.
[Paiton+20] Paiton, Dylan M., et al. "Selectivity and robustness of sparse coding networks." Journal of vision 20.12 (2020): 10-10.
[Hennig+21] Hennig, Jay A., et al. "How learning unfolds in the brain: toward an optimization view." Neuron 109.23 (2021): 3720-3735.
[Kohn+20] Kohn, Adam, et al. "Principles of corticocortical communication: proposed schemes and design considerations." Trends in neurosciences 43.9 (2020): 725-737.

adversarial robustness lecture

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to adversarial robustness lecture

Similar to adversarial robustness lecture (20)

Recently uploaded

Recently uploaded (20)

adversarial robustness lecture

Editor's Notes