SlideShare a Scribd company logo
Adversarial Robustness
Muhammad Ahmed Shah
04/19/2022
Outline
• Part I: Introduction to Adversarial Perturbations
• What are adversarial perturbations?
• How are adversarial perturbations created?
• How to defend against adversarial perturbations?
• Why do adversarial perturbations exist?
• Part II: Past and Current Projects in Our Group
• Towards Adversarial Robustness via Compact Feature Representations
• Biologically Inspired Models for Adversarial Robustness
Outline
• Part I: Introduction to Adversarial Perturbations
• What are adversarial perturbations?
• How are adversarial perturbations created?
• How to defend against adversarial perturbations?
• Why do adversarial perturbations exist?
• Part II: Past and Current Projects in Our Group
• Towards Adversarial Robustness via Compact Feature Representations
• Biologically Inspired Models for Adversarial Robustness
Adversarial Attacks on ML Models
𝛿 𝑥 + 𝛿
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
ϵ
Adversarial Vulnerability of DNNs
• We have two classifiers
The Human The ML classifier The Adversary
5
Adversarial Vulnerability of DNNs
• The ML classifier is trying to replicate the human’s decision function.
cat
6
Adversarial Vulnerability of DNNs
• To teach the classifier, the human provides it very sparse feedback.
That’s a cat!
7
Adversarial Vulnerability of DNNs
• The classifier is trying to replicate the human’s decision function.
cat
8
Adversarial Vulnerability of DNNs
• The classifier is trying to replicate the human’s decision function.
cat
9
Adversarial Vulnerability of DNNs
• The classifier is trying to replicate the human’s decision function.
cat
10
Adversarial Vulnerability of DNNs
• The classifier is trying to replicate the human’s decision function.
cat
11
Adversarial Vulnerability of DNNs
• The adversary searches for points within (an approximation of) the
human’s perceptual boundary for which the ML classifier responds
differently than we do
cat
12
Adversarial Vulnerability of DNNs
• If the classifier accurately models the perceptual boundary, the
adversary would have to find a point outside the boundary to change
the classifier’s output.
cat
13
Adversarial Vulnerability of DNNs
• If the classifier accurately models the perceptual boundary the
adversary would have to find a point outside the boundary to change
the classifier’s output.
Not A Cat
14
Outline
• Part I: Introduction to Adversarial Perturbations
• What are adversarial perturbations?
• How are adversarial perturbations created?
• How to defend against adversarial perturbations?
• Why do adversarial perturbations exist?
• Part II: Past and Current Projects in Our Group
• Towards Adversarial Robustness via Compact Feature Representations
• Biologically Inspired Models for Adversarial Robustness
Adversarial Attack Methods
• Fast Gradient Sign Method (FGSM) [Goodfellow+2014]
𝑥𝑎𝑑𝑣 = 𝑥 + 𝜖 ⋅ sign(∇𝑥ℒ 𝑓 𝑥 , 𝑦 )
cat
16
𝜖 ⋅ sign(∇𝑥ℒ 𝑓 𝑥 , 𝑦 )
Adversarial Attack Methods
• Projected Gradient Descent (PGD) [Madry+2018]
1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿′ − 𝑥
cat
17
Π𝜖 is a projection onto a ℓ𝑝-norm ball of
radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used
and Π𝑥 projects to onto the subspace that 𝑥
lies in. Usually [-1, 1] for images.
Adversarial Attack Methods
• Projected Gradient Descent (PGD) [Madry+2018]
1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿 − 𝑥
cat
18
Π𝜖 is a projection onto a ℓ𝑝-norm ball of
radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used
and Π𝑥 projects to onto the subspace that 𝑥
lies in. Usually [-1, 1] for images.
Adversarial Attack Methods
• Projected Gradient Descent (PGD) [Madry+2018]
1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿 − 𝑥
cat
19
Π𝜖 is a projection onto a ℓ𝑝-norm ball of
radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used
and Π𝑥 projects to onto the subspace that 𝑥
lies in. Usually [-1, 1] for images.
Adversarial Attack Methods
• Projected Gradient Descent (PGD) [Madry+2018]
1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿 − 𝑥
cat
20
Π𝜖 is a projection onto a ℓ𝑝-norm ball of
radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used
and Π𝑥 projects to onto the subspace that 𝑥
lies in. Usually [-1, 1] for images.
Adversarial Attack Methods
• Projected Gradient Descent (PGD) [Madry+2018]
1. 𝛿 = Π𝜖 𝑈 −1,1
2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾
3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦
4. 𝛿 ← Π𝑥 𝑥 + 𝛿 − 𝑥
cat
21
Π𝜖 is a projection onto a ℓ𝑝-norm ball of
radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used
and Π𝑥 projects to onto the subspace that 𝑥
lies in. Usually [-1, 1] for images.
Outline
• Part I: Introduction to Adversarial Perturbations
• What are adversarial perturbations?
• How are adversarial perturbations created?
• How to defend against adversarial perturbations?
• Why do adversarial perturbations exist?
• Part II: Past and Current Projects in Our Group
• Towards Adversarial Robustness via Compact Feature Representations
• Biologically Inspired Models for Adversarial Robustness
Types of Adversarial Defenses
• Empirical Defenses:
• Work well in most practical scenarios (hence empirical)
• No formal proof that they will always work.
• Certifiable Defenses:
• Provable (certifiable) that prediction of the model for a given input does not
change if a norm-bounded adversarial perturbation is added to it.
‖𝛿‖
Adversarial Training
• AT is perhaps the most successful empirical defense against adversarial
attacks.
• Over the years, several variations of AT have been proposed that have
made it more effective.
• Basic Algorithm [Madry+2017]
1. 𝑓𝑜𝑟 𝑥, 𝑦 ∈ 𝐷:
2. 𝑥𝑎𝑑𝑣 ← max
𝑥′∈𝒳
𝑥−𝑥′ ≤𝜖
ℒ(𝑓𝜃 𝑥′
, 𝑦)
3. 𝜃 ← ∇𝜃ℒ 𝑓𝜃 𝑥𝑎𝑑𝑣 , 𝑦
• Note that we create a new adversarial example in each iteration. Why?
• The model has changed so the gradients will change and thus the adversarial
perturbation will change
Adversarial Training
• Some issues:
• Multiple optimization steps required for computing adversarial perturbations
=> slow and computationally intensive => difficult to scale to larger
models/datasets.
• Robust overfitting – robust accuracy improves on the train set but decreases
on the testing set.
• Overfitting to the attack type and attack parameters used during training.
Incremental Improvements to Adversarial
Training
• Logit Pairing [Kannan+18]:
𝜃 ← ∇𝜃 ℒ 𝑓𝜃 𝑥𝑎𝑑𝑣 , 𝑦 + 𝛼 𝑓𝜃 𝑥 − 𝑓𝜃 𝑥𝑎𝑑𝑣
• TRADES [Zhang+19]:
argmax 𝑥′∈𝒳
𝑥−𝑥′ ≤𝜖
ℒ 𝑓𝜃 𝑥 , 𝑓𝜃 𝑥𝑎𝑑𝑣
𝜃 ← ∇𝜃 ℒ 𝑓𝜃 𝑥𝑎𝑑𝑣 , 𝑦 + ℒ 𝑓𝜃 𝑥 , 𝑓𝜃 𝑥𝑎𝑑𝑣 /𝜆
Speeding Up Adversarial Training [Wong+20]
Speeding Up Adversarial Training [Wong+20]
• Use a single FGSM step instead
of multiple PGD steps during
adversarial training.
• Generally this does not work, but
the authors state that the key is
random initialization of the
perturbation.
• Can train an ImageNet model in
12 hours, compared to 50 hours
required for ImageNet.
State-of-the-Art Adversarial Training
[Rebuffi+21]
State-of-the-Art Adversarial
Training [Rebuffi+21]
• Combines model weight averaging, data
augmentation and synthetic data
generation to achieve SOTA robust
accuracy on CIFAR-10
• Weight Averaging:
1. 𝜃𝑡+1 = 𝜃𝑡 + 𝜂∇𝜃
2. 𝜃𝑡+1 = 𝜏𝜃𝑡+1 + 1 − 𝜏 𝜃𝑡
• Uses TRADES to perform AT.
• Key Outcomes:
• Weight Averaging improves robustness
• CutMix data augmentation provides the best
clean-robust accuracy tradeoff
• Increasing the training data improves robust
accuracy
• Using a small amount synthetic data is
beneficial.
Randomized Smoothing [Cohen+19]
• RS is a very popular certifiable defense.
• Smoothed Classifier – 𝑔:
𝑔 𝑥 = argmax𝑐∈𝒴𝑃𝜀~𝒩(0,𝐼𝜎) 𝑓 𝑥 + 𝜀 = 𝑐
• Smoothing to Robustness:
• Let 𝑝𝐴 and 𝑝𝐵 be the probabilities of the most probable class and the second-
most probable class, respectively and let Φ be the Gaussian CDF.
• 𝑔 is robust in a ℓ2 ball of radius 𝑅 =
𝜎
2
Φ−1 𝑝𝐴 − Φ−1 𝑝𝐵 around 𝑥.
Randomized Smoothing [Cohen+19]
• SAMPLEUNDERNOISE:
1. 𝑓𝑜𝑟 𝑖: 1 → 𝑛:
2. 𝜖~𝒩 0, 𝐼𝜎
3. 𝑐 = 𝑓 𝑥 + 𝜖
4. 𝑐𝑜𝑢𝑛𝑡𝑠 𝑐 += 1
• BINOMPVALUE returns the p-value of the
hypothesis test that 𝑛𝐴~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛𝐴 +
Outline
• Part I: Introduction to Adversarial Perturbations
• What are adversarial perturbations?
• How are adversarial perturbations created?
• How to defend against adversarial perturbations?
• Why do adversarial perturbations exist?
• Part II: Past and Current Projects in Our Group
• Towards Adversarial Robustness via Compact Feature Representations
• Biologically Inspired Models for Adversarial Robustness
The Classifier is Learning Non-Robust
Features
• The features extracted from the data can be of 3 types:
• Robust:
• Features that are likely to be meaningful to humans as well.
• Non-robust:
• Features that are useful for the task at hand but are either not what humans would use,
or their usefulness is an artifact of the dataset
• Useless:
• They can not be used to improve performance on the task.
Example: XOR
• Now the algorithm has been provided this new table instead
• Target Function: 𝑌 = 𝑋1 xor 𝑋2
• 𝑋3 is a spurious input
X1 X2 X3 Y
0 0 0 0
0 1 0 1
1 0 0 1
1 1 0 0
0 0 1 0
0 1 1 1
1 0 1 1
1 1 1 0
35
Example: XOR
• The algorithm can learn any of these patterns for the unseen input
combinations
• Only one is right for our target function
• If it learns any of the others, the output for some combinations of X1 and X2
can be made erroneous by choosing the right X3
X1 X2 X3 Y
0 0 0 0
0 1 0 1
1 0 0 1
1 1 0 0
0 0 1 0
1 1 1 0
0 1 1 0
1 0 1 0
0 1 1 1
1 0 1 0
0 1 1 0
1 0 1 1
0 1 1 1
1 0 1 1
36
Example: XOR
• The number of missing patterns is exponential in the number of spurious bits
• The number of possible extensions to the table is exponential in the number of
missing patterns
• The number of ways of adversarially modifying inputs increases super-
exponentially with the number of spurious bits
X1 X2 X3 Y
0 0 0 0
0 1 0 1
1 0 0 1
1 1 0 0
0 0 1 0
1 1 1 0
0 1 1 0
1 0 1 0
0 1 1 1
1 0 1 0
0 1 1 0
1 0 1 1
0 1 1 1
1 0 1 1
2𝐾 − 𝐷
22𝐾−𝐷
37
Robust and Non-Robust Features in Images
• Ilyas+18 used adversarial attacks
to disentangle robust and non-
robust features
• Non-robust features can provide
good accuracy on clean data.
High Frequency Features [Wang+20]
• [Wang+20] identifies a particular type of
non-robust feature – high frequency
components.
• Generally human perception operates on
a bounded band of frequencies, both in
terms of visual and audio perception.
• HF components in audio are in audible to
humans (e.g. dog whistle)
• HF components in images seem like noise to
us and we try to filter it out.
• HF components of the images are
separated by computing the fourier
transform of the image and thresholding
the distance to the component to the
centroid
FFT Log
https://homepages.inf.ed.ac.uk/rbf/HIPR2/fourier.htm
High Frequency Features [Wang+20]
• It turns out image classifiers are
heavily reliant on high-frequency
components.
• If a trained model is presented with
only the low frequency components, it
performs poorly.
• If it is presented with only the high
frequency components it predicts
correctly
• In the initial epochs using LF
components does not harm training
accuracy, but in later epochs it does
•  models learn LF components first
then they learn HF components.
High Frequency Features [Wang+20]
• Frequency Components and
Robustness:
• If models rely heavily on HF
components are vulnerable
• Humans don’t rely on HF, so
changing HF components will
likely not change the human’s
prediction.
• If a convolutional filter highly
weighs HF components, the
model will rely on HF
components.
• Hypothesis: Smoothing the
filter will improve robustness
• There is a noticeable
improvement.
Differences between Biological Systems and
ML Models [Firestone+20]
• Biological and artificial systems
are constrained in different ways,
so comparison based only on
performance are not apples-to-
apples.
• Humans have foveated vision.
• Adding a foveated filter to images
before passing them to CNNs yields
more semantically meaningful
perturbations
Differences between Biological Systems and
ML Models [Firestone+20]
• Biological and artificial systems
are constrained in different ways,
so comparison based only on
performance are not apples-to-
apples.
• Humans have foveated vision.
• Machines have a restricted
vocabulary
1. Crossword
2. Bagel
3. Starfish
4. School bus
Differences between Biological Systems and
ML Models [Firestone+20]
• Biological and artificial systems
are constrained in different ways,
so comparison based only on
performance are not apples-to-
apples.
• Humans have foveated vision.
• Machines have a restricted
vocabulary
• Different inductive biases – shape
vs. texture
Differences between Biological Systems and
ML Models [Firestone+20]
• Biological and artificial systems are
constrained in different ways, so
comparison based only on
performance are not apples-to-
apples.
• Humans have foveated vision.
• Machines have a restricted vocabulary
• Different inductive biases – shape vs.
texture
• Machines (usually) perform feed
forward processing, while humans can
do recurrent processing
Outline
• Part I: Introduction to Adversarial Perturbations
• What are adversarial perturbations?
• How are adversarial perturbations created?
• How to defend against adversarial perturbations?
• Why do adversarial perturbations exist?
• Part II: Past and Current Projects in Our Group
• Towards Adversarial Robustness via Compact Feature Representations
• Biologically Inspired Models for Adversarial Robustness
Towards Adversarial Robustness via Compact
Feature Representations [Shah+21a]
Recall the Example: XOR
• Now the algorithm has been provided this new table instead
• Target Function: 𝑌 = 𝑋1 xor 𝑋2
• 𝑋3 is a spurious input
X1 X2 X3 Y
0 0 0 0
0 1 0 1
1 0 0 1
1 1 0 0
0 0 1 0
0 1 1 1
1 0 1 1
1 1 1 0
48
Neurons Are Features in DNNs
• Too few neurons  features are under-specified  low accuracy
• Too many neurons  features are over-specified  adversarial vulnerability
We propose a method to identify superfluous features and remove/reduce their
influence on the model’s output
Identifying Superfluous Features
• Let 𝒇 = [𝑓1, … , 𝑓𝑛] be feature vector
• We can decompose each 𝑓𝑖 ∈ 𝒇 as
𝑓𝑖 = 𝜙 𝒇−𝑖 + 𝛿𝑖
• 𝛿𝑖 is the novel information encoded by 𝑓𝑖 and 𝜙 𝒇−𝑖 is the redundant
information.
• We quantify the usefulness of 𝑓𝑖 as 𝐼(𝛿𝑖, 𝑌), where 𝑌 is the true label
• Less useful (superfluous) features are not related to the true label and can be
exploited by the adversary
Decomposing the Features [Shah+20]
• We assume 𝜙 to be a linear function
𝜙 𝒇−𝑖 = 𝒇−𝑖𝒂(𝑖)
• We can find 𝒂(𝑗)
by solving
minimize𝒂∈ℝ𝑛−1𝔼𝑥∼𝒫 𝒇−𝑖𝒂(𝑖) − 𝑓𝑖
2
• We can compute 𝛿𝑖 = 𝑓𝑖 − 𝜙(𝒇−𝑖)
51
Determining the Usefulness of A Feature
[Shah+20]
• Recall: We define the usefulness of 𝑓𝑖 as 𝐼(𝛿𝑖, 𝑌)
• 𝐼 𝛿𝑖, 𝑌 is difficult to compute so we estimate it using two methods:
• First-order approximation:
𝐼 𝛿𝑖, 𝑌 = ∇𝑓𝑖
𝐿 𝑋, 𝑌 𝛿𝑖
• MINE [Belghazi+2018]: Uses a neural network to estimate MI.
• We rank the features by their usefulness and remove the least useful
features using LRE-AMC
Removing Features Using LRE-AMC
[Shah+21b]
• Given the neuron-feature equivalence
𝑓𝑖 = 𝜎(𝑊𝑖𝒇)
• To remove 𝑓𝑖 we can simply remove the 𝑖𝑡ℎ column of 𝑊
• This also removes the influence of 𝜙 𝒇−𝑖 which may cause the
weights in the next layer (𝑊) to become sub-optimal.
• To mitigate the error, we make the following adjustment
𝑊𝑘𝑗 ← 𝑊𝑘𝑗 + 𝑊𝑘𝑖𝒂𝑗
𝑖
• After removing the neuron, we fine-tune the network.
Lossless Redundancy Elimination
• Consider this network
• 𝑦1 = 𝑤11𝑧1 + 𝑤12𝑧2 + 𝑤13𝑧3
• 𝑦2 = 𝑤21𝑧1 + 𝑤22𝑧2 + 𝑤23𝑧3
𝑧2 𝑧3
𝑧1
𝑦2
𝑦1
54
Lossless Redundancy Elimination
• Consider this network
• 𝑦1 = 𝑤11𝑧1 + 𝑤12𝑧2 + 𝑤13𝑧3
• 𝑦2 = 𝑤21𝑧1 + 𝑤22𝑧2 + 𝑤23𝑧3
• Suppose 𝑧1 = 𝑧2 + 𝑧3 𝑧2 𝑧3
𝑧1
𝑦2
𝑦1
55
Lossless Redundancy Elimination
• Consider this network
• 𝑦1 = 𝑤11𝑧1 + 𝑤12𝑧2 + 𝑤13𝑧3
• 𝑦2 = 𝑤21𝑧1 + 𝑤22𝑧2 + 𝑤23𝑧3
• Suppose 𝑧1 = 𝑧2 + 𝑧3
• We can remove 𝑧1
• And readjust weights
• 𝑤12 ← 𝑤12 + 𝑤11 𝑤13 ← 𝑤13 + 𝑤11
• 𝑤22 ← 𝑤22 + 𝑤21 𝑤23 ← 𝑤23 + 𝑤21
𝑧2 𝑧3
𝑦2
𝑦1
56
Evaluation Setup
• Models
• VGG-16 trained on CIFAR-10
• AlexNet trained on CIFAR-10
• LeNet trained on MNIST
• Attack Settings:
• PGD adversary with various ℓ∞ and ℓ2 constraints
• Baseline:
• Adversarial Training [Madry+17]
• Gaussian Smoothing [Cohen+19]
• Vanilla LRE-AMC (ranks and prunes neurons with low 𝛿𝑖)
Results Method −Δ𝑃 𝐴𝑐𝑐𝑐𝑙𝑛 ℓ∞
= 0.015
ℓ∞
= 0.031
ℓ∞
= 0.062
ℓ2 = 0.5 ℓ2 = 1.0 ℓ2 = 2.0
LeNet
None 0 99.1 1.2 0 0 18 3.97 1
Ours (MI) 86.8 95.7 13.1 12.1 10.2 14.9 12.7 11.5
Ours (FO) 93.9 96.2 6.3 4.5 2.8 6.3 2.8 0.8
AlexNet
None 0 77.5 8.9 0.3 0.08 7.23 0.2 0.06
LRE-AMC 97.7 74.6 23.7 17 14.3 25 17.3 14.5
Ours(FO) 98.3 72.3 14.6 10.1 9.5 13.2 9.8 9.2
Ours(MI) 98.3 72.2 10.2 4.2 3.2 9.5 4.3 3.7
VGG-16
None 0 90.3 1.4 0 0 4 1.8 0.6
AdvTrain 0 74.9 57.1 37.1 8.6 53.2 27.6 3.5
GSmooth 0 82.9 43.5 13.8 0.8 47.6 16.6 1.0
Ours(FO) 87.7 85.6 20.0 17.4 13.3 20.6 19.3 15.8
LRE-AMC 84.6 87.7 11.2 9.3 5.7 11.5 9.9 6.8
Ours(MI) 98.3 85.7 11.8 9.2 7.0 12.4 9.5 7.1
• Removing spurious features
significantly improves robustness
• First-order estimation of MI
seems to work better than MINE
• The robustness gains of our
techniques generalize to even
larger perturbation sizes, where
it outperforms adversarial
training and gaussian smoothing
• Note that our method does
not employ perturbed data
at any point unlike the other
defenses
Conclusion
• We have shown that pruning neurons that encode superfluous
features improves the robustness of DNNs while making them more
compact.
• Our results appears to contradict [Nakkiran, 2019, Madry+, 2017]
who posit that high capacity models are a pre-requisite for adversarial
robustness.
• Our results show that high capacity may be required at training time to learn
robust features, but judiciously removing spurious neurons/features can
make the models much more robust.
Outline
• Part I: Introduction to Adversarial Perturbations
• What are adversarial perturbations?
• How are adversarial perturbations created?
• How to defend against adversarial perturbations?
• Why do adversarial perturbations exist?
• Part II: Past and Current Projects in Our Group
• Towards Adversarial Robustness via Compact Feature Representations
• Biologically Inspired Models for Adversarial Robustness
Differences between Biological Systems and
ML Models [Firestone+20]
• Biological and artificial systems are
constrained in different ways, so
comparison based only on
performance are not apples-to-
apples.
• Humans have foveated vision.
• Machines have a restricted vocabulary
• Different inductive biases – shape vs.
texture
• Machines (usually) perform feed
forward processing, while humans
can do recurrent processing
The Bayesian Brain Hypothesis [Parr+18]
• Brain encodes beliefs (in the synaptic weights) about the causes of sensory
data, and these beliefs are updated in response to new sensory
information
• Complete class theorem: there is always a prior belief that renders an observed
behavior Bayes optimal.
• Pathologies related to loss of sensory signals can be explained as there
being no observation to modulate the influence of the prior.
• Autism can be understood as weak prior beliefs about the environment
due to which patients rely to a greater degree on the stimuli obtained from
the environment.
• There are ascending and descending connections in the brain:
• Ascending connections carry predictions and descending connections carry errors
The Predictive Coding Hypothesis [Rao+99]
• “The approach postulates that
neural networks learn the
statistical regularities of the
natural world, signaling
deviations from such regularities
to higher processing centers. This
reduces redundancy by removing
the predictable, and hence
redundant, components of the
input signal”. [Rao+99]
𝐼|𝑟~𝒩 𝑓 𝑈𝑟 , 𝜎𝐼 ∝ 𝑓 𝑈𝑟 − 𝐼 2
𝑟|𝑟𝑡𝑑~𝒩 𝑟𝑡𝑑, 𝜎𝑖 ∝ 𝑟 − 𝑟𝑡𝑑 2
+𝑔(𝑟)
The Predictive Coding Hypothesis [Rao+99]
• Another way to look at this hypothesis is that the brain is aligning the
observations to its expectation.
• Since the brain mostly observes natural (clean) data its prior is closely
aligned with clean data
• We hypothesize that when we observe adversarially perturbed data,
the brain aligns it with its expectation and, in a way, denoises the data
internally.
Sparse Coding Hypothesis [Paiton+20]
• The neural activation patterns that arise in response to a stimulus
tend to be very sparse.
• This is true deep within cortical areas, as well as in the retina and
Lateral Geniculate Nucleus.
Sparse Coding Hypothesis [Paiton+20]
• Locally Competitive Algorithm (LCA)
[Rozell+18]
𝐸 𝑢 = 𝑥 − Φ𝜎(𝑢) 2
2
+ 𝜆 𝜎(𝑢) 1
• Used to learn sparse representations from data
1. 𝑓𝑜𝑟 𝑘: 1 → 𝐾:
2. 𝑢 ← ∇𝑢𝐸 𝑢
3. Φ ← ∇Φ𝐸 𝑢
• ∇𝑢𝐸 𝑢 = 𝜎′ 𝑢 ⊙ −Φ𝑇𝑠 + Φ𝑇Φ𝜎 𝑢 − 𝜆𝟏
• Conventionally each activation is computed
(somewhat) conditionally independent of the
other activations – they are pointwise activations.
• Due to this term each activation is dependent on
the activations of all the other neurons around it
– these are population activations.
Sparse Coding Hypothesis [Paiton+20] –
Relation to Adversarial Robustness
• Iso-contour is a group of points at which a function has the same value
• Point-wise non-linear neurons the iso-response contours are straight
• perturbations perpendicular to the weight vector do not change the output
• Perturbations parallel to the weight vector can change the output.
• They can be arbitrarily far from the weights
• Neurons with population non-linearities (e.g. with horizontal
connections) have curved iso-response contours
• if a perturbation is perpendicular to the weight of one neuron it might not be
perpendicular to the weight of some other neuron thus the output of all the
neurons will change.
• Curved iso-contours indicate specificity towards a preferred direction
• For highly curved surfaces perturbations need to be parallel and close to the
weight vector
• Curvature increases as over-completeness is increased.
Sparse Coding Hypothesis [Paiton+20]
• Point-wise non-linear neurons – adversarial
attack travels parallel to the target class's
weight vector until it hits the decision
boundary.
• Population non-linear neurons – attack travels
perpendicular to the iso-contours until it
reaches the target class's weight vectors and
then travels along it.
• Perturbation likely to be semantically meaningful
• Key Takeaway:
• Population non-linearities induce robustness by
increasing specificity of the neurons.
Neural Activity Covariance [Hennig+21]
• It has been observed that neurons in the brain exhibit a degree of
covariability.
• Importantly, this covariability pattern remains largely fixed in the
short term, even if it hampers task performance.
• We hypothesize that this covariability pattern might have implications
for robustness.
• The space of possible activation patterns is reduced.
• The adversary can not modify the image in any arbitrary way, rather the
perturbations must respect the covariability pattern.
• If it does not, then the neural activities will be “projected” to the space that respects the
covariability pattern and under this projection the adversarial perturbation may no
longer be adversarial.
Computationalizing Neural Covariability
• A straightforward approach:
• Covariability ≡ Covariance
• Choose a family of probability distributions to model the conditional distribution of the
activations
𝑃(𝑎𝑖|𝑎−𝑖; Σ)
• Update each 𝑎𝑖 to maximize its conditional probability.
• Advantages:
• Simple and interpretable.
• Disadvantages:
• Simple distributions then to be unimodal so the activations will be pushed to a single value
and thus be rendered useless.
• Complicated multimodal distributions might be harder to learn and would still have a
relatively small number of stationary points.
• Covariance is symmetric by definition, however, we do not have reason to believe that such
symmetry of incoming and outgoing synaptic weights exists in the brain.
Computationalizing Neural Covariability
• Another approach:
• Covariability ≡ Predictability
• Let 𝑎𝑖 ≈ 𝑓𝑊 𝑎−𝑖
• Update each 𝑎𝑖 to optimize argmax𝑦𝑖
𝐷 𝑎𝑖, 𝑓𝑊 𝑎−𝑖
• Advantages:
• Flexibility
• 𝑓𝑊 can be selected to give larger number of stationary points
• Symmetric relationships are not required.
• Starting to look like Boltzmann Machines…
Emergent Architecture
• Synaptic connections are continuously being created and destroyed in
the brain.
• The network of connection, with features like recurrence, feedback
and skip connections is somewhat emergent.
• We try to simulate that by considering a fully connected network.
A Fully Connected (FC) Network with
Activation Consistency Maximization
• Let 𝑎𝑖 ≈ 𝑓𝑊𝑙 𝑎−𝑖 and let 𝑓𝑊𝑙 𝑎−𝑖 = 𝜓 𝑊𝑙
𝑎−𝑖
• Alternatively, we can let 𝑎𝑖 ≈ 𝑓𝑊𝑙 𝑎 and constrain 𝑊𝑙
to
have zero on the diagonal.
• Inference Algorithm
1. 𝑠 ← 𝑊𝑓𝑥
2. 𝑓𝑜𝑟 𝑖: 1 → 𝑘:
3. 𝑎 ← 𝜙 𝑠
4. 𝑠 ← 𝑎 − 𝜂𝑎 ∇𝑎 𝑎 − 𝜓(𝑊𝑙𝑎) 2
2
(𝑊𝑖𝑖
𝑙
= 0, ∀𝑖)
5. 𝑦 ← 𝑠1:𝐶
• During training run the inference algorithm and
backprop.
• Low magnitude weights are pruned away after
training.
Experimental Setup – Models
• First, we consider the case in which 𝜓(𝑊𝑙𝑎) = 𝑊𝑙𝑎, so 𝑓𝑊𝑙 is linear.
• We consider FC models with 10 and 64 units.
• The 10-unit model has no hidden units – the final activations of the 10 units
are the logits.
• The 64-unit model has 54 hidden units and so has more flexibility and greater
representational power.
• We experiment with 8, 16 and 32 iterations of activation consistency
optimization.
Experimental Setup – Baselines
• As baselines we consider two 2-layer MLPs, the first has 10 units in
both layer and the second has 64 units.
• These models have comparable numbers of parameters to the FC models.
• The FC model has 𝐷 × 𝑁 + 𝑁2
+ 𝑁 parameters, while the MLPs have 𝐷 ×
𝑁 + 𝑁 + 𝑁2 + 𝑁 + 𝑁 × 𝐶 + 𝐶.
• The performance difference will be due to the additional computation being
performed and not due to additional parameters/memorization capabilities.
Experimental Setup – Training and Evaluation
• The models are trained on 10K images from MNIST or FMNIST and
evaluated on 1K images.
• The models are also evaluated on adversarially perturbed data.
• The perturbations are computed using the Projected Gradient
Descent attack
• The attack is repeated several times with different upper bounds on
the ℓ∞ norms of the perturbations.
• The upper bounds we consider are 0.01, 0.02, 0.03, 0.05, 0.1
Results
• The FC models exhibit
significantly grater
adversarial robustness
• The FC models suffer slight
degradation in clean
accuracy
• Increasing the number of
units improves the
performance of the MLP
more significantly than the
FC models
• Increasing the number of
iterations improves
performance at higher levels
of perturbation.
MNIST
FMNIST
Learned Lateral Matrices
• The lateral matrices
exhibits a high level of
symmetry
• This is expected because
𝑓𝑊 is linear.
10-Units/32 steps
MNIST
FMNIST
Evolution of Activations and Accuracy –
FMNIST The linear separability and accuracy of some
classes increases rapidly across the iterations.
𝜖 = 0.0 𝜖 = 0.05 𝜖 = 0.1
10-Units
64-Units
Optimizing Non-Linear Consistency
• Let 𝜓 = 𝑅𝑒𝐿𝑈
• Hypothesis: Introducing non-linearity will allow for non-symmetric
relationships between neurons
• The experimental setup remains the same
Results
• Some conflicting results.
• Significant
improvements in the 10
unit model
• But less so in the 64 unit
model
• Robustness decreases in
the 64-unit MNIST
model but increases in
the 64-unit FMNIST
model.
MNIST
FMNIST
Evolution of Activations and Accuracy – 64-
Unit Models
There seems to be little change in the activations for
the MNIST model with ReLU consistency optimization.
MNIST FMNIST
Linear
ReLU
Learned Lateral Matrices
• The lateral matrices are
less symmetric now
10-Units/32 steps
MNIST
FMNIST
Introducing Activity-Input Consistency
Optimization
• Inference Algorithm:
1. 𝑠 ← 𝑊𝑓𝑥
2. For 𝑖: 1 → 𝑘:
3. 𝑎 ← 𝜙 𝑠
4. 𝑠 ← 𝑎 − 𝜂𝑎 ∇𝑎 𝑎 − 𝜓(𝑊𝑙𝑎) 2
2
+ 𝜵𝒂 𝒙 − 𝝍(𝑾𝒃𝒂) 𝟐
𝟐
(𝑊𝑖𝑖
𝑙
= 0, ∀𝑖)
5. 𝑦 ← 𝑠1:𝐶
• Everything else remains the same.
Motivating Activity-Input Consistency
Optimization
• Recall the predictive coding hypothesis
• The brain tries to make the internal representations at each level “consistent”
with representations at the previous level and the next level.
• It is possible (but possibly unlikely) that activity consistency
optimization modifies the activations to a point that they have no
information about the input.
• When 𝑓𝑊 is linear, there is a trivial solution 𝑎 = 𝟎.
• Adding an addition objective of reconstructing the input discourages
the optimization from discarding image related information.
Results
• Optimizing input-
activity consistency
improves robustness for
FMNIST but not MNIST
• Perhaps, augmentations
like non-linear
consistency and input-
activity consistency are
only needed for more
complex datasets.
MNIST
FMNIST
Convolutionalizing The Model: Method 1
Method 1: We simply scan the input using the model we used for MNIST and FMNIST
• This method effectively optimizes the consistency between the channels but not between different spatial
coordinates.
Convolutionalizing The Model: Method 1
Method 1: We simply scan the input using the model we used for MNIST and FMNIST
• This method effectively optimizes the consistency between the channels but not between different spatial
coordinates.
Convolutionalizing The Model: Method 1
Method 1: We simply scan the input using the model we used for MNIST and FMNIST
• This method effectively optimizes the consistency between the channels but not between different spatial
coordinates.
Convolutionalizing The Model: Method 1
Method 1: We simply scan the input using the model we used for MNIST and FMNIST
• This method effectively optimizes the consistency between the channels but not between different spatial
coordinates.
Advantages:
• Simple to implement
• Memory efficient – the weight matrices depend
only on the size of the kernel and the number of
channels, not the image size.
Disadvantages:
• Does not optimize consistency between spatial
coordinates.
Convolutionalizing The Model: Method 2
Method 2: Optimizing consistency across both channels and spatial coordinates is too memory intensive so
optimize consistency between spatial coordinates in each channel, but not across channels.
Convolutionalizing The Model: Method 2
Method 2: Optimizing consistency across both channels and spatial coordinates is too memory intensive so
optimize consistency between spatial coordinates in each channel, but not across channels.
Convolutionalizing The Model: Method 2
Method 2: Optimizing consistency across both channels and spatial coordinates is too memory intensive so
optimize consistency between spatial coordinates in each channel, but not across channels.
Advantages:
• Optimizes spatial consistency which is known to be
important in images.
Disadvantages:
• Still wasteful in resources, since images have local
spatial consistency but not global – better to
window the consistency optimization.
• Channel consistency is not optimized.
Convolutionalizing The Model: Method 3
Method 3: We simply scan the input using the model we used for
MNIST and FMNIST
• This method effectively optimizes the consistency between the
channels but not between different spatial coordinates.
Advantages:
• Optimizes both channel and local-spatial
consistency
Disadvantages:
• Windowing operation is slow
Convolutionalizing The Model: Method 3
Method 3: We simply scan the input using the model we used for
MNIST and FMNIST
• This method effectively optimizes the consistency between the
channels but not between different spatial coordinates.
Advantages:
• Optimizes both channel and local-spatial
consistency
Disadvantages:
• Windowing operation is slow
Convolutionalizing The Model: Method 3
Method 3: We simply scan the input using the model we used for
MNIST and FMNIST
• This method effectively optimizes the consistency between the
channels but not between different spatial coordinates.
Advantages:
• Optimizes both channel and local-spatial
consistency
Disadvantages:
• Windowing operation is slow
Experimental Setup – Models
• 𝜓 = 𝑅𝑒𝐿𝑈
• 64 units
• 1 layer
• 32 activity consistency optimization iterations
• 5x5 kernel and 3x3 stride
• ReLU activations
Experimental Setup – Baseline
• 1-layer CNN
• 64 channels
• 5x5 kernel with 3x3 stride
Experimental Setup – Training and Evaluation
• The models are trained on 10K images from CIFAR10 and evaluated
on 1K images.
• The models are also evaluated on adversarially perturbed data.
• The perturbations are computed using the Projected Gradient
Descent attack
• The attack is repeated several times with different upper bounds on
the ℓ∞ norms of the perturbations.
• The upper bounds we consider are 0.008, 0.016, 0.024, 0.032, 0.048, 0.064
Comparing All Methods
• Our method improves
robustness
significantly
• Robustness comes at
the cost of clean
accuracy
Summary and Conclusions
• Integrated biologically inspired mechanisms into deep learning models.
• These mechanisms constrain the activation patterns i.e. arbitrary patterns
become less likely
• In a way the model is internally denoising the perturbations and moving them
towards activations observed during training – some rudimentary memory
mechanism.
• Experimental results show that this makes the models more robust
• Hypothesis: the adversary has an additional task of ensuring that the perturbations
are “familiar” to the model.
• The experimental results generalize across datasets and model
architectures.
Future Directions
• Refine the convolutional architectures
• We might need to increase depth to improve accuracy
• Depth is required for shift invariance
• Further investigate the similarities between our consistency optimization
approach and Boltzmann machines / Hopfield nets
• There is a hypothesis that signaling between cortical areas in the brain
takes place in a very small “communication subspace” [Kohn+20]
• Neural activations in the source area that lie outside this subspace cause little or no
activity in the target area
• Perhaps quantizing the activity of the model may be a way of implementing this
subspace.
References
[Goodfellow+14] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
[Madry+18] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu, “Towards deep learning models resistant to adversarial attacks,” 2017.
[Kannan+18] Kannan, Harini, Alexey Kurakin, and Ian Goodfellow. "Adversarial logit pairing." arXiv preprint arXiv:1803.06373 (2018).
[Zhang+19] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 7472–7482. PMLR, 2019.
[Wong+20] Wong, Eric, Leslie Rice, and J. Zico Kolter. "Fast is better than free: Revisiting adversarial training." arXiv preprint arXiv:2001.03994 (2020).
[Rebuffi+21] Rebuffi, Sylvestre-Alvise, et al. "Fixing data augmentation to improve adversarial robustness." arXiv preprint arXiv:2103.01946 (2021).
[Cohen+19] Jeremy M. Cohen, Elan Rosenfeld, and J. Zico Kolter, “Certified adversarial robustness via randomized smoothing,” CoRR, 2019.
[Ilyas+18] Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry, “Adversarial examples are not bugs, they are features,” 2019.
[Wang+19] Haohan Wang, Xindi Wu, Pengcheng Yin, and Eric P. Xing, “High frequency component helps explain the generalization of convolutional neural networks,” CoRR, 2019.
[Firestone+20] Firestone, Chaz. "Performance vs. competence in human–machine comparisons." Proceedings of the National Academy of Sciences 117.43 (2020): 26562-26571.
[Shah+21a] Shah, Muhammad A., Raphael Olivier, and Bhiksha Raj. "Towards Adversarial Robustness Via Compact Feature Representations." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE,
2021.
[Nakkiran19] Preetum Nakkiran, “Adversarial robustness may be at odds with simplicity,” 2019.
[Belghazi+2018] Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm, “Mine: mutual information neural estimation,” arXiv:1801.04062, 2018.
[Shah+21b] Muhammad Shah, Raphael Olivier, and Bhiksha Raj, “Exploiting non-linear redundancy for neural model compression,” in ICPR, 2021.
[Parr+18] Parr, Thomas, Geraint Rees, and Karl J. Friston. "Computational neuropsychology and Bayesian inference." Frontiers in human neuroscience (2018): 61.
[Rao+99] Rao, Rajesh PN, and Dana H. Ballard. "Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects." Nature neuroscience 2.1 (1999): 79-87.
[Paiton+20] Paiton, Dylan M., et al. "Selectivity and robustness of sparse coding networks." Journal of vision 20.12 (2020): 10-10.
[Hennig+21] Hennig, Jay A., et al. "How learning unfolds in the brain: toward an optimization view." Neuron 109.23 (2021): 3720-3735.
[Kohn+20] Kohn, Adam, et al. "Principles of corticocortical communication: proposed schemes and design considerations." Trends in neurosciences 43.9 (2020): 725-737.

More Related Content

What's hot

Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)
Taehoon Kim
 
Classification vs clustering
Classification vs clusteringClassification vs clustering
Classification vs clustering
Khadija Parween
 
Decision tree
Decision treeDecision tree
Decision tree
ShraddhaPandey45
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle Introduction
Flavio Morelli
 
Linear and Logistics Regression
Linear and Logistics RegressionLinear and Logistics Regression
Linear and Logistics Regression
Mukul Kumar Singh Chauhan
 
Heap Tree.pdf
Heap Tree.pdfHeap Tree.pdf
Heap Tree.pdf
manahilzulfiqar6
 
Robustness of Deep Neural Networks
Robustness of Deep Neural NetworksRobustness of Deep Neural Networks
Robustness of Deep Neural Networks
khalooei
 
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen..."The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
LEE HOSEONG
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]JULIO GONZALEZ SANZ
 
Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9
Randa Elanwar
 
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Network
agdatalab
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
Khang Pham
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
Venkata Reddy Konasani
 
Neural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnNeural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to Learn
Kwanghee Choi
 
Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introduction
Christian Perone
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
Andrii Gakhov
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
Dong Guo
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
홍배 김
 
Enhanced Deep Residual Networks for Single Image Super-Resolution
Enhanced Deep Residual Networks for Single Image Super-ResolutionEnhanced Deep Residual Networks for Single Image Super-Resolution
Enhanced Deep Residual Networks for Single Image Super-Resolution
NAVER Engineering
 

What's hot (20)

Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)
 
Classification vs clustering
Classification vs clusteringClassification vs clustering
Classification vs clustering
 
Decision tree
Decision treeDecision tree
Decision tree
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle Introduction
 
Linear and Logistics Regression
Linear and Logistics RegressionLinear and Logistics Regression
Linear and Logistics Regression
 
Heap Tree.pdf
Heap Tree.pdfHeap Tree.pdf
Heap Tree.pdf
 
Robustness of Deep Neural Networks
Robustness of Deep Neural NetworksRobustness of Deep Neural Networks
Robustness of Deep Neural Networks
 
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen..."The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]
 
Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9
 
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Network
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Neural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnNeural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to Learn
 
Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introduction
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Enhanced Deep Residual Networks for Single Image Super-Resolution
Enhanced Deep Residual Networks for Single Image Super-ResolutionEnhanced Deep Residual Networks for Single Image Super-Resolution
Enhanced Deep Residual Networks for Single Image Super-Resolution
 

Similar to adversarial robustness lecture

riken-RBlur-slides.pptx
riken-RBlur-slides.pptxriken-RBlur-slides.pptx
riken-RBlur-slides.pptx
MuhammadAhmedShah2
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ChenYiHuang5
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
DataRobot
 
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
PyData
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdf
anandsimple
 
Operations Research.pptx
Operations Research.pptxOperations Research.pptx
Operations Research.pptx
banhi.guha
 
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
GeekPwn Keen
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
Seonho Park
 
IFTA2020 Kei Nakagawa
IFTA2020 Kei NakagawaIFTA2020 Kei Nakagawa
IFTA2020 Kei Nakagawa
Kei Nakagawa
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
Jason Riedy
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
Yogendra Singh
 
Deep learning Unit1 BasicsAllllllll.pptx
Deep learning Unit1 BasicsAllllllll.pptxDeep learning Unit1 BasicsAllllllll.pptx
Deep learning Unit1 BasicsAllllllll.pptx
FreefireGarena30
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
gmorishita
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
ananth
 
第19回ステアラボ人工知能セミナー発表資料
第19回ステアラボ人工知能セミナー発表資料第19回ステアラボ人工知能セミナー発表資料
第19回ステアラボ人工知能セミナー発表資料
Takayuki Osogami
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
PrabhuSelvaraj15
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
Fares Al-Qunaieer
 
Talwalkar mlconf (1)
Talwalkar mlconf (1)Talwalkar mlconf (1)
Talwalkar mlconf (1)
MLconf
 

Similar to adversarial robustness lecture (20)

riken-RBlur-slides.pptx
riken-RBlur-slides.pptxriken-RBlur-slides.pptx
riken-RBlur-slides.pptx
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdf
 
Operations Research.pptx
Operations Research.pptxOperations Research.pptx
Operations Research.pptx
 
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
 
gan.pdf
gan.pdfgan.pdf
gan.pdf
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
 
IFTA2020 Kei Nakagawa
IFTA2020 Kei NakagawaIFTA2020 Kei Nakagawa
IFTA2020 Kei Nakagawa
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Deep learning Unit1 BasicsAllllllll.pptx
Deep learning Unit1 BasicsAllllllll.pptxDeep learning Unit1 BasicsAllllllll.pptx
Deep learning Unit1 BasicsAllllllll.pptx
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
06 cs661 qb1_sn
06 cs661 qb1_sn06 cs661 qb1_sn
06 cs661 qb1_sn
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
第19回ステアラボ人工知能セミナー発表資料
第19回ステアラボ人工知能セミナー発表資料第19回ステアラボ人工知能セミナー発表資料
第19回ステアラボ人工知能セミナー発表資料
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Talwalkar mlconf (1)
Talwalkar mlconf (1)Talwalkar mlconf (1)
Talwalkar mlconf (1)
 

Recently uploaded

Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
NelTorrente
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 

Recently uploaded (20)

Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 

adversarial robustness lecture

  • 2. Outline • Part I: Introduction to Adversarial Perturbations • What are adversarial perturbations? • How are adversarial perturbations created? • How to defend against adversarial perturbations? • Why do adversarial perturbations exist? • Part II: Past and Current Projects in Our Group • Towards Adversarial Robustness via Compact Feature Representations • Biologically Inspired Models for Adversarial Robustness
  • 3. Outline • Part I: Introduction to Adversarial Perturbations • What are adversarial perturbations? • How are adversarial perturbations created? • How to defend against adversarial perturbations? • Why do adversarial perturbations exist? • Part II: Past and Current Projects in Our Group • Towards Adversarial Robustness via Compact Feature Representations • Biologically Inspired Models for Adversarial Robustness
  • 4. Adversarial Attacks on ML Models 𝛿 𝑥 + 𝛿 Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014). ϵ
  • 5. Adversarial Vulnerability of DNNs • We have two classifiers The Human The ML classifier The Adversary 5
  • 6. Adversarial Vulnerability of DNNs • The ML classifier is trying to replicate the human’s decision function. cat 6
  • 7. Adversarial Vulnerability of DNNs • To teach the classifier, the human provides it very sparse feedback. That’s a cat! 7
  • 8. Adversarial Vulnerability of DNNs • The classifier is trying to replicate the human’s decision function. cat 8
  • 9. Adversarial Vulnerability of DNNs • The classifier is trying to replicate the human’s decision function. cat 9
  • 10. Adversarial Vulnerability of DNNs • The classifier is trying to replicate the human’s decision function. cat 10
  • 11. Adversarial Vulnerability of DNNs • The classifier is trying to replicate the human’s decision function. cat 11
  • 12. Adversarial Vulnerability of DNNs • The adversary searches for points within (an approximation of) the human’s perceptual boundary for which the ML classifier responds differently than we do cat 12
  • 13. Adversarial Vulnerability of DNNs • If the classifier accurately models the perceptual boundary, the adversary would have to find a point outside the boundary to change the classifier’s output. cat 13
  • 14. Adversarial Vulnerability of DNNs • If the classifier accurately models the perceptual boundary the adversary would have to find a point outside the boundary to change the classifier’s output. Not A Cat 14
  • 15. Outline • Part I: Introduction to Adversarial Perturbations • What are adversarial perturbations? • How are adversarial perturbations created? • How to defend against adversarial perturbations? • Why do adversarial perturbations exist? • Part II: Past and Current Projects in Our Group • Towards Adversarial Robustness via Compact Feature Representations • Biologically Inspired Models for Adversarial Robustness
  • 16. Adversarial Attack Methods • Fast Gradient Sign Method (FGSM) [Goodfellow+2014] 𝑥𝑎𝑑𝑣 = 𝑥 + 𝜖 ⋅ sign(∇𝑥ℒ 𝑓 𝑥 , 𝑦 ) cat 16 𝜖 ⋅ sign(∇𝑥ℒ 𝑓 𝑥 , 𝑦 )
  • 17. Adversarial Attack Methods • Projected Gradient Descent (PGD) [Madry+2018] 1. 𝛿 = Π𝜖 𝑈 −1,1 2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾 3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦 4. 𝛿 ← Π𝑥 𝑥 + 𝛿′ − 𝑥 cat 17 Π𝜖 is a projection onto a ℓ𝑝-norm ball of radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used and Π𝑥 projects to onto the subspace that 𝑥 lies in. Usually [-1, 1] for images.
  • 18. Adversarial Attack Methods • Projected Gradient Descent (PGD) [Madry+2018] 1. 𝛿 = Π𝜖 𝑈 −1,1 2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾 3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦 4. 𝛿 ← Π𝑥 𝑥 + 𝛿 − 𝑥 cat 18 Π𝜖 is a projection onto a ℓ𝑝-norm ball of radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used and Π𝑥 projects to onto the subspace that 𝑥 lies in. Usually [-1, 1] for images.
  • 19. Adversarial Attack Methods • Projected Gradient Descent (PGD) [Madry+2018] 1. 𝛿 = Π𝜖 𝑈 −1,1 2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾 3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦 4. 𝛿 ← Π𝑥 𝑥 + 𝛿 − 𝑥 cat 19 Π𝜖 is a projection onto a ℓ𝑝-norm ball of radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used and Π𝑥 projects to onto the subspace that 𝑥 lies in. Usually [-1, 1] for images.
  • 20. Adversarial Attack Methods • Projected Gradient Descent (PGD) [Madry+2018] 1. 𝛿 = Π𝜖 𝑈 −1,1 2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾 3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦 4. 𝛿 ← Π𝑥 𝑥 + 𝛿 − 𝑥 cat 20 Π𝜖 is a projection onto a ℓ𝑝-norm ball of radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used and Π𝑥 projects to onto the subspace that 𝑥 lies in. Usually [-1, 1] for images.
  • 21. Adversarial Attack Methods • Projected Gradient Descent (PGD) [Madry+2018] 1. 𝛿 = Π𝜖 𝑈 −1,1 2. 𝑓𝑜𝑟 𝑘: 1 → 𝐾 3. 𝛿′ ← Π𝜖 𝛿 + ∇𝛿ℒ 𝑓 𝑥 + 𝛿 , 𝑦 4. 𝛿 ← Π𝑥 𝑥 + 𝛿 − 𝑥 cat 21 Π𝜖 is a projection onto a ℓ𝑝-norm ball of radius 𝜖. Usually, ℓ∞ or ℓ2 norms are used and Π𝑥 projects to onto the subspace that 𝑥 lies in. Usually [-1, 1] for images.
  • 22. Outline • Part I: Introduction to Adversarial Perturbations • What are adversarial perturbations? • How are adversarial perturbations created? • How to defend against adversarial perturbations? • Why do adversarial perturbations exist? • Part II: Past and Current Projects in Our Group • Towards Adversarial Robustness via Compact Feature Representations • Biologically Inspired Models for Adversarial Robustness
  • 23. Types of Adversarial Defenses • Empirical Defenses: • Work well in most practical scenarios (hence empirical) • No formal proof that they will always work. • Certifiable Defenses: • Provable (certifiable) that prediction of the model for a given input does not change if a norm-bounded adversarial perturbation is added to it. ‖𝛿‖
  • 24. Adversarial Training • AT is perhaps the most successful empirical defense against adversarial attacks. • Over the years, several variations of AT have been proposed that have made it more effective. • Basic Algorithm [Madry+2017] 1. 𝑓𝑜𝑟 𝑥, 𝑦 ∈ 𝐷: 2. 𝑥𝑎𝑑𝑣 ← max 𝑥′∈𝒳 𝑥−𝑥′ ≤𝜖 ℒ(𝑓𝜃 𝑥′ , 𝑦) 3. 𝜃 ← ∇𝜃ℒ 𝑓𝜃 𝑥𝑎𝑑𝑣 , 𝑦 • Note that we create a new adversarial example in each iteration. Why? • The model has changed so the gradients will change and thus the adversarial perturbation will change
  • 25. Adversarial Training • Some issues: • Multiple optimization steps required for computing adversarial perturbations => slow and computationally intensive => difficult to scale to larger models/datasets. • Robust overfitting – robust accuracy improves on the train set but decreases on the testing set. • Overfitting to the attack type and attack parameters used during training.
  • 26. Incremental Improvements to Adversarial Training • Logit Pairing [Kannan+18]: 𝜃 ← ∇𝜃 ℒ 𝑓𝜃 𝑥𝑎𝑑𝑣 , 𝑦 + 𝛼 𝑓𝜃 𝑥 − 𝑓𝜃 𝑥𝑎𝑑𝑣 • TRADES [Zhang+19]: argmax 𝑥′∈𝒳 𝑥−𝑥′ ≤𝜖 ℒ 𝑓𝜃 𝑥 , 𝑓𝜃 𝑥𝑎𝑑𝑣 𝜃 ← ∇𝜃 ℒ 𝑓𝜃 𝑥𝑎𝑑𝑣 , 𝑦 + ℒ 𝑓𝜃 𝑥 , 𝑓𝜃 𝑥𝑎𝑑𝑣 /𝜆
  • 27. Speeding Up Adversarial Training [Wong+20]
  • 28. Speeding Up Adversarial Training [Wong+20] • Use a single FGSM step instead of multiple PGD steps during adversarial training. • Generally this does not work, but the authors state that the key is random initialization of the perturbation. • Can train an ImageNet model in 12 hours, compared to 50 hours required for ImageNet.
  • 30. State-of-the-Art Adversarial Training [Rebuffi+21] • Combines model weight averaging, data augmentation and synthetic data generation to achieve SOTA robust accuracy on CIFAR-10 • Weight Averaging: 1. 𝜃𝑡+1 = 𝜃𝑡 + 𝜂∇𝜃 2. 𝜃𝑡+1 = 𝜏𝜃𝑡+1 + 1 − 𝜏 𝜃𝑡 • Uses TRADES to perform AT. • Key Outcomes: • Weight Averaging improves robustness • CutMix data augmentation provides the best clean-robust accuracy tradeoff • Increasing the training data improves robust accuracy • Using a small amount synthetic data is beneficial.
  • 31. Randomized Smoothing [Cohen+19] • RS is a very popular certifiable defense. • Smoothed Classifier – 𝑔: 𝑔 𝑥 = argmax𝑐∈𝒴𝑃𝜀~𝒩(0,𝐼𝜎) 𝑓 𝑥 + 𝜀 = 𝑐 • Smoothing to Robustness: • Let 𝑝𝐴 and 𝑝𝐵 be the probabilities of the most probable class and the second- most probable class, respectively and let Φ be the Gaussian CDF. • 𝑔 is robust in a ℓ2 ball of radius 𝑅 = 𝜎 2 Φ−1 𝑝𝐴 − Φ−1 𝑝𝐵 around 𝑥.
  • 32. Randomized Smoothing [Cohen+19] • SAMPLEUNDERNOISE: 1. 𝑓𝑜𝑟 𝑖: 1 → 𝑛: 2. 𝜖~𝒩 0, 𝐼𝜎 3. 𝑐 = 𝑓 𝑥 + 𝜖 4. 𝑐𝑜𝑢𝑛𝑡𝑠 𝑐 += 1 • BINOMPVALUE returns the p-value of the hypothesis test that 𝑛𝐴~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛𝐴 +
  • 33. Outline • Part I: Introduction to Adversarial Perturbations • What are adversarial perturbations? • How are adversarial perturbations created? • How to defend against adversarial perturbations? • Why do adversarial perturbations exist? • Part II: Past and Current Projects in Our Group • Towards Adversarial Robustness via Compact Feature Representations • Biologically Inspired Models for Adversarial Robustness
  • 34. The Classifier is Learning Non-Robust Features • The features extracted from the data can be of 3 types: • Robust: • Features that are likely to be meaningful to humans as well. • Non-robust: • Features that are useful for the task at hand but are either not what humans would use, or their usefulness is an artifact of the dataset • Useless: • They can not be used to improve performance on the task.
  • 35. Example: XOR • Now the algorithm has been provided this new table instead • Target Function: 𝑌 = 𝑋1 xor 𝑋2 • 𝑋3 is a spurious input X1 X2 X3 Y 0 0 0 0 0 1 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 1 1 1 1 0 1 1 1 1 1 0 35
  • 36. Example: XOR • The algorithm can learn any of these patterns for the unseen input combinations • Only one is right for our target function • If it learns any of the others, the output for some combinations of X1 and X2 can be made erroneous by choosing the right X3 X1 X2 X3 Y 0 0 0 0 0 1 0 1 1 0 0 1 1 1 0 0 0 0 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 1 1 1 0 1 1 36
  • 37. Example: XOR • The number of missing patterns is exponential in the number of spurious bits • The number of possible extensions to the table is exponential in the number of missing patterns • The number of ways of adversarially modifying inputs increases super- exponentially with the number of spurious bits X1 X2 X3 Y 0 0 0 0 0 1 0 1 1 0 0 1 1 1 0 0 0 0 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 1 1 1 0 1 1 2𝐾 − 𝐷 22𝐾−𝐷 37
  • 38. Robust and Non-Robust Features in Images • Ilyas+18 used adversarial attacks to disentangle robust and non- robust features • Non-robust features can provide good accuracy on clean data.
  • 39. High Frequency Features [Wang+20] • [Wang+20] identifies a particular type of non-robust feature – high frequency components. • Generally human perception operates on a bounded band of frequencies, both in terms of visual and audio perception. • HF components in audio are in audible to humans (e.g. dog whistle) • HF components in images seem like noise to us and we try to filter it out. • HF components of the images are separated by computing the fourier transform of the image and thresholding the distance to the component to the centroid FFT Log https://homepages.inf.ed.ac.uk/rbf/HIPR2/fourier.htm
  • 40. High Frequency Features [Wang+20] • It turns out image classifiers are heavily reliant on high-frequency components. • If a trained model is presented with only the low frequency components, it performs poorly. • If it is presented with only the high frequency components it predicts correctly • In the initial epochs using LF components does not harm training accuracy, but in later epochs it does •  models learn LF components first then they learn HF components.
  • 41. High Frequency Features [Wang+20] • Frequency Components and Robustness: • If models rely heavily on HF components are vulnerable • Humans don’t rely on HF, so changing HF components will likely not change the human’s prediction. • If a convolutional filter highly weighs HF components, the model will rely on HF components. • Hypothesis: Smoothing the filter will improve robustness • There is a noticeable improvement.
  • 42. Differences between Biological Systems and ML Models [Firestone+20] • Biological and artificial systems are constrained in different ways, so comparison based only on performance are not apples-to- apples. • Humans have foveated vision. • Adding a foveated filter to images before passing them to CNNs yields more semantically meaningful perturbations
  • 43. Differences between Biological Systems and ML Models [Firestone+20] • Biological and artificial systems are constrained in different ways, so comparison based only on performance are not apples-to- apples. • Humans have foveated vision. • Machines have a restricted vocabulary 1. Crossword 2. Bagel 3. Starfish 4. School bus
  • 44. Differences between Biological Systems and ML Models [Firestone+20] • Biological and artificial systems are constrained in different ways, so comparison based only on performance are not apples-to- apples. • Humans have foveated vision. • Machines have a restricted vocabulary • Different inductive biases – shape vs. texture
  • 45. Differences between Biological Systems and ML Models [Firestone+20] • Biological and artificial systems are constrained in different ways, so comparison based only on performance are not apples-to- apples. • Humans have foveated vision. • Machines have a restricted vocabulary • Different inductive biases – shape vs. texture • Machines (usually) perform feed forward processing, while humans can do recurrent processing
  • 46. Outline • Part I: Introduction to Adversarial Perturbations • What are adversarial perturbations? • How are adversarial perturbations created? • How to defend against adversarial perturbations? • Why do adversarial perturbations exist? • Part II: Past and Current Projects in Our Group • Towards Adversarial Robustness via Compact Feature Representations • Biologically Inspired Models for Adversarial Robustness
  • 47. Towards Adversarial Robustness via Compact Feature Representations [Shah+21a]
  • 48. Recall the Example: XOR • Now the algorithm has been provided this new table instead • Target Function: 𝑌 = 𝑋1 xor 𝑋2 • 𝑋3 is a spurious input X1 X2 X3 Y 0 0 0 0 0 1 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 1 1 1 1 0 1 1 1 1 1 0 48
  • 49. Neurons Are Features in DNNs • Too few neurons  features are under-specified  low accuracy • Too many neurons  features are over-specified  adversarial vulnerability We propose a method to identify superfluous features and remove/reduce their influence on the model’s output
  • 50. Identifying Superfluous Features • Let 𝒇 = [𝑓1, … , 𝑓𝑛] be feature vector • We can decompose each 𝑓𝑖 ∈ 𝒇 as 𝑓𝑖 = 𝜙 𝒇−𝑖 + 𝛿𝑖 • 𝛿𝑖 is the novel information encoded by 𝑓𝑖 and 𝜙 𝒇−𝑖 is the redundant information. • We quantify the usefulness of 𝑓𝑖 as 𝐼(𝛿𝑖, 𝑌), where 𝑌 is the true label • Less useful (superfluous) features are not related to the true label and can be exploited by the adversary
  • 51. Decomposing the Features [Shah+20] • We assume 𝜙 to be a linear function 𝜙 𝒇−𝑖 = 𝒇−𝑖𝒂(𝑖) • We can find 𝒂(𝑗) by solving minimize𝒂∈ℝ𝑛−1𝔼𝑥∼𝒫 𝒇−𝑖𝒂(𝑖) − 𝑓𝑖 2 • We can compute 𝛿𝑖 = 𝑓𝑖 − 𝜙(𝒇−𝑖) 51
  • 52. Determining the Usefulness of A Feature [Shah+20] • Recall: We define the usefulness of 𝑓𝑖 as 𝐼(𝛿𝑖, 𝑌) • 𝐼 𝛿𝑖, 𝑌 is difficult to compute so we estimate it using two methods: • First-order approximation: 𝐼 𝛿𝑖, 𝑌 = ∇𝑓𝑖 𝐿 𝑋, 𝑌 𝛿𝑖 • MINE [Belghazi+2018]: Uses a neural network to estimate MI. • We rank the features by their usefulness and remove the least useful features using LRE-AMC
  • 53. Removing Features Using LRE-AMC [Shah+21b] • Given the neuron-feature equivalence 𝑓𝑖 = 𝜎(𝑊𝑖𝒇) • To remove 𝑓𝑖 we can simply remove the 𝑖𝑡ℎ column of 𝑊 • This also removes the influence of 𝜙 𝒇−𝑖 which may cause the weights in the next layer (𝑊) to become sub-optimal. • To mitigate the error, we make the following adjustment 𝑊𝑘𝑗 ← 𝑊𝑘𝑗 + 𝑊𝑘𝑖𝒂𝑗 𝑖 • After removing the neuron, we fine-tune the network.
  • 54. Lossless Redundancy Elimination • Consider this network • 𝑦1 = 𝑤11𝑧1 + 𝑤12𝑧2 + 𝑤13𝑧3 • 𝑦2 = 𝑤21𝑧1 + 𝑤22𝑧2 + 𝑤23𝑧3 𝑧2 𝑧3 𝑧1 𝑦2 𝑦1 54
  • 55. Lossless Redundancy Elimination • Consider this network • 𝑦1 = 𝑤11𝑧1 + 𝑤12𝑧2 + 𝑤13𝑧3 • 𝑦2 = 𝑤21𝑧1 + 𝑤22𝑧2 + 𝑤23𝑧3 • Suppose 𝑧1 = 𝑧2 + 𝑧3 𝑧2 𝑧3 𝑧1 𝑦2 𝑦1 55
  • 56. Lossless Redundancy Elimination • Consider this network • 𝑦1 = 𝑤11𝑧1 + 𝑤12𝑧2 + 𝑤13𝑧3 • 𝑦2 = 𝑤21𝑧1 + 𝑤22𝑧2 + 𝑤23𝑧3 • Suppose 𝑧1 = 𝑧2 + 𝑧3 • We can remove 𝑧1 • And readjust weights • 𝑤12 ← 𝑤12 + 𝑤11 𝑤13 ← 𝑤13 + 𝑤11 • 𝑤22 ← 𝑤22 + 𝑤21 𝑤23 ← 𝑤23 + 𝑤21 𝑧2 𝑧3 𝑦2 𝑦1 56
  • 57. Evaluation Setup • Models • VGG-16 trained on CIFAR-10 • AlexNet trained on CIFAR-10 • LeNet trained on MNIST • Attack Settings: • PGD adversary with various ℓ∞ and ℓ2 constraints • Baseline: • Adversarial Training [Madry+17] • Gaussian Smoothing [Cohen+19] • Vanilla LRE-AMC (ranks and prunes neurons with low 𝛿𝑖)
  • 58. Results Method −Δ𝑃 𝐴𝑐𝑐𝑐𝑙𝑛 ℓ∞ = 0.015 ℓ∞ = 0.031 ℓ∞ = 0.062 ℓ2 = 0.5 ℓ2 = 1.0 ℓ2 = 2.0 LeNet None 0 99.1 1.2 0 0 18 3.97 1 Ours (MI) 86.8 95.7 13.1 12.1 10.2 14.9 12.7 11.5 Ours (FO) 93.9 96.2 6.3 4.5 2.8 6.3 2.8 0.8 AlexNet None 0 77.5 8.9 0.3 0.08 7.23 0.2 0.06 LRE-AMC 97.7 74.6 23.7 17 14.3 25 17.3 14.5 Ours(FO) 98.3 72.3 14.6 10.1 9.5 13.2 9.8 9.2 Ours(MI) 98.3 72.2 10.2 4.2 3.2 9.5 4.3 3.7 VGG-16 None 0 90.3 1.4 0 0 4 1.8 0.6 AdvTrain 0 74.9 57.1 37.1 8.6 53.2 27.6 3.5 GSmooth 0 82.9 43.5 13.8 0.8 47.6 16.6 1.0 Ours(FO) 87.7 85.6 20.0 17.4 13.3 20.6 19.3 15.8 LRE-AMC 84.6 87.7 11.2 9.3 5.7 11.5 9.9 6.8 Ours(MI) 98.3 85.7 11.8 9.2 7.0 12.4 9.5 7.1 • Removing spurious features significantly improves robustness • First-order estimation of MI seems to work better than MINE • The robustness gains of our techniques generalize to even larger perturbation sizes, where it outperforms adversarial training and gaussian smoothing • Note that our method does not employ perturbed data at any point unlike the other defenses
  • 59. Conclusion • We have shown that pruning neurons that encode superfluous features improves the robustness of DNNs while making them more compact. • Our results appears to contradict [Nakkiran, 2019, Madry+, 2017] who posit that high capacity models are a pre-requisite for adversarial robustness. • Our results show that high capacity may be required at training time to learn robust features, but judiciously removing spurious neurons/features can make the models much more robust.
  • 60. Outline • Part I: Introduction to Adversarial Perturbations • What are adversarial perturbations? • How are adversarial perturbations created? • How to defend against adversarial perturbations? • Why do adversarial perturbations exist? • Part II: Past and Current Projects in Our Group • Towards Adversarial Robustness via Compact Feature Representations • Biologically Inspired Models for Adversarial Robustness
  • 61. Differences between Biological Systems and ML Models [Firestone+20] • Biological and artificial systems are constrained in different ways, so comparison based only on performance are not apples-to- apples. • Humans have foveated vision. • Machines have a restricted vocabulary • Different inductive biases – shape vs. texture • Machines (usually) perform feed forward processing, while humans can do recurrent processing
  • 62. The Bayesian Brain Hypothesis [Parr+18] • Brain encodes beliefs (in the synaptic weights) about the causes of sensory data, and these beliefs are updated in response to new sensory information • Complete class theorem: there is always a prior belief that renders an observed behavior Bayes optimal. • Pathologies related to loss of sensory signals can be explained as there being no observation to modulate the influence of the prior. • Autism can be understood as weak prior beliefs about the environment due to which patients rely to a greater degree on the stimuli obtained from the environment. • There are ascending and descending connections in the brain: • Ascending connections carry predictions and descending connections carry errors
  • 63. The Predictive Coding Hypothesis [Rao+99] • “The approach postulates that neural networks learn the statistical regularities of the natural world, signaling deviations from such regularities to higher processing centers. This reduces redundancy by removing the predictable, and hence redundant, components of the input signal”. [Rao+99] 𝐼|𝑟~𝒩 𝑓 𝑈𝑟 , 𝜎𝐼 ∝ 𝑓 𝑈𝑟 − 𝐼 2 𝑟|𝑟𝑡𝑑~𝒩 𝑟𝑡𝑑, 𝜎𝑖 ∝ 𝑟 − 𝑟𝑡𝑑 2 +𝑔(𝑟)
  • 64. The Predictive Coding Hypothesis [Rao+99] • Another way to look at this hypothesis is that the brain is aligning the observations to its expectation. • Since the brain mostly observes natural (clean) data its prior is closely aligned with clean data • We hypothesize that when we observe adversarially perturbed data, the brain aligns it with its expectation and, in a way, denoises the data internally.
  • 65. Sparse Coding Hypothesis [Paiton+20] • The neural activation patterns that arise in response to a stimulus tend to be very sparse. • This is true deep within cortical areas, as well as in the retina and Lateral Geniculate Nucleus.
  • 66. Sparse Coding Hypothesis [Paiton+20] • Locally Competitive Algorithm (LCA) [Rozell+18] 𝐸 𝑢 = 𝑥 − Φ𝜎(𝑢) 2 2 + 𝜆 𝜎(𝑢) 1 • Used to learn sparse representations from data 1. 𝑓𝑜𝑟 𝑘: 1 → 𝐾: 2. 𝑢 ← ∇𝑢𝐸 𝑢 3. Φ ← ∇Φ𝐸 𝑢 • ∇𝑢𝐸 𝑢 = 𝜎′ 𝑢 ⊙ −Φ𝑇𝑠 + Φ𝑇Φ𝜎 𝑢 − 𝜆𝟏 • Conventionally each activation is computed (somewhat) conditionally independent of the other activations – they are pointwise activations. • Due to this term each activation is dependent on the activations of all the other neurons around it – these are population activations.
  • 67. Sparse Coding Hypothesis [Paiton+20] – Relation to Adversarial Robustness • Iso-contour is a group of points at which a function has the same value • Point-wise non-linear neurons the iso-response contours are straight • perturbations perpendicular to the weight vector do not change the output • Perturbations parallel to the weight vector can change the output. • They can be arbitrarily far from the weights • Neurons with population non-linearities (e.g. with horizontal connections) have curved iso-response contours • if a perturbation is perpendicular to the weight of one neuron it might not be perpendicular to the weight of some other neuron thus the output of all the neurons will change. • Curved iso-contours indicate specificity towards a preferred direction • For highly curved surfaces perturbations need to be parallel and close to the weight vector • Curvature increases as over-completeness is increased.
  • 68. Sparse Coding Hypothesis [Paiton+20] • Point-wise non-linear neurons – adversarial attack travels parallel to the target class's weight vector until it hits the decision boundary. • Population non-linear neurons – attack travels perpendicular to the iso-contours until it reaches the target class's weight vectors and then travels along it. • Perturbation likely to be semantically meaningful • Key Takeaway: • Population non-linearities induce robustness by increasing specificity of the neurons.
  • 69. Neural Activity Covariance [Hennig+21] • It has been observed that neurons in the brain exhibit a degree of covariability. • Importantly, this covariability pattern remains largely fixed in the short term, even if it hampers task performance. • We hypothesize that this covariability pattern might have implications for robustness. • The space of possible activation patterns is reduced. • The adversary can not modify the image in any arbitrary way, rather the perturbations must respect the covariability pattern. • If it does not, then the neural activities will be “projected” to the space that respects the covariability pattern and under this projection the adversarial perturbation may no longer be adversarial.
  • 70. Computationalizing Neural Covariability • A straightforward approach: • Covariability ≡ Covariance • Choose a family of probability distributions to model the conditional distribution of the activations 𝑃(𝑎𝑖|𝑎−𝑖; Σ) • Update each 𝑎𝑖 to maximize its conditional probability. • Advantages: • Simple and interpretable. • Disadvantages: • Simple distributions then to be unimodal so the activations will be pushed to a single value and thus be rendered useless. • Complicated multimodal distributions might be harder to learn and would still have a relatively small number of stationary points. • Covariance is symmetric by definition, however, we do not have reason to believe that such symmetry of incoming and outgoing synaptic weights exists in the brain.
  • 71. Computationalizing Neural Covariability • Another approach: • Covariability ≡ Predictability • Let 𝑎𝑖 ≈ 𝑓𝑊 𝑎−𝑖 • Update each 𝑎𝑖 to optimize argmax𝑦𝑖 𝐷 𝑎𝑖, 𝑓𝑊 𝑎−𝑖 • Advantages: • Flexibility • 𝑓𝑊 can be selected to give larger number of stationary points • Symmetric relationships are not required. • Starting to look like Boltzmann Machines…
  • 72. Emergent Architecture • Synaptic connections are continuously being created and destroyed in the brain. • The network of connection, with features like recurrence, feedback and skip connections is somewhat emergent. • We try to simulate that by considering a fully connected network.
  • 73. A Fully Connected (FC) Network with Activation Consistency Maximization • Let 𝑎𝑖 ≈ 𝑓𝑊𝑙 𝑎−𝑖 and let 𝑓𝑊𝑙 𝑎−𝑖 = 𝜓 𝑊𝑙 𝑎−𝑖 • Alternatively, we can let 𝑎𝑖 ≈ 𝑓𝑊𝑙 𝑎 and constrain 𝑊𝑙 to have zero on the diagonal. • Inference Algorithm 1. 𝑠 ← 𝑊𝑓𝑥 2. 𝑓𝑜𝑟 𝑖: 1 → 𝑘: 3. 𝑎 ← 𝜙 𝑠 4. 𝑠 ← 𝑎 − 𝜂𝑎 ∇𝑎 𝑎 − 𝜓(𝑊𝑙𝑎) 2 2 (𝑊𝑖𝑖 𝑙 = 0, ∀𝑖) 5. 𝑦 ← 𝑠1:𝐶 • During training run the inference algorithm and backprop. • Low magnitude weights are pruned away after training.
  • 74. Experimental Setup – Models • First, we consider the case in which 𝜓(𝑊𝑙𝑎) = 𝑊𝑙𝑎, so 𝑓𝑊𝑙 is linear. • We consider FC models with 10 and 64 units. • The 10-unit model has no hidden units – the final activations of the 10 units are the logits. • The 64-unit model has 54 hidden units and so has more flexibility and greater representational power. • We experiment with 8, 16 and 32 iterations of activation consistency optimization.
  • 75. Experimental Setup – Baselines • As baselines we consider two 2-layer MLPs, the first has 10 units in both layer and the second has 64 units. • These models have comparable numbers of parameters to the FC models. • The FC model has 𝐷 × 𝑁 + 𝑁2 + 𝑁 parameters, while the MLPs have 𝐷 × 𝑁 + 𝑁 + 𝑁2 + 𝑁 + 𝑁 × 𝐶 + 𝐶. • The performance difference will be due to the additional computation being performed and not due to additional parameters/memorization capabilities.
  • 76. Experimental Setup – Training and Evaluation • The models are trained on 10K images from MNIST or FMNIST and evaluated on 1K images. • The models are also evaluated on adversarially perturbed data. • The perturbations are computed using the Projected Gradient Descent attack • The attack is repeated several times with different upper bounds on the ℓ∞ norms of the perturbations. • The upper bounds we consider are 0.01, 0.02, 0.03, 0.05, 0.1
  • 77. Results • The FC models exhibit significantly grater adversarial robustness • The FC models suffer slight degradation in clean accuracy • Increasing the number of units improves the performance of the MLP more significantly than the FC models • Increasing the number of iterations improves performance at higher levels of perturbation. MNIST FMNIST
  • 78. Learned Lateral Matrices • The lateral matrices exhibits a high level of symmetry • This is expected because 𝑓𝑊 is linear. 10-Units/32 steps MNIST FMNIST
  • 79. Evolution of Activations and Accuracy – FMNIST The linear separability and accuracy of some classes increases rapidly across the iterations. 𝜖 = 0.0 𝜖 = 0.05 𝜖 = 0.1 10-Units 64-Units
  • 80. Optimizing Non-Linear Consistency • Let 𝜓 = 𝑅𝑒𝐿𝑈 • Hypothesis: Introducing non-linearity will allow for non-symmetric relationships between neurons • The experimental setup remains the same
  • 81. Results • Some conflicting results. • Significant improvements in the 10 unit model • But less so in the 64 unit model • Robustness decreases in the 64-unit MNIST model but increases in the 64-unit FMNIST model. MNIST FMNIST
  • 82. Evolution of Activations and Accuracy – 64- Unit Models There seems to be little change in the activations for the MNIST model with ReLU consistency optimization. MNIST FMNIST Linear ReLU
  • 83. Learned Lateral Matrices • The lateral matrices are less symmetric now 10-Units/32 steps MNIST FMNIST
  • 84. Introducing Activity-Input Consistency Optimization • Inference Algorithm: 1. 𝑠 ← 𝑊𝑓𝑥 2. For 𝑖: 1 → 𝑘: 3. 𝑎 ← 𝜙 𝑠 4. 𝑠 ← 𝑎 − 𝜂𝑎 ∇𝑎 𝑎 − 𝜓(𝑊𝑙𝑎) 2 2 + 𝜵𝒂 𝒙 − 𝝍(𝑾𝒃𝒂) 𝟐 𝟐 (𝑊𝑖𝑖 𝑙 = 0, ∀𝑖) 5. 𝑦 ← 𝑠1:𝐶 • Everything else remains the same.
  • 85. Motivating Activity-Input Consistency Optimization • Recall the predictive coding hypothesis • The brain tries to make the internal representations at each level “consistent” with representations at the previous level and the next level. • It is possible (but possibly unlikely) that activity consistency optimization modifies the activations to a point that they have no information about the input. • When 𝑓𝑊 is linear, there is a trivial solution 𝑎 = 𝟎. • Adding an addition objective of reconstructing the input discourages the optimization from discarding image related information.
  • 86. Results • Optimizing input- activity consistency improves robustness for FMNIST but not MNIST • Perhaps, augmentations like non-linear consistency and input- activity consistency are only needed for more complex datasets. MNIST FMNIST
  • 87. Convolutionalizing The Model: Method 1 Method 1: We simply scan the input using the model we used for MNIST and FMNIST • This method effectively optimizes the consistency between the channels but not between different spatial coordinates.
  • 88. Convolutionalizing The Model: Method 1 Method 1: We simply scan the input using the model we used for MNIST and FMNIST • This method effectively optimizes the consistency between the channels but not between different spatial coordinates.
  • 89. Convolutionalizing The Model: Method 1 Method 1: We simply scan the input using the model we used for MNIST and FMNIST • This method effectively optimizes the consistency between the channels but not between different spatial coordinates.
  • 90. Convolutionalizing The Model: Method 1 Method 1: We simply scan the input using the model we used for MNIST and FMNIST • This method effectively optimizes the consistency between the channels but not between different spatial coordinates. Advantages: • Simple to implement • Memory efficient – the weight matrices depend only on the size of the kernel and the number of channels, not the image size. Disadvantages: • Does not optimize consistency between spatial coordinates.
  • 91. Convolutionalizing The Model: Method 2 Method 2: Optimizing consistency across both channels and spatial coordinates is too memory intensive so optimize consistency between spatial coordinates in each channel, but not across channels.
  • 92. Convolutionalizing The Model: Method 2 Method 2: Optimizing consistency across both channels and spatial coordinates is too memory intensive so optimize consistency between spatial coordinates in each channel, but not across channels.
  • 93. Convolutionalizing The Model: Method 2 Method 2: Optimizing consistency across both channels and spatial coordinates is too memory intensive so optimize consistency between spatial coordinates in each channel, but not across channels. Advantages: • Optimizes spatial consistency which is known to be important in images. Disadvantages: • Still wasteful in resources, since images have local spatial consistency but not global – better to window the consistency optimization. • Channel consistency is not optimized.
  • 94. Convolutionalizing The Model: Method 3 Method 3: We simply scan the input using the model we used for MNIST and FMNIST • This method effectively optimizes the consistency between the channels but not between different spatial coordinates. Advantages: • Optimizes both channel and local-spatial consistency Disadvantages: • Windowing operation is slow
  • 95. Convolutionalizing The Model: Method 3 Method 3: We simply scan the input using the model we used for MNIST and FMNIST • This method effectively optimizes the consistency between the channels but not between different spatial coordinates. Advantages: • Optimizes both channel and local-spatial consistency Disadvantages: • Windowing operation is slow
  • 96. Convolutionalizing The Model: Method 3 Method 3: We simply scan the input using the model we used for MNIST and FMNIST • This method effectively optimizes the consistency between the channels but not between different spatial coordinates. Advantages: • Optimizes both channel and local-spatial consistency Disadvantages: • Windowing operation is slow
  • 97. Experimental Setup – Models • 𝜓 = 𝑅𝑒𝐿𝑈 • 64 units • 1 layer • 32 activity consistency optimization iterations • 5x5 kernel and 3x3 stride • ReLU activations
  • 98. Experimental Setup – Baseline • 1-layer CNN • 64 channels • 5x5 kernel with 3x3 stride
  • 99. Experimental Setup – Training and Evaluation • The models are trained on 10K images from CIFAR10 and evaluated on 1K images. • The models are also evaluated on adversarially perturbed data. • The perturbations are computed using the Projected Gradient Descent attack • The attack is repeated several times with different upper bounds on the ℓ∞ norms of the perturbations. • The upper bounds we consider are 0.008, 0.016, 0.024, 0.032, 0.048, 0.064
  • 100. Comparing All Methods • Our method improves robustness significantly • Robustness comes at the cost of clean accuracy
  • 101. Summary and Conclusions • Integrated biologically inspired mechanisms into deep learning models. • These mechanisms constrain the activation patterns i.e. arbitrary patterns become less likely • In a way the model is internally denoising the perturbations and moving them towards activations observed during training – some rudimentary memory mechanism. • Experimental results show that this makes the models more robust • Hypothesis: the adversary has an additional task of ensuring that the perturbations are “familiar” to the model. • The experimental results generalize across datasets and model architectures.
  • 102. Future Directions • Refine the convolutional architectures • We might need to increase depth to improve accuracy • Depth is required for shift invariance • Further investigate the similarities between our consistency optimization approach and Boltzmann machines / Hopfield nets • There is a hypothesis that signaling between cortical areas in the brain takes place in a very small “communication subspace” [Kohn+20] • Neural activations in the source area that lie outside this subspace cause little or no activity in the target area • Perhaps quantizing the activity of the model may be a way of implementing this subspace.
  • 103. References [Goodfellow+14] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014). [Madry+18] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu, “Towards deep learning models resistant to adversarial attacks,” 2017. [Kannan+18] Kannan, Harini, Alexey Kurakin, and Ian Goodfellow. "Adversarial logit pairing." arXiv preprint arXiv:1803.06373 (2018). [Zhang+19] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 7472–7482. PMLR, 2019. [Wong+20] Wong, Eric, Leslie Rice, and J. Zico Kolter. "Fast is better than free: Revisiting adversarial training." arXiv preprint arXiv:2001.03994 (2020). [Rebuffi+21] Rebuffi, Sylvestre-Alvise, et al. "Fixing data augmentation to improve adversarial robustness." arXiv preprint arXiv:2103.01946 (2021). [Cohen+19] Jeremy M. Cohen, Elan Rosenfeld, and J. Zico Kolter, “Certified adversarial robustness via randomized smoothing,” CoRR, 2019. [Ilyas+18] Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry, “Adversarial examples are not bugs, they are features,” 2019. [Wang+19] Haohan Wang, Xindi Wu, Pengcheng Yin, and Eric P. Xing, “High frequency component helps explain the generalization of convolutional neural networks,” CoRR, 2019. [Firestone+20] Firestone, Chaz. "Performance vs. competence in human–machine comparisons." Proceedings of the National Academy of Sciences 117.43 (2020): 26562-26571. [Shah+21a] Shah, Muhammad A., Raphael Olivier, and Bhiksha Raj. "Towards Adversarial Robustness Via Compact Feature Representations." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. [Nakkiran19] Preetum Nakkiran, “Adversarial robustness may be at odds with simplicity,” 2019. [Belghazi+2018] Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm, “Mine: mutual information neural estimation,” arXiv:1801.04062, 2018. [Shah+21b] Muhammad Shah, Raphael Olivier, and Bhiksha Raj, “Exploiting non-linear redundancy for neural model compression,” in ICPR, 2021. [Parr+18] Parr, Thomas, Geraint Rees, and Karl J. Friston. "Computational neuropsychology and Bayesian inference." Frontiers in human neuroscience (2018): 61. [Rao+99] Rao, Rajesh PN, and Dana H. Ballard. "Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects." Nature neuroscience 2.1 (1999): 79-87. [Paiton+20] Paiton, Dylan M., et al. "Selectivity and robustness of sparse coding networks." Journal of vision 20.12 (2020): 10-10. [Hennig+21] Hennig, Jay A., et al. "How learning unfolds in the brain: toward an optimization view." Neuron 109.23 (2021): 3720-3735. [Kohn+20] Kohn, Adam, et al. "Principles of corticocortical communication: proposed schemes and design considerations." Trends in neurosciences 43.9 (2020): 725-737.

Editor's Notes

  1. I’ll start by providing some background on adversarial attacks against ML models Consider the image on the left, that is clearly showing a cute panda. If we pass this image to a deep learning model trained on imagenet, it also correctly identifies the panda. Now consider the image on the right. This image is obtained by superimposing the patch of seemingly random noise, delta, shown in the middle onto the original image. The noise is scaled to a very small magnitude, epsilon, such that the modified image looks identical to the original. However, our imagenet model, which correctly classified the original image, classifies the modified image as a gibbon with 99% confidence.
  2. I’ll begin by describing the problem of adversarial vulnerability in deep neural networks We have two classifiers, the human and the ML classifier, and an adversary. Under the threat model that we are considering, the goal of the adversary is to an input sample in such a way that the ML classifier responds to it differently than a human.
  3. The objective of the ML classifier is to replicate the human’s decision function for the given task. Consider the case of visual perception. When the human sees this image, they will recognize it as a cat. When making this decision the human brings a vast amount of experiential and, even, scientific knowledge to bear. In addition to this specific image, the human will also consider several perceptual variants of the same image as cats, as long as they exhibit the features that the human knows to be characteristic of cats.
  4. The ML classifier, on the other hand, is provided only very sparse information by the human. For example, the human will tell the classifier that these three images contain cats, without telling it what it actually means to be a cat, and that this image is that of a dog.
  5. Now suppose that these images are arranged as shown in the human’s perceptual space. The image of the dog represents the entire perceptual region that contains dogs. The classifier’s job is to discriminate between cats and dogs. With the information the classifier is provided it can learn a boundary, such as this one, that intersects the perceptual region containing cats. In fact, it can learn many such boundaries, all of which would be equally optimal given the information provided to the model.
  6. In fact, it can learn many such boundaries, all of which would be equally optimal given the information provided to the model.
  7. In fact, it can learn many such boundaries, all of which would be equally optimal given the information provided to the model.
  8. In fact, it can learn many such boundaries, all of which would be equally optimal given the information provided to the model. However, make the model vulnerable to adversarial attacks.
  9. To craft its attack, the adversary searches for points within the human’s perceptual boundary for which the ML classifier responds differently than the human. Since in most cases the human’s perceptual boundary is unknown (even to us humans), the adversary approximates it using a metric ball of a small radius around the data point. In this example, this region here contains points that the human would consider as cats, but the classifier would classify as dog
  10. If the classifier, was somehow able to accurately model the perceptual boundary of the human it would be robust to adversarial attacks, at least under the threat model we are considering.
  11. In this case any, the adversary can not find a point within the human’s perceptual region for a cat, that the classifier will classify as a dog, and vice versa. Any point that the adversary picks such that the it changes the decision of the classifier, will also change the decision of the human.
  12. One of the simplest techniquesis the fast gradient sign method (FGSM). FGSM computes the perturbation by computing the gradient of the loss function w.r.t to the input, x, and scaling its sign by epsilon.
  13. A more powerful technique can be obtained by using projected gradient descent to find a perturbation that maximizes loss while remaining imperceptibly small. Like FGSM this technique computes the perturbation using the gradient of the loss function w.r.t to x. [START CLICKING FOR ANIMATION]
  14. However, unlike, FGSM, this technique iteratively optimizes the perturbation and, to satisfy the imperceptibility constraint, the perturbation is projected onto a norm ball of radius epsilon.
  15. However, unlike, FGSM, this technique iteratively optimizes the perturbation and, to satisfy the imperceptibility constraint, the perturbation is projected onto a norm ball of radius epsilon.
  16. However, unlike, FGSM, this technique iteratively optimizes the perturbation and, to satisfy the imperceptibility constraint, the perturbation is projected onto a norm ball of radius epsilon.
  17. A more powerful technique can be obtained by using projected gradient descent to find a perturbation that maximizes loss while remaining imperceptibly small. Like FGSM this technique computes the perturbation using the gradient of the loss function w.r.t to x. However, unlike, FGSM, this technique iteratively optimizes the perturbation and, to satisfy the imperceptibility constraint, the perturbation is projected onto a norm ball of radius epsilon.
  18. CLICK – algorithm CLICK - question Pose question why new adversarial examples need to be created in every iteration? CLICK - answer
  19. Logit pairing – try to keep logits close TRADES – adversarial examples crafted to maximally change the logits
  20. CLICK – smoothed classifier CLICK – smoothing to robustness CLICK – Radius calculations Explain the radius calculations on the board.
  21. Draw figures for the cases of predicting correct class, abstaining and predicting the wrong class on the board.
  22. In reality the classifier may only have access to only a part of the complete table, for example just 6 of the 8 rows.
  23. Since the training data is incomplete the classifier must infer a value of Y for each of the missing input combinations. It can do this in 4 ways, as shown here, however only one of the learned outputs corresponds to our target function. In all the other patterns, the value of X3 can be manipulated to change the predicted value of Y. For example, if the algorithm, learns the first pattern then changing the value of X3 from 0 to 1 can result in the classifier incorrectly predicting Y=0 for when X1 is not equal to X2.
  24. From this example, we see that the number of missing patterns can be exponential in the number of spurious bits. For a dataset of size D, over K binary features, there are 2^K – D missing patterns The number of possible extensions to the table is exponential in the number of missing patterns. And therefore, the number of ways of adversarially modifying the inputs is super exponential in the number of spurious bits.
  25. Ask students to guess
  26. Now we provide the algorithm an additional spurious input x3 which doesn’t effect the output. If we provide the algorithm the full table shown here, it can determine that the value of Y is the same for X3=0 and X3=1 if X1 and X2 are fixed, and therefore X3 is a spurious input. However, it is extremely uncommon for the training data to completely specify the target function
  27. We can apply the same reasoning to neural networks. In a neural network each neuron encodes a feature. To demonstrate this we can use the simple network on the slide that models the decision boundaries shown in the image on the right. The eight neurons in the first layer are feature detectors for the eight boundaries. Each neuron indicates on which side of the respective boundary the input point lies. Similarly the two neurons in the second layer are feature detectors for each square. The left neuron fires if the input point is in the while square while the right neuron fires if the point is in the small black square. Likewise, the output neuron in the final layer is a NAND gate that operates on the features computed by the second layer. [CLICK] Given the equivalence between features and neurons, if a layer has too few neurons, the input to the downstream subnetwork would be under-specified causing the model to perform poorly. If a layer has too many neurons the input to the downstream network is overspecified making it vulnerable to adversarial attacks. [CLICK] In this paper we propose a method to identify superfluous features and remove/reduce their influence on the model’s output
  28. We can identify superfluous features as follows. Consider a feature vector, f, containing features f_1, to f_n and f_-i to be the vector with f_i removed. We can decompose each feature f_i into two components, phi(f_-i) and and delta_i. Phi(f_-i) is the component of f_i that can be predicted from the values of the other features in f, while delta_i is the novel information encoded by f_i. We can now quantify the usefulness of feature by computing the mutual information between delta_i and the true label, Y. Features that do not provide a lot of new information about the true label are of limited use for the task at hand, and including them in the model may make it vulnerable to adversarial attacks.
  29. We implement the decomposition as follows: While other settings are possible, in this paper we have assumed phi to be linear function, phi(f-i) is computed as the dot product between f-i and a coefficient vector a^i We can easily find this coefficient vector by using ordinary least squares regression. Now we can compute delta_i by subtracting phi(f-i) from f_i.
  30. After decomposing all the features and obtaining the delta_i’s, the next step is to compute the mutual information between delta_i and Y. Unfortunately computing mutual information is a non-trivial task if the true distribution is not known, and so we will have to approximate it. We have employed two types of approximations: The first is a fist-order approximation of the influence of f_i on the correctness of the model’s output. This is computed as the product of delta_i with the gradient of the loss function w.r.t the feature f_i. The second approximation uses Mutual Information Neural Estimation. After we have quantified the usefulness of all the features, we removed the least useful features from the network using LRE-AMC, a structural pruning technique.
  31. Given the equivalence between neurons and features, removing a feature from the network means removing a neuron. We can consider each f_i to be the output of a neuron (or convolutional filter) in a layer of the network. Since the output of each layer becomes the input features for the next layer, to remove a neuron from the network, we can simply drop the corresponding row or column from the weight matrices of
  32. To demonstrate this approach at a high level, consider this simple network with three hidden neurons and two outputs.
  33. Now suppose that the output of the first neuron is just the sum of the outputs of the other two neurons
  34. In this case we can remove, the first neuron and add its outgoing weight to the outgoing weights of the remaining neurons. Specifically, we will add w_11 to w_12 and w_13 and add w_21 to w_22 and w_23 Adjusting the weights is rather straightforward if we know which neurons are linearly dependent on other neurons and what their linear combination weights are. In practice, we will not have this information and so we will need to develop a technique to extract it.
  35. We evaluate our approach on three image recognition models, namely vgg-16 and alexNet trained on CIFAR-10 and LeNet trained on MNIST We attack these models using the PGD approach presented earlier, with perturbations of various l_inf and l_2 norms. We compare the performance of our approach against two highly successful adversarial defensive techniques. We also run experiments with vanilla LRE-AMC to determine if simply pruning redundant neurons improves robustness.
  36. The results of our experiments are presented in the table. We can see that PGD reduces the accuracy of all the model to almost 0%, even with perturbations of small magnitude. If we remove superfluous neurons using our approach, we see that in all the cases the accuracy of the models improves significantly. At the same time we see that these models have upto 98% fewer parameters Even for the largest perturbations, the models show between 10 to 15 % accuracy, whereas the original models had 0% accuracy. We applied adversarial training and gaussian smoothing to the vgg-16 network. We see that while these techniques yield higher accuracy on smaller perturbations, our approach performs much better on larger perturbations. Most notably, the accuracy of our approach is 4.5 times higher than the accuracy of the adversarially trained model at perturbations with the largest l_2 norm. Furthermore, it is important to note that we did not have to train our models on perturbed data, which also leads to higher accuracy of our models on clean data.
  37. Check transferability across different num iterations
  38. Check transferability across different num iterations
  39. Check transferability across different num iterations