SlideShare a Scribd company logo
1 of 62
Biologically Inspired
Methods for
Adversarially Robust
Deep Learning
Muhammad Ahmed Shah
Robustness of DNNs to Input Perturbations
• DNNs are sensitive to perturbations that humans are invariant to
• This causes them to fail in counter-intuitive ways and raises questions
about their reliability
goldfinch
crow
2
Adversarial Attacks
• Pernicious because they are imperceptible
• Can be realized in the physical world
• Need to defend against them
Goodfellow, Ian
J., Jonathon
Shlens, and
Christian
Szegedy.
"Explaining and
harnessing
adversarial
examples." arXiv
preprint
arXiv:1412.6572
(2014).
𝛿 𝑥 + 𝜖𝛿
ϵ
3
Various Types of Adversarial Attacks
ℓ𝑝 bounded
perturbations
Unbounded
Parametric
Perceptual
distance bounded
perturbations
Wasserstein
distance bounded
perturbations
Adversarial
Patch
Physically
Realisable
Ideally DNNs should be robust to all such perturbations
But, often, they are not
4
Defenses Against Adversarial Attacks
Adversarial
Defenses
Attack Aware
Defenses
Certified
Defenses
Image
Transforms
Attack
Detectors
Attack Agnostic
Defenses
Adversarial
Training
Neural
Architecture
Search
Pruning
Do not generalize
Do not make the DNN robust
Only adds computational load
Generalizable
5
Attack
Neutralizers
Attack Agnostic Defenses
Structural Priors
Design elements
conducive to
robustness.
Biological Priors
Biological
principles related
to robustness.
6
Differences Between Human and DNN
Decision Functions Enable Adversarial Attacks
8
cat
+
Hypothesis: Aligning DNNs with Human
Perception Will Make Them More Robust
9
Dapello+20
Overview
10
1. Foveation via Adaptive
Blurring and Desaturation
2. Biologically Inspired
Audio Features
3. Fixed Interneuron
Covariability
4. Recurrence
Biological Priors
11
1. Foveation via Adaptive
Blurring and Desaturation
2. Biologically Inspired
Audio Features
3. Fixed Interneuron
Covariability
4. Recurrence
R-Blur: Foveation via Adaptive Blurring and
Desaturation
12
Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
R-Blur: Overview
13
+ 𝛿~𝒩(0, 𝜎)
𝜶𝑐
𝜶𝑟
1. Select fixation point
2. Add Gaussian Noise
3. Split into color and grey channels and
apply adaptive blurring
4. Combine the color and grey channels
Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
Computing Eccentricity
• Eccentricity ≡ Distance from
fixation point
• Generally measured radially, i.e.
Euclidian distance
• Need to extract circular regions to blur
– inefficient
• We use a different distance metric
𝑒𝑝𝑥,𝑝𝑦
=
max( 𝑝𝑥 − 𝑓𝑥 , |𝑝𝑦 − 𝑓𝑦|)
𝑊
• Regions with same eccentricity are
squares – can be extracted by slicing
the image tensor
14
Estimating Visual Acuity
• Visual acuity := the ability to perceive
spatial details in the image
• The acuity of color vision decreases
exponentially with eccentricity.
• The acuity of grey vision generally
much lower, and is minimal at the
fixation point.
• We approximate color and grey acuity
as:
𝒟𝐶 𝑒 ; 𝜎𝐶 = max Λ 𝑒; 0, 𝜎𝐶 , 𝜍 𝑒; 0,2.5𝜎𝐶
𝒟𝑅 𝑒; 𝜎𝑅, 𝑚 = 𝑚(1 − 𝒟𝐶 𝑒 ; 𝜎𝑅 )
• Λ and 𝜍 are the PDF for the Laplace and
Cauchi distribution
• We set 𝜎𝐶 = 0.12, 𝜎𝑅 = 0.09 and 𝑚 =
0.12
15
Eccentricity
Estimated
Acuity
Quantizing Visual Acuity
• The visual acuity at a pixel
determines the std. dev. of
Gaussian blur applied to it.
• #Kernels = # Unique acuity values
= # unique eccentricity values =
max{𝑊, 𝐻}
• To improve efficiency, we
quantize the estimated visual
acuity values
16
Applying Blur
• We compute the std. dev. of the
Gaussian kernel at each pixel as
𝛽𝐷(𝑒𝑝𝑥,𝑝𝑦
)
• 𝐷(𝑒𝑝𝑥,𝑝𝑦
) is the estimated acuity,
and 𝛽 = 0.05
17
Desaturation via Combination
• The blurred grey and color
images are combined in a
pixelwise combination
• The pixel weights are the
normalized color and grey visual
acuity values
18
Fixation Point Selection
19
DeepGaze DeepGaze DeepGaze DeepGaze
Original
Predefined
Initial Fixation
ResNet ResNet ResNet ResNet ResNet
Output
Average
Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
R-Blur Improves Adversarial Robustness of
ResNet
20
Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
CIFAR-10 Ecoset Imagenet
Std. ResNet
5 Rand. Affine Tfms
R-Blur
21
53
41
31
21
13
49
40
32
24
18
52
45
40
31
23
1 2 3 4 5
Accuracy
(%)
Corruption Severity
Std. ResNet AT R-Blur
R-Blur is Robust to Common Corruptions
Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
R-Blur Compares Favorably to Biological
Defenses
22
Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
-8.2
5.3
3.2
-7.2
15
6.5
Clean Adv. Com. Cor.
Difference
in
Accuracy
(R-Blur
-
Other)
VOneNet (Dapello+20) R-Warp (Vuyyuru+20)
Role of Number of Fixations
23
Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
Role of Fixation Location
24
70
61
70
0
18 20
0
10
20
30
40
50
60
70
80
Std. ResNet R-Blur w.
DeepGaze
R-Blur w.
Opt. Fix.
Accuracy Clean
Adv.
Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
Key Takeaways
• R-Blur significantly improves the robustness of DNNs without being
trained on perturbed data.
• The robustness of R-Blur generalizes better than adversarial training
to different perturbation types.
• R-Blur shows the promise of biologically motivated approaches to
improving the robustness of DNNs.
25
Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
Biological Priors
26
1. Foveation via Adaptive
Blurring and Desaturation
2. Biologically Inspired
Audio Features
3. Fixed Interneuron
Covariability
4. Recurrence
Audio Features For Automatic Speech
Recognition
• Early ASR approaches used hand-crafted features -- often inspired by
biology, particularly the cochlea
• The simple and popular approach
• Time-frequency analysis via FFT (sim. basilar membrane)
• Bank of band-pass Mel filters (sim. characteristic frequencies in cochlea)
• Non-linearity (sim. hair cell response)
27
Spectrogram Log Mel-Spectrogram
STFT
Waveform Mel-Filterbank Log
Bio-plausible Audio Features for ASR
• More bio-plausible feature
extraction methods exist but not
widely used
• Known to improve noise
robustness, but adversarial
robustness not evaluated
• We evaluate the performance of
several exiting biologically
plausible audio feature
• We also propose novel features
28
Features Evaluated So Far
Feature Salient Feature
Log Spectrogram Time-frequency representation + non-linearity
Log Mel Spectrogram Triangular RFs with CFs on the Mel Scale
Cochleagram
[Feather+23]
Gammatone RFs with CFs on the ERB scale + power-law non-linearity
Gammatone Spectrogram Same as Cochleagram but computed by transforming the STFT
Power Normalized
Coefficients [Kim+10]
Power-normalized Gammatone RFs with CFs on the ERB scale +
temporal masking + noise suppression + power-law non-linearity
Difference of
Gammatones
Lateral suppression by frequencies around the CF
Frequency Masked
Spectrogram
Simulates simultaneous frequency masking
29
Lateral Suppression via Difference of
Gammatone Filters
• Lateral Suppression:
• the response at a given CF may be suppressed by the energy at adjacent
frequencies [Stern & Morgan 12]
• Enhances responses to spectral changes and reduces impact of noise
• Proposal: take a difference of Gammatone filterbank
31
Difference of Gammatone Filters
• Create 2 Gammatone
frequency response curves
with different widths, and
subtract.
• Normalize by sum of positive
values
𝐺𝑑 𝑘 = 𝐺1 𝑘 − 𝐺2 𝑘
𝐺𝑑 𝑘, 𝑡 =
𝐺𝑑[𝑘, 𝑡]
𝑡 max{𝐺𝑑[𝑘, 𝑡] , 0}
Frequency
Amplitude
Difference of Gammatone Filterbank
Power Normalized Gammatone Filterbank Responses Normalized Difference of Gammatone
Filterbank Responses
Frequency (FFT bin)
Amplitude
Amplitude
Frequency (FFT bin)
Applying DoG Filterbank
• Convolve the DoG Filterbank over the STFT
𝑆𝑥
𝐺 = 𝑆𝑥 ∗ 𝐺
• Half-wave Rectify
𝑆𝑥
𝐺
+
𝑘, 𝑡 = max 𝑆𝑥
𝐺 𝑘, 𝑡 , 0
• Non-linear Compression
𝑆𝑥[𝑘, 𝑡] = 𝑆𝑥
𝐺
+
𝑘, 𝑡 0.3
34
Example: Concord Returned To Its Place
Amidst The Tents
Power Normalized Gammatone Normalized Difference of Gammatone
Time (Window)
Frequency
(FFT
bin)
Frequency
(FFT
bin) Time (Window)
Effect on Robustness
36
• Model: CNN + 16 layer conformer
• 65 non-adversarial audio transforms
• Untargeted gradient-based attack
• SNR-bounded PGD @ 10,20,30,40 dB
• NWER:
1
|Δ| 𝛿∈Δ
𝑊𝐸𝑅𝑓
𝛿
𝑊𝐸𝑅𝑀𝐹𝐶𝐶
𝛿
Simultaneous Frequency Masking
• High power at a frequency can raise the threshold of hearing for
adjacent frequencies
• Frequencies below the threshold are inaudible, i.e. masked
• Exploited for MP3 compression – more compression in masked spectro-
temporal regions.
• Proposal: Compute the hearing the hearing threshold and zero-out
the masked region
37
Frequency Masked Spectrogram
• Estimate the masking threshold
for each (FFT) frequency [Qin+19,
Lin+15]
• Zero-out regions of the
spectrogram where Power-
Spectral Density (PSD) falls below
the threshold
38
Time (window) Time (window)
Frequency
(FFT
bin)
Frequency
(FFT
bin)
Estimating the Masking Threshold [Qin+19]
1. Smoothed Normalized PSD
𝑝𝑥 𝑘 = 20log10 𝑠𝑥 𝑘
1
𝑁
𝑝𝑥 𝑘 = 96 − max
𝑘
𝑝𝑥 𝑘 + 𝑝𝑥 𝑘
𝑝𝑥
𝑚
𝑘
= 10log10 10𝑝𝑥 𝑘−1
+ 10𝑝𝑥 𝑘
+ 10𝑝𝑥 𝑘+1
2. Two-sided spreading function
𝑆𝐹 𝑖, 𝑗 =
27Δ𝑏𝑖𝑗 Δ𝑏𝑖𝑗 > 0
𝐺 𝑖 ⋅ Δ𝑏𝑖𝑗 𝑜𝑤
Δ𝑏𝑖𝑗 = 𝑏𝑎𝑟𝑘 𝑓𝑗 − 𝑏𝑎𝑟𝑘 𝑓𝑖
𝐺 𝑖 = −27 + 0.37max{𝑝𝑥 𝑖 − 40,0}
39
masker
maskee
Estimating the Masking Threshold (cont.)
3. Pairwise Threshold
𝑇 𝑖, 𝑗 = 𝑝𝑥
𝑚
𝑖 + Δ𝑚 𝑖 + 𝑆𝐹 𝑖, 𝑗
Δ𝑚 𝑖 = −6.025 − 0.275 ⋅ 𝑏𝑎𝑟𝑘(𝑖)
4. Global Threshold
𝜃𝑥 𝑘 = 10log10 10
𝐴𝑇𝐻(𝑖)
10 +
𝑖
10
𝑇[𝑖,𝑘]
10
40
Time
PSD
Applying Masking
5. Create a mask
𝛼𝑥(𝑘) = 𝐼 𝑝𝑥
𝑚
𝑘 > 𝜃𝑥 𝑘
6. Apply Mask to Spectrogram
𝑠𝑥(𝑘) = 𝑠𝑥 𝑘 ⊙ 𝛼𝑥(𝑘)
7. Apply non-linearity
𝑠𝑥 𝑘 0.3
41
Time
Time
Robustness of All Features
• Model: CNN + 16 layer conformer
• 65 non-adversarial audio transforms
• Untargeted gradient-based attack
• SNR-bounded PGD @ 10,20,30,40 dB
• NWER:
1
|Δ| 𝛿∈Δ
𝑊𝐸𝑅𝑓
𝛿
𝑊𝐸𝑅𝑀𝐹𝐶𝐶
𝛿
• Gammatone FB generally improves
robustness
• Best against Adv: Difference of
Gammatone
• Best against non-Adv: Gammatone
Spectrogram
42
Clean WER of All Features
• Model: CNN + 16 layer conformer
• All features have low WER
• Gammatone Spectrogram lowest WER
43
Key Takeaway and Future Work
• Certain biological phenomenon (lateral suppression) improves
robustness to adversarial attack
• While others (temporal masking) do not
• Gammatone FB generally improves robustness
• The gammatone spectrogram has lowest WER on clean data and non-
adversarial perturbations
• Simulate detailed cochlear models (e.g. CARFAC [Lyon 12], Seneff)
• Creating efficient PyTorch implementation taking time.
44
Biological Priors
45
1. Foveation via Adaptive
Blurring and Desaturation
2. Biologically Inspired
Audio Features
3. Fixed Interneuron
Covariability
4. Recurrence
Fixed Inter-Neuron Covariability Induces
Adversarial Robustness
46
• Inter-neuron correlations in the brain are
rigid [Hennig+21]
• Inter-neuron correlations in DNNs are
flexible
• Change based on stimulus distribution
Shah, M.A. and Raj, B., Fixed Inter-Neuron Covariability Induces Adversarial Robustness, ICASSP
(2024)
SCA Layer
47
Transform the activations so they respect the learned correlation
1. 𝒖 ← 𝑓 𝑥
2. For 𝑡: 1 → 𝑇 do
3. 𝒂𝒙 ← 𝜙(𝒖)
4. 𝐽 ← 𝒂𝒙 − 𝜙 𝑾𝒈𝒂𝒙 + 𝒃𝒈 2
+ 𝜆 𝒙 − 𝑾ℎ𝒂𝒙 − 𝒃ℎ 2
5. 𝒖 ← 𝜂∇𝒂𝒙
𝐽
5. End for
6. 𝒂𝒙 ← 𝜙(𝒖)
Map to correlation regularization
𝑊
𝑔
0
𝒂𝒙
Shah, M.A. and Raj, B., Fixed Inter-Neuron Covariability Induces Adversarial Robustness, ICASSP (2024)
Diagonal 0
Result #1: SCA Layer Reduces Change in Inter-
Neuron Correlation
48
FMNIST MNIST Speech Commands
Shah, M.A. and Raj, B., Fixed Inter-Neuron Covariability Induces Adversarial Robustness, ICASSP (2024)
Results #2: SCA Layer Makes Models More
Robust
49
Shah, M.A. and Raj, B., Fixed Inter-Neuron Covariability Induces Adversarial Robustness, ICASSP (2024)
-0.5
1.5 1.8
0.7 0.5
4.6
-1.3
-4.2
3.3
Clean Non-Adv Perturb Adv Perturb
Difference
In
Accuracy
(SCA-MLP)
MNIST FMNIST Speech Commands
Biological Priors
50
1. Foveation via Adaptive
Blurring and Desaturation
2. Biologically Inspired
Audio Features
3. Fixed Interneuron
Covariability
4. Recurrence
Recurrent Connections in the Brain
• Recurrent circuits are wide spread in the brain
[Bullier+01, Briggs+20]
• Lateral connections between neurons in the
same region
• Feedback connections from higher cognitive
areas to lower areas
• May fill in missing information due to crowding or
occlusion [Spoerer+17, Boutin+21]
• Not represented in DNNs
51
Classification
Adding Recurrence to DNNs
52
Input
Conv-Pool
Conv+Upsample Conv
Conv-Pool
Global
Pooling
Conv-Pool
Linear Proj
Feedforward
Pathways
Lateral
Recurrence
Feedback
𝑡 = 1
𝑡 = 2
𝑡 = 𝑇
Conv+Upsample
Results
53
Adding recurrence improves accuracy on clean and
adversarially perturbed data
CIFAR-10
Time Steps
Reconstructing From Feedback Signal
• Without constraints degenerate solutions are possible
• Recurrent connections may learn identity functions
• Explicitly encourage models to fill in missing information
54
Conv+Upsample
Conv-Pool
Reconstructing From Feedback Signal
ℒ 𝑥, 𝑥, 𝑦 = 𝑥 − 𝑥 2 + 𝑋𝑒𝑛𝑡(𝑥, 𝑦)
55
Classification
Conv+Upsample Conv
Conv-Pool
Global
Pooling
Conv-Pool
Linear Proj
Reconstruction
Input
Radom
Occlusions
tanh
Prediction of
occluded
information
𝑥
𝑥
Results
56
Adding reconstruction significantly improves accuracy on
adversarially perturbed data,
but reduces accuracy on clean data
Key Takeaways and Future Work
• Scale up experiments to larger models and datasets
• Explore synergies with other works like R-Blur and bio-plausible audio
feature
57
Summary
58
Foveation via Adaptive
Blurring and Desaturation
Biologically Inspired
Audio Features
Fixed Interneuron
Covariability
Recurrence
Improved robustness to
adv & non-adv
perturbations
lateral suppression
improved adv
robustness
Improved robustness
to adv & non-adv
perturbations
Improved adv
robustness.
• FB w/ recon yields
better robustness.
References
P. Benz, C. Zhang, and I. S. Kweon. Batch normalization increases adversarial vulnerability and
decreases adversarial transferability: A non-robust feature perspective. In Proceedings of the
IEEE/CVF International Conference on Computer Vision, pages 7818–7827, 2021.
A. Brandmeyer, R. Lyon, and R. Weiss. Cascade of asymmetric resonators with fast-acting com-
pression cochlear model, 2015.
S. Bubeck and M. Sellke. A universal law of robustness via isoperimetry. Advances in Neural
Information Processing Systems, 34:28811–28822, 2021.
B. Choksi, M. Mozafari, C. Biggs O’May, B. Ador, A. Alamia, and R. VanRullen. Predify: Aug- menting
deep neural networks with brain-inspired predictive coding dynamics. Advances in Neural
Information Processing Systems, 34:14069–14083, 2021.
J. Dapello, T. Marques, M. Schrimpf, F. Geiger, D. Cox, and J. J. DiCarlo. Simulating a primary visual
cortex at the front of cnns improves robustness to image perturbations. Advances in Neural
Information Processing Systems, 33:13073–13087, 2020.
References
J. M. Gant, A. Banburski, and A. Deza. Evaluating the adversarial robustness of a foveated texture
transform module in a cnn. In SVRHM 2021 Workshop@ NeurIPS, 2021.
H. Hermansky, N. Morgan, A. Bayya, and P. Kohn. Rasta-plp speech analysis. In Proc. IEEE Int’l Conf.
Acoustics, speech and signal processing, volume 1, pages 121–124, 1991.
Y. Huang, J. Gornet, S. Dai, Z. Yu, T. Nguyen, D. Tsao, and A. Anandkumar. Neural networks with recurrent
generative feedback. Advances in Neural Information Processing Systems, 33: 535–545, 2020.
J. Kubilius, M. Schrimpf, A. Nayebi, D. Bear, D. L. Yamins, and J. J. DiCarlo. Cornet: Modeling the neural
mechanisms of core object recognition. BioRxiv, page 408385, 2018.
A. Jonnalagadda, W. Y. Wang, B. Manjunath, and M. Eckstein. Foveater: Foveated transformer for image
classification, 2022.
R. Lyon. Computational models of neural auditory processing. In ICASSP’84. IEEE International Conference
on Acoustics, Speech, and Signal Processing, volume 9, pages 41–44. IEEE, 1984.
Lin, Y. and Abdulla, W. H. Principles of psychoacoustics. In Audio Watermark, pp. 15–49. Springer, 2015.
Appendix
61
All Components of R-Blur Improve Robustness
62
R-Blur vs. Gaussian Noise vs. Gaussian Blur
63
R-Blur vs. Gaussian Blur
64

More Related Content

Similar to Biologically Inspired Methods for Adversarially Robust Deep Learning

Study on Data Augmentation Methods for Sonar Image Analysis
Study on Data Augmentation Methods for Sonar Image AnalysisStudy on Data Augmentation Methods for Sonar Image Analysis
Study on Data Augmentation Methods for Sonar Image Analysisharmonylab
 
Image Denoising Using Wavelet Transform
Image Denoising Using Wavelet TransformImage Denoising Using Wavelet Transform
Image Denoising Using Wavelet TransformIJERA Editor
 
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...sipij
 
THE TELEVISION SYSTEM IN INDIA
THE TELEVISION SYSTEM IN INDIATHE TELEVISION SYSTEM IN INDIA
THE TELEVISION SYSTEM IN INDIAIshank Ranjan
 
U tokyo 2019
U tokyo 2019U tokyo 2019
U tokyo 2019Jinze Yu
 
Defense Presentation
Defense PresentationDefense Presentation
Defense PresentationSahil Chaubal
 
Senior design final presentation master
Senior design final presentation masterSenior design final presentation master
Senior design final presentation mastercladd7294
 
Noise Removal in SAR Images using Orthonormal Ridgelet Transform
Noise Removal in SAR Images using Orthonormal Ridgelet TransformNoise Removal in SAR Images using Orthonormal Ridgelet Transform
Noise Removal in SAR Images using Orthonormal Ridgelet TransformIJERA Editor
 
Noise Removal in SAR Images using Orthonormal Ridgelet Transform
Noise Removal in SAR Images using Orthonormal Ridgelet TransformNoise Removal in SAR Images using Orthonormal Ridgelet Transform
Noise Removal in SAR Images using Orthonormal Ridgelet TransformIJERA Editor
 
My Conferecence Publication
My Conferecence PublicationMy Conferecence Publication
My Conferecence Publicationharikrish.u
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...IJORCS
 
Rethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingRethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingSho Takase
 
Depth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayNAVER Engineering
 

Similar to Biologically Inspired Methods for Adversarially Robust Deep Learning (20)

Ax26326329
Ax26326329Ax26326329
Ax26326329
 
Study on Data Augmentation Methods for Sonar Image Analysis
Study on Data Augmentation Methods for Sonar Image AnalysisStudy on Data Augmentation Methods for Sonar Image Analysis
Study on Data Augmentation Methods for Sonar Image Analysis
 
riken-RBlur-slides.pptx
riken-RBlur-slides.pptxriken-RBlur-slides.pptx
riken-RBlur-slides.pptx
 
example based audio editing
example based audio editingexample based audio editing
example based audio editing
 
Image Denoising Using Wavelet Transform
Image Denoising Using Wavelet TransformImage Denoising Using Wavelet Transform
Image Denoising Using Wavelet Transform
 
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
 
THE TELEVISION SYSTEM IN INDIA
THE TELEVISION SYSTEM IN INDIATHE TELEVISION SYSTEM IN INDIA
THE TELEVISION SYSTEM IN INDIA
 
U tokyo 2019
U tokyo 2019U tokyo 2019
U tokyo 2019
 
Types of noise
Types of noiseTypes of noise
Types of noise
 
Defense Presentation
Defense PresentationDefense Presentation
Defense Presentation
 
Wavelet
WaveletWavelet
Wavelet
 
Senior design final presentation master
Senior design final presentation masterSenior design final presentation master
Senior design final presentation master
 
Noise Removal in SAR Images using Orthonormal Ridgelet Transform
Noise Removal in SAR Images using Orthonormal Ridgelet TransformNoise Removal in SAR Images using Orthonormal Ridgelet Transform
Noise Removal in SAR Images using Orthonormal Ridgelet Transform
 
Noise Removal in SAR Images using Orthonormal Ridgelet Transform
Noise Removal in SAR Images using Orthonormal Ridgelet TransformNoise Removal in SAR Images using Orthonormal Ridgelet Transform
Noise Removal in SAR Images using Orthonormal Ridgelet Transform
 
My Conferecence Publication
My Conferecence PublicationMy Conferecence Publication
My Conferecence Publication
 
Lc3618931897
Lc3618931897Lc3618931897
Lc3618931897
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
 
Rethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingRethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast Training
 
M.sc. m hassan
M.sc. m hassanM.sc. m hassan
M.sc. m hassan
 
Depth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things away
 

Recently uploaded

ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistandanishmna97
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 

Recently uploaded (20)

ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 

Biologically Inspired Methods for Adversarially Robust Deep Learning

  • 1. Biologically Inspired Methods for Adversarially Robust Deep Learning Muhammad Ahmed Shah
  • 2. Robustness of DNNs to Input Perturbations • DNNs are sensitive to perturbations that humans are invariant to • This causes them to fail in counter-intuitive ways and raises questions about their reliability goldfinch crow 2
  • 3. Adversarial Attacks • Pernicious because they are imperceptible • Can be realized in the physical world • Need to defend against them Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014). 𝛿 𝑥 + 𝜖𝛿 ϵ 3
  • 4. Various Types of Adversarial Attacks ℓ𝑝 bounded perturbations Unbounded Parametric Perceptual distance bounded perturbations Wasserstein distance bounded perturbations Adversarial Patch Physically Realisable Ideally DNNs should be robust to all such perturbations But, often, they are not 4
  • 5. Defenses Against Adversarial Attacks Adversarial Defenses Attack Aware Defenses Certified Defenses Image Transforms Attack Detectors Attack Agnostic Defenses Adversarial Training Neural Architecture Search Pruning Do not generalize Do not make the DNN robust Only adds computational load Generalizable 5 Attack Neutralizers
  • 6. Attack Agnostic Defenses Structural Priors Design elements conducive to robustness. Biological Priors Biological principles related to robustness. 6
  • 7. Differences Between Human and DNN Decision Functions Enable Adversarial Attacks 8 cat +
  • 8. Hypothesis: Aligning DNNs with Human Perception Will Make Them More Robust 9 Dapello+20
  • 9. Overview 10 1. Foveation via Adaptive Blurring and Desaturation 2. Biologically Inspired Audio Features 3. Fixed Interneuron Covariability 4. Recurrence
  • 10. Biological Priors 11 1. Foveation via Adaptive Blurring and Desaturation 2. Biologically Inspired Audio Features 3. Fixed Interneuron Covariability 4. Recurrence
  • 11. R-Blur: Foveation via Adaptive Blurring and Desaturation 12 Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
  • 12. R-Blur: Overview 13 + 𝛿~𝒩(0, 𝜎) 𝜶𝑐 𝜶𝑟 1. Select fixation point 2. Add Gaussian Noise 3. Split into color and grey channels and apply adaptive blurring 4. Combine the color and grey channels Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
  • 13. Computing Eccentricity • Eccentricity ≡ Distance from fixation point • Generally measured radially, i.e. Euclidian distance • Need to extract circular regions to blur – inefficient • We use a different distance metric 𝑒𝑝𝑥,𝑝𝑦 = max( 𝑝𝑥 − 𝑓𝑥 , |𝑝𝑦 − 𝑓𝑦|) 𝑊 • Regions with same eccentricity are squares – can be extracted by slicing the image tensor 14
  • 14. Estimating Visual Acuity • Visual acuity := the ability to perceive spatial details in the image • The acuity of color vision decreases exponentially with eccentricity. • The acuity of grey vision generally much lower, and is minimal at the fixation point. • We approximate color and grey acuity as: 𝒟𝐶 𝑒 ; 𝜎𝐶 = max Λ 𝑒; 0, 𝜎𝐶 , 𝜍 𝑒; 0,2.5𝜎𝐶 𝒟𝑅 𝑒; 𝜎𝑅, 𝑚 = 𝑚(1 − 𝒟𝐶 𝑒 ; 𝜎𝑅 ) • Λ and 𝜍 are the PDF for the Laplace and Cauchi distribution • We set 𝜎𝐶 = 0.12, 𝜎𝑅 = 0.09 and 𝑚 = 0.12 15 Eccentricity Estimated Acuity
  • 15. Quantizing Visual Acuity • The visual acuity at a pixel determines the std. dev. of Gaussian blur applied to it. • #Kernels = # Unique acuity values = # unique eccentricity values = max{𝑊, 𝐻} • To improve efficiency, we quantize the estimated visual acuity values 16
  • 16. Applying Blur • We compute the std. dev. of the Gaussian kernel at each pixel as 𝛽𝐷(𝑒𝑝𝑥,𝑝𝑦 ) • 𝐷(𝑒𝑝𝑥,𝑝𝑦 ) is the estimated acuity, and 𝛽 = 0.05 17
  • 17. Desaturation via Combination • The blurred grey and color images are combined in a pixelwise combination • The pixel weights are the normalized color and grey visual acuity values 18
  • 18. Fixation Point Selection 19 DeepGaze DeepGaze DeepGaze DeepGaze Original Predefined Initial Fixation ResNet ResNet ResNet ResNet ResNet Output Average Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
  • 19. R-Blur Improves Adversarial Robustness of ResNet 20 Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023) CIFAR-10 Ecoset Imagenet Std. ResNet 5 Rand. Affine Tfms R-Blur
  • 20. 21 53 41 31 21 13 49 40 32 24 18 52 45 40 31 23 1 2 3 4 5 Accuracy (%) Corruption Severity Std. ResNet AT R-Blur R-Blur is Robust to Common Corruptions Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
  • 21. R-Blur Compares Favorably to Biological Defenses 22 Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023) -8.2 5.3 3.2 -7.2 15 6.5 Clean Adv. Com. Cor. Difference in Accuracy (R-Blur - Other) VOneNet (Dapello+20) R-Warp (Vuyyuru+20)
  • 22. Role of Number of Fixations 23 Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
  • 23. Role of Fixation Location 24 70 61 70 0 18 20 0 10 20 30 40 50 60 70 80 Std. ResNet R-Blur w. DeepGaze R-Blur w. Opt. Fix. Accuracy Clean Adv. Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
  • 24. Key Takeaways • R-Blur significantly improves the robustness of DNNs without being trained on perturbed data. • The robustness of R-Blur generalizes better than adversarial training to different perturbation types. • R-Blur shows the promise of biologically motivated approaches to improving the robustness of DNNs. 25 Shah, M.A., Kashaf, A. and Raj, B., 2023. Training on Foveated Images Improves Robustness to Adversarial Attacks. NeurIPS (2023)
  • 25. Biological Priors 26 1. Foveation via Adaptive Blurring and Desaturation 2. Biologically Inspired Audio Features 3. Fixed Interneuron Covariability 4. Recurrence
  • 26. Audio Features For Automatic Speech Recognition • Early ASR approaches used hand-crafted features -- often inspired by biology, particularly the cochlea • The simple and popular approach • Time-frequency analysis via FFT (sim. basilar membrane) • Bank of band-pass Mel filters (sim. characteristic frequencies in cochlea) • Non-linearity (sim. hair cell response) 27 Spectrogram Log Mel-Spectrogram STFT Waveform Mel-Filterbank Log
  • 27. Bio-plausible Audio Features for ASR • More bio-plausible feature extraction methods exist but not widely used • Known to improve noise robustness, but adversarial robustness not evaluated • We evaluate the performance of several exiting biologically plausible audio feature • We also propose novel features 28
  • 28. Features Evaluated So Far Feature Salient Feature Log Spectrogram Time-frequency representation + non-linearity Log Mel Spectrogram Triangular RFs with CFs on the Mel Scale Cochleagram [Feather+23] Gammatone RFs with CFs on the ERB scale + power-law non-linearity Gammatone Spectrogram Same as Cochleagram but computed by transforming the STFT Power Normalized Coefficients [Kim+10] Power-normalized Gammatone RFs with CFs on the ERB scale + temporal masking + noise suppression + power-law non-linearity Difference of Gammatones Lateral suppression by frequencies around the CF Frequency Masked Spectrogram Simulates simultaneous frequency masking 29
  • 29. Lateral Suppression via Difference of Gammatone Filters • Lateral Suppression: • the response at a given CF may be suppressed by the energy at adjacent frequencies [Stern & Morgan 12] • Enhances responses to spectral changes and reduces impact of noise • Proposal: take a difference of Gammatone filterbank 31
  • 30. Difference of Gammatone Filters • Create 2 Gammatone frequency response curves with different widths, and subtract. • Normalize by sum of positive values 𝐺𝑑 𝑘 = 𝐺1 𝑘 − 𝐺2 𝑘 𝐺𝑑 𝑘, 𝑡 = 𝐺𝑑[𝑘, 𝑡] 𝑡 max{𝐺𝑑[𝑘, 𝑡] , 0} Frequency Amplitude
  • 31. Difference of Gammatone Filterbank Power Normalized Gammatone Filterbank Responses Normalized Difference of Gammatone Filterbank Responses Frequency (FFT bin) Amplitude Amplitude Frequency (FFT bin)
  • 32. Applying DoG Filterbank • Convolve the DoG Filterbank over the STFT 𝑆𝑥 𝐺 = 𝑆𝑥 ∗ 𝐺 • Half-wave Rectify 𝑆𝑥 𝐺 + 𝑘, 𝑡 = max 𝑆𝑥 𝐺 𝑘, 𝑡 , 0 • Non-linear Compression 𝑆𝑥[𝑘, 𝑡] = 𝑆𝑥 𝐺 + 𝑘, 𝑡 0.3 34
  • 33. Example: Concord Returned To Its Place Amidst The Tents Power Normalized Gammatone Normalized Difference of Gammatone Time (Window) Frequency (FFT bin) Frequency (FFT bin) Time (Window)
  • 34. Effect on Robustness 36 • Model: CNN + 16 layer conformer • 65 non-adversarial audio transforms • Untargeted gradient-based attack • SNR-bounded PGD @ 10,20,30,40 dB • NWER: 1 |Δ| 𝛿∈Δ 𝑊𝐸𝑅𝑓 𝛿 𝑊𝐸𝑅𝑀𝐹𝐶𝐶 𝛿
  • 35. Simultaneous Frequency Masking • High power at a frequency can raise the threshold of hearing for adjacent frequencies • Frequencies below the threshold are inaudible, i.e. masked • Exploited for MP3 compression – more compression in masked spectro- temporal regions. • Proposal: Compute the hearing the hearing threshold and zero-out the masked region 37
  • 36. Frequency Masked Spectrogram • Estimate the masking threshold for each (FFT) frequency [Qin+19, Lin+15] • Zero-out regions of the spectrogram where Power- Spectral Density (PSD) falls below the threshold 38 Time (window) Time (window) Frequency (FFT bin) Frequency (FFT bin)
  • 37. Estimating the Masking Threshold [Qin+19] 1. Smoothed Normalized PSD 𝑝𝑥 𝑘 = 20log10 𝑠𝑥 𝑘 1 𝑁 𝑝𝑥 𝑘 = 96 − max 𝑘 𝑝𝑥 𝑘 + 𝑝𝑥 𝑘 𝑝𝑥 𝑚 𝑘 = 10log10 10𝑝𝑥 𝑘−1 + 10𝑝𝑥 𝑘 + 10𝑝𝑥 𝑘+1 2. Two-sided spreading function 𝑆𝐹 𝑖, 𝑗 = 27Δ𝑏𝑖𝑗 Δ𝑏𝑖𝑗 > 0 𝐺 𝑖 ⋅ Δ𝑏𝑖𝑗 𝑜𝑤 Δ𝑏𝑖𝑗 = 𝑏𝑎𝑟𝑘 𝑓𝑗 − 𝑏𝑎𝑟𝑘 𝑓𝑖 𝐺 𝑖 = −27 + 0.37max{𝑝𝑥 𝑖 − 40,0} 39 masker maskee
  • 38. Estimating the Masking Threshold (cont.) 3. Pairwise Threshold 𝑇 𝑖, 𝑗 = 𝑝𝑥 𝑚 𝑖 + Δ𝑚 𝑖 + 𝑆𝐹 𝑖, 𝑗 Δ𝑚 𝑖 = −6.025 − 0.275 ⋅ 𝑏𝑎𝑟𝑘(𝑖) 4. Global Threshold 𝜃𝑥 𝑘 = 10log10 10 𝐴𝑇𝐻(𝑖) 10 + 𝑖 10 𝑇[𝑖,𝑘] 10 40 Time PSD
  • 39. Applying Masking 5. Create a mask 𝛼𝑥(𝑘) = 𝐼 𝑝𝑥 𝑚 𝑘 > 𝜃𝑥 𝑘 6. Apply Mask to Spectrogram 𝑠𝑥(𝑘) = 𝑠𝑥 𝑘 ⊙ 𝛼𝑥(𝑘) 7. Apply non-linearity 𝑠𝑥 𝑘 0.3 41 Time Time
  • 40. Robustness of All Features • Model: CNN + 16 layer conformer • 65 non-adversarial audio transforms • Untargeted gradient-based attack • SNR-bounded PGD @ 10,20,30,40 dB • NWER: 1 |Δ| 𝛿∈Δ 𝑊𝐸𝑅𝑓 𝛿 𝑊𝐸𝑅𝑀𝐹𝐶𝐶 𝛿 • Gammatone FB generally improves robustness • Best against Adv: Difference of Gammatone • Best against non-Adv: Gammatone Spectrogram 42
  • 41. Clean WER of All Features • Model: CNN + 16 layer conformer • All features have low WER • Gammatone Spectrogram lowest WER 43
  • 42. Key Takeaway and Future Work • Certain biological phenomenon (lateral suppression) improves robustness to adversarial attack • While others (temporal masking) do not • Gammatone FB generally improves robustness • The gammatone spectrogram has lowest WER on clean data and non- adversarial perturbations • Simulate detailed cochlear models (e.g. CARFAC [Lyon 12], Seneff) • Creating efficient PyTorch implementation taking time. 44
  • 43. Biological Priors 45 1. Foveation via Adaptive Blurring and Desaturation 2. Biologically Inspired Audio Features 3. Fixed Interneuron Covariability 4. Recurrence
  • 44. Fixed Inter-Neuron Covariability Induces Adversarial Robustness 46 • Inter-neuron correlations in the brain are rigid [Hennig+21] • Inter-neuron correlations in DNNs are flexible • Change based on stimulus distribution Shah, M.A. and Raj, B., Fixed Inter-Neuron Covariability Induces Adversarial Robustness, ICASSP (2024)
  • 45. SCA Layer 47 Transform the activations so they respect the learned correlation 1. 𝒖 ← 𝑓 𝑥 2. For 𝑡: 1 → 𝑇 do 3. 𝒂𝒙 ← 𝜙(𝒖) 4. 𝐽 ← 𝒂𝒙 − 𝜙 𝑾𝒈𝒂𝒙 + 𝒃𝒈 2 + 𝜆 𝒙 − 𝑾ℎ𝒂𝒙 − 𝒃ℎ 2 5. 𝒖 ← 𝜂∇𝒂𝒙 𝐽 5. End for 6. 𝒂𝒙 ← 𝜙(𝒖) Map to correlation regularization 𝑊 𝑔 0 𝒂𝒙 Shah, M.A. and Raj, B., Fixed Inter-Neuron Covariability Induces Adversarial Robustness, ICASSP (2024) Diagonal 0
  • 46. Result #1: SCA Layer Reduces Change in Inter- Neuron Correlation 48 FMNIST MNIST Speech Commands Shah, M.A. and Raj, B., Fixed Inter-Neuron Covariability Induces Adversarial Robustness, ICASSP (2024)
  • 47. Results #2: SCA Layer Makes Models More Robust 49 Shah, M.A. and Raj, B., Fixed Inter-Neuron Covariability Induces Adversarial Robustness, ICASSP (2024) -0.5 1.5 1.8 0.7 0.5 4.6 -1.3 -4.2 3.3 Clean Non-Adv Perturb Adv Perturb Difference In Accuracy (SCA-MLP) MNIST FMNIST Speech Commands
  • 48. Biological Priors 50 1. Foveation via Adaptive Blurring and Desaturation 2. Biologically Inspired Audio Features 3. Fixed Interneuron Covariability 4. Recurrence
  • 49. Recurrent Connections in the Brain • Recurrent circuits are wide spread in the brain [Bullier+01, Briggs+20] • Lateral connections between neurons in the same region • Feedback connections from higher cognitive areas to lower areas • May fill in missing information due to crowding or occlusion [Spoerer+17, Boutin+21] • Not represented in DNNs 51
  • 50. Classification Adding Recurrence to DNNs 52 Input Conv-Pool Conv+Upsample Conv Conv-Pool Global Pooling Conv-Pool Linear Proj Feedforward Pathways Lateral Recurrence Feedback 𝑡 = 1 𝑡 = 2 𝑡 = 𝑇 Conv+Upsample
  • 51. Results 53 Adding recurrence improves accuracy on clean and adversarially perturbed data CIFAR-10 Time Steps
  • 52. Reconstructing From Feedback Signal • Without constraints degenerate solutions are possible • Recurrent connections may learn identity functions • Explicitly encourage models to fill in missing information 54
  • 53. Conv+Upsample Conv-Pool Reconstructing From Feedback Signal ℒ 𝑥, 𝑥, 𝑦 = 𝑥 − 𝑥 2 + 𝑋𝑒𝑛𝑡(𝑥, 𝑦) 55 Classification Conv+Upsample Conv Conv-Pool Global Pooling Conv-Pool Linear Proj Reconstruction Input Radom Occlusions tanh Prediction of occluded information 𝑥 𝑥
  • 54. Results 56 Adding reconstruction significantly improves accuracy on adversarially perturbed data, but reduces accuracy on clean data
  • 55. Key Takeaways and Future Work • Scale up experiments to larger models and datasets • Explore synergies with other works like R-Blur and bio-plausible audio feature 57
  • 56. Summary 58 Foveation via Adaptive Blurring and Desaturation Biologically Inspired Audio Features Fixed Interneuron Covariability Recurrence Improved robustness to adv & non-adv perturbations lateral suppression improved adv robustness Improved robustness to adv & non-adv perturbations Improved adv robustness. • FB w/ recon yields better robustness.
  • 57. References P. Benz, C. Zhang, and I. S. Kweon. Batch normalization increases adversarial vulnerability and decreases adversarial transferability: A non-robust feature perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7818–7827, 2021. A. Brandmeyer, R. Lyon, and R. Weiss. Cascade of asymmetric resonators with fast-acting com- pression cochlear model, 2015. S. Bubeck and M. Sellke. A universal law of robustness via isoperimetry. Advances in Neural Information Processing Systems, 34:28811–28822, 2021. B. Choksi, M. Mozafari, C. Biggs O’May, B. Ador, A. Alamia, and R. VanRullen. Predify: Aug- menting deep neural networks with brain-inspired predictive coding dynamics. Advances in Neural Information Processing Systems, 34:14069–14083, 2021. J. Dapello, T. Marques, M. Schrimpf, F. Geiger, D. Cox, and J. J. DiCarlo. Simulating a primary visual cortex at the front of cnns improves robustness to image perturbations. Advances in Neural Information Processing Systems, 33:13073–13087, 2020.
  • 58. References J. M. Gant, A. Banburski, and A. Deza. Evaluating the adversarial robustness of a foveated texture transform module in a cnn. In SVRHM 2021 Workshop@ NeurIPS, 2021. H. Hermansky, N. Morgan, A. Bayya, and P. Kohn. Rasta-plp speech analysis. In Proc. IEEE Int’l Conf. Acoustics, speech and signal processing, volume 1, pages 121–124, 1991. Y. Huang, J. Gornet, S. Dai, Z. Yu, T. Nguyen, D. Tsao, and A. Anandkumar. Neural networks with recurrent generative feedback. Advances in Neural Information Processing Systems, 33: 535–545, 2020. J. Kubilius, M. Schrimpf, A. Nayebi, D. Bear, D. L. Yamins, and J. J. DiCarlo. Cornet: Modeling the neural mechanisms of core object recognition. BioRxiv, page 408385, 2018. A. Jonnalagadda, W. Y. Wang, B. Manjunath, and M. Eckstein. Foveater: Foveated transformer for image classification, 2022. R. Lyon. Computational models of neural auditory processing. In ICASSP’84. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 9, pages 41–44. IEEE, 1984. Lin, Y. and Abdulla, W. H. Principles of psychoacoustics. In Audio Watermark, pp. 15–49. Springer, 2015.
  • 60. All Components of R-Blur Improve Robustness 62
  • 61. R-Blur vs. Gaussian Noise vs. Gaussian Blur 63

Editor's Notes

  1. DNNs are known to be sensitive to minor perturbations that humans are invariant to. This short coming raises questions about their reliability in real-world scenarios where unexpected errors can have serious consequences.
  2. Perhaps the most concerning of these perturbations are adversarial attacks. This is the classic example of adversarial attacks where the image of a panda is imperceptibly perturbed and now the DNN calls it a gibbon The imperceptibility and semantic irrelevance of the adversarial perturbations make them particularly pernicious, and jarring for human observers. [CLICK] It has been shown that such adversarial perturbations can even be realized in the real world and can be used to compromise DNNs deployed in the wild. For example, researches showed that bu putting these unusual but benign looking patterns on clothing can cause DNNs to make errors like not detecting a person when one is present, or misidentifying them. Given the growing pervasiveness of AI in society, it is essential to develop models that are robust to adversarial perturbations, that is, they can make accurate prediction even in the presence of adversarial perturbation.
  3. Over the year various types of adversarial attacks have been developed, some of which are shown here. Different types of adversarial attacks generate different types of perturbations. [CLICK] Some of them seek to generate perturbations are imperceptible. Since imperceptibility is not well defined, different attacks approximate it differently using various distance metrics. [CLICK] Meanwhile other attacks produce perturbations that may be perceptible but human perception is invariant to them. [CLICK] Ideally, we would want DNNs to be robust to all such types of attacks and perturbations, [CLICK] however this is often not the case.
  4. Several methods have been proposed to make DNNs robust to adversarial attacks. Most of these methods fall into 2 high-level categories: attack neutralizers, and attack aware defenses. [CLICK] Attack neutralizers are modules that neutralize or reject adversarially perturbed inputs before they reach the underlying DNNs. Examples of such methods are randomized sequences of image transforms that render the adversarial attacks inert, and attack detectors which reject inputs that are considered to be adversarially perturbed. [CLICK] The short-coming of these methods is that they do not actually make the DNN more robust, [CLICK] but rather they just increase the computational cost of generating successful perturbations. [CLICK] Attack aware defenses refer to techniques that introduce adversarial perturbations in the model selection process. The most popular of these, and in fact among all defenses, [CLICK] is adversarial training, which trains the DNN on adversarially perturbed data Other approaches include neural architecture search, and pruning based on accuracy on adversarial inputs, as well as Certified defenses, which are techniques that are accompanied by theoretical guarantees of robustness under certain conditions. [CLICK] Since attack aware defenses rely on precisely defining and/or simulating adversarial attacks, they tend to overfit to the types of attacks used during model selection, and their robustness does not generalize to other types of attacks. [CLICK] In contrast there is a relatively smaller body of work has explored attack agnostic defenses. These methods make the models more robust without introducing adversarial perturbations during training or model selection. As a result, these methods tend to [CLICK] generalize better and provide robustness against diverse adversarial attacks. For this reason in our work we will explore attack agnostic defenses.
  5. We can think of Attack agnostic defenses as imposing priors on the model design and selection process in order to induce robustness. [CLICK] These are usually Structural priors that are related to design elements of DNNs, like architecture, activation functions, etc. that yield models that are naturally more robust to adversarial attacks without being trained on adversarially perturbed data. Recently a growing body of work has also explored [CLICK] biological priors. These are mechanisms and constraints found in biological perception that contribute to its robustness. Examples of biological priors are foveated vision, greater sensitivity to low frequency audio signals than high frequency ones, and stochastic neural activity. Researchers studying biological robustness priors usually computationalize biological mechanisms and constraints and integrate them into DNNs under the hypothesis that doing so will enhance robustness. In this talk, I will focus on my work on biological priors for robustness.
  6. [CLICK] In our work we consider two types of robustness priors, namely structural priors and biological priors. Structural priors are design elements of DNNs, like architecture, activation functions, etc. that yield models that are naturally more robust to adversarial attacks without being trained on adversarially perturbed data. On the other hand, biological priors are mechanisms and constraints found in biological perception that contribute to its robustness. In this talk, I will focus on our work on biological priors for robustness With regards to biological priors, we consider two subtypes of priors, namely [CLICK] sensory and cognitive. [CLICK] In the domain of vision we have simulated foveation using adaptive blurring and desaturation, and found it to improve robustness In the domain of audio [CLICK] we propose to explore biologically plausible audio features as sensory robustness priors and evaluate their impact on the accuracy and robustness of ASR models [CLICK] Moving to cognitive priors. We have simulated inflexible inter-neuron correlations and have shown that this leads to improved robustness to adversarial and non-adversarial perturbations We have also introduced lateral and feed-back recurrent connections in DNNs, similar to those observed in the brain. We observe that models with these recurrent connections become more robust. We propose to extend this line of research to the domain of audio by designing biologically-plausible DNN modifications that improve the robustness of speech recognition model We also propose to develop methods to align speech-processing DNNs with human salience judgements. To accurately track progress and compare with prior work, we propose to develop a benchmark to measure the robustness of speech recognition DNNs to adversarial and non-adversarial perturbations. [CLICK] We will start with our work on structural robustness priors
  7. Before diving in to the work I’ll first motivate why it makes sense to turn to biology for inspiration when designing robust DNNs. The key observation is that adversarial attacks are possible because the decision function of DNNs differs from the decision function of humans. This is illustrated by the following cartoon. [CLICK] Here the blue region contains all the images that humans would classify as a cat, and this yellow region contains the images humans would classify as dogs However, [CLICK] a DNN classifier trained on images of cats and dogs sampled from the blue and yellow regions may not accurately model the human’s decision function. For example, This decision boundary shown here is optimal for the training data, however it intersects the blue region, and therefore permits adversarial attacks [CLICK] because adversary can perturb an image of a cat such that it will still be classified as a cat by humans, but it will be classified as a dog by the DNN.
  8. Based on this observation it is reasonable to expect that [CLICK] aligning DNNs with Human Perception will make Them More Robust to perturbations, like adversarial attacks, that humans are invariant to. In order to align DNNs with human perception biological priors are introduced into DNNs. [CLICK] Indeed, there is some evidence from prior work that indicates a positive relationship between biological plausibility of DNNs and their robustness to adversarial attacks. This figure from Dapello et.al plots the accuracy of the DNNs on adversarially perturbed inputs against how well the neurons in the said DNNs approximate the responses of the neurons in V1. We can see from the figure that models that better approximate biological perception are also more adversarially robust.
  9. With this motivation in place, I will now provide an overview of the body of work that I will present in this talk. As I mentioned earlier, this talk will be about our work related to identifying and integrating biological robustness priors into DNNs. In our work, we have investigated biological priors related to the various levels of the perceptual system. [CLICK] Starting with the sensory organs. [CLICK]We study the role of the retina by simulating foveation using adaptive blurring and desaturation and found it to improve the robustness of image recognition DNNs to adversarial and non-adversarial perturbations. [CLICK] We study the role of biological auditory system by evaluating the robustness of several popular audio feature extraction methods that seek to simulate aspects of biological auditory processing. We also go on to propose novel feature extraction methods that simulate aspects of biological auditory processing that are not covered by the prior work. [CLICK] We then move up the perceptual system, into the brain where we investigate the role of local and global inter-neuron dynamics in enabling robust perception. [CLICK] Starting with local inter-neuron dynamics, we study the role of inflexible inter-neuron correlations in robust perception by developing a DNN layer that explicitly constrains the inter-neuron correlations. We observe that our proposed layer improves robustness to both adversarial and non-adversarial input perturbations. [CLICK] Going a level up the, we then study the role of global inter-neuron dynamics encoded by wide-spread recurrent connections in the brain. Specifically, we simulate lateral and feedback recurrent connections in DNNs and observe that doing so improves their adversarial robustness. With this high-level overview of our work in place, I’ll start getting into details of our work in each area. [CLICK]
  10. Starting with our investigation of the role of the retinal foveation and desaturation in robust visual sensing.
  11. To simulate foveation we develop an image filter, called R-Blur, which transforms the image on the left to the one on the right.
  12. At a high-level Rblur performs the following computations: [CLICK] Given an image, first a fixation point is selected, possibly using a DNN. [CLICK] Then gaussian noise is added to the image to simulate the stochasticity of biological neurons. [CLICK] Then two copies of the image are made, one colored, and the other greyscale, to simulate the two types of photoreceptors in the retina. The images are adaptively blurred. In the colored image, the regions close to the fixation point are blurred less than regions further away from it, while in the greyscaled image the opposite is done. The numbers here indicate the std. dev of the Gaussian kernel used to blur the region bounded within the boxes. [CLICK] The colored and greyscaled images are then combined in a pixelwise weighted combination, such that the weights for the colored image is higher near the fixation point and lower further away from it. With this high-level overview in place, I’ll go into some details about each step.
  13. I’ll start with how we compute eccentricity, [CLICK]which is the distance from the fixation point. [CLICK] Generally eccentricity is computed radially, as shown in the polar plot here. [CLICK] Eccentricity defined as such creates circular regions around the fixation point with the same eccentricity as shown in the contour plot. In the process of foveation we will need to extract regions having the same eccentricity and blur them with the same intensity. Due to how arrays are organized in computer memory, extracting circular regions can be highly inefficient. [CLICK] Therefore, in the interest of efficiency we use the ℓ ∞ norm to quantify the distance to the fixation point. Doing so results in square equidistant regions which can be extracted very efficiently via slicing.
  14. We use the eccentricity for each location to estimate the visual acuity at that location. Visual acuity refers to the ability to perceive spatial details in the image. [CLICK] The acuity of color vision arising from the cone photoreceptors is maximum at the fixation point and decays exponentially away from it [CLICK] Meanwhile the acuity of monochromatic vision arising from the rod photoreceptors is generally low, and is minimal at the fixation point. [CLICK] We use the equations shown here to approximate the color and mono-chrome visual acuity, and the resulting values are plotted in the figure on the right. In the figure the x-axis represents eccentricity and the y-axis represents the estimated acuity value. Here we are not particularly interested in the actual acuity value but rather we want to model shape of the curve relating acuity and eccentricity.
  15. [CLICK] Each unique acuity value will require the image to be blurred with a different intensity [CLICK] and there can be as many unique acuity values as the longest dimension of the image. For reasonably large images, we would need to apply 100s of different Gaussian blur kernels per image which will become prohibitively expensive. [CLICK] To improve efficiency, we quantize the estimated acuity values so that we can bring down the unique acuity values, and consequently the number of blurring kernels to a reasonable number.
  16. Since we had quantized the acuity values, we now have square regions of the image with the same acuity values, as shown in the figure to the right. We extract these regions and apply the Gaussian blur to them with the intensity, or std. dev, of the blurring kernel being proportional to the estimated acuity. The std. dev values for each region are indicated by the number in the callouts.
  17. After blurring the color and grey images are combined in a pixel-wise convex combination, where the weights of the combination are the estimated color and mono-chorme acuity values, normalized by their sum.
  18. To integrate R-Blur with a DNN classifier, such as a ResNet, we apply R-Blur to the original image before passing it to the DNN. Since humans consider several fixation points before reaching a decision, we use a fixation prediction model along with R-Blur to generate a scanpath of fixated images. [CLICK] Given the original image, we set the initial fixation point to an arbitrarily chosen predefined location in the top-left corner, and apply R-Blur. [CLICK] We then pass the blurred image to a fixation prediction network, such as DeepGaze. The Fixation prediction network is trained on human gaze tracking data and predicts a heatmap representing, for each pixel location, the likelihood of humans fixating on it. We select the most likely location from this heatmap as the next fixation point in the saccade. We then repeat this process to get a scanpath of several fixation points and foveated images. [CLICK] The foveated images are passed into a DNN and the output of the DNN is averaged to obtain the final output upon which the prediction will be based.
  19. We find that models augmented with R-Blur are significantly more robust than the vanilla ResNet to adversarial attack. This trend is visible on toy datasets such as CIFAR-10, as well as more complex datasets like imagenet.
  20. We also note that augmenting DNNs with R-Blur makes them more robust to non-adversarial corruptions as well. [CLICK] We observe that a ResNet augmented with R-Blur generally achieves higher accuracy than a vanilla ResNet, and even an adversarially trained ResNet on the Imagenet-C benchmark. Interestingly, except for the highest corruption severity, AT performs at par or worse than even the vanilla resnet baseline.
  21. [CLICK] Simulativing foveation via R-Blur significantly improves the robustness of DNNs without being trained on perturbed data. [CLICK] The robustness of R-Blur generalizes better than adversarial training to different perturbation types. [CLICK] R-Blur shows the promise of biologically motivated approaches to improving the robustness of DNNs.
  22. Having studied the role of the retina in robust perception, we extend our investigation to the auditory system, where study the role of cochlear processing in robust speech recognition.
  23. [CLICK] Early methods for automatic speech recognition used hand-crafted features that were inspired by the biological auditory system, and in particular with the cochlea. [CLICK] Many such features were proposed over the year that modeled the cochlea to varying degrees of detail. However, most of these approaches have fallen by the wayside in favor of the following relatively simple approach that simulates only three aspects of the cochlea [CLICK] First, time-frequency analysis is performed via FFT to simulate the basilar membrane, which results in the spectrogram shown here. The x-axis of the spectrogram represents time, and the y-axis represents the frequency, and the intensity of the color represents the energy at a particular frequency at a given time. [CLICK] Then a bank of band-pass filters on the Mel scale are applied to simulate the characteristic frequencies at different locations in the cochlea. The triangular Mel filterbank is illustrated here. Each triangular curve represents the amplitude response of a particular cochlear location to the range of frequencies indicated on the x-axis. The center frequency for each filter is the characteristic frequency for the particular cochlear site. The center frequencies and the width of the triangular filters are distributed according to the mel scale. This filterbank is convolved with the spectrogram to obtain the estimated responses for each filter, or cochlear site. [CLICK] Finally, a log non-linearity is applied to simulate the range of hair cell responses. Resulting in the final Log-Mel Spectrogram. This approach is still used in SOTA ASR models, including openai’s whisper.
  24. Despite it success, the aforementioned simple approach does not account for several aspects of cochlear processing that other feature extraction algorithms from prior works have simulated. However, since these aspects of cochlear processing did not yield gains in clean accuracy they are not widely used in modern speech recognition models. With that said, prior work as shown that these more bio-plausible features are more robust to noise, but their impact on adversarial robustness remains to be evaluated. In this work we train speech recognition models on several biologically plausible feature representations and evaluate their robustness to various forms of corruptions, including adversarial attacks. We then go on to propose novel feature extraction methods that integrate aspects of biological audio processing that a linked to robust speech recognition.
  25. We started our evaluation with several biologically motivated features with open-source implementations in python. These include … We intend to extend this list with more detailed cochlear models from the literature but their implementation is an involved process. In addition to feature extraction methods from prior work we have also designed and evaluated [CLICK] two novel approaches that seek to simulate two aspects of biological audition namely [CLICK] lateral suppression and [CLICK] simultaneous frequency masking.
  26. [CLICK] Lateral suppression is the phenomenon due to which the response at a given CF may be suppressed by the energy at adjacent frequencies. [CLICK] This has the effect of enhancing responses to spectral changes and reduces the impact of noise. Perhaps one could model this by applying a center surround Mexican hat filter to the spectrogram. However, that would assume that the spectral size of the excitatory and suppression fields is constant at each spectro-temporal location. With slight modification, we can create two gammatone filterbanks with the excitatory field of one being slightly larger than the other, and take the difference of the two to obtain a Mexican hat like filter except that the width of the center and surround increases at higher frequencies, and is decreases at lower frequencies. [CLICK] Whereas since the spectral resolution of auditory perception is higher at lower frequencies the excitatory fields and the suppression fields at lower frequencies should be narrower than those at higher frequencies. The Gammatone filterbank models the sizes of the excitatory field at different frequencies. With slight modification, we can create two gammatone filterbanks with the excitatory field of one being slightly larger than the other, and take the difference of the two to obtain a Mexican hat like filter except that the width of the center and surround increases at higher frequencies.
  27. [CLICK] Lateral suppression is the phenomenon due to which the response at a given CF may be suppressed by the energy at adjacent frequencies. [CLICK] This has the effect of enhancing responses to spectral changes and reduces the impact of noise. Perhaps one could model this by applying a center surround Mexican hat filter to the spectrogram. However, that would assume that the spectral size of the excitatory and suppression fields is constant at each spectro-temporal location. [CLICK] With slight modification, we can create two gammatone filterbanks with the excitatory field of one being slightly larger than the other, and take the difference of the two to obtain a Mexican hat like filter except that the width of the center and surround increases at higher frequencies, and is decreases at lower frequencies.
  28. Concretely, we instantiate two gammatone filters with one being 25% wider than the other. The figure to the right, which has frequency on the x-axis and amplitude on the y-axis shows these two filters in blue and orange. We then subtract the wider filter from the larger, and normalize by the sum of the positive values to get the filter shown in green.
  29. Here we show the original power normalized Gammatone filterbank along side the Difference of Gammatone filterbank derived from it.
  30. On the left we have the spectrogram after being filtered by the Gammatone filterbank, and on the left is the same spectrogram after being filtered with the difference of Gammatone filterbank.
  31. DoG significantly improves adversarial robustness
  32. Studies from neuroscience have shown that inter-neuron correlations in the brain are highly inflexible. It has been observed that neuronal activations do not violate their correlation patterns even if it negatively impacts task performance. In contrast, the inter-neuron correlation in DNNs is highly variable and can change drastically as the input is perturbed. We observe that this variability is linked with the adversarial robustness of DNNs. This can be observed from the figure on the right which tracks the norm of the change in the inter-neuron correlation matrix as the size of the perturbation is increased. The change in the inter-neuron correlations of a model trained on natural data, shown in orange, is significantly greater than the change in inter-neuron correlations for an adversarially trained model, shown in blue. This indicates that adversarial robustness, leads to more rigid inter-neuron correlations. In this work we show that the reverse is also true. That constraining the inter-neuron correlations to be more rigid causes the model to become more adversarially robust
  33. To introduce this constraint in DNNs we develop the SCA layer. The SCA layer initially performs the same feed forward computation as a standard DNN layer, but then optimizes its activations such that activation of each neuron becomes predictiable from the activations of all the other neurons. This objective is encoded by the green term in Line 4. To preclude degenerate solutions, we add the blue regularization term that encourages a_x to retain information about the input.
  34. Experimental results on image and sound recognition tasks show that replacing the linear layers with the SCA layer reduces the change in the inter-neuron correlations induced by input perturbations
  35. Here we present the difference between the accuracy of DNNs with the SCA layer and vanilla MLPs on clean data, and data perturbed by non-adversarial and adversarial perturbations. We observe that the SCA layer has minimal impact on clean accuracy, with generally enhancing the accuracy on perturbed data. Importantly, we note that the SCA layer is consistently more adversarially robust than the MLP.
  36. Recurrent connections are highly prevalent in the brain Broadly speaking there are two types of recurrent connections, namely lateral and feedback connections. Lateral connections are between neurons in the same region or at the same level of the perceptual hierarchy On the other hand, feedback connections connect neurons at higher levels of the perceptual hierarchy, to neurons at lower levels Prior work has posited that these feedback pathways are related to robust perception in the presence of crowding or occlusions. In this work we seek to introduce both lateral and feedback connections into convolutional neural networks
  37. Concretely we modify standard feedforward convolutional neural networks by adding [CLICK] lateral and [CLICK] feedback pathways. The lateral pathways add the output of each layer to its input, while the feedback pathways add the output of each layer to the input of the preceding layer. We implement both types of recurrent connections via convolution layers followed by upsampling. To resolve the recurrent loops we unroll [CLICK] the network for a fixed number of time steps.
  38. Here we plot the accuracy of the recurrent models for various sizes of adversarial perturbations as the number of unrolled time steps is increased. When only 1 time step is used the model is identical to a feedforward CNN. We find that adding recurrence indeed improves the accuracy of the DNN on clean and adversarially perturbed data.
  39. It is possible however, that the recurrent connections might not be fully utilized in the absence of constraints that explicitly engage them. Since feedback connections are known to aid in perception under occlusions, we explicitly train them to fill in occluded parts of the image.
  40. To do this we adopt the following procedure: [CLICK] First, random occlusions are added to the original image. [CLICK] Then the occluded image is passed to the recurrent CNN [CLICK] We consider the feedback signal originating from the first layer to carry the occluded information. [CLICK] so we add it to the original image to obtain the reconstructed image. [CLICK] The model is now trained to minimize the divergence between the original and reconstructed image, while also being trained to perform classification.
  41. Here we present the accuracy of the recurrent models with and without the reconstruction constraint against adversarial perturbations of various sizes. We observe that adding the reconstruction constraint significantly improves the robustness of the CNN to adversarial attacks. However it does cause a slight degradation in accuracy on unperturbed data.