This presentation shows some of my work carried out as part of my master thesis on "Mathematical Analysis of Neural Networks" at TUM Chair of Applied Numerical Analysis under Prof. Dr. Massimo Fornasier. The thesis constitutes a literature review with the aim of analysing and contrasting some of the approaches in the mathematical analysis of neural networks. The thesis focuses on 3 key aspects: Modern and classical approximation theory, robustness and stability of neural networks and unique identification of network weights. While the three themes carry approximately equal weight in the thesis, this presentation gives only a very short overview over the first and third chapter of my thesis and focuses on the robustness chapter. See also the full text version available on SlideShare/LinkedIn.
2. 2/26
Main Idea
Overview of the status quo of mathematics of neural networks
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 2 / 26
3. 2/26
Main Idea
Overview of the status quo of mathematics of neural networks
Central Question: Why are neural networks so successful in practice?
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 2 / 26
4. 2/26
Main Idea
Overview of the status quo of mathematics of neural networks
Central Question: Why are neural networks so successful in practice?
Focus on three areas:
1 Approximation Theory
2 Stability
3 Unique Identification of Network Parameters
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 2 / 26
5. 3/26
Outline
Overview of the Thesis
1 Approximation Theory
2 Stability
3 Unique Identification of Network Parameters
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 3 / 26
6. 3/26
Outline
Overview of the Thesis
1 Approximation Theory
2 Stability
3 Unique Identification of Network Parameters
Stability
Adversarial Examples
Scattering Networks
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 3 / 26
7. 4/26
Approximation Theory
Universal approximation: Theoretic ability to approximate any
(reasonable) function to any arbitrary degree of accuracy by a shallow
neural network under mild assumptions on the non-linearity [6][2].
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 4 / 26
8. 4/26
Approximation Theory
Universal approximation: Theoretic ability to approximate any
(reasonable) function to any arbitrary degree of accuracy by a shallow
neural network under mild assumptions on the non-linearity [6][2].
Order of Approximation: Upper and lower bounds on
E(X; Σr (σ)) := sup
f ∈X
inf
g∈Σr (σ)
f − g X
are in [11]
O(r−s/d
)
for d the dimension, r the number of hidden units, s the degree of
smoothness.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 4 / 26
9. 4/26
Approximation Theory
Universal approximation: Theoretic ability to approximate any
(reasonable) function to any arbitrary degree of accuracy by a shallow
neural network under mild assumptions on the non-linearity [6][2].
Order of Approximation: Upper and lower bounds on
E(X; Σr (σ)) := sup
f ∈X
inf
g∈Σr (σ)
f − g X
are in [11]
O(r−s/d
)
for d the dimension, r the number of hidden units, s the degree of
smoothness.
Curse of Dimensionality: The number of units in the hidden layer
necessary for fixed accuracy is in the order O( −d/s).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 4 / 26
10. 5/26
Approximation Theory - Breaking the Curse of
Dimensionality
Central Question: Why are deep neural networks preferred over
shallow ones in practice?
1
Tomaso Poggio et al. Why and when can deep-but not shallow-networks avoid the
curse of dimensionality: A review. 2017. DOI: 10.1007/s11633-017-1054-2. arXiv:
1611.00740.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 5 / 26
11. 5/26
Approximation Theory - Breaking the Curse of
Dimensionality
Central Question: Why are deep neural networks preferred over
shallow ones in practice?
Linear instead of exponential order of approximation with deep neural
networks for compositional functions1 (e.g.
f (x1, x2, x3, x4) = h2(h11(x1, x2), h12(x3, x4)))
1
Tomaso Poggio et al. Why and when can deep-but not shallow-networks avoid the
curse of dimensionality: A review. 2017. DOI: 10.1007/s11633-017-1054-2. arXiv:
1611.00740.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 5 / 26
12. 5/26
Approximation Theory - Breaking the Curse of
Dimensionality
Central Question: Why are deep neural networks preferred over
shallow ones in practice?
Linear instead of exponential order of approximation with deep neural
networks for compositional functions1 (e.g.
f (x1, x2, x3, x4) = h2(h11(x1, x2), h12(x3, x4)))
Locality of the constituent functions as a key to the success of CNNs,
rather than weight sharing.
1
Tomaso Poggio et al. Why and when can deep-but not shallow-networks avoid the
curse of dimensionality: A review. 2017. DOI: 10.1007/s11633-017-1054-2. arXiv:
1611.00740.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 5 / 26
14. 7/26
Unique Identification of Neural Network Parameters
Approximation of a target function given by a one layer neural network
using polynomially many samples
2
Hanie Sedghi and Anima Anandkumar. “Provable methods for training neural
networks with sparse connectivity”. In: arXiv preprint arXiv:1412.2693 (2014).
3
Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. “Beating the perils of
non-convexity: Guaranteed training of neural networks using tensor methods”. In: arXiv
preprint arXiv:1506.08473 (2015).
4
Massimo Fornasier, Jan Vybíral, and Ingrid Daubechies. “Robust and Resource
Efficient Identification of Shallow Neural Networks by Fewest Samples”. In: (2019).
arXiv: arXiv:1804.01592v2.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 7 / 26
15. 7/26
Unique Identification of Neural Network Parameters
Approximation of a target function given by a one layer neural network
using polynomially many samples
Unique identification of the network parameters
2
Hanie Sedghi and Anima Anandkumar. “Provable methods for training neural
networks with sparse connectivity”. In: arXiv preprint arXiv:1412.2693 (2014).
3
Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. “Beating the perils of
non-convexity: Guaranteed training of neural networks using tensor methods”. In: arXiv
preprint arXiv:1506.08473 (2015).
4
Massimo Fornasier, Jan Vybíral, and Ingrid Daubechies. “Robust and Resource
Efficient Identification of Shallow Neural Networks by Fewest Samples”. In: (2019).
arXiv: arXiv:1804.01592v2.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 7 / 26
16. 7/26
Unique Identification of Neural Network Parameters
Approximation of a target function given by a one layer neural network
using polynomially many samples
Unique identification of the network parameters
Comparison of two approaches using tensor decomposition2,3 and
spectral norm optimisation in matrix subspaces4
2
Hanie Sedghi and Anima Anandkumar. “Provable methods for training neural
networks with sparse connectivity”. In: arXiv preprint arXiv:1412.2693 (2014).
3
Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. “Beating the perils of
non-convexity: Guaranteed training of neural networks using tensor methods”. In: arXiv
preprint arXiv:1506.08473 (2015).
4
Massimo Fornasier, Jan Vybíral, and Ingrid Daubechies. “Robust and Resource
Efficient Identification of Shallow Neural Networks by Fewest Samples”. In: (2019).
arXiv: arXiv:1804.01592v2.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 7 / 26
21. 10/26
Adversarial Perturbations
Adversarial examples generalise across models [15], [5]
Hypotheses on distribution of adversarial examples in input space [15],
[5]
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
22. 10/26
Adversarial Perturbations
Adversarial examples generalise across models [15], [5]
Hypotheses on distribution of adversarial examples in input space [15],
[5]
Fast Gradient Sign Method by Goodfellow [5]
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
23. 10/26
Adversarial Perturbations
Adversarial examples generalise across models [15], [5]
Hypotheses on distribution of adversarial examples in input space [15],
[5]
Fast Gradient Sign Method by Goodfellow [5]
Maximise the network loss:
∆x = arg max
r ∞≤λ
J(θ, x + r, y)
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
24. 10/26
Adversarial Perturbations
Adversarial examples generalise across models [15], [5]
Hypotheses on distribution of adversarial examples in input space [15],
[5]
Fast Gradient Sign Method by Goodfellow [5]
Maximise the network loss:
∆x = arg max
r ∞≤λ
J(θ, x + r, y)
DeepFool [12] for minimal adversarial perturbations in the lp norm
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
25. 11/26
ADef5
- Adversarial Deformations
Deformation of an input image f : [0, 1]2 → R with respect to vector
field τ : [0, 1]2 → R2 as
Lτ f (x) = f (x − τ(x)) ∀x ∈ [0, 1]2
5
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 11 / 26
26. 11/26
ADef5
- Adversarial Deformations
Deformation of an input image f : [0, 1]2 → R with respect to vector
field τ : [0, 1]2 → R2 as
Lτ f (x) = f (x − τ(x)) ∀x ∈ [0, 1]2
In general, unboundedness of r = f − Lτ f in the Lp norm even for
imperceptible deformations
5
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 11 / 26
27. 11/26
ADef5
- Adversarial Deformations
Deformation of an input image f : [0, 1]2 → R with respect to vector
field τ : [0, 1]2 → R2 as
Lτ f (x) = f (x − τ(x)) ∀x ∈ [0, 1]2
In general, unboundedness of r = f − Lτ f in the Lp norm even for
imperceptible deformations
Size of the deformation measured as
τ T := max
s,t∈[W ]
τ(s, t) 2 .
5
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 11 / 26
28. 12/26
ADef6
6
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 12 / 26
29. 13/26
ADef7
Iterative Construction of Adversarial Examples with Gradient Descent
7
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 13 / 26
30. 13/26
ADef7
Iterative Construction of Adversarial Examples with Gradient Descent
ADef successfully fools a CNN on MNIST and an Inception-v3 and a
ResNet-101 on ImageNet.
7
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 13 / 26
31. 13/26
ADef7
Iterative Construction of Adversarial Examples with Gradient Descent
ADef successfully fools a CNN on MNIST and an Inception-v3 and a
ResNet-101 on ImageNet.
Deformations larger in · T than common perturbations in the · ∞
norm, though imperceptible to the human eye.
7
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 13 / 26
32. 14/26
Adversarial Training - A Game Theory Perspective8
Cast the optimisation problem of FGSM
∆x = arg max
r ∞≤λ
J(θ, x + r, y)
into a game theory framework
π∗
, ρ∗
:= arg min
π
arg max
ρ
Ep∼π,r∼ρ [J(Mp(θ), x + r, y)]
8
Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial
defense”. In: arXiv preprint arXiv:1803.01442 (2018).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 14 / 26
33. 14/26
Adversarial Training - A Game Theory Perspective8
Cast the optimisation problem of FGSM
∆x = arg max
r ∞≤λ
J(θ, x + r, y)
into a game theory framework
π∗
, ρ∗
:= arg min
π
arg max
ρ
Ep∼π,r∼ρ [J(Mp(θ), x + r, y)]
Defence strategy Mp(θ): Stochastic Activation Pruning (SAP)
8
Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial
defense”. In: arXiv preprint arXiv:1803.01442 (2018).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 14 / 26
34. 14/26
Adversarial Training - A Game Theory Perspective8
Cast the optimisation problem of FGSM
∆x = arg max
r ∞≤λ
J(θ, x + r, y)
into a game theory framework
π∗
, ρ∗
:= arg min
π
arg max
ρ
Ep∼π,r∼ρ [J(Mp(θ), x + r, y)]
Defence strategy Mp(θ): Stochastic Activation Pruning (SAP)
Drop activations on the forward pass with probability proportional to
their absolute value. Rescale remaining activations.
8
Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial
defense”. In: arXiv preprint arXiv:1803.01442 (2018).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 14 / 26
35. 14/26
Adversarial Training - A Game Theory Perspective8
Cast the optimisation problem of FGSM
∆x = arg max
r ∞≤λ
J(θ, x + r, y)
into a game theory framework
π∗
, ρ∗
:= arg min
π
arg max
ρ
Ep∼π,r∼ρ [J(Mp(θ), x + r, y)]
Defence strategy Mp(θ): Stochastic Activation Pruning (SAP)
Drop activations on the forward pass with probability proportional to
their absolute value. Rescale remaining activations.
Apply SAP post-hoc to the pretrained model.
8
Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial
defense”. In: arXiv preprint arXiv:1803.01442 (2018).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 14 / 26
37. 16/26
Scattering Networks9
Aim: Find an embedding Φ for signal f that is translation invariant
and stable to deformations
9
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 16 / 26
38. 16/26
Scattering Networks9
Aim: Find an embedding Φ for signal f that is translation invariant
and stable to deformations
Assumption: Labels do not vary under translation, scaling or (slight)
deformation
9
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 16 / 26
39. 16/26
Scattering Networks9
Aim: Find an embedding Φ for signal f that is translation invariant
and stable to deformations
Assumption: Labels do not vary under translation, scaling or (slight)
deformation
Deformation: Lτ f (x) = f (x − τ(x))
9
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 16 / 26
40. 16/26
Scattering Networks9
Aim: Find an embedding Φ for signal f that is translation invariant
and stable to deformations
Assumption: Labels do not vary under translation, scaling or (slight)
deformation
Deformation: Lτ f (x) = f (x − τ(x))
Lipschitz continuity wrt to deformation Lτ : For compact Ω ⊂ Rd ,
there exists C > 0 with
Φ(Lτ f ) − Φ(f ) ≤ C f ( sup
x∈Rd
| τ(x)| + sup
x∈Rd
|Hτ(x)|)
for all f ∈ L2(Rd ) with support in Ω and for all τ ∈ C2(Rd ).
9
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 16 / 26
41. 17/26
Scattering Networks10
Due to Lipschitz continuity, linearisation of the deformation in the
embedding domain
10
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 17 / 26
42. 17/26
Scattering Networks10
Due to Lipschitz continuity, linearisation of the deformation in the
embedding domain
Interpretation of stability as an operator that maps a non-linear
deformation to a linear movement in a linear space that can be
captured by a linear classifier
10
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 17 / 26
43. 17/26
Scattering Networks10
Due to Lipschitz continuity, linearisation of the deformation in the
embedding domain
Interpretation of stability as an operator that maps a non-linear
deformation to a linear movement in a linear space that can be
captured by a linear classifier
After application of Φ, classification with linear classifier despite the
deformation τ being potentially very non-linear
10
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 17 / 26
44. 18/26
Why use wavelets?11
Inspiration for the construction of Φ from the Littlewood-Paley
wavelet transform
11
Stéphane Mallat. Scattering Invariant Deep Networks for Classification.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 18 / 26
45. 18/26
Why use wavelets?11
Inspiration for the construction of Φ from the Littlewood-Paley
wavelet transform
Function representation
WJf := {f ∗ φ2J , (f ∗ ψλ)λ∈ΛJ
}
where ΛJ = {λ = 2j r : r ∈ G+, 2j > 2−J}.
11
Stéphane Mallat. Scattering Invariant Deep Networks for Classification.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 18 / 26
46. 19/26
The Scattering Transform
Definition (Scattering Transform)
Define the windowed scattering transform for all p = (λ1, · · · , λm) as
SJ[p]f (x) = |· · · |f ∗ ψλ1 | ∗ ψλ2 | · · · |∗ψλm | ∗ φ2J (x)
φ2J (x) is a local averaging.
Properties12:
Preservation of the L2 norm
12
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 19 / 26
47. 19/26
The Scattering Transform
Definition (Scattering Transform)
Define the windowed scattering transform for all p = (λ1, · · · , λm) as
SJ[p]f (x) = |· · · |f ∗ ψλ1 | ∗ ψλ2 | · · · |∗ψλm | ∗ φ2J (x)
φ2J (x) is a local averaging.
Properties12:
Preservation of the L2 norm
Translation Invariance
12
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 19 / 26
48. 19/26
The Scattering Transform
Definition (Scattering Transform)
Define the windowed scattering transform for all p = (λ1, · · · , λm) as
SJ[p]f (x) = |· · · |f ∗ ψλ1 | ∗ ψλ2 | · · · |∗ψλm | ∗ φ2J (x)
φ2J (x) is a local averaging.
Properties12:
Preservation of the L2 norm
Translation Invariance
Stability to deformations
12
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 19 / 26
49. 20/26
Connection to CNNs13
To compute the scattering transform iterate on operator
UJf = {f ∗ φ2J , (U[λ]f )λ∈ΛJ
}
where U[λ]f = |f ∗ ψλ|.
Figure: Black nodes: averaging operation. White nodes: application of the
operator U[λ1], U[λ1, λ2] etc.
13
Stéphane Mallat. “Group invariant scattering”. In: Communications on Pure and
Applied Mathematics 65.10 (2012), pp. 1331–1398.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 20 / 26
50. 21/26
Insights - Progressive Linearisation14
Locally linearisation of Lτ from one layer to the next
14
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
51. 21/26
Insights - Progressive Linearisation14
Locally linearisation of Lτ from one layer to the next
Lipschitz continuity to deformations preferred over invariance
14
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
52. 21/26
Insights - Progressive Linearisation14
Locally linearisation of Lτ from one layer to the next
Lipschitz continuity to deformations preferred over invariance
Assumption that translations/deformations are local symmetries i.e.
that the target function does not vary locally under them.
14
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
53. 21/26
Insights - Progressive Linearisation14
Locally linearisation of Lτ from one layer to the next
Lipschitz continuity to deformations preferred over invariance
Assumption that translations/deformations are local symmetries i.e.
that the target function does not vary locally under them.
After linearising local symmetries project linearly
14
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
54. 21/26
Insights - Progressive Linearisation14
Locally linearisation of Lτ from one layer to the next
Lipschitz continuity to deformations preferred over invariance
Assumption that translations/deformations are local symmetries i.e.
that the target function does not vary locally under them.
After linearising local symmetries project linearly
Classification using a linear classifier as is usually done in CNNs at the
last layer.
14
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
55. 22/26
References I
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an
Iterative Algorithm to Construct Adversarial Deformations”. In:
(2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv:
1804.07729.
George Cybenko. “Approximation by superpositions of a sigmoidal
function”. In: Mathematics of control, signals and systems 2.4
(1989), pp. 303–314.
Guneet S Dhillon et al. “Stochastic activation pruning for robust
adversarial defense”. In: arXiv preprint arXiv:1803.01442 (2018).
Massimo Fornasier, Jan Vybíral, and Ingrid Daubechies. “Robust and
Resource Efficient Identification of Shallow Neural Networks by
Fewest Samples”. In: (2019). arXiv: arXiv:1804.01592v2.
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy.
“Explaining and harnessing adversarial examples”. In: arXiv preprint
arXiv:1412.6572 (2014).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 22 / 26
56. 23/26
References II
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. “Multilayer
feedforward networks are universal approximators”. In: Neural
networks 2.5 (1989), pp. 359–366.
Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. “Beating
the perils of non-convexity: Guaranteed training of neural networks
using tensor methods”. In: arXiv preprint arXiv:1506.08473 (2015).
Stéphane Mallat. “Group invariant scattering”. In: Communications
on Pure and Applied Mathematics 65.10 (2012), pp. 1331–1398.
Stéphane Mallat. Scattering Invariant Deep Networks for
Classification.
Stéphane Mallat. Understanding deep convolutional networks. 2016.
DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920.
H N Mhaskar. NEURAL NETWORKS FOR OPTIMAL
APPROXIMATION OF SMOOTH AND ANALYTIC FUNCTIONS.
Tech. rep. 1996, pp. 164–177.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 23 / 26
57. 24/26
References III
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and
Pascal Frossard. “Deepfool: a simple and accurate method to fool
deep neural networks”. In: Proceedings of the IEEE conference on
computer vision and pattern recognition. 2016, pp. 2574–2582.
Tomaso Poggio et al. Why and when can deep-but not
shallow-networks avoid the curse of dimensionality: A review. 2017.
DOI: 10.1007/s11633-017-1054-2. arXiv: 1611.00740.
Hanie Sedghi and Anima Anandkumar. “Provable methods for
training neural networks with sparse connectivity”. In: arXiv preprint
arXiv:1412.2693 (2014).
Christian Szegedy et al. “Intriguing properties of neural networks”.
In: arXiv preprint arXiv:1312.6199 (2013).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 24 / 26
58. 25/26
ADef-Finding Adversarial Deformations15
For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl
where l is the true label. F∗(y) < 0.
15
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
59. 25/26
ADef-Finding Adversarial Deformations15
For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl
where l is the true label. F∗(y) < 0.
Define g : T → R as the map τ → F∗(yτ ). g(0) = F∗(y) < 0.
15
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
60. 25/26
ADef-Finding Adversarial Deformations15
For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl
where l is the true label. F∗(y) < 0.
Define g : T → R as the map τ → F∗(yτ ). g(0) = F∗(y) < 0.
Aim: Find small τ ∈ T such that g(τ) ≥ 0.
15
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
61. 25/26
ADef-Finding Adversarial Deformations15
For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl
where l is the true label. F∗(y) < 0.
Define g : T → R as the map τ → F∗(yτ ). g(0) = F∗(y) < 0.
Aim: Find small τ ∈ T such that g(τ) ≥ 0.
Approximate g around 0 as
g(τ) ≈ g(0) + (D0g)τ (1)
where D0g is the derivative of g evaluated at zero.
15
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
62. 25/26
ADef-Finding Adversarial Deformations15
For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl
where l is the true label. F∗(y) < 0.
Define g : T → R as the map τ → F∗(yτ ). g(0) = F∗(y) < 0.
Aim: Find small τ ∈ T such that g(τ) ≥ 0.
Approximate g around 0 as
g(τ) ≈ g(0) + (D0g)τ (1)
where D0g is the derivative of g evaluated at zero.
Solve (D0g)τ = −g(0).
15
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
63. 26/26
Insights16
Theorem (Lipschitz continuity to diffeomorphisms)
There exists C > 0 such that for all f ∈ L2(Rd ) with f 1 < ∞ and all
τ ∈ C2(Rd ) with τ ∞ ≤ 1/2 the following statement holds:
SJ[PJ]Lτ f − SJ[PJ]f ≤ C f 1K(τ) (2)
where
K(τ) = 2−J
τ ∞ + τ ∞ max log
∆τ ∞
τ ∞
, 1 + Hτ ∞.
2−J τ ∞ and τ ∞ quantify the extent to which τ translates and
deforms the input respectively.
16
Stéphane Mallat. “Group invariant scattering”. In: Communications on Pure and
Applied Mathematics 65.10 (2012), pp. 1331–1398.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 26 / 26