SlideShare a Scribd company logo
1 of 63
Download to read offline
1/26
Master Thesis
-
Mathematical Analysis of Neural Networks
Alina Leidinger
TUM
15 June, 2019
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 1 / 26
2/26
Main Idea
Overview of the status quo of mathematics of neural networks
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 2 / 26
2/26
Main Idea
Overview of the status quo of mathematics of neural networks
Central Question: Why are neural networks so successful in practice?
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 2 / 26
2/26
Main Idea
Overview of the status quo of mathematics of neural networks
Central Question: Why are neural networks so successful in practice?
Focus on three areas:
1 Approximation Theory
2 Stability
3 Unique Identification of Network Parameters
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 2 / 26
3/26
Outline
Overview of the Thesis
1 Approximation Theory
2 Stability
3 Unique Identification of Network Parameters
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 3 / 26
3/26
Outline
Overview of the Thesis
1 Approximation Theory
2 Stability
3 Unique Identification of Network Parameters
Stability
Adversarial Examples
Scattering Networks
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 3 / 26
4/26
Approximation Theory
Universal approximation: Theoretic ability to approximate any
(reasonable) function to any arbitrary degree of accuracy by a shallow
neural network under mild assumptions on the non-linearity [6][2].
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 4 / 26
4/26
Approximation Theory
Universal approximation: Theoretic ability to approximate any
(reasonable) function to any arbitrary degree of accuracy by a shallow
neural network under mild assumptions on the non-linearity [6][2].
Order of Approximation: Upper and lower bounds on
E(X; Σr (σ)) := sup
f ∈X
inf
g∈Σr (σ)
f − g X
are in [11]
O(r−s/d
)
for d the dimension, r the number of hidden units, s the degree of
smoothness.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 4 / 26
4/26
Approximation Theory
Universal approximation: Theoretic ability to approximate any
(reasonable) function to any arbitrary degree of accuracy by a shallow
neural network under mild assumptions on the non-linearity [6][2].
Order of Approximation: Upper and lower bounds on
E(X; Σr (σ)) := sup
f ∈X
inf
g∈Σr (σ)
f − g X
are in [11]
O(r−s/d
)
for d the dimension, r the number of hidden units, s the degree of
smoothness.
Curse of Dimensionality: The number of units in the hidden layer
necessary for fixed accuracy is in the order O( −d/s).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 4 / 26
5/26
Approximation Theory - Breaking the Curse of
Dimensionality
Central Question: Why are deep neural networks preferred over
shallow ones in practice?
1
Tomaso Poggio et al. Why and when can deep-but not shallow-networks avoid the
curse of dimensionality: A review. 2017. DOI: 10.1007/s11633-017-1054-2. arXiv:
1611.00740.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 5 / 26
5/26
Approximation Theory - Breaking the Curse of
Dimensionality
Central Question: Why are deep neural networks preferred over
shallow ones in practice?
Linear instead of exponential order of approximation with deep neural
networks for compositional functions1 (e.g.
f (x1, x2, x3, x4) = h2(h11(x1, x2), h12(x3, x4)))
1
Tomaso Poggio et al. Why and when can deep-but not shallow-networks avoid the
curse of dimensionality: A review. 2017. DOI: 10.1007/s11633-017-1054-2. arXiv:
1611.00740.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 5 / 26
5/26
Approximation Theory - Breaking the Curse of
Dimensionality
Central Question: Why are deep neural networks preferred over
shallow ones in practice?
Linear instead of exponential order of approximation with deep neural
networks for compositional functions1 (e.g.
f (x1, x2, x3, x4) = h2(h11(x1, x2), h12(x3, x4)))
Locality of the constituent functions as a key to the success of CNNs,
rather than weight sharing.
1
Tomaso Poggio et al. Why and when can deep-but not shallow-networks avoid the
curse of dimensionality: A review. 2017. DOI: 10.1007/s11633-017-1054-2. arXiv:
1611.00740.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 5 / 26
6/26
Unique Identification of Neural Network Parameters
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 6 / 26
7/26
Unique Identification of Neural Network Parameters
Approximation of a target function given by a one layer neural network
using polynomially many samples
2
Hanie Sedghi and Anima Anandkumar. “Provable methods for training neural
networks with sparse connectivity”. In: arXiv preprint arXiv:1412.2693 (2014).
3
Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. “Beating the perils of
non-convexity: Guaranteed training of neural networks using tensor methods”. In: arXiv
preprint arXiv:1506.08473 (2015).
4
Massimo Fornasier, Jan Vybíral, and Ingrid Daubechies. “Robust and Resource
Efficient Identification of Shallow Neural Networks by Fewest Samples”. In: (2019).
arXiv: arXiv:1804.01592v2.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 7 / 26
7/26
Unique Identification of Neural Network Parameters
Approximation of a target function given by a one layer neural network
using polynomially many samples
Unique identification of the network parameters
2
Hanie Sedghi and Anima Anandkumar. “Provable methods for training neural
networks with sparse connectivity”. In: arXiv preprint arXiv:1412.2693 (2014).
3
Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. “Beating the perils of
non-convexity: Guaranteed training of neural networks using tensor methods”. In: arXiv
preprint arXiv:1506.08473 (2015).
4
Massimo Fornasier, Jan Vybíral, and Ingrid Daubechies. “Robust and Resource
Efficient Identification of Shallow Neural Networks by Fewest Samples”. In: (2019).
arXiv: arXiv:1804.01592v2.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 7 / 26
7/26
Unique Identification of Neural Network Parameters
Approximation of a target function given by a one layer neural network
using polynomially many samples
Unique identification of the network parameters
Comparison of two approaches using tensor decomposition2,3 and
spectral norm optimisation in matrix subspaces4
2
Hanie Sedghi and Anima Anandkumar. “Provable methods for training neural
networks with sparse connectivity”. In: arXiv preprint arXiv:1412.2693 (2014).
3
Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. “Beating the perils of
non-convexity: Guaranteed training of neural networks using tensor methods”. In: arXiv
preprint arXiv:1506.08473 (2015).
4
Massimo Fornasier, Jan Vybíral, and Ingrid Daubechies. “Robust and Resource
Efficient Identification of Shallow Neural Networks by Fewest Samples”. In: (2019).
arXiv: arXiv:1804.01592v2.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 7 / 26
8/26
Stability
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 8 / 26
9/26
Stability
Adversarial Examples
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 9 / 26
9/26
Stability
Adversarial Examples
Scattering Networks
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 9 / 26
10/26
Adversarial Perturbations
Adversarial examples generalise across models [15], [5]
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
10/26
Adversarial Perturbations
Adversarial examples generalise across models [15], [5]
Hypotheses on distribution of adversarial examples in input space [15],
[5]
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
10/26
Adversarial Perturbations
Adversarial examples generalise across models [15], [5]
Hypotheses on distribution of adversarial examples in input space [15],
[5]
Fast Gradient Sign Method by Goodfellow [5]
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
10/26
Adversarial Perturbations
Adversarial examples generalise across models [15], [5]
Hypotheses on distribution of adversarial examples in input space [15],
[5]
Fast Gradient Sign Method by Goodfellow [5]
Maximise the network loss:
∆x = arg max
r ∞≤λ
J(θ, x + r, y)
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
10/26
Adversarial Perturbations
Adversarial examples generalise across models [15], [5]
Hypotheses on distribution of adversarial examples in input space [15],
[5]
Fast Gradient Sign Method by Goodfellow [5]
Maximise the network loss:
∆x = arg max
r ∞≤λ
J(θ, x + r, y)
DeepFool [12] for minimal adversarial perturbations in the lp norm
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
11/26
ADef5
- Adversarial Deformations
Deformation of an input image f : [0, 1]2 → R with respect to vector
field τ : [0, 1]2 → R2 as
Lτ f (x) = f (x − τ(x)) ∀x ∈ [0, 1]2
5
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 11 / 26
11/26
ADef5
- Adversarial Deformations
Deformation of an input image f : [0, 1]2 → R with respect to vector
field τ : [0, 1]2 → R2 as
Lτ f (x) = f (x − τ(x)) ∀x ∈ [0, 1]2
In general, unboundedness of r = f − Lτ f in the Lp norm even for
imperceptible deformations
5
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 11 / 26
11/26
ADef5
- Adversarial Deformations
Deformation of an input image f : [0, 1]2 → R with respect to vector
field τ : [0, 1]2 → R2 as
Lτ f (x) = f (x − τ(x)) ∀x ∈ [0, 1]2
In general, unboundedness of r = f − Lτ f in the Lp norm even for
imperceptible deformations
Size of the deformation measured as
τ T := max
s,t∈[W ]
τ(s, t) 2 .
5
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 11 / 26
12/26
ADef6
6
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 12 / 26
13/26
ADef7
Iterative Construction of Adversarial Examples with Gradient Descent
7
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 13 / 26
13/26
ADef7
Iterative Construction of Adversarial Examples with Gradient Descent
ADef successfully fools a CNN on MNIST and an Inception-v3 and a
ResNet-101 on ImageNet.
7
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 13 / 26
13/26
ADef7
Iterative Construction of Adversarial Examples with Gradient Descent
ADef successfully fools a CNN on MNIST and an Inception-v3 and a
ResNet-101 on ImageNet.
Deformations larger in · T than common perturbations in the · ∞
norm, though imperceptible to the human eye.
7
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 13 / 26
14/26
Adversarial Training - A Game Theory Perspective8
Cast the optimisation problem of FGSM
∆x = arg max
r ∞≤λ
J(θ, x + r, y)
into a game theory framework
π∗
, ρ∗
:= arg min
π
arg max
ρ
Ep∼π,r∼ρ [J(Mp(θ), x + r, y)]
8
Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial
defense”. In: arXiv preprint arXiv:1803.01442 (2018).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 14 / 26
14/26
Adversarial Training - A Game Theory Perspective8
Cast the optimisation problem of FGSM
∆x = arg max
r ∞≤λ
J(θ, x + r, y)
into a game theory framework
π∗
, ρ∗
:= arg min
π
arg max
ρ
Ep∼π,r∼ρ [J(Mp(θ), x + r, y)]
Defence strategy Mp(θ): Stochastic Activation Pruning (SAP)
8
Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial
defense”. In: arXiv preprint arXiv:1803.01442 (2018).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 14 / 26
14/26
Adversarial Training - A Game Theory Perspective8
Cast the optimisation problem of FGSM
∆x = arg max
r ∞≤λ
J(θ, x + r, y)
into a game theory framework
π∗
, ρ∗
:= arg min
π
arg max
ρ
Ep∼π,r∼ρ [J(Mp(θ), x + r, y)]
Defence strategy Mp(θ): Stochastic Activation Pruning (SAP)
Drop activations on the forward pass with probability proportional to
their absolute value. Rescale remaining activations.
8
Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial
defense”. In: arXiv preprint arXiv:1803.01442 (2018).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 14 / 26
14/26
Adversarial Training - A Game Theory Perspective8
Cast the optimisation problem of FGSM
∆x = arg max
r ∞≤λ
J(θ, x + r, y)
into a game theory framework
π∗
, ρ∗
:= arg min
π
arg max
ρ
Ep∼π,r∼ρ [J(Mp(θ), x + r, y)]
Defence strategy Mp(θ): Stochastic Activation Pruning (SAP)
Drop activations on the forward pass with probability proportional to
their absolute value. Rescale remaining activations.
Apply SAP post-hoc to the pretrained model.
8
Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial
defense”. In: arXiv preprint arXiv:1803.01442 (2018).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 14 / 26
15/26
Scattering Networks
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 15 / 26
16/26
Scattering Networks9
Aim: Find an embedding Φ for signal f that is translation invariant
and stable to deformations
9
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 16 / 26
16/26
Scattering Networks9
Aim: Find an embedding Φ for signal f that is translation invariant
and stable to deformations
Assumption: Labels do not vary under translation, scaling or (slight)
deformation
9
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 16 / 26
16/26
Scattering Networks9
Aim: Find an embedding Φ for signal f that is translation invariant
and stable to deformations
Assumption: Labels do not vary under translation, scaling or (slight)
deformation
Deformation: Lτ f (x) = f (x − τ(x))
9
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 16 / 26
16/26
Scattering Networks9
Aim: Find an embedding Φ for signal f that is translation invariant
and stable to deformations
Assumption: Labels do not vary under translation, scaling or (slight)
deformation
Deformation: Lτ f (x) = f (x − τ(x))
Lipschitz continuity wrt to deformation Lτ : For compact Ω ⊂ Rd ,
there exists C > 0 with
Φ(Lτ f ) − Φ(f ) ≤ C f ( sup
x∈Rd
| τ(x)| + sup
x∈Rd
|Hτ(x)|)
for all f ∈ L2(Rd ) with support in Ω and for all τ ∈ C2(Rd ).
9
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 16 / 26
17/26
Scattering Networks10
Due to Lipschitz continuity, linearisation of the deformation in the
embedding domain
10
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 17 / 26
17/26
Scattering Networks10
Due to Lipschitz continuity, linearisation of the deformation in the
embedding domain
Interpretation of stability as an operator that maps a non-linear
deformation to a linear movement in a linear space that can be
captured by a linear classifier
10
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 17 / 26
17/26
Scattering Networks10
Due to Lipschitz continuity, linearisation of the deformation in the
embedding domain
Interpretation of stability as an operator that maps a non-linear
deformation to a linear movement in a linear space that can be
captured by a linear classifier
After application of Φ, classification with linear classifier despite the
deformation τ being potentially very non-linear
10
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 17 / 26
18/26
Why use wavelets?11
Inspiration for the construction of Φ from the Littlewood-Paley
wavelet transform
11
Stéphane Mallat. Scattering Invariant Deep Networks for Classification.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 18 / 26
18/26
Why use wavelets?11
Inspiration for the construction of Φ from the Littlewood-Paley
wavelet transform
Function representation
WJf := {f ∗ φ2J , (f ∗ ψλ)λ∈ΛJ
}
where ΛJ = {λ = 2j r : r ∈ G+, 2j > 2−J}.
11
Stéphane Mallat. Scattering Invariant Deep Networks for Classification.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 18 / 26
19/26
The Scattering Transform
Definition (Scattering Transform)
Define the windowed scattering transform for all p = (λ1, · · · , λm) as
SJ[p]f (x) = |· · · |f ∗ ψλ1 | ∗ ψλ2 | · · · |∗ψλm | ∗ φ2J (x)
φ2J (x) is a local averaging.
Properties12:
Preservation of the L2 norm
12
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 19 / 26
19/26
The Scattering Transform
Definition (Scattering Transform)
Define the windowed scattering transform for all p = (λ1, · · · , λm) as
SJ[p]f (x) = |· · · |f ∗ ψλ1 | ∗ ψλ2 | · · · |∗ψλm | ∗ φ2J (x)
φ2J (x) is a local averaging.
Properties12:
Preservation of the L2 norm
Translation Invariance
12
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 19 / 26
19/26
The Scattering Transform
Definition (Scattering Transform)
Define the windowed scattering transform for all p = (λ1, · · · , λm) as
SJ[p]f (x) = |· · · |f ∗ ψλ1 | ∗ ψλ2 | · · · |∗ψλm | ∗ φ2J (x)
φ2J (x) is a local averaging.
Properties12:
Preservation of the L2 norm
Translation Invariance
Stability to deformations
12
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 19 / 26
20/26
Connection to CNNs13
To compute the scattering transform iterate on operator
UJf = {f ∗ φ2J , (U[λ]f )λ∈ΛJ
}
where U[λ]f = |f ∗ ψλ|.
Figure: Black nodes: averaging operation. White nodes: application of the
operator U[λ1], U[λ1, λ2] etc.
13
Stéphane Mallat. “Group invariant scattering”. In: Communications on Pure and
Applied Mathematics 65.10 (2012), pp. 1331–1398.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 20 / 26
21/26
Insights - Progressive Linearisation14
Locally linearisation of Lτ from one layer to the next
14
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
21/26
Insights - Progressive Linearisation14
Locally linearisation of Lτ from one layer to the next
Lipschitz continuity to deformations preferred over invariance
14
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
21/26
Insights - Progressive Linearisation14
Locally linearisation of Lτ from one layer to the next
Lipschitz continuity to deformations preferred over invariance
Assumption that translations/deformations are local symmetries i.e.
that the target function does not vary locally under them.
14
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
21/26
Insights - Progressive Linearisation14
Locally linearisation of Lτ from one layer to the next
Lipschitz continuity to deformations preferred over invariance
Assumption that translations/deformations are local symmetries i.e.
that the target function does not vary locally under them.
After linearising local symmetries project linearly
14
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
21/26
Insights - Progressive Linearisation14
Locally linearisation of Lτ from one layer to the next
Lipschitz continuity to deformations preferred over invariance
Assumption that translations/deformations are local symmetries i.e.
that the target function does not vary locally under them.
After linearising local symmetries project linearly
Classification using a linear classifier as is usually done in CNNs at the
last layer.
14
Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI:
10.1098/rsta.2015.0203. arXiv: 1601.04920.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
22/26
References I
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an
Iterative Algorithm to Construct Adversarial Deformations”. In:
(2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv:
1804.07729.
George Cybenko. “Approximation by superpositions of a sigmoidal
function”. In: Mathematics of control, signals and systems 2.4
(1989), pp. 303–314.
Guneet S Dhillon et al. “Stochastic activation pruning for robust
adversarial defense”. In: arXiv preprint arXiv:1803.01442 (2018).
Massimo Fornasier, Jan Vybíral, and Ingrid Daubechies. “Robust and
Resource Efficient Identification of Shallow Neural Networks by
Fewest Samples”. In: (2019). arXiv: arXiv:1804.01592v2.
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy.
“Explaining and harnessing adversarial examples”. In: arXiv preprint
arXiv:1412.6572 (2014).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 22 / 26
23/26
References II
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. “Multilayer
feedforward networks are universal approximators”. In: Neural
networks 2.5 (1989), pp. 359–366.
Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. “Beating
the perils of non-convexity: Guaranteed training of neural networks
using tensor methods”. In: arXiv preprint arXiv:1506.08473 (2015).
Stéphane Mallat. “Group invariant scattering”. In: Communications
on Pure and Applied Mathematics 65.10 (2012), pp. 1331–1398.
Stéphane Mallat. Scattering Invariant Deep Networks for
Classification.
Stéphane Mallat. Understanding deep convolutional networks. 2016.
DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920.
H N Mhaskar. NEURAL NETWORKS FOR OPTIMAL
APPROXIMATION OF SMOOTH AND ANALYTIC FUNCTIONS.
Tech. rep. 1996, pp. 164–177.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 23 / 26
24/26
References III
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and
Pascal Frossard. “Deepfool: a simple and accurate method to fool
deep neural networks”. In: Proceedings of the IEEE conference on
computer vision and pattern recognition. 2016, pp. 2574–2582.
Tomaso Poggio et al. Why and when can deep-but not
shallow-networks avoid the curse of dimensionality: A review. 2017.
DOI: 10.1007/s11633-017-1054-2. arXiv: 1611.00740.
Hanie Sedghi and Anima Anandkumar. “Provable methods for
training neural networks with sparse connectivity”. In: arXiv preprint
arXiv:1412.2693 (2014).
Christian Szegedy et al. “Intriguing properties of neural networks”.
In: arXiv preprint arXiv:1312.6199 (2013).
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 24 / 26
25/26
ADef-Finding Adversarial Deformations15
For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl
where l is the true label. F∗(y) < 0.
15
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
25/26
ADef-Finding Adversarial Deformations15
For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl
where l is the true label. F∗(y) < 0.
Define g : T → R as the map τ → F∗(yτ ). g(0) = F∗(y) < 0.
15
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
25/26
ADef-Finding Adversarial Deformations15
For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl
where l is the true label. F∗(y) < 0.
Define g : T → R as the map τ → F∗(yτ ). g(0) = F∗(y) < 0.
Aim: Find small τ ∈ T such that g(τ) ≥ 0.
15
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
25/26
ADef-Finding Adversarial Deformations15
For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl
where l is the true label. F∗(y) < 0.
Define g : T → R as the map τ → F∗(yτ ). g(0) = F∗(y) < 0.
Aim: Find small τ ∈ T such that g(τ) ≥ 0.
Approximate g around 0 as
g(τ) ≈ g(0) + (D0g)τ (1)
where D0g is the derivative of g evaluated at zero.
15
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
25/26
ADef-Finding Adversarial Deformations15
For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl
where l is the true label. F∗(y) < 0.
Define g : T → R as the map τ → F∗(yτ ). g(0) = F∗(y) < 0.
Aim: Find small τ ∈ T such that g(τ) ≥ 0.
Approximate g around 0 as
g(τ) ≈ g(0) + (D0g)τ (1)
where D0g is the derivative of g evaluated at zero.
Solve (D0g)τ = −g(0).
15
Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative
Algorithm to Construct Adversarial Deformations”. In: (2018). DOI:
10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
26/26
Insights16
Theorem (Lipschitz continuity to diffeomorphisms)
There exists C > 0 such that for all f ∈ L2(Rd ) with f 1 < ∞ and all
τ ∈ C2(Rd ) with τ ∞ ≤ 1/2 the following statement holds:
SJ[PJ]Lτ f − SJ[PJ]f ≤ C f 1K(τ) (2)
where
K(τ) = 2−J
τ ∞ + τ ∞ max log
∆τ ∞
τ ∞
, 1 + Hτ ∞.
2−J τ ∞ and τ ∞ quantify the extent to which τ translates and
deforms the input respectively.
16
Stéphane Mallat. “Group invariant scattering”. In: Communications on Pure and
Applied Mathematics 65.10 (2012), pp. 1331–1398.
Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 26 / 26

More Related Content

What's hot

8 予測と回帰分析
8 予測と回帰分析8 予測と回帰分析
8 予測と回帰分析Seiichi Uchida
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series DataArun Kejariwal
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...tuxette
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2IAEME Publication
 
Neural network and deep learning Devfest17
Neural network and deep learning   Devfest17Neural network and deep learning   Devfest17
Neural network and deep learning Devfest17Issam AlZinati
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionAdnan Masood
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingYanbin Kong
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]JULIO GONZALEZ SANZ
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsAustin Benson
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan forijaia
 
2018 algorithms for the minmax regret path problem with interval data
2018   algorithms for the minmax regret path problem with interval data2018   algorithms for the minmax regret path problem with interval data
2018 algorithms for the minmax regret path problem with interval dataFrancisco Pérez
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelYanbin Kong
 

What's hot (17)

8 予測と回帰分析
8 予測と回帰分析8 予測と回帰分析
8 予測と回帰分析
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
 
Neural network and deep learning Devfest17
Neural network and deep learning   Devfest17Neural network and deep learning   Devfest17
Neural network and deep learning Devfest17
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Pca
PcaPca
Pca
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Time series deep learning
Time series   deep learningTime series   deep learning
Time series deep learning
 
PCA
PCAPCA
PCA
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and Understanding
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifs
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan for
 
2018 algorithms for the minmax regret path problem with interval data
2018   algorithms for the minmax regret path problem with interval data2018   algorithms for the minmax regret path problem with interval data
2018 algorithms for the minmax regret path problem with interval data
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative Model
 
Pca ankita dubey
Pca ankita dubeyPca ankita dubey
Pca ankita dubey
 

Similar to Master Thesis Presentation (Subselection of Topics)

GIS-Based Adaptive Neuro Fuzzy Inference System (ANFIS) Method for Vulnerabil...
GIS-Based Adaptive Neuro Fuzzy Inference System (ANFIS) Method for Vulnerabil...GIS-Based Adaptive Neuro Fuzzy Inference System (ANFIS) Method for Vulnerabil...
GIS-Based Adaptive Neuro Fuzzy Inference System (ANFIS) Method for Vulnerabil...Dai Quoc Tran
 
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Jihun Yun
 
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksQuantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksValentin De Bortoli
 
Blind Audio Source Separation (Bass): An Unsuperwised Approach
Blind Audio Source Separation (Bass): An Unsuperwised Approach Blind Audio Source Separation (Bass): An Unsuperwised Approach
Blind Audio Source Separation (Bass): An Unsuperwised Approach IJEEE
 
Gradient-based Meta-learning with learned layerwise subspace and metric
Gradient-based Meta-learning with learned layerwise subspace and metricGradient-based Meta-learning with learned layerwise subspace and metric
Gradient-based Meta-learning with learned layerwise subspace and metricNAVER Engineering
 
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceGradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceYoonho Lee
 
Network analysis for computational biology
Network analysis for computational biologyNetwork analysis for computational biology
Network analysis for computational biologytuxette
 
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)XAIC
 
Eligibility of village fund direct cash assistance recipients using artificia...
Eligibility of village fund direct cash assistance recipients using artificia...Eligibility of village fund direct cash assistance recipients using artificia...
Eligibility of village fund direct cash assistance recipients using artificia...IAESIJAI
 
PERFORMANCE ANALYSIS OF NEURAL NETWORK MODELS FOR OXAZOLINES AND OXAZOLES DER...
PERFORMANCE ANALYSIS OF NEURAL NETWORK MODELS FOR OXAZOLINES AND OXAZOLES DER...PERFORMANCE ANALYSIS OF NEURAL NETWORK MODELS FOR OXAZOLINES AND OXAZOLES DER...
PERFORMANCE ANALYSIS OF NEURAL NETWORK MODELS FOR OXAZOLINES AND OXAZOLES DER...ijistjournal
 
Multitask Adversarial Learning of Deep Neural Networks for Medical Imaging an...
Multitask Adversarial Learning of Deep Neural Networks for Medical Imaging an...Multitask Adversarial Learning of Deep Neural Networks for Medical Imaging an...
Multitask Adversarial Learning of Deep Neural Networks for Medical Imaging an...Debdoot Sheet
 
Isspit presentation
Isspit presentationIsspit presentation
Isspit presentationELVINUGONNA
 
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET Journal
 
Novel approach for predicting the rise and fall of stock index for a specific...
Novel approach for predicting the rise and fall of stock index for a specific...Novel approach for predicting the rise and fall of stock index for a specific...
Novel approach for predicting the rise and fall of stock index for a specific...IAEME Publication
 
TOPOLOGY MAP ANALYSIS FOR EFFECTIVE CHOICE OF NETWORK ATTACK SCENARIO
TOPOLOGY MAP ANALYSIS FOR EFFECTIVE CHOICE OF NETWORK ATTACK SCENARIOTOPOLOGY MAP ANALYSIS FOR EFFECTIVE CHOICE OF NETWORK ATTACK SCENARIO
TOPOLOGY MAP ANALYSIS FOR EFFECTIVE CHOICE OF NETWORK ATTACK SCENARIOIJCNCJournal
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Dataidescitation
 
performance analysis of different radiation pattern using genetic algorithm
performance analysis of different radiation pattern using genetic algorithmperformance analysis of different radiation pattern using genetic algorithm
performance analysis of different radiation pattern using genetic algorithmijtsrd
 

Similar to Master Thesis Presentation (Subselection of Topics) (20)

Reconstruction of ecological interaction networks by Raphaëlle Momal-Leisenri...
Reconstruction of ecological interaction networks by Raphaëlle Momal-Leisenri...Reconstruction of ecological interaction networks by Raphaëlle Momal-Leisenri...
Reconstruction of ecological interaction networks by Raphaëlle Momal-Leisenri...
 
GIS-Based Adaptive Neuro Fuzzy Inference System (ANFIS) Method for Vulnerabil...
GIS-Based Adaptive Neuro Fuzzy Inference System (ANFIS) Method for Vulnerabil...GIS-Based Adaptive Neuro Fuzzy Inference System (ANFIS) Method for Vulnerabil...
GIS-Based Adaptive Neuro Fuzzy Inference System (ANFIS) Method for Vulnerabil...
 
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
 
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksQuantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
 
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
 
Blind Audio Source Separation (Bass): An Unsuperwised Approach
Blind Audio Source Separation (Bass): An Unsuperwised Approach Blind Audio Source Separation (Bass): An Unsuperwised Approach
Blind Audio Source Separation (Bass): An Unsuperwised Approach
 
Gradient-based Meta-learning with learned layerwise subspace and metric
Gradient-based Meta-learning with learned layerwise subspace and metricGradient-based Meta-learning with learned layerwise subspace and metric
Gradient-based Meta-learning with learned layerwise subspace and metric
 
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceGradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
 
Network analysis for computational biology
Network analysis for computational biologyNetwork analysis for computational biology
Network analysis for computational biology
 
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
파이콘 한국 2019 튜토리얼 - 설명가능인공지능이란? (Part 1)
 
Eligibility of village fund direct cash assistance recipients using artificia...
Eligibility of village fund direct cash assistance recipients using artificia...Eligibility of village fund direct cash assistance recipients using artificia...
Eligibility of village fund direct cash assistance recipients using artificia...
 
PERFORMANCE ANALYSIS OF NEURAL NETWORK MODELS FOR OXAZOLINES AND OXAZOLES DER...
PERFORMANCE ANALYSIS OF NEURAL NETWORK MODELS FOR OXAZOLINES AND OXAZOLES DER...PERFORMANCE ANALYSIS OF NEURAL NETWORK MODELS FOR OXAZOLINES AND OXAZOLES DER...
PERFORMANCE ANALYSIS OF NEURAL NETWORK MODELS FOR OXAZOLINES AND OXAZOLES DER...
 
Multitask Adversarial Learning of Deep Neural Networks for Medical Imaging an...
Multitask Adversarial Learning of Deep Neural Networks for Medical Imaging an...Multitask Adversarial Learning of Deep Neural Networks for Medical Imaging an...
Multitask Adversarial Learning of Deep Neural Networks for Medical Imaging an...
 
Isspit presentation
Isspit presentationIsspit presentation
Isspit presentation
 
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
 
CI image processing
CI image processing CI image processing
CI image processing
 
Novel approach for predicting the rise and fall of stock index for a specific...
Novel approach for predicting the rise and fall of stock index for a specific...Novel approach for predicting the rise and fall of stock index for a specific...
Novel approach for predicting the rise and fall of stock index for a specific...
 
TOPOLOGY MAP ANALYSIS FOR EFFECTIVE CHOICE OF NETWORK ATTACK SCENARIO
TOPOLOGY MAP ANALYSIS FOR EFFECTIVE CHOICE OF NETWORK ATTACK SCENARIOTOPOLOGY MAP ANALYSIS FOR EFFECTIVE CHOICE OF NETWORK ATTACK SCENARIO
TOPOLOGY MAP ANALYSIS FOR EFFECTIVE CHOICE OF NETWORK ATTACK SCENARIO
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
performance analysis of different radiation pattern using genetic algorithm
performance analysis of different radiation pattern using genetic algorithmperformance analysis of different radiation pattern using genetic algorithm
performance analysis of different radiation pattern using genetic algorithm
 

Recently uploaded

DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptAmirRaziq1
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and momentdonamiaquintan2
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...HafsaHussainp
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosZachary Labe
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionJadeNovelo1
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书zdzoqco
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 

Recently uploaded (20)

DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.ppt
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and moment
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenarios
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and Function
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 

Master Thesis Presentation (Subselection of Topics)

  • 1. 1/26 Master Thesis - Mathematical Analysis of Neural Networks Alina Leidinger TUM 15 June, 2019 Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 1 / 26
  • 2. 2/26 Main Idea Overview of the status quo of mathematics of neural networks Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 2 / 26
  • 3. 2/26 Main Idea Overview of the status quo of mathematics of neural networks Central Question: Why are neural networks so successful in practice? Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 2 / 26
  • 4. 2/26 Main Idea Overview of the status quo of mathematics of neural networks Central Question: Why are neural networks so successful in practice? Focus on three areas: 1 Approximation Theory 2 Stability 3 Unique Identification of Network Parameters Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 2 / 26
  • 5. 3/26 Outline Overview of the Thesis 1 Approximation Theory 2 Stability 3 Unique Identification of Network Parameters Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 3 / 26
  • 6. 3/26 Outline Overview of the Thesis 1 Approximation Theory 2 Stability 3 Unique Identification of Network Parameters Stability Adversarial Examples Scattering Networks Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 3 / 26
  • 7. 4/26 Approximation Theory Universal approximation: Theoretic ability to approximate any (reasonable) function to any arbitrary degree of accuracy by a shallow neural network under mild assumptions on the non-linearity [6][2]. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 4 / 26
  • 8. 4/26 Approximation Theory Universal approximation: Theoretic ability to approximate any (reasonable) function to any arbitrary degree of accuracy by a shallow neural network under mild assumptions on the non-linearity [6][2]. Order of Approximation: Upper and lower bounds on E(X; Σr (σ)) := sup f ∈X inf g∈Σr (σ) f − g X are in [11] O(r−s/d ) for d the dimension, r the number of hidden units, s the degree of smoothness. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 4 / 26
  • 9. 4/26 Approximation Theory Universal approximation: Theoretic ability to approximate any (reasonable) function to any arbitrary degree of accuracy by a shallow neural network under mild assumptions on the non-linearity [6][2]. Order of Approximation: Upper and lower bounds on E(X; Σr (σ)) := sup f ∈X inf g∈Σr (σ) f − g X are in [11] O(r−s/d ) for d the dimension, r the number of hidden units, s the degree of smoothness. Curse of Dimensionality: The number of units in the hidden layer necessary for fixed accuracy is in the order O( −d/s). Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 4 / 26
  • 10. 5/26 Approximation Theory - Breaking the Curse of Dimensionality Central Question: Why are deep neural networks preferred over shallow ones in practice? 1 Tomaso Poggio et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. 2017. DOI: 10.1007/s11633-017-1054-2. arXiv: 1611.00740. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 5 / 26
  • 11. 5/26 Approximation Theory - Breaking the Curse of Dimensionality Central Question: Why are deep neural networks preferred over shallow ones in practice? Linear instead of exponential order of approximation with deep neural networks for compositional functions1 (e.g. f (x1, x2, x3, x4) = h2(h11(x1, x2), h12(x3, x4))) 1 Tomaso Poggio et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. 2017. DOI: 10.1007/s11633-017-1054-2. arXiv: 1611.00740. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 5 / 26
  • 12. 5/26 Approximation Theory - Breaking the Curse of Dimensionality Central Question: Why are deep neural networks preferred over shallow ones in practice? Linear instead of exponential order of approximation with deep neural networks for compositional functions1 (e.g. f (x1, x2, x3, x4) = h2(h11(x1, x2), h12(x3, x4))) Locality of the constituent functions as a key to the success of CNNs, rather than weight sharing. 1 Tomaso Poggio et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. 2017. DOI: 10.1007/s11633-017-1054-2. arXiv: 1611.00740. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 5 / 26
  • 13. 6/26 Unique Identification of Neural Network Parameters Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 6 / 26
  • 14. 7/26 Unique Identification of Neural Network Parameters Approximation of a target function given by a one layer neural network using polynomially many samples 2 Hanie Sedghi and Anima Anandkumar. “Provable methods for training neural networks with sparse connectivity”. In: arXiv preprint arXiv:1412.2693 (2014). 3 Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. “Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods”. In: arXiv preprint arXiv:1506.08473 (2015). 4 Massimo Fornasier, Jan Vybíral, and Ingrid Daubechies. “Robust and Resource Efficient Identification of Shallow Neural Networks by Fewest Samples”. In: (2019). arXiv: arXiv:1804.01592v2. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 7 / 26
  • 15. 7/26 Unique Identification of Neural Network Parameters Approximation of a target function given by a one layer neural network using polynomially many samples Unique identification of the network parameters 2 Hanie Sedghi and Anima Anandkumar. “Provable methods for training neural networks with sparse connectivity”. In: arXiv preprint arXiv:1412.2693 (2014). 3 Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. “Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods”. In: arXiv preprint arXiv:1506.08473 (2015). 4 Massimo Fornasier, Jan Vybíral, and Ingrid Daubechies. “Robust and Resource Efficient Identification of Shallow Neural Networks by Fewest Samples”. In: (2019). arXiv: arXiv:1804.01592v2. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 7 / 26
  • 16. 7/26 Unique Identification of Neural Network Parameters Approximation of a target function given by a one layer neural network using polynomially many samples Unique identification of the network parameters Comparison of two approaches using tensor decomposition2,3 and spectral norm optimisation in matrix subspaces4 2 Hanie Sedghi and Anima Anandkumar. “Provable methods for training neural networks with sparse connectivity”. In: arXiv preprint arXiv:1412.2693 (2014). 3 Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. “Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods”. In: arXiv preprint arXiv:1506.08473 (2015). 4 Massimo Fornasier, Jan Vybíral, and Ingrid Daubechies. “Robust and Resource Efficient Identification of Shallow Neural Networks by Fewest Samples”. In: (2019). arXiv: arXiv:1804.01592v2. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 7 / 26
  • 17. 8/26 Stability Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 8 / 26
  • 18. 9/26 Stability Adversarial Examples Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 9 / 26
  • 19. 9/26 Stability Adversarial Examples Scattering Networks Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 9 / 26
  • 20. 10/26 Adversarial Perturbations Adversarial examples generalise across models [15], [5] Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
  • 21. 10/26 Adversarial Perturbations Adversarial examples generalise across models [15], [5] Hypotheses on distribution of adversarial examples in input space [15], [5] Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
  • 22. 10/26 Adversarial Perturbations Adversarial examples generalise across models [15], [5] Hypotheses on distribution of adversarial examples in input space [15], [5] Fast Gradient Sign Method by Goodfellow [5] Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
  • 23. 10/26 Adversarial Perturbations Adversarial examples generalise across models [15], [5] Hypotheses on distribution of adversarial examples in input space [15], [5] Fast Gradient Sign Method by Goodfellow [5] Maximise the network loss: ∆x = arg max r ∞≤λ J(θ, x + r, y) Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
  • 24. 10/26 Adversarial Perturbations Adversarial examples generalise across models [15], [5] Hypotheses on distribution of adversarial examples in input space [15], [5] Fast Gradient Sign Method by Goodfellow [5] Maximise the network loss: ∆x = arg max r ∞≤λ J(θ, x + r, y) DeepFool [12] for minimal adversarial perturbations in the lp norm Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 10 / 26
  • 25. 11/26 ADef5 - Adversarial Deformations Deformation of an input image f : [0, 1]2 → R with respect to vector field τ : [0, 1]2 → R2 as Lτ f (x) = f (x − τ(x)) ∀x ∈ [0, 1]2 5 Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 11 / 26
  • 26. 11/26 ADef5 - Adversarial Deformations Deformation of an input image f : [0, 1]2 → R with respect to vector field τ : [0, 1]2 → R2 as Lτ f (x) = f (x − τ(x)) ∀x ∈ [0, 1]2 In general, unboundedness of r = f − Lτ f in the Lp norm even for imperceptible deformations 5 Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 11 / 26
  • 27. 11/26 ADef5 - Adversarial Deformations Deformation of an input image f : [0, 1]2 → R with respect to vector field τ : [0, 1]2 → R2 as Lτ f (x) = f (x − τ(x)) ∀x ∈ [0, 1]2 In general, unboundedness of r = f − Lτ f in the Lp norm even for imperceptible deformations Size of the deformation measured as τ T := max s,t∈[W ] τ(s, t) 2 . 5 Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 11 / 26
  • 28. 12/26 ADef6 6 Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 12 / 26
  • 29. 13/26 ADef7 Iterative Construction of Adversarial Examples with Gradient Descent 7 Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 13 / 26
  • 30. 13/26 ADef7 Iterative Construction of Adversarial Examples with Gradient Descent ADef successfully fools a CNN on MNIST and an Inception-v3 and a ResNet-101 on ImageNet. 7 Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 13 / 26
  • 31. 13/26 ADef7 Iterative Construction of Adversarial Examples with Gradient Descent ADef successfully fools a CNN on MNIST and an Inception-v3 and a ResNet-101 on ImageNet. Deformations larger in · T than common perturbations in the · ∞ norm, though imperceptible to the human eye. 7 Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 13 / 26
  • 32. 14/26 Adversarial Training - A Game Theory Perspective8 Cast the optimisation problem of FGSM ∆x = arg max r ∞≤λ J(θ, x + r, y) into a game theory framework π∗ , ρ∗ := arg min π arg max ρ Ep∼π,r∼ρ [J(Mp(θ), x + r, y)] 8 Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial defense”. In: arXiv preprint arXiv:1803.01442 (2018). Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 14 / 26
  • 33. 14/26 Adversarial Training - A Game Theory Perspective8 Cast the optimisation problem of FGSM ∆x = arg max r ∞≤λ J(θ, x + r, y) into a game theory framework π∗ , ρ∗ := arg min π arg max ρ Ep∼π,r∼ρ [J(Mp(θ), x + r, y)] Defence strategy Mp(θ): Stochastic Activation Pruning (SAP) 8 Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial defense”. In: arXiv preprint arXiv:1803.01442 (2018). Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 14 / 26
  • 34. 14/26 Adversarial Training - A Game Theory Perspective8 Cast the optimisation problem of FGSM ∆x = arg max r ∞≤λ J(θ, x + r, y) into a game theory framework π∗ , ρ∗ := arg min π arg max ρ Ep∼π,r∼ρ [J(Mp(θ), x + r, y)] Defence strategy Mp(θ): Stochastic Activation Pruning (SAP) Drop activations on the forward pass with probability proportional to their absolute value. Rescale remaining activations. 8 Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial defense”. In: arXiv preprint arXiv:1803.01442 (2018). Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 14 / 26
  • 35. 14/26 Adversarial Training - A Game Theory Perspective8 Cast the optimisation problem of FGSM ∆x = arg max r ∞≤λ J(θ, x + r, y) into a game theory framework π∗ , ρ∗ := arg min π arg max ρ Ep∼π,r∼ρ [J(Mp(θ), x + r, y)] Defence strategy Mp(θ): Stochastic Activation Pruning (SAP) Drop activations on the forward pass with probability proportional to their absolute value. Rescale remaining activations. Apply SAP post-hoc to the pretrained model. 8 Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial defense”. In: arXiv preprint arXiv:1803.01442 (2018). Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 14 / 26
  • 36. 15/26 Scattering Networks Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 15 / 26
  • 37. 16/26 Scattering Networks9 Aim: Find an embedding Φ for signal f that is translation invariant and stable to deformations 9 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 16 / 26
  • 38. 16/26 Scattering Networks9 Aim: Find an embedding Φ for signal f that is translation invariant and stable to deformations Assumption: Labels do not vary under translation, scaling or (slight) deformation 9 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 16 / 26
  • 39. 16/26 Scattering Networks9 Aim: Find an embedding Φ for signal f that is translation invariant and stable to deformations Assumption: Labels do not vary under translation, scaling or (slight) deformation Deformation: Lτ f (x) = f (x − τ(x)) 9 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 16 / 26
  • 40. 16/26 Scattering Networks9 Aim: Find an embedding Φ for signal f that is translation invariant and stable to deformations Assumption: Labels do not vary under translation, scaling or (slight) deformation Deformation: Lτ f (x) = f (x − τ(x)) Lipschitz continuity wrt to deformation Lτ : For compact Ω ⊂ Rd , there exists C > 0 with Φ(Lτ f ) − Φ(f ) ≤ C f ( sup x∈Rd | τ(x)| + sup x∈Rd |Hτ(x)|) for all f ∈ L2(Rd ) with support in Ω and for all τ ∈ C2(Rd ). 9 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 16 / 26
  • 41. 17/26 Scattering Networks10 Due to Lipschitz continuity, linearisation of the deformation in the embedding domain 10 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 17 / 26
  • 42. 17/26 Scattering Networks10 Due to Lipschitz continuity, linearisation of the deformation in the embedding domain Interpretation of stability as an operator that maps a non-linear deformation to a linear movement in a linear space that can be captured by a linear classifier 10 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 17 / 26
  • 43. 17/26 Scattering Networks10 Due to Lipschitz continuity, linearisation of the deformation in the embedding domain Interpretation of stability as an operator that maps a non-linear deformation to a linear movement in a linear space that can be captured by a linear classifier After application of Φ, classification with linear classifier despite the deformation τ being potentially very non-linear 10 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 17 / 26
  • 44. 18/26 Why use wavelets?11 Inspiration for the construction of Φ from the Littlewood-Paley wavelet transform 11 Stéphane Mallat. Scattering Invariant Deep Networks for Classification. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 18 / 26
  • 45. 18/26 Why use wavelets?11 Inspiration for the construction of Φ from the Littlewood-Paley wavelet transform Function representation WJf := {f ∗ φ2J , (f ∗ ψλ)λ∈ΛJ } where ΛJ = {λ = 2j r : r ∈ G+, 2j > 2−J}. 11 Stéphane Mallat. Scattering Invariant Deep Networks for Classification. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 18 / 26
  • 46. 19/26 The Scattering Transform Definition (Scattering Transform) Define the windowed scattering transform for all p = (λ1, · · · , λm) as SJ[p]f (x) = |· · · |f ∗ ψλ1 | ∗ ψλ2 | · · · |∗ψλm | ∗ φ2J (x) φ2J (x) is a local averaging. Properties12: Preservation of the L2 norm 12 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 19 / 26
  • 47. 19/26 The Scattering Transform Definition (Scattering Transform) Define the windowed scattering transform for all p = (λ1, · · · , λm) as SJ[p]f (x) = |· · · |f ∗ ψλ1 | ∗ ψλ2 | · · · |∗ψλm | ∗ φ2J (x) φ2J (x) is a local averaging. Properties12: Preservation of the L2 norm Translation Invariance 12 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 19 / 26
  • 48. 19/26 The Scattering Transform Definition (Scattering Transform) Define the windowed scattering transform for all p = (λ1, · · · , λm) as SJ[p]f (x) = |· · · |f ∗ ψλ1 | ∗ ψλ2 | · · · |∗ψλm | ∗ φ2J (x) φ2J (x) is a local averaging. Properties12: Preservation of the L2 norm Translation Invariance Stability to deformations 12 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 19 / 26
  • 49. 20/26 Connection to CNNs13 To compute the scattering transform iterate on operator UJf = {f ∗ φ2J , (U[λ]f )λ∈ΛJ } where U[λ]f = |f ∗ ψλ|. Figure: Black nodes: averaging operation. White nodes: application of the operator U[λ1], U[λ1, λ2] etc. 13 Stéphane Mallat. “Group invariant scattering”. In: Communications on Pure and Applied Mathematics 65.10 (2012), pp. 1331–1398. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 20 / 26
  • 50. 21/26 Insights - Progressive Linearisation14 Locally linearisation of Lτ from one layer to the next 14 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
  • 51. 21/26 Insights - Progressive Linearisation14 Locally linearisation of Lτ from one layer to the next Lipschitz continuity to deformations preferred over invariance 14 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
  • 52. 21/26 Insights - Progressive Linearisation14 Locally linearisation of Lτ from one layer to the next Lipschitz continuity to deformations preferred over invariance Assumption that translations/deformations are local symmetries i.e. that the target function does not vary locally under them. 14 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
  • 53. 21/26 Insights - Progressive Linearisation14 Locally linearisation of Lτ from one layer to the next Lipschitz continuity to deformations preferred over invariance Assumption that translations/deformations are local symmetries i.e. that the target function does not vary locally under them. After linearising local symmetries project linearly 14 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
  • 54. 21/26 Insights - Progressive Linearisation14 Locally linearisation of Lτ from one layer to the next Lipschitz continuity to deformations preferred over invariance Assumption that translations/deformations are local symmetries i.e. that the target function does not vary locally under them. After linearising local symmetries project linearly Classification using a linear classifier as is usually done in CNNs at the last layer. 14 Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 21 / 26
  • 55. 22/26 References I Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. George Cybenko. “Approximation by superpositions of a sigmoidal function”. In: Mathematics of control, signals and systems 2.4 (1989), pp. 303–314. Guneet S Dhillon et al. “Stochastic activation pruning for robust adversarial defense”. In: arXiv preprint arXiv:1803.01442 (2018). Massimo Fornasier, Jan Vybíral, and Ingrid Daubechies. “Robust and Resource Efficient Identification of Shallow Neural Networks by Fewest Samples”. In: (2019). arXiv: arXiv:1804.01592v2. Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. “Explaining and harnessing adversarial examples”. In: arXiv preprint arXiv:1412.6572 (2014). Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 22 / 26
  • 56. 23/26 References II Kurt Hornik, Maxwell Stinchcombe, and Halbert White. “Multilayer feedforward networks are universal approximators”. In: Neural networks 2.5 (1989), pp. 359–366. Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. “Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods”. In: arXiv preprint arXiv:1506.08473 (2015). Stéphane Mallat. “Group invariant scattering”. In: Communications on Pure and Applied Mathematics 65.10 (2012), pp. 1331–1398. Stéphane Mallat. Scattering Invariant Deep Networks for Classification. Stéphane Mallat. Understanding deep convolutional networks. 2016. DOI: 10.1098/rsta.2015.0203. arXiv: 1601.04920. H N Mhaskar. NEURAL NETWORKS FOR OPTIMAL APPROXIMATION OF SMOOTH AND ANALYTIC FUNCTIONS. Tech. rep. 1996, pp. 164–177. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 23 / 26
  • 57. 24/26 References III Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. “Deepfool: a simple and accurate method to fool deep neural networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 2574–2582. Tomaso Poggio et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. 2017. DOI: 10.1007/s11633-017-1054-2. arXiv: 1611.00740. Hanie Sedghi and Anima Anandkumar. “Provable methods for training neural networks with sparse connectivity”. In: arXiv preprint arXiv:1412.2693 (2014). Christian Szegedy et al. “Intriguing properties of neural networks”. In: arXiv preprint arXiv:1312.6199 (2013). Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 24 / 26
  • 58. 25/26 ADef-Finding Adversarial Deformations15 For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl where l is the true label. F∗(y) < 0. 15 Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
  • 59. 25/26 ADef-Finding Adversarial Deformations15 For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl where l is the true label. F∗(y) < 0. Define g : T → R as the map τ → F∗(yτ ). g(0) = F∗(y) < 0. 15 Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
  • 60. 25/26 ADef-Finding Adversarial Deformations15 For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl where l is the true label. F∗(y) < 0. Define g : T → R as the map τ → F∗(yτ ). g(0) = F∗(y) < 0. Aim: Find small τ ∈ T such that g(τ) ≥ 0. 15 Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
  • 61. 25/26 ADef-Finding Adversarial Deformations15 For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl where l is the true label. F∗(y) < 0. Define g : T → R as the map τ → F∗(yτ ). g(0) = F∗(y) < 0. Aim: Find small τ ∈ T such that g(τ) ≥ 0. Approximate g around 0 as g(τ) ≈ g(0) + (D0g)τ (1) where D0g is the derivative of g evaluated at zero. 15 Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
  • 62. 25/26 ADef-Finding Adversarial Deformations15 For score function F = (F1, · · · , FC ) : Y → RC define F∗ = Fk − Fl where l is the true label. F∗(y) < 0. Define g : T → R as the map τ → F∗(yτ ). g(0) = F∗(y) < 0. Aim: Find small τ ∈ T such that g(τ) ≥ 0. Approximate g around 0 as g(τ) ≈ g(0) + (D0g)τ (1) where D0g is the derivative of g evaluated at zero. Solve (D0g)τ = −g(0). 15 Rima Alaifari, Giovanni S. Alberti, and Tandri Gauksson. “ADef: an Iterative Algorithm to Construct Adversarial Deformations”. In: (2018). DOI: 10.1111/j.1469-185X.1996.tb00746.x. arXiv: 1804.07729. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 25 / 26
  • 63. 26/26 Insights16 Theorem (Lipschitz continuity to diffeomorphisms) There exists C > 0 such that for all f ∈ L2(Rd ) with f 1 < ∞ and all τ ∈ C2(Rd ) with τ ∞ ≤ 1/2 the following statement holds: SJ[PJ]Lτ f − SJ[PJ]f ≤ C f 1K(τ) (2) where K(τ) = 2−J τ ∞ + τ ∞ max log ∆τ ∞ τ ∞ , 1 + Hτ ∞. 2−J τ ∞ and τ ∞ quantify the extent to which τ translates and deforms the input respectively. 16 Stéphane Mallat. “Group invariant scattering”. In: Communications on Pure and Applied Mathematics 65.10 (2012), pp. 1331–1398. Alina Leidinger (TUM) Mathematical Analysis of NNs 15 June, 2019 26 / 26