SlideShare a Scribd company logo
1 of 116
Download to read offline
An introduction to neural networks
Nathalie Vialaneix & Hyphen
nathalie.vialaneix@inra.fr
http://www.nathalievialaneix.eu &
https://www.hyphen-stat.com/en/
DATE ET LIEU
Nathalie Vialaneix & Hyphen | Introduction to neural networks 1/41
Outline
1 Statistical / machine / automatic learning
Background and notations
Underļ¬tting / Overļ¬tting
Consistency
Avoiding overļ¬tting
2 (non deep) Neural networks
Presentation of multi-layer perceptrons
Theoretical properties of perceptrons
Learning perceptrons
Learning in practice
3 Deep neural networks
Overview
CNN
Nathalie Vialaneix & Hyphen | Introduction to neural networks 2/41
Outline
1 Statistical / machine / automatic learning
Background and notations
Underļ¬tting / Overļ¬tting
Consistency
Avoiding overļ¬tting
2 (non deep) Neural networks
Presentation of multi-layer perceptrons
Theoretical properties of perceptrons
Learning perceptrons
Learning in practice
3 Deep neural networks
Overview
CNN
Nathalie Vialaneix & Hyphen | Introduction to neural networks 3/41
Background
Purpose: predict Y from X;
Nathalie Vialaneix & Hyphen | Introduction to neural networks 4/41
Background
Purpose: predict Y from X;
What we have: n observations of (X, Y), (x1, y1), . . . , (xn, yn);
Nathalie Vialaneix & Hyphen | Introduction to neural networks 4/41
Background
Purpose: predict Y from X;
What we have: n observations of (X, Y), (x1, y1), . . . , (xn, yn);
What we want: estimate unknown Y from new X: xn+1, . . . , xm.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 4/41
Basics
From (xi, yi)i, deļ¬nition of a machine, Ī¦n
s.t.:
Ė†ynew = Ī¦n
(xnew).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
Basics
From (xi, yi)i, deļ¬nition of a machine, Ī¦n
s.t.:
Ė†ynew = Ī¦n
(xnew).
if Y is numeric, Ī¦n
is called a regression function fonction de
rƩgression;
if Y is a factor, Ī¦n
is called a classiļ¬er classiļ¬eur;
Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
Basics
From (xi, yi)i, deļ¬nition of a machine, Ī¦n
s.t.:
Ė†ynew = Ī¦n
(xnew).
if Y is numeric, Ī¦n
is called a regression function fonction de
rƩgression;
if Y is a factor, Ī¦n
is called a classiļ¬er classiļ¬eur;
Ī¦n
is said to be trained or learned from the observations (xi, yi)i.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
Basics
From (xi, yi)i, deļ¬nition of a machine, Ī¦n
s.t.:
Ė†ynew = Ī¦n
(xnew).
if Y is numeric, Ī¦n
is called a regression function fonction de
rƩgression;
if Y is a factor, Ī¦n
is called a classiļ¬er classiļ¬eur;
Ī¦n
is said to be trained or learned from the observations (xi, yi)i.
Desirable properties
accuracy to the observations: predictions made on known data are
close to observed values;
Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
Basics
From (xi, yi)i, deļ¬nition of a machine, Ī¦n
s.t.:
Ė†ynew = Ī¦n
(xnew).
if Y is numeric, Ī¦n
is called a regression function fonction de
rƩgression;
if Y is a factor, Ī¦n
is called a classiļ¬er classiļ¬eur;
Ī¦n
is said to be trained or learned from the observations (xi, yi)i.
Desirable properties
accuracy to the observations: predictions made on known data are
close to observed values;
generalization ability: predictions made on new data are also
accurate.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
Basics
From (xi, yi)i, deļ¬nition of a machine, Ī¦n
s.t.:
Ė†ynew = Ī¦n
(xnew).
if Y is numeric, Ī¦n
is called a regression function fonction de
rƩgression;
if Y is a factor, Ī¦n
is called a classiļ¬er classiļ¬eur;
Ī¦n
is said to be trained or learned from the observations (xi, yi)i.
Desirable properties
accuracy to the observations: predictions made on known data are
close to observed values;
generalization ability: predictions made on new data are also
accurate.
Conļ¬‚icting objectives!!
Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
Underļ¬tting/Overļ¬tting sous/sur - apprentissage
Function x ā†’ y to be estimated
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Underļ¬tting/Overļ¬tting sous/sur - apprentissage
Observations we might have
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Underļ¬tting/Overļ¬tting sous/sur - apprentissage
Observations we do have
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Underļ¬tting/Overļ¬tting sous/sur - apprentissage
First estimation from the observations: underļ¬tting
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Underļ¬tting/Overļ¬tting sous/sur - apprentissage
Second estimation from the observations: accurate estimation
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Underļ¬tting/Overļ¬tting sous/sur - apprentissage
Third estimation from the observations: overļ¬tting
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Underļ¬tting/Overļ¬tting sous/sur - apprentissage
Summary
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Errors
training error (measures the accuracy to the observations)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassiļ¬cation rate
{Ė†yi yi, i = 1, . . . , n}
n
Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassiļ¬cation rate
{Ė†yi yi, i = 1, . . . , n}
n
if y is numeric: mean square error (MSE)
1
n
n
i=1
(Ė†yi āˆ’ yi)2
Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassiļ¬cation rate
{Ė†yi yi, i = 1, . . . , n}
n
if y is numeric: mean square error (MSE)
1
n
n
i=1
(Ė†yi āˆ’ yi)2
or root mean square error (RMSE) or pseudo-R2
: 1āˆ’MSE/Var((yi)i)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassiļ¬cation rate
{Ė†yi yi, i = 1, . . . , n}
n
if y is numeric: mean square error (MSE)
1
n
n
i=1
(Ė†yi āˆ’ yi)2
or root mean square error (RMSE) or pseudo-R2
: 1āˆ’MSE/Var((yi)i)
test error: a way to prevent overļ¬tting (estimates the generalization
error) is the simple validation
Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassiļ¬cation rate
{Ė†yi yi, i = 1, . . . , n}
n
if y is numeric: mean square error (MSE)
1
n
n
i=1
(Ė†yi āˆ’ yi)2
or root mean square error (RMSE) or pseudo-R2
: 1āˆ’MSE/Var((yi)i)
test error: a way to prevent overļ¬tting (estimates the generalization
error) is the simple validation
1 split the data into training/test sets (usually 80%/20%)
2 train Ī¦n
from the training dataset
3 compute the test error from the remaining data
Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
Example
Observations
Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
Example
Training/Test datasets
Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
Example
Training/Test errors
Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
Example
Summary
Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
Practical consequences
A machine or an algorithm is a method that:
start from a given set of prediction functions C
use observations (xi, yi)i to pick the ā€œbestā€ prediction function
according to what has already been observed.
This is often done by the estimation of some parameters (e.g.,
coefļ¬cients in linear models, weights in neural networks, ...)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 9/41
Practical consequences
A machine or an algorithm is a method that:
start from a given set of prediction functions C
use observations (xi, yi)i to pick the ā€œbestā€ prediction function
according to what has already been observed.
This is often done by the estimation of some parameters (e.g.,
coefļ¬cients in linear models, weights in neural networks, ...)
C can depend on user choices (e.g., achitecture, number of neurons in
neural networks) ā‡’ hyper-parameters
parameters (learned from data) hyper-parameters (can not be learned from data)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 9/41
Practical consequences
A machine or an algorithm is a method that:
start from a given set of prediction functions C
use observations (xi, yi)i to pick the ā€œbestā€ prediction function
according to what has already been observed.
This is often done by the estimation of some parameters (e.g.,
coefļ¬cients in linear models, weights in neural networks, ...)
C can depend on user choices (e.g., achitecture, number of neurons in
neural networks) ā‡’ hyper-parameters
parameters (learned from data) hyper-parameters (can not be learned from data)
hyper-parameters often control the richness of C: must be carefully tuned
to ensure a good accuracy and avoid overļ¬tting
Nathalie Vialaneix & Hyphen | Introduction to neural networks 9/41
Model selection / Fair error evaluation
Several strategies can be used to estimate the error of a set C and to be
able to select the best family of models C (tuning of hyper-parameters):
split the data into a training/test set (āˆ¼ 67/33%): the model is
estimated with the training dataset and a test error is computed with
the remaining observations;
Nathalie Vialaneix & Hyphen | Introduction to neural networks 10/41
Model selection / Fair error evaluation
Several strategies can be used to estimate the error of a set C and to be
able to select the best family of models C (tuning of hyper-parameters):
split the data into a training/test set (āˆ¼ 67/33%)
cross validation: split the data into L folds. For each fold, train a
model without the data included in the current fold and compute the
error with the data included in the fold: the averaging of these L errors
is the cross validation error;
fold 1 fold 2 fold 3 . . . fold K
train without fold 2: Ī¦āˆ’2
error fold 2: err(Ļ†āˆ’2
, fold 2)
CV error: 1
K l err(Ī¦āˆ’l
, fold l)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 10/41
Model selection / Fair error evaluation
Several strategies can be used to estimate the error of a set C and to be
able to select the best family of models C (tuning of hyper-parameters):
split the data into a training/test set (āˆ¼ 67/33%)
cross validation: split the data into L folds. For each fold, train a
model without the data included in the current fold and compute the
error with the data included in the fold: the averaging of these L errors
is the cross validation error;
out-of-bag error: create B bootstrap samples (size n taken with
replacement in the original sample) and compute a prediction based
on Out-Of-Bag samples (see next slide).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 10/41
Out-of-bag observations, prediction and error
OOB (Out-Of Bags) error: error based on the observations not included in
the ā€œbagā€:
(xi, yi)
i āˆˆ T 1
i āˆˆ T b i T B
Ī¦1
Ī¦b
Ī¦B
OOB prediction: Ī¦B
(xi)
used to train the model
prediction algorithm
Nathalie Vialaneix & Hyphen | Introduction to neural networks 11/41
Other model selection methods and strategies to avoid
overļ¬tting
Main idea: Penalize the training error with something that represents the
complexity of the model
model selection with explicit penalization strategy: minimize (āˆ¼) error
+ complexity
Example: BIC/AIC criterion in linear models
āˆ’2L + log n Ɨ p
avoiding overļ¬tting with implicit penalization strategy: constrain the
parameters w to take their values within a reduced subset
min error(w) + Ī» w
with w the parameter of the model (learned) and . typically the 1 or
2 norm
Nathalie Vialaneix & Hyphen | Introduction to neural networks 12/41
Outline
1 Statistical / machine / automatic learning
Background and notations
Underļ¬tting / Overļ¬tting
Consistency
Avoiding overļ¬tting
2 (non deep) Neural networks
Presentation of multi-layer perceptrons
Theoretical properties of perceptrons
Learning perceptrons
Learning in practice
3 Deep neural networks
Overview
CNN
Nathalie Vialaneix & Hyphen | Introduction to neural networks 13/41
What are (artiļ¬cial) neural networks?
Common properties
(artiļ¬cial) ā€œNeural networksā€: general name for supervised and
unsupervised methods developed in (vague) analogy to the brain;
Nathalie Vialaneix & Hyphen | Introduction to neural networks 14/41
What are (artiļ¬cial) neural networks?
Common properties
(artiļ¬cial) ā€œNeural networksā€: general name for supervised and
unsupervised methods developed in (vague) analogy to the brain;
combination (network) of simple elements (neurons).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 14/41
What are (artiļ¬cial) neural networks?
Common properties
(artiļ¬cial) ā€œNeural networksā€: general name for supervised and
unsupervised methods developed in (vague) analogy to the brain;
combination (network) of simple elements (neurons).
Example of graphical representation:
INPUTS
OUTPUTS
Nathalie Vialaneix & Hyphen | Introduction to neural networks 14/41
Different types of neural networks
A neural network is deļ¬ned by:
1 the network structure;
2 the neuron type.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
Different types of neural networks
A neural network is deļ¬ned by:
1 the network structure;
2 the neuron type.
Standard examples
Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to
supervised problems (classiļ¬cation and regression);
Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
Different types of neural networks
A neural network is deļ¬ned by:
1 the network structure;
2 the neuron type.
Standard examples
Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to
supervised problems (classiļ¬cation and regression);
Radial basis function networks (RBF): same purpose but based on
local smoothing;
Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
Different types of neural networks
A neural network is deļ¬ned by:
1 the network structure;
2 the neuron type.
Standard examples
Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to
supervised problems (classiļ¬cation and regression);
Radial basis function networks (RBF): same purpose but based on
local smoothing;
Self-organizing maps (SOM also sometimes called Kohonenā€™s maps)
or Topographic maps: dedicated to unsupervised problems
(clustering), self-organized;
. . .
Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
Different types of neural networks
A neural network is deļ¬ned by:
1 the network structure;
2 the neuron type.
Standard examples
Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to
supervised problems (classiļ¬cation and regression);
Radial basis function networks (RBF): same purpose but based on
local smoothing;
Self-organizing maps (SOM also sometimes called Kohonenā€™s maps)
or Topographic maps: dedicated to unsupervised problems
(clustering), self-organized;
. . .
In this talk, focus on MLP.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
MLP: Advantages/Drawbacks
Advantages
classiļ¬cation OR regression (i.e., Y can be a numeric variable or a
factor);
non parametric method: ļ¬‚exible;
good theoretical properties.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 16/41
MLP: Advantages/Drawbacks
Advantages
classiļ¬cation OR regression (i.e., Y can be a numeric variable or a
factor);
non parametric method: ļ¬‚exible;
good theoretical properties.
Drawbacks
hard to train (high computational cost, especially when d is large);
overļ¬t easily;
ā€œblack boxā€ models (hard to interpret).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 16/41
References
Advised references:
[Bishop, 1995, Ripley, 1996] overview of the topic from a learning (more
than statistical) perspective
[Devroye et al., 1996, Gyƶrļ¬ et al., 2002] in dedicated chapters present
statistical properties of perceptrons
Nathalie Vialaneix & Hyphen | Introduction to neural networks 17/41
Analogy to the brain
1 a neuron collects signals
from neighboring
neurons through its
dendrites
connexions which frequently lead to activating a neuron are enforced (tend
to have an increasing impact on the destination neuron)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 18/41
Analogy to the brain
1 a neuron collects signals
from neighboring
neurons through its
dendrites
2 when total signal is
above a given threshold,
the neuron is activated
connexions which frequently lead to activating a neuron are enforced (tend
to have an increasing impact on the destination neuron)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 18/41
Analogy to the brain
1 a neuron collects signals
from neighboring
neurons through its
dendrites
2 when total signal is
above a given threshold,
the neuron is activated
3 ... and a signal is sent to
other neurons through
the axon
connexions which frequently lead to activating a neuron are enforced (tend
to have an increasing impact on the destination neuron)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 18/41
First model of artiļ¬cial neuron
[Mc Culloch and Pitts, 1943, Rosenblatt, 1958, Rosenblatt, 1962]
x(1)
x(2)
x(p)
f(x)Ī£+
w1
w2
wp
w0
f : x āˆˆ Rp
ā†’ 1 p
j=1
wjx(j)+w0 ā‰„ 0
Nathalie Vialaneix & Hyphen | Introduction to neural networks 19/41
(artiļ¬cial) Perceptron
Layers
MLP have one input layer (x āˆˆ Rp
), one output layer (y āˆˆ R or
āˆˆ {1, . . . , K āˆ’ 1} values) and several hidden layers;
no connections within a layer;
connections between two consecutive layers (feedforward).
Example (regression, y āˆˆ R):
INPUTS
x = (x(1)
, . . . , x(p)
)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
(artiļ¬cial) Perceptron
Layers
MLP have one input layer (x āˆˆ Rp
), one output layer (y āˆˆ R or
āˆˆ {1, . . . , K āˆ’ 1} values) and several hidden layers;
no connections within a layer;
connections between two consecutive layers (feedforward).
Example (regression, y āˆˆ R):
INPUTS
x = (x(1)
, . . . , x(p)
) Layer 1
weights w
(1)
jk
Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
(artiļ¬cial) Perceptron
Layers
MLP have one input layer (x āˆˆ Rp
), one output layer (y āˆˆ R or
āˆˆ {1, . . . , K āˆ’ 1} values) and several hidden layers;
no connections within a layer;
connections between two consecutive layers (feedforward).
Example (regression, y āˆˆ R):
INPUTS
x = (x(1)
, . . . , x(p)
) Layer 1
Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
(artiļ¬cial) Perceptron
Layers
MLP have one input layer (x āˆˆ Rp
), one output layer (y āˆˆ R or
āˆˆ {1, . . . , K āˆ’ 1} values) and several hidden layers;
no connections within a layer;
connections between two consecutive layers (feedforward).
Example (regression, y āˆˆ R):
INPUTS
x = (x(1)
, . . . , x(p)
) Layer 1 Layer 2
Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
(artiļ¬cial) Perceptron
Layers
MLP have one input layer (x āˆˆ Rp
), one output layer (y āˆˆ R or
āˆˆ {1, . . . , K āˆ’ 1} values) and several hidden layers;
no connections within a layer;
connections between two consecutive layers (feedforward).
Example (regression, y āˆˆ R):
INPUTS
x = (x(1)
, . . . , x(p)
) Layer 1 Layer 2 y
OUTPUTS
2 hidden layer MLP
Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
A neuron in MLP
v1
v2
v3
Ɨw1
Ɨw2
Ɨw3
+
w0 (Bias Biais)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
A neuron in MLP
v1
v2
v3
Ɨw1
Ɨw2
Ɨw3 Ɨw2
Ɨw1
+
w0
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
A neuron in MLP
v1
v2
v3
Ɨw1
Ɨw2
Ɨw3 Ɨw2
Ɨw1
+
w0
Standard activation functions fonctions de transfert / dā€™activation
Biologically inspired: Heaviside function
h(z) =
0 if z < 0
1 otherwise.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
A neuron in MLP
v1
v2
v3
Ɨw1
Ɨw2
Ɨw3 Ɨw2
Ɨw1
+
w0
Standard activation functions
Main issue with the Heaviside function: not continuous!
Identity
h(z) = z
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
A neuron in MLP
v1
v2
v3
Ɨw1
Ɨw2
Ɨw3 Ɨw2
Ɨw1
+
w0
Standard activation functions
But identity activation function gives linear model if used with one hidden
layer: not ļ¬‚exible enough
Logistic function
h(z) = 1
1+exp(āˆ’z)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
A neuron in MLP
v1
v2
v3
Ɨw1
Ɨw2
Ɨw3 Ɨw2
Ɨw1
+
w0
Standard activation functions
Another popular activation function (useful to model positive real numbers)
Rectiļ¬ed linear (ReLU)
h(z) = max(0, z)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
A neuron in MLP
v1
v2
v3
Ɨw1
Ɨw2
Ɨw3 Ɨw2
Ɨw1
+
w0
General sigmoid
sigmoid: nondecreasing function h : R ā†’ R such that
lim
zā†’+āˆž
h(z) = 1 lim
zā†’āˆ’āˆž
h(z) = 0
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
Focus on one-hidden-layer perceptrons (R package nnet)
Regression case
x(1)
x(2)
x(p) w(1)
w(2)+
w
(0)
1
+
w
(0)
Q
Ī¦(x)
Ī¦(x) =
Q
k=1
w
(2)
k
hk x w
(1)
k
+ w
(0)
k
+ w
(2)
0
, with hk a (logistic) sigmoid
Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
Focus on one-hidden-layer perceptrons (R package nnet)
Binary classiļ¬cation case (y āˆˆ {0, 1})
x(1)
x(2)
x(p) w(1)
w(2)+
w
(0)
1
+
w
(0)
Q
Ļ†(x)
Ļ†(x) = h0
ļ£«
ļ£¬ļ£¬ļ£¬ļ£¬ļ£¬ļ£¬ļ£­
Q
k=1
w
(2)
k
hk x w
(1)
k
+ w
(0)
k
+ w
(2)
0
ļ£¶
ļ£·ļ£·ļ£·ļ£·ļ£·ļ£·ļ£ø
with h0 logistic sigmoid or identity.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
Focus on one-hidden-layer perceptrons (R package nnet)
Binary classiļ¬cation case (y āˆˆ {0, 1})
x(1)
x(2)
x(p) w(1)
w(2)+
w
(0)
1
+
w
(0)
Q
Ļ†(x)
decision with:
Ī¦(x) =
0 if Ļ†(x) < 1/2
1 otherwise
Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
Focus on one-hidden-layer perceptrons (R package nnet)
Extension to any classiļ¬cation problem in {1, . . . , K āˆ’ 1}
x(1)
x(2)
x(p) w(1)
w(2)+
w
(0)
1
+
w
(0)
Q
Ļ†(x)
Straightforward extension to multiple classes with a multiple output
perceptron (number of output units equal to K) and a maximum probability
rule for the decision.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
Theoretical properties of perceptrons
This section answers two questions:
1 can we approximate any function g : [0, 1]p
ā†’ R arbitrary well with a
perceptron?
Nathalie Vialaneix & Hyphen | Introduction to neural networks 23/41
Theoretical properties of perceptrons
This section answers two questions:
1 can we approximate any function g : [0, 1]p
ā†’ R arbitrary well with a
perceptron?
2 when a perceptron is trained with i.i.d. observations from an arbitrary
random variable pair (X, Y), is it consistent? (i.e., does it reach the
minimum possible error asymptotically when the number of
observations grows to inļ¬nity?)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 23/41
Illustration of the universal approximation property
Simple examples
a function to approximate: g : [0, 1] ā†’ sin 1
x+0.1
Nathalie Vialaneix & Hyphen | Introduction to neural networks 24/41
Illustration of the universal approximation property
Simple examples
a function to approximate: g : [0, 1] ā†’ sin 1
x+0.1
trying to approximate (how this is performed is explained later in this talk) this
function with MLP having different numbers of neurons on their
hidden layer
Nathalie Vialaneix & Hyphen | Introduction to neural networks 24/41
Sketch of main properties of 1-hidden layer MLP
1 universal approximation [Pinkus, 1999, Devroye et al., 1996,
Hornik, 1991, Hornik, 1993, Stinchcombe, 1999]: 1-hidden-layer MLP
can approximate arbitrary well any (regular enough) function (but, in
theory, the number of neurons can grow arbitrarily for that)
2 consistency
[Farago and Lugosi, 1993, Devroye et al., 1996, White, 1990,
White, 1991, Barron, 1994, McCaffrey and Gallant, 1994]:
1-hidden-layer MLP are universally consistent with a number of
neurons that grows slower than O n
log n to +āˆž
ā‡’ the number of neurons controls the richness of the MLP and has to
increase with n
Nathalie Vialaneix & Hyphen | Introduction to neural networks 25/41
Empirical error minimization
Given i.i.d. observations of (X, Y), (xi, yi)i, how to choose the weights w?
x(1)
x(2)
x(p) w(1)
w(2)+
w
(0)
1
+
w
(0)
Q
Ī¦w(x)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 26/41
Empirical error minimization
Given i.i.d. observations of (X, Y), (xi, yi)i, how to choose the weights w?
Standard approach: minimize the empirical L2 risk:
Ln(w) =
n
i=1
[fw(xi) āˆ’ yi]2
with
yi āˆˆ R for the regression case
yi āˆˆ {0, 1} for the classiļ¬cation case, with the associated decision rule
x ā†’ 1{Ī¦w (x)ā‰¤1/2}.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 26/41
Empirical error minimization
Given i.i.d. observations of (X, Y), (xi, yi)i, how to choose the weights w?
Standard approach: minimize the empirical L2 risk:
Ln(w) =
n
i=1
[fw(xi) āˆ’ yi]2
with
yi āˆˆ R for the regression case
yi āˆˆ {0, 1} for the classiļ¬cation case, with the associated decision rule
x ā†’ 1{Ī¦w (x)ā‰¤1/2}.
But: Ln(w) is not convex in w ā‡’ general optimization problem
Nathalie Vialaneix & Hyphen | Introduction to neural networks 26/41
Optimization with gradient descent
Method: initialize (randomly or with some prior knowledge) the weights
w(0) āˆˆ RQp+2Q+1
Batch approach: for t = 1, . . . , T
w(t + 1) = w(t) āˆ’ Āµ(t) wLn(w)(w(t));
Nathalie Vialaneix & Hyphen | Introduction to neural networks 27/41
Optimization with gradient descent
Method: initialize (randomly or with some prior knowledge) the weights
w(0) āˆˆ RQp+2Q+1
Batch approach: for t = 1, . . . , T
w(t + 1) = w(t) āˆ’ Āµ(t) wLn(w)(w(t));
online (or stochastic) approach: write
Ln(w)(w) =
n
i=1
[fw(xi) āˆ’ yi]2
=Ei
and for t = 1, . . . , T, randomly pick i āˆˆ {1, . . . , n} and update:
w(t + 1) = w(t) āˆ’ Āµ(t) wEi(w(t)).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 27/41
Discussion about practical choices for this approach
batch version converges (from an optimization point of view) to a local
minimum of the error for a good choice of Āµ(t) but convergence can
be slow
stochastic version is usually inefļ¬cient but is useful for large datasets
(n large)
more efļ¬cient algorithms exist to solve the optimization task. The one
implemented in the R package nnet uses higher order derivatives
(BFGS algorithm) but this is a very active area of research in which
many improvements have been made recently
in all cases, solutions returned are, at best, local minima which
strongly depends on the initialization: using more than one
initialization state is advised, as well as checking the variability among
a few runs
Nathalie Vialaneix & Hyphen | Introduction to neural networks 28/41
Gradient backpropagation method
[Rumelhart and Mc Clelland, 1986]
The gradient backpropagation rƩtropropagation du gradient principle is
used to easily computes gradients in perceptrons (or in other types of
neural network):
Nathalie Vialaneix & Hyphen | Introduction to neural networks 29/41
Gradient backpropagation method
[Rumelhart and Mc Clelland, 1986]
The gradient backpropagation rƩtropropagation du gradient principle is
used to easily computes gradients in perceptrons (or in other types of
neural network):
This way, stochastic gradient descent alternates:
a forward step which aims at calculating outputs from all observations
xi given a value of the weights w
a backward step in which the gradient backpropagation principle is
used to obtain the gradient for the current weights w
Nathalie Vialaneix & Hyphen | Introduction to neural networks 29/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(1)
w(2)
w
(0)
1
w
(0)
Q
Ī¦w(x)
initialize weights
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(1)
w(2)
w
(0)
1
w
(0)
Q
Ī¦w(x)
Forward step: for all k, compute a
(1)
k
= xi
w
(1)
k
+ w
(0)
k
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(1)
w(2)
w
(0)
1
w
(0)
Q
Ī¦w(x)
Forward step: for all k, compute z
(1)
k
= hk (a
(1)
k
)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(1)
w(2)
w
(0)
1
w
(0)
Q
Ī¦w(x) = a(2)
Forward step: compute a(2) = Q
k=1
w
(2)
k
z
(1)
k
+ w
(2)
0
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(1)
w
(0)
1
w
(0)
Q
w(2)
Ī¦w(x) = a(2)
Backward step: compute āˆ‚Ei
āˆ‚w
(2)
k
= Ī“(2) Ɨ zk with
Ī“(2)
= āˆ‚Ei
āˆ‚a(2) =
[h0(a(2))āˆ’yi ]2
āˆ‚a(2) = 2h0(a(2)
) Ɨ [h0(a(2)
) āˆ’ yi]
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(2)
w
(0)
1
w
(0)
Q
w(1)
Ī¦w(x)
Backward step: āˆ‚Ei
āˆ‚w
(1)
kj
= Ī“
(1)
k
Ɨ x
(j)
i
with
Ī“
(1)
k
= āˆ‚Ei
āˆ‚a
(1)
k
= āˆ‚Ei
āˆ‚a(2) Ɨ āˆ‚a(2)
āˆ‚a
(1)
k
= Ī“(2)
Ɨ w
(2)
k
hk (a
(1)
k
)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(1)
w(2)
w
(0)
1
w
(0)
Q
Ī¦w(x)
Backward step: āˆ‚Ei
āˆ‚w
(0)
k
= Ī“
(1)
k
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Initialization and stopping of the training algorithm
1 How to initialize weights? Standard choices w
(1)
jk
āˆ¼ N(0, 1/
āˆš
p) and
w
(2)
k
āˆ¼ N(0, 1/
āˆš
Q)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 31/41
Initialization and stopping of the training algorithm
1 How to initialize weights? Standard choices w
(1)
jk
āˆ¼ N(0, 1/
āˆš
p) and
w
(2)
k
āˆ¼ N(0, 1/
āˆš
Q)
2 When to stop the algorithm? (gradient descent or alike) Standard
choices:
bounded T
target value of the error Ė†Ln(w)
target value of the evolution Ė†Ln(w(t)) āˆ’ Ė†Ln(w(t + 1))
Nathalie Vialaneix & Hyphen | Introduction to neural networks 31/41
Initialization and stopping of the training algorithm
1 How to initialize weights? Standard choices w
(1)
jk
āˆ¼ N(0, 1/
āˆš
p) and
w
(2)
k
āˆ¼ N(0, 1/
āˆš
Q)
In the R package nnet, weights are sampled uniformly between
[āˆ’0.5, 0.5] or between āˆ’ 1
maxi X
(j)
i
, 1
maxi X
(j)
i
if X(j) is large.
2 When to stop the algorithm? (gradient descent or alike) Standard
choices:
bounded T
target value of the error Ė†Ln(w)
target value of the evolution Ė†Ln(w(t)) āˆ’ Ė†Ln(w(t + 1))
In the R package nnet, a combination of the three criteria is used and
tunable.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 31/41
Strategies to avoid overļ¬tting
Properly tune Q with a CV or a bootstrap estimation of the
generalization ability of the method
Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
Strategies to avoid overļ¬tting
Properly tune Q with a CV or a bootstrap estimation of the
generalization ability of the method
Early stopping: for Q large enough, use a part of the data as a
validation set and stops the training (gradient descent) when the
empirical error computed on this dataset starts to increase
Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
Strategies to avoid overļ¬tting
Properly tune Q with a CV or a bootstrap estimation of the
generalization ability of the method
Early stopping: for Q large enough, use a part of the data as a
validation set and stops the training (gradient descent) when the
empirical error computed on this dataset starts to increase
Weight decay: for Q large enough, penalize the empirical risk with a
function of the weights, e.g.,
Ė†Ln(w) + Ī»w w
Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
Strategies to avoid overļ¬tting
Properly tune Q with a CV or a bootstrap estimation of the
generalization ability of the method
Early stopping: for Q large enough, use a part of the data as a
validation set and stops the training (gradient descent) when the
empirical error computed on this dataset starts to increase
Weight decay: for Q large enough, penalize the empirical risk with a
function of the weights, e.g.,
Ė†Ln(w) + Ī»w w
Noise injection: modify the input data with a random noise during the
training
Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
Outline
1 Statistical / machine / automatic learning
Background and notations
Underļ¬tting / Overļ¬tting
Consistency
Avoiding overļ¬tting
2 (non deep) Neural networks
Presentation of multi-layer perceptrons
Theoretical properties of perceptrons
Learning perceptrons
Learning in practice
3 Deep neural networks
Overview
CNN
Nathalie Vialaneix & Hyphen | Introduction to neural networks 33/41
What is ā€œdeep learningā€?
INPUTS
= (x(1)
, . . . , x(p)
) Layer 1 Layer 2 y
OUTPUTS
2 hidden layer MLP
Learning with neural networks except that...
the number of neurons is huge!!!
Image taken from https://towardsdatascience.com/
Nathalie Vialaneix & Hyphen | Introduction to neural networks 34/41
Main types of deep NN
deep MLP
Nathalie Vialaneix & Hyphen | Introduction to neural networks 35/41
Main types of deep NN
deep MLP
CNN (Convolutional Neural Network): used in image processing
Images taken from [Krizhevsky et al., 2012, Zeiler and Fergus, 2014]
Nathalie Vialaneix & Hyphen | Introduction to neural networks 35/41
Main types of deep NN
deep MLP
CNN (Convolutional Neural Network): used in image processing
RNN (Recurrent Neural Network): used for data with a temporal
sequence (in natural language processing for instance)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 35/41
Why such a big buzz around deep learning?...
ImageNet results http://www.image-net.org/challenges/LSVRC/
Image by courtesy of StƩphane Canu
See also: http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
Nathalie Vialaneix & Hyphen | Introduction to neural networks 36/41
Improvements in learning parameters
optimization algorithm: gradient descent is used but choice of the
descent step (Āµ(t)) has been improved [Ruder, 2016] and mini-batches
are also often used to speed the learning (ex: ADAM,
[Kingma and Ba, 2017])
Nathalie Vialaneix & Hyphen | Introduction to neural networks 37/41
Improvements in learning parameters
optimization algorithm: gradient descent is used but choice of the
descent step (Āµ(t)) has been improved [Ruder, 2016] and mini-batches
are also often used to speed the learning (ex: ADAM,
[Kingma and Ba, 2017])
avoid overļ¬tting with dropouts: randomly removing a fraction p of the
neurons (and their connexions) during each step of the training step
(larger number of iterations to converge but faster update at each
iteration) [Srivastava et al., 2014]
INPUTS
x = (x(1)
, . . . , x(p)
) Layer 1 Layer 2 y
OUTPUTS
Nathalie Vialaneix & Hyphen | Introduction to neural networks 37/41
Improvements in learning parameters
optimization algorithm: gradient descent is used but choice of the
descent step (Āµ(t)) has been improved [Ruder, 2016] and mini-batches
are also often used to speed the learning (ex: ADAM,
[Kingma and Ba, 2017])
avoid overļ¬tting with dropouts: randomly removing a fraction p of the
neurons (and their connexions) during each step of the training step
(larger number of iterations to converge but faster update at each
iteration) [Srivastava et al., 2014]
INPUTS
x = (x(1)
, . . . , x(p)
) Layer 1 Layer 2 y
OUTPUTS
Nathalie Vialaneix & Hyphen | Introduction to neural networks 37/41
Main available implementations
Theano
Tensor-
Flow
Caffe
PyTorch
(and
autograd)
Keras
Date 2008-2017 2015-... 2013-... 2016-... 2017-...
Main dev
Montreal
University
(Y. Bengio)
Google
Brain
Berkeley
University
community
driven
(Facebook,
Google, ...)
FranƧois
Chollet /
RStudio
Lan-
guage(s)
Python
C++/
Python
C++/
Python/
Matlab
Python
(with
strong
GPU
accelera-
tions)
high level
API on top
of Tensor-
Flow (and
CNTK,
Theano...):
Python, R
Nathalie Vialaneix & Hyphen | Introduction to neural networks 38/41
General architecture of CNN
Image taken from the article https://doi.org/10.3389/fpls.2017.02235
Nathalie Vialaneix & Hyphen | Introduction to neural networks 39/41
Convolutional layer
Images taken from https://towardsdatascience.com
Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
Convolutional layer
Images taken from https://towardsdatascience.com
Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
Convolutional layer
Images taken from https://towardsdatascience.com
Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
Convolutional layer
Images taken from https://towardsdatascience.com
Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
Convolutional layer
Images taken from https://towardsdatascience.com
Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
Pooling layer
Basic principle: select a subsample of the previous layer values
Generally used: max pooling
Image taken from Wikimedia Commons, by Aphex34
Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41
References
Barron, A. (1994).
Approximation and estimation bounds for artiļ¬cial neural networks.
Machine Learning, 14:115ā€“133.
Bishop, C. (1995).
Neural Networks for Pattern Recognition.
Oxford University Press, New York, USA.
Devroye, L., Gyƶrļ¬, L., and Lugosi, G. (1996).
A Probabilistic Theory for Pattern Recognition.
Springer-Verlag, New York, NY, USA.
Farago, A. and Lugosi, G. (1993).
Strong universal consistency of neural network classiļ¬ers.
IEEE Transactions on Information Theory, 39(4):1146ā€“1151.
Gyƶrļ¬, L., Kohler, M., KrzyĖ™zak, A., and Walk, H. (2002).
A Distribution-Free Theory of Nonparametric Regression.
Springer-Verlag, New York, NY, USA.
Hornik, K. (1991).
Approximation capabilities of multilayer feedfoward networks.
Neural Networks, 4(2):251ā€“257.
Hornik, K. (1993).
Some new results on neural network approximation.
Neural Networks, 6(8):1069ā€“1072.
Kingma, D. P. and Ba, J. (2017).
Adam: a method for stochastic optimization.
arXiv preprint arXiv: 1412.6980.
Krizhevsky, A., Sutskever, I., and Hinton, Geoffrey, E. (2012).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41
ImageNet classiļ¬cation with deep convolutional neural networks.
In Pereira, F., Burges, C., Bottou, L., and Weinberger, K., editors, Proceedings of Neural Information Processing Systems (NIPS
2012), volume 25.
Mc Culloch, W. and Pitts, W. (1943).
A logical calculus of ideas immanent in nervous activity.
Bulletin of Mathematical Biophysics, 5(4):115ā€“133.
McCaffrey, D. and Gallant, A. (1994).
Convergence rates for single hidden layer feedforward networks.
Neural Networks, 7(1):115ā€“133.
Pinkus, A. (1999).
Approximation theory of the MLP model in neural networks.
Acta Numerica, 8:143ā€“195.
Ripley, B. (1996).
Pattern Recognition and Neural Networks.
Cambridge University Press.
Rosenblatt, F. (1958).
The perceptron: a probabilistic model for information storage and organization in the brain.
Psychological Review, 65:386ā€“408.
Rosenblatt, F. (1962).
Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms.
Spartan Books, Washington, DC, USA.
Ruder, S. (2016).
An overview of gradient descent optimization algorithms.
arXiv preprint arXiv: 1609.04747.
Rumelhart, D. and Mc Clelland, J. (1986).
Parallel Distributed Processing: Exploration in the MicroStructure of Cognition.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41
MIT Press, Cambridge, MA, USA.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014).
Dropout: a simple way to prevent neural networks from overļ¬tting.
Journal of Machine Learning Research, 15:1929ā€“1958.
Stinchcombe, M. (1999).
Neural network approximation of continuous functionals and continuous functions on compactiļ¬cations.
Neural Network, 12(3):467ā€“477.
White, H. (1990).
Connectionist nonparametric regression: multilayer feedforward networks can learn arbitraty mappings.
Neural Networks, 3:535ā€“549.
White, H. (1991).
Nonparametric estimation of conditional quantiles using neural networks.
In Proceedings of the 23rd Symposium of the Interface: Computing Science and Statistics, pages 190ā€“199, Alexandria, VA,
USA. American Statistical Association.
Zeiler, M. D. and Fergus, R. (2014).
Visualizing and understanding convolutional networks.
In Proceedings of European Conference on Computer Vision (ECCV 2014).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41

More Related Content

What's hot

Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selectionchenhm
Ā 
A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...ijcsit
Ā 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
Ā 
Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...butest
Ā 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Syed Atif Naseem
Ā 
Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning ParrotAI
Ā 
An Optimal Approach For Knowledge Protection In Structured Frequent Patterns
An Optimal Approach For Knowledge Protection In Structured Frequent PatternsAn Optimal Approach For Knowledge Protection In Structured Frequent Patterns
An Optimal Approach For Knowledge Protection In Structured Frequent PatternsWaqas Tariq
Ā 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
Ā 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...tuxette
Ā 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine LearningDataminingTools Inc
Ā 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsVidya sagar Sharma
Ā 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...IJERA Editor
Ā 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistancePhD Assistance
Ā 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
Ā 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningcsandit
Ā 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalAConeAdam Cone
Ā 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistancePhD Assistance
Ā 
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...ijcsit
Ā 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
Ā 

What's hot (20)

Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
Ā 
A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...
Ā 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
Ā 
Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...
Ā 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
Ā 
Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning
Ā 
An Optimal Approach For Knowledge Protection In Structured Frequent Patterns
An Optimal Approach For Knowledge Protection In Structured Frequent PatternsAn Optimal Approach For Knowledge Protection In Structured Frequent Patterns
An Optimal Approach For Knowledge Protection In Structured Frequent Patterns
Ā 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
Ā 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...
Ā 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
Ā 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
Ā 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Ā 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - Phdassistance
Ā 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
Ā 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
Ā 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalACone
Ā 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - Phdassistance
Ā 
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
Ā 
G
GG
G
Ā 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
Ā 

Similar to An introduction to neural network

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learningtuxette
Ā 
Machine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataMachine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataWeCloudData
Ā 
Artificial Intelligence in Medical Imaging
Artificial Intelligence in Medical ImagingArtificial Intelligence in Medical Imaging
Artificial Intelligence in Medical ImagingZahidulIslamJewel2
Ā 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
Ā 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
Ā 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
Ā 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
Ā 
Basen Network
Basen NetworkBasen Network
Basen Networkguestf7d226
Ā 
Deep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDeep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDataWorks Summit/Hadoop Summit
Ā 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine LearningIRJET Journal
Ā 
Neural Networks
Neural NetworksNeural Networks
Neural NetworksNikitaRuhela
Ā 
Neural networks, naiĢˆve bayes and decision tree machine learning
Neural networks, naiĢˆve bayes and decision tree machine learningNeural networks, naiĢˆve bayes and decision tree machine learning
Neural networks, naiĢˆve bayes and decision tree machine learningFrancisco E. Figueroa-Nigaglioni
Ā 
Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIUProf. Neeta Awasthy
Ā 
Sep2009 Introduction to Medical Expert Decision Support Systems for Mayo Clinic
Sep2009 Introduction to Medical Expert Decision Support Systems for Mayo ClinicSep2009 Introduction to Medical Expert Decision Support Systems for Mayo Clinic
Sep2009 Introduction to Medical Expert Decision Support Systems for Mayo Clinicdoc_vogt
Ā 
1608 probability and statistics in engineering
1608 probability and statistics in engineering1608 probability and statistics in engineering
1608 probability and statistics in engineeringDr Fereidoun Dejahang
Ā 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biologytuxette
Ā 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learningbutest
Ā 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
Ā 
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...butest
Ā 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdfssuserdce5c21
Ā 

Similar to An introduction to neural network (20)

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
Ā 
Machine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataMachine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudData
Ā 
Artificial Intelligence in Medical Imaging
Artificial Intelligence in Medical ImagingArtificial Intelligence in Medical Imaging
Artificial Intelligence in Medical Imaging
Ā 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
Ā 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
Ā 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
Ā 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
Ā 
Basen Network
Basen NetworkBasen Network
Basen Network
Ā 
Deep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDeep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in Spark
Ā 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
Ā 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Ā 
Neural networks, naiĢˆve bayes and decision tree machine learning
Neural networks, naiĢˆve bayes and decision tree machine learningNeural networks, naiĢˆve bayes and decision tree machine learning
Neural networks, naiĢˆve bayes and decision tree machine learning
Ā 
Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIU
Ā 
Sep2009 Introduction to Medical Expert Decision Support Systems for Mayo Clinic
Sep2009 Introduction to Medical Expert Decision Support Systems for Mayo ClinicSep2009 Introduction to Medical Expert Decision Support Systems for Mayo Clinic
Sep2009 Introduction to Medical Expert Decision Support Systems for Mayo Clinic
Ā 
1608 probability and statistics in engineering
1608 probability and statistics in engineering1608 probability and statistics in engineering
1608 probability and statistics in engineering
Ā 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
Ā 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learning
Ā 
Classifiers
ClassifiersClassifiers
Classifiers
Ā 
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Ā 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdf
Ā 

More from tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
Ā 
MĆ©thodes Ć  noyaux pour lā€™intĆ©gration de donnĆ©es hĆ©tĆ©rogĆØnes
MĆ©thodes Ć  noyaux pour lā€™intĆ©gration de donnĆ©es hĆ©tĆ©rogĆØnesMĆ©thodes Ć  noyaux pour lā€™intĆ©gration de donnĆ©es hĆ©tĆ©rogĆØnes
MĆ©thodes Ć  noyaux pour lā€™intĆ©gration de donnĆ©es hĆ©tĆ©rogĆØnestuxette
Ā 
MƩthodologies d'intƩgration de donnƩes omiques
MƩthodologies d'intƩgration de donnƩes omiquesMƩthodologies d'intƩgration de donnƩes omiques
MƩthodologies d'intƩgration de donnƩes omiquestuxette
Ā 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
Ā 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
Ā 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
Ā 
ASTERICS : une application pour intƩgrer des donnƩes omiques
ASTERICS : une application pour intƩgrer des donnƩes omiquesASTERICS : une application pour intƩgrer des donnƩes omiques
ASTERICS : une application pour intƩgrer des donnƩes omiquestuxette
Ā 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
Ā 
Rserve, renv, flask, Vue.js dans un docker pour intƩgrer des donnƩes omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intƩgrer des donnƩes omiques ...Rserve, renv, flask, Vue.js dans un docker pour intƩgrer des donnƩes omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intƩgrer des donnƩes omiques ...tuxette
Ā 
Apprentissage pour la biologie molĆ©culaire et lā€™analyse de donnĆ©es omiques
Apprentissage pour la biologie molĆ©culaire et lā€™analyse de donnĆ©es omiquesApprentissage pour la biologie molĆ©culaire et lā€™analyse de donnĆ©es omiques
Apprentissage pour la biologie molĆ©culaire et lā€™analyse de donnĆ©es omiquestuxette
Ā 
Quelques rƩsultats prƩliminaires de l'Ʃvaluation de mƩthodes d'infƩrence de r...
Quelques rƩsultats prƩliminaires de l'Ʃvaluation de mƩthodes d'infƩrence de r...Quelques rƩsultats prƩliminaires de l'Ʃvaluation de mƩthodes d'infƩrence de r...
Quelques rƩsultats prƩliminaires de l'Ʃvaluation de mƩthodes d'infƩrence de r...tuxette
Ā 
IntƩgration de donnƩes omiques multi-Ʃchelles : mƩthodes Ơ noyau et autres ap...
IntƩgration de donnƩes omiques multi-Ʃchelles : mƩthodes Ơ noyau et autres ap...IntƩgration de donnƩes omiques multi-Ʃchelles : mƩthodes Ơ noyau et autres ap...
IntƩgration de donnƩes omiques multi-Ʃchelles : mƩthodes Ơ noyau et autres ap...tuxette
Ā 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
Ā 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
Ā 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
Ā 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
Ā 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
Ā 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
Ā 
PrƩsentation du projet ASTERICS
PrƩsentation du projet ASTERICSPrƩsentation du projet ASTERICS
PrƩsentation du projet ASTERICStuxette
Ā 
PrƩsentation du projet ASTERICS
PrƩsentation du projet ASTERICSPrƩsentation du projet ASTERICS
PrƩsentation du projet ASTERICStuxette
Ā 

More from tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
Ā 
MĆ©thodes Ć  noyaux pour lā€™intĆ©gration de donnĆ©es hĆ©tĆ©rogĆØnes
MĆ©thodes Ć  noyaux pour lā€™intĆ©gration de donnĆ©es hĆ©tĆ©rogĆØnesMĆ©thodes Ć  noyaux pour lā€™intĆ©gration de donnĆ©es hĆ©tĆ©rogĆØnes
MĆ©thodes Ć  noyaux pour lā€™intĆ©gration de donnĆ©es hĆ©tĆ©rogĆØnes
Ā 
MƩthodologies d'intƩgration de donnƩes omiques
MƩthodologies d'intƩgration de donnƩes omiquesMƩthodologies d'intƩgration de donnƩes omiques
MƩthodologies d'intƩgration de donnƩes omiques
Ā 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
Ā 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
Ā 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
Ā 
ASTERICS : une application pour intƩgrer des donnƩes omiques
ASTERICS : une application pour intƩgrer des donnƩes omiquesASTERICS : une application pour intƩgrer des donnƩes omiques
ASTERICS : une application pour intƩgrer des donnƩes omiques
Ā 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
Ā 
Rserve, renv, flask, Vue.js dans un docker pour intƩgrer des donnƩes omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intƩgrer des donnƩes omiques ...Rserve, renv, flask, Vue.js dans un docker pour intƩgrer des donnƩes omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intƩgrer des donnƩes omiques ...
Ā 
Apprentissage pour la biologie molĆ©culaire et lā€™analyse de donnĆ©es omiques
Apprentissage pour la biologie molĆ©culaire et lā€™analyse de donnĆ©es omiquesApprentissage pour la biologie molĆ©culaire et lā€™analyse de donnĆ©es omiques
Apprentissage pour la biologie molĆ©culaire et lā€™analyse de donnĆ©es omiques
Ā 
Quelques rƩsultats prƩliminaires de l'Ʃvaluation de mƩthodes d'infƩrence de r...
Quelques rƩsultats prƩliminaires de l'Ʃvaluation de mƩthodes d'infƩrence de r...Quelques rƩsultats prƩliminaires de l'Ʃvaluation de mƩthodes d'infƩrence de r...
Quelques rƩsultats prƩliminaires de l'Ʃvaluation de mƩthodes d'infƩrence de r...
Ā 
IntƩgration de donnƩes omiques multi-Ʃchelles : mƩthodes Ơ noyau et autres ap...
IntƩgration de donnƩes omiques multi-Ʃchelles : mƩthodes Ơ noyau et autres ap...IntƩgration de donnƩes omiques multi-Ʃchelles : mƩthodes Ơ noyau et autres ap...
IntƩgration de donnƩes omiques multi-Ʃchelles : mƩthodes Ơ noyau et autres ap...
Ā 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
Ā 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
Ā 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
Ā 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
Ā 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
Ā 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
Ā 
PrƩsentation du projet ASTERICS
PrƩsentation du projet ASTERICSPrƩsentation du projet ASTERICS
PrƩsentation du projet ASTERICS
Ā 
PrƩsentation du projet ASTERICS
PrƩsentation du projet ASTERICSPrƩsentation du projet ASTERICS
PrƩsentation du projet ASTERICS
Ā 

Recently uploaded

Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
Ā 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .Poonam Aher Patil
Ā 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
Ā 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
Ā 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
Ā 
ā¤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number šŸ’¦āœ….
ā¤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number šŸ’¦āœ….ā¤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number šŸ’¦āœ….
ā¤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number šŸ’¦āœ….Nitya salvi
Ā 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
Ā 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
Ā 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
Ā 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
Ā 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
Ā 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
Ā 
High Class Escorts in Hyderabad ā‚¹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ā‚¹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ā‚¹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ā‚¹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
Ā 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
Ā 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
Ā 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
Ā 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
Ā 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
Ā 
High Profile šŸ” 8250077686 šŸ“ž Call Girls Service in GTB NagaršŸ‘
High Profile šŸ” 8250077686 šŸ“ž Call Girls Service in GTB NagaršŸ‘High Profile šŸ” 8250077686 šŸ“ž Call Girls Service in GTB NagaršŸ‘
High Profile šŸ” 8250077686 šŸ“ž Call Girls Service in GTB NagaršŸ‘Damini Dixit
Ā 

Recently uploaded (20)

Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
Ā 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
Ā 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
Ā 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Ā 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
Ā 
ā¤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number šŸ’¦āœ….
ā¤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number šŸ’¦āœ….ā¤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number šŸ’¦āœ….
ā¤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number šŸ’¦āœ….
Ā 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
Ā 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
Ā 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
Ā 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
Ā 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
Ā 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Ā 
High Class Escorts in Hyderabad ā‚¹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ā‚¹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ā‚¹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ā‚¹7.5k Pick Up & Drop With Cash Payment 969456...
Ā 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Ā 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
Ā 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
Ā 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
Ā 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
Ā 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Ā 
High Profile šŸ” 8250077686 šŸ“ž Call Girls Service in GTB NagaršŸ‘
High Profile šŸ” 8250077686 šŸ“ž Call Girls Service in GTB NagaršŸ‘High Profile šŸ” 8250077686 šŸ“ž Call Girls Service in GTB NagaršŸ‘
High Profile šŸ” 8250077686 šŸ“ž Call Girls Service in GTB NagaršŸ‘
Ā 

An introduction to neural network

  • 1. An introduction to neural networks Nathalie Vialaneix & Hyphen nathalie.vialaneix@inra.fr http://www.nathalievialaneix.eu & https://www.hyphen-stat.com/en/ DATE ET LIEU Nathalie Vialaneix & Hyphen | Introduction to neural networks 1/41
  • 2. Outline 1 Statistical / machine / automatic learning Background and notations Underļ¬tting / Overļ¬tting Consistency Avoiding overļ¬tting 2 (non deep) Neural networks Presentation of multi-layer perceptrons Theoretical properties of perceptrons Learning perceptrons Learning in practice 3 Deep neural networks Overview CNN Nathalie Vialaneix & Hyphen | Introduction to neural networks 2/41
  • 3. Outline 1 Statistical / machine / automatic learning Background and notations Underļ¬tting / Overļ¬tting Consistency Avoiding overļ¬tting 2 (non deep) Neural networks Presentation of multi-layer perceptrons Theoretical properties of perceptrons Learning perceptrons Learning in practice 3 Deep neural networks Overview CNN Nathalie Vialaneix & Hyphen | Introduction to neural networks 3/41
  • 4. Background Purpose: predict Y from X; Nathalie Vialaneix & Hyphen | Introduction to neural networks 4/41
  • 5. Background Purpose: predict Y from X; What we have: n observations of (X, Y), (x1, y1), . . . , (xn, yn); Nathalie Vialaneix & Hyphen | Introduction to neural networks 4/41
  • 6. Background Purpose: predict Y from X; What we have: n observations of (X, Y), (x1, y1), . . . , (xn, yn); What we want: estimate unknown Y from new X: xn+1, . . . , xm. Nathalie Vialaneix & Hyphen | Introduction to neural networks 4/41
  • 7. Basics From (xi, yi)i, deļ¬nition of a machine, Ī¦n s.t.: Ė†ynew = Ī¦n (xnew). Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
  • 8. Basics From (xi, yi)i, deļ¬nition of a machine, Ī¦n s.t.: Ė†ynew = Ī¦n (xnew). if Y is numeric, Ī¦n is called a regression function fonction de rĆ©gression; if Y is a factor, Ī¦n is called a classiļ¬er classiļ¬eur; Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
  • 9. Basics From (xi, yi)i, deļ¬nition of a machine, Ī¦n s.t.: Ė†ynew = Ī¦n (xnew). if Y is numeric, Ī¦n is called a regression function fonction de rĆ©gression; if Y is a factor, Ī¦n is called a classiļ¬er classiļ¬eur; Ī¦n is said to be trained or learned from the observations (xi, yi)i. Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
  • 10. Basics From (xi, yi)i, deļ¬nition of a machine, Ī¦n s.t.: Ė†ynew = Ī¦n (xnew). if Y is numeric, Ī¦n is called a regression function fonction de rĆ©gression; if Y is a factor, Ī¦n is called a classiļ¬er classiļ¬eur; Ī¦n is said to be trained or learned from the observations (xi, yi)i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
  • 11. Basics From (xi, yi)i, deļ¬nition of a machine, Ī¦n s.t.: Ė†ynew = Ī¦n (xnew). if Y is numeric, Ī¦n is called a regression function fonction de rĆ©gression; if Y is a factor, Ī¦n is called a classiļ¬er classiļ¬eur; Ī¦n is said to be trained or learned from the observations (xi, yi)i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
  • 12. Basics From (xi, yi)i, deļ¬nition of a machine, Ī¦n s.t.: Ė†ynew = Ī¦n (xnew). if Y is numeric, Ī¦n is called a regression function fonction de rĆ©gression; if Y is a factor, Ī¦n is called a classiļ¬er classiļ¬eur; Ī¦n is said to be trained or learned from the observations (xi, yi)i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Conļ¬‚icting objectives!! Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
  • 13. Underļ¬tting/Overļ¬tting sous/sur - apprentissage Function x ā†’ y to be estimated Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 14. Underļ¬tting/Overļ¬tting sous/sur - apprentissage Observations we might have Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 15. Underļ¬tting/Overļ¬tting sous/sur - apprentissage Observations we do have Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 16. Underļ¬tting/Overļ¬tting sous/sur - apprentissage First estimation from the observations: underļ¬tting Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 17. Underļ¬tting/Overļ¬tting sous/sur - apprentissage Second estimation from the observations: accurate estimation Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 18. Underļ¬tting/Overļ¬tting sous/sur - apprentissage Third estimation from the observations: overļ¬tting Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 19. Underļ¬tting/Overļ¬tting sous/sur - apprentissage Summary Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 20. Errors training error (measures the accuracy to the observations) Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
  • 21. Errors training error (measures the accuracy to the observations) if y is a factor: misclassiļ¬cation rate {Ė†yi yi, i = 1, . . . , n} n Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
  • 22. Errors training error (measures the accuracy to the observations) if y is a factor: misclassiļ¬cation rate {Ė†yi yi, i = 1, . . . , n} n if y is numeric: mean square error (MSE) 1 n n i=1 (Ė†yi āˆ’ yi)2 Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
  • 23. Errors training error (measures the accuracy to the observations) if y is a factor: misclassiļ¬cation rate {Ė†yi yi, i = 1, . . . , n} n if y is numeric: mean square error (MSE) 1 n n i=1 (Ė†yi āˆ’ yi)2 or root mean square error (RMSE) or pseudo-R2 : 1āˆ’MSE/Var((yi)i) Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
  • 24. Errors training error (measures the accuracy to the observations) if y is a factor: misclassiļ¬cation rate {Ė†yi yi, i = 1, . . . , n} n if y is numeric: mean square error (MSE) 1 n n i=1 (Ė†yi āˆ’ yi)2 or root mean square error (RMSE) or pseudo-R2 : 1āˆ’MSE/Var((yi)i) test error: a way to prevent overļ¬tting (estimates the generalization error) is the simple validation Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
  • 25. Errors training error (measures the accuracy to the observations) if y is a factor: misclassiļ¬cation rate {Ė†yi yi, i = 1, . . . , n} n if y is numeric: mean square error (MSE) 1 n n i=1 (Ė†yi āˆ’ yi)2 or root mean square error (RMSE) or pseudo-R2 : 1āˆ’MSE/Var((yi)i) test error: a way to prevent overļ¬tting (estimates the generalization error) is the simple validation 1 split the data into training/test sets (usually 80%/20%) 2 train Ī¦n from the training dataset 3 compute the test error from the remaining data Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
  • 26. Example Observations Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
  • 27. Example Training/Test datasets Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
  • 28. Example Training/Test errors Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
  • 29. Example Summary Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
  • 30. Practical consequences A machine or an algorithm is a method that: start from a given set of prediction functions C use observations (xi, yi)i to pick the ā€œbestā€ prediction function according to what has already been observed. This is often done by the estimation of some parameters (e.g., coefļ¬cients in linear models, weights in neural networks, ...) Nathalie Vialaneix & Hyphen | Introduction to neural networks 9/41
  • 31. Practical consequences A machine or an algorithm is a method that: start from a given set of prediction functions C use observations (xi, yi)i to pick the ā€œbestā€ prediction function according to what has already been observed. This is often done by the estimation of some parameters (e.g., coefļ¬cients in linear models, weights in neural networks, ...) C can depend on user choices (e.g., achitecture, number of neurons in neural networks) ā‡’ hyper-parameters parameters (learned from data) hyper-parameters (can not be learned from data) Nathalie Vialaneix & Hyphen | Introduction to neural networks 9/41
  • 32. Practical consequences A machine or an algorithm is a method that: start from a given set of prediction functions C use observations (xi, yi)i to pick the ā€œbestā€ prediction function according to what has already been observed. This is often done by the estimation of some parameters (e.g., coefļ¬cients in linear models, weights in neural networks, ...) C can depend on user choices (e.g., achitecture, number of neurons in neural networks) ā‡’ hyper-parameters parameters (learned from data) hyper-parameters (can not be learned from data) hyper-parameters often control the richness of C: must be carefully tuned to ensure a good accuracy and avoid overļ¬tting Nathalie Vialaneix & Hyphen | Introduction to neural networks 9/41
  • 33. Model selection / Fair error evaluation Several strategies can be used to estimate the error of a set C and to be able to select the best family of models C (tuning of hyper-parameters): split the data into a training/test set (āˆ¼ 67/33%): the model is estimated with the training dataset and a test error is computed with the remaining observations; Nathalie Vialaneix & Hyphen | Introduction to neural networks 10/41
  • 34. Model selection / Fair error evaluation Several strategies can be used to estimate the error of a set C and to be able to select the best family of models C (tuning of hyper-parameters): split the data into a training/test set (āˆ¼ 67/33%) cross validation: split the data into L folds. For each fold, train a model without the data included in the current fold and compute the error with the data included in the fold: the averaging of these L errors is the cross validation error; fold 1 fold 2 fold 3 . . . fold K train without fold 2: Ī¦āˆ’2 error fold 2: err(Ļ†āˆ’2 , fold 2) CV error: 1 K l err(Ī¦āˆ’l , fold l) Nathalie Vialaneix & Hyphen | Introduction to neural networks 10/41
  • 35. Model selection / Fair error evaluation Several strategies can be used to estimate the error of a set C and to be able to select the best family of models C (tuning of hyper-parameters): split the data into a training/test set (āˆ¼ 67/33%) cross validation: split the data into L folds. For each fold, train a model without the data included in the current fold and compute the error with the data included in the fold: the averaging of these L errors is the cross validation error; out-of-bag error: create B bootstrap samples (size n taken with replacement in the original sample) and compute a prediction based on Out-Of-Bag samples (see next slide). Nathalie Vialaneix & Hyphen | Introduction to neural networks 10/41
  • 36. Out-of-bag observations, prediction and error OOB (Out-Of Bags) error: error based on the observations not included in the ā€œbagā€: (xi, yi) i āˆˆ T 1 i āˆˆ T b i T B Ī¦1 Ī¦b Ī¦B OOB prediction: Ī¦B (xi) used to train the model prediction algorithm Nathalie Vialaneix & Hyphen | Introduction to neural networks 11/41
  • 37. Other model selection methods and strategies to avoid overļ¬tting Main idea: Penalize the training error with something that represents the complexity of the model model selection with explicit penalization strategy: minimize (āˆ¼) error + complexity Example: BIC/AIC criterion in linear models āˆ’2L + log n Ɨ p avoiding overļ¬tting with implicit penalization strategy: constrain the parameters w to take their values within a reduced subset min error(w) + Ī» w with w the parameter of the model (learned) and . typically the 1 or 2 norm Nathalie Vialaneix & Hyphen | Introduction to neural networks 12/41
  • 38. Outline 1 Statistical / machine / automatic learning Background and notations Underļ¬tting / Overļ¬tting Consistency Avoiding overļ¬tting 2 (non deep) Neural networks Presentation of multi-layer perceptrons Theoretical properties of perceptrons Learning perceptrons Learning in practice 3 Deep neural networks Overview CNN Nathalie Vialaneix & Hyphen | Introduction to neural networks 13/41
  • 39. What are (artiļ¬cial) neural networks? Common properties (artiļ¬cial) ā€œNeural networksā€: general name for supervised and unsupervised methods developed in (vague) analogy to the brain; Nathalie Vialaneix & Hyphen | Introduction to neural networks 14/41
  • 40. What are (artiļ¬cial) neural networks? Common properties (artiļ¬cial) ā€œNeural networksā€: general name for supervised and unsupervised methods developed in (vague) analogy to the brain; combination (network) of simple elements (neurons). Nathalie Vialaneix & Hyphen | Introduction to neural networks 14/41
  • 41. What are (artiļ¬cial) neural networks? Common properties (artiļ¬cial) ā€œNeural networksā€: general name for supervised and unsupervised methods developed in (vague) analogy to the brain; combination (network) of simple elements (neurons). Example of graphical representation: INPUTS OUTPUTS Nathalie Vialaneix & Hyphen | Introduction to neural networks 14/41
  • 42. Different types of neural networks A neural network is deļ¬ned by: 1 the network structure; 2 the neuron type. Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
  • 43. Different types of neural networks A neural network is deļ¬ned by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classiļ¬cation and regression); Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
  • 44. Different types of neural networks A neural network is deļ¬ned by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classiļ¬cation and regression); Radial basis function networks (RBF): same purpose but based on local smoothing; Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
  • 45. Different types of neural networks A neural network is deļ¬ned by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classiļ¬cation and regression); Radial basis function networks (RBF): same purpose but based on local smoothing; Self-organizing maps (SOM also sometimes called Kohonenā€™s maps) or Topographic maps: dedicated to unsupervised problems (clustering), self-organized; . . . Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
  • 46. Different types of neural networks A neural network is deļ¬ned by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classiļ¬cation and regression); Radial basis function networks (RBF): same purpose but based on local smoothing; Self-organizing maps (SOM also sometimes called Kohonenā€™s maps) or Topographic maps: dedicated to unsupervised problems (clustering), self-organized; . . . In this talk, focus on MLP. Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
  • 47. MLP: Advantages/Drawbacks Advantages classiļ¬cation OR regression (i.e., Y can be a numeric variable or a factor); non parametric method: ļ¬‚exible; good theoretical properties. Nathalie Vialaneix & Hyphen | Introduction to neural networks 16/41
  • 48. MLP: Advantages/Drawbacks Advantages classiļ¬cation OR regression (i.e., Y can be a numeric variable or a factor); non parametric method: ļ¬‚exible; good theoretical properties. Drawbacks hard to train (high computational cost, especially when d is large); overļ¬t easily; ā€œblack boxā€ models (hard to interpret). Nathalie Vialaneix & Hyphen | Introduction to neural networks 16/41
  • 49. References Advised references: [Bishop, 1995, Ripley, 1996] overview of the topic from a learning (more than statistical) perspective [Devroye et al., 1996, Gyƶrļ¬ et al., 2002] in dedicated chapters present statistical properties of perceptrons Nathalie Vialaneix & Hyphen | Introduction to neural networks 17/41
  • 50. Analogy to the brain 1 a neuron collects signals from neighboring neurons through its dendrites connexions which frequently lead to activating a neuron are enforced (tend to have an increasing impact on the destination neuron) Nathalie Vialaneix & Hyphen | Introduction to neural networks 18/41
  • 51. Analogy to the brain 1 a neuron collects signals from neighboring neurons through its dendrites 2 when total signal is above a given threshold, the neuron is activated connexions which frequently lead to activating a neuron are enforced (tend to have an increasing impact on the destination neuron) Nathalie Vialaneix & Hyphen | Introduction to neural networks 18/41
  • 52. Analogy to the brain 1 a neuron collects signals from neighboring neurons through its dendrites 2 when total signal is above a given threshold, the neuron is activated 3 ... and a signal is sent to other neurons through the axon connexions which frequently lead to activating a neuron are enforced (tend to have an increasing impact on the destination neuron) Nathalie Vialaneix & Hyphen | Introduction to neural networks 18/41
  • 53. First model of artiļ¬cial neuron [Mc Culloch and Pitts, 1943, Rosenblatt, 1958, Rosenblatt, 1962] x(1) x(2) x(p) f(x)Ī£+ w1 w2 wp w0 f : x āˆˆ Rp ā†’ 1 p j=1 wjx(j)+w0 ā‰„ 0 Nathalie Vialaneix & Hyphen | Introduction to neural networks 19/41
  • 54. (artiļ¬cial) Perceptron Layers MLP have one input layer (x āˆˆ Rp ), one output layer (y āˆˆ R or āˆˆ {1, . . . , K āˆ’ 1} values) and several hidden layers; no connections within a layer; connections between two consecutive layers (feedforward). Example (regression, y āˆˆ R): INPUTS x = (x(1) , . . . , x(p) ) Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
  • 55. (artiļ¬cial) Perceptron Layers MLP have one input layer (x āˆˆ Rp ), one output layer (y āˆˆ R or āˆˆ {1, . . . , K āˆ’ 1} values) and several hidden layers; no connections within a layer; connections between two consecutive layers (feedforward). Example (regression, y āˆˆ R): INPUTS x = (x(1) , . . . , x(p) ) Layer 1 weights w (1) jk Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
  • 56. (artiļ¬cial) Perceptron Layers MLP have one input layer (x āˆˆ Rp ), one output layer (y āˆˆ R or āˆˆ {1, . . . , K āˆ’ 1} values) and several hidden layers; no connections within a layer; connections between two consecutive layers (feedforward). Example (regression, y āˆˆ R): INPUTS x = (x(1) , . . . , x(p) ) Layer 1 Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
  • 57. (artiļ¬cial) Perceptron Layers MLP have one input layer (x āˆˆ Rp ), one output layer (y āˆˆ R or āˆˆ {1, . . . , K āˆ’ 1} values) and several hidden layers; no connections within a layer; connections between two consecutive layers (feedforward). Example (regression, y āˆˆ R): INPUTS x = (x(1) , . . . , x(p) ) Layer 1 Layer 2 Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
  • 58. (artiļ¬cial) Perceptron Layers MLP have one input layer (x āˆˆ Rp ), one output layer (y āˆˆ R or āˆˆ {1, . . . , K āˆ’ 1} values) and several hidden layers; no connections within a layer; connections between two consecutive layers (feedforward). Example (regression, y āˆˆ R): INPUTS x = (x(1) , . . . , x(p) ) Layer 1 Layer 2 y OUTPUTS 2 hidden layer MLP Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
  • 59. A neuron in MLP v1 v2 v3 Ɨw1 Ɨw2 Ɨw3 + w0 (Bias Biais) Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 60. A neuron in MLP v1 v2 v3 Ɨw1 Ɨw2 Ɨw3 Ɨw2 Ɨw1 + w0 Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 61. A neuron in MLP v1 v2 v3 Ɨw1 Ɨw2 Ɨw3 Ɨw2 Ɨw1 + w0 Standard activation functions fonctions de transfert / dā€™activation Biologically inspired: Heaviside function h(z) = 0 if z < 0 1 otherwise. Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 62. A neuron in MLP v1 v2 v3 Ɨw1 Ɨw2 Ɨw3 Ɨw2 Ɨw1 + w0 Standard activation functions Main issue with the Heaviside function: not continuous! Identity h(z) = z Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 63. A neuron in MLP v1 v2 v3 Ɨw1 Ɨw2 Ɨw3 Ɨw2 Ɨw1 + w0 Standard activation functions But identity activation function gives linear model if used with one hidden layer: not ļ¬‚exible enough Logistic function h(z) = 1 1+exp(āˆ’z) Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 64. A neuron in MLP v1 v2 v3 Ɨw1 Ɨw2 Ɨw3 Ɨw2 Ɨw1 + w0 Standard activation functions Another popular activation function (useful to model positive real numbers) Rectiļ¬ed linear (ReLU) h(z) = max(0, z) Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 65. A neuron in MLP v1 v2 v3 Ɨw1 Ɨw2 Ɨw3 Ɨw2 Ɨw1 + w0 General sigmoid sigmoid: nondecreasing function h : R ā†’ R such that lim zā†’+āˆž h(z) = 1 lim zā†’āˆ’āˆž h(z) = 0 Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 66. Focus on one-hidden-layer perceptrons (R package nnet) Regression case x(1) x(2) x(p) w(1) w(2)+ w (0) 1 + w (0) Q Ī¦(x) Ī¦(x) = Q k=1 w (2) k hk x w (1) k + w (0) k + w (2) 0 , with hk a (logistic) sigmoid Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
  • 67. Focus on one-hidden-layer perceptrons (R package nnet) Binary classiļ¬cation case (y āˆˆ {0, 1}) x(1) x(2) x(p) w(1) w(2)+ w (0) 1 + w (0) Q Ļ†(x) Ļ†(x) = h0 ļ£« ļ£¬ļ£¬ļ£¬ļ£¬ļ£¬ļ£¬ļ£­ Q k=1 w (2) k hk x w (1) k + w (0) k + w (2) 0 ļ£¶ ļ£·ļ£·ļ£·ļ£·ļ£·ļ£·ļ£ø with h0 logistic sigmoid or identity. Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
  • 68. Focus on one-hidden-layer perceptrons (R package nnet) Binary classiļ¬cation case (y āˆˆ {0, 1}) x(1) x(2) x(p) w(1) w(2)+ w (0) 1 + w (0) Q Ļ†(x) decision with: Ī¦(x) = 0 if Ļ†(x) < 1/2 1 otherwise Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
  • 69. Focus on one-hidden-layer perceptrons (R package nnet) Extension to any classiļ¬cation problem in {1, . . . , K āˆ’ 1} x(1) x(2) x(p) w(1) w(2)+ w (0) 1 + w (0) Q Ļ†(x) Straightforward extension to multiple classes with a multiple output perceptron (number of output units equal to K) and a maximum probability rule for the decision. Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
  • 70. Theoretical properties of perceptrons This section answers two questions: 1 can we approximate any function g : [0, 1]p ā†’ R arbitrary well with a perceptron? Nathalie Vialaneix & Hyphen | Introduction to neural networks 23/41
  • 71. Theoretical properties of perceptrons This section answers two questions: 1 can we approximate any function g : [0, 1]p ā†’ R arbitrary well with a perceptron? 2 when a perceptron is trained with i.i.d. observations from an arbitrary random variable pair (X, Y), is it consistent? (i.e., does it reach the minimum possible error asymptotically when the number of observations grows to inļ¬nity?) Nathalie Vialaneix & Hyphen | Introduction to neural networks 23/41
  • 72. Illustration of the universal approximation property Simple examples a function to approximate: g : [0, 1] ā†’ sin 1 x+0.1 Nathalie Vialaneix & Hyphen | Introduction to neural networks 24/41
  • 73. Illustration of the universal approximation property Simple examples a function to approximate: g : [0, 1] ā†’ sin 1 x+0.1 trying to approximate (how this is performed is explained later in this talk) this function with MLP having different numbers of neurons on their hidden layer Nathalie Vialaneix & Hyphen | Introduction to neural networks 24/41
  • 74. Sketch of main properties of 1-hidden layer MLP 1 universal approximation [Pinkus, 1999, Devroye et al., 1996, Hornik, 1991, Hornik, 1993, Stinchcombe, 1999]: 1-hidden-layer MLP can approximate arbitrary well any (regular enough) function (but, in theory, the number of neurons can grow arbitrarily for that) 2 consistency [Farago and Lugosi, 1993, Devroye et al., 1996, White, 1990, White, 1991, Barron, 1994, McCaffrey and Gallant, 1994]: 1-hidden-layer MLP are universally consistent with a number of neurons that grows slower than O n log n to +āˆž ā‡’ the number of neurons controls the richness of the MLP and has to increase with n Nathalie Vialaneix & Hyphen | Introduction to neural networks 25/41
  • 75. Empirical error minimization Given i.i.d. observations of (X, Y), (xi, yi)i, how to choose the weights w? x(1) x(2) x(p) w(1) w(2)+ w (0) 1 + w (0) Q Ī¦w(x) Nathalie Vialaneix & Hyphen | Introduction to neural networks 26/41
  • 76. Empirical error minimization Given i.i.d. observations of (X, Y), (xi, yi)i, how to choose the weights w? Standard approach: minimize the empirical L2 risk: Ln(w) = n i=1 [fw(xi) āˆ’ yi]2 with yi āˆˆ R for the regression case yi āˆˆ {0, 1} for the classiļ¬cation case, with the associated decision rule x ā†’ 1{Ī¦w (x)ā‰¤1/2}. Nathalie Vialaneix & Hyphen | Introduction to neural networks 26/41
  • 77. Empirical error minimization Given i.i.d. observations of (X, Y), (xi, yi)i, how to choose the weights w? Standard approach: minimize the empirical L2 risk: Ln(w) = n i=1 [fw(xi) āˆ’ yi]2 with yi āˆˆ R for the regression case yi āˆˆ {0, 1} for the classiļ¬cation case, with the associated decision rule x ā†’ 1{Ī¦w (x)ā‰¤1/2}. But: Ln(w) is not convex in w ā‡’ general optimization problem Nathalie Vialaneix & Hyphen | Introduction to neural networks 26/41
  • 78. Optimization with gradient descent Method: initialize (randomly or with some prior knowledge) the weights w(0) āˆˆ RQp+2Q+1 Batch approach: for t = 1, . . . , T w(t + 1) = w(t) āˆ’ Āµ(t) wLn(w)(w(t)); Nathalie Vialaneix & Hyphen | Introduction to neural networks 27/41
  • 79. Optimization with gradient descent Method: initialize (randomly or with some prior knowledge) the weights w(0) āˆˆ RQp+2Q+1 Batch approach: for t = 1, . . . , T w(t + 1) = w(t) āˆ’ Āµ(t) wLn(w)(w(t)); online (or stochastic) approach: write Ln(w)(w) = n i=1 [fw(xi) āˆ’ yi]2 =Ei and for t = 1, . . . , T, randomly pick i āˆˆ {1, . . . , n} and update: w(t + 1) = w(t) āˆ’ Āµ(t) wEi(w(t)). Nathalie Vialaneix & Hyphen | Introduction to neural networks 27/41
  • 80. Discussion about practical choices for this approach batch version converges (from an optimization point of view) to a local minimum of the error for a good choice of Āµ(t) but convergence can be slow stochastic version is usually inefļ¬cient but is useful for large datasets (n large) more efļ¬cient algorithms exist to solve the optimization task. The one implemented in the R package nnet uses higher order derivatives (BFGS algorithm) but this is a very active area of research in which many improvements have been made recently in all cases, solutions returned are, at best, local minima which strongly depends on the initialization: using more than one initialization state is advised, as well as checking the variability among a few runs Nathalie Vialaneix & Hyphen | Introduction to neural networks 28/41
  • 81. Gradient backpropagation method [Rumelhart and Mc Clelland, 1986] The gradient backpropagation rĆ©tropropagation du gradient principle is used to easily computes gradients in perceptrons (or in other types of neural network): Nathalie Vialaneix & Hyphen | Introduction to neural networks 29/41
  • 82. Gradient backpropagation method [Rumelhart and Mc Clelland, 1986] The gradient backpropagation rĆ©tropropagation du gradient principle is used to easily computes gradients in perceptrons (or in other types of neural network): This way, stochastic gradient descent alternates: a forward step which aims at calculating outputs from all observations xi given a value of the weights w a backward step in which the gradient backpropagation principle is used to obtain the gradient for the current weights w Nathalie Vialaneix & Hyphen | Introduction to neural networks 29/41
  • 83. Backpropagation in practice x(1) x(2) x(p) + + w(1) w(2) w (0) 1 w (0) Q Ī¦w(x) initialize weights Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 84. Backpropagation in practice x(1) x(2) x(p) + + w(1) w(2) w (0) 1 w (0) Q Ī¦w(x) Forward step: for all k, compute a (1) k = xi w (1) k + w (0) k Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 85. Backpropagation in practice x(1) x(2) x(p) + + w(1) w(2) w (0) 1 w (0) Q Ī¦w(x) Forward step: for all k, compute z (1) k = hk (a (1) k ) Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 86. Backpropagation in practice x(1) x(2) x(p) + + w(1) w(2) w (0) 1 w (0) Q Ī¦w(x) = a(2) Forward step: compute a(2) = Q k=1 w (2) k z (1) k + w (2) 0 Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 87. Backpropagation in practice x(1) x(2) x(p) + + w(1) w (0) 1 w (0) Q w(2) Ī¦w(x) = a(2) Backward step: compute āˆ‚Ei āˆ‚w (2) k = Ī“(2) Ɨ zk with Ī“(2) = āˆ‚Ei āˆ‚a(2) = [h0(a(2))āˆ’yi ]2 āˆ‚a(2) = 2h0(a(2) ) Ɨ [h0(a(2) ) āˆ’ yi] Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 88. Backpropagation in practice x(1) x(2) x(p) + + w(2) w (0) 1 w (0) Q w(1) Ī¦w(x) Backward step: āˆ‚Ei āˆ‚w (1) kj = Ī“ (1) k Ɨ x (j) i with Ī“ (1) k = āˆ‚Ei āˆ‚a (1) k = āˆ‚Ei āˆ‚a(2) Ɨ āˆ‚a(2) āˆ‚a (1) k = Ī“(2) Ɨ w (2) k hk (a (1) k ) Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 89. Backpropagation in practice x(1) x(2) x(p) + + w(1) w(2) w (0) 1 w (0) Q Ī¦w(x) Backward step: āˆ‚Ei āˆ‚w (0) k = Ī“ (1) k Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 90. Initialization and stopping of the training algorithm 1 How to initialize weights? Standard choices w (1) jk āˆ¼ N(0, 1/ āˆš p) and w (2) k āˆ¼ N(0, 1/ āˆš Q) Nathalie Vialaneix & Hyphen | Introduction to neural networks 31/41
  • 91. Initialization and stopping of the training algorithm 1 How to initialize weights? Standard choices w (1) jk āˆ¼ N(0, 1/ āˆš p) and w (2) k āˆ¼ N(0, 1/ āˆš Q) 2 When to stop the algorithm? (gradient descent or alike) Standard choices: bounded T target value of the error Ė†Ln(w) target value of the evolution Ė†Ln(w(t)) āˆ’ Ė†Ln(w(t + 1)) Nathalie Vialaneix & Hyphen | Introduction to neural networks 31/41
  • 92. Initialization and stopping of the training algorithm 1 How to initialize weights? Standard choices w (1) jk āˆ¼ N(0, 1/ āˆš p) and w (2) k āˆ¼ N(0, 1/ āˆš Q) In the R package nnet, weights are sampled uniformly between [āˆ’0.5, 0.5] or between āˆ’ 1 maxi X (j) i , 1 maxi X (j) i if X(j) is large. 2 When to stop the algorithm? (gradient descent or alike) Standard choices: bounded T target value of the error Ė†Ln(w) target value of the evolution Ė†Ln(w(t)) āˆ’ Ė†Ln(w(t + 1)) In the R package nnet, a combination of the three criteria is used and tunable. Nathalie Vialaneix & Hyphen | Introduction to neural networks 31/41
  • 93. Strategies to avoid overļ¬tting Properly tune Q with a CV or a bootstrap estimation of the generalization ability of the method Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
  • 94. Strategies to avoid overļ¬tting Properly tune Q with a CV or a bootstrap estimation of the generalization ability of the method Early stopping: for Q large enough, use a part of the data as a validation set and stops the training (gradient descent) when the empirical error computed on this dataset starts to increase Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
  • 95. Strategies to avoid overļ¬tting Properly tune Q with a CV or a bootstrap estimation of the generalization ability of the method Early stopping: for Q large enough, use a part of the data as a validation set and stops the training (gradient descent) when the empirical error computed on this dataset starts to increase Weight decay: for Q large enough, penalize the empirical risk with a function of the weights, e.g., Ė†Ln(w) + Ī»w w Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
  • 96. Strategies to avoid overļ¬tting Properly tune Q with a CV or a bootstrap estimation of the generalization ability of the method Early stopping: for Q large enough, use a part of the data as a validation set and stops the training (gradient descent) when the empirical error computed on this dataset starts to increase Weight decay: for Q large enough, penalize the empirical risk with a function of the weights, e.g., Ė†Ln(w) + Ī»w w Noise injection: modify the input data with a random noise during the training Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
  • 97. Outline 1 Statistical / machine / automatic learning Background and notations Underļ¬tting / Overļ¬tting Consistency Avoiding overļ¬tting 2 (non deep) Neural networks Presentation of multi-layer perceptrons Theoretical properties of perceptrons Learning perceptrons Learning in practice 3 Deep neural networks Overview CNN Nathalie Vialaneix & Hyphen | Introduction to neural networks 33/41
  • 98. What is ā€œdeep learningā€? INPUTS = (x(1) , . . . , x(p) ) Layer 1 Layer 2 y OUTPUTS 2 hidden layer MLP Learning with neural networks except that... the number of neurons is huge!!! Image taken from https://towardsdatascience.com/ Nathalie Vialaneix & Hyphen | Introduction to neural networks 34/41
  • 99. Main types of deep NN deep MLP Nathalie Vialaneix & Hyphen | Introduction to neural networks 35/41
  • 100. Main types of deep NN deep MLP CNN (Convolutional Neural Network): used in image processing Images taken from [Krizhevsky et al., 2012, Zeiler and Fergus, 2014] Nathalie Vialaneix & Hyphen | Introduction to neural networks 35/41
  • 101. Main types of deep NN deep MLP CNN (Convolutional Neural Network): used in image processing RNN (Recurrent Neural Network): used for data with a temporal sequence (in natural language processing for instance) Nathalie Vialaneix & Hyphen | Introduction to neural networks 35/41
  • 102. Why such a big buzz around deep learning?... ImageNet results http://www.image-net.org/challenges/LSVRC/ Image by courtesy of StĆ©phane Canu See also: http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/ Nathalie Vialaneix & Hyphen | Introduction to neural networks 36/41
  • 103. Improvements in learning parameters optimization algorithm: gradient descent is used but choice of the descent step (Āµ(t)) has been improved [Ruder, 2016] and mini-batches are also often used to speed the learning (ex: ADAM, [Kingma and Ba, 2017]) Nathalie Vialaneix & Hyphen | Introduction to neural networks 37/41
  • 104. Improvements in learning parameters optimization algorithm: gradient descent is used but choice of the descent step (Āµ(t)) has been improved [Ruder, 2016] and mini-batches are also often used to speed the learning (ex: ADAM, [Kingma and Ba, 2017]) avoid overļ¬tting with dropouts: randomly removing a fraction p of the neurons (and their connexions) during each step of the training step (larger number of iterations to converge but faster update at each iteration) [Srivastava et al., 2014] INPUTS x = (x(1) , . . . , x(p) ) Layer 1 Layer 2 y OUTPUTS Nathalie Vialaneix & Hyphen | Introduction to neural networks 37/41
  • 105. Improvements in learning parameters optimization algorithm: gradient descent is used but choice of the descent step (Āµ(t)) has been improved [Ruder, 2016] and mini-batches are also often used to speed the learning (ex: ADAM, [Kingma and Ba, 2017]) avoid overļ¬tting with dropouts: randomly removing a fraction p of the neurons (and their connexions) during each step of the training step (larger number of iterations to converge but faster update at each iteration) [Srivastava et al., 2014] INPUTS x = (x(1) , . . . , x(p) ) Layer 1 Layer 2 y OUTPUTS Nathalie Vialaneix & Hyphen | Introduction to neural networks 37/41
  • 106. Main available implementations Theano Tensor- Flow Caffe PyTorch (and autograd) Keras Date 2008-2017 2015-... 2013-... 2016-... 2017-... Main dev Montreal University (Y. Bengio) Google Brain Berkeley University community driven (Facebook, Google, ...) FranƧois Chollet / RStudio Lan- guage(s) Python C++/ Python C++/ Python/ Matlab Python (with strong GPU accelera- tions) high level API on top of Tensor- Flow (and CNTK, Theano...): Python, R Nathalie Vialaneix & Hyphen | Introduction to neural networks 38/41
  • 107. General architecture of CNN Image taken from the article https://doi.org/10.3389/fpls.2017.02235 Nathalie Vialaneix & Hyphen | Introduction to neural networks 39/41
  • 108. Convolutional layer Images taken from https://towardsdatascience.com Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
  • 109. Convolutional layer Images taken from https://towardsdatascience.com Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
  • 110. Convolutional layer Images taken from https://towardsdatascience.com Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
  • 111. Convolutional layer Images taken from https://towardsdatascience.com Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
  • 112. Convolutional layer Images taken from https://towardsdatascience.com Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
  • 113. Pooling layer Basic principle: select a subsample of the previous layer values Generally used: max pooling Image taken from Wikimedia Commons, by Aphex34 Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41
  • 114. References Barron, A. (1994). Approximation and estimation bounds for artiļ¬cial neural networks. Machine Learning, 14:115ā€“133. Bishop, C. (1995). Neural Networks for Pattern Recognition. Oxford University Press, New York, USA. Devroye, L., Gyƶrļ¬, L., and Lugosi, G. (1996). A Probabilistic Theory for Pattern Recognition. Springer-Verlag, New York, NY, USA. Farago, A. and Lugosi, G. (1993). Strong universal consistency of neural network classiļ¬ers. IEEE Transactions on Information Theory, 39(4):1146ā€“1151. Gyƶrļ¬, L., Kohler, M., KrzyĖ™zak, A., and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer-Verlag, New York, NY, USA. Hornik, K. (1991). Approximation capabilities of multilayer feedfoward networks. Neural Networks, 4(2):251ā€“257. Hornik, K. (1993). Some new results on neural network approximation. Neural Networks, 6(8):1069ā€“1072. Kingma, D. P. and Ba, J. (2017). Adam: a method for stochastic optimization. arXiv preprint arXiv: 1412.6980. Krizhevsky, A., Sutskever, I., and Hinton, Geoffrey, E. (2012). Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41
  • 115. ImageNet classiļ¬cation with deep convolutional neural networks. In Pereira, F., Burges, C., Bottou, L., and Weinberger, K., editors, Proceedings of Neural Information Processing Systems (NIPS 2012), volume 25. Mc Culloch, W. and Pitts, W. (1943). A logical calculus of ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5(4):115ā€“133. McCaffrey, D. and Gallant, A. (1994). Convergence rates for single hidden layer feedforward networks. Neural Networks, 7(1):115ā€“133. Pinkus, A. (1999). Approximation theory of the MLP model in neural networks. Acta Numerica, 8:143ā€“195. Ripley, B. (1996). Pattern Recognition and Neural Networks. Cambridge University Press. Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65:386ā€“408. Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington, DC, USA. Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv: 1609.04747. Rumelhart, D. and Mc Clelland, J. (1986). Parallel Distributed Processing: Exploration in the MicroStructure of Cognition. Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41
  • 116. MIT Press, Cambridge, MA, USA. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overļ¬tting. Journal of Machine Learning Research, 15:1929ā€“1958. Stinchcombe, M. (1999). Neural network approximation of continuous functionals and continuous functions on compactiļ¬cations. Neural Network, 12(3):467ā€“477. White, H. (1990). Connectionist nonparametric regression: multilayer feedforward networks can learn arbitraty mappings. Neural Networks, 3:535ā€“549. White, H. (1991). Nonparametric estimation of conditional quantiles using neural networks. In Proceedings of the 23rd Symposium of the Interface: Computing Science and Statistics, pages 190ā€“199, Alexandria, VA, USA. American Statistical Association. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. In Proceedings of European Conference on Computer Vision (ECCV 2014). Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41