• Artificial Neural Networks
• Deep Neural Networks and Deep Learning
• Autoencoders and Sparsity
• Convolutional Networks
artificial neural networks
• The central idea is to extract linear
combinations of the inputs as derived features
and then model the target as a nonlinear
function of these features
• A feedforward neural network of depth n is a n-
stage regression or classification model,
The outputs of layer l are called activations and are
computed based on linear combinations of inputs
and the bias unit in the following way:
soft-max activation function used as the
last layer (classifier) for K-class
Two types of activation functions:
sigmoid activation and soft-max
When training feedforward networks we
use an average sum-of-squared errors as
an error function
To prevent from overfitting we add regularization
to error function
deep neural networks and
• Deep vanilla neural networks perform worse
than neural networks with one or two hidden
• In theory deep neural networks have at least
the same expressive power as shallow neural
networks but in practice they stuck in local
optima during training phase.
• It is important to use a non-linear activation
function f(x) in each hidden layer
autoencoders and sparsity
• An autoencoder is a neural network that is
trained to encode an input x into some
representation c(x) so that the input can be
reconstructed from that representation
After successful3 training,
it should decompose
the inputs into a
hidden layer activations.
With this trained
autoencoder has learned
We can measure the average activations of the
neurons in the second layer:
and add a penalty to the error function which will prevent
the activations from straying too far from some desired
mean activation p (the sparsity parameter).
* Kullback-Leibler divergence
The resulting autoencoder is called a sparse
B is called the sparsity constraint and controls
the sparsity penalty.
• Better than vanilla neural network.
• Inspired by the human visual system structure
and work by exploiting local connections
through two operations ( Convolution and Sub-
sampling / Pooling)
• Organized in layers of two types:
• Convolution, Sub-sampling
• Biologically inspired operation that reduces the
dimensionality of the input.
Single cell of output matrix is calculated by:
kernel, I is the input matrix. In actual implementation P
• 784-200-200-200-10 Deep network
• Greedy layerwise training
• Training protocol
• Training Parameters and Methods
greedy layer wise training
• to construct a deep pretrained network of n
layers divide the learning into n stages.
• In the first stage train an autoencoder on the
provided training data sans labels.
• Next map the training data to the feature space.
• The mapped data is then used to train the next
stage auto encoder.
• The training follows layer by layer until the last
• The last layer is trained as a classifier (not as an
autoencoder) using supervised learning.
After training the last stage, the networks n1 through n4
are stacked to form a deep neural network. Use the full
training set to train the deep neural network – this final
step is called fine-tuning.
modify the weights W(1) as well, so that adjustments can be m
• Instead of training the network on the full
image we can exploit local connectivity via
convolutional networks, and additionally
restrict the number of trainable parameters
with the use of pooling.
Activation of the hidden unit i
difference of cnns and
• The main difference between AutoEncoder and
Convolutional Network is the level of network hardwiring.
Convolutional Nets are pretty much hardwired. Convolution
operation is pretty much local in image domain, meaning
much more sparsity in the number of connections in neural
network view. Pooling(subsampling) operation in image
domain is also a hardwired set of neural connections in
neural domain. Such topological constraints on network
structure. Given such constraints, training of CNN learns
best weights for this convolution operation (In practice there
are multiple filters). CNNs are usually used for image and
speech tasks where convolutional constraints are a good
• In contrast, Autoencoders almost specify
nothing about the topology of the network.
They are much more general. The idea is to
find good neural transformation to reconstruct
the input. They are composed of encoder
(projects the input to hidden layer) and
decoder (reprojects hidden layer to output).
The hidden layer learns a set of latent features
or latent factors. Linear autoencoders span the
same subspace with PCA. Given a dataset,
they learn number of basis to explain the
underlying pattern of the data.