1. DEEP LEARNING IN NEURAL NETWORKS
Tanushri Sarma CSI14007
Roshan Chettri CSI14029
2. Deep Learning: Definition
Deep learning is a set of algorithms in machine learning that attempt to
learn in multiple levels, corresponding to different levels of abstraction.
It typically uses artificial neural networks. The levels in these learned
statistical models correspond to distinct levels of concepts, where
higher-level concepts are defined from lower-level ones, and the same
lower level concepts can help to define many higher-level concepts.
3. Deep Learning Overview
• Train networks with many layers (vs. shallow nets with just a couple
of layers)
• Multiple layers work to build an improved feature space
– First layer learns 1st
order features (e.g. edges…)
– 2nd
layer learns higher order features (combinations of first layer
features, combinations of edges, etc.)
– In current models layers often learn in an unsupervised mode
and discover general features of the input space – serving
multiple tasks related to the unsupervised instances (image
recognition, etc.)
– Then final layer features are fed into supervised layer(s)
• And entire network is often subsequently tuned using
supervised training of the entire net, using the initial
weightings learned in the unsupervised phase
– Could also do fully supervised versions, etc.
4. A Three-Way Categorization
1. Deep networks for unsupervised or generative learning,
which are intended to capture high-order correlation of the
observed or visible data for pattern analysis or synthesis
purposes when no information about target class labels is
available.
2. Deep networks for supervised learning, which are intended
to directly provide discriminative power for pattern classification
purposes, often by characterizing the posterior distributions
of classes conditioned on the visible data.
3. Hybrid deep networks, where the goal is discrimination which
is assisted, often in a significant way, with the outcomes of
generative or unsupervised deep networks.
7. Deep Learning Architectures
• Deep Neural Networks
• Deep Belief Networks
• Convolutional Neural Networks
• Deep Boltzmann Machines
8. Deep Neural Networks
• A deep neural network (DNN) is an artificial neural network with
multiple hidden layers of units between the input and output
layers. Similar to shallow ANNs, DNNs can model complex non-
linear relationships. DNN architectures, e.g., for object
detection and parsing generate compositional models where the
object is expressed as layered composition of image primitives. The
extra layers enable composition of features from lower layers, giving
the potential of modelling complex data with fewer units than a
similarly performing shallow network.
9. Deep Belief Networks
• Geoff Hinton (2006)
• Uses Greedy layer-wise training but each layer is an RBM
(Restricted Boltzmann Machine)
• RBM is a constrained
Boltzmann machine with
– No lateral connections between
hidden (h) and visible (x) nodes
– Symmetric weights
– Does not use annealing/temperature, but that is all right since
each RBM not seeking a global minima, but rather an
incremental transformation of the feature space
– Typically uses probabilistic logistic node, but other activations
possible
10. Convolutional Neural networks
Each layer combines (merges, smoothens) patches from previous layers
– Typically tries to compress large data (images) into a smaller set of
robust features, based on local variations
– Basic convolution can still create many features
•Pooling –
This step compresses and smoothens the data
– Make data invariant to small translational changes
– Usually takes the average or max value across disjoint patches
•Often convolution filters and pooling are hand crafted – not learned, though
tuning can occur
•After this hand-crafted/non-trained/partial-trained convolving the new set of
features are used to train a supervised model
•Requires neighborhood regularities in the input space (e.g. images, stationary
property)
.
12. Deep Boltzmann Machines
• A Deep Boltzmann Machine (DBM) is a type of binary
pairwise Markov random field (undirected probabilistic graphical
models) with multiple layers of hidden random variables. It is a
network of symmetrically coupled stochastic binary units.
Like DBNs, they benefit from the ability of learning complex and
abstract internal representations of the input in tasks such
as object or speech recognition, with the use of limited
number of labelled data to fine-tune the representations built based
on a large supply of unlabelled sensory input data.
13. Why GPU in Deep Learning?
• Progress in AI
IDEA
CODE
TRAIN
TEST
14. Why GPU in Deep Learning?
• One important thing that determines our progress in AI is the
Latency that it takes to go from Idea to Tested model and then we
can go around the circle again.
• We need to be able to express our ideas about models in code
quickly and run them quickly on hardware.
• Boiling out model down into something that can actually run could
take years. The Latency would be really long.
15. Why GPU in Deep Learning?
• We need something that is programmable. We need to be able to
change our ideas about what our model should look like and just
compile and run them.
• This is one of the main advantage of GPU. That is why GPUs have
been so popular in the field of Deep Learning.
• Another thing that is really important is the time it takes to train. So
this is where HPC and Parallel Computing comes in.