DEEP BELIEF NETS
Hasan Hüseyin Topçu
Deep Learning
OUTLINE
•  Unsupervised Feature Learning
•  Deep vs. Shallow Architectures
•  Restricted Boltzman Machines
•  Deep Belief Networks
•  Greedy Layer-wise Deep Training Algorithm
•  Conclusion
Unsupervised Feature Learning
•  Transformation of "raw"
inputs to a
representation
•  We have almost
unlabeled data so we
need an unsupervised
way of learning
•  DBNs are graphical
models which learn to
extract a deep
hierarchical
representation of the
training data.
Deep vs. Shallow Architecture
•  Perceptron, Multilayer NNs (lacks to train unlabeled data), SVMs,…
•  Shallow architectures contain a fixed feature layer (or base function)
and a weight-combination layer
•  Deep architectures are compositions of many layers of adaptive non-
linear components(DBNs, CNNs, …)
Restricted Boltzman Machines
•  The main building block of a DBN is a bipartite undirected graphical model called
the Restricted Boltzmann Machine (RBM).
•  More technically, a Restricted Boltzmann Machine is a stochastic neural network
(neural network meaning we have neuron-like units whose binary activations
depend on the neighbors they’re connected to; stochastic meaning these
activations have a probabilistic element) consisting of:
Restriction? To make learning easier, we restrict the network so that no visible
unit is connected to any other visible unit and no hidden unit is connected to
any other hidden unit.
Deep Belief Networks
•  DBNs can be viewed as a composition of simple, unsupervised
networks i.e. RBMs + Sigmoid Belief Networks
•  The greatest advantage of DBNs is its
capability of “learning features”, which is
achieved by a ‘layer-by-layer’ learning
strategies where the higher level features
are learned from the previous layers
Greedy Layer-wise Deep Training
•  Idea: DBNs can be formed by “stacking” RBMs
•  Each layer is trained as a Restricted Boltzman Machine.
•  Train layers sequentially starting from bottom (observed data) layer. (Greedy
layer-wise)
•  Each layer learns a higher-level representation of the layer below. The
training criterion does not depend on the labels. (Unsupervised)
Greedy Layer-wise Deep Training
•  The principle of greedy layer-wise unsupervised training can be
applied to DBNs with RBMs as the building blocks for each layer
[Hinton06], [Bengio07]
•  1. Train the first layer as an RBM that models the raw input x =
•  h0 as its visible layer.
•  2. Use that first layer to obtain a representation of the input that will be used as data for the
second layer. Two common solutions exist. This representation can be chosen as being the
mean activations p(h1 = 1| h0}) or samples of p(h1 | h0}).
•  3. Train the second layer as an RBM, taking the transformed data (samples or mean
activations) as training examples (for the visible layer of that RBM).
•  4. Iterate (2 and 3) for the desired number of layers, each time propagating upward either
samples or mean values.
•  5. Fine-tune all the parameters of this deep architecture with respect to a proxy for the DBN
log- likelihood, or with respect to a supervised training criterion (after adding extra learning
machinery to convert the learned representation into supervised predictions, e.g. a linear
classifier).
Greedy Layer-wise Deep Training
Greedy Layer-wise Deep Training
Greedy Layer-wise Deep Training
DBNs Training
After Layer-wise unsupervised pre-
training good initializations are
obtained
Fine tune the whole network (i.e. by
backpropagation/wake-sleep) w.r.t. a
supervised criterion
Conclusion
•  Deep learning represents a more intellectual behavior
(learning features) compared with the other traditional
machine learning.
•  A central idea, referred to as greedy layerwise
unsupervised pre-training, was to learn a hierarchy of
features one level at a time, using unsupervised feature
learning to learn a new transformation at each level to be
composed with the previously learned transformations;
essentially, each iteration of unsupervised feature learning
adds one layer of weights to a deep neural network.
Finally, the set of layers could be combined to initialize a
deep supervised predictor, such as a neural network
classifier, or a deep generative model
Project
Apply DBN to MNIST digit
dataset to classify the
handwritten digits
References
•  Dandan Mo. A survey on deep learning: one small step
toward AI . 2012
•  Geoffrey E Hinton. A Fast Learning Algorithm for Deep
Belief Nets. 1554:1527–1554, 2006.
•  Yoshua Bengio. Learning Deep Architectures for AI,
volume 2. 2009.
Q & A

Deep Belief Networks

  • 1.
    DEEP BELIEF NETS HasanHüseyin Topçu Deep Learning
  • 2.
    OUTLINE •  Unsupervised FeatureLearning •  Deep vs. Shallow Architectures •  Restricted Boltzman Machines •  Deep Belief Networks •  Greedy Layer-wise Deep Training Algorithm •  Conclusion
  • 3.
    Unsupervised Feature Learning • Transformation of "raw" inputs to a representation •  We have almost unlabeled data so we need an unsupervised way of learning •  DBNs are graphical models which learn to extract a deep hierarchical representation of the training data.
  • 4.
    Deep vs. ShallowArchitecture •  Perceptron, Multilayer NNs (lacks to train unlabeled data), SVMs,… •  Shallow architectures contain a fixed feature layer (or base function) and a weight-combination layer •  Deep architectures are compositions of many layers of adaptive non- linear components(DBNs, CNNs, …)
  • 5.
    Restricted Boltzman Machines • The main building block of a DBN is a bipartite undirected graphical model called the Restricted Boltzmann Machine (RBM). •  More technically, a Restricted Boltzmann Machine is a stochastic neural network (neural network meaning we have neuron-like units whose binary activations depend on the neighbors they’re connected to; stochastic meaning these activations have a probabilistic element) consisting of: Restriction? To make learning easier, we restrict the network so that no visible unit is connected to any other visible unit and no hidden unit is connected to any other hidden unit.
  • 6.
    Deep Belief Networks • DBNs can be viewed as a composition of simple, unsupervised networks i.e. RBMs + Sigmoid Belief Networks •  The greatest advantage of DBNs is its capability of “learning features”, which is achieved by a ‘layer-by-layer’ learning strategies where the higher level features are learned from the previous layers
  • 7.
    Greedy Layer-wise DeepTraining •  Idea: DBNs can be formed by “stacking” RBMs •  Each layer is trained as a Restricted Boltzman Machine. •  Train layers sequentially starting from bottom (observed data) layer. (Greedy layer-wise) •  Each layer learns a higher-level representation of the layer below. The training criterion does not depend on the labels. (Unsupervised)
  • 8.
    Greedy Layer-wise DeepTraining •  The principle of greedy layer-wise unsupervised training can be applied to DBNs with RBMs as the building blocks for each layer [Hinton06], [Bengio07] •  1. Train the first layer as an RBM that models the raw input x = •  h0 as its visible layer. •  2. Use that first layer to obtain a representation of the input that will be used as data for the second layer. Two common solutions exist. This representation can be chosen as being the mean activations p(h1 = 1| h0}) or samples of p(h1 | h0}). •  3. Train the second layer as an RBM, taking the transformed data (samples or mean activations) as training examples (for the visible layer of that RBM). •  4. Iterate (2 and 3) for the desired number of layers, each time propagating upward either samples or mean values. •  5. Fine-tune all the parameters of this deep architecture with respect to a proxy for the DBN log- likelihood, or with respect to a supervised training criterion (after adding extra learning machinery to convert the learned representation into supervised predictions, e.g. a linear classifier).
  • 9.
  • 10.
  • 11.
  • 12.
    DBNs Training After Layer-wiseunsupervised pre- training good initializations are obtained Fine tune the whole network (i.e. by backpropagation/wake-sleep) w.r.t. a supervised criterion
  • 13.
    Conclusion •  Deep learningrepresents a more intellectual behavior (learning features) compared with the other traditional machine learning. •  A central idea, referred to as greedy layerwise unsupervised pre-training, was to learn a hierarchy of features one level at a time, using unsupervised feature learning to learn a new transformation at each level to be composed with the previously learned transformations; essentially, each iteration of unsupervised feature learning adds one layer of weights to a deep neural network. Finally, the set of layers could be combined to initialize a deep supervised predictor, such as a neural network classifier, or a deep generative model
  • 14.
    Project Apply DBN toMNIST digit dataset to classify the handwritten digits
  • 15.
    References •  Dandan Mo.A survey on deep learning: one small step toward AI . 2012 •  Geoffrey E Hinton. A Fast Learning Algorithm for Deep Belief Nets. 1554:1527–1554, 2006. •  Yoshua Bengio. Learning Deep Architectures for AI, volume 2. 2009.
  • 16.