Deep learning

Deep learning
simon@hekovnik.si

Deep/nature

• Brains have a deep architecture
• Humans organize their ideas hierarchically,
through composition of simpler ideas

Deep/math
• Insufﬁciently deep architectures can be
exponentially inefﬁcient
• Distributed (possibly sparse) representations
are necessary to achieve non-local
generalization
• Intermediate representations allow sharing
statistical strength

Enablers
• CPU (GPU) power
• Algorithmic
• pre-training/network stacking (RBMs,
auto-encoders)
• RBM + contrastive divergence
• parallelization tricks

State of the art

• Learning feature detectors for faces and
cats unsupervised from videos.
• Billions of units, >9 layers
• Better than human recognition of trafﬁc
sings

Applications
• Sequence prediction (time series, gene
sequence, natural language modeling, ...)
• Machine vision
• Speech recognition
• Dimensionality reduction
• Classiﬁcation ...

The good

• State of the art results in many ﬁelds
• Unsupervised, semi-supervised
• (at least somewhat) online learning and
adaptation
• multi-task learning
• (close to) linear scalability

The bad

• Expensive to train
• Hard to inspect/visualize progress for non-
visual tasks

The ugly

• Hyperparameter and topology selection
still critical
• Dependance on tricks for practical results
on real-life datasets

DBN key ideas:
network stacking

• Greedy layer-wise learning
• Hidden units of level k as visible units of
level k+1
• (Use backpropagation on whole stack)

DBN key ideas:

Unsupervised greedy
layer-wise learning
+
Supervised top layer

DBN key ideas: RBM
• generative stochastic neural network
• the network has an energy function and we
are searching for thermal equilibrium
• binary units; weights are state change
probabilities
• learning via contrastive divergence

DBN key ideas:
contrastive divergence

(1) v → h
(2) h → v’
(3) v’ → h’
(4) δw = v⊗h - v’⊗h’

DBN key ideas:
auto-encoder
• Denoising auto-encoder
(corrupt and reconstruct
the input)
• Sparse coding
(each item is encoded by
strong activation of a
small set of neurons)

LSTM
• RNN with explicit state
• Combination of BPTT and RTRL learning
• Online learning
• Can retain information over arbitrarily long
periods of time
• Can be trained by artiﬁcial evolution
• Can combine LSTM blocks with regular
units

Tricks

• Specialized layers (convolution, max-
pooling, ...)
• Multi-column
• Mini-batching, bias, weight momentum,
parameter scheduling, ...

DBN vs. LSTM
• General purpose • Conceptually cleaner
• More ﬂexible • Simpler and smaller
• Bigger community topology
• Faster convergence

People to watch

• G. Hinton (U. Toronto)
• J. Schmidhuber (IDSIA)
• A.Y. Ng (Stanford, Google Brain)
• J. Hawkins (Numenta)

Deep learning

Recommended

Recommended

More Related Content

Similar to Deep learning

Similar to Deep learning (20)

More from Simon Belak

More from Simon Belak (20)

Deep learning