Deep learning
  simon@hekovnik.si
Deep/nature

• Brains have a deep architecture
• Humans organize their ideas hierarchically,
  through composition of simpler ideas
Deep/math
• Insufficiently deep architectures can be
  exponentially inefficient
• Distributed (possibly sparse) representations
  are necessary to achieve non-local
  generalization
• Intermediate representations allow sharing
  statistical strength
Enablers
• CPU (GPU) power
• Algorithmic
 • pre-training/network stacking (RBMs,
    auto-encoders)
 • RBM + contrastive divergence
 • parallelization tricks
State of the art

• Learning feature detectors for faces and
  cats unsupervised from videos.
• Billions of units, >9 layers
• Better than human recognition of traffic
  sings
Applications
• Sequence prediction (time series, gene
  sequence, natural language modeling, ...)
• Machine vision
• Speech recognition
• Dimensionality reduction
• Classification ...
The good

• State of the art results in many fields
• Unsupervised, semi-supervised
• (at least somewhat) online learning and
  adaptation
• multi-task learning
• (close to) linear scalability
The bad


• Expensive to train
• Hard to inspect/visualize progress for non-
  visual tasks
The ugly

• Hyperparameter and topology selection
  still critical
• Dependance on tricks for practical results
  on real-life datasets
Deep belief networks
DBN key ideas:
     network stacking

• Greedy layer-wise learning
• Hidden units of level k as visible units of
  level k+1
• (Use backpropagation on whole stack)
DBN key ideas:

Unsupervised greedy
 layer-wise learning
          +
Supervised top layer
DBN key ideas: RBM
• generative stochastic neural network
• the network has an energy function and we
  are searching for thermal equilibrium
• binary units; weights are state change
  probabilities
• learning via contrastive divergence
DBN key ideas:
contrastive divergence

(1) v → h
(2) h → v’
(3) v’ → h’
(4) δw = v⊗h - v’⊗h’
DBN key ideas:
auto-encoder
     • Denoising auto-encoder
       (corrupt and reconstruct
       the input)
     • Sparse coding
       (each item is encoded by
       strong activation of a
       small set of neurons)
LSTM
LSTM
• RNN with explicit state
• Combination of BPTT and RTRL learning
• Online learning
• Can retain information over arbitrarily long
  periods of time
• Can be trained by artificial evolution
• Can combine LSTM blocks with regular
  units
Tricks

• Specialized layers (convolution, max-
  pooling, ...)
• Multi-column
• Mini-batching, bias, weight momentum,
  parameter scheduling, ...
DBN vs. LSTM
• General purpose    • Conceptually cleaner
• More flexible       • Simpler and smaller
• Bigger community       topology
                     •   Faster convergence
People to watch

• G. Hinton (U. Toronto)
• J. Schmidhuber (IDSIA)
• A.Y. Ng (Stanford, Google Brain)
• J. Hawkins (Numenta)

Deep learning

  • 1.
    Deep learning simon@hekovnik.si
  • 2.
    Deep/nature • Brains havea deep architecture • Humans organize their ideas hierarchically, through composition of simpler ideas
  • 3.
    Deep/math • Insufficiently deeparchitectures can be exponentially inefficient • Distributed (possibly sparse) representations are necessary to achieve non-local generalization • Intermediate representations allow sharing statistical strength
  • 4.
    Enablers • CPU (GPU)power • Algorithmic • pre-training/network stacking (RBMs, auto-encoders) • RBM + contrastive divergence • parallelization tricks
  • 5.
    State of theart • Learning feature detectors for faces and cats unsupervised from videos. • Billions of units, >9 layers • Better than human recognition of traffic sings
  • 6.
    Applications • Sequence prediction(time series, gene sequence, natural language modeling, ...) • Machine vision • Speech recognition • Dimensionality reduction • Classification ...
  • 7.
    The good • Stateof the art results in many fields • Unsupervised, semi-supervised • (at least somewhat) online learning and adaptation • multi-task learning • (close to) linear scalability
  • 8.
    The bad • Expensiveto train • Hard to inspect/visualize progress for non- visual tasks
  • 9.
    The ugly • Hyperparameterand topology selection still critical • Dependance on tricks for practical results on real-life datasets
  • 10.
  • 11.
    DBN key ideas: network stacking • Greedy layer-wise learning • Hidden units of level k as visible units of level k+1 • (Use backpropagation on whole stack)
  • 12.
    DBN key ideas: Unsupervisedgreedy layer-wise learning + Supervised top layer
  • 13.
    DBN key ideas:RBM • generative stochastic neural network • the network has an energy function and we are searching for thermal equilibrium • binary units; weights are state change probabilities • learning via contrastive divergence
  • 14.
    DBN key ideas: contrastivedivergence (1) v → h (2) h → v’ (3) v’ → h’ (4) δw = v⊗h - v’⊗h’
  • 15.
    DBN key ideas: auto-encoder • Denoising auto-encoder (corrupt and reconstruct the input) • Sparse coding (each item is encoded by strong activation of a small set of neurons)
  • 16.
  • 17.
    LSTM • RNN withexplicit state • Combination of BPTT and RTRL learning • Online learning • Can retain information over arbitrarily long periods of time • Can be trained by artificial evolution • Can combine LSTM blocks with regular units
  • 18.
    Tricks • Specialized layers(convolution, max- pooling, ...) • Multi-column • Mini-batching, bias, weight momentum, parameter scheduling, ...
  • 19.
    DBN vs. LSTM •General purpose • Conceptually cleaner • More flexible • Simpler and smaller • Bigger community topology • Faster convergence
  • 20.
    People to watch •G. Hinton (U. Toronto) • J. Schmidhuber (IDSIA) • A.Y. Ng (Stanford, Google Brain) • J. Hawkins (Numenta)