SlideShare a Scribd company logo
1 of 20
Download to read offline
Deep learning
  simon@hekovnik.si
Deep/nature

• Brains have a deep architecture
• Humans organize their ideas hierarchically,
  through composition of simpler ideas
Deep/math
• Insufficiently deep architectures can be
  exponentially inefficient
• Distributed (possibly sparse) representations
  are necessary to achieve non-local
  generalization
• Intermediate representations allow sharing
  statistical strength
Enablers
• CPU (GPU) power
• Algorithmic
 • pre-training/network stacking (RBMs,
    auto-encoders)
 • RBM + contrastive divergence
 • parallelization tricks
State of the art

• Learning feature detectors for faces and
  cats unsupervised from videos.
• Billions of units, >9 layers
• Better than human recognition of traffic
  sings
Applications
• Sequence prediction (time series, gene
  sequence, natural language modeling, ...)
• Machine vision
• Speech recognition
• Dimensionality reduction
• Classification ...
The good

• State of the art results in many fields
• Unsupervised, semi-supervised
• (at least somewhat) online learning and
  adaptation
• multi-task learning
• (close to) linear scalability
The bad


• Expensive to train
• Hard to inspect/visualize progress for non-
  visual tasks
The ugly

• Hyperparameter and topology selection
  still critical
• Dependance on tricks for practical results
  on real-life datasets
Deep belief networks
DBN key ideas:
     network stacking

• Greedy layer-wise learning
• Hidden units of level k as visible units of
  level k+1
• (Use backpropagation on whole stack)
DBN key ideas:

Unsupervised greedy
 layer-wise learning
          +
Supervised top layer
DBN key ideas: RBM
• generative stochastic neural network
• the network has an energy function and we
  are searching for thermal equilibrium
• binary units; weights are state change
  probabilities
• learning via contrastive divergence
DBN key ideas:
contrastive divergence

(1) v → h
(2) h → v’
(3) v’ → h’
(4) δw = v⊗h - v’⊗h’
DBN key ideas:
auto-encoder
     • Denoising auto-encoder
       (corrupt and reconstruct
       the input)
     • Sparse coding
       (each item is encoded by
       strong activation of a
       small set of neurons)
LSTM
LSTM
• RNN with explicit state
• Combination of BPTT and RTRL learning
• Online learning
• Can retain information over arbitrarily long
  periods of time
• Can be trained by artificial evolution
• Can combine LSTM blocks with regular
  units
Tricks

• Specialized layers (convolution, max-
  pooling, ...)
• Multi-column
• Mini-batching, bias, weight momentum,
  parameter scheduling, ...
DBN vs. LSTM
• General purpose    • Conceptually cleaner
• More flexible       • Simpler and smaller
• Bigger community       topology
                     •   Faster convergence
People to watch

• G. Hinton (U. Toronto)
• J. Schmidhuber (IDSIA)
• A.Y. Ng (Stanford, Google Brain)
• J. Hawkins (Numenta)

More Related Content

Similar to Deep learning

DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
 
Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Joe Xing
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsChitta Ranjan
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningPoo Kuan Hoong
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsJon Lederman
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JJosh Patterson
 
Training machine learning deep learning 2017
Training machine learning deep learning 2017Training machine learning deep learning 2017
Training machine learning deep learning 2017Iwan Sofana
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Márton Miháltz
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep LearningNatasha Latysheva
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep LearningYasas Senarath
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015Josh Patterson
 

Similar to Deep learning (20)

DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0
 
Deep learning 1
Deep learning 1Deep learning 1
Deep learning 1
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancements
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep Learning
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning Basics
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
 
AINL 2016: Filchenkov
AINL 2016: FilchenkovAINL 2016: Filchenkov
AINL 2016: Filchenkov
 
Training machine learning deep learning 2017
Training machine learning deep learning 2017Training machine learning deep learning 2017
Training machine learning deep learning 2017
 
Neural Networks-1
Neural Networks-1Neural Networks-1
Neural Networks-1
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
 

More from Simon Belak

Tools for building the future
Tools for building the futureTools for building the future
Tools for building the futureSimon Belak
 
Doing data science with clojure
Doing data science with clojureDoing data science with clojure
Doing data science with clojureSimon Belak
 
Exploratory analysis
Exploratory analysisExploratory analysis
Exploratory analysisSimon Belak
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructureSimon Belak
 
The subtle art of recommendation
The subtle art of recommendationThe subtle art of recommendation
The subtle art of recommendationSimon Belak
 
Metabase Ljubljana Meetup #2
Metabase Ljubljana Meetup #2Metabase Ljubljana Meetup #2
Metabase Ljubljana Meetup #2Simon Belak
 
Metabase lj meetup
Metabase lj meetupMetabase lj meetup
Metabase lj meetupSimon Belak
 
Sketch algorithms
Sketch algorithmsSketch algorithms
Sketch algorithmsSimon Belak
 
Transducing for fun and profit
Transducing for fun and profitTransducing for fun and profit
Transducing for fun and profitSimon Belak
 
Your metrics are wrong
Your metrics are wrongYour metrics are wrong
Your metrics are wrongSimon Belak
 
Writing smart contracts the sane way
Writing smart contracts the sane wayWriting smart contracts the sane way
Writing smart contracts the sane waySimon Belak
 
Online statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithmsOnline statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithmsSimon Belak
 
Save the princess
Save the princessSave the princess
Save the princessSimon Belak
 
Data driven going to market strategy
Data driven going to market strategyData driven going to market strategy
Data driven going to market strategySimon Belak
 
Spec: a lisp-flavoured type system
Spec: a lisp-flavoured type systemSpec: a lisp-flavoured type system
Spec: a lisp-flavoured type systemSimon Belak
 
A data layer in clojure
A data layer in clojureA data layer in clojure
A data layer in clojureSimon Belak
 
Odkrivanje segmentov iz podatkov
Odkrivanje segmentov iz podatkovOdkrivanje segmentov iz podatkov
Odkrivanje segmentov iz podatkovSimon Belak
 
Using Onyx in anger
Using Onyx in angerUsing Onyx in anger
Using Onyx in angerSimon Belak
 

More from Simon Belak (20)

Tools for building the future
Tools for building the futureTools for building the future
Tools for building the future
 
Doing data science with clojure
Doing data science with clojureDoing data science with clojure
Doing data science with clojure
 
Exploratory analysis
Exploratory analysisExploratory analysis
Exploratory analysis
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
The subtle art of recommendation
The subtle art of recommendationThe subtle art of recommendation
The subtle art of recommendation
 
Metabase Ljubljana Meetup #2
Metabase Ljubljana Meetup #2Metabase Ljubljana Meetup #2
Metabase Ljubljana Meetup #2
 
Metabase lj meetup
Metabase lj meetupMetabase lj meetup
Metabase lj meetup
 
Sketch algorithms
Sketch algorithmsSketch algorithms
Sketch algorithms
 
Transducing for fun and profit
Transducing for fun and profitTransducing for fun and profit
Transducing for fun and profit
 
Your metrics are wrong
Your metrics are wrongYour metrics are wrong
Your metrics are wrong
 
Writing smart contracts the sane way
Writing smart contracts the sane wayWriting smart contracts the sane way
Writing smart contracts the sane way
 
Online statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithmsOnline statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithms
 
Save the princess
Save the princessSave the princess
Save the princess
 
Data driven going to market strategy
Data driven going to market strategyData driven going to market strategy
Data driven going to market strategy
 
Spec: a lisp-flavoured type system
Spec: a lisp-flavoured type systemSpec: a lisp-flavoured type system
Spec: a lisp-flavoured type system
 
A data layer in clojure
A data layer in clojureA data layer in clojure
A data layer in clojure
 
Odkrivanje segmentov iz podatkov
Odkrivanje segmentov iz podatkovOdkrivanje segmentov iz podatkov
Odkrivanje segmentov iz podatkov
 
Using Onyx in anger
Using Onyx in angerUsing Onyx in anger
Using Onyx in anger
 
Spec + onyx
Spec + onyxSpec + onyx
Spec + onyx
 
Dao of lisp
Dao of lispDao of lisp
Dao of lisp
 

Deep learning

  • 1. Deep learning simon@hekovnik.si
  • 2. Deep/nature • Brains have a deep architecture • Humans organize their ideas hierarchically, through composition of simpler ideas
  • 3. Deep/math • Insufficiently deep architectures can be exponentially inefficient • Distributed (possibly sparse) representations are necessary to achieve non-local generalization • Intermediate representations allow sharing statistical strength
  • 4. Enablers • CPU (GPU) power • Algorithmic • pre-training/network stacking (RBMs, auto-encoders) • RBM + contrastive divergence • parallelization tricks
  • 5. State of the art • Learning feature detectors for faces and cats unsupervised from videos. • Billions of units, >9 layers • Better than human recognition of traffic sings
  • 6. Applications • Sequence prediction (time series, gene sequence, natural language modeling, ...) • Machine vision • Speech recognition • Dimensionality reduction • Classification ...
  • 7. The good • State of the art results in many fields • Unsupervised, semi-supervised • (at least somewhat) online learning and adaptation • multi-task learning • (close to) linear scalability
  • 8. The bad • Expensive to train • Hard to inspect/visualize progress for non- visual tasks
  • 9. The ugly • Hyperparameter and topology selection still critical • Dependance on tricks for practical results on real-life datasets
  • 11. DBN key ideas: network stacking • Greedy layer-wise learning • Hidden units of level k as visible units of level k+1 • (Use backpropagation on whole stack)
  • 12. DBN key ideas: Unsupervised greedy layer-wise learning + Supervised top layer
  • 13. DBN key ideas: RBM • generative stochastic neural network • the network has an energy function and we are searching for thermal equilibrium • binary units; weights are state change probabilities • learning via contrastive divergence
  • 14. DBN key ideas: contrastive divergence (1) v → h (2) h → v’ (3) v’ → h’ (4) δw = v⊗h - v’⊗h’
  • 15. DBN key ideas: auto-encoder • Denoising auto-encoder (corrupt and reconstruct the input) • Sparse coding (each item is encoded by strong activation of a small set of neurons)
  • 16. LSTM
  • 17. LSTM • RNN with explicit state • Combination of BPTT and RTRL learning • Online learning • Can retain information over arbitrarily long periods of time • Can be trained by artificial evolution • Can combine LSTM blocks with regular units
  • 18. Tricks • Specialized layers (convolution, max- pooling, ...) • Multi-column • Mini-batching, bias, weight momentum, parameter scheduling, ...
  • 19. DBN vs. LSTM • General purpose • Conceptually cleaner • More flexible • Simpler and smaller • Bigger community topology • Faster convergence
  • 20. People to watch • G. Hinton (U. Toronto) • J. Schmidhuber (IDSIA) • A.Y. Ng (Stanford, Google Brain) • J. Hawkins (Numenta)