January 5, 2017 1
Deep Learning
Er. Shiva K. Shrestha, ME Computer, NCIT
Slide Credit
o Jeff Dean, Google, Large Scale Deep Learning
o Andrew Ng, Deep Learning
o Aditya Khosla & Joseph Lim, Visual Recognition through ML Competition
January 5, 2017 2
Structure
◦ General Questions of the World
◦ What is Deep Learning?
◦ Why Deep Learning?
◦ Deep Neural Network Architectures
◦ Deep Learning Applications
◦ Conclusions, Recommendations
January 5, 2017 3
How Can We Build More Intelligent Computer
Systems?
According to Jeff Dean, Google:
o Need to perceive and understand the world
o Basic speech and vision capabilities
o Language understanding
o User behavior prediction
o …
January 5, 2017 4
How can we do this?
According to Jeff Dean, Google:
o Cannot write algorithms for each task we want to accomplish
separately.
o Need to write general algorithms that learn from observations
o Can we build systems that:
o Generate understanding from raw data
o Solve difficult problems to improve products
o Minimize software engineering effort
January 5, 2017 5
Plenty of Data
o Text: trillions of words of English + other languages
o Visual: billions of images and videos
o Audio: thousands of hours of speech per day
o User Activity: queries, result page clicks, map requests, etc.
o Knowledge Graph: billions of labelled relation triples
o …
January 5, 2017 6
Image Models
January 5, 2017 7
What are these numbers?
January 5, 2017 8
What are all these words?
January 5, 2017 9
How about these words?
January 5, 2017 10
Textual Understanding
“This movie should have NEVER been made. From the poorly done
animation, to the beyond bad acting. I am not sure at what point the
people behind this movie said "Ok, looks good! Lets do it!" I was in
awe of how truly horrid this movie was.”
January 5, 2017 11
General Machine Learning Approaches
o Learning by labeled example: Supervised Learning
o e.g. An email spam detector
o amazingly effective if you have lots of examples
o Discovering patterns: Unsupervised Learning
o e.g. data clustering
o difficult in practice, but useful if you lack labeled examples
o Feedback right/wrong: Reinforcement Learning
o e.g. learning to play chess by winning or losing
o works well in some domains, becoming more important
January 5, 2017 12
Machine Learning
o For many of these problems, we have lots of data
o gives computers the ability to learn without being explicitly
programmed
January 5, 2017 13
Approaches
o Decision tree learning
o Association rule learning
o Artificial neural networks
o Deep learning
o Inductive logic programming
o Support vector machines
o Clustering
o Bayesian networks
Approaches …
o Reinforcement learning
o Representation learning
o Similarity and metric learning
o Sparse dictionary learning
o Genetic algorithms
o Rule-based machine learning
o Learning classifier systems
Typical Goal of Machine Learning
Label: “Motorcycle”
Suggest tags
Image search
…
Speech recognition
Music classification
Speaker identification
…
Web search
Anti-spam
Machine translation
…
text
audio
images/video
I/p O/p
ML
ML
ML
January 5, 2017 14
Basic Idea of Deep Learning
Is there some way to extract meaningful features from data
even without knowing the task to be performed?
Then, throw in some hierarchical ‘stuff’ to make it ‘deep’
January 5, 2017 15
What is Deep Learning?
o The modern reincarnation of ANNs from the 1980s and 90s.
o A collection of simple trainable mathematical units, which
collaborate to compute a complicated function.
oCompatible with (3) General ML Approaches
January 5, 2017 16
What is Deep Learning? (2)
o Loosely inspired by what (little) we know about the biological brain.
o AKA:
o Deep Structure Learning
o Hierarchical Learning
o Deep M/c Learning
January 5, 2017 17
Deep Learning Definitions
Deep learning is characterized as a class of machine learning algorithms that
o use a cascade of many layers of nonlinear processing units for feature
extraction and transformation.
o are based on the learning of multiple levels of features or representations of
the data.
o are part of the broader machine learning field of learning representations of
data.
o learn multiple levels of representations that correspond to different levels of
abstraction;
January 5, 2017 18
DL - Why is this hard?
You see this:
But the camera sees this:
January 5, 2017 19
Pixel-based Representation
Input
Raw image
Motorbikes
“Non”-Motorbikes
Learning
algorithm
pixel 1
pixel2
pixel 1
pixel 2
January 5, 2017 20
Pixel-based Representation (2)
Input
Motorbikes
“Non”-Motorbikes
Learning
algorithm
pixel 1
pixel2
pixel 1
pixel 2
Raw image
January 5, 2017 21
Pixel-based Representation (2)
Input Motorbikes
“Non”-Motorbikes
Learning
algorithm
pixel 1
pixel2
pixel 1
pixel 2
Raw image
January 5, 2017 22
What We Want
Input
Motorbikes
“Non”-Motorbikes
Learning
algorithm
pixel 1
pixel2
Feature
representation
handlebars
wheel
E.g., Does it have Handlebars? Wheels?
Handlebars
Wheels
Raw image Features
January 5, 2017 23
Some Feature Representations
SIFT Spin image
HoG
RIFT
Textons GLOH
January 5, 2017 24
Some Feature Representations (2)
SIFT Spin image
HoG
RIFT
Textons GLOH
Coming up with features is often difficult, time-
consuming, and requires expert knowledge.
January 5, 2017 25
The Brain:
Potential Motivation for Deep Learning
[Roe et al., 1992]
Auditory Cortex learns to see!
Auditory Cortex
January 5, 2017 26
The Brain adapts!
[BrainPort; Welsh & Blasch, 1997; Nagel et al., 2005; Constantine-Paton & Law, 2009]
Seeing with your Tongue Human Echolocation (Sonar)
Haptic belt: Direction Sense Implanting a 3rd Eye
January 5, 2017 27
Feature Learning Problem
Given a 14x14 image patch x, can represent it using 196 real numbers.
Problem: Can we find a learn a better feature vector to represent this?
255
98
93
87
89
91
48
…
January 5, 2017 28
Why Deep Learning?
Method Accuracy
Hessian + ESURF [Williems et al 2008] 38%
Harris3D + HOG/HOF [Laptev et al 2003,
2004]
45%
Cuboids + HOG/HOF [Dollar et al 2005,
Laptev 2004]
46%
Hessian + HOG/HOF [Laptev 2004, Williems
et al 2008]
46%
Dense + HOG / HOF [Laptev 2004] 47%
Cuboids + HOG3D [Klaser 2008, Dollar et al
2005]
46%
Unsupervised Feature Learning (DL) 52%
[Le, Zhou & Ng, 2011]
Task: Video Activity Recognition
January 5, 2017 29
Deep Neural Network Architectures
o GMDH: 1st DLN of 1965
o Convolutional NN
o Neural history compressor
o Recursive NN
o Long short-term memory (LSTM)
o Deep belief networks (DBN)
o Convolutional deep belief networks
o Large memory storage & retrieval NN
o Deep Boltzmann machines
o Stacked (de-noising) auto-encoders
o Deep stacking networks
o Tensor deep stacking networks
o Spike-and-slab RBMs
o Compound hierarchical-deep models
o Deep coding networks
o Deep Q-networks
o Networks with separate memory
structures
January 5, 2017 30
Neural Network (NN)
x1
x2
x3
+1 +1
Layer 1 Layer 2
Layer 4+1
Layer 3
4 layer network with 2 output units:
January 5, 2017 31
Unsupervised Feature Learning with a NN
x4
x5
x6
+1
x1
x2
x3
+1
a1
a2
a3
+1
b1
b2
b3
+1
c1
c2
c3
New representation
for input.
Use [c1, c3, c3] as representation to feed to learning algorithm.
Deep Belief Network
DBN is algorithm for learning a feature hierarchy.
Building Block: 2-layer graphical model (Restricted Boltzmann
Machine).
Can then learn additional layers one at a time.
Schematic overview of
a deep belief net.
Deep Belief Network (2)
Input [x1, x2, x3, x4]
Layer 2. [a1, a2, a3]
Layer 3. [b1, b2, b3]
Similar to a sparse auto-encoder in many ways.
Stack RBMs on top of each other to get DBN.
January 5, 2017 34
Convolutional DBN for Audio
Spectrogram
Detection units
Max pooling unit
January 5, 2017 35
Convolutional DBN for Audio (2)
Spectrogram
January 5, 2017 36
Convolutional DBN for Images
January 5, 2017 37
Going Deep
Pixels
Object Models
[Honglak Lee]
Training Set: Aligned
images of faces.
January 5, 2017 38
Edges
Object Parts
(combination of edges)
Applications
o Computer Vision: Object
Detection & Recognition
o Speech Recognition
o Speaker Identification
o Web Searches
o Text Classification - Sentiment
Analysis
o Translations
o Miscellaneous
o Fine-grained Classification
o Generalization
o Generating Image Captions from
Pixels
o …
January 5, 2017 39
Applications (2)
January 5, 2017 40
Speech Recognition on Android
January 5, 2017 41
Impact on Speech Recognition
January 5, 2017 42
Text Classifications
January 5, 2017 43
Results for IMDB Sentiment Classification
(long paragraphs)
Translation
o Google Translate:
o As Reuters noted for the first time in July, the seating configuration is exactly
what fuels the battle between the latest devices.
o Neural LSTM Model:
o As Reuters reported for the first time in July, the configuration of seats is
exactly what drives the battle between the latest aircraft.
o Human Translation:
o As Reuters first reported in July, seat layout is exactly what drives the battle
between the latest jets.
January 5, 2017 44
Good Fine-grained Classification
January 5, 2017 45
Good Generalization
January 5, 2017 46
Sensible Errors
January 5, 2017 47
Generating Image Captions from Pixels
January 5, 2017 48
Work by Oriol
Vinyals et al.
Generating Image Captions from Pixels(2)
January 5, 2017 49
Generating Image Captions from Pixels(3)
January 5, 2017 50
Generating Image Captions from Pixels(4)
January 5, 2017 51
Conclusion
Deep Neural Networks are very effective for wide range of tasks
o By using parallelism, we can quickly train very large and effective deep neural
models on very large datasets
o Automatically build high-level representations to solve desired tasks
o By using embedding, can work with sparse data
o Effective in many domains: speech, vision, language modeling, user prediction,
language understanding, translation, advertising, …
January 5, 2017 52
An important tool in building Intelligent Systems !
Thank You !
Q/A ?
January 5, 2017 53
Recommendations
o Le, Ranzato, Monga, Devin, Chen, Corrado, Dean, & Ng. Building High-Level Features Using Large
Scale Unsupervised Learning, ICML 2012.
o Dean, Corrado, et al. , Large Scale Distributed Deep Networks, NIPS 2012.
o Mikolov, Chen, Corrado and Dean. Efficient Estimation of Word Representations in Vector Space,
http://arxiv.org/abs/1301.3781.
o Distributed Representations of Sentences and Documents, by Quoc Le and Tomas Mikolov, ICML
2014, http://arxiv.org/abs/1405.4053
o Vanhoucke, Devin and Heigold. Deep Neural Networks for Acoustic Modeling, ICASSP 2013.
o Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, and Quoc Le.
http://arxiv.org/abs/1409.3215. To appear in NIPS, 2014.
o http://research.google.com/papers
o http://research.google.com/people/jeff
January 5, 2017 54

Deep Learning for Artificial Intelligence (AI)

  • 1.
    January 5, 20171 Deep Learning Er. Shiva K. Shrestha, ME Computer, NCIT
  • 2.
    Slide Credit o JeffDean, Google, Large Scale Deep Learning o Andrew Ng, Deep Learning o Aditya Khosla & Joseph Lim, Visual Recognition through ML Competition January 5, 2017 2
  • 3.
    Structure ◦ General Questionsof the World ◦ What is Deep Learning? ◦ Why Deep Learning? ◦ Deep Neural Network Architectures ◦ Deep Learning Applications ◦ Conclusions, Recommendations January 5, 2017 3
  • 4.
    How Can WeBuild More Intelligent Computer Systems? According to Jeff Dean, Google: o Need to perceive and understand the world o Basic speech and vision capabilities o Language understanding o User behavior prediction o … January 5, 2017 4
  • 5.
    How can wedo this? According to Jeff Dean, Google: o Cannot write algorithms for each task we want to accomplish separately. o Need to write general algorithms that learn from observations o Can we build systems that: o Generate understanding from raw data o Solve difficult problems to improve products o Minimize software engineering effort January 5, 2017 5
  • 6.
    Plenty of Data oText: trillions of words of English + other languages o Visual: billions of images and videos o Audio: thousands of hours of speech per day o User Activity: queries, result page clicks, map requests, etc. o Knowledge Graph: billions of labelled relation triples o … January 5, 2017 6
  • 7.
  • 8.
    What are thesenumbers? January 5, 2017 8
  • 9.
    What are allthese words? January 5, 2017 9
  • 10.
    How about thesewords? January 5, 2017 10
  • 11.
    Textual Understanding “This movieshould have NEVER been made. From the poorly done animation, to the beyond bad acting. I am not sure at what point the people behind this movie said "Ok, looks good! Lets do it!" I was in awe of how truly horrid this movie was.” January 5, 2017 11
  • 12.
    General Machine LearningApproaches o Learning by labeled example: Supervised Learning o e.g. An email spam detector o amazingly effective if you have lots of examples o Discovering patterns: Unsupervised Learning o e.g. data clustering o difficult in practice, but useful if you lack labeled examples o Feedback right/wrong: Reinforcement Learning o e.g. learning to play chess by winning or losing o works well in some domains, becoming more important January 5, 2017 12
  • 13.
    Machine Learning o Formany of these problems, we have lots of data o gives computers the ability to learn without being explicitly programmed January 5, 2017 13 Approaches o Decision tree learning o Association rule learning o Artificial neural networks o Deep learning o Inductive logic programming o Support vector machines o Clustering o Bayesian networks Approaches … o Reinforcement learning o Representation learning o Similarity and metric learning o Sparse dictionary learning o Genetic algorithms o Rule-based machine learning o Learning classifier systems
  • 14.
    Typical Goal ofMachine Learning Label: “Motorcycle” Suggest tags Image search … Speech recognition Music classification Speaker identification … Web search Anti-spam Machine translation … text audio images/video I/p O/p ML ML ML January 5, 2017 14
  • 15.
    Basic Idea ofDeep Learning Is there some way to extract meaningful features from data even without knowing the task to be performed? Then, throw in some hierarchical ‘stuff’ to make it ‘deep’ January 5, 2017 15
  • 16.
    What is DeepLearning? o The modern reincarnation of ANNs from the 1980s and 90s. o A collection of simple trainable mathematical units, which collaborate to compute a complicated function. oCompatible with (3) General ML Approaches January 5, 2017 16
  • 17.
    What is DeepLearning? (2) o Loosely inspired by what (little) we know about the biological brain. o AKA: o Deep Structure Learning o Hierarchical Learning o Deep M/c Learning January 5, 2017 17
  • 18.
    Deep Learning Definitions Deeplearning is characterized as a class of machine learning algorithms that o use a cascade of many layers of nonlinear processing units for feature extraction and transformation. o are based on the learning of multiple levels of features or representations of the data. o are part of the broader machine learning field of learning representations of data. o learn multiple levels of representations that correspond to different levels of abstraction; January 5, 2017 18
  • 19.
    DL - Whyis this hard? You see this: But the camera sees this: January 5, 2017 19
  • 20.
  • 21.
  • 22.
    Pixel-based Representation (2) InputMotorbikes “Non”-Motorbikes Learning algorithm pixel 1 pixel2 pixel 1 pixel 2 Raw image January 5, 2017 22
  • 23.
    What We Want Input Motorbikes “Non”-Motorbikes Learning algorithm pixel1 pixel2 Feature representation handlebars wheel E.g., Does it have Handlebars? Wheels? Handlebars Wheels Raw image Features January 5, 2017 23
  • 24.
    Some Feature Representations SIFTSpin image HoG RIFT Textons GLOH January 5, 2017 24
  • 25.
    Some Feature Representations(2) SIFT Spin image HoG RIFT Textons GLOH Coming up with features is often difficult, time- consuming, and requires expert knowledge. January 5, 2017 25
  • 26.
    The Brain: Potential Motivationfor Deep Learning [Roe et al., 1992] Auditory Cortex learns to see! Auditory Cortex January 5, 2017 26
  • 27.
    The Brain adapts! [BrainPort;Welsh & Blasch, 1997; Nagel et al., 2005; Constantine-Paton & Law, 2009] Seeing with your Tongue Human Echolocation (Sonar) Haptic belt: Direction Sense Implanting a 3rd Eye January 5, 2017 27
  • 28.
    Feature Learning Problem Givena 14x14 image patch x, can represent it using 196 real numbers. Problem: Can we find a learn a better feature vector to represent this? 255 98 93 87 89 91 48 … January 5, 2017 28
  • 29.
    Why Deep Learning? MethodAccuracy Hessian + ESURF [Williems et al 2008] 38% Harris3D + HOG/HOF [Laptev et al 2003, 2004] 45% Cuboids + HOG/HOF [Dollar et al 2005, Laptev 2004] 46% Hessian + HOG/HOF [Laptev 2004, Williems et al 2008] 46% Dense + HOG / HOF [Laptev 2004] 47% Cuboids + HOG3D [Klaser 2008, Dollar et al 2005] 46% Unsupervised Feature Learning (DL) 52% [Le, Zhou & Ng, 2011] Task: Video Activity Recognition January 5, 2017 29
  • 30.
    Deep Neural NetworkArchitectures o GMDH: 1st DLN of 1965 o Convolutional NN o Neural history compressor o Recursive NN o Long short-term memory (LSTM) o Deep belief networks (DBN) o Convolutional deep belief networks o Large memory storage & retrieval NN o Deep Boltzmann machines o Stacked (de-noising) auto-encoders o Deep stacking networks o Tensor deep stacking networks o Spike-and-slab RBMs o Compound hierarchical-deep models o Deep coding networks o Deep Q-networks o Networks with separate memory structures January 5, 2017 30
  • 31.
    Neural Network (NN) x1 x2 x3 +1+1 Layer 1 Layer 2 Layer 4+1 Layer 3 4 layer network with 2 output units: January 5, 2017 31
  • 32.
    Unsupervised Feature Learningwith a NN x4 x5 x6 +1 x1 x2 x3 +1 a1 a2 a3 +1 b1 b2 b3 +1 c1 c2 c3 New representation for input. Use [c1, c3, c3] as representation to feed to learning algorithm.
  • 33.
    Deep Belief Network DBNis algorithm for learning a feature hierarchy. Building Block: 2-layer graphical model (Restricted Boltzmann Machine). Can then learn additional layers one at a time. Schematic overview of a deep belief net.
  • 34.
    Deep Belief Network(2) Input [x1, x2, x3, x4] Layer 2. [a1, a2, a3] Layer 3. [b1, b2, b3] Similar to a sparse auto-encoder in many ways. Stack RBMs on top of each other to get DBN. January 5, 2017 34
  • 35.
    Convolutional DBN forAudio Spectrogram Detection units Max pooling unit January 5, 2017 35
  • 36.
    Convolutional DBN forAudio (2) Spectrogram January 5, 2017 36
  • 37.
    Convolutional DBN forImages January 5, 2017 37
  • 38.
    Going Deep Pixels Object Models [HonglakLee] Training Set: Aligned images of faces. January 5, 2017 38 Edges Object Parts (combination of edges)
  • 39.
    Applications o Computer Vision:Object Detection & Recognition o Speech Recognition o Speaker Identification o Web Searches o Text Classification - Sentiment Analysis o Translations o Miscellaneous o Fine-grained Classification o Generalization o Generating Image Captions from Pixels o … January 5, 2017 39
  • 40.
  • 41.
    Speech Recognition onAndroid January 5, 2017 41
  • 42.
    Impact on SpeechRecognition January 5, 2017 42
  • 43.
    Text Classifications January 5,2017 43 Results for IMDB Sentiment Classification (long paragraphs)
  • 44.
    Translation o Google Translate: oAs Reuters noted for the first time in July, the seating configuration is exactly what fuels the battle between the latest devices. o Neural LSTM Model: o As Reuters reported for the first time in July, the configuration of seats is exactly what drives the battle between the latest aircraft. o Human Translation: o As Reuters first reported in July, seat layout is exactly what drives the battle between the latest jets. January 5, 2017 44
  • 45.
  • 46.
  • 47.
  • 48.
    Generating Image Captionsfrom Pixels January 5, 2017 48 Work by Oriol Vinyals et al.
  • 49.
    Generating Image Captionsfrom Pixels(2) January 5, 2017 49
  • 50.
    Generating Image Captionsfrom Pixels(3) January 5, 2017 50
  • 51.
    Generating Image Captionsfrom Pixels(4) January 5, 2017 51
  • 52.
    Conclusion Deep Neural Networksare very effective for wide range of tasks o By using parallelism, we can quickly train very large and effective deep neural models on very large datasets o Automatically build high-level representations to solve desired tasks o By using embedding, can work with sparse data o Effective in many domains: speech, vision, language modeling, user prediction, language understanding, translation, advertising, … January 5, 2017 52 An important tool in building Intelligent Systems !
  • 53.
    Thank You ! Q/A? January 5, 2017 53
  • 54.
    Recommendations o Le, Ranzato,Monga, Devin, Chen, Corrado, Dean, & Ng. Building High-Level Features Using Large Scale Unsupervised Learning, ICML 2012. o Dean, Corrado, et al. , Large Scale Distributed Deep Networks, NIPS 2012. o Mikolov, Chen, Corrado and Dean. Efficient Estimation of Word Representations in Vector Space, http://arxiv.org/abs/1301.3781. o Distributed Representations of Sentences and Documents, by Quoc Le and Tomas Mikolov, ICML 2014, http://arxiv.org/abs/1405.4053 o Vanhoucke, Devin and Heigold. Deep Neural Networks for Acoustic Modeling, ICASSP 2013. o Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, and Quoc Le. http://arxiv.org/abs/1409.3215. To appear in NIPS, 2014. o http://research.google.com/papers o http://research.google.com/people/jeff January 5, 2017 54