2. Handong Global University
Agenda
Introduction to Deep Learning
Deep Learning Algorithms
Successful Application of Deep Learning
Q&A
3. Handong Global University
Machine Learning
Learn from data
Data-driven approach ( Knowledge-based approach)
Trainable
Framework
Parameters Train
Training
Samples
(deep) neural networks,
Bayesian classifier,
HMM, MRF, CRF, SVM,
Etc.
Input
result
4. Handong Global University
Knowledge-base vs. Data-driven
Knowledge-based approaches
Intuitive
Dependent on designer’s knowledge
Difficult to justify or improve
Data-driven approaches
Learn from data
Requires training data
Given training data, easy to (re)build
Difficult to understand trained model
Given sufficient training samples,
Data-driven approach > knowledge-based approach
5. Handong Global University
Neural Networks
An artificial neural network is a mathematical
model inspired by biological neural networks.
Intelligence comes from their connection weights
Connection weights are decided by learning or adaptation
x1
x2
xn
…
o1
om
…
o
6. Handong Global University
Neural Networks
Neural networks is a mathematical model to learn
mappings
Mapping from a vector to another vector (or a scalar value)
Examples)
Pattern class (classification)
Independent variables dependent variables (regression)
Information decision
History future
x1
x2
xn
… o1
om
…
input
vector
output
vector
7. Handong Global University
Neural Networks
Neural networks can learn probability distribution from
training samples
Examples)
Approximate joint prob. P(X,Y), or conditional prob. P(X|Y)
Likelihood P(x|), a posteriori probability P(|x)
Classification, sampling, restoration
x1
x2
xn
…
o1
om
…
training
Samples from f(X)
{ X1, X2, … }
8. Handong Global University
Deep Neural Networks
A deep neural network (DNN) is a neural network with
multiple levels of nonlinear operations.
Layer 1
Input
Layer 2
Output
…
9. Handong Global University
Network Depth and Decision Region
[Lipman87]
Half Plane
Bounded
by
Hyperplane
Convex
(open or
closed)
Regions
Arbitrary
(Complexity
Limited By
# Nodes)
10. Handong Global University
Why Deep Networks?
Efficient in modeling of complex functions
Representation of some functions needs sufficiently many
layers.
Stepwise abstraction to learn high-level feature
Large capacity
DNN can learn very well from a huge volume of samples
Integrated learning
DNN integrates feature extractor and classifier in a single
network
11. Handong Global University
Stepwise Abstraction
Abstraction from low level representation to high
level representation.
Similar to human perception process
Layer 1
Input
Layer 2
Output
…
[Lee12]
12. Handong Global University
Integrated Learning
Deep networks optimize both feature extractor and
classifier in a unified framework.
Conventional system
Deep neural network
Classifier
Feature
Extractor
DNN
input
input
output
output
13. Handong Global University
Challenges with Deep Networks
Hard to optimize
Back-propagation algorithm does not work well for deep
fully connected networks starting from random weights
New training algorithms
A large number of parameters
A huge volume of training samples is now available.
Techniques to improve generalization ability
Ex) sparse coding, virtual sample generation, dropout
Requires heavy computation
GPU-based massive parallel processing
H/W implementation (SoC, FPGA)
15. Handong Global University
BP on Deep Network
BP does not work on deep networks
Error signals from many nodes are blended together.
become dim and vague on bottom layers
“Diminishing gradient problem”
Error signal
at a non-output node i
𝛿𝑖 = 𝑓′(𝑛𝑒𝑡𝑖)
𝑗
𝑤𝑖𝑗 𝛿𝑗
j
wij
i
16. Handong Global University
Agenda
Introduction to Deep Learning
Deep Learning Algorithms
Successful Application of Deep Learning
Q&A
17. Handong Global University
Breakthroughs in Deep Learning
Conventional back-propagation algorithm does not
work well for deep fully-connected networks starting
from random weights.
Layer-wise unsupervised pre-training algorithm
Ex) DBN[Hinton2006], stacked auto-encoders[Bengio2006]
First, place the weights near a local optimal position by
unsupervised learning algorithm
Then, conventional supervised learning algorithms work fine
Network structure to prevent diminishing gradient
problem
Ex) Convolutional Neural Networks [Fukushima1980][LeCun1998]
18. Handong Global University
Layer-wise Unsupervised Pre-training
Based on generative neural networks
Training procedure
1. Pre-train each layer to reproduce the input by unsupervised
learning algorithm
2. Fine-tune the whole network by a supervised learning
algorithm
Ex) wake-sleep[Hinton2003], back-propagation
20. Handong Global University
Layer-wise Unsupervised Pre-training
Starting from bottom layer, train each layer to
reproduce the input
Input encoding hidden decoding reprod. of input
Layer 1 (X1)
Input layer (X0)
Layer 2 (X2)
Output layer (XN)
W1
W2
WN
…
WN
W1
W2
Forward-backward network
Forward propagation
for encoding
Backward propagation
for decoding
1st phase
21. Handong Global University
Layer-wise Unsupervised Pre-training
Starting from bottom layer, train each layer to
reproduce the input
Input encoding hidden decoding reprod. of input
Layer 1 (X1)
Input layer (X0)
Layer 2 (X2)
Output layer (XN)
W1
W2
WN
…
WN
W1
W2
Forward-backward network
Forward propagation
for encoding
Backward propagation
for decoding
2nd phase
22. Handong Global University
Convolutional Neural Networks
Neocognitron [Fukushima80]
Designed to imitate visual processing of human/animals
Suggested basic concept and network structure of CNN
LeNet [LeCun98]
Simplified node and network structure
Gradient-based learning
Many improvements and extensions
[Simard2003], [Ciresan2011]
Convolutional DBN [Lee2009]
Siamese network [Chopra2005]
Locally connected network [Taigman2014]
Fast training algorithm using FFT [Mathieu2014]
23. Handong Global University
Convolutional Neural Networks
Composed of many heterogeneous layers
Convolution layer – feature extraction
Max-pooling layer - feature abstraction
Fully-connected layers - classification
24. Handong Global University
Convolution Layers
Odd-numbered layers in
low/middle-level of CNN
Nodes on each layer are grouped
into 2D planes (or feature maps)
Each plane is connected to one or
more input planes
Each node computes weighted sum
of input nodes in a small region
All nodes on a plane share weight
set
Extract feature by convolution
operation
25. Handong Global University
Max-Pooling Layers
Even-numbered layers in
low/middle-level of CNN
Nodes on each layer are grouped
into planes
Each plane is connected to only
one input plane
Each node chooses maximum
among input nodes in a small
region
Abstract features
Reduces feature dimension
Ignores positional variation of feature
elements
26. Handong Global University
Fully-connected Layers
Top 2~3 layers of CNN
1D structure
Each node is fully connected to
all input nodes
Each node computes weighted
sum of all input nodes
Classify input pattern with high-
level features extracted by previous
layers
27. Handong Global University
Gradient-based Learning [LeCun98]
Trains the whole network to minimize a single error
function E.
At layer n
At layer n-1
Layer 1 (X1)
Input layer (X0)
W1
Layer 2 (X2)
W2
Output layer (XN)
WN
…
28. Handong Global University
Why CNN Works Well?
Network structure effectively guides learning from 2D
images preventing the diminishing gradient problem
Sparse connection
Parameter tying
Good at catching 2D structures
Training of convolution masks is effective to learn feature
extraction
Good at handing shape variation
Abstraction in phases
Max pooling
Directly train the network to minimize classification error.
29. Handong Global University
Agenda
Introduction to Deep Learning
Deep Learning Algorithms
Successful Application of Deep Learning
Q&A
30. Handong Global University
Numeral Digit Recognition (MNIST DB)
Support Vector Machines
Neural Nets
Convolutional Neural Networks (Deep Networks)
35. Handong Global University
Deep Learning in Face Recognition
Face recognition flow
1. Detection
2. Alignment (pre-processing)
3. Representation (feature extraction) CNN
Robust to variation in lighting, expression, …
Alternatives: LBP + PCA/FDA
4. Verification / Classification Siamese network
Alternatives: Euclidian distance, dot product, 2 distance, SVM,
…
36. Handong Global University
DeepFace [Taigman2014]
Facebook AI group, Tel Aviv Univ.
Y. Taigman, et.al.
Achieved 97.25% on LFW dataset.
cf. Conventional best performance: 96.33% [Cao 2013]
Face recognition prodedure
2D and 3D alignment
CNN-based representation
Verification by weighted 2 distance and Siamese network
A huge volume of training data
SFC dataset (4,000 identity * 1,000 samples)
37. Handong Global University
DeepFace [Taigman2014]
Feature extraction by CNN
Train a CNN-based face recognizer
Represent the input face image by the output of (N-1)th
layer
39. Handong Global University
Deep Learning in Speech Recognition
Deep Neural Networks for Acoustic Modeling in
Speech Recognition [Hinton2012]
Deep learning
40. Handong Global University
Deep Learning in Speech Recognition
CNN for speech recognition [Ossama13]
Apply CNN on 2D vector (frame, frequency bands)
41. Handong Global University
Hangul Recognition
Challenges in Hangul Recognition
A multitude of similar characters
Missing one small stroke often result in misclassification
Ex) 에-애–얘, 괟-괱-괠-팰, 흥-홍-훙-흉
Excessive cursiveness
42. Handong Global University
Deep Learning in Hangul Recognition
Methods SERI95a PE92
Kim&Kim01
Structural
matching
86.3% 82.2%
Kang&Kim04
Structural
matching
90.3% 87.7%
Jang&Kim02
Structural
matching +
Post-processing
93.4% N/A
Kim&Liu11 MQDF 93.71% 85.99%
Kim CNN 95.96% 92.92%
Error reduction rates 35.71% 42.44%
[Kim2014]