Handong Global University
Deep Learning:
인공지능/기계학습의 새로운 트랜드
In-Jung Kim
Handong Global University
2014. 11. 7.
Handong Global University
Agenda
 Introduction to Deep Learning
 Deep Learning Algorithms
 Successful Application of Deep Learning
 Q&A
Handong Global University
Machine Learning
 Learn from data
 Data-driven approach ( Knowledge-based approach)
Trainable
Framework
Parameters Train
Training
Samples
(deep) neural networks,
Bayesian classifier,
HMM, MRF, CRF, SVM,
Etc.
Input
result
Handong Global University
Knowledge-base vs. Data-driven
 Knowledge-based approaches
 Intuitive
 Dependent on designer’s knowledge
 Difficult to justify or improve
 Data-driven approaches
 Learn from data
 Requires training data
 Given training data, easy to (re)build
 Difficult to understand trained model
 Given sufficient training samples,
 Data-driven approach > knowledge-based approach
Handong Global University
Neural Networks
 An artificial neural network is a mathematical
model inspired by biological neural networks.
 Intelligence comes from their connection weights
 Connection weights are decided by learning or adaptation
x1
x2
xn
…
o1
om
…
o
Handong Global University
Neural Networks
 Neural networks is a mathematical model to learn
mappings
 Mapping from a vector to another vector (or a scalar value)
Examples)
 Pattern  class (classification)
 Independent variables  dependent variables (regression)
 Information  decision
 History  future
x1
x2
xn
… o1
om
…
input
vector
output
vector
Handong Global University
Neural Networks
 Neural networks can learn probability distribution from
training samples
Examples)
 Approximate joint prob. P(X,Y), or conditional prob. P(X|Y)
 Likelihood P(x|), a posteriori probability P(|x)
 Classification, sampling, restoration
x1
x2
xn
…
o1
om
…
training
Samples from f(X)
{ X1, X2, … }
Handong Global University
Deep Neural Networks
 A deep neural network (DNN) is a neural network with
multiple levels of nonlinear operations.
Layer 1
Input
Layer 2
Output
…
Handong Global University
Network Depth and Decision Region
[Lipman87]
Half Plane
Bounded
by
Hyperplane
Convex
(open or
closed)
Regions
Arbitrary
(Complexity
Limited By
# Nodes)
Handong Global University
Why Deep Networks?
 Efficient in modeling of complex functions
 Representation of some functions needs sufficiently many
layers.
 Stepwise abstraction to learn high-level feature
 Large capacity
 DNN can learn very well from a huge volume of samples
 Integrated learning
 DNN integrates feature extractor and classifier in a single
network
Handong Global University
Stepwise Abstraction
 Abstraction from low level representation to high
level representation.
 Similar to human perception process
Layer 1
Input
Layer 2
Output
…
[Lee12]
Handong Global University
Integrated Learning
 Deep networks optimize both feature extractor and
classifier in a unified framework.
 Conventional system
 Deep neural network
Classifier
Feature
Extractor
DNN
input
input
output
output
Handong Global University
Challenges with Deep Networks
 Hard to optimize
 Back-propagation algorithm does not work well for deep
fully connected networks starting from random weights
 New training algorithms
 A large number of parameters
 A huge volume of training samples is now available.
 Techniques to improve generalization ability
Ex) sparse coding, virtual sample generation, dropout
 Requires heavy computation
 GPU-based massive parallel processing
 H/W implementation (SoC, FPGA)
Handong Global University
The Back-Propagation Algorithm
 Gradient descent algorithm to minimize error E.
Layer 1 (X1)
Input layer (X0)
W1
Layer 2 (X2)
W2
Output layer (XN)
WN
feature
errorsignal
…
netj
𝜕𝐸
𝜕𝑤 𝑖𝑗
=
𝜕𝐸
𝜕𝑛𝑒𝑡 𝑗
𝜕𝑛𝑒𝑡 𝑗
𝜕𝑤 𝑖𝑗
=𝛿𝑗 𝑥𝑖
𝛿𝑗 =
𝜕𝐸
𝜕𝑛𝑒𝑡𝑗
𝛿𝑖 = 𝑓′
(𝑛𝑒𝑡𝑖)
𝑗
𝑤𝑖𝑗 𝛿𝑗
Handong Global University
BP on Deep Network
 BP does not work on deep networks
 Error signals from many nodes are blended together.
 become dim and vague on bottom layers
“Diminishing gradient problem”
 Error signal
at a non-output node i
𝛿𝑖 = 𝑓′(𝑛𝑒𝑡𝑖)
𝑗
𝑤𝑖𝑗 𝛿𝑗
j
wij
i
Handong Global University
Agenda
 Introduction to Deep Learning
 Deep Learning Algorithms
 Successful Application of Deep Learning
 Q&A
Handong Global University
Breakthroughs in Deep Learning
 Conventional back-propagation algorithm does not
work well for deep fully-connected networks starting
from random weights.
 Layer-wise unsupervised pre-training algorithm
Ex) DBN[Hinton2006], stacked auto-encoders[Bengio2006]
 First, place the weights near a local optimal position by
unsupervised learning algorithm
 Then, conventional supervised learning algorithms work fine
 Network structure to prevent diminishing gradient
problem
Ex) Convolutional Neural Networks [Fukushima1980][LeCun1998]
Handong Global University
Layer-wise Unsupervised Pre-training
 Based on generative neural networks
 Training procedure
1. Pre-train each layer to reproduce the input by unsupervised
learning algorithm
2. Fine-tune the whole network by a supervised learning
algorithm
Ex) wake-sleep[Hinton2003], back-propagation
Handong Global University
Generative Neural Networks
 Neural networks with forward–backward connections
Layer 1 (X1)
Input layer (X0)
W1
Layer 2 (X2)
W2
Output layer (XN)
WN
…
Layer 1 (X1)
Input layer (X0)
Layer 2 (X2)
Output layer (XN)
W1
W2
WN
…
WN
W1
W2
Forward-backward networkFeed forward network
Forward connection
for “encoding”
Backward connection
for “decoding”
Handong Global University
Layer-wise Unsupervised Pre-training
 Starting from bottom layer, train each layer to
reproduce the input
Input  encoding  hidden  decoding  reprod. of input
Layer 1 (X1)
Input layer (X0)
Layer 2 (X2)
Output layer (XN)
W1
W2
WN
…
WN
W1
W2
Forward-backward network
Forward propagation
for encoding
Backward propagation
for decoding
1st phase
Handong Global University
Layer-wise Unsupervised Pre-training
 Starting from bottom layer, train each layer to
reproduce the input
Input  encoding  hidden  decoding  reprod. of input
Layer 1 (X1)
Input layer (X0)
Layer 2 (X2)
Output layer (XN)
W1
W2
WN
…
WN
W1
W2
Forward-backward network
Forward propagation
for encoding
Backward propagation
for decoding
2nd phase
Handong Global University
Convolutional Neural Networks
 Neocognitron [Fukushima80]
 Designed to imitate visual processing of human/animals
 Suggested basic concept and network structure of CNN
 LeNet [LeCun98]
 Simplified node and network structure
 Gradient-based learning
 Many improvements and extensions
 [Simard2003], [Ciresan2011]
 Convolutional DBN [Lee2009]
 Siamese network [Chopra2005]
 Locally connected network [Taigman2014]
 Fast training algorithm using FFT [Mathieu2014]
Handong Global University
Convolutional Neural Networks
 Composed of many heterogeneous layers
 Convolution layer – feature extraction
 Max-pooling layer - feature abstraction
 Fully-connected layers - classification
Handong Global University
Convolution Layers
 Odd-numbered layers in
low/middle-level of CNN
 Nodes on each layer are grouped
into 2D planes (or feature maps)
 Each plane is connected to one or
more input planes
 Each node computes weighted sum
of input nodes in a small region
 All nodes on a plane share weight
set
 Extract feature by convolution
operation
Handong Global University
Max-Pooling Layers
 Even-numbered layers in
low/middle-level of CNN
 Nodes on each layer are grouped
into planes
 Each plane is connected to only
one input plane
 Each node chooses maximum
among input nodes in a small
region
 Abstract features
 Reduces feature dimension
 Ignores positional variation of feature
elements
Handong Global University
Fully-connected Layers
 Top 2~3 layers of CNN
 1D structure
 Each node is fully connected to
all input nodes
 Each node computes weighted
sum of all input nodes
 Classify input pattern with high-
level features extracted by previous
layers
Handong Global University
Gradient-based Learning [LeCun98]
 Trains the whole network to minimize a single error
function E.
 At layer n
 At layer n-1
Layer 1 (X1)
Input layer (X0)
W1
Layer 2 (X2)
W2
Output layer (XN)
WN
…
Handong Global University
Why CNN Works Well?
 Network structure effectively guides learning from 2D
images preventing the diminishing gradient problem
 Sparse connection
 Parameter tying
 Good at catching 2D structures
 Training of convolution masks is effective to learn feature
extraction
 Good at handing shape variation
 Abstraction in phases
 Max pooling
 Directly train the network to minimize classification error.
Handong Global University
Agenda
 Introduction to Deep Learning
 Deep Learning Algorithms
 Successful Application of Deep Learning
 Q&A
Handong Global University
Numeral Digit Recognition (MNIST DB)
 Support Vector Machines
 Neural Nets
 Convolutional Neural Networks (Deep Networks)
Handong Global University
Chinese Character Recognition (CASIA DB)
 ICDAR 2013 Competition Result [Yin, et.al 2013]
CNN
Handong Global University
Object Image Recognition
 ImageNet Large Scale Visual Recognition Challenge
2013 (ILSVRC2013, http://www.image-net.org)
 1300 object categories
 Training set: 1,281,167 images
 Validation set: 50,000 images
 Test set: 100,000 images
Handong Global University
Examples of ILSVRC2013 Images
Handong Global University
ILSVRC2013 Results
All high rankers are based on CNNs
Handong Global University
Deep Learning in Face Recognition
 Face recognition flow
1. Detection
2. Alignment (pre-processing)
3. Representation (feature extraction)  CNN
 Robust to variation in lighting, expression, …
 Alternatives: LBP + PCA/FDA
4. Verification / Classification  Siamese network
 Alternatives: Euclidian distance, dot product, 2 distance, SVM,
…
Handong Global University
DeepFace [Taigman2014]
 Facebook AI group, Tel Aviv Univ.
 Y. Taigman, et.al.
 Achieved 97.25% on LFW dataset.
cf. Conventional best performance: 96.33% [Cao 2013]
 Face recognition prodedure
 2D and 3D alignment
 CNN-based representation
 Verification by weighted 2 distance and Siamese network
 A huge volume of training data
 SFC dataset (4,000 identity * 1,000 samples)
Handong Global University
DeepFace [Taigman2014]
 Feature extraction by CNN
 Train a CNN-based face recognizer
 Represent the input face image by the output of (N-1)th
layer
Handong Global University
Deep Learning in Speech Recognition
 Hybrid system (HMM + DNN)
Handong Global University
Deep Learning in Speech Recognition
 Deep Neural Networks for Acoustic Modeling in
Speech Recognition [Hinton2012]
Deep learning
Handong Global University
Deep Learning in Speech Recognition
 CNN for speech recognition [Ossama13]
 Apply CNN on 2D vector (frame, frequency bands)
Handong Global University
Hangul Recognition
 Challenges in Hangul Recognition
 A multitude of similar characters
 Missing one small stroke often result in misclassification
Ex) 에-애–얘, 괟-괱-괠-팰, 흥-홍-훙-흉
 Excessive cursiveness
Handong Global University
Deep Learning in Hangul Recognition
Methods SERI95a PE92
Kim&Kim01
Structural
matching
86.3% 82.2%
Kang&Kim04
Structural
matching
90.3% 87.7%
Jang&Kim02
Structural
matching +
Post-processing
93.4% N/A
Kim&Liu11 MQDF 93.71% 85.99%
Kim CNN 95.96% 92.92%
Error reduction rates 35.71% 42.44%
[Kim2014]
Handong Global University
Q&A

Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중

  • 1.
    Handong Global University DeepLearning: 인공지능/기계학습의 새로운 트랜드 In-Jung Kim Handong Global University 2014. 11. 7.
  • 2.
    Handong Global University Agenda Introduction to Deep Learning  Deep Learning Algorithms  Successful Application of Deep Learning  Q&A
  • 3.
    Handong Global University MachineLearning  Learn from data  Data-driven approach ( Knowledge-based approach) Trainable Framework Parameters Train Training Samples (deep) neural networks, Bayesian classifier, HMM, MRF, CRF, SVM, Etc. Input result
  • 4.
    Handong Global University Knowledge-basevs. Data-driven  Knowledge-based approaches  Intuitive  Dependent on designer’s knowledge  Difficult to justify or improve  Data-driven approaches  Learn from data  Requires training data  Given training data, easy to (re)build  Difficult to understand trained model  Given sufficient training samples,  Data-driven approach > knowledge-based approach
  • 5.
    Handong Global University NeuralNetworks  An artificial neural network is a mathematical model inspired by biological neural networks.  Intelligence comes from their connection weights  Connection weights are decided by learning or adaptation x1 x2 xn … o1 om … o
  • 6.
    Handong Global University NeuralNetworks  Neural networks is a mathematical model to learn mappings  Mapping from a vector to another vector (or a scalar value) Examples)  Pattern  class (classification)  Independent variables  dependent variables (regression)  Information  decision  History  future x1 x2 xn … o1 om … input vector output vector
  • 7.
    Handong Global University NeuralNetworks  Neural networks can learn probability distribution from training samples Examples)  Approximate joint prob. P(X,Y), or conditional prob. P(X|Y)  Likelihood P(x|), a posteriori probability P(|x)  Classification, sampling, restoration x1 x2 xn … o1 om … training Samples from f(X) { X1, X2, … }
  • 8.
    Handong Global University DeepNeural Networks  A deep neural network (DNN) is a neural network with multiple levels of nonlinear operations. Layer 1 Input Layer 2 Output …
  • 9.
    Handong Global University NetworkDepth and Decision Region [Lipman87] Half Plane Bounded by Hyperplane Convex (open or closed) Regions Arbitrary (Complexity Limited By # Nodes)
  • 10.
    Handong Global University WhyDeep Networks?  Efficient in modeling of complex functions  Representation of some functions needs sufficiently many layers.  Stepwise abstraction to learn high-level feature  Large capacity  DNN can learn very well from a huge volume of samples  Integrated learning  DNN integrates feature extractor and classifier in a single network
  • 11.
    Handong Global University StepwiseAbstraction  Abstraction from low level representation to high level representation.  Similar to human perception process Layer 1 Input Layer 2 Output … [Lee12]
  • 12.
    Handong Global University IntegratedLearning  Deep networks optimize both feature extractor and classifier in a unified framework.  Conventional system  Deep neural network Classifier Feature Extractor DNN input input output output
  • 13.
    Handong Global University Challengeswith Deep Networks  Hard to optimize  Back-propagation algorithm does not work well for deep fully connected networks starting from random weights  New training algorithms  A large number of parameters  A huge volume of training samples is now available.  Techniques to improve generalization ability Ex) sparse coding, virtual sample generation, dropout  Requires heavy computation  GPU-based massive parallel processing  H/W implementation (SoC, FPGA)
  • 14.
    Handong Global University TheBack-Propagation Algorithm  Gradient descent algorithm to minimize error E. Layer 1 (X1) Input layer (X0) W1 Layer 2 (X2) W2 Output layer (XN) WN feature errorsignal … netj 𝜕𝐸 𝜕𝑤 𝑖𝑗 = 𝜕𝐸 𝜕𝑛𝑒𝑡 𝑗 𝜕𝑛𝑒𝑡 𝑗 𝜕𝑤 𝑖𝑗 =𝛿𝑗 𝑥𝑖 𝛿𝑗 = 𝜕𝐸 𝜕𝑛𝑒𝑡𝑗 𝛿𝑖 = 𝑓′ (𝑛𝑒𝑡𝑖) 𝑗 𝑤𝑖𝑗 𝛿𝑗
  • 15.
    Handong Global University BPon Deep Network  BP does not work on deep networks  Error signals from many nodes are blended together.  become dim and vague on bottom layers “Diminishing gradient problem”  Error signal at a non-output node i 𝛿𝑖 = 𝑓′(𝑛𝑒𝑡𝑖) 𝑗 𝑤𝑖𝑗 𝛿𝑗 j wij i
  • 16.
    Handong Global University Agenda Introduction to Deep Learning  Deep Learning Algorithms  Successful Application of Deep Learning  Q&A
  • 17.
    Handong Global University Breakthroughsin Deep Learning  Conventional back-propagation algorithm does not work well for deep fully-connected networks starting from random weights.  Layer-wise unsupervised pre-training algorithm Ex) DBN[Hinton2006], stacked auto-encoders[Bengio2006]  First, place the weights near a local optimal position by unsupervised learning algorithm  Then, conventional supervised learning algorithms work fine  Network structure to prevent diminishing gradient problem Ex) Convolutional Neural Networks [Fukushima1980][LeCun1998]
  • 18.
    Handong Global University Layer-wiseUnsupervised Pre-training  Based on generative neural networks  Training procedure 1. Pre-train each layer to reproduce the input by unsupervised learning algorithm 2. Fine-tune the whole network by a supervised learning algorithm Ex) wake-sleep[Hinton2003], back-propagation
  • 19.
    Handong Global University GenerativeNeural Networks  Neural networks with forward–backward connections Layer 1 (X1) Input layer (X0) W1 Layer 2 (X2) W2 Output layer (XN) WN … Layer 1 (X1) Input layer (X0) Layer 2 (X2) Output layer (XN) W1 W2 WN … WN W1 W2 Forward-backward networkFeed forward network Forward connection for “encoding” Backward connection for “decoding”
  • 20.
    Handong Global University Layer-wiseUnsupervised Pre-training  Starting from bottom layer, train each layer to reproduce the input Input  encoding  hidden  decoding  reprod. of input Layer 1 (X1) Input layer (X0) Layer 2 (X2) Output layer (XN) W1 W2 WN … WN W1 W2 Forward-backward network Forward propagation for encoding Backward propagation for decoding 1st phase
  • 21.
    Handong Global University Layer-wiseUnsupervised Pre-training  Starting from bottom layer, train each layer to reproduce the input Input  encoding  hidden  decoding  reprod. of input Layer 1 (X1) Input layer (X0) Layer 2 (X2) Output layer (XN) W1 W2 WN … WN W1 W2 Forward-backward network Forward propagation for encoding Backward propagation for decoding 2nd phase
  • 22.
    Handong Global University ConvolutionalNeural Networks  Neocognitron [Fukushima80]  Designed to imitate visual processing of human/animals  Suggested basic concept and network structure of CNN  LeNet [LeCun98]  Simplified node and network structure  Gradient-based learning  Many improvements and extensions  [Simard2003], [Ciresan2011]  Convolutional DBN [Lee2009]  Siamese network [Chopra2005]  Locally connected network [Taigman2014]  Fast training algorithm using FFT [Mathieu2014]
  • 23.
    Handong Global University ConvolutionalNeural Networks  Composed of many heterogeneous layers  Convolution layer – feature extraction  Max-pooling layer - feature abstraction  Fully-connected layers - classification
  • 24.
    Handong Global University ConvolutionLayers  Odd-numbered layers in low/middle-level of CNN  Nodes on each layer are grouped into 2D planes (or feature maps)  Each plane is connected to one or more input planes  Each node computes weighted sum of input nodes in a small region  All nodes on a plane share weight set  Extract feature by convolution operation
  • 25.
    Handong Global University Max-PoolingLayers  Even-numbered layers in low/middle-level of CNN  Nodes on each layer are grouped into planes  Each plane is connected to only one input plane  Each node chooses maximum among input nodes in a small region  Abstract features  Reduces feature dimension  Ignores positional variation of feature elements
  • 26.
    Handong Global University Fully-connectedLayers  Top 2~3 layers of CNN  1D structure  Each node is fully connected to all input nodes  Each node computes weighted sum of all input nodes  Classify input pattern with high- level features extracted by previous layers
  • 27.
    Handong Global University Gradient-basedLearning [LeCun98]  Trains the whole network to minimize a single error function E.  At layer n  At layer n-1 Layer 1 (X1) Input layer (X0) W1 Layer 2 (X2) W2 Output layer (XN) WN …
  • 28.
    Handong Global University WhyCNN Works Well?  Network structure effectively guides learning from 2D images preventing the diminishing gradient problem  Sparse connection  Parameter tying  Good at catching 2D structures  Training of convolution masks is effective to learn feature extraction  Good at handing shape variation  Abstraction in phases  Max pooling  Directly train the network to minimize classification error.
  • 29.
    Handong Global University Agenda Introduction to Deep Learning  Deep Learning Algorithms  Successful Application of Deep Learning  Q&A
  • 30.
    Handong Global University NumeralDigit Recognition (MNIST DB)  Support Vector Machines  Neural Nets  Convolutional Neural Networks (Deep Networks)
  • 31.
    Handong Global University ChineseCharacter Recognition (CASIA DB)  ICDAR 2013 Competition Result [Yin, et.al 2013] CNN
  • 32.
    Handong Global University ObjectImage Recognition  ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013, http://www.image-net.org)  1300 object categories  Training set: 1,281,167 images  Validation set: 50,000 images  Test set: 100,000 images
  • 33.
  • 34.
    Handong Global University ILSVRC2013Results All high rankers are based on CNNs
  • 35.
    Handong Global University DeepLearning in Face Recognition  Face recognition flow 1. Detection 2. Alignment (pre-processing) 3. Representation (feature extraction)  CNN  Robust to variation in lighting, expression, …  Alternatives: LBP + PCA/FDA 4. Verification / Classification  Siamese network  Alternatives: Euclidian distance, dot product, 2 distance, SVM, …
  • 36.
    Handong Global University DeepFace[Taigman2014]  Facebook AI group, Tel Aviv Univ.  Y. Taigman, et.al.  Achieved 97.25% on LFW dataset. cf. Conventional best performance: 96.33% [Cao 2013]  Face recognition prodedure  2D and 3D alignment  CNN-based representation  Verification by weighted 2 distance and Siamese network  A huge volume of training data  SFC dataset (4,000 identity * 1,000 samples)
  • 37.
    Handong Global University DeepFace[Taigman2014]  Feature extraction by CNN  Train a CNN-based face recognizer  Represent the input face image by the output of (N-1)th layer
  • 38.
    Handong Global University DeepLearning in Speech Recognition  Hybrid system (HMM + DNN)
  • 39.
    Handong Global University DeepLearning in Speech Recognition  Deep Neural Networks for Acoustic Modeling in Speech Recognition [Hinton2012] Deep learning
  • 40.
    Handong Global University DeepLearning in Speech Recognition  CNN for speech recognition [Ossama13]  Apply CNN on 2D vector (frame, frequency bands)
  • 41.
    Handong Global University HangulRecognition  Challenges in Hangul Recognition  A multitude of similar characters  Missing one small stroke often result in misclassification Ex) 에-애–얘, 괟-괱-괠-팰, 흥-홍-훙-흉  Excessive cursiveness
  • 42.
    Handong Global University DeepLearning in Hangul Recognition Methods SERI95a PE92 Kim&Kim01 Structural matching 86.3% 82.2% Kang&Kim04 Structural matching 90.3% 87.7% Jang&Kim02 Structural matching + Post-processing 93.4% N/A Kim&Liu11 MQDF 93.71% 85.99% Kim CNN 95.96% 92.92% Error reduction rates 35.71% 42.44% [Kim2014]
  • 43.