Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중

Handong Global University
Deep Learning:
인공지능/기계학습의 새로운 트랜드
In-Jung Kim
2014. 11. 7.

Agenda
 Introduction to Deep Learning
 Deep Learning Algorithms
 Successful Application of Deep Learning
 Q&A

Machine Learning
 Learn from data
 Data-driven approach ( Knowledge-based approach)
Trainable
Framework
Parameters Train
Training
Samples
(deep) neural networks,
Bayesian classifier,
HMM, MRF, CRF, SVM,
Etc.
Input
result

Knowledge-base vs. Data-driven
 Knowledge-based approaches
 Intuitive
 Dependent on designer’s knowledge
 Difficult to justify or improve
 Data-driven approaches
 Learn from data
 Requires training data
 Given training data, easy to (re)build
 Difficult to understand trained model
 Given sufficient training samples,
 Data-driven approach > knowledge-based approach

Neural Networks
 An artificial neural network is a mathematical
model inspired by biological neural networks.
 Intelligence comes from their connection weights
 Connection weights are decided by learning or adaptation
x1
x2
xn
…
o1
om
…
o

Neural Networks
 Neural networks is a mathematical model to learn
mappings
 Mapping from a vector to another vector (or a scalar value)
Examples)
 Pattern  class (classification)
 Independent variables  dependent variables (regression)
 Information  decision
 History  future
x1
x2
xn
… o1
om
…
input
vector
output
vector

Neural Networks
 Neural networks can learn probability distribution from
training samples
Examples)
 Approximate joint prob. P(X,Y), or conditional prob. P(X|Y)
 Likelihood P(x|), a posteriori probability P(|x)
 Classification, sampling, restoration
x1
x2
xn
…
o1
om
…
training
Samples from f(X)
{ X1, X2, … }

Deep Neural Networks
 A deep neural network (DNN) is a neural network with
multiple levels of nonlinear operations.
Layer 1
Input
Layer 2
Output
…

Network Depth and Decision Region
[Lipman87]
Half Plane
Bounded
by
Hyperplane
Convex
(open or
closed)
Regions
Arbitrary
(Complexity
Limited By
# Nodes)

Why Deep Networks?
 Efficient in modeling of complex functions
 Representation of some functions needs sufficiently many
layers.
 Stepwise abstraction to learn high-level feature
 Large capacity
 DNN can learn very well from a huge volume of samples
 Integrated learning
 DNN integrates feature extractor and classifier in a single
network

Stepwise Abstraction
 Abstraction from low level representation to high
level representation.
 Similar to human perception process
Layer 1
Input
Layer 2
Output
…
[Lee12]

Integrated Learning
 Deep networks optimize both feature extractor and
classifier in a unified framework.
 Conventional system
 Deep neural network
Classifier
Feature
Extractor
DNN
input
input
output
output

Challenges with Deep Networks
 Hard to optimize
 Back-propagation algorithm does not work well for deep
fully connected networks starting from random weights
 New training algorithms
 A large number of parameters
 A huge volume of training samples is now available.
 Techniques to improve generalization ability
Ex) sparse coding, virtual sample generation, dropout
 Requires heavy computation
 GPU-based massive parallel processing
 H/W implementation (SoC, FPGA)

The Back-Propagation Algorithm
 Gradient descent algorithm to minimize error E.
Layer 1 (X1)
Input layer (X0)
W1
Layer 2 (X2)
W2
Output layer (XN)
WN
feature
errorsignal
…
netj
𝜕𝐸
𝜕𝑤 𝑖𝑗
=
𝜕𝐸
𝜕𝑛𝑒𝑡 𝑗
𝜕𝑛𝑒𝑡 𝑗
𝜕𝑤 𝑖𝑗
=𝛿𝑗 𝑥𝑖
𝛿𝑗 =
𝜕𝐸
𝜕𝑛𝑒𝑡𝑗
𝛿𝑖 = 𝑓′
(𝑛𝑒𝑡𝑖)
𝑗
𝑤𝑖𝑗 𝛿𝑗

BP on Deep Network
 BP does not work on deep networks
 Error signals from many nodes are blended together.
 become dim and vague on bottom layers
“Diminishing gradient problem”
 Error signal
at a non-output node i
𝛿𝑖 = 𝑓′(𝑛𝑒𝑡𝑖)
𝑗
𝑤𝑖𝑗 𝛿𝑗
j
wij
i

Breakthroughs in Deep Learning
 Conventional back-propagation algorithm does not
work well for deep fully-connected networks starting
from random weights.
 Layer-wise unsupervised pre-training algorithm
Ex) DBN[Hinton2006], stacked auto-encoders[Bengio2006]
 First, place the weights near a local optimal position by
unsupervised learning algorithm
 Then, conventional supervised learning algorithms work fine
 Network structure to prevent diminishing gradient
problem
Ex) Convolutional Neural Networks [Fukushima1980][LeCun1998]

Layer-wise Unsupervised Pre-training
 Based on generative neural networks
 Training procedure
1. Pre-train each layer to reproduce the input by unsupervised
learning algorithm
2. Fine-tune the whole network by a supervised learning
algorithm
Ex) wake-sleep[Hinton2003], back-propagation

Generative Neural Networks
 Neural networks with forward–backward connections
Layer 1 (X1)
Input layer (X0)
W1
Layer 2 (X2)
W2
Output layer (XN)
WN
…
Layer 1 (X1)
Input layer (X0)
Layer 2 (X2)
Output layer (XN)
W1
W2
WN
…
WN
W1
W2
Forward-backward networkFeed forward network
Forward connection
for “encoding”
Backward connection
for “decoding”

 Starting from bottom layer, train each layer to
reproduce the input
Input  encoding  hidden  decoding  reprod. of input
Layer 1 (X1)
Input layer (X0)
Layer 2 (X2)
Output layer (XN)
W1
W2
WN
…
WN
W1
W2
Forward-backward network
Forward propagation
for encoding
Backward propagation
for decoding
1st phase

 Starting from bottom layer, train each layer to
reproduce the input
Input  encoding  hidden  decoding  reprod. of input
Layer 1 (X1)
Input layer (X0)
Layer 2 (X2)
Output layer (XN)
W1
W2
WN
…
WN
W1
W2
Forward-backward network
Forward propagation
for encoding
Backward propagation
for decoding
2nd phase

Convolutional Neural Networks
 Neocognitron [Fukushima80]
 Designed to imitate visual processing of human/animals
 Suggested basic concept and network structure of CNN
 LeNet [LeCun98]
 Simplified node and network structure
 Gradient-based learning
 Many improvements and extensions
 [Simard2003], [Ciresan2011]
 Convolutional DBN [Lee2009]
 Siamese network [Chopra2005]
 Locally connected network [Taigman2014]
 Fast training algorithm using FFT [Mathieu2014]

Convolutional Neural Networks
 Composed of many heterogeneous layers
 Convolution layer – feature extraction
 Max-pooling layer - feature abstraction
 Fully-connected layers - classification

Convolution Layers
 Odd-numbered layers in
low/middle-level of CNN
 Nodes on each layer are grouped
into 2D planes (or feature maps)
 Each plane is connected to one or
more input planes
 Each node computes weighted sum
of input nodes in a small region
 All nodes on a plane share weight
set
 Extract feature by convolution
operation

Max-Pooling Layers
 Even-numbered layers in
low/middle-level of CNN
 Nodes on each layer are grouped
into planes
 Each plane is connected to only
one input plane
 Each node chooses maximum
among input nodes in a small
region
 Abstract features
 Reduces feature dimension
 Ignores positional variation of feature
elements

Fully-connected Layers
 Top 2~3 layers of CNN
 1D structure
 Each node is fully connected to
all input nodes
 Each node computes weighted
sum of all input nodes
 Classify input pattern with high-
level features extracted by previous
layers

Gradient-based Learning [LeCun98]
 Trains the whole network to minimize a single error
function E.
 At layer n
 At layer n-1
Layer 1 (X1)
Input layer (X0)
W1
Layer 2 (X2)
W2
Output layer (XN)
WN
…

Why CNN Works Well?
 Network structure effectively guides learning from 2D
images preventing the diminishing gradient problem
 Sparse connection
 Parameter tying
 Good at catching 2D structures
 Training of convolution masks is effective to learn feature
extraction
 Good at handing shape variation
 Abstraction in phases
 Max pooling
 Directly train the network to minimize classification error.

Numeral Digit Recognition (MNIST DB)
 Support Vector Machines
 Neural Nets
 Convolutional Neural Networks (Deep Networks)

Chinese Character Recognition (CASIA DB)
 ICDAR 2013 Competition Result [Yin, et.al 2013]
CNN

Object Image Recognition
 ImageNet Large Scale Visual Recognition Challenge
2013 (ILSVRC2013, http://www.image-net.org)
 1300 object categories
 Training set: 1,281,167 images
 Validation set: 50,000 images
 Test set: 100,000 images

Examples of ILSVRC2013 Images

ILSVRC2013 Results
All high rankers are based on CNNs

Deep Learning in Face Recognition
 Face recognition flow
1. Detection
2. Alignment (pre-processing)
3. Representation (feature extraction)  CNN
 Robust to variation in lighting, expression, …
 Alternatives: LBP + PCA/FDA
4. Verification / Classification  Siamese network
 Alternatives: Euclidian distance, dot product, 2 distance, SVM,
…

DeepFace [Taigman2014]
 Facebook AI group, Tel Aviv Univ.
 Y. Taigman, et.al.
 Achieved 97.25% on LFW dataset.
cf. Conventional best performance: 96.33% [Cao 2013]
 Face recognition prodedure
 2D and 3D alignment
 CNN-based representation
 Verification by weighted 2 distance and Siamese network
 A huge volume of training data
 SFC dataset (4,000 identity * 1,000 samples)

DeepFace [Taigman2014]
 Feature extraction by CNN
 Train a CNN-based face recognizer
 Represent the input face image by the output of (N-1)th
layer

Deep Learning in Speech Recognition
 Hybrid system (HMM + DNN)

 Deep Neural Networks for Acoustic Modeling in
Speech Recognition [Hinton2012]
Deep learning

 CNN for speech recognition [Ossama13]
 Apply CNN on 2D vector (frame, frequency bands)

Hangul Recognition
 Challenges in Hangul Recognition
 A multitude of similar characters
 Missing one small stroke often result in misclassification
Ex) 에-애–얘, 괟-괱-괠-팰, 흥-홍-훙-흉
 Excessive cursiveness

Deep Learning in Hangul Recognition
Methods SERI95a PE92
Kim&Kim01
Structural
matching
86.3% 82.2%
Kang&Kim04
Structural
matching
90.3% 87.7%
Jang&Kim02
Structural
matching +
Post-processing
93.4% N/A
Kim&Liu11 MQDF 93.71% 85.99%
Kim CNN 95.96% 92.92%
Error reduction rates 35.71% 42.44%
[Kim2014]

Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중

Similar to Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중 (20)

More from datasciencekorea

More from datasciencekorea (7)

Recently uploaded

Recently uploaded (20)

Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중