SlideShare a Scribd company logo
Deep Learning
Lecture (1)
19.10.22 You Sung Min
Bengio, Yoshua, Ian Goodfellow, and Aaron
Courville. Deep learning. Vol. 1. MIT press, 2017.
0. Introduction
1. Why neural networks?
1. What is the neural network?
2. Universal approximation theorem
3. Why deep neural network?
2. How the network learns
1. Gradient descent
2. Backpropagation
3. Modern deep learning
1. Convolutional neural network
2. Recurrent neural network
Contents
Example of deep learning model
Introduction
Image source : Zeiler & Fergus, 2014
Artificial intelligence
Introduction
History of deep learning
Introduction
Backpropagation
Distributed representation
(1986)
Deep
learning
(2006)
LSTM
(1997)Biological
learning
(1943)
Neocognitron
(1980)
Perceptron
(1958) Stochastic
gradient descent
(1960)
History of deep learning
 Size of dataset
Introduction
History of deep learning
 Connections per neuron
Introduction
10: GoogleNet
(2014)
History of deep learning
 Number of neurons
Introduction
1. Perceptron
20. GoogleNet
Structure of perceptron (Developed in 1950s)
Why neural networks?
=
𝟎 𝒊𝒇
𝒋
𝝎𝒋 𝒙𝒋 ≤ 𝑻
𝟏 𝒊𝒇
𝒋
𝝎𝒋 𝒙𝒋 > 𝑻
𝝎 𝟏
𝝎 𝟐
𝝎 𝟑
𝒋
𝝎𝒋 𝒙𝒋Binary Inputs
Threshold T
𝒋
𝝎𝒋 𝒙𝒋 − 𝑻 ≤ 𝟎
or
𝒋
𝝎𝒋 𝒙𝒋 − 𝑻 > 𝟎
𝒛 =
𝒋
𝝎𝒋 𝒙𝒋 + 𝒃 𝒐𝒖𝒕𝒑𝒖𝒕 𝒚 = 𝝓(𝒛), where
𝝓 is called activation ftn.
output of a single neuron 𝒚 = 𝝓( 𝒋 𝝎𝒋 𝒙𝒋 + 𝒃)
Multilayer perceptron (MLP)
Why neural networks?
𝝎 𝟏
𝟏
𝝎𝒊
𝒋
𝒚 𝟏
𝟐
𝒙 𝟏
𝒙 𝟐
𝒙𝒊
𝒚𝒋
𝟏
𝒚 𝟐
𝟏
𝒚 𝟏
𝟏
𝒚𝒋
𝟏
= 𝝓(
𝒊
𝝎𝒊
𝟏
𝒙𝒊 + 𝒃𝒋
𝟏
)
𝒚 𝟏
𝟐
= 𝝓(
𝒊
𝝎𝒊
𝟐
𝒚𝒊
𝟏
+ 𝒃𝒋
𝟐
)
𝒚 𝟑
𝝎 𝟏
𝟐
𝝎 𝟏
𝟑
𝒚 𝟑 = 𝝓(
𝒊
𝝎𝒊
𝟑
𝒚𝒊
𝟐
+ 𝒃𝒋
𝟑
)
𝑭 𝒙 = 𝝓
𝒊
𝝎𝒊
𝟑
𝝓(
𝒊
𝝎𝒊
𝟐
𝝓(
𝒊
𝝎𝒊
𝟏
𝒙𝒊 + 𝒃𝒋
𝟏
) + 𝒃𝒋
𝟐
) + 𝒃𝒋
𝟑
Output of a network
Universal approximation theorem (보편 근사정리)
⇒ For any subset of ℝ 𝒏, any continuous function f can be
approximated with a feedforward neural network
that has at least a single hidden layer
⇒ 하나의 은닉층을 갖는 신경망은 임의의 연속인 다변수 함
수를 원하는 정도로 근사 할 수 있다
Why neural networks?
𝑭 𝒙 =
𝒊=𝟏
𝑵
𝒗𝒊 𝝋 𝑾𝒊
𝑻
𝒙 + 𝒃𝒊
, where φ is ℝ → ℝ, nonconstant,
bounded , continuous function
𝑭 𝒙 − 𝒇 𝒙 < 𝝐 for all 𝒙 ∈ 𝒔𝒖𝒃𝒆𝒕 𝒐𝒇 ℝ 𝑴
Universal approximation theorem (보편 근사정리)
⇒ Regardless of what function we are trying to learn,
a large MLP will be able to represent that function
But not guaranteed that the training algorithm is able to
learn that function
1. Optimization algorithm may fail to find parameters
(weight)
2. Training algorithm might choose wrong function
due to overfitting (fail generalization)
: There is no universal procedure to train and generalize
a function (no free lunch theorem; Wolpert, 1996)
Why neural networks?
Universal approximation theorem (보편 근사정리)
⇒ A feed forward with a single hidden layer is sufficient to
represent any function. But the layer may be large and may
fail to learn and generalize correctly
 Why deep neural network?
In many case, deeper model can reduce the required number
of units (neuron) and the amount of generalization error
Why neural networks?
Why deep neural network?
Effect of depth (Goodfellow et al., 2014)
 Street View House Numbers (SVHN) database
Why neural networks?
Number of depth
Goodfellow, Ian J., et al. "Multi-digit number recognition from street view imagery using
deep convolutional neural networks." arXiv preprint arXiv:1312.6082 (2013)
Why deep neural network?
Curse of dimensionality (→ statistical challenge)
Let dimension of data space as d
Required number of sample to inference : n
Generally in practical task: 𝐝 ≫ 𝒏 𝟑
Why neural networks?
Image source : Nicolas Chapados
d = 10
𝒏 𝟏
d = 𝟏𝟎 𝟐
𝒏 𝟐
d = 𝟏𝟎 𝟑
𝒏 𝟑
𝒏 𝟏 < 𝒏 𝟐 ≪ 𝒏 𝟑
Why deep neural network?
Local constancy prior (smoothness prior)
 For 𝒙 as an input sample and small change of ε,
the well-trained function 𝒇 should satisfy
Why neural networks?
𝒇∗
𝒙 ≈ 𝒇∗
𝒙 + 𝝐
Why deep neural network?
Local constancy prior (smoothness prior)
Models with local kernel at samples
𝑶(𝒌) sample is required to distinguish 𝑶(𝒌) regions
Deep learning spans data into subspaces
(Distributed representation)
Data was generated by the composition of factors (or
features), potentially at multiple levels in a hierarchy
Why neural networks?
Voronoi diagram
(nearest-neighborhood)
Why deep neural network?
Manifold hypothesis
Manifold : a connected set of points that can be
approximated well by considering only a small
number of degree of freedom (or dimensions) in a
higher-dimensional space
Why neural networks?
Why deep neural network?
Manifold hypothesis
Real world data(sound, image, text etc.) are highly
concentrated
Why neural networks?
Random samples in the image space
Why deep neural network?
Manifold hypothesis
Even though the data space is ℝ 𝒏, we don’t have to
consider all the space
We may consider only neighborhood of the observed
samples along with some manifolds
A transfer may exist along the manifold
For example, intensity change in images
 Manifolds related human face and those related with cat
may different
Why neural networks?
Why deep neural network?
Manifold hypothesis
Why neural networks?
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with
deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015)
Why deep neural network?
 Non-linear transform by learning
Linear model: linear combination of input 𝑿
⇒ Linear model with non-linear transform 𝝓(𝑿) as
input
Finding an optimal 𝝓 𝑿
Previous: human knowledge-based transform
(i.e., handcrafted features)
Deep learning: learning inside the network
𝒚 = 𝒇 𝒙; 𝜽, 𝝎 = 𝝓(𝒙; 𝜽) 𝑻 𝝎
Why neural networks?
Why deep neural network?
Why neural networks?
A hidden layer
𝒚 = 𝒇 𝒙; 𝜽, 𝝎 = 𝝓(𝒙; 𝜽) 𝑻 𝝎
Why deep neural network?
Summary
Curse of dimensionality
Local constancy prior
Manifold hypothesis
Nonlinear transform by learning
Dimension of the data space can
be reduced as subsets of manifold
The number of decision regions
can be spanned with the subspaces
as composition of factors
Why neural networks?
Learning of the network
To approximate a function 𝒇∗
Classifier 𝒚 = 𝒇∗(𝒙), where 𝒚𝒊 ∈ 𝒇𝒊𝒏𝒊𝒕𝒆 𝒔𝒆𝒕
Regression 𝒚 = 𝒇∗
(𝒙), where 𝒚𝒊 ∈ ℝ 𝒅
 A network defines a mapping 𝒚 = 𝒇(𝒙; 𝜽) and
learns parameters 𝜽 which approximate the function 𝒇∗
Due to the non-linearity, the global optimization
algorithm (such as convex optimization) is not proper to
the deep learning → Update cost function 𝑪
Gradient descent
Backpropagation
How the network learns
Learning of the network
Gradient descent
How the network learns
𝒇 𝟏: ℝ → ℝ
𝒇 𝟐: ℝ 𝒏 → ℝ
Learning of the network
Directional derivative of 𝒇 at 𝒖 direction
𝝏
𝝏𝜶
𝒇 𝒗 + 𝜶𝒖 = 𝒖 𝑻 𝛁𝒗 𝒇(𝒗)
→ min
𝒖
cos 𝜽 , 𝒘𝒉𝒆𝒓𝒆 𝜶 = 𝟎
Moving toward negative gradient decreases 𝒇
How the network learns
𝒇
𝒗′ = 𝒗 − 𝜼𝛁𝒗 𝒇(𝒗)
(𝜼 ∶ 𝒍𝒆𝒂𝒓𝒏𝒊𝒏𝒈 𝒓𝒂𝒕𝒆)
Learning of the network
Backpropagation
How the network learns
Error backpropagation path
𝒙 𝒚 = 𝒈(𝒙)
𝒅𝒛
𝒅𝒙
=
𝒅𝒛
𝒅𝒚
𝒅𝒚
𝒅𝒙
𝒛 = 𝒇 𝒈 𝒙
= 𝒇(𝒚)y
𝒛
by chain-rule
Learning of the network
Backpropagation
For 𝒙 ∈ ℝ 𝒎
, 𝒚 ∈ ℝ 𝒏
and 𝒈: ℝ 𝒎
→ ℝ 𝒏
, 𝒇: ℝ 𝒏
→ ℝ
From gradient descent,
How the network learns
𝒅𝒛
𝒅𝒙
=
𝒅𝒛
𝒅𝒚
𝒅𝒚
𝒅𝒙
𝝏𝒛
𝝏𝒙𝒊
=
𝒋
𝝏𝒛
𝝏𝒚𝒋
𝝏𝒚𝒋
𝝏𝒙𝒊
𝛁𝒙 𝒛 = (
𝝏𝒚
𝝏𝒙
) 𝑻
𝛁𝒚 𝒛
𝝏𝒚
𝝏𝒙
: 𝒏 × 𝒎 Jacobian
matrix of 𝒈
𝒙′ = 𝒙 − 𝜼(
𝝏𝒚
𝝏𝒙
) 𝑻 𝛁𝒚 𝒛 𝜽′ = 𝜽 − 𝜼(
𝝏𝒚
𝝏𝜽
) 𝑻 𝛁𝒚 𝒛
Learning of the network
Universal approximation theorem
Gradient descent & Backpropagation
Practical reason of fail
Optimization
Optimizer (SGD, AdaGrad, RMSprop, Adam, etc.)
Weight initialization
Regularization
Parameter norm penalty (𝑳 𝟐
, 𝑳 𝟏
)
Augmentation / Noise input (weight noise, label smoothing)
Multitask learning
Parameter sharing (CNN)
Ensemble / Dropout
Adversarial training
How the network learns
Domain specific prior
Convolutional neural network
Convolution vs cross-correlation
Convolution
Cross-correlation
Modern deep learning
𝑺 𝒊, 𝒋 = 𝑰 ∗ 𝑲 𝒊, 𝒋 =
𝒎 𝒏
𝑰 𝒎, 𝒏 𝑲(𝒊 − 𝒎, 𝒋 − 𝒏)
= 𝑲 ∗ 𝑰 𝒊, 𝒋 =
𝒎 𝒏
𝑰 𝒊 − 𝒎, 𝒋 − 𝒏 𝑲(𝒎, 𝒏)
𝑺 𝒊, 𝒋 = 𝑰 ∗ 𝑲 𝒊, 𝒋 =
𝒎 𝒏
𝑰 𝒊 + 𝒎, 𝒋 + 𝒏 𝑲(𝒎, 𝒏)
Most of CNN actually uses cross-correlation not convolution
Convolutional neural network
Significant characteristics of CNN
 Sparse interaction
 Parameter sharing
 Equivariant representation
Sparse interaction
 Kernel size ≪ input size (e.g., 128-by-128 image and 3-by-3 kernel)
 For 𝒎 − 𝒊𝒏𝒑𝒖𝒕 and 𝒏 − 𝒐𝒖𝒕𝒑𝒖𝒕,
fully connected network: 𝑶 𝒎 × 𝒏
CNN: 𝑶 𝒌 × 𝒏 , 𝐰𝐡𝐞𝐫𝐞 𝐤 𝐢𝐬 𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐜𝐨𝐧𝐧𝐞𝐜𝐭𝐢𝐨𝐧𝐬
 Practically, k has several orders of magnitude smaller than m
Modern deep learning
CNN fully connected network Receptive field of CNN
Convolutional neural network
Parameter sharing
 Learning only a set of parameters (kernel) for every location
 Reduce the required amount of memory
Modern deep learning
fully connected networkCNN
Calculation : 4 billion times efficient
Memory storage: 178,640 for matrix multiplication
Vertical
edge
Convolutional neural network
Equivariant representation
(translation equivariant)
 Translation in input → translation in output
Modern deep learning
Location of output (feature)
related to cat
Convolutional neural network
Pooling (translation invariance)
Tasks that care more about whether some features
exist than exactly where they are
Modern deep learning
Convolutional neural network
Prior belief of convolution and pooling
Ftn. the layer should learn contains only local
interactions and is equivariant to translation
Ftn. the layers learns must be invariant to small
translations
C.f.) Inception module(Szegedy. 2015)
Capsule network(Hinton, 2017)
Modern deep learning
Convolutional neural network
Historical meaning of CNN
Since the imageNet challenge of AlexNet(2012)
Modern deep learning
Convolutional neural network
Historical meaning of CNN
First deep network that is trained and operated
well with backpropagation
Reason of success is not entirely clear
Efficiency of the computation time might give
chances to perform more experiments for the
tuning of the implementation and hyperparameters
CNN achieved states of the arts with the data that
has a clear grid-structured topology(such as image)
Modern deep learning
End
Q & A

More Related Content

What's hot

Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Yan Xu
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
amalalhait
 
Deep learning
Deep learningDeep learning
Deep learning
Kuppusamy P
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
SHIMI S L
 
Confusion Matrix Explained
Confusion Matrix ExplainedConfusion Matrix Explained
Confusion Matrix Explained
Stockholm University
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Knoldus Inc.
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
Dr. C.V. Suresh Babu
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
Akash Goel
 
Deep learning
Deep learningDeep learning
Deep learning
Ratnakar Pandey
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
CNN Algorithm
CNN AlgorithmCNN Algorithm
CNN Algorithm
georgejustymirobi1
 
Evolutionary computing - soft computing
Evolutionary computing - soft computingEvolutionary computing - soft computing
Evolutionary computing - soft computing
SakshiMahto1
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
Hojin Yang
 
Asymptotic notations
Asymptotic notationsAsymptotic notations
Asymptotic notationsEhtisham Ali
 
Deep learning: Overfitting , underfitting, and regularization
Deep learning: Overfitting , underfitting, and regularizationDeep learning: Overfitting , underfitting, and regularization
Deep learning: Overfitting , underfitting, and regularization
Aly Abdelkareem
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic Notations
Rishabh Soni
 
Artificial Intelligence Searching Techniques
Artificial Intelligence Searching TechniquesArtificial Intelligence Searching Techniques
Artificial Intelligence Searching Techniques
Dr. C.V. Suresh Babu
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
garima931
 
Artificial Intelligence_ Knowledge Representation
Artificial Intelligence_ Knowledge RepresentationArtificial Intelligence_ Knowledge Representation
Artificial Intelligence_ Knowledge Representation
ThenmozhiK5
 
Classification
ClassificationClassification
Classification
CloudxLab
 

What's hot (20)

Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Deep learning
Deep learningDeep learning
Deep learning
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
Confusion Matrix Explained
Confusion Matrix ExplainedConfusion Matrix Explained
Confusion Matrix Explained
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Deep learning
Deep learningDeep learning
Deep learning
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
CNN Algorithm
CNN AlgorithmCNN Algorithm
CNN Algorithm
 
Evolutionary computing - soft computing
Evolutionary computing - soft computingEvolutionary computing - soft computing
Evolutionary computing - soft computing
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
 
Asymptotic notations
Asymptotic notationsAsymptotic notations
Asymptotic notations
 
Deep learning: Overfitting , underfitting, and regularization
Deep learning: Overfitting , underfitting, and regularizationDeep learning: Overfitting , underfitting, and regularization
Deep learning: Overfitting , underfitting, and regularization
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic Notations
 
Artificial Intelligence Searching Techniques
Artificial Intelligence Searching TechniquesArtificial Intelligence Searching Techniques
Artificial Intelligence Searching Techniques
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
Artificial Intelligence_ Knowledge Representation
Artificial Intelligence_ Knowledge RepresentationArtificial Intelligence_ Knowledge Representation
Artificial Intelligence_ Knowledge Representation
 
Classification
ClassificationClassification
Classification
 

Similar to Deep learning lecture - part 1 (basics, CNN)

Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
Namkug Kim
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
hirokazutanaka
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
iTrain
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introduction
SungminYou
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
JaeJun Yoo
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
Anirban Santara
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
Akash Goel
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
yang947066
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
Massimiliano Patacchiola
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
SungminYou
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural Networks Introduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
MuhammadMir92
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 
Artificial Neural networks
Artificial Neural networksArtificial Neural networks
Artificial Neural networks
Learnbay Datascience
 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applications
Sangeeta Tiwari
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
Siby Jose Plathottam
 
Neural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An IntroNeural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An Intro
Fariz Darari
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
Balázs Hidasi
 

Similar to Deep learning lecture - part 1 (basics, CNN) (20)

Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introduction
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Ffnn
FfnnFfnn
Ffnn
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural Networks Introduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Artificial Neural networks
Artificial Neural networksArtificial Neural networks
Artificial Neural networks
 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applications
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
 
Neural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An IntroNeural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An Intro
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 

Recently uploaded

Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 

Recently uploaded (20)

Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 

Deep learning lecture - part 1 (basics, CNN)

  • 1. Deep Learning Lecture (1) 19.10.22 You Sung Min Bengio, Yoshua, Ian Goodfellow, and Aaron Courville. Deep learning. Vol. 1. MIT press, 2017.
  • 2. 0. Introduction 1. Why neural networks? 1. What is the neural network? 2. Universal approximation theorem 3. Why deep neural network? 2. How the network learns 1. Gradient descent 2. Backpropagation 3. Modern deep learning 1. Convolutional neural network 2. Recurrent neural network Contents
  • 3. Example of deep learning model Introduction Image source : Zeiler & Fergus, 2014
  • 5. History of deep learning Introduction Backpropagation Distributed representation (1986) Deep learning (2006) LSTM (1997)Biological learning (1943) Neocognitron (1980) Perceptron (1958) Stochastic gradient descent (1960)
  • 6. History of deep learning  Size of dataset Introduction
  • 7. History of deep learning  Connections per neuron Introduction 10: GoogleNet (2014)
  • 8. History of deep learning  Number of neurons Introduction 1. Perceptron 20. GoogleNet
  • 9. Structure of perceptron (Developed in 1950s) Why neural networks? = 𝟎 𝒊𝒇 𝒋 𝝎𝒋 𝒙𝒋 ≤ 𝑻 𝟏 𝒊𝒇 𝒋 𝝎𝒋 𝒙𝒋 > 𝑻 𝝎 𝟏 𝝎 𝟐 𝝎 𝟑 𝒋 𝝎𝒋 𝒙𝒋Binary Inputs Threshold T 𝒋 𝝎𝒋 𝒙𝒋 − 𝑻 ≤ 𝟎 or 𝒋 𝝎𝒋 𝒙𝒋 − 𝑻 > 𝟎 𝒛 = 𝒋 𝝎𝒋 𝒙𝒋 + 𝒃 𝒐𝒖𝒕𝒑𝒖𝒕 𝒚 = 𝝓(𝒛), where 𝝓 is called activation ftn. output of a single neuron 𝒚 = 𝝓( 𝒋 𝝎𝒋 𝒙𝒋 + 𝒃)
  • 10. Multilayer perceptron (MLP) Why neural networks? 𝝎 𝟏 𝟏 𝝎𝒊 𝒋 𝒚 𝟏 𝟐 𝒙 𝟏 𝒙 𝟐 𝒙𝒊 𝒚𝒋 𝟏 𝒚 𝟐 𝟏 𝒚 𝟏 𝟏 𝒚𝒋 𝟏 = 𝝓( 𝒊 𝝎𝒊 𝟏 𝒙𝒊 + 𝒃𝒋 𝟏 ) 𝒚 𝟏 𝟐 = 𝝓( 𝒊 𝝎𝒊 𝟐 𝒚𝒊 𝟏 + 𝒃𝒋 𝟐 ) 𝒚 𝟑 𝝎 𝟏 𝟐 𝝎 𝟏 𝟑 𝒚 𝟑 = 𝝓( 𝒊 𝝎𝒊 𝟑 𝒚𝒊 𝟐 + 𝒃𝒋 𝟑 ) 𝑭 𝒙 = 𝝓 𝒊 𝝎𝒊 𝟑 𝝓( 𝒊 𝝎𝒊 𝟐 𝝓( 𝒊 𝝎𝒊 𝟏 𝒙𝒊 + 𝒃𝒋 𝟏 ) + 𝒃𝒋 𝟐 ) + 𝒃𝒋 𝟑 Output of a network
  • 11. Universal approximation theorem (보편 근사정리) ⇒ For any subset of ℝ 𝒏, any continuous function f can be approximated with a feedforward neural network that has at least a single hidden layer ⇒ 하나의 은닉층을 갖는 신경망은 임의의 연속인 다변수 함 수를 원하는 정도로 근사 할 수 있다 Why neural networks? 𝑭 𝒙 = 𝒊=𝟏 𝑵 𝒗𝒊 𝝋 𝑾𝒊 𝑻 𝒙 + 𝒃𝒊 , where φ is ℝ → ℝ, nonconstant, bounded , continuous function 𝑭 𝒙 − 𝒇 𝒙 < 𝝐 for all 𝒙 ∈ 𝒔𝒖𝒃𝒆𝒕 𝒐𝒇 ℝ 𝑴
  • 12. Universal approximation theorem (보편 근사정리) ⇒ Regardless of what function we are trying to learn, a large MLP will be able to represent that function But not guaranteed that the training algorithm is able to learn that function 1. Optimization algorithm may fail to find parameters (weight) 2. Training algorithm might choose wrong function due to overfitting (fail generalization) : There is no universal procedure to train and generalize a function (no free lunch theorem; Wolpert, 1996) Why neural networks?
  • 13. Universal approximation theorem (보편 근사정리) ⇒ A feed forward with a single hidden layer is sufficient to represent any function. But the layer may be large and may fail to learn and generalize correctly  Why deep neural network? In many case, deeper model can reduce the required number of units (neuron) and the amount of generalization error Why neural networks?
  • 14. Why deep neural network? Effect of depth (Goodfellow et al., 2014)  Street View House Numbers (SVHN) database Why neural networks? Number of depth Goodfellow, Ian J., et al. "Multi-digit number recognition from street view imagery using deep convolutional neural networks." arXiv preprint arXiv:1312.6082 (2013)
  • 15. Why deep neural network? Curse of dimensionality (→ statistical challenge) Let dimension of data space as d Required number of sample to inference : n Generally in practical task: 𝐝 ≫ 𝒏 𝟑 Why neural networks? Image source : Nicolas Chapados d = 10 𝒏 𝟏 d = 𝟏𝟎 𝟐 𝒏 𝟐 d = 𝟏𝟎 𝟑 𝒏 𝟑 𝒏 𝟏 < 𝒏 𝟐 ≪ 𝒏 𝟑
  • 16. Why deep neural network? Local constancy prior (smoothness prior)  For 𝒙 as an input sample and small change of ε, the well-trained function 𝒇 should satisfy Why neural networks? 𝒇∗ 𝒙 ≈ 𝒇∗ 𝒙 + 𝝐
  • 17. Why deep neural network? Local constancy prior (smoothness prior) Models with local kernel at samples 𝑶(𝒌) sample is required to distinguish 𝑶(𝒌) regions Deep learning spans data into subspaces (Distributed representation) Data was generated by the composition of factors (or features), potentially at multiple levels in a hierarchy Why neural networks? Voronoi diagram (nearest-neighborhood)
  • 18. Why deep neural network? Manifold hypothesis Manifold : a connected set of points that can be approximated well by considering only a small number of degree of freedom (or dimensions) in a higher-dimensional space Why neural networks?
  • 19. Why deep neural network? Manifold hypothesis Real world data(sound, image, text etc.) are highly concentrated Why neural networks? Random samples in the image space
  • 20. Why deep neural network? Manifold hypothesis Even though the data space is ℝ 𝒏, we don’t have to consider all the space We may consider only neighborhood of the observed samples along with some manifolds A transfer may exist along the manifold For example, intensity change in images  Manifolds related human face and those related with cat may different Why neural networks?
  • 21. Why deep neural network? Manifold hypothesis Why neural networks? Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015)
  • 22. Why deep neural network?  Non-linear transform by learning Linear model: linear combination of input 𝑿 ⇒ Linear model with non-linear transform 𝝓(𝑿) as input Finding an optimal 𝝓 𝑿 Previous: human knowledge-based transform (i.e., handcrafted features) Deep learning: learning inside the network 𝒚 = 𝒇 𝒙; 𝜽, 𝝎 = 𝝓(𝒙; 𝜽) 𝑻 𝝎 Why neural networks?
  • 23. Why deep neural network? Why neural networks? A hidden layer 𝒚 = 𝒇 𝒙; 𝜽, 𝝎 = 𝝓(𝒙; 𝜽) 𝑻 𝝎
  • 24. Why deep neural network? Summary Curse of dimensionality Local constancy prior Manifold hypothesis Nonlinear transform by learning Dimension of the data space can be reduced as subsets of manifold The number of decision regions can be spanned with the subspaces as composition of factors Why neural networks?
  • 25. Learning of the network To approximate a function 𝒇∗ Classifier 𝒚 = 𝒇∗(𝒙), where 𝒚𝒊 ∈ 𝒇𝒊𝒏𝒊𝒕𝒆 𝒔𝒆𝒕 Regression 𝒚 = 𝒇∗ (𝒙), where 𝒚𝒊 ∈ ℝ 𝒅  A network defines a mapping 𝒚 = 𝒇(𝒙; 𝜽) and learns parameters 𝜽 which approximate the function 𝒇∗ Due to the non-linearity, the global optimization algorithm (such as convex optimization) is not proper to the deep learning → Update cost function 𝑪 Gradient descent Backpropagation How the network learns
  • 26. Learning of the network Gradient descent How the network learns 𝒇 𝟏: ℝ → ℝ 𝒇 𝟐: ℝ 𝒏 → ℝ
  • 27. Learning of the network Directional derivative of 𝒇 at 𝒖 direction 𝝏 𝝏𝜶 𝒇 𝒗 + 𝜶𝒖 = 𝒖 𝑻 𝛁𝒗 𝒇(𝒗) → min 𝒖 cos 𝜽 , 𝒘𝒉𝒆𝒓𝒆 𝜶 = 𝟎 Moving toward negative gradient decreases 𝒇 How the network learns 𝒇 𝒗′ = 𝒗 − 𝜼𝛁𝒗 𝒇(𝒗) (𝜼 ∶ 𝒍𝒆𝒂𝒓𝒏𝒊𝒏𝒈 𝒓𝒂𝒕𝒆)
  • 28. Learning of the network Backpropagation How the network learns Error backpropagation path 𝒙 𝒚 = 𝒈(𝒙) 𝒅𝒛 𝒅𝒙 = 𝒅𝒛 𝒅𝒚 𝒅𝒚 𝒅𝒙 𝒛 = 𝒇 𝒈 𝒙 = 𝒇(𝒚)y 𝒛 by chain-rule
  • 29. Learning of the network Backpropagation For 𝒙 ∈ ℝ 𝒎 , 𝒚 ∈ ℝ 𝒏 and 𝒈: ℝ 𝒎 → ℝ 𝒏 , 𝒇: ℝ 𝒏 → ℝ From gradient descent, How the network learns 𝒅𝒛 𝒅𝒙 = 𝒅𝒛 𝒅𝒚 𝒅𝒚 𝒅𝒙 𝝏𝒛 𝝏𝒙𝒊 = 𝒋 𝝏𝒛 𝝏𝒚𝒋 𝝏𝒚𝒋 𝝏𝒙𝒊 𝛁𝒙 𝒛 = ( 𝝏𝒚 𝝏𝒙 ) 𝑻 𝛁𝒚 𝒛 𝝏𝒚 𝝏𝒙 : 𝒏 × 𝒎 Jacobian matrix of 𝒈 𝒙′ = 𝒙 − 𝜼( 𝝏𝒚 𝝏𝒙 ) 𝑻 𝛁𝒚 𝒛 𝜽′ = 𝜽 − 𝜼( 𝝏𝒚 𝝏𝜽 ) 𝑻 𝛁𝒚 𝒛
  • 30. Learning of the network Universal approximation theorem Gradient descent & Backpropagation Practical reason of fail Optimization Optimizer (SGD, AdaGrad, RMSprop, Adam, etc.) Weight initialization Regularization Parameter norm penalty (𝑳 𝟐 , 𝑳 𝟏 ) Augmentation / Noise input (weight noise, label smoothing) Multitask learning Parameter sharing (CNN) Ensemble / Dropout Adversarial training How the network learns Domain specific prior
  • 31. Convolutional neural network Convolution vs cross-correlation Convolution Cross-correlation Modern deep learning 𝑺 𝒊, 𝒋 = 𝑰 ∗ 𝑲 𝒊, 𝒋 = 𝒎 𝒏 𝑰 𝒎, 𝒏 𝑲(𝒊 − 𝒎, 𝒋 − 𝒏) = 𝑲 ∗ 𝑰 𝒊, 𝒋 = 𝒎 𝒏 𝑰 𝒊 − 𝒎, 𝒋 − 𝒏 𝑲(𝒎, 𝒏) 𝑺 𝒊, 𝒋 = 𝑰 ∗ 𝑲 𝒊, 𝒋 = 𝒎 𝒏 𝑰 𝒊 + 𝒎, 𝒋 + 𝒏 𝑲(𝒎, 𝒏) Most of CNN actually uses cross-correlation not convolution
  • 32. Convolutional neural network Significant characteristics of CNN  Sparse interaction  Parameter sharing  Equivariant representation Sparse interaction  Kernel size ≪ input size (e.g., 128-by-128 image and 3-by-3 kernel)  For 𝒎 − 𝒊𝒏𝒑𝒖𝒕 and 𝒏 − 𝒐𝒖𝒕𝒑𝒖𝒕, fully connected network: 𝑶 𝒎 × 𝒏 CNN: 𝑶 𝒌 × 𝒏 , 𝐰𝐡𝐞𝐫𝐞 𝐤 𝐢𝐬 𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐜𝐨𝐧𝐧𝐞𝐜𝐭𝐢𝐨𝐧𝐬  Practically, k has several orders of magnitude smaller than m Modern deep learning CNN fully connected network Receptive field of CNN
  • 33. Convolutional neural network Parameter sharing  Learning only a set of parameters (kernel) for every location  Reduce the required amount of memory Modern deep learning fully connected networkCNN Calculation : 4 billion times efficient Memory storage: 178,640 for matrix multiplication Vertical edge
  • 34. Convolutional neural network Equivariant representation (translation equivariant)  Translation in input → translation in output Modern deep learning Location of output (feature) related to cat
  • 35. Convolutional neural network Pooling (translation invariance) Tasks that care more about whether some features exist than exactly where they are Modern deep learning
  • 36. Convolutional neural network Prior belief of convolution and pooling Ftn. the layer should learn contains only local interactions and is equivariant to translation Ftn. the layers learns must be invariant to small translations C.f.) Inception module(Szegedy. 2015) Capsule network(Hinton, 2017) Modern deep learning
  • 37. Convolutional neural network Historical meaning of CNN Since the imageNet challenge of AlexNet(2012) Modern deep learning
  • 38. Convolutional neural network Historical meaning of CNN First deep network that is trained and operated well with backpropagation Reason of success is not entirely clear Efficiency of the computation time might give chances to perform more experiments for the tuning of the implementation and hyperparameters CNN achieved states of the arts with the data that has a clear grid-structured topology(such as image) Modern deep learning

Editor's Notes

  1. A simple model to emulate a single neuron A perceptron takes binary inputs (𝒙_𝟏,𝒙_𝟐,𝒙_𝟑…) and produce a single binary output (0, 1)
  2. By Cmglee - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=20206883
  3. Image source: https://www.cc.gatech.edu/~san37/post/dlhc-cnn/
  4. Image source: https://www.cc.gatech.edu/~san37/post/dlhc-cnn/
  5. Image source: https://www.cc.gatech.edu/~san37/post/dlhc-cnn/
  6. Image source: https://www.topbots.com/14-design-patterns-improve-convolutional-neural-network-cnn-architecture/
  7. 13층의 컨볼루션 신경망의 값을 산출하기 위해선 약 300억 번의 연산수 필요