SlideShare a Scribd company logo
1 of 24
Download to read offline
Gradient-Based Learning Applied to Document Recognition
Y. LeCun, L. Bottou, Y. Bengio and P. Haffner
Proceedings of the IEEE, 86(11):2278--‐2324, November 1998
01
LeNet
Speaker: Chia-Jung Ni
• History of Representative CNN models
• Three key ideas for CNN
• Local Receptive Fields
• Shared Weights
• Sub-sampling
• Model Architecture
• Implementation
• Keras
02
Outline
Slide: https://drive.google.com/file/d/12YWNNbqB-_JHl0CrNEl6loINBJoGHgE3/view?usp=sharing
Code: https://drive.google.com/file/d/1wDcDgoF8VSj29ab-cXsN82Q1pxdBiaUx/view?usp=sharing
1980s
CNN be
proposed
1998 LeNet
2012
AlexNet
2015
VGGNet
2015
GoogleNet
2016
ResNet
2017
DenseNet
03
History of Representative CNN models
The first time use
𝐛𝐚𝐜𝐤 − 𝐩𝐫𝐨𝐩𝐚𝐠𝐚𝐭𝐢𝐨𝐧 to
update model params.
The first time use
𝐆𝐏𝐔 to
accelerate computations
• Why local connectivity? (what)
• Spatial correlation is local
• Reduce # of parameters
04
Three key ideas : Local Receptive Fields (1/3)
Example. WLOG
- 1000x1000 image
- 3x3 filter (kernel)
106
+ 1 params.
/ hidden unit
32
+ 1 params.
/ hidden unit
• Why weight sharing? (where)
• Statistics is at different locations
• Reduce # of parameters
05
Three key ideas : Shared Weights (2/3)
Example. WLOG
- # input units (neurons) = 7
- # hidden unit = 3
3 ∗ 3 + 3 = 12 params. 3 ∗ 1 + 3 = 6 params.
• Why Sub-sampling? (Size)
• Sub-sampling the pixel will not change the object
• Reduce memory consumption
06
Three key ideas : Sub-sampling (3/3)
1 2 2 0
1 2 3 2
3 1 3 2
0 2 0 2
2 3
3 3
1.5 1.75
1.5 1.75
Max-pooling Avg-pooling
• Architecture of LeNet-5
• Two sets of convolutional and average pooling layers
• Followed by a flattening convolutional layer
• Then two fully-connected layers and finally a softmax classifier
07
Model Architecture
• Similar to the idea of activation function
• All the first 6 layers (C1, S2, C3, S4, C5, F6) feature maps are passed
through this nonlinear scaled hyperbolic tangent function
08
Model Architecture – Squashing Function
𝑓 𝑎 = 𝐴𝑡𝑎𝑛𝑓 𝑆𝑎 , 𝑤ℎ𝑒𝑟𝑒 ቊ
𝐴 = 1.17519
𝑆 = 2
3
with this choice of params,
the equalities 𝑓 1 = 1 and 𝑓 −1 = −1 satisfied.
𝑓 𝑎𝑓′ 𝑎
𝑓′′ 𝑎
Some details
- Symmetric functions will yield faster convergence,
although the learning might be slow as the weights are too
large/ small.
- The absolute value of the 2nd derivative of f(a) is a
maximum at +1 and -1, which also improves the
convergence toward the end of learning session.
09
Model Architecture – 1st layer (1/7)
• Trainable params
= (weight * input map channel + bias) * output map channel
= (5*5*1 + 1) * 6 = 156
• Connections
= (weight * input map channel + bias) * output map channel * output map size
= (5*5 *1 + 1) * 6 * (28*28) = 122,304
𝑊 𝑙
= 𝑖𝑛𝑡
W𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
𝐻 𝑙
= 𝑖𝑛𝑡
𝐻 𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
Convolution layer 1 (C1)
with 6 feature maps or filters
having size 5×5, a stride of one,
and ‘same’ padding!
10
Model Architecture – 2nd layer (2/7)
• Trainable params
= (weight + bias) * output map channel
= (1 + 1) * 6 = 12
• Connections
= (kernel size + bias) * output map channel * output map size
= (2*2 + 1) * 6 * (14*14) = 5,880
Subsampling layer 2 (S2)
with a filter size 2×2, a
stride of two, and ‘valid’
padding!
𝑊 𝑙
= 𝑖𝑛𝑡
W𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
𝐻 𝑙
= 𝑖𝑛𝑡
𝐻 𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
11
Model Architecture – 3rd layer (3/7)
• Trainable params
= ∑group [ (weight * input map channel + bias) * output map channel ]
= (5*5*3 + 1) * 6 + (5*5*4 + 1) * 6 + (5*5*4 + 1) * 3 + (5*5*6 + 1) * 1 = 456 + 606 + 303 +151 = 1,516
• Connections
= (weight * input map channel + bias) * output map channel * output map size
= [(5*5*3 + 1) * 6 + (5*5*4 + 1) * 6 + (5*5*4 + 1) * 3 + (5*5*6 + 1) * 1] * (10*10) = 151,600
Convolution layer 3 (C3)
with 16 feature maps having
size 5×5 and a stride of one,
and ‘valid’ padding!
Based on the consideration of computation costs,
• First 6 feature maps are connected to 3 contiguous input maps
• Second 6 feature maps are connected to 4 contiguous input maps
• Next 3 feature maps are connected to 4 discontinuous input maps
• Last 1 feature map are connected to all 6 input maps
𝑊 𝑙
= 𝑖𝑛𝑡
W𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
𝐻 𝑙
= 𝑖𝑛𝑡
𝐻 𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
12
Model Architecture – 4th layer (4/7)
• Trainable params
= (weight + bias) * output map channel
= (1 + 1) * 16 = 32
• Connections
= (kernel size + bias) * output map channel * output map size
= (2*2 + 1) * 16 * (5*5) = 2,000
Subsampling layer 4 (S4)
with a filter size 2×2, a
stride of two, and ‘valid’
padding!
𝑊 𝑙
= 𝑖𝑛𝑡
W𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
𝐻 𝑙
= 𝑖𝑛𝑡
𝐻 𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
13
Model Architecture – 5th layer (5/7)
• Trainable params
= (weight * input map channel + bias) * output map channel
= (5*5*16 + 1) * 120 = 48,120
• Connections
= (weight * input map channel + bias) * output map channel * output map size
= (5*5*16 + 1) * 120 * (1*1) = 48,120
Convolution layer 5 (C5)
with 120 feature maps or
filters having size 5×5, a stride
of one, and ‘valid’ padding!
𝑊 𝑙
= 𝑖𝑛𝑡
W𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
𝐻 𝑙
= 𝑖𝑛𝑡
𝐻 𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
14
Model Architecture – 6th layer (6/7)
• Trainable params
= (weight + bias) * output map channel
= (120 + 1) * 84 = 10,164
• Connections
= (weight + bias) * output map channel
= (120 + 1) * 84 = 10,164
Fully-connected layer (F6)
with 84 neuron units!
𝑊 𝑙
= 𝑖𝑛𝑡
W𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
𝐻 𝑙
= 𝑖𝑛𝑡
𝐻 𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
15
Model Architecture – Output layer (7/7)
Output layer
with the Euclidean Radial Basis Function
(RBF)
The output of each RBF units (RBF) 𝑦 𝑖 is
computed as follow:
𝑦 𝑖
= ෍
𝑗
𝑥𝑗 − 𝑤𝑖𝑗
2
Loss Function
With mean squared error
function (MSE) to measure
discrepancy
The output of a particular RBF can be interpreted as a penalty term measuring the fit
between the input pattern and a model of the class associated with the RBF. In
probabilistic terms, the RBF output can be interpreted as the unnormalized negative
log-likelihood of a Gaussian distortion in the space of configuration of layer F6.
, where 𝑦 𝐷 𝑝 is the output of 𝐷 𝑝
-th RBF units,
that is, the one that corresponds to the right
class of input pattern 𝑧 𝑝
.
Model Architecture (LeNet-5)
Notation W, H F S P
Layer
Feature Map
# channel
Feature Map
Size
Filter Size
(Kernel Size)
Stride Padding
Activation
function
Input Image 1 32x32 - - - -
1 Convilution 6 28x28 5x5 1 0 tanh
2 Avg-Pooling 6 14x14 2x2 2 0 tanh
3 Convilution 16 10x10 5x5 1 0 tanh
4 Avg-Pooling 16 5x5 2x2 2 0 tanh
5 Convilution 120 1x1 5x5 1 0 tanh
6 FC - 84 - - - tanh
Output FC - 10 - - - RBF
𝑊 𝑙
= 𝑖𝑛𝑡
W𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
𝐻 𝑙
= 𝑖𝑛𝑡
𝐻 𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
16
Implementation – Download Data Set & Normalize
16
Implementation – Define LeNet-5 Model
16
Implementation – Define LeNet-5 Model & Evaluate
16
Implementation – Visualize the Training Process
16
17
Thanks for your listening.
18
Appendix 1. Common to zero pad the border
Example. WLOG
- input 7x7
- 3x3 filter, applied with stride 1
- pad with 1 pixel border => what is the output?
In general, common to see CONV layers with
stride 1, filters of size FxF, and zero-padding with (F-1)/2.
(will preserve size spatially)
• F = 3 => zero pad with 1
• F = 5 => zero pad with 2
𝑊 𝑙
= 𝑖𝑛𝑡
𝑊 𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
𝐻 𝑙
= 𝑖𝑛𝑡
𝐻 𝑙−1
− 𝐹 + 2𝑃
𝑆
+ 1
19
Appendix 2. Sub-Sampling v.s. Pooling
• Sub-Sampling is simply Average-Pooling with learnable weights per
feature map.
• Sub-Sampling is a generalization of Average-Pooling.
1 2 2 0
1 2 3 2
3 1 3 2
0 2 0 2
1.5 1.75
1.5 1.75
Avg-pooling
1.5 1.75
1.5 1.75
Sub-sampling
𝑤 + 𝑏
, where w and b ∈ 𝑅
19
Appendix 3. Radial Basis Function (RBF) units
𝑥84
𝑥1
𝑥2
𝑥83
𝑥𝑗
𝑦10
𝑦1
𝑦2
𝑦9
𝑦𝑖
⋮
⋮
⋮
⋮
𝑤1,1
𝑤10,84
𝒀10×1 = 𝑾10×84
𝑇
𝑿84×1
𝑵𝒐𝒕𝒆.
1) 𝑥𝑗 ∈ ℝ 𝑖𝑠 𝑡ℎ𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝑡ℎ𝑒 𝐹𝐶6 𝑙𝑎𝑦𝑒𝑟 𝑤𝑖𝑡ℎ 𝑠𝑞𝑢𝑎𝑠ℎ𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑓 𝑎 = 𝐴𝑡𝑎𝑛𝑓 𝑆𝑎 ,
∀ 𝑗 = 1, … , 84
2) 𝑇ℎ𝑒 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑣𝑒𝑐𝑡𝑜𝑟𝑠 ൛𝑤𝑖𝑗 |𝑖 = 1, … , 10 ; 𝑗 =

More Related Content

What's hot

Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural networkKIRAN R
 
Convolutional neural networks
Convolutional neural networks Convolutional neural networks
Convolutional neural networks Roozbeh Sanaei
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkMojammilHusain
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnSumeraHangi
 
AlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetAlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetJungwon Kim
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning Asma-AH
 

What's hot (20)

Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 
Convolutional neural networks
Convolutional neural networks Convolutional neural networks
Convolutional neural networks
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Resnet
ResnetResnet
Resnet
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Deep learning ppt
Deep learning pptDeep learning ppt
Deep learning ppt
 
AlexNet.pptx
AlexNet.pptxAlexNet.pptx
AlexNet.pptx
 
AlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, ResnetAlexNet, VGG, GoogleNet, Resnet
AlexNet, VGG, GoogleNet, Resnet
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning
 

Similar to LeNet-5

Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
openMP loop parallelization
openMP loop parallelizationopenMP loop parallelization
openMP loop parallelizationAlbert DeFusco
 
COA Chapter 1.pdf
COA Chapter 1.pdfCOA Chapter 1.pdf
COA Chapter 1.pdfAbelAteme
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Sergey Karayev
 
Lecture 5: Convolutional Neural Network Models
Lecture 5: Convolutional Neural Network ModelsLecture 5: Convolutional Neural Network Models
Lecture 5: Convolutional Neural Network ModelsMohamed Loey
 
Dsoop (co 221) 1
Dsoop (co 221) 1Dsoop (co 221) 1
Dsoop (co 221) 1Puja Koch
 
B Eng Final Year Project Presentation
B Eng Final Year Project PresentationB Eng Final Year Project Presentation
B Eng Final Year Project Presentationjesujoseph
 
A Simple Communication System Design Lab #4 with MATLAB Simulink
A Simple Communication System Design Lab #4 with MATLAB SimulinkA Simple Communication System Design Lab #4 with MATLAB Simulink
A Simple Communication System Design Lab #4 with MATLAB SimulinkJaewook. Kang
 
Branch and bounding : Data structures
Branch and bounding : Data structuresBranch and bounding : Data structures
Branch and bounding : Data structuresKàŕtheek Jåvvàjí
 
FPGA_Logic.pdf
FPGA_Logic.pdfFPGA_Logic.pdf
FPGA_Logic.pdfwafawafa52
 
CS540-2-lecture11 - Copy.ppt
CS540-2-lecture11 - Copy.pptCS540-2-lecture11 - Copy.ppt
CS540-2-lecture11 - Copy.pptssuser0be977
 

Similar to LeNet-5 (20)

Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
openMP loop parallelization
openMP loop parallelizationopenMP loop parallelization
openMP loop parallelization
 
lecture_20.pptx
lecture_20.pptxlecture_20.pptx
lecture_20.pptx
 
lecture_20.pptx
lecture_20.pptxlecture_20.pptx
lecture_20.pptx
 
COA Chapter 1.pdf
COA Chapter 1.pdfCOA Chapter 1.pdf
COA Chapter 1.pdf
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
 
Lecture 5: Convolutional Neural Network Models
Lecture 5: Convolutional Neural Network ModelsLecture 5: Convolutional Neural Network Models
Lecture 5: Convolutional Neural Network Models
 
Singlevaropt
SinglevaroptSinglevaropt
Singlevaropt
 
Dsoop (co 221) 1
Dsoop (co 221) 1Dsoop (co 221) 1
Dsoop (co 221) 1
 
Queue (1)(1).ppt
Queue (1)(1).pptQueue (1)(1).ppt
Queue (1)(1).ppt
 
Matlab Nn Intro
Matlab Nn IntroMatlab Nn Intro
Matlab Nn Intro
 
B Eng Final Year Project Presentation
B Eng Final Year Project PresentationB Eng Final Year Project Presentation
B Eng Final Year Project Presentation
 
A Simple Communication System Design Lab #4 with MATLAB Simulink
A Simple Communication System Design Lab #4 with MATLAB SimulinkA Simple Communication System Design Lab #4 with MATLAB Simulink
A Simple Communication System Design Lab #4 with MATLAB Simulink
 
Branch and bounding : Data structures
Branch and bounding : Data structuresBranch and bounding : Data structures
Branch and bounding : Data structures
 
Matlab-1.pptx
Matlab-1.pptxMatlab-1.pptx
Matlab-1.pptx
 
SLE2015: Distributed ATL
SLE2015: Distributed ATLSLE2015: Distributed ATL
SLE2015: Distributed ATL
 
FPGA_Logic.pdf
FPGA_Logic.pdfFPGA_Logic.pdf
FPGA_Logic.pdf
 
Code Optimization.ppt
Code Optimization.pptCode Optimization.ppt
Code Optimization.ppt
 
CS540-2-lecture11 - Copy.ppt
CS540-2-lecture11 - Copy.pptCS540-2-lecture11 - Copy.ppt
CS540-2-lecture11 - Copy.ppt
 
Unsupervised learning networks
Unsupervised learning networksUnsupervised learning networks
Unsupervised learning networks
 

Recently uploaded

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 

Recently uploaded (20)

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 

LeNet-5

  • 1. Gradient-Based Learning Applied to Document Recognition Y. LeCun, L. Bottou, Y. Bengio and P. Haffner Proceedings of the IEEE, 86(11):2278--‐2324, November 1998 01 LeNet Speaker: Chia-Jung Ni
  • 2. • History of Representative CNN models • Three key ideas for CNN • Local Receptive Fields • Shared Weights • Sub-sampling • Model Architecture • Implementation • Keras 02 Outline Slide: https://drive.google.com/file/d/12YWNNbqB-_JHl0CrNEl6loINBJoGHgE3/view?usp=sharing Code: https://drive.google.com/file/d/1wDcDgoF8VSj29ab-cXsN82Q1pxdBiaUx/view?usp=sharing
  • 3. 1980s CNN be proposed 1998 LeNet 2012 AlexNet 2015 VGGNet 2015 GoogleNet 2016 ResNet 2017 DenseNet 03 History of Representative CNN models The first time use 𝐛𝐚𝐜𝐤 − 𝐩𝐫𝐨𝐩𝐚𝐠𝐚𝐭𝐢𝐨𝐧 to update model params. The first time use 𝐆𝐏𝐔 to accelerate computations
  • 4. • Why local connectivity? (what) • Spatial correlation is local • Reduce # of parameters 04 Three key ideas : Local Receptive Fields (1/3) Example. WLOG - 1000x1000 image - 3x3 filter (kernel) 106 + 1 params. / hidden unit 32 + 1 params. / hidden unit
  • 5. • Why weight sharing? (where) • Statistics is at different locations • Reduce # of parameters 05 Three key ideas : Shared Weights (2/3) Example. WLOG - # input units (neurons) = 7 - # hidden unit = 3 3 ∗ 3 + 3 = 12 params. 3 ∗ 1 + 3 = 6 params.
  • 6. • Why Sub-sampling? (Size) • Sub-sampling the pixel will not change the object • Reduce memory consumption 06 Three key ideas : Sub-sampling (3/3) 1 2 2 0 1 2 3 2 3 1 3 2 0 2 0 2 2 3 3 3 1.5 1.75 1.5 1.75 Max-pooling Avg-pooling
  • 7. • Architecture of LeNet-5 • Two sets of convolutional and average pooling layers • Followed by a flattening convolutional layer • Then two fully-connected layers and finally a softmax classifier 07 Model Architecture
  • 8. • Similar to the idea of activation function • All the first 6 layers (C1, S2, C3, S4, C5, F6) feature maps are passed through this nonlinear scaled hyperbolic tangent function 08 Model Architecture – Squashing Function 𝑓 𝑎 = 𝐴𝑡𝑎𝑛𝑓 𝑆𝑎 , 𝑤ℎ𝑒𝑟𝑒 ቊ 𝐴 = 1.17519 𝑆 = 2 3 with this choice of params, the equalities 𝑓 1 = 1 and 𝑓 −1 = −1 satisfied. 𝑓 𝑎𝑓′ 𝑎 𝑓′′ 𝑎 Some details - Symmetric functions will yield faster convergence, although the learning might be slow as the weights are too large/ small. - The absolute value of the 2nd derivative of f(a) is a maximum at +1 and -1, which also improves the convergence toward the end of learning session.
  • 9. 09 Model Architecture – 1st layer (1/7) • Trainable params = (weight * input map channel + bias) * output map channel = (5*5*1 + 1) * 6 = 156 • Connections = (weight * input map channel + bias) * output map channel * output map size = (5*5 *1 + 1) * 6 * (28*28) = 122,304 𝑊 𝑙 = 𝑖𝑛𝑡 W𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1 𝐻 𝑙 = 𝑖𝑛𝑡 𝐻 𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1 Convolution layer 1 (C1) with 6 feature maps or filters having size 5×5, a stride of one, and ‘same’ padding!
  • 10. 10 Model Architecture – 2nd layer (2/7) • Trainable params = (weight + bias) * output map channel = (1 + 1) * 6 = 12 • Connections = (kernel size + bias) * output map channel * output map size = (2*2 + 1) * 6 * (14*14) = 5,880 Subsampling layer 2 (S2) with a filter size 2×2, a stride of two, and ‘valid’ padding! 𝑊 𝑙 = 𝑖𝑛𝑡 W𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1 𝐻 𝑙 = 𝑖𝑛𝑡 𝐻 𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1
  • 11. 11 Model Architecture – 3rd layer (3/7) • Trainable params = ∑group [ (weight * input map channel + bias) * output map channel ] = (5*5*3 + 1) * 6 + (5*5*4 + 1) * 6 + (5*5*4 + 1) * 3 + (5*5*6 + 1) * 1 = 456 + 606 + 303 +151 = 1,516 • Connections = (weight * input map channel + bias) * output map channel * output map size = [(5*5*3 + 1) * 6 + (5*5*4 + 1) * 6 + (5*5*4 + 1) * 3 + (5*5*6 + 1) * 1] * (10*10) = 151,600 Convolution layer 3 (C3) with 16 feature maps having size 5×5 and a stride of one, and ‘valid’ padding! Based on the consideration of computation costs, • First 6 feature maps are connected to 3 contiguous input maps • Second 6 feature maps are connected to 4 contiguous input maps • Next 3 feature maps are connected to 4 discontinuous input maps • Last 1 feature map are connected to all 6 input maps 𝑊 𝑙 = 𝑖𝑛𝑡 W𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1 𝐻 𝑙 = 𝑖𝑛𝑡 𝐻 𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1
  • 12. 12 Model Architecture – 4th layer (4/7) • Trainable params = (weight + bias) * output map channel = (1 + 1) * 16 = 32 • Connections = (kernel size + bias) * output map channel * output map size = (2*2 + 1) * 16 * (5*5) = 2,000 Subsampling layer 4 (S4) with a filter size 2×2, a stride of two, and ‘valid’ padding! 𝑊 𝑙 = 𝑖𝑛𝑡 W𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1 𝐻 𝑙 = 𝑖𝑛𝑡 𝐻 𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1
  • 13. 13 Model Architecture – 5th layer (5/7) • Trainable params = (weight * input map channel + bias) * output map channel = (5*5*16 + 1) * 120 = 48,120 • Connections = (weight * input map channel + bias) * output map channel * output map size = (5*5*16 + 1) * 120 * (1*1) = 48,120 Convolution layer 5 (C5) with 120 feature maps or filters having size 5×5, a stride of one, and ‘valid’ padding! 𝑊 𝑙 = 𝑖𝑛𝑡 W𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1 𝐻 𝑙 = 𝑖𝑛𝑡 𝐻 𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1
  • 14. 14 Model Architecture – 6th layer (6/7) • Trainable params = (weight + bias) * output map channel = (120 + 1) * 84 = 10,164 • Connections = (weight + bias) * output map channel = (120 + 1) * 84 = 10,164 Fully-connected layer (F6) with 84 neuron units! 𝑊 𝑙 = 𝑖𝑛𝑡 W𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1 𝐻 𝑙 = 𝑖𝑛𝑡 𝐻 𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1
  • 15. 15 Model Architecture – Output layer (7/7) Output layer with the Euclidean Radial Basis Function (RBF) The output of each RBF units (RBF) 𝑦 𝑖 is computed as follow: 𝑦 𝑖 = ෍ 𝑗 𝑥𝑗 − 𝑤𝑖𝑗 2 Loss Function With mean squared error function (MSE) to measure discrepancy The output of a particular RBF can be interpreted as a penalty term measuring the fit between the input pattern and a model of the class associated with the RBF. In probabilistic terms, the RBF output can be interpreted as the unnormalized negative log-likelihood of a Gaussian distortion in the space of configuration of layer F6. , where 𝑦 𝐷 𝑝 is the output of 𝐷 𝑝 -th RBF units, that is, the one that corresponds to the right class of input pattern 𝑧 𝑝 .
  • 16. Model Architecture (LeNet-5) Notation W, H F S P Layer Feature Map # channel Feature Map Size Filter Size (Kernel Size) Stride Padding Activation function Input Image 1 32x32 - - - - 1 Convilution 6 28x28 5x5 1 0 tanh 2 Avg-Pooling 6 14x14 2x2 2 0 tanh 3 Convilution 16 10x10 5x5 1 0 tanh 4 Avg-Pooling 16 5x5 2x2 2 0 tanh 5 Convilution 120 1x1 5x5 1 0 tanh 6 FC - 84 - - - tanh Output FC - 10 - - - RBF 𝑊 𝑙 = 𝑖𝑛𝑡 W𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1 𝐻 𝑙 = 𝑖𝑛𝑡 𝐻 𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1 16
  • 17. Implementation – Download Data Set & Normalize 16
  • 18. Implementation – Define LeNet-5 Model 16
  • 19. Implementation – Define LeNet-5 Model & Evaluate 16
  • 20. Implementation – Visualize the Training Process 16
  • 21. 17 Thanks for your listening.
  • 22. 18 Appendix 1. Common to zero pad the border Example. WLOG - input 7x7 - 3x3 filter, applied with stride 1 - pad with 1 pixel border => what is the output? In general, common to see CONV layers with stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially) • F = 3 => zero pad with 1 • F = 5 => zero pad with 2 𝑊 𝑙 = 𝑖𝑛𝑡 𝑊 𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1 𝐻 𝑙 = 𝑖𝑛𝑡 𝐻 𝑙−1 − 𝐹 + 2𝑃 𝑆 + 1
  • 23. 19 Appendix 2. Sub-Sampling v.s. Pooling • Sub-Sampling is simply Average-Pooling with learnable weights per feature map. • Sub-Sampling is a generalization of Average-Pooling. 1 2 2 0 1 2 3 2 3 1 3 2 0 2 0 2 1.5 1.75 1.5 1.75 Avg-pooling 1.5 1.75 1.5 1.75 Sub-sampling 𝑤 + 𝑏 , where w and b ∈ 𝑅
  • 24. 19 Appendix 3. Radial Basis Function (RBF) units 𝑥84 𝑥1 𝑥2 𝑥83 𝑥𝑗 𝑦10 𝑦1 𝑦2 𝑦9 𝑦𝑖 ⋮ ⋮ ⋮ ⋮ 𝑤1,1 𝑤10,84 𝒀10×1 = 𝑾10×84 𝑇 𝑿84×1 𝑵𝒐𝒕𝒆. 1) 𝑥𝑗 ∈ ℝ 𝑖𝑠 𝑡ℎ𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝑡ℎ𝑒 𝐹𝐶6 𝑙𝑎𝑦𝑒𝑟 𝑤𝑖𝑡ℎ 𝑠𝑞𝑢𝑎𝑠ℎ𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑓 𝑎 = 𝐴𝑡𝑎𝑛𝑓 𝑆𝑎 , ∀ 𝑗 = 1, … , 84 2) 𝑇ℎ𝑒 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑣𝑒𝑐𝑡𝑜𝑟𝑠 ൛𝑤𝑖𝑗 |𝑖 = 1, … , 10 ; 𝑗 =