SlideShare a Scribd company logo
Digit Recognizer by Convolutional Neural Network (CNN)
Ding Li 2018.06
online store: costumejewelry1.com
2
MNIST database
(Modified National Institute of Standards and Technology database)
60,000 training images; 10,000 testing images
Kaggle Challenge
42,000 training images; 28,000 testing images
Predict
Input Image Pixel Value
Handwritten Digit
784 (28*28) Pixels
Pixel color coding [0,255]
Label
[0,9]
Input Result
https://en.wikipedia.org/wiki/MNIST_database
Handwritten Digits
28
28
28
28
3
Input
28*28
Conv 1
3*3
26*26*16
16 filters
Max pool 1
2*2
Conv 2
3*3
24*24*16
16 filters
12*12*16
Conv 3
3*3
10*10*32
32 filters
Max pool 2
2*2
Conv 4
3*3
32 filters
4*4*32
Flatten
512
Full
Connected
512 1024
Full
Connected
0
1
2
3
4
5
6
7
8
9
10
Max
Prediction
Probability
Dense 1
Relu
Dense 2
Relu
Dense 3
Full
Connected
Predicted
Label
[0,9]
Softmax
https://en.wikipedia.org/wiki/Convolutional_neural_network LeNet - 5 AlexNet VGG 16
8*8*32
https://en.wikipedia.org/wiki/Yann_LeCun
Trainable
Parameters:
814,778
4
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
10 10 10 0 0 0
*
1 0 -1
1 0 -1
1 0 -1
10 10 0
10 10 0
10 10 0
*
1 0 -1
1 0 -1
1 0 -1
10 10 0
10 10 0
10 10 0
1 0 -1
1 0 -1
1 0 -1
∑ ∑
10 0 0
10 0 0
10 0 0
= 30
0 30 30 0
0 30 30 0
0 30 30 0
0 30 30 0
Detected edge
No edge
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
filter (kernel)
feature detector
Input
Output
5
*
1 0 -1
1 0 -1
1 0 -1
1 1 1
0 0 0
-1 -1 -1
Multiple convolutional filters (kernels) can capture different changes of neighborhood pixels.
https://aishack.in/tutorials/image-convolution-examples/
6
3 8 2 1
7 9 1 1
4 5 2 3
5 6 1 2
9 2
6 3
Max Pool
Pool after convolutional layer, reduce noise and data size.
Local information are concentrated into higher level information.
Besides max pool, there is also average pool.
Size: 2 x 2
7
Input x0, x1, x2
Output a
Parameters w0, w1, w2, b
(to be optimized)
𝑧 = w0 x0 + w1x1 + w2 x2 + b
a = f(z) activation function
Nonlinear increase certainty
ReLU Sigmoid
𝑓(𝑧) =
1
1 + 𝑒−𝑧
f(z) = max(0, z)
http://cs231n.github.io/neural-networks-1/
on
off
on
off
8
Fully Connected Layers
All the input from the previous layer are
combined at each node.
x0
x1
x2
x3
𝑎0
[1]
= 𝑓(𝑤0,0
1
∙ 𝑥0 + 𝑤1,0
1
∙ 𝑥1 + 𝑤2,0
1
∙ 𝑥2 + 𝑤3,0
1
∙ 𝑥3 + 𝑏0
1
)
All the local features extracted in previous
layers are fully connected with different
weights to construct global features.
Complicated relationship between input can
be revealed by deep networks.
https://github.com/drewnoff/spark-notebook-ml-labs/tree/master/labs/DLFramework
𝑎0
[1]
𝑎1
[1]
𝑎2
[1]
𝑎3
[1]
𝑎4
[1]
𝑎5
[1]
𝑎0
[2]
𝑎1
[2]
𝑎2
[2]
𝑎3
[2]
𝑎4
[2]
𝑎5
[2]
𝑎0
[3]
𝑎1
[3]
𝑎2
[3]
𝑎3
[3]
𝑎4
[3]
𝑎5
[3]
𝑎1
[1]
= 𝑓(𝑤0,1
1
∙ 𝑥0 + 𝑤1,1
1
∙ 𝑥1 + 𝑤2,1
1
∙ 𝑥2 + 𝑤3,1
1
∙ 𝑥3 + 𝑏1
1
)
…...
9
…………
𝑎0
[𝐿−1]
𝑎1
[𝐿−1]
𝑎2
[𝐿−2]
𝑎1023
[𝐿−1]
……………..….…………
𝑧0 = 𝑤0,0
𝐿
∙ 𝑎0
𝐿−1
+ 𝑤1,0
𝐿
∙ 𝑎1
𝐿−1
+ 𝑤2,0
𝐿
∙ 𝑎2
𝐿−1
+ ⋯ + 𝑤1023,0
𝐿
∙ 𝑎1023
[𝐿−1]
+ 𝑏0
𝐿
0
1
2
9
Linear combination of inputs from previous layer:
𝑦0 =
𝑒𝑧0
𝑒𝑧0+𝑒𝑧1+𝑒𝑧2+⋯+𝑒𝑧9
L-1 layer
L layer
Softmax, normalize the result:
𝑦0
𝑦1
𝑦2
𝑦9
𝑦0 + 𝑦1 + 𝑦2 + … + 𝑦9 = 1
Probability that 𝑦 = 0
𝑦 =[ 𝑦0, 𝑦1, 𝑦2, … 𝑦9 ]
Prediction:
True value: y = [ y0 , y1 , y2 , … y9]
E.g. y = 0
𝑦 =[ 𝟎. 𝟗, 0.02, 0.01, 0.02, … 0.04]
y = [ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Loss function: 𝐿 𝑦, 𝑦 = −
𝑖=0
9
𝑦𝑖 log(𝑦𝑖) 𝐿 𝑦, 𝑦 = −1 ∗ log 0.9 = 0.046
L ≥ 0, 𝑎𝑡 𝑏𝑒𝑠𝑡 𝑚𝑎𝑡𝑐ℎ 𝑦 = y,
Cost function:
𝐿 𝑦, 𝑦 = −1 ∗ log 1 = 0
𝐽(𝑤, 𝑏) = 1
𝑚 𝑚 𝐿 𝑦,𝑦 𝑡𝑜𝑡𝑎𝑙 𝑙𝑜𝑠𝑠 𝑜𝑓 𝑚 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
𝐺𝑜𝑎𝑙: 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑐𝑜𝑠𝑡 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛.
A Friendly Introduction to Cross-Entropy Loss
𝑤0,0
𝐿
𝑤1023,0
𝐿
10
…………
𝑎0
[𝐿−1]
𝑎1
[𝐿−1]
𝑎2
[𝐿−2]
𝑎1023
[𝐿−1]
……………..….…………
L-1 layer
L layer
𝑦0
𝑦1
𝑦2
𝑦9
𝑊[𝐿]
𝑏[𝐿]
𝑎0
[𝐿−2]
𝑎1
[𝐿−2]
𝑎2
[𝐿−2]
𝑎1023
[𝐿−2]
……………..….…………
L-2 layer
𝑊[𝐿−1]
𝑏[𝐿−1]
0
1
2
9
𝐺𝑜𝑎𝑙: 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑝𝑟𝑜𝑗𝑒𝑐𝑡𝑒𝑑 𝑦 𝑎𝑛𝑑 𝑡𝑟𝑢𝑒 𝑦.
1. With the initial parameters W and b, predict the label 𝑦 with
forward propagation, calculate the cost.
2. 𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑜𝑓 𝐿 𝑙𝑎𝑦𝑒𝑟, 𝑊[𝐿]
& 𝑏[𝐿]
,
assuming inputs from L-1 layer, 𝐴[𝐿−1]
do not change.
3. 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 𝑜𝑓 𝐿– 1 𝑙𝑎𝑦𝑒𝑟 𝑖𝑛𝑝𝑢𝑡 , 𝐴[𝐿−1]
,
which is needed to minimize the cost funtion,
assuming parameters 𝑊[𝐿]
& 𝑏[𝐿]
do not change.
4. 𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑜𝑓 𝐿– 1 𝑙𝑎𝑦𝑒𝑟, 𝑊[𝐿−1] & 𝑏[𝐿−1] ,
𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑑𝑒𝑠𝑖𝑟𝑒𝑑 𝑐ℎ𝑎𝑛𝑔𝑒𝑠 of 𝐴[𝐿−1]
.
5. Proceed like this all the way to the first layer,
optimize the parameters W and b of all layers.
6. Running forward propagation and backpropagation once is called
one epoch, run multiple epochs until cost is near minimum value.
𝐴[𝐿−2] 𝐴[𝐿−1] Forward Propagation
Backpropagation https://en.wikipedia.org/wiki/Backpropagation https://en.wikipedia.org/wiki/Geoffrey_Hinton
1
1
This technique can force the network to learn features in a distributed way and reduces the overfitting.
Dropout applied after the two pool layers and first two full connected layers.
A proportion of nodes in the
layer are randomly ignored for
each training sample.
12
Some images are even hard for human to
recognize, more samples like these can help.
After 1 hour, 30 epochs’ training,
achieved 99.67% accuracy.
Can you predict the true value?
13
Some images are even hard for human to
recognize, more samples like these can help.
After 1 hour, 30 epochs’ training,
achieved 99.67% accuracy.
1
4
Input
Convolution Matrix, 16 filters
Generated from training
Output
1
5
Deeper More abstract
26*26*16 24*24*16 12*12*16
10*10*32 8*8*32 4*4*32
1
6
The machine will combined the 1024 final values to judge the label [0,9].
light
signals
Logical
signals
1
7
Can we understand machine’s ‘mind’?
1
8
• Convolutional Neural Network is very powerful for analyzing visual image.
• The convolutional layers can capture the local features.
• The pooling layers can concentrate the local changes, as well as reduce the noise and data size.
• The full connected layers can combine all the local features to generate global features .
• The global features are combined to make the final judgement, here the probability of label [0,9].
• Can human understand Artificial Neural Networks?
• Is there any similarity between brain and CNN to process the visual information?
• What is the meaning of local and global features generated by machines?
• Can human understand machines’ logic?
Python code of the project at kaggle: https://www.kaggle.com/dingli/digits-recognition-with-cnn-keras

More Related Content

What's hot

DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
Mark Chang
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
Shocky1
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain
홍배 김
 
Joint unsupervised learning of deep representations and image clusters
Joint unsupervised learning of deep representations and image clustersJoint unsupervised learning of deep representations and image clusters
Joint unsupervised learning of deep representations and image clusters
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
Sean Moran
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Universitat Politècnica de Catalunya
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++
Dongheon Lee
 
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u net
Ding Li
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
Chanuk Lim
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Universitat Politècnica de Catalunya
 
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Universitat Politècnica de Catalunya
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
Etsuji Nakai
 
Graph Convolutional Neural Networks
Graph Convolutional Neural Networks Graph Convolutional Neural Networks
Graph Convolutional Neural Networks
신동 강
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
SOYEON KIM
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
Universitat Politècnica de Catalunya
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
홍배 김
 
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Universitat Politècnica de Catalunya
 

What's hot (20)

DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain
 
Joint unsupervised learning of deep representations and image clusters
Joint unsupervised learning of deep representations and image clustersJoint unsupervised learning of deep representations and image clusters
Joint unsupervised learning of deep representations and image clusters
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++
 
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u net
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
 
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
 
Graph Convolutional Neural Networks
Graph Convolutional Neural Networks Graph Convolutional Neural Networks
Graph Convolutional Neural Networks
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
 
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 

Similar to Digit recognizer by convolutional neural network

Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
Vahid Mirjalili
 
Convolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesConvolution Neural Network Lecture Slides
Convolution Neural Network Lecture Slides
AdnanHaider234505
 
Eye deep
Eye deepEye deep
Eye deep
sveitser
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST
 
Deep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image RetrievalDeep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image Retrieval
Edwin Efraín Jiménez Lepe
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
Ahmed BESBES
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
San Kim
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
nyomans1
 
C++ and Deep Learning
C++ and Deep LearningC++ and Deep Learning
C++ and Deep Learning
Oswald Campesato
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
Oswald Campesato
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
Oswald Campesato
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
inside-BigData.com
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
Jun Young Park
 
Binary numbersystem
Binary numbersystemBinary numbersystem
Binary numbersystem
Shehrevar Davierwala
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
台灣資料科學年會
 
DeepXplore: Automated Whitebox Testing of Deep Learning
DeepXplore: Automated Whitebox Testing of Deep LearningDeepXplore: Automated Whitebox Testing of Deep Learning
DeepXplore: Automated Whitebox Testing of Deep Learning
Masahiro Sakai
 
The reversible residual network
The reversible residual networkThe reversible residual network
The reversible residual network
ThyrixYang1
 

Similar to Digit recognizer by convolutional neural network (20)

Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Convolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesConvolution Neural Network Lecture Slides
Convolution Neural Network Lecture Slides
 
Eye deep
Eye deepEye deep
Eye deep
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Deep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image RetrievalDeep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image Retrieval
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
C++ and Deep Learning
C++ and Deep LearningC++ and Deep Learning
C++ and Deep Learning
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
Binary numbersystem
Binary numbersystemBinary numbersystem
Binary numbersystem
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 
DeepXplore: Automated Whitebox Testing of Deep Learning
DeepXplore: Automated Whitebox Testing of Deep LearningDeepXplore: Automated Whitebox Testing of Deep Learning
DeepXplore: Automated Whitebox Testing of Deep Learning
 
The reversible residual network
The reversible residual networkThe reversible residual network
The reversible residual network
 

More from Ding Li

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
 
Titanic survivor prediction by machine learning
Titanic survivor prediction by machine learningTitanic survivor prediction by machine learning
Titanic survivor prediction by machine learning
Ding Li
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Ding Li
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
Ding Li
 
Practical data science
Practical data sciencePractical data science
Practical data science
Ding Li
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
Ding Li
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
Ding Li
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
Ding Li
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
Ding Li
 
Great neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysisGreat neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysis
Ding Li
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in Cloud
Ding Li
 

More from Ding Li (11)

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Titanic survivor prediction by machine learning
Titanic survivor prediction by machine learningTitanic survivor prediction by machine learning
Titanic survivor prediction by machine learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Great neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysisGreat neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysis
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in Cloud
 

Recently uploaded

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 

Recently uploaded (20)

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 

Digit recognizer by convolutional neural network

  • 1. Digit Recognizer by Convolutional Neural Network (CNN) Ding Li 2018.06 online store: costumejewelry1.com
  • 2. 2 MNIST database (Modified National Institute of Standards and Technology database) 60,000 training images; 10,000 testing images Kaggle Challenge 42,000 training images; 28,000 testing images Predict Input Image Pixel Value Handwritten Digit 784 (28*28) Pixels Pixel color coding [0,255] Label [0,9] Input Result https://en.wikipedia.org/wiki/MNIST_database Handwritten Digits 28 28 28 28
  • 3. 3 Input 28*28 Conv 1 3*3 26*26*16 16 filters Max pool 1 2*2 Conv 2 3*3 24*24*16 16 filters 12*12*16 Conv 3 3*3 10*10*32 32 filters Max pool 2 2*2 Conv 4 3*3 32 filters 4*4*32 Flatten 512 Full Connected 512 1024 Full Connected 0 1 2 3 4 5 6 7 8 9 10 Max Prediction Probability Dense 1 Relu Dense 2 Relu Dense 3 Full Connected Predicted Label [0,9] Softmax https://en.wikipedia.org/wiki/Convolutional_neural_network LeNet - 5 AlexNet VGG 16 8*8*32 https://en.wikipedia.org/wiki/Yann_LeCun Trainable Parameters: 814,778
  • 4. 4 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 10 10 10 0 0 0 * 1 0 -1 1 0 -1 1 0 -1 10 10 0 10 10 0 10 10 0 * 1 0 -1 1 0 -1 1 0 -1 10 10 0 10 10 0 10 10 0 1 0 -1 1 0 -1 1 0 -1 ∑ ∑ 10 0 0 10 0 0 10 0 0 = 30 0 30 30 0 0 30 30 0 0 30 30 0 0 30 30 0 Detected edge No edge https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/ filter (kernel) feature detector Input Output
  • 5. 5 * 1 0 -1 1 0 -1 1 0 -1 1 1 1 0 0 0 -1 -1 -1 Multiple convolutional filters (kernels) can capture different changes of neighborhood pixels. https://aishack.in/tutorials/image-convolution-examples/
  • 6. 6 3 8 2 1 7 9 1 1 4 5 2 3 5 6 1 2 9 2 6 3 Max Pool Pool after convolutional layer, reduce noise and data size. Local information are concentrated into higher level information. Besides max pool, there is also average pool. Size: 2 x 2
  • 7. 7 Input x0, x1, x2 Output a Parameters w0, w1, w2, b (to be optimized) 𝑧 = w0 x0 + w1x1 + w2 x2 + b a = f(z) activation function Nonlinear increase certainty ReLU Sigmoid 𝑓(𝑧) = 1 1 + 𝑒−𝑧 f(z) = max(0, z) http://cs231n.github.io/neural-networks-1/ on off on off
  • 8. 8 Fully Connected Layers All the input from the previous layer are combined at each node. x0 x1 x2 x3 𝑎0 [1] = 𝑓(𝑤0,0 1 ∙ 𝑥0 + 𝑤1,0 1 ∙ 𝑥1 + 𝑤2,0 1 ∙ 𝑥2 + 𝑤3,0 1 ∙ 𝑥3 + 𝑏0 1 ) All the local features extracted in previous layers are fully connected with different weights to construct global features. Complicated relationship between input can be revealed by deep networks. https://github.com/drewnoff/spark-notebook-ml-labs/tree/master/labs/DLFramework 𝑎0 [1] 𝑎1 [1] 𝑎2 [1] 𝑎3 [1] 𝑎4 [1] 𝑎5 [1] 𝑎0 [2] 𝑎1 [2] 𝑎2 [2] 𝑎3 [2] 𝑎4 [2] 𝑎5 [2] 𝑎0 [3] 𝑎1 [3] 𝑎2 [3] 𝑎3 [3] 𝑎4 [3] 𝑎5 [3] 𝑎1 [1] = 𝑓(𝑤0,1 1 ∙ 𝑥0 + 𝑤1,1 1 ∙ 𝑥1 + 𝑤2,1 1 ∙ 𝑥2 + 𝑤3,1 1 ∙ 𝑥3 + 𝑏1 1 ) …...
  • 9. 9 ………… 𝑎0 [𝐿−1] 𝑎1 [𝐿−1] 𝑎2 [𝐿−2] 𝑎1023 [𝐿−1] ……………..….………… 𝑧0 = 𝑤0,0 𝐿 ∙ 𝑎0 𝐿−1 + 𝑤1,0 𝐿 ∙ 𝑎1 𝐿−1 + 𝑤2,0 𝐿 ∙ 𝑎2 𝐿−1 + ⋯ + 𝑤1023,0 𝐿 ∙ 𝑎1023 [𝐿−1] + 𝑏0 𝐿 0 1 2 9 Linear combination of inputs from previous layer: 𝑦0 = 𝑒𝑧0 𝑒𝑧0+𝑒𝑧1+𝑒𝑧2+⋯+𝑒𝑧9 L-1 layer L layer Softmax, normalize the result: 𝑦0 𝑦1 𝑦2 𝑦9 𝑦0 + 𝑦1 + 𝑦2 + … + 𝑦9 = 1 Probability that 𝑦 = 0 𝑦 =[ 𝑦0, 𝑦1, 𝑦2, … 𝑦9 ] Prediction: True value: y = [ y0 , y1 , y2 , … y9] E.g. y = 0 𝑦 =[ 𝟎. 𝟗, 0.02, 0.01, 0.02, … 0.04] y = [ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0] Loss function: 𝐿 𝑦, 𝑦 = − 𝑖=0 9 𝑦𝑖 log(𝑦𝑖) 𝐿 𝑦, 𝑦 = −1 ∗ log 0.9 = 0.046 L ≥ 0, 𝑎𝑡 𝑏𝑒𝑠𝑡 𝑚𝑎𝑡𝑐ℎ 𝑦 = y, Cost function: 𝐿 𝑦, 𝑦 = −1 ∗ log 1 = 0 𝐽(𝑤, 𝑏) = 1 𝑚 𝑚 𝐿 𝑦,𝑦 𝑡𝑜𝑡𝑎𝑙 𝑙𝑜𝑠𝑠 𝑜𝑓 𝑚 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝐺𝑜𝑎𝑙: 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑐𝑜𝑠𝑡 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛. A Friendly Introduction to Cross-Entropy Loss 𝑤0,0 𝐿 𝑤1023,0 𝐿
  • 10. 10 ………… 𝑎0 [𝐿−1] 𝑎1 [𝐿−1] 𝑎2 [𝐿−2] 𝑎1023 [𝐿−1] ……………..….………… L-1 layer L layer 𝑦0 𝑦1 𝑦2 𝑦9 𝑊[𝐿] 𝑏[𝐿] 𝑎0 [𝐿−2] 𝑎1 [𝐿−2] 𝑎2 [𝐿−2] 𝑎1023 [𝐿−2] ……………..….………… L-2 layer 𝑊[𝐿−1] 𝑏[𝐿−1] 0 1 2 9 𝐺𝑜𝑎𝑙: 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑝𝑟𝑜𝑗𝑒𝑐𝑡𝑒𝑑 𝑦 𝑎𝑛𝑑 𝑡𝑟𝑢𝑒 𝑦. 1. With the initial parameters W and b, predict the label 𝑦 with forward propagation, calculate the cost. 2. 𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑜𝑓 𝐿 𝑙𝑎𝑦𝑒𝑟, 𝑊[𝐿] & 𝑏[𝐿] , assuming inputs from L-1 layer, 𝐴[𝐿−1] do not change. 3. 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 𝑜𝑓 𝐿– 1 𝑙𝑎𝑦𝑒𝑟 𝑖𝑛𝑝𝑢𝑡 , 𝐴[𝐿−1] , which is needed to minimize the cost funtion, assuming parameters 𝑊[𝐿] & 𝑏[𝐿] do not change. 4. 𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑜𝑓 𝐿– 1 𝑙𝑎𝑦𝑒𝑟, 𝑊[𝐿−1] & 𝑏[𝐿−1] , 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑑𝑒𝑠𝑖𝑟𝑒𝑑 𝑐ℎ𝑎𝑛𝑔𝑒𝑠 of 𝐴[𝐿−1] . 5. Proceed like this all the way to the first layer, optimize the parameters W and b of all layers. 6. Running forward propagation and backpropagation once is called one epoch, run multiple epochs until cost is near minimum value. 𝐴[𝐿−2] 𝐴[𝐿−1] Forward Propagation Backpropagation https://en.wikipedia.org/wiki/Backpropagation https://en.wikipedia.org/wiki/Geoffrey_Hinton
  • 11. 1 1 This technique can force the network to learn features in a distributed way and reduces the overfitting. Dropout applied after the two pool layers and first two full connected layers. A proportion of nodes in the layer are randomly ignored for each training sample.
  • 12. 12 Some images are even hard for human to recognize, more samples like these can help. After 1 hour, 30 epochs’ training, achieved 99.67% accuracy. Can you predict the true value?
  • 13. 13 Some images are even hard for human to recognize, more samples like these can help. After 1 hour, 30 epochs’ training, achieved 99.67% accuracy.
  • 14. 1 4 Input Convolution Matrix, 16 filters Generated from training Output
  • 15. 1 5 Deeper More abstract 26*26*16 24*24*16 12*12*16 10*10*32 8*8*32 4*4*32
  • 16. 1 6 The machine will combined the 1024 final values to judge the label [0,9]. light signals Logical signals
  • 17. 1 7 Can we understand machine’s ‘mind’?
  • 18. 1 8 • Convolutional Neural Network is very powerful for analyzing visual image. • The convolutional layers can capture the local features. • The pooling layers can concentrate the local changes, as well as reduce the noise and data size. • The full connected layers can combine all the local features to generate global features . • The global features are combined to make the final judgement, here the probability of label [0,9]. • Can human understand Artificial Neural Networks? • Is there any similarity between brain and CNN to process the visual information? • What is the meaning of local and global features generated by machines? • Can human understand machines’ logic? Python code of the project at kaggle: https://www.kaggle.com/dingli/digits-recognition-with-cnn-keras