Deep Learning (2)
Convolutional Neural Networks
PRESENTED BY HENGYANG (TROY) LU
APRIL 22ND, 2017
Outlines for Today
Section I. Basics of Convolutional Neural Networks
◦ What is CNN?
◦ Comparison with traditional Neural Networks
◦ Why we need CNNs?
◦ Boosting Technologies for CNNs
Section II. More Details of Convolutional Neural Networks
◦ AlexNet  A Network for classification  The “Equation”
◦ Optimization Methods in Neural Networks  The Numerical “Solver”
Section III. Convolutional Neural Networks with Tensorflow and TFlearn
Section I. The Basics
Image from http://parse.ele.tue.nl/cluster/2/CNNArchitecture.jpg
What is Convolutional Neural Network?
What is convolution?
◦ It is a specialized linear operation.
◦ A 2D convolution is shown on the right. (Images From: community.arm.com)
◦ Strictly speaking, it’s cross-correlation.
◦ In CNNs, all convolution operations are actually cross-correlation.
Convolutional neural networks are neural networks that use convolution in place of general
matrix multiplication in at least one of their layers. They are very powerful in processing data
with grid-like topology. [1]
[1] Ian Goodfellow, Yoshua Bengio, Aaron Courville , Deep Learning
Comparison with MLP
In last lecture, we got to know MLP(multi-layer perceptron), where the operation from one
layer to neurons in the upper layer is matrix multiplication controlled by weights and bias.
In CNNs, where do those “Neurons” go?
◦ Each neuron is one element in the matrix after convolution
◦ weights are shared
Comparison with MLP
 Local Connections
A
B
C
A, with convolution kernel size = 3, the activated neurons are only affected by local neurons , unlike in B,
where there are full connections; however, with depth, the receptive field can expand, and get global connections
to neurons in lower layer.
Why we Need Convolutional Neural
Networks?
A lot of challenges we could not deal with in the past, now with CNN, yes, we can! :D
A lot of things we could do in the past, now with CNN, we can do better!
CNNs represent current state-of-the-art technique in classification, object detection etc.
Now, let’s take a brief look at these achievements…
MNIST Hand-written digits recognition
The MNIST database of handwritten digits
◦ Has a training set of 60000 examples,
◦ Has a test set of 10000 examples,
◦ Is a subset of a larger set available from NIST ( National Institute of Standards and Technology)
◦ The digits have been size-normalized (28x28) and centered in a fixed-size image.
http://simonwinder.com/2015/07/training-neural-nets-on-mnist-digits/
MNIST Classification Record [1]
Classifier Preprocessing Best Test Error Rate (%)
Linear Classifiers deskewing 7.6
K-Nearest Neighbours Shape-context feature extraction 0.63
Boosted Stumps Haar features 0.87
Non-linear classifiers none 3.3
SVMs deskewing 0.56
Neural Nets none 0.35
Convolution Neural Nets Width normalization 0.23
[1] http://yann.lecun.com/exdb/mnist/
The ImageNet Challenge [1][2]
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is a benchmark in object
category classification and detection on hundreds of object categories and millions of images
◦ The ILSVRC challenge has been running annually since 2010, following the footsteps of PASCAL VOC
challenge, which was established in 2005.
◦ ILSVRC 2010, 1,461406 images and 1000 object classes.
◦ Images are annotated, and annotations fall into one of two categories
◦ (1) image-level annotation of a binary label for the presence or absence of an object class in the image;
◦ (2) object-level annotation of a tight bounding box and class label around an object instance in the image.
◦ ILSVRC 2017, the last ILSVRC challenge.
◦ In these years, several convolutional neural network structure won the first place:
◦ AlexNet 2012
◦ InceptionNet 2014
◦ Deep Residual Network 2015
[1] http://image-net.org/challenges/LSVRC/2017/
[2] Olga Russakovsky et al., ImageNet Large Scale Visual Recognition Challenge
ImageNet : Diversity of Data
ImageNet : Fine-grained classes
ImageNet: Tasks
PRISMA
Technology Behind PRISMA [1]
Deep Convolutional Neural Networks
(a) Separate the content and style of an image
(b) Recombine the content of one image with
the style of another image
[1] Leon A. Gatys et al, A Neural Algorithm of Artistic Style
Boosting Technology for CNNs
The First CNN prototype appeared much earlier, but why it becomes super-hot only in the recent
years?
◦ Huge amount of data and advanced storage/memory systems
◦ GPU acceleration which is super fast in convolution operations (Nvidia GPU Tesla K40 1.4 TFlops)
◦ Deep neural network structures
◦ Optimization methods for training the deep CNNs are invented, like stochastic gradient descent
◦ Off-the-shelf software package solutions are available and easy to use
◦ Progress in both hardware and software make CNNs the ONE!
Section II: More Details [1]
http://www.ritchieng.com/machine-learning/deep-learning/convs/
[1] Slides in section II, credit from slides presented by Tugce Tasci and Kyunghee Kim
AlexNet  Dataset
AlexNet  Dataset
Architecture
Conv L1 Conv L2 Conv L3 Conv L4 Conv L5
Fully
Connected
L6
Fully
Connected
L7
Output
Layer
L8
Layer 1 (Convolutional)
Layer 1 (Convolutional)
ReLU Nonlinearity
ReLU Nonlinearity
Local Response Normalization
Overlapping Pooling
Pooling summarize the outputs of neighbouring groups of neurons in the same kernel map.
Two important parameters
◦ Kernel size : z
◦ Stride size: s
◦ If s < z, then the max-pooling is overlapped
In the experiment, s=2, z=3 overlapped pooling reduces the top-1 and top-5 error rates by 0.4%
and 0.3%, respectively, compared with s=2 and z=2 non-overlapping case.
Reduce Overfitting
Reduce Overfitting
Reduce Overfitting
Reduce Overfitting
Train the CNNs  Optimization
Techniques
Back-propagation
◦ Sparse Connections of CNNs decrease the complexity of Back-Propagation
◦ ReLU activation function relieves the vanishing gradient problem
Stochastic Gradient Descent
Loss Minimization
Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
Large-Scale Setting
Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
Optimization Methods Requirements
Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
Stochastic Gradient Descent (SGD)
Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
Variants to the basic SGD
Nestrov’s Accelerated Gradient (SGD)
Adaptive Gradient (AdaGrad)
Root Mean Square Propagation (RMSProp)
Adaptive Moment Estimation (Adam)
NAG
Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
AdaGrad
Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
RMSProp
Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
ADAM
Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
ADAM
Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
Comparisons of Different Optimization
Methods
Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
Multi-Layer Neural Networks on MNIST
Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
Convolutional Neural Networks on CIFAR-10
Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
SGD for AlexNet
Results: ILSVRC-2010
Results: ILSVRC-2012
Section III. CNNs with Tensorflow and
TFlearn
Images from Peter Goldsborough, A Tour of Tensorflow
Tensorflow
Tensorflow is an open-source library for numerical computation using data flow graphs
◦ Developed by Google Brain Team and Google’s Machine Intelligence research Org.
Implementation ML in tensorflow
◦ In tensorflow, computations are represented using Graphs
◦ Each node is an operation (OP)
◦ Data is represented as Tensors
◦ OP takes Tensors and returns Tensors
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
Construction of Computational Graph
Follow the 3-steps pattern
◦ 1. inference() – Builds the graph as far as is required for running the network forward to make
predictions
◦ 2. loss() – Adds to the inference graph the ops required to generate loss
◦ 3. training() – Adds to the loss graph the ops required to compute and apply gradients
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
Deep Convolutional Networks in
Tensorflow
Load the training data, using MNIST
from tensorflow.examples.tutorials.mnist import input_data
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
Weight Initialization
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
Convolution and Pooling
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
First Convolutional Layer
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
Second Convolutional Layer
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
Fully Connected Layer
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
Dropout
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
Readout Layer
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
Train and Evaluate
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
Execute
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
TFLearn
TFLearn is an abstraction library built on top of Tensorflow that provides high-level building
blocks to quickly construct TensorFlow graphs.
◦ Highly modular interface
◦ Allow rapid chaining of neural network layers, regularization functions, optimizers and other elements
◦ Can be used with tensorflow hybridly
In the following part, let’s implement the previous CNN model with tflearn, and see how much
easier life is now!
TFLearn Website http://tflearn.org/
Redo the same thing with TFLearn
Import the packages
TFLearn Website http://tflearn.org/
Load MNIST dataset
TFLearn Website http://tflearn.org/
Build the convolutional network
TFLearn Website http://tflearn.org/
Training the Network
TFLearn Website http://tflearn.org/
Conclusion
Pros:
◦ Deep Convolutional Neural Networks represent current state-of-the-art techniques in image
classification, object detection and localization
◦ Powerful CNN models are like AlexNet, InceptionNet, Deep Residual Networks
◦ Open-source libraries for deploying applications with CNNs very fast
◦ Convolutional Neural Networks can share pre-trained weights, which is the basis for transfer learning
Cons:
◦ The interpretation and mechanism of CNNs are not clear, we don’t know why they work better than
previous models
◦ Large number of training data and annotations are needed, which may not be practical in some
problems.
Thank You :D

Convolutional neural network

  • 1.
    Deep Learning (2) ConvolutionalNeural Networks PRESENTED BY HENGYANG (TROY) LU APRIL 22ND, 2017
  • 2.
    Outlines for Today SectionI. Basics of Convolutional Neural Networks ◦ What is CNN? ◦ Comparison with traditional Neural Networks ◦ Why we need CNNs? ◦ Boosting Technologies for CNNs Section II. More Details of Convolutional Neural Networks ◦ AlexNet  A Network for classification  The “Equation” ◦ Optimization Methods in Neural Networks  The Numerical “Solver” Section III. Convolutional Neural Networks with Tensorflow and TFlearn
  • 3.
    Section I. TheBasics Image from http://parse.ele.tue.nl/cluster/2/CNNArchitecture.jpg
  • 4.
    What is ConvolutionalNeural Network? What is convolution? ◦ It is a specialized linear operation. ◦ A 2D convolution is shown on the right. (Images From: community.arm.com) ◦ Strictly speaking, it’s cross-correlation. ◦ In CNNs, all convolution operations are actually cross-correlation. Convolutional neural networks are neural networks that use convolution in place of general matrix multiplication in at least one of their layers. They are very powerful in processing data with grid-like topology. [1] [1] Ian Goodfellow, Yoshua Bengio, Aaron Courville , Deep Learning
  • 5.
    Comparison with MLP Inlast lecture, we got to know MLP(multi-layer perceptron), where the operation from one layer to neurons in the upper layer is matrix multiplication controlled by weights and bias. In CNNs, where do those “Neurons” go? ◦ Each neuron is one element in the matrix after convolution ◦ weights are shared
  • 6.
    Comparison with MLP Local Connections A B C A, with convolution kernel size = 3, the activated neurons are only affected by local neurons , unlike in B, where there are full connections; however, with depth, the receptive field can expand, and get global connections to neurons in lower layer.
  • 7.
    Why we NeedConvolutional Neural Networks? A lot of challenges we could not deal with in the past, now with CNN, yes, we can! :D A lot of things we could do in the past, now with CNN, we can do better! CNNs represent current state-of-the-art technique in classification, object detection etc. Now, let’s take a brief look at these achievements…
  • 8.
    MNIST Hand-written digitsrecognition The MNIST database of handwritten digits ◦ Has a training set of 60000 examples, ◦ Has a test set of 10000 examples, ◦ Is a subset of a larger set available from NIST ( National Institute of Standards and Technology) ◦ The digits have been size-normalized (28x28) and centered in a fixed-size image. http://simonwinder.com/2015/07/training-neural-nets-on-mnist-digits/
  • 9.
    MNIST Classification Record[1] Classifier Preprocessing Best Test Error Rate (%) Linear Classifiers deskewing 7.6 K-Nearest Neighbours Shape-context feature extraction 0.63 Boosted Stumps Haar features 0.87 Non-linear classifiers none 3.3 SVMs deskewing 0.56 Neural Nets none 0.35 Convolution Neural Nets Width normalization 0.23 [1] http://yann.lecun.com/exdb/mnist/
  • 10.
    The ImageNet Challenge[1][2] The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is a benchmark in object category classification and detection on hundreds of object categories and millions of images ◦ The ILSVRC challenge has been running annually since 2010, following the footsteps of PASCAL VOC challenge, which was established in 2005. ◦ ILSVRC 2010, 1,461406 images and 1000 object classes. ◦ Images are annotated, and annotations fall into one of two categories ◦ (1) image-level annotation of a binary label for the presence or absence of an object class in the image; ◦ (2) object-level annotation of a tight bounding box and class label around an object instance in the image. ◦ ILSVRC 2017, the last ILSVRC challenge. ◦ In these years, several convolutional neural network structure won the first place: ◦ AlexNet 2012 ◦ InceptionNet 2014 ◦ Deep Residual Network 2015 [1] http://image-net.org/challenges/LSVRC/2017/ [2] Olga Russakovsky et al., ImageNet Large Scale Visual Recognition Challenge
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    Technology Behind PRISMA[1] Deep Convolutional Neural Networks (a) Separate the content and style of an image (b) Recombine the content of one image with the style of another image [1] Leon A. Gatys et al, A Neural Algorithm of Artistic Style
  • 16.
    Boosting Technology forCNNs The First CNN prototype appeared much earlier, but why it becomes super-hot only in the recent years? ◦ Huge amount of data and advanced storage/memory systems ◦ GPU acceleration which is super fast in convolution operations (Nvidia GPU Tesla K40 1.4 TFlops) ◦ Deep neural network structures ◦ Optimization methods for training the deep CNNs are invented, like stochastic gradient descent ◦ Off-the-shelf software package solutions are available and easy to use ◦ Progress in both hardware and software make CNNs the ONE!
  • 17.
    Section II: MoreDetails [1] http://www.ritchieng.com/machine-learning/deep-learning/convs/ [1] Slides in section II, credit from slides presented by Tugce Tasci and Kyunghee Kim
  • 18.
  • 19.
  • 20.
    Architecture Conv L1 ConvL2 Conv L3 Conv L4 Conv L5 Fully Connected L6 Fully Connected L7 Output Layer L8
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
    Overlapping Pooling Pooling summarizethe outputs of neighbouring groups of neurons in the same kernel map. Two important parameters ◦ Kernel size : z ◦ Stride size: s ◦ If s < z, then the max-pooling is overlapped In the experiment, s=2, z=3 overlapped pooling reduces the top-1 and top-5 error rates by 0.4% and 0.3%, respectively, compared with s=2 and z=2 non-overlapping case.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
    Train the CNNs Optimization Techniques Back-propagation ◦ Sparse Connections of CNNs decrease the complexity of Back-Propagation ◦ ReLU activation function relieves the vanishing gradient problem Stochastic Gradient Descent
  • 32.
    Loss Minimization Slide creditfrom Nadav Cohen, “Adam: A Method for Stochastic Optimization”
  • 33.
    Large-Scale Setting Slide creditfrom Nadav Cohen, “Adam: A Method for Stochastic Optimization”
  • 34.
    Optimization Methods Requirements Slidecredit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
  • 35.
    Stochastic Gradient Descent(SGD) Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
  • 36.
    Variants to thebasic SGD Nestrov’s Accelerated Gradient (SGD) Adaptive Gradient (AdaGrad) Root Mean Square Propagation (RMSProp) Adaptive Moment Estimation (Adam)
  • 37.
    NAG Slide credit fromNadav Cohen, “Adam: A Method for Stochastic Optimization”
  • 38.
    AdaGrad Slide credit fromNadav Cohen, “Adam: A Method for Stochastic Optimization”
  • 39.
    RMSProp Slide credit fromNadav Cohen, “Adam: A Method for Stochastic Optimization”
  • 40.
    ADAM Slide credit fromNadav Cohen, “Adam: A Method for Stochastic Optimization”
  • 41.
    ADAM Slide credit fromNadav Cohen, “Adam: A Method for Stochastic Optimization”
  • 42.
    Comparisons of DifferentOptimization Methods Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
  • 43.
    Multi-Layer Neural Networkson MNIST Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
  • 44.
    Convolutional Neural Networkson CIFAR-10 Slide credit from Nadav Cohen, “Adam: A Method for Stochastic Optimization”
  • 45.
  • 46.
  • 47.
  • 48.
    Section III. CNNswith Tensorflow and TFlearn Images from Peter Goldsborough, A Tour of Tensorflow
  • 49.
    Tensorflow Tensorflow is anopen-source library for numerical computation using data flow graphs ◦ Developed by Google Brain Team and Google’s Machine Intelligence research Org. Implementation ML in tensorflow ◦ In tensorflow, computations are represented using Graphs ◦ Each node is an operation (OP) ◦ Data is represented as Tensors ◦ OP takes Tensors and returns Tensors Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
  • 50.
    Construction of ComputationalGraph Follow the 3-steps pattern ◦ 1. inference() – Builds the graph as far as is required for running the network forward to make predictions ◦ 2. loss() – Adds to the inference graph the ops required to generate loss ◦ 3. training() – Adds to the loss graph the ops required to compute and apply gradients Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
  • 51.
  • 52.
    Load the trainingdata, using MNIST from tensorflow.examples.tutorials.mnist import input_data Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
  • 53.
    Weight Initialization Tensorflow DemoExamples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
  • 54.
    Convolution and Pooling TensorflowDemo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
  • 55.
    First Convolutional Layer TensorflowDemo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
  • 56.
    Second Convolutional Layer TensorflowDemo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
  • 57.
    Fully Connected Layer TensorflowDemo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
  • 58.
    Dropout Tensorflow Demo Examples,credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
  • 59.
    Readout Layer Tensorflow DemoExamples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
  • 60.
    Train and Evaluate TensorflowDemo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
  • 61.
    Execute Tensorflow Demo Examples,credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
  • 62.
    TFLearn TFLearn is anabstraction library built on top of Tensorflow that provides high-level building blocks to quickly construct TensorFlow graphs. ◦ Highly modular interface ◦ Allow rapid chaining of neural network layers, regularization functions, optimizers and other elements ◦ Can be used with tensorflow hybridly In the following part, let’s implement the previous CNN model with tflearn, and see how much easier life is now! TFLearn Website http://tflearn.org/
  • 63.
    Redo the samething with TFLearn Import the packages TFLearn Website http://tflearn.org/
  • 64.
    Load MNIST dataset TFLearnWebsite http://tflearn.org/
  • 65.
    Build the convolutionalnetwork TFLearn Website http://tflearn.org/
  • 66.
    Training the Network TFLearnWebsite http://tflearn.org/
  • 67.
    Conclusion Pros: ◦ Deep ConvolutionalNeural Networks represent current state-of-the-art techniques in image classification, object detection and localization ◦ Powerful CNN models are like AlexNet, InceptionNet, Deep Residual Networks ◦ Open-source libraries for deploying applications with CNNs very fast ◦ Convolutional Neural Networks can share pre-trained weights, which is the basis for transfer learning Cons: ◦ The interpretation and mechanism of CNNs are not clear, we don’t know why they work better than previous models ◦ Large number of training data and annotations are needed, which may not be practical in some problems.
  • 68.