Convolutional Neural Network
School of Computer Sicience and Engineering
1. Introduction
2. Convolution
3. Relu
4. Pooling
School of Computer Sicience and Engineering
3. Example by Tensorflow
1.1 Definition
• CNN is a specualized kind of neural network for processing
data thtat has a known, grid-like topology, such as time-
series(1D grid), image data(2D grid), etc.
• CNN is a supervised deep learning algorithm, it is used in
various fields like speech recognition, image retrieval and
face recognition.
School of Computer Sicience and Engineering
1.1 Definition
• ImageNet Classification with Deep Convolutional Neural Networks
(Cited by 9538, NIPS 2012, Alex Krizhevsky, Ilya Sutskever, Geoffrey
E. Hinton)
• build a CNN, has 60 million parameters and 650,000 neurons,
consists of five convolutional layers.
• Typical CNN is a 5 layer architecture consist of convolution layer,
pooling layer and classification layer.
• Convolution layer: extract the unique features from the input
image
• Pooling layer: reduce the dimensionality
• Generally CNN is trained using back-propagation algorithm
School of Computer Sicience and Engineering
1.2 Motivation
• MLP do not scale well
• MLP ignore pixel correlation
• MLP are not robust to image transformation
School of Computer Sicience and Engineering
multi-layer perceptron
2.1 Why Convolution ?
• preserves the spatial relationship
between pixels by learning image
features using small squares of
input data
• detect small,meaningful features
such as edges with kernels
School of Computer Sicience and Engineering
A 2D convolution example from deep learning book
2.2 Convolution Example
School of Computer Sicience and Engineering
input matrix
kernel matrix
2.2 Convolution Example
School of Computer Sicience and Engineering
different filters can detect
different features
2.2 Convolution Example
School of Computer Sicience and Engineering
The Convolution Operation
3 ReLU
• Introducing the Non Linearity
School of Computer Sicience and Engineering
Other non linear
functions such as
tanh or sigmoid can
also be used instead
of ReLU, but ReLU
has been found to
perform better in
most situations
4.1 Motivation of Pooling
• Reduce dimensionality
• In all cases, pooling helps to make the representation become
approximately invariant to small translations of the input.
• local translation can be a very useful property if we care more about
whether some feature is present than exactly where it is.
• Type of Pooling
• Max(works better)
• Average
• Sum
School of Computer Sicience and Engineering
4.2 Max Pooling
School of Computer Sicience and Engineering
5 Example by Tensorflow
School of Computer Sicience and Engineering
28 * 28
5 Example by Tensorflow
School of Computer Sicience and Engineering
5 Example by Tensorflow
School of Computer Sicience and Engineering
• zero-padding the 28x28x1 image to
32x32x1
• applying 5x5x32 convolution to get
28x28x32
• max-pooling down to 14x14x32 zero-
padding the 14x14x32 to 18x18x32
• applying 5x5x32x64 convolution to get
14x14x64
• max-pooling down to 7x7x64.
5 Example by Tensorflow
School of Computer Sicience and Engineering
Reference
• http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork
• http://cs231n.github.io/convolutional-networks/
• https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
• Deep Learning Book
• http://www.slideshare.net/ssuser06e0c5/explanation-on-tensorflow-
example-deep-mnist-for-expert
• http://shuaizhang.tech/2016/12/08/Tensorflow%E6%95%99%E7%A8%8B2
-Deep-MNIST-Using-CNN/
School of Computer Sicience and Engineering
School of Computer Sicience and Engineering

Introduction to CNN

  • 1.
    Convolutional Neural Network Schoolof Computer Sicience and Engineering
  • 2.
    1. Introduction 2. Convolution 3.Relu 4. Pooling School of Computer Sicience and Engineering 3. Example by Tensorflow
  • 3.
    1.1 Definition • CNNis a specualized kind of neural network for processing data thtat has a known, grid-like topology, such as time- series(1D grid), image data(2D grid), etc. • CNN is a supervised deep learning algorithm, it is used in various fields like speech recognition, image retrieval and face recognition. School of Computer Sicience and Engineering
  • 4.
    1.1 Definition • ImageNetClassification with Deep Convolutional Neural Networks (Cited by 9538, NIPS 2012, Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton) • build a CNN, has 60 million parameters and 650,000 neurons, consists of five convolutional layers. • Typical CNN is a 5 layer architecture consist of convolution layer, pooling layer and classification layer. • Convolution layer: extract the unique features from the input image • Pooling layer: reduce the dimensionality • Generally CNN is trained using back-propagation algorithm School of Computer Sicience and Engineering
  • 5.
    1.2 Motivation • MLPdo not scale well • MLP ignore pixel correlation • MLP are not robust to image transformation School of Computer Sicience and Engineering multi-layer perceptron
  • 6.
    2.1 Why Convolution? • preserves the spatial relationship between pixels by learning image features using small squares of input data • detect small,meaningful features such as edges with kernels School of Computer Sicience and Engineering A 2D convolution example from deep learning book
  • 7.
    2.2 Convolution Example Schoolof Computer Sicience and Engineering input matrix kernel matrix
  • 8.
    2.2 Convolution Example Schoolof Computer Sicience and Engineering different filters can detect different features
  • 9.
    2.2 Convolution Example Schoolof Computer Sicience and Engineering The Convolution Operation
  • 10.
    3 ReLU • Introducingthe Non Linearity School of Computer Sicience and Engineering Other non linear functions such as tanh or sigmoid can also be used instead of ReLU, but ReLU has been found to perform better in most situations
  • 11.
    4.1 Motivation ofPooling • Reduce dimensionality • In all cases, pooling helps to make the representation become approximately invariant to small translations of the input. • local translation can be a very useful property if we care more about whether some feature is present than exactly where it is. • Type of Pooling • Max(works better) • Average • Sum School of Computer Sicience and Engineering
  • 12.
    4.2 Max Pooling Schoolof Computer Sicience and Engineering
  • 13.
    5 Example byTensorflow School of Computer Sicience and Engineering 28 * 28
  • 14.
    5 Example byTensorflow School of Computer Sicience and Engineering
  • 15.
    5 Example byTensorflow School of Computer Sicience and Engineering • zero-padding the 28x28x1 image to 32x32x1 • applying 5x5x32 convolution to get 28x28x32 • max-pooling down to 14x14x32 zero- padding the 14x14x32 to 18x18x32 • applying 5x5x32x64 convolution to get 14x14x64 • max-pooling down to 7x7x64.
  • 16.
    5 Example byTensorflow School of Computer Sicience and Engineering
  • 17.
    Reference • http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork • http://cs231n.github.io/convolutional-networks/ •https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/ • Deep Learning Book • http://www.slideshare.net/ssuser06e0c5/explanation-on-tensorflow- example-deep-mnist-for-expert • http://shuaizhang.tech/2016/12/08/Tensorflow%E6%95%99%E7%A8%8B2 -Deep-MNIST-Using-CNN/ School of Computer Sicience and Engineering
  • 18.
    School of ComputerSicience and Engineering

Editor's Notes

  • #2 \hat{r} = u + b_u +b_i + p_u^T(q_i+\epsilon)
  • #3 \hat{r} = u + b_u +b_i + p_u^T(q_i+\epsilon)
  • #4 \hat{r} = u + b_u +b_i + p_u^T(q_i+\epsilon)
  • #5 \hat{r} = u + b_u +b_i + p_u^T(q_i+\epsilon)
  • #6 \hat{r} = u + b_u +b_i + p_u^T(q_i+\epsilon)
  • #7 \hat{r} = u + b_u +b_i + p_u^T(q_i+\epsilon)
  • #8 \hat{r} = u + b_u +b_i + p_u^T(q_i+\epsilon)
  • #9 \hat{r} = u + b_u +b_i + p_u^T(q_i+\epsilon)
  • #10 \hat{r} = u + b_u +b_i + p_u^T(q_i+\epsilon)
  • #11 \hat{r} = u + b_u +b_i + p_u^T(q_i+\epsilon)
  • #12 For example, when determining whether an image contains a face, we need not know the location of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on the left side of the face and an eye on the right side of the face
  • #13 For example, when determining whether an image contains a face, we need not know the location of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on the left side of the face and an eye on the right side of the face
  • #14 For example, when determining whether an image contains a face, we need not know the location of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on the left side of the face and an eye on the right side of the face
  • #15 For example, when determining whether an image contains a face, we need not know the location of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on the left side of the face and an eye on the right side of the face
  • #16 For example, when determining whether an image contains a face, we need not know the location of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on the left side of the face and an eye on the right side of the face
  • #17 For example, when determining whether an image contains a face, we need not know the location of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on the left side of the face and an eye on the right side of the face
  • #18 For example, when determining whether an image contains a face, we need not know the location of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on the left side of the face and an eye on the right side of the face
  • #19 \hat{r} = u + b_u +b_i + p_u^T(q_i+\epsilon)