Convolutional Neural Network

❤ Convolutional Neural Network
Presented by Junho Cho (@junhocho)
© Junho Cho, 2016 1

Convolutional?

This is Neural Network (NN)

Machine Learning =
Feature Representation + Classiﬁer
such as ...

SIFT feature + SVM

HoG feature + Random Forest

HoG feature + SVM

SIFT feature + Random Forest

CNN feature + SVM

SIFT feature + Neural Network

Problem
Lots of feature extractor, hard feature engineering
Which classiﬁer?
Framework is extremely modular

Deep Learning
enables learning representation and classiﬁer
End-to-End (Learn together)

To be End-to-End
All part in network is differentiable
For Back-propagation

Much easier training (relatively too past)
Don't need to extract feature yourself
Just let the Neural Network
Learn feature by itself!
Including classiﬁer!

Require less domain knowledge
Apply to various domain!
Speech, Text, Image, Reinforcement Learning

DenseCap : localize and caption (Vision + Natural Language)

and Better performance ⭐

This is Typical Convolutional
Neural Network (CNN)
[LeNet-5, LeCun 1980]

Baic CNN is
composed of
1. Convolution (Conv)
2. Pooling (Subsampling)
3. Rectiﬁed Linear Unit (ReLU)
4. Fully Connected layers (FC)

Basic CNN
[(Conv-ReLU)*n - POOL] * m - (FC-ReLU) * k - loss
that's it! for real

Will explain these computations later

CNN usage?
Mostly on
Images!

Used as image recognizer
• Object Classiﬁcation(Recognition)
• Object Detection
• Image Captioning
• Visual Q&A
• Even in Alphago

Where to begin ...
the History!

[LeNet-5, LeCun 1980]

Shallow CNN existed.
But Deep CNN wasn't popular
1. Computationally hard at the moment
2. Vanishing gradient problem: Can't train deep net

We wanna go
deeeeeeeper

Deep Learning
This is now solved with several
advantages
1. Lots of data (ImageNet)
2. Powerful computation (GPU)
3. Some practical technique (ReLU, Dropout)

What is ImageNet?

http://image-net.org
ImageNet is an image database containing 14,197,122 images
with labels.
ILSVRC : challenge of Classiﬁcation/Localization/Detection

ImageNet Top-5 Classiﬁcation
Error

Go deeeeeeeper

Where we use it
again?

Neural Art
video link

Now let's understand the computation
Conv, ReLU, Pool, FC

Reminder: Perceptron

Fully Connected (FC) layers
Densely connected.
Compute all input neurons, Spatial information disappears.

FC is matrix multiplcation.
output = tf.matmul(input, W)
input size :
output size :
then W has parameters

Convolution
It keeps spatial information using convolutional ﬁlters

Reminder: 1D Convolution

But We do 2D
convolution on
images

Basically train these Conv ﬁlters
Via Back-propagation

More hyperparameters in Conv
1. stride : step of Conv ﬁlter
2. padding : add borders (usually zero) of input

Summup of Conv
• Slide Conv ﬁlter over input.
• Maintain spatial info with shared ﬁlter weight
• Parameters: kernel_size, filterNum, padding, stride
• Learning parameter: of kernel_size

In TensorFlow
tf.nn.conv2d(input, filter,
strides, padding,
use_cudnn_on_gpu=None,
data_format=None,
name=None)

• kernel_size = 3
• Stride = 2
• padding = 0

• kernel_size = 3
• Stride = 2
• padding = 1

• kernel_size = 3
• Stride = 1
• padding = 1

Compare FC and Conv
Local Invariance
ﬁgure credit to CS231n

CNN is powerful because
• Local invariance
the convolution filters are ‘sliding’ over the input image, the
exact location of the object we want to find does not matter
much.
• Less parameters
This helps preventing overfitting.

Basically Conv is special case of FC.
• Doubly block circulant matrix
• Toeplitz matrix
Conv can be implemented with Matrix Multiplication

We apply ReLU to have non-linearity of model.

We needs activation function after Conv

Rectiﬁed Linear Unit
this is activation function, replacing sigmoid

single Perceptron
: XOR is unsolvable
1. Thus bend the dimension (add non-linearity!)
2. Multi Layer Perceptron

Activation functions are like
switch.
On/Off of each neurons
Apply Non-linearity of Neural Network
Let's test it!

sigmoid has Gradient Vanishing problem.
ReLU is practically the best activation function in CNN

But also sigmoid and tanh is occasionally used
Also there is PReLU, LeakyReLU

Pooling
• makes the representations smaller and more manageable
• Reduces number of parameters and prevent overﬁtting
• operates over each activation map independently
• Average, L2-norm, Max-pooling

Max-pooling
Normally use max-pooling because generally performs better

However, recent model replace pooling with strided-Conv

Dropout
While training, turn off neuron randomly.
Regularizes the model.

How they look in code?
def model(X, w, w2, w3, w4, w_o, p_keep_conv, p_keep_hidden):
l1a = tf.nn.relu(tf.nn.conv2d(X, w,
strides=[1, 1, 1, 1], padding='SAME'))
# l1a output shape=(?, input_height, input_width, number_of_channels_layer1)
l1 = tf.nn.max_pool(l1a, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
# l1 output shape=(?, input_height/2, input_width/2, number_of_channels_layer1)
l1 = tf.nn.dropout(l1, p_keep_conv)
l2a = tf.nn.relu(tf.nn.conv2d(l1, w2,
# l2a output shape=(?, input_height/2, input_width/2, number_of_channels_layer2)
# l2 shape=(?, input_height/4, input_width/4, number_of_channels_layer2)
# l3a shape=(?, input_height/4, input_width/4, number_of_channels_layer3)
l3 = tf.reshape(l3, [-1, w4.get_shape().as_list()[0]])
# flatten to (?, input_height/8 * input_width/8 * number_of_channels_layer3)
l4 = tf.nn.relu(tf.matmul(l3, w4))
#fully connected_layer
l4 = tf.nn.dropout(l4, p_keep_hidden)
pyx = tf.matmul(l4, w_o)
return pyx

Let's analyze famous architecutre

Only uses 3x3 Conv layers.
In point of receptive ﬁeld,
two 3x3 is better than one 5x5
or three 3x3 is better than 7x7
because of lesser parameters.
Too deep network is hard to train.
Performance: VGG16 > VGG19
Many researchers ﬁnetuned this net for
their task, because of simplicity of the
net architecture.
© Junho Cho, 2016 100

GoogLeNet
© Junho Cho, 2016 101

Use inception model.
Enhance scale invariance.
© Junho Cho, 2016 102

ResNetYou can actually train super deep network!
© Junho Cho, 2016 103

Skip-connection optimizes training!
Learn residual transformation.
© Junho Cho, 2016 105

Batch Normalization
© Junho Cho, 2016 107

To train your Neural Network.
Keep in mind
1. Data-preparation : preprocessing, input & output
2. Architecture : dimension check
3. Loss function : CrossEntropy? L2?
4. Optimizer : SGD, Adam, ...
© Junho Cho, 2016 109

Just replace
SGD: tf.train.GradientDescentOptimizer
with ADAM: tf.train.AdamOptimizer
slide from link
© Junho Cho, 2016 114

Import libraries!
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
© Junho Cho, 2016 116

function for variables
def init_weights(shape):
return tf.Variable(tf.truncated_normal(shape, stddev=0.01))
© Junho Cho, 2016 117

Deﬁne model
def model(X, w, w2, w3, w4, w_o, p_keep_conv, p_keep_hidden):
l1a = tf.nn.relu(tf.nn.conv2d(X, w,
# l1a output shape=(?, input_height, input_width, number_of_channels_layer1)
# l1 output shape=(?, input_height/2, input_width/2, number_of_channels_layer1)
# l2a output shape=(?, input_height/2, input_width/2, number_of_channels_layer2)
# l3a shape=(?, input_height/4, input_width/4, number_of_channels_layer3)
l3 = tf.reshape(l3, [-1, w4.get_shape().as_list()[0]])
# flatten to (?, input_height/8 * input_width/8 * number_of_channels_layer3)
l4 = tf.nn.relu(tf.matmul(l3, w4))
#fully connected_layer
l4 = tf.nn.dropout(l4, p_keep_hidden)
pyx = tf.matmul(l4, w_o)
return pyx
© Junho Cho, 2016 118

Prepare Data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=False)
X_trn, Y_trn, X_test, Y_test = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels
X_trn = X_trn.reshape(-1, 28, 28, 1) # 28x28x1 input img
X_test = X_test.reshape(-1, 28, 28, 1) # 28x28x1 input img
© Junho Cho, 2016 119

Initialize
w = init_weights([3, 3, 1, 32])
w2 = init_weights([3, 3, 32, 64])
w3 = init_weights([3, 3, 64, 128])
w4 = init_weights([128 * 4 * 4, 625])
w_o = init_weights([625, 10])
p_keep_conv = tf.placeholder(tf.float32)
p_keep_hidden = tf.placeholder(tf.float32)
py_x = model(X, w, w2, w3, w4, w_o, p_keep_conv, p_keep_hidden)
© Junho Cho, 2016 120

Deﬁne loss function
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(py_x, Y))
Select Optimizer
train_op = tf.train.AdagradOptimizer(learning_rate=0.05).minimize(loss)
© Junho Cho, 2016 121

Monitor accuracy of my model
correct = tf.nn.in_top_k(py_x, Y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
Monitor my loss function drop
x = np.arange(50)
plt.plot(x, trn_loss_list)
plt.plot(x, test_loss_list)
plt.title("cross entropy loss")
plt.legend(["train loss", "test_loss"])
plt.xlabel("epoch")
plt.ylabel("cross entropy")
© Junho Cho, 2016 123

Tips for start training
• Good weight initialization
• Random gaussian initialization
• Famous initialization: Xavier, He
• Or use pretrained network
• ImageNet pretrained VGG16 !
© Junho Cho, 2016 124

• Don't ignore data preparation
• you need lots of data
• Most annoying and difﬁcult part
• Preprocessing: normalize input image
• 0~256 >> -1~1
© Junho Cho, 2016 125

• Visualize Training samples and loss function
• Don't just pray for result
• Monitoring is necessary
© Junho Cho, 2016 126

• Check gradient & Adjust learning rate
• NaN ??!!
• gradient explosion: lower learning rate
• Don't be too much creative
• start from base code/architecture like VGG
• Ensemble
• gain extra 2% performance more
© Junho Cho, 2016 127

Other Deep Learning FrameWorks?
• Torch
• TensorFlow (Keras, Slim, PrettyTensor, ...)
• Caffe
• Theano (Keras, Lasagne)
• MXnet, CNTK, PaddlePaddle, Chainer, ...
© Junho Cho, 2016 128

Most bottleneck of training time is actually
DATA I/O or CPU
Nice data loading codes already implemented
© Junho Cho, 2016 134

Deconvolutionalso known as Up-Convoluion / Transpose-Convolution
Computation is same as Back-propagation of Convolution
It increase spatial dimension
© Junho Cho, 2016 139

GAN formulation
Generator Network and Discriminator Network fools each
other.
Finally, Generator produces samples that fool Discriminator.
© Junho Cho, 2016 150

Convolutional Neural Network

More Related Content

What's hot

Similar to Convolutional Neural Network

More from Junho Cho

Recently uploaded

Convolutional Neural Network