The document presents an overview of Convolutional Neural Networks (CNNs), detailing their advantages over traditional neural networks, specific architectures like AlexNet, and their performance in tasks such as image classification and object detection. It highlights the role of deep learning techniques and technologies that have contributed to the rise of CNNs, including improved optimization methods and powerful computational resources. Additionally, it discusses the use of TensorFlow and TFLearn for implementing CNNs, emphasizing the state-of-the-art results achieved with these models while acknowledging ongoing challenges in interpreting their mechanisms.
Overview of CNNs, their importance in deep learning, and presentation structure.
Introduction to CNNs, their operations, local vs. global connections, and their advantages.
MNIST dataset specifications and classification performance records using various techniques.
Introduction to ImageNet Challenge, its data diversity and tasks for object classification.
Discussion on boosting technologies that enhance CNN performance, such as GPU and data availability.
Details on AlexNet architecture, including its layers and components.
Methods for reducing overfitting in CNNs and their effectiveness in enhancing model performance.
Various optimization techniques including SGD, ADA, and comparative methods to improve CNN training.
Introduction to TensorFlow for CNN modeling, showcasing steps for building and training networks.
Implementation of CNN using TFLearn, highlighting its advantages and comparison with TensorFlow.Summary of CNNs’ strengths and weaknesses, and a thank you note concluding the presentation.
Outlines for Today
SectionI. Basics of Convolutional Neural Networks
◦ What is CNN?
◦ Comparison with traditional Neural Networks
◦ Why we need CNNs?
◦ Boosting Technologies for CNNs
Section II. More Details of Convolutional Neural Networks
◦ AlexNet A Network for classification The “Equation”
◦ Optimization Methods in Neural Networks The Numerical “Solver”
Section III. Convolutional Neural Networks with Tensorflow and TFlearn
3.
Section I. TheBasics
Image from http://parse.ele.tue.nl/cluster/2/CNNArchitecture.jpg
4.
What is ConvolutionalNeural Network?
What is convolution?
◦ It is a specialized linear operation.
◦ A 2D convolution is shown on the right. (Images From: community.arm.com)
◦ Strictly speaking, it’s cross-correlation.
◦ In CNNs, all convolution operations are actually cross-correlation.
Convolutional neural networks are neural networks that use convolution in place of general
matrix multiplication in at least one of their layers. They are very powerful in processing data
with grid-like topology. [1]
[1] Ian Goodfellow, Yoshua Bengio, Aaron Courville , Deep Learning
5.
Comparison with MLP
Inlast lecture, we got to know MLP(multi-layer perceptron), where the operation from one
layer to neurons in the upper layer is matrix multiplication controlled by weights and bias.
In CNNs, where do those “Neurons” go?
◦ Each neuron is one element in the matrix after convolution
◦ weights are shared
6.
Comparison with MLP
Local Connections
A
B
C
A, with convolution kernel size = 3, the activated neurons are only affected by local neurons , unlike in B,
where there are full connections; however, with depth, the receptive field can expand, and get global connections
to neurons in lower layer.
7.
Why we NeedConvolutional Neural
Networks?
A lot of challenges we could not deal with in the past, now with CNN, yes, we can! :D
A lot of things we could do in the past, now with CNN, we can do better!
CNNs represent current state-of-the-art technique in classification, object detection etc.
Now, let’s take a brief look at these achievements…
8.
MNIST Hand-written digitsrecognition
The MNIST database of handwritten digits
◦ Has a training set of 60000 examples,
◦ Has a test set of 10000 examples,
◦ Is a subset of a larger set available from NIST ( National Institute of Standards and Technology)
◦ The digits have been size-normalized (28x28) and centered in a fixed-size image.
http://simonwinder.com/2015/07/training-neural-nets-on-mnist-digits/
9.
MNIST Classification Record[1]
Classifier Preprocessing Best Test Error Rate (%)
Linear Classifiers deskewing 7.6
K-Nearest Neighbours Shape-context feature extraction 0.63
Boosted Stumps Haar features 0.87
Non-linear classifiers none 3.3
SVMs deskewing 0.56
Neural Nets none 0.35
Convolution Neural Nets Width normalization 0.23
[1] http://yann.lecun.com/exdb/mnist/
10.
The ImageNet Challenge[1][2]
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is a benchmark in object
category classification and detection on hundreds of object categories and millions of images
◦ The ILSVRC challenge has been running annually since 2010, following the footsteps of PASCAL VOC
challenge, which was established in 2005.
◦ ILSVRC 2010, 1,461406 images and 1000 object classes.
◦ Images are annotated, and annotations fall into one of two categories
◦ (1) image-level annotation of a binary label for the presence or absence of an object class in the image;
◦ (2) object-level annotation of a tight bounding box and class label around an object instance in the image.
◦ ILSVRC 2017, the last ILSVRC challenge.
◦ In these years, several convolutional neural network structure won the first place:
◦ AlexNet 2012
◦ InceptionNet 2014
◦ Deep Residual Network 2015
[1] http://image-net.org/challenges/LSVRC/2017/
[2] Olga Russakovsky et al., ImageNet Large Scale Visual Recognition Challenge
Technology Behind PRISMA[1]
Deep Convolutional Neural Networks
(a) Separate the content and style of an image
(b) Recombine the content of one image with
the style of another image
[1] Leon A. Gatys et al, A Neural Algorithm of Artistic Style
16.
Boosting Technology forCNNs
The First CNN prototype appeared much earlier, but why it becomes super-hot only in the recent
years?
◦ Huge amount of data and advanced storage/memory systems
◦ GPU acceleration which is super fast in convolution operations (Nvidia GPU Tesla K40 1.4 TFlops)
◦ Deep neural network structures
◦ Optimization methods for training the deep CNNs are invented, like stochastic gradient descent
◦ Off-the-shelf software package solutions are available and easy to use
◦ Progress in both hardware and software make CNNs the ONE!
17.
Section II: MoreDetails [1]
http://www.ritchieng.com/machine-learning/deep-learning/convs/
[1] Slides in section II, credit from slides presented by Tugce Tasci and Kyunghee Kim
Overlapping Pooling
Pooling summarizethe outputs of neighbouring groups of neurons in the same kernel map.
Two important parameters
◦ Kernel size : z
◦ Stride size: s
◦ If s < z, then the max-pooling is overlapped
In the experiment, s=2, z=3 overlapped pooling reduces the top-1 and top-5 error rates by 0.4%
and 0.3%, respectively, compared with s=2 and z=2 non-overlapping case.
Train the CNNs Optimization
Techniques
Back-propagation
◦ Sparse Connections of CNNs decrease the complexity of Back-Propagation
◦ ReLU activation function relieves the vanishing gradient problem
Stochastic Gradient Descent
Section III. CNNswith Tensorflow and
TFlearn
Images from Peter Goldsborough, A Tour of Tensorflow
49.
Tensorflow
Tensorflow is anopen-source library for numerical computation using data flow graphs
◦ Developed by Google Brain Team and Google’s Machine Intelligence research Org.
Implementation ML in tensorflow
◦ In tensorflow, computations are represented using Graphs
◦ Each node is an operation (OP)
◦ Data is represented as Tensors
◦ OP takes Tensors and returns Tensors
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
50.
Construction of ComputationalGraph
Follow the 3-steps pattern
◦ 1. inference() – Builds the graph as far as is required for running the network forward to make
predictions
◦ 2. loss() – Adds to the inference graph the ops required to generate loss
◦ 3. training() – Adds to the loss graph the ops required to compute and apply gradients
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
Load the trainingdata, using MNIST
from tensorflow.examples.tutorials.mnist import input_data
Tensorflow Demo Examples, credit from Jesus Fernandez Bes, “Introduction to convolutional Networks using Tensorflow”
TFLearn
TFLearn is anabstraction library built on top of Tensorflow that provides high-level building
blocks to quickly construct TensorFlow graphs.
◦ Highly modular interface
◦ Allow rapid chaining of neural network layers, regularization functions, optimizers and other elements
◦ Can be used with tensorflow hybridly
In the following part, let’s implement the previous CNN model with tflearn, and see how much
easier life is now!
TFLearn Website http://tflearn.org/
63.
Redo the samething with TFLearn
Import the packages
TFLearn Website http://tflearn.org/
Conclusion
Pros:
◦ Deep ConvolutionalNeural Networks represent current state-of-the-art techniques in image
classification, object detection and localization
◦ Powerful CNN models are like AlexNet, InceptionNet, Deep Residual Networks
◦ Open-source libraries for deploying applications with CNNs very fast
◦ Convolutional Neural Networks can share pre-trained weights, which is the basis for transfer learning
Cons:
◦ The interpretation and mechanism of CNNs are not clear, we don’t know why they work better than
previous models
◦ Large number of training data and annotations are needed, which may not be practical in some
problems.