CNN and its applications by ketaki

P R E S E N T E D B Y
PAT WA R I K E TA K I J .
2 0 1 7 M N S 0 1 0
U N D E R T H E G U I D A N C E
O F
D R . M R S . WA G H M A R E J . M .

Contents
 Introduction
 ConvNet Layers
 Activation functions
 Applications of CNN
 Image classification
 Traffic sign recognition
 Vehicle detection
 Some another interesting application
 Conclusion
 References

Introduction
 What is ANN?
 A neural network is a system of interconnected artificial “neurons” that exchange
messages between each other.

fig2. Training of neural network
Fig3. Illustration of neuron and mathematical model

 What is Convolutional neural network?
 A CNN is a special case of the ANN . A CNN consists of one or more convolutional
layers, often with a sub-sampling layer, which are followed by one or more fully
connected layers as in a standard neural network.
 convolution layers play the role of feature extractor
Fig 4 Typical block diagram of a CNN

ConvNet Layers
1) Convolution layers
 The convolution operation extracts different features of the input. The first
convolution layer extracts low-level features like edges, lines, and corners.

2) Pooling Layer/subsampling layer

 3) Non-linear layers
 Neural networks in general and CNNs in particular rely on a non-linear “trigger”
function to signal distinct identification of likely features on each hidden layer.
CNNs may use a variety of specific functions such as rectified linear units (ReLUs)
and continuous trigger (non-linear) functions to efficiently implement this non-linear
triggering.
 ReLU(Rectified Linear Unit)
implements function y = max(x,0).

 Continuous trigger (non-linear) function
 The non-linear layer operates element by element in each feature. A
continuous trigger function can be hyperbolic tangent (Figure I), absolute of
hyperbolic tangent (Figure II), or sigmoid (Figure III).
Figure I Figure II Figure III
Figure IV : tanh processing

Activation Functions
 Activation functions are non-linarites that take on a single number and do some
mathematical operations on it.
1. Sigmoid:
 This non-linearity takes as input a real-valued function and outputs value in the
range of 0 and 1.
2.Tanh:
 Tanh can be considered as a scaled up version of sigmoid, outputting values in the
range of -1 and 1.
tanh (x) = 2f (2 x)−1
3. Leaky ReLU:
 Slight modification to ReLU resulted in Leaky ReLU given by Equation.

Applications of CNN
 The main categories are as follows:
1. image and pattern recognition
2. speech recognition
3. natural language processing
4. video analysis

 Adaboost can effectively control the quantity of weak classifiers, which greatly
increases execution efficiency through integration; therefore, it has also been applied
to real-time object testing.
 This effectively reduces parameter redundancy, which results in greater
generalization and higher training efficiency.
 The first step in this paper is to introduce the basic knowledge of the convolution,
pooling, weight decay, and dropout of the convolution neural network, as well as the
theories and knowledge of ensemble learning.
 With the convolution neural network and the learning principles of Adaboost, this
paper takes the initiative to propose the Boost Convolutional Neural Network, and
conducts experimentation and analysis of image classification with the benchmark
datasets of CIFAR-10 image classification.

 BOOSTING ENSEMBLE LEARNING
 As shown in Figure below , ensemble learning completes learning tasks by constructing and
cooperating with several learners, meaning the so-called ‘‘collectivism’’, where a group of
individual learners are trained, and a strategy is then adopted to combine them.
 Boosting Algorithm
 Boosting is an algorithm concept that combines weak classifiers through certain
methods to form a strong classifier with high classification performance.
 Process of boosting classification:
1. Train the first weak classifier (h1) in the initialized training dataset
2. Combine the poorly-trained data of the weak classifier (h1) with new data to form
the training data for a new round, and train the second weak classifier (h2)

3. Combine the poorly-trained data of weak classifier (h1) and weak classifier (h2)
with new data to form the training data for a new round, and train the third weak
classifier (h3)
4. Repeat Steps 2-3 until an adequate number of weak classifiers have been attained.
5. Form a strong classifier through the weighted voting of all weak classifiers.
 ADABOOST ALGORITHM(Adaptive Boosting)
As its name suggests, according to the trained classifiers, AdaBoost can self-adjust
weak classifiers after learning and is sensitive to noise data and outliers. In some
tasks, it can efficiently resist overfitting.
The concept of Adaboost can be described, as follows:
 the given training dataset T = {(x1, y1), (x2, y2) . . . (xN, yN)}, where x ∈ X, X ∈ R
n , and yi is a mark set {−1, +1}.
 Adaboost aims to acquire a series of weak classifiers from the training data, and
then, combines these weak classifiers into a strong classifier. Moreover, its
computational process can be described, as follows:
1. The weight distribution of the initial training data; each training sample is given the
same initial weight: 1/N i.e

D1 = (w11,w12 . . . w1i . . . ,w1N ), w1i = 1/ N
i = 1, 2, . . . ,N
2. In the case of m = 1, 2, . . . , M uses the training data with the weight distribution
(Dm) in learning to obtain a basic binomial classifier, as follows:
Gm(x) : x → {−1, +1}
3. the rate of a wrong classification of classifier (Gm(x)) in the training dataset is, as
follows:
4. coefficient (αm) of Gm(x)) is calculated, which refers to the voting weight of
Gm(x)) in the final classier:
5. the weight distribution of the updated training datasets is, as follows:

Zm is a normalized factor
6. the linear combination of basic classifiers is established:
The final classifier is then obtained is:

Proposed BoostCNN structure
BoostCNN training procedure.

 SIMULATION EXPERIMENT AND ANALYSIS
 To analyze the performance of the BoostCNN model, this chapter uses the four-
layered convolution network and Softmax classifier as the feature learning
architecture, and adopts the AdaBoost algorithm to generate several Softmax
classifiers as the output.
 CIFAR-10 DATASET
 the CIFAR-10 dataset is an image dataset whose 60,000 colorful images (32 ∗ 32)
are classified into 10 categories. Specifically, 50,000 images are classified as
training data, while 10,000 images are classified as testing data. As shown in Figure
17, the 10 categories are plane, car, bird, cat, deer, dog, frog, horse, ship, and truck,
and each category includes 6,000 images.
Experimental comparison of CIFAR-10 testing datasets.

CONCLUSION
 This study focuses on how to combine a convolution neural network with AdaBoost
to enhance the image identification performance of the learning algorithms.
 After the convolution neural network is trained into the deep feature extraction
model, and the original images are converted to acquire the deep features, AdaBoost
is used for ensemble learning.
 The conclusions of this paper are, as follows:
1) After the feature extraction of the deep convolution neural network, the original
image data are fully abstracted, thus, the traditional learning algorithms can also
effectively learn highly complicated image data.
2) Through the comparative experiments of the CIFAR-10 image dataset, AdaBoost
can generally enhance the performance of base learners by 3%.

 Traffic sign recognition (TSR) represents an important feature of advanced driver
assistance systems, contributing to the safety of the drivers and vehicles as well.
 Developing TSR systems requires the use of computer vision techniques, which
could be considered fundamental in the field of pattern recognition in general.
Despite all the previous works and research that has been achieved, traffic sign
detection and recognition still remain a very challenging problem, precisely if we
want to provide a real time processing solution.
 In this paper, author present a comparative and analytical study of the two major
approaches for traffic sign detection and recognition.
 The first approach is based on the color segmentation technique and
convolutional neural networks (C-CNN),
 while the second one is based on the fast region-based convolutional neural
networks approach (Fast R-CNN).

 COLOR SEGMENTATION & CNN BASED APPROACH
 The C-CNN method consists of selecting a set of regions for interest (ROIs) by
applying a color tresholding on the input image, thus reducing the search space.
 Then, a trained CNN is used to classify the ROI (whether it contains a traffic sign or
not), followed by another CNN with the same architecture that is used to recognize
the detected traffic signs.
 Using the HSV features and grayscale color spaces, color thresholding can be
applied to a given image or frame.
 After color detection, an additional step is required in order to verify if the
corresponding regions ROIs are a traffic sign or not, by using a CNN model as a
classifier.

 After converting RGB color space into HSV color space the CNN is applied as a
model.
 It is fairly simple and has 7 layers: 3 convolutional layers, 3 maxpooling layers for
feature extraction and one fully connected layer as a classifier. The CNN was trained
to recognize two classes: traffic sign / no traffic sign. For that, we merged two
datasets: the GTSRB dataset and 30,000 random samples of the Cifar-10 dataset
 After classifying the extracted ROIs, we apply a second CNN with same architecture
of the first one, to recognize the detected traffic signs in the ROIs.
 FAST R-CNN BASED APPROACH
 Fast R-CNN was proposed to fix the disadvantages of R-CNN [28] and SPPnet,
while improving their speed and accuracy. The Fast R-CNN method has several
advantages:
1. Higher detection quality than R-CNN, and SPPnet,
2. Training is single-stage, using a multi-task loss.
3. No disk storage is required for feature caching.
4. Training can update all network layers.

 The Fast R-CNN network takes as input an entire image and a set of object
proposals that are calculated by an external algorithm. The network first processes
the whole image with several convolutional (conv) and max pooling layers to
produce a conv feature map. Then, for each object proposal a region of interest
(ROI) pooling layer extracts a fixed-length feature vector from the feature map.
 Each feature vector is fed into a sequence of fully connected(fc) layers that finally
branch into two sibling output layers:
 One that produces softmax probability estimates over K object classes plus a
catch-all “background” class.
 Another layer that outputs four real-value numbers for each of the K object
classes. Each set of 4 values encodes refined bounding-box positions for one of
the K classes.

 EXPERIMENTAL RESULTS
 In this case they used python language for implementation. For color segmentation,
we used the OpenCv API and the well-known deep learning API Keras with
Tensorflow backend were used to train the CNNs models in both approaches.
1. C-CNN APPROACH RESULTS
Here they trained two CNNs with the same architecture, the first one for classifying
the ROIs extracted after applying the color segmentation on input frames and the
second one to recognize the detected traffic signs. The training time took 6 hours on
an Intel core i7 3612QM 8GB 1T.

2. C-CNN APPROACH RESULTS
 With the Fast R-CNN approach, we could achieve 94.8% accuracy on test set (the
GTSDB). The training time took 3 days, on a server with a GTX 1080 TI ROC -
11GO graphic card
 Even though the Fast R-CNN is a very good approach and showed interesting results
in pattern detection & recognition in general, but if the training dataset is not
balanced and big, that limits the performance of the approach.

 CONCLUSION
 This paper presented an analytical study of two effective and efficient road sign
detection and recognition approaches.
 The experimental results achieved after testing both of the methods on the German
Traffic Sign Detection & Recognition datasets, conclude that the Fast R-CNN is so
much faster than the C-CNN method, also it is invariant to illumination changes.
 On the other hand, even though the C-CNN approach is slow and sensitive to
weather conditions, it is invariant to scale and viewing angle.

 Vehicle detection and counting in aerial images is important for a wide range of
applications such as urban planning and traffic management.
 Many methods have been introduced in the literature for solving this problem. These
methods are either based on shallow learning or deep learning approaches. However,
these methods suffer from relatively low precision and recall rate.
 This paper introduces an automated vehicle detection and counting system in aerial
images.
 The proposed system utilizes convolution neural network (CNN) to regress a vehicle
spatial density map across the aerial image. It has been evaluated on two publicly
available datasets namely Munich and OIRDS.
 The experimental results show that our proposed system is efficient and effective,
and produces higher precision and recall rate than the comparative methods.

 SHALLOW-LEARNING-BASED METHODS
 The general strategy followed in this group relies on handcrafted features extraction
followed by a classifier or cascade of classifiers.
 authors proposed a system for car counting in aerial images captured by UAV. They
have reduced search space by selecting the regions where cars might exist using a
supervised classifier then extracted feature points using scale invariant feature
transform (SIFT).
 Then support vector machine (SVM) has been used in order to discriminate
between the cars and all other objects. Four steps for car detection system have been
introduced.
1. The proposed system starts with selecting the areas that might have cars.
2. Then, two sets of histogram of oriented gradients (HOG) features are extracted for
vertical and horizontal filtering directions.
3. The discrimination between the cars and other objects has been performed by one
of three suggested techniques: mutual information measure, normalized cross
correlation, and combination of the correlation measure with SVM classification.
4. The discrimination is obtained by associating an orientation value to the points
classified as cars. Finally, the points that belong to the same car are merged.

 DEEP-LEARNING-BASED METHODS
 Most of the works proposed in this category use convolution neural network for
automatic features extraction. In deep convolutional neural network with multi-scale
spatial pyramid pooling (SPP) has been employed in extracting the target patterns
with different sizes. However, input images have been pre-processed by maximum
normed gradient algorithm in order to restore the edges of the objects.
 Another deep learning approach has been introduced. In this , the input image has
been segmented into small homogeneous regions. Then the features in the
segmented regions are extracted using pre-trained convolutional neural network
(CNN) by a sliding-window approach.
 Windows are classified using support vector machine (SVM) into car and no-car
classes. Finally, post-processing is done such as morphological dilation to smooth
the detected regions and fill the holes.

 The proposed system
1. FULLY CONVOLUTIONAL REGRESSION NETWORK(FCRN)
The proposed system. (a) Training phase. (b) Inference Phase.

3. IMPLEMENTATION DETAILS
 The implementation of the proposed architecture is based on Tensorflow
 During training phase, 224x224 random patches were selected from the aerial
image. The selected patch contains at least one vehicle. Thus, patches with no
vehicles were not chosen during training. In order to increase the amount of training
examples, data augmentation techniques were utilized such as rotation, horizontal
and vertical flipping and shifting.
 The mean square error target function is used, as follows:
where, X is the input patch with M samples,
are all trainable parameters,
YP is the predicted density map, and YT
YT is the ground truth annotation.
 RMSprop optimizer has been used for updating the parameter values.

 DATASETS DESCRIPTION
 The proposed system has been evaluated on two public datasets namely DLR
Munich vehicle dataset provided by Remote Sensing Technology Institute of the
German Aerospace Center and Overhead Imagery Research Data Set (OIRDS)
dataset.
 Munich dataset contains 20 images (5616 x 3744 pixels) taken by DLR 3K camera
system at a height of 1000 m above the ground over the area of Munich, Germany.
This dataset contains 3418 cars and 54 trucks annotated in the training image set and
5799 cars and 93 trucks annotated in testing image set.
 OIRDS dataset This dataset contains 907 aerial images with approximately1800
annotated vehicles.

 QUANTITATIVE EVALUATION AND COMPARISON
F1- F1 score
Performance Comparison between the proposed method and the state-of-the-art methods.

 CONCLUSION
 A novel vehicle detection and counting method has been introduced using
convloutional regression neural network. In the proposed system, we have used
regression model in order to predict the density map of the input patches. Then, the
output of FCRN goes under empirical threshold which results a binary image.
 Finally, a simple connected component algorithm is used for finding the locations
and count of the blobs that represent the detected vehicles. The results of the
proposed architecture outperforms the state-of-the-art methods.
 have achieved the highest true positive rate and the lowest false alarm rate.

Some another Interesting applications
 Mixtures of Lightweight Deep Convolutional Neural Networks: Applied to
Agricultural Robotics
 Aircraft Type Recognition Based on Segmentation With Deep Convolutional Neural
Networks
 Deep Convolutional Neural Networks and Data Augmentation for Environmental
Sound Classification
 Fast Deep Neural Networks with Knowledge Guided Training and Predicted
Regions of Interests for Real-time Video Object Detection

Conclusion
 CNNs give the best performance in pattern/image recognition problems and even
outperform humans in certain cases. Cadence has achieved best-in-industry results
using proprietary algorithms and architectures with CNNs.
 Now we have studied CNN and its components along with the applications. Also we
have studied three IEEE papers thoroughly that are having different applications that
are Image classification, traffic sign recognition and Vehicle detection.
 In previous section I have given more interesting applications you may go through
it.

References
[1]Samer Hijazi, Rishi Kumar, and Chris Rowen, IP Group, Cadence,”Using
Convolutional Neural Networks for Image Recognition”.
[2]Neena Aloysius and Geetha M,” A Review on Deep Convolutional Neural
Networks”, International Conference on Communication and Signal Processing,
April 6-8, 2017, India.
[3]Keiron O’Shea1 and Ryan Nash2,” An Introduction to Convolutional Neural
Networks”
[4] Kaoutar Sefrioui Boujemaa, Afaf Bouhoute, Karim Boubouh and Ismail Berrada,”
Traffic sign recognition using convolutional neural networks”, 2017 International
Conference on Wireless Networks and Mobile Communications (WINCOM).
[5] SHIN-JYE LEE, TONGLIN CHEN, LUN YU, AND CHIN-HUI LAI,” Image
Classification Based on the Boost Convolutional Neural Network”, Digital Object
Identifier 10.1109/ACCESS.2018.2796722.
[6] HILAL TAYARA, KIM GIL SOO , AND KIL TO CHONG ,“Vehicle Detection and
Counting in High-Resolution Aerial Images Using Convolutional Regression Neural
Network ”, Digital Object Identifier 10.1109/ACCESS.2017.DOI.

CNN and its applications by ketaki

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to CNN and its applications by ketaki

Similar to CNN and its applications by ketaki (20)

Recently uploaded

Recently uploaded (20)

CNN and its applications by ketaki