Vehicle detection in Aerial Images

VEHICLE
DETECTION FROM
AERIAL IMAGES
USING FCRN
BY:
KOSHY G

INTRODUCTION
• Applications such as surveillance, traffic management, and
rescue tasks.
• Prevents traffic jams and congestions which in turn reduces
air and noise pollution.
• Surveillance - to making right decisions.
29-09-2018
2

CHALLENGES
• Small size of the vehicles
• Different types and orientations
• Similarity in visual appearance of vehicles and some other objects
• Detection time in very high resolution images
29-09-2018
3

FIXED-GROUND SENSORS
• Information collected efficiently using different types of fixed ground
sensors
• Such as stationary camera, radar sensors, bridge sensors, and
induction loop.
• Partial overview about vehicles density, parking lots situation, and
traffic flow.
• For road network monitoring and planning, traffic statistics, and
optimization.
29-09-2018
5

IMAGE-BASED SENSORS
• Two sources: satellites and airplanes or unmanned aerial vehicles
(UAV).
• Gives an overall overview of traffic situation in the area of interest.
• Satellites provide images with submeter spatial resolution.
• Aerial images provide a higher spatial resolution of 0.1 to 0.5m
• Easier data acquisition, low cost, fast acquisition of images, and
environment-friendliness
• Supervised learning problem, by convolutional neural network (CNN)
29-09-2018
6

CONVOLUTIONAL NEURAL NETWORK
• Deep, feed-forward artificial neural networks
• In traditional algorithms features were hand-engineered.
• Independence from prior knowledge and human effort in feature design
• Consists of an input and an output layer, as well as multiple hidden layers.
• The hidden layers consist of convolutional layers, pooling layers, fully
connected layers and normalization layers
29-09-2018
7

CONVOLUTIONAL LAYER
• Core building block of a CNN.
• Consist of a set of learnable filters or kernels
• The network learns filters that activate when it detects some specific type
of feature at some spatial position in the input.
• Stacking the activation maps for all filters along the depth dimension
forms the full output volume of the convolution layer.
29-09-2018
9

• The value of each filter is learned during the training process.
• Find more meaning from images
• By stacking layers of convolutions on top of each other, more abstract and
in-depth information from a CNN.
29-09-2018
12

CONV2D
• Most common type of convolution layer
• Extend through the three channels in an image (Red, Green, and Blue).
• After the convolutions are performed individually for each channels, they
are added up to get the final convoluted image.
• The output of a filter after a convolution operation is called a feature map
29-09-2018
14

• Each filter in this layer is randomly initialized to some distribution
(Normal, Gaussian, etc.).
• By having different initialization criteria, each filter gets trained slightly
differently.
• Random initialization ensures that each filter learns to identify different
features.
• Each successive layer can have two to four times the number of filters in
the previous layer. This helps the network learn hierarchical features.
29-09-2018
16

RELU
• ReLU is the abbreviation of Rectified Linear
Units.
• This layer applies the non-saturating activation
function 𝑓 𝑥 = max(0, 𝑥).
• It increases the nonlinear properties of the
decision function and of the overall network
29-09-2018
17

POOLING
• Non-linear down-sampling.
• Max Pooling, Average Pooling, Sum Pooling
• Partitions the input image into a set of non-overlapping rectangles and, for
each such sub-region, outputs the maximum.
• Exact location of a feature is less important than its rough location relative
to other features.
29-09-2018
19

• Reduce the spatial size of the
representation,
• Reduce the number of parameters
and amount of computation in the
network
• Insert a pooling layer between
successive convolutional layers in a
CNN architecture
29-09-2018
20

FULLY CONNECTED LAYER
• Finally, after several convolutional and max pooling layers, the high-level
reasoning in the neural network is done via fully connected layers.
• Neurons in a fully connected layer have connections to all activations in
the previous layer, as seen in regular neural networks.
29-09-2018
22

CNN SUMMARY
• INPUT will hold the raw pixel values of the image,
Ex: An image of width 32, height 32, and with three color channels R,G,B.
• CONV layer will compute the output of neurons that are connected to local
regions in the input,
Result in volume such as [32x32x12] if we decided to use 12 filters.
• RELU layer will apply an elementwise activation function, such as the max(0,x)
thresholding at zero.
This leaves the size of the volume unchanged ([32x32x12]).
29-09-2018
24

• POOL layer will perform a downsampling operation along the
spatial dimensions (width, height),
resulting in volume such as [16x16x12].
• FC (i.e. fully-connected) layer will compute the class scores,
Resulting in volume of size [1x1x10], where each of the 10
numbers correspond to a class score
29-09-2018
25

VGG16
• Neural network that performed very well in the Image Net Large Scale
Visual Recognition Challenge (ILSVRC) in 2014.
• Scored first place on the image localization task and second place on the
image classification task.
• It can detect any one of 1000 images
• It takes input image of size 224 * 224 * 3 (RGB image)
• Total 16 layers
29-09-2018
27

FULLY CONVOLUTIONAL REGRESSION
NETWORK
• To solve vehicle detection and counting problem
• FCRN has two paths: downsampling path and up-sampling path.
• The down-sampling path is the pre-trained VGG-16 network .
• Consists of repeated padded 3 x3 convolutions followed by rectified linear
unit (ReLU) and a max pooling operation.
• The layers up to 'conv5' from VGG-16 network.
29-09-2018
30

• De-convolution layer and up sampling is done
• Batch normalization is done for fast convergence
• The input is an image and the output is a
density map.
• Accurate vehicles detection and localization
29-09-2018
31

FCRN ARCHITECTURE
29-09-2018
32

SOLUTION
• Using CNN
• Mapping function between an image I (x) and a density map D(x),denoted
as F : I (x)
• 𝐹: 𝐼 𝑥 → 𝐷 𝑥 𝑤ℎ𝑒𝑟𝑒 𝐼𝜖𝑅 𝑚𝑥𝑛 𝑎𝑛𝑑 𝐷 𝜖𝑅 𝑚𝑥𝑛
29-09-2018
33

• a,b,c are the elements of the positive-definite matrix
•
𝑎 𝑏
𝑏 𝑐
and used for generating rotated ground-truth.
• x and y are inferred from the width and height of the vehicle, and 𝜃 is the
orientation of the vehicle
29-09-2018
35 GROUND TRUTH PREPARATION

TRAINING THE NETWORK
• During training, an input image and its corresponding ground truth are
given to the FCRN
• To minimize the error between the ground truth and predicted output.
• During inference, the output of the trained model goes under an empirical
thresholding
• Simple connected component algorithm is used for returning the count and
the location of the detected vehicles.
29-09-2018
37

TRAINING THE NETWORK
• During training phase, 224x224 random patches were selected from the
aerial image.
• The selected patch contains at least one vehicle.
• Thus, patches with no vehicles were not chosen during training.
• To increase the amount of training examples, data augmentation
techniques were utilized
29-09-2018
38

MEAN SQUARE ERROR FUNCTION
• X is the input patch with M samples, ∅ are all trainable parameters,
• YP is the predicted density map, and YT is the ground truth annotation.
29-09-2018
40

DATASET
• DLR Munich vehicle dataset provided by Remote Sensing Technology Institute of
the German Aerospace Center and Overhead Imagery Research Data Set (OIRDS)
dataset
• Munich dataset contains 20 images (5616 x 3744 pixels) taken by DLR 3K camera
system at a height of 1000 m above the ground over the area of Munich, Germany.
• This dataset contains 3418 cars and 54 trucks annotated in the training image set
and 5799 cars and 93 trucks annotated in testing image set.
• OIRDS dataset contains 907 aerial images with approximately 1800 annotated
vehicles
29-09-2018
42

Examples of aerial images in Munich dataset (first row) and OIRDS dataset (second
row).
29-09-2018
43

Munich dataset. Green represents true positive cases, yellow represents false negative
cases, and red represents false positive cases.
29-09-2018
44

OIRDS dataset. Green represents true positive cases, yellow represents false negative
cases, and red represents false positive cases.
29-09-2018
45

Fig: the input patch, the ground truth, the predicted density map, the result of applying thresholding and
connected component algorithm, finding all vehicles successfully
29-09-2018
46

REFERENCES
• [1]. Z. Zheng, X. Wang, G. Zhou, and L. Jiang, ``Vehicle detection based on morphology
from highway aerial images,'' in Proc. IEEE Int. Geosci. Remote Sens. Symp., Jul. 2012,
pp. 59976000.
• [2] J. Leitloff, S. Hinz, and U. Stilla, ``Vehicle detection in very high resolution satellite
images of city areas,'' IEEE Trans. Geosci. Remote Sens., vol. 48, no. 7, pp. 27952806,
Jul. 2010.
• [3] X. Jin and C. H. Davis, ``Vehicle detection from high-resolution satellite imagery
using morphological shared-weight neural networks,'' Image Vis. Comput., vol. 25, no. 9,
pp. 14221431, 2007.
• [4] R. Ruskone, L. Guigues, S. Airault, and O. Jamet, ``Vehicle detection on aerial
images: A structural approach,'' in Proc. 13th Int. Conf. Pattern Recognit., vol. 3. Aug.
1996, pp. 900904.
29-09-2018
47

Vehicle detection in Aerial Images

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Vehicle detection in Aerial Images

Similar to Vehicle detection in Aerial Images (20)

More from Koshy Geoji

More from Koshy Geoji (9)

Recently uploaded

Recently uploaded (20)

Vehicle detection in Aerial Images