SlideShare a Scribd company logo
1 of 17
Download to read offline
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -55
PEDESTRIAN DETECTION IN LOW RESOLUTION
VIDEOS USING A MULTI-FRAME HOG-BASED
DETECTOR
Dr. Hisham Sager
Colorado School of Mines
hsager@mines.edu
Dr. William Hoff
Colorado School of Mines
whoff@mines.edu
Manuscript History
Number: IRJCS/RS/Vol.06/Issue03/MRCS10090
Received: 03, March 2019
Final Correction: 13, March 2019
Final Accepted: 21, March 2019
Published: March 2019
Citation: Sager & Hoff (2019). PEDESTRIAN DETECTION IN LOW RESOLUTION VIDEOS USING A MULTI-FRAME
HOG-BASED DETECTOR. IRJCS:: International Research Journal of Computer Science, Volume VI, 55-71.
doi://10.26562/IRJCS.2019.MRCS10090
Editor: Dr.A.Arul L.S, Chief Editor, IRJCS, AM Publications, India
Copyright: ©2019 This is an open access article distributed under the terms of the Creative Commons Attribution
License, Which Permits unrestricted use, distribution, and reproduction in any medium, provided the original
author and source are credited
Abstract-- Detecting pedestrians in low resolution videos is a challenging task, due to the small size of pedestrians
in the images and the limited information. In practical outdoor surveillance scenarios the pedestrian size is
usually small. Existing state-of-the-art pedestrian detection methods that use histogram of oriented gradient
(HOG) features have poor performance in this problem domain. To compensate for the lack of information in a
single frame, we propose a novel detection method that recognizes pedestrians in a short sequence of frames.
Namely, we take the single-frame HOG-based detector and extend it to multiple frames. Our detector is applied to
regions containing potential moving objects. In the case of video taken from a moving camera on an aerial
platform, video stabilization is first performed to register the frames. A classifier is then applied to features
extracted from spatio-temporal volumes surrounding the potential moving objects. On challenging stationary and
aerial video datasets, our detection accuracy outperforms several state-of-the-art algorithms.
I. INTRODUCTION
The task of detecting people in images and video is an important and active area in both academia and industry.
One scenario is the case of outdoor surveillance cameras, which are mounted in a high position and look down
upon a large public area such as a street or plaza. Another scenario is a camera mounted on a moving platform
such as a helicopter or unmanned aerial vehicle (UAV). In these scenarios, the size of people in the images is often
small, and detection can be challenging. In this paper, we focus on the problem of detecting people in low
resolution videos, where the height of the people is on the order of 20 pixels tall in the images. We also focus on
detecting pedestrians, by which we mean people that are walking.
This problem has many applications, including search and rescue, law enforcement, and border monitoring. When
the size of a pedestrian in an image becomes very small, many shape details are lost, and it is difficult to
distinguish a pedestrian from a non-pedestrian. Figure 1 shows an example of a pedestrian at four different
resolution levels. As described in Section 2, existing algorithms for pedestrian detection do fairly well for high
resolution images, but performance degrades dramatically when the height of pedestrians is 30 pixels or less. In
addition to the small size of the pedestrians in the images, the problem of detecting pedestrians in low resolution
video can be challenging for other reasons.
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -56
There can be a wide range of poses and appearance, including a variety of clothing. The lighting can vary and
shadows can be present. Background clutter can have a similar appearance to pedestrians. Pedestrians can be
partially occluded by other objects, or by other pedestrians.
Figure 1: An example pedestrian at four different resolution levels: The height of the pedestrian is 140, 50, 20, and
10 pixels. (Left) Images at actual size. (Right) The same images but stretched for visualization.
The problem is even more challenging in aerial videos. The effective resolution of the video is often degraded due
to motion blur and haze, further reducing the available visual information on shape and appearance. In these
scenarios, video stabilization is often used to compensate for camera motion, in order to help find moving objects
in the scene. However, stabilization is imperfect, especially in the case of rapidly moving cameras. As a result,
many false regions can be identified as moving objects. Another significant challenge is that the camera moves
around frequently and does not dwell for long on a particular portion of the scene. Thus, algorithms that rely on a
long sequence of observations to build up a motion track model may not be applicable. As an example, Figure 2
shows snapshots from a video taken from a small quadrotor UAV flying rapidly over a field [1]. The camera moves
erratically, undergoing large amplitude rotations and translations. As a result, people are usually only within the
field of view for a short time (as briefly as several seconds). Although the size of people varies due to the changing
altitude of the camera, the height of people is often as small as 20 pixels tall. Detecting people in these scenarios is
extremely difficult, due to the low resolution.
Figure 2: Example images from aerial video (4 seconds apart). Size of images is 380×640 pixels.
Although it is very difficult to recognize a person in a single low resolution image, the task is much easier when a
short sequence of images is used. For example, Figure 3 shows a single low resolution frame in which it is difficult
to recognize the object. The right portion of the figure is a sequence of frames in which a subject is performing a
recognizable movement; i.e., walking. Despite the deficiency of recognizable features in the static image, the
movement can be easily recognized when the sequence is put in motion on a screen. This phenomenon is well
known from the pioneering work of Johansson [2], whose moving light display experiments showed convincingly
that the human visual system can easily recognize people in image sequences, even though the images contained
only a few bright spots (attached to their joints). A static image of spots is meaningless to observers, while a
sequence of images creates a vivid perception of a person walking. The most common and successful existing
approaches for pedestrian detection, such as the Dalal detector [3], use histogram of oriented gradient (HOG)
features, with a support vector machine (SVM) classifier. However, these approaches perform poorly when the
height of pedestrians is 30 pixels or less [4; 37].
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -57
Figure 3: (Left) Single frame. (Right) Sequence of frames from a video of a walking person.
To compensate for the lack of information in a single frame, we propose a novel detection method that recognizes
pedestrians in short sequence of frames. Namely, we take the single-frame HOG-based detector and extend it to
multiple frames. Our approach (Section 3) uses HOG features extracted from a spatiotemporal volume of images.
We use a volume composed of up to 32 “slices”, where each slice is a small sub image of 32×32 pixels. This volume
represents duration of about one second or less. The idea is that the motion of a person walking is distinctive, and
we can train a classifier to recognize the temporal sequence of feature vectors within the volume. As an example,
consider the images of a walking person shown in the top row of Figure 4. The corresponding HOG features are
shown in the second row. The third row shows images of a moving car. The sequence of corresponding HOG
features (bottom row) of the negative example is visually quite different from that of the positive example.
The main contribution of this work is the development of a novel multi-frame HOG-based pedestrian detector that
is able to detect pedestrians in video, at lower resolutions than has been reported in previous work. Our detector
achieves significantly better accuracy than existing detectors on challenging video datasets. The rest of this paper
is organized as follows. We discuss related work in Section 2. Our pedestrian detection method is presented in
Section 3. A detailed description of experimental results is presented in Section 4. Section 5 summarizes
conclusions and future work.
(a)
(b)
Figure 4: (a) Positive example (pedestrian). (b) Negative example (part of a car passing by a post)
II. RELATED WORK
There is an extensive body of literature on people detection, although there is less work on pedestrian detection in
low-resolution videos, and relatively little work on pedestrian detection in aerial videos. Comprehensive reviews
can be found in [4; 5; 6; 7]. Most work focuses on pedestrian detection in single high-resolution images. Instead of
an explicit model, an implicit representation is learned from examples, using machine learning techniques. These
approaches typically extract features from the image and then apply a classifier to decide if the image contains a
person. Typically, the detection system is applied to sub-images over the entire image, using a sliding window
approach. A multi-scale approach can be used, to handle different sizes of the person in the window. Alternatively,
the detection system can be preceded by a region-of-interest selector, which generates initial object hypotheses,
using some simple and fast tests. Then the full person detection system is applied to the candidate windows. The
most common and successful approaches for single frame pedestrian detection use gradient-based features. The
Dalal-Triggs detector [3] used a histogram of oriented gradient (HOG) features, with a support vector machine
(SVM) classifier. A model for the shape of a person is learned from many training examples. The HOG + SVM
approach is still considered as a competitive baseline in pedestrian detection.
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -58
Although this approach has excellent performance in high resolution images, studies [4] have shown that
performance degrades dramatically when the height of pedestrians is 30 pixels or less. A variation on this
approach is to use deformable part models for detection [9]. The part models are related to a root model using a
set of deformation models representing the expected location of body parts. Although this approach can handle a
wider variety of poses, Park, et al [10] found that the part-based model is not useful for pedestrian heights less
than 90 pixels. Same is applied to some recently used approaches such as deep convolutional neural networks
(CNNs) which have been widely adopted for pedestrian detection [38] and achieved state-of-the-art performance,
but not for low-resolution applications. Recently, a wave of deep CNNs based pedestrian detectors have achieved
good performance on several high-quality pedestrian benchmarks [41; 42] which is not the case for the low
resolution applications targeted by our work.
Contextual information can improve recognition, since in traffic scenes pedestrians are often around vehicles [12;
13; 39]. The approach of [40] proposes a segmentation and context network (SCN) structure that combines the
segmentation and context information for improving the accuracy of pedestrian detection. Our work does not use
contextual information, since we wanted to make our approach more general and not limit our domain to traffic
scenes. Other approaches use features that are similar to Haar wavelets [14; 15; 16]. Viola and Jones [14]
popularized this approach and showed its applicability to face detection. The features are differences of
rectangular regions in the images. These are simple and very fast to compute. Although each feature is not very
discriminatory, a large number of features can to be chained together to achieve good performance. The method of
AdaBoost is used to train the classifier and select features. In [15] Viola and Jones use Haar-like wavelets to
compute features in pairs of successive images for pedestrian detection.
Jones and Snow [16] extended the above algorithm to make use of 10 images in a sequence. This algorithm is the
closest one to our approach, since it uses a relatively long sequence. They used two types of Haar-like features:
Features applied within each frame, and differences of features between two different frames. On the PETS2001
dataset [17], their detector achieves a detection rate from 84% to 93% with a FP rate of 10-6. They were able to
detect pedestrians down to a size of 20 pixels tall, in videos taken from stationary cameras. To get better
performance, one might try to extend the Jones and Snow method to work on longer sequences of images.
However, in this case the number of potential Haar-like features grows to an unmanageable amount. Because of
the large number of feature hypotheses that need to be examined at each stage, the training time can be quite slow
(in the order of weeks).
Other approaches also use the additional information provided by image sequences to improve detection. For
example, [8] uses a two stage classifier which uses the detection scores from previous images to improve the
classification in the current image. Optical flow information can be incorporated into a feature vector along with
image gradient information [18; 19]. In [20] gradient weighted optical flow from the first frame of the sequence to
detect objects (face or person), then it is convolved with magnitude of gradient for further tracking. Other work
[11; 22; 23] extract spatiotemporal gradient features from the spatiotemporal volume of images. These methods
were developed to recognize actions in videos. Conceivably these approaches could also be used to detect
pedestrians. However, in low resolution image sequences, it would be difficult to extract local features, since the
volume is so small.
The work of [21] proposed 3DHOG descriptor to characterize the features of motion with a co-occurrence spatio-
temporal vectors. To increase discrimination, HOG HOF (HOG Histogram of Optical Flow) and the STHOG (Spatio-
temporal HOG) descriptors are proposed at the price of very high computation cost. The optical flow-based
features appear to help in high resolution [4], but in low-resolution scenarios, detection results are poor due to
noise, camera jitter, and the limited number of pixels available. There is relatively little work focused on
pedestrian detection in low resolution aerial videos. Most previous work using aerial video performs motion
compensation by registering each image to a reference image. In this way a short term background image can be
computed, which can be used to detect foreground objects using image subtraction (e.g., [24]). The work of [25]
applied a joint global–local information algorithm to suppress the background interference and enrich the
description of pedestrian. It is based on extracting features from human body parts, which are not available in low
resolution applications. Some approaches for pedestrian detection in aerial images use the same methods that
were discussed above, namely HOG-like features with an SVM classifier (e.g., [26]) or Haar-like features with an
AdaBoost classifier (e.g., [27]). Other approaches combine these features with additional information. For example
[28] uses shadows cast by people for classification. However, shadow information may not be reliable in low
resolution videos. The only approach that was found that uses image sequences for pedestrian detection in aerial
videos was [24]. This approach computes frequency measures of sub windows to detect the periodic motion of
legs, arms, etc. However, quantitative results were not presented. Some approaches integrate detection and
tracking. For example, [29] extracts hypotheses of body articulations from images; then combines those into
“tracklets” over a small number of frames, using a dynamic model of limbs and body parts.
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -59
This requires relatively high resolution images. Other approaches can detect very small pedestrians, but require a
longer sequence of frames. For example, the approach of [30] can detect and track vehicles and people only a few
pixels in size, but uses sequences ranging from tens of seconds to minutes long. The work of [31] checks 2D object
detections for consistency with scene geometry and converts them to 3D tracks. In videos taken from a rapidly
moving UAV, a pedestrian may only be visible in the field of view for a short time (a few seconds). Thus, methods
that require a long sequence of images to build up a track file in order to perform detection are not applicable. In
contrast, our system (described in the next section) only requires a short sequence of images and is applicable to
situations where the pedestrian is only visible for a short time.
III. PEDESTRIAN DETECTION METHOD
Figure 5 shows the architecture of the system, which contains two phases: a training phase and a detection phase.
In the training phase, positive training examples (i.e., volumes containing pedestrians) and negative training
examples (i.e., volumes not containing pedestrians) are created. Features are then extracted from these volumes.
In the detection phase, the binary classifier constructed during the training phase is used to scan over detected
ROIs in sequences of unseen testing images to search for pedestrians.
3.1 Video Stabilization
In the case of aerial video, where the images are taken from a moving camera, stabilization is applied to short
overlapping sequences of 32 frames. We start with the first frame of each sequence and use it as the reference
frame for the sequence. The remaining frames are registered to the first frame. The results are overlapping groups
of 32 frames, which are co-registered. This aids the next step, which is to detect ROIs containing potential moving
objects. In the case of videos taken from stationary cameras, the stabilization step can be skipped, since the
images are already co-registered. We assume that the camera is looking down at the ground, which is
approximately a planar surface. Thus, a homography (projective) transform describes the relationship between
any two images. To compute the transform, Harris corner interest points are matched between the reference
image and each subsequent image. We fit a homography to the matched points, using RANSAC to eliminate
outliers. We then apply the transform to align the current image to the reference image (Figure 6). The
assumption of a planar surface is only an approximation, although it is usually good if the camera is high above the
ground. However, any objects (such as buildings and trees) above the plane will be miss-registered, and may
result in ROIs that do not correspond to actual moving objects. Our classifier will subsequently filter these out,
since motion patterns within these ROIs do not match the patterns of a walking person.
3.2 ROI Detection
After stabilization, background subtraction is used to identify potential moving objects in the scene for subsequent
analysis.
Figure 5: The architecture of our overall pedestrian detection system.
A background model for each group of 32 frames is constructed by computing the mean of the frames. Then the
difference between the middle frame in the sequence and the background model is computed.
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -60
Initial foreground pixels are identified by thresholding the absolute value of the difference image. In our work, the
threshold was empirically set to 15−25	(depending	on	the	image	sequence).		Morphological	operations	are	applied	
to eliminate small regions and join broken regions. The opening operation removes small objects from the
foreground, placing them in the background, while closing operation removes small holes in the foreground. A
disk-shaped structuring element is used with radius of size of 4 or more. Connected components whose area lies
between 20 pixels and 500 pixels are extracted and their bounding boxes become the final ROIs. An example of an
image containing the final detected ROIs is shown in Figure 7. Although simple in design, the ROI detector
performs well in detecting potential moving objects. We deliberately tuned it to be very sensitive. For example, in
the VIRAT aerial dataset (described in Section 4), only 49 out of 1607 actual moving objects were not detected,
and in the PETS 2001 dataset, only 88 out of 1929 actual moving objects were not detected. Of course, the ROI
detector also occasionally detects non-moving objects, due to image noise and miss-registration. The classifier
will subsequently filter out the non-moving objects (as well as non-pedestrians).
Figure 6: (Left) Reference frame (1st of sequence of 32 frames). (Right) The 16th frame in the sequence,
registered to the first.
Figure 7: An example of ROIs shown on: (Left) The binary image. (Right) The registered Image.
3.3 Formation of Spatiotemporal Volumes
A sliding window of size 32×32 pixels is scanned within each ROI detected above. At each position, a
spatiotemporal volume is created by extracting a sequence of sub images (slices), at a fixed position in the
registered images, for N frames (we used up to 32 frames). The slice window size was chosen to be 32×32 pixels.
This size is large enough so a pedestrian remains within the window throughout the sequence at normal walking
speeds, which usually corresponds to about ½ pixel per frame. Since our detector is trained to detect pedestrians
with a height of approximately 20 pixels, this allows a border of about 6 pixels above and below the person (Figure
8). To handle possible variations in scale, we extract volumes at multiple scales in the image sequence by creating
a pyramid of images of different sizes. A scale factor of 0.75 is used between levels of the pyramid (for a total of 6
pyramid levels). This allows us to detect people that are taller than 20 pixels – the detector will detect people at
the image level where the height is about 20 pixels.
3.4 Feature Extraction, Normalization, and Dimensionality Reduction
HOG features are then extracted from each of the slices that make up the volume. We divide each 32×32 pixel slice
into square cells (typically 4×4 pixels each), and compute a histogram of gradient directions in each cell. We use 9
bins for the gradient directions, which represent unsigned directions from 0°-180°. Following the method of [3],
cells are grouped into (possibly overlapping) blocks, where each block consists of 2×2 cells. The features from
each slice are then concatenated into a single large vector. Variations in illumination affect the magnitudes of the
gradients. The influence of large gradient magnitudes can be reduced using normalization, which can be
performed in input space or in feature space. We found out that normalization in input space has little or no effect
on performance and sometimes decreases the performance. So, normalization is performed in feature space.
Following the method of [32], we normalize the volumetric blocks using the L2-norm followed by clipping to limit
the maximum values (Lowe-style clipped L2-norm).
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -61
Figure 8: Spatiotemporal Volume. (Left) Positive example. (Middle) Gradient. (Right) Computed HOG, with
volumetric block (shown in red color), and cell (shown in yellow color)
The difference is that in our algorithm the features were normalized within each volumetric block, meaning that
the sequence of blocks across the N slices (e.g. 16 slices) is at the same place in each slice, as shown in Figure 9.
Next, the features from the volumetric block in all slices were concatenated into a single feature vector. The result
of the volumetric normalization step is a set of feature vectors that are better invariant to changes in illumination
or shadowing. Using lower dimensional features produces models with fewer parameters, which speeds up the
training and detection algorithms. Following the work of [9], we apply Principal Components Analysis (PCA) to the
feature victors to reduce the dimensionality of features. In the learning stage, we collect a large number of 36-
dimensional HOG features corresponding to blocks and perform PCA on them. The eigen values indicate that the
linear subspace spanned by the top 50% eigenvectors can capture the essential information in the features.
Figure 9: (a) Volumetric Block. (b) Spatiotemporal volume of 16 slices.
3.5 Classification
To search for pedestrians, we apply a classifier to each spatiotemporal volume within the detected ROIs. We first
train a support vector machine (SVM) on examples of positive (pedestrian) and negative (non-pedestrian) feature
vectors. To extract positive examples from the training videos, the following procedure was followed. A
pedestrian was manually selected in one of the images (in one of the detected ROIs) and a square sub window was
extracted from the image surrounding the pedestrian. This sub window was scaled such that the person was 20
pixels tall, and the sub window size was 32 × 32 pixels. Next, a sequence of sub windows was extracted from the
registered images; half of them proceeding and half of them following the central image, at the same fixed place in
all images, and the sub windows were similarly scaled. A total of 32 such slices were assembled into a
spatiotemporal volume, representing a single positive example (Figure 10). Negative examples were also
extracted from the training images. These were spatiotemporal volumes of the same size as the positive examples,
but sampled randomly from completely person-free areas of detected ROIs. We used a freely available SVM-based
classifier; the OSU-SVM MATLAB toolbox [33], which is based on LIBSVM [34].
Figure 10: Four different training example sequences. Top: two sequences from VIRAT [31]. Bottom: sequences
from UCF-2009 [32]. Each sequence contains 8 slices subsampled from a sequence of 32 slices.
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -62
K-fold cross-validation is used for parameter selection, by partitioning the training data into 5 equally sized
segments, and then performing iterations of training and validation to pick the best parameters for the SVM
kernels. We experimented with two kernels – a linear kernel and a radial basis function kernel. Although the non-
linear kernel gives slightly more accurate results, for simplicity and speed we use the linear kernel as the baseline
classifier throughout this study. Finally, we perform non-maximum suppression on the detections. If the bounding
boxes of two or more detections overlap by more than 50%, they are merged into one detection by averaging their
top left coordinates.
IV.DATASETS, EXPERIMENTS, AND RESULTS
The results in this section use the following default parameters: Blocks of size of 2×2 cells, with no overlap, and
each cell consists of 4×4 pixels. We use 9 bins for gradient orientations, and normalize volumetric blocks using a
clipped L2-norm. The size of image slices is 32×32 pixels and a linear SVM classifier is used. In addition to
evaluating our algorithm, we compare our results to two other algorithms: the Dalal-Triggs algorithm [3], which is
among the most popular approaches for single frame pedestrian detection, and the Jones and Snow algorithm [15],
which was the best performing algorithm on low resolution pedestrians that we found. If we limit our algorithm
to use only a single image, it is essentially the same as the Dalal-Triggs algorithm. Therefore, we can directly
compare the performance of our algorithm to that of the Dalal-Triggs algorithm on each of the datasets. Although
we did not have an implementation of the Jones and Snow algorithm, they give performance results on one of the
datasets that we used, so we can compare our algorithm to theirs on that dataset. For evaluation and comparison
we use the following standard metrics, based on the possible outcomes of True Positive (TP), True Negative (TN),
False Positive (FP), and False Negative (FN). The “Detection Rate” (DR), or True Positive Rate (TPR) measures
how accurate the classifier is in sensing pedestrians. It is the proportion of positive examples (pedestrians) that
were correctly identified. It is calculated as:
FNTP
TP
DR


PositivesTotal
ClassifiedCorrectlyPositives
The “False Positive Rate” (FPR) is the proportion of negative examples (non-pedestrians) that were incorrectly
classified as positives (pedestrians). It is also known as the false positives per window (FPPW) rate. It is
calculated as:
FPTN
FP
FPR


NegativesTotal
ClassifiedyIncorrectlNegatives
A “Receiver Operating Characteristic” (ROC) curve depicts the tradeoff between hit rates (DR) and false alarm
rates of a classifier, as some parameters of the algorithm are varied.
4.1 Datasets
For evaluation of the proposed approach, we used six datasets that are representative for our application. These
are two stationary camera datasets (PETS2001 [17], VIRAT public 1.0 [35]), and three aerial datasets (VIRAT Fort
AP Hill [35], UCF-2009 [32], UCF-2007 [36]). In addition, we created our own set of aerial video sequences of
natural scenes to be realistic. It simulates scenarios of search and rescue scene or a border monitoring scene. In
the stationary datasets, videos were collected from a stationary surveillance camera and no video stabilization is
needed. All the images were converted from color to grayscale since color information was not used during
feature extraction. In addition, grayscale images were used during image registration. All these datasets are low
resolution; however, the height of people in some images is greater than 20 pixels. Although our detector was
designed to detect people with heights of 20 pixels, it can still detect these larger pedestrians. Since an image
pyramid was used, the detector can detect people at the image level where the height was about 20 pixels. This
guarantees that at some level of the pyramid the people will be close to 20 pixels height and can be detected by the
algorithm.
4.1.1 Stationary Datasets
The video sequences were taken by stationary cameras at the top of high buildings to record large numbers of
event instances across a very wide area while avoiding occlusion as much as possible. The cameras look down
upon a scene containing streets with buildings, trees, and parking lots. Cars and pedestrians periodically move
through the scene. The PETS 2001 dataset [17] is popular for automated surveillance research. It was also used by
Jones and Snow to evaluate their algorithm [16] (that we are comparing to). It contains 16 video sequences of
about two to four minutes length, with a frame rate of 25 frames/second, and frame size of 768 pixels in width and
576 pixels in height. Half the videos are designated as training, and half for testing. For this dataset, we extracted
2,560 training examples from the training videos. 960 of them were positive examples and 1600 were negative
examples. The second stationary camera dataset is the stationary VIRAT public 1.0 dataset [35], with a frame rate
of 30 frames/second, and frame size of 1280 pixels in width and 720 pixels in height. For this dataset, the training
set consisted of 1760 examples: 780 of them were positive examples and 980 were negative examples.
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -63
4.1.2 Aerial Datasets
The VIRAT Fort AP Hill aerial dataset [35] was recorded using an electro-optical sensor from a military aircraft
flying at a height up to 1000 meters. The resolution of these aerial videos is 640×480 with 30Hz frame rate, and
the typical pixel height of people in the collection is about 20 pixels. The videos include buildings and parking lots
where people and vehicles are engaged in different activities. The data is challenging in terms of low resolution,
uncontrolled background clutter, diversity in scenes, changing viewpoints, changing illumination, and low image
sharpness. For this dataset, the training set consisted of 1,280 positive examples and 1,280 negative examples.
The UCF-2009 (also known as UCF-LM) dataset [36] video sequences were obtained using an R/C-controlled
blimp equipped with a camera mounted on a gimbal in a dirt parking lot near the football stadium in Florida. The
flying altitudes ranged from 400–450 feet. Actions were performed by different actors. The UCF-2009 dataset has
a resolution of 540×960 pixels with a 23Hz frame rate. For this dataset, the training set consisted of 1,000 positive
volumes and 1,000 negative volumes. The UCF-2007 dataset [36] is an earlier dataset from UCF, and is more
challenging due to large variations in camera motion, rapidly changing viewpoints, changes in object appearance,
pose, object scale, cluttered background, and illumination conditions. In addition, it suffers from interlacing,
motion blur, and poor focus. The resolution is of 854×480 pixels with a 30 Hz frame rate. To remove artifacts
caused by the interlacing, we subsampled the images, so the final resolution was 427×240 pixels. For this dataset,
the training set consisted of 250 positive volumes and 250 negative volumes. For the set of sequences that we
have created; it was recorded using a DJI Phantom 3 Quadcopter. From now on we will refer to this data as the DJI
dataset. This dataset consists of aerial imagery of people walking across a field, at a frame rate of a 25 Hz with
image size of 720×1280 pixels. This dataset is challenging since the pedestrians are very small and there are
shadows cast by the pedestrians. In addition, there are variations in camera motion, changes in illumination
conditions, and some low image sharpness. For this dataset, the training set consisted of 780 positive volumes
and 1,260 negative volumes.
4.2 Experimental Results
Test sets were collected from portions of the videos that were different from the training sequences. ROIs were
detected in these images and the detector was scanned within each ROI. To show the main concept, Figure 11
presents an example of the results of applying the algorithm to one of the aerial datasets. After detecting ROIs, the
classifier was then applied to each volume within the ROIs. We shift volumes in the time direction every 4 frames.
This means that a pedestrian detected in frames 1:32 and in frames 5:37 is counted as two detected instances.
This helps in recovering from miss-detections and increases the testing set. Following [9], a detection is
considered to be correct if the area of overlap between the detection window and the ground truth window
exceeds 50%.
For PETS 2001, the total number of tested examples within the ROIs was 1,235 positive examples (16 slices each)
and 8,730 negative examples (16 slices each). Using the detector with the default parameters, a detection rate of
94.7% was achieved with a false positive rate (FPR) of 10-6. At the same FPR, the Dalal algorithm [3], which uses
single images, achieved a detection rate of 73%. On the same dataset, the Jones and Snow algorithm [16] achieved
a detection rate of 93%, when 8 detectors were combined. For the stationary VIRAT dataset, the total number of
tested examples was 720 positive examples and 3860 negative examples. Using the detector with the default
parameters, we achieved a detection rate of 91% with a false positive rate of 10-6. At the same FPR rate, the single-
frame detector of Dalal achieved a DR of 70%, on the same dataset. For the aerial VIRAT dataset, a total of 12,600
volumes were classified during the scanning over the detected ROIs, of which 5,016 were positive examples and
7,584 were negative. Using the detector with the default parameters, it achieved a DR of 78% at FPR of 4×10-3.
This value of FPR means that only 4 in 1000 tested non-pedestrian volumes were falsely classified as pedestrians.
At the same FPR the single-frame Dalal detector achieved a detection rate of 39%. For UCF 2009, a total of 5,880
volumes were classified; 2,184 of them were positives, and 3,696 were negatives. Using the detector with
parameters tuned for the best performance on this dataset, DR is 92% at FPR of 4×10-3. At the same FPR the Dalal
detector achieved a DR of 50%. For the UCF-2007 dataset, a total of 500 volumes were classified; half of them
positives and half of them were negatives. Using the detector with the default parameters, DR is 73% at FPR of
4×10-3. At the same FPR the Dalal detector achieved a DR of 41%. For the DJI dataset, a total of 1,820 volumes
were classified during the scanning over the detected ROIs, of which 740 were positive examples and 1,080 were
negative. The detector achieved a DR of 71% at FPR of 4×10-3. At the same FPR the single-frame Dalal detector
achieved a detection rate of 54%. Figures 12-15 shows detection examples on frames from the six datasets
discussed above. Detection results are shown as boxes, where TP is “true positive”, FP is “false positive”, TN is
“true negative”, and FN is “false negative” (to avoid cluttering the figures, we are not showing all the detected TNs).
Figure 12 (a,b) shows example of two frames from the Aerial VIRAT dataset with TP, FP, TN and FN results
labeled. Figure 12(c) shows the sequence of slices for one of the TP detections. Figure 12(d) shows an example of
a sequence of slices for one of TNs. TNs result from scanning the classifier around false ROIs that correspond to
motion regions resulting from non-pedestrian motion (e.g. vehicles) or from static objects due to non-perfect
stabilization.
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -64
Here an ROI was detected on a static object (the corner of a building). Since the motion pattern for this object does
not match that of a pedestrian, the classifier labels this as a non-pedestrian. Corresponding results on the UCF
2009 dataset are shown in Figure 13. Examples of sequences of slices for TN detections are shown in Figure 13(c,
d). The TNs shown here resulted from scanning the classifier around false ROIs corresponds to non-pedestrian
motion; e.g. a car and a bicycle. The motion patterns here do not match that of pedestrians, so the classifier labels
them as non-pedestrians. Figure 14 shows frames from UCF-2007 dataset with detections, and Figure 15 shows
examples from the stationary VIRAT dataset and PETS 2001 dataset with detections. Corresponding results on the
DJI dataset are shown in Figure 16. Examples of sequences of slices for TN detections are shown in Figure 16(c,
d).
(a) (b)
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -65
(c) (d)
Figure 12: (a,b) Two frames from aerial VIRAT dataset with detections. (c) Slices for one TP in (a). (d) Slices for
the TN shown in (b).
(a) (b)
(c) (d)
Figure 13: Detections on two frames from UCF-2009 Dataset. (a) FP and TPs. (b) TNs. (c) and (d) Sequences for
TNs shown in (b).
The TNs shown here resulted from scanning the classifier around false ROIs corresponds to non-pedestrian
motion; e.g. part of a moving vehicle. The ROC curves are plotted in Figures 17-18. Figure 17(left) shows the ROC
curves of the three detectors: Dalal detector, Jones-Snow detector, and our multi-frame HOG detector on PETS
2001 dataset.
(a) (b)
Figure 14: (a) and (b) Two frames from UCF-2007 dataset with example detections.
Figure 17(right) shows the ROC curves of the Dalal detector and the multi-frame HOG detector on the stationary
datasets. The multi-frame detector always outperforms the other detectors.
(a) (b)
Figure 15: Example detections, (a) PETS2001 dataset. (b) Stationary VIRAT dataset.
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -66
(a) (b)
(c) (d)
Figure 16: Detections on two frames from DJI Dataset. (a) TP, TN and FP. (b) TP. (c) and (d) Sequences for TP and
FP shown in (a).
Figure 18 shows the ROC curves of the Dalal detector and the multi-frame HOG detector on the four aerial
datasets. The multi-frame detector always gives a better detection rates than the Dalal detector
4.3 Performance Study and Discussion
We hypothesized that using information from multiple images in the detector should be better than using
information from only a single (or a few) image, since the motion patterns should aid recognition. We trained and
tested the detector on volumes consisting of a different number of slices per volume, ranging from 1 to 32 slices.
Using parameters tuned for the best performance, different classifiers were trained and tested. In each
experiment, the same number of slices per volume is used in training and in testing. Figure 19(top) shows the DR
of each classifier on the stationary datasets at FPR of 10-6, while Figure 19(bottom) shows the DR of each classifier
on the aerial datasets at FPR of 4×10-3. The results confirm that the use of multiple images for detection
dramatically improves the results. The improvement in detection rate increases with the number of slices, until a
total of 16 slices is reached (corresponding to about a half of a second of walking).
Figure 17: (Left) ROC curves of the three detectors on PETS 2001 dataset. (Right) ROC curves of the two detectors
on Stationary VIRAT dataset.
(a) (b)
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -67
(c) (d)
Figure 18: ROC curve for the Multi-Frame HOG detector and the Dalal detector on (a) Aerial VIRAT dataset. (b)
Aerial UCF-2009 dataset. (c) Aerial UCF-2007 dataset. (d) DJI dataset.
Figures 20-21 show the ROC curves for each dataset, as the number of slices is varied. In the aerial VIRAT dataset
(Figure 20), using a single slice per volume detector gives a DR of 40% at FPR of 4×10-3. At the same FPR, the use
of 16 slices per volume raises the DR to 78%. For the UCF-2009 dataset (Figure 21), using single frame (i.e., one
slice per volume detector), gives a DR of 50% at a FPR of 4×10-3. At the same FPR, the use of 16 frames improves
DR to 92%. Note that the single slice case corresponds to the Dalal detector. We also studied the effect of cell size.
Figure 22 shows the ROC for UCF-2009, as the cell size is varied. Cells of size of 4×4 pixels perform best. As shown
in Figure 23 (left), the size of a person’s head, forearm, upper leg, and lower leg appear to be approximately 4x4
pixels, which may allow cells of 4x4 pixels to capture the shape and motion of these parts. Figure 23(right) gives
some insight on what cues the detector uses to make its decision. It shows the weights corresponding to each
element of the feature vector; i.e., the value of the elements of w’s in the classifier decision function equation,
bxwxf T
)( . The weights are shown for the central slice in the volume. The figure shows that the contours
of pedestrian head, shoulders, and lower legs have the highest weights which represent the main cues for
detection.
Figure 19: The effect of the number of slices per volume on DR. (Top) stationary datasets. (Bottom) Aerial datasets.
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -68
Figure 20: ROC of the classifiers with different number of slices per volume: aerial VIRAT.
Figure 21: ROC of the classifiers with different number of slices per volume: UCF-2009 Dataset.
Figure 22: ROC Curve: Cell size effect.
Figure 23: (Left) Cells of size 4×4 pixels appear to match key parts of the pedestrian’s body. (Right) SVM positive
weights.
4.4 Frame Randomization
To confirm our hypothesis that the classifier learns the characteristic motion of walking pedestrians, we
randomized the order of frames. The expectation is that giving the classifier temporally incoherent data should
reduce the performance. We followed the same procedure as before to extract ROIs and form spatiotemporal
volumes. However, this time we randomized the slices in both the training and testing volumes. Figure 24 shows
detection rates obtained from multiple tests on the two datasets.
Figure 24: The effect of randomizing the order of frames on classification performance.
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -69
The results show that the use of randomized frame sequences degraded detection rates by an average of 8% in the
aerial VIRAT dataset experiments and by an average of 12% in the UCF dataset experiments at FPR of 4×10-3. For
example, for the UCF dataset, when the sequences were used in their normal coherent order, we obtained a
detection rate of 92%. In one of the tests in which randomized sequences were used, at the same FPR, the
detection rate degraded to 80%. These experiments indicate that the classifier is indeed learning the
characteristic motion of walking pedestrians.
V. CONCLUSIONS AND FUTURE WORK
We presented a method for detecting pedestrians in low-resolution videos, using a novel multiple-frame HOG-
based feature approach. The method is designed to detect pedestrians with heights as small as 20 pixels or higher.
On five public datasets including three challenging aerial datasets and our own created dataset, the method
achieves excellent results; i.e., detection rates of 78%, 92%, %73, and 71% at a false positive rate of 4×10-3 on
aerial VIRAT, aerial UCF-2009, aerial UCF-2007, and aerial DJI respectively. For the stationary datasets the
method achieves a detection rate of 94.7% and 91% at a false positive rate of 10-6 on PETs 2001 and stationary
VIRAT datasets respectively.
Figure 25: Example from the results of detecting pedestrians in challenging UAV videos. Detections are shown at
different pyramid levels.
We also have obtained excellent preliminary results on UAV videos [1] posted on YouTube (Figure 25). The
detector needs only a short sequence of frames to perform detection. Thus, it is applicable to situations where the
camera is moving rapidly and does not dwell on the same portion of the scene for very long. We studied the benefit
of using multiple frames on the performance of the detector. We confirmed that using additional frames improves
the performance significantly, up to about 16 frames. We also found that the detector can learn the coherence of
pedestrian motion. Future work should evaluate other classifiers to see if they perform better than the simple SVM
we used. Another direction is to train the classifier to detect multiple classes, such as fast walking, stationary, and
running people. Finally, our detector could be integrated into a standard tracker such as a multiple hypothesis
tracker or particle filter-based tracker. Even in aerial videos from rapidly moving cameras, a person is often in the
field of view for multiple seconds. Therefore, multiple detections can be associated into a single track, to improve
accuracy.
Acknowledgement
This work was partially supported by a gift from Northrop-Grumman.
REFERENCES
1. “Airsoft UAV footage”, downloaded from https://www.youtube.com/watch?v=rppbsvUSpxY, August 2016.
2. Johansson, Gunnar. 1973. "Visual perception of biological motion and a model for its analysis." Attention,
Perception, & Psychophysics 14, no. 2: 201-211.
3. Dalal, Navneet, and Bill Triggs. 2005. "Histograms of oriented gradients for human detection. 2005."
In Computer Vision and Pattern Recognition. IEEE Computer Society Conference, vol. 1, pp. 886-893.
4. Dollár, Piotr, Christian Wojek, Bernt Schiele, and Pietro Perona. 2012. "Pedestrian detection: An evaluation of
the state of the art." Pattern Analysis and Machine Intelligence, IEEE Transactions on 34, no. 4: 743-761.
5. Zhang, Shanshan, Rodrigo Benenson, Mohamed Omran, Jan Hosang, and Bernt Schiele. 2016. "How Far are
We from Solving Pedestrian Detection?." arXiv:1602.01237.
6. Enzweiler, Markus, and Dariu M. Gavrila. 2009. "Monocular pedestrian detection: Survey and
experiments." Pattern Analysis and Machine Intelligence, IEEE Transactions on 31, no. 12: 2179-2195.
7. Benenson, Rodrigo, Mohamed Omran, Jan Hosang, and Bernt Schiele. 2014. "Ten years of pedestrian
detection, what have we learned?” ECCV, CVRSUAD workshop.
8. A. González, D. Vázquez, S. Ramos, A. M. López, J. Amores. 2015. "Spatiotemporal Stacked Sequential Learning
for Pedestrian Detection". In Proceedings of the Iberian Conference on Pattern Recognition and Image
Analysis, Spain.
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -70
9. Felzenszwalb, Pedro, David McAllester, and Deva Ramanan. 2008. "A discriminatively trained, multiscale,
deformable part model." In Computer Vision and Pattern Recognition. IEEE Conference, pp. 1-8.
10. Park, Dennis, Deva Ramanan, and Charless Fowlkes .2010. "Multiresolution Models for Object Detection."
Proc. European Conf. Computer Vision (ECCV), pp. 241-254.
11. Klaser	A,	Marszałek	M,	Schmid	C.	A	spatio-temporal descriptor based on 3d-gradients. 2008. In BMVC 19th
British Machine Vision Conference (pp. 275-1). British Machine Vision Association.
12. Yan, Junjie, Xucong Zhang, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2013. "Robust multi-resolution pedestrian
detection in traffic scenes." In Computer Vision and Pattern Recognition (CVPR),IEEE Conference on, pp.
3033-3040.
13. Hu, Hai-Miao, Xiaowei Zhang, Wan Zhang, and Bo Li. 2015. "Joint global–local information pedestrian
detection algorithm for outdoor video surveillance." Journal of Visual Communication and Image
Representation 26: 168-181.
14. Viola, Paul, and Michael J. Jones. 2004. "Robust real-time face detection." International journal of computer
vision 57, no. 2: 137-154.
15. Viola, Paul, Michael J. Jones, and Daniel Snow. 2005. "Detecting pedestrians using patterns of motion and
appearance." International Journal of Computer Vision 63, no. 2: 153-161.
16. Jones, Michael J., and Daniel Snow.2008. "Pedestrian detection using boosted features over many frames."
In Pattern Recognition. 19th International Conference, pp. 1-4. IEEE.
17. PETS dataset. http://www.cvg.cs.rdg.ac.uk/pets2001/pets2001-dataset.html.
18. Dalal, Navneet, Bill Triggs, and Cordelia Schmid. 2006. "Human detection using oriented histograms of flow
and appearance." Computer Vision–ECCV: 428-441.
19. Wojek, Christian, Stefan Walk, and Bernt Schiele. 2009. "Multi-cue onboard pedestrian detection."
In Computer Vision and Pattern Recognition,(CVPR ), IEEE Conference on, pp. 794-801.
20. Mukherjee, Snehasis, and Dipti Prasad Mukherjee. 2015. "A motion-based approach to detect persons in low-
resolution video." Multimedia Tools and Applications 74, no. 21: 9475-9490.
21. Hua C, Makihara Y, Yagi Y, Iwasaki S, Miyagawa K, Li B. 2015. Onboard monocular pedestrian detection by
combining spatio-temporal HOG with structure from motion algorithm. Machine Vision and Applications
26(2-3):161-183.
22. Dollár, Piotr, Vincent Rabaud, Garrison Cottrell, and Serge Belongie. 2005. "Behavior recognition via sparse
spatio-temporal features." In Visual Surveillance and Performance Evaluation of Tracking and Surveillance.
2nd Joint IEEE International Workshop on, pp. 65-72.
23. Klaser, Alexander,	Marcin	Marszałek,	and	Cordelia	Schmid.	2008.	"A	spatio-temporal descriptor based on 3d-
gradients." In BMVC19th British Machine Vision Conference. pp. 275-1.
24. Narayanaswami, Ranga, Anastasia Tyurina, David Diel, Raman K. Mehra, and Janice M. Chinn. 2011.
"Discrimination and tracking of dismounts using low-resolution aerial video sequences." In SPIE Optical
Engineering+ Applications, pp. 81370H-81370H.
25. Hai-Miao Hu, Xiaowei Zhang, Wan Zhang, Bo Li. 2015. "Joint global–local information pedestrian detection
algorithm for outdoor video surveillance". Journal of Visual Communication and Image Representation,
Volume 26 168-181.
26. Oreifej, Omar, Ramin Mehran, and Mubarak Shah. 2010. "Human identity recognition in aerial images."
In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on, pp. 709-716.
27. Gaszczak, Anna, Toby P. Breckon, and Jiwan Han. 2011. "Real-time people and vehicle detection from UAV
imagery." In Proc. SPIE Conference Intelligent Robots and Computer Vision XXVIII: Algorithms and
Techniques, volume 7878, doi: 10.1117/12.876663
28. Reilly, Vladimir, Berkan Solmaz, and Mubarak Shah. 2010. "Geometric constraints for human detection in
aerial imagery." Proc. European Conf. Computer Vision (ECCV), pp. 252-265.
29. Andriluka, Mykhaylo, Stefan Roth, and Bernt Schiele. 2008. "People-tracking-by-detection and people-
detection-by-tracking." In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on, pp. 1-8.
30. Basharat, Arslan, Matt Turek, Yiliang Xu, Chuck Atkins, David Stoup, Keith Fieldhouse, Paul Tunison, and
Anthony Hoogs. 2014. "Real-time multi-target tracking in wide area motion imagery." In Applications of
Computer Vision (WACV),IEEE Winter Conference on, pp. 839-846.
31. Leibe, Bastian, Aleš Leonardis, and Bernt Schiele. 2008. "Robust object detection with interleaved
categorization and segmentation." International journal of computer vision 77, no. 1-3: 259-289.
32. Lowe, David G. 2004. "Distinctive image features from scale-invariant key points." International journal of
computer vision 60, no. 2: 91-110.
33. OSU-SVM Toolbox for MATLAB. Last Update: 2009-07-17.http://sourceforge.net/projects/svm.
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 03, Volume 6 (March 2019) www.irjcs.com
_________________________________________________________________________________________________________________________________________________
IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace,
Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80
© 2014-19, IRJCS- All Rights Reserved Page -71
34. Chang, Chih-Chung, and Chih-Jen Lin. 2013. "LIBSVM: a library for support vector machines." ACM
Transactions on Intelligent Systems and Technology (TIST) 2, no. 3: 27.
35. Oh, Sangmin, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit
Mukherjee et al. 2011. "A large-scale benchmark dataset for event recognition in surveillance video."
In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on, pp. 3153-3160.
36. UCF Lockheed-Martin UAV Dataset. 2009. Retrieved from http://vision.eecs.ucf.edu/aerial/index.html.
37. Sager, Hisham and Hoff, William, 2014, March. Pedestrian detection in low resolution videos. In IEEE Winter
Conference on Applications of Computer Vision (pp. 668-673). IEEE.
38. Wang, Shiguang, Jian Cheng, Haijun Liu, and Ming Tang. 2018. "Pcn: Part and context information for
pedestrian detection with CNNs." arXiv preprint arXiv: 1804.04483.
39. Chen, Zhichang, Li Zhang, Abdul Mateen Khattak, Wanlin Gao, and Minjuan Wang. 2019. "Deep Feature Fusion
by Competitive Attention for Pedestrian Detection." IEEE Access.
40. Li, Zhaoqing, Zhenxue Chen, QM Jonathan Wu, and Chengyun Liu. 2019. "Pedestrian detection via deep
segmentation and context network." Neural Computing and Applications: 1-13.
41. Zhang, Liliang, Liang Lin, Xiaodan Liang, and Kaiming He. 2016 "Is faster r-cnn doing well for pedestrian
detection?." In European conference on computer vision, pp. 443-457. Springer, Cham.
42. Li, Chengyang, Dan Song, Ruofeng Tong, and Min Tang. 2019. "Illumination-aware faster R-CNN for robust
multispectral pedestrian detection." Pattern Recognition 85: 161-171.

More Related Content

What's hot

Disparity map generation based on trapezoidal camera architecture for multi v...
Disparity map generation based on trapezoidal camera architecture for multi v...Disparity map generation based on trapezoidal camera architecture for multi v...
Disparity map generation based on trapezoidal camera architecture for multi v...ijma
 
3D Body Scanning for Human Anthropometry
3D Body Scanning for Human Anthropometry3D Body Scanning for Human Anthropometry
3D Body Scanning for Human Anthropometryijtsrd
 
IRJET- Augmented Reality in Surgical Procedures
IRJET- Augmented Reality in Surgical ProceduresIRJET- Augmented Reality in Surgical Procedures
IRJET- Augmented Reality in Surgical ProceduresIRJET Journal
 
Interactive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor networkInteractive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor networkijcga
 
Obstacle detection for autonomous systems using stereoscopic images and bacte...
Obstacle detection for autonomous systems using stereoscopic images and bacte...Obstacle detection for autonomous systems using stereoscopic images and bacte...
Obstacle detection for autonomous systems using stereoscopic images and bacte...IJECEIAES
 
Manifold image processing for see through effect in
Manifold image processing for see through effect inManifold image processing for see through effect in
Manifold image processing for see through effect ineSAT Publishing House
 
Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network  Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network ijcga
 
Leave a Trace - A People Tracking System Meets Anomaly Detection
Leave a Trace - A People Tracking System Meets Anomaly DetectionLeave a Trace - A People Tracking System Meets Anomaly Detection
Leave a Trace - A People Tracking System Meets Anomaly Detectionijma
 
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...Darius Burschka
 
Raida iii /certified fixed orthodontic courses by Indian dental academy
Raida iii /certified fixed orthodontic courses by Indian dental academy Raida iii /certified fixed orthodontic courses by Indian dental academy
Raida iii /certified fixed orthodontic courses by Indian dental academy Indian dental academy
 
Robust Motion Detection and Tracking of Moving Objects using HOG Feature and ...
Robust Motion Detection and Tracking of Moving Objects using HOG Feature and ...Robust Motion Detection and Tracking of Moving Objects using HOG Feature and ...
Robust Motion Detection and Tracking of Moving Objects using HOG Feature and ...CSCJournals
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Reeb Graph for Automatic 3D Cephalometry
Reeb Graph for Automatic 3D CephalometryReeb Graph for Automatic 3D Cephalometry
Reeb Graph for Automatic 3D CephalometryCSCJournals
 
Manifold image processing for see through effect in laparoscopic surgeries
Manifold image processing for see through effect in laparoscopic surgeriesManifold image processing for see through effect in laparoscopic surgeries
Manifold image processing for see through effect in laparoscopic surgerieseSAT Journals
 
Ear Biometrics shritosh kumar
Ear Biometrics shritosh kumarEar Biometrics shritosh kumar
Ear Biometrics shritosh kumarshritosh kumar
 
Pedestrian Counting in Video Sequences based on Optical Flow Clustering
Pedestrian Counting in Video Sequences based on Optical Flow ClusteringPedestrian Counting in Video Sequences based on Optical Flow Clustering
Pedestrian Counting in Video Sequences based on Optical Flow ClusteringCSCJournals
 
Efficient and secure real-time mobile robots cooperation using visual servoing
Efficient and secure real-time mobile robots cooperation using visual servoing Efficient and secure real-time mobile robots cooperation using visual servoing
Efficient and secure real-time mobile robots cooperation using visual servoing IJECEIAES
 

What's hot (18)

Disparity map generation based on trapezoidal camera architecture for multi v...
Disparity map generation based on trapezoidal camera architecture for multi v...Disparity map generation based on trapezoidal camera architecture for multi v...
Disparity map generation based on trapezoidal camera architecture for multi v...
 
3D Body Scanning for Human Anthropometry
3D Body Scanning for Human Anthropometry3D Body Scanning for Human Anthropometry
3D Body Scanning for Human Anthropometry
 
IRJET- Augmented Reality in Surgical Procedures
IRJET- Augmented Reality in Surgical ProceduresIRJET- Augmented Reality in Surgical Procedures
IRJET- Augmented Reality in Surgical Procedures
 
Interactive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor networkInteractive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor network
 
Obstacle detection for autonomous systems using stereoscopic images and bacte...
Obstacle detection for autonomous systems using stereoscopic images and bacte...Obstacle detection for autonomous systems using stereoscopic images and bacte...
Obstacle detection for autonomous systems using stereoscopic images and bacte...
 
Manifold image processing for see through effect in
Manifold image processing for see through effect inManifold image processing for see through effect in
Manifold image processing for see through effect in
 
Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network  Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network
 
20120140502012
2012014050201220120140502012
20120140502012
 
Leave a Trace - A People Tracking System Meets Anomaly Detection
Leave a Trace - A People Tracking System Meets Anomaly DetectionLeave a Trace - A People Tracking System Meets Anomaly Detection
Leave a Trace - A People Tracking System Meets Anomaly Detection
 
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
 
Raida iii /certified fixed orthodontic courses by Indian dental academy
Raida iii /certified fixed orthodontic courses by Indian dental academy Raida iii /certified fixed orthodontic courses by Indian dental academy
Raida iii /certified fixed orthodontic courses by Indian dental academy
 
Robust Motion Detection and Tracking of Moving Objects using HOG Feature and ...
Robust Motion Detection and Tracking of Moving Objects using HOG Feature and ...Robust Motion Detection and Tracking of Moving Objects using HOG Feature and ...
Robust Motion Detection and Tracking of Moving Objects using HOG Feature and ...
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Reeb Graph for Automatic 3D Cephalometry
Reeb Graph for Automatic 3D CephalometryReeb Graph for Automatic 3D Cephalometry
Reeb Graph for Automatic 3D Cephalometry
 
Manifold image processing for see through effect in laparoscopic surgeries
Manifold image processing for see through effect in laparoscopic surgeriesManifold image processing for see through effect in laparoscopic surgeries
Manifold image processing for see through effect in laparoscopic surgeries
 
Ear Biometrics shritosh kumar
Ear Biometrics shritosh kumarEar Biometrics shritosh kumar
Ear Biometrics shritosh kumar
 
Pedestrian Counting in Video Sequences based on Optical Flow Clustering
Pedestrian Counting in Video Sequences based on Optical Flow ClusteringPedestrian Counting in Video Sequences based on Optical Flow Clustering
Pedestrian Counting in Video Sequences based on Optical Flow Clustering
 
Efficient and secure real-time mobile robots cooperation using visual servoing
Efficient and secure real-time mobile robots cooperation using visual servoing Efficient and secure real-time mobile robots cooperation using visual servoing
Efficient and secure real-time mobile robots cooperation using visual servoing
 

Similar to PEDESTRIAN DETECTION IN LOW RESOLUTION VIDEOS USING A MULTI-FRAME HOG-BASED DETECTOR

IRJET- Estimation of Crowd Count in a Heavily Occulated Regions
IRJET-  	  Estimation of Crowd Count in a Heavily Occulated RegionsIRJET-  	  Estimation of Crowd Count in a Heavily Occulated Regions
IRJET- Estimation of Crowd Count in a Heavily Occulated RegionsIRJET Journal
 
Person Detection in Maritime Search And Rescue Operations
Person Detection in Maritime Search And Rescue OperationsPerson Detection in Maritime Search And Rescue Operations
Person Detection in Maritime Search And Rescue OperationsIRJET Journal
 
Person Detection in Maritime Search And Rescue Operations
Person Detection in Maritime Search And Rescue OperationsPerson Detection in Maritime Search And Rescue Operations
Person Detection in Maritime Search And Rescue OperationsIRJET Journal
 
Schematic model for analyzing mobility and detection of multiple
Schematic model for analyzing mobility and detection of multipleSchematic model for analyzing mobility and detection of multiple
Schematic model for analyzing mobility and detection of multipleIAEME Publication
 
Advanced Robot Vision for Medical Surgical Applications
Advanced Robot Vision for Medical Surgical ApplicationsAdvanced Robot Vision for Medical Surgical Applications
Advanced Robot Vision for Medical Surgical ApplicationsDR.P.S.JAGADEESH KUMAR
 
Intelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep LearningIntelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep LearningIRJET Journal
 
IRJET- Development and Monitoring of a Fall Detection System through Wear...
IRJET-  	  Development and Monitoring of a Fall Detection System through Wear...IRJET-  	  Development and Monitoring of a Fall Detection System through Wear...
IRJET- Development and Monitoring of a Fall Detection System through Wear...IRJET Journal
 
IRJET- Behavior Analysis from Videos using Motion based Feature Extraction
IRJET-  	  Behavior Analysis from Videos using Motion based Feature ExtractionIRJET-  	  Behavior Analysis from Videos using Motion based Feature Extraction
IRJET- Behavior Analysis from Videos using Motion based Feature ExtractionIRJET Journal
 
IRJET- Criminal Recognization in CCTV Surveillance Video
IRJET-  	  Criminal Recognization in CCTV Surveillance VideoIRJET-  	  Criminal Recognization in CCTV Surveillance Video
IRJET- Criminal Recognization in CCTV Surveillance VideoIRJET Journal
 
IRJET - An Intelligent Pothole Detection System using Deep Learning
IRJET -  	  An Intelligent Pothole Detection System using Deep LearningIRJET -  	  An Intelligent Pothole Detection System using Deep Learning
IRJET - An Intelligent Pothole Detection System using Deep LearningIRJET Journal
 
IRJET - Military Spy Robot with Intelligentdestruction
IRJET - Military Spy Robot with IntelligentdestructionIRJET - Military Spy Robot with Intelligentdestruction
IRJET - Military Spy Robot with IntelligentdestructionIRJET Journal
 
IRJET- Survey on Detection of Crime
IRJET-  	  Survey on Detection of CrimeIRJET-  	  Survey on Detection of Crime
IRJET- Survey on Detection of CrimeIRJET Journal
 
A Study of Motion Detection Method for Smart Home System
A Study of Motion Detection Method for Smart Home SystemA Study of Motion Detection Method for Smart Home System
A Study of Motion Detection Method for Smart Home SystemAM Publications
 
IRJET - Direct Me-Nevigation for Blind People
IRJET -  	  Direct Me-Nevigation for Blind PeopleIRJET -  	  Direct Me-Nevigation for Blind People
IRJET - Direct Me-Nevigation for Blind PeopleIRJET Journal
 
Human Activity Recognition
Human Activity RecognitionHuman Activity Recognition
Human Activity RecognitionIRJET Journal
 
Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...
Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...
Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...IRJET Journal
 
Implementation of Geo-fencing to monitor a specific target using Point in Pol...
Implementation of Geo-fencing to monitor a specific target using Point in Pol...Implementation of Geo-fencing to monitor a specific target using Point in Pol...
Implementation of Geo-fencing to monitor a specific target using Point in Pol...IRJET Journal
 
Motion Object Detection Using BGS Technique
Motion Object Detection Using BGS TechniqueMotion Object Detection Using BGS Technique
Motion Object Detection Using BGS TechniqueMangaiK4
 

Similar to PEDESTRIAN DETECTION IN LOW RESOLUTION VIDEOS USING A MULTI-FRAME HOG-BASED DETECTOR (20)

IRJET- Estimation of Crowd Count in a Heavily Occulated Regions
IRJET-  	  Estimation of Crowd Count in a Heavily Occulated RegionsIRJET-  	  Estimation of Crowd Count in a Heavily Occulated Regions
IRJET- Estimation of Crowd Count in a Heavily Occulated Regions
 
Person Detection in Maritime Search And Rescue Operations
Person Detection in Maritime Search And Rescue OperationsPerson Detection in Maritime Search And Rescue Operations
Person Detection in Maritime Search And Rescue Operations
 
Person Detection in Maritime Search And Rescue Operations
Person Detection in Maritime Search And Rescue OperationsPerson Detection in Maritime Search And Rescue Operations
Person Detection in Maritime Search And Rescue Operations
 
Schematic model for analyzing mobility and detection of multiple
Schematic model for analyzing mobility and detection of multipleSchematic model for analyzing mobility and detection of multiple
Schematic model for analyzing mobility and detection of multiple
 
Advanced Robot Vision for Medical Surgical Applications
Advanced Robot Vision for Medical Surgical ApplicationsAdvanced Robot Vision for Medical Surgical Applications
Advanced Robot Vision for Medical Surgical Applications
 
Waymo Essay
Waymo EssayWaymo Essay
Waymo Essay
 
Intelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep LearningIntelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep Learning
 
IRJET- Development and Monitoring of a Fall Detection System through Wear...
IRJET-  	  Development and Monitoring of a Fall Detection System through Wear...IRJET-  	  Development and Monitoring of a Fall Detection System through Wear...
IRJET- Development and Monitoring of a Fall Detection System through Wear...
 
IRJET- Behavior Analysis from Videos using Motion based Feature Extraction
IRJET-  	  Behavior Analysis from Videos using Motion based Feature ExtractionIRJET-  	  Behavior Analysis from Videos using Motion based Feature Extraction
IRJET- Behavior Analysis from Videos using Motion based Feature Extraction
 
IRJET- Criminal Recognization in CCTV Surveillance Video
IRJET-  	  Criminal Recognization in CCTV Surveillance VideoIRJET-  	  Criminal Recognization in CCTV Surveillance Video
IRJET- Criminal Recognization in CCTV Surveillance Video
 
AUGMENTED REALITY
AUGMENTED REALITYAUGMENTED REALITY
AUGMENTED REALITY
 
IRJET - An Intelligent Pothole Detection System using Deep Learning
IRJET -  	  An Intelligent Pothole Detection System using Deep LearningIRJET -  	  An Intelligent Pothole Detection System using Deep Learning
IRJET - An Intelligent Pothole Detection System using Deep Learning
 
IRJET - Military Spy Robot with Intelligentdestruction
IRJET - Military Spy Robot with IntelligentdestructionIRJET - Military Spy Robot with Intelligentdestruction
IRJET - Military Spy Robot with Intelligentdestruction
 
IRJET- Survey on Detection of Crime
IRJET-  	  Survey on Detection of CrimeIRJET-  	  Survey on Detection of Crime
IRJET- Survey on Detection of Crime
 
A Study of Motion Detection Method for Smart Home System
A Study of Motion Detection Method for Smart Home SystemA Study of Motion Detection Method for Smart Home System
A Study of Motion Detection Method for Smart Home System
 
IRJET - Direct Me-Nevigation for Blind People
IRJET -  	  Direct Me-Nevigation for Blind PeopleIRJET -  	  Direct Me-Nevigation for Blind People
IRJET - Direct Me-Nevigation for Blind People
 
Human Activity Recognition
Human Activity RecognitionHuman Activity Recognition
Human Activity Recognition
 
Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...
Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...
Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...
 
Implementation of Geo-fencing to monitor a specific target using Point in Pol...
Implementation of Geo-fencing to monitor a specific target using Point in Pol...Implementation of Geo-fencing to monitor a specific target using Point in Pol...
Implementation of Geo-fencing to monitor a specific target using Point in Pol...
 
Motion Object Detection Using BGS Technique
Motion Object Detection Using BGS TechniqueMotion Object Detection Using BGS Technique
Motion Object Detection Using BGS Technique
 

More from AM Publications

DEVELOPMENT OF TODDLER FAMILY CADRE TRAINING BASED ON ANDROID APPLICATIONS IN...
DEVELOPMENT OF TODDLER FAMILY CADRE TRAINING BASED ON ANDROID APPLICATIONS IN...DEVELOPMENT OF TODDLER FAMILY CADRE TRAINING BASED ON ANDROID APPLICATIONS IN...
DEVELOPMENT OF TODDLER FAMILY CADRE TRAINING BASED ON ANDROID APPLICATIONS IN...AM Publications
 
TESTING OF COMPOSITE ON DROP-WEIGHT IMPACT TESTING AND DAMAGE IDENTIFICATION ...
TESTING OF COMPOSITE ON DROP-WEIGHT IMPACT TESTING AND DAMAGE IDENTIFICATION ...TESTING OF COMPOSITE ON DROP-WEIGHT IMPACT TESTING AND DAMAGE IDENTIFICATION ...
TESTING OF COMPOSITE ON DROP-WEIGHT IMPACT TESTING AND DAMAGE IDENTIFICATION ...AM Publications
 
THE USE OF FRACTAL GEOMETRY IN TILING MOTIF DESIGN
THE USE OF FRACTAL GEOMETRY IN TILING MOTIF DESIGNTHE USE OF FRACTAL GEOMETRY IN TILING MOTIF DESIGN
THE USE OF FRACTAL GEOMETRY IN TILING MOTIF DESIGNAM Publications
 
TWO-DIMENSIONAL INVERSION FINITE ELEMENT MODELING OF MAGNETOTELLURIC DATA: CA...
TWO-DIMENSIONAL INVERSION FINITE ELEMENT MODELING OF MAGNETOTELLURIC DATA: CA...TWO-DIMENSIONAL INVERSION FINITE ELEMENT MODELING OF MAGNETOTELLURIC DATA: CA...
TWO-DIMENSIONAL INVERSION FINITE ELEMENT MODELING OF MAGNETOTELLURIC DATA: CA...AM Publications
 
USING THE GENETIC ALGORITHM TO OPTIMIZE LASER WELDING PARAMETERS FOR MARTENSI...
USING THE GENETIC ALGORITHM TO OPTIMIZE LASER WELDING PARAMETERS FOR MARTENSI...USING THE GENETIC ALGORITHM TO OPTIMIZE LASER WELDING PARAMETERS FOR MARTENSI...
USING THE GENETIC ALGORITHM TO OPTIMIZE LASER WELDING PARAMETERS FOR MARTENSI...AM Publications
 
ANALYSIS AND DESIGN E-MARKETPLACE FOR MICRO, SMALL AND MEDIUM ENTERPRISES
ANALYSIS AND DESIGN E-MARKETPLACE FOR MICRO, SMALL AND MEDIUM ENTERPRISESANALYSIS AND DESIGN E-MARKETPLACE FOR MICRO, SMALL AND MEDIUM ENTERPRISES
ANALYSIS AND DESIGN E-MARKETPLACE FOR MICRO, SMALL AND MEDIUM ENTERPRISESAM Publications
 
REMOTE SENSING AND GEOGRAPHIC INFORMATION SYSTEMS
REMOTE SENSING AND GEOGRAPHIC INFORMATION SYSTEMS REMOTE SENSING AND GEOGRAPHIC INFORMATION SYSTEMS
REMOTE SENSING AND GEOGRAPHIC INFORMATION SYSTEMS AM Publications
 
EVALUATE THE STRAIN ENERGY ERROR FOR THE LASER WELD BY THE H-REFINEMENT OF TH...
EVALUATE THE STRAIN ENERGY ERROR FOR THE LASER WELD BY THE H-REFINEMENT OF TH...EVALUATE THE STRAIN ENERGY ERROR FOR THE LASER WELD BY THE H-REFINEMENT OF TH...
EVALUATE THE STRAIN ENERGY ERROR FOR THE LASER WELD BY THE H-REFINEMENT OF TH...AM Publications
 
HMM APPLICATION IN ISOLATED WORD SPEECH RECOGNITION
HMM APPLICATION IN ISOLATED WORD SPEECH RECOGNITIONHMM APPLICATION IN ISOLATED WORD SPEECH RECOGNITION
HMM APPLICATION IN ISOLATED WORD SPEECH RECOGNITIONAM Publications
 
EFFECT OF SILICON - RUBBER (SR) SHEETS AS AN ALTERNATIVE FILTER ON HIGH AND L...
EFFECT OF SILICON - RUBBER (SR) SHEETS AS AN ALTERNATIVE FILTER ON HIGH AND L...EFFECT OF SILICON - RUBBER (SR) SHEETS AS AN ALTERNATIVE FILTER ON HIGH AND L...
EFFECT OF SILICON - RUBBER (SR) SHEETS AS AN ALTERNATIVE FILTER ON HIGH AND L...AM Publications
 
UTILIZATION OF IMMUNIZATION SERVICES AMONG CHILDREN UNDER FIVE YEARS OF AGE I...
UTILIZATION OF IMMUNIZATION SERVICES AMONG CHILDREN UNDER FIVE YEARS OF AGE I...UTILIZATION OF IMMUNIZATION SERVICES AMONG CHILDREN UNDER FIVE YEARS OF AGE I...
UTILIZATION OF IMMUNIZATION SERVICES AMONG CHILDREN UNDER FIVE YEARS OF AGE I...AM Publications
 
REPRESENTATION OF THE BLOCK DATA ENCRYPTION ALGORITHM IN AN ANALYTICAL FORM F...
REPRESENTATION OF THE BLOCK DATA ENCRYPTION ALGORITHM IN AN ANALYTICAL FORM F...REPRESENTATION OF THE BLOCK DATA ENCRYPTION ALGORITHM IN AN ANALYTICAL FORM F...
REPRESENTATION OF THE BLOCK DATA ENCRYPTION ALGORITHM IN AN ANALYTICAL FORM F...AM Publications
 
OPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNNOPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNNAM Publications
 
DETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECTDETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECTAM Publications
 
SIMULATION OF ATMOSPHERIC POLLUTANTS DISPERSION IN AN URBAN ENVIRONMENT
SIMULATION OF ATMOSPHERIC POLLUTANTS DISPERSION IN AN URBAN ENVIRONMENTSIMULATION OF ATMOSPHERIC POLLUTANTS DISPERSION IN AN URBAN ENVIRONMENT
SIMULATION OF ATMOSPHERIC POLLUTANTS DISPERSION IN AN URBAN ENVIRONMENTAM Publications
 
PREPARATION AND EVALUATION OF WOOL KERATIN BASED CHITOSAN NANOFIBERS FOR AIR ...
PREPARATION AND EVALUATION OF WOOL KERATIN BASED CHITOSAN NANOFIBERS FOR AIR ...PREPARATION AND EVALUATION OF WOOL KERATIN BASED CHITOSAN NANOFIBERS FOR AIR ...
PREPARATION AND EVALUATION OF WOOL KERATIN BASED CHITOSAN NANOFIBERS FOR AIR ...AM Publications
 
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...AM Publications
 
A MODEL BASED APPROACH FOR IMPLEMENTING WLAN SECURITY
A MODEL BASED APPROACH FOR IMPLEMENTING WLAN SECURITY A MODEL BASED APPROACH FOR IMPLEMENTING WLAN SECURITY
A MODEL BASED APPROACH FOR IMPLEMENTING WLAN SECURITY AM Publications
 
DATA MINING WITH CLUSTERING ON BIG DATA FOR SHOPPING MALL’S DATASET
DATA MINING WITH CLUSTERING ON BIG DATA FOR SHOPPING MALL’S DATASETDATA MINING WITH CLUSTERING ON BIG DATA FOR SHOPPING MALL’S DATASET
DATA MINING WITH CLUSTERING ON BIG DATA FOR SHOPPING MALL’S DATASETAM Publications
 

More from AM Publications (20)

DEVELOPMENT OF TODDLER FAMILY CADRE TRAINING BASED ON ANDROID APPLICATIONS IN...
DEVELOPMENT OF TODDLER FAMILY CADRE TRAINING BASED ON ANDROID APPLICATIONS IN...DEVELOPMENT OF TODDLER FAMILY CADRE TRAINING BASED ON ANDROID APPLICATIONS IN...
DEVELOPMENT OF TODDLER FAMILY CADRE TRAINING BASED ON ANDROID APPLICATIONS IN...
 
TESTING OF COMPOSITE ON DROP-WEIGHT IMPACT TESTING AND DAMAGE IDENTIFICATION ...
TESTING OF COMPOSITE ON DROP-WEIGHT IMPACT TESTING AND DAMAGE IDENTIFICATION ...TESTING OF COMPOSITE ON DROP-WEIGHT IMPACT TESTING AND DAMAGE IDENTIFICATION ...
TESTING OF COMPOSITE ON DROP-WEIGHT IMPACT TESTING AND DAMAGE IDENTIFICATION ...
 
THE USE OF FRACTAL GEOMETRY IN TILING MOTIF DESIGN
THE USE OF FRACTAL GEOMETRY IN TILING MOTIF DESIGNTHE USE OF FRACTAL GEOMETRY IN TILING MOTIF DESIGN
THE USE OF FRACTAL GEOMETRY IN TILING MOTIF DESIGN
 
TWO-DIMENSIONAL INVERSION FINITE ELEMENT MODELING OF MAGNETOTELLURIC DATA: CA...
TWO-DIMENSIONAL INVERSION FINITE ELEMENT MODELING OF MAGNETOTELLURIC DATA: CA...TWO-DIMENSIONAL INVERSION FINITE ELEMENT MODELING OF MAGNETOTELLURIC DATA: CA...
TWO-DIMENSIONAL INVERSION FINITE ELEMENT MODELING OF MAGNETOTELLURIC DATA: CA...
 
USING THE GENETIC ALGORITHM TO OPTIMIZE LASER WELDING PARAMETERS FOR MARTENSI...
USING THE GENETIC ALGORITHM TO OPTIMIZE LASER WELDING PARAMETERS FOR MARTENSI...USING THE GENETIC ALGORITHM TO OPTIMIZE LASER WELDING PARAMETERS FOR MARTENSI...
USING THE GENETIC ALGORITHM TO OPTIMIZE LASER WELDING PARAMETERS FOR MARTENSI...
 
ANALYSIS AND DESIGN E-MARKETPLACE FOR MICRO, SMALL AND MEDIUM ENTERPRISES
ANALYSIS AND DESIGN E-MARKETPLACE FOR MICRO, SMALL AND MEDIUM ENTERPRISESANALYSIS AND DESIGN E-MARKETPLACE FOR MICRO, SMALL AND MEDIUM ENTERPRISES
ANALYSIS AND DESIGN E-MARKETPLACE FOR MICRO, SMALL AND MEDIUM ENTERPRISES
 
REMOTE SENSING AND GEOGRAPHIC INFORMATION SYSTEMS
REMOTE SENSING AND GEOGRAPHIC INFORMATION SYSTEMS REMOTE SENSING AND GEOGRAPHIC INFORMATION SYSTEMS
REMOTE SENSING AND GEOGRAPHIC INFORMATION SYSTEMS
 
EVALUATE THE STRAIN ENERGY ERROR FOR THE LASER WELD BY THE H-REFINEMENT OF TH...
EVALUATE THE STRAIN ENERGY ERROR FOR THE LASER WELD BY THE H-REFINEMENT OF TH...EVALUATE THE STRAIN ENERGY ERROR FOR THE LASER WELD BY THE H-REFINEMENT OF TH...
EVALUATE THE STRAIN ENERGY ERROR FOR THE LASER WELD BY THE H-REFINEMENT OF TH...
 
HMM APPLICATION IN ISOLATED WORD SPEECH RECOGNITION
HMM APPLICATION IN ISOLATED WORD SPEECH RECOGNITIONHMM APPLICATION IN ISOLATED WORD SPEECH RECOGNITION
HMM APPLICATION IN ISOLATED WORD SPEECH RECOGNITION
 
INTELLIGENT BLIND STICK
INTELLIGENT BLIND STICKINTELLIGENT BLIND STICK
INTELLIGENT BLIND STICK
 
EFFECT OF SILICON - RUBBER (SR) SHEETS AS AN ALTERNATIVE FILTER ON HIGH AND L...
EFFECT OF SILICON - RUBBER (SR) SHEETS AS AN ALTERNATIVE FILTER ON HIGH AND L...EFFECT OF SILICON - RUBBER (SR) SHEETS AS AN ALTERNATIVE FILTER ON HIGH AND L...
EFFECT OF SILICON - RUBBER (SR) SHEETS AS AN ALTERNATIVE FILTER ON HIGH AND L...
 
UTILIZATION OF IMMUNIZATION SERVICES AMONG CHILDREN UNDER FIVE YEARS OF AGE I...
UTILIZATION OF IMMUNIZATION SERVICES AMONG CHILDREN UNDER FIVE YEARS OF AGE I...UTILIZATION OF IMMUNIZATION SERVICES AMONG CHILDREN UNDER FIVE YEARS OF AGE I...
UTILIZATION OF IMMUNIZATION SERVICES AMONG CHILDREN UNDER FIVE YEARS OF AGE I...
 
REPRESENTATION OF THE BLOCK DATA ENCRYPTION ALGORITHM IN AN ANALYTICAL FORM F...
REPRESENTATION OF THE BLOCK DATA ENCRYPTION ALGORITHM IN AN ANALYTICAL FORM F...REPRESENTATION OF THE BLOCK DATA ENCRYPTION ALGORITHM IN AN ANALYTICAL FORM F...
REPRESENTATION OF THE BLOCK DATA ENCRYPTION ALGORITHM IN AN ANALYTICAL FORM F...
 
OPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNNOPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNN
 
DETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECTDETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECT
 
SIMULATION OF ATMOSPHERIC POLLUTANTS DISPERSION IN AN URBAN ENVIRONMENT
SIMULATION OF ATMOSPHERIC POLLUTANTS DISPERSION IN AN URBAN ENVIRONMENTSIMULATION OF ATMOSPHERIC POLLUTANTS DISPERSION IN AN URBAN ENVIRONMENT
SIMULATION OF ATMOSPHERIC POLLUTANTS DISPERSION IN AN URBAN ENVIRONMENT
 
PREPARATION AND EVALUATION OF WOOL KERATIN BASED CHITOSAN NANOFIBERS FOR AIR ...
PREPARATION AND EVALUATION OF WOOL KERATIN BASED CHITOSAN NANOFIBERS FOR AIR ...PREPARATION AND EVALUATION OF WOOL KERATIN BASED CHITOSAN NANOFIBERS FOR AIR ...
PREPARATION AND EVALUATION OF WOOL KERATIN BASED CHITOSAN NANOFIBERS FOR AIR ...
 
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
 
A MODEL BASED APPROACH FOR IMPLEMENTING WLAN SECURITY
A MODEL BASED APPROACH FOR IMPLEMENTING WLAN SECURITY A MODEL BASED APPROACH FOR IMPLEMENTING WLAN SECURITY
A MODEL BASED APPROACH FOR IMPLEMENTING WLAN SECURITY
 
DATA MINING WITH CLUSTERING ON BIG DATA FOR SHOPPING MALL’S DATASET
DATA MINING WITH CLUSTERING ON BIG DATA FOR SHOPPING MALL’S DATASETDATA MINING WITH CLUSTERING ON BIG DATA FOR SHOPPING MALL’S DATASET
DATA MINING WITH CLUSTERING ON BIG DATA FOR SHOPPING MALL’S DATASET
 

Recently uploaded

complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixingviprabot1
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 

Recently uploaded (20)

complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixing
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 

PEDESTRIAN DETECTION IN LOW RESOLUTION VIDEOS USING A MULTI-FRAME HOG-BASED DETECTOR

  • 1. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -55 PEDESTRIAN DETECTION IN LOW RESOLUTION VIDEOS USING A MULTI-FRAME HOG-BASED DETECTOR Dr. Hisham Sager Colorado School of Mines hsager@mines.edu Dr. William Hoff Colorado School of Mines whoff@mines.edu Manuscript History Number: IRJCS/RS/Vol.06/Issue03/MRCS10090 Received: 03, March 2019 Final Correction: 13, March 2019 Final Accepted: 21, March 2019 Published: March 2019 Citation: Sager & Hoff (2019). PEDESTRIAN DETECTION IN LOW RESOLUTION VIDEOS USING A MULTI-FRAME HOG-BASED DETECTOR. IRJCS:: International Research Journal of Computer Science, Volume VI, 55-71. doi://10.26562/IRJCS.2019.MRCS10090 Editor: Dr.A.Arul L.S, Chief Editor, IRJCS, AM Publications, India Copyright: ©2019 This is an open access article distributed under the terms of the Creative Commons Attribution License, Which Permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Abstract-- Detecting pedestrians in low resolution videos is a challenging task, due to the small size of pedestrians in the images and the limited information. In practical outdoor surveillance scenarios the pedestrian size is usually small. Existing state-of-the-art pedestrian detection methods that use histogram of oriented gradient (HOG) features have poor performance in this problem domain. To compensate for the lack of information in a single frame, we propose a novel detection method that recognizes pedestrians in a short sequence of frames. Namely, we take the single-frame HOG-based detector and extend it to multiple frames. Our detector is applied to regions containing potential moving objects. In the case of video taken from a moving camera on an aerial platform, video stabilization is first performed to register the frames. A classifier is then applied to features extracted from spatio-temporal volumes surrounding the potential moving objects. On challenging stationary and aerial video datasets, our detection accuracy outperforms several state-of-the-art algorithms. I. INTRODUCTION The task of detecting people in images and video is an important and active area in both academia and industry. One scenario is the case of outdoor surveillance cameras, which are mounted in a high position and look down upon a large public area such as a street or plaza. Another scenario is a camera mounted on a moving platform such as a helicopter or unmanned aerial vehicle (UAV). In these scenarios, the size of people in the images is often small, and detection can be challenging. In this paper, we focus on the problem of detecting people in low resolution videos, where the height of the people is on the order of 20 pixels tall in the images. We also focus on detecting pedestrians, by which we mean people that are walking. This problem has many applications, including search and rescue, law enforcement, and border monitoring. When the size of a pedestrian in an image becomes very small, many shape details are lost, and it is difficult to distinguish a pedestrian from a non-pedestrian. Figure 1 shows an example of a pedestrian at four different resolution levels. As described in Section 2, existing algorithms for pedestrian detection do fairly well for high resolution images, but performance degrades dramatically when the height of pedestrians is 30 pixels or less. In addition to the small size of the pedestrians in the images, the problem of detecting pedestrians in low resolution video can be challenging for other reasons.
  • 2. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -56 There can be a wide range of poses and appearance, including a variety of clothing. The lighting can vary and shadows can be present. Background clutter can have a similar appearance to pedestrians. Pedestrians can be partially occluded by other objects, or by other pedestrians. Figure 1: An example pedestrian at four different resolution levels: The height of the pedestrian is 140, 50, 20, and 10 pixels. (Left) Images at actual size. (Right) The same images but stretched for visualization. The problem is even more challenging in aerial videos. The effective resolution of the video is often degraded due to motion blur and haze, further reducing the available visual information on shape and appearance. In these scenarios, video stabilization is often used to compensate for camera motion, in order to help find moving objects in the scene. However, stabilization is imperfect, especially in the case of rapidly moving cameras. As a result, many false regions can be identified as moving objects. Another significant challenge is that the camera moves around frequently and does not dwell for long on a particular portion of the scene. Thus, algorithms that rely on a long sequence of observations to build up a motion track model may not be applicable. As an example, Figure 2 shows snapshots from a video taken from a small quadrotor UAV flying rapidly over a field [1]. The camera moves erratically, undergoing large amplitude rotations and translations. As a result, people are usually only within the field of view for a short time (as briefly as several seconds). Although the size of people varies due to the changing altitude of the camera, the height of people is often as small as 20 pixels tall. Detecting people in these scenarios is extremely difficult, due to the low resolution. Figure 2: Example images from aerial video (4 seconds apart). Size of images is 380×640 pixels. Although it is very difficult to recognize a person in a single low resolution image, the task is much easier when a short sequence of images is used. For example, Figure 3 shows a single low resolution frame in which it is difficult to recognize the object. The right portion of the figure is a sequence of frames in which a subject is performing a recognizable movement; i.e., walking. Despite the deficiency of recognizable features in the static image, the movement can be easily recognized when the sequence is put in motion on a screen. This phenomenon is well known from the pioneering work of Johansson [2], whose moving light display experiments showed convincingly that the human visual system can easily recognize people in image sequences, even though the images contained only a few bright spots (attached to their joints). A static image of spots is meaningless to observers, while a sequence of images creates a vivid perception of a person walking. The most common and successful existing approaches for pedestrian detection, such as the Dalal detector [3], use histogram of oriented gradient (HOG) features, with a support vector machine (SVM) classifier. However, these approaches perform poorly when the height of pedestrians is 30 pixels or less [4; 37].
  • 3. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -57 Figure 3: (Left) Single frame. (Right) Sequence of frames from a video of a walking person. To compensate for the lack of information in a single frame, we propose a novel detection method that recognizes pedestrians in short sequence of frames. Namely, we take the single-frame HOG-based detector and extend it to multiple frames. Our approach (Section 3) uses HOG features extracted from a spatiotemporal volume of images. We use a volume composed of up to 32 “slices”, where each slice is a small sub image of 32×32 pixels. This volume represents duration of about one second or less. The idea is that the motion of a person walking is distinctive, and we can train a classifier to recognize the temporal sequence of feature vectors within the volume. As an example, consider the images of a walking person shown in the top row of Figure 4. The corresponding HOG features are shown in the second row. The third row shows images of a moving car. The sequence of corresponding HOG features (bottom row) of the negative example is visually quite different from that of the positive example. The main contribution of this work is the development of a novel multi-frame HOG-based pedestrian detector that is able to detect pedestrians in video, at lower resolutions than has been reported in previous work. Our detector achieves significantly better accuracy than existing detectors on challenging video datasets. The rest of this paper is organized as follows. We discuss related work in Section 2. Our pedestrian detection method is presented in Section 3. A detailed description of experimental results is presented in Section 4. Section 5 summarizes conclusions and future work. (a) (b) Figure 4: (a) Positive example (pedestrian). (b) Negative example (part of a car passing by a post) II. RELATED WORK There is an extensive body of literature on people detection, although there is less work on pedestrian detection in low-resolution videos, and relatively little work on pedestrian detection in aerial videos. Comprehensive reviews can be found in [4; 5; 6; 7]. Most work focuses on pedestrian detection in single high-resolution images. Instead of an explicit model, an implicit representation is learned from examples, using machine learning techniques. These approaches typically extract features from the image and then apply a classifier to decide if the image contains a person. Typically, the detection system is applied to sub-images over the entire image, using a sliding window approach. A multi-scale approach can be used, to handle different sizes of the person in the window. Alternatively, the detection system can be preceded by a region-of-interest selector, which generates initial object hypotheses, using some simple and fast tests. Then the full person detection system is applied to the candidate windows. The most common and successful approaches for single frame pedestrian detection use gradient-based features. The Dalal-Triggs detector [3] used a histogram of oriented gradient (HOG) features, with a support vector machine (SVM) classifier. A model for the shape of a person is learned from many training examples. The HOG + SVM approach is still considered as a competitive baseline in pedestrian detection.
  • 4. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -58 Although this approach has excellent performance in high resolution images, studies [4] have shown that performance degrades dramatically when the height of pedestrians is 30 pixels or less. A variation on this approach is to use deformable part models for detection [9]. The part models are related to a root model using a set of deformation models representing the expected location of body parts. Although this approach can handle a wider variety of poses, Park, et al [10] found that the part-based model is not useful for pedestrian heights less than 90 pixels. Same is applied to some recently used approaches such as deep convolutional neural networks (CNNs) which have been widely adopted for pedestrian detection [38] and achieved state-of-the-art performance, but not for low-resolution applications. Recently, a wave of deep CNNs based pedestrian detectors have achieved good performance on several high-quality pedestrian benchmarks [41; 42] which is not the case for the low resolution applications targeted by our work. Contextual information can improve recognition, since in traffic scenes pedestrians are often around vehicles [12; 13; 39]. The approach of [40] proposes a segmentation and context network (SCN) structure that combines the segmentation and context information for improving the accuracy of pedestrian detection. Our work does not use contextual information, since we wanted to make our approach more general and not limit our domain to traffic scenes. Other approaches use features that are similar to Haar wavelets [14; 15; 16]. Viola and Jones [14] popularized this approach and showed its applicability to face detection. The features are differences of rectangular regions in the images. These are simple and very fast to compute. Although each feature is not very discriminatory, a large number of features can to be chained together to achieve good performance. The method of AdaBoost is used to train the classifier and select features. In [15] Viola and Jones use Haar-like wavelets to compute features in pairs of successive images for pedestrian detection. Jones and Snow [16] extended the above algorithm to make use of 10 images in a sequence. This algorithm is the closest one to our approach, since it uses a relatively long sequence. They used two types of Haar-like features: Features applied within each frame, and differences of features between two different frames. On the PETS2001 dataset [17], their detector achieves a detection rate from 84% to 93% with a FP rate of 10-6. They were able to detect pedestrians down to a size of 20 pixels tall, in videos taken from stationary cameras. To get better performance, one might try to extend the Jones and Snow method to work on longer sequences of images. However, in this case the number of potential Haar-like features grows to an unmanageable amount. Because of the large number of feature hypotheses that need to be examined at each stage, the training time can be quite slow (in the order of weeks). Other approaches also use the additional information provided by image sequences to improve detection. For example, [8] uses a two stage classifier which uses the detection scores from previous images to improve the classification in the current image. Optical flow information can be incorporated into a feature vector along with image gradient information [18; 19]. In [20] gradient weighted optical flow from the first frame of the sequence to detect objects (face or person), then it is convolved with magnitude of gradient for further tracking. Other work [11; 22; 23] extract spatiotemporal gradient features from the spatiotemporal volume of images. These methods were developed to recognize actions in videos. Conceivably these approaches could also be used to detect pedestrians. However, in low resolution image sequences, it would be difficult to extract local features, since the volume is so small. The work of [21] proposed 3DHOG descriptor to characterize the features of motion with a co-occurrence spatio- temporal vectors. To increase discrimination, HOG HOF (HOG Histogram of Optical Flow) and the STHOG (Spatio- temporal HOG) descriptors are proposed at the price of very high computation cost. The optical flow-based features appear to help in high resolution [4], but in low-resolution scenarios, detection results are poor due to noise, camera jitter, and the limited number of pixels available. There is relatively little work focused on pedestrian detection in low resolution aerial videos. Most previous work using aerial video performs motion compensation by registering each image to a reference image. In this way a short term background image can be computed, which can be used to detect foreground objects using image subtraction (e.g., [24]). The work of [25] applied a joint global–local information algorithm to suppress the background interference and enrich the description of pedestrian. It is based on extracting features from human body parts, which are not available in low resolution applications. Some approaches for pedestrian detection in aerial images use the same methods that were discussed above, namely HOG-like features with an SVM classifier (e.g., [26]) or Haar-like features with an AdaBoost classifier (e.g., [27]). Other approaches combine these features with additional information. For example [28] uses shadows cast by people for classification. However, shadow information may not be reliable in low resolution videos. The only approach that was found that uses image sequences for pedestrian detection in aerial videos was [24]. This approach computes frequency measures of sub windows to detect the periodic motion of legs, arms, etc. However, quantitative results were not presented. Some approaches integrate detection and tracking. For example, [29] extracts hypotheses of body articulations from images; then combines those into “tracklets” over a small number of frames, using a dynamic model of limbs and body parts.
  • 5. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -59 This requires relatively high resolution images. Other approaches can detect very small pedestrians, but require a longer sequence of frames. For example, the approach of [30] can detect and track vehicles and people only a few pixels in size, but uses sequences ranging from tens of seconds to minutes long. The work of [31] checks 2D object detections for consistency with scene geometry and converts them to 3D tracks. In videos taken from a rapidly moving UAV, a pedestrian may only be visible in the field of view for a short time (a few seconds). Thus, methods that require a long sequence of images to build up a track file in order to perform detection are not applicable. In contrast, our system (described in the next section) only requires a short sequence of images and is applicable to situations where the pedestrian is only visible for a short time. III. PEDESTRIAN DETECTION METHOD Figure 5 shows the architecture of the system, which contains two phases: a training phase and a detection phase. In the training phase, positive training examples (i.e., volumes containing pedestrians) and negative training examples (i.e., volumes not containing pedestrians) are created. Features are then extracted from these volumes. In the detection phase, the binary classifier constructed during the training phase is used to scan over detected ROIs in sequences of unseen testing images to search for pedestrians. 3.1 Video Stabilization In the case of aerial video, where the images are taken from a moving camera, stabilization is applied to short overlapping sequences of 32 frames. We start with the first frame of each sequence and use it as the reference frame for the sequence. The remaining frames are registered to the first frame. The results are overlapping groups of 32 frames, which are co-registered. This aids the next step, which is to detect ROIs containing potential moving objects. In the case of videos taken from stationary cameras, the stabilization step can be skipped, since the images are already co-registered. We assume that the camera is looking down at the ground, which is approximately a planar surface. Thus, a homography (projective) transform describes the relationship between any two images. To compute the transform, Harris corner interest points are matched between the reference image and each subsequent image. We fit a homography to the matched points, using RANSAC to eliminate outliers. We then apply the transform to align the current image to the reference image (Figure 6). The assumption of a planar surface is only an approximation, although it is usually good if the camera is high above the ground. However, any objects (such as buildings and trees) above the plane will be miss-registered, and may result in ROIs that do not correspond to actual moving objects. Our classifier will subsequently filter these out, since motion patterns within these ROIs do not match the patterns of a walking person. 3.2 ROI Detection After stabilization, background subtraction is used to identify potential moving objects in the scene for subsequent analysis. Figure 5: The architecture of our overall pedestrian detection system. A background model for each group of 32 frames is constructed by computing the mean of the frames. Then the difference between the middle frame in the sequence and the background model is computed.
  • 6. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -60 Initial foreground pixels are identified by thresholding the absolute value of the difference image. In our work, the threshold was empirically set to 15−25 (depending on the image sequence). Morphological operations are applied to eliminate small regions and join broken regions. The opening operation removes small objects from the foreground, placing them in the background, while closing operation removes small holes in the foreground. A disk-shaped structuring element is used with radius of size of 4 or more. Connected components whose area lies between 20 pixels and 500 pixels are extracted and their bounding boxes become the final ROIs. An example of an image containing the final detected ROIs is shown in Figure 7. Although simple in design, the ROI detector performs well in detecting potential moving objects. We deliberately tuned it to be very sensitive. For example, in the VIRAT aerial dataset (described in Section 4), only 49 out of 1607 actual moving objects were not detected, and in the PETS 2001 dataset, only 88 out of 1929 actual moving objects were not detected. Of course, the ROI detector also occasionally detects non-moving objects, due to image noise and miss-registration. The classifier will subsequently filter out the non-moving objects (as well as non-pedestrians). Figure 6: (Left) Reference frame (1st of sequence of 32 frames). (Right) The 16th frame in the sequence, registered to the first. Figure 7: An example of ROIs shown on: (Left) The binary image. (Right) The registered Image. 3.3 Formation of Spatiotemporal Volumes A sliding window of size 32×32 pixels is scanned within each ROI detected above. At each position, a spatiotemporal volume is created by extracting a sequence of sub images (slices), at a fixed position in the registered images, for N frames (we used up to 32 frames). The slice window size was chosen to be 32×32 pixels. This size is large enough so a pedestrian remains within the window throughout the sequence at normal walking speeds, which usually corresponds to about ½ pixel per frame. Since our detector is trained to detect pedestrians with a height of approximately 20 pixels, this allows a border of about 6 pixels above and below the person (Figure 8). To handle possible variations in scale, we extract volumes at multiple scales in the image sequence by creating a pyramid of images of different sizes. A scale factor of 0.75 is used between levels of the pyramid (for a total of 6 pyramid levels). This allows us to detect people that are taller than 20 pixels – the detector will detect people at the image level where the height is about 20 pixels. 3.4 Feature Extraction, Normalization, and Dimensionality Reduction HOG features are then extracted from each of the slices that make up the volume. We divide each 32×32 pixel slice into square cells (typically 4×4 pixels each), and compute a histogram of gradient directions in each cell. We use 9 bins for the gradient directions, which represent unsigned directions from 0°-180°. Following the method of [3], cells are grouped into (possibly overlapping) blocks, where each block consists of 2×2 cells. The features from each slice are then concatenated into a single large vector. Variations in illumination affect the magnitudes of the gradients. The influence of large gradient magnitudes can be reduced using normalization, which can be performed in input space or in feature space. We found out that normalization in input space has little or no effect on performance and sometimes decreases the performance. So, normalization is performed in feature space. Following the method of [32], we normalize the volumetric blocks using the L2-norm followed by clipping to limit the maximum values (Lowe-style clipped L2-norm).
  • 7. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -61 Figure 8: Spatiotemporal Volume. (Left) Positive example. (Middle) Gradient. (Right) Computed HOG, with volumetric block (shown in red color), and cell (shown in yellow color) The difference is that in our algorithm the features were normalized within each volumetric block, meaning that the sequence of blocks across the N slices (e.g. 16 slices) is at the same place in each slice, as shown in Figure 9. Next, the features from the volumetric block in all slices were concatenated into a single feature vector. The result of the volumetric normalization step is a set of feature vectors that are better invariant to changes in illumination or shadowing. Using lower dimensional features produces models with fewer parameters, which speeds up the training and detection algorithms. Following the work of [9], we apply Principal Components Analysis (PCA) to the feature victors to reduce the dimensionality of features. In the learning stage, we collect a large number of 36- dimensional HOG features corresponding to blocks and perform PCA on them. The eigen values indicate that the linear subspace spanned by the top 50% eigenvectors can capture the essential information in the features. Figure 9: (a) Volumetric Block. (b) Spatiotemporal volume of 16 slices. 3.5 Classification To search for pedestrians, we apply a classifier to each spatiotemporal volume within the detected ROIs. We first train a support vector machine (SVM) on examples of positive (pedestrian) and negative (non-pedestrian) feature vectors. To extract positive examples from the training videos, the following procedure was followed. A pedestrian was manually selected in one of the images (in one of the detected ROIs) and a square sub window was extracted from the image surrounding the pedestrian. This sub window was scaled such that the person was 20 pixels tall, and the sub window size was 32 × 32 pixels. Next, a sequence of sub windows was extracted from the registered images; half of them proceeding and half of them following the central image, at the same fixed place in all images, and the sub windows were similarly scaled. A total of 32 such slices were assembled into a spatiotemporal volume, representing a single positive example (Figure 10). Negative examples were also extracted from the training images. These were spatiotemporal volumes of the same size as the positive examples, but sampled randomly from completely person-free areas of detected ROIs. We used a freely available SVM-based classifier; the OSU-SVM MATLAB toolbox [33], which is based on LIBSVM [34]. Figure 10: Four different training example sequences. Top: two sequences from VIRAT [31]. Bottom: sequences from UCF-2009 [32]. Each sequence contains 8 slices subsampled from a sequence of 32 slices.
  • 8. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -62 K-fold cross-validation is used for parameter selection, by partitioning the training data into 5 equally sized segments, and then performing iterations of training and validation to pick the best parameters for the SVM kernels. We experimented with two kernels – a linear kernel and a radial basis function kernel. Although the non- linear kernel gives slightly more accurate results, for simplicity and speed we use the linear kernel as the baseline classifier throughout this study. Finally, we perform non-maximum suppression on the detections. If the bounding boxes of two or more detections overlap by more than 50%, they are merged into one detection by averaging their top left coordinates. IV.DATASETS, EXPERIMENTS, AND RESULTS The results in this section use the following default parameters: Blocks of size of 2×2 cells, with no overlap, and each cell consists of 4×4 pixels. We use 9 bins for gradient orientations, and normalize volumetric blocks using a clipped L2-norm. The size of image slices is 32×32 pixels and a linear SVM classifier is used. In addition to evaluating our algorithm, we compare our results to two other algorithms: the Dalal-Triggs algorithm [3], which is among the most popular approaches for single frame pedestrian detection, and the Jones and Snow algorithm [15], which was the best performing algorithm on low resolution pedestrians that we found. If we limit our algorithm to use only a single image, it is essentially the same as the Dalal-Triggs algorithm. Therefore, we can directly compare the performance of our algorithm to that of the Dalal-Triggs algorithm on each of the datasets. Although we did not have an implementation of the Jones and Snow algorithm, they give performance results on one of the datasets that we used, so we can compare our algorithm to theirs on that dataset. For evaluation and comparison we use the following standard metrics, based on the possible outcomes of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). The “Detection Rate” (DR), or True Positive Rate (TPR) measures how accurate the classifier is in sensing pedestrians. It is the proportion of positive examples (pedestrians) that were correctly identified. It is calculated as: FNTP TP DR   PositivesTotal ClassifiedCorrectlyPositives The “False Positive Rate” (FPR) is the proportion of negative examples (non-pedestrians) that were incorrectly classified as positives (pedestrians). It is also known as the false positives per window (FPPW) rate. It is calculated as: FPTN FP FPR   NegativesTotal ClassifiedyIncorrectlNegatives A “Receiver Operating Characteristic” (ROC) curve depicts the tradeoff between hit rates (DR) and false alarm rates of a classifier, as some parameters of the algorithm are varied. 4.1 Datasets For evaluation of the proposed approach, we used six datasets that are representative for our application. These are two stationary camera datasets (PETS2001 [17], VIRAT public 1.0 [35]), and three aerial datasets (VIRAT Fort AP Hill [35], UCF-2009 [32], UCF-2007 [36]). In addition, we created our own set of aerial video sequences of natural scenes to be realistic. It simulates scenarios of search and rescue scene or a border monitoring scene. In the stationary datasets, videos were collected from a stationary surveillance camera and no video stabilization is needed. All the images were converted from color to grayscale since color information was not used during feature extraction. In addition, grayscale images were used during image registration. All these datasets are low resolution; however, the height of people in some images is greater than 20 pixels. Although our detector was designed to detect people with heights of 20 pixels, it can still detect these larger pedestrians. Since an image pyramid was used, the detector can detect people at the image level where the height was about 20 pixels. This guarantees that at some level of the pyramid the people will be close to 20 pixels height and can be detected by the algorithm. 4.1.1 Stationary Datasets The video sequences were taken by stationary cameras at the top of high buildings to record large numbers of event instances across a very wide area while avoiding occlusion as much as possible. The cameras look down upon a scene containing streets with buildings, trees, and parking lots. Cars and pedestrians periodically move through the scene. The PETS 2001 dataset [17] is popular for automated surveillance research. It was also used by Jones and Snow to evaluate their algorithm [16] (that we are comparing to). It contains 16 video sequences of about two to four minutes length, with a frame rate of 25 frames/second, and frame size of 768 pixels in width and 576 pixels in height. Half the videos are designated as training, and half for testing. For this dataset, we extracted 2,560 training examples from the training videos. 960 of them were positive examples and 1600 were negative examples. The second stationary camera dataset is the stationary VIRAT public 1.0 dataset [35], with a frame rate of 30 frames/second, and frame size of 1280 pixels in width and 720 pixels in height. For this dataset, the training set consisted of 1760 examples: 780 of them were positive examples and 980 were negative examples.
  • 9. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -63 4.1.2 Aerial Datasets The VIRAT Fort AP Hill aerial dataset [35] was recorded using an electro-optical sensor from a military aircraft flying at a height up to 1000 meters. The resolution of these aerial videos is 640×480 with 30Hz frame rate, and the typical pixel height of people in the collection is about 20 pixels. The videos include buildings and parking lots where people and vehicles are engaged in different activities. The data is challenging in terms of low resolution, uncontrolled background clutter, diversity in scenes, changing viewpoints, changing illumination, and low image sharpness. For this dataset, the training set consisted of 1,280 positive examples and 1,280 negative examples. The UCF-2009 (also known as UCF-LM) dataset [36] video sequences were obtained using an R/C-controlled blimp equipped with a camera mounted on a gimbal in a dirt parking lot near the football stadium in Florida. The flying altitudes ranged from 400–450 feet. Actions were performed by different actors. The UCF-2009 dataset has a resolution of 540×960 pixels with a 23Hz frame rate. For this dataset, the training set consisted of 1,000 positive volumes and 1,000 negative volumes. The UCF-2007 dataset [36] is an earlier dataset from UCF, and is more challenging due to large variations in camera motion, rapidly changing viewpoints, changes in object appearance, pose, object scale, cluttered background, and illumination conditions. In addition, it suffers from interlacing, motion blur, and poor focus. The resolution is of 854×480 pixels with a 30 Hz frame rate. To remove artifacts caused by the interlacing, we subsampled the images, so the final resolution was 427×240 pixels. For this dataset, the training set consisted of 250 positive volumes and 250 negative volumes. For the set of sequences that we have created; it was recorded using a DJI Phantom 3 Quadcopter. From now on we will refer to this data as the DJI dataset. This dataset consists of aerial imagery of people walking across a field, at a frame rate of a 25 Hz with image size of 720×1280 pixels. This dataset is challenging since the pedestrians are very small and there are shadows cast by the pedestrians. In addition, there are variations in camera motion, changes in illumination conditions, and some low image sharpness. For this dataset, the training set consisted of 780 positive volumes and 1,260 negative volumes. 4.2 Experimental Results Test sets were collected from portions of the videos that were different from the training sequences. ROIs were detected in these images and the detector was scanned within each ROI. To show the main concept, Figure 11 presents an example of the results of applying the algorithm to one of the aerial datasets. After detecting ROIs, the classifier was then applied to each volume within the ROIs. We shift volumes in the time direction every 4 frames. This means that a pedestrian detected in frames 1:32 and in frames 5:37 is counted as two detected instances. This helps in recovering from miss-detections and increases the testing set. Following [9], a detection is considered to be correct if the area of overlap between the detection window and the ground truth window exceeds 50%. For PETS 2001, the total number of tested examples within the ROIs was 1,235 positive examples (16 slices each) and 8,730 negative examples (16 slices each). Using the detector with the default parameters, a detection rate of 94.7% was achieved with a false positive rate (FPR) of 10-6. At the same FPR, the Dalal algorithm [3], which uses single images, achieved a detection rate of 73%. On the same dataset, the Jones and Snow algorithm [16] achieved a detection rate of 93%, when 8 detectors were combined. For the stationary VIRAT dataset, the total number of tested examples was 720 positive examples and 3860 negative examples. Using the detector with the default parameters, we achieved a detection rate of 91% with a false positive rate of 10-6. At the same FPR rate, the single- frame detector of Dalal achieved a DR of 70%, on the same dataset. For the aerial VIRAT dataset, a total of 12,600 volumes were classified during the scanning over the detected ROIs, of which 5,016 were positive examples and 7,584 were negative. Using the detector with the default parameters, it achieved a DR of 78% at FPR of 4×10-3. This value of FPR means that only 4 in 1000 tested non-pedestrian volumes were falsely classified as pedestrians. At the same FPR the single-frame Dalal detector achieved a detection rate of 39%. For UCF 2009, a total of 5,880 volumes were classified; 2,184 of them were positives, and 3,696 were negatives. Using the detector with parameters tuned for the best performance on this dataset, DR is 92% at FPR of 4×10-3. At the same FPR the Dalal detector achieved a DR of 50%. For the UCF-2007 dataset, a total of 500 volumes were classified; half of them positives and half of them were negatives. Using the detector with the default parameters, DR is 73% at FPR of 4×10-3. At the same FPR the Dalal detector achieved a DR of 41%. For the DJI dataset, a total of 1,820 volumes were classified during the scanning over the detected ROIs, of which 740 were positive examples and 1,080 were negative. The detector achieved a DR of 71% at FPR of 4×10-3. At the same FPR the single-frame Dalal detector achieved a detection rate of 54%. Figures 12-15 shows detection examples on frames from the six datasets discussed above. Detection results are shown as boxes, where TP is “true positive”, FP is “false positive”, TN is “true negative”, and FN is “false negative” (to avoid cluttering the figures, we are not showing all the detected TNs). Figure 12 (a,b) shows example of two frames from the Aerial VIRAT dataset with TP, FP, TN and FN results labeled. Figure 12(c) shows the sequence of slices for one of the TP detections. Figure 12(d) shows an example of a sequence of slices for one of TNs. TNs result from scanning the classifier around false ROIs that correspond to motion regions resulting from non-pedestrian motion (e.g. vehicles) or from static objects due to non-perfect stabilization.
  • 10. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -64 Here an ROI was detected on a static object (the corner of a building). Since the motion pattern for this object does not match that of a pedestrian, the classifier labels this as a non-pedestrian. Corresponding results on the UCF 2009 dataset are shown in Figure 13. Examples of sequences of slices for TN detections are shown in Figure 13(c, d). The TNs shown here resulted from scanning the classifier around false ROIs corresponds to non-pedestrian motion; e.g. a car and a bicycle. The motion patterns here do not match that of pedestrians, so the classifier labels them as non-pedestrians. Figure 14 shows frames from UCF-2007 dataset with detections, and Figure 15 shows examples from the stationary VIRAT dataset and PETS 2001 dataset with detections. Corresponding results on the DJI dataset are shown in Figure 16. Examples of sequences of slices for TN detections are shown in Figure 16(c, d). (a) (b)
  • 11. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -65 (c) (d) Figure 12: (a,b) Two frames from aerial VIRAT dataset with detections. (c) Slices for one TP in (a). (d) Slices for the TN shown in (b). (a) (b) (c) (d) Figure 13: Detections on two frames from UCF-2009 Dataset. (a) FP and TPs. (b) TNs. (c) and (d) Sequences for TNs shown in (b). The TNs shown here resulted from scanning the classifier around false ROIs corresponds to non-pedestrian motion; e.g. part of a moving vehicle. The ROC curves are plotted in Figures 17-18. Figure 17(left) shows the ROC curves of the three detectors: Dalal detector, Jones-Snow detector, and our multi-frame HOG detector on PETS 2001 dataset. (a) (b) Figure 14: (a) and (b) Two frames from UCF-2007 dataset with example detections. Figure 17(right) shows the ROC curves of the Dalal detector and the multi-frame HOG detector on the stationary datasets. The multi-frame detector always outperforms the other detectors. (a) (b) Figure 15: Example detections, (a) PETS2001 dataset. (b) Stationary VIRAT dataset.
  • 12. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -66 (a) (b) (c) (d) Figure 16: Detections on two frames from DJI Dataset. (a) TP, TN and FP. (b) TP. (c) and (d) Sequences for TP and FP shown in (a). Figure 18 shows the ROC curves of the Dalal detector and the multi-frame HOG detector on the four aerial datasets. The multi-frame detector always gives a better detection rates than the Dalal detector 4.3 Performance Study and Discussion We hypothesized that using information from multiple images in the detector should be better than using information from only a single (or a few) image, since the motion patterns should aid recognition. We trained and tested the detector on volumes consisting of a different number of slices per volume, ranging from 1 to 32 slices. Using parameters tuned for the best performance, different classifiers were trained and tested. In each experiment, the same number of slices per volume is used in training and in testing. Figure 19(top) shows the DR of each classifier on the stationary datasets at FPR of 10-6, while Figure 19(bottom) shows the DR of each classifier on the aerial datasets at FPR of 4×10-3. The results confirm that the use of multiple images for detection dramatically improves the results. The improvement in detection rate increases with the number of slices, until a total of 16 slices is reached (corresponding to about a half of a second of walking). Figure 17: (Left) ROC curves of the three detectors on PETS 2001 dataset. (Right) ROC curves of the two detectors on Stationary VIRAT dataset. (a) (b)
  • 13. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -67 (c) (d) Figure 18: ROC curve for the Multi-Frame HOG detector and the Dalal detector on (a) Aerial VIRAT dataset. (b) Aerial UCF-2009 dataset. (c) Aerial UCF-2007 dataset. (d) DJI dataset. Figures 20-21 show the ROC curves for each dataset, as the number of slices is varied. In the aerial VIRAT dataset (Figure 20), using a single slice per volume detector gives a DR of 40% at FPR of 4×10-3. At the same FPR, the use of 16 slices per volume raises the DR to 78%. For the UCF-2009 dataset (Figure 21), using single frame (i.e., one slice per volume detector), gives a DR of 50% at a FPR of 4×10-3. At the same FPR, the use of 16 frames improves DR to 92%. Note that the single slice case corresponds to the Dalal detector. We also studied the effect of cell size. Figure 22 shows the ROC for UCF-2009, as the cell size is varied. Cells of size of 4×4 pixels perform best. As shown in Figure 23 (left), the size of a person’s head, forearm, upper leg, and lower leg appear to be approximately 4x4 pixels, which may allow cells of 4x4 pixels to capture the shape and motion of these parts. Figure 23(right) gives some insight on what cues the detector uses to make its decision. It shows the weights corresponding to each element of the feature vector; i.e., the value of the elements of w’s in the classifier decision function equation, bxwxf T )( . The weights are shown for the central slice in the volume. The figure shows that the contours of pedestrian head, shoulders, and lower legs have the highest weights which represent the main cues for detection. Figure 19: The effect of the number of slices per volume on DR. (Top) stationary datasets. (Bottom) Aerial datasets.
  • 14. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -68 Figure 20: ROC of the classifiers with different number of slices per volume: aerial VIRAT. Figure 21: ROC of the classifiers with different number of slices per volume: UCF-2009 Dataset. Figure 22: ROC Curve: Cell size effect. Figure 23: (Left) Cells of size 4×4 pixels appear to match key parts of the pedestrian’s body. (Right) SVM positive weights. 4.4 Frame Randomization To confirm our hypothesis that the classifier learns the characteristic motion of walking pedestrians, we randomized the order of frames. The expectation is that giving the classifier temporally incoherent data should reduce the performance. We followed the same procedure as before to extract ROIs and form spatiotemporal volumes. However, this time we randomized the slices in both the training and testing volumes. Figure 24 shows detection rates obtained from multiple tests on the two datasets. Figure 24: The effect of randomizing the order of frames on classification performance.
  • 15. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -69 The results show that the use of randomized frame sequences degraded detection rates by an average of 8% in the aerial VIRAT dataset experiments and by an average of 12% in the UCF dataset experiments at FPR of 4×10-3. For example, for the UCF dataset, when the sequences were used in their normal coherent order, we obtained a detection rate of 92%. In one of the tests in which randomized sequences were used, at the same FPR, the detection rate degraded to 80%. These experiments indicate that the classifier is indeed learning the characteristic motion of walking pedestrians. V. CONCLUSIONS AND FUTURE WORK We presented a method for detecting pedestrians in low-resolution videos, using a novel multiple-frame HOG- based feature approach. The method is designed to detect pedestrians with heights as small as 20 pixels or higher. On five public datasets including three challenging aerial datasets and our own created dataset, the method achieves excellent results; i.e., detection rates of 78%, 92%, %73, and 71% at a false positive rate of 4×10-3 on aerial VIRAT, aerial UCF-2009, aerial UCF-2007, and aerial DJI respectively. For the stationary datasets the method achieves a detection rate of 94.7% and 91% at a false positive rate of 10-6 on PETs 2001 and stationary VIRAT datasets respectively. Figure 25: Example from the results of detecting pedestrians in challenging UAV videos. Detections are shown at different pyramid levels. We also have obtained excellent preliminary results on UAV videos [1] posted on YouTube (Figure 25). The detector needs only a short sequence of frames to perform detection. Thus, it is applicable to situations where the camera is moving rapidly and does not dwell on the same portion of the scene for very long. We studied the benefit of using multiple frames on the performance of the detector. We confirmed that using additional frames improves the performance significantly, up to about 16 frames. We also found that the detector can learn the coherence of pedestrian motion. Future work should evaluate other classifiers to see if they perform better than the simple SVM we used. Another direction is to train the classifier to detect multiple classes, such as fast walking, stationary, and running people. Finally, our detector could be integrated into a standard tracker such as a multiple hypothesis tracker or particle filter-based tracker. Even in aerial videos from rapidly moving cameras, a person is often in the field of view for multiple seconds. Therefore, multiple detections can be associated into a single track, to improve accuracy. Acknowledgement This work was partially supported by a gift from Northrop-Grumman. REFERENCES 1. “Airsoft UAV footage”, downloaded from https://www.youtube.com/watch?v=rppbsvUSpxY, August 2016. 2. Johansson, Gunnar. 1973. "Visual perception of biological motion and a model for its analysis." Attention, Perception, & Psychophysics 14, no. 2: 201-211. 3. Dalal, Navneet, and Bill Triggs. 2005. "Histograms of oriented gradients for human detection. 2005." In Computer Vision and Pattern Recognition. IEEE Computer Society Conference, vol. 1, pp. 886-893. 4. Dollár, Piotr, Christian Wojek, Bernt Schiele, and Pietro Perona. 2012. "Pedestrian detection: An evaluation of the state of the art." Pattern Analysis and Machine Intelligence, IEEE Transactions on 34, no. 4: 743-761. 5. Zhang, Shanshan, Rodrigo Benenson, Mohamed Omran, Jan Hosang, and Bernt Schiele. 2016. "How Far are We from Solving Pedestrian Detection?." arXiv:1602.01237. 6. Enzweiler, Markus, and Dariu M. Gavrila. 2009. "Monocular pedestrian detection: Survey and experiments." Pattern Analysis and Machine Intelligence, IEEE Transactions on 31, no. 12: 2179-2195. 7. Benenson, Rodrigo, Mohamed Omran, Jan Hosang, and Bernt Schiele. 2014. "Ten years of pedestrian detection, what have we learned?” ECCV, CVRSUAD workshop. 8. A. González, D. Vázquez, S. Ramos, A. M. López, J. Amores. 2015. "Spatiotemporal Stacked Sequential Learning for Pedestrian Detection". In Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis, Spain.
  • 16. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -70 9. Felzenszwalb, Pedro, David McAllester, and Deva Ramanan. 2008. "A discriminatively trained, multiscale, deformable part model." In Computer Vision and Pattern Recognition. IEEE Conference, pp. 1-8. 10. Park, Dennis, Deva Ramanan, and Charless Fowlkes .2010. "Multiresolution Models for Object Detection." Proc. European Conf. Computer Vision (ECCV), pp. 241-254. 11. Klaser A, Marszałek M, Schmid C. A spatio-temporal descriptor based on 3d-gradients. 2008. In BMVC 19th British Machine Vision Conference (pp. 275-1). British Machine Vision Association. 12. Yan, Junjie, Xucong Zhang, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2013. "Robust multi-resolution pedestrian detection in traffic scenes." In Computer Vision and Pattern Recognition (CVPR),IEEE Conference on, pp. 3033-3040. 13. Hu, Hai-Miao, Xiaowei Zhang, Wan Zhang, and Bo Li. 2015. "Joint global–local information pedestrian detection algorithm for outdoor video surveillance." Journal of Visual Communication and Image Representation 26: 168-181. 14. Viola, Paul, and Michael J. Jones. 2004. "Robust real-time face detection." International journal of computer vision 57, no. 2: 137-154. 15. Viola, Paul, Michael J. Jones, and Daniel Snow. 2005. "Detecting pedestrians using patterns of motion and appearance." International Journal of Computer Vision 63, no. 2: 153-161. 16. Jones, Michael J., and Daniel Snow.2008. "Pedestrian detection using boosted features over many frames." In Pattern Recognition. 19th International Conference, pp. 1-4. IEEE. 17. PETS dataset. http://www.cvg.cs.rdg.ac.uk/pets2001/pets2001-dataset.html. 18. Dalal, Navneet, Bill Triggs, and Cordelia Schmid. 2006. "Human detection using oriented histograms of flow and appearance." Computer Vision–ECCV: 428-441. 19. Wojek, Christian, Stefan Walk, and Bernt Schiele. 2009. "Multi-cue onboard pedestrian detection." In Computer Vision and Pattern Recognition,(CVPR ), IEEE Conference on, pp. 794-801. 20. Mukherjee, Snehasis, and Dipti Prasad Mukherjee. 2015. "A motion-based approach to detect persons in low- resolution video." Multimedia Tools and Applications 74, no. 21: 9475-9490. 21. Hua C, Makihara Y, Yagi Y, Iwasaki S, Miyagawa K, Li B. 2015. Onboard monocular pedestrian detection by combining spatio-temporal HOG with structure from motion algorithm. Machine Vision and Applications 26(2-3):161-183. 22. Dollár, Piotr, Vincent Rabaud, Garrison Cottrell, and Serge Belongie. 2005. "Behavior recognition via sparse spatio-temporal features." In Visual Surveillance and Performance Evaluation of Tracking and Surveillance. 2nd Joint IEEE International Workshop on, pp. 65-72. 23. Klaser, Alexander, Marcin Marszałek, and Cordelia Schmid. 2008. "A spatio-temporal descriptor based on 3d- gradients." In BMVC19th British Machine Vision Conference. pp. 275-1. 24. Narayanaswami, Ranga, Anastasia Tyurina, David Diel, Raman K. Mehra, and Janice M. Chinn. 2011. "Discrimination and tracking of dismounts using low-resolution aerial video sequences." In SPIE Optical Engineering+ Applications, pp. 81370H-81370H. 25. Hai-Miao Hu, Xiaowei Zhang, Wan Zhang, Bo Li. 2015. "Joint global–local information pedestrian detection algorithm for outdoor video surveillance". Journal of Visual Communication and Image Representation, Volume 26 168-181. 26. Oreifej, Omar, Ramin Mehran, and Mubarak Shah. 2010. "Human identity recognition in aerial images." In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on, pp. 709-716. 27. Gaszczak, Anna, Toby P. Breckon, and Jiwan Han. 2011. "Real-time people and vehicle detection from UAV imagery." In Proc. SPIE Conference Intelligent Robots and Computer Vision XXVIII: Algorithms and Techniques, volume 7878, doi: 10.1117/12.876663 28. Reilly, Vladimir, Berkan Solmaz, and Mubarak Shah. 2010. "Geometric constraints for human detection in aerial imagery." Proc. European Conf. Computer Vision (ECCV), pp. 252-265. 29. Andriluka, Mykhaylo, Stefan Roth, and Bernt Schiele. 2008. "People-tracking-by-detection and people- detection-by-tracking." In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on, pp. 1-8. 30. Basharat, Arslan, Matt Turek, Yiliang Xu, Chuck Atkins, David Stoup, Keith Fieldhouse, Paul Tunison, and Anthony Hoogs. 2014. "Real-time multi-target tracking in wide area motion imagery." In Applications of Computer Vision (WACV),IEEE Winter Conference on, pp. 839-846. 31. Leibe, Bastian, Aleš Leonardis, and Bernt Schiele. 2008. "Robust object detection with interleaved categorization and segmentation." International journal of computer vision 77, no. 1-3: 259-289. 32. Lowe, David G. 2004. "Distinctive image features from scale-invariant key points." International journal of computer vision 60, no. 2: 91-110. 33. OSU-SVM Toolbox for MATLAB. Last Update: 2009-07-17.http://sourceforge.net/projects/svm.
  • 17. International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Issue 03, Volume 6 (March 2019) www.irjcs.com _________________________________________________________________________________________________________________________________________________ IRJCS: Mendeley (Elsevier Indexed) CiteFactor Journal Citations Impact Factor 1.81 –SJIF: Innospace, Morocco (2016): 4.281 Indexcopernicus: (ICV 2016): 88.80 © 2014-19, IRJCS- All Rights Reserved Page -71 34. Chang, Chih-Chung, and Chih-Jen Lin. 2013. "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST) 2, no. 3: 27. 35. Oh, Sangmin, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee et al. 2011. "A large-scale benchmark dataset for event recognition in surveillance video." In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on, pp. 3153-3160. 36. UCF Lockheed-Martin UAV Dataset. 2009. Retrieved from http://vision.eecs.ucf.edu/aerial/index.html. 37. Sager, Hisham and Hoff, William, 2014, March. Pedestrian detection in low resolution videos. In IEEE Winter Conference on Applications of Computer Vision (pp. 668-673). IEEE. 38. Wang, Shiguang, Jian Cheng, Haijun Liu, and Ming Tang. 2018. "Pcn: Part and context information for pedestrian detection with CNNs." arXiv preprint arXiv: 1804.04483. 39. Chen, Zhichang, Li Zhang, Abdul Mateen Khattak, Wanlin Gao, and Minjuan Wang. 2019. "Deep Feature Fusion by Competitive Attention for Pedestrian Detection." IEEE Access. 40. Li, Zhaoqing, Zhenxue Chen, QM Jonathan Wu, and Chengyun Liu. 2019. "Pedestrian detection via deep segmentation and context network." Neural Computing and Applications: 1-13. 41. Zhang, Liliang, Liang Lin, Xiaodan Liang, and Kaiming He. 2016 "Is faster r-cnn doing well for pedestrian detection?." In European conference on computer vision, pp. 443-457. Springer, Cham. 42. Li, Chengyang, Dan Song, Ruofeng Tong, and Min Tang. 2019. "Illumination-aware faster R-CNN for robust multispectral pedestrian detection." Pattern Recognition 85: 161-171.