Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery

Aerodynamic Analysis and Design Lab.
WAI NWE TUN
August 16, 2017
Using HOG Descriptors on Superpixels for
Human Detection of UAV Imagery
AADL AI Seminar

Contents
 Introduction
 System architecture
• Superpixel extraction
• HOG descriptor calculation
• Adaboost classifier
 Performance evaluations
 Dataset
 Evaluation results
 Conclusion
 References
2

Introduction
3
 Human detection is a challenging task, with many applications such as
pedestrian detection, search and rescue operations, surveillance.
 Object detection process is generally performed in two main steps:
feature extraction and classification.
 Feature extraction maps image windows to a fixed size feature space
that robustly encodes visual form.
 Feature vectors are fed into a classifier to train.
 Using the trained classifier, the images can be determined if desired
object is present or not.
 This system presented human detection using image features by HOG
descriptors for each superpixel by Simple Linear Iterative Clustering
(SLIC) and HOG features descriptors are classified by AdaBoost.

Introduction
4
 Superpixel algorithms group pixels into perceptually meaningful atomic
regions which can be used to replace the rigid structure of the pixel grid.
 They capture image redundancy, provide a convenient primitive from
which to compute image features, and greatly reduce the complexity of
subsequent image processing tasks. SLIC adapts k-means clustering to
generate superpixels.
 Feature descriptor is a representation of an image or an image patch
that simplifies the image by extracting useful information and throwing
away extraneous information.
 In HOG descriptors, the distribution of directions of gradient are used to
calculate features.
 AdaBoost classifier is an ensemble type which uses a combination of
models consisting of a series of k weak classifiers.

System Architecture
5

Features Extraction
6

Feature Extraction
7
 Sobel filters
 Gradient magnitude and
direction are obtained from
converting x, y coordinates
to polar coordinates

Superpixel Extraction by SLIC
8
 Instead of fixed-sized block in original HOG, superpixels are used in
HOG calculation.
 Superpixels are a group of consecutive pixels which share common
characteristics such as intensity and represent meaningful region.
 Superpixel algorithm
• graph-based : each pixel as a node, node similarity as edge weight, minimize
cost function over graph.
• gradient ascent methods: clustering
 SLIC : gradient ascent method
 Distance measure is calculated by
normalizing the two measures:
color proximity (LAB) and
spatial proximity (XY).
 K-means clustering is performed using the distance measure for each
limited region.

HOG Descriptors Calculation
9
 Histograms of gradients are created for each superpixel group and
concatenated all. Then, L2 Normalization is performed.

AdaBoost Classifier: Learning
10
 Given D, a data set of d class-labeled tuples, (𝑋1, 𝑦1), .. (𝑋 𝑑, 𝑦𝑛) where X
= HOG descriptors, y=human or not, d = number of images, n= 2
(number of class labels).
 Adaboost assigns each training tuple an equal weight of
1
𝑑
.
 Generating k classifiers requires k rounds/iterations
 In round i, tuples from D are sampled to form a training set, 𝐷𝑖, of size d
(with replacement).
 Each tuple’s chance of being selected is based on its weight.
 A classifier model, 𝑀𝑖, is derived from the training tuples, 𝐷𝑖 and its error
is calculated.
 If a tuple was incorrectly classified, its weight is increased. Otherwise, it
is decreased. This is to focus more on the misclassified tuples.
𝑒𝑟𝑟𝑜𝑟 𝑀𝑖 =
𝑗
𝑑
𝑤𝑗 × 𝑒𝑟𝑟(𝑋𝑗

AdaBoost Classifer: Prediction
11
 Instead of an equal vote, a weight to each classifier’s vote is assigned,
based on how well the classifier performed.
 The lower a classifier’s error rate, the more accurate it is and the higher
its weight for voting should be.
 For each class c (e.g., human presence or not), weights of each classifer
that assigned class c to ‘that’ tuple.
 The class with the highest sum is the ‘winner’ and is returned as the
class prediction for ‘that’ tuple.
𝑣𝑜𝑡𝑒 𝑀𝑖 =
1 − 𝑒𝑟𝑟𝑜𝑟 𝑀𝑖
𝑒𝑟𝑟𝑜𝑟 𝑀𝑖

Performance Evaluations
12
 Performance is evaluated using four measures: accuracy, precision,
recall and computational time.
 Accuracy
• Overall accuracy of a system
 Precision (Correctness)
• Express how many selected items are relevant
• High precision means that a system returned more
relevant results than irrelevant ones
 Recall (Completeness)
• Express how many relevant items are selected
• High recall means that a system returned most of the
relevant results
 Computational time
• Time taken to build a classifier and classify an image
* Wikipedia: Precision and Recall

Dataset
13
 INRIA person dataset and ImageNet person dataset
 Training stage : number of positive (human presence) and negative (no
human) images is 1000 each.
 Testing stage : 300 images for positive and 200 images for negative.
 All images are downscaled to a resolution of 128x96. (Trial and error
process)

Evaluation Methods
14
 Confusion matrix is created for each case.
Predicted class
Human detected
Human not
detected
Actual
class
Human True positives (TP) False negatives (FN)
No human False positives (FP) True negatives (TN)
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁

Evaluation Results
15
 Computational time:
• Training
– 10-seg : 23 minutes
– Original HOG : 8 minutes
• Testing
– 10-seg : 0.64 seconds
– Original HOG : 0.04 seconds
71.2
80.3
80.6
79.8
83.7
82.8
76.6
80.7
80.4
78.8
81
83.2
73.2
84.7
74.3
AC C UR AC Y PR EC ISION R EC ALL
%
10-seg 100-seg 200-seg 300-seg original HOG

Results
16

Conclusion
17
 It presents an approach to detect human in images uniquely by
superpixelwise HOG calculations each channel of LUV color space.
 HOG descriptors are fed into AdaBoost to classify images into two
categories : human-detected and human-not-detected.
 For performance measure, accuracy, precision, recall and computational
time are used.
 Four experiments including 10-segmented, 100-segmented, 200-
segmented, and 300-segmented are performed to compare results
against the original HOG.
 Except 10-segmented one, the other superpixelwise approaches
outperform the orginal HOG in terms of accuracy by 3% or more.

References
18
 All references described in my paper
 http://www.learnopencv.com/histogram-of-oriented-gradients/
 https://www.mathsisfun.com/polar-cartesian-coordinates.html

Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery

Similar to Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery (20)

Recently uploaded

Recently uploaded (20)

Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery