Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery
1. Aerodynamic Analysis and Design Lab.
WAI NWE TUN
August 16, 2017
Using HOG Descriptors on Superpixels for
Human Detection of UAV Imagery
AADL AI Seminar
3. Aerodynamic Analysis and Design Lab.
Introduction
3
Human detection is a challenging task, with many applications such as
pedestrian detection, search and rescue operations, surveillance.
Object detection process is generally performed in two main steps:
feature extraction and classification.
Feature extraction maps image windows to a fixed size feature space
that robustly encodes visual form.
Feature vectors are fed into a classifier to train.
Using the trained classifier, the images can be determined if desired
object is present or not.
This system presented human detection using image features by HOG
descriptors for each superpixel by Simple Linear Iterative Clustering
(SLIC) and HOG features descriptors are classified by AdaBoost.
4. Aerodynamic Analysis and Design Lab.
Introduction
4
Superpixel algorithms group pixels into perceptually meaningful atomic
regions which can be used to replace the rigid structure of the pixel grid.
They capture image redundancy, provide a convenient primitive from
which to compute image features, and greatly reduce the complexity of
subsequent image processing tasks. SLIC adapts k-means clustering to
generate superpixels.
Feature descriptor is a representation of an image or an image patch
that simplifies the image by extracting useful information and throwing
away extraneous information.
In HOG descriptors, the distribution of directions of gradient are used to
calculate features.
AdaBoost classifier is an ensemble type which uses a combination of
models consisting of a series of k weak classifiers.
7. Aerodynamic Analysis and Design Lab.
Feature Extraction
7
Sobel filters
Gradient magnitude and
direction are obtained from
converting x, y coordinates
to polar coordinates
8. Aerodynamic Analysis and Design Lab.
Superpixel Extraction by SLIC
8
Instead of fixed-sized block in original HOG, superpixels are used in
HOG calculation.
Superpixels are a group of consecutive pixels which share common
characteristics such as intensity and represent meaningful region.
Superpixel algorithm
• graph-based : each pixel as a node, node similarity as edge weight, minimize
cost function over graph.
• gradient ascent methods: clustering
SLIC : gradient ascent method
Distance measure is calculated by
normalizing the two measures:
color proximity (LAB) and
spatial proximity (XY).
K-means clustering is performed using the distance measure for each
limited region.
9. Aerodynamic Analysis and Design Lab.
HOG Descriptors Calculation
9
Histograms of gradients are created for each superpixel group and
concatenated all. Then, L2 Normalization is performed.
10. Aerodynamic Analysis and Design Lab.
AdaBoost Classifier: Learning
10
Given D, a data set of d class-labeled tuples, (𝑋1, 𝑦1), .. (𝑋 𝑑, 𝑦𝑛) where X
= HOG descriptors, y=human or not, d = number of images, n= 2
(number of class labels).
Adaboost assigns each training tuple an equal weight of
1
𝑑
.
Generating k classifiers requires k rounds/iterations
In round i, tuples from D are sampled to form a training set, 𝐷𝑖, of size d
(with replacement).
Each tuple’s chance of being selected is based on its weight.
A classifier model, 𝑀𝑖, is derived from the training tuples, 𝐷𝑖 and its error
is calculated.
If a tuple was incorrectly classified, its weight is increased. Otherwise, it
is decreased. This is to focus more on the misclassified tuples.
𝑒𝑟𝑟𝑜𝑟 𝑀𝑖 =
𝑗
𝑑
𝑤𝑗 × 𝑒𝑟𝑟(𝑋𝑗
11. Aerodynamic Analysis and Design Lab.
AdaBoost Classifer: Prediction
11
Instead of an equal vote, a weight to each classifier’s vote is assigned,
based on how well the classifier performed.
The lower a classifier’s error rate, the more accurate it is and the higher
its weight for voting should be.
For each class c (e.g., human presence or not), weights of each classifer
that assigned class c to ‘that’ tuple.
The class with the highest sum is the ‘winner’ and is returned as the
class prediction for ‘that’ tuple.
𝑣𝑜𝑡𝑒 𝑀𝑖 =
1 − 𝑒𝑟𝑟𝑜𝑟 𝑀𝑖
𝑒𝑟𝑟𝑜𝑟 𝑀𝑖
12. Aerodynamic Analysis and Design Lab.
Performance Evaluations
12
Performance is evaluated using four measures: accuracy, precision,
recall and computational time.
Accuracy
• Overall accuracy of a system
Precision (Correctness)
• Express how many selected items are relevant
• High precision means that a system returned more
relevant results than irrelevant ones
Recall (Completeness)
• Express how many relevant items are selected
• High recall means that a system returned most of the
relevant results
Computational time
• Time taken to build a classifier and classify an image
* Wikipedia: Precision and Recall
13. Aerodynamic Analysis and Design Lab.
Dataset
13
INRIA person dataset and ImageNet person dataset
Training stage : number of positive (human presence) and negative (no
human) images is 1000 each.
Testing stage : 300 images for positive and 200 images for negative.
All images are downscaled to a resolution of 128x96. (Trial and error
process)
14. Aerodynamic Analysis and Design Lab.
Evaluation Methods
14
Confusion matrix is created for each case.
Predicted class
Human detected
Human not
detected
Actual
class
Human True positives (TP) False negatives (FN)
No human False positives (FP) True negatives (TN)
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
15. Aerodynamic Analysis and Design Lab.
Evaluation Results
15
Computational time:
• Training
– 10-seg : 23 minutes
– Original HOG : 8 minutes
• Testing
– 10-seg : 0.64 seconds
– Original HOG : 0.04 seconds
71.2
80.3
80.6
79.8
83.7
82.8
76.6
80.7
80.4
78.8
81
83.2
73.2
84.7
74.3
AC C UR AC Y PR EC ISION R EC ALL
%
10-seg 100-seg 200-seg 300-seg original HOG
17. Aerodynamic Analysis and Design Lab.
Conclusion
17
It presents an approach to detect human in images uniquely by
superpixelwise HOG calculations each channel of LUV color space.
HOG descriptors are fed into AdaBoost to classify images into two
categories : human-detected and human-not-detected.
For performance measure, accuracy, precision, recall and computational
time are used.
Four experiments including 10-segmented, 100-segmented, 200-
segmented, and 300-segmented are performed to compare results
against the original HOG.
Except 10-segmented one, the other superpixelwise approaches
outperform the orginal HOG in terms of accuracy by 3% or more.
18. Aerodynamic Analysis and Design Lab.
References
18
All references described in my paper
http://www.learnopencv.com/histogram-of-oriented-gradients/
https://www.mathsisfun.com/polar-cartesian-coordinates.html