Perspective Multiscale Detection and Tracking of Persons
1. Perspective Multiscale Detection and
Tracking of Persons
Marcos Nieto, Juan Diego Ortega, Andoni Cortés, and Seán Gaines
MMM 2014 – The 20th Anniversary International Conference on
Multimedia Modeling, Dublin (Ireland), 6,7,8-10th January 2014
1
1
4. Motivation
• Object detection in images
Multiscale detection
Sliding window
Spans position & size
Bounding boxes
Detection-by-classification
Supervised learning
Feature extraction
Binary or multiclass
Close
Open
4
4
Close
Open
5. • Real-time applications
Multiscale detection
Kind of brute-force
Too many evaluations
Some are absurd given the context
Num. Evaluations
100
90
80
70
60
50
40
30
20
10
0
Th.
Motivation
1,02
1,05
1,1
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
Levels
Parameters
Initial (smallest) size
Number of scales
Factor between scales
Offset (stride)
…
Therefore, some
knowledge about the
scene must be provided
5
5
6. Motivation
• Contextual information
Color, motion, depth
Perspective of the scene
Low generality
High generality
Particular to each application
Allows to maintain multiscale technique
Applicable in real-time
Two assumptions
There is a dominant ground plane
Objects lie on the plane, and their 3D size is app. known
Surveillance, ADAS
Vehicles, persons
6
6
8. Perspective calibration
• Plane view calibration
Extrinsics from Homography
Rotation and translation of
camera
Homography calculation
4-points
2 metric references
1 DoF Camera model
Focal length from homography
Refinement using Lev.-Marq.
8
8
12. Approach
• Define the perspective of the scene
Camera calibration
Intrinsic parameters
Homography
Camera pose
Extrinsic parameters calibration
• Define the 3D size of the object to search
Persons
1700 x 500 x 500
Car
1500 x 1700 x 3500
• A) Calculate the best parameters for multiscale
• B) Define a fixed grid of positions in the plane
12
12
13. Approach
Multiscale
• A) Perspective multiscale
• Rescale original
image so model size
fits farthest object
• Compute scale
factor so that model
size coincides with
closest object at the
smallest image
• Filter out invalid
positions
Focused effort: less
number of levels are
required
13
13
Perspective Multiscale
14. Approach
Num. Evaluations
100
90
80
70
60
50
40
30
20
10
0
Th.
• It is still necessary to filter out invalid positions-sizes
• The advantage of using this approach is that traditional multiscale
implementations can still be used with much less number of levels
1,02
1,05
1,1
1 6 11 16 21 26 31 36 41 46 51 56 61
Levels
Focused effort: less number
of levels are required
(typically 3 to 5)
14
14
15. Approach
• B) Grid of fixed positions
• Predefine feasible
locations of objects
• No need to filter
• Can not be used in
multiscale
implementations.
One evaluation per
candidate
Much more focused
effort
Projected boxes
15
15
Bounding boxes
17. Results
• Case study: Person detection
–
–
–
–
Full-body and Head & Shoulder SVM-HOG detector
Perspective Multiscale
Linear multiobject tracking
Active Vision Group dataset (1920x1080, 4500 frames, 71460 persons
labeled)
17
17
18. Results
• Performance
Multiscale
Perspective Multiscale
– Reduction from
144880 to 46226
(68%) for similar
performance
1
0,998
0,996
0,994
0,992
Precision
– Using 3 levels is
enough because
perspective effect is
soft
0,99
L=3, 5, 7
Less FP but also
some
missdetections
FB
0,988
FBUB
0,986
FBUB*
0,984
Filtering
DAF
0,982
Tracking
Less FN
0,98
0,978
-0,1
6E-16
18
18
0,1
0,2
0,3
Recall
0,4
0,5
0,6
19. Results
• Case study: Vehicle detection
–
–
–
–
Vehicle detection application for embedded vision system
Road can be assumed as planar in the short distance
Ground truth sequence 2 minutes
Grid of fixed positions
19
19
20. Results
• Case study: Vehicle detection
– Detections are sparse and noisy
– Tracking is still necessary
20
20
22. Results
Type
Processor
RAM
CPU
OS
Language
PC
Intel Core
i5
8 GB
3.0 GHz
Windows 7
Ubuntu 12.04
C++
Embedded
HW 1
ARM
Cortex
512 MB
800 MHz
Xilinx Zynq
Linux
C++
Slow
Perspective
multiscale
Brute-force
multiscale
Fast
2 - 10 ms in PC
30 - 40 ms in ARM Cortex
11 – 40 ms in PC
25 fps real-time
22
22
23. Conclusions
• Perspective is a contextual information available in many situations
• Assumptions: dominant ground plane and known object size
• Its computation is easy (K, R, t) using homographies
• It can be used for object detection to focus computational Two
ways of applying it
• A) Perspective Multiscale: Wrapping multiscale function (~60%
reduction in typical surveillance scene)
• B) Grid of fixed positions: for even more reduction of
complexity (x7 speed up in low perspective scenes like onboard
vehicle detection)
23
23