1. Semantically Integrating Laser and Vision
in Pedestrian Detection
Luciano Oliveira
Advisors:
Prof. Urbano Nunes
Prof. Paulo Peixoto
2. Motivation
Where is the pedestrian
Clustering in the scene?
methods
Segmentation
Recognition
Kalman Efficient sub-
Filter window searching
(image)
Tracking
Searching
3. Goals
Object detection
using laser/vision
Proof-of-concept:
pedestrian detection,
but can be applied to
several other objects
Recover object
localization
DO NOT entirely
rely on laser, as
previous methods do
Perform the fusion in
a context-aware mode
4. Overview of the proposed method
Laser points Images
Laser-image registration HLSM-FINT
Coarse 3D sliding window Sensor Parts-based
segmentation searching For each registration For each ensemble detector
3D window 2D window
{cn }n=1
N (object, confidence)
Templating matching
Fine Procrustes Semantic/contextual
segmentation
{ f m }m=1 for each c
M
n
analysis (label, confidence) for interpretation
each fm
Laser segmentation MLN
and labeling
Ground MRF
Reference Inference and
shapes decision outputs
6. Sensor-driven detectors
Laser points Images
Laser-image registration HLSM-FINT
Coarse 3D sliding window Sensor Parts-based
segmentation searching For each registration For each ensemble detector
3D window 2D window
{cn }n=1
N (object, confidence)
Templating matching
Fine Procrustes Semantic/contextual
segmentation
{ f m }m=1 for each c
M
n
analysis (label, confidence) for interpretation
each fm
Laser segmentation MLN
and labeling
Ground MRF
Reference Inference and
shapes decision outputs
7. Ensemble of classifiers HFI
Fuzzy inputs Hierarchical Fuzzy Integration
Perimeter
rate
Fuzzy
System
Intersection
rate
Final confidence
Distance /
max(w,w´) Fuzzy
System
C1
confidence
C2
Joint
C1 scaled score Fuzzy
System
C2 scaled score
8. Drawbacks
Initially evaluated on Haar-like features /
Adaboost and HOG / SVM classification
systems
It suffers from exponential growing of
rules and low overall performance over
challenging situations
10. HLSM-FINT – Rationale
• CNN – expert in background (BG)
(60% of hit rate in NiSIS competition)
BG BG OB
• HOG/SVM – expert in objects (OB)
(70% of hit rate in NiSIS competition)
• Fuzzy integral (Sugeno) – provides
a comprehensive framework and
great synergism
• 95.67% of hit rate in NiSIS
competition over 6125 cropped
images (ped + non-ped), using
Heuristic Majority Vote method BG BG OB
• 96.4% of hit rate over full
DaimlerChrysler datasets : ~15.000
images
16. Laser-image registration
Laser points Images
Laser-image registration HLSM-FINT
Coarse 3D sliding window Sensor Parts-based
segmentation searching For each registration For each ensemble detector
3D window 2D window
{cn }n=1
N (object, confidence)
Templating matching
Fine Procrustes Semantic/contextual
segmentation
{ f m }m=1 for each c
M
n
analysis (label, confidence) for interpretation
each fm
Laser segmentation MLN
and labeling
Ground MRF
Reference Inference and
shapes decision outputs
18. Semantic Fusion
Laser points Images
Laser-image registration HLSM-FINT
Coarse 3D sliding window Sensor Parts-based
segmentation searching For each registration For each ensemble detector
3D window 2D window
{cn }n=1
N (object, confidence)
Templating matching
Fine Procrustes Semantic/contextual
segmentation
{ f m }m=1 for each c
M
n
analysis (label, confidence) for interpretation
each fm
Laser segmentation MLN
and labeling
Ground MRF
Reference Inference and
shapes decision outputs
20. Semantic fusion
Wi
MRF
• MRFs given by FOL formulas
• Weights given by the MRF training (gradient ascent method over the
conditonal log-likelihood)
22. Conclusions
HFI has achieved better performance than its components, but failed
to get the gist of the fusion
HLSM-FINT has succeeded to capture the aimed synergism of the
fusion, but has had difficulties on hard situations (e.g. occlusion).
Parts-based occlusion has improved this issue.
The introduction of the laser sensor has brought significant
improvement
The proposed fusion method offers two main advantages:
Contextual and spatial relationship among the parts of the
object, dropping the false alarm rate
It is able to detect the object in spite of laser failing
The whole system is not able to run on-the-fly, although there is no
code optimization. Nevertheless, parallel hardware can provide
interesting plataform to make the system faster. It will be subject of
future research.
23. Publications and awards
Journals
OLIVEIRA, L.; NUNES, U.; PEIXOTO, P.; SILVA, M. and MOITA, F. Semantic
Fusion of Laser and Vision in Pedestrian Detection, Journal of Pattern
Recognition, Elsevier, accepted for publication (ISI impact factor: 3.279).
OLIVEIRA, L.; NUNES, U. and PEIXOTO, P. On Exploration of Classifier
Ensemble Synergism in Pedestrian Detection, IEEE Transactions on
Intelligent Transportation Systems, pp. 16-21, 2010 (ISI impact factor:
2.844).
Awards
3rd place in Intel/GV Entrepreneurship and Venture Capital Competition
(2008)
1st place in NiSIS Competition - Best accuracy model over Daimler Chrysler
image dataset. Scheme of Primate's Visual Cortex Cells for Pedestrian
Recognition (2007)
5 international conferences
Editor's Notes
In the beginning of my phd, the aim was to conceive a pedestrian detection system using monocular vision. As the work was going on, we realized that using a unique sensor with a unique method would be cumbersome, if not impossible, mainly applied in ITS. Our goal has changed, then, to propose inovative synergistic methods using multiple sensors. Our work is intitled (read...), and was supported by the following organizations...
An object detecion system is usually comprised of 4 modules. Each one of them forms a field of research, and can be subject of deep investigation within a thesis work. Therefore, we focused our attention on object detection itself.
Our first proposed ensemble of classifiers.
Our second proposed ensemble of classifiers.
The rationale of the method at a glance is to explore the synergism of high performance detection system. Therefore, what we want is to find synergism between the representation of background and object.
This is our parts-based HLSM-FINT. We use the more hinted parts, while avoiding representing the limbs, which are hard to detect at certain distances.