Stixel based real time object detection for ADAS using surface normal

Stixel-based Real Time Object detection for
ADAS using Surface Normal Vectors
CVLab. at Inha Univ.
Tae-Kang Woo
2016.12.
Keywords : Surface vector, Detection validation, Disparity confidence(Middle level representation confidence),
Stixel, Real-time ADAS, 3D reconstruction, Extrinsic parameter estimation, Object detection

Contents
I. Problem Definition
1. Problem
2. Goal
3. Related work
II. System Design
III. Surface Normal Vector
1. SNV map using integral image
2. Local SNV computation
IV. Super-SNV
1. S-SNV computation
2. Parametric issue
3. Adaptive mean shift
V. Evaluation
1. Test scenario & Database
2. Evaluation method
3. Experimental result & discussion
VI. Conclusion
2
Chap.1, 2
Chap.4
Chap.3
Chap.3
Chap.4

Introduction
❖ Flow chart of ADAS Stereo vision
3
disparity V-disparity Ground remove
In disparity map
Remove the sky that is over 2.5m based on ground
Ground line Remove below ground plane(v-disparity line)
Height constraint
Left image
Stixel Stixel
segmentation

Problem
❖ Hypothesis ROI Error caused by incorrect disparity
4
Naïve padding WLS filter

Purpose of Stereo vision
5
❖ Goal
▪ Stixel-based stereo vision module for real time ADAS
• 15fps on TX1 or 30fps on PC
▪ Stable hypothesis ROI for recognition module
• Precision rate 10% improvement by removing error ROI
▪ Object geometry feature analysis & classification using SNV
• Propose 3 classes of forward situation based on Surface Normal Vector
• Hypothesis ROI validation using surface vector object classification
▪ Extrinsic parameter output between camera and ROI(ground, object)
• The representative vector of the surface vectors in the ROI is selected

Previous approach – object detection
6
❖ A disparity map refinement to enhance weakly-textured urban
environment data (2013)
▪ Research to overcome disparity error is most active.
▪ Define refinement term using segmentation based on edge
▪ Performance improves but takes more than 1,700ms
Result

7
❖ Disparity confidence map (2010)
▪ Confidence map based on matching cost to enable disparity validation
▪ There is a disadvantage that the reliability of the object unit is not provided
▪ Reliability is not provided for interpolated disparity estimates
Image Disparity
Confidence

8
❖ U-V Disparity Map Analysis (2010, 2015)
▪ Super pixel method based on 2D projection
▪ It is assumed that the disparity exists in the object, and the object is detected by fitting the line
of each axis after projection.
▪ Disparity Multiple errors occur when estimating to improve performance.
Test image
Result
disparity
V-disparity
U-disparity

Problem – object detection
❖ Hypothesis Error
9
Error Out-Noc Out-All
2 pixels 24.83 % 28.39 %
3 pixels 17.14 % 20.78 %
4 pixels 13.18 % 16.70 %
5 pixels 10.79 % 14.14 %
Deep Embed alg.
There are inevitable errors on reflective regions in spite of state-of-the-art method.
Z. Chen, X. Sun, Y. Yu, L. Wang and C. Huang: A Deep Visual Correspondence Embedding Model for Stereo Matching Costs. ICCV 2015.

System design
10
❖ INNOVATION : Develop validation method with physical meaning
Left image
Right image
Distance
3D position
Multiple ROI
Bounding box
NERV Object Detection
Normal-based Efficient Re-Validation
Stixel
Estimation
Stixel
Segmentation
(objectness)stixels
Stereo
Matching
Disparity
map
Hypothesis
ValidationSurface Normal
map
BB with 3D
position
Surface
normal
computation
Depth feature for
RGB-D processing
Extrinsic parameter
Camera pose

Hypothesis ROI validation
❖ Surface normal vector
11
A
B
C
𝐴
𝐵
𝐶
𝑁 = 𝐴𝐵 × 𝐴𝐶
A
B
C𝑁 = 𝑢 × Ԧ𝑣
𝑁 = {𝑦 𝑢 𝑧 𝑣 − 𝑧 𝑢 𝑦𝑣, 𝑥 𝑢 𝑧 𝑣 − 𝑧 𝑢 𝑥 𝑣, 𝑥 𝑢 𝑦𝑣 − 𝑦 𝑢 𝑥 𝑣}
𝑢
Ԧ𝑣
Real meter Surface Normal vector

❖ Surface normal at Hypothesis Error
12
A
B
C
Surface Normal vector of Error area
Disparity error

❖ Assumption
▪ Normal vectors can represent a difference of object attributes.
▪ Each normal vector has information : {position, direction, scale}
▪ Surface normal can be divided into 3 part ( i.e. object, ground, error )
❖ Goal
▪ Find the differences among each part of surface normal
13
Object
Surface normal
Position : {x, y, z}
Direction : {i, j, k}
Scale : s
Ground
Surface normal
Scale : s
Error
Surface normal
Scale : s
Class of Surface Normal vector

❖ Feature of normal in error region
14
High Density than others position’s
Their normal don’t have
horizontal component

SNV Map Computation
❖ How to compute normal vector? – Naïve SNV
15
3D point cloud from disparity
Time : 29 ms
Surface normal
• Generally, it has been considered that calculating a surface vector in an image is difficult to operate in real time.

SNV Map Computation
❖ How to compute normal vectors efficiently? – Integral image
16
𝒮(𝐼𝑂, 𝑚, 𝑛, 𝑟) =
1
4𝑟2
· ( 𝐼𝑂(𝑚 + 𝑟, 𝑛 + 𝑟) − 𝐼𝑂(𝑚 − 𝑟, 𝑛 + 𝑟) − 𝐼𝑂(𝑚 + 𝑟, 𝑛 − 𝑟) + 𝐼𝑂(𝑚 − 𝑟, 𝑛 − 𝑟) )
𝑢 𝑥 =
𝒫𝑥 𝑚 + 𝑟, 𝑛 − 𝒫𝑥 𝑚 − 𝑟, 𝑛
2
𝑢 𝑦 =
𝒫𝑦 𝑚 + 𝑟, 𝑛 − 𝒫𝑦 𝑚 − 𝑟, 𝑛
2
𝑢 𝑧 =
𝒮(𝐼 𝒫𝑧
, 𝑚 + 1, 𝑛, 𝑟 − 1) − 𝒮(𝐼 𝒫𝑧
, 𝑚 − 1, 𝑛, 𝑟 − 1)
2
𝑢 𝑥 =
𝒫𝑥 𝑚, 𝑛 + 𝑟 − 𝒫𝑥 𝑚, 𝑛 − 𝑟
2
𝑢 𝑦 =
𝒫𝑦 𝑚, 𝑛 + 𝑟 − 𝒫𝑦 𝑚, 𝑛 − 𝑟
2
𝑢 𝑧 =
𝒮(𝐼 𝒫𝑧
, 𝑚, 𝑛 + 1, 𝑟 − 1) − 𝒮(𝐼 𝒫𝑧
, 𝑚, 𝑛 − 1, 𝑟 − 1)
2
• where 𝒫𝑥, 𝒫𝑦, and 𝒫𝑧 are two-dimensional maps storing the x-, y-, and z-coordinates of the organized point cloud.
𝐼 𝒫𝑧
is the integral image of the z-components of the point cloud.
𝑟 means, ℛ 𝑚, 𝑛 = min(ℬ 𝑚, 𝑛 ,
𝒯 𝑚,𝑛
2
).  smoothing function depending on depth and depth change
𝑁 = 𝑢 × Ԧ𝑣
Holzer, S., Rusu, R. B., Dixon, M., Gedikli, S., & Navab, N. (2012, October). Adaptive neighborhood selection for real-time surface normal estimation from organized point cloud data using
integral images. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on (pp. 2684-2689). IEEE.
Processing time : 28 ms
At 2.26 GHz Intel(R) Core(TM)2 Quad CPU and 4 GB of RAM
VGA image 307200 pixels
Processing time : 12 ms
At 2.7 GHz Intel Core i7 CPU and 16 GB of RAM
VGA image 307200 pixels

SNV Map Computation
❖ How to compute normal vectors in real time? – Local SNV
17
𝑁 = 𝑢 × Ԧ𝑣
PROCESS
1. Compute SNV in stixel ROI
2. Convert coordinate to angle
3. Find the mode angle using
histogram
4. Remove outlier in adaptive
mean shift
5. Select the convergence value to
main extrinsic parameter

Ground information in surface vector
❖ How to compute normal vector in real time? – Local SNV
18
Original image Full search
2 pixel ground 5 pixel ground
Time: 28ms
Time: 5ms

Ground information in surface vector
❖ How to compute normal vector efficiently? – Local SNV + Super-SNV
19
X
Y
Z
-0.0~0.0
-1.0~-0.9
-0.1~-0.0
The angle of the surface of the object can be calculated.  Automatic calculation of external parameters is possible.
Pitch angle : -1.89°

Direction of ground and object
20
Pitch angle : -1.89°❖ Stixel area based SNV
Direction of ground vector Direction of object vector
y
x
z
y
z
x

Super Surface Normal vector
❖ SSNV Selection method
21
• Although the resolution of the Cartesian coordinate reference is 0.1, the range of resolution varies by
cos−1
𝑥. Therefore, the calculation of the histogram itself will have a significant effect on the reliability.
• Therefore, the histogram is calculated by changing the x-axis in units of 𝜃.
𝜃
y
-2.5 0 2.5 5-5
0
1
Interval : 0.1°
(x, y, z) Cartesian coordinate  pitch, yaw, roll angle
𝒑𝒊𝒕𝒄𝒉 ° = 𝟗𝟎° − 𝐜𝐨𝐬−𝟏
𝒛
𝒚 𝟐 + 𝒛 𝟐

Issue of vector interval
❖ How to set the interval of the vectors?
▪ The optimal solution from the trade-off between execution time and accuracy
▪ Super surface normal vector confidence
22
Interval : 5 pixel
Setting standard?

❖ How to set the interval of the vectors? – Processing time
23
10.71
6.53
5.36
4.48
4.12
4.01
6.83
1.7
0.97
0.71
0.52
0.41
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1
2
3
4
5
6
Time(ms)
Interval(pixel)
1 2 3 4 5 6
SNV cal time 10.71 6.53 5.36 4.48 4.12 4.01
Super-SNV time 6.83 1.7 0.97 0.71 0.52 0.41
Suface normal processing time
SNV cal time Super-SNV time
• Compute the surface vector for the entire image while maintaining real time.
• Since the surface vector calculation module is a sub-module of the entire module, it can not be costly.
• It is difficult to use less than three pixels because it is available only within a vector interval of 6.6ms which
is 10% of real time.
• Will it be possible to guarantee reliability in intervals of 3 pixels or more?
Real time
boundary
6.6ms

❖ How to set the interval of the vectors? – Accuracy
▪ Histogram mode
24
• The range of ± α is defined as Inlier based on the bin having the maximum value of the histogram. The
range should be 95% of the total number. Find the mean and standard deviation of the Inlier vectors.
• The mean is -1.94°, The variation is (64.06?).
mode: -1.3° Inlier: 95% range
based on mode
Inlier mean:
-1.94°

▪ Histogram Distribution
25
• Due to the physical phenomenon of the camera, it is not possible to extract the opposite parallax and vector
on the curved surface.
 Therefore, a skewness occurs in the distribution and the entire distribution is biased in one direction
• The vector of the ideal ground rather than the camera observation is expected to follow the normal
distribution, but the specimen is distorted due to camera observation.
• In this case, the representative value of the distribution is known as the average < mean value < mode, and
the median value is known to be located near the average at the point where the interval between the mean
and the mode is divided into three equal parts.
𝒉 𝒄
Optical axis
𝜽
V-FOV
Ground
Camera

▪ Advanced mean-shift
26
1. Use the initial value of the mode to find the mean value of the inliers within the surrounding 𝑟.
In this case, 𝑟 is determined as a range including 50% of the total number. Ex) 본 예제에서 약 ±3.5°
2. Perform STEP 1. again based on the average value.
3. Repeat STEP 1. and 2. until the average converges to 0.01° or less.
Init value: -1.3° Inlier: 50% range
based on center
r
First step:-1.13°

▪ Advanced mean-shift
27
• Confidence measure: entropy
𝐻 𝑋 = 𝐸 𝐼 𝑋 = ෍
1
𝐾
𝑃 𝑋 = 𝑘 ln(
1
𝑃 𝑋 = 𝑘
) = − ෍
1
𝐾
𝑃 𝑋 = 𝑘 ln 𝑃 𝑋 = 𝑘
Init value: -1.3° Inlier: 50% range
based on center
r
First step:-1.13°
𝐻 𝑚𝑒𝑎𝑛 ≥ 𝐻 𝑚𝑜𝑑𝑒 > 𝐻 𝑚𝑒𝑎𝑛𝑠ℎ𝑖𝑓𝑡

Experiment Result
❖ Surface Normal Result
28
Point cloud
Pitch histogram Ground dir
Normal vector
Disparity
Left image

Experiment Introduction
❖ Open dataset: KITTI, CityScape
29
KITTI dataset CityScape dataset
• 3D GT
• Vehicle inner information(OBD)
• Color stereo image
• 1242x375 & focal : 722
• Horizontal FOV : 81°
• 2D pixelwise label GT
• 2048x1024 & focal : 2263.5
• Horizontal FOV : 48.7°

Experiment Introduction
❖ INHA Dataset
30
INHA ZED dataset
• 2D GT
• Vehicle speed
• 1280x720 & focal : 700
• Horizontal FOV : 85°

Evaluation method
❖ Measurement
31
𝑑1
• Precision
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
• Recall
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
TP: true positive, FP: false positive, FN: false negative
• F-measure
𝐹𝛽 =
1 + 𝛽2
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝛽2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
𝐹1 =
2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
𝛽: weight
𝑂1
𝑂2
𝑂3
𝑂4
𝑔𝑡
1m
Image
𝑅1
𝒄𝒐𝒓𝒓𝒆𝒄𝒕
• Overlap rate > 50% : PASCAL measure
𝑎0 =
𝑎𝑟𝑒𝑎 𝐵𝐵 𝑑𝑡 ∩ 𝐵𝐵 𝑔𝑡
𝑎𝑟𝑒𝑎(𝐵𝐵 𝑑𝑡 ∪ 𝐵𝐵 𝑔𝑡)
> 0.5
𝒊𝒏𝒄𝒐𝒓𝒓𝒆𝒄𝒕
• Else case including 𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑃0
𝑅2
𝑑𝑡

Experiment result
❖ Result on KITTI Dataset
32
Stixel only Stixel with SNV
Number
of object
9873 9873
True positive 9579 9562
False positive 1805 396
False negative 294 311
Precision 0.841 0.960
Recall 0.970 0.968
𝐹1 measure 0.901 0.964
True Positive
False Negative
False Positive
Removed
SNV
True Positive
True Positive
False Positive
False Positive
STIXEL
Aver time: 24ms

Discussion about Experiment result
❖ Discussion on KITTI Dataset
33
Stixel only Stixel with SNV
Number
of object
9873 9873
True positive 9579 9562
False positive 1805 396
False negative 294 311
Precision 0.841 0.960
Recall 0.970 0.968
𝐹1 measure 0.901 0.964
True Positive
False Negative
False Positive
Removed
SNV
True Positive
True Positive
False Positive
False Positive
STIXEL
It should not be
removed, but it
was removed
because there is
a lot of ground
vector
It should be removed there is a lot of ground vector
It should be removed, but it was not removed
because there is a lot of object vector

Experiment result
❖ Result on CityScape Dataset
34
Aver time: 28ms

Experiment result
❖ Result on INHA Dataset
35
Aver time: 33ms

Conclusion
36
❖Hypothesis ROI validation
• It use ‘Surface normal’ to find the difference direction between other object.
• Surface normal can be computed by Global or Local method.
• This method depends on only two inputs that are ‘Disparity map’ and ‘Bounding box’.
• So It can be utilized to any 3D recognition system for their result validation.
• This method appears to solve the disparity error on reflective region.
• Also, In global method, Surface normal map can be used to recognition module.
❖Future work
• I will develop a 3D ROI for ADAS based on collision risk analysis.
pixel
distance 5m

Stixel based real time object detection for ADAS using surface normal

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Stixel based real time object detection for ADAS using surface normal

Similar to Stixel based real time object detection for ADAS using surface normal (20)

Recently uploaded

Recently uploaded (20)

Stixel based real time object detection for ADAS using surface normal