Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras
FisheyeDistanceNet++: Self-Supervised Fisheye Distance Estimation with Self-Attention, Robust Loss Function and Camera View Generalization
An Online Learning System for Wireless Charging Alignment using Surround-view Fisheye Cameras
RoadEdgeNet: Road Edge Detection System Using Surround View Camera Images
2. Outline
• Disentangling and Vectorization: A 3D Visual Perception Approach for
Autonomous Driving Based on Surround-View Fisheye Cameras
• SVDistNet: Self-Supervised Near-Field Distance Estimation on
Surround View Fisheye Cameras
• FisheyeDistanceNet++: Self-Supervised Fisheye Distance Estimation
with Self-Attention, Robust Loss Function and Camera View
Generalization
• An Online Learning System for Wireless Charging Alignment using
Surround-view Fisheye Cameras
• RoadEdgeNet: Road Edge Detection System Using Surround View
Camera Images
3. Disentangling and Vectorization: A 3D Visual Perception Approach for
Autonomous Driving Based on Surround-View Fisheye Cameras
• The 3D visual perception for vehicles with the surround-view fisheye camera
system is a critical and challenging task for low-cost urban autonomous driving.
• While existing monocular 3D object detection methods perform not well enough
on the fisheye images for mass production, partly due to the lack of 3D datasets
of such images.
• In this paper, manage to overcome and avoid the difficulty of acquiring the large
scale of accurate 3D labeled truth data, by breaking down the 3D object detection
task into some sub-tasks, such as vehicle’s contact point detection, type
classification, re-identification and unit assembling, etc.
• Particularly, propose the concept of Multidimensional Vector to include the
utilizable information generated in different dimensions and stages, instead of
the descriptive approach for the BEV or a cube of eight points.
• The experiments of real fisheye images demonstrate that our solution achieves
state-of-the-art accuracy while being real-time in practice.
5. Disentangling and Vectorization: A 3D Visual Perception Approach for
Autonomous Driving Based on Surround-View Fisheye Cameras
The inputs are four-channels fisheye images which construct a surround-view environment for ego-vehicles. The final
output is a vector map containing the shape of the object under the bird’s-eye view.
6. Disentangling and Vectorization: A 3D Visual Perception Approach for
Autonomous Driving Based on Surround-View Fisheye Cameras
The diagram of translating the pixel
coordinates of the contact points in
fisheye image to the physical
coordinates.
7. Disentangling and Vectorization: A 3D Visual Perception Approach for
Autonomous Driving Based on Surround-View Fisheye Cameras
The specific composition of the Multidimensional Vector.
8. Disentangling and Vectorization: A 3D Visual Perception Approach for
Autonomous Driving Based on Surround-View Fisheye Cameras
ReID module is divided into three stages.
The first stage fuses the vectors of three
branches, the second stage generates an ID
for each object of each channel, and the
third stage merges the vectors which
describing the same object into one vector.
9. Disentangling and Vectorization: A 3D Visual Perception Approach for
Autonomous Driving Based on Surround-View Fisheye Cameras
Channel fusion and category fusion. (1) Wheels from different channels
are fused into one vehicle. α and β are 0.5, which indicate weights
assigned to the front wheels of two vehicles. (2) Wheels and bumper
from different categories are fused into one vehicle.
10. Disentangling and Vectorization: A 3D Visual Perception Approach for
Autonomous Driving Based on Surround-View Fisheye Cameras
Two cases of calculating the heading angles of the target-vehicles. (a) two
wheels on one side are visible. (b) only one wheel and one bumper are visible.
11. Disentangling and Vectorization: A 3D Visual Perception Approach for
Autonomous Driving Based on Surround-View Fisheye Cameras
perception
12. Disentangling and Vectorization: A 3D Visual Perception Approach for
Autonomous Driving Based on Surround-View Fisheye Cameras
Time consumption tests on the hardware platform
13. SVDistNet: Self-Supervised Near-Field Distance
Estimation on Surround View Fisheye Cameras
• The depth estimation model must be tested on a variety of cameras equipped to millions of cars
with varying camera geometries. Even within a single car, intrinsics vary due to manufacturing
tolerances.
• Deep learning models are sensitive to these changes, and it is practically infeasible to train and
test on each camera variant.
• present camera-geometry adaptive multi-scale convolutions which utilize the camera parameters
as a conditional input, enabling the model to generalize to previously unseen fisheye cameras.
• improve by pairwise and patchwise vector-based self-attention encoder networks.
• a generalization across different camera viewing angles and extensive experiments.
• To enable comparison with other approaches, evaluate the front camera data on the KITTI dataset
(pinhole camera images) and achieve state-of-the-art performance among self-supervised
monocular methods.
• Baseline code and dataset will be made public: https://github.com/valeoai/WoodScape
14. SVDistNet: Self-Supervised Near-Field Distance
Estimation on Surround View Fisheye Cameras
surround-view distance estimation framework is
facilitated by employing a single network on
images from multiple cameras. A surround-view
coverage of geometric information is obtained for
an autonomous vehicle by utilizing and post-
processing the distance maps from all cameras.
16. SVDistNet: Self-Supervised Near-Field Distance
Estimation on Surround View Fisheye Cameras
• High-level overview of surround-view self-supervised distance estimation
framework, which employs semantic guidance as well as camera-geometry
adaptive convolutions (orange blocks).
• framework comprises training units for self-supervised distance estimation
(blue blocks) and semantic segmentation (green blocks).
• The camera tensor Ct assists SVDistNet in producing distance maps across
multiple camera-viewpoints and making the network camera independent.
• Ct can also be applied to standard camera models.
• The multi-task loss from 9 weights and optimizes both modalities at the
same time.
• By post-processing the predicted distance maps in 3D space, can obtain
surround-view geometric information using proposed framework.
17. SVDistNet: Self-Supervised Near-Field Distance
Estimation on Surround View Fisheye Cameras
Overview of proposed network architecture for semantically guided self-supervised distance estimation.
It consists of a shared vector-based self-attention encoder and task-specific decoders. encoder is a self-
attention network with pairwise and patchwise variants, while the decoder uses pixel-adaptive
convolutions, which are both complemented by Camera Geometry convolutions.
19. FisheyeDistanceNet++: Self-Supervised Fisheye Distance Estimation with
Self-Attention, Robust Loss Function and Camera View Generalization
Distance estimation results of the same network evaluated on four different fisheye cameras of
a surround-view camera system. One can see that SVDistNet model generalizes well across
different viewing angles and consistently produces high-quality distance outputs.
21. An Online Learning System for Wireless Charging
Alignment using Surround-view Fisheye Cameras
• In parallel to the electrification of the vehicular fleet, automated parking systems that
make use of surround-view camera systems are becoming increasingly popular.
• In this work, propose a system based on the surround-view camera architecture to
detect, localize, and automatically align the vehicle with the inductive charge pad.
• The visual design of the charge pads is not standardized and not necessarily known
beforehand.
• Therefore, a system that relies on offline training will fail in some situations.
• Thus, propose a self-supervised online learning method that leverages the driver’s
actions when manually aligning the vehicle with the charge pad and combine it with
weak supervision from semantic se
• gmentation and depth to learn a classifier to auto-annotate the charge pad in the video
for further training. In this way, when faced with a previously unseen charge pad, the
driver needs only manually align the vehicle a single time.
• As the charge pad is flat on the ground, it is not easy to detect it from a distance. Thus,
propose using a Visual SLAM pipeline to learn landmarks relative to the charge pad to
enable alignment from a greater range.
22. An Online Learning System for Wireless Charging
Alignment using Surround-view Fisheye Cameras
23. An Online Learning System for Wireless Charging
Alignment using Surround-view Fisheye Cameras
Various different commercial charge pads.
ArUco patterns on a charging station.
24. An Online Learning System for Wireless Charging
Alignment using Surround-view Fisheye Cameras
Three task perception stack.
25. An Online Learning System for Wireless Charging
Alignment using Surround-view Fisheye Cameras
Synthetic analysis of the pixel size of charge pads
at various distances from the vehicle.
26. An Online Learning System for Wireless Charging
Alignment using Surround-view Fisheye Cameras
Employing Visual SLAM to predict the position of the charge pad in
an image corresponding to a previous vehicle position.
27. An Online Learning System for Wireless Charging
Alignment using Surround-view Fisheye Cameras
Overall system architecture for online charge pad learning and vehicle-charge pad alignment
28. An Online Learning System for Wireless Charging
Alignment using Surround-view Fisheye Cameras
29. An Online Learning System for Wireless Charging
Alignment using Surround-view Fisheye Cameras
30. Qualitative results of charge pad
Detection and Tracking in different
scenarios namely outdoor (top),
indoor (2nd row), synthetic (3rd row),
and augmented Valeo logo (bottom).
31. An Online Learning System for Wireless Charging
Alignment using Surround-view Fisheye Cameras
Examples of visual features in an image
32. RoadEdgeNet: Road Edge Detection System
Using Surround View Camera Images
• Road Edge is defined as the borderline where there is a change from the road
surface to the non-road surface.
• Most of the currently existing solutions for Road Edge Detection use only a single
front camera to capture the input image; hence, the system’s performance and
robustness suffer.
• efficient CNN trained on a very diverse dataset yields more than 98% semantic
segmentation for the road surface, which is then used to obtain road edge
segments for individual camera images.
• Afterward, the multi-cameras raw road edges are transformed into world
coordinates, and RANSAC curve fitting is used to get the final road edges on both
sides of the vehicle for driving assistance.
• The process of road edge extraction is also very computationally efficient with the
same generic road segmentation output, which is computed along with other
semantic segmentation for driving assistance and autonomous driving.
• RoadEdgeNet algorithm is designed for automated driving in series production,
and discuss the various challenges and limitations of the current algorithm.
33. RoadEdgeNet: Road Edge Detection System
Using Surround View Camera Images
Overall Road Edge Detection System Architecture
34. RoadEdgeNet: Road Edge Detection System
Using Surround View Camera Images
RoadEdgeNet
Architecture
35. RoadEdgeNet: Road Edge Detection System
Using Surround View Camera Images
Road Edges candidate points (left), edge points from curve fitting (mid) and Road Edges Overlay on the Front Camera (right)
36. RoadEdgeNet: Road Edge Detection System
Using Surround View Camera Images
Obtaining candidate left and right road edge points.
Note that this figure illustrates the Front View camera
image. The same logic is applied for the Rear View
image. For mirror cameras, will perform steps 1 - 3 only
as only have either left or right side edge points on
mirror cameras. Step-1: Get the far left road pixel (x1,
y1) and far- right road pixel (x2, y2) from the segmented
binary image. Step-2: Scan from left to right for each
row. If a road pixel is reached, store the point and go to
the next row while skipping other pixels in that row.
Step-3: Repeat step 2 for each row until y1 is reached
and skip everything below y1. Step-4: Scan from right to
left for each row. If a road pixel is reached, store the
point and go to the next row skipping all other pixels in
that row. Step-5: Repeat step 4 for each row until y2 is
reached and skip everything below y2. Step-6: Transform
the points into vehicle coordinates with the Image to
World Transformation for further processing.