Fisheye/Omnidirectional View in Autonomous Driving V

Fisheye/Omnidirectional View in
Autonomous Driving V
Yu Huang

Outline
• Road-line detection and 3D reconstruction using fisheye cameras
• Vehicle Re-ID for Surround-view Camera System
• SynDistNet: Self-Supervised Monocular Fisheye Camera Distance
Estimation Synergized with Semantic Segmentation for Autonomous
Driving
• Universal Semantic Segmentation for Fisheye Urban Driving Images
• UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a
Generic Framework for Handling Common Camera Distortion Models
• OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving
• Adversarial Attacks on Multi-task Visual Perception for Autonomous Driving

Road-line detection and 3D reconstruction
using fisheye cameras
• In future ADAS, smart monitoring of the vehicle environment is a key issue.
• Fisheye cameras have become popular as they provide a panoramic view with
a few low-cost sensors.
• However, current ADAS systems have limited use as most of the underlying
image processing has been designed for perspective views only.
• In this article illustrate how the theoretical work done in omnidirectional
vision over the past ten years can help to tackle this issue.
• To do so, have evaluated a simple algorithm for road line detection based on
the unified sphere model in real conditions.
• firstly highlight the interest of using fisheye cameras in a vehicle, then present
experimental results on the detection of lines on a set of 180 images,
• finally, show how the 3D position of the lines can be recovered by
triangulation.

Road-line detection and 3D reconstruction
using fisheye cameras

Vehicle Re-ID for Surround-view Camera System
• The vehicle re-identification (Re-ID) plays a critical role in the perception system of autonomous
driving, which attracts more and more attention in recent years.
• However, no existing complete solution for the surround-view system mounted on the vehicle.
• two main challenges in above scenario: i) In single-camera view, it is difficult to recognize the
same vehicle from the past image frames due to the fish-eye distortion, occlusion, truncation, etc.
ii) In multi-camera view, the appearance of the same vehicle varies greatly from different cameras
viewpoints.
• Thus, present an integral vehicle Re-ID solution to address these problems.
• Specifically, propose a quality evaluation mechanism to balance the effect of tracking boxs drift
and targets consistence.
• Besides, take advantage of the Re-ID network based on attention mechanism, then combined
with a spatial constraint strategy to further boost the performance between different cameras.
• The experiments demonstrate that solution achieves state-of-the-art accuracy while being real-
time in practice.
• Besides, will release the code and annotated fisheye dataset for the benefit of community.

Vehicles in single view of fisheye camera. (a) The same vehicle features change dramatically in
consecutive frames and vehicles tend to obscure each other. (b) Matching errors are caused by
tracking results. (c) The vehicle center indicated by the orange box is stable while the IoU in
consecutive frames indicated by the yellow box decreases with movement.

The overall framework of vehicle Re-ID in single camera. Each object is assigned a single tracker to realize Re-ID in single
channel. Tracking templates are initialized with object detection results. All tracking outputs are post-processed by the
quality evaluation module to deal with the distorted or occluded objects.

Samples captured by different cameras. (a) The appearances of the same vehicle captured by different
cameras vary greatly, and the same color represents the same object. (b) Objects have a similar
appearance may appear in the same camera view, as shown by these two black vehicles in green boxes.

Illustration of the multi-camera Re-ID network. This network is
a two branch parallel structure. The top branch is employed to
make the network pay more attention on object regions, and
anther is for extracting global features.

Projection uncertainty of key points. Ellipse 1 and ellipse 2 are uncertainty
ranges of front and left (right) cameras, respectively.

The overall framework of the vehicle Re-ID in multi-camera. For the new target, Re-ID model is used first to
extract the features, followed by the distance metrics is carried out for this feature and features in gallery.
Besides, the spatial constraint strategy is adopted to improve the correlation effect.

SynDistNet: Self-Supervised Monocular Fisheye
Camera Distance Estimation Synergized with
Semantic Segmentation for Autonomous Driving
• Self-supervised learning approaches for monocular depth estimation usually suffer from scale
ambiguity.
• do not generalize well when applied on distance estimation for complex projection models such as
in fisheye and omnidirectional cameras.
• introduce a multi-task learning strategy to improve self- supervised monocular distance estimation
on fisheye and pinhole camera images.
• contribution to this work is threefold:
• Firstly, introduce a distance estimation network architecture using a self-attention based encoder coupled with robust
semantic feature guidance to the decoder that can be trained in a one-stage fashion.
• Secondly, integrate a generalized robust loss function, which improves performance significantly while removing the
need for hyperparameter tuning with the reprojection loss.
• Finally, reduce the artifacts caused by dynamic objects violating static world assumption by using a semantic masking
strategy.
• significantly improve upon the RMSE of previous work on fisheye by 25% reduction in RMSE.

Overview over the joint prediction of
distance D and semantic segmentation M
from a single input image I。 Compared to
previous approaches, semantically guided
distance estimation produces sharper
depth edges and reasonable distance
estimates for dynamic objects.

Overview of proposed framework for
the joint prediction of distance and
semantic segmentation. The upper part
(blue blocks) describes the single steps
for the depth estimation, while the
green blocks describe the single steps
needed for the prediction of the
semantic segmentation. Both tasks are
optimized inside a multi-task network
by using the weighted total loss.

Application of semantic masking methods, to handle
potentially dynamic objects. The dynamic objects inside the
segmentation masks from consecutive frames in (b) and (d)
are accumulated to a dynamic object mask, which is used to
mask the photometric error (e), as shown in (h).

Visualization of our proposed network architecture to
semantically guide the depth estimation. We utilize a
self-attention based encoder and a semantically guided
decoder using pixel-adaptive convolutions.

Universal Semantic Segmentation for Fisheye
Urban Driving Images
• Semantic segmentation is a critical method in the field of autonomous driving. When
performing semantic image segmentation, a wider field of view (FoV) helps to obtain
more information about the surrounding environment, making automatic driving safer
and more reliable, which could be offered by fisheye cameras.
• In this paper, a seven DoF augmentation method is proposed to transform rectilinear
image to fisheye image in a more comprehensive way.
• In the training process, rectilinear images are transformed into fisheye images in seven
DoF, which simulates the fisheye images taken by cameras of different positions,
orientations and focal lengths. The result shows that training with the seven-DoF
augmentation can improve the models accuracy and robustness against different
distorted fisheye data.
• This seven-DoF augmentation provides a universal semantic segmentation solution for
fisheye cameras in different autonomous driving applications.
• Also, provide specific parameter settings of the augmentation for autonomous driving.
• At last, tested universal semantic segmentation model on real fisheye images and
obtained satisfactory results.
• The code and configurations are released at https://github.com/Yaozhuwa/FisheyeSeg.

Projection model of fisheye camera. PW
is a point on a rectilinear image that we
place on the x-y plane of the world
coordinate system. θ is the Angle of
incidence of the point relative to the
fisheye camera. P is the imaging point
of PW on the fisheye image. |OP| = fθ.
The relative rotation and translation
between the world coordinate system
and the camera coordinate system
results in six degrees of freedom.

The six DoF augmentation. Except the first
row, every image is transformed using a
virtual fisheye camera with focal length of
300 pixels. The letter in brackets means
that which axis the camera is panning
along or rotating around.

the synthetic fisheye images with different f (focal length)

Semantic segmentation of real fisheye images.

UnRectDepthNet: Self-Supervised Monocular
Depth Estimation using a Generic Framework for
Handling Common Camera Distortion Models
• This rectification process simplifies the depth estimation significantly, and thus it has been
adopted in CNN approaches.
• However, rectification has several side effects, including a reduced field of view (FOV), resampling
distortion, and sensitivity to calibration errors.
• In this paper, propose a generic scale-aware self-supervised pipeline for estimating depth,
Euclidean distance, and visual odometry from unrectified monocular videos.
• demonstrate a similar level of precision on the unrectified KITTI dataset with barrel distortion
comparable to the rectified KITTI dataset.
• The intuition being that the rectification step can be implicitly absorbed within the CNN model,
which learns the distortion model without increasing complexity.
• not suffer from a reduced field of view and avoids computational costs for rectification at
inference time.
• To further illustrate the general applicability of the proposed framework, apply it to wide-angle
fisheye cameras with 190◦ horizontal field of view.
• The training framework UnRectDepthNet takes in the camera distortion model as an argument
and adapts projection and unprojection functions accordingly.

Depth obtained from a single unrectified (left) and rectified KITTI image (right). Our scale-
aware model, UnRectDepthNet, yields precise boundaries and fine-grained depth maps.

Illustration of distortion correction in KITTI and Wood- Scape datasets. The first row shows a raw KITTI
image with barrel distortion and the corresponding rectified image. The red box was used to crop out
black pixels in periphery causing a loss of FOV. The second row shows a raw WoodScape image with
strong fisheye lens distortion and the corresponding rectified image exhibiting a drastic loss of FOV.

The projection is a complex multi-stage process compared to regular
lenses and thus list the detailed steps:

The radial distortion models are summarized below:

A self- supervised monocular structure-from-motion (SfM):

• The UnRectDepthNet training block on the right enables the usage of various camera
models generically listed in the black box.
• The distortion is then handled internally in the unprojection and projection steps of the
transformation from It to It−1.
• This paper has tested it with KITTI barrel distorted and WoodScape fisheye distorted
video sequences.
• The block on the left indicates the entire workflow of the training pipeline where the top
row depicts the ego masks, Mt→t−1, Mt→t+1 representing the valid pixel coordinates
while synthesizing Iˆt−1→t from It−1 and Iˆt+1→t from It+1 respectively.
• The following row showcases the masks used to filter static pixels, obtained after training
two epochs, and the black pixels are removed from the reconstruction loss.
• Dynamic objects moving at speed similar to the ego car’s as well as homogeneous areas
are filtered out to prevent the contamination of reconstruction loss.
• The third row shows the depth predictions, where the scale ambiguity is resolved using
the ego vehicle’s odometry data.
• Finally, the top block illustrates the inference output.

OmniDet: Surround View Cameras based Multi-task
Visual Perception Network for Autonomous Driving
• Surround View fisheye cameras are commonly deployed in automated driving for 360° near-field
sensing around the vehicle.
• This work presents a multi-task visual perception network on unrectified fisheye images to enable the
vehicle to sense its surrounding environment.
• It consists of six primary tasks necessary for an autonomous driving system: depth estimation, visual
odometry, semantic segmentation, motion segmentation, object detection, and lens soiling detection.
• demonstrate that the jointly trained model performs better than the respective single task versions.
• multi-task model has a shared encoder providing a significant computational advantage and has
synergized decoders where tasks support each other.
• propose a novel camera geometry based adaptation mechanism to encode the fisheye distortion
model both at training and inference.
• This was crucial to enable training on the WoodScape dataset, comprised of data from different parts
of the world collected by 12 different cameras mounted on three different cars with different intrinsics
and viewpoints.
• Given that bounding boxes is not a good representation for distorted fisheye images, also extend
object detection to use a polygon with non-uniformly sampled vertices.
• Additionally evaluate our model on standard automotive datasets, namely KITTI and Cityscapes.

Overview
of
our
Surround
View
cameras
based
multi-task
visual
perception
framework.

OmniDet: Surround View Cameras based Multi-task
Visual Perception Network for Autonomous Driving

Adversarial Attacks on Multi-task Visual
Perception for Autonomous Driving
• Deep neural networks (DNNs) have accomplished impressive success in various
applications, including autonomous driving perception tasks, in recent years.
• On the other hand, deep neural networks are fooled by adversarial attacks.
• This vulnerability raises significant concerns, particularly in safety-critical
applications.
• research into attacking and defending DNNs has gained much coverage.
• In this work, detailed adversarial attacks are applied on a diverse multi-task visual
perception deep network across distance estimation, semantic segmentation,
motion detection, and object detection.
• The experiments consider both white and black box attacks for targeted and un-
targeted cases, while attacking a task and inspecting the effect on all the others,
in addition to inspecting the effect of applying a simple defense method.

Adversarial attacks on OmniDet MTL model. Distance,
segmentation, motion and detection perception tasks
are attacked by white and black box methods with
targeted and targeted objectives, resulting in incorrect
model predictions.

Illustration of baseline multi-task
architecture comprising of four tasks

White box Un-targeted, White box Targeted, Black box Un-targeted, & Black box Targeted Attacks. Within each group from
top to bottom, from left to right: Original results, adversarial perturbations, & the impacted results.

Fisheye/Omnidirectional View in Autonomous Driving V

Fisheye/Omnidirectional View in Autonomous Driving V

Recommended

Recommended

More Related Content

Similar to Fisheye/Omnidirectional View in Autonomous Driving V

Similar to Fisheye/Omnidirectional View in Autonomous Driving V (20)

More from Yu Huang

More from Yu Huang (20)

Recently uploaded

Recently uploaded (20)

Fisheye/Omnidirectional View in Autonomous Driving V