2. Outline
■ Micro-Doppler Based Human-Robot Classification Using Ensemble and Deep
Learning Approaches (2018.2)
■ Deep Learning for End-to-End Automatic Target Recognition from Synthetic
Aperture Radar Imagery (2018)
■ Practical classification of different moving targets using automotive radar and deep
neural networks (2018)
■ RRPN: Radar Region Proposal Network For Object Detection In Autonomous
Vehicles (2019.5)
■ Vehicle Detection With Automotive Radar Using Deep Learning on Range-Azimuth-
Doppler Tensors (2019.10)
■ 2D Car Detection in Radar Data with PointNets (2019.12)
■ Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver
Assistance Systems (2019.12)
3. Micro-Doppler Based Human-Robot Classification
Using Ensemble and Deep Learning Approaches
■ Radar sensors can be used for analyzing the induced frequency shifts due to micro-
motions in both range and velocity dimensions identified as micro-Doppler (µ-D) and
micro-Range (µ-R), respectively.
■ Different moving targets will have unique µ-D and µ-R signatures that can be used for
target classification.
■ This paper uses a 25GHz FMCW Single-Input Single-Output (SISO) radar in industrial
safety for real-time human-robot identification.
■ Due to the real-time constraint, joint Range- Doppler (R-D) maps are directly analyzed for
the classification problem.
■ For ensemble classifiers, restructured range and velocity profiles are passed directly to
ensemble trees, such as gradient boosting and random forest without feature extraction.
■ Finally, a Deep Convolutional Neural Network (DCNN) is used and raw R-D images are
directly fed into the constructed network.
4. Micro-Doppler Based Human-Robot Classification
Using Ensemble and Deep Learning Approaches
Comparison between R-D
maps of human and robot.
5. Micro-Doppler Based Human-Robot Classification
Using Ensemble and Deep Learning Approaches
Grayscale R-D maps of human and robot fed to CNN
6. Micro-Doppler Based Human-Robot Classification
Using Ensemble and Deep Learning Approaches
The proposed CNN architecture on R-D maps
7. Deep Learning for End-to-End Automatic Target
Recognition from Synthetic Aperture Radar Imagery
■ The standard architecture of synthetic aperture radar (SAR) automatic target
recognition (ATR) consists of three stages: detection, discrimination, and classification.
■ CNNs for SAR ATR, most of them classify target classes from a target chip extracted
from SAR imagery, as a classification for the third stage of SAR ATR.
■ This report proposes a CNN for end-to-end ATR from SAR imagery.
■ The CNN named verification support network (VersNet) performs all three stages of
SAR ATR end-to-end.
■ VersNet inputs a SAR image of arbitrary sizes with multiple classes and multiple
targets, and outputs a SAR ATR image representing the position, class, and pose of
each detected target.
■ This report describes evaluation of VersNet trained to output scores of all 12 classes:
10 target classes, a target front class, and a background class, for each pixel using the
moving and stationary target acquisition and recognition (MSTAR) public dataset.
8. Deep Learning for End-to-End Automatic Target
Recognition from Synthetic Aperture Radar Imagery
The CNN named VersNet performs automatic target recognition of
multi-class / multi-target in variable size SAR image. In this case, the
input is a single image with three classes and four targets (upper left
and lower right targets are the same class). VersNet outputs the position,
class, and pose (front side) of each detected target.
9. Deep Learning for End-to-End Automatic Target
Recognition from Synthetic Aperture Radar Imagery
Illustration of training for the proposed CNN model
10. Deep Learning for End-to-End Automatic Target
Recognition from Synthetic Aperture Radar Imagery
Input (SAR image of multiple classes and multiple targets), output (SAR ATR image), and GT.
11. Practical classification of different moving targets
using automotive radar and deep neural networks
■ This work presents results for classification of different classes of targets (car, single
and multiple people, bicycle) using automotive radar data and different NN.
■ A fast implementation of radar algorithms for detection, tracking, and micro-Doppler
extraction is proposed in conjunction with the automotive radar transceiver TEF810X
and microcontroller unit SR32R274 manufactured by NXP Semiconductors.
■ Three different types of neural networks are considered, namely a classic
convolutional network, a residual network, and a combination of convolutional and
recurrent network, for different classification problems across the four classes of
targets recorded.
■ Considerable accuracy (close to 100% in some cases) and low latency of the radar
pre-processing prior to classification (∼0.55 s to produce a 0.5 s long spectrogram)
are demonstrated in this study, and possible shortcomings and outstanding issues
are discussed.
12. Practical classification of different moving targets
using automotive radar and deep neural networks
Block diagram of the multi-target classification system
13. Practical classification of different moving targets
using automotive radar and deep neural networks
Examples of spectrograms for
different targets:
(a) Single person walking,
(b) Two people walking together,
(c) Bicycle,
(d) Car
14. Practical classification of different moving targets
using automotive radar and deep neural networks
Representation of the different network architectures. (a) CNN similar to VGG type, (b) Convolutional residual
network, (c) Combination of convolutional and recurrent LSTM network.
16. RRPN: Radar Region Proposal Network For
Object Detection In Autonomous Vehicles
■ Region proposal algorithms play an important role in most state-of-the-art two-stage
object detection networks by hypothesizing object locations in the image.
■ Region proposal algorithms are known to be the bottleneck in most two-stage object
detection networks, increasing the processing time for each image and resulting in slow
networks not suitable for real-time applications such as autonomous driving vehicles.
■ This paper introduces RRPN, a Radar- based real-time region proposal algorithm for
object detection in autonomous driving vehicles.
■ RRPN generates object proposals by mapping Radar detections to the image coordinate
system and generating pre-defined anchor boxes for each mapped Radar detection point.
■ These anchor boxes are then transformed and scaled based on the object’s distance
from the vehicle, to provide more accurate proposals for the detected objects.
■ Code has been made publicly available at https://github.com/mrnabati/RRPN.
17. RRPN: Radar Region Proposal Network For
Object Detection In Autonomous Vehicles
Generating anchors of different shapes and sizes for each Radar detection (blue circle).
18. RRPN: Radar Region Proposal Network For
Object Detection In Autonomous Vehicles
■ The first step in generating ROIs is mapping the radar detections from the vehicle
coordinates to the camera-view coordinates.
■ Radar detections are reported in a bird’s eye view perspective with the object’s
range and azimuth measured in the vehicle’s coordinate system.
■ By mapping these detections to the camera-view coordinates, it enables associating
the objects detected by the Radars to those seen in the images obtained by the
camera.
■ Once the Radar detections are mapped to the image coordinates, it gets the
approximate location of every detected object in the image.
■ These mapped Radar detections, called Points of Interest (POI), provide valuable
information about the objects in each image, without any processing on the image
itself. Having this info, a simple approach for proposing ROIs would be introducing a
bounding box centered at every POI.
19. RRPN: Radar Region Proposal Network For
Object Detection In Autonomous Vehicles
■ Anchor bounding boxes from Faster R-CNN is used in this method.
■ For every POI, it generates several bounding boxes with different sizes and aspect
ratios centered at the POI.
■ It uses 4 different sizes and 3 different aspect ratios to generate these anchors.
■ To account for the fact that the POI is not always mapped to the center of the object
in the image coordinate, it also generate different translated versions of the anchors.
■ These translated anchors provide more accurate bounding boxes when the POI is
mapped towards the right, left or the bottom of the object.
■ Radar detections have the range info for every detected object, which is used in this
step to scale all generated anchors.
20. RRPN: Radar Region Proposal Network For
Object Detection In Autonomous Vehicles
Detection results. Top row: ground truth, middle row: Selective Search, bottom row: RRPN
It is evaluated on the released NuScenes dataset using the Fast R-CNN object detection network.
21. Vehicle Detection With Automotive Radar Using Deep
Learning on Range-Azimuth-Doppler Tensors
■ Radar has been a key enabler of advanced driver assistance systems in automotive
for over two decades.
■ As an inexpensive, all-weather and long-range sensor that simultaneously provides
velocity measurements, radar is expected to be indispensable to the future of AV.
■ Traditional radar signal processing cannot distinguish reflections from objects of
interest from clutter, are generally limited to detecting peaks in the received signal.
■ These peak detection methods effectively collapse the image-like radar signal into a
sparse point cloud.
■ It demonstrates a deep-learning-based vehicle detection solution which operates on
the image-like tensor instead of the point cloud resulted by peak detection.
22. Vehicle Detection With Automotive Radar Using Deep
Learning on Range-Azimuth-Doppler Tensors
An example of the radar signal with the
corresponding camera and LiDAR images.
The radar signal resides in polar coordinate
space: Vertical axis is the range, and
horizontal axis is the azimuth (angle). The
Doppler (velocity) channel values for the
marked points are plotted. Each of these
points are also marked in the camera frame.
Example Doppler plots for the highlighted points
23. Vehicle Detection With Automotive Radar Using Deep
Learning on Range-Azimuth-Doppler Tensors
■ The radar tensor is 3-D: it has two spatial dimensions, range and azimuth,
accompanied by a third, Doppler dimension, which represents the velocity of objects
relative to the radar, up to a certain aliasing velocity.
■ The first approach is to remove the Doppler dimension by summing the signal power
over that dimension. The input of the model is a range-azimuth tensor, hence it is
called this solution the Range-Azimuth (RA) model.
■ The second approach is to also provide range-Doppler and azimuth-Doppler tensors
as input. The range-Doppler input has the azimuth dimension collapsed. Similarly,
the azimuth- Doppler input has range dimension collapsed. Thus, the model has
three inputs that are fused after initial processing, called the Range-Azimuth-
Doppler (RAD) model.
■ Due to the properties of the radar signal, translation equivariance cannot be
expected. CoordConv is used in the first layer.
■ In practice, this means stacking two additional channels to the input which contain
the pixel coordinates to enable the convolutions to be conditioned on location.
24. Vehicle Detection With Automotive Radar Using Deep
Learning on Range-Azimuth-Doppler Tensors
Conceptual diagram of the DL model architecture. Feature channels are not visualized in the picture.
Notation: (R)ange; (A)zimuth; (D)oppler. The different 2D tensors are calculated from the original
RAD tensor by summing over each of the dimensions.
25. Vehicle Detection With Automotive Radar Using Deep
Learning on Range-Azimuth-Doppler Tensors
■ The feature extractor used for the Range-Azimuth (RA) model is motivated by the
Feature Pyramid Network (FPN) architecture.
■ It consists of multiple consecutive convolutional layers, with multiple down- sampling
(i.e. strided convolutional) layers.
■ The next stage is up-sampling multiple times using transposed convolutions.
■ Skip connections are used between feature maps of matching shapes from the up-
sampling and the down-sampling path.
■ Before adding the feature maps together, an additional convolutional layer is
executed for each skip-connection.
■ The layer configuration is constructed such that a feature in the final layer has a
receptive field spanning the complete input.
26. Vehicle Detection With Automotive Radar Using Deep
Learning on Range-Azimuth-Doppler Tensors
■ The Range-Azimuth-Doppler (RAD) model operates on the three projections of the 3D
radar tensor to reduce computational complexity.
■ The projections are made by summing the power over the omitted dimension.
■ The network has three 2D inputs: range-azimuth, azimuth- Doppler and range-Doppler.
■ The range-azimuth branch is exactly the same as the down-sampling part of the Range-
Azimuth (RA) model.
■ Additionally, there are two branches taking range-Doppler and the azimuth-Doppler
tensors as input, respectively. These branches only down-sample.
■ The resulting feature maps are then fused as follows.
– First, each feature map is repeated along the missing dimension such that the tensors have
compatible shapes.
– This yields three 4D feature tensors, one channel being the feature channel and the rest
correspond to range- azimuth-Doppler.
– It then concatenates these in the channel dimension and apply 3D convolutional layers.
– After these convolutions, it performs max-pooling over the Doppler dimension and continue
with the up-sampling layers of the range-azimuth model.
27. Vehicle Detection With Automotive Radar Using Deep
Learning on Range-Azimuth-Doppler Tensors
■ After the FFT, the radar tensor is in polar space (range-azimuth).
■ As the range increases, the distance btw adjacent bins becomes larger:
– the angle between center of the forward bin and the next bin is 3.7°, which
corresponds to a distance of ~3 meters laterally at a distance of 47 meters,
while the angle btw bins increases to 11° (or 9 meters) for the most extreme bins.
■ SSD methods, place a grid of prior boxes over the input tensor.
The physical center direction of the
azimuth bins on a Cartesian grid
A typical large vehicle with dimensions
of 2 by 5 meters is shown for reference.
28. Vehicle Detection With Automotive Radar Using Deep
Learning on Range-Azimuth-Doppler Tensors
■ A baseline: Polar input, polar output.
– The baseline solution takes the range-azimuth radar tensor as input.
– The prior boxes are distributed on a uniform grid in polar space.
■ 3 alternative approaches:
– Cartesian input, Cartesian output. The input tensor is transformed from polar space
to Cartesian space using bi-linear interpolation. The Cartesian input tensor is
clipped, resulting in a square feature map.
– Polar input, Cartesian output with learned transformation. The input tensor of the
NN is in polar space, but the output boxes are on a uniform grid in Cartesian space.
Thus, the NN has to explicitly learn the polar to Cartesian transformation.
– Polar-to-Cartesian transformation on latent features. Same as the polar input,
Cartesian output solution, but after feature extraction, an explicit transformation
layer transforms the latent features from polar to Cartesian space (using bi-linear
interpolation).
29. Vehicle Detection With Automotive Radar Using Deep
Learning on Range-Azimuth-Doppler Tensors
■ It converts the network into a RNN by taking advantage of LSTM modules.
■ To operate in a fully convolutional manner, it employs a Convolutional LSTM cell.
■ In essence, compared to a more traditional LSTM cell, some of the operations are
replaced with convolutions, and the cell operates on a 3D tensor.
■ It employs a one-shot object detection model, namely Single Shot Detector (SSD).
■ In essence, SSD operates on one or more feature maps extracted from a backbone
network.
■ SSD uses regression to adapt the size and position of the pre-defined box to better
match the bounding box of the actual object.
■ During inference, NMS is used to remove overlapping detections which are likely for the
same object.
■ It uses Focal Loss to provide superior results compared to hard negative mining.
30. Vehicle Detection With Automotive Radar Using Deep
Learning on Range-Azimuth-Doppler Tensors
The radar signal has been visualized in Cartesian coordinates. Targets are indicated by black,
predictions by white outlines. Velocity estimation targets and predictions are also visible.
31. 2D Car Detection in Radar Data with PointNets
■ For many automated driving functions, a highly accurate perception of the vehicle
environment is a crucial prerequisite.
■ Modern high-resolution radar sensors generate multiple radar targets per object, which
makes these sensors particularly suitable for the 2D object detection task.
■ This work presents an approach to detect 2D objects solely depending on sparse radar
data using PointNets.
■ This method facilitates a classification together with a bounding box estimation of
objects using a single radar sensor.
■ To this end, PointNets are adjusted for radar data performing 2D object classification
with segmentation, and 2D bounding box regression in order to estimate an amodal 2D
bounding box.
■ The algorithm is evaluated using an automatically created dataset which consist of
various realistic driving maneuvers.
■ The results show the great potential of object detection in high-resolution radar data
using PointNets.
32. 2D Car Detection in Radar Data with PointNets
2D object detection in radar data. Radar point cloud with reflections belonging to a car (red) or
clutter (blue). The length of arrows displays the Doppler velocity, the size of points represents the
radar cross section (RCS) value. The red box is a predicted amodal 2D bounding box.
33. 2D Car Detection in Radar Data with PointNets
2D object detection in radar data with PointNets. First, a patch proposal determine
multiple RoI, called patches, using the entire radar target list. Second, a classification and
segmentation network classifies these patches. Subsequently, each of the n radar targets
are classified to get an instance segmentation. Finally, a regression network estimates an
amodal 2D bounding box for objects using the m segmented car radar targets.
34. 2D Car Detection in Radar Data with PointNets
■ The patch proposal divides the radar point cloud into regions of interest.
■ A patch with specific length and width is determined around each radar target.
■ The length and width of the patch must be selected in such a way that it comprises the
entire object of interest, here a car.
■ It is important that each patch contains enough radar targets to distinguish btw car and
clutter patches in the classification, car and clutter targets in the segmentation step.
■ The patch proposal generates multiple patches containing the same object.
■ The final 2D object detector provides multiple hypotheses for a single object.
■ This behavior is desirable cause the object tracking system in the further processing
chain for environmental perception deals with multiple hypotheses per object.
■ The patches are normalized to a center view which ensures rotation-invariance.
■ All radar targets within a patch are forwarded to the classification/segmentation network.
35. 2D Car Detection in Radar Data with PointNets
■ The classification and object segmentation module consists of a network which
classifies each patch and segments all radar targets inside the patch.
■ For this purpose, the entire patches are considered using the classification network
to distinguish between car and clutter patches.
■ For car patches, the segmentation network predicts a probability score for each
radar target which indicates the probability of radar targets belonging to a car.
■ In the masking step, radar targets which are classified as car targets are extracted.
■ Coordinates of the segmented radar targets are normalized to ensure translational
invariance of the algorithm.
■ Note that the classification and segmentation module can easily be extended to
multiple classes.
■ For this purpose, the patch is classified as a certain class and, consequently the
predicted classification information is used for the segmentation step.
36. 2D Car Detection in Radar Data with PointNets
■ This module estimates an associated amodal 2D bounding box (Bbox).
■ First, a light- weight regression PointNet, called Transformer PointNet (T-Net), estimates
the center of the amodal bounding box and transforms radar targets into a local
coordinate system relative to the predicted center.
■ The regression network predicts parameters of a 2D bounding box, i.e., its center (xc ,
yc ), its heading angle θ and its size (l, w).
■ For the box center estimation, residual based 2D localization is performed.
■ Heading angle and size of bounding box is predicted using a combination of a
classification and a segmentation approach.
■ For size estimation, predefined size templates are incorporated for classification.
■ Residual values regarding those categories are predicted.
■ In case of multiple classes, the box estimation network also uses the classification info
for the bounding box regression.
■ Therefore, the size templates have to be extended by additional classes, e.g.,
pedestrians or cyclists.
37. 2D Car Detection in Radar Data with PointNets
Network architectures for 2D object detection in radar data with PointNets.
38. 2D Car Detection in Radar Data with PointNets
■ For the object detection task in radar data, the network architecture is based on the
concepts of PointNet and Frustum PointNets.
■ The network architecture consisting of classification, segmentation and 2D bounding box
regression network.
■ For the classification and segmentation network the architecture is conceptually similar
to PointNet.
■ The network for amodal 2D bounding box estimation is the same as Frustum pointnets.
■ In this work, the input of the classification and bounding box regression network is radar
data.
■ For this reason, the input regarding original PointNet is extended for radar target lists.
■ For classification and segmentation network as well as bounding box regression network,
the radar targets are represented as set of 4-d points containing 2D spatial data, ego
motion compensated Doppler velocity and RCS information.
■ For the classification and segmentation network, the input is a radar target list with n
points of a patch.
■ Then, the segmented radar target list with m points belonging to an object is fed into the
2D bounding box estimation network.
39. 2D Car Detection in Radar Data with PointNets
Results for 2D object detection in radar data. The object detector is evaluated on the test set.
Accuracy and F1 score are evaluated for classification and segmentation. IoU evaluates the 2D
bounding box estimation by using mean IoU (mIoU) and ratio of IoUs with a threshold of 0.7. The
entire test data set as well as single driving maneuvers are considered.
40. Radar and Camera Early Fusion for Vehicle Detection
in Advanced Driver Assistance Systems
■ Perception module is at the heart of Advanced Driver Assistance Systems (ADAS).
■ To improve the quality and robustness of this module, especially in the presence of
environmental noises such as varying lighting and weather conditions, fusion of
sensors (mainly camera and LiDAR) has been the center of attention in the recent
studies.
■ This paper focuses on a relatively unexplored area which addresses the early fusion
of camera and radar sensors.
■ It feeds a minimally processed radar signal to this deep learning architecture along
with its corresponding camera frame to enhance the accuracy and robustness of
this perception module.
■ The evaluation, performed on real world data, suggests that the complementary
nature of radar and camera signals can be leveraged to reduce the lateral error.
41. Radar and Camera Early Fusion for Vehicle Detection
in Advanced Driver Assistance Systems
■ Radar presents a low-cost alternative to LiDAR as a range determining sensor.
■ A typical automotive radar is currently considerably cheaper than a LiDAR due to the
nature of its fundamental design.
■ Besides costs, radar is robust to different lighting and weather conditions (e.g., rain
and fog) and capable of providing instantaneous measurement of velocity, providing
the opportunity for improved system reaction times.
■ With multiple sensors on a vehicle, sensor fusion is a natural next step for ADAS
systems as it can improve the accuracy and especially robustness of object
detection in a relatively noisy environment.
■ The fusion of data across different sensors can occur at a late stage, of lower
complexity than an early fusion method where the sensor measurements from
multiple modalities are jointly processed to generate object properties.
42. Radar and Camera Early Fusion for Vehicle Detection
in Advanced Driver Assistance Systems
■ Traditionally, early fusion allows low-level fusion of the features, which results in
better detection accuracy.
■ Radar data, in the context of autonomous driving and ADAS, has been used to
improve the accuracy of sensor fusion and/or the perception module.
■ However, radar data is typically processed using a CFAR (constant false alarm rate)
algorithm to convert the raw data into a point-cloud which separates the targets of
interest from the surrounding clutter.
■ Converting raw 4D radar tensor (comprised of a dense 2D Euclidean space, Doppler,
and time) into a sparse 2D point cloud removes a significant amount of information
in the signal.
■ In contrast, this method relies on the raw radar data to minimize the artefacts
introduced by post-processing of the signal as well as minimizing the abstraction of
radar output.
43. Radar and Camera Early Fusion for Vehicle Detection
in Advanced Driver Assistance Systems
Inspired by SSD, FusionNet extracts and combines features extracted from different sensors observing the
same space, from a different perspective, with relative positions known. Each feature extraction branch
incorporates a spatial transformation such that the feature maps are spatially aligned with the other branches.
44. Radar and Camera Early Fusion for Vehicle Detection
in Advanced Driver Assistance Systems
■ It implemented two branches in FusionNet, namely the Radar branch, that
processes the range-azimuth image from the radar, and the Camera branch that
processes the images captured by a forward-facing camera.
■ After the independent feature extractor branches, these features are then passed
through the fusion layer(s).
■ In order to ensure that the network learns meaningful representations from different
signal sources, it employed a unique training strategy of partially freezing the
network and fine-tuning.
Visualization of the Radar and Camera’s spatial transforms.
45. Radar and Camera Early Fusion for Vehicle Detection
in Advanced Driver Assistance Systems
Examples of scenes where the network performs well.
A sampling of missed detections.