SlideShare a Scribd company logo
1 of 44
Download to read offline
Fusion of Camera and LiDAR for
Autonomous Vehicles I
(via Deep Learning)
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• A General Pipeline for 3D Detection of Vehicles
• Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian
Detection
• Fusing Bird’s Eye View LIDAR Point Cloud and Front View Camera Image for Deep Object
Detection
• PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation
• RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement
• Joint 3D Proposal Generation and Object Detection from View Aggregation
• Frustum PointNets for 3D Object Detection from RGB-D Data
• Deep Continuous Fusion for Multi-Sensor 3D Object Detection
• Multi-View 3D Object Detection Network for Autonomous Driving
• End-to-end Learning of Multi-sensor 3D Tracking by Detection
A General Pipeline for 3D Detection of
Vehicles
• Autonomous driving requires 3D perception of vehicles and other objects in the in
environment.
• Much of the current methods support 2D vehicle detection.
• Here is a pipeline to adopt any 2D detection network and fuse it with a 3D point cloud to
generate 3D information with minimum changes of the 2D detection networks.
• To identify the 3D box, a model fitting algorithm is developed based on generalized car
models and score maps.
• A two-stage convolutional neural network (CNN) is proposed to refine the detected 3D box.
• It requires minimum efforts to modify the existing 2D networks to fit into the pipeline,
adding just one additional regression term at the output layer to estimate the vehicle
dimensions.
A General Pipeline for 3D Detection of
Vehicles
The raw image is passed to a 2D detection network which provides 2D boxes around the vehicles in
the image plane. Subsequently, a set of 3D points which fall into the 2D bounding box after projection
is selected. With this set, a model fitting algorithm detects the 3D location and 3D bounding box of
the vehicle. And then another CNN network, which takes the points that fit into the 3D bounding box
as input, carries out the final 3D box regression and classification.
Combining LiDAR Space Clustering and Convolutional
Neural Networks for Pedestrian Detection
• This is a pedestrian detector that exploits LiDAR data, in addition to visual information.
• The hypothesis is that using depth data and prior info about the size of the objects, it can
reduce the search space by providing candidates and, speeding up detection algorithms.
• A hypothesis is that this prior definition of the location and size of the candidate bounding
box will also decrease the number of false detections.
• In the approach, LiDAR data is utilized to generate region proposals by processing the three
dimensional point cloud that it provides.
• These candidate regions are then further processed by a state-of-the-art CNN classifier that
is fine-tuned for pedestrian detection.
Combining LiDAR Space Clustering and Convolutional
Neural Networks for Pedestrian Detection
The algorithm is built upon the idea of clustering the 3-D point cloud of the LiDAR. It starts
with raw measurements down-sampling, followed by removal of the floor plane. Then, a
density-based clustering algorithm generates the candidates that are projected on the image
space to provide a region of interest.
Fusing Bird’s Eye View LIDAR Point Cloud and Front
View Camera Image for Deep Object Detection
• This is a method for fusing LIDAR point cloud and camera-captured images in deep
convolutional neural networks (CNN).
• The method constructs a layer called sparse non-homogeneous pooling layer to transform
features between bird’s eye view and front view.
• The sparse point cloud is used to construct the mapping between the two views.
• The pooling layer allows efficient fusion of the multi-view features at any stage of the
network.
• This is favorable for 3D object detection using camera-LIDAR fusion for autonomous driving.
• A corresponding one-stage detector is designed and tested on the KITTI bird’s eye view
object detection dataset, which produces 3D bounding boxes from the bird’s eye view
map.
• The fusion method shows significant improvement on both speed and accuracy of the
pedestrian detection over other fusion-based object detection networks.
Fusing Bird’s Eye View LIDAR Point Cloud and Front
View Camera Image for Deep Object Detection
The sparse non-homogeneous pooling layer that fuses front view image and bird’s eye view LIDAR feature.
Fusing Bird’s Eye View LIDAR Point Cloud and Front
View Camera Image for Deep Object Detection
The fusion-based one-stage object detection network with MS-CNN networks.
PointFusion: Deep Sensor Fusion for 3D
Bounding Box Estimation
• PointFusion, a generic 3D object detection method leverages both image and 3D point
cloud information.
• Unlike existing methods that either use multi-stage pipelines or hold sensor and dataset-
specific assumptions, PointFusion is conceptually simple and application- agnostic.
• It consists of: an off-the-shelf CNN that extracts appearance and geometry features from
input RGB image crops, a variant of PointNet that processes the raw 3D point cloud, and a
fusion sub-network that combines the two outputs to predict 3D bounding boxes.
• The image data and the raw point cloud data are independently processed by a CNN and a
PointNet architecture, respectively.
• The resulting outputs are then combined by a fusion network, which predicts multiple 3D
box hypotheses and their confidences, using the input 3D points as spatial anchors.
PointFusion: Deep Sensor Fusion for 3D
Bounding Box Estimation
Two feature extractors: a PointNet variant that processes raw point cloud data (A), and a CNN that extracts visual
features from an input image (B). Two fusion network formulations: a vanilla global architecture that directly regresses
the box corner locations (D), and a dense architecture that predicts the spatial offset of each of the 8 corners relative to
an input point, (C): for each input point, the network predicts the spatial offset (white arrows) from a corner (red dot) to
the input point (blue), and selects the prediction with the highest score as the final prediction (E).
RoarNet: A Robust 3D Object Detection based
on RegiOn Approximation Refinement
• RoarNet, an approach for 3D object detection from 2D image and 3D Lidar point clouds.
• Based on two stage object detection framework with PointNet as backbone network.
• The first part, RoarNet 2D, estimates the 3D poses of objects from a monocular image, which
approximates where to examine further, and derives multiple candidates that are
geometrically feasible.
• This step significantly narrows down feasible 3D regions, which otherwise requires
demanding processing of 3D point clouds in a huge search space.
• The second part, RoarNet 3D, takes the candidate regions and conducts in-depth inferences
to conclude final poses in a recursive manner.
• RoarNet 3D processes 3D point clouds without any loss of data, leading to precise detection.
RoarNet: A Robust 3D Object Detection based
on RegiOn Approximation Refinement
The model first predicts the 2D bounding boxes and a 3D poses of objects from a 2D image. For each 2D object
detection, geometric agreement search is applied to predict the location of object in 3D space. Centered on
each location prediction, set region proposal which has a shape of standing cylinder. Taking the prediction error
in bounding box and pose into account, there can be multiple region proposals for a single object.
RoarNet: A Robust 3D Object Detection based
on RegiOn Approximation Refinement
• Each region proposal is responsible for detecting a single object.
• Taking the point clouds sampled from each region proposal as input, the model predicts the
location of an object relative to the center of region proposal, which recursively serves for
setting new region proposals for the next step.
• The model also predicts objectness score which reflects the probability of an object being
inside the region proposal.
• Only those proposals with high objectness scores are considered at the next step.
• At a final step, the model sets new region proposals at previously predicted locations.
• The model predicts all coordinates required for 3D bounding box regression including
location, rotation, and size of the objects.
• For practical reason, repeating this step more than once gives better detection performance.
RoarNet: A Robust 3D Object Detection based
on RegiOn Approximation Refinement
Architecture of RoarNet 2D
RoarNet: A Robust 3D Object Detection based
on RegiOn Approximation Refinement
The backbone network is a simplified version of PointNet without T-Net.
RoarNet: A Robust 3D Object Detection based
on RegiOn Approximation Refinement
Joint 3D Proposal Generation and Object
Detection from View Aggregation
• AVOD, an Aggregate View Object Detection network for autonomous driving scenarios.
• The neural network architecture uses LIDAR point clouds and RGB images to generate
features that are shared by two subnetworks: a region proposal network (RPN) and a second
stage detector network.
• The RPN uses an architecture capable of performing multimodal feature fusion on high
resolution feature maps to generate reliable 3D object proposals for multiple object classes
in road scenes.
• Using these proposals, the second stage detection network performs accurate oriented 3D
bounding box regression and category classification to predict the extents, orientation, and
classification of objects in 3D space.
• The proposed architecture produces SoA results on the KITTI 3D object detection
benchmark while running in real time with a low memory footprint.
• Code is https://github.com/kujason/avod
Joint 3D Proposal Generation and Object
Detection from View Aggregation
The method’s architectural diagram. The feature extractors are shown in blue, the region proposal network in
pink, and the second stage detection network in green.
Joint 3D Proposal Generation and Object
Detection from View Aggregation
The architecture of proposed high resolution feature extractor
shown here for the image branch. Feature maps are
propagated from the encoder to the decoder section via red
arrows. Fusion is then performed at every stage of the decoder
by a learned upsampling layer, followed by concatenation, and
then mixing via a convolutional layer, resulting in a full
resolution feature map at the last layer of the decoder.
Joint 3D Proposal Generation and Object
Detection from View Aggregation
A visual comparison between the 8 corner box encoding, the
axis aligned box encoding, and our 4 corner encoding.
Joint 3D Proposal Generation and Object
Detection from View Aggregation
Left: 3D region proposal network output, Middle: 3D detection output, and Right: the projection of the
detection output onto image space for all three classes. The 3D LIDAR point cloud has been colorized and
interpolated for better visualization.
Frustum PointNets for 3D Object Detection
from RGB-D Data
• 3D object detection from RGB- D data in both indoor and outdoor scenes.
• While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns
and invariances of 3D data, here directly operate on raw point clouds by popping up RGB-D
scans.
• However, a key challenge of this approach is how to efficiently localize objects in point
clouds of large-scale scenes (region proposal).
• Instead of solely relying on 3D proposals, this method leverages both mature 2D object
detectors and advanced 3D deep learning for object localization, achieving efficiency as well
as high recall for even small objects.
• Benefited from learning directly in raw point clouds, this method is also able to precisely
estimate 3D bounding boxes even under strong occlusion or with very sparse points.
• Evaluated on KITTI and SUN RGB-D 3D detection benchmarks.
Frustum PointNets for 3D Object Detection
from RGB-D Data
3D object detection pipeline. Given RGB-D data, first generate 2D object region proposals in the RGB
image using a CNN. Each 2D region is then extruded to a 3D viewing frustum in which to get a point
cloud from depth data. Finally, the frustum PointNet predicts a (oriented and amodal) 3D bounding box
for the object from the points in frustum.
Frustum PointNets for 3D Object Detection
from RGB-D Data
Frustum PointNets for 3D object detection. First leverage a 2D CNN object detector to propose 2D regions and
classify their content. 2D regions are then lifted to 3D and thus become frustum proposals. Given a point cloud
in a frustum (n × c with n points and c channels of XYZ, intensity etc. for each point), the object instance is
segmented by binary classification of each point. Based on the segmented object point cloud (m × c), a light-
weight regression PointNet (T-Net) tries to align points by translation such that their centroid is close to amodal
box center. At last the box estimation net estimates the amodal 3D bounding box for the object.
Frustum PointNets for 3D Object Detection
from RGB-D Data
Coordinate systems for point cloud. Artificial points (black dots) are shown to
illustrate (a) default camera coordinate; (b) frustum coordinate after rotating
frustums to center view; (c) mask coordinate with object points’ centroid at
origin; (d) object coordinate predicted by T-Net.
Frustum PointNets for 3D Object Detection
from RGB-D Data
Basic architectures and IO for PointNets. Architecture is illustrated for PointNet++ (v2)
models with set abstraction layers and feature propagation layers (for segmentation).
Frustum PointNets for 3D Object Detection
from RGB-D Data
True positive detection boxes are in green, while false positive boxes are in red and ground truth
boxes in blue are shown for false positive and false negative cases. Digit and letter beside each box
denote instance id and semantic class, with “v” for cars, “p” for pedestrian and “c” for cyclist.
Frustum PointNets for 3D Object Detection
from RGB-D Data
Network architectures for Frustum PointNets. v1 models are based on PointNet. v2 models are based on PointNet++ set
abstraction (SA) and feature propagation (FP) layers. The architecture for residual center estimation T-Net is shared for v1
and v2. The colors (blue for segmentation nets, red for T-Net and green for box estimation nets) of the network background
indicate the coordinate system of the input point cloud. Segmentation nets operate in frustum coordinate, T-Net processes
points in mask coordinate while box estimation nets take points in object coordinate. The small yellow square (or bar)
concatenated with global features is class one-hot vector that tells the predicted category of the underlying object.
Deep Continuous Fusion for Multi-Sensor
3D Object Detection
• It remains an open problem to design 3D detectors that can better exploit multiple
modalities.
• A 3D object detector can exploit both LIDAR as well as cameras to perform very accurate
localization.
• It reasons in bird’s eye view (BEV) and fuses image features by learning to project them
into BEV space.
• Towards this goal, an end-to-end learnable architecture exploits continuous convolutions to
fuse image and LIDAR feature maps at different levels of resolution.
• The proposed continuous fusion layer encode both discrete-state image features as well as
continuous geometric information.
• This enables designing a reliable and efficient end-to-end learnable 3D object detector
based on multiple sensors.
Deep Continuous Fusion for Multi-Sensor
3D Object Detection
Architecture of model. There are two streams, namely the camera image stream and the BEV LIDAR stream.
Continuous fusion layers are used to fuse the image features onto the BEV feature maps.
Deep Continuous Fusion for Multi-Sensor
3D Object Detection
Continuous fusion layer: given a target pixel on BEV image, first extract K nearest LIDAR points; then project the 3D
points onto the camera image plane; this helps retrieve corresponding image features; finally feed the image
feature + continuous geometry offset into a MLP to generate feature for the target pixel.
Deep Continuous Fusion for Multi-Sensor
3D Object Detection
The 2D bounding boxes are obtained by projecting the 3D detections onto the image.
The bounding box of an object on BEV and images are shown in the same color.
Multi-View 3D Object Detection Network
for Autonomous Driving
• Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point
cloud and RGB images as input and predicts oriented 3D bounding boxes.
• It encodes the sparse 3D point cloud with a compact multi-view representation.
• The network is composed of two subnetworks: one for 3D object proposal generation and
another for multi-view feature fusion.
• The proposal network generates 3D candidate boxes efficiently from the bird’s eye view
representation of 3D point cloud.
• A deep fusion scheme combines region-wise features from multiple views and enables
interactions between intermediate layers of different paths.
Multi-View 3D Object Detection Network
for Autonomous Driving
Input features of the MV3D network.
Multi-View 3D Object Detection Network
for Autonomous Driving
The network takes the bird’s eye view and front view of LIDAR point cloud as well as an image as input. It first
generates 3D object proposals from bird’s eye view map and project them to three views. A deep fusion
network is used to combine region-wise features obtained via ROI pooling for each view. The fused features
are used to jointly predict object class and do oriented 3D box regression.
End-to-end Learning of Multi-sensor 3D
Tracking by Detection
• This task, commonly referred to as Multi-target tracking, consists on identifying how many
objects there are in each frame, as well as link their trajectories over time.
• It is an approach to tracking by detection that can exploit both cameras as well as LIDAR
data to produce very accurate 3D trajectories.
• Towards this goal, it formulates the problem as inference in a deep structured model, where
the potentials are computed using convolutional neural nets.
• The matching cost of associating two detections exploits both appearance and motion via a
Siamese network that processes images and motion representations via convolutional layers.
• Inference in the model can be done exactly and efficiently by a set of feedforward passes
followed by solving a linear program.
• Importantly, the model is formulated such that it can be trained end-to-end to solve both
the detection and tracking problems.
End-to-end Learning of Multi-sensor 3D
Tracking by Detection
In this work, it formulates tracking as a system containing multiple neural networks that are interwoven
together in a single architecture. Note that the system takes as external input a time series of RGB Frames
(camera images) and LIDAR point clouds. From these inputs, the system produces discrete trajectories of the
targets. In particular, an architecture is end to end trainable while still maintaining explainability, which is
achieved by formulating the system in a structured manner.
End-to-end Learning of Multi-sensor 3D
Tracking by Detection
Neural networks designed for both
scoring and matching: the forward passes
over a set of detections from two frames.
End-to-end Learning of Multi-sensor 3D
Tracking by Detection
• To extract appearance features, employ a Siamese network based on VGG16.
• Note that in a Siamese setup, the two branches (each processing a detection) share the
same set of weights.
• This makes the architecture more efficient in terms of memory and allows learning with
fewer examples.
• In particular, resize each detection to be of dimension 224 × 224.
• To produce a concise representation of activations without using fully connected layers, each
of the max-pool outputs is passed through a product layer followed by a weighted sum,
which produces a single scalar for each max-pool layer, yielding an activation vector size 5.
• Use skip-pooling as matching should exploit both low-level features (e.g., color) as well as
semantically richer features from higher layers.
• To incorporate spatial information into the model, employ fully connected architectures that
model both 2D and 3D motion.
End-to-end Learning of Multi-sensor 3D
Tracking by Detection
• In particular, exploit 3D information in the form of a 180 × 200 occupancy grid in bird’s
eye view and 2D information from the occupancy region in the frontal view camera, scaled
down from the original resolution of 1242 × 375 to 124 × 37.
• In bird’s eye perspective, each 3D detection is projected onto a ground plane, leaving only
a rotated rectangle that reflects its occupancy in the world.
• Since the observer is a mobile platform (an autonomous vehicle, in this case), the coordinate
system between two subsequent frames would be shifted because the observer moved in
the time elapsed.
• Since its speed in each axis is known from the IMU data, one can calculate the displacement
of the observer between each observation and translate the coordinates accordingly; this
way, both grids are on the exact same coordinate system.
• The frontal view perspective encodes the rectangular area in the camera occupied by the
target, equivalent of projecting the 3D bounding box onto camera coordinates.
End-to-end Learning of Multi-sensor 3D
Tracking by Detection
Detector: MV3D
End-to-end Learning of Multi-sensor 3D
Tracking by Detection
Trajectories are color coded, such that having the same color means it’s the same object.
fusion of Camera and lidar for autonomous driving I

More Related Content

What's hot

2018.02 intro to visual odometry
2018.02 intro to visual odometry2018.02 intro to visual odometry
2018.02 intro to visual odometryBrianHoltPhD
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015Jia-Bin Huang
 
An Introduction to Image Processing and Artificial Intelligence
An Introduction to Image Processing and Artificial IntelligenceAn Introduction to Image Processing and Artificial Intelligence
An Introduction to Image Processing and Artificial IntelligenceWasif Altaf
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
입문 Visual SLAM 14강 - 2장 Introduction to slam
입문 Visual SLAM 14강  - 2장 Introduction to slam입문 Visual SLAM 14강  - 2장 Introduction to slam
입문 Visual SLAM 14강 - 2장 Introduction to slamjdo
 
3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous drivingYu Huang
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisHyeongmin Lee
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaPreferred Networks
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)Universitat Politècnica de Catalunya
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAIYurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAILviv Startup Club
 
Image ORB feature
Image ORB featureImage ORB feature
Image ORB featureGavin Gao
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIYu Huang
 

What's hot (20)

2018.02 intro to visual odometry
2018.02 intro to visual odometry2018.02 intro to visual odometry
2018.02 intro to visual odometry
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
An Introduction to Image Processing and Artificial Intelligence
An Introduction to Image Processing and Artificial IntelligenceAn Introduction to Image Processing and Artificial Intelligence
An Introduction to Image Processing and Artificial Intelligence
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
입문 Visual SLAM 14강 - 2장 Introduction to slam
입문 Visual SLAM 14강  - 2장 Introduction to slam입문 Visual SLAM 14강  - 2장 Introduction to slam
입문 Visual SLAM 14강 - 2장 Introduction to slam
 
3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Depth estimation using deep learning
Depth estimation using deep learningDepth estimation using deep learning
Depth estimation using deep learning
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
SLAM
SLAMSLAM
SLAM
 
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAIYurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
 
Image ORB feature
Image ORB featureImage ORB feature
Image ORB feature
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 

Similar to fusion of Camera and lidar for autonomous driving I

3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving IIYu Huang
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous drivingYu Huang
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IVYu Huang
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4IRJET Journal
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image IIIYu Huang
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataYu Huang
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationVijaylaxmiNagurkar
 
Udacity-Didi Challenge Finalists
Udacity-Didi Challenge FinalistsUdacity-Didi Challenge Finalists
Udacity-Didi Challenge FinalistsDavid Silver
 
Goal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D cameraGoal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D camerajournalBEEI
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfQualcomm Research
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Presentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking ProjectPresentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking ProjectPrathamesh Joshi
 
Remote Sensing Field Camp 2016
Remote Sensing Field Camp 2016 Remote Sensing Field Camp 2016
Remote Sensing Field Camp 2016 COGS Presentations
 
Rapid Laser Scanning the process
Rapid Laser Scanning the processRapid Laser Scanning the process
Rapid Laser Scanning the processSeeview Solutions
 
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNNIRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNNIRJET Journal
 

Similar to fusion of Camera and lidar for autonomous driving I (20)

3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal Data
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
Major PRC-1 ppt.pptx
Major PRC-1 ppt.pptxMajor PRC-1 ppt.pptx
Major PRC-1 ppt.pptx
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentation
 
Udacity-Didi Challenge Finalists
Udacity-Didi Challenge FinalistsUdacity-Didi Challenge Finalists
Udacity-Didi Challenge Finalists
 
Goal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D cameraGoal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D camera
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdf
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Presentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking ProjectPresentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking Project
 
Remote Sensing Field Camp 2016
Remote Sensing Field Camp 2016 Remote Sensing Field Camp 2016
Remote Sensing Field Camp 2016
 
Rapid Laser Scanning the process
Rapid Laser Scanning the processRapid Laser Scanning the process
Rapid Laser Scanning the process
 
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNNIRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
 

More from Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 

More from Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 

Recently uploaded

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 

Recently uploaded (20)

Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 

fusion of Camera and lidar for autonomous driving I

  • 1. Fusion of Camera and LiDAR for Autonomous Vehicles I (via Deep Learning) Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • A General Pipeline for 3D Detection of Vehicles • Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection • Fusing Bird’s Eye View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection • PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation • RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement • Joint 3D Proposal Generation and Object Detection from View Aggregation • Frustum PointNets for 3D Object Detection from RGB-D Data • Deep Continuous Fusion for Multi-Sensor 3D Object Detection • Multi-View 3D Object Detection Network for Autonomous Driving • End-to-end Learning of Multi-sensor 3D Tracking by Detection
  • 3. A General Pipeline for 3D Detection of Vehicles • Autonomous driving requires 3D perception of vehicles and other objects in the in environment. • Much of the current methods support 2D vehicle detection. • Here is a pipeline to adopt any 2D detection network and fuse it with a 3D point cloud to generate 3D information with minimum changes of the 2D detection networks. • To identify the 3D box, a model fitting algorithm is developed based on generalized car models and score maps. • A two-stage convolutional neural network (CNN) is proposed to refine the detected 3D box. • It requires minimum efforts to modify the existing 2D networks to fit into the pipeline, adding just one additional regression term at the output layer to estimate the vehicle dimensions.
  • 4. A General Pipeline for 3D Detection of Vehicles The raw image is passed to a 2D detection network which provides 2D boxes around the vehicles in the image plane. Subsequently, a set of 3D points which fall into the 2D bounding box after projection is selected. With this set, a model fitting algorithm detects the 3D location and 3D bounding box of the vehicle. And then another CNN network, which takes the points that fit into the 3D bounding box as input, carries out the final 3D box regression and classification.
  • 5. Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection • This is a pedestrian detector that exploits LiDAR data, in addition to visual information. • The hypothesis is that using depth data and prior info about the size of the objects, it can reduce the search space by providing candidates and, speeding up detection algorithms. • A hypothesis is that this prior definition of the location and size of the candidate bounding box will also decrease the number of false detections. • In the approach, LiDAR data is utilized to generate region proposals by processing the three dimensional point cloud that it provides. • These candidate regions are then further processed by a state-of-the-art CNN classifier that is fine-tuned for pedestrian detection.
  • 6. Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection The algorithm is built upon the idea of clustering the 3-D point cloud of the LiDAR. It starts with raw measurements down-sampling, followed by removal of the floor plane. Then, a density-based clustering algorithm generates the candidates that are projected on the image space to provide a region of interest.
  • 7. Fusing Bird’s Eye View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection • This is a method for fusing LIDAR point cloud and camera-captured images in deep convolutional neural networks (CNN). • The method constructs a layer called sparse non-homogeneous pooling layer to transform features between bird’s eye view and front view. • The sparse point cloud is used to construct the mapping between the two views. • The pooling layer allows efficient fusion of the multi-view features at any stage of the network. • This is favorable for 3D object detection using camera-LIDAR fusion for autonomous driving. • A corresponding one-stage detector is designed and tested on the KITTI bird’s eye view object detection dataset, which produces 3D bounding boxes from the bird’s eye view map. • The fusion method shows significant improvement on both speed and accuracy of the pedestrian detection over other fusion-based object detection networks.
  • 8. Fusing Bird’s Eye View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection The sparse non-homogeneous pooling layer that fuses front view image and bird’s eye view LIDAR feature.
  • 9. Fusing Bird’s Eye View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection The fusion-based one-stage object detection network with MS-CNN networks.
  • 10. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation • PointFusion, a generic 3D object detection method leverages both image and 3D point cloud information. • Unlike existing methods that either use multi-stage pipelines or hold sensor and dataset- specific assumptions, PointFusion is conceptually simple and application- agnostic. • It consists of: an off-the-shelf CNN that extracts appearance and geometry features from input RGB image crops, a variant of PointNet that processes the raw 3D point cloud, and a fusion sub-network that combines the two outputs to predict 3D bounding boxes. • The image data and the raw point cloud data are independently processed by a CNN and a PointNet architecture, respectively. • The resulting outputs are then combined by a fusion network, which predicts multiple 3D box hypotheses and their confidences, using the input 3D points as spatial anchors.
  • 11. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation Two feature extractors: a PointNet variant that processes raw point cloud data (A), and a CNN that extracts visual features from an input image (B). Two fusion network formulations: a vanilla global architecture that directly regresses the box corner locations (D), and a dense architecture that predicts the spatial offset of each of the 8 corners relative to an input point, (C): for each input point, the network predicts the spatial offset (white arrows) from a corner (red dot) to the input point (blue), and selects the prediction with the highest score as the final prediction (E).
  • 12. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement • RoarNet, an approach for 3D object detection from 2D image and 3D Lidar point clouds. • Based on two stage object detection framework with PointNet as backbone network. • The first part, RoarNet 2D, estimates the 3D poses of objects from a monocular image, which approximates where to examine further, and derives multiple candidates that are geometrically feasible. • This step significantly narrows down feasible 3D regions, which otherwise requires demanding processing of 3D point clouds in a huge search space. • The second part, RoarNet 3D, takes the candidate regions and conducts in-depth inferences to conclude final poses in a recursive manner. • RoarNet 3D processes 3D point clouds without any loss of data, leading to precise detection.
  • 13. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement The model first predicts the 2D bounding boxes and a 3D poses of objects from a 2D image. For each 2D object detection, geometric agreement search is applied to predict the location of object in 3D space. Centered on each location prediction, set region proposal which has a shape of standing cylinder. Taking the prediction error in bounding box and pose into account, there can be multiple region proposals for a single object.
  • 14. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement • Each region proposal is responsible for detecting a single object. • Taking the point clouds sampled from each region proposal as input, the model predicts the location of an object relative to the center of region proposal, which recursively serves for setting new region proposals for the next step. • The model also predicts objectness score which reflects the probability of an object being inside the region proposal. • Only those proposals with high objectness scores are considered at the next step. • At a final step, the model sets new region proposals at previously predicted locations. • The model predicts all coordinates required for 3D bounding box regression including location, rotation, and size of the objects. • For practical reason, repeating this step more than once gives better detection performance.
  • 15. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement Architecture of RoarNet 2D
  • 16. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement The backbone network is a simplified version of PointNet without T-Net.
  • 17. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement
  • 18. Joint 3D Proposal Generation and Object Detection from View Aggregation • AVOD, an Aggregate View Object Detection network for autonomous driving scenarios. • The neural network architecture uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network. • The RPN uses an architecture capable of performing multimodal feature fusion on high resolution feature maps to generate reliable 3D object proposals for multiple object classes in road scenes. • Using these proposals, the second stage detection network performs accurate oriented 3D bounding box regression and category classification to predict the extents, orientation, and classification of objects in 3D space. • The proposed architecture produces SoA results on the KITTI 3D object detection benchmark while running in real time with a low memory footprint. • Code is https://github.com/kujason/avod
  • 19. Joint 3D Proposal Generation and Object Detection from View Aggregation The method’s architectural diagram. The feature extractors are shown in blue, the region proposal network in pink, and the second stage detection network in green.
  • 20. Joint 3D Proposal Generation and Object Detection from View Aggregation The architecture of proposed high resolution feature extractor shown here for the image branch. Feature maps are propagated from the encoder to the decoder section via red arrows. Fusion is then performed at every stage of the decoder by a learned upsampling layer, followed by concatenation, and then mixing via a convolutional layer, resulting in a full resolution feature map at the last layer of the decoder.
  • 21. Joint 3D Proposal Generation and Object Detection from View Aggregation A visual comparison between the 8 corner box encoding, the axis aligned box encoding, and our 4 corner encoding.
  • 22. Joint 3D Proposal Generation and Object Detection from View Aggregation Left: 3D region proposal network output, Middle: 3D detection output, and Right: the projection of the detection output onto image space for all three classes. The 3D LIDAR point cloud has been colorized and interpolated for better visualization.
  • 23. Frustum PointNets for 3D Object Detection from RGB-D Data • 3D object detection from RGB- D data in both indoor and outdoor scenes. • While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, here directly operate on raw point clouds by popping up RGB-D scans. • However, a key challenge of this approach is how to efficiently localize objects in point clouds of large-scale scenes (region proposal). • Instead of solely relying on 3D proposals, this method leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects. • Benefited from learning directly in raw point clouds, this method is also able to precisely estimate 3D bounding boxes even under strong occlusion or with very sparse points. • Evaluated on KITTI and SUN RGB-D 3D detection benchmarks.
  • 24. Frustum PointNets for 3D Object Detection from RGB-D Data 3D object detection pipeline. Given RGB-D data, first generate 2D object region proposals in the RGB image using a CNN. Each 2D region is then extruded to a 3D viewing frustum in which to get a point cloud from depth data. Finally, the frustum PointNet predicts a (oriented and amodal) 3D bounding box for the object from the points in frustum.
  • 25. Frustum PointNets for 3D Object Detection from RGB-D Data Frustum PointNets for 3D object detection. First leverage a 2D CNN object detector to propose 2D regions and classify their content. 2D regions are then lifted to 3D and thus become frustum proposals. Given a point cloud in a frustum (n × c with n points and c channels of XYZ, intensity etc. for each point), the object instance is segmented by binary classification of each point. Based on the segmented object point cloud (m × c), a light- weight regression PointNet (T-Net) tries to align points by translation such that their centroid is close to amodal box center. At last the box estimation net estimates the amodal 3D bounding box for the object.
  • 26. Frustum PointNets for 3D Object Detection from RGB-D Data Coordinate systems for point cloud. Artificial points (black dots) are shown to illustrate (a) default camera coordinate; (b) frustum coordinate after rotating frustums to center view; (c) mask coordinate with object points’ centroid at origin; (d) object coordinate predicted by T-Net.
  • 27. Frustum PointNets for 3D Object Detection from RGB-D Data Basic architectures and IO for PointNets. Architecture is illustrated for PointNet++ (v2) models with set abstraction layers and feature propagation layers (for segmentation).
  • 28. Frustum PointNets for 3D Object Detection from RGB-D Data True positive detection boxes are in green, while false positive boxes are in red and ground truth boxes in blue are shown for false positive and false negative cases. Digit and letter beside each box denote instance id and semantic class, with “v” for cars, “p” for pedestrian and “c” for cyclist.
  • 29. Frustum PointNets for 3D Object Detection from RGB-D Data Network architectures for Frustum PointNets. v1 models are based on PointNet. v2 models are based on PointNet++ set abstraction (SA) and feature propagation (FP) layers. The architecture for residual center estimation T-Net is shared for v1 and v2. The colors (blue for segmentation nets, red for T-Net and green for box estimation nets) of the network background indicate the coordinate system of the input point cloud. Segmentation nets operate in frustum coordinate, T-Net processes points in mask coordinate while box estimation nets take points in object coordinate. The small yellow square (or bar) concatenated with global features is class one-hot vector that tells the predicted category of the underlying object.
  • 30. Deep Continuous Fusion for Multi-Sensor 3D Object Detection • It remains an open problem to design 3D detectors that can better exploit multiple modalities. • A 3D object detector can exploit both LIDAR as well as cameras to perform very accurate localization. • It reasons in bird’s eye view (BEV) and fuses image features by learning to project them into BEV space. • Towards this goal, an end-to-end learnable architecture exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution. • The proposed continuous fusion layer encode both discrete-state image features as well as continuous geometric information. • This enables designing a reliable and efficient end-to-end learnable 3D object detector based on multiple sensors.
  • 31. Deep Continuous Fusion for Multi-Sensor 3D Object Detection Architecture of model. There are two streams, namely the camera image stream and the BEV LIDAR stream. Continuous fusion layers are used to fuse the image features onto the BEV feature maps.
  • 32. Deep Continuous Fusion for Multi-Sensor 3D Object Detection Continuous fusion layer: given a target pixel on BEV image, first extract K nearest LIDAR points; then project the 3D points onto the camera image plane; this helps retrieve corresponding image features; finally feed the image feature + continuous geometry offset into a MLP to generate feature for the target pixel.
  • 33. Deep Continuous Fusion for Multi-Sensor 3D Object Detection The 2D bounding boxes are obtained by projecting the 3D detections onto the image. The bounding box of an object on BEV and images are shown in the same color.
  • 34. Multi-View 3D Object Detection Network for Autonomous Driving • Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes. • It encodes the sparse 3D point cloud with a compact multi-view representation. • The network is composed of two subnetworks: one for 3D object proposal generation and another for multi-view feature fusion. • The proposal network generates 3D candidate boxes efficiently from the bird’s eye view representation of 3D point cloud. • A deep fusion scheme combines region-wise features from multiple views and enables interactions between intermediate layers of different paths.
  • 35. Multi-View 3D Object Detection Network for Autonomous Driving Input features of the MV3D network.
  • 36. Multi-View 3D Object Detection Network for Autonomous Driving The network takes the bird’s eye view and front view of LIDAR point cloud as well as an image as input. It first generates 3D object proposals from bird’s eye view map and project them to three views. A deep fusion network is used to combine region-wise features obtained via ROI pooling for each view. The fused features are used to jointly predict object class and do oriented 3D box regression.
  • 37. End-to-end Learning of Multi-sensor 3D Tracking by Detection • This task, commonly referred to as Multi-target tracking, consists on identifying how many objects there are in each frame, as well as link their trajectories over time. • It is an approach to tracking by detection that can exploit both cameras as well as LIDAR data to produce very accurate 3D trajectories. • Towards this goal, it formulates the problem as inference in a deep structured model, where the potentials are computed using convolutional neural nets. • The matching cost of associating two detections exploits both appearance and motion via a Siamese network that processes images and motion representations via convolutional layers. • Inference in the model can be done exactly and efficiently by a set of feedforward passes followed by solving a linear program. • Importantly, the model is formulated such that it can be trained end-to-end to solve both the detection and tracking problems.
  • 38. End-to-end Learning of Multi-sensor 3D Tracking by Detection In this work, it formulates tracking as a system containing multiple neural networks that are interwoven together in a single architecture. Note that the system takes as external input a time series of RGB Frames (camera images) and LIDAR point clouds. From these inputs, the system produces discrete trajectories of the targets. In particular, an architecture is end to end trainable while still maintaining explainability, which is achieved by formulating the system in a structured manner.
  • 39. End-to-end Learning of Multi-sensor 3D Tracking by Detection Neural networks designed for both scoring and matching: the forward passes over a set of detections from two frames.
  • 40. End-to-end Learning of Multi-sensor 3D Tracking by Detection • To extract appearance features, employ a Siamese network based on VGG16. • Note that in a Siamese setup, the two branches (each processing a detection) share the same set of weights. • This makes the architecture more efficient in terms of memory and allows learning with fewer examples. • In particular, resize each detection to be of dimension 224 × 224. • To produce a concise representation of activations without using fully connected layers, each of the max-pool outputs is passed through a product layer followed by a weighted sum, which produces a single scalar for each max-pool layer, yielding an activation vector size 5. • Use skip-pooling as matching should exploit both low-level features (e.g., color) as well as semantically richer features from higher layers. • To incorporate spatial information into the model, employ fully connected architectures that model both 2D and 3D motion.
  • 41. End-to-end Learning of Multi-sensor 3D Tracking by Detection • In particular, exploit 3D information in the form of a 180 × 200 occupancy grid in bird’s eye view and 2D information from the occupancy region in the frontal view camera, scaled down from the original resolution of 1242 × 375 to 124 × 37. • In bird’s eye perspective, each 3D detection is projected onto a ground plane, leaving only a rotated rectangle that reflects its occupancy in the world. • Since the observer is a mobile platform (an autonomous vehicle, in this case), the coordinate system between two subsequent frames would be shifted because the observer moved in the time elapsed. • Since its speed in each axis is known from the IMU data, one can calculate the displacement of the observer between each observation and translate the coordinates accordingly; this way, both grids are on the exact same coordinate system. • The frontal view perspective encodes the rectangular area in the camera occupied by the target, equivalent of projecting the 3D bounding box onto camera coordinates.
  • 42. End-to-end Learning of Multi-sensor 3D Tracking by Detection Detector: MV3D
  • 43. End-to-end Learning of Multi-sensor 3D Tracking by Detection Trajectories are color coded, such that having the same color means it’s the same object.