SlideShare a Scribd company logo
1 of 93
Download to read offline
Mobility Technologies Co., Ltd.
3D Perception for Autonomous Driving
- Datasets and Algorithms -
Kazuyuki MIyazawa
AI R D Group 2, AI System Dept.
Mobility Technologies Co., Ltd.
Mobility Technologies Co., Ltd.
Who am I?
2
@kzykmyzw
Kazuyuki Miyazawa
Group Leader
AI R D Group 2
AI System Dept.
Mobility Technologies Co., Ltd.
Past Work Experience
April 2019 - March 2020
AI Research Engineer@DeNA Co., Ltd.
April 2010 - March 2019
Research Scientist@Mitsubishi Electric Corp.
Education
PhD in Information Science@Tohoku University
Mobility Technologies Co., Ltd.3
1 Autonomous Driving Datasets
Agenda
2 3D Object Detection Algorithms
Mobility Technologies Co., Ltd.
3D Object Detection: Motivation
■ 2D bounding boxes are not sufficient
■ Lack of 3D pose, Occlusion information, and 3D location
Preliminary (Today’s Main Topic)
4
2D Object Detection 3D Object Detection
http://www.cs.toronto.edu/~byang/
Mobility Technologies Co., Ltd.
Autonomous Driving
Datasets
5
01
Mobility Technologies Co., Ltd.
KITTI [2012]
6
Sensor Setup
● GPS/IMU x 1
● LiDAR (64ch) x 1
● Grayscale Camera (1.4M) x 2
● Color Camera (1.4M) x 2
http://www.cvlibs.net/datasets/kitti/
Mobility Technologies Co., Ltd.
KITTI [2012]
7
Mobility Technologies Co., Ltd.
3D Object Detection
8
● 7,481 training images / point clouds
● 7,518 test images / point clouds
● 80,256 labeled objects
type Car, Van, Truck, Pedestrian, Person_sitting, Cyclist,
Tram, Misc or DontCare
truncated 0 to 1, where truncated refers to the object leaving
image boundaries
occuluded 0 = fully visible, 1 = partly occluded, 2 = largely occluded,
3 = unknown
alpha Observation angle of object, ranging [-pi..pi]
bbox 2D bounding box of object in the image
dimensions 3D object dimensions: height, width, length
location 3D object location x,y,z in camera coordinate
rotation_y Rotation ry around Y-axis in camera coordinates [-pi..pi]
Annotations
Mobility Technologies Co., Ltd.
License
9
Mobility Technologies Co., Ltd.
Variants of KITTI
10
SemanticKITTI Dataset provides
annotations that associate each LiDAR
point with one of 28 semantic classes in all
22 sequences of the KITTI Dataset
http://semantic-kitti.org/
Virtual KITTI contains 50 high-resolution
monocular videos (21,260 frames)
generated from five different virtual worlds
in urban settings under different imaging
and weather conditions
https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds/
Mobility Technologies Co., Ltd.
ApolloScape [2017]
11
Sensor Setup
● GPS/IMU x 1
● LiDAR x 2
● Color Camera (9.2M) x 2
http://apolloscape.auto/
Mobility Technologies Co., Ltd.
ApolloScape [2017]
12
Scene Parsing
3D Car Instance
Lane Segmentation
Mobility Technologies Co., Ltd.
ApolloScape [2017]
13
Self Localization Stereo
Mobility Technologies Co., Ltd.
3D Object Detection
14
● 53 min training sequences
● 50 min testing sequences
● 70K 3D fitted cars
type Small vehicle, Big vehicle, Pedestrian, Motorcyclist and
Bicyclist, Traffic cones, Others
dimensions 3D object dimensions: height, width, length
location 3D object location x,y,z in relative coordinate
heading Steering radian with respect to the direction of the object
Annotations
Mobility Technologies Co., Ltd.
■ To the extent that we authorize the Developer to use Datasets and subject to the terms of this
Agreement, the Developer is entitled to use the Datasets only (i) for Developer’s internal
purposes of non-commercial research or teaching and (ii) in accordance with the terms of this
Agreement.
License
15
http://apolloscape.auto/license.html
Mobility Technologies Co., Ltd.
nuScenes [2019]
16
Sensor Setup
● GPS/IMU x 1
● LiDAR (32ch) x 1
● RADAR x 5
● Color Camera (1.4M) x 3
https://www.nuscenes.org/
Mobility Technologies Co., Ltd.
Semantic Map
17
● Provide highly accurate human-annotated
semantic maps of the relevant areas
● 11 semantic classes
● Encourage the use of localization and
semantic maps as strong priors for all tasks
Mobility Technologies Co., Ltd.
3D Object Detection
18
● category
● attribute
● visibility
● instance
● sensor
● calibrated_sensor
● ego_pose
● log
● scene
● sample
● sample_data
● sample_annotation
● map
Number of annotations per category
Attributes distribution for selected categories
1.4M boxes in total
Mobility Technologies Co., Ltd.
License
19
Mobility Technologies Co., Ltd.
Argoverse [2019]
20
Sensor Setup
● GPS x 1
● LiDAR (32ch) x 2
● Color Camera (4.8M) x 2
● Color Camera (2M) x 7
https://www.argoverse.org/
Mobility Technologies Co., Ltd.
Argoverse Maps
21
Vector Map:
Lane-Level Geometry
Rasterized Map:
Ground Height
Rasterized Map:
Drivable Area
Mobility Technologies Co., Ltd.
3D Object Detection (3D Tracking)
22
● Collection of 113 log segments with
3D object tracking annotations
● These log segments vary in length
from 15 to 30 seconds and contain
a total of 11,052 tracks
● Each sequence includes
annotations for all objects within 5
meters of “drivable area” — the
area in which it is possible for a
vehicle to drive
Mobility Technologies Co., Ltd.
License
23
Mobility Technologies Co., Ltd.
Lyft Level 5 [2019]
24
Sensor Setup (BETA_V0)
● LiDAR (40ch) x 3
● WFOV Camera (1.2M) x 6
● Long-focal-length Camera (1.7M) x 1
Sensor Setup (BETA_++)
● LiDAR (64ch) x 1
● LiDAR (40ch) x 2
● WFOV Camera (2M) x 6
● Long-focal-length Camera (2M) x 1
https://level5.lyft.com/dataset/
Mobility Technologies Co., Ltd.
Semantic Map
25
Mobility Technologies Co., Ltd.
3D Object Detection (Same format as nuScenes)
26
● category
● attribute
● visibility
● instance
● sensor
● calibrated_sensor
● ego_pose
● log
● scene
● sample
● sample_data
● sample_annotation
● map
animal
bicycle
bus
car
emergency_vehicle
motorcycle
other_vehicle
pedestrian
truck
638K boxes in total
Mobility Technologies Co., Ltd.
License
27
Mobility Technologies Co., Ltd.
Audi Autonomous Driving Dataset (A2D2) [2020]
28
Sensor Setup
● GPS/IMU x 1
● LiDAR (16ch) x 5
● Color Camera (2.3M) x 6
https://www.a2d2.audi/a2d2/en.html
Mobility Technologies Co., Ltd.
Audi Autonomous Driving Dataset (A2D2) [2020]
29
Mobility Technologies Co., Ltd.
3D Object Detection
30
● All images have corresponding
LiDAR point clouds, of which
12,497 are annotated with 3D
bounding boxes within the field
of view of the front-center
camera
Mobility Technologies Co., Ltd.
License
31
Mobility Technologies Co., Ltd.
Comparison
32
? ? ?
? ? ?
These figures are based on Table 1 in https://arxiv.org/abs/1912.04838
Mobility Technologies Co., Ltd.
Comparison
33
Waymo Waymo Waymo
Waymo Waymo Waymo
These figures are based on Table 1 in https://arxiv.org/abs/1912.04838
Mobility Technologies Co., Ltd.
Waymo Open Dataset [2019]
34
Sensor Setup
● Mid-Range (~75m) LiDAR x 1
● Short-Range (~20m) LiDAR x 4
● Color Camera (2M) x 3
● Color Camera (1.6M) x 2
https://waymo.com/open/
Mobility Technologies Co., Ltd.
Data Volume
35
Train
798 segments
w/ labels
(757 GB)
Test
150 seg.
w/o labels
(192 GB)
Validation
202 seg.
w/ labels
(144 GB)
● Contain 1150 segments that each span 20 seconds
● Additionally, segments from a new location and only a subset have labels
are provided for domain adaptation
Mobility Technologies Co., Ltd.
Data Format
36
Segment Frame context Shared information among all frames in the scene (e.g., calibration parameters, stats)
timestamp_micros Frame timestamp
pose Vehicle pose
images Camera images and metadata (e.g., pose, velocity, timestamp)
lasers Range images
laser_labels 3D box annotations
projected_lidar_labels Lidar labels (laser_labels) projected to camera images
camera_labels 2D box annotations
no_label_zones Polygon that represents areas without labels (e.g., opposite side of a highway)
Frame ...
● Each segment (20 sec) consists of ~200 frames (10 Hz)
● All the data related to a segment is stored to a single tfrecord and represented
as protocol buffers
Mobility Technologies Co., Ltd.
Range Image
37
The point cloud of each LiDAR is encoded as a range image
1streturn2ndreturn
range
intensity
elongation
range
intensity
elongation
Mobility Technologies Co., Ltd.
API & Tutorial in colab
38
https://github.com/waymo-research/waymo-open-dataset
https://colab.research.google.com/github/waymo-research/waymo-open-dataset/blob/master/tutorial/tutorial.ipynb
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
39
Mid-range LiDAR
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
40
Mid-range LiDAR
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
41
Mid-range LiDAR
Short-range LiDAR (front)
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
42
Mid-range LiDAR
Short-range LiDAR (right)
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
43
Mid-range LiDAR
Short-range LiDAR (rear)
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
44
Mid-range LiDAR
Short-range LiDAR (left)
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
45
Mid-range LiDAR
Short-range LiDARs (all)
Mobility Technologies Co., Ltd.
Data Visualization (Camera Images)
46
Front Left
1920x1080
Front
1920x1080
Front Right
1920x1080
Side Left
1920x886
Side Right
1920x886
Mobility Technologies Co., Ltd.
3D Object Detection
47
■ 3D LiDAR Lables
■ 3D 7-DOF bounding boxes in the
vehicle frame with globally unique
tracking IDs
■ vehicles, pedestrian, cyclists, signs
■ 2D Camera Lables
■ Not projections of 3D labels
■ vehicles, pedestrian, cyclists
■ Tight-fitting, axis-aligned 2D
bounding boxes with globally
unique tracking IDs
Vehicle Pedestrian Cyclists Signs
3D
Object 6.1M 2.8M 67K 3.2M
3D
TrackID 60K 23K 620 23K
2D
Object 7.7M 2.1M 63K -
2D
TrackID 164K 45K 1.3K -
Labeled object and tracking ID counts
Mobility Technologies Co., Ltd.
2D Label Samples
48
Mobility Technologies Co., Ltd.
3D Label Samples
49
Mobility Technologies Co., Ltd.
LiDAR to Camera Projection
50
■ Cameras and LiDARs data are well-synchronized
■ LiDAR points can be projected to camera image with rolling shutter effect compensation
Mobility Technologies Co., Ltd.
Challenges
51
Mobility Technologies Co., Ltd.
Evaluation Metrics for 3D Object Detection
52
https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html
P/R Curve Average Precision with Heading
Each true positive is weighted by heading
accuracy defined as
Ground truth
Prediction
Mobility Technologies Co., Ltd.
To ensure the Dataset is only used for Non-Commercial Purposes, You agree
■ Not to distribute or publish any models trained on or refined using the Dataset,
or the weights or biases from such trained models
■ Not to use or deploy the Dataset, any models trained on or refined using the
Dataset, or the weights or biases from such trained models (i) in operation of a
vehicle or to assist in the operation of a vehicle, (ii) in any Production Systems,
or (iii) for any other primarily commercial purposes
License
53
https://waymo.com/open/terms/
Mobility Technologies Co., Ltd.
3D Object Detection
Algorithms
54
02
Mobility Technologies Co., Ltd.
■ Design a novel type of neural network that directly consumes point clouds, which well respects
the permutation invariance of points in the input
■ Provide a unified architecture for applications ranging from object classification, part
segmentation, to scene semantic parsing
PointNet [C. Qi+, CVPR2017]
55
https://arxiv.org/abs/1612.00593
Mobility Technologies Co., Ltd.
PointNet Architecture
56
Mobility Technologies Co., Ltd.
PointNet Architecture
57
Predict an affine transformation
matrix by a mini-network and align
all input set to achieve invariance
against geometric transformations
Mobility Technologies Co., Ltd.
PointNet Architecture
58
The same alignment approach
is also applied in feature space
Mobility Technologies Co., Ltd.
PointNet Architecture
59
Using max pooling as
symmetric function, aggregate
unordered point features
Mobility Technologies Co., Ltd.
■ Divide a point cloud into 3D voxels and transform them into a unified feature representation
■ Descriptive volumetric representation is then connected to a RPN to generate detections
VoxelNet [Y, Zhou+, CVPR2018]
60
A voxel represents a value
on a regular grid in three-
dimensional space
https://en.wikipedia.org/wiki/Voxel
LiDAR ONLY
https://arxiv.org/abs/1711.06396
Mobility Technologies Co., Ltd.
Voxel Feature Encoding (VFE) Layer
61
● VFE enables inter-point interaction within
a voxel, by combining point-wise features
with a locally aggregated feature.
● Stacking multiple VFE layers allows
learning complex features for
characterizing local 3D shape information
Mobility Technologies Co., Ltd.
Convolutional Middle Layers
62
● Each convolutional middle layer applies 3D
convolution, BN layer, and ReLU layer
sequentially
● Convolutional middle layers aggregate
voxel-wise features within a progressively
expanding receptive field, adding more
context to the shape description
Mobility Technologies Co., Ltd.
Region Proposal Network
63
● The first layer of each block downsamples the input feature map
● Then the output of every block is upsampled to a fixed size and
concatenated to construct the high resolution feature map
● Finally, this feature map is mapped to the desired learning targets
Mobility Technologies Co., Ltd.
Evaluation on KITTI
64
Performance comparison on KITTI validation set
Performance comparison on KITTI test set
Mobility Technologies Co., Ltd.
■ Apply sparse convolution to greatly increase the speeds of training and inference
■ Introduce a novel angle loss regression approach to solve the problem of the large loss
generated when the angle prediction error is equal to π
SECOND (Sparsely Embedded CONvolutional Detection) [Y, Yan+, Sensors2018]
65
LiDAR ONLY
https://pdfs.semanticscholar.org/5125/a16039cabc6320c908a4764f32596e018ad3.pdf
Mobility Technologies Co., Ltd.
Sparse Convolution Algorithm
66
■ Gather the necessary input to construct the matrix, perform GEMM, then scatter the data back
■ GPU-based rule generation algorithm is proposed to construct input–output index rule matrix
Mobility Technologies Co., Ltd.
■ Directly predicting the radian offset suffers from an adversarial example problem between the
cases of 0 and π radians because they correspond to the same box but generate a large loss
when one is misidentified as the other
■ Solve this problem by introducing a new angle loss regression:
■ To address the issue that this loss treats boxes with opposite directions as being the same, a
simple direction classifier is added to the output of the RPN
Sine-Error Loss for Angle Regression
67
Mobility Technologies Co., Ltd.
Evaluation on KITTI
68
Performance comparison on KITTI validation set
Performance comparison on KITTI test set
Mobility Technologies Co., Ltd.
PointPillars [A. Lang+, CVPR2019]
69
■ Propose an encoder to learn a representation of point clouds organized in vertical columns
(pillars) and generate pseudo 2D image
■ Encoded features can be used with any standard 2D convolutional detection architecture
without computationally-expensive 3D ConvNets
LiDAR ONLY
https://arxiv.org/abs/1812.05784
Mobility Technologies Co., Ltd.
Pointcloud to Pseudo-Image
70
Point cloud is discretized into an evenly
spaced grid in the x-y plane,creating a
set of pillars
Mobility Technologies Co., Ltd.
Pointcloud to Pseudo-Image
71
Create a dense tensor of size (D, P, N)
D: Dimension of augmented lidar point (=9)
P: Number of non-empty pillars per sample
N: Number of points per pillar
Mobility Technologies Co., Ltd.
Pointcloud to Pseudo-Image
72
Apply PointNet to generate a (C, P,
N) sized feature tensor, followed by a
max operation over the channels to
create an output tensor of size (C, P)
Mobility Technologies Co., Ltd.
Pointcloud to Pseudo-Image
73
Features are scattered back to the original
pillar locations to create a pseudo-image of
size (C, H, W) where H and W indicate the
height and width of the canvas
Mobility Technologies Co., Ltd.
Backbone
74
Top-down network produces
features at increasingly
small spatial resolution
Second network performs
upsampling and concatenation
of the top-down features
Mobility Technologies Co., Ltd.
Detection Head
75
Single Shot Detector (SSD) is
used with additional regression
targets (height and elevation)
Mobility Technologies Co., Ltd.
Evaluation on KITTI
76
Performance comparison on KITTI test set
Mobility Technologies Co., Ltd.
■ Implementaion
■ Official PointPillar’s implementation is forked from SECOND’s
implementation and is no longer maintained
■ Instead, SECOND’s implementation now supports PointPillars
■ Format Conversion
■ SECOND’s implementation only supports KITTI and nuScenes, so format
conversion is the fastest way to use Waymo Open Dataset
■ Several converters can be found on GitHub
■ Waymo_Kitti_Adapter
■ waymo_kitti_converter
Let’s Try PointPillars on Waymo Open Dataset
77
Mobility Technologies Co., Ltd.
These results are just for reference, because only a part of training set is used and hyper parameters are not
tuned to Waymo Open Dataset at all
Vehicle Detection Results
78
Mobility Technologies Co., Ltd.
These results are just for reference, because only a part of training set is used and hyper parameters are not
tuned to Waymo Open Dataset at all
Vehicle Detection Results
79
Mobility Technologies Co., Ltd.
Results from Leaderboard on Waymo Open Dataset
80
https://waymo.com/open/challenges/3d-detection/#
Mobility Technologies Co., Ltd.
■ First generate 2D object region proposals in the RGB image using CNN, then each 2D region
is extruded to a 3D viewing frustum to get a point cloud
■ PointNet predicts a 3D bounding box for the object from the points in frustum
Frustum PointNets [C. Qi+, CVPR2018]
81
LiDAR + Camera
https://arxiv.org/abs/1812.05784
Mobility Technologies Co., Ltd.
Frustum Proposal
82
● Use object detector in RGB image to predict a 2D bounding box
and lift it to a frustum with a known camera matrix
● Collect all points within the frustum to form a frustum point cloud
Mobility Technologies Co., Ltd.
3D Instance Segmentation
83
Object instance is segmented by
binary classification of each point
using PointNet
Mobility Technologies Co., Ltd.
Amodal 3D Box Estimation
84
Estimate the object’s amodal
oriented 3D bounding box by
using a box regression PointNet
Estimate the true center of the
complete object and then
transform the coordinate such
that the predicted center
becomes the origin
Mobility Technologies Co., Ltd.
Evaluation on KITTI
85
Performance comparison on KITTI validation set
Performance comparison on KITTI test set
Mobility Technologies Co., Ltd.
PV-RCNN [S. Shi+, CVPR2020]
86
https://arxiv.org/abs/1912.13192
■ Voxel-based operation efficiently encodes multi-scale feature representations and can
generate high-quality 3D proposals, while the PointNet-based set abstraction operation
preserves accurate location information with flexible receptive fields
■ Integrate the two operations via the voxel-to-keypoint 3D scene encoding and the keypoint-to-
grid RoI feature abstraction
LiDAR ONLY
Mobility Technologies Co., Ltd.
3D Voxel CNN for Feature Encoding and Proposal Generation
87
Input points are first divided into
voxels and gradually converted into
feature volumes by 3D sparse CNN
By converting 3D feature volumes
into 2D bird-view feature maps,
high-quality 3D proposals are
generated following the anchor-
based approaches
Mobility Technologies Co., Ltd.
Voxel-to-keypoint Scene Encoding via Voxel Set Abstraction
88
Small number of
keypoints are sampled
from the point clouds
PointNet-based set abstraction module encodes
the multi-scale semantic features from the 3D
CNN feature volumes to the keypoints.
Check if each key point is inside or
outside of a ground-truth 3D box,
and re-weight the keypoint features
Mobility Technologies Co., Ltd.
Keypoint-to-grid RoI Feature Abstraction for Proposal Refinement
89
RoI-grid pooling module
aggregates the keypoint
features to the RoI-grid
points with multiple
receptive fields using
PointNet
Mobility Technologies Co., Ltd.
Evaluation on KITTI / Waymo Open Dataset
90
Performance comparison on KITTI test set
Performance comparison on Waymo OD validation set
Mobility Technologies Co., Ltd.
We Don’t Need Camera?
91
3D vehicle detection performance on KITTI test set (moderate)
LiDAR only
LiDAR + Camera
Mobility Technologies Co., Ltd.
■ Autonomous Driving Dataset
■ KITTI is most famous and frequently used dataset for vehicle related researches, however, it has
limited amount and the performance on the dataset is coming to a head (> 80% AP)
■ More recent datasets provide much larger multi-modal sensor data and annotations, and some of
them also provide semantic maps
■ Waymo Open Dataset is one of the largest and most diverse datasets ever released, and provides
high-quality (meata)data and annotations (but unfortunately, it’s NOT commercial-friendly at all)
■ 3D Object Detection Algorithms
■ Recent 3D object detection algorithms re-purpose camera-based detection architectures, which has
been greatly advanced by CNN and many mature techniques such as region proposal
■ Main two streams are the grid-based methods and the point-based methods, and a key component in
the former is 2D/3D CNN, and PointNet in the latter
■ Current SoTAs are dominated by LiDAR-only methods and LiDAR-camera fusion methods lag behind
Summary
92
·
Mobility Technologies Co., Ltd.

More Related Content

What's hot

Depth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningYu Huang
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
NDGeospatialSummit2019 - Drone Based Lidar and the Future of Survey/GIS
NDGeospatialSummit2019 - Drone Based Lidar and the Future of Survey/GISNDGeospatialSummit2019 - Drone Based Lidar and the Future of Survey/GIS
NDGeospatialSummit2019 - Drone Based Lidar and the Future of Survey/GISNorth Dakota GIS Hub
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
Neural Scene Representation & Rendering: Introduction to Novel View SynthesisNeural Scene Representation & Rendering: Introduction to Novel View Synthesis
Neural Scene Representation & Rendering: Introduction to Novel View SynthesisVincent Sitzmann
 
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...Edge AI and Vision Alliance
 
SfM Learner系単眼深度推定手法について
SfM Learner系単眼深度推定手法についてSfM Learner系単眼深度推定手法について
SfM Learner系単眼深度推定手法についてRyutaro Yamauchi
 
Trend of 3D object detections
Trend of 3D object detectionsTrend of 3D object detections
Trend of 3D object detectionsEiji Sekiya
 
Point Cloud and its applications
Point Cloud and its applicationsPoint Cloud and its applications
Point Cloud and its applicationsLeonis Wong
 
SuperGlue; Learning Feature Matching with Graph Neural Networks (CVPR'20)
SuperGlue;Learning Feature Matching with Graph Neural Networks (CVPR'20)SuperGlue;Learning Feature Matching with Graph Neural Networks (CVPR'20)
SuperGlue; Learning Feature Matching with Graph Neural Networks (CVPR'20)Yusuke Uchida
 
SSII2019TS: 実践カメラキャリブレーション ~カメラを用いた実世界計測の基礎と応用~
SSII2019TS: 実践カメラキャリブレーション ~カメラを用いた実世界計測の基礎と応用~SSII2019TS: 実践カメラキャリブレーション ~カメラを用いた実世界計測の基礎と応用~
SSII2019TS: 実践カメラキャリブレーション ~カメラを用いた実世界計測の基礎と応用~SSII
 
20160724_cv_sfm_revisited
20160724_cv_sfm_revisited20160724_cv_sfm_revisited
20160724_cv_sfm_revisitedKyohei Unno
 
[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals
[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals
[DL輪読会]ViNG: Learning Open-World Navigation with Visual GoalsDeep Learning JP
 
Computer Vision – From traditional approaches to deep neural networks
Computer Vision – From traditional approaches to deep neural networksComputer Vision – From traditional approaches to deep neural networks
Computer Vision – From traditional approaches to deep neural networksinovex GmbH
 
Deformable Part Modelとその発展
Deformable Part Modelとその発展Deformable Part Modelとその発展
Deformable Part Modelとその発展Takao Yamanaka
 
Visual Object Tracking: review
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: reviewDmytro Mishkin
 
SLAM入門 第2章 SLAMの基礎
SLAM入門 第2章 SLAMの基礎SLAM入門 第2章 SLAMの基礎
SLAM入門 第2章 SLAMの基礎yohei okawa
 

What's hot (20)

Depth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep Learning
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
NDGeospatialSummit2019 - Drone Based Lidar and the Future of Survey/GIS
NDGeospatialSummit2019 - Drone Based Lidar and the Future of Survey/GISNDGeospatialSummit2019 - Drone Based Lidar and the Future of Survey/GIS
NDGeospatialSummit2019 - Drone Based Lidar and the Future of Survey/GIS
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
Neural Scene Representation & Rendering: Introduction to Novel View SynthesisNeural Scene Representation & Rendering: Introduction to Novel View Synthesis
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
 
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
 
SfM Learner系単眼深度推定手法について
SfM Learner系単眼深度推定手法についてSfM Learner系単眼深度推定手法について
SfM Learner系単眼深度推定手法について
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Trend of 3D object detections
Trend of 3D object detectionsTrend of 3D object detections
Trend of 3D object detections
 
Point Cloud and its applications
Point Cloud and its applicationsPoint Cloud and its applications
Point Cloud and its applications
 
SuperGlue; Learning Feature Matching with Graph Neural Networks (CVPR'20)
SuperGlue;Learning Feature Matching with Graph Neural Networks (CVPR'20)SuperGlue;Learning Feature Matching with Graph Neural Networks (CVPR'20)
SuperGlue; Learning Feature Matching with Graph Neural Networks (CVPR'20)
 
SSII2019TS: 実践カメラキャリブレーション ~カメラを用いた実世界計測の基礎と応用~
SSII2019TS: 実践カメラキャリブレーション ~カメラを用いた実世界計測の基礎と応用~SSII2019TS: 実践カメラキャリブレーション ~カメラを用いた実世界計測の基礎と応用~
SSII2019TS: 実践カメラキャリブレーション ~カメラを用いた実世界計測の基礎と応用~
 
20160724_cv_sfm_revisited
20160724_cv_sfm_revisited20160724_cv_sfm_revisited
20160724_cv_sfm_revisited
 
[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals
[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals
[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals
 
Computer Vision – From traditional approaches to deep neural networks
Computer Vision – From traditional approaches to deep neural networksComputer Vision – From traditional approaches to deep neural networks
Computer Vision – From traditional approaches to deep neural networks
 
Deformable Part Modelとその発展
Deformable Part Modelとその発展Deformable Part Modelとその発展
Deformable Part Modelとその発展
 
Visual Object Tracking: review
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: review
 
=SLAM ppt.pdf
=SLAM ppt.pdf=SLAM ppt.pdf
=SLAM ppt.pdf
 
Visual slam
Visual slamVisual slam
Visual slam
 
SLAM入門 第2章 SLAMの基礎
SLAM入門 第2章 SLAMの基礎SLAM入門 第2章 SLAMの基礎
SLAM入門 第2章 SLAMの基礎
 

Similar to 3D Perception for Autonomous Driving - Datasets and Algorithms -

fyp presentation of group 43011 final.pptx
fyp presentation of group 43011 final.pptxfyp presentation of group 43011 final.pptx
fyp presentation of group 43011 final.pptxIIEE - NEDUET
 
License Plate Recognition
License Plate RecognitionLicense Plate Recognition
License Plate RecognitionAmr Rashed
 
Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...
Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...
Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...IJEEE
 
IRJET - Floor Cleaning Robot with Vision
IRJET - Floor Cleaning Robot with VisionIRJET - Floor Cleaning Robot with Vision
IRJET - Floor Cleaning Robot with VisionIRJET Journal
 
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole GroupEdge AI and Vision Alliance
 
IRJET - Vehicle Signal Breaking Alert System
IRJET - Vehicle Signal Breaking Alert SystemIRJET - Vehicle Signal Breaking Alert System
IRJET - Vehicle Signal Breaking Alert SystemIRJET Journal
 
IRJET- Proposed Design for 3D Map Generation using UAV
IRJET- Proposed Design for 3D Map Generation using UAVIRJET- Proposed Design for 3D Map Generation using UAV
IRJET- Proposed Design for 3D Map Generation using UAVIRJET Journal
 
What we've done so far with mago3D, an open source based 'Digital Twin' platf...
What we've done so far with mago3D, an open source based 'Digital Twin' platf...What we've done so far with mago3D, an open source based 'Digital Twin' platf...
What we've done so far with mago3D, an open source based 'Digital Twin' platf...SANGHEE SHIN
 
DESIGN & DEVELOPMENT OF UNMANNED GROUND VEHICLE
DESIGN & DEVELOPMENT OF UNMANNED GROUND VEHICLEDESIGN & DEVELOPMENT OF UNMANNED GROUND VEHICLE
DESIGN & DEVELOPMENT OF UNMANNED GROUND VEHICLEIRJET Journal
 
mago3D, A Brand-New Web Based Open Source GeoBIM Platform
mago3D, A Brand-New Web Based Open Source GeoBIM Platformmago3D, A Brand-New Web Based Open Source GeoBIM Platform
mago3D, A Brand-New Web Based Open Source GeoBIM PlatformSANGHEE SHIN
 
라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션
라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션
라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션Impyeong Lee
 
Worknet smart pole overview
Worknet smart pole overviewWorknet smart pole overview
Worknet smart pole overviewMike Maziarka
 
Presentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking ProjectPresentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking ProjectPrathamesh Joshi
 
COMPARATIVE STUDY ON AUTOMATED NUMBER PLATE EXTRACTION USING OPEN CV AND MATLAB
COMPARATIVE STUDY ON AUTOMATED NUMBER PLATE EXTRACTION USING OPEN CV AND MATLABCOMPARATIVE STUDY ON AUTOMATED NUMBER PLATE EXTRACTION USING OPEN CV AND MATLAB
COMPARATIVE STUDY ON AUTOMATED NUMBER PLATE EXTRACTION USING OPEN CV AND MATLABIRJET Journal
 
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...AugmentedWorldExpo
 
IRJET - Detection of Landmine using Robotic Vehicle
IRJET -  	  Detection of Landmine using Robotic VehicleIRJET -  	  Detection of Landmine using Robotic Vehicle
IRJET - Detection of Landmine using Robotic VehicleIRJET Journal
 
Introduction to mago3D: A Web Based Open Source GeoBIM Platform
Introduction to mago3D: A Web Based Open Source GeoBIM PlatformIntroduction to mago3D: A Web Based Open Source GeoBIM Platform
Introduction to mago3D: A Web Based Open Source GeoBIM PlatformSANGHEE SHIN
 
IRJET - Automated Gate for Vehicular Entry using Image Processing
IRJET - Automated Gate for Vehicular Entry using Image ProcessingIRJET - Automated Gate for Vehicular Entry using Image Processing
IRJET - Automated Gate for Vehicular Entry using Image ProcessingIRJET Journal
 
Introduction to mago3D, an Open Source Based Digital Twin Platform
Introduction to mago3D, an Open Source Based Digital Twin PlatformIntroduction to mago3D, an Open Source Based Digital Twin Platform
Introduction to mago3D, an Open Source Based Digital Twin PlatformSANGHEE SHIN
 
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료BJ Jang
 

Similar to 3D Perception for Autonomous Driving - Datasets and Algorithms - (20)

fyp presentation of group 43011 final.pptx
fyp presentation of group 43011 final.pptxfyp presentation of group 43011 final.pptx
fyp presentation of group 43011 final.pptx
 
License Plate Recognition
License Plate RecognitionLicense Plate Recognition
License Plate Recognition
 
Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...
Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...
Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...
 
IRJET - Floor Cleaning Robot with Vision
IRJET - Floor Cleaning Robot with VisionIRJET - Floor Cleaning Robot with Vision
IRJET - Floor Cleaning Robot with Vision
 
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
 
IRJET - Vehicle Signal Breaking Alert System
IRJET - Vehicle Signal Breaking Alert SystemIRJET - Vehicle Signal Breaking Alert System
IRJET - Vehicle Signal Breaking Alert System
 
IRJET- Proposed Design for 3D Map Generation using UAV
IRJET- Proposed Design for 3D Map Generation using UAVIRJET- Proposed Design for 3D Map Generation using UAV
IRJET- Proposed Design for 3D Map Generation using UAV
 
What we've done so far with mago3D, an open source based 'Digital Twin' platf...
What we've done so far with mago3D, an open source based 'Digital Twin' platf...What we've done so far with mago3D, an open source based 'Digital Twin' platf...
What we've done so far with mago3D, an open source based 'Digital Twin' platf...
 
DESIGN & DEVELOPMENT OF UNMANNED GROUND VEHICLE
DESIGN & DEVELOPMENT OF UNMANNED GROUND VEHICLEDESIGN & DEVELOPMENT OF UNMANNED GROUND VEHICLE
DESIGN & DEVELOPMENT OF UNMANNED GROUND VEHICLE
 
mago3D, A Brand-New Web Based Open Source GeoBIM Platform
mago3D, A Brand-New Web Based Open Source GeoBIM Platformmago3D, A Brand-New Web Based Open Source GeoBIM Platform
mago3D, A Brand-New Web Based Open Source GeoBIM Platform
 
라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션
라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션
라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션
 
Worknet smart pole overview
Worknet smart pole overviewWorknet smart pole overview
Worknet smart pole overview
 
Presentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking ProjectPresentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking Project
 
COMPARATIVE STUDY ON AUTOMATED NUMBER PLATE EXTRACTION USING OPEN CV AND MATLAB
COMPARATIVE STUDY ON AUTOMATED NUMBER PLATE EXTRACTION USING OPEN CV AND MATLABCOMPARATIVE STUDY ON AUTOMATED NUMBER PLATE EXTRACTION USING OPEN CV AND MATLAB
COMPARATIVE STUDY ON AUTOMATED NUMBER PLATE EXTRACTION USING OPEN CV AND MATLAB
 
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
 
IRJET - Detection of Landmine using Robotic Vehicle
IRJET -  	  Detection of Landmine using Robotic VehicleIRJET -  	  Detection of Landmine using Robotic Vehicle
IRJET - Detection of Landmine using Robotic Vehicle
 
Introduction to mago3D: A Web Based Open Source GeoBIM Platform
Introduction to mago3D: A Web Based Open Source GeoBIM PlatformIntroduction to mago3D: A Web Based Open Source GeoBIM Platform
Introduction to mago3D: A Web Based Open Source GeoBIM Platform
 
IRJET - Automated Gate for Vehicular Entry using Image Processing
IRJET - Automated Gate for Vehicular Entry using Image ProcessingIRJET - Automated Gate for Vehicular Entry using Image Processing
IRJET - Automated Gate for Vehicular Entry using Image Processing
 
Introduction to mago3D, an Open Source Based Digital Twin Platform
Introduction to mago3D, an Open Source Based Digital Twin PlatformIntroduction to mago3D, an Open Source Based Digital Twin Platform
Introduction to mago3D, an Open Source Based Digital Twin Platform
 
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
 

More from Kazuyuki Miyazawa

VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...Kazuyuki Miyazawa
 
Teslaにおけるコンピュータビジョン技術の調査 (2)
Teslaにおけるコンピュータビジョン技術の調査 (2)Teslaにおけるコンピュータビジョン技術の調査 (2)
Teslaにおけるコンピュータビジョン技術の調査 (2)Kazuyuki Miyazawa
 
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...Kazuyuki Miyazawa
 
Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査Kazuyuki Miyazawa
 
ドラレコ + CV = 地図@Mobility Technologies
ドラレコ + CV = 地図@Mobility Technologiesドラレコ + CV = 地図@Mobility Technologies
ドラレコ + CV = 地図@Mobility TechnologiesKazuyuki Miyazawa
 
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for VisionMLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for VisionKazuyuki Miyazawa
 
CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選Kazuyuki Miyazawa
 
kaggle NFL 1st and Future - Impact Detection
kaggle NFL 1st and Future - Impact Detectionkaggle NFL 1st and Future - Impact Detection
kaggle NFL 1st and Future - Impact DetectionKazuyuki Miyazawa
 
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth EstimationKazuyuki Miyazawa
 
How Much Position Information Do Convolutional Neural Networks Encode?
How Much Position Information Do Convolutional Neural Networks Encode?How Much Position Information Do Convolutional Neural Networks Encode?
How Much Position Information Do Convolutional Neural Networks Encode?Kazuyuki Miyazawa
 
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...Kazuyuki Miyazawa
 
Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
Devil is in the Edges: Learning Semantic Boundaries from Noisy AnnotationsDevil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
Devil is in the Edges: Learning Semantic Boundaries from Noisy AnnotationsKazuyuki Miyazawa
 

More from Kazuyuki Miyazawa (14)

VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
 
Teslaにおけるコンピュータビジョン技術の調査 (2)
Teslaにおけるコンピュータビジョン技術の調査 (2)Teslaにおけるコンピュータビジョン技術の調査 (2)
Teslaにおけるコンピュータビジョン技術の調査 (2)
 
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
 
Data-Centric AIの紹介
Data-Centric AIの紹介Data-Centric AIの紹介
Data-Centric AIの紹介
 
Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査
 
ドラレコ + CV = 地図@Mobility Technologies
ドラレコ + CV = 地図@Mobility Technologiesドラレコ + CV = 地図@Mobility Technologies
ドラレコ + CV = 地図@Mobility Technologies
 
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for VisionMLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
 
CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選
 
kaggle NFL 1st and Future - Impact Detection
kaggle NFL 1st and Future - Impact Detectionkaggle NFL 1st and Future - Impact Detection
kaggle NFL 1st and Future - Impact Detection
 
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
 
How Much Position Information Do Convolutional Neural Networks Encode?
How Much Position Information Do Convolutional Neural Networks Encode?How Much Position Information Do Convolutional Neural Networks Encode?
How Much Position Information Do Convolutional Neural Networks Encode?
 
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
 
SIGGRAPH 2019 Report
SIGGRAPH 2019 ReportSIGGRAPH 2019 Report
SIGGRAPH 2019 Report
 
Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
Devil is in the Edges: Learning Semantic Boundaries from Noisy AnnotationsDevil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
 

Recently uploaded

Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdfKamal Acharya
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...ssuserdfc773
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessorAshwiniTodkar4
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)ChandrakantDivate1
 
Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257subhasishdas79
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelDrAjayKumarYadav4
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...ppkakm
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 

Recently uploaded (20)

Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdf
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)
 
Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 

3D Perception for Autonomous Driving - Datasets and Algorithms -

  • 1. Mobility Technologies Co., Ltd. 3D Perception for Autonomous Driving - Datasets and Algorithms - Kazuyuki MIyazawa AI R D Group 2, AI System Dept. Mobility Technologies Co., Ltd.
  • 2. Mobility Technologies Co., Ltd. Who am I? 2 @kzykmyzw Kazuyuki Miyazawa Group Leader AI R D Group 2 AI System Dept. Mobility Technologies Co., Ltd. Past Work Experience April 2019 - March 2020 AI Research Engineer@DeNA Co., Ltd. April 2010 - March 2019 Research Scientist@Mitsubishi Electric Corp. Education PhD in Information Science@Tohoku University
  • 3. Mobility Technologies Co., Ltd.3 1 Autonomous Driving Datasets Agenda 2 3D Object Detection Algorithms
  • 4. Mobility Technologies Co., Ltd. 3D Object Detection: Motivation ■ 2D bounding boxes are not sufficient ■ Lack of 3D pose, Occlusion information, and 3D location Preliminary (Today’s Main Topic) 4 2D Object Detection 3D Object Detection http://www.cs.toronto.edu/~byang/
  • 5. Mobility Technologies Co., Ltd. Autonomous Driving Datasets 5 01
  • 6. Mobility Technologies Co., Ltd. KITTI [2012] 6 Sensor Setup ● GPS/IMU x 1 ● LiDAR (64ch) x 1 ● Grayscale Camera (1.4M) x 2 ● Color Camera (1.4M) x 2 http://www.cvlibs.net/datasets/kitti/
  • 7. Mobility Technologies Co., Ltd. KITTI [2012] 7
  • 8. Mobility Technologies Co., Ltd. 3D Object Detection 8 ● 7,481 training images / point clouds ● 7,518 test images / point clouds ● 80,256 labeled objects type Car, Van, Truck, Pedestrian, Person_sitting, Cyclist, Tram, Misc or DontCare truncated 0 to 1, where truncated refers to the object leaving image boundaries occuluded 0 = fully visible, 1 = partly occluded, 2 = largely occluded, 3 = unknown alpha Observation angle of object, ranging [-pi..pi] bbox 2D bounding box of object in the image dimensions 3D object dimensions: height, width, length location 3D object location x,y,z in camera coordinate rotation_y Rotation ry around Y-axis in camera coordinates [-pi..pi] Annotations
  • 9. Mobility Technologies Co., Ltd. License 9
  • 10. Mobility Technologies Co., Ltd. Variants of KITTI 10 SemanticKITTI Dataset provides annotations that associate each LiDAR point with one of 28 semantic classes in all 22 sequences of the KITTI Dataset http://semantic-kitti.org/ Virtual KITTI contains 50 high-resolution monocular videos (21,260 frames) generated from five different virtual worlds in urban settings under different imaging and weather conditions https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds/
  • 11. Mobility Technologies Co., Ltd. ApolloScape [2017] 11 Sensor Setup ● GPS/IMU x 1 ● LiDAR x 2 ● Color Camera (9.2M) x 2 http://apolloscape.auto/
  • 12. Mobility Technologies Co., Ltd. ApolloScape [2017] 12 Scene Parsing 3D Car Instance Lane Segmentation
  • 13. Mobility Technologies Co., Ltd. ApolloScape [2017] 13 Self Localization Stereo
  • 14. Mobility Technologies Co., Ltd. 3D Object Detection 14 ● 53 min training sequences ● 50 min testing sequences ● 70K 3D fitted cars type Small vehicle, Big vehicle, Pedestrian, Motorcyclist and Bicyclist, Traffic cones, Others dimensions 3D object dimensions: height, width, length location 3D object location x,y,z in relative coordinate heading Steering radian with respect to the direction of the object Annotations
  • 15. Mobility Technologies Co., Ltd. ■ To the extent that we authorize the Developer to use Datasets and subject to the terms of this Agreement, the Developer is entitled to use the Datasets only (i) for Developer’s internal purposes of non-commercial research or teaching and (ii) in accordance with the terms of this Agreement. License 15 http://apolloscape.auto/license.html
  • 16. Mobility Technologies Co., Ltd. nuScenes [2019] 16 Sensor Setup ● GPS/IMU x 1 ● LiDAR (32ch) x 1 ● RADAR x 5 ● Color Camera (1.4M) x 3 https://www.nuscenes.org/
  • 17. Mobility Technologies Co., Ltd. Semantic Map 17 ● Provide highly accurate human-annotated semantic maps of the relevant areas ● 11 semantic classes ● Encourage the use of localization and semantic maps as strong priors for all tasks
  • 18. Mobility Technologies Co., Ltd. 3D Object Detection 18 ● category ● attribute ● visibility ● instance ● sensor ● calibrated_sensor ● ego_pose ● log ● scene ● sample ● sample_data ● sample_annotation ● map Number of annotations per category Attributes distribution for selected categories 1.4M boxes in total
  • 19. Mobility Technologies Co., Ltd. License 19
  • 20. Mobility Technologies Co., Ltd. Argoverse [2019] 20 Sensor Setup ● GPS x 1 ● LiDAR (32ch) x 2 ● Color Camera (4.8M) x 2 ● Color Camera (2M) x 7 https://www.argoverse.org/
  • 21. Mobility Technologies Co., Ltd. Argoverse Maps 21 Vector Map: Lane-Level Geometry Rasterized Map: Ground Height Rasterized Map: Drivable Area
  • 22. Mobility Technologies Co., Ltd. 3D Object Detection (3D Tracking) 22 ● Collection of 113 log segments with 3D object tracking annotations ● These log segments vary in length from 15 to 30 seconds and contain a total of 11,052 tracks ● Each sequence includes annotations for all objects within 5 meters of “drivable area” — the area in which it is possible for a vehicle to drive
  • 23. Mobility Technologies Co., Ltd. License 23
  • 24. Mobility Technologies Co., Ltd. Lyft Level 5 [2019] 24 Sensor Setup (BETA_V0) ● LiDAR (40ch) x 3 ● WFOV Camera (1.2M) x 6 ● Long-focal-length Camera (1.7M) x 1 Sensor Setup (BETA_++) ● LiDAR (64ch) x 1 ● LiDAR (40ch) x 2 ● WFOV Camera (2M) x 6 ● Long-focal-length Camera (2M) x 1 https://level5.lyft.com/dataset/
  • 25. Mobility Technologies Co., Ltd. Semantic Map 25
  • 26. Mobility Technologies Co., Ltd. 3D Object Detection (Same format as nuScenes) 26 ● category ● attribute ● visibility ● instance ● sensor ● calibrated_sensor ● ego_pose ● log ● scene ● sample ● sample_data ● sample_annotation ● map animal bicycle bus car emergency_vehicle motorcycle other_vehicle pedestrian truck 638K boxes in total
  • 27. Mobility Technologies Co., Ltd. License 27
  • 28. Mobility Technologies Co., Ltd. Audi Autonomous Driving Dataset (A2D2) [2020] 28 Sensor Setup ● GPS/IMU x 1 ● LiDAR (16ch) x 5 ● Color Camera (2.3M) x 6 https://www.a2d2.audi/a2d2/en.html
  • 29. Mobility Technologies Co., Ltd. Audi Autonomous Driving Dataset (A2D2) [2020] 29
  • 30. Mobility Technologies Co., Ltd. 3D Object Detection 30 ● All images have corresponding LiDAR point clouds, of which 12,497 are annotated with 3D bounding boxes within the field of view of the front-center camera
  • 31. Mobility Technologies Co., Ltd. License 31
  • 32. Mobility Technologies Co., Ltd. Comparison 32 ? ? ? ? ? ? These figures are based on Table 1 in https://arxiv.org/abs/1912.04838
  • 33. Mobility Technologies Co., Ltd. Comparison 33 Waymo Waymo Waymo Waymo Waymo Waymo These figures are based on Table 1 in https://arxiv.org/abs/1912.04838
  • 34. Mobility Technologies Co., Ltd. Waymo Open Dataset [2019] 34 Sensor Setup ● Mid-Range (~75m) LiDAR x 1 ● Short-Range (~20m) LiDAR x 4 ● Color Camera (2M) x 3 ● Color Camera (1.6M) x 2 https://waymo.com/open/
  • 35. Mobility Technologies Co., Ltd. Data Volume 35 Train 798 segments w/ labels (757 GB) Test 150 seg. w/o labels (192 GB) Validation 202 seg. w/ labels (144 GB) ● Contain 1150 segments that each span 20 seconds ● Additionally, segments from a new location and only a subset have labels are provided for domain adaptation
  • 36. Mobility Technologies Co., Ltd. Data Format 36 Segment Frame context Shared information among all frames in the scene (e.g., calibration parameters, stats) timestamp_micros Frame timestamp pose Vehicle pose images Camera images and metadata (e.g., pose, velocity, timestamp) lasers Range images laser_labels 3D box annotations projected_lidar_labels Lidar labels (laser_labels) projected to camera images camera_labels 2D box annotations no_label_zones Polygon that represents areas without labels (e.g., opposite side of a highway) Frame ... ● Each segment (20 sec) consists of ~200 frames (10 Hz) ● All the data related to a segment is stored to a single tfrecord and represented as protocol buffers
  • 37. Mobility Technologies Co., Ltd. Range Image 37 The point cloud of each LiDAR is encoded as a range image 1streturn2ndreturn range intensity elongation range intensity elongation
  • 38. Mobility Technologies Co., Ltd. API & Tutorial in colab 38 https://github.com/waymo-research/waymo-open-dataset https://colab.research.google.com/github/waymo-research/waymo-open-dataset/blob/master/tutorial/tutorial.ipynb
  • 39. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 39 Mid-range LiDAR
  • 40. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 40 Mid-range LiDAR
  • 41. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 41 Mid-range LiDAR Short-range LiDAR (front)
  • 42. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 42 Mid-range LiDAR Short-range LiDAR (right)
  • 43. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 43 Mid-range LiDAR Short-range LiDAR (rear)
  • 44. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 44 Mid-range LiDAR Short-range LiDAR (left)
  • 45. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 45 Mid-range LiDAR Short-range LiDARs (all)
  • 46. Mobility Technologies Co., Ltd. Data Visualization (Camera Images) 46 Front Left 1920x1080 Front 1920x1080 Front Right 1920x1080 Side Left 1920x886 Side Right 1920x886
  • 47. Mobility Technologies Co., Ltd. 3D Object Detection 47 ■ 3D LiDAR Lables ■ 3D 7-DOF bounding boxes in the vehicle frame with globally unique tracking IDs ■ vehicles, pedestrian, cyclists, signs ■ 2D Camera Lables ■ Not projections of 3D labels ■ vehicles, pedestrian, cyclists ■ Tight-fitting, axis-aligned 2D bounding boxes with globally unique tracking IDs Vehicle Pedestrian Cyclists Signs 3D Object 6.1M 2.8M 67K 3.2M 3D TrackID 60K 23K 620 23K 2D Object 7.7M 2.1M 63K - 2D TrackID 164K 45K 1.3K - Labeled object and tracking ID counts
  • 48. Mobility Technologies Co., Ltd. 2D Label Samples 48
  • 49. Mobility Technologies Co., Ltd. 3D Label Samples 49
  • 50. Mobility Technologies Co., Ltd. LiDAR to Camera Projection 50 ■ Cameras and LiDARs data are well-synchronized ■ LiDAR points can be projected to camera image with rolling shutter effect compensation
  • 51. Mobility Technologies Co., Ltd. Challenges 51
  • 52. Mobility Technologies Co., Ltd. Evaluation Metrics for 3D Object Detection 52 https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html P/R Curve Average Precision with Heading Each true positive is weighted by heading accuracy defined as Ground truth Prediction
  • 53. Mobility Technologies Co., Ltd. To ensure the Dataset is only used for Non-Commercial Purposes, You agree ■ Not to distribute or publish any models trained on or refined using the Dataset, or the weights or biases from such trained models ■ Not to use or deploy the Dataset, any models trained on or refined using the Dataset, or the weights or biases from such trained models (i) in operation of a vehicle or to assist in the operation of a vehicle, (ii) in any Production Systems, or (iii) for any other primarily commercial purposes License 53 https://waymo.com/open/terms/
  • 54. Mobility Technologies Co., Ltd. 3D Object Detection Algorithms 54 02
  • 55. Mobility Technologies Co., Ltd. ■ Design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input ■ Provide a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing PointNet [C. Qi+, CVPR2017] 55 https://arxiv.org/abs/1612.00593
  • 56. Mobility Technologies Co., Ltd. PointNet Architecture 56
  • 57. Mobility Technologies Co., Ltd. PointNet Architecture 57 Predict an affine transformation matrix by a mini-network and align all input set to achieve invariance against geometric transformations
  • 58. Mobility Technologies Co., Ltd. PointNet Architecture 58 The same alignment approach is also applied in feature space
  • 59. Mobility Technologies Co., Ltd. PointNet Architecture 59 Using max pooling as symmetric function, aggregate unordered point features
  • 60. Mobility Technologies Co., Ltd. ■ Divide a point cloud into 3D voxels and transform them into a unified feature representation ■ Descriptive volumetric representation is then connected to a RPN to generate detections VoxelNet [Y, Zhou+, CVPR2018] 60 A voxel represents a value on a regular grid in three- dimensional space https://en.wikipedia.org/wiki/Voxel LiDAR ONLY https://arxiv.org/abs/1711.06396
  • 61. Mobility Technologies Co., Ltd. Voxel Feature Encoding (VFE) Layer 61 ● VFE enables inter-point interaction within a voxel, by combining point-wise features with a locally aggregated feature. ● Stacking multiple VFE layers allows learning complex features for characterizing local 3D shape information
  • 62. Mobility Technologies Co., Ltd. Convolutional Middle Layers 62 ● Each convolutional middle layer applies 3D convolution, BN layer, and ReLU layer sequentially ● Convolutional middle layers aggregate voxel-wise features within a progressively expanding receptive field, adding more context to the shape description
  • 63. Mobility Technologies Co., Ltd. Region Proposal Network 63 ● The first layer of each block downsamples the input feature map ● Then the output of every block is upsampled to a fixed size and concatenated to construct the high resolution feature map ● Finally, this feature map is mapped to the desired learning targets
  • 64. Mobility Technologies Co., Ltd. Evaluation on KITTI 64 Performance comparison on KITTI validation set Performance comparison on KITTI test set
  • 65. Mobility Technologies Co., Ltd. ■ Apply sparse convolution to greatly increase the speeds of training and inference ■ Introduce a novel angle loss regression approach to solve the problem of the large loss generated when the angle prediction error is equal to π SECOND (Sparsely Embedded CONvolutional Detection) [Y, Yan+, Sensors2018] 65 LiDAR ONLY https://pdfs.semanticscholar.org/5125/a16039cabc6320c908a4764f32596e018ad3.pdf
  • 66. Mobility Technologies Co., Ltd. Sparse Convolution Algorithm 66 ■ Gather the necessary input to construct the matrix, perform GEMM, then scatter the data back ■ GPU-based rule generation algorithm is proposed to construct input–output index rule matrix
  • 67. Mobility Technologies Co., Ltd. ■ Directly predicting the radian offset suffers from an adversarial example problem between the cases of 0 and π radians because they correspond to the same box but generate a large loss when one is misidentified as the other ■ Solve this problem by introducing a new angle loss regression: ■ To address the issue that this loss treats boxes with opposite directions as being the same, a simple direction classifier is added to the output of the RPN Sine-Error Loss for Angle Regression 67
  • 68. Mobility Technologies Co., Ltd. Evaluation on KITTI 68 Performance comparison on KITTI validation set Performance comparison on KITTI test set
  • 69. Mobility Technologies Co., Ltd. PointPillars [A. Lang+, CVPR2019] 69 ■ Propose an encoder to learn a representation of point clouds organized in vertical columns (pillars) and generate pseudo 2D image ■ Encoded features can be used with any standard 2D convolutional detection architecture without computationally-expensive 3D ConvNets LiDAR ONLY https://arxiv.org/abs/1812.05784
  • 70. Mobility Technologies Co., Ltd. Pointcloud to Pseudo-Image 70 Point cloud is discretized into an evenly spaced grid in the x-y plane,creating a set of pillars
  • 71. Mobility Technologies Co., Ltd. Pointcloud to Pseudo-Image 71 Create a dense tensor of size (D, P, N) D: Dimension of augmented lidar point (=9) P: Number of non-empty pillars per sample N: Number of points per pillar
  • 72. Mobility Technologies Co., Ltd. Pointcloud to Pseudo-Image 72 Apply PointNet to generate a (C, P, N) sized feature tensor, followed by a max operation over the channels to create an output tensor of size (C, P)
  • 73. Mobility Technologies Co., Ltd. Pointcloud to Pseudo-Image 73 Features are scattered back to the original pillar locations to create a pseudo-image of size (C, H, W) where H and W indicate the height and width of the canvas
  • 74. Mobility Technologies Co., Ltd. Backbone 74 Top-down network produces features at increasingly small spatial resolution Second network performs upsampling and concatenation of the top-down features
  • 75. Mobility Technologies Co., Ltd. Detection Head 75 Single Shot Detector (SSD) is used with additional regression targets (height and elevation)
  • 76. Mobility Technologies Co., Ltd. Evaluation on KITTI 76 Performance comparison on KITTI test set
  • 77. Mobility Technologies Co., Ltd. ■ Implementaion ■ Official PointPillar’s implementation is forked from SECOND’s implementation and is no longer maintained ■ Instead, SECOND’s implementation now supports PointPillars ■ Format Conversion ■ SECOND’s implementation only supports KITTI and nuScenes, so format conversion is the fastest way to use Waymo Open Dataset ■ Several converters can be found on GitHub ■ Waymo_Kitti_Adapter ■ waymo_kitti_converter Let’s Try PointPillars on Waymo Open Dataset 77
  • 78. Mobility Technologies Co., Ltd. These results are just for reference, because only a part of training set is used and hyper parameters are not tuned to Waymo Open Dataset at all Vehicle Detection Results 78
  • 79. Mobility Technologies Co., Ltd. These results are just for reference, because only a part of training set is used and hyper parameters are not tuned to Waymo Open Dataset at all Vehicle Detection Results 79
  • 80. Mobility Technologies Co., Ltd. Results from Leaderboard on Waymo Open Dataset 80 https://waymo.com/open/challenges/3d-detection/#
  • 81. Mobility Technologies Co., Ltd. ■ First generate 2D object region proposals in the RGB image using CNN, then each 2D region is extruded to a 3D viewing frustum to get a point cloud ■ PointNet predicts a 3D bounding box for the object from the points in frustum Frustum PointNets [C. Qi+, CVPR2018] 81 LiDAR + Camera https://arxiv.org/abs/1812.05784
  • 82. Mobility Technologies Co., Ltd. Frustum Proposal 82 ● Use object detector in RGB image to predict a 2D bounding box and lift it to a frustum with a known camera matrix ● Collect all points within the frustum to form a frustum point cloud
  • 83. Mobility Technologies Co., Ltd. 3D Instance Segmentation 83 Object instance is segmented by binary classification of each point using PointNet
  • 84. Mobility Technologies Co., Ltd. Amodal 3D Box Estimation 84 Estimate the object’s amodal oriented 3D bounding box by using a box regression PointNet Estimate the true center of the complete object and then transform the coordinate such that the predicted center becomes the origin
  • 85. Mobility Technologies Co., Ltd. Evaluation on KITTI 85 Performance comparison on KITTI validation set Performance comparison on KITTI test set
  • 86. Mobility Technologies Co., Ltd. PV-RCNN [S. Shi+, CVPR2020] 86 https://arxiv.org/abs/1912.13192 ■ Voxel-based operation efficiently encodes multi-scale feature representations and can generate high-quality 3D proposals, while the PointNet-based set abstraction operation preserves accurate location information with flexible receptive fields ■ Integrate the two operations via the voxel-to-keypoint 3D scene encoding and the keypoint-to- grid RoI feature abstraction LiDAR ONLY
  • 87. Mobility Technologies Co., Ltd. 3D Voxel CNN for Feature Encoding and Proposal Generation 87 Input points are first divided into voxels and gradually converted into feature volumes by 3D sparse CNN By converting 3D feature volumes into 2D bird-view feature maps, high-quality 3D proposals are generated following the anchor- based approaches
  • 88. Mobility Technologies Co., Ltd. Voxel-to-keypoint Scene Encoding via Voxel Set Abstraction 88 Small number of keypoints are sampled from the point clouds PointNet-based set abstraction module encodes the multi-scale semantic features from the 3D CNN feature volumes to the keypoints. Check if each key point is inside or outside of a ground-truth 3D box, and re-weight the keypoint features
  • 89. Mobility Technologies Co., Ltd. Keypoint-to-grid RoI Feature Abstraction for Proposal Refinement 89 RoI-grid pooling module aggregates the keypoint features to the RoI-grid points with multiple receptive fields using PointNet
  • 90. Mobility Technologies Co., Ltd. Evaluation on KITTI / Waymo Open Dataset 90 Performance comparison on KITTI test set Performance comparison on Waymo OD validation set
  • 91. Mobility Technologies Co., Ltd. We Don’t Need Camera? 91 3D vehicle detection performance on KITTI test set (moderate) LiDAR only LiDAR + Camera
  • 92. Mobility Technologies Co., Ltd. ■ Autonomous Driving Dataset ■ KITTI is most famous and frequently used dataset for vehicle related researches, however, it has limited amount and the performance on the dataset is coming to a head (> 80% AP) ■ More recent datasets provide much larger multi-modal sensor data and annotations, and some of them also provide semantic maps ■ Waymo Open Dataset is one of the largest and most diverse datasets ever released, and provides high-quality (meata)data and annotations (but unfortunately, it’s NOT commercial-friendly at all) ■ 3D Object Detection Algorithms ■ Recent 3D object detection algorithms re-purpose camera-based detection architectures, which has been greatly advanced by CNN and many mature techniques such as region proposal ■ Main two streams are the grid-based methods and the point-based methods, and a key component in the former is 2D/3D CNN, and PointNet in the latter ■ Current SoTAs are dominated by LiDAR-only methods and LiDAR-camera fusion methods lag behind Summary 92