SlideShare a Scribd company logo
1 of 21
Download to read offline
허정원, 김병현, 최승준
VoxelNet
End-to-End Learning for Point Cloud Based 3D Object Detection
Zhou, Yin, and Oncel Tuzel. Proceedings of the IEEE conference on computer vision and pattern recognition. (2018)
Contents
• Introduction
• Architecture
• Experiments
• Conclusion
2
1. Introduction
3
What is 3D Object Detection?
Problem definition
ℬ = 𝑓!"#(ℐ$"%$&'),
ℬ = 𝐵(, ⋯ , 𝐵) is a set of N 3D object in a scene,
𝑓!"# is a 3D object detection model,
ℐ$"%$&' is one or more sensory inputs.
4
1. Introduction
3D Cuboid
x
roll
yaw(𝜃)
z
y pitch
𝑙
𝑤
ℎ
𝐵 = 𝑥!, 𝑦!, 𝑧!, 𝑙, 𝑤, ℎ, 𝜃, 𝑐𝑙𝑎𝑠𝑠
vx, vy - speed
5
1. Introduction
Sensory Inputs
Radars, cameras, and LiDAR (Light Detection And Ranging) sensors are the three
most widely adopted sensory types
• Radar: Long detection range and robust to weather conditions. Velocity(Doppler)
• Camera: Cheap and easily accessible and crucial for understanding semantics.
• LiDAR: Accurate 3D information directly acquired by LiDAR sensors.
6
1. Introduction
Comparisons with 2D Object Detection
• Heterogeneous data representations.
• 2D methods detect from the perspective view. 3D methods must consider different
views.
• 3D methods has a high demand for accurate localization in the 3D space.
Bird’s Eye View(LiDAR) Point View Cylindrical View
7
1. Introduction
Datasets - KITTI
• KITTI: Pioneering work data collection and annotating 3D objects from the
collected data.
• 3D IoU
Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013).
Vision meets robotics: The kitti dataset.
The International Journal of Robotics Research, 32(11), 1231-1237.
8
1. Introduction
VoxelNet
• Voxel feature encoding (VFE) layer, which enables inter-point interaction.
• Stacking multiple VFE layers allows learning complex feature.
• VoxelNet divides the piont cloud into equally spaced 3D voxels, encodes each
voxel via stacked VFE layers, and then 3D convolution further aggregates local
voxel features, transforming the point cloud into a high-dimensional volumetric
representation and yield the detection result.
→ Benefits both from the sparse point structure and parallel processing on the
voxel grid.
9
1. Introduction
2. Architecture
10
Feature learning network
Voxel Partition
• Subdivide the 3D space into equally spaced voxels.
• Suppose the point encompasses with range D, H, W along the Z, Y, X axes
respectively.
voxel of size vD, vH, vW = 0.4, 0.2, 0.2
D, H, W are multiple of vD, vH, vW
D, H, W = Z, Y, X
H, W, L = Z, Y, X
11
2. Architecture
Z ×Y ×X = [−3, 1] × [−40, 40] × [0, 70.4]
D, H, W = 10, 400, 352
Feature learning network
Grouping
• LiDAR point cloud is sparse and has highly variable point.
• Therefor, after grouping, a voxel will contain a variable number of points.
Random Sampling
1. Computational savings
2. Decreases the imbalance
12
2. Architecture
Stacked Voxel Feature Encoding
• 𝑉 = {𝑝. = [𝑥., 𝑦., 𝑧., 𝑟.]/ ∈ ℝ0}.1(⋯# as a non-empty voxel containing t ≤ T
LiDAR points, where pi contains XYZ coordinates for the i-th point and ri is the
received reflectance.
• Local mean as the centroid of all the points in V(vx, vy, vz)
• Augment each point pi 𝑉.% =
{ ̂
𝑝.[𝑥., 𝑦., 𝑧., 𝑟., 𝑥. − 𝑣3, 𝑦. − 𝑣4, 𝑧. − 𝑣5]/ ∈ ℝ6}.1(⋯# transformed through the
fully connected network (FCN) into a feature space
Sparse Tensor Representation
4𝐷 = 𝐶 × 𝐷7 × 𝐻7× 𝑊7 = 128 × 10 × 400 × 352
Feature learning network
13
Convolutional Middle Layers
• ConvMD(cin, cout, k, s, p) to represent an M-dimentional convolution operator
where cin and cout, kernel size(k), stride size(s) and padding size(p).
4𝐷 = 𝐶 × 𝐷7 × 𝐻7× 𝑊7 = 64 × 2 × 400 × 352
14
2. Architecture
Region Proposal Network
The network has three blocks of fully convolutional layers. The first layer of each
block downsamples the feature map by half via a convolution with a stride size of 2,
followed by a stride 1. BN, ReLU. Upsample the output of every block a fixed size
and concatanate to construct the high resolution feature map. 1. score map, 2.
regression map
3𝐷 = 𝐶 × 𝐻7× 𝑊7 = 128 × 400 × 352
15
2. Architecture
Loss Function
Let {𝑎.
8&$
}.1(…)!"#
be the set of Npos positive anchors
{𝑎:
%";
}:1(…)$%&
be the set of Nneg negative anchors.
A 3D ground truth box as (𝑥<
;
, 𝑦<
;
, 𝑧<
;
, 𝑙
;
, 𝑤
;
, ℎ
;
, 𝜃
;
), where 𝑥<
;
, 𝑦<
;
, 𝑧<
;
represent
the center location, 𝑙
;
, 𝑤
;
, ℎ
;
are length, width, height of the box, and 𝜃
;
is the
yaw rotation around Z-axis.
To retrieve the ground truth box from a matching positive anchor parameterized as
(𝑥<
=, 𝑦<
=, 𝑧<
=, 𝑙=
, 𝑤=
, ℎ=
, 𝜃=
)
𝑢∗ ∈ ℝ6 = ∆𝑥, ∆𝑦, ∆𝑧, ∆𝑙, ∆𝑤, ∆ℎ, ∆𝜃
𝐿 = 𝛼
1
𝑁!"#
D
$
𝐿%&#(𝑝$
!"#
, 1) + 𝛽
1
𝑁'()
D
*
𝐿%&#(𝑝*
'()
, 0)
+
1
𝑁!"#
D
$
𝐿+()(𝑢$, 𝑢$
∗
)
17
2. Architecture
3. Experiments
18
Evaluation in 3D
Method Modality
Car Pedestrian Cyclist
Easy Moderate Hard Easy Moderate Hard Easy Moderate Hard
Mono3D Mono 2.53 2.31 2.31 N/A N/A N/A N/A N/A N/A
3DOP Stereo 6.55 5.07 4.10 N/A N/A N/A N/A N/A N/A
VeloFCN LiDAR 15.20 13.66 15.98 N/A N/A N/A N/A N/A N/A
MV (BV+FV) LiDAR 71.19 56.60 55.30 N/A N/A N/A N/A N/A N/A
MV
(BV+FV+RGB)
LiDAR+Mono 71.29 62.68 56.56 N/A N/A N/A N/A N/A N/A
HC-baseline LiDAR 71.73 59.75 55.69 43.95 40.18 37.48 55.35 36.07 34.15
VoxelNet LiDAR 81.97 65.46 62.85 57.86 53.42 48.87 67.17 47.65 45.11
19
Evaluation in 3D
20
3. Experiments
4. Conclusion
21
Conclusion
• Remove the bottleneck of manual feature engineering and propose VoxelNet.
• Operate directly on sparse 3D points and capture 3D shape information effectively.
• Efficient implementation of VoxelNet that benefits from point cloud sparsity and
parallel processing on a voxel grid.
• Show that VoxelNet outperforms state-of-the-art LiDAR based 3D detection
methods by a large margin.
• Provides a better 3D representation.
Future work: Extending VoxelNet for joint LiDAR and image based end-to-end 3D
detection to further improve detection and localization accuracy.
22

More Related Content

Similar to VoxelNet

Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometryppd1961
 
Meshing for computer graphics
Meshing for computer graphicsMeshing for computer graphics
Meshing for computer graphicsBruno Levy
 
Implementation of a modified counterpropagation neural network model in onlin...
Implementation of a modified counterpropagation neural network model in onlin...Implementation of a modified counterpropagation neural network model in onlin...
Implementation of a modified counterpropagation neural network model in onlin...Alexander Decker
 
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEAU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEThiyagarajan G
 
Cascades Demo Secrets
Cascades Demo SecretsCascades Demo Secrets
Cascades Demo Secretsicastano
 
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...sipij
 
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...IDES Editor
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fieldsVarun Bhaseen
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
Edge linking hough transform
Edge linking hough transformEdge linking hough transform
Edge linking hough transformaruna811496
 
CS 354 Graphics Math
CS 354 Graphics MathCS 354 Graphics Math
CS 354 Graphics MathMark Kilgard
 
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdfAdvanced-Concepts-Team
 
Trident International Graphics Workshop 2014 4/5
Trident International Graphics Workshop 2014 4/5Trident International Graphics Workshop 2014 4/5
Trident International Graphics Workshop 2014 4/5Takao Wada
 
Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015Soma Boubou
 
New Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral RecognitionNew Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral RecognitionIJERA Editor
 
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICESA BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICEScsandit
 

Similar to VoxelNet (20)

Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometry
 
Spectral convnets
Spectral convnetsSpectral convnets
Spectral convnets
 
V2 v posenet
V2 v posenetV2 v posenet
V2 v posenet
 
Meshing for computer graphics
Meshing for computer graphicsMeshing for computer graphics
Meshing for computer graphics
 
Isvc08
Isvc08Isvc08
Isvc08
 
Implementation of a modified counterpropagation neural network model in onlin...
Implementation of a modified counterpropagation neural network model in onlin...Implementation of a modified counterpropagation neural network model in onlin...
Implementation of a modified counterpropagation neural network model in onlin...
 
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEAU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
 
Cascades Demo Secrets
Cascades Demo SecretsCascades Demo Secrets
Cascades Demo Secrets
 
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
 
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fields
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Edge linking hough transform
Edge linking hough transformEdge linking hough transform
Edge linking hough transform
 
CS 354 Graphics Math
CS 354 Graphics MathCS 354 Graphics Math
CS 354 Graphics Math
 
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
 
Trident International Graphics Workshop 2014 4/5
Trident International Graphics Workshop 2014 4/5Trident International Graphics Workshop 2014 4/5
Trident International Graphics Workshop 2014 4/5
 
Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015
 
New Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral RecognitionNew Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral Recognition
 
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICESA BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 

More from taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimizationtaeseon ryu
 

More from taeseon ryu (20)

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
 

Recently uploaded

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

VoxelNet

  • 1. 허정원, 김병현, 최승준 VoxelNet End-to-End Learning for Point Cloud Based 3D Object Detection Zhou, Yin, and Oncel Tuzel. Proceedings of the IEEE conference on computer vision and pattern recognition. (2018)
  • 2. Contents • Introduction • Architecture • Experiments • Conclusion 2
  • 4. What is 3D Object Detection? Problem definition ℬ = 𝑓!"#(ℐ$"%$&'), ℬ = 𝐵(, ⋯ , 𝐵) is a set of N 3D object in a scene, 𝑓!"# is a 3D object detection model, ℐ$"%$&' is one or more sensory inputs. 4 1. Introduction
  • 5. 3D Cuboid x roll yaw(𝜃) z y pitch 𝑙 𝑤 ℎ 𝐵 = 𝑥!, 𝑦!, 𝑧!, 𝑙, 𝑤, ℎ, 𝜃, 𝑐𝑙𝑎𝑠𝑠 vx, vy - speed 5 1. Introduction
  • 6. Sensory Inputs Radars, cameras, and LiDAR (Light Detection And Ranging) sensors are the three most widely adopted sensory types • Radar: Long detection range and robust to weather conditions. Velocity(Doppler) • Camera: Cheap and easily accessible and crucial for understanding semantics. • LiDAR: Accurate 3D information directly acquired by LiDAR sensors. 6 1. Introduction
  • 7. Comparisons with 2D Object Detection • Heterogeneous data representations. • 2D methods detect from the perspective view. 3D methods must consider different views. • 3D methods has a high demand for accurate localization in the 3D space. Bird’s Eye View(LiDAR) Point View Cylindrical View 7 1. Introduction
  • 8. Datasets - KITTI • KITTI: Pioneering work data collection and annotating 3D objects from the collected data. • 3D IoU Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11), 1231-1237. 8 1. Introduction
  • 9. VoxelNet • Voxel feature encoding (VFE) layer, which enables inter-point interaction. • Stacking multiple VFE layers allows learning complex feature. • VoxelNet divides the piont cloud into equally spaced 3D voxels, encodes each voxel via stacked VFE layers, and then 3D convolution further aggregates local voxel features, transforming the point cloud into a high-dimensional volumetric representation and yield the detection result. → Benefits both from the sparse point structure and parallel processing on the voxel grid. 9 1. Introduction
  • 11. Feature learning network Voxel Partition • Subdivide the 3D space into equally spaced voxels. • Suppose the point encompasses with range D, H, W along the Z, Y, X axes respectively. voxel of size vD, vH, vW = 0.4, 0.2, 0.2 D, H, W are multiple of vD, vH, vW D, H, W = Z, Y, X H, W, L = Z, Y, X 11 2. Architecture Z ×Y ×X = [−3, 1] × [−40, 40] × [0, 70.4] D, H, W = 10, 400, 352
  • 12. Feature learning network Grouping • LiDAR point cloud is sparse and has highly variable point. • Therefor, after grouping, a voxel will contain a variable number of points. Random Sampling 1. Computational savings 2. Decreases the imbalance 12 2. Architecture
  • 13. Stacked Voxel Feature Encoding • 𝑉 = {𝑝. = [𝑥., 𝑦., 𝑧., 𝑟.]/ ∈ ℝ0}.1(⋯# as a non-empty voxel containing t ≤ T LiDAR points, where pi contains XYZ coordinates for the i-th point and ri is the received reflectance. • Local mean as the centroid of all the points in V(vx, vy, vz) • Augment each point pi 𝑉.% = { ̂ 𝑝.[𝑥., 𝑦., 𝑧., 𝑟., 𝑥. − 𝑣3, 𝑦. − 𝑣4, 𝑧. − 𝑣5]/ ∈ ℝ6}.1(⋯# transformed through the fully connected network (FCN) into a feature space Sparse Tensor Representation 4𝐷 = 𝐶 × 𝐷7 × 𝐻7× 𝑊7 = 128 × 10 × 400 × 352 Feature learning network 13
  • 14. Convolutional Middle Layers • ConvMD(cin, cout, k, s, p) to represent an M-dimentional convolution operator where cin and cout, kernel size(k), stride size(s) and padding size(p). 4𝐷 = 𝐶 × 𝐷7 × 𝐻7× 𝑊7 = 64 × 2 × 400 × 352 14 2. Architecture
  • 15. Region Proposal Network The network has three blocks of fully convolutional layers. The first layer of each block downsamples the feature map by half via a convolution with a stride size of 2, followed by a stride 1. BN, ReLU. Upsample the output of every block a fixed size and concatanate to construct the high resolution feature map. 1. score map, 2. regression map 3𝐷 = 𝐶 × 𝐻7× 𝑊7 = 128 × 400 × 352 15 2. Architecture
  • 16. Loss Function Let {𝑎. 8&$ }.1(…)!"# be the set of Npos positive anchors {𝑎: %"; }:1(…)$%& be the set of Nneg negative anchors. A 3D ground truth box as (𝑥< ; , 𝑦< ; , 𝑧< ; , 𝑙 ; , 𝑤 ; , ℎ ; , 𝜃 ; ), where 𝑥< ; , 𝑦< ; , 𝑧< ; represent the center location, 𝑙 ; , 𝑤 ; , ℎ ; are length, width, height of the box, and 𝜃 ; is the yaw rotation around Z-axis. To retrieve the ground truth box from a matching positive anchor parameterized as (𝑥< =, 𝑦< =, 𝑧< =, 𝑙= , 𝑤= , ℎ= , 𝜃= ) 𝑢∗ ∈ ℝ6 = ∆𝑥, ∆𝑦, ∆𝑧, ∆𝑙, ∆𝑤, ∆ℎ, ∆𝜃 𝐿 = 𝛼 1 𝑁!"# D $ 𝐿%&#(𝑝$ !"# , 1) + 𝛽 1 𝑁'() D * 𝐿%&#(𝑝* '() , 0) + 1 𝑁!"# D $ 𝐿+()(𝑢$, 𝑢$ ∗ ) 17 2. Architecture
  • 18. Evaluation in 3D Method Modality Car Pedestrian Cyclist Easy Moderate Hard Easy Moderate Hard Easy Moderate Hard Mono3D Mono 2.53 2.31 2.31 N/A N/A N/A N/A N/A N/A 3DOP Stereo 6.55 5.07 4.10 N/A N/A N/A N/A N/A N/A VeloFCN LiDAR 15.20 13.66 15.98 N/A N/A N/A N/A N/A N/A MV (BV+FV) LiDAR 71.19 56.60 55.30 N/A N/A N/A N/A N/A N/A MV (BV+FV+RGB) LiDAR+Mono 71.29 62.68 56.56 N/A N/A N/A N/A N/A N/A HC-baseline LiDAR 71.73 59.75 55.69 43.95 40.18 37.48 55.35 36.07 34.15 VoxelNet LiDAR 81.97 65.46 62.85 57.86 53.42 48.87 67.17 47.65 45.11 19
  • 19. Evaluation in 3D 20 3. Experiments
  • 21. Conclusion • Remove the bottleneck of manual feature engineering and propose VoxelNet. • Operate directly on sparse 3D points and capture 3D shape information effectively. • Efficient implementation of VoxelNet that benefits from point cloud sparsity and parallel processing on a voxel grid. • Show that VoxelNet outperforms state-of-the-art LiDAR based 3D detection methods by a large margin. • Provides a better 3D representation. Future work: Extending VoxelNet for joint LiDAR and image based end-to-end 3D detection to further improve detection and localization accuracy. 22