SlideShare a Scribd company logo

VoxelNet

안녕하세요 딥러닝 논문읽기 모임 입니다! 오늘 소개할 논문은 3D관련 업무를 진행 하시는/ 희망하시는 분들의 필수 논문인 VoxelNET 입니다. 발표자료:https://www.slideshare.net/taeseonryu/mcsemultimodal-contrastive-learning-of-sentence-embeddings 안녕하세요! 딥러닝 논문읽기 모임입니다. 오늘은 자율 주행, 가정용 로봇, 증강/가상 현실과 같은 다양한 응용 분야에서 중요한 문제인 3D 포인트 클라우드에서의 객체 탐지에 대한 획기적인 진전을 소개하고자 합니다. 이를 위해 'VoxelNet'이라는 새로운 3D 탐지 네트워크에 대해 알아보겠습니다. 1. 기존 방법의 한계 기존의 많은 노력은 수동으로 만들어진 특징 표현, 예를 들어 새의 눈 시점 투영 등에 집중해 왔습니다. 하지만 이러한 방법들은 LiDAR 포인트 클라우드와 영역 제안 네트워크(RPN) 사이의 연결을 효과적으로 수행하기 어렵습니다. 2. VoxelNet의 혁신적 접근법 VoxelNet은 3D 포인트 클라우드를 위한 수동 특징 공학의 필요성을 없애고, 특징 추출과 바운딩 박스 예측을 단일 단계, end-to-end 학습 가능한 깊은 네트워크로 통합합니다. VoxelNet은 포인트 클라우드를 균일하게 배치된 3D 복셀로 나누고, 새롭게 도입된 복셀 특징 인코딩(VFE) 레이어를 통해 각 복셀 내의 포인트 그룹을 통합된 특징 표현으로 변환합니다. 3. 효과적인 기하학적 표현 학습 이 방식을 통해 포인트 클라우드는 서술적인 체적 표현으로 인코딩되며, 이는 RPN에 연결되어 탐지를 생성합니다. VoxelNet은 다양한 기하학적 구조를 가진 객체의 효과적인 구별 가능한 표현을 학습합니다. 4. 성능 평가 KITTI 자동차 탐지 벤치마크에서의 실험 결과, VoxelNet은 기존의 LiDAR 기반 3D 탐지 방법들을 큰 차이로 능가했습니다. 또한, LiDAR만을 기반으로 한 보행자와 자전거 탐지에서도 희망적인 결과를 보였습니다. VoxelNet의 도입은 3D 포인트 클라우드에서의 객체 탐지를 혁신적으로 개선하고 있으며, 이 분야에서의 미래 발전에 중요한 영향을 미칠 것으로 기대됩니다. 오늘 논문 리뷰를 위해 이미지처리 허정원님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다! https://youtu.be/yCgsCyoJoMg

1 of 21
Download to read offline
허정원, 김병현, 최승준
VoxelNet
End-to-End Learning for Point Cloud Based 3D Object Detection
Zhou, Yin, and Oncel Tuzel. Proceedings of the IEEE conference on computer vision and pattern recognition. (2018)
Contents
• Introduction
• Architecture
• Experiments
• Conclusion
2
1. Introduction
3
What is 3D Object Detection?
Problem definition
ℬ = 𝑓!"#(ℐ$"%$&'),
ℬ = 𝐵(, ⋯ , 𝐵) is a set of N 3D object in a scene,
𝑓!"# is a 3D object detection model,
ℐ$"%$&' is one or more sensory inputs.
4
1. Introduction
3D Cuboid
x
roll
yaw(𝜃)
z
y pitch
𝑙
𝑤
ℎ
𝐵 = 𝑥!, 𝑦!, 𝑧!, 𝑙, 𝑤, ℎ, 𝜃, 𝑐𝑙𝑎𝑠𝑠
vx, vy - speed
5
1. Introduction
Sensory Inputs
Radars, cameras, and LiDAR (Light Detection And Ranging) sensors are the three
most widely adopted sensory types
• Radar: Long detection range and robust to weather conditions. Velocity(Doppler)
• Camera: Cheap and easily accessible and crucial for understanding semantics.
• LiDAR: Accurate 3D information directly acquired by LiDAR sensors.
6
1. Introduction

Recommended

Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Stixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalStixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalTaeKang Woo
 
CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics PipelineMark Kilgard
 
Object tracking by dtcwt feature vectors 2-3-4
Object tracking by dtcwt feature vectors 2-3-4Object tracking by dtcwt feature vectors 2-3-4
Object tracking by dtcwt feature vectors 2-3-4IAEME Publication
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDing Li
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 

More Related Content

Similar to VoxelNet

Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometryppd1961
 
Meshing for computer graphics
Meshing for computer graphicsMeshing for computer graphics
Meshing for computer graphicsBruno Levy
 
Implementation of a modified counterpropagation neural network model in onlin...
Implementation of a modified counterpropagation neural network model in onlin...Implementation of a modified counterpropagation neural network model in onlin...
Implementation of a modified counterpropagation neural network model in onlin...Alexander Decker
 
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEAU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEThiyagarajan G
 
Cascades Demo Secrets
Cascades Demo SecretsCascades Demo Secrets
Cascades Demo Secretsicastano
 
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...sipij
 
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...IDES Editor
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fieldsVarun Bhaseen
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
Edge linking hough transform
Edge linking hough transformEdge linking hough transform
Edge linking hough transformaruna811496
 
CS 354 Graphics Math
CS 354 Graphics MathCS 354 Graphics Math
CS 354 Graphics MathMark Kilgard
 
Trident International Graphics Workshop 2014 4/5
Trident International Graphics Workshop 2014 4/5Trident International Graphics Workshop 2014 4/5
Trident International Graphics Workshop 2014 4/5Takao Wada
 
Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015Soma Boubou
 
New Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral RecognitionNew Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral RecognitionIJERA Editor
 
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICESA BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICEScsandit
 

Similar to VoxelNet (20)

Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometry
 
Spectral convnets
Spectral convnetsSpectral convnets
Spectral convnets
 
V2 v posenet
V2 v posenetV2 v posenet
V2 v posenet
 
Meshing for computer graphics
Meshing for computer graphicsMeshing for computer graphics
Meshing for computer graphics
 
Isvc08
Isvc08Isvc08
Isvc08
 
Implementation of a modified counterpropagation neural network model in onlin...
Implementation of a modified counterpropagation neural network model in onlin...Implementation of a modified counterpropagation neural network model in onlin...
Implementation of a modified counterpropagation neural network model in onlin...
 
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEAU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
 
Cascades Demo Secrets
Cascades Demo SecretsCascades Demo Secrets
Cascades Demo Secrets
 
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
 
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fields
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Edge linking hough transform
Edge linking hough transformEdge linking hough transform
Edge linking hough transform
 
CS 354 Graphics Math
CS 354 Graphics MathCS 354 Graphics Math
CS 354 Graphics Math
 
Trident International Graphics Workshop 2014 4/5
Trident International Graphics Workshop 2014 4/5Trident International Graphics Workshop 2014 4/5
Trident International Graphics Workshop 2014 4/5
 
Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015
 
New Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral RecognitionNew Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral Recognition
 
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICESA BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
conv_nets.pptx
conv_nets.pptxconv_nets.pptx
conv_nets.pptx
 

More from taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimizationtaeseon ryu
 

More from taeseon ryu (20)

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
 

Recently uploaded

What is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxWhat is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxJose Briones
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsDataArchiva
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)UNCResearchHub
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfAustraliaChapterIIBA
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for usersStephenEfange3
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)CUO VEERANAN VEERANAN
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Thibaud Le Douarin
 
Business Analytics _ Confidence Interval
Business Analytics _ Confidence IntervalBusiness Analytics _ Confidence Interval
Business Analytics _ Confidence IntervalRavindra Nath Shukla
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxHizkiaJastis
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaAdrian Sanabria
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referencepriyansabari355
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxPoonamRijal
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensKondapi V Siva Rama Brahmam
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxMdRafiqulIslam403212
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referencepriyansabari355
 

Recently uploaded (17)

What is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxWhat is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptx
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data Goals
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for users
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
 
Business Analytics _ Confidence Interval
Business Analytics _ Confidence IntervalBusiness Analytics _ Confidence Interval
Business Analytics _ Confidence Interval
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptx
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix Enigma
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as reference
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptx
 
Electricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptxElectricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptx
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample Screens
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptx
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a reference
 

VoxelNet

  • 1. 허정원, 김병현, 최승준 VoxelNet End-to-End Learning for Point Cloud Based 3D Object Detection Zhou, Yin, and Oncel Tuzel. Proceedings of the IEEE conference on computer vision and pattern recognition. (2018)
  • 2. Contents • Introduction • Architecture • Experiments • Conclusion 2
  • 4. What is 3D Object Detection? Problem definition ℬ = 𝑓!"#(ℐ$"%$&'), ℬ = 𝐵(, ⋯ , 𝐵) is a set of N 3D object in a scene, 𝑓!"# is a 3D object detection model, ℐ$"%$&' is one or more sensory inputs. 4 1. Introduction
  • 5. 3D Cuboid x roll yaw(𝜃) z y pitch 𝑙 𝑤 ℎ 𝐵 = 𝑥!, 𝑦!, 𝑧!, 𝑙, 𝑤, ℎ, 𝜃, 𝑐𝑙𝑎𝑠𝑠 vx, vy - speed 5 1. Introduction
  • 6. Sensory Inputs Radars, cameras, and LiDAR (Light Detection And Ranging) sensors are the three most widely adopted sensory types • Radar: Long detection range and robust to weather conditions. Velocity(Doppler) • Camera: Cheap and easily accessible and crucial for understanding semantics. • LiDAR: Accurate 3D information directly acquired by LiDAR sensors. 6 1. Introduction
  • 7. Comparisons with 2D Object Detection • Heterogeneous data representations. • 2D methods detect from the perspective view. 3D methods must consider different views. • 3D methods has a high demand for accurate localization in the 3D space. Bird’s Eye View(LiDAR) Point View Cylindrical View 7 1. Introduction
  • 8. Datasets - KITTI • KITTI: Pioneering work data collection and annotating 3D objects from the collected data. • 3D IoU Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11), 1231-1237. 8 1. Introduction
  • 9. VoxelNet • Voxel feature encoding (VFE) layer, which enables inter-point interaction. • Stacking multiple VFE layers allows learning complex feature. • VoxelNet divides the piont cloud into equally spaced 3D voxels, encodes each voxel via stacked VFE layers, and then 3D convolution further aggregates local voxel features, transforming the point cloud into a high-dimensional volumetric representation and yield the detection result. → Benefits both from the sparse point structure and parallel processing on the voxel grid. 9 1. Introduction
  • 11. Feature learning network Voxel Partition • Subdivide the 3D space into equally spaced voxels. • Suppose the point encompasses with range D, H, W along the Z, Y, X axes respectively. voxel of size vD, vH, vW = 0.4, 0.2, 0.2 D, H, W are multiple of vD, vH, vW D, H, W = Z, Y, X H, W, L = Z, Y, X 11 2. Architecture Z ×Y ×X = [−3, 1] × [−40, 40] × [0, 70.4] D, H, W = 10, 400, 352
  • 12. Feature learning network Grouping • LiDAR point cloud is sparse and has highly variable point. • Therefor, after grouping, a voxel will contain a variable number of points. Random Sampling 1. Computational savings 2. Decreases the imbalance 12 2. Architecture
  • 13. Stacked Voxel Feature Encoding • 𝑉 = {𝑝. = [𝑥., 𝑦., 𝑧., 𝑟.]/ ∈ ℝ0}.1(⋯# as a non-empty voxel containing t ≤ T LiDAR points, where pi contains XYZ coordinates for the i-th point and ri is the received reflectance. • Local mean as the centroid of all the points in V(vx, vy, vz) • Augment each point pi 𝑉.% = { ̂ 𝑝.[𝑥., 𝑦., 𝑧., 𝑟., 𝑥. − 𝑣3, 𝑦. − 𝑣4, 𝑧. − 𝑣5]/ ∈ ℝ6}.1(⋯# transformed through the fully connected network (FCN) into a feature space Sparse Tensor Representation 4𝐷 = 𝐶 × 𝐷7 × 𝐻7× 𝑊7 = 128 × 10 × 400 × 352 Feature learning network 13
  • 14. Convolutional Middle Layers • ConvMD(cin, cout, k, s, p) to represent an M-dimentional convolution operator where cin and cout, kernel size(k), stride size(s) and padding size(p). 4𝐷 = 𝐶 × 𝐷7 × 𝐻7× 𝑊7 = 64 × 2 × 400 × 352 14 2. Architecture
  • 15. Region Proposal Network The network has three blocks of fully convolutional layers. The first layer of each block downsamples the feature map by half via a convolution with a stride size of 2, followed by a stride 1. BN, ReLU. Upsample the output of every block a fixed size and concatanate to construct the high resolution feature map. 1. score map, 2. regression map 3𝐷 = 𝐶 × 𝐻7× 𝑊7 = 128 × 400 × 352 15 2. Architecture
  • 16. Loss Function Let {𝑎. 8&$ }.1(…)!"# be the set of Npos positive anchors {𝑎: %"; }:1(…)$%& be the set of Nneg negative anchors. A 3D ground truth box as (𝑥< ; , 𝑦< ; , 𝑧< ; , 𝑙 ; , 𝑤 ; , ℎ ; , 𝜃 ; ), where 𝑥< ; , 𝑦< ; , 𝑧< ; represent the center location, 𝑙 ; , 𝑤 ; , ℎ ; are length, width, height of the box, and 𝜃 ; is the yaw rotation around Z-axis. To retrieve the ground truth box from a matching positive anchor parameterized as (𝑥< =, 𝑦< =, 𝑧< =, 𝑙= , 𝑤= , ℎ= , 𝜃= ) 𝑢∗ ∈ ℝ6 = ∆𝑥, ∆𝑦, ∆𝑧, ∆𝑙, ∆𝑤, ∆ℎ, ∆𝜃 𝐿 = 𝛼 1 𝑁!"# D $ 𝐿%&#(𝑝$ !"# , 1) + 𝛽 1 𝑁'() D * 𝐿%&#(𝑝* '() , 0) + 1 𝑁!"# D $ 𝐿+()(𝑢$, 𝑢$ ∗ ) 17 2. Architecture
  • 18. Evaluation in 3D Method Modality Car Pedestrian Cyclist Easy Moderate Hard Easy Moderate Hard Easy Moderate Hard Mono3D Mono 2.53 2.31 2.31 N/A N/A N/A N/A N/A N/A 3DOP Stereo 6.55 5.07 4.10 N/A N/A N/A N/A N/A N/A VeloFCN LiDAR 15.20 13.66 15.98 N/A N/A N/A N/A N/A N/A MV (BV+FV) LiDAR 71.19 56.60 55.30 N/A N/A N/A N/A N/A N/A MV (BV+FV+RGB) LiDAR+Mono 71.29 62.68 56.56 N/A N/A N/A N/A N/A N/A HC-baseline LiDAR 71.73 59.75 55.69 43.95 40.18 37.48 55.35 36.07 34.15 VoxelNet LiDAR 81.97 65.46 62.85 57.86 53.42 48.87 67.17 47.65 45.11 19
  • 19. Evaluation in 3D 20 3. Experiments
  • 21. Conclusion • Remove the bottleneck of manual feature engineering and propose VoxelNet. • Operate directly on sparse 3D points and capture 3D shape information effectively. • Efficient implementation of VoxelNet that benefits from point cloud sparsity and parallel processing on a voxel grid. • Show that VoxelNet outperforms state-of-the-art LiDAR based 3D detection methods by a large margin. • Provides a better 3D representation. Future work: Extending VoxelNet for joint LiDAR and image based end-to-end 3D detection to further improve detection and localization accuracy. 22