The document outlines research on using LiDAR data for autonomous vehicle object detection. It begins with an introduction to sensor fusion techniques using LiDAR and camera data. Several deep learning approaches for 3D object detection from LiDAR point clouds are then summarized, including methods that project the point cloud into 2D feature maps or 3D voxel grids as input to convolutional networks. Finally, techniques for exploiting HD maps and performing real-time on-device detection are discussed. The document provides an overview of the state-of-the-art in LiDAR-based object detection for autonomous driving applications.
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
Formal Scenario-Based Testing of Autonomous Vehicles: From Simulation to the Real World, 2020
A Scenario-Based Development Framework for Autonomous Driving, 2020
A Customizable Dynamic Scenario Modeling and Data Generation Platform for Autonomous Driving, 2020
Large Scale Autonomous Driving Scenarios Clustering with Self-supervised Feature Extraction, 2021
Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles, 2021
Systems Approach to Creating Test Scenarios for Automated Driving Systems, Reliability Engineering and System Safety (215), 2021
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
Formal Scenario-Based Testing of Autonomous Vehicles: From Simulation to the Real World, 2020
A Scenario-Based Development Framework for Autonomous Driving, 2020
A Customizable Dynamic Scenario Modeling and Data Generation Platform for Autonomous Driving, 2020
Large Scale Autonomous Driving Scenarios Clustering with Self-supervised Feature Extraction, 2021
Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles, 2021
Systems Approach to Creating Test Scenarios for Automated Driving Systems, Reliability Engineering and System Safety (215), 2021
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
camera-based Lane detection by deep learningYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/10/introduction-to-simultaneous-localization-and-mapping-slam-a-presentation-from-gareth-cross/
Independent game developer (and former technical lead of state estimation at Skydio) Gareth Cross presents the “Introduction to Simultaneous Localization and Mapping (SLAM)” tutorial at the May 2021 Embedded Vision Summit.
This talk provides an introduction to the fundamentals of simultaneous localization and mapping (SLAM). Cross aims to provide foundational knowledge, and viewers are not expected to have any prerequisite experience in the field.
The talk consists of an introduction to the concept of SLAM, as well as practical design considerations in formulating SLAM problems. Visual inertial odometry is introduced as a motivating example of SLAM, and Cross explains how this problem is structured and solved.
Google Self Driving Cars
The Google Self-Driving Car is a project by Google that involves developing technology for autonomous cars. The software powering Google's cars is called Google Chauffeur. Lettering on the side of each car identifies it as a "self-driving car". The project is currently being led by Google engineer Sebastian Thrun, former director of the Stanford Artificial Intelligence Laboratory and co-inventor of Google Street View. Thrun's team at Stanford created the robotic vehicle Stanley which won the 2005 DARPA Grand Challenge and its US$2 million prize from the United States Department of Defense. The team developing the system consisted of 15 engineers working for Google, including Chris Urmson, Mike Montemerlo, and Anthony Levandowski who had worked on the DARPA Grand and Urban Challenges.
Legislation has been passed in four states and the District of Columbia allowing driverless cars. The U.S. state of Nevada passed a law on June 29, 2011, permitting the operation of autonomous cars in Nevada, after Google had been lobbying in that state for robotic car laws. The Nevada law went into effect on March 1, 2012, and the Nevada Department of Motor Vehicles issued the first license for an autonomous car in May 2012, to a Toyota Prius modified with Google's experimental driverless technology. In April 2012, Florida became the second state to allow the testing of autonomous cars on public roads, and California became the third when Governor Jerry Brown signed the bill into law at Google HQ in Mountain View. In July 2014, the city of Coeur d'Alene, Idaho adopted a robotics ordinance that includes provisions to allow for self-driving cars.
Videos
https://www.youtube.com/channel/UCCLyNDhxwpqNe3UeEmGHl8g
Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos.
A Small Helping Hand from me to my Engineering collegues and my other friends in need of Object Detection
Self-Driving Cars With Convolutional Neural Networks (CNN.pptxssuserf79e761
Self-driving cars aim to revolutionize car travel by making it safe and efficient. In this article, we outlined some of the key components such as LiDAR, RADAR, cameras, and most importantly – the algorithms that make self-driving cars possible.
Few things need to be taken care of:
The algorithms used are not yet optimal enough to perceive roads and lanes because some roads lack markings and other signs.
The optimal sensing modality for localization, mapping, and perception still lack accuracy and efficiency.
Vehicle-to-vehicle communication is still a dream, but work is being done in this area as well.
The field of human-machine interaction is not explored enough, with many open, unsolved problems.
Self-driving cars aim to revolutionize car travel by making it safe and efficient. In this article, we outlined some of the key components such as LiDAR, RADAR, cameras, and most importantly – the algorithms that make self-driving cars possible.
Few things need to be taken care of:
The algorithms used are not yet optimal enough to perceive roads and lanes because some roads lack markings and other signs.
The optimal sensing modality for localization, mapping, and perception still lack accuracy and efficiency.
Vehicle-to-vehicle communication is still a dream, but work is being done in this area as well.
The field of human-machine interaction is not explored enough, with many open, unsolved problems.
Q-learning is one of the most commonly used DRL algorithms for self-driving cars. It comes under the category of model-free learning. In model-free learning, the agent will try to approximate the optimal state-action pair. The policy still determines which action-value pairs or Q-value are visited and updated (see the equation below). The goal is to find optimal policy by interacting with the environment while modifying the same when the agent makes an error.
Computer vision has received great attention over the last two decades.
This research field is important not only in security-related software, but also in advanced interface between people and computers, advanced control methods and many other areas.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
camera-based Lane detection by deep learningYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/10/introduction-to-simultaneous-localization-and-mapping-slam-a-presentation-from-gareth-cross/
Independent game developer (and former technical lead of state estimation at Skydio) Gareth Cross presents the “Introduction to Simultaneous Localization and Mapping (SLAM)” tutorial at the May 2021 Embedded Vision Summit.
This talk provides an introduction to the fundamentals of simultaneous localization and mapping (SLAM). Cross aims to provide foundational knowledge, and viewers are not expected to have any prerequisite experience in the field.
The talk consists of an introduction to the concept of SLAM, as well as practical design considerations in formulating SLAM problems. Visual inertial odometry is introduced as a motivating example of SLAM, and Cross explains how this problem is structured and solved.
Google Self Driving Cars
The Google Self-Driving Car is a project by Google that involves developing technology for autonomous cars. The software powering Google's cars is called Google Chauffeur. Lettering on the side of each car identifies it as a "self-driving car". The project is currently being led by Google engineer Sebastian Thrun, former director of the Stanford Artificial Intelligence Laboratory and co-inventor of Google Street View. Thrun's team at Stanford created the robotic vehicle Stanley which won the 2005 DARPA Grand Challenge and its US$2 million prize from the United States Department of Defense. The team developing the system consisted of 15 engineers working for Google, including Chris Urmson, Mike Montemerlo, and Anthony Levandowski who had worked on the DARPA Grand and Urban Challenges.
Legislation has been passed in four states and the District of Columbia allowing driverless cars. The U.S. state of Nevada passed a law on June 29, 2011, permitting the operation of autonomous cars in Nevada, after Google had been lobbying in that state for robotic car laws. The Nevada law went into effect on March 1, 2012, and the Nevada Department of Motor Vehicles issued the first license for an autonomous car in May 2012, to a Toyota Prius modified with Google's experimental driverless technology. In April 2012, Florida became the second state to allow the testing of autonomous cars on public roads, and California became the third when Governor Jerry Brown signed the bill into law at Google HQ in Mountain View. In July 2014, the city of Coeur d'Alene, Idaho adopted a robotics ordinance that includes provisions to allow for self-driving cars.
Videos
https://www.youtube.com/channel/UCCLyNDhxwpqNe3UeEmGHl8g
Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos.
A Small Helping Hand from me to my Engineering collegues and my other friends in need of Object Detection
Self-Driving Cars With Convolutional Neural Networks (CNN.pptxssuserf79e761
Self-driving cars aim to revolutionize car travel by making it safe and efficient. In this article, we outlined some of the key components such as LiDAR, RADAR, cameras, and most importantly – the algorithms that make self-driving cars possible.
Few things need to be taken care of:
The algorithms used are not yet optimal enough to perceive roads and lanes because some roads lack markings and other signs.
The optimal sensing modality for localization, mapping, and perception still lack accuracy and efficiency.
Vehicle-to-vehicle communication is still a dream, but work is being done in this area as well.
The field of human-machine interaction is not explored enough, with many open, unsolved problems.
Self-driving cars aim to revolutionize car travel by making it safe and efficient. In this article, we outlined some of the key components such as LiDAR, RADAR, cameras, and most importantly – the algorithms that make self-driving cars possible.
Few things need to be taken care of:
The algorithms used are not yet optimal enough to perceive roads and lanes because some roads lack markings and other signs.
The optimal sensing modality for localization, mapping, and perception still lack accuracy and efficiency.
Vehicle-to-vehicle communication is still a dream, but work is being done in this area as well.
The field of human-machine interaction is not explored enough, with many open, unsolved problems.
Q-learning is one of the most commonly used DRL algorithms for self-driving cars. It comes under the category of model-free learning. In model-free learning, the agent will try to approximate the optimal state-action pair. The policy still determines which action-value pairs or Q-value are visited and updated (see the equation below). The goal is to find optimal policy by interacting with the environment while modifying the same when the agent makes an error.
Computer vision has received great attention over the last two decades.
This research field is important not only in security-related software, but also in advanced interface between people and computers, advanced control methods and many other areas.
RegNet: Multimodal Sensor Registration Using Deep Neural Networks
CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks
RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with Geometric Deep Learning and Generative Model
CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints
LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
CFNet: LiDAR-Camera Registration Using Calibration Flow Network
3D perception is crucial for understanding the real world. It offers many benefits and new capabilities over 2D across diverse applications, from XR and autonomous driving to IOT, camera, and mobile. 3D perception with machine learning is creating the new state of the art (SOTA) in areas, such as depth estimation, object detection, and neural scene representation. Making these SOTA neural networks feasible for real-world deployment on mobile devices constrained by power, thermal, and performance has been a challenge. Qualcomm AI Research has developed not only novel AI techniques for 3D perception but also full-stack AI optimizations to enable real-world deployments and energy-efficient solutions. This presentation explores the latest research that is enabling efficient 3D perception while maintaining neural network model accuracy. You’ll learn about:
- The advantages of 3D perception over 2D and the need for 3D perception across applications
- Advancements in 3D perception research by Qualcomm AI Research
- Our future 3D perception research directions
The 2016 Remote Sensing Field camp will take the form of two projects.
A low tech, low cost aerial photography project using visible spectrum UAV/Ultralight Aircraft mounted cameras as the sensor to demonstrate that relatively low tech, low cost solutions can achieve surprisingly good results when compared to more commercial systems.
A more high tech, high cost terrestrial LiDAR collect of a building or structure of historical or architectural significance.
The scope of a project will influence all other aspects of the project, including its cost, timing, quality and risk.
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...ijtsrd
Self driving cars has the potential to revolutionize urban mobility by providing sustainable, safe, and convenient and congestion free transportability. Autonomous driving vehicles have become a trend in the vehicle industry. Many driver assistance systems DAS have been presented to support these automatic cars. This vehicle autonomy as an application of AI has several challenges like infallibly recognizing traffic lights, signs, unclear lane markings, pedestrians, etc. These problems can be overcome by using the technological development in the fields of Deep Learning, Computer Vision due to availability of Graphical Processing Units GPU and cloud platform. By using deep learning, a deep neural network based model is proposed for reliable detection and recognition of traffic lights TL . Aswathy Madhu | Sruthy S ""Traffic Light Detection and Recognition for Self Driving Cars using Deep Learning: Survey"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30030.pdf
Paper Url : https://www.ijtsrd.com/engineering/computer-engineering/30030/traffic-light-detection-and-recognition-for-self-driving-cars-using-deep-learning-survey/aswathy-madhu
Lane and Object Detection for Autonomous Vehicle using Advanced Computer VisionYogeshIJTSRD
The vision of this project is to develop lane and object detection in Autonomous Vehicle system to run efficiently in normal road condition and to eliminate the use of high cost Light based LiDAR system to implement high resolution cameras with advanced computer vision and deep learning technology to provide an Advanced Driver Assistance System ADAS . Detecting lane lines could be a crucial task for any self driving autonomous vehicle. Hence, this project was focused to identify lane lines on the road using OpenCV. The OpenCV tools such as colour selection, the region of interest selection, grey scaling, canny edge detection and perspective transformation are being employed. This project is modelled as an integration of two systems to solve the real time implementation problem in autonomous vehicles. The first part of the system is lane detection by advanced computer vision techniques to detect the lane lines to command the vehicle to stay inside the lane marking. The second part of the system is object detection and tracking is to detect and track the vehicle and pedestrians on the road to get a clear understanding of the environment to plan and generate a trajectory to navigate the autonomous vehicle safely to its destination without any crashes, this is done by a special deep learning method called transfer learning with Single Shot multibox Detection SSD algorithm and Mobile Net architecture. G. Monika | S. Bhavani | L. Azim Jahan Siana | N. Meenakshi "Lane and Object Detection for Autonomous Vehicle using Advanced Computer Vision" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-3 , April 2021, URL: https://www.ijtsrd.com/papers/ijtsrd39952.pdf Paper URL: https://www.ijtsrd.com/engineering/electronics-and-communication-engineering/39952/lane-and-object-detection-for-autonomous-vehicle-using-advanced-computer-vision/g-monika
Object gripping algorithm for robotic assistance by means of deep leaning IJECEIAES
This paper exposes the use of recent deep learning techniques in the state of the art, little addressed in robotic applications, where a new algorithm based on Faster R-CNN and CNN regression is exposed. The machine vision systems implemented, tend to require multiple stages to locate an object and allow a robot to take it, increasing the noise in the system and the processing times. The convolutional networks based on regions allow one to solve this problem, it is used for it two convolutional architectures, one for classification and location of three types of objects and one to determine the grip angle for a robotic gripper. Under the establish virtual environment, the grip algorithm works up to 5 frames per second with a 100% object classification, and with the implementation of the Faster R-CNN, it allows obtain 100% accuracy in the classifications of the test database, and over a 97% of average precision locating the generated boxes in each element, gripping successfully the objects.
Goal location prediction based on deep learning using RGB-D camerajournalBEEI
In the navigation system, the desired destination position plays an essential role since the path planning algorithms takes a current location and goal location as inputs as well as the map of the surrounding environment. The generated path from path planning algorithm is used to guide a user to his final destination. This paper presents a proposed algorithm based on RGB-D camera to predict the goal coordinates in 2D occupancy grid map for visually impaired people navigation system. In recent years, deep learning methods have been used in many object detection tasks. So, the object detection method based on convolution neural network method is adopted in the proposed algorithm. The measuring distance between the current position of a sensor and the detected object depends on the depth data that is acquired from RGB-D camera. Both of the object detected coordinates and depth data has been integrated to get an accurate goal location in a 2D map. This proposed algorithm has been tested on various real-time scenarios. The experiments results indicate to the effectiveness of the proposed algorithm.
Similar to Lidar for Autonomous Driving II (via Deep Learning) (20)
Application of Foundation Model for Autonomous DrivingYu Huang
Since DARPA’s Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.
Fisheye based Perception for Autonomous Driving VIYu Huang
Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras
FisheyeDistanceNet++: Self-Supervised Fisheye Distance Estimation with Self-Attention, Robust Loss Function and Camera View Generalization
An Online Learning System for Wireless Charging Alignment using Surround-view Fisheye Cameras
RoadEdgeNet: Road Edge Detection System Using Surround View Camera Images
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
Road-line detection and 3D reconstruction using fisheye cameras
• Vehicle Re-ID for Surround-view Camera System
• SynDistNet: Self-Supervised Monocular Fisheye Camera Distance
Estimation Synergized with Semantic Segmentation for Autonomous
Driving
• Universal Semantic Segmentation for Fisheye Urban Driving Images
• UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a
Generic Framework for Handling Common Camera Distortion Models
• OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving
• Adversarial Attacks on Multi-task Visual Perception for Autonomous Driving
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
FisheyeMultiNet: Real-time Multi-task Learning Architecture for
Surround-view Automated Parking System
• Generalized Object Detection on Fisheye Cameras for Autonomous
Driving: Dataset, Representations and Baseline
• SynWoodScape: Synthetic Surround-view Fisheye Camera Dataset for
Autonomous Driving
• Feasible Self-Calibration of Larger Field-of-View (FOV) Camera Sensors
for the ADAS
Autonomous driving for robotaxi, like perception, prediction, planning, decision making and control etc. As well as simulation, visualization and data closed loop etc.
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
Canadian Adverse Driving Conditions Dataset, 2020, 2
Deep multimodal sensor fusion in unseen adverse weather, 2020, 8
RADIATE: A Radar Dataset for Automotive Perception in Bad Weather, 2021, 4
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection, 2021, 7
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather, 2021, 8
DSOR: A Scalable Statistical Filter for Removing Falling Snow from LiDAR Point Clouds in Severe Winter Weather, 2021, 9
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
Introduction;
data driven models for autonomous driving;
cloud computing infrastructure and big data processing;
annotation tools for training data;
large scale model training platform;
model testing and verification;
related machine learning techniques;
Conclusion.
Simulation for autonomous driving at uber atgYu Huang
Testing Safety of SDVs by Simulating Perception and Prediction
LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
Recovering and Simulating Pedestrians in the Wild
S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling
SceneGen: Learning to Generate Realistic Traffic Scenes
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors
GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving
AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
Appendix: (Waymo)
SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
Prediction and planning for self driving at waymoYu Huang
ChauffeurNet: Learning To Drive By Imitating The Best Synthesizing The Worst
Multipath: Multiple Probabilistic Anchor Trajectory Hypotheses For Behavior Prediction
VectorNet: Encoding HD Maps And Agent Dynamics From Vectorized Representation
TNT: Target-driven Trajectory Prediction
Large Scale Interactive Motion Forecasting For Autonomous Driving : The Waymo Open Motion Dataset
Identifying Driver Interactions Via Conditional Behavior Prediction
Peeking Into The Future: Predicting Future Person Activities And Locations In Videos
STINet: Spatio-temporal-interactive Network For Pedestrian Detection And Trajectory Prediction
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Lidar for Autonomous Driving II (via Deep Learning)
1. LiDAR for Autonomous Vehicles II
(via Deep Learning)
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
2. Outline
Online Camera LiDAR Fusion and Object Detection on
Hybrid Data for Autonomous Driving
RegNet: Multimodal Sensor Registration Using Deep
Neural Networks
Vehicle Detection from 3D Lidar Using FCN
VoxelNet: End-to-End Learning for Point Cloud Based
3D Object Detection
Object Detection and Classification in Occupancy Grid
Maps using Deep Convolutional Networks
RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point
Cloud for Autonomous Driving
BirdNet: a 3D Object Detection Framework from LiDAR
information
LMNet: Real-time Multiclass Object Detection on CPU
using 3D LiDAR
HDNET: Exploit HD Maps for 3D Object Detection
IPOD: Intensive Point-based Object Detector for Point
Cloud
PIXOR: Real-time 3D Object Detection from Point
Clouds
DepthCN: Vehicle Detection Using 3D-LIDAR and
ConvNet
SECOND: Sparsely Embedded Convolutional Detection
YOLO3D: E2E RT 3D Oriented Object Bounding Box
Detection from LiDAR Point Cloud
YOLO4D: A ST Approach for RT Multi-object Detection
and Classification from LiDAR Point Clouds
Deconvolutional Networks for Point-Cloud Vehicle
Detection and Tracking in Driving Scenarios
Fast and Furious: Real Time E2E 3D Detection,
Tracking and Motion Forecasting with a Single
Convolutional Net
…To be continued
3. Outline
SqueezeSeg: Convolutional Neural Nets with
Recurrent CRF for Real-Time Road-Object
Segmentation from 3D LiDAR Point Cloud
SEGCloud: Semantic Segmentation of 3D Point
Clouds
Multi-View 3D Object Detection Network for
Autonomous Driving
A General Pipeline for 3D Detection of Vehicles
Combining LiDAR Space Clustering and Convolutional
Neural Networks for Pedestrian Detection
Pseudo-LiDAR from Visual Depth Estimation: Bridging
the Gap in 3D Object Detection for Autonomous
Driving
PointNet: Deep Learning on Point Sets for 3D
Classification and Segmentation
PointNet++: Deep Hierarchical Feature Learning on
Point Sets in a Metric Space
PointFusion: Deep Sensor Fusion for 3D Bounding
Box Estimation
Frustum PointNets for 3D Object Detection from RGB-
D Data
RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
Joint 3D Proposal Generation and Object Detection
from View Aggregation
SPLATNet: Sparse Lattice Networks for Point Cloud
Processing
PointRCNN: 3D Object Proposal Generation and
Detection from Point Cloud
Deep Continuous Fusion for Multi-Sensor 3D Object
Detection
End-to-end Learning of Multi-sensor 3D Tracking by
Detection
4. Online Camera LiDAR Fusion and Object Detection on
Hybrid Data for Autonomous Driving
Non-calibrated sensors result in artifacts and aberration in the environment model, which
makes tasks like free-space detection more challenging.
To improve the LiDAR and camera fusion approach of Levinson and Thrun.
Rely on intensity discontinuities and erosion and dilation of the edge image for increased
robustness against shadows and visual patterns, which is a recurring problem in point cloud
related work.
Use a gradient free optimizer instead of an exhaustive grid search to find the extrinsic
calibration.
The fusion pipeline is lightweight and able to run in real-time on a computer in the car.
For the detection task, modify the Faster R-CNN architecture to accommodate hybrid LiDAR-
camera data for improved object detection and classification.
5. Online Camera LiDAR Fusion and Object Detection on
Hybrid Data for Autonomous Driving
sensor fusion and object detection pipeline
estimating the rotation and translation
btw their coordinate systems Non-optimal calibration
6. RegNet: Multimodal Sensor Registration Using Deep
Neural Networks
RegNet, the deep CNN to infer a 6 DOF extrinsic calibration between multimodal sensors,
exemplified using a scanning LiDAR and a monocular camera.
Compared to existing approaches, RegNet casts all 3 conventional calibration steps (feature
extraction, feature matching and global regression) into a single real-time capable CNN.
It does not require any human interaction and bridges the gap between classical offline and
target-less online calibration approaches as it provides both a stable initial estimation as well
as a continuous online correction of the extrinsic parameters.
During training, randomly decalibrate our system in order to train RegNet to infer the
correspondence between projected depth measurements and RGB image and finally regress
the extrinsic calibration.
Additionally, with an iterative execution of multiple CNNs, that are trained on different
magnitudes of decalibration, it compares favorably to state-of-the-art methods in terms of a
mean calibration error of 0.28◦ for the rotational and 6 cm for the translation components
even for large decalibrations up to 1.5 m and 20◦ .
7. RegNet: Multimodal Sensor Registration Using Deep
Neural Networks
It estimates the calibration btw a depth and an RGB sensor. The depth points are projected on the RGB
image using an initial calibration Hinit. In the 1st and 2nd part of the network, use NiN blocks to extract rich
features for matching. The final part regresses decalibration by gathering global info. using two FCLs.
During training φdecalib is randomly permutated resulting in different projections of the depth points.
9. Vehicle Detection from 3D Lidar Using FCN
Point clouds from a Velodyne scan can be
roughly projected and discretized into a 2D
point map;
The projected point map analogous to
cylindral images;
Encode the bounding box corner of the
vehicle (8 corners as 24-d);
It consists of one objectness classification
branch and one bounding box regression
branch.
10. (a) The input point map, with
the d channel visualized. (b)
The output confidence map of
the objectness branch. (c)
Bounding box candidates
corresponding to all points
predicted as positive, i.e. high
confidence points in (b). (d)
Remaining bounding boxes
after non-max suppression.
Vehicle Detection from 3D Lidar Using FCN
11. VoxelNet: End-to-End Learning for Point Cloud Based
3D Object Detection
Just remove the need of manual feature
engineering for 3D point clouds and propose
VoxelNet, a generic 3D detection network that
unifies feature extraction and bounding box
prediction into a single stage, end-to-end trainable
deep network.
Specifically, VoxelNet divides a point cloud into
equally spaced 3D voxels and transforms a group
of points within each voxel into a unified feature
representation through the voxel feature encoding
(VFE) layer.
In this way, the point cloud is encoded as a
descriptive volumetric representation, which is
then connected to a RPN to generate detections.
15. Object Detection and Classification in Occupancy
Grid Maps using Deep Convolutional Networks
Based on a grid map environment
representation, well-suited for sensor fusion,
free-space estimation and machine learning,
detect and classify objects using deep CNNs.
As input, use a multi-layer grid map efficiently
encoding 3D range sensor info.
The inference output consists of a list of rotated
Bboxes with associated semantic classes.
Transform range sensor measurements to a multi-
layer grid map which serves as input for object
detection and classification network. From these top
view grid maps the network infers rotated 3D
bounding boxes together with semantic classes.
These boxes can be projected into the camera image
for visual validation. Cars are depicted green, cyclists
aquamarin and pedestrians cyan.
16. Object Detection and Classification in Occupancy
Grid Maps using Deep Convolutional Networks
Below are minimal preprocessing to obtain occupancy grid maps.
As there are labeled objects only in the camera image, remove all points that are not in the
camera’s field of view.
Apply ground surface segmentation and estimate different grid cell features, then the resulting
multi-layer grid maps are of size 60m×60m and a cell size of either 10cm or 15cm.
As observed, the ground is flat in most of the scenarios, so fit a ground plane to the
representing point set.
Then, use the full point set or a non-ground subset to construct a multi-layer grid map
containing different features.
17. Object Detection and Classification in Occupancy
Grid Maps using Deep Convolutional Networks
KITTI Bird’s Eye View Evaluation 2017 consists of 7481 images for training and 7518
images for testing as well as corresponding range sensor data represented as point sets.
Training and test data contain 80,256 labeled objects in total which are represented as
oriented 3D Bboxes (7 parameters).
As summarized in Table, there are 8 semantic classes labeled in the training set although
not all classes are used to determine the benchmark result.
18. RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point
Cloud for Autonomous Driving
Real-time 3-dimensional (RT3D) vehicle detection
method that utilizes pure LiDAR point cloud to
predict the location, orientation, and size of vehicles.
Apply pre-RoIpooling convolution that moves a
majority of the convolution operations to ahead of the
RoI pooling, leaving just a small part behind, so that
significantly boosts the computation efficiency.
A pose-sensitive feature map design is strongly
activated by the relative poses of vehicles, leading to
a high regression accuracy on the location,
orientation, and size of vehicles.
RT3D is the 1st LiDAR 3-D vehicle detection work
that completes detection within 0.09s.
19. RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point
Cloud for Autonomous Driving
The network architecture of RT3D
20. BirdNet: a 3D Object Detection Framework from LiDAR
information
LiDAR- based 3D object detection pipeline entailing three stages:
First, laser info. is projected into a novel cell encoding for bird’s eye view projection.
Later, both object location on the plane and its heading are estimated through a
convolutional neural network originally designed for image processing.
Finally, 3D oriented detections are computed in a post-processing phase.
21. BirdNet: a 3D Object Detection Framework from LiDAR
information
Results on KITTI Benchmark test set: detections in image, BEV projection, and 3D point cloud.
22. LMNet: Real-time Multiclass Object Detection on CPU
using 3D LiDAR
An optimized single-stage deep CNN to detect objects in urban environments, using
nothing more than point cloud data.
The network structure employs dilated convolutions to gradually increase the perceptive
field as depth increases, this helps to reduce the computation time by about 30%.
The input consists of 5 perspective representations of the unorganized point cloud data.
The network outputs an objectness map and the bounding box offset values for each point.
Using reflection, range, and the position on each of the 3 axes helped to improve the
location and orientation of the output bounding box.
Execution times is 50 FPS using desktop GPUs, and up to 10 FPS on a Intel Core i5 CPU.
23. LMNet: Real-time Multiclass Object Detection on CPU
using 3D LiDAR
Used dilated layers
The LMNet architecture
Encoded input point cloud
24. HDNET: Exploit HD Maps for 3D Object
Detection
High-Definition (HD) maps provide strong priors that can boost the performance and
robustness of modern 3D object detectors.
Here is a one stage detector to extract geometric and semantic features from the HD maps.
As maps might not be available everywhere, a map prediction module estimates the map
on the fly from raw LiDAR data.
The whole framework runs at 20 frames per second.
25. HDNET: Exploit HD Maps for 3D Object
Detection
BEV LiDAR representation that exploits geometric and semantic HD map information.
(a) The raw LiDAR point cloud. (b) Incorporating geometric ground prior.
(c) Discretization of the LiDAR point cloud. (d) Incorporating semantic road prior.
26. HDNET: Exploit HD Maps for 3D Object
Detection
Network structures for object detection (left) and online map estimation (right).
27. IPOD: Intensive Point-based Object Detector for Point
Cloud
A 3D object detection framework, IPOD, based on raw point cloud.
It seeds object proposal for each point, which is the basic element.
An E2E trainable architecture, where features of all points within a proposal are extracted
from the backbone network and achieve a proposal feature for final bounding inference.
These features with both context info. and precise point cloud coord.s improve the performance.
28. IPOD: Intensive Point-based Object Detector for Point
Cloud
Illustration of point-based proposal
generation. (a) Semantic segmentation result
on the image. (b) Projected segmentation
result on point cloud. (c) Point-based
proposals on positive points after NMS.
29. IPOD: Intensive Point-based Object Detector for Point
Cloud
Illustration of proposal feature generation module. It combines location info. and
context feature to generate offsets from the centroid of interior points to the center of
target instance object. The predicted residuals are added back to the location info. in order
to make feature more robust to geometric transformation.
30. IPOD: Intensive Point-based Object Detector for Point
Cloud
Backbone architecture. Bounding-box prediction network.
31. PIXOR: Real-time 3D Object Detection from Point
Clouds
This method utilizes the 3D data more efficiently by representing the scene from the
Bird’s Eye View (BEV), and propose PIXOR (ORiented 3D object detection from
PIXel-wise NN predictions), a proposal-free, single-stage detector that outputs
oriented 3D object estimates decoded from pixel-wise neural network predictions.
The input representation, network architecture, and model optimization are specially
designed to balance high accuracy and real-time efficiency.
3D object detector from Bird’s Eye View (BEV) of LIDAR point cloud.
32. PIXOR: Real-time 3D Object Detection from Point
Clouds
The network architecture of PIXOR
Use cross-entropy loss on the classification output
and a smooth loss on the regression output.
Sum the classification loss over all locations on the
output map, while the regression loss is computed
over positive locations only.
33. DepthCN: Vehicle Detection Using 3D-LIDAR and
ConvNet
Vehicle detection based on the Hypothesis Generation (HG) and Verification (HV)
paradigms.
The data inputted to the system is a point cloud obtained from a 3D-LIDAR
mounted on board an instrumented vehicle, which is transformed to a Dense-
depth Map (DM).
The solution starts by removing ground points followed by point cloud
segmentation.
Then, segmented obstacles (object hypotheses) are projected onto the DM.
Bboxes are fitted to the segmented objects as vehicle hypotheses (HG step).
Bboxes are used as inputs to a ConvNet to classify/verify the hypotheses of
belonging to the category ‘vehicle’ (HV step).
34. DepthCN: Vehicle Detection Using 3D-LIDAR and
ConvNet
3D-LIDAR-based vehicle detection algorithm (DepthCN).
35. DepthCN: Vehicle Detection Using 3D-LIDAR and
ConvNet
Top: the point cloud where the detected ground points are denoted with green and LIDAR points that are
out of the field of view of the camera are shown in red. Bottom: the projected clusters and HG results in the
form of 2D BB. Right: the zoomed view, and the vertical orange arrows indicate corresponding obstacles.
36. DepthCN: Vehicle Detection Using 3D-LIDAR and
ConvNet
The generated Dense-depth Map (DM) with the
projected hypotheses (red).
The ConvNet architecture The generated hypotheses and the detection results are
shown as red and dashed-green BBs, respectively, in both
DM and images. The bottom figures show the result in PCD.
37. SECOND: Sparsely Embedded Convolutional
Detection
An improved sparse convolution method for such networks, which significantly increases
the speed of both training and inference.
Introduce a new form of angle loss regression to improve the orientation estimation
performance and a new data augmentation approach that can enhance the convergence
speed and performance.
The proposed network produces SoA results on the KITTI 3D object detection
benchmarks while maintaining a fast inference speed.
The detector takes a raw point cloud as input, converts it to voxel features and coordinates, and applies two VFE
(voxel feature encoding) layers and a linear layer. A sparse CNN is applied and an RPN generates the detection.
38. SECOND: Sparsely Embedded Convolutional
Detection
The sparse convolution
algorithm is shown above, and
the GPU rule generation
algorithm is shown below. Nin
denotes the number of input
features, and Nout denotes the
number of output features. N is
the number of gathered features.
Rule is the rule matrix, where
Rule[i, :, :] is the ith rule
corresponding to the ith kernel
matrix in the convolution kernel.
The boxes with colors except
white indicate points with sparse
data and the white boxes
indicate empty points.
39. SECOND: Sparsely Embedded Convolutional
Detection
A GPU-based rule generation algorithm
(Algorithm 1) that runs faster on a GPU.
First, collect the input indexes and
associated spatial indexes instead of the
output indexes (1st loop). Duplicate
output locations are obtained in this
stage. Then execute a unique parallel
algorithm on the spatial index data to
obtain the output indexes and their
associated spatial indexes. A buffer with
the same spatial dimensions as those of
the sparse data is generated from the
previous results for table lookup in the
next step (2nd loop). Finally, we iterate
on the rules and use the stored spatial
indexes to obtain the output index for
each input index (3rd loop).
40. SECOND: Sparsely Embedded Convolutional
Detection
The structure of sparse middle feature extractor. The
yellow boxes represent sparse convolution, the
white boxes represent submanifold convolution, and
the red box represents the sparse-to-dense layer.
The upper part of the figure shows the spatial
dimensions of the sparse data.
Lθ = SmoothL1(sin(θp − θt)),
Introducing a new angle loss regression
This approach to angle loss has two advantages:
(1) it solves the adversarial problem btw orientations of 0, π;
(2) it naturally models the IoU against the angle offset
function.
Structure of RPN
downsampling
convolutional layers
concatenation
transpose convolutional layers
41. SECOND: Sparsely Embedded Convolutional
Detection
Results of 3D detection on the KITTI test set. For better visualization, the 3D boxes
detected using LiDAR are projected onto images from the left camera.
42. YOLO3D: E2E RT 3D Oriented Object Bounding Box
Detection from LiDAR Point Cloud
Based on the success of the one-shot regression meta-architecture in the 2D perspective
image space, extend it to generate oriented 3D object Bboxes from LiDAR point cloud.
The idea is extending the loss function of YOLO v2 to include the yaw angle, the 3D box
center in Cartesian coordinates and the height of the box as a direct regression problem.
This formulation enables real-time performance, which is essential for automated driving.
In KITTI, it achieves real-time performance (40 fps) on Titan X GPU.
43. YOLO3D: E2E RT 3D Oriented Object Bounding Box
Detection from LiDAR Point Cloud
The total loss
Project the point cloud to get bird’s eye view grid map.
create two grid maps from projection of point cloud.
The first feature map contains the maximum height,
where each grid cell (pixel) value represents the height
of the highest point associated with that cell.
The second grid map represent the density of points.
In YOLO-v2, anchors are calculated using k-means
clustering over width and length of ground truth boxes.
The point behind using anchors, is to find priors for the
boxes, onto which the model can predict modifications.
The anchors must be able to cover the whole range of
boxes that can appear in the data.
Choose not to use clustering to calculate the anchors,
and instead, calculate the mean 3D box dimensions for
each object class, and use these average box
dimensions as anchors.
44. YOLO4D: A ST Approach for RT Multi-object
Detection and Classification from LiDAR Point Clouds
YOLO4D: the 3D LiDAR point clouds are aggregated over time as a 4D tensor;
3D space dimensions in addition to the time dimension, which is fed to a one-
shot fully convolutional detector, based on YOLO v2 architecture.
YOLO3D is extended with Convol. LSTM for temporal features aggregation.
The outputs are the oriented 3D Object BBox info., in addition to its length (L),
width (W), height (H) and orientation (yaw), together with the objects classes and
confidence scores.
Two different techniques are evaluated to incorporate the temporal dimension:
recurrence and frame stacking.
45. YOLO4D: A ST Approach for RT Multi-object
Detection and Classification from LiDAR Point Clouds
Left: Frame stacking architecture; Right: Convolutional LSTM architecture.
The prediction model
The total loss
46. Deconvolutional Networks for Point-Cloud Vehicle
Detection and Tracking in Driving Scenarios
A full vehicle detection and tracking system that
works with 3D lidar information only.
The detection step uses a CNN that receives as
input a featured representation of the 3D
information provided by a Velodyne HDL-64
sensor and returns a per-point classification of
whether it belongs to a vehicle or not.
The classified point cloud is then geometrically
processed to generate observations for a multi-
object tracking system implemented via a
number of Multi-Hypothesis Extended Kalman
Filters (MH-EKF) that estimate the position and
velocity of the surrounding vehicles.
The model is fed with an encoded representation
of the point cloud and computes for 3D each point
its probability of belonging to a vehicle. The
classified points are then clustered generating
trustworthy observations that are fed to MH-EKF
based tracker.
47. Deconvolutional Networks for Point-Cloud Vehicle
Detection and Tracking in Driving Scenarios
To obtain a useful input for the detector,
project the 3D point cloud raw data to a
featured image-like representation
containing ranges and reflectivity info. by
means of transformation G(·).
Ground truth for learning the classification
task is obtained by first projecting the
image-based Kitti tracklets over the 3D
Velodyne info., and then applying again
transformation G(·) over the selected points.
48. Deconvolutional Networks for Point-Cloud Vehicle
Detection and Tracking in Driving Scenarios
The network encompasses only conv. and deconv. Blocks followed by BN and ReLU nonlinearities. The first
3 blocks conduct the feature extraction step controlling, according to vehicle detection objective, the size of
the receptive fields and the feature maps generated. The next 3 deconvolutional blocks expanse the info.
enabling the point-wise classification. After each deconvolution, feature maps from the lower part of the
network are concatenated (CAT) before applying the normalization and non-linearities, providing richer info.
and better performance. During training, 3 losses are calculated at different network points.
49. Deconvolutional Networks for Point-Cloud Vehicle
Detection and Tracking in Driving Scenarios
They show the raw input point cloud, the
Deep detector output, the final tracked
vehicles and the RGB projected bounding
boxes submitted for evaluation.
50. Fast and Furious: Real Time E2E 3D Detection,
Tracking and Motion Forecasting with a Single
Convolutional Net
A deep neural network to jointly reason about 3D detection, tracking and motion forecasting
given data captured by a 3D sensor.
By jointly reasoning about these tasks, the holistic approach is more robust to occlusion as
well as sparse data at range.
It performs 3D convolutions across space and time over a bird’s eye view representation of
the 3D world, which is very efficient in terms of both memory and computation.
It can perform all tasks in as little as 30 ms.
Overlay temporal & motion forecasting data.
Green: bbox w/ 3D point. Grey: bbox w/o 3D point.
51. Fast and Furious: Real Time E2E 3D Detection,
Tracking and Motion Forecasting with a Single
Convolutional Net
The FaF work takes multiple frames as input and performs detection, tracking and motion forecasting.
52. Fast and Furious: Real Time E2E 3D Detection,
Tracking and Motion Forecasting with a Single
Convolutional Net
Modeling temporal information
53. Fast and Furious: Real Time E2E 3D Detection,
Tracking and Motion Forecasting with a Single
Convolutional Net
Motion forecasting
The loss function
classification loss
The regression targets
smooth L1
54. SqueezeSeg: Conv. Neural Nets with Recurrent CRF for RT
Road-Object Segmentation from 3D LiDAR Point Cloud
Semantic segmentation of road-objects from 3D LiDAR point clouds.
Detect and categorize instances of interest, such as cars, pedestrians and cyclists.
Formulate it as a pointwise classification problem, and propose an E2E pipeline called
SqueezeSeg based on CNN: the CNN takes a transformed LiDAR point cloud as input and
directly outputs a point-wise label map, which is then refined by a CRF as a recurrent layer.
Instance-level labels are then obtained by conventional clustering algorithms.
The CNN model is trained on LiDAR point clouds from the KITTI dataset, and point-wise
segmentation labels are derived from 3D bounding boxes from KITTI.
To obtain extra training data, built a LiDAR simulator into Grand Theft Auto V (GTA-V), a
popular video game, to synthesize large amounts of realistic training data.
GT segmentation Predicted segmentation
55. SqueezeSeg: Conv. Neural Nets with Recurrent CRF for RT
Road-Object Segmentation from 3D LiDAR Point Cloud
LiDAR Projections.
Network structure of SqueezeSeg
56. SqueezeSeg: Conv. Neural Nets with Recurrent CRF for RT
Road-Object Segmentation from 3D LiDAR Point Cloud
Structure of FireModule and FireDeconv
Conditional Random Field (CRF) as an RNN layer
https://github.com/BichenWuUCB/SqueezeSeg.
57. SEGCloud: Semantic Segmentation of 3D Point Clouds
SEGCloud, an E2E framework to obtain 3D point-level segmentation that combines the
advantages of NNs, trilinear interpolation(TI) and fully connected CRF (FC-CRF).
Coarse voxel predictions from a 3D Fully Convolutional NN are transferred back to the raw
3D points via trilinear interpolation.
FC-CRF enforces global consistency and provides fine-grained semantics on the points.
Implement the FC-CRF as a differentiable Recurrent NN to allow joint optimization.
58. SEGCloud: Semantic Segmentation of 3D Point Clouds
The 3D-FCNN is made of 3 residual layers sandwiched between 2 convolutional layers.
Max Pooling in the early stages of the network yields a 4X downsampling.
59. SEGCloud: Semantic Segmentation of 3D Point Clouds
Trilinear interpolation of class scores from voxels to points: Each point’s score is
computed as the weighted sum of the scores from its 8 spatially closest voxel centers.
60. SEGCloud: Semantic Segmentation of 3D Point Clouds
A 2-stage training by first optimizing over the point-level unary potentials (no
CRF) and then over the joint framework for point-level fine-grained labeling.
61. Multi-View 3D networks (MV3D), a sensory-fusion
framework that takes both LIDAR point cloud and RGB
images as input and predicts oriented 3D b boxes.
Composed of 2 subnetworks: one for 3D object
proposal generation, one for multi-view feature fusion.
The proposal network generates 3D candidate boxes
from bird’s eye view representation of 3D point cloud.
A deep fusion scheme to combine region-wise
features from multiple views and enable interactions
btw intermediate layers of different paths.
Multi-View 3D Object Detection
Network for Autonomous Driving
63. Input features of the MV3D network.
Multi-View 3D Object Detection
Network for Autonomous Driving
64. Training strategy for the Region-
based Fusion Network: During
training, the bottom 3 paths and losses
are added to regularize the network.
The auxiliary layers share weights with
the corresponding layers in the main
network.
Multi-View 3D Object Detection
Network for Autonomous Driving
65. A General Pipeline for 3D Detection of
Vehicles
A pipeline to adopt 2D detection net and fuse it with a 3D point cloud to generate 3D info.
To identify the 3D box, model fitting based on generalised car models and score maps.
A two-stage CNN is proposed to refine the detected 3D box.
General fusion pipeline. All of the point clouds viewed from the top (bird’s eye view). The height is encoded by color, with
red being the ground. A subset of points is selected based on the 2D detection. A model fitting algorithm based on the
generalised car models and score maps is applied to find the car points in the subset and a two-stage refinement CNN is
designed to fine tune the detected 3D box and re-assign an objectiveness score to it.
66. A General Pipeline for 3D Detection of
Vehicles
Generalised car models Score map (scores are indicated at bottom.)
Qualitative result illustration on KITTI data (top) and Boston data (bottom). Blue boxes are the 3D detection results
67. Combining LiDAR Space Clustering and Convolutional
Neural Networks for Pedestrian Detection
In purely image- based pedestrian detection approaches, the SoA results
have been achieved with CNN and surprisingly few detection frameworks
have been built upon multi-cue approaches.
This is a pedestrian detector for autonomous vehicles that exploits LiDAR
data, in addition to visual info.
LiDAR data is utilized to generate region proposals by processing the 3-d
point cloud that it provides.
These candidate regions are then further processed by a SoA CNN
classifier that was fine-tuned for pedestrian detection.
68. Combining LiDAR Space Clustering and Convolutional
Neural Networks for Pedestrian Detection
(a) Cluster proposal (b) Size and ratio corrections
69. Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
Taking the inner workings of CNNs into consideration, convert image- based depth maps to
pseudo-LiDAR representations.
With this representation, apply different existing LiDAR-based detection algorithms.
On the popular KITTI benchmark, it raises the detection accuracy of objects within 30m
range from the previous SoA of 22% to an unprecedented 74%.
Pseudo-LiDAR signal from visual depth estimation.
70. Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
The two-step pipeline for image-based 3D object detection. Given stereo or monocular images,
first predict the depth map, followed by transforming it into a 3D point cloud in the LiDAR
coordinate system. Call this representation as pseudo-LiDAR, and process it exactly like
LiDAR — any LiDAR-based 3D objection algorithms thus can be applied.
71. Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
Apply a single 2D convolution with a
uniform kernel to the frontal view depth
map (top-left). The resulting depth map
(top-right), after projected into the bird’s-
eye view (bottom-right), reveals a large
depth distortion in comparison to the
original pseudo-LiDAR view (bottom-left),
especially for far-away objects. The boxes
are super-imposed and contain all points of
the green and yellow cars respectively.
72. Fusing Bird’s Eye View LIDAR Point Cloud and Front View
Camera Image for Deep Object Detection
A method for fusing LIDAR point cloud and camera-captured images in deep
CNN.
The method constructs a layer called sparse non-homogeneous pooling layer to
transform features between bird’s eye view and front view.
The sparse point cloud is used to construct the mapping between the two views.
The pooling layer allows fusion of multi-view features at any stage of the network.
This is favorable for 3D object detection using camera-LIDAR fusion for
autonomous driving.
A corresponding one-stage detector is designed and tested, which produces 3D
Bboxes from the bird’s eye view map.
73. Fusing Bird’s Eye View LIDAR Point Cloud and Front View
Camera Image for Deep Object Detection
The vanilla fusion-based one-stage object detection network
The sparse non-homogeneous pooling layer that fuses
front view image and bird’s eye view LIDAR feature.
74. Fusing Bird’s Eye View LIDAR Point Cloud and Front View
Camera Image for Deep Object Detection
(a)From camera to bird’s eye. (b)From bird’s eye to camera. (c)From front view conv4
layer to bird’s eye conv4 layer. (d)From bird’s eye conv4 to bird’s eye conv4.
75. Fusing Bird’s Eye View LIDAR Point Cloud and Front View
Camera Image for Deep Object Detection
The fusion-based one-stage object detection network with SOA single-sensor networks.
76. PointNet: Deep Learning on Point Sets for 3D
Classification and Segmentation
Applications of PointNet. PointNet is a deep net architecture that consumes raw point cloud (set of
points) without voxelization or rendering. It is a unified architecture that learns both global and local
point features, providing a simple, efficient and effective approach for a number of 3D recognition tasks.
77. PointNet: Deep Learning on Point Sets for 3D
Classification and Segmentation
PointNet Architecture. The classification network takes n points as input, applies input and feature transformations, and
then aggregates point features by max pooling. The output is classification scores for k classes. The segmentation network
is an extension to the classification net. It concatenates global and local features and outputs per point scores.
78. PointNet++: Deep Hierarchical Feature Learning on
Point Sets in a Metric Space
PointNet does not capture local structures induced by the metric space points live in,
limiting its ability to recognize fine-grained patterns and generalizability to complex
scenes.
The network called PointNet++ is able to learn deep point set features efficiently and
robustly.
This is a hierarchical NN that applies PointNet recursively on a nested partitioning of the
input point set.
By exploiting metric space distances, the network is able to learn local features with
increasing contextual scales.
With further observation that point sets are usually sampled with varying densities, which
results in greatly decreased performance for networks trained on uniform densities, a set
learning layers is able to adaptively combine features from multiple scales.
80. PointFusion: Deep Sensor Fusion for 3D Bounding Box
Estimation
PointFusion, a generic 3D object detection method that leverages both image and 3D point
cloud information.
The image data and the raw point cloud data are independently processed by a CNN and a
PointNet architecture, respectively.
The resulting outputs are then combined by a novel fusion network, which predicts multiple
3D box hypotheses and their confidences, using the input 3D points as spatial anchors.
Sample 3D object detection results of
PointFusion model on the KITTI dataset
(left) and the SUN-RGBD dataset (right).
81. PointFusion: Deep Sensor Fusion for 3D Bounding Box
Estimation
A PointNet variant that processes raw point cloud data (A), and a CNN that extracts visual features from an input
image (B). A vanilla global architecture that directly regresses the box corner locations (D), and a dense
architecture that predicts the spatial offset of each of the 8 corners relative to an input point (C): for each input
point, the network predicts the spatial offset (white arrows) from a corner (red dot) to the input point (blue), and
selects the prediction with the highest score as the final prediction (E).
82. Frustum PointNets for 3D Object Detection
from RGB-D Data
A 3D object detection solution from RGB-D data in both indoor and outdoor
scenes.
Previous methods focus on images or 3D voxels, often obscuring natural 3D
patterns and invariances of 3D data, this operate on raw point clouds by popping
up RGB-D scans.
A challenge is how to efficiently localize objects in point clouds of large-scale
scenes (region proposal).
Instead of solely relying on 3D proposals, it leverages both mature 2D object
detectors and advanced 3D deep learning for object localization, achieving
efficiency as well as high recall.
Benefited from learning directly in raw point clouds, it is also able to precisely
estimate 3D Bboxes even under strong occlusion or with very sparse points.
83. Frustum PointNets for 3D Object Detection
from RGB-D Data
3D object detection pipeline. Given RGB-D data, first generate 2D object region proposals in
the RGB image using a CNN. Each 2D region is then extruded to a 3D viewing frustum in which
to get a point cloud from depth data. Finally, the frustum PointNet predicts a (oriented and amodal)
3D bounding box for the object from the points in frustum.
84. Frustum PointNets for 3D Object Detection
from RGB-D Data
Frustum PointNets for 3D object detection. First leverage a 2D CNN object detector to propose 2D regions and
classify their content. 2D regions are then lifted to 3D and thus become frustum proposals. Given a point cloud in a
frustum (n × c with n points and c channels of XYZ, intensity etc. for each point), the object instance is segmented
by binary classification of each point. Based on the segmented object point cloud (m × c), a light-weight regression
PointNet (T-Net) tries to align points by translation such that their centroid is close to amodal box center. At last the
box estimation net estimates the amodal 3D bounding box for the object.
85. Frustum PointNets for 3D Object Detection
from RGB-D Data
Coordinate systems for point cloud. (a) default camera
coordinate; (b) frustum coordinate after rotating frustums to
center view; (c) mask coordinate with object points’ centroid
at origin; (d) object coordinate predicted by T-Net.
Basic architectures and IO for PointNets. Architecture is
illustrated for PointNet++ (v2) models with set abstraction
layers and feature propagation layers (for segmentation).
86. Frustum PointNets for 3D Object Detection
from RGB-D Data
Visualizations of Frustum PointNet results on KITTI val set.
87. RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
RoarNet for 3D object detection from 2D image and 3D Lidar point clouds.
Based on two stage object detection framework with PointNet as backbone network, several
ideas to improve 3D object detection performance.
The first part, estimates the 3D poses of objects from a monocular image, which approximates
where to examine further, and derives multiple candidates that are geometrically feasible.
This step significantly narrows down feasible 3D regions, which otherwise requires demanding
processing of 3D point clouds in a huge search space.
The second part, takes the candidate regions and conducts in-depth inferences to conclude final
poses in a recursive manner.
Inspired by PointNet, RoarNet processes 3D point clouds directly, leading to precise detection.
RoarNet is implemented in Tensorflow and publicly available with pretrained models.
88. RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
Detection pipeline of RoarNet. The model (a) predicts region proposals in 3D space using geometric
agreement search, (b) predicts objectness in each region proposal, (c) predicts 3D bounding boxes, (d)
calculates IoU (Intersection over Union) between 2D detection and 3D detection.
89. RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
Architecture of RoarNet
90. RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
(a) Previous
Architecture
(b) RoarNet 2D
Architecture
91. RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
RoarNet 2D. An unified architecture detects 2D
bounding boxes and 3D poses illustrated in (a)
and (b), respectively. For each object, two
extreme cases are shown as non-filled boxes,
and final equally-spaced candidate locations as
colored dots in (b). All calculations are derived
in 3D space despite bird’s eye view (i.e., X-Z
plane) visualization.
92. RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
A detection pipeline of several network architectures
93. Joint 3D Proposal Generation and Object Detection
from View Aggregation
AVOD, an Aggregate View Object Detection network for autonomous driving scenarios.
The network uses LIDAR point clouds and RGB images to generate features shared by two
subnetworks: a region proposal network (RPN) and a second stage detector network.
The RPN is capable of performing multimodal feature fusion on high resolution feature maps to
generate reliable 3D object proposals for multiple object classes in road scenes.
Using these proposals, the second stage detection network performs accurate oriented 3D bounding
box regression and category classification to predict the extents, orientation, and classification of
objects in 3D space.
Source code is at: https://github.com/kujason/avod.
A visual representation of the 3D detection problem
from Bird’s Eye View (BEV). The Bbox in green is used to
determine the IoU overlap in the computation of the average
precision. The importance of explicit orientation estimation
can be seen as an object’s Bbox does not change when the
orientation (purple) is shifted by ±π radians.
94. Joint 3D Proposal Generation and Object Detection
from View Aggregation
The method’s architectural diagram. The feature extractors are shown in blue, the region proposal
network in pink, and the second stage detection network in green.
95. Joint 3D Proposal Generation and Object Detection
from View Aggregation
The architecture of high resolution
feature extractor for the image branch.
Feature maps are propagated from the
encoder to the decoder section via red
arrows. Fusion is then performed at every
stage of the decoder by a learned
upsampling layer, followed by concatenation,
and then mixing via a convolutional layer,
resulting in a full resolution feature map at
the last layer of the decoder.
96. Joint 3D Proposal Generation and Object Detection
from View Aggregation
Qualitative results of AVOD for cars (top) and pedestrians/cyclists (bottom). Left: 3D RPN output, Middle: 3D
detection output, and Right: the projection of the detection output onto image space for all three classes.
97. SPLATNet: Sparse Lattice Networks for Point Cloud
Processing
A network architecture for processing point clouds that directly operates on a collection of
points represented as a sparse set of samples in a high-dimensional lattice.
The network uses sparse bilateral convolutional layers as building blocks, and these layers
maintain efficiency by using indexing structures to apply convolutions only on occupied parts
of the lattice, so allow flexible specifications of the lattice structure enabling hierarchical and
spatially-aware feature learning, as well as joint 2D-3D reasoning.
Both point-based and image-based representations can be easily incorporated in a network
with such layers and the resulting model can be trained in an E2E manner.
From point clouds and images to semantics. SPLATNet3D
directly takes point cloud as input and predicts labels for
each point. SPLATNet2D-3D, on the other hand, jointly
processes both point cloud and the corresponding multi-
view images for better 2D and 3D predictions.
98. SPLATNet: Sparse Lattice Networks for Point Cloud
Processing
Bilateral Convolution Layer (BCL). Splat: BCL
first interpolates input features F onto a dl-
dimensional permutohedral lattice defined by the
lattice features L at input points. Convolve: BCL
then does dl-dimensional convolution over this
sparsely populated lattice. Slice: The filtered signal
is then interpolated back onto the input signal.
• The input points to BCL need not be ordered or
lie on a grid as they are projected onto a dl-
dimensional grid defined by lattice features Lin.
• The input and output points can be different for
BCL with the specification of different input and
output lattice features Lin and Lout.
• Since BCL allows separate specifications of input
and lattice features, input signals can be projected
into a different dimensional space for filtering.
• Just like in standard spatial convolutions, BCL
allows an easy specification of filter neighborhood.
• Since a signal is usually sparse in high-
dimension, BCL uses hash tables to index the
populated vertices and does convolutions only at
those locations.
99. SPLATNet: Sparse Lattice Networks for Point Cloud
Processing
SPLATNet. Illustration of inputs, outputs and network architectures for SPLATNet3D and SPLATNet2D-3D.
100. SPLATNet: Sparse Lattice Networks for Point Cloud
Processing
2D to 3D projection. Using splat and slice using
splat and slice operations. Given input features of 2D
images, pixels are projected onto a 3D
permutohedral lattice defined by 3D positional lattice
features. The splatted signal is then sliced onto the
points of interest in a 3D point cloud.
Facade point cloud labeling. Sample visual
results of SPLATNet3D and SPLATNet2D-3D.
101. PointRCNN: 3D Object Proposal Generation and
Detection from Point Cloud
PointRCNN is a deep NN method for 3D object detection from raw point cloud.
The whole framework is composed of two stages:
stage-1 for the bottom-up 3D proposal generation;
stage-2 for refining proposals in the canonical coord.s to obtain the detection results.
Instead of generating proposals from RGB image or projecting point cloud to bird’s view
or voxels, this stage-1 sub-network directly generates a small number of high-quality 3D
proposals from point cloud in a bottom-up manner via segmenting the point cloud of
whole scene into FG points and BG.
The stage-2 sub-network transforms the pooled points of each proposal to canonical
coord.s to learn local spatial features, which is combined with global semantic features of
each point learned in stage-1 for accurate box refinement and confidence prediction.
102. PointRCNN: 3D Object Proposal Generation and
Detection from Point Cloud
Instead of generating proposals from fused feature
maps of bird’s view and front view, or RGB images,
this method directly generates 3D proposals from raw
point cloud in a bottom-up manner.
C: PointRCNN
103. PointRCNN: 3D Object Proposal Generation and
Detection from Point Cloud
The PointRCNN architecture. The whole network consists of two parts: (a) for generating 3D proposals
from raw point cloud in a bottom-up manner. (b) for refining the 3D proposals in canonical coordinate.
104. PointRCNN: 3D Object Proposal Generation and
Detection from Point Cloud
Bin-based localization. The surrounding area along X
and Z axes of each foreground point is split into a series
of bins to locate the object center.
Canonical transformation. The pooled points belonged to
each proposal are transformed to the corresponding canonical
coordinate system for better local spatial feature learning,
where CCS denotes Canonical Coordinate System.
105. PointRCNN: 3D Object Proposal Generation and
Detection from Point Cloud
The upper is the image and the lower is a representative view of the corresponding point cloud.
106. Deep Continuous Fusion for Multi-Sensor 3D Object
Detection
A 3D object detector exploits both LIDAR and cameras to perform very accurate localization.
Design an E2E learnable architecture that exploits continuous convolutions to fuse image and
LIDAR feature maps at different levels of resolution.
The continuous fusion layer encode both discrete-state image features and continuous
geometric info.
Deep parametric continuous convolution is a learnable operator that operates over non-grid-
structured data.
The motivation behind is to extend the standard grid-structured convolution to non-grid-structured
data, while retaining high capacity and low complexity.
The key idea is to exploit multi-layer perceptrons as parameterized kernel functions for continuous
convolution.
This parametric kernel function spans the full continuous domain.
The weighted summation over finite number of neighboring points is used to approximate the
otherwise computationally prohibitive continuous convolution.
Each neighbor is weighted differently according to its relative geometric offset wrt the target point.
This enables a reliable and efficient E2E learnable 3D object detector based on multiple
sensors.
107. Deep Continuous Fusion for Multi-Sensor 3D Object
Detection
Continuous fusion layer: given a target pixel on BEV image, extract K nearest LIDAR points (S1); project the 3D
points onto the camera image plane (S2-3); this helps retrieve corresponding image features (S4); feed the image
feature + continuous geometry offset into a MLP to generate feature for the target pixel (S5).
108. Deep Continuous Fusion for Multi-Sensor 3D Object
Detection
Qualitative
results on KITTI
Dataset.
109. End-to-end Learning of Multi-sensor 3D Tracking by
Detection
An approach of tracking by detection that can exploit both cameras as well as LIDAR data to
produce very accurate 3D trajectories.
Towards this goal, formulate it as a linear program that can be solved exactly, and learn
convolutional networks for detection as well as matching in an end-to-end manner.
The system takes as external input a time series of RGB Frames and LIDAR point clouds. From these
inputs, the system produces discrete trajectories of the targets. In particular, an architecture that is e2e
trainable while still maintaining explainability, is achieved by formulating the system in a structured manner.
110. End-to-end Learning of Multi-sensor 3D Tracking by
Detection
Forward passes over a set of detections from
two frames for both scoring and matching.
For each detection xj, a forward pass of a Detection
Network is computed to produce θdet
W(xj), the cost of
using or discarding xj according to the assignment to ydet
j.
For each pair of detections xj and xi from subsequent
frames, a forward pass of the Match Network is computed
to produce θlink
W(xi,xj), the cost of linking or not these two
detections according to the assignment to ylink
i,j. Finally,
each detection might start a new trajectory or end an
existing one, the costs for this are computed via θnew
W(x)
and θend
W(x), respectively, and are associated with the
assignments to ynew and yend.
Formulate the problem as inference in a deep structured model (DSM), where the factors are computed
using a set of feed forward neural nets that exploit both camera and LIDAR data to compute both detection
and matching scores. Inference in the model can be done exactly by a set of feed forward processes
followed by solving a linear program. Learning is done e2e via minimization of a structured hinge loss,
optimizing simultaneously the detector and tracker.