camera-based Lane detection by deep learningYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
RegNet: Multimodal Sensor Registration Using Deep Neural Networks
CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks
RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with Geometric Deep Learning and Generative Model
CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints
LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
CFNet: LiDAR-Camera Registration Using Calibration Flow Network
Camera-based road Lane detection by deep learning IIIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Deconstructing SfM-Net architecture and beyond
"SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations. Given a sequence of frames, SfM-Net predicts depth, segmentation, camera and rigid object motions, converts those into a dense frame-to-frame motion field (optical flow), differentiably warps frames in time to match pixels and back-propagates."
Alternative download:
https://www.dropbox.com/s/aezl7ro8sy2xq7j/sfm_net_v2.pdf?dl=0
Fisheye based Perception for Autonomous Driving VIYu Huang
Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras
FisheyeDistanceNet++: Self-Supervised Fisheye Distance Estimation with Self-Attention, Robust Loss Function and Camera View Generalization
An Online Learning System for Wireless Charging Alignment using Surround-view Fisheye Cameras
RoadEdgeNet: Road Edge Detection System Using Surround View Camera Images
camera-based Lane detection by deep learningYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
RegNet: Multimodal Sensor Registration Using Deep Neural Networks
CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks
RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with Geometric Deep Learning and Generative Model
CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints
LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
CFNet: LiDAR-Camera Registration Using Calibration Flow Network
Camera-based road Lane detection by deep learning IIIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Deconstructing SfM-Net architecture and beyond
"SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations. Given a sequence of frames, SfM-Net predicts depth, segmentation, camera and rigid object motions, converts those into a dense frame-to-frame motion field (optical flow), differentiably warps frames in time to match pixels and back-propagates."
Alternative download:
https://www.dropbox.com/s/aezl7ro8sy2xq7j/sfm_net_v2.pdf?dl=0
Fisheye based Perception for Autonomous Driving VIYu Huang
Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras
FisheyeDistanceNet++: Self-Supervised Fisheye Distance Estimation with Self-Attention, Robust Loss Function and Camera View Generalization
An Online Learning System for Wireless Charging Alignment using Surround-view Fisheye Cameras
RoadEdgeNet: Road Edge Detection System Using Surround View Camera Images
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
Road-line detection and 3D reconstruction using fisheye cameras
• Vehicle Re-ID for Surround-view Camera System
• SynDistNet: Self-Supervised Monocular Fisheye Camera Distance
Estimation Synergized with Semantic Segmentation for Autonomous
Driving
• Universal Semantic Segmentation for Fisheye Urban Driving Images
• UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a
Generic Framework for Handling Common Camera Distortion Models
• OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving
• Adversarial Attacks on Multi-task Visual Perception for Autonomous Driving
Effective Object Detection and Background Subtraction by using M.O.IIJMTST Journal
This paper proposes efficient motion detection and people counting based on background subtraction using dynamic threshold approach with mathematical morphology. Here these different methods are used effectively for object detection and compare these performance based on accurate detection. Here the techniques frame differences, dynamic threshold based detection will be used. After the object foreground detection, the parameters like speed, velocity motion will be determined. For this, most of previous methods depend on the assumption that the background is static over short time periods. In dynamic threshold based object detection, morphological process and filtering also used effectively for unwanted pixel removal from the background. The background frame will be updated by comparing the current frame intensities with reference frame. Along with this dynamic threshold, mathematical morphology also used which has an ability of greatly attenuating color variations generated by background motions while still highlighting moving objects. Finally the simulated results will be shown that used approximate median with mathematical morphology approach is effective rather than prior background subtraction methods in dynamic texture scenes and performance parameters of moving object such sensitivity, speed and velocity will be evaluated by using MOI.
2019年6月13日、SSII2019 Organized Session: Multimodal 4D sensing。エンドユーザー向け SLAM 技術の現在。登壇者:武笠 知幸(Research Scientist, Rakuten Institute of Technology)
https://confit.atlas.jp/guide/event/ssii2019/static/organized#OS2
New Method for Traffic Density Estimation Based on Topic ModelNidhi Shirbhayye
Description: The system presents a new framework for traffic density estimation based on topic model, which is an unsupervised model. It uses a set of visual features without any individual vehicle detection and tracking need, and discovers the motion patterns automatically in traffic scenes by using topic model.
Reconstruction of Objects with VSN M.Priscilla - UG Scholar,
B.Nandhini - UG Scholar,
S.Manju - UG Scholar,
S.Shafiqa Shalaysha – UG Scholar,
Christo Ananth - Assistant Professor,
Department of ECE,
Francis Xavier Engineering College, Tirunelveli, India
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
FisheyeMultiNet: Real-time Multi-task Learning Architecture for
Surround-view Automated Parking System
• Generalized Object Detection on Fisheye Cameras for Autonomous
Driving: Dataset, Representations and Baseline
• SynWoodScape: Synthetic Surround-view Fisheye Camera Dataset for
Autonomous Driving
• Feasible Self-Calibration of Larger Field-of-View (FOV) Camera Sensors
for the ADAS
Falling costs with rising quality via hardware innovations and deep learning.
Technical introduction for scanning technologies from Structure-from-Motion (SfM), Range sensing (e.g. Kinect and Matterport) to Laser scanning (e.g. LiDAR), and the associated traditional and deep learning-based processing techniques.
Note! Due to small font size, and bad rendering by SlideShare, better to download the slides locally to your device
Alternative download link for the PDF:
https://www.dropbox.com/s/eclyy45k3gz66ve/proptech_emergingScanningTech.pdf?dl=0
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Real Time Object Identification for Intelligent Video Surveillance ApplicationsEditor IJCATR
Intelligent video surveillance system has emerged as a very important research topic in the computer vision field in the
recent years. It is well suited for a broad range of applications such as to monitor activities at traffic intersections for detecting
congestions and predict the traffic flow. Object classification in the field of video surveillance is a key component of smart
surveillance software. Two robust methodology and algorithms adopted for people and object classification for automated surveillance
systems is proposed in this paper. First method uses background subtraction model for detecting the object motion. The background
subtraction and image segmentation based on morphological transformation for tracking and object classification on highways is
proposed. This algorithm uses erosion followed by dilation on various frames. Proposed algorithm in first method, segments the image
by preserving important edges which improves the adaptive background mixture model and makes the system learn faster and more
accurately. The system used in second method adopts the object detection method without background subtraction because of the static
object detection. Segmentation is done by the bounding box registration technique. Then the classification is done with the multiclass
SVM using the edge histogram as features. The edge histograms are calculated for various bin values in different environment. The
result obtained demonstrates the effectiveness of the proposed approach.
Application of Foundation Model for Autonomous DrivingYu Huang
Since DARPA’s Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.
Autonomous driving for robotaxi, like perception, prediction, planning, decision making and control etc. As well as simulation, visualization and data closed loop etc.
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
Canadian Adverse Driving Conditions Dataset, 2020, 2
Deep multimodal sensor fusion in unseen adverse weather, 2020, 8
RADIATE: A Radar Dataset for Automotive Perception in Bad Weather, 2021, 4
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection, 2021, 7
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather, 2021, 8
DSOR: A Scalable Statistical Filter for Removing Falling Snow from LiDAR Point Clouds in Severe Winter Weather, 2021, 9
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
Formal Scenario-Based Testing of Autonomous Vehicles: From Simulation to the Real World, 2020
A Scenario-Based Development Framework for Autonomous Driving, 2020
A Customizable Dynamic Scenario Modeling and Data Generation Platform for Autonomous Driving, 2020
Large Scale Autonomous Driving Scenarios Clustering with Self-supervised Feature Extraction, 2021
Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles, 2021
Systems Approach to Creating Test Scenarios for Automated Driving Systems, Reliability Engineering and System Safety (215), 2021
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
Introduction;
data driven models for autonomous driving;
cloud computing infrastructure and big data processing;
annotation tools for training data;
large scale model training platform;
model testing and verification;
related machine learning techniques;
Conclusion.
Simulation for autonomous driving at uber atgYu Huang
Testing Safety of SDVs by Simulating Perception and Prediction
LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
Recovering and Simulating Pedestrians in the Wild
S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling
SceneGen: Learning to Generate Realistic Traffic Scenes
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors
GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving
AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
Appendix: (Waymo)
SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
Prediction and planning for self driving at waymoYu Huang
ChauffeurNet: Learning To Drive By Imitating The Best Synthesizing The Worst
Multipath: Multiple Probabilistic Anchor Trajectory Hypotheses For Behavior Prediction
VectorNet: Encoding HD Maps And Agent Dynamics From Vectorized Representation
TNT: Target-driven Trajectory Prediction
Large Scale Interactive Motion Forecasting For Autonomous Driving : The Waymo Open Motion Dataset
Identifying Driver Interactions Via Conditional Behavior Prediction
Peeking Into The Future: Predicting Future Person Activities And Locations In Videos
STINet: Spatio-temporal-interactive Network For Pedestrian Detection And Trajectory Prediction
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSveerababupersonal22
It consists of cw radar and fmcw radar ,range measurement,if amplifier and fmcw altimeterThe CW radar operates using continuous wave transmission, while the FMCW radar employs frequency-modulated continuous wave technology. Range measurement is a crucial aspect of radar systems, providing information about the distance to a target. The IF amplifier plays a key role in signal processing, amplifying intermediate frequency signals for further analysis. The FMCW altimeter utilizes frequency-modulated continuous wave technology to accurately measure altitude above a reference point.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Deep VO and SLAM IV
1. Deep VO and SLAM IV
Yu Huang
Yu.haung07@gmail.com
Sunnyvale, California
2. Outline
• Flowdometry: An Optical Flow and Deep Learning Based Approach to Visual Odometry
• Supervising the new with the old: learning SFM from SFM
• Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency
• Digging Into Self-Supervised Monocular Depth Estimation
• Learning monocular visual odometry with dense 3D mapping from dense 3D flow
• Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion,
Optical Flow and Motion Segmentation
• Estimating Metric Scale Visual Odometry from Videos using 3D Convolutional Networks
• GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with
Generative Adversarial Networks
• DeepPCO: End-to-End Point Cloud Odometry through Deep Parallel Neural Network
• DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency
• Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic
Understanding
3. Flowdometry: An Optical Flow and Deep Learning
Based Approach to Visual Odometry
• https://github.com/petermuller/flowdometry
• Visual odometry is a challenging task related to simultaneous localization and mapping that aims
to generate a map traveled from a visual data stream.
• Based on one or two cameras, motion is estimated from features and pixel differences between
frames.
• Because of the frame rate of the cameras, there are generally small, incremental changes
between subsequent frames where optical flow can be assumed to be proportional to the
physical distance moved by an egocentric reference, such as a camera on a vehicle.
• This paper porposed a visual odometry system called Flowdometry based on optical flow and
deep learning.
• Optical flow images are used as input to a convolutional neural network, which calculates a
rotation and displacement for each image pixel.
• The displacements and rotations are applied incrementally to construct a map of where the
camera has traveled.
• The proposed system is trained and tested on the KITTI visual odometry dataset, and accuracy is
measured by the difference in distances between ground truth and predicted driving trajectories.
4. Flowdometry: An Optical Flow and Deep Learning
Based Approach to Visual Odometry
The Flowdometry convolutional neural network architecture
based on the contractive part of FlowNetS
FlowNetS architecture with the contractive side of the network
6. Supervising the new with the old: learning SFM
from SFM
• Recent work has demonstrated that it is possible to learn deep neural networks for monocular
depth and ego-motion estimation from unlabelled video sequences, an interesting theoretical
development with numerous advantages in applications.
• This paper propose a number of improvements to these approaches.
• First, since such self supervised approaches are based on the brightness constancy assumption,
which is valid only for a subset of pixels, apply a probabilistic learning formulation where the
network predicts distributions over variables rather than specific values.
• As these distributions are conditioned on the observed image, the network can learn which scene
and object types are likely to violate the model assumptions, resulting in more robust learning; so
build on dozens of years of experience in developing handcrafted structure-from-motion (SFM)
algorithms by using an off-the-shelf SFM system to generate a supervisory signal for the deep
neural network.
• While this signal is also noisy, this probabilistic formulation can learn and account for the defects
of SFM, helping to integrate different sources of information and boosting the overall
performance of the network.
10. Self-Supervised Learning of Depth and Motion
Under Photometric Inconsistency
• The self-supervised learning of depth and pose from monocular sequences provides an attractive
solution by using the photometric consistency of nearby frames as it depends much less on the
ground-truth data.
• This paper addresses the issue when previous assumptions of the self-supervised approaches are
violated due to the dynamic nature of real-world scenes.
• Different from handling the noise as uncertainty, this key idea is to incorporate more robust
geometric quantities and enforce internal consistency in the temporal image sequence.
• Enforcing the depth consistency across adjacent frames significantly improves the depth
estimation with much fewer noisy pixels.
• The geometric information is implicitly embedded into neural networks and does not bring
overhead for inference.
14. Digging Into Self-Supervised Monocular Depth
Estimation
• Per-pixel ground-truth depth data is challenging to acquire at scale.
• To overcome this limitation, self-supervised learning has emerged as a promising alternative for
training models to perform monocular depth estimation.
• This paper proposes a set of improvements, which together result in both quantitatively and
qualitatively improved depth maps compared to competing self-supervised methods.
• Research on self-supervised monocular training usually explores increasingly complex
architectures, loss functions, and image formation models, all of which have recently helped to
close the gap with fully-supervised methods.
• It shows that a surprisingly simple model, and associated design choices, lead to superior
predictions.
• (i) a minimum reprojection loss, designed to robustly handle occlusions;
• (ii) a full-resolution multi-scale sampling method that reduces visual artifacts;
• (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions.
• https://github.com/nianticlabs/monodepth2
18. Learning monocular visual odometry with dense 3D
mapping from dense 3D flow
• This paper introduces a fully deep learning approach to monocular SLAM, which can perform
simultaneous localization using a NN for learning visual odometry (L-VO) and dense 3D mapping.
• Dense 2D flow and a depth image are generated from monocular images by sub-networks, which
are then used by a 3D flow associated layer in the L-VO network to generate dense 3D flow.
• Given this 3D flow, the dual stream L-VO network can then predict the 6DOF relative pose and
furthermore reconstruct the vehicle trajectory.
• In order to learn the correlation between motion directions, the Bivariate Gaussian modeling is
employed in the loss function.
• Moreover, the learned depth is leveraged to generate a dense 3D map.
• As a result, an entire visual SLAM system, that is, learning monocular odometry combined with
dense 3D mapping, is achieved.
22. Competitive Collaboration: Joint Unsupervised Learning of
Depth, Camera Motion, Optical Flow and Motion Segmentation
• It addresses the unsupervised learning of several interconnected problems in low-level vision:
single view depth prediction, camera motion estimation, optical flow, and segmentation of a
video into the static scene and moving regions.
• The key insight is four fundamental vision problems are coupled through geometric constraints.
• Consequently, learning to solve them together simplifies the problem because the solutions can
reinforce each other.
• They go beyond previous work by exploiting geometry more explicitly and segmenting the scene
into static and moving regions.
• To that end, it introduces Competitive Collaboration, a framework that facilitates the coordinated
training of multiple specialized NNs to solve complex problems.
• Competitive Collaboration works much like expectation-maximization, but with NNs that act as
both competitors to explain pixels that correspond to static or moving regions, and as
collaborators through a moderator that assigns pixels to be either static or independently moving.
• This method integrates all these problems in a common framework and simultaneously reasons
about the segmentation of the scene into moving objects and the static background, the camera
motion, depth of the static scene structure, and the optical flow of moving objects.
• All our models and code are available at https://github.com/anuragranj/cc.
27. Estimating Metric Scale Visual Odometry from
Videos using 3D Convolutional Networks
• Monocular visual odometry (VO) is a heavily studied topic in robotics as it enables robust 3D
localization with a ubiquitous, lightweight sensor: a single camera.
• Scale accuracy can only be achieved with geometric methods in one of two ways: 1) by fusing info
from a sensor that measures physical units, such as an IMU or GPS receiver, or 2) by exploiting
prior knowledge about objects in a scene, such as the typical size.
• This is an E2E deep learning approach for performing metric scale-sensitive regression such visual
odometry with a single camera and no additional sensors.
• They propose a 3D convolutional architecture, 3DC-VO, that can leverage temporal relationships
over a short moving window of images to estimate linear and angular velocities.
• The network makes local predictions on stacks of images that can be integrated to form a full
trajectory.
• https://www.github.com/alexanderkoumis/3dc_vo.
28. Estimating Metric Scale Visual Odometry from
Videos using 3D Convolutional Networks
A 3D convolution
Generic subnetwork structure
30. GANVO: Unsupervised Deep Monocular Visual Odometry and
Depth Estimation with Generative Adversarial Networks
• In the last decade, supervised deep learning approaches have been extensively employed in
visual odometry (VO) applications, which is not feasible in environments where labelled data is
not abundant.
• On the other hand, unsupervised deep learning approaches for localization and mapping in
unknown environments from unlabelled data have received comparatively less attention in VO
research.
• This study proposes a generative unsupervised learning framework that predicts 6-DoF pose
camera motion and monocular depth map of the scene from unlabelled RGB image sequences,
using deep convolutional Generative Adversarial Networks (GANs).
• They create a supervisory signal by warping view sequences and assigning the re-projection
minimization to the objective loss function that is adopted in multi-view pose estimation and
single-view depth generation network.
31. GANVO: Unsupervised Deep Monocular Visual Odometry and
Depth Estimation with Generative Adversarial Networks
32. GANVO: Unsupervised Deep Monocular Visual Odometry and
Depth Estimation with Generative Adversarial Networks
33. GANVO: Unsupervised Deep Monocular Visual Odometry and
Depth Estimation with Generative Adversarial Networks
34. GANVO: Unsupervised Deep Monocular Visual Odometry and
Depth Estimation with Generative Adversarial Networks
35. DeepPCO: End-to-End Point Cloud Odometry
through Deep Parallel Neural Network
• Odometry is of key importance for localization in the absence of a map.
• There is considerable work in the area of visual odometry (VO), and recent advances in deep
learning have brought novel approaches to VO, which directly learn salient features from raw
images.
• These learning-based approaches have led to more accurate and robust VO systems.
• However, they have not been well applied to point cloud data yet.
• This work investigates how to exploit deep learning to estimate point cloud odometry (PCO),
which may serve as a critical component in point cloud-based downstream tasks or learning-
based systems.
• Specifically, they propose a end-to-end deep parallel neural network called DeepPCO, which can
estimate the 6-DOF poses using consecutive point clouds.
• It consists of two parallel sub-networks to estimate 3- D translation and orientation respectively
rather than a single neural network.
39. DeepPCO: End-to-End Point Cloud Odometry
through Deep Parallel Neural Network
Ablation experiment of single branch fully connected layers. All the parameter configurations
of convolutional layers and fully connected layers are the same as DeepPCO. Different from
DeepPCO in which transformation vector is trained using two branches, 3-D translation (x, y, z)
and orientation (i, j, k) are jointly trained and inferred by just one branch here.
40. DeepPCO: End-to-End Point Cloud Odometry
through Deep Parallel Neural Network
“Deep Learning for Laser Based Odometry Estimation”
41. DF-Net: Unsupervised Joint Learning of Depth and
Flow using Cross-Task Consistency
• https://github.com/vt-vl-lab/DF-Net
• It presents an unsupervised learning framework for simultaneously training single-view depth
prediction and optical flow estimation models using unlabeled video sequences.
• Existing unsupervised methods often exploit brightness constancy and spatial smoothness priors
to train depth or flow models.
• This paper proposes to leverage geometric consistency as additional supervisory signals.
• The core idea is that for rigid regions it can use the predicted scene depth and camera motion to
synthesize 2D optical flow by back-projecting the induced 3D scene flow.
• The discrepancy between the rigid flow (from depth prediction and camera motion) and the
estimated flow (from optical flow model) allows us to impose a cross-task consistency loss.
• While all the networks are jointly optimized during training, they can be applied independently at
test time.
45. Every Pixel Counts ++: Joint Learning of Geometry
and Motion with 3D Holistic Understanding
• https://github.com/chenxuluo/EPC
• Learning to estimate 3D geometry in a single frame and optical flow from consecutive frames by
watching unlabeled videos via Deep CNN has made significant progress recently.
• Current SoTA methods treat the two tasks independently. One typical assumption of the existing
depth estimation methods is that the scenes contain no independent moving objects. while
object moving could be easily modeled using optical flow.
• This paper proposes to address the two tasks as a whole, i.e. to jointly understand per-pixel 3D
geometry and motion.
• This eliminates the need of static scene assumption and enforces the inherent geometrical
consistency during the learning process, yielding significantly improved results for both tasks.
• This method is called as “Every Pixel Counts++” or “EPC++”.
• Specifically, during training, given two consecutive frames from a video, they adopt three parallel
networks to predict the camera motion (MotionNet), dense depth map (DepthNet), and per-pixel
optical flow between two frames (OptFlowNet) respectively.
• The three types of information, are fed into a holistic 3D motion parser (HMP), and per-pixel 3D
motion of both rigid background and moving objects are disentangled and recovered.
• Various loss terms are formulated to jointly supervise the three networks.
46. Every Pixel Counts ++: Joint Learning of Geometry
and Motion with 3D Holistic Understanding
47. Every Pixel Counts ++: Joint Learning of Geometry
and Motion with 3D Holistic Understanding
DepthNet
48. Every Pixel Counts ++: Joint Learning of Geometry
and Motion with 3D Holistic Understanding
49. Every Pixel Counts ++: Joint Learning of Geometry
and Motion with 3D Holistic Understanding
50. Every Pixel Counts ++: Joint Learning of Geometry
and Motion with 3D Holistic Understanding