This document describes a new method for 2D and 3D object detection and classification for autonomous vehicles using LiDAR and camera (CCD) sensors. It proposes generating object proposals from LiDAR point cloud data by filtering points, projecting them to 2D, and segmenting edges. The proposals are then classified using R-FCN neural network on the CCD image. Class labels from R-FCN are mapped back to edge points to determine the 3D orientation and expand the bounding box for occluded regions. Evaluation on KITTI dataset shows it achieves accurate and fast object detection compared to previous methods.
Annotation tools for ADAS & Autonomous DrivingYu Huang
The document lists over 30 tools for annotating images, videos, and point cloud data. Many of the tools are open source and used for tasks like object detection, segmentation, and labeling. The tools cover a wide range of domains from natural images to LiDAR point clouds and include both online and desktop-based annotation solutions.
3-d interpretation from single 2-d image VYu Huang
The document outlines several approaches for monocular 3D object detection from a single 2D image for autonomous driving applications. It summarizes MonoRUn, which uses self-supervised dense correspondences and geometry along with uncertainty propagation. It also summarizes M3DSSD, which uses feature alignment and asymmetric non-local attention in a single-stage detector. Additionally, it discusses analyzing and addressing localization errors, integrating differentiable NMS into training, and a flexible framework that decouples and adapts approaches for truncated vs normal objects.
Lidar for Autonomous Driving II (via Deep Learning)Yu Huang
The document outlines research on using LiDAR data for autonomous vehicle object detection. It begins with an introduction to sensor fusion techniques using LiDAR and camera data. Several deep learning approaches for 3D object detection from LiDAR point clouds are then summarized, including methods that project the point cloud into 2D feature maps or 3D voxel grids as input to convolutional networks. Finally, techniques for exploiting HD maps and performing real-time on-device detection are discussed. The document provides an overview of the state-of-the-art in LiDAR-based object detection for autonomous driving applications.
Fisheye Omnidirectional View in Autonomous DrivingYu Huang
This document discusses several papers related to using omnidirectional/fisheye camera views for autonomous driving applications. The papers propose methods for tasks like image classification, object detection, scene understanding from 360 degree camera data. Specific approaches discussed include graph-based classification of omnidirectional images, learning spherical convolutions for 360 degree imagery, spherical CNNs, and networks for scene understanding and 3D object detection using around view monitoring camera systems.
RegNet: Multimodal Sensor Registration Using Deep Neural Networks
CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks
RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with Geometric Deep Learning and Generative Model
CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints
LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
CFNet: LiDAR-Camera Registration Using Calibration Flow Network
Annotation tools for ADAS & Autonomous DrivingYu Huang
The document lists over 30 tools for annotating images, videos, and point cloud data. Many of the tools are open source and used for tasks like object detection, segmentation, and labeling. The tools cover a wide range of domains from natural images to LiDAR point clouds and include both online and desktop-based annotation solutions.
3-d interpretation from single 2-d image VYu Huang
The document outlines several approaches for monocular 3D object detection from a single 2D image for autonomous driving applications. It summarizes MonoRUn, which uses self-supervised dense correspondences and geometry along with uncertainty propagation. It also summarizes M3DSSD, which uses feature alignment and asymmetric non-local attention in a single-stage detector. Additionally, it discusses analyzing and addressing localization errors, integrating differentiable NMS into training, and a flexible framework that decouples and adapts approaches for truncated vs normal objects.
Lidar for Autonomous Driving II (via Deep Learning)Yu Huang
The document outlines research on using LiDAR data for autonomous vehicle object detection. It begins with an introduction to sensor fusion techniques using LiDAR and camera data. Several deep learning approaches for 3D object detection from LiDAR point clouds are then summarized, including methods that project the point cloud into 2D feature maps or 3D voxel grids as input to convolutional networks. Finally, techniques for exploiting HD maps and performing real-time on-device detection are discussed. The document provides an overview of the state-of-the-art in LiDAR-based object detection for autonomous driving applications.
Fisheye Omnidirectional View in Autonomous DrivingYu Huang
This document discusses several papers related to using omnidirectional/fisheye camera views for autonomous driving applications. The papers propose methods for tasks like image classification, object detection, scene understanding from 360 degree camera data. Specific approaches discussed include graph-based classification of omnidirectional images, learning spherical convolutions for 360 degree imagery, spherical CNNs, and networks for scene understanding and 3D object detection using around view monitoring camera systems.
RegNet: Multimodal Sensor Registration Using Deep Neural Networks
CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks
RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with Geometric Deep Learning and Generative Model
CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints
LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
CFNet: LiDAR-Camera Registration Using Calibration Flow Network
NDS (Navigation Data Standard) is a standard for navigation map data that defines the structure and format of map databases. It specifies that data should be organized into product databases, update regions, building blocks, and levels. Building blocks separate data by functional aspects like names, traffic, or map display. Levels partition spatial data by size, with higher levels having larger partitions. This hierarchical structure allows flexible versioning and integration of map data from different suppliers.
Depth Fusion from RGB and Depth Sensors IIIYu Huang
The document outlines several methods for fusing RGB and depth sensor data using convolutional neural networks. Key methods discussed include:
- Propagating confidence maps through CNNs to produce dense depth completions from sparse LiDAR data with uncertainty estimates.
- Using CNNs to handle both sparse depth data and dense RGB data for tasks like depth completion and semantic segmentation, by changing only the last layer of the network.
- Fusing sparse 3D LiDAR and dense stereo depth with a CNN to produce high-precision depth estimations, encoding the complementary characteristics of each sensor type.
- Training a morphological neural network using a large RGB-D dataset to learn optimal filter shapes for depth completion from sparse inputs
Classified 3d Model Retrieval Based on Cascaded Fusion of Local Descriptors ijcga
One of the core tasks in order to perform fast and accurate retrieval results in a content-based search and retrieval 3D system is to determine an efficient and effective method for matching similarities between the 3D models. In this paper the “cascaded fusion of local descriptors” is proposed for efficient retrieval of classified 3D models, based on a 2D coloured logo retrieval methodological approach, suitably modified for the purpose of 3D search and retrieval tasks that are widely used in the augmented reality (AR) and virtual reality (VR) fields. Initially, features from Key points are extracted using different state of the art local descriptor algorithms and then they are joined to constitute the feature tuple for the respective key point. Additionally, a feature vocabulary for each descriptor is created that maps those tuples to the respective vocabularies using distance functions that applied among the newly created tuples of each Point Cloud. Subsequently, an inverted index table is formed that maps the 3D models to each tuple respectively. Therefore, for every query 3D model only the corresponding 3D models are retrieved as these were previously mapped in the inverted index table. Finally, from the retrieved list by comparing the local features frequency of appearance to the first vocabulary, the final re ranked list of the most similar 3D models is produced.
The document provides an overview of a vision-based place recognition system for autonomous robots. It discusses the framework of such a system, including sensing, pre-processing, feature extraction, training, classification, and post-processing. Local feature extraction is a key component, involving local feature detection to identify interest points and local feature descriptors to build representations around those points. The system aims to recognize places using visual cues in order to enable robot localization.
Camera-based road Lane detection by deep learning IIIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Simulation for autonomous driving at uber atgYu Huang
Testing Safety of SDVs by Simulating Perception and Prediction
LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
Recovering and Simulating Pedestrians in the Wild
S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling
SceneGen: Learning to Generate Realistic Traffic Scenes
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors
GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving
AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
Appendix: (Waymo)
SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
3-d interpretation from single 2-d image IVYu Huang
This document summarizes several methods for monocular 3D object detection from a single 2D image for autonomous driving applications. It outlines methods that use pseudo-LiDAR representations, monocular camera space cubification with an auto-encoder, utilizing ground plane priors, predicting categorical depth distributions, dynamic message propagation conditioned on depth, and utilizing geometric constraints. The methods aim to overcome challenges of monocular 3D detection by leveraging techniques such as depth estimation, 3D feature representation learning, and integrating contextual and depth cues.
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
Canadian Adverse Driving Conditions Dataset, 2020, 2
Deep multimodal sensor fusion in unseen adverse weather, 2020, 8
RADIATE: A Radar Dataset for Automotive Perception in Bad Weather, 2021, 4
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection, 2021, 7
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather, 2021, 8
DSOR: A Scalable Statistical Filter for Removing Falling Snow from LiDAR Point Clouds in Severe Winter Weather, 2021, 9
El documento describe varios servicios de Google como Google Reader, que permite añadir RSS a Google Reader; Blogger para crear blogs; Froogle para buscar productos; GMail con más de 2GB de almacenamiento; Google AdSense y AdWords para publicidad; Google Alerts para recibir alertas de búsquedas; Google Pack con software para Windows; Google Talk para mensajería instantánea; y Google Earth para ver imágenes de satélite.
NDS (Navigation Data Standard) is a standard for navigation map data that defines the structure and format of map databases. It specifies that data should be organized into product databases, update regions, building blocks, and levels. Building blocks separate data by functional aspects like names, traffic, or map display. Levels partition spatial data by size, with higher levels having larger partitions. This hierarchical structure allows flexible versioning and integration of map data from different suppliers.
Depth Fusion from RGB and Depth Sensors IIIYu Huang
The document outlines several methods for fusing RGB and depth sensor data using convolutional neural networks. Key methods discussed include:
- Propagating confidence maps through CNNs to produce dense depth completions from sparse LiDAR data with uncertainty estimates.
- Using CNNs to handle both sparse depth data and dense RGB data for tasks like depth completion and semantic segmentation, by changing only the last layer of the network.
- Fusing sparse 3D LiDAR and dense stereo depth with a CNN to produce high-precision depth estimations, encoding the complementary characteristics of each sensor type.
- Training a morphological neural network using a large RGB-D dataset to learn optimal filter shapes for depth completion from sparse inputs
Classified 3d Model Retrieval Based on Cascaded Fusion of Local Descriptors ijcga
One of the core tasks in order to perform fast and accurate retrieval results in a content-based search and retrieval 3D system is to determine an efficient and effective method for matching similarities between the 3D models. In this paper the “cascaded fusion of local descriptors” is proposed for efficient retrieval of classified 3D models, based on a 2D coloured logo retrieval methodological approach, suitably modified for the purpose of 3D search and retrieval tasks that are widely used in the augmented reality (AR) and virtual reality (VR) fields. Initially, features from Key points are extracted using different state of the art local descriptor algorithms and then they are joined to constitute the feature tuple for the respective key point. Additionally, a feature vocabulary for each descriptor is created that maps those tuples to the respective vocabularies using distance functions that applied among the newly created tuples of each Point Cloud. Subsequently, an inverted index table is formed that maps the 3D models to each tuple respectively. Therefore, for every query 3D model only the corresponding 3D models are retrieved as these were previously mapped in the inverted index table. Finally, from the retrieved list by comparing the local features frequency of appearance to the first vocabulary, the final re ranked list of the most similar 3D models is produced.
The document provides an overview of a vision-based place recognition system for autonomous robots. It discusses the framework of such a system, including sensing, pre-processing, feature extraction, training, classification, and post-processing. Local feature extraction is a key component, involving local feature detection to identify interest points and local feature descriptors to build representations around those points. The system aims to recognize places using visual cues in order to enable robot localization.
Camera-based road Lane detection by deep learning IIIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Simulation for autonomous driving at uber atgYu Huang
Testing Safety of SDVs by Simulating Perception and Prediction
LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
Recovering and Simulating Pedestrians in the Wild
S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling
SceneGen: Learning to Generate Realistic Traffic Scenes
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors
GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving
AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
Appendix: (Waymo)
SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
3-d interpretation from single 2-d image IVYu Huang
This document summarizes several methods for monocular 3D object detection from a single 2D image for autonomous driving applications. It outlines methods that use pseudo-LiDAR representations, monocular camera space cubification with an auto-encoder, utilizing ground plane priors, predicting categorical depth distributions, dynamic message propagation conditioned on depth, and utilizing geometric constraints. The methods aim to overcome challenges of monocular 3D detection by leveraging techniques such as depth estimation, 3D feature representation learning, and integrating contextual and depth cues.
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
Canadian Adverse Driving Conditions Dataset, 2020, 2
Deep multimodal sensor fusion in unseen adverse weather, 2020, 8
RADIATE: A Radar Dataset for Automotive Perception in Bad Weather, 2021, 4
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection, 2021, 7
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather, 2021, 8
DSOR: A Scalable Statistical Filter for Removing Falling Snow from LiDAR Point Clouds in Severe Winter Weather, 2021, 9
El documento describe varios servicios de Google como Google Reader, que permite añadir RSS a Google Reader; Blogger para crear blogs; Froogle para buscar productos; GMail con más de 2GB de almacenamiento; Google AdSense y AdWords para publicidad; Google Alerts para recibir alertas de búsquedas; Google Pack con software para Windows; Google Talk para mensajería instantánea; y Google Earth para ver imágenes de satélite.
Short Term Rental Statistics Report Altadena (10-2014 to 10-2016)Robert StGenis
This document contains statistics on short term rental listings and bookings in Altadena, CA from October 2014 to October 2016. It shows the number of available listings, booked listings, average daily rates, occupancy rates, and revenue for entire home rentals, private room rentals, and hotel-comparable listings over this time period. The data was provided by Airdna and demonstrates trends in short term rental activity in Altadena over a two year period.
Sherif Ahmed Mostafa Mohamed has over 15 years of experience in various IT roles including IT support engineer, IT team leader, and current role as an IT project engineer. He has a bachelor's degree in information systems and seeks to apply his skills and acquire new knowledge in a professional organization. His experience includes supporting users, managing IT assets, implementing systems and policies, and providing training.
Helikopter Sistemleri - Bölüm 4 - helikopter sistemleri. FAA Helicopter Flying Handbook'tan referans alınarak hazırlandı. İstanbul Arel Üniversitesi - Emre Akar
Seguridad informatica;robo de identidadSofia Rivera
Este documento describe el robo de identidad y formas de combatirlo. Explica que el robo de identidad implica que alguien robe información personal y luego la use para cometer fraude haciéndose pasar por la víctima. Detalla métodos comunes como phishing y revisar la basura, e insta a proteger la información con contraseñas, detectar correos electrónicos fraudulentos y destruir documentos con datos personales. Además, enfatiza la importancia de monitorear cuentas y reportar cualquier actividad sospechosa
The document contains a tweet by Steven Wright about abstract painting without tools, followed by repeated sections of text about dynamic engagement, shared understanding, opportunities, bespoke, stand-ups and done with increasing numbers of checkmark emojis.
Marcel Hild - Spryker (e)commerce framework als Alternative zu traditioneller...AboutYouGmbH
Spryker is a commerce technology company that provides a framework for building complex e-commerce sites. The document discusses Spryker's modular architecture, which separates features into independent bundles that can be updated separately. This allows projects to be customized while keeping dependencies minimized. The technology is built on solid principles like SOLID and follows best practices including frontend/backend separation and a modular monolith approach.
Die Folien stammen aus einer Roundtable-Diskussion rund um den Mobilen Zugang zu Informationen und zu Unternehmenswissen: Werden mobile Devices gebraucht oder nicht? Müssen Hersteller von Business Software diese auch mobil zur Verfügung stellen? Welche Szenarien für mobiles Entreprise Content Management (ECM) sind sinnvoll?
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4IRJET Journal
This document discusses improving real-time 3D object detection on LIDAR point clouds using an optimized version of Complex-YOLO V4. The original Complex-YOLO model achieves real-time performance but the authors implement it using YOLO V4 and compare different rotated box IoU losses to achieve faster and more accurate object detection results on the KITTI benchmark. Their improved model shows promising results with higher accuracy while maintaining real-time performance.
1) The document proposes a vehicle and pedestrian detection method based on spatial pyramid pooling and attention mechanisms to improve YOLOv3.
2) It first uses spatial pyramid pooling to fuse local and global features to better detect targets of different sizes. It then adds an attention mechanism to enhance key features and remove redundant features.
3) Experimental results showed the proposed method achieved 91.4% mAP and 83.2% F1 score on the KITTI dataset, performing better than YOLOv3 in accuracy and speed.
This document proposes and evaluates several deep learning models for unsupervised monocular depth estimation. It begins with background on depth estimation methods and a literature review of recent work. Four depth estimation architectures are then described: EfficientNet-B7, EfficientNet-B3, DenseNet121, and DenseNet161. These models use an encoder-decoder structure with skip connections. An unsupervised loss function is adopted that combines appearance matching, disparity smoothness, and left-right consistency losses. The models are trained on the KITTI dataset and evaluated using standard KITTI metrics, showing improved performance over baseline methods using less training data and lower input resolution.
IRJET- Automatic Traffic Sign Detection and Recognition using CNNIRJET Journal
This document presents a method for automatic traffic sign detection and recognition using convolutional neural networks (CNNs). The proposed system first enhances input images and performs thresholding and region extraction. Features are then extracted and the images are classified using a CNN. The CNN architecture includes convolutional, ReLU, pooling and fully connected layers. The system achieves detection rates over 88% mean average precision and boundary estimation errors under 3 pixels. It runs in real-time at over 7 frames per second on mobile platforms, providing accurate traffic sign detection, recognition and boundary estimation. The method is robust to occlusion, blurring and small targets compared to other methods.
Goal location prediction based on deep learning using RGB-D camerajournalBEEI
In the navigation system, the desired destination position plays an essential role since the path planning algorithms takes a current location and goal location as inputs as well as the map of the surrounding environment. The generated path from path planning algorithm is used to guide a user to his final destination. This paper presents a proposed algorithm based on RGB-D camera to predict the goal coordinates in 2D occupancy grid map for visually impaired people navigation system. In recent years, deep learning methods have been used in many object detection tasks. So, the object detection method based on convolution neural network method is adopted in the proposed algorithm. The measuring distance between the current position of a sensor and the detected object depends on the depth data that is acquired from RGB-D camera. Both of the object detected coordinates and depth data has been integrated to get an accurate goal location in a 2D map. This proposed algorithm has been tested on various real-time scenarios. The experiments results indicate to the effectiveness of the proposed algorithm.
Object gripping algorithm for robotic assistance by means of deep leaning IJECEIAES
This document presents a new algorithm using deep learning techniques for object gripping by a robot. It uses a Faster R-CNN to classify and locate three types of objects (cylinder, parallelepiped, toroid) with 100% accuracy. It also uses a CNN for regression to estimate the rotation angle of the parallelepiped with a mean error of 0.769 degrees. Testing in a virtual environment showed the algorithm could classify objects and grip them successfully at 5 frames per second using a three-finger gripper for increased stability over a two-finger gripper.
An Analysis of Various Deep Learning Algorithms for Image Processingvivatechijri
Various applications of image processing has given it a wider scope when it comes to data analysis.
Various Machine Learning Algorithms provide a powerful environment for training modules effectively to
identify various entities of images and segment the same accordingly. Rather one can observe that though the
image classifiers like the Support Vector Machines (SVM) or Random Forest Algorithms do justice to the task,
deep learning algorithms like the Artificial Neural Networks (ANN) and its subordinates, the very well-known
and extremely powerful Algorithm Convolution Neural Networks (CNN) can provide a new dimension to the
image processing domain. It has way higher accuracy and computational power for classifying images further
and segregating their various entities as individual components of the image working region. Major focus will
be on the Region Convolution Neural Networks (R-CNN) algorithm and how well it provides the pixel-level
segmentation further using its better successors like the Fast-Faster and Mask R-CNN versions.
A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...IRJET Journal
This document reviews improving traffic sign detection using the YOLO algorithm for object detection. It begins by discussing previous work on traffic sign detection and recognition that used techniques like mobile LiDAR, sparse R-CNN neural networks, and improvements to YOLOv4-Tiny. It then examines the YOLO algorithm and how it uses convolutional neural networks for real-time object detection with a single propagation through the network. The document proposes using an improved YOLO algorithm for traffic sign detection to address limitations in existing techniques. It discusses the methodology of object detection, recognition and localization using neural networks and how YOLO has been applied for applications like traffic sign detection.
The document describes the team ICTANS' pipeline for position and orientation estimation of cars and pedestrians using sensors such as lidar, radar, and cameras. It discusses their framework which uses R-FCN detectors on front and bird's eye views of lidar data along with sensor fusion to estimate obstacle positions, as well as improvements made in the second round such as using only velodyne data and Kalman filtering for detection in various ranges while satisfying real-time constraints. The team achieved a score of 0.332 and rank of 5 using this approach.
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...ijtsrd
This document summarizes a research paper that proposes a deep learning model for detecting and recognizing traffic lights using transfer learning. It begins with an introduction describing the challenges of autonomous vehicle perception and how deep learning can help overcome these challenges. It then reviews 9 other related works on traffic light detection using techniques like RFID, background subtraction, object tracking networks and HSV color modeling. Finally, it proposes a model using Faster R-CNN and Inception V2 for transfer learning to detect traffic lights in images and determine their state (red, yellow, or green). The model is trained on a dataset of Indian traffic signals.
Licence Plate Recognition Using Supervised Learning and Deep LearningIRJET Journal
1. The document discusses using supervised learning and deep learning techniques for license plate recognition (LPR). It analyzes direct and indirect recognition algorithms and compares features of existing LPR systems.
2. A proposed LPR system is described that uses image preprocessing, license plate detection, character segmentation, and character recognition. Preprocessing improves image quality before detecting the license plate region.
3. The proposed system applies contour tracing and Canny edge detection algorithms to the license plate region to sharpen character edges for recognition.
This document describes a proposed method for real-time object detection using Single Shot Multi-Box Detection (SSD) with the MobileNet model. SSD is a single, unified network for object detection that eliminates feature resampling and combines predictions. MobileNet is used to create a lightweight network by employing depthwise separable convolutions, which significantly reduces model size compared to regular convolutions. The proposed SSD with MobileNet model achieved improved accuracy in identifying real-time household objects while maintaining the detection speed of SSD.
Traffic Sign Detection and Recognition for Automated Driverless Cars Based on...ijtsrd
1) The document discusses a proposed method for detecting and recognizing traffic lights and signs for autonomous vehicles using Faster Region-Based Convolutional Neural Network (F-RCNN).
2) The method uses transfer learning with the F-RCNN Inception V2 model in TensorFlow to identify different traffic light and sign classes from images.
3) Prior related work on traffic light and sign detection is also discussed, including methods using image processing techniques as well as deep learning methods like convolutional neural networks.
Lane and Object Detection for Autonomous Vehicle using Advanced Computer VisionYogeshIJTSRD
The vision of this project is to develop lane and object detection in Autonomous Vehicle system to run efficiently in normal road condition and to eliminate the use of high cost Light based LiDAR system to implement high resolution cameras with advanced computer vision and deep learning technology to provide an Advanced Driver Assistance System ADAS . Detecting lane lines could be a crucial task for any self driving autonomous vehicle. Hence, this project was focused to identify lane lines on the road using OpenCV. The OpenCV tools such as colour selection, the region of interest selection, grey scaling, canny edge detection and perspective transformation are being employed. This project is modelled as an integration of two systems to solve the real time implementation problem in autonomous vehicles. The first part of the system is lane detection by advanced computer vision techniques to detect the lane lines to command the vehicle to stay inside the lane marking. The second part of the system is object detection and tracking is to detect and track the vehicle and pedestrians on the road to get a clear understanding of the environment to plan and generate a trajectory to navigate the autonomous vehicle safely to its destination without any crashes, this is done by a special deep learning method called transfer learning with Single Shot multibox Detection SSD algorithm and Mobile Net architecture. G. Monika | S. Bhavani | L. Azim Jahan Siana | N. Meenakshi "Lane and Object Detection for Autonomous Vehicle using Advanced Computer Vision" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-3 , April 2021, URL: https://www.ijtsrd.com/papers/ijtsrd39952.pdf Paper URL: https://www.ijtsrd.com/engineering/electronics-and-communication-engineering/39952/lane-and-object-detection-for-autonomous-vehicle-using-advanced-computer-vision/g-monika
Automatic Detection of Unexpected Accidents Monitoring Conditions in TunnelsIRJET Journal
The document describes a proposed system to automatically detect accidents and unexpected events in road tunnels using video footage from CCTV cameras. The system would use object detection and tracking technology, along with a Faster R-CNN deep learning model, to identify objects like vehicles, fires, and people in tunnel videos. It would monitor the movement and position of detected objects over time to identify accidents or other irregular events. If an accident is detected, a signal would be sent to alert authorities so they can respond quickly. The system aims to address the challenges of limited visibility and low-quality images from tunnel CCTV cameras.
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...IRJET Journal
This document summarizes an approach for efficient object detection and matching in images and videos. It proposes a classification scheme that classifies extracted features as either object or non-object features. This binary classification approach can be used for object detection and matching in a way that is more robust and faster compared to traditional methods. The classification stage also enables faster object registration. The approach is evaluated to show advantages for object matching and registration compared to other methods. It has potential applications for real-time object tracking and detection.
2019年6月13日、SSII2019 Organized Session: Multimodal 4D sensing。エンドユーザー向け SLAM 技術の現在。登壇者:武笠 知幸(Research Scientist, Rakuten Institute of Technology)
https://confit.atlas.jp/guide/event/ssii2019/static/organized#OS2
This document summarizes an approach for object recognition and 6-DoF pose estimation from RGB-D images. It first segments the scene to isolate objects from background surfaces like tables. It then clusters the remaining points into individual objects. For each object model, it generates synthetic views from different angles and extracts a global feature descriptor combining geometry and color information. During recognition, it extracts descriptors from segmented objects, finds nearest matches in the training database, and estimates the object and its pose. Experimental results demonstrate high recognition rates and pose accuracy, even under occlusion.
Object Detection for Autonomous Cars using AI/MLIRJET Journal
The document discusses using machine learning and computer vision techniques for object detection in autonomous vehicles. Specifically, it proposes using the Single Shot Detector (SSD) algorithm to identify and classify objects around a self-driving car from camera images. The SSD model was trained on a dataset to detect common objects like cars, people, buses etc. and estimate bounding boxes around detected objects. The methodology uses OpenCV and TensorFlow to implement SSD on images from a webcam in real-time. While bounding boxes were sometimes inconsistent in dense traffic, detection was more accurate for objects closer to the camera or in less crowded scenarios. The goal is to demonstrate how computer vision allows autonomous vehicles to perceive their surroundings.
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
How to Build a Module in Odoo 17 Using the Scaffold Method
Mmpaper draft10
1. Bidirectional Information exchange
Subtitle as needed (paper subtitle)
Authors Name/s per 1st Affiliation (Author)
line 1 (of Affiliation): dept. name of organization
line 2-name of organization, acronyms acceptable
line 3-City, Country
line 4-e-mail address if desired
Authors Name/s per 2nd Affiliation (Author)
line 1 (of Affiliation): dept. name of organization
line 2-name of organization, acronyms acceptable
line 3-City, Country
line 4-e-mail address if desired
Abstract— It is essential to collect and analyze
environmental information surrounding the vehicle in an
autonomous driving environment. To do this, it is
necessary to analyze real-time information such as
location, orientation and size of objects. In particular,
handling of occlusion or truncation helps to ensure safe
driving by providing exact information about the
surroundings of the vehicle. In this paper, we propose a
new method to generate object proposals around a vehicle
using projected LiDAR information and to perform object
detection more accurately by exploiting a 2D image based
classifier. This method generates a proposal by filtering
the LiDAR information into a 2D edge with a simple but
strong affinity. Then classify the objects in the proposal
through R-FCN and map the class labels classified at the
edge points used to make the proposal. This class label
combines with the orientation information of the edge in
3D space to complete the 3D box of the object. This
compensates for the disadvantage of the CCD-based CNN
classifier which shows difficulties to obtain spatial
information and solves such problems as occlusion. We
compared our method to state-of-the-art results on the
KITTI Dataset and showed good results in terms of speed
and accuracy (especially IOU with ground-truth).
Keywords—LiDAR; CCD; Proposal generation; Object
Detection; R-FCN
I. INTRODUCTION
In recent years, CNN had many contributions from fields such
as object classification and object detection. Especially, as the
autonomous driving becomes an issue, deep network
researches for object recognition and behavior analysis in the
driving environment are actively being carried out. The most
recent deep network-based object classification researches are
divided into two categories : two-stage type and one-stage
type. In two-stage type, 'proposal generator' and 'classifier'
scheme are implemented as an independent component, but in
one-stage type these are performed together.
In two-stage type, R-CNN[1] shows good performance in
object classification by evaluating the 'scores' of features that
exist in that region via 'region'(?) and classifying the proposed
regions into feature maps. This method significantly reduces
the computational cost compared to the sliding window
method, and thus has advantages in terms of speed. Faster R-
CNN [2] devised the 'RPN (Region Proposal Network)' that
generates the proposal through feature sharing, taking
advantage of the fact that the convolution layer itself contains
enough information to represent the object. HyperNet [3]
replaces the features used in the Faster R-CNN by pooling
several times to compensate for the fact that the FC5 layer
information is insufficient for small objects. It creates a hyper
feature set and used it to match objects of various sizes. There
is, however, a dilemma for translation invariants and variants
between linked object detection and classifiers. To solve this
dilemma, R-FCN [4] divides the ROI into k × k lattices and
finds objects in the center of ROI through local voting.
In one-stage type, bounding prediction and class
probability calculation are performed together like YOLO[5].
In YOLO, the class probability map and the bounding box
confidence map in the grid are generated based on the grid
without generating a separate proposal, and the class of the
bounding box is classified immediately, greatly simplifying
the overall calculation process. Fast YOLO [ ] simplifies the
network a little bit and shows a computation speed of 155 fps.
In a similar approach, a single shot detector (SSD)[6] was
introduced to evaluate box and confidence through step
features in the convolution layer like HyperNet.
Even though the existing methods are good at classifying
occlusion and truncated objects, but is vulnerable to the
processing of the obscured part of the object.
However, for autonomous navigation, additional
information is needed in detecting and classifying objects
well. The most important information is occlusion and
truncation among objects. These two factors are important
factors in understanding the actual position and arrangement
of nearby objects in the driving environment. Of course, the
existing set of methods is also good at classifying occlusion
and truncated objects, but is vulnerable to the processing of
the obscured part of the object.
In both types, a bounding box is created based on the
characteristics of the unobscured part of the obscured object,
and the object is classified through a part of the object
represented in the box. So, although the class of the object can
be found, there is a problem that the size of the actual object in
2. the
hidden
area
can
not be
estimated.
Another problem is obtaining spatial information such as the
3D location and orientation of the detected object, and the
actual size of the object. In order to grasp the traffic flow
around the vehicle prior to autonomous driving, it is very
difficult to estimate the 3D spatial information using only one
accurate RGB image. Therefore, using 3D depth information
is the most intuitive method. 3DOP[7] generates Depth map
using stereo image, and by projecting each Pixel of RGB
Image into 3D space. It defines the relationship between each
pixel as an MRF energy function through several properties,
classifies the object by Linear SVM. However, the
sophisticated stereo-based depth map generation used in
3DOP consumes a lot of computational cost. Therefore, there
is a method to obtain depth in addition to stereo, among which
laser sensor information such as LiDAR is used. Laser sensors
can collect 3D spatial information very quickly. Among these
studies, vote3D [8] used LiDAR directly learning in 3D space
through 3D Voxel. 3DVP classifies LiDAR pointcloud into
3D Voxel and applies 2D Alignment with 3D CAD Model. In
addition, there are studies in which LiDAR points are
projected onto 2D to generate 2D depth maps. But, LiDAR
information has sparse characteristics basically and it is
difficult to learn steadily because it shows non-uniform results
depending on the surface of environment or object. Based on
the structure of RPN + classifier of Faster R-CNN,
SubCNN[9] adds subcategory information to both sides and
Figure 1. System overview. The upper network is characterized by scoring through voting for the area of k x k with R-
FCN. In this study, ROI of this network is generated by filtering LiDAR pointcloud. Through this ROI, the object is
classified by voting and the classified class label is mapped again to the point to expand it in 3D space. This creates a final
bounding box for each object.
3. corrects the area according to the size and direction
information of subclass. Particularly in 'Car' category,
3DVP[10] was applied to identify the 3D location and
orientation of the vehicle and perform 2D and 3D
segmentation. However, this method can only be applied to
the rigid body model, and many additional samples are
required depending on each sub category class. In addition,
manual sorting for alignment is required for all of them, and
the sparse LiDAR pointcloud has the disadvantage that it is
difficult to construct a sufficient voxel for long distance
objects. And the computation speed is not fast enough and is
not suitable for autonomous driving environments where real-
time processing is essential. Therefore, this study proposes
fast and accurate 2D and 3D object detection and classification
method suitable for autonomous driving environment by
adding CCD image and LiDAR sensor information. The
proposed method uses a LiDAR Point as 2D data that
combines all the points of the z-axis in one plane instead of
the 3D voxel, which guarantees a much higher density than the
conventional one. We filter them and group edges to find
edges with high affinity and generate 2D proposals. The
generated proposal is used as the ROI of the R-FCN network
and the bounding box of the object in 3D space is created by
mapping the classification result to the constituent edge point.
Finally, if you project it onto 2D again, you can expand the
box to the hidden region that does not appear in the 2D image.
Our benefits through this study:
- This method replaces the existing proposal generator and
improves the speed by mixing the simplified LiDAR sensor
information with the powerful CCD-based CNN architecture.
- It is possible to acquire 3-D spatial information about the
object around the vehicle at the same time by only the
information generated in the process of making proposal
without additional process.
- This method expands the bounding box of the occluded or
truncated region to extend the box closer to the actual size of
the object.
4. To do this, we show how to perform a mutual projection
between 2D and 3D points using a simple method in Section
II-A. Section II-B uses this to detect objects in the CCD-based
state-of-the-art classifier R-FCN network. Section II-C
expands the box based on orientation information obtained in
II-A and orientation information obtained in 3D space through
the detected class label to determine the optimal bounding box
for the actual class. Section-III quantitatively verifies the
performance of the Average precision(AP) in the object and
tracking sequences in the KITTI Dataset and compares it with
the previous studies. Also, confirm the expansion result for the
obscured area through the picture.
II. BIDIRECTIONAL INFORMATION EXCHANGE
A. Proposal Generation with LiDAR
We propose 'bidirectional projection filtering' using 2D -
3D information as an intersection through a calibration matrix
between CCD image and 3D point cloud as shown in Figure 2.
Through this method, it is possible to generate an object
proposal, estimate an obscured area, and obtain the actual size
and 3D spatial information of the detected object based on the
CCD image. In this section, we describe how to remove point
information that is not needed for proposal generation and
exchange information between 2D and 3D space through
projection. As shown in figure 1- (b), we remove the ground to
separate object points and eliminate points outside the unused
CCD angle. Next, project the z-axis on one side to obtain a
dense 2D object shape. Then, the error points are filtered and
the remaining edges are grouped based on the affinity between
the neighboring edges, and a proposal is generated in units of
edge groups generated.
Figure 2. Point filtering through 2D projection of LiDAR.
(a) original image, (b) Lidar point cloud with ground
removed, (c) removal of points outside the angle of view
and reduce dimensions to increase density (d) 2D
projection and filtering of points, (e) 3D point cloud before
filtering, and (F) 2D point cloud after filtering.
Ground Plane removal - In the LiDAR environment, the
simplest way to remove the ground is to remove all of the
LiDAR points below the corresponding height by considering
the ground of the vehicle equipped with the sensor as ground
plane. In this paper, we propose a surface detection method
considering the curvature of the road surface because the road
is infinitely flat and there is no obstacle. Although there are
many methods, this study uses the fact that the objects in the
driving environment have a volume including a certain height.
Through the voxel grid of 20x20x10 cm, the height of the
points constituting the cell is obtained for all XY cells in which
the point exists.
(1)
(2)
When each coordinate of the XY plane is i and j, each grid
cell is represented by Cij, and if the point set belonging to Cij
is PSij. In equation (1), HPSij is the height of each cells.
Ground G(i,j) in equation (2) is the grid with zero height area.
Figure 2-(b) shows the point cloud in which the ground area is
removed in this way.
Outside point elimination – It is necessary to further
simplify the point set with ground removed. First, the
LiDAR Point is multiplied with the calibration matrix to
remove all points outside the x-axis range of the CCD image
and then process the remaining points only. The first reason for
doing a CCD projection is because you can easily remove
LiDAR points outside the angle of view. Second, since the
reference of the horizontal axis search changes from radian to
pixel through projection, it is possible to perform filtering on
the horizontal axis in a very simple manner. Figure 2-(c) shows
that the range of points to be processed is greatly reduced.
Z axis elimination - Sparse LiDAR points after the ground is
removed are configured in a two-dimensional plane by
projecting all the points on the z-axis to increase the density
and at the same time for fast computation. Since the height that
can be reached is 50cm or more, PS
z
ij = -0.9 is set for all point
set PSij in order to locate the upper plane at 50cm above the
ground where the radar sensor of the vehicle is installed.
However, as shown in Figure 2-(b), since the 3D point of the
existing 3D space is pressed in one plane, the density of the
point is increased, but there is a portion that looks like noise
due to the depth difference.
Edge filtering in CCD - The use of 2D LiDAR point clouds
by CCD projection has several advantages. In 3D space,
LiDAR spreads radially with zero point and obtains distance
information. In order to extract the shortest line of an object
from a shooting point, the distance must be calculated for all
the angles that are continuous values. However, if we project
this onto the CCD, the discrete value of the pixel becomes the
reference domain as shown in Figure 2-(d). As shown in the
figure 3, duplicate points can easily be removed according to
height or depth in one pixel column in pixel level. In addition,
since the error pixel can be easily discriminated through the
gradient operation proportional to the distance between
5. neighboring pixels, it is possible to generate the edges closest
to the photographed vehicle in each column of the x axis.
(3)
In equation (3), Cli denotes all x-axis columns containing at
least one projected LiDAR Point and hPsi denotes the
minimum height of each Cli. This gives hPsi for all i and
removes noise through a one-dimensional median filter[11].
At this time, not only the height of the point selected by the
median filter but also the index of the corresponding point are
copied together to remove the noise point in the 3D space as
well. Figure 2-(f) shows a three-dimensional point cloud
arranged through this.
Segmentation by edge affinity – The point map determined
through the above process is segmented by two conditions.
First, we group the consecutive edges by the edge grouping
method introduced in Edgeboxes[12]. In this case, the edge
magnitude mp is used to inverse the Euclidean distance
between adjacent points Pi and Pj of two neighboring edge
groups as shown in equation (4).
(4)
(5)
The affinity score a(si,sj) is obtained for all edge groups Si, and
the edge is segmented based on the boundary of the group
whose score changes rapidly. (4) is a formula for obtaining the
affinity score between neighboring edge groups. PGz
i and PGz
j,
which are the ground heights, and PGz1.5
i and PGz1.5
j, which are
1.5m high from the ground, are projected on the CCD image,
respectively, for pi and pj located at the boundary of each edge
set. Create a bounding box by connecting horizontally to the
adjacent boundary.
Proposal generation based on actual size – We create a
bounding box between segmented boundaries. At this time, the
bottom value of each boundary point is determined through the
ground position obtained by the ground removal process. The
height of the bounding box is 1.5m which covers all three
classes that are mainly detected in the driving environment and
it is projected on CCD image to generate proposal in CCD
image. The yellow line in Figure 4 represents the projected
LiDAR edge, and the orange line represents the boundary
divided by the affinity of the LiDAR edge. The green box at
the bottom of Figure 4 creates a box with a height of 1.5m
between the boundaries based on this.
Figure 4. A box proposal created based on edge group
boundaries divided by edge affinity.
B. Classification with R-FCN
The generated proposal classifies objects by combining them
with R-FCN showing state-of-the-art performance. Figure 5-(b)
shows the R-FCN implemented through ResNets [13] 101. R-
FCN solves the object translation invariance and variance
dilemma between detection and classification by dividing
Figure 3. The process by which objects are detected by bidirectional projection filtering. (a) original image, (b) 2D LiDAR
projection, (c) 2D gradient-based edge cleanup, (d) Proposal creation using edge affinity, and classification result by R-
FCN. (e) 3D bounding box expansion by LiDAR projection(Red Boxes). , (f) final result.
6. ROI is divided into n x n grid and feature set is trained through
each grid cell. If a box proposal is determined according to this
learning method, a high voting score is returned if the object is
sufficiently wrapped. By using this feature, it is possible to
organize the surrounding box based on the voting score of the
redundant box due to noise of LiDAR.
Figure 5. Detailed structure of proposed method. (a) 3D to
2D Projection Filtering (b) R-FCN (c) 2D to 3D Projection
Filtering. (d) Result
C. 3D projection and box extension
This time, we assign the class label to the edge group inside the
edge boundary of the box classified by R-FCN. Each labeled
edge is projected in the 3D space through the index given in
Figure 5-(c) and the orientation of the object is estimated
through the X-Y axis pole among the points constituting the
edge. Once the orientation is determined, the 3D box is
expanded to match the class specific size and aspect ratio. In
this case, the edge to be used as the base of the expansion can
be determined according to the positional relationship between
the edge group and the height difference of hPi and hPj at the
segmentation boundary line surrounding the edge in 2D space.
Using this, we extend the box around the edge that contains the
edge of the object. Figure 6 shows the process of expanding the
box through the labeled edges and integrating the overlapping
boxes and edges.
III. EXPERIMENT RESULTS
This study was conducted through caffe[15] framework and
used NVIDIA TITAN X GPU. The two networks we used for
evaluation, Faster R-CNN and R-FCN, were originally studied
and evaluated in the PASCAL VOC Dataset, but this study
was conducted through KITTI Dataset because LiDAR
information is required together. We also compared the
original R-FCN method using Selective search [14] and the
results of this study with the KITTI object dataset by using R-
FCN learning model learned in PASCAL VOC for objective
comparison. The metrics used in the evaluation are the
standard mean average precision and the mean intersection-
over-union.
A. Experiments on KITTI
We first training through the KITTI Object Dataset and
Tracking sequence, which is a typical dataset that can use
CCD information and LiDAR information among public
datasets that can use R-FCN at present. In training phase, four
classes of 'Car', 'Pedestrian', 'cyclist', and 'background' were
combined and learned in four classes. Since the proposed
method does not require additional learning for LiDAR, there
is no need to change the architecture of R-FCN to learn CCD
image. However, as the type of class changes, the volume of
the feature set has changed. But the valid range of LiDAR
Sensor data given in KITTI Dataset is about 50m. It was not
possible to evaluate the object of hard difficulty which is not
wide enough to cover the entire CCD image and belong to the
distance of 50m or more.
Table 1 compares the object detection rates of existing state-
of-the-art methods and our research by measuring the mAP by
difficulty for the three classes. We measured the precision of
the box with the ground truth area and IOU of 50% or more,
and found good results for 'Car' and 'Pedestrian'. However, in
the case of 'cyclist', LiDAR points were not uniformly
distributed according to the spoke shape of the bicycle wheel,
so that a box proposal of a sufficient size could not be
generated, resulting in a relatively low value.
Car Pedestrian Cyclist
Method E M E M E M
Regionlet[15] 84.75 76.45 73.14 61.15 70.41 58.72
3DVP[10] 87.46 75.77 - - - -
SubCat[16] 84.14 75.46 - - - -
SDP [17] 90.33 83.53 77.74 64.19 74.08 61.31
Ours 95.41 88.54 81.78 65.71 72.11 60.85
Table 1. The results of the KITTI dataset of our study are
compared with state-of-the-art methods. Hard difficulty
was excluded from comparison because of the limit of
measurement distance of LiDAR sensor of about 50m.
B. Experiments on PASCAL VOC & KITTI
In fact, this study is close to the proposal generation method,
so the classifier R-FCN itself is used without any
modification. Therefore, the measurement of mAP at KITTI is
Figure 6. Expanded box by determined edge orientation
and corner detection. (a) shows the object box classified by
R-FCN, and (b) is a figure that is extended to 2D according
to the actual size of the class through 2D to 3D projection
and then projected to 2D.
7. not enough to analyze the performance of our study because it
is more influenced by the performance of our R-FCN.
Therefore, we compared the results with R-FCN using Faster
R-CNN or Selective search using RPN to objectively measure
performance as a proposal creation and result correction tool.
In this comparison, we used the caffe model learned from
PASCAL VOC 07 + 12 published in the original paper as it is
to prevent problems that may occur during re-learning of two
networks through KITTI Dataset. Table 2 compares the
accuracy and computation time of two other studies with this
study. Table 3 shows the AP variation with IOU ratio to
ground-truth. As shown in the table, the overlap ratio is higher
than that of conventional RPN or SS. This is because the box
extension through the class label reduces the error by re-
expanding the box closer to the actual object size. We have
confirmed that, as long as the object belongs to the LiDAR
sensor range, this study expresses the object as a bounding box
closer to the ground-truth than the existing research. Although
the box extension is performed only for the vehicle category
due to the problem of orientation acquisition, as shown in
Zhang et al.[14]'s experiment. But, if the aspect ratio and size
of the object such as pedestrian can be generalized, it is
possible to expand the additional category.
Training
Data
Test
Data
mAP
(%)
test time
(sec/img)
RPN+Faster R-
CNN
07+12 KITTI 75.7 0.37
RPN+R-FCN 07+12 KITTI 77.4 0.20
SS+R-FCN 07+12 KITTI 80.4 2.21
Ours +R-FCN 07+12 KITTI 82.4 0.17
Table 2. Comparison result of detection rate in KITTI
Object and Tracking dataset. To reduce errors, we used
the original pre-training model.
Training
Data
Test
Data
AP@
0.5
AP@
0.7
AP@
0.9
RPN+R-FCN 07+12 KITTI(car) 84.8 77.4 55.2
SS+R-FCN 07+12 KITTI(car) 86.3 80.4 58.4
Ours 07+12 KITTI(car) 89.7 82.4 80.1
Table 3. The AP change according to the IOU rate in the
Car category. The proposed method shows a high overlap
ratio with respect to the ground truth compared to the
RGB feature based proposal generator such as selective
search.
IV. CONCLUSION
We propose a simple, but strong method for autonomous
driving through this study. We have shown better results than
state-of-the-art methods by combining CCD-based classifier
and LiDAR information effectively through the proposed BPF.
Especially, 3D space information of a partially obscured object
is grasped in real time, and the detection area is extended based
on this, so that the result shows a high IOU rate with the
groundtruth. However, there are still problems to be solved,
such as how to handle objects that appear on the CCD but are
outside the LiDAR range, or to expand on both sides of the
object. In addition, LiDAR sensing range is shorter than the
visible range of the CCD, so it is necessary to compensate for
CCD objects. We will address these issues through additional
research.
REFERENCES
[1] Girshick, Ross, et al. "Rich feature hierarchies for accurate object
detection and semantic segmentation." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2014.
[2] Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object
detection with region proposal networks." Advances in neural
information processing systems. 2015.
[3] Kong, Tao, et al. "HyperNet: Towards Accurate Region Proposal
Generation and Joint Object Detection." arXiv preprint
arXiv:1604.00600 (2016).
[4] Dai, Jifeng, et al. "R-FCN: Object Detection via Region-based Fully
Convolutional Networks." arXiv preprint arXiv:1605.06409 (2016).
[5] Redmon, Joseph, et al. "You only look once: Unified, real-time object
detection." arXiv preprint arXiv:1506.02640 (2015).
[6] Liu, Wei, et al. "SSD: Single Shot MultiBox Detector." arXiv preprint
arXiv:1512.02325 (2015).
[7] Chen, Xiaozhi, et al. "3d object proposals for accurate object class
detection." Advances in Neural Information Processing Systems. 2015.
[8] Wang, Dominic Zeng, and Ingmar Posner. "Voting for voting in online
point cloud object detection." Proceedings of the Robotics: Science and
Systems, Rome, Italy 1317 (2015).
[9] Xiang, Yu, et al. "Subcategory-aware Convolutional Neural Networks
for Object Proposals and Detection." arXiv preprint arXiv:1604.04693
(2016).
[10] Xiang, Yu, et al. "Data-driven 3d voxel patterns for object category
recognition." 2015 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). IEEE, 2015.
[11] T. Huang, G. Yang, and G. Tang, "A fast two-dimensional median
filtering algorithm", IEEE Trans. Acoust., Speech, Signal Processing,
vol. 27, no. 1, pp. 13–18, 1979.
[12] Zitnick, C. Lawrence, and Piotr Dollár. "Edge boxes: Locating object
proposals from edges." European Conference on Computer Vision.
Springer International Publishing, 2014.
[13] He, Kaiming, et al. "Deep residual learning for image recognition."
arXiv preprint arXiv:1512.03385 (2015).
[14] Uijlings, Jasper RR, et al. "Selective search for object recognition."
International journal of computer vision 104.2 (2013): 154-171.
[15] Wang, Xiaoyu, et al. "Regionlets for generic object detection." IEEE
transactions on pattern analysis and machine intelligence 37.10 (2015):
2071-2084.
[16] Ohn-Bar, Eshed, and Mohan Manubhai Trivedi. "Learning to detect
vehicles by clustering appearance patterns." IEEE Transactions on
Intelligent Transportation Systems 16.5 (2015): 2511-2521.
[17] Yang, Fan, Wongun Choi, and Yuanqing Lin. "Exploit all the layers:
Fast and accurate cnn object detector with scale dependent pooling and
cascaded rejection classifiers." Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition. 2016.
[18] Zhang, Liliang, et al. "Is Faster R-CNN Doing Well for Pedestrian
Detection?." European Conference on Computer Vision. Springer
International Publishing, 2016.