How conventional vision is more appropriate for control since it provides also error analysis. There is a lot of information in the images that is lost when converting to 3D
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...Darius Burschka
The talk motivates a re-thinking of the way, how perception passes the information to the control modules. Metric information is not a native space of the camera and apparently also not used in biology for navigation. Early abstraction of information from images loses a lot of important information that can be directly used for following (Visual-Servoing), motion estimation(Motion Blurr), and collision relations(Optical Flow Clustering). I present in this talk ways, how we use the image information in "classical way" that does not require any learning and runs on low-power CPUs.
The Machine Vision and Perception Group at the Technical University of Munich conducts research on visual perception and its application to medical, mobile, and human-computer interaction systems. This includes areas like registration of rigid and deformable objects, monocular reconstruction, visual localization and mapping, and biologically inspired navigation and action analysis. The group develops algorithms for tasks like 3D object recognition in point clouds that are robust to incomplete and noisy data.
Deep Learning - a Path from Big Data Indexing to Robotic ApplicationsDarius Burschka
These are the slides to my ShanghAI lecture from Dec 10, 2020. It proposes necessary extensions to make DeepNets appropriate tools for robotic systems.
The talk can be found on https://fb.watch/2hXDC6K4Pq/
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...c.choi
1) The document describes a real-time method for estimating and tracking the 3D pose of a rigid object using either a mono or stereo camera.
2) The method combines scale invariant feature matching (SIFT) for initial pose estimation with optical flow-based tracking (KLT) for efficient local pose estimation.
3) Outliers in the tracking are removed using RANSAC to improve accuracy, and tracking restarts from initial pose estimation if the number of inliers falls below a threshold.
In tech vision-based_obstacle_detection_module_for_a_wheeled_mobile_robotSudhakar Spartan
This document describes a vision-based obstacle detection module for a wheeled mobile robot. It uses a stereoscopic vision system to detect obstacles and build and update a map of the environment. The system extracts the ground surface from images using hue and detects obstacle edges using luminance. It then performs stereo matching on corresponding obstacle edge points to calculate depth and detect obstacles. The module is implemented using an FPGA to achieve high-performance real-time obstacle detection for safe robot navigation and dynamic map updating.
Motion capture is the process of recording movements of humans or objects and translating that data into digital form that can be used in films, games, and other media. It works by tracking markers placed on actors' bodies and using multiple synchronized cameras to triangulate the 3D positions over time. Early motion capture used mechanical exoskeletons connected to joints, but modern optical systems track passive reflective markers with cameras in the infrared spectrum. Optical motion capture is now commonly used in film production due to its accuracy and ability to capture complex performances without wires or sensors restricting movement.
Intelligent indoor mobile robot navigation using stereo visionsipij
Majority of the existing robot navigation systems, which facilitate the use of laser range finders, sonar
sensors or artificial landmarks, has the ability to locate itself in an unknown environment and then build a
map of the corresponding environment. Stereo vision,while still being a rapidly developing technique in the
field of autonomous mobile robots, are currently less preferable due to its high implementation cost. This
paper aims at describing an experimental approach for the building of a stereo vision system that helps the
robots to avoid obstacles and navigate through indoor environments and at the same time remaining very
much cost effective. This paper discusses the fusion techniques of stereo vision and ultrasound sensors
which helps in the successful navigation through different types of complex environments. The data from
the sensor enables the robot to create the two dimensional topological map of unknown environments and
stereo vision systems models the three dimension model of the same environment.
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...Darius Burschka
The talk motivates a re-thinking of the way, how perception passes the information to the control modules. Metric information is not a native space of the camera and apparently also not used in biology for navigation. Early abstraction of information from images loses a lot of important information that can be directly used for following (Visual-Servoing), motion estimation(Motion Blurr), and collision relations(Optical Flow Clustering). I present in this talk ways, how we use the image information in "classical way" that does not require any learning and runs on low-power CPUs.
The Machine Vision and Perception Group at the Technical University of Munich conducts research on visual perception and its application to medical, mobile, and human-computer interaction systems. This includes areas like registration of rigid and deformable objects, monocular reconstruction, visual localization and mapping, and biologically inspired navigation and action analysis. The group develops algorithms for tasks like 3D object recognition in point clouds that are robust to incomplete and noisy data.
Deep Learning - a Path from Big Data Indexing to Robotic ApplicationsDarius Burschka
These are the slides to my ShanghAI lecture from Dec 10, 2020. It proposes necessary extensions to make DeepNets appropriate tools for robotic systems.
The talk can be found on https://fb.watch/2hXDC6K4Pq/
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...c.choi
1) The document describes a real-time method for estimating and tracking the 3D pose of a rigid object using either a mono or stereo camera.
2) The method combines scale invariant feature matching (SIFT) for initial pose estimation with optical flow-based tracking (KLT) for efficient local pose estimation.
3) Outliers in the tracking are removed using RANSAC to improve accuracy, and tracking restarts from initial pose estimation if the number of inliers falls below a threshold.
In tech vision-based_obstacle_detection_module_for_a_wheeled_mobile_robotSudhakar Spartan
This document describes a vision-based obstacle detection module for a wheeled mobile robot. It uses a stereoscopic vision system to detect obstacles and build and update a map of the environment. The system extracts the ground surface from images using hue and detects obstacle edges using luminance. It then performs stereo matching on corresponding obstacle edge points to calculate depth and detect obstacles. The module is implemented using an FPGA to achieve high-performance real-time obstacle detection for safe robot navigation and dynamic map updating.
Motion capture is the process of recording movements of humans or objects and translating that data into digital form that can be used in films, games, and other media. It works by tracking markers placed on actors' bodies and using multiple synchronized cameras to triangulate the 3D positions over time. Early motion capture used mechanical exoskeletons connected to joints, but modern optical systems track passive reflective markers with cameras in the infrared spectrum. Optical motion capture is now commonly used in film production due to its accuracy and ability to capture complex performances without wires or sensors restricting movement.
Intelligent indoor mobile robot navigation using stereo visionsipij
Majority of the existing robot navigation systems, which facilitate the use of laser range finders, sonar
sensors or artificial landmarks, has the ability to locate itself in an unknown environment and then build a
map of the corresponding environment. Stereo vision,while still being a rapidly developing technique in the
field of autonomous mobile robots, are currently less preferable due to its high implementation cost. This
paper aims at describing an experimental approach for the building of a stereo vision system that helps the
robots to avoid obstacles and navigate through indoor environments and at the same time remaining very
much cost effective. This paper discusses the fusion techniques of stereo vision and ultrasound sensors
which helps in the successful navigation through different types of complex environments. The data from
the sensor enables the robot to create the two dimensional topological map of unknown environments and
stereo vision systems models the three dimension model of the same environment.
Interactive full body motion capture using infrared sensor networkijcga
Traditional motion capture (mocap) has been
well
-
stud
ied in visual science for
the last decades
. However
the fie
ld is mostly about capturing
precise animation to be used in
specific
application
s
after
intensive
post
processing such as studying biomechanics or rigging models in movies. These data set
s are normally
captured in complex laboratory environments with
sophisticated
equipment thus making motion capture a
field that is mostly exclusive to professional animators.
In
addition
, obtrusive sensors must be attached to
actors and calibrated within t
he capturing system, resulting in limited and unnatural motion.
In recent year
the rise of computer vision and interactive entertainment opened the gate for a different type of motion
capture which focuses on producing
optical
marker
less
or mechanical sens
orless
motion capture.
Furtherm
ore a wide array of low
-
cost
device are released that are easy to use
for less mission critical
applications
.
This paper
describe
s
a new technique of using multiple infrared devices to process data from
multiple infrared sensors to enhance the flexibility and accuracy of the markerless mocap
using commodity
devices such as Kinect
. The method involves analyzing each individual sensor
data, decompose and rebuild
them into a uniformed skeleton across all sensors. We then assign criteria to define the confidence level of
captured signal from
sensor. Each sensor operates on its own process and communicates through MPI.
Our method emphasize
s on the need of minimum calculation overhead for better real time performance
while being able to maintain good scalability
This document provides an overview of Simultaneous Localization and Mapping (SLAM). It discusses the basic theories behind SLAM, including what SLAM is, where it is used, and how it works. It also covers visual-inertial SLAM (VI-SLAM) in more detail, discussing the sensors, models, calibration, and the frontend and backend processing.
Interactive Full-Body Motion Capture Using Infrared Sensor Network ijcga
The document describes a new technique for interactive full-body motion capture using multiple infrared sensors. It processes data from each sensor independently and then combines the results to enhance flexibility and accuracy. The method aims to maintain real-time performance while improving on issues like limited actor orientation, inaccurate joint tracking, and conflicting data from individual sensors.
Human Motion Detection in Video Surveillance using Computer Vision TechniqueIRJET Journal
The document discusses a technique for detecting human motion in video surveillance using computer vision. It proposes a method called DECOLOR (Detecting Contiguous Outliers in the LOw-rank Representation) that formulates object detection as outlier detection in a low-rank representation of video frames. This allows it to detect moving objects flexibly without assumptions about foreground or background behavior. DECOLOR simultaneously performs object detection and background estimation using only the test video sequence, without requiring training data. The method models the outlier support explicitly and favors spatially contiguous outliers, making it suitable for detecting clustered foreground objects like people. It achieves more accurate detection and background estimation than state-of-the-art robust principal component analysis methods.
This document summarizes a research paper that proposed a non-intrusive system to detect driver drowsiness using computer vision techniques. The system uses the Viola-Jones algorithm to detect the driver's face and eyes in images. It then applies fuzzy logic and genetic algorithms to analyze features like eye state and mouth movements to determine the level of drowsiness. The study found genetic algorithms produced more accurate results compared to fuzzy logic alone at estimating drowsiness levels from the facial analysis. The system represents a non-invasive alternative to existing drowsiness detection methods.
Obstacle detection for autonomous systems using stereoscopic images and bacte...IJECEIAES
This paper presents a low cost strategy for real-time estimation of the position of ob- stacles in an unknown environment for autonomous robots. The strategy was intended for use in autonomous service robots, which navigate in unknown and dynamic indoor environments. In addition to human interaction, these environments are characterized by a design created for the human being, which is why our developments seek morphological and functional similarity equivalent to the human model. We use a pair of cameras on our robot to achieve a stereoscopic vision of the environment, and we analyze this information to determine the distance to obstacles using an algorithm that mimics bacterial behavior. The algorithm was evaluated on our robotic platform demonstrating high performance in the location of obstacles and real-time operation.
A Robot Collision Avoidance Method Using Kinect and Global VisionTELKOMNIKA JOURNAL
This document describes a robot collision avoidance method using Kinect and global vision. Global vision is installed above the robot to detect potential collision objects in real-time using background subtraction and thresholding algorithms. Kinect detects human skeletons and uses a Kalman filter to predict joint positions over time. The predicted joint positions are used to calculate a human joint danger index to evaluate collision risk levels. Different motion control strategies are then applied depending on the obstacle category and danger index value.
Deconstructing SfM-Net architecture and beyond
"SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations. Given a sequence of frames, SfM-Net predicts depth, segmentation, camera and rigid object motions, converts those into a dense frame-to-frame motion field (optical flow), differentiably warps frames in time to match pixels and back-propagates."
Alternative download:
https://www.dropbox.com/s/aezl7ro8sy2xq7j/sfm_net_v2.pdf?dl=0
GPS Tracking System Coupled With Image Processing In Traffic Signals to Enhan...ijcsit
In modern and crowded traffic life, it becomes mandatory to save the human life by reaching the hospital very sooner than possible. This paper explains about the advancement in tracking the ambulance presence amidst the traffic with the usage of GPS tracking system calculating the position of the ambulance from the traffic signal which replaces the RF transmitter receiver circuitry. This paper also retains the use of the image capturing technology available from the base paper (IMPACT OF IMAGE PROCESSING IN SAVING THE HUMAN LIFE BY AUTOMATING TRAFFIC SIGNALS) to identify the presence of the ambulance, thereby changing the signal to green automatically in favour of ambulance.
IRJET- Behavior Analysis from Videos using Motion based Feature ExtractionIRJET Journal
This document proposes a technique for analyzing human behavior in videos using motion-based feature extraction. It discusses how previous approaches have used spatial and temporal features to detect abnormal behaviors. The proposed approach extracts motion features from videos to represent each video with a single feature vector, rather than extracting features from each individual frame. This reduces the feature space and unnecessary information. The technique involves preprocessing videos into frames, extracting motion features, using KNN classification on the features to classify behaviors as normal or abnormal, and evaluating the method's performance on various metrics like accuracy, recall, and precision. Testing on fight and riot datasets showed the motion-based approach achieved higher accuracy, recall, precision and F-measure than a non-motion based approach.
A STUDY OF VARIATION OF NORMAL OF POLY-GONS CREATED BY POINT CLOUD DATA FOR A...Tomohiro Fukuda
This slide is presented in CAADRIA2011 (The 16th International Conference on Computer Aided Architectural Design Research in Asia).
Abstracts: Acquiring current 3D space data of cities, buildings, and rooms rapidly and in detail has become indispensable. When the point cloud data of an object or space scanned by a 3D laser scanner is converted into polygons, it is an accumulation of small polygons. When object or space is a closed flat plane, it is necessary to merge small polygons to reduce the volume of data, and to convert them into one polygon. When an object or space is a closed flat plane, each normal vector of small polygons theoretically has the same angle. However, in practise, these angles are not the same. Therefore, the purpose of this study is to clarify the variation of the angle of a small polygon group that should become one polygon based on actual data. As a result of experimentation, no small polygons are converted by the point cloud data scanned with the 3D laser scanner even if the group of small polygons is a closed flat plane lying in the same plane. When the standard deviation of the extracted number of polygons is assumed to be less than 100, the variation of the angle of the normal vector is roughly 7 degrees.
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...CSCJournals
Augmented reality has been a topic of intense research for several years for many applications. It consists of inserting a virtual object into a real scene. The virtual object must be accurately positioned in a desired place. Some measurements (calibration) are thus required and a set of correspondences between points on the calibration target and the camera images must be found. In this paper, we present a tracking technique based on both detection of Chessboard corners and a least squares method; the objective is to estimate the perspective transformation matrix for the current view of the camera. This technique does not require any information or computation of the camera parameters; it can used in real time without any initialization and the user can change the camera focal without any fear of losing alignment between real and virtual object.
Conventional non-vision based navigation systems relying on purely Global Positioning System (GPS) or inertial sensors can provide the 3D position or orientation of the user. However GPS is often not available in forested regions and cannot be used indoors. Visual odometry provides an independent method to estimate position and orientation of the user/system based on the images captured by the moving user accurately. Vision based systems also provide information (e.g. images, 3D location of landmarks, detection of scene objects) about the scene that the user is looking at. In this project, a set of techniques are used for the accurate pose and position estimation of the moving vehicle for autonomous navigation using the images obtained from two cameras placed at two different locations of the same area on the top of the vehicle. These cases are referred to as stereo vision. Stereo vision provides a method for the 3D reconstruction of the environment which is required for pose and position estimation. Firstly, a set of images are captured. The Harris corner detector is utilized to automatically extract a set of feature points from the images and then feature matching is done using correlation based matching. Triangulation is applied on feature points to find the 3D co-ordinates. Next, a new set of images is captured. Then repeat the same technique for the new set of images too. Finally, by using the 3D feature points, obtained from the first set of images and the new set of images, the pose and position estimation of moving vehicle is done using QUEST algorithm.
Availability of Mobile Augmented Reality System for Urban Landscape SimulationTomohiro Fukuda
This slide is presented in CDVE2012 (The 9th International Conference on Cooperative Design, Visualization, and Engineering).
Abstract. This research presents the availability of a landscape simulation method for a mobile AR (Augmented Reality), comparing it with photo montage and VR (Virtual Reality) which are the main existing methods. After a pilot experiment with 28 subjects in Kobe city, a questionnaire about three landscape simulation methods was implemented. In the results of the questionnaire, the mobile AR method was well evaluated for reproducibility of a landscape, operability, and cost. An evaluation rated as better than equivalent was obtained in comparison with the existing methods. The suitability of mobile augmented reality for landscape simulation was found to be high.
Integration of a Structure from Motion into Virtual and Augmented Reality for...Tomohiro Fukuda
Proceedings (Full paper reviewed)
Tomohiro Fukuda, Hideki Nada, Haruo Adachi, Shunta Shimizu, Chikako Takei, Yusuke Sato, Nobuyoshi Yabuki, and Ali Motamedi: 2017, Integration of a Structure from Motion into Virtual and Augmented Reality for Architectural and Urban Simulation: Demonstrated in Real Architectural and Urban Projects, Future Trajectories of Computation in Design: 17th International Conference CAAD Futures 2017, p.596, 2017.7
Book (Book contribution)
Tomohiro Fukuda, Hideki Nada, Haruo Adachi, Shunta Shimizu, Chikako Takei, Yusuke Sato, Nobuyoshi Yabuki, and Ali Motamedi: 2017, Integration of a Structure from Motion into Virtual and Augmented Reality for Architectural and Urban Simulation: Demonstrated in Real Architectural and Urban Projects, Computer-Aided Architectural Design - Future Trajectories,pp.60-77,Springer (Communications in Computer and Information Science 724), ISSN 1865-0929,ISBN 978-981-10-5196-8,2017.7
Computational visual simulations are extremely useful and powerful tools for decision-making. The use of virtual and augmented reality (VR/AR) has become a common phenomenon due to real-time and interactive visual simulation tools in architectural and urban design studies and presentations. In this study, a demonstration is performed to integrate structure from motion (SfM) into VR and AR. A 3D modeling method is explored by SfM under real-time rendering as a solution for the modeling cost in large-scale VR. The study examines the application of camera parameters of SfM to realize an appropriate registration and tracking accuracy in marker-less AR to visualize full-scale design projects on a planned construction site. The proposed approach is applied to plural real architectural and urban design projects, and results indicate the feasibility and effectiveness of the proposed approach.
Motion planning and controlling algorithm for grasping and manipulating movin...ijscai
Many of the robotic grasping researches have been focusing on stationary objects. And for dynamic moving
objects, researchers have been using real time captured images to locate objects dynamically. However,
this approach of controlling the grasping process is quite costly, implying a lot of resources and image
processing.Therefore, it is indispensable to seek other method of simpler handling… In this paper, we are
going to detail the requirements to manipulate a humanoid robot arm with 7 degree-of-freedom to grasp
and handle any moving objects in the 3-D environment in presence or not of obstacles and without using
the cameras. We use the OpenRAVE simulation environment, as well as, a robot arm instrumented with the
Barrett hand. We also describe a randomized planning algorithm capable of planning. This algorithm is an
extent of RRT-JT that combines exploration, using a Rapidly-exploring Random Tree, with exploitation,
using Jacobian-based gradient descent, to instruct a 7-DoF WAM robotic arm, in order to grasp a moving
target, while avoiding possible encountered obstacles . We present a simulation of a scenario that starts
with tracking a moving mug then grasping it and finally placing the mug in a determined position, assuring
a maximum rate of success in a reasonable time.
This document appears to be a research paper submitted by Anutam Majumder for their M.Sc. degree. The paper discusses estimation of camera trajectory using visual odometry. It provides an introduction to visual odometry and discusses challenges in the field. The paper then reviews relevant literature on visual SLAM and deep learning based methods. It proposes a methodology using a CNN-RNN network for visual odometry estimation and discusses the network architecture, feature extraction, sequential modeling, and optimization. Results are presented on a dataset and the paper concludes with an evaluation comparing the method to other approaches.
This doctoral dissertation examines facial skin motion properties from video for modeling and applications. It presents two methods for computing strain patterns from video: a finite difference method and a finite element method. The finite element method incorporates material properties of facial tissues by modeling their Young's modulus values. Experiments show strain patterns are discriminative and stable features for facial expression recognition, age estimation, and person identification. The dissertation also develops a method for expression invariant face matching by modeling Young's modulus from multiple expressions.
An Application of Stereo Image Reprojection from Multi-Angle Images fo...Tatsuro Matsubara
This document presents a stereo image reprojection system for immersive virtual reality using images from multiple cameras. The system separates the VR process into rendering images from cameras placed spherically and calculating a view image for display based on head tracking. It develops a method to adjust the interpupillary distance for different head tilts by varying the camera positions. The system is shown to work in real-time on laptops and smartphones using VR headsets to provide an inexpensive way to experience immersive VR. Future work includes optimizing the image size and number of cameras needed.
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...Kitsukawa Yuki
パターン・映像情報処理特論において論文を紹介した時の発表資料です。
Xiangyun Meng, Wei Wang, and Ben Leong. 2015. SkyStitch: A Cooperative Multi-UAV-based Real-time Video Surveillance System with Stitching. In Proceedings of the 23rd ACM international conference on Multimedia (MM '15). ACM, New York, NY, USA, 261-270. DOI=http://dx.doi.org/10.1145/2733373.2806225
Interactive full body motion capture using infrared sensor networkijcga
Traditional motion capture (mocap) has been
well
-
stud
ied in visual science for
the last decades
. However
the fie
ld is mostly about capturing
precise animation to be used in
specific
application
s
after
intensive
post
processing such as studying biomechanics or rigging models in movies. These data set
s are normally
captured in complex laboratory environments with
sophisticated
equipment thus making motion capture a
field that is mostly exclusive to professional animators.
In
addition
, obtrusive sensors must be attached to
actors and calibrated within t
he capturing system, resulting in limited and unnatural motion.
In recent year
the rise of computer vision and interactive entertainment opened the gate for a different type of motion
capture which focuses on producing
optical
marker
less
or mechanical sens
orless
motion capture.
Furtherm
ore a wide array of low
-
cost
device are released that are easy to use
for less mission critical
applications
.
This paper
describe
s
a new technique of using multiple infrared devices to process data from
multiple infrared sensors to enhance the flexibility and accuracy of the markerless mocap
using commodity
devices such as Kinect
. The method involves analyzing each individual sensor
data, decompose and rebuild
them into a uniformed skeleton across all sensors. We then assign criteria to define the confidence level of
captured signal from
sensor. Each sensor operates on its own process and communicates through MPI.
Our method emphasize
s on the need of minimum calculation overhead for better real time performance
while being able to maintain good scalability
This document provides an overview of Simultaneous Localization and Mapping (SLAM). It discusses the basic theories behind SLAM, including what SLAM is, where it is used, and how it works. It also covers visual-inertial SLAM (VI-SLAM) in more detail, discussing the sensors, models, calibration, and the frontend and backend processing.
Interactive Full-Body Motion Capture Using Infrared Sensor Network ijcga
The document describes a new technique for interactive full-body motion capture using multiple infrared sensors. It processes data from each sensor independently and then combines the results to enhance flexibility and accuracy. The method aims to maintain real-time performance while improving on issues like limited actor orientation, inaccurate joint tracking, and conflicting data from individual sensors.
Human Motion Detection in Video Surveillance using Computer Vision TechniqueIRJET Journal
The document discusses a technique for detecting human motion in video surveillance using computer vision. It proposes a method called DECOLOR (Detecting Contiguous Outliers in the LOw-rank Representation) that formulates object detection as outlier detection in a low-rank representation of video frames. This allows it to detect moving objects flexibly without assumptions about foreground or background behavior. DECOLOR simultaneously performs object detection and background estimation using only the test video sequence, without requiring training data. The method models the outlier support explicitly and favors spatially contiguous outliers, making it suitable for detecting clustered foreground objects like people. It achieves more accurate detection and background estimation than state-of-the-art robust principal component analysis methods.
This document summarizes a research paper that proposed a non-intrusive system to detect driver drowsiness using computer vision techniques. The system uses the Viola-Jones algorithm to detect the driver's face and eyes in images. It then applies fuzzy logic and genetic algorithms to analyze features like eye state and mouth movements to determine the level of drowsiness. The study found genetic algorithms produced more accurate results compared to fuzzy logic alone at estimating drowsiness levels from the facial analysis. The system represents a non-invasive alternative to existing drowsiness detection methods.
Obstacle detection for autonomous systems using stereoscopic images and bacte...IJECEIAES
This paper presents a low cost strategy for real-time estimation of the position of ob- stacles in an unknown environment for autonomous robots. The strategy was intended for use in autonomous service robots, which navigate in unknown and dynamic indoor environments. In addition to human interaction, these environments are characterized by a design created for the human being, which is why our developments seek morphological and functional similarity equivalent to the human model. We use a pair of cameras on our robot to achieve a stereoscopic vision of the environment, and we analyze this information to determine the distance to obstacles using an algorithm that mimics bacterial behavior. The algorithm was evaluated on our robotic platform demonstrating high performance in the location of obstacles and real-time operation.
A Robot Collision Avoidance Method Using Kinect and Global VisionTELKOMNIKA JOURNAL
This document describes a robot collision avoidance method using Kinect and global vision. Global vision is installed above the robot to detect potential collision objects in real-time using background subtraction and thresholding algorithms. Kinect detects human skeletons and uses a Kalman filter to predict joint positions over time. The predicted joint positions are used to calculate a human joint danger index to evaluate collision risk levels. Different motion control strategies are then applied depending on the obstacle category and danger index value.
Deconstructing SfM-Net architecture and beyond
"SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations. Given a sequence of frames, SfM-Net predicts depth, segmentation, camera and rigid object motions, converts those into a dense frame-to-frame motion field (optical flow), differentiably warps frames in time to match pixels and back-propagates."
Alternative download:
https://www.dropbox.com/s/aezl7ro8sy2xq7j/sfm_net_v2.pdf?dl=0
GPS Tracking System Coupled With Image Processing In Traffic Signals to Enhan...ijcsit
In modern and crowded traffic life, it becomes mandatory to save the human life by reaching the hospital very sooner than possible. This paper explains about the advancement in tracking the ambulance presence amidst the traffic with the usage of GPS tracking system calculating the position of the ambulance from the traffic signal which replaces the RF transmitter receiver circuitry. This paper also retains the use of the image capturing technology available from the base paper (IMPACT OF IMAGE PROCESSING IN SAVING THE HUMAN LIFE BY AUTOMATING TRAFFIC SIGNALS) to identify the presence of the ambulance, thereby changing the signal to green automatically in favour of ambulance.
IRJET- Behavior Analysis from Videos using Motion based Feature ExtractionIRJET Journal
This document proposes a technique for analyzing human behavior in videos using motion-based feature extraction. It discusses how previous approaches have used spatial and temporal features to detect abnormal behaviors. The proposed approach extracts motion features from videos to represent each video with a single feature vector, rather than extracting features from each individual frame. This reduces the feature space and unnecessary information. The technique involves preprocessing videos into frames, extracting motion features, using KNN classification on the features to classify behaviors as normal or abnormal, and evaluating the method's performance on various metrics like accuracy, recall, and precision. Testing on fight and riot datasets showed the motion-based approach achieved higher accuracy, recall, precision and F-measure than a non-motion based approach.
A STUDY OF VARIATION OF NORMAL OF POLY-GONS CREATED BY POINT CLOUD DATA FOR A...Tomohiro Fukuda
This slide is presented in CAADRIA2011 (The 16th International Conference on Computer Aided Architectural Design Research in Asia).
Abstracts: Acquiring current 3D space data of cities, buildings, and rooms rapidly and in detail has become indispensable. When the point cloud data of an object or space scanned by a 3D laser scanner is converted into polygons, it is an accumulation of small polygons. When object or space is a closed flat plane, it is necessary to merge small polygons to reduce the volume of data, and to convert them into one polygon. When an object or space is a closed flat plane, each normal vector of small polygons theoretically has the same angle. However, in practise, these angles are not the same. Therefore, the purpose of this study is to clarify the variation of the angle of a small polygon group that should become one polygon based on actual data. As a result of experimentation, no small polygons are converted by the point cloud data scanned with the 3D laser scanner even if the group of small polygons is a closed flat plane lying in the same plane. When the standard deviation of the extracted number of polygons is assumed to be less than 100, the variation of the angle of the normal vector is roughly 7 degrees.
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...CSCJournals
Augmented reality has been a topic of intense research for several years for many applications. It consists of inserting a virtual object into a real scene. The virtual object must be accurately positioned in a desired place. Some measurements (calibration) are thus required and a set of correspondences between points on the calibration target and the camera images must be found. In this paper, we present a tracking technique based on both detection of Chessboard corners and a least squares method; the objective is to estimate the perspective transformation matrix for the current view of the camera. This technique does not require any information or computation of the camera parameters; it can used in real time without any initialization and the user can change the camera focal without any fear of losing alignment between real and virtual object.
Conventional non-vision based navigation systems relying on purely Global Positioning System (GPS) or inertial sensors can provide the 3D position or orientation of the user. However GPS is often not available in forested regions and cannot be used indoors. Visual odometry provides an independent method to estimate position and orientation of the user/system based on the images captured by the moving user accurately. Vision based systems also provide information (e.g. images, 3D location of landmarks, detection of scene objects) about the scene that the user is looking at. In this project, a set of techniques are used for the accurate pose and position estimation of the moving vehicle for autonomous navigation using the images obtained from two cameras placed at two different locations of the same area on the top of the vehicle. These cases are referred to as stereo vision. Stereo vision provides a method for the 3D reconstruction of the environment which is required for pose and position estimation. Firstly, a set of images are captured. The Harris corner detector is utilized to automatically extract a set of feature points from the images and then feature matching is done using correlation based matching. Triangulation is applied on feature points to find the 3D co-ordinates. Next, a new set of images is captured. Then repeat the same technique for the new set of images too. Finally, by using the 3D feature points, obtained from the first set of images and the new set of images, the pose and position estimation of moving vehicle is done using QUEST algorithm.
Availability of Mobile Augmented Reality System for Urban Landscape SimulationTomohiro Fukuda
This slide is presented in CDVE2012 (The 9th International Conference on Cooperative Design, Visualization, and Engineering).
Abstract. This research presents the availability of a landscape simulation method for a mobile AR (Augmented Reality), comparing it with photo montage and VR (Virtual Reality) which are the main existing methods. After a pilot experiment with 28 subjects in Kobe city, a questionnaire about three landscape simulation methods was implemented. In the results of the questionnaire, the mobile AR method was well evaluated for reproducibility of a landscape, operability, and cost. An evaluation rated as better than equivalent was obtained in comparison with the existing methods. The suitability of mobile augmented reality for landscape simulation was found to be high.
Integration of a Structure from Motion into Virtual and Augmented Reality for...Tomohiro Fukuda
Proceedings (Full paper reviewed)
Tomohiro Fukuda, Hideki Nada, Haruo Adachi, Shunta Shimizu, Chikako Takei, Yusuke Sato, Nobuyoshi Yabuki, and Ali Motamedi: 2017, Integration of a Structure from Motion into Virtual and Augmented Reality for Architectural and Urban Simulation: Demonstrated in Real Architectural and Urban Projects, Future Trajectories of Computation in Design: 17th International Conference CAAD Futures 2017, p.596, 2017.7
Book (Book contribution)
Tomohiro Fukuda, Hideki Nada, Haruo Adachi, Shunta Shimizu, Chikako Takei, Yusuke Sato, Nobuyoshi Yabuki, and Ali Motamedi: 2017, Integration of a Structure from Motion into Virtual and Augmented Reality for Architectural and Urban Simulation: Demonstrated in Real Architectural and Urban Projects, Computer-Aided Architectural Design - Future Trajectories,pp.60-77,Springer (Communications in Computer and Information Science 724), ISSN 1865-0929,ISBN 978-981-10-5196-8,2017.7
Computational visual simulations are extremely useful and powerful tools for decision-making. The use of virtual and augmented reality (VR/AR) has become a common phenomenon due to real-time and interactive visual simulation tools in architectural and urban design studies and presentations. In this study, a demonstration is performed to integrate structure from motion (SfM) into VR and AR. A 3D modeling method is explored by SfM under real-time rendering as a solution for the modeling cost in large-scale VR. The study examines the application of camera parameters of SfM to realize an appropriate registration and tracking accuracy in marker-less AR to visualize full-scale design projects on a planned construction site. The proposed approach is applied to plural real architectural and urban design projects, and results indicate the feasibility and effectiveness of the proposed approach.
Motion planning and controlling algorithm for grasping and manipulating movin...ijscai
Many of the robotic grasping researches have been focusing on stationary objects. And for dynamic moving
objects, researchers have been using real time captured images to locate objects dynamically. However,
this approach of controlling the grasping process is quite costly, implying a lot of resources and image
processing.Therefore, it is indispensable to seek other method of simpler handling… In this paper, we are
going to detail the requirements to manipulate a humanoid robot arm with 7 degree-of-freedom to grasp
and handle any moving objects in the 3-D environment in presence or not of obstacles and without using
the cameras. We use the OpenRAVE simulation environment, as well as, a robot arm instrumented with the
Barrett hand. We also describe a randomized planning algorithm capable of planning. This algorithm is an
extent of RRT-JT that combines exploration, using a Rapidly-exploring Random Tree, with exploitation,
using Jacobian-based gradient descent, to instruct a 7-DoF WAM robotic arm, in order to grasp a moving
target, while avoiding possible encountered obstacles . We present a simulation of a scenario that starts
with tracking a moving mug then grasping it and finally placing the mug in a determined position, assuring
a maximum rate of success in a reasonable time.
This document appears to be a research paper submitted by Anutam Majumder for their M.Sc. degree. The paper discusses estimation of camera trajectory using visual odometry. It provides an introduction to visual odometry and discusses challenges in the field. The paper then reviews relevant literature on visual SLAM and deep learning based methods. It proposes a methodology using a CNN-RNN network for visual odometry estimation and discusses the network architecture, feature extraction, sequential modeling, and optimization. Results are presented on a dataset and the paper concludes with an evaluation comparing the method to other approaches.
This doctoral dissertation examines facial skin motion properties from video for modeling and applications. It presents two methods for computing strain patterns from video: a finite difference method and a finite element method. The finite element method incorporates material properties of facial tissues by modeling their Young's modulus values. Experiments show strain patterns are discriminative and stable features for facial expression recognition, age estimation, and person identification. The dissertation also develops a method for expression invariant face matching by modeling Young's modulus from multiple expressions.
An Application of Stereo Image Reprojection from Multi-Angle Images fo...Tatsuro Matsubara
This document presents a stereo image reprojection system for immersive virtual reality using images from multiple cameras. The system separates the VR process into rendering images from cameras placed spherically and calculating a view image for display based on head tracking. It develops a method to adjust the interpupillary distance for different head tilts by varying the camera positions. The system is shown to work in real-time on laptops and smartphones using VR headsets to provide an inexpensive way to experience immersive VR. Future work includes optimizing the image size and number of cameras needed.
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...Kitsukawa Yuki
パターン・映像情報処理特論において論文を紹介した時の発表資料です。
Xiangyun Meng, Wei Wang, and Ben Leong. 2015. SkyStitch: A Cooperative Multi-UAV-based Real-time Video Surveillance System with Stitching. In Proceedings of the 23rd ACM international conference on Multimedia (MM '15). ACM, New York, NY, USA, 261-270. DOI=http://dx.doi.org/10.1145/2733373.2806225
The flow of baseline estimation using a single omnidirectional cameraTELKOMNIKA JOURNAL
1. The document describes a method for estimating the baseline of a single omnidirectional camera using optical flow tracking of points on an object.
2. As the camera is moved horizontally, tracking points on an object in panoramic images produces coordinate shifts that are saved and represented as graphs.
3. Analyzing the graphs allows determining the equation that estimates the baseline flow and coefficients of the equation.
Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...ijma
This paper deals with leader-follower formations of non-holonomic mobile robots, introducing a formation
control strategy based on pixel counts using a commercial grade electro optics camera. Localization of the
leader for motions along line of sight as well as the obliquely inclined directions are considered based on
pixel variation of the images by referencing to two arbitrarily designated positions in the image frames.
Based on an established relationship between the displacement of the camera movement along the viewing
direction and the difference in pixel counts between reference points in the images, the range and the angle
estimate between the follower camera and the leader is calculated. The Inverse Perspective Transform is
used to account for non linear relationship between the height of vehicle in a forward facing image and its
distance from the camera. The formulation is validated with experiments.
Fuzzy-proportional-integral-derivative-based controller for object tracking i...IJECEIAES
This paper aims at designing and implementing an intelligent controller for the orientation control of a two-wheeled mobile robot. The controller is designed in LabVIEW and based on analyzed image parameters from cameras. The image program calculates the distance and angle from the camera to the object. The fuzzy controller will get these parameters as crisp input data and send the calculated velocity as crisp output data to the right and left wheel motor for the robot tracking the target object. The results show that the controller gives a fast response and high reliability and quickly carries out data recovery from system faults. The system also works well in the uncertainties of process variables and without mathematical modeling.
This document discusses the development of an enhanced fusion vision system (EFFV) to improve pilot situational awareness in low visibility conditions. It describes fusing visible and infrared camera images using principal component analysis to generate a single superior image. Simulation results show the EFFV provides better detection capability by combining sensor data. The system could integrate with synthetic vision and advanced navigation to further improve landing accuracy in poor weather.
IRJET- Moving Object Detection with Shadow Compression using Foreground Segme...IRJET Journal
The document describes a study that investigates using foreground segmentation to extract moving objects from videos by detecting differences between frames, with the goal of tracking silhouettes of moving objects to create an interactive video display. The researchers propose using a statistical approach to segment foreground objects and apply filtering techniques to reduce noise from the extracted shadows. The results indicate the extraction process accurately tracks motion and outlines of foreground objects between frames.
The document summarizes a study that compares the performance of camera-only, IMU-only, and combined camera-IMU systems for visual inertial navigation (VisNav) and simultaneous localization and mapping (SLAM). Data was collected from a camera and IMU mounted on a rolling platform traversing a test course. The camera performed poorly around corners while the IMU had offset errors, but combining the sensors was superior to either individually. Statistical analysis showed the mean error was significantly lower for the camera-IMU system compared to the camera-only and IMU-only systems. The study was limited but integrating the sensors helped compensate their individual weaknesses.
IRJET- Robust and Fast Detection of Moving Vechiles in Aerial Videos usin...IRJET Journal
This document discusses a method for detecting moving vehicles in aerial videos using sliding windows in MATLAB. It first performs foreground motion segmentation and trajectory accumulation to build a dynamic scene motion heat map. It then introduces a novel saliency-based background model that enhances moving objects to segment vehicles in high-heat areas. Key steps include compensating for camera motion, detecting independent motion via tracking image corners, and applying a sliding window classifier with optimized parameters like window size and orientation. Evaluation on urban videos shows the approach achieves 88% detection rates with only 2% false positives, outperforming alternative segmentation-based methods in processing time and accuracy. The goal is robust, real-time vehicle detection from aerial videos for applications like traffic monitoring.
SIMULTANEOUS MAPPING AND NAVIGATION FOR RENDEZVOUS IN SPACE APPLICATIONS Nandakishor Jahagirdar
The project is to develop a autonomous navigation system along with mapping of the path.
A robot which senses the edges of the object in the path and move without colliding the object. This application equipped with camera as main component which captures the images and transmitted to workstation through wireless antenna.
The processing of the image is done on a workstation or computer using MATLAB-2013a. An IR ranging device, which senses any objects ahead of it and accordingly the robot change its direction to avoid any collision.
Thus we ensure that even in cases of circumstances leading to errors in the output of the image processing algorithm, a decision can be made using the input from the IR sensors.
AUTO LANDING PROCESS FOR AUTONOMOUS FLYING ROBOT BY USING IMAGE PROCESSING BA...csandit
In today’s technological life, everyone is quite familiar with the importance of security
measures in our lives. So in this regard, many attempts have been made by researchers and one
of them is flying robots technology. One well-known usage of flying robot, perhaps, is its
capability in security and care measurements which made this device extremely practical, not
only for its unmanned movement, but also for the unique manoeuvre during flight over the
arbitrary areas. In this research, the automatic landing of a flying robot is discussed. The
system is based on the frequent interruptions that is sent from main microcontroller to camera
module in order to take images; these images have been distinguished by image processing
system based on edge detection, after analysing the image the system can tell whether or not to
land on the ground. This method shows better performance in terms of precision as well as
experimentally.
This document summarizes a research paper on a webcam-based intelligent surveillance system. The system uses a webcam as a sensor to capture live images of the monitored area. If any motion is detected in the images, the software stores the captured images in a folder. It also sends a wireless signal to a receiver. The system has three security levels - high, middle, and low - to control the monitoring sensitivity. The key modules are login, camera interfacing, image capturing and storage, motion detection, and hardware interfacing. Motion detection works by comparing images over time and detecting changes. The system is presented as an affordable alternative to other security systems like CCTV cameras that have drawbacks around costs and vulnerability to hacking.
This document describes a project to develop a quadrotor drone capable of autonomous navigation in GPS-denied environments using visual sensors. The drone is equipped with an optical flow sensor for velocity and altitude control, an IMU, and cameras for visual SLAM. Software runs ROS and uses a monocular visual odometry algorithm. Initial flight tests show the drone can hold position and velocity with wind. Future work will integrate the visual SLAM for full position control and add safety features.
This paper proposes a way of recognition foreground of moving object by Foreground Extraction
algorithm by Pan-Tilt-Zoom camera. It presents the combined process of Foreground Extraction and local
histogram process. Background images are often modeled as multiple frames and their corresponding
camera pan and tilt angles are determined. Initially have got to work out the foremost matchedbackground
from sequence of input frames based on camera pose information. The method is more continued by
compensating the matched background image with this image. Then Background Subtraction is completed
between modeled background and current image. Finally before local histogram process is completed,
noises are often removed by morphological operators. As a result correct foreground moving objects are
successfully extracted by implementing these four steps.
DETECTION OF MOVING OBJECT USING FOREGROUND EXTRACTION ALGORITHM BY PTZ CAMERAijistjournal
The document summarizes a proposed method for detecting moving objects in video sequences captured by a Pan-Tilt-Zoom (PTZ) camera. The method uses foreground extraction and local histogram processing. It first finds the most matched background frame using Euclidean distance. SURF keypoints are extracted and used to compute a homography matrix for background compensation. Foreground is extracted by subtracting the compensated background from the current frame. Morphological operations and local histogram processing are applied to remove noise and obtain the final extracted foreground containing the moving objects. The method is evaluated on a public dataset and is shown to achieve accurate segmentation when combining foreground extraction with local histogram processing.
IRJET- A Review Analysis to Detect an Object in Video Surveillance SystemIRJET Journal
This document reviews techniques for detecting objects in video surveillance systems. It discusses common object detection methods like frame differencing, optical flow, and background subtraction. Frame differencing detects motion by calculating pixel differences between frames but cannot detect still objects. Optical flow estimates pixel motion between frames to detect objects. Background subtraction models the static background and detects objects by subtracting current frames from the background model. The document analyzes these techniques and their use in video surveillance applications like traffic monitoring and security. It concludes more research is needed to improve object classification accuracy and handle challenges like camera motion.
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHMcsandit
Computer vision approaches are increasingly used in mobile robotic systems, since they allow
to obtain a very good representation of the environment by using low-power and cheap sensors.
In particular it has been shown that they can compete with standard solutions based on laser
range scanners when dealing with the problem of simultaneous localization and mapping
(SLAM), where the robot has to explore an unknown environment while building a map of it and
localizing in the same map. We present a package for simultaneous localization and mapping in
ROS (Robot Operating System) using a monocular camera sensor only. Experimental results in
real scenarios as well as on standard datasets show that the algorithm is able to track the
trajectory of the robot and build a consistent map of small environments, while running in near
real-time on a standard PC.
Due to new developed technology man is leading a comfortable life. People want each work should be done automatically. So in this paper introduces a system called UNMANNED GROUND VEHICLE. UGV as name indicate it operates in contact with ground and without any human resource. The vehicle will have a set of sensors to observe the environment. In this paper for the working of UGV, FPGA is embedded with image processing. FPGA as main processing platform used to control UGV. The performance evaluation of proposed system takes place by capturing the image of UGV with help of camera. This system demonstrate accurate localization of UGV. This UGV vehicle used in military, mall, automobile industry. To work system properly provide proper interfacing and synchronization between hardware/software module.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document summarizes and compares various algorithms used to implement video surveillance systems, including pixel matching, image matching, and clustering algorithms. It first provides background on video surveillance systems and their need for automatic abnormal motion detection. It then reviews several specific algorithms: pixel matching, agglomerative clustering, reciprocal nearest neighbor pairing, sub-pixel mapping, patch matching, tone mapping, and k-means clustering. For each algorithm, it provides a brief overview of the approach and complexity. The document also discusses image matching algorithms like classic image checking, pixel-based identity checking, and pixel-based similarity checking. Overall, the document analyzes algorithms that can be used to detect and classify motion in video surveillance systems.
Similar to Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Environments (20)
Expanding Access to Affordable At-Home EV Charging by Vanessa WarheitForth
Vanessa Warheit, Co-Founder of EV Charging for All, gave this presentation at the Forth Addressing The Challenges of Charging at Multi-Family Housing webinar on June 11, 2024.
Implementing ELDs or Electronic Logging Devices is slowly but surely becoming the norm in fleet management. Why? Well, integrating ELDs and associated connected vehicle solutions like fleet tracking devices lets businesses and their in-house fleet managers reap several benefits. Check out the post below to learn more.
Understanding Catalytic Converter Theft:
What is a Catalytic Converter?: Learn about the function of catalytic converters in vehicles and why they are targeted by thieves.
Why are They Stolen?: Discover the valuable metals inside catalytic converters (such as platinum, palladium, and rhodium) that make them attractive to criminals.
Steps to Prevent Catalytic Converter Theft:
Parking Strategies: Tips on where and how to park your vehicle to reduce the risk of theft, such as parking in well-lit areas or secure garages.
Protective Devices: Overview of various anti-theft devices available, including catalytic converter locks, shields, and alarms.
Etching and Marking: The benefits of etching your vehicle’s VIN on the catalytic converter or using a catalytic converter marking kit to make it traceable and less appealing to thieves.
Surveillance and Monitoring: Recommendations for using security cameras and motion-sensor lights to deter thieves.
Statistics and Insights:
Theft Rates by Borough: Analysis of data to determine which borough in NYC experiences the highest rate of catalytic converter thefts.
Recent Trends: Current trends and patterns in catalytic converter thefts to help you stay aware of emerging hotspots and tactics used by thieves.
Benefits of This Presentation:
Awareness: Increase your awareness about catalytic converter theft and its impact on vehicle owners.
Practical Tips: Gain actionable insights and tips to effectively prevent catalytic converter theft.
Local Insights: Understand the specific risks in different NYC boroughs, helping you take targeted preventive measures.
This presentation aims to equip you with the knowledge and tools needed to protect your vehicle from catalytic converter theft, ensuring you are prepared and proactive in safeguarding your property.
Welcome to ASP Cranes, your trusted partner for crane solutions in Raipur, Chhattisgarh! With years of experience and a commitment to excellence, we offer a comprehensive range of crane services tailored to meet your lifting and material handling needs.
At ASP Cranes, we understand the importance of reliable and efficient crane operations in various industries, from construction and manufacturing to logistics and infrastructure development. That's why we strive to deliver top-notch solutions that enhance productivity, safety, and cost-effectiveness for our clients.
Our services include:
Crane Rental: Whether you need a crawler crane for heavy lifting or a hydraulic crane for versatile operations, we have a diverse fleet of well-maintained cranes available for rent. Our rental options are flexible and can be customized to suit your project requirements.
Crane Sales: Looking to invest in a crane for your business? We offer a wide selection of new and used cranes from leading manufacturers, ensuring you find the perfect equipment to match your needs and budget.
Crane Maintenance and Repair: To ensure optimal performance and safety, regular maintenance and timely repairs are essential for cranes. Our team of skilled technicians provides comprehensive maintenance and repair services to keep your equipment running smoothly and minimize downtime.
Crane Operator Training: Proper training is crucial for safe and efficient crane operation. We offer specialized training programs conducted by certified instructors to equip operators with the skills and knowledge they need to handle cranes effectively.
Custom Solutions: We understand that every project is unique, which is why we offer custom crane solutions tailored to your specific requirements. Whether you need modifications, attachments, or specialized equipment, we can design and implement solutions that meet your needs.
At ASP Cranes, customer satisfaction is our top priority. We are dedicated to delivering reliable, cost-effective, and innovative crane solutions that exceed expectations. Contact us today to learn more about our services and how we can support your project in Raipur, Chhattisgarh, and beyond. Let ASP Cranes be your trusted partner for all your crane needs!
EV Charging at MFH Properties by Whitaker JamiesonForth
Whitaker Jamieson, Senior Specialist at Forth, gave this presentation at the Forth Addressing The Challenges of Charging at Multi-Family Housing webinar on June 11, 2024.
What Could Be Behind Your Mercedes Sprinter's Power Loss on Uphill RoadsSprinter Gurus
Unlock the secrets behind your Mercedes Sprinter's uphill power loss with our comprehensive presentation. From fuel filter blockages to turbocharger troubles, we uncover the culprits and empower you to reclaim your vehicle's peak performance. Conquer every ascent with confidence and ensure a thrilling journey every time.
Ever been troubled by the blinking sign and didn’t know what to do?
Here’s a handy guide to dashboard symbols so that you’ll never be confused again!
Save them for later and save the trouble!
Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Environments
1. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Visual Modelling and Collision Avoidance in
Dynamic Environments
from Monocular Video Sequences
Darius Burschka
Machine Vision and Perception Group (MVP)
Department of Computer Science
Technische Universität München
2. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
MVPResearch of the MVP Group
The Machine Vision and
Perception Group @TUM works
on the aspects of visual
perception and control in
medical, mobile, and HCI
applications
Visual navigation
Biologically motivated
perception
Perception for manipulation
Visual Action AnalysisPhotogrammetric monocular
reconstruction
Rigid and Deformable
Registration
3. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Research of the MVP Group
Exploration of physical
object properties
Sensor substitution
Multimodal Sensor
Fusion
Development of new
Optical Sensors
The Machine Vision and
Perception Group @TUM works
on the aspects of visual
perception and control in
medical, mobile, and HCI
applications
5. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Coupling Alternatives for Perception Modules
M
M
Sensors
Camera,IMU,Laser
Structure-from-X
Actuators
Map-based action planning (not real-time)
(Metric Representation)
Reactive behavior(Instincts), e.g., Obstacle avoidance,…
(real-time for control)
do we really need metric representation here?
8. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Biology helps to increase robustness
Mair, Burschka
Mobile Robots Navigation, book chapter, In-Tech, 2010
9. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020 9
Can we navigate directly from monocular video?
(Zinf system, Burschka et al. 2008)
10. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Visual Static Modelling with a Drone (2007)
10
Mount for an 8MP
Digital Camera
13. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
We used to reconstruct static scenes from monocular in
2007… (with DLR)
13
Accuracy:1.5cm
14. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
High Accuracy at Example of Light Section 3D Reconstruction
14
MachineVisionandPerceptionGroup@TUMMVP
http://www6.in.tum.de/burschka/ Seminar, Oct 28, 2016
Light Section Principle
15. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Accuracy of the system - Construction of 3D models (2008)
Camera localization
accuracy allows direct
stiching of the line responses
from the light-section system
16. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
120fps Monocular Navigation from Sparse Optical Flow
16
GPU implementation of sparse flow (feature-based OpenCV) system
using only 10% of the resources
17. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
What can we do with the 3D PointClouds?
labeling motion parameters
19. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
ObjectRANSAC system fitting 3D models into
cluttered scenes (Papazov et al. 2010)
20. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Deformable Registration from Generic Models
(special issue SGP'11 Papazov et al.)
Matching of a detailed shape to a
primitive prior
The manipulation “heat map” from the generic
model gets propagated
Deformation of the original
model generates a deformation
heat-map showing the
similarities of object regions to
the model.
21. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
21
Navigation for Control
VINS filter design [Schmid et al. IROS 2012]
Synchronization of real-time and non realtime modules by sensor hardware trigger
Direct system state:
High rate calculation by „Strap Down Algorithm“ (SDA)
Indirect system state:
Estimation by indirect Extended Kalman Filter (EKF)
22. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
70 m trajectory
Ground truth by
tachymeter
5 s forced vision drop out
with translational motion
1 s forced vision drop out
with rotational motion
Estimation error < 1.2 m
Odometry error < 25.9 m
Results comparable to runs
without vision drop outs
VINS-Systems
Fusion of heterogeneous data with varying latencies (with DLR)
23. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020 23
Navigation under strong illumination changes
• Autonomous indoor/outdoor flight
of 60m
• Mapping resolution: 0.1m
• Leaving through a window
• Returning through door
24. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Collaborative Reconstruction with Self-Localization (CVP2008)
Vision in Action: Efficient straegies for
cognitive agents in complex envirnments)
4 Darius Burschka
Fig. 2. Collaborative 3D reconstruction from 2 independently moving cameras.
directional system with a large field of view.
Fig. 3. Single viewpoint
property.
We decided to use omnidirectional systems in-
stead of fish-lens cameras, because their single view-
point property [2] is essential for our combined
localization and reconstruction approach (Fig. 3).
This property allows an easy recovery of the viewing
angle of the virtual camera with the focal point F
(Fig. 3) directly from the image coordinates (ui, νi).
A standard perspective camera can be mapped on
our generic model of an omnidirectional sensor. The
only limitation of a standard perspective camera is
the limited viewing angle. In case of a standard per-
spective camera with a focal point at F, we can es-
timate the direction vector ni of the incoming rays
from the uni-focal image coordinates (focal length
of the camera f=1) (ui, νi) to
method which can be replaced by our robust 3D estimate described in section 2.1.
This mode is necessary in cases, where the required configuration of the cameras
causes occlusions between the agents although the target is still in view of both
cameras.
Our approach offers a robust initialization method for the system presented
in [3]. The original approach relied on an essential method to initialize the
3D structure in the world. Our system gives a more robust initialization method
minimizing the image error directly. The limited space of this paper does not
allow a detailed description of this part of the system. The recursive approach
from [3] is used to maintain the radial distance λx.
3 Results
Our flying systems use omnidirectional mirrors like the one depicted in Fig. 6
Fig. 6. Flying agent equipped with an omnidirectional sensor pointing upwards.
We tested the system on several indoor and outdoor sequences with two cam-
eras observing the world through different sized planar mirrors (Fig. 4) using a
Linux laptop computer with a 1.2 GHz Pentium Centrino processor. The system
was equipped with 1GB RAM and was operating two Firewire cameras with
standard PAL resolution of 768x576.
3.1 Accuracy of the Estimation of Extrinsic Parameters
We used the system to estimate the extrinsic motion parameters and achieved
results comparable with the extrinsic camera calibration results. We verified
the parameters by applying them to the 3D reconstruction process in (5) and
achieved measurement accuracy below the resolution of our test system. This
reconstruction was in the close range of the system which explains the high
inria-00325805,version1-30Sep2008
property. timate the direction vector ni of the incoming rays
from the uni-focal image coordinates (focal length
of the camera f=1) (ui, νi) to
ni =
(ui, νi, 1)T
||(ui, νi, 1)T ||
. (2)
We rely on the fact that each camera can see the partner and the area that it
wants to reconstruct at the same time.
In our system, Camera 1 observes the position of the focal point F of Cam-
era 2 along the vector T , and the point P to be reconstructed along the vector V1
simultaneously (Fig. 2). The second camera (Camera 2) uses its own coordinate
frame to reconstruct the same point P along the vector V2. The point P observed
by this camera has modified coordinates [10]:
V2 = R ∗ (V1 + T ) (3)
inria-0032580
Collaborative Exploration - Vision in Action
Since we cannot rely on any extrinsic calibration, we perform the calibratio
of the extrinsic parameters directly from the current observation. We need t
find the transformation parameters (R, T) in (3) defining the transformatio
between the coordinate frames of the two cameras. Each camera defines its ow
coordinate frame.
2.1 3D Reconstruction from Motion Stereo
In our system, the cameras undergo an arbitrary motion (R, T ) which result
in two independent observations (n1, n2) of a point P. The equation (3) can b
written using (2) as
λ2n2 = R ∗ (λ1n1 + T ). (4
We need to find the radial distances (λ1, λ2) along the incoming rays to estimat
the 3D coordinates of the point. We can find it by re-writing (4) to
(−Rn1, n2)
λ1
λ2
= R · T
λ1
λ2
= (−Rn1, n2)
−∗
· R · T = D−∗
· R · T
(5
We use in (5) the pseudo inverse matrix D−∗
to solve for the two unknown ra
dial distances (λ1, λ2). A pseudo-inverse matrix to D can be calculated accordin
to
Sep2008
Collaborative Exploration - Vision in Action 5
Since we cannot rely on any extrinsic calibration, we perform the calibration
of the extrinsic parameters directly from the current observation. We need to
find the transformation parameters (R, T) in (3) defining the transformation
between the coordinate frames of the two cameras. Each camera defines its own
coordinate frame.
2.1 3D Reconstruction from Motion Stereo
In our system, the cameras undergo an arbitrary motion (R, T ) which results
in two independent observations (n1, n2) of a point P. The equation (3) can be
written using (2) as
λ2n2 = R ∗ (λ1n1 + T ). (4)
We need to find the radial distances (λ1, λ2) along the incoming rays to estimate
the 3D coordinates of the point. We can find it by re-writing (4) to
(−Rn1, n2)
λ1
λ2
= R · T
λ1
λ2
= (−Rn1, n2)
−∗
· R · T = D−∗
· R · T
(5)
We use in (5) the pseudo inverse matrix D−∗
to solve for the two unknown ra-
dial distances (λ1, λ2). A pseudo-inverse matrix to D can be calculated according
to
D−∗
= (DT
· D)−1
· DT
. (6)
The pseudo-inverse operation finds a least square approximation satisfying the
overdetermined set of three equations with two unknowns (λ1, λ2) in (5). Due
to calibration and detection errors, the two lines V1 and V2 in Fig. 2 do not
necessarily intersect. Equation (5) calculates the position of the point along
ersion1-30Sep2008
Collaborative Exploration - Vision in Action 5
Since we cannot rely on any extrinsic calibration, we perform the calibration
of the extrinsic parameters directly from the current observation. We need to
find the transformation parameters (R, T) in (3) defining the transformation
between the coordinate frames of the two cameras. Each camera defines its own
coordinate frame.
2.1 3D Reconstruction from Motion Stereo
In our system, the cameras undergo an arbitrary motion (R, T ) which results
in two independent observations (n1, n2) of a point P. The equation (3) can be
written using (2) as
λ2n2 = R ∗ (λ1n1 + T ). (4)
We need to find the radial distances (λ1, λ2) along the incoming rays to estimate
the 3D coordinates of the point. We can find it by re-writing (4) to
(−Rn1, n2)
λ1
λ2
= R · T
λ1
λ2
= (−Rn1, n2)
−∗
· R · T = D−∗
· R · T
(5)
We use in (5) the pseudo inverse matrix D−∗
to solve for the two unknown ra-
dial distances (λ1, λ2). A pseudo-inverse matrix to D can be calculated according
to
D−∗
= (DT
· D)−1
· DT
. (6)
The pseudo-inverse operation finds a least square approximation satisfying the
overdetermined set of three equations with two unknowns (λ1, λ2) in (5). Due
to calibration and detection errors, the two lines V1 and V2 in Fig. 2 do not
necessarily intersect. Equation (5) calculates the position of the point along
each line closest to the other line.
We notice the similarity between the equations (1) and (5). Equation (1)
can be written to solve for the unknown distance Z from the image plane of the
coplanar binocular stereo system to:
325805,version1-30Sep2008
25. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Asynchronous Stereo for Dynamic Scenes
4 Darius Burschka
Fig. 2. Collaborative 3D reconstruction from 2 independently moving cameras.
directional system with a large field of view.
Fig. 3. Single viewpoint
property.
We decided to use omnidirectional systems in-
stead of fish-lens cameras, because their single view-
point property [2] is essential for our combined
localization and reconstruction approach (Fig. 3).
This property allows an easy recovery of the viewing
angle of the virtual camera with the focal point F
(Fig. 3) directly from the image coordinates (ui, νi).
A standard perspective camera can be mapped on
our generic model of an omnidirectional sensor. The
only limitation of a standard perspective camera is
the limited viewing angle. In case of a standard per-
spective camera with a focal point at F, we can es-
timate the direction vector ni of the incoming rays
from the uni-focal image coordinates (focal length
of the camera f=1) (ui, νi) to
ni =
(ui, νi, 1)T
||(ui, νi, 1)T ||
. (2)
inria-00325805,version1-30Sep2008
NTP
Figure 2: Here: C0 and C1 are the camera centers of the
stereo pair, P0,P1,P2 are the 3D poses of the point at times
t0,t1,t2. Latter correspond to frame acquisition timestamps
of camera C0. P⇤ is the 3D pose of the point at time t⇤,
which correspond to the frame acquisition timestamp of the
camera C1. Vectors v0,v1,v2, are unit vectors pointing from
camera center C0, to corresponding 3D points. Vector v⇤ is
the unit vector pointing from camera center C1 to pose P⇤
and the unit vector v3 is the velocity direction vector.
Here a = acos(v0 ·v1) and b = acos(v2 ·v1). First
we write law of sines for the triangles 4P0COP1 and
Since the angle between the velocity direction
vector v3 and v0 is y, v3 and v2 is h, and v2 and
ˆn is p
2 . We can compute the v3 by solving the follow-
ing equation:
2
4
v0x v0y v0z
v2x v2y v2z
nx ny nz
3
5
2
4
v3x
v3y
v3z
3
5 =
2
4
cosy
cosh
0
3
5
(7)
3.2 Path Reconstruction
In the second stage of proposed method we compute
the 3D pose P0. For reasons of simplicity we will
represent the poses (Pi (i = 0..2), P⇤) depicted in Fig.
2 as:
Pi =
2
4
ai ⇤zi
bi ⇤zi
zi
3
5 P⇤ =
2
4
m⇤z⇤
0 +tx
s⇤z⇤
0 +ty
q⇤z⇤
0 +tz
3
5 (8)
where tx,ty,tz are the x,y and z components of
the translation vector between the cameras, ai = vix
viz
,
bi =
viy
viz
are the x and y components of the vi (i = 0..2)
direction vectors Figure 2, and:
2 ⇤ 3
Mkhitaryan, Burschka VISAPP 2014
MVP
26. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Back to Autonomous Vehicle Applications - Processing Units:
Local Feature Tracking Algorithms (AGAST, fastest keypoint detector part of
OpenCV developed by us)
•Image-gradient based à Extended KLT (ExtKLT)
• patch-based implementation
• feature propagation
• corner-binding
+ sub-pixel accuracy
• algorithm scales bad with number
of features
•Tracking-By-Matching à AGAST tracker
• AGAST corner detector
• efficient descriptor
• high frame-rates (hundrets of
features in a few milliseconds)
+ algorithm scales well with number
of features
• pixel-accuracy
8
28. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Previous approach – Navigation from Optical Flow between
images
Can motion be calculated directly a single image?
29. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
What is the underlying principle?
Point Spread Function (PSF)
Horizontal motion
30. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Motion Blurr
Gaussian window to avoid artifacts in Cepstrum
31. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Coupling Alternatives for Perception Modules
M
M
Sensors
Camera,IMU,Laser
Structure-from-X
Actuators
Map-based action planning
Reactive behavior(Instincts)
e.g., Obstacle avoidance,…
32. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Navigation Strategies (metric vs. non-metric)
Vision-Based Control
the control signals are
generated directly from the
sensor perception
Map-based Navigation
the reconstructed data is stored
in 3D maps to be used for
obstacle avoidance and mission
planning.
33. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Optimal Feature Selection
33
III. SYSTEM ANALYSIS
The relative error in the solution caused by perturbations
of parameters can be estimated from the condition number
of the Image Jacobian matrix J. The condition number
is the ratio between the largest and the smallest singular
value of the matrix J.
The condition number estimates the sensitivity of solu-
tion of a linear algebraic system to variations of parame-
ters in matrix J and in the measurement vector b.
Consider the equation system with perturbations in
matrix J and vector b:
(J + δJ )xb = b + δb (7)
The relative error in the solution caused by perturbations
of parameters can be estimated by the following inequality
using the condition number κ calculated for J (see [5]):
||x − xb||
||x||
≤ κ
||δJ ||
||J ||
+
||δb||
||b||
+ O( 2
) (8)
Therefore, the relative error in solution x can be as large
as condition number times the relative error in J and b. The
condition number together with the singular values of the
matrix J describe the sensitivity of the system to changes
in the input parameters.
In the following subsections we investigate the observ-
ability and accuracy of the output parameters (x, y, z)
from the input stream of the camera system (sec. III-
A) and the influence of the real sensor on the achievable
accuracy of the system (sec. III-B).
A. Optimal Landmark Configuration for the Image Jaco-
bian Matrix
The singular values can be obtained as positive square
roots of the eigenvalues of the matrix JT
· J. With
yi∈{1,...,N} as heights of the tracked objects, αi∈{1,...,N}
as azimuth angles to them and βi∈{1,...,N} as their eleva-
tion angles. The resulting matrix for N landmarks has the
form shown in (9).
ri = x2
i
From th
directly for
∆r, the err
the value
Since in
space, the
the recipro
0
0.05
0.1
0.15
0.2
0.25
0.3
We dedu
placement
expression
∆r. The o
dκr
dyi
⇒
that corres
It is sho
solution co
J t
i =
∂αi
∂x
∂αi
∂z
∂αi
∂Θ
∂βi
∂x
∂βi
∂z
∂βi
∂Θ
=
zi
x2
i
+z2
i
− xi
x2
i
+z2
i
−1
− xiyi
(x2
i
+y2
i
+z2
i
)·
√
x2
i
+z2
i
− yizi
(x2
i
+y2
i
+z2
i
)·
√
x2
i
+z2
i
0
(2)
The dependency on the unknown position (xi, zi) of
the robot relative to the tracked landmark can be avoided
considering the geometry of the system to:
x2
i + y2
i + z2
i = yi
sin βi
, x2
i + z2
i = yi
tan βi
,
xi = yi·sin αi
tan βi
, zi = yi·cos αi
tan βi
J t
i =
tan βi·cos αi
yi
−tan βi·sin αi
yi
−1
−sin2
βi·sin αi
yi
−sin2
βi·cos αi
yi
0
(3)
Note in particular that the Image Jacobian is a function
of only one unobserved parameter, yi, the height of the
observed point. Furthermore, this value is constant for
motion in the plane. Thus, instead of estimating a time-
changing quantity as is the case in most vision-based
control, we only need to solve a simpler static estimation
problem. We refer to [3] for a detailed description of the
yi-estimation.
In the following text we assume that the system already
learned the positions of the tracked objects as columns
in a matrix Mp in the teaching phase [3]. In the replay
)−1
J tT
(6)
position
robot.
rbations
number
number
singular
of solu-
parame-
an orientation ∆Θ relative to it. The equation (3) can then
be written in this case as:
J t
i =
0 −1
−sin2
βi
yi
0
=
0 −1
− 1
yi· 1+
ri
yi
2
0
ri = x2
i + z2
i
(10)
From the equation (10) we learn that an error ∆Θ is
directly forwarded to the output value αi, while the value
∆r, the error in the distance to the feature, is scaled with
the value
κr = y · 1 +
ri
yi
2 −1
(11)
∆wt
= (J t
)−1
·∆et
, with (J t
)−1
= (J tT
J t
)−1
J tT
(6)
The value ∆wt
describes the error in the 3D position
that we use to generate the control signals for the robot.
III. SYSTEM ANALYSIS
The relative error in the solution caused by perturbations
of parameters can be estimated from the condition number
of the Image Jacobian matrix J. The condition number
is the ratio between the largest and the smallest singular
value of the matrix J.
The condition number estimates the sensitivity of solu-
tion of a linear algebraic system to variations of parame-
ters in matrix J and in the measurement vector b.
Consider the equation system with perturbations in
matrix J and vector b:
(J + δJ )xb = b + δb (7)
The relative error in the solution caused by perturbations
of parameters can be estimated by the following inequality
using the condition number κ calculated for J (see [5]):
||x − xb||
||x||
≤ κ
||δJ ||
||J ||
+
||δb||
||b||
+ O( 2
) (8)
Therefore, the relative error in solution x can be as large
as condition number times the relative error in J and b. The
condition number together with the singular values of the
matrix J describe the sensitivity of the system to changes
in the input parameters.
an orientation ∆Θ relative to it. The equation (3) can then
be written in this case as:
J t
i =
0 −1
−sin2
βi
yi
0
=
0 −1
− 1
yi· 1+
ri
yi
2
0
ri = x2
i + z2
i
(10)
From the equation (10) we learn that an error ∆Θ is
directly forwarded to the output value αi, while the value
∆r, the error in the distance to the feature, is scaled with
the value
κr = y · 1 +
ri
yi
2 −1
(11)
Since in our case the measurement error is in the image
space, the resulting errors in the world are dependent on
the reciprocal values.
0 2 4 6 8 10 12 14 16 18 20
0.05
0.1
0.15
0.2
0.25
0.3
y
κr
r
is the ratio between the largest and the smallest singular
value of the matrix J.
The condition number estimates the sensitivity of solu-
tion of a linear algebraic system to variations of parame-
ters in matrix J and in the measurement vector b.
Consider the equation system with perturbations in
matrix J and vector b:
(J + δJ )xb = b + δb (7)
The relative error in the solution caused by perturbations
of parameters can be estimated by the following inequality
using the condition number κ calculated for J (see [5]):
||x − xb||
||x||
≤ κ
||δJ ||
||J ||
+
||δb||
||b||
+ O( 2
) (8)
Therefore, the relative error in solution x can be as large
as condition number times the relative error in J and b. The
condition number together with the singular values of the
matrix J describe the sensitivity of the system to changes
in the input parameters.
In the following subsections we investigate the observ-
ability and accuracy of the output parameters (x, y, z)
from the input stream of the camera system (sec. III-
A) and the influence of the real sensor on the achievable
accuracy of the system (sec. III-B).
A. Optimal Landmark Configuration for the Image Jaco-
bian Matrix
The singular values can be obtained as positive square
roots of the eigenvalues of the matrix JT
· J. With
yi∈{1,...,N} as heights of the tracked objects, αi∈{1,...,N}
as azimuth angles to them and βi∈{1,...,N} as their eleva-
tion angles. The resulting matrix for N landmarks has the
form shown in (9).
The system estimates three parameters (dx, dy, dΘ)
from the image positions (ui, vi) of all tracked primitives
(features) i ∈ {1, . . . , N} (4). Therefore, at least two fea-
tures are necessary to estimate all 3 position parameters.
Each feature contributes a measurement of a distance
∆ri from the robot to the feature in the ground plane and
the value
κr = y · 1 +
ri
yi
2 −1
(11)
Since in our case the measurement error is in the image
space, the resulting errors in the world are dependent on
the reciprocal values.
0 2 4 6 8 10 12 14 16 18 20
0.05
0.1
0.15
0.2
0.25
0.3
y
κr
r
Fig. 3. Dependency of κr on yi and ri.
We deduce from the above equation that the optimum
placement of the feature should maximize the above
expression to allow good observability of the position error
∆r. The optimal value can be estimated to
dκr
dyi
= −
1
xi
2 1 + ri
2
yi
2
+
2 · ri
2
yi
4 1 + ri
2
yi
2
2 = 0
⇒ yi = ±ri ⇒ βi = arctan
yi
ri
(12)
that corresponds to an angle |βi| = 45◦
.
It is shown that any linear system has at least one
solution component whose sensitivity to perturbations is
proportional to the condition number of the matrix, but
there may exist many components that are much better
conditioned [4]. The sensitivity for different components
of the solution changes depending on the configuration
of the landmarks and the relative position of the robot to
them.
34. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Capturing Motion Properties of Large Dynamic Scenes
Cars in distances over 50m are only a
few pixels large
35. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Are lab approaches transferrable to automobile and avionic
applications?
Sensitivity increase:
• Larger baseline (B)
• Longer focal length (f)
field of view
• Smaller pixelsize (px)
“pixel explosion”
d p =
B⋅ f
px
⋅
1
z
pixel[ ]
75cm
36. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Direct Mapping of Point Motion on Image
Observation
estimate the collision property.
A three-dimensional point Pi travels with an arbitrary constant velocity vector vgi
in Fig. 5. This motion results in a trace of tracked points pi in consecutive image
frames for time steps t={0,1,2}. The vector vgi defines a plane containing Pi. We
call this a collision plane which sweeps through the focal point of the camera after
some. The collision time can be estimated from the trace of the point [9].
t=0
t=1
t=2
k vgi
Vgi
PI
Ei
PI
Vgi
H
H
H
Fig. 5 A 3D point Pi travels in three frames t=0,1,2 with a velocity vgi. The vector vgi with the
point Pi defines a gray collision plane with vgi as its normal vector.
37. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Detection of Independent Motion Groups from Optical Flow
• Our goal is a robust detection of motion direction
and collision times from a monocular uncali-
brated camera sequence.
• Representation of the dynamic scene ordered by
collision times instead of Cartesian coordinates
aenables monocular processing (no scale neces-
sary) and better priorisation of collision candidates
than in conventional methods
• Independent estimation of motion direction and
collision time allows collision categorization in
large distances from the camera Schaub et al., Journal ITSC 2017
Burschka, BMVC 2017
t=0
t=1
t=2
k vgi
Vgi
PI
Ei
PI
Vgi
H
H
H
39. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020 39
Novel Control Design for Non-metric Control from Monocular
(Schaub&Burschka)
• Matlab simulation with 3
objects (C is on collision
course)
• Non-Linear Gradient-
Descent with an Adaptive
Lagrange Interpolation
Search (ALIS)
• Weights:
Results: Optimization
DLR.de • Chart 10 > ICSTCC15 > Alexander Schaub • Reactive Avoidance of Dynamic Obstacles through Optimization of their Epipoles > 14.10.15
Δ"↓.
• Good performance: 2 Steps to reach the optimum
• Realtime implementation with 25 Hz
Δ"↓-
• Planning space represented as
collision times for different
velocities
New Controller is necessary for the non-
metric input:
40. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Navigation based on Pixel-Information from Monocular View
Concept: Shifting the optical flow epipole
out of the object’s boundaries
à no collision
Planar motion of the objects (P ∈
ℝ!
{%, ', ( = *})
Effect of the relative velocity - = {-", -#}
on the Epipole’s 2D image position (., /)
à find: Δ1$ = f(Δ-", Δ-#) :
Schaub, Burschka ITSCC 2015
42. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Novel Non-Cartesian Map Structure (IROS 2019)
42
Map is indexed by the azimuth angle of the relative
velocity vector and the time until the plane with the velocity
vector as a normal passes the focal point TTC
This makes the map content static in dynamic environments. Merely a TTC counter
scrolls the grid during the operation but no changes of the grid information is necessary!
43. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Identification of Interaction Regions in the Scene
43
nt
es
of
rts
ly
er-
of
ze
ng
nd
ta
ng
a-
e-
nt
al-
ze
es,
nd
8]
ar
ay
up
by
on
Kitani et al. [26] infer future long-term walking paths
from the physical world with obstacles and preferred areas as
sidewalks but do not analyze temporal evolution after initial
estimation like this approach.
Fig. 2. Map data is enriched by interaction regions(orange) caused by static
POI (entrances to shops, narrows) (green), dynamic intersection regions
(blue) based on intersecting trajectories. Pictured area of second example.
III. APPROACH
Our approach is divided into three parts: First, a static
map of Points and Areas of Interest (POI/AOI) is computed
offline beforehand (Figure 2), which is used for local
search of possible long term goals in a given scenario. It is
extended on-line if temporary changes occur in a scenario,
e.g. construction works on or next to the street.
Second, possible dynamic areas of interaction are calcu-
lated on-line with possible paths and occupation times for
circulation. For review only.
Extraction of map-based POI
static
Pedestrian Intention Detection as a Resource Competition Challenge
Peter Gawronski1 Darius Burschka2
Abstract— We propose an approach to classify possible in-
teraction types between autonomous vehicles and pedestrians
based on the idea of resource competition in shared spaces.
Autonomous vehicles are more challenged in urban traffic
scenarios as lots of uncertainties influence the current world
model. Urban environments impose very little constraints on
the motion of pedestrians. This creates the demand for an
approach to determine intentions for each pedestrian as far
ahead as possible and to react to changes early. A motion model
based on goal-driven pedestrian movement shows a set of most
likely planned trajectories. These are analyzed for overlapping
occupation times in road segments, thus interactions with the
vehicle. The output is an early estimation which suggests most
probable interaction types and places. From this estimation,
current trajectory of the pedestrian is used to refine the
prediction of the most probable intention of interaction place
and type. In the end the algorithm combines topological and
behavioral input to infer and validate long term intention
of interaction type before being able to actually infer the
interaction from current dynamics.
In terms of a proof-of-concept, the applicability of the ap-
proach is validated on real world scenarios from the Cityscapes
data set.
Fig. 1. Person starting to run towards an intersection: Changes in behavior
indicate a change of mind and thus a new interaction type. Highlighted
interaction area based on the temporal resource competition analysis.
Scenario 3, cutout of frame 19997
3. Very similar behavior can have different causes and
outcomes, as pedestrian movement is not limited very much,
thus changes in trajectory and intention happen very quickly.
4. Traffic participants in a scene influence each other by
changes in behavior so that most often a seamless interaction
is achieved, sometimes aided by visual clues.
There is a lot of research for solutions to estimate pedestri-
CONFIDENTIAL. Limited circulation. For review only.
Identification of overlapping ”resource” allocation (competition
for POI) dynamic
44. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Estimation of Intention of the Traffic Agents
44
includes a road network to identify intersections of roads at
which interactions between motorists will occur most likely,
but is feasible for pedestrian movement as to determine
movement across the road. Furthermore, static infrastructure
can be plotted into the database such as crosswalks, sig-
nalized crossings or narrow passages in low density urban
areas. It is merged with destinations of travel in the scene
as crossroads to another street, entries to parking lots shops
or areas of shared space, e.g. a pedestrian headed towards a
crosswalk.
a) Influences of topology: Figure 3 shows some typical
topologies and how they influence behavior. In all scenarios,
the hypothetical arrival and occupation time is calculated for
the pedestrian as depicted on the left. The vehicle is limited
to the road, but may change lanes or turn.
A long term goal may influence behavior and trajectory of
a pedestrian, as different paths lead to the same goal and
non-interference with cars may be intended. Third example
includes crosswalk and thus creates a strong goal as crossing
is safer there and makes jaywalking less probably.
The last example shows a possible situation at a corner.
Arrival time of the pedestrian is identical for both crossings,
an interaction with the vehicle depending on the unknown
goal. Temporal analysis is needed to evaluate further.
resource, e.g. an area at which trajectories meet, as simul-
taneous usage is not possible, shown in Figure 1 in red.
This limitation is used for behavior prediction as some form
of reaction is needed to avoid a collision, either by change
of velocity or direction by one or more of the participants
involved. A planned solution may exist, as with right-of-
way, but changes in behavior may solve these differently than
anticipated, e.g. the yielding object accelerating to avoid the
yield situation, as in scenario 3.
Fig. 4. Schematic behavior patterns in a interaction scenario: 1/2: vehicle
passing behind/in front of the pedestrian from a collision situation, 3:
passive pedestrian; 4: assertive crossing; 5/6/7: changes in velocity which
may influence outcome, 8/9: pedestrian turning away/towards the road as
indicators against/for crossing intention. Details in chapter III-C.
C. Temporal Analysis of Behavior
A temporal analysis is performed to analyze the behavior
and predict which of previously estimated probable actions
is performed. Both objects involved use nonverbal commu-
nication in form of clear changes in TTA by changes in
velocity and/or direction to clear an interfering situation.
Temporal evolution of TTC at the resource
allows to assess passivity of aggressivity of
the traffic partner
to the road, but may change lanes or turn.
A long term goal may influence behavior and trajectory of
a pedestrian, as different paths lead to the same goal and
non-interference with cars may be intended. Third example
includes crosswalk and thus creates a strong goal as crossing
is safer there and makes jaywalking less probably.
The last example shows a possible situation at a corner.
Arrival time of the pedestrian is identical for both crossings,
an interaction with the vehicle depending on the unknown
goal. Temporal analysis is needed to evaluate further.
Fig. 3. Schematic overview of influences of POI to pedestrian movement:
Possible walking paths for a pedestrian (orange) and the ego vehicle (purple)
and found POI in range (green). LTR: no POI in area; possible paths to
cross the road; introducing crosswalks/right of way; different options create
different situations
b) Occupation times: To compensate for input noise
and the fact that the whole area must be cleared of obstruc-
tions before safe passage is possible, some buffer around
arrival times must be considered. Direct collision is consid-
ered at TTA 0.5s, close calls at TTA 2s, both of
which are to be avoided.
Fig. 4. Schematic behavior patterns in a interaction scenario: 1/2: vehicle
passing behind/in front of the pedestrian from a collision situation, 3:
passive pedestrian; 4: assertive crossing; 5/6/7: changes in velocity which
may influence outcome, 8/9: pedestrian turning away/towards the road as
indicators against/for crossing intention. Details in chapter III-C.
C. Temporal Analysis of Behavior
A temporal analysis is performed to analyze the behavior
and predict which of previously estimated probable actions
is performed. Both objects involved use nonverbal commu-
nication in form of clear changes in TTA by changes in
velocity and/or direction to clear an interfering situation.
Pedestrians may give visual clues to a driver [11], but as
these are not always clear, especially in-camera due to low
resolution, and also they can go unnoticed or misinterpreted
by an algorithmic detector, they are not taken into account
in this work.
In situations of clear legislation, the expected output is
anticipated. Unexpected behavior can have multiple causes
by means of aggressiveness or situation awareness, but
can also be a sign of distraction from the situation. Open
situations can have several outcomes, so the analysis of
movement behavior can pinpoint the moment a decision is
made, especially for the yielding party.
The main information gain is obtained from changes in
velocity, as both responses of acceleration and deceleration
Static and dynamic POI allow a better
prediction of intentions
Changes in the temporal interaction with agents can be used for behavior analysis
(IV 2019)
45. https://mvp.in.tum.de Machine Vision and Perception Group IPAM Workshop I, Oct 7, 2020
Conclusions
45
• Non-metric navigation allows operations directly in camera images
without the necessity of metric scale
• Temporal representation helps to assess and predict behaviors
• Learning approaches are based on similarity search and, therefore, built
for segmentation and labeling – not for metric measurements
• Scene understanding from single images reconstruct only spatial but no
temporal relations
• Early data abstraction loses often important information from the sensor
• Driving can be modelled as resource competition on the road