Report




Scene Interpretation For SLAM using Stereo-vision
In robotics, SLAM (Simultaneous Localisation and Mapping) is the problem of estimating concurrently
the robot’s position a...
1.1 Stereo-Analysis



           The first task is to compute the disparity image from the left and right images. The ste...
2. Feature Extraction - compute the Laplacian of Gaussian for each image
    3. Correlation - compute disparities by match...
disparity image thus obtained, each disparity image point [u,v,d] corresponds to a 3D point
        in the robot’s frame.
...
2. The filtered RGB values are transformed into the HSV color space. Valid values to hue and
    saturation are assigned o...
Ground-Plane Extraction using Color-Model
1.4 Local Map Construction




      Occupancy maps are very suitable representations for SLAM purposes. Although occupanc...
Fig. 1.4 (b) Column vs Distance Graph (c) Occupancy Grid Representation




           The spikes can be overcome using su...
The aim is to have accurate registration of local surroundings, and have global consistency
maintained at each step. The t...
References

[1]. Don Murray and Jim Little: Using Real-time Stereo-vision for mobile robot navigation

[2]. D.Santosh, Sup...
Report: Scene Interpretation For SLAM Using Stereo-Vision
Report: Scene Interpretation For SLAM Using Stereo-Vision
Upcoming SlideShare
Loading in...5
×

Report: Scene Interpretation For SLAM Using Stereo-Vision

1,272

Published on

Published in: Technology, Art & Photos
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,272
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Report: Scene Interpretation For SLAM Using Stereo-Vision

  1. 1. Report Scene Interpretation For SLAM using Stereo-vision
  2. 2. In robotics, SLAM (Simultaneous Localisation and Mapping) is the problem of estimating concurrently the robot’s position and a map of its surrounding environment. The task is challenging because of two main factors: noisy robot dynamics and limitation of the sensors. Although work in SLAM has preferentially used laser- rangefinders, here stereo-vision is used. Vision-sensors enable us to use more distant objects as landmarks for navigation, and to learn and use color and texture models of the environment, in looking further ahead than is possible with range-sensors alone. The problem can be sub-divided into the following areas: 1. Scene Interpretation ( Local Map-building) 2. Global Map-building 3. Planning and Control This report only focuses on Scene Interpretation. 1. Scene Interpretation The objective is to construct a local map – distinguishing obstacles from free-regions, from the visual information. The basic procedure is to compute a ground plane from the disparity image, analyse the height differences from the ground to find obstacles, and finally coupling this information with color analysis to give the traversable regions of the terrain.
  3. 3. 1.1 Stereo-Analysis The first task is to compute the disparity image from the left and right images. The stereo camera used for the purpose is videre mdcs2 stereo-cam. Calibrating the camera with a set of 20 images of the chequered pattern yielded the fundamental matrix. 2 methods to compute the disparity image were tried out: • SRI Stereo-Engine, bundled as SVS software The Stereo Engine algorithm computes disparity between a pair of stereo images, using an extremely efficient implementation of area correlation. It comprises the following 4 stages: 1. Rectification - remove distortion from images
  4. 4. 2. Feature Extraction - compute the Laplacian of Gaussian for each image 3. Correlation - compute disparities by matching 4. Filtering - remove uncertain matches • Using Graph-Cuts on CUDA The goal of the algorithm is to compute one disparity estimate for each pixel in a reference image, which is generally the left input image. For SLAM (specifically ground-plane extraction), a dense disparity map of the reference image is an important requirement. It is to be noted that the algorithm takes rectified images as input. In other words, the goal is to find a labeling f that assigns each pixel p ε P a label fp ε ℓ ( a finite set), where f is both piecewise smooth and consistent with the observed data. This is one of the methods for global optimization, whereby the following energy function is minimized (the choice of the energy function depends upon the specific application): E(f) = Esmooth(f) + Edata(f) For more details on the energy function, please refer [ ] This energy-minimization problem is solved using graph-cuts, which is implemented as a multi-threaded algorithm on CUDA, hence speeding up the execution considerably. In the
  5. 5. disparity image thus obtained, each disparity image point [u,v,d] corresponds to a 3D point in the robot’s frame. 1.2 Ground-Plane Extraction The next step is to obtain the ground-plane. Although it is possible to detect obstacles using local variation in height, using a ground plane simplifies processing and yields more stable results. To extract ground-plane, RANSAC method is generally used (refer [4]), using sets of 3 non-collinear points. Hypothesized planes are ranked by the number of points that are close to the plane. Points that lie too high above the ground-plane, but lower than the robot’s height, are labeled as obstacles. Note: The method of extraction of ground-plane using RANSAC is useful generally in relatively open spaces. For environments which are cluttered with obstacles, it is better to resort directly to detection of obstacles using local variation in height. 1.3 Ground-Plane Segmentation using color-model The input image is first over-segmented into a number of super-pixels i.e, contiguous regions with fairly uniform color and texture [5]. Each super-pixel is classified as belonging as either to ground or non-ground, based on the whether the membership score (X) of the super-pixel is above or below a threshold. Each individual image-pixel is classified as belonging either to an obstacle or the ground based on its color appearance. The algorithm consists of the following four steps: 1. The color image is filtered with a 5X5 Gaussian filter to reduce the level of noise.
  6. 6. 2. The filtered RGB values are transformed into the HSV color space. Valid values to hue and saturation are assigned only if intensity is above a threshold, because color information is very noisy at low intensity. Similarly, hue is assigned a valid value if the corresponding saturation value is above a threshold. This causes the segmentation to be less sensitive to illumination changes. 3. A trapezoidal area in front of the mobile robot is used for reference. The valid hue and intensity values of the pixels inside the trapezoidal reference area are histogrammed into two one-dimensional, one for hue and other for intensity. 4. All pixels of the filtered input image are compared to the hue and the intensity histograms. A pixel is classified as an ground if both the two conditions are satisfied: • The hue histogram bin value at the pixel’s hue value is above the hue threshold. • The intensity histogram bin value at the pixel’s intensity value at the pixel’s value is above the intensity threshold. The membership score is calculated as follow: X(Si) = 1/|Si| ∑ (ground-pixels in the super-pixel)
  7. 7. Ground-Plane Extraction using Color-Model
  8. 8. 1.4 Local Map Construction Occupancy maps are very suitable representations for SLAM purposes. Although occupancy grids may be implemented in any number of dimensions, 2D grids are mostly used. The reduction of 3D data of the world to 2D representation is justified, since a robot fundamentally move on the ground-plane, hence inhabiting a 2D world. Each grid represents a space in the physical world, and the value in the grid signifies the probability of it being occupied. From the disparity image obtained from 1.1, the ground portion is identified from 1.2 and 1.3, and removed from the disparity image, leaving only the obstacles. In this disparity image, darker areas indicate lower disparities, and are farther away from the camera. From the resulting disparity image, the maximum disparity is taken from each column, to create a single row of maximum disparities. Fig.1.4 (b) shows these column values converted into distance, and (c) shows these distance values converted into an occupancy grid representation, with black indicating the uncertain region around the obstacles, and white indicating regions that are clear. Spikes such as in fig.(b) may be present, of there are mismatches in the stereo-algorithm. Fig.1.4 (a) Row of Maximum Disparity
  9. 9. Fig. 1.4 (b) Column vs Distance Graph (c) Occupancy Grid Representation The spikes can be overcome using surface-segmentation. These spikes are caused due to errors which are locally stable, but not large and they have no support from the surrounding surfaces. True surfaces in the stereo image should not only be globally consistent, but globally part of a larger 3D surface. To segment the image into surfaces of continuous disparity, the following logic is applied: i= L given j ε N(i) j=L |dj – di| <=1 where i is any given pixel, L is a surface label, N(i) is a neighborhood of pixels around i and di is the disparity value at location i. Now using 1.4(c), the map has to be updated. Though there are efficient probabilistic frameworks exist for this purpose, this report considers a much simpler method. Grid-values range from 0-255, with 0 indicating free-space. All the grids start with a value of 128. With fig. 1.4(c), the update rule is: If i ε OCC (r) G(i) = G(i) + K If i ε CLEAR (r) G(i) = G(i) - K where i is an occupancy grid location, r is the sensor reading, OCC(r) represents the region of uncertainty around the sensed obstacle, CLEAR(r) represents the clear region between robot and the sensed obstacle, G(i) represents the current grid-value at location i and K is a constant. Typically, one may take K=20, and G(i) < 50 to be free of obstacles and G(i) > 150 to be definitely obstructed. 2. Global Map-Building
  10. 10. The aim is to have accurate registration of local surroundings, and have global consistency maintained at each step. The tools at our disposal are: Visual odometry (VO) and shaft-encoder data. Visual Odometry ensures local consistency in map registration. Over larger regions, VO provides the necessary corrections to the shaft-encoder data to keep the errors from growing without bounds. The key aspect of VO is to track features from one frame to another, and hence determine the relative incremental motion between two frames, that is the position and orientation displacement between two frames. In our case the features are Harris corners. After they are extracted, the optical flow- field is constructed. From this point, several parameters are calculated using camera and epipolar geometry, and using Kalman filter. However, the report does not go into a detailed description of these.
  11. 11. References [1]. Don Murray and Jim Little: Using Real-time Stereo-vision for mobile robot navigation [2]. D.Santosh, Supreeth Achar, CV Jawahar: Autonomous Image-based Exploration for Mobile Robot Navigation [3]. Iwan Ulrich and Illah Nourbakhsh: Appearance-Based Obstacle Detection with Monocular Color-Vision [4]. Konolige, Agarwal, Bolles, Cowan, Fischler, Gerkey: Outdoor Mapping and Navigation using Stereo-Vision [5] Felzenszwalb and Huttenlocher: Efficient Graph-Based Image Segmentation

×