Kinect is a motion sensing input device by Microsoft (2010) for Xbox 360 and
Xbox One video game consoles and Windows PCs.
It is based around a webcam-style add-on peripheral, it enables users to
control and interact with their console/computer without the need for a game
controller, through a natural user interface using gestures and spoken
It has an infrared receiver, infrared transmitter, RGB camera and a
microphone array for taking real time input from the environment .
The Kinect’s depth sensor produces depth image at 640x480 pixels. The
sensor has field of view of 58° horizontally and 45° vertically. The optimal
operating range of the sensor is said to be between 0.8 to 3.5m that can be
extended up to 0.6 to 6m
This paper describes the use of Microsoft Kinect
for building maps of rooms. These maps can be
used for navigation and robot path finder. The
Kinect system is arguably the most popular 3-D
camera technology currently on the market.
Mapping rooms has become a trend since the
release of Microsoft Kinect.
The wide availability of affordable RGB-D cameras
is causing a revolution in perception and changing
the landscape of robotics and related fields.
The Kinect depth’s sensor has become popular
replacing the expensive systems like tilting laser
range finders and stereoscopic systems. In this
paper, we explore some of the recent trends for
mapping using kinect
• Teleoperation means controlling robotic vehicles from a remote or
• It implies a client-server architecture where one machine controls
the vehicle and collects the telemetry, and another machine has the
operator which visualizes the telemetry and generates commands
for the machine
Obstacle Detection & Avoidance System
Region of Interest (ROI): The Region Of Interest (ROI) is a region of the image
that is specially chosen for successive image analysis operations for the
robotic entity. All of the pixels outside of the ROI are discarded, in this case
for our particular robot and navigating around spaces, the region of interest
for us is the "available space" for movements.
The Gaussian Image in Computer Vision lets machines differentiate between
primitive geometric objects. It maps geometric objects based on area,
position and normal direction as a complex value.
Kalman Filter Framework: This algorithm is used for estimating the state of a
linear or non-linear dynamic system from noisy measurements. It is
extremely efficient with Robotics and computer vision as well. The Kalman
filter also provides a statistically robust framework for fusing different
measurement modalities. The filter maintains an estimate of the uncertainty
in the tracked parameters, which can be useful for evaluating tracking
In Simultaneous Localization and
Mapping (SLAM), the robot is left in
an unknown location. The robot
moves and builds a consistent map of
its surrounding Apart from the above
mentioned algorithms Extended
Kalman Filter, Fast SLAM and
Occupancy Grid Map is used. Most of
the algorithms used odometry and
proximity sensors to implement
localization and mapping.
RGB-D SLAM Algorithm
The SLAM is divided into
Combining the RGB and
depth images: The RGB and
depth images of the current
frame must be combined
in order to create a 3D point
cloud relative to the Kinect
camera. The depth data is
obtained directly from the IR
camera, the IR and RGB
images can be mapped to
one another in the same
manner as a traditional
stereo camera setup. But the
cameras must be calibrated.
A top down, orthographic view of a three
dimensional map generated from Kinect data with a
map created by the SLAM library
Feature Extraction and Matching: This step is a major part of the
SLAM front-end, which is responsible for establishing spatial
relations from the sensor data. There are two phases in this step:
the extraction of interest points or features from the RGB image of
the current frame (converted to gray scale), and then matching or
tracking those points back to the RGB image in the previous
frame. An additional constraint must is added to the feature
matching algorithm, each matching pair of features must have a
corresponding 3D point in their respective frames. This step is the
most computationally expensive in the algorithm when
implemented on mobile hardware. Therefore, the speed of the
system, as a whole, is dependent on the methods used for feature
detection and matching.
Graph Optimization and Map Building: The pairwise
transformations between sensor poses, as computed by the
front-end, form the edges of a pose graph. Due to estimation
errors, the edges form no globally consistent trajectory. To
create a globally consistent trajectory we optimize the pose
graph using the g2o framework. The g2o framework is an
easily extensible graph optimizer that can be applied to a
wide range of problems including several variants of SLAM
and bundle adjustment.
It computes a globally consistent trajectory.
Vehicle Pose Error
Sensor Random Error
The errors are minimized using octomap. The
OctoMap library implements a 3D occupancy
grid mapping approach, providing data
structures and mapping algorithms in C++
particularly suited for robotics.
3-D Mapping: The three dimensional mapping used
by the researchers can be improved up on in several
Latency Reduction: In teleoperation there is some
latency between commanding the vehicle and
receiving updated telemetry.
Color Image Integration: Adding the data from the
color image on the Kinect would add to the photo
realism of the teleoperation telemetry.
In this presentation, the SLAM for three dimensional
mapping have been explored in detail.
SLAM extracts visual keypoints from the color images and
uses the depth images to localize the keypoints in 3D.
In future researchers will be trying to attempt shadow
tracking where the Kinect will be placed on CSUF's
Unmanned Utility Robotic Ground Vehicle (UURGV) to follow
a person around until that person stops.
A big advantage is that the Kinect sensor is very cheap
costing about $ 150 at this time, in comparison with other
depth cameras and laser sensors which can cost up to several
I would like to thank Dr. Sudeep Pasricha for teaching us
Embedded Design of Hardware/Software systems and
making it a wonderful experience.