This document summarizes an approach for object recognition and 6-DoF pose estimation from RGB-D images. It first segments the scene to isolate objects from background surfaces like tables. It then clusters the remaining points into individual objects. For each object model, it generates synthetic views from different angles and extracts a global feature descriptor combining geometry and color information. During recognition, it extracts descriptors from segmented objects, finds nearest matches in the training database, and estimates the object and its pose. Experimental results demonstrate high recognition rates and pose accuracy, even under occlusion.
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...IJCSEA Journal
The registration in augmented reality is process which merges virtual objects generated by computer with real world image caught by camera. This paper describes the knowledge-based registration, computer vision-based registration and tracker-based registration technology. This paper mainly focused on trackerbased
registration technology in augmented reality. Also described method in tracker- based technology, problem and solution.
Implementation of high performance feature extraction method using oriented f...eSAT Journals
Abstract
Feature-based image matching is an important characteristic in many computer based applications such as object recognition, 3D stereo reconstruction, structure-from-motion and images stitching. These applications require a lot real-time performance. Feature based algorithms are well-suited for such operations. Different algorithms are used for image processing like Scale-invariant feature transform (SIFT), Speeded up Robust Features (SURF), Oriented FAST and Rotated BRIEF (ORB). ORB is one of the fast binary descriptor which is relying on BRIEF, where the BRIEF is rotation invariant and resistant to noise. This paper gives the advantages of rotation invariance and scale invariance of ORB algorithm for object detection technique. Query based object detection method is explained in this paper for object detection with efficient computation time. Different experimental results prove the scale invariance and rotation in variance of ORB in query based object detection method.
Keywords: ORB, BRIEF, SIFT, SURF
OFFLINE SIGNATURE VERIFICATION USING SUPPORT LOCAL BINARY PATTERNijaia
The offline signature verification is an automatic verification system that works on the scanned image of a signature. Signature verification uses the gray level measure with varying foreground features. The signature verification is performed by identifying feature vector using local patterns. The Local Binary
Pattern (LBP) in signature verification has used to extract the local structure information by establishing
the relationship between central pixel and adjacent pixels. This paper uses the Support Local Binary Pattern (SLBP) features for signature verification. The signatures are tested on MCYT dataset. The accuracy of the proposed method is tested against k-Nearest Neighbor Classifier (KNNC) and Linear
Discriminant Classifier (LDC).
Feature based ghost removal in high dynamic range imagingijcga
This paper presents a technique to reduce the ghost artifacts
in a high dynamic range (HDR) image. In HDR
imaging, we need to detect the motion between multiple exp
osure images of the same scene in order to
prevent the ghost artifacts
. First, w
e
establish
correspondences between the aligned reference image and the
other exposure images using the zero
-
mean normalized cross correlation (ZNCC
).
T
hen
, we
find object
moti
on regions
using
adaptive local thresholding of ZNCC feature maps and motion map clustering. In this
process, we focus on finding accurate motion regions and on reducing false detection in order to minimize
the side effects as well.
Through
experiments wit
h several sets of
low dynamic range
images captured with
different exposures, we show that the proposed method can remove the ghost artifacts better than existing
methods
.
IJCER (www.ijceronline.com) International Journal of computational Engineeri...ijceronline
Call for paper 2012, hard copy of Certificate, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJCER, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, research and review articles, IJCER Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathematics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer review journal, indexed journal, research and review articles, engineering journal, www.ijceronline.com, research journals,
yahoo journals, bing journals, International Journal of Computational Engineering Research, Google journals, hard copy of Certificate,
journal of engineering, online Submission
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...IJCSEA Journal
The registration in augmented reality is process which merges virtual objects generated by computer with real world image caught by camera. This paper describes the knowledge-based registration, computer vision-based registration and tracker-based registration technology. This paper mainly focused on trackerbased
registration technology in augmented reality. Also described method in tracker- based technology, problem and solution.
Implementation of high performance feature extraction method using oriented f...eSAT Journals
Abstract
Feature-based image matching is an important characteristic in many computer based applications such as object recognition, 3D stereo reconstruction, structure-from-motion and images stitching. These applications require a lot real-time performance. Feature based algorithms are well-suited for such operations. Different algorithms are used for image processing like Scale-invariant feature transform (SIFT), Speeded up Robust Features (SURF), Oriented FAST and Rotated BRIEF (ORB). ORB is one of the fast binary descriptor which is relying on BRIEF, where the BRIEF is rotation invariant and resistant to noise. This paper gives the advantages of rotation invariance and scale invariance of ORB algorithm for object detection technique. Query based object detection method is explained in this paper for object detection with efficient computation time. Different experimental results prove the scale invariance and rotation in variance of ORB in query based object detection method.
Keywords: ORB, BRIEF, SIFT, SURF
OFFLINE SIGNATURE VERIFICATION USING SUPPORT LOCAL BINARY PATTERNijaia
The offline signature verification is an automatic verification system that works on the scanned image of a signature. Signature verification uses the gray level measure with varying foreground features. The signature verification is performed by identifying feature vector using local patterns. The Local Binary
Pattern (LBP) in signature verification has used to extract the local structure information by establishing
the relationship between central pixel and adjacent pixels. This paper uses the Support Local Binary Pattern (SLBP) features for signature verification. The signatures are tested on MCYT dataset. The accuracy of the proposed method is tested against k-Nearest Neighbor Classifier (KNNC) and Linear
Discriminant Classifier (LDC).
Feature based ghost removal in high dynamic range imagingijcga
This paper presents a technique to reduce the ghost artifacts
in a high dynamic range (HDR) image. In HDR
imaging, we need to detect the motion between multiple exp
osure images of the same scene in order to
prevent the ghost artifacts
. First, w
e
establish
correspondences between the aligned reference image and the
other exposure images using the zero
-
mean normalized cross correlation (ZNCC
).
T
hen
, we
find object
moti
on regions
using
adaptive local thresholding of ZNCC feature maps and motion map clustering. In this
process, we focus on finding accurate motion regions and on reducing false detection in order to minimize
the side effects as well.
Through
experiments wit
h several sets of
low dynamic range
images captured with
different exposures, we show that the proposed method can remove the ghost artifacts better than existing
methods
.
IJCER (www.ijceronline.com) International Journal of computational Engineeri...ijceronline
Call for paper 2012, hard copy of Certificate, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJCER, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, research and review articles, IJCER Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathematics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer review journal, indexed journal, research and review articles, engineering journal, www.ijceronline.com, research journals,
yahoo journals, bing journals, International Journal of Computational Engineering Research, Google journals, hard copy of Certificate,
journal of engineering, online Submission
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Automatic rectification of perspective distortion from a single image using p...ijcsa
Perspective distortion occurs due to the perspective projection of 3D scene on a 2D surface. Correcting the distortion of a single image without losing any desired information is one of the challenging task in the field of Computer Vision. We consider the problem of estimating perspective distortion from a single still image of an unstructured environment and to make perspective correction which is both quantitatively accurate as well as visually pleasing. Corners are detected based on the orientation of the image. A method based on plane homography and transformation is used to make perspective correction. The algorithm infers frontier information directly from the images, without any reference objects or prior knowledge of the camera parameters. The frontiers are detected using geometric context based segmentation. The goal of this paper is to present a framework providing fully automatic and fast perspective correction.
EFFECTIVE INTEREST REGION ESTIMATION MODEL TO REPRESENT CORNERS FOR IMAGE sipij
One of the most important steps to describe local features is to estimate the interest region around the feature location to achieve the invariance against different image transformation. The pixels inside the interest region are used to build the descriptor, to represent a feature. Estimating the interest region
around a corner location is a fundamental step to describe the corner feature. But the process is challenging under different image conditions. Most of the corner detectors derive appropriate scales to estimate the region to build descriptors. In our approach, we have proposed a new local maxima-based
interest region detection method. This region estimation method can be used to build descriptors to represent corners. We have performed a comparative analysis to match the feature points using recent corner detectors and the result shows that our method achieves better precision and recall results than
existing methods.
Feature extraction is becoming popular in face recognition method. Face recognition is the interesting and growing area in real time applications. In last decades many of face recognitions methods has been developed. Feature extraction is the one of the emerging technique in the face recognition methods. In this method an attempt to show best faces recognition method. Here used different descriptors combination like LBP and SIFT, LBP and HOG for feature extraction. Using a single descriptor is difficult to address all variations so combining multiple features in common. Find LBP and SIFT features separately from the images and fuse them with a canonical correlation analysis and same procedure also done using LBP and HOG. The SIFT features have some limitations they don’t work well with lighting changes, quite slow, and mathematically complicated and computationally heavy. The combinations of HOG and LBP features make the system robust against some variations like illumination and expressions. Also, face recognition technique used a different classifier to extract the useful information from images to solve the problems. This paper is organized into four sections. Introduction in the first section. The second section describes feature descriptors and the third section describes proposed methods, final sections describes experiments result and conclusion phase.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Enhanced target tracking based on mean shift algorithm for satellite imageryeSAT Journals
Abstract Target tracking in high resolution satellite images is challenging task for computer vision field. In this paper we have proposed a mean shift algorithm based enhanced target tracking system for high resolution satellite imagery. In proposed tracking algorithm, Target modeling is done using spectral features of target object i.e. Mean & Energy density function. Feature Vector space with minimum Euclidean Distance is used for predicting next possible position of target object in consecutive frames. Proposed tracking algorithm has been tested using two high resolution databases i.e. Harbor & Airport region database acquired by WorldView-2 satellite at different times. Recall, Precision & F1 score etc. performance parameters are also calculated for showing the tracking ability of the proposed method in real-time applications and are compared with the results of Regional Operator Design based tracking algorithm proposed in [1]. The results show that our proposed method gives relatively better performance than the other tracking algorithms used in satellite imagery. Keywords- Target tracking, Mean shift algorithm, Energy density function, Feature Vector Space, Frame
Proposal and Implementation of the Connected-Component Labeling of Binary Ima...CSCJournals
The connected-component labeling (CCL) is a technique for extracting connected pixels having the same value. It is mainly used for abnormality diagnosis of products, and for extracting noise areas of products. In the extraction of product areas in product diagnosis, a hole filling processing (HFP) is used to complement discolored areas. However, the HFP is inefficient, because the CCL needs to be executed twice in the foreground and background, and half of the threads are idle during each process. In this study, we propose a rewriting method for continuous label IDs with pixel-by-pixel parallelism, and a HFP method using simultaneous CCL of foreground and background. We implemented and evaluated these methods on Jetson TX2. The rewriting process to the continuous label ID is 3.7-13.8 times faster than the conventional method of sequential processing on the CPU, and on average 9.2 times faster. For the HFP using simultaneous CCL, we implemented and verified the conventional method that requires twice the CCL and the proposed method that can extract the foreground and background with one CCL. The performance of the proposed method is about 13-27% better than that of the conventional method. In addition, in the lightweight object detection method that is an application using the proposed method, the facial detection time is about 14 ms, which is about 60 times faster than the conventional method. As a result, the facial detection processing with high computational complexity can be operated practically even on an inexpensive and small processor. The CCL process for GPGPU has little room for optimization, and it has been difficult to achieve higher speeds. However, we focused on wasted idols in the HFP, proposed a method to reduce and supplement them, and realized a faster HFP than the conventional method.
A PREDICTION METHOD OF GESTURE TRAJECTORY BASED ON LEAST SQUARES FITTING MODELVLSICS Design
Implicit interaction based on context information is widely used and studied in the virtual scene. In context
based human computer interaction, the meaning of action A is well defined. For instance, the right wave is
defined turning paper or PPT in context B, And it mean volume up in context C. However, we cannot use
the context information when we select the object to be manipulated. In view of this situation, this paper
proposes using the least squares fitting curve beam to predict the user's trajectory, so as to determine what
object the user’s wants to operate. At the same time, the fitting effects of the three curves were compared,
and the curve which is more accord with the hand movement trajectory is obtained. In addition, using the
bounding box size control the Z variable to move in an appropriate location. Experimental results show
that the proposed in this paper based on bounding box size to control the Z variables get a good effect; by
fitting the trajectory of a human hand, to predict the object that the subjects would like to operate. The
correct rate is 91%.
Histogram Gabor Phase Pattern and Adaptive Binning Technique in Feature Selec...CSCJournals
The aim of this paper is to develop a robust system for face recognition by using Histogram Gabor Phase Pattern (HGPP) and adaptive binning technique. Gabor wavelet function is used for representing the features of the image both in frequency and orientation level. The huge feature space created by Gabor wavelet is classified by using adaptive binning technique. The unused bin spaces are used. As a result of which, the size of the space is drastically reduced and high quality HGPP created. It is due to this approach, the computation complexity and the time taken for the process is reduced and the recognition rate of the face improved. The significance of this system is its compatibility in yielding best results in the face recognition with major factors of a face image. The system is verified with FERET database and the results are compared with those of the existing methods.
7th Annual Offshore Support Vessels (OSV) 2015Ruoh Yi Tham
7th Annual Offshore Support Vessels (OSV) 2015
Bringing Commercial Trends and Projects to the Forefront
20-22 April 2015, Singapore http://singapore.osvconference.com
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Automatic rectification of perspective distortion from a single image using p...ijcsa
Perspective distortion occurs due to the perspective projection of 3D scene on a 2D surface. Correcting the distortion of a single image without losing any desired information is one of the challenging task in the field of Computer Vision. We consider the problem of estimating perspective distortion from a single still image of an unstructured environment and to make perspective correction which is both quantitatively accurate as well as visually pleasing. Corners are detected based on the orientation of the image. A method based on plane homography and transformation is used to make perspective correction. The algorithm infers frontier information directly from the images, without any reference objects or prior knowledge of the camera parameters. The frontiers are detected using geometric context based segmentation. The goal of this paper is to present a framework providing fully automatic and fast perspective correction.
EFFECTIVE INTEREST REGION ESTIMATION MODEL TO REPRESENT CORNERS FOR IMAGE sipij
One of the most important steps to describe local features is to estimate the interest region around the feature location to achieve the invariance against different image transformation. The pixels inside the interest region are used to build the descriptor, to represent a feature. Estimating the interest region
around a corner location is a fundamental step to describe the corner feature. But the process is challenging under different image conditions. Most of the corner detectors derive appropriate scales to estimate the region to build descriptors. In our approach, we have proposed a new local maxima-based
interest region detection method. This region estimation method can be used to build descriptors to represent corners. We have performed a comparative analysis to match the feature points using recent corner detectors and the result shows that our method achieves better precision and recall results than
existing methods.
Feature extraction is becoming popular in face recognition method. Face recognition is the interesting and growing area in real time applications. In last decades many of face recognitions methods has been developed. Feature extraction is the one of the emerging technique in the face recognition methods. In this method an attempt to show best faces recognition method. Here used different descriptors combination like LBP and SIFT, LBP and HOG for feature extraction. Using a single descriptor is difficult to address all variations so combining multiple features in common. Find LBP and SIFT features separately from the images and fuse them with a canonical correlation analysis and same procedure also done using LBP and HOG. The SIFT features have some limitations they don’t work well with lighting changes, quite slow, and mathematically complicated and computationally heavy. The combinations of HOG and LBP features make the system robust against some variations like illumination and expressions. Also, face recognition technique used a different classifier to extract the useful information from images to solve the problems. This paper is organized into four sections. Introduction in the first section. The second section describes feature descriptors and the third section describes proposed methods, final sections describes experiments result and conclusion phase.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Enhanced target tracking based on mean shift algorithm for satellite imageryeSAT Journals
Abstract Target tracking in high resolution satellite images is challenging task for computer vision field. In this paper we have proposed a mean shift algorithm based enhanced target tracking system for high resolution satellite imagery. In proposed tracking algorithm, Target modeling is done using spectral features of target object i.e. Mean & Energy density function. Feature Vector space with minimum Euclidean Distance is used for predicting next possible position of target object in consecutive frames. Proposed tracking algorithm has been tested using two high resolution databases i.e. Harbor & Airport region database acquired by WorldView-2 satellite at different times. Recall, Precision & F1 score etc. performance parameters are also calculated for showing the tracking ability of the proposed method in real-time applications and are compared with the results of Regional Operator Design based tracking algorithm proposed in [1]. The results show that our proposed method gives relatively better performance than the other tracking algorithms used in satellite imagery. Keywords- Target tracking, Mean shift algorithm, Energy density function, Feature Vector Space, Frame
Proposal and Implementation of the Connected-Component Labeling of Binary Ima...CSCJournals
The connected-component labeling (CCL) is a technique for extracting connected pixels having the same value. It is mainly used for abnormality diagnosis of products, and for extracting noise areas of products. In the extraction of product areas in product diagnosis, a hole filling processing (HFP) is used to complement discolored areas. However, the HFP is inefficient, because the CCL needs to be executed twice in the foreground and background, and half of the threads are idle during each process. In this study, we propose a rewriting method for continuous label IDs with pixel-by-pixel parallelism, and a HFP method using simultaneous CCL of foreground and background. We implemented and evaluated these methods on Jetson TX2. The rewriting process to the continuous label ID is 3.7-13.8 times faster than the conventional method of sequential processing on the CPU, and on average 9.2 times faster. For the HFP using simultaneous CCL, we implemented and verified the conventional method that requires twice the CCL and the proposed method that can extract the foreground and background with one CCL. The performance of the proposed method is about 13-27% better than that of the conventional method. In addition, in the lightweight object detection method that is an application using the proposed method, the facial detection time is about 14 ms, which is about 60 times faster than the conventional method. As a result, the facial detection processing with high computational complexity can be operated practically even on an inexpensive and small processor. The CCL process for GPGPU has little room for optimization, and it has been difficult to achieve higher speeds. However, we focused on wasted idols in the HFP, proposed a method to reduce and supplement them, and realized a faster HFP than the conventional method.
A PREDICTION METHOD OF GESTURE TRAJECTORY BASED ON LEAST SQUARES FITTING MODELVLSICS Design
Implicit interaction based on context information is widely used and studied in the virtual scene. In context
based human computer interaction, the meaning of action A is well defined. For instance, the right wave is
defined turning paper or PPT in context B, And it mean volume up in context C. However, we cannot use
the context information when we select the object to be manipulated. In view of this situation, this paper
proposes using the least squares fitting curve beam to predict the user's trajectory, so as to determine what
object the user’s wants to operate. At the same time, the fitting effects of the three curves were compared,
and the curve which is more accord with the hand movement trajectory is obtained. In addition, using the
bounding box size control the Z variable to move in an appropriate location. Experimental results show
that the proposed in this paper based on bounding box size to control the Z variables get a good effect; by
fitting the trajectory of a human hand, to predict the object that the subjects would like to operate. The
correct rate is 91%.
Histogram Gabor Phase Pattern and Adaptive Binning Technique in Feature Selec...CSCJournals
The aim of this paper is to develop a robust system for face recognition by using Histogram Gabor Phase Pattern (HGPP) and adaptive binning technique. Gabor wavelet function is used for representing the features of the image both in frequency and orientation level. The huge feature space created by Gabor wavelet is classified by using adaptive binning technique. The unused bin spaces are used. As a result of which, the size of the space is drastically reduced and high quality HGPP created. It is due to this approach, the computation complexity and the time taken for the process is reduced and the recognition rate of the face improved. The significance of this system is its compatibility in yielding best results in the face recognition with major factors of a face image. The system is verified with FERET database and the results are compared with those of the existing methods.
7th Annual Offshore Support Vessels (OSV) 2015Ruoh Yi Tham
7th Annual Offshore Support Vessels (OSV) 2015
Bringing Commercial Trends and Projects to the Forefront
20-22 April 2015, Singapore http://singapore.osvconference.com
Is your business poised for long-term growth?
Which engineers are under-performing?
Are you meeting all of your SLA commitments?
How do you know if your services are priced correctly?
Who is your most profitable client?
Understand your business.
=====================================================
For more information about Autotask:
Integrated Computer Systems, Inc.
3/F Limketkai Bldg., ortigas Ave., Greenhills, San Juan City 1502 Philippines
Tel: (+63 2) 744 3000 • 727 3801
Fax: (+63 2) 721 4502 • 621 9464
Email: info@ics.com.ph
Website: http://www.ics.com.ph
APPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLINGsipij
Object tracking is one of the most important problems in computer vision. The aim of video tracking is to extract the trajectories of a target or object of interest, i.e. accurately locate a moving target in a video sequence and discriminate target from non-targets in the feature space of the sequence. So, feature descriptors can have significant effects on such discrimination. In this paper, we use the basic idea of many trackers which consists of three main components of the reference model, i.e., object modeling, object detection and localization, and model updating. However, there are major improvements in our system. Our forth component, occlusion handling, utilizes the r-spatiogram to detect the best target candidate. While spatiogram contains some moments upon the coordinates of the pixels, r-spatiogram computes region-based compactness on the distribution of the given feature in the image that captures richer features to represent the objects. The proposed research develops an efficient and robust way to keep tracking the object throughout video sequences in the presence of significant appearance variations and severe occlusions. The proposed method is evaluated on the Princeton RGBD tracking dataset considering sequences with different challenges and the obtained results demonstrate the effectiveness of the proposed method.
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...IJCSEA Journal
The registration in augmented reality is process which merges virtual objects generated by computer with
real world image caught by camera. This paper describes the knowledge-based registration, computer
vision-based registration and tracker-based registration technology. This paper mainly focused on trackerbased registration technology in augmented reality. Also described method in tracker- based technology,
problem and solution.
Implementation of Object Tracking for Real Time VideoIDES Editor
Real-time tracking of object boundaries is an
important task in many vision applications. Here we propose
an approach to implement the level set method. This approach
does not need to solve any partial differential equations (PDFs),
thus reducing the computation dramatically compared with
optimized narrow band techniques proposed before. With our
approach, real-time level-set based video tracking can be
achieved.
Goal location prediction based on deep learning using RGB-D camerajournalBEEI
In the navigation system, the desired destination position plays an essential role since the path planning algorithms takes a current location and goal location as inputs as well as the map of the surrounding environment. The generated path from path planning algorithm is used to guide a user to his final destination. This paper presents a proposed algorithm based on RGB-D camera to predict the goal coordinates in 2D occupancy grid map for visually impaired people navigation system. In recent years, deep learning methods have been used in many object detection tasks. So, the object detection method based on convolution neural network method is adopted in the proposed algorithm. The measuring distance between the current position of a sensor and the detected object depends on the depth data that is acquired from RGB-D camera. Both of the object detected coordinates and depth data has been integrated to get an accurate goal location in a 2D map. This proposed algorithm has been tested on various real-time scenarios. The experiments results indicate to the effectiveness of the proposed algorithm.
Extraction of Buildings from Satellite ImagesAkanksha Prasad
Buildings are termed as important components for various applications. Building extraction is defined as a sub-problem of Object Recognition. Though, numerous building extraction techniques have been proposed in the literature. But still they often exhibit limited success in the real scenarios. The main purpose of this research is to develop an algorithm which is able to detect and extract buildings from satellite images. In the proposed approach feature-based extraction process is used to extract buildings from satellite images. The overall system is tested and high performance detection is achieved which shows the effectiveness of proposed approach.
3D Reconstruction from Multiple uncalibrated 2D Images of an ObjectAnkur Tyagi
3D reconstruction is the process of capturing the shape and appearance of real objects. In this project we are using passive methods which only use sensors to measure the radiance reflected or emitted by the objects surface to infer its 3D structure.
Features for Cross Spectral Image Matching: A SurveyjournalBEEI
In recent years, cross spectral matching has been gaining attention in various biometric systems for identification and verification purposes. Cross spectral matching allows images taken under different electromagnetic spectrums to match each other. In cross spectral matching, one of the keys for successful matching is determined by the features used for representing an image. Therefore, the feature extraction step becomes an essential task. Researchers have improved matching accuracy by developing robust features. This paper presents most commonly selected features used in cross spectral matching. This survey covers basic concepts of cross spectral matching, visual and thermal features extraction, and state of the art descriptors. In the end, this paper provides a description of better feature selection methods in cross spectral matching.
Object detection for service robot using range and color features of an imageIJCSEA Journal
In real-world applications, service robots need to locate and identify objects in a scene. A range sensor
provides a robust estimate of depth information, which is useful to accurately locate objects in a scene. On
the other hand, color information is an important property for object recognition task. The objective of this
paper is to detect and localize multiple objects within an image using both range and color features. The
proposed method uses 3D shape features to generate promising hypotheses within range images and
verifies these hypotheses by using features obtained from both range and color images.
1. Object recognition and pose
estimation from an RGB-D image
BERICHT ZUR FORSCHUNGSPRAXIS
von
Chiraz Nafouki
geb. am 15.09.1992
wohnhaft in:
Willi-Graf-Strasse 17 -0211
80805 M¨unchen
Tel.: 015175658353
Lehrstuhl f¨ur
STEUERUNGS- und REGELUNGSTECHNIK
Technische Universit¨at M¨unchen
Univ.-Prof. Dr.-Ing./Univ. Tokio Martin Buss
Fachgebiet f¨ur
Dynamic Human-Robot-Interaction for Automation Systems
Technische Universit¨at M¨unchen
Prof. Dongheui Lee, Ph.D.
Betreuer: M.Sc. Shile Li
Beginn: 07.09.2015
Zwischenbericht: 15.10.2015
Abgabe: 18.01.2016
2.
3. Abstract
Object recognition and 6-DoF pose estimation play an important role in many
robotic applications. They make it possible to identify objects in the environment
and accurately estimate their pose in the space.
In this report, a new feature descriptor that combines geometry and color informa-
tion for object recognition and 6-DoF pose estimation is presented. The training
database is obtained by generating synthetic views of the 3D object models. At
recognition stage, a segmentation of the scene is performed and the objects and
their poses are recognized by searching the k-nearest neighbours in the database.
7. 5
Chapter 1
Introduction
Object recognition and 6-DoF pose estimation has gained an increasing interest
in recent years. In many robotic applications, such as object grasping, tracking
and occlusion handling, the robotic perception should be able to correctly identify
objects and accurately estimate their 6-DoF pose in real time. The methods for
object recognition can be classified into two main categories: 2D-based methods
and 3D-based methods. The 2D-based methods require to first extract keypoints
from an image. Then, a descriptor is evaluated for each keypoint, usually based on
its surrounding pixels, and the descriptors are saved in a database. The 3D-based
methods came to overcome the limits of the 2D-based methods which can not fully
represent spatial information. With the advent of low-cost RGB devices such as the
Kinect introduced by Microsoft, studying 3D-based techniques has become a more
interesting research area.
In order to achieve 3D-object recognition, many feature descriptors were developed.
There are two main kinds of descriptors: local descriptors and global descriptors.
Like 2D-based methods, local descriptors are also based on extracting keypoints
from the 3D-object model. Each local descriptor corresponds to one keypoint and
therefore there are as many local descriptors as extracted keypoints. On the other
hand, by using a global descriptor, a whole partial view of an object corresponds
to one single descriptor. Unlike local descriptors, global descriptors do not require
keypoints detection. This results in reduced computational cost compared with lo-
cal descriptors especially in the matching stage. Therefore, we chose to develop a
global descriptor for object recognition and pose estimation because they are faster,
more robust against noise and more suitable for real-time applications than local
descriptors.
The pipeline of object recognition and pose estimation with global descriptors is
presented in Fig. 1.1. The first step of the object recognition pipeline begins with
object modelling. In this step, the 3D object models are build. Then, a large amount
of synthetic views are generated for each object model. Since we want to be able
8. 6 CHAPTER 1. INTRODUCTION
to recognize each object with different poses and that each viewpoint corresponds
to a unique object pose with respect to the camera, we need to generate as many
viewpoints as possible in order to provide a more accurate pose estimation. For our
experiments, we used 1260 generated synthetic views for each object. After that, a
global descriptor is estimated for each synthetic view in the feature extraction step
and then stored with the object pose label into a database. For the the testing stage,
Figure 1.1: Pipeline of object recognition and pose estimation with global descriptors
we consider a real color point cloud scene and the aim is to identify the different
objects in the scene and estimate their poses. In order to achieve this aim, a segmen-
tation of the scene should be first performed in order to filter the horizontal surface,
on which the objects are supposed to be lying from the scene. The remaining points
in the scene should correspond to the different objects.Then, a clustering of these
objects is performed based on Euclidean distance, so that each cluster corresponds
to one object to be recognized. After that, a global feature descriptor is estimated
for each cluster and is compared to the descriptors of the training data. The object
recognition and its pose estimation results from finding the best matching i.e. the
nearest neighbour in the database with respect to a certain metric.Finally, the pose
estimation can be optimized by using an optimization technique such as Iterative
Closest Point (ICP).
Our work focused mainly on the feature extraction part. We developed a global
descriptor that combines geometry and colour information. Our motivation behind
this work was that, in many 2D-based and 3D-based recognition approaches, colour
information was proven to improve the recognition rate and and solve the problem of
identifying objects having similar shapes but different colours. Logically, this should
also stand in the case of global descriptors. However, most of the global descriptors
from the state of art use only geometric information and ignore colour information.
Therefore , we try to efficiently include colour information in our global descriptor
in order to improve the recognition rate and pose estimation accuracy. For this pur-
pose, the Point Cloud Library (PCL), an open-source library that allows 3D point
cloud processing [RC11] was used.
9. 7
This report is organized as follows. In Chapter 2, two global descriptors from the
state of art are presented. In Chapter 3, we describe the object modelling of the
training data. Our global descriptor and the feature extraction process are both pre-
sented in Chapter 4. Object Recognition and pose estimation steps are described in
Chapter 5. In Chapter 6, we present our experiment set-up and the experimental
results. Finally, we conclude our work in Chapter 7.
11. 9
Chapter 2
Related Work
In this section, we present two global descriptors from the state-of-the-art which
enable to achieve real-time object recognition and pose estimation and explain the
underlying principles of each descriptor. Our descriptor was inspired in some aspects
by these two descriptors and we aimed at overcoming some of their drawbacks which
will be further explained in this section.
2.1 The Viewpoint Feature Histogram
The Viewpoint Feature Histogram (VFH) is a global descriptor which is based on
the surflet-pair relation. The surflet-pair relation [WHH03] encodes the geometric
properties of the surface of an object by using the surface normals. Let pi and
pj be two arbitrary points of an object and ni and nj their corresponding surface
normals. The surflet-pair relation describes the Euclidean distance between pi and
pj, as well as the angles between each surface normal and the connecting line of
the two points as shown in Fig. 2.1. There are many local and global descriptors
which use the surflet-point-pair relation such as the Point Feature Histogram (PFH)
[Rus09], the Fast Point Feature Histogram (FPFH) [RBB09] and the Ensemble of
Shape Functions (ESF) [WV11]. The Viewpoint Feature Histogram calculates the
Figure 2.1: Surflet-pair relation between two points pi and pj
surflet-point-pair relation between each point and the centroid of an object. Then, it
12. 10 CHAPTER 2. RELATED WORK
accumulates the estimated point pair features into a single histogram. The size of the
VFH’s descriptor is 308 which enables to achieve real time performance in object
recognition applications. However the VFH descriptor includes only geometrical
information and does not take into account color information. Therefore, it can not
distinguish between two objects which have the same shape but different colors such
as the two cans in Fig. 2.2. An other issue related to VFH descriptor is that its
accuracy is closely related to the estimation of the centroid’s position. Therefore,
in case the object is partly occluded, the centroid is not correctly estimated, which
may affect the recognition results.
Figure 2.2: An example of two objects with same shape but different colors: a sprite
can and a mezzomix can
2.2 The Viewpoint oriented Colour-Shape Histogram
The Viewpoint oriented Colour-Shape Histogram (VCSH) [Li12] is a global descrip-
tor which includes both geometric and color information. It also uses the surflet-
point-pair relation for estimating the geometrical feature of the object and adds
color information by using a smoothed color ranging method. The geometric part
is obtained by calculating four features for each point pi of the object: the distance
between the point pi and the centroid, the angle between the surface normal at pi
and the viewpoint direction, the angle between the viewpoint direction and the line
connecting pi to the centroid and the distance between the centroid and its projec-
tion on the surface tangent in pi. For the color information, VCSH uses HSV color
space. Unlike RGB color space, HSV is invariant to illumination changes because
it separates color from intensity information. The descriptor defines 8 color ranges
and in each color range, each of the four geometrical features is encoded in 30 bins.
Therefore, the total size of VCSH is 960. Although the correlation between color
and geometry in this descriptor allows a more accurate object recognition and pose
estimation. However, the large size of the descriptor increases the computational
cost especially in the matching stage. Besides, the color histogram used in VCSH
considers only the total distribution of colors in the object and does not take into
the spatial location of the colors. However, it is possible that two objects have
the same color distribution but are perceptually different such as in Fig. 2.3. Our
13. 2.2. THE VIEWPOINT ORIENTED COLOUR-SHAPE HISTOGRAM 11
Figure 2.3: Two perceptually different objects with same color distributions.
descriptor aims at overcoming this problem by dividing the object into subregions
based on the distance to the centroid and estimating a color histogram for each
subregion. For the color histogram, CIELab color space is used. Like HSV, CIELab
separates color components from intensity components which makes it invariant to
illumination changes. Moreover, CIELab is more numerically stable than HSV at
low illumination intensity and is therefore chosen to be used in our work.
15. 13
Chapter 3
Scene segmentation and Object
Modelling
In this section, we describe the preprocessing of testing data by scene segmentation
and object clustering and the object modelling of the training data. In the training
dataset, the object models are already clustered, so that each point cloud object
corresponds to a single object captured from a known viewpoint. A descriptor is
then calculated for each object model and stored in the database with the object
identity and pose labels. However, in the testing dataset, the objects to be recog-
nized are parts of a complicated real world scene. The objects are not separated
beforehand from the background or from other objects like in the case of training
data. Therefore, we first need to perform a segmentation of the scene in order to
extract the horizontal plane (table, floor ...), on which our objects are supposed to
be lying. Then, we cluster the remaining point clouds in the scene so that each clus-
ter should correspond to a single object to be recognized. After that, we estimate a
feature descriptor for each object and compare it to the descriptors in the training
dataset in order to recognize the object and extract its initial pose estimation.
3.1 Plane Segmentation and object clustering
Plane segmentation consists in identifying the horizontal plane on which the objects
lie and extracting this plane from the point cloud in order to facilitate the clustering
of the remaining objects. Fig. 3.1 shows an example of a real world scene taken with
a Kinect. The scene contains not only the objects that need to be recognized, but
also an horizontal white plane which corresponds to a table on which the objects
lie. In order to extract the horizontal plane from the scene, Random Sample Con-
sensus (RANSAC) method is used. This method allows the estimation of the plane
parameters. If we denote by P the scene point cloud, a plane H can be described
16. 14 CHAPTER 3. SCENE SEGMENTATION AND OBJECT MODELLING
Figure 3.1: An example of a real world scene before segmentation
using four parameters {a,b,c,d} and is defined as follows:
H = {(x, y, z)T
∈ P|ax + by + cz + d = 0} (3.1)
To estimate the parameters {a,b,c,d}, RANSAC algorithm first selects three random
points from the point cloud P and calculates the four parameters of the plane P1
defined by the three points. Then the algorithm calculates the Euclidean distances
of all points p ∈ P to the plane P1. A point p of P is considered to be a point of
P1 if its distance d to P1 satisfies the condition : d < dlim where dlim was set in our
experiments to the value 0.01. The algorithm then determines the total number of
points that belong to P1. This process is repeated 100 times. And the plane model
Pi, i ∈ [1, 100] that contains a maximum number of points is considered as the plane
to be extracted and is filtered from the scene.
Once the horizontal plane is eliminated from the scene, the remaining points are
clustered into different clusters. The first step of the clustering algorithm is based
on a k-d tree search. A k-d tree [Ben75] of a point cloud P is a binary tree, where
each node is a k-dimensional point. Every non-leaf node defines a splitting hyper-
plane that divides the space into two parts based on a specific dimension among the
k dimensions. The steps of the clustering algorithm [Rus09] are as follows:
1) Generate a k-d tree representation of the input point cloud P.
2) Set up an empty list of clusters C and a queue of points that need to be checked
Q.
3) For every point pi ∈ P perform the following steps:
1.Add pi to the current queue Q
2.For every point pi ∈ Q do:
• Search for the set Pi
of point neighbours of pi in a sphere with a
radius r to be fixed.
• For every point in Pi
, check if the point has already been processed,
and if not add it to Q.
3.When all points in Q have been processed, add Q to the list of clusters
C, and reset Q to an empty list.
17. 3.2. OBJECT MODELLING 15
4) The algorithm is repeated until all points in P have been processed and have
become part of the list of point clusters C.
The result of the algorithm should be the different clusters, where each cluster cor-
responds to an individual point cloud object to be recognized. Fig. 3.2 shows an
example of two objects obtained after plane segmentation and object clustering of
the scene shown in Fig. 3.1.
Figure 3.2: An example of two teabags obtained after plane segmentation and object
clustering of the scene shown in Fig. 3.1
3.2 Object Modelling
3.2.1 Na¨ıve Sampling
For our experiments, we used the training dataset provided by [Li12]. The object
models which form the training dataset are obtained by generating synthetic views
from different viewpoints for each 3D object point cloud. To generate the 3D object
models, a rotated plane was used on which the objects were put for modelling.
Point clouds corresponding to different viewpoints were captured using a Kinect
sensor. Once the 3D object models are formed, the synthetic views generation
phase begins. A synthetic view means the projection of the 3D object model from
a certain viewpoint on a 2D plane. Since each viewpoint corresponds to a certain
object pose, the more viewpoints we have, the more accurate the pose estimation
will be. For each object model, 1260 synthetic views were generated and stored
in a database with the object identity and pose labels. The method of sampling
used for synthetic views generation in [Li12] is called na¨ıve sampling [J.K04]. It
is based on uniformly sampling each one of the Euler angles independently to get
the different rotations of the 3D object. Once the object is rotated, a projection
on a 2D plane is performed in order to get the genrated view. The synthetic view
generation rate was: every 10◦
in elevation and every 2◦
in azimuth. Fig. 3.3 shows
an example of some object models which were used for the training dataset. As
shown in this figure, some objects have the same shape but different colours and
others have similar colours but different shapes. Therefore, these objects are suitable
for evaluating the performance of our descriptor in combining colour and geometry
information for object recognition and pose estimation.
18. 16 CHAPTER 3. SCENE SEGMENTATION AND OBJECT MODELLING
Figure 3.3: An example of object models used for the training dataset
(a) Na¨ıve Sampling (b) Uniform Sampling
Figure 3.4: Generation of 1260 samples using (a)na¨ıve sampling and (b)uniform
sampling. Both figures are shown from an upper view.
3.2.2 Uniform Sampling
The previous approach was based on uniformly sampling each of the elevation and
azimuth angles independently. However, this approach does not result in a uniform
distribution of the generated viewpoints. In fact, the na¨ıve sampling method pre-
sented in the previous paragraph results in a distribution that is heavily concentrated
in polar regions [J.K04]. This sampling is therefore inefficient because it results in
more samples towards the poles that is needed for the recognition. Moreover, it
results in a sparser distribution near the equator which affects the recognition of
viewpoints taken at a low elevation angle. Fig. 3.4a shows an upper view of a na¨ıve
sampling of a sphere using elevation and azimuth angles for generating 1260 samples
as explained in the previous paragraph. A better method of sampling is called uni-
form sampling [J.K04]. This method allows us to generate a uniformly distributed
sampling of rotations. It is based on generating uniform distributions in the range
[−π, π] for the pitch angle, and using the inverse cosine relationship to generate the
yaw angle in the range [0, π/2]. The use of the inverse cosine allows us to avoid
oversampling the polar regions. The steps of the uniform sampling algorithm for a
single viewpoint generation are given in the following pseudocode:
19. 3.2. OBJECT MODELLING 17
1) Choose the pitch angle θ as follows: θ = 2 ∗ π ∗ rand() − π.
2) Choose the yaw angle φ as follows:φ = arccos(rand()).
Fig. 3.4b shows an upper view of the result of a uniform sampling of a sphere
using pitch and yaw angles for generating 1260 synthetic views. It is clear that the
problem of oversampling near polar regions is solved with this method resulting in
a uniform distribution of sampling rotations.
20. 18 CHAPTER 3. SCENE SEGMENTATION AND OBJECT MODELLING
21. 19
Chapter 4
Feature Extraction
In this section, we describe our global feature descriptor. Our feature descriptor is
composed of two parts. The first part contains merely geometric information and
is inspired from the surflet-pair relation. This part is more robust against occlu-
sions than VFH or VCSH descriptors because it is not based on calculating the
surflet-pair relation between the centroid and other points of the object. Instead,
it calculates the surflet-pair relation by using randomly chosen point pairs. In the
second part, colour and geometry information are correlated by dividing the object
into subregions and estimating the colour histogram for each subregion. By this
way, the colour locations are better taken into account than in VCSH which only
calculates the total color distribution within the object.
Let’s consider a point cloud P, which corresponds to an object and contains n
points pi, with i ∈ [0, n]. To calculate the first part of the descriptor, we start by
choosing random point pairs (pi, pj) from the point cloud object a certain number
of times and calculate their normals (ni, nj) each time. For each randomly cho-
sen point pair , three geometric features inspired from the surflet-pair relation are
evaluated. These features were proven to effectively encode geometric information
of point cloud surfaces, especially when these point clouds contain many surface
variations [WHH03]. The first feature consists in the Euclidean distance d(pi, pj)
between the two random points:
d(pi, pj) = ||pi − pj|| (4.1)
The distance feature is encoded in a 30-bin histogram. To define the limits of each
bin, we first calculate the maximum distance dmax between two points within the
object:
dmax = max{d(pi, pj); pi ∈ P, pj ∈ P} (4.2)
Then, we normalize the bins with respect to the maximum distance dmax. An
example of calculation of the distance histogram is shown in Fig. 4.1. For each
two random points chosen in the object, the distance between them is calculated
and the corresponding bin is incremented by 1. The second feature consists of the
22. 20 CHAPTER 4. FEATURE EXTRACTION
angle between each of the two normals and the connecting line of the two points.
Therefore for each point pair (pi, pj), we evaluate two angles a1 and a2 as follows:
a1 = (ni, pipj)
a2 = (nj, pipj) (4.3)
The angles are also encoded in a 30-bin histogram. The third geometric feature
Figure 4.1: Estimation of the distance histogram for a point cloud object. The
distance between each randomly chosen point pair is classified into one bin among
30 bins.
corresponds to the angle a3 between the viewpoint direction v and the connecting
line of the two points:
a3 = (v, pipj) (4.4)
The viewpoint direction v is defined as the line connecting the origin of the camera
to the centroid of the object pi. This feature is also encoded in a 30-bin histogram.
Therefore, the size of the first part of the descriptor is 30 · 3 = 90.
The second part of our descriptor includes both colour and geometric information.
For this part, we need to divide the object into 10 regions, based on the distance to
the centroid. In order to achieve this, we start by calculating the maximum distance
d′
max to the centroid c:
d′
max = max{d(c, pi); pi ∈ P} (4.5)
Then, for each random point pi, we calculate its distance to the centroid d(c, pi).
The region number Rn ∈ [1, 10] to which the point pi belongs is determined as
follows:
Rn = ceil(d(c, pi) · 10/d′
max) (4.6)
23. 21
Figure 4.2: Definition of 6 regions for an object characterized by its centroid c and
the maximum distance to the centroid d′
max
Fig. 4.2 shows an example of the definition of regions for an object characterized
by its centroid c and the maximum distance to the centroid d′
max. For the purpose
of simplification, we divided the object into 6 regions only. Once we classified the
point pi into one region between 1 and 10, we convert its RGB colour to CIELab
colour space. CIELab is a three dimensional colour space, which was introduced by
the Commission Internationale de l’Eclairage (CIE) and adopted in 1970. Unlike
RGB, CIELab is device independent. This means that it is not related to a specific
device such as camera, scanner, etc. An other advantage of CIELab compared with
other color spaces such as RGB or HSV is that it is perceptually uniform. This
means that in this color space, the distance between points is directly proportional
to color difference perceived by the human eye [CF97]. CIELab color space can
be represented in 3 dimensional space by a sphere with three coordinates designed
by L∗
, a∗
and b∗
as shown in Fig. 4.3. The vertical L∗
axis represents lightness
or gray scale and ranges from 0 to 100. The horizontal a∗
and b∗
axes represent
the chroma and cross each other in the center of the sphere. The a∗
axis is green
at one extremity, and red at the other.The b∗
axis is blue at one extremity, and
yellow at the other. The choice of the axes a∗
and b∗
is based on the color opponent
process, which suggests that there are three opponent channels in the human vision
system: red versus green, blue versus yellow, and black versus white [CIE04]. If we
move horizontally away from the L∗
axis, the color saturation increases gradually.In
theory there are no maximum values of a∗
and b∗
but in practice, a∗
values range in
[−500, 500] and b∗
values in [−200, 200]. The transformation from RGB color space
to CIELab color space is a non-linear transformation. The RGB values are first
normalized to the range [0, 1] and then transformed to an intermediate space called
24. 22 CHAPTER 4. FEATURE EXTRACTION
Figure 4.3: CIELab color space.
CIEXYZ color space [CF97] as follows:
X = R ∗ 0.4124 + G ∗ 0.3576 + B ∗ 0.1805
Y = R ∗ 0.2126 + G ∗ 0.7152 + B ∗ 0.0722
Z = R ∗ 0.0193 + G ∗ 0.1192 + B ∗ 0.9505
(4.7)
After that, the XYZ values are transformed to CIELab colour space using the fol-
lowing equations:
L∗
= 116f(Y/Y0) − 16
a∗
= 500[f(X/X0) − f(Y/Y0)]
b∗
= 500[f(Y/Y0) − f(Z/Z0)]
(4.8)
where X0 = 95.047 , Y0 = 100.000 and Z0 = 108.883. The function f is defined as
follows:
f(x) =
3
√
x if x > 0.008856
7.787x + 0.138 else
.
For each of the ten regions, we define a color histogram of size 30 bins, where each of
the 3 CIELab values is encoded into 10 bins. Therefore the size of the second part of
our descriptor is 30 · 10 = 300 and the total size of the descriptor is 90 + 300 = 390.
Fig. 4.4 shows how our descriptor is divided. The first part contains three geo-
metric features: The distance d between two random points, the angle between a
segment connecting two random points and their normals and the angle between
the viewpoint direction and a segment connecting two random points. The second
part of the descriptor allows a correlation between colour ang geometry information
since each random point is first classified into a region based on its distance to the
25. 23
Figure 4.4: Structre of the feature descriptor
centroid and then the colour histogram corresponding to its region is incremented
based on its CIELab values.
27. 25
Chapter 5
Recognition and Pose Estimation
In this section, the object recognition and pose estimation phases are described.
The goal of object recognition is to provide a set of candidates from the training
dataset that could correspond to an object taken from a real world scene. The 6-DoF
pose associated to each candidate is also retrieved from the training data and given
as the initial pose estimation of the object. This initial pose estimation can be im-
proved by using an optimization technique such as the Iterative Closest Point (ICP).
5.1 Object Recognition
The recognition problem can be seen as a nearest neighbour estimation problem.
To achieve the nearest neighbour search, a k-d tree structure of all the descriptors
of the training dataset was used. Then, given a feature descriptor f of an object
to be recognized, the k-nearest descriptors are searched in the k-d tree structure
and sorted according to their distances to f. Chi-Squared distance was used for the
nearest neighbours search. If f = (f1, f2, ..., f390)T
, and g = (g1, g2, ..., g390)T
are two
descriptors, the Chi-Squared distance d(f, g) between f and g is defined by :
d(f, g) =
390
k=1
(fk − gk)2
fk + gk
. (5.1)
Once the k-nearest neighbours i.e descriptors having the k minimal distances to f
are found, their corresponding pose estimations are retrieved from the database.
In our case, we take k=1 and therefore we consider only the nearest neighbour for
object recognition and pose estimation. The initial pose estimation gives us an
initial guess of the transformation matrix Tinit from the object-coordinate-system to
the camera-coordinate-system.
28. 26 CHAPTER 5. RECOGNITION AND POSE ESTIMATION
5.2 Pose Estimation
The initial pose estimation of a recognized object can be optimized using Itera-
tive Closest Point (ICP) algorithm [Zha94]. ICP is used to minimize the difference
between two point clouds. One point cloud, known as the reference or target, is
kept fixed, while the other point cloud, known as the source, is transformed to best
match the reference. The algorithm iteratively changes the geometric transforma-
tion, which is a combination of translation and rotation applied on the source, in
order to minimize the distance from the source to the reference point cloud. In our
case, the reference point cloud is the object in the testing data and the source is the
recognized object in the training data taken from a certain viewpoint. After iterat-
ing a certain number of times, the transformation matrix Ticp calculated using ICP
is estimated. The final transformation matrix from the camera-coordinate-system
to the object-coordinate-system is therefore obtained as follows:
Tfinal = Ticp · Tinit. (5.2)
Once we estimate Tfinal, we can apply the final geometric transformation on the
3D model of the recognized object, visualize the object in its estimated pose and
compare the estimated pose with the real pose of the object.
29. 27
Chapter 6
Experimental Results
In this chapter, we present the results of some experiments that we have conducted
to evaluate the performance of our descriptor. Our experimental setting consists
of a Kinect sensor and an horizontal plane (floor or table) on which the objects to
be recognized lie as shown in Fig. 6.1. Our code was implemented using the open
source point cloud library (PCL) and was integrated into a ROS node. This allowed
us to receive online frames from Kinect and to publish in real time the identities
and pose estimations of the recognized objects. Our experimental results take into
Figure 6.1: Experimental setting
consideration three main aspects: recognition rate, pose accuracy and occlusion
handling. We show here that our descriptor is able in most cases to correctly identify
objects and estimate their poses, even if they are partially occluded by other objects.
6.1 Recognition rate
In order to estimate the ability of our descriptor to correctly identify an object,
we consider for each testing object its nearest neighbour in the training data. we
30. 28 CHAPTER 6. EXPERIMENTAL RESULTS
(a) (b)
Figure 6.2: Original object (a) and superposition of the recognized object in its
estimated pose with the real scene (b).
compare the label of the nearest neighbour to the real label of the object and if
the nearest neighbour is misidentified, we consider the recognition to be wrong,
otherwise we consider the recognition to be correct. An other possible option was
to take several nearest neighbours and to perform a voting process to decide the
identity of the object. We took 20 different random partial views of each of the the
objects that we want to recognize in real scene. We obtain a recognition rate of
94% which confirms the liability of our descriptor in correctly recognizing objects.
We also evaluated the recognition rate using 100 random synthetic viewpoints taken
from the training data. We obtain a recognition rate of 100%. We recall that our
descriptor is not deterministic since it is based on the choice of N random points
from the point cloud object, with N being the total number of points in the object.
Therefore, the feature vector obtained from the application of our descriptor to a
same object taken from a fixed viewpoint is not constant.
6.2 Pose accuracy
One of the main aims of our descriptor is to give a good estimation of the pose of the
object to be recognized. In order to evaluate the accuracy of the pose estimation, we
superpose the reconstructed 3D object model in its estimated pose with the object in
its real pose as captured in the scene. We compare the superposition of objects with
the eye and deduce if the pose is estimated accurately. Fig. 6.2 shows an example of
the original object taken in a real scene (Fig. 6.2a) and the superposition between
the original object and the recognized object in its estimated pose (Fig. 6.2b). As
we can notice from the superposition of the two objects, the estimated pose is very
close to the real pose. In most cases our descriptor was able to give a very accurate
estimation of the object.
Fig. 6.3 shows an example of object recognition and pose estimation of different
objects. Here, the estimated poses are used for the reconstruction of the 3D models.
31. 6.3. OCCLUSION HANDLING 29
(a) (b)
Figure 6.3: Original camera view (a) and reconstructed 3D models based on the
estimated poses of the recognized objects (b).
(a) (b)
Figure 6.4: Original camera view (a) and reconstructed 3D models based on the
estimated poses of the recognized objects in case of partial occlusion (b).
6.3 Occlusion handling
An other feature of our descriptor is its ability to handle partial occlusions. In fact
the choice of random points to evaluate the feature vector makes our descriptor
more robust against missing points which may result from occlusions or from sensor
inaccuracy. In order to estimate the performance of our descriptor in handling par-
tial occlusions, we realized many experiments in which some objects were partially
occluded by other objects. Our descriptor was able in most cases to correctly rec-
ognize the objects. The estimated poses are less accurate than without occlusions
but are still acceptable. Fig.6.4 shows an example of object recognition and pose
estimation in case of a partial occlusion.
33. 31
Chapter 7
Conclusion and future work
In this report, our global feature descriptor that combines color and geometry in-
formation for object recognition and pose estimation is presented. Our descriptor
is composed of two main parts: the first part is based on the extraction of geomet-
ric properties using random point pairs. The second part is based on a correlation
between color and geometry information using also random points from the object.
Experiments show the ability of our descriptor to recognize and estimate accurately
the pose of objects having different shapes and colors. The integration of color infor-
mation makes our descriptor capable of distinguishing objects having similar shapes
but different colours. Moreover, the choice of random points makes our descriptor
more robust against occlusions which was also shown in our experiments.
In future work, the occlusion handling could be further improved, mainly by mak-
ing the second part of our descriptor independent of the centroid estimation. An
other possible solution for better occlusion handling is to apply the first part of
our descriptor independently to estimate the position of the centroid and use the
estimated position in the second part to include the colour information. An other
possible improvement for our work is to use Locality-Sensitive Hashing (LSH) algo-
rithm instead of kd-tree structure for the nearest neighbours search. In fact, LSH
was proven to be more efficient than kd-tree for high dimensional vectors which is
the case for our descriptor.
37. BIBLIOGRAPHY 35
Bibliography
[Ben75] Jon Louis Bentley. Multidimensional binary search trees used for associative
searching. Magazine Communications of the ACM, 18:509–517, September
1975.
[CF97] Christine Connolly and Thomas Fliess. A study of efficiency and accuracy
in the transformation from rgb to cielab color space. In IEEE Transactions
on image processing, Vol. 6, No. 7, July 1997.
[CIE04] CIE. Colorimetry, 3rd Ed., Publication No. CIE 15:2004, Vienna, Austria.
Technical report, 2004.
[J.K04] James J.Kuffner. Effective sampling and distance metrics for 3d rigid body
path planning. In IEEE International Conference on Robotics and Automa-
tion (ICRA), 2004.
[Li12] Shile Li. Fast rigid object recognition and 6-dof pose estimation using color
and depth information. B.S. Thesis, Technische Universit¨at M¨unchen, 2012.
[RBB09] Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. Fast Point Feature
Histograms (FPFH) for 3D Registration. In Proceedings of the IEEE In-
ternational Conference on Robotics and Automation (ICRA), Kobe, Japan,
May 12-17 2009.
[RC11] R. B. Rusu and S. Cousins. 3D is here: Point Cloud Library (PCL). In IEEE
International Conference on Robotics and Automation (ICRA), Shanghai,
China, May 9-13 2011.
[Rus09] Radu Bogdan Rusu. Semantic 3D Object Maps for Everyday Manipula-
tion in Human Living Environments. PhD thesis, Technische Universit¨at
M¨unchen, 2009.
[WHH03] Eric Wahl, Ulrich Hillenbrand, and Gerd Hirzinger. Surflet-pair-relation
histograms: A statistical 3D-shape representation for rapid classification.
In 3-D Digital Imaging and Modeling, pages 474–482, 2003.
[WV11] Walter Wohlkinger and Markus Vincze. Ensemble of shape functions for
3D object classification. In ROBIO, pages 2987–2992, 2011.
38. 36 BIBLIOGRAPHY
[Zha94] Zhengyou Zhang. Iterative point matching for registration of free-form
curves and surfaces. International Journal of Computer Vision, 13:119–
152, 1994.