PS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958H

A REPORT
ON
Realtime 3D segmentation
By
Gunjan Kumar Singh 2012B5A7521P Msc. (Hons.) In Physics and B.E. (Hons.) in Computer Science
Saurabh Bhardwaj 2012B5A7848P Msc. (Hons.) In Physics and B.E. (Hons.) in Computer Science
Divya Sanghi 2012B4A7958H Msc. (Hons.) In Mathematics and B.E. (Hons.) in Computer Science
AT
CSIR-Central Electronics Engineering Research Institute
Pilani-333031
A Practice School-I station of
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
23rd
May - 17th
July, 2014

A REPORT
ON
Realtime 3D segmentation
By
Prepared in the partial fulfilment of
Practice School-I
(BITS F221)
Under the guidance of
DR. JAGDISH RAHEJA
PRINCIPAL SCIENTIST, DIGITAL SYSTEMS GROUP
AT
CSIR-Central Electronics Engineering Research Institute (CEERI)
Pilani-333031
A Practice School-I station of
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
23rd
May - 17th
July, 2014

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE
PILANI (RAJASTHAN)
Practice School Division
Station: CENTRAL ELECTRONICS ENGINEERING RESEARCH INSTITUTE (CEERI) Centre: Pilani
Duration: From: 23rd
May, 2014 To: 17th
July, 2014
Date of Submission: 15th
July, 2014
Title of the Project: REALTIME 3D SEGMENTATION
Name of the Student ID No. Discipline
Name of the Expert: Dr. Jagdish Raheja Designation: Principal Scientist
Name of the PS Faculty: Mr. Parikshit Kishor Singh
Key words: Object segmentation, occlusion check, Graph, Adjacency Matrix
Project Area: 3D Image Processing
Abstract: A real-time algorithm that segments unstructured and highly cluttered scenes is discussed in
this paper. The algorithm robustly separates objects of unknown shape in congested scenes of stacked
and occluded objects. The model-free approach finds smooth surface patches, using a depth image
from a Kinect camera, which are subsequently combined to form highly probable object hypotheses.
Co-planarity and curvature matching is used to recombine surfaces separated by occlusion. The real-
time capabilities are proven and the quality of the algorithm is evaluated on a benchmark database.
Advantages compared to existing approaches as well as weaknesses are discussed.
Date
15th
July 2014

Table of Contents
Topic Page No.
Acknowledgement..................... .................................................. 1
Introduction............................... .................................................. 2
Pre Segmentation....................... .................................................. 3
 Median filter.................... .................................................. 3
 Determining surface normal and temporal smoothing ...... 3
 Detection of surface normal edges .................................... 4
 Segmentation into surface patches..................................... 4
 Connected component analysis algorithm......................... 5
High Segmentation.................... .................................................. 9
 Adjacency matrix and assignment of edge points ............. 9
 Cutfree neighbors............ .................................................. 9
 Improving adjacency matrix.............................................. 10
 Co-Planarity.................... .................................................. 10
 Curvature matching......... .................................................. 11
 Probabilistic Object Composition (Graph Cut) ................. 12
 Remaining edge points.... .................................................. 13
Code Explanation ..................... .................................................. 14
 Structures ........................ .................................................. 14
 Functions......................... .................................................. 14
 Thinning Algorithm ........ .................................................. 16
 Main function.................. .................................................. 17
Results ....................................... .................................................. 20
Conclusion................................. .................................................. 22
References ................................. .................................................. 23

1
Acknowledgement
Firstly, we are very grateful to Practice School Division (PSD), BITS-Pilani for
providing us an opportunity to pursue our Practice School-1(PS-1) under guidance of
eminent scientists at Central Electronics Engineering Research Institute (CEERI), Pilani.
We would like to express our sincere thanks to Dr. Chandrashekhar, Director,
CEERI, Pilani for giving us the opportunity to carry out a project in this esteemed
organization. We would also like to thank Dr. J.L. Raheja our Project Guide for
suggesting us the project and providing us valuable guidance and support throughout
our work. We would like to extend our gratitude to Ms. Zeba and all others who were
directly or indirectly related to this project.
We are grateful to Mr. Vinod Verma for helping us in our daily attendance and
support throughout our tenure in CEERI. We would also like to thank our PS-1
instructor, Mr. Parikshit Kishor Singh, for being a constant source of guidance and
motivation for us.

2
Introduction
In computer vision, image segmentation is the process of partitioning a digital image into
multiple segments (sets of pixels, also known as super-pixels). The goal
of segmentation is to simplify and/or change the representation of an image into
something that is more meaningful and easier to analyze.
In the present work the model-free and real-time capable segmentation approach
presented in the previous work of the authors is extended to a general probabilistic
framework, which considers multimodal cues in a uniform manner. The algorithm
combines two segmentation methods: the identification of smooth object surfaces and the
composition of these surfaces into sensible object hypotheses.
In this work, region growing is replaced by connected component analysis and motion
sensitive temporal smoothing is implemented to avoid the motion blur effect.
While the high level segmentation extracted support planes and decomposed the
remaining blobs using binary space partitioning; the second contribution introduced the
idea of composing cutfree neighboring surfaces. In the current work, graph-cut is being
applied on a probabilistically weighted similarity graph considering adjacency, curvature
and co-planarity of found surface patches to enable the method to handle occluded and
open curved objects.
Additionally, the algorithms are further optimized for real-time challenges. The main
advantage of the method, in contrast to existing ones, is the capability to unknown,
stacked, nearby, and occluded objects in a model-free manner. Naturally, this approach
has its limitations compared to model-based approaches, especially if very complex
objects heaps are to be considered. However, it provides a meaningful initial object
hypothesis in arbitrary situations, which can be refined by active exploration or used as
input to model-based adaptive methods.
The probabilistic nature of method allows to focus these methods to selectively
disambiguate uncertain object hypotheses. The algorithm operates in real-time facilitating
interactive usage in human-robot-cooperation tasks.

3
Pre-Segmentation
The objective of the first processing step is to segment the depth image into regions of
(smoothly curved) surfaces, continuously enclosed by sharp object edges. We deal with
depth images as they possess low noise levels. Additionally, the raw depth image is
transformed into a 3D point cloud, which is represented w.r.t. a robot-defined coordinate
frame.
a)Median Filter : Median filter is the first step in implementing this work. It is used to
remove noise from image (if any). In this method we construct a mask of N × N
dimension, where N is an odd number. Generally a 3 × 3 mask is preferred. The mask
consists of all the 8 neighbors of the concerned pixel including the pixel itself. All the
pixel values in the mask are then sorted and the concerned pixel is replaced with the
median value of the mask. Median filter is preferred over Box filter because it replaces
the pixel value with the value of one of its neighbors while box filter may produce a
value which is nowhere in the entire image.
b) Determination of Surface Normals and Temporal Smoothing: As a basis for
computing “surface normal edges”, surface normals for every image point are determined
from the plane spanned by three points in the 3 ×3 neighborhood of the considered
central image point using cross product.
The determination of surface normals is directly performed on the raw depth image,
instead of the 3D point cloud. That is, the 2D image coordinates are augmented by the
depth value to yield valid three-dimensional vectors. This procedure yields much more
distinct changes of the normal direction at the boundary of objects, because the
smoothing effect due to 3D projection is avoided.
In order to reduce sensor noise and to obtain smooth and stable surface normal
estimations, a three stage smoothing procedure is applied.
First a 3 × 3 median filter has been applied earlier to the raw depth image. Secondly, a
motion sensitive temporal smoothing is used, averaging depth values of all individual
image pixels within the last n = 6 frames, if the difference of the depth values is smaller
than d = 10. The normal are calculated by taking the co-ordinates of concerned pixels and
any of its two neighbor pixels. Now, normal can be calculated by taking cross product of
the two vectors.Finally, the calculated normals are smoothed applying a convolution
using a 5 × 5 Gaussian kernel.

4
c) Detection of Surface Normal Edges: The next step is the fast detection of surface
normal edges, which is based on the computation of the scalar product of adjacent surface
normals. To obtain clear, uninterrupted edges, suitable for subsequent application of a
region growing algorithm, we look for edges in all eight directions defined by the
neighboring pixels of a point, i.e. north (N), east (E), south (S), west (W), as well as NE,
SE, SW, NW.
The final result of the edge filter is obtained from averaging the results of all eight scalar
products. While large values, close to one, correspond to flat surfaces, smaller values
indicate increasingly sharp object edges.
Finally, binarizing the obtained edge image by employing a threshold value θmax = 0.85
(31.8o
), we can easily separate edges from smoothly curved surfaces.
Object edges are clearly visible as bold lines as shown in figure below. On the left the
actual depth map is shown while right picture shows the detected edges.While smooth
and large surfaces form homogeneous white regions. Some false edges may still be
detected due to noise.
However those regions are small and disjointed and thus can be easily filtered out in
subsequent processing steps.
d) Segmentation into Surface Patches: Finally, a fast connected component analysis
algorithm is applied. The fast surface patch segmentation based on normal edges already
provides a detailed segmentation of the scene into surface patches, and is then employed
in the subsequent object segmentation step. The algorithm consists of two iterations of
processing which are explained with the help of flow charts below.

5
1st Iteration
Image
Scan image pixel by pixel
Check top and left neighbors
If both neighbors are
boundary or background
points
Pixel is not an edge or
background
Assign a
new label
If one neighbor is boundary
or background
Assign the
label of other
neighbor
If neither neighbor is
boundary or background
Assign the label
of neighbor with
smaller value
Assign the
larger label as
child of smaller
label

6
2nd
Iteration
Both the iterations are also explained below with
the help of a sample image. Suppose we have an
image as shown to the right. The black pixels
represent the boundary. First of all we visit the first
pixel after making sure that it is not a boundary
pixel, we assign first label to it. Then moving to
next pixel, We check its left neighbor (as it doesn’t
has any top neighbor). Since the neighbor is not a
boundary pixel so we give it the same label as the
label. After that we encounter a boundary and after
boundary we need to generate a new label. So new
label is generated and it is assigned to the pixel as
shown in image below(left). Proceeding similarly
me assign labels to all the pixels (non boundary) in
the 1st
row.
Scan image pixel by pixel
Pixel is
labelled
Get label’s parent
Assign parent’s label in place of
child’s.

7
Similarly we mark pixels in the 2nd
row as well comparing the current pixel with its top
and left neighbor.
In the third row we encounter a pixel(just below label 2) whose top neighbor has label 2
while the left neighbor has pixel value 1. So the lower value is assigned to it ( upper right
figure). We proceed similarly and at the end of 1st
iteration we get an image like the one
shown below.

8
2nd
iteration is all about merging the different labels which construct same surface like in
the image shown above, labels 1 and 2 represents the same plane so they must be
represented by a single label. In this iteration we merge such labels with the minimum of
the two values. We first point out a pixel where the top or right pixel has the different
label than the current pixel. Then we replace all the pixels of that region with the
minimum value of the two labels. Lower left image shows the image after merging pixels
with label 1 and 2 while the lower right image shows the resulting image after completion
of merging.

9
High-Level Object Segmentation
In the second processing block, the aim is segmentation on an object level, which means
that the previously found surface patches need to be combined to form proper object
regions. A weighted graph is created, modeling the probability of two surface patches
belonging to the same object region.
Subsequently this graph structure is analyzed to find the most probable segmentation into
object regions using a graph cut algorithm. Co-planarity and curvature cues are also
employed to successfully combine objects patches which are separated due to occlusion.
a) Adjacency Matrix and Assignment of Edge Points:
An initial adjacency matrix representing the basic connectivity of surface patches is
determined as follows: For every edge point pr all neighboring surface points pi within a
radius r in image space are considered, which have a Euclidean distance ||pr –pi|| smaller
than a threshold dmax. All possible surface pairs obtained from this list are marked as
adjacent.
For example, in the given figure, surfaces 2
and 10 are neighbors in image space, but not
in 3D space and therefore aren’t considered
adjacent. Faces 7,9,11 fulfill the conditions
and become connected in the graph.
b) Cutfree Neighbors: To further improve the adjacency matrix, a plausible heuristic
check is applied. The central idea of this heuristic approach is that visible surface patches
are part of an object’s outer hull, such that points belonging to this object should either lie
on the one or the other side of the associated plane. If we conversely find enough points
on both sides of the plane, we assume two (or more) separated objects and split the blob
into two blobs for further processing.
If one surface cuts the other, such that a considerable amount of points are lying on both
sides of the former surface. For illustration, in the surfaces 4 and 12 in above figure,
while all points of face 4 are on top of the supporting face 12, the plane fitted into surface
4 cuts surface 12. Hence this surface combination is disregarded. On the other hand
surfaces 7,9,11 are all pairwise cut-free.

10
c) Improving the Adjacency Matrix: In case of occlusion, a single face of an object is
separated into two parts, which will not have a link in the adjacency matrix. To overcome
this limitation, further links are added to the matrix based on additional cues, namely co-
planarity of flat faces and similar curvature of curved surfaces.
d) Co-Planarity: To check for co-planarity of two flat surfaces we proceed in two steps:
 If both surfaces have similar mean normals (up to a small noise margin). Because
the normal of the spanned plane may crucially depend on the actual selection of
points, this criterion is checked for a set of 50 randomly selected triples of points.
If any of the calculated normal deviates too much, the two surfaces are not
considered coplanar.
 Whether the faces are aligned, i.e. indeed span a common plane. In this case, any
plane spanned by three points from both surfaces should have a similar normal as
the two original mean normals.
Otherwise, the above described occlusion check is carried out, along several lines
connecting two randomly selected points from both surfaces. If this check is passed as
well, a corresponding link in the connectivity matrix is added for the given pair of
surfaces.
This Figure shows the
resulting graphs before and
after co-planarity extension.
While the first graph results
in four final objects, the
second graph correctly
results in three objects.

11
e) Curvature Matching: In order to handle curved surface patches in a similar fashion,
their curvatures are compared. To this end, a curvature histogram is computed for every
curved surface, representing the distribution of surface normals within the surface. The
2D histogram of 11×11 bins describes the relative frequency of observing surface
normals with given x and y components. The associated frequency of z components is
determined by the fact, that normals are normalized to magnitude one.
The distance of histograms is estimated by the mutual overlap of their distributions:
D(A,B) = ∑
Exploiting normalisation of histograms:
min(a,b)=1/2*((a+b)-│a-b│)
We compute the Similarity index:
S(a,b)=1-(1/2*D(a,b))=∑ijmin(aij,bij)
It is a score between 0 and 1. Surface pairs with a score larger than h = 0.5 are
considered for recombination.
We differentiate between open curved objects and occluded curved objects. To
recombine the inner and outer surfaces of an open object (like a cup or bowl), two
conditions must be fulfilled: (1) both surfaces are neighboring in image space and (2) the
surfaces are concave and convex respectively. The first condition considers the fact,
that the calculation of the initial adjacency matrix is restricted to neighbored surfaces in
Euclidean space.
To assess the convexity / concavity of a surface, we again consider the curvature
histogram, namely the two extremal bins hmin and hmax along the major axis of the
histogram blob. Back-projecting these bins into the image space, we yield point sets Pmin
and Pmax, whose normals are mapped onto the corresponding bins.
The mean image coordinates pmin and pmax of these point sets. Accordingly, convexity
is assessed by considering the scalar product
(pmax − pmin) ・ (hmax − hmin)
between the directional vectors
formed by the extremal points in
image vs. histogram coordinates. If
this value is positive, i.e. both vectors
pointing into a similar direction, the
surface is convex, otherwise it is
concave. This is also shown in figure

12
to the right.
The picture below shows the actual image, depth map and histograms of the objects in the
image. The left two histograms belongs to the lying cylinder (one for each occluded part)
third one belongs to the vertical bottle and the last one to the ball.
f) Probabilistic Object Composition (Graph Cut): The result of the previous steps is an
adjacency matrix representing a graph with edges for all possible surface combinations
arising from cut free neighborhood, co-planarity and curvature matching. This graph is
turned into a weighted graph, such that edge weights represent the strength of
connectivity between two connected nodes.
To determine the connectivity weights, initially a common weight is assigned wij = 1/n to
all edges (i, j) originating from node i. Here, n denotes the number of nodes adjacent to
node i. This results in a directed graph, where all outgoing edges of a node have the same
weight and thus the same probability for composition with this node. To create an
undirected graph, we average the weights of incoming and outgoing edges:
Wsym =
2
1
(Win + Wout )
The higher the connectivity of two nodes, the higher their
connecting weight. Exploiting the weighed graph, we set a
threshold θc = 0.5, and then apply graph cut algorithm.
Starting with individual nodes, the algorithm calculates all
connected sub graphs in ascending size and their
corresponding cuts. A cut is the set of all edges outgoing
from the sub graph and the associated costs is the sum of the

13
corresponding edge weights. If the costs are smaller than θc, a cut is found and the sub
graph is extracted as a single object. If the sub-graph exceeds n/2 in size, the algorithm
aborts, because all potential cuts were considered. The figure on the previous page shows
a cut of edge with probability of .29. this cut was made as its cost was less than 0.5
(consistent with our threshold value).
This threshold balances between under- and over-segmentation. A very small value, close
to zero, generates a single segment for every initially connected sub-graph, while a very
high value generates an individual segment for every surface node. This creates sub-
graphs which represent different objects.
g) Remaining Edge Points: In the final processing step, all remaining edge points have
to be processed to obtain the final segmentation result.
Firstly, the remaining points are segmented using a region growing algorithm working in
the image plane and using the Euclidean distance as the criterion of uniformity. These
segments are then processed according to the following rules:
 If a segment has no neighboring faces (caused by missing depth information), it
becomes a separate object.
 If a segment has one neighboring face and comprises very few points only, they are
assigned to this neighbor.
 If a segment is completely enclosed by a single neighboring face, it becomes a new
object. If it is not completely enclosed, all points are assigned to the single
 neighboring region.
 If a segment has more than one neighbor and all neighbors are part of a common
object, it will be assigned to this object.
 If a segment has more than one neighbor corresponding to different objects, all
points are assigned to the best matching neighboring plane using RANSAC.

14
Code Explanation
Structures:
 vctr: To hold the normalized surface normal calculated at each pixel. It contains
the x, y and z components in float data type.
 coordinate: To structure to hold co-ordinates of different pixels.
 list: It has fields of label, node_no and a pointer next. label holds the label value
assigned. node_no is used to count the number of nodes so it acts as a counter. next
is just a pointer to the next field.
 arr: it has two fields value and no. while value contains an integer label, no. acts as
a counter.
Functions:
 vec :It takes as input the pointer of an array of 3 coordinates, then calculates the
difference between them (to calculate vector along surface). It stores the difference
of the x, y and z components in an array and calls the CrossProduct function. It
returns the surface normal of the type vctr.
 CrossProduct : It takes as input 2 arrays which contain the x,y and z
components of 2 vectors and calculates the cross product to return the surface
normal.
 edge : It takes as input pointers to 2 vctr and calculates the dot product between
them. This is done to check the angle between two surface normal.
 create : It creates a list of all the labels that are assigned. I has 2 pointers head and
temp. Head points to the first element of list while temp points to the lastly added
node. It takes as input double pointers to head and temp pointer, pointer to an
integer label and an integer n.
When the list is empty, a new node is created. Values of label and n pointers are
assigned to label and node_no respectively. Head and temp, both are pointed at the
same label (this created label).

15
When the list is non-empty, the same thing happens except that now head points at
the first node and temp points at the last node. Temp is created so that for adding a
new element, the list has not to be traversed every time. It returns a pointer to
temp.
 replace : It takes as input 2 integer pointers and the list created using create
function. It compares the value of the 2 values the pointers are pointing at and
replaces the bigger value by the smaller value in the list’s label field.
 dist : It calculate the distance between 2 coordinate points.
 del : This function takes both head and temp pointer and deletes the list created for
storing labels, node by node. As the list was created dynamically, it must be
deleted at the end of the program by the programmer. Compiler is not responsible
for this task anymore.
 add: add function takes a pointer to an array, an integer value (size of that array)
and pointer to an integer(label). It adds the label to the array if it is not already
present in the array to keep a count of the number of different labels in the image.
 mod : This function takes two integer values and one integer pointer. It calculates
the absolute difference between the two integer values and stores it at the location
pointed by the integer variable.
 thinning : explained in a subsequent section.
 thinningIteration : explained in a subsequent section.
 Insertionsort : For sorting which is used to find the median of the values in the
mask of median filer.

16
Thinning Algorithm
The method for extracting the skeleton of a picture consists of removing all the contour
points of the picture except those points that belong to the skeleton. In order to preserve
the connectivity of the skeleton, each iteration is divided into two sub-iterations.
In the first sub-iteration, the contour point P1 is deleted from the digital pattern if it
satisfies the following conditions:
(a) 2 ≤ B(P1) ≤ 6
(b) A(P1)= 1
(C) P2*P4*P6 = 0
(d) P4*P6*P8 = 0
where A(P1) is the number of 01 patterns in the ordered set P2, P3, P4, - • - P8, P9 that
are the eight neighbors of P1 (Figure 1), and B(P1) is the number of nonzero neighbors of
P1, that is, B(P1) = P2 + P3 + P4 + • • • + P8 + P9. If any condition is not satisfied, e.g.,
A(P1) = 2 P1 is not deleted from the picture.
In the second sub-iteration, only conditions (c) and (d) are changed as follows:
(c') P2*P4*P8 = 0
(d') P2*P6*P8 = 0
and the rest remain the same.
By conditions (c) and (d) of the first sub-iteration, only the south-east boundary points
and the north-west corner points which do not belong to an ideal skeleton are removed.
By condition (a), the endpoints of a skeleton line are preserved. Also, condition (b)
prevents the deletion of those points that lie between the endpoints of a skeleton
line, as shown in Figure 5. The iterations continue until no more points can be removed.
0 0 0 0 0 0 0 0 0 0
0 1 1 1 1 1 1 1 1 0
0 0 0 0 0 0 0 0 0 0
This is like a boundary of some binary image. In the first gray kernel, B(P1)=1 hence it
won’t be deleted. In the second kernel A(P1) =2 so it also won’t be deleted. In this way
the single pixel boundary is preserved.

17
Main function
In the main function we first load the source image and apply median filter on it. An
array of integers, window is created. It is given the 9 values of the 3×3 kernel centered at
each pixel. This array is sorted using insertion sort and then value at the center of the
kernel is replaced by the fifth value in sorted order.
Now we loop through the image in a for loop. At each pixel, we store in an array of
coordinate , the coordinates of the point and right and down neighbors. This array is then
passed to the function vec which then returns the normal. This normal is stored in an
array of vctr. So after looping through the entire image, in a 2d array of vctr we have the
unit normal at each point.
Now we take the dot product of a normal from all 8 neighbors one by one using the dot
function and sore the values in an array of floats. Then the average of the 8 dot products
is taken. If the average value is smaller than the threshold ( dot product is smaller means
that angle is larger) then we mark the point as boundary point(black) and rest points as
white or the other way round(just a matter of convention ) .
Hence after looping through the entire image we have the edges in the image. But as the
edges are quite thick, a thinning algorithm would also be applied to reduce the thickness
of edges to one pixel only. The thinning algorithm is explained in a later section. After
completion of all this we get a binary image.
Our next step is to identify the set of pixels in a connected region as a single pixel. For
this we define an array of pointers which can point to integer values having exactly the
same size as that of the original image. The algorithm for assigning labels to the non-
boundary pixels has already been discussed.
To label each pixel with a value, we follow the Connected Component labelling code
explained in the first part. It proceeds as follows:
 If the pixel is a boundary or background point, continue the loop without doing
anything and assign in patch matrix the pointer an integer whose value is 255.
 Else, at x=y=0, assign a new label and create a node that stores this label value.
Also pointer to this value is stored in the patch matrix.
 In the first row or column, only one of the left or top neighbors is available
respectively. In that case if the available neighbor is a boundary point increment

18
the value of label, create a new node and assign it. Also store the pointer in patch
matrix.
 If either the top or left neighbor of the concerned pixel is a boundary point, assign
the label of the other neighbor and store pointer to that in the patch matrix.
 If both the neighbors are boundary, increment the value of label and assign it to the
pixel.
 If both neighbors are non-boundary points, then assign the smaller label of the two
and store that the bigger value is a child of the smaller value.
So in main function as soon as we assign a new label, we call the function create which
adds a node to the current linked list if already a list exists and creates a first node if no
list exists at the time of assignment of label. Then we make the corresponding pointers
point to the label stored in that list. As further and further labels are assigned, the list
grows in size and when the last pixel is processed, we get a complete list having the value
of all the assigned labels.
For merging the labels such that regions which are not completely disjoint have the same
label, we follow this approach (in the patch matrix):
 If the pixel is first row or column, or has both top and left neighbors as boundary
or is itself a boundary, continue the loop. It is so because in the first row or column
the pixel will have either a new value or a value same as it’s available neighbor. In
either case, it won’t be involved directly in change of values or call to replace. It
would only be involved as a neighbor.
 Else, if the top or left neighbor is a boundary, and the value at the pixel is not equal
to the value of other neighbor use the replace function and replace the pointer of
the larger value by the smaller value. In this way these components would be
merged.
 Else if neither neighbor is boundary, then by the rule of assignment of labels the
label value has to be equal to the value of either the top or left neighbor. If the
value is equal to the left neighbor, then call replace function on the top neighbor
and the current pixel. In this way these regions will be joined.

19
So by now we have identified different surfaces in a scene. Now since the objects are
made up of surfaces so we need to combine the surfaces that belong to the same object.
For this we visit each boundary pixel in the patch matrix and construct a circle of unit
radius (consider all the neighbors at a distance if one unit to the concerned pixel) and then
we follow the following approach.
 If the neighboring pixels contains more than two boundary pixels then we continue
through the loop and skip that pixel as that pixel is at the boundary of more than
two surfaces and all these surfaces will be merged in subsequent steps.
 If the neighboring pixels have less than or equal to two neighbors then for each
non-boundary pixel we call the mod function which returns the absolute difference
between the depth values of the concerned neighbor and the concerned boundary
pixel. The difference value for each non-boundary pixel is added and averaged.
 If the average if less than a threshold (20) in our case, the minimum value of label
among all the neighbor is assigned to all the non-boundary pixels.
 If the average is more than the specified threshold the program considers them as
surface belonging to different objects and hence leave them as they are.
After this step the surfaces having the same patch form a single object.
Now we come to the part of 3d projection. Here a right and a left stereo image is taken.
Both the images are split into their RGB channels using split function. Then another array
is created in which the red component from left image and blue and green component
from right image is taken and merged. This creates an anaglyph which when viewed from
red-cyan glasses will create the perception of depth.

20
Results
This section presents the final results that were obtained from the applied algorithm. First
the input image is shown with five objects in it and then the segmented images for each
object are shown.

21
But these results represent just the 2D output. So to convert it into 3D output we use the
technique of anaglyph.
Anaglyph
since we did not have the RGB image of the scene shown above so below we have shown
the results of our function tested on a different set of images. To the left, left image is
shown while the right image is shown to the right. Anaglyph is shown a the bottom.

22
Conclusion
• In this paper, extension of model-free segmentation algorithm for cluttered scenes
which is not restricted by a given set of object models or world knowledge.
• A fast algorithm to determine object edges using edge detection on surface normals
was combined with a novel graph-based method to combine surface patches to
form highly probable object hypotheses.
• Coplanarity checks and curvature matching were added to handle occluded and
open curved objects.
• The algorithm can deal with stacked, nearby, and occluded objects, which is
achieved by finding object edges in depth images and the novel idea to identify
adjacent and cut-free surface patches, as well as coplanar surfaces, separated by
occlusion, which can be combined to form object regions.
• The algorithm was evaluated w.r.t. real-time capabilities and segmentation quality.

23
References
• Realtime 3D Segmentation for Human-Robot Interaction ,Andre U¨ ckermann,
Robert Haschke and Helge Ritter
• Real-Time 3D Segmentation of Cluttered Scenes for Robot Grasping, A. U¨
ckermann, R. Haschke, H. Ritter,
• A Fast Parallel Algorithm for Thinning Digital Patterns, T. Y. Zhaung and C. Y.
Suen
• http://www.aishack.in/2010/03/labelling-connected-components-example
• Digital Image Processing 1st
edition, S.Sridhar
• Digital Image Processing 2nd
edition, R.C. Gonzalez, R.E.Woods
• Object Oriented Programming with c++ 6th
edition, E Balagurusamy
• Datastructures with c 2nd
edition, Yashwant Kanetkar
• Learning OpenCV , G.Bradski, A. Kaehler

PS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958H

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (8)

Similar to PS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958H

Similar to PS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958H (20)

PS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958H