Seminar Report (Final)

1 | P a g e
SEMINAR REPORT
ON
Closest pair: Using Divide and Conquer
SESSION 2014-2015
DEPARTMENT OF
Computer Science and Engineering
SILIGURI INSTITUTE OF TECHNOLOGY
(AFFILIATED BY WBUT)
SUBMITTED BY:-
ARUNEEL DAS
Roll No: - 119002075
Year: - 3rd
(6th
semester)
Under guidance of:-
Mr. KAUSHIK NATH (Assist. Professor)

2 | P a g e
Preface
This report contains information on a program I wrote in C. The "closest" program
takes in a set of points in two dimensions and finds the distance between the closest
pair of points in the set. The algorithm used in this program is given
in Introduction to Algorithms, by Thomas H. Cormen, Charles E. Leiserson, and
Ronald L. Rivest. This report was prepared for a seminar under guidance of
Professor Mr. KAUSHIK NATH at SHILIGURI INSTITUTE OF TECHNOLOGY.

3 | P a g e
Contents
Seminar report: Closest Pair Algorithm
 Preface
 Acknowledgement
 Description
 Introduction
 History
 Algorithm
 Brute Force Algorithm
 Divide & Conquer Algorithm
 Implementation
 Code : BRUTE FORCE
 Code: DIVIDE & CONQUER
 Result
 Output: BRUTE FORCE
 Code: DIVIDE & CONQUER
 Conclusion
 Bibliography

4 | P a g e
Description
This program solves the problem of finding the closest pair of points in a set of
points. The set consists of points in R2
defined by both, x and y coordinate. The
"closest pair" refers to the pair of points in the set that has the smallest Euclidean
distance, where Euclidean distance between points p1=(x1, y1) and p2=(x2,y2) is simply
sqrt((x1-x2)2
-(y1-y2)2
). If there are two identical points in the set, then the closest pair
distance in the set will obviously be zero. As noted in Introduction to Algorithms,
"this problem has applications in traffic control systems. A system for controlling air
or sea traffic might need to know which the two closest vehicles are in order to
detect potential collisions."

5 | P a g e
Introduction
The Closest-Pair problem is considered an “easy” Closest-Point problem, in the sense
that there are a number of other geometric problems (e.g. nearest neighbors and
minimal spanning trees) that find the closest pair as part of their solution. This
problem and its generalizations arise in areas such as statistics, pattern recognition
and molecular biology. At present time, many algorithms are known for solving the
Closest-Pair problem in any dimension k > 2, with optimal time complexity). The
Closest-Pair is also one of the first non-trivial computational
problems that was solved efficiently using the divide-and-conquer strategy and it
became since a classical.

6 | P a g e
History
An algorithm with optimal time complexity O(n lg n) for solving the Closest-Pair
problem in the planar case appeared for the first time in 1975, in a computational
geometry classic paper by Ian Shamos . This algorithm was based on the Voronoi
polygons.
The first optimal algorithm for solving the Closest-Pair problem in any dimension k >
2 is due to Jon Bentley and Ian Shamos . Using a divide-and-conquer approach to
initially solve the problem in the plane1, those authors were able to generalize the
planar process to higher dimensions by exploring a sparsity condition induced over
the set of points in the k-plane.
For the planar case, the original procedure and other versions of the divide-and-
conquer algorithm usually compute at least seven pairwise comparisons for each
point in the central slab, within the combine step.
In 1998, Zhou, Xiong, and Zhu2 presented an improved version of the planar
procedure, where at most four pairwise comparisons need to be considered in the
combine step, for each point lying on the left side (alternatively, on the right side) of
the central slab. In the same article, Zhou et al. introduced the “complexity of
computing distances”, which measures “the number of Euclidean distances to
compute by a closest-pair algorithm”. The core idea behind this definition is that,
since the Euclidean distance is usually more expensive than other basic operations, it
may be possible to achieve significant efficiency improvements by reducing this
complexity measure.
The authors conclude More recently, Ge, Wang, and Zhu used some sophisticated
geometric arguments to show that it is always possible to discard one of the four
pairwise comparisons in the combine step, thus reducing significantly the complexity
of computing distances, and presented their enhanced version of the Closest-Pair
algorithm, accordingly.
In 2007, Jiang and Gillespie presented another version of the Closest-Pair divide-and-
conquer algorithm which reduced the complexity of computing distances by a
logarithmic factor. However, after performing some algorithmic experimentation,
the authors found that, albeit this reduction, the new algorithm was “the slowest
among the four algorithms” [7] that were included in the comparative study. The
experimental results also showed that the fastest among the four algorithms was in

7 | P a g e
fact a procedure named Basic-2, where two pairwise comparisons are required in the
combine step, for each point that lies in the central slab and, therefore, has a relative
high complexity of computing distancesthat the simpler design in the combine step,
and a consequent correct imbalance in trading expensive operations with cheaper
ones are the main factors for explaining the success of the Basic-2 algorithm.

8 | P a g e
Algorithm
The most obvious way to compute the closest pair distance of a set of points is to
compute the distance for every pair and keep the smallest distance. This brute force
algorithm can be computed in O(n2
) for a set of n points. The divide and conquer
algorithm used here requires only O(n log n) time to compute the same closest pair
distance.
Brute Force Algorithm
A straight forward solution is to check the distances between all pairs and take the
minimum among them. This solution requires n(n - 1)/2 distance computations and
n(n - 1)/2- 1
comparisons. The straightforward solution using induction would proceed by
removing a point, solving the problem for (n – 1) points, and considering the extra
point. However, if the only information obtained from the solution of the (n – 1) case
is the minimum distance, then the distances from the additional point to all other (n
-1) points must be checked.
As a result, the total number of distance computations T(n) satisfies the recurrence
relation T(n) = T(n-1) + n-1, where T(2)= 1, and we can solve T(n) = O(n2
).
A l g o r i t h m D e s c r i p t i o n o f B r u t e F o r c e S t r a t e g y
: -
The closest pair of points can be computed in O(n2
) time by performing a brute-force
search. To do that, one could compute the distances between all the n(n − 1) /2 pairs
of points, then pick the pair with the smallest distance, as illustrated below.
minDist = infinity
for i = 1 to length(P) - 1
for j = i + 1 to length(P)
let p = P[i], q = P[j]
if dist(p, q) < minDist:
minDist = dist(p, q)
closestPair = (p, q)
return closestPair

9 | P a g e
bruteForceClosestPair of P(1), P(2), ... P(N)
if N < 2 then
return _
else
minDistance _ |P(1) - P(2)|
minPoints _ { P(1), P(2) }
foreach i _ [1, N-1]
foreach j _ [i+1, N]
if |P(i) - P(j)| < minDistance then
minDistance _ |P(i) - P(j)|
minPoints _ { P(i), P(j) }
endif
endfor
e n d f o r
return minDistance, minPoints
endif
Divide and Conquer Algorithm
This algorithm begins by taking the set of points P and sorting in two ways. The set X
consists of the points of P sorted by X coordinate, the set Y consists of the points of P
sorted by Y coordinate. We use presorting, as described later, to avoid resorting X
and Y with each recursive call. The idea of the algorithm is to recursively divide P into
smaller and smaller sets until some base case is reached, compute this base case,
and then combine the solutions. The base case used in my program is to compute by
"brute force" method (compare all pairs) when the set is size BASE_CASE_SIZE or
smaller. When the base case does not apply, "the recursive invocation carries out the
divide-and-conquer paradigm as follows."
 Divide: Divide the set P of points into 2 smaller sets PL and PR such that all
points in PL are on or to the left of some vertical line l and all points in PR are
on or to the right of l. The array X is divided into the sorted arrays XL and XR,
and Y is divided into sorted arrays YL and YR, each containing the sorted points
of PL and PR respectively. An example divide is shown below:

10 | P a g e
 Conquer: After the set of points has been divided, the algorithm makes two
recursive calls to find the closest pair of points in PL and PR. The first recursive
call receives PL, XL and YL, the second recursive call receives PR, XR and YR.
The results of recursive calls are then compared, with the smallest closest
pairs distance of the two stored as delta. In the PVM implementation, a
recursive procedure call may be replaced with a process spawn in some cases.
 Combine: The closest pair distance of a given set is often the delta found after
the two recursive algorithm calls; however we must also take care to check
the points that lie near the dividing line l. We leave to the reader of this to
verify that we only need to consider points falling in the strip within delta
distance of the dividing line l, as illustrated by the shaded region. The points in
this 2*delta wide strip are stored in an array Y', sorted by y coordinate. For
every point in this array Y', we check the distance to the next seven points in
Y'. The smallest distance found in this manner is kept track of as delta'. Finally,
if delta' is less than delta, then the strip did contain a pair of points closer than
delta distance apart and the distance delta' is returned instead of delta.

11 | P a g e
A l g o r i t h m D e s c r i p t i o n o f D i v i d e a n d C o n q u e r
S t r a t e g y : -
Divide the set into two equal sized parts by the line
minimal distance in each part.
a) Let d be the minimal of the two minimal distances. It takes O(1) time.
b) Eliminate points that lie farther than d apart from l. It takes O(n) time
c) Sort the remaining points according to their y-coordinates. This Step is a sort that
takes O(n log n) time.
d) Scan the remaining points in the y order and compute the distances of each point
to its five neighbors. It takes O(n) time.
e) If any of these distances is less than d then update d. It takes O(1) time.
Steps define the merging process must be repeated log n times because this is a

12 | P a g e
divide and conquer algorithm.
A sketch of the algorithm based on the recursive divide & conquer approach is given
below.
closestPair of (xP, yP)
where xP is P(1) .. P(N) sorted by x coordinate, and
yP is P(1) .. P(N) sorted by y coordinate (ascending order)
if N _ 3 then
return closest points of xP using brute-force algorithm
else
xL _ points of xP from 1 to _N/2_
xR _ points of xP from _N/2_+1 to N
xm _ xP(_N/2_)x
yL _ { p _ yP : px _ xm }
yR _ { p _ yP : px > xm }
(dL, pairL) _ closestPair of (xL, yL)
(dR, pairR) _ closestPair of (xR, yR)
(dmin, pairMin) _ (dR, pairR)
if dL < dR then
(dmin, pairMin) _ (dL, pairL)
endif
yS _ { p _ yP : |xm - px| < dmin }
nS _ number of points in yS
(closest, closestPair) _ (dmin, pairMin)
for i from 1 to nS - 1
k _ i + 1
while k _ nS and yS(k)y - yS(i)y < dmin
if |yS(k) - yS(i)| < closest then
(closest, closestPair) _ (|yS(k) - yS(i)|, {yS(k), yS(i)})
endif
k _ k + 1
endwhile
endfor
return closest, closestPair
endif

13 | P a g e
Closest Pair Analysis
It takes O(n log n) steps to sort according to the x coordinates, but
done only once. We then solve two sub problems of size n/2. Eliminating the points
outside of the strips can be done in O(n) steps. It then takes 0(n log n) steps to sort
according to the y coordinates. Finally, it takes O(n) steps to scan the strips and to
compare each one to a constant number of its neighbors in the order.
Overall, to solve a problem of size n, we solve two sub problems of size n/2 and use
O(n log n) steps for combining the solutions (plus O(n log n) steps) beginning for
sorting the x coordinates). We obtain the following recurrence relation:
T(n)=2T(n/2)+O(n log n) ,t(2)=1
The solution of this recurrence relation is T(n) = O(n log2 n). This is asymptotically
better than a quadratic algorithm, but we still want to do better than that. So, now
we try to find an O(n log n) algorithm.
The key idea here is to strengthen the induction hypothesis. The reason we have to
spend O(n log n) time in the combining step is the sorting of the y coordinates.
Although we know how to solve the sorting problem directly, doing so takes too
long.
Can we somehow solve the sorting problem at the same time we are solving the
closest-pair problem? In other words, we would like to strengthen the induction
hypothesis for the closest-pair problem to include sorting.
Induction Hypothesis: given a set of <n points in the plane, We
know how to find the closest distance and how to Output the
set sorted according to the point’s y coordinates.
We have already seen how to find the minimal distance if the points are sorted in
each step according to their y coordinates. Hence, the only thing that we need to do
to extend this hypothesis is to sort the set of n points when the two subsets (of size
n/2) are already sorted. But, this sorting is exactly merge-sort. The main advantage
of this approach is that we do not have to sort every time we combine the solutions
— we only have to merge. Since merging can be done in O(n) steps, the recurrence

14 | P a g e
relation becomes T(n) = 2T(n/2) + 0(n), where T(2)= 1, which implies that
T(n) = O(n log n).
Let T(n) be the time required to solve the problem for n points:
 Divide: O (1)
 Conquer: 2T(n/2)
 Combine: O (n)
The precise form of the recurrence is: T(n) = T(_n/2_) + T(_n/2_) + O (n)
Final recurrence is T(n) = 2T (n/2) + O(n), which solves to T(n) = O(n log n).

15 | P a g e
Implementation
The following algorithms have been implemented in C. And the mentioned code
for the given algorithms are given below.
CODE: BRUTE FORCE
//This is a brute force implementation of the closest pair problem.
//The time complexity is O(n2)
#include<stdio.h>
#include<math.h>
#include <stdlib.h>
#include<assert.h>
#include<time.h>
#define MAX 32767
#define NP 10
//structure defined to represent a point with X and Y coordinate.
typedef struct pnt
{
double x;
double y;
} point;
//global declarations.
point p1,p2;
double shortestDistance = MAX;
//function to find closest pair by brute force method.
void bruteforce(point Points[ ])
{
int index1,index2,d,i,j;
for(i=0; i<NP-1; i++)
{
for(j=i+1; j<NP; j++)
{

16 | P a g e
d=sqrt(pow((Points[i].x-Points[j].x),2) + pow((Points[i].y-
Points[j].y),2)); //finding Euclidean distance.
if(d<shortestDistance)
{
shortestDistance=d;
p1=Points[i];
p2=Points[j];
}
}
} printf("nnShortest distance: %lf", shortestDistance);
printf("nnShortest points: point1: (%f , %f) and point2: (%lf , %lf)", p1.x,
p1.y, p2.x,
p2.y);
}
//main function
int main()
{
int i, c = 0;
double *DATA;
point pts[NP];
FILE *fp;
clock_t start,end;
double TIME;
fp = fopen("InputData.txt","r");
assert(fp);
DATA = (double *)calloc(sizeof(double),2*NP);
assert(DATA);
for(i=0; i < 2*NP; i++)
fscanf(fp,"%lf",&DATA[i]);
for(i = 0; i < NP; i++)
{
pts[i].x = DATA[c++];
pts[i].y = DATA[c++];
}

17 | P a g e
printf("nThe points are: n");
for(i = 0; i < NP; i++)
{
printf("n(%lf , %lf)",pts[i].x,pts[i].y); //printing the points on console.
}
start=clock();
bruteforce(pts); //call to closest pair function.
end=clock();
TIME=(double)(end-start)/CLOCKS_PER_SEC;
printf("nnTime taken is: %lf",TIME);
fclose(fp);
return 0;
}
CODE: DIVIDE & CONQUER
//This is a divide and conquer implementation of the closest pair problem.
//The time complexity is O(nlogn)
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include<assert.h>
#include<time.h>
#define MAX 32767
#define NP 10
//structure defined to represent a point with X and Y coordinate.
typedef struct pnt
{
double x;
double y;
} point;
//global declarations.
point p1,p2;
double shortestDistance = MAX;

18 | P a g e
//function defined to sort the array wrt X-coordinate in O(nlogn) time.
void quicksortByX(point A[ ],int p,int r)
{
int q;
if(p<r)
{
q = partitionByX(A,p,r);
quicksortByX(A,p,q-1);
quicksortByX(A,q+1,r);
}
}
int partitionByX(point A[ ],int p,int r)
{
int s, q;
double z;
point temp;
z = A[p].x;
q = p;
for(s=p+1 ; s<=r ; s++)
{
if (A[s].x < z)
{
q++;
temp = A[q];
A[q] = A[s];
A[s] = temp;
}
}
temp = A[p];
A[p] = A[q];
A[q] = temp;

19 | P a g e
return q;
}
//function defined to sort the array wrt Y-coordinate in O(nlogn) time.
void quicksortByY(point A[ ],int p,int r)
{
Int q;
if(p<r)
{
q = partitionByY(A,p,r);
quicksortByY(A,p,q-1);
quicksortByY(A,q+1,r);
}
}
int partitionByY(point A[ ],int p,int r)
{
int s, q;
double z;
point temp;
z = A[p].y;
q = p;
for(s=p+1 ; s<=r ; s++)
{
if (A[s].y < z)
{
q++;
temp = A[q];
A[q] = A[s];
A[s] = temp;
}

20 | P a g e
}
temp = A[p];
A[p] = A[q];
A[q] = temp;
return q;
}
//function to calculate minimum.
double minimum(double d1, double d2)
{
if(d1<d2)
return d1;
else
return d2;
}
// merge pointsByX(low to mid) and pointsByX(mid+1 to high) back, so that
pointsByY[low to high] are sorted by y-coordinate
void merge(point PointsByX[], point PointsByY[], int lowBound, int mid, int
highBound)
{
int i;
for(i =lowBound; i <=highBound; i++)
{
PointsByY[i] = PointsByX[i];
}
//Only sort pointsByY from lowBound to highBound
//Need not sort the entire array because the later calculation only uses part of the
array
quicksortByY(PointsByY, lowBound, highBound);
}
//closest function.
double closest(point PointsByX[], point PointsByY[], point temp[], int lowBound,
int highBound)
{

21 | P a g e
if (highBound<=lowBound) //terminating condition for divide and conquer.
return MAX;
int mid = (lowBound + highBound)/2; //middle index
point median = PointsByX[mid]; //middle point.
double d1 = closest(PointsByX,PointsByY,temp,lowBound,mid); //recursive calls,
left sub problem.
double d2 = closest(PointsByX,PointsByY,temp,mid+1,highBound); //recursive
calls, right sub problem.
double d = minimum(d1,d2);
// merge back so that PointsByY array is sorted by y-coordinate
// only from index lowBound to index highBound is sorted
merge(PointsByX, PointsByY, lowBound, mid, highBound); //call to merge
function
// temp[0 to k-1] has a sequence of points closer than delta, sorted by y-
coordinate
int k = 0;
int i, j;
for(i = lowBound; i<=highBound; i++)
{
if(abs(PointsByY[i].y - median.y) < d)
{
temp[k] = PointsByY[i];
k++;
}
}
// compare each point to its neighbors with y-coordinate closer than d
for(i = 0; i < k; i++)
{
for(j=i+1; (j<k) && (temp[j].y-temp[i].y < d); j++)
{

22 | P a g e
double distance = sqrt(pow((temp[i].x-temp[j].x),2) + pow((temp[i].y-
temp[j].y),2));
if(distance < d)
d = distance;
if(distance < shortestDistance)
{
shortestDistance = d;
p1 = temp[i];
p2 = temp[j];
}
}
}
return shortestDistance;
}
//function to find closest pair
void closestpair(point Points[])
{
int i;
int N = NP;
if(N<=1)
return;
//sort by x-coordinate
point PointsByX[NP];
for(i = 0; i < N; i++)
{
PointsByX[i] = Points[i]; //copy the points array as it is into the
pointsByX array
}
quicksortByX(PointsByX,0,N-1); //call to quick sort to sort it wrt to X-
coordinate.

23 | P a g e
// check for identical points
for (i = 0; i < N-1; i++)
{
if ((PointsByX[i].x == PointsByX[i+1].x) && (PointsByX[i].y ==
PointsByX[i+1].y))
{
shortestDistance = 0.0;
p1 = PointsByX[i];
p2 = PointsByX[i+1];
printf("nnShortest distance: %f", shortestDistance);
printf("nnShortest points: point1: (%f , %f) and point2: (%f , %f)",
p1.x, p1.y, p2.x, p2.y);
return;
}
}
//displayPoints(pointsByX);
point PointsByY[N];
for(i=0; i<N; i++)
PointsByY[i] = PointsByX[i];
//temporary array
point temp[N];
printf("nnShortest distance: %f", closest(PointsByX, PointsByY, temp, 0, N-
1));
printf("nnShortest points: point1: (%f , %f) and point2: (%f , %f)", p1.x,
p1.y, p2.x, p2.y);
}
//main function
int main()
{
int i;
point pts[NP];

24 | P a g e
FILE *fp;
clock_t start,end;
double TIME;
fp = fopen("D:close.txt","w");
assert(fp);
/*
point pts, PointsByX, PointsByY;
pts = malloc(sizeof(point) * NP); //array of points
PointsByX = malloc(sizeof(point) * NP); //array to hold points sorted by X
coordinate.
PointsByY = malloc(sizeof(point) * NP); //array to hold points sorted by Y
coordinate.
*/
for(i = 0; i < NP; i++)
{
//randomly generates X and Y coordinates.
pts[i].x = 100 * (double) rand()/RAND_MAX;
pts[i].y = 100 * (double) rand()/RAND_MAX;
}
printf("nThe points are: n");
for(i = 0; i < NP; i++)
{
printf("n(%f , %f)",pts[i].x,pts[i].y); //printing the points on console.
fprintf(fp,"%f %fn",pts[i].x,pts[i].y); //printing into file.
}
start=clock();
closestpair(pts); //call to closest pair function.
end=clock();
TIME=(double)(end-start)/CLOCKS_PER_SEC;
printf("nnTime taken is: %lf",TIME);
fclose(fp);
return 0;
}

25 | P a g e
Results
OUTPUT: BRUTE FORCE
OUTPUT: DIVIDE & CONQUER

26 | P a g e
Conclusion
A naive algorithm of finding distances between all pairs of points and selecting the
minimum requires O (dn2
) time. It turns out that the problem may be solved
in O(n log n) time in a Euclidean Space of fixed dimension d.

27 | P a g e
Bibliography
 Introduction To Algorithms, A Creative Approach -- Udi Manber Pg. 295
 Introduction To Algorithms (3ed) -- CLRS Pg. 1039
 The Algorithm Design Manual (2ed) -- Steven S Skiena Pg. 595
 Algorithms Design Techniques and Analysis -- M H Alsuwaiyel Pg. 209
 Algorithm Design -- Kleinberg and Tardos Pg. 243
 Algorithms -- Robert Sedgewick Pg. 369
 www.saurabhschool.com

Seminar Report (Final)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Seminar Report (Final)

Similar to Seminar Report (Final) (20)

Seminar Report (Final)