SlideShare a Scribd company logo
1 of 9
Download to read offline
International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
DOI : 10.5121/ijcsa.2012.2102 13
PARTITION SORT REVISITED:
RECONFIRMING THE ROBUSTNESS IN
AVERAGE CASE AND MUCH MORE!
Niraj Kumar Singh1
, Mita Pal2
and Soubhik Chakraborty3*
1
Department of Computer Science & Engineering,
B.I.T. Mesra, Ranchi-835215, India
2,3
Department of Applied Mathematics,
B.I.T. Mesra, Ranchi-835215, India
*email address of the corresponding author: soubhikc@yahoo.co.in
(S. Chakraborty)
ABSTRACT
In our previous work there was some indication that Partition Sort could be having a more robust
average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we
reconfirm this through computer experiments for inputs from Cauchy distribution for which
expectation theoretically does not exist. Additionally, the algorithm is found to be sensitive to
parameters of the input probability distribution demanding further investigation on parameterized
complexity. The results on this algorithm for Binomial inputs in our second study are very
encouraging in that direction.
KEY WORDS
Partition-sort; average case complexity, robustness; parameterized complexity; computer
experiments; factorial experiments
1. Introduction
Average complexity is an important field of study in algorithm analysis as it explains how certain
algorithms with bad worst case complexity perform better on the average like Quick sort. The
danger in making such a claim often lies in not verifying the robustness of the average complexity
in question. Average complexity is theoretically obtained by applying mathematical expectation
to the dominant operation or the dominant region in the code. One problem is: for a complex code
it is not easy to identify the dominant operation. This problem can be resolved by replacing the
count based mathematical bound by a weight based statistical bound that also permits collective
consideration of all operations and then estimate it by directly working on time, regarding the
time consumed by an operation as its weight. A bigger problem is that the probability distribution
over which expectation is taken may not be realistic over the domain of the problem. Algorithm
books derive these expectations for uniform probability inputs. Nothing is stated explicitly that
the results will hold even for non-uniform inputs nor is there any indication as to how realistic the
International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
14
uniform input is over the domain of the problem. The rejection of Knuth’s proof in [1] and
Hoare’s proof in [2] for non uniform inputs should be a curtain raiser in that direction. Similarly,
it appears from [3] that the average complexity in Schoor’s matrix multiplication algorithm is not
the expected number of multiplications O(d1d2 n3
), d1 and d2 being the density (fraction of non
zero elements) of pre and post factor matrices, but the exact number of comparisons which is n2
provided there are sufficient zeroes and surprisingly we don’t need a sparse matrix to get an
empirical O(n2
) complexity! This result is obtained using a statistical bound estimate and shows
that multiplication need not be the dominant operation in every matrix multiplication algorithm
under certain input conditions.
In our previous work we introduced Partition Sort and discovered it to be having a more robust
average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we
reconfirm this through computer experiments for inputs from Cauchy distribution for which
expectation theoretically does not exist! Additionally, the algorithm is found to be sensitive to
parameters of the input probability distribution demanding further investigation on parameterized
complexity on this algorithm. This is confirmed for Binomial inputs in our second study.
The Algorithm Partition Sort
Partition-sort algorithm is based on divide and conquer paradigm. The function “partition” is the
key sub-routine of this algorithm. The nature of partition function is such that when applied on
input A[1…….n] it divides this list into two halves of sizes floor (n/2) and ceiling (n/2)
respectively. The property of the elements in these halves is such that the value of each element in
first half is less than the value of every element in the second half. The Partition-sort routine is
called on each half recursively to finally obtain a sorted sequence of data as required. Partition
Sort was introduced by Singh and Chakraborty [4] who obtained O(nlog2
2
n) worst case count,
(nlog2n) best case count and empirical O(nlog2n) as the statistical bound estimate by working
directly on time, for reasons stated earlier, in the average case.
2. Statistical Analysis
2.1 Reconfirming the robustness of average complexity of Partition Sort
Theorem 1: If U1 and U2 are two independent uniform U [0, 1] variates then Z1 and Z2 defined
below are two independent Standard Normal variates:
Z1= (-2lnU1)1/2
Cos(2ЛU2); Z2= (-2lnU1)1/2
Sin(2ЛU2)
This result is called Box Muller transformation.
Theorem 2: If Z1 and Z2 are two independent standard Normal variates then Z1/Z2 is a standard
Cauchy variate. For more details, we refer to [5].
Cauchy distribution is an unconventional distribution for which expectation does not exist
theoretically. Hence it is not possible to know the average case complexity theoretically for inputs
from this distribution. Working directly on time, using computer experiments, we have obtained
an empirical O(nlogn) complexity in average sorting time for Partition sort for Cauchy
distribution inputs which we simulated using theorems 1 and 2 given above. This result goes a
International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
15
long way in reconfirming that Partition Sort’s average complexity is more robust compared to
that of Quick Sort. In [4] we have theoretically proved that its worst case complexity is also much
better than that of Quick Sort as O(nlog2
2
n) < O(n2
). Although Partition Sort is inferior to Heap
Sort’s O(nlogn) complexity in worst case, it is still easier to program Partition Sort.
Table 1 and figure 1 based on table 1 summarize our results.
Table 1: Average time for Partition Sort for Cauchy distribution inputs
N 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
Me
an
Tim
e
(Se
c.)
0.0516
8
0.1081
6
0.148
7
0.1721
8
0.2049
4
0.2407
8
0.265
9
0.3132
2
0.3512
8
0.39496
Fig. 1: Regression model suggesting empirical O(nlogn) complexity
2.2 Partition Sort subjected to parameterized complexity analysis
Parameterized complexity is a branch of computational complexity theory in computer science
that focuses on classifying computational problems according to their inherent difficulty with
respect to multiple parameters of the input. The complexity of a problem is then measured as a
function in those parameters. This allows to classify NP-hard problems on a finer scale than in the
classical setting, where the complexity of a problem is only measured by the number of bits in the
input (see also http://en.wikipedia.org/wiki/Parameterized_complexity). The first systematic work
on parameterized complexity was done by Downey & Fellows [6]. The authors in [7] have
strongly argued both theoretically and experimentally why for certain algorithms like sorting, the
parameters of the input distribution should also be taken into account for explaining the
complexity, not just the parameter characterizing the size of the input. The second study is
accordingly devoted to parameterized complexity analysis whereby the sorting elements of
Partition Sort come independently from a Binomial (m, p) distribution. Use is made of
factorial experiments to investigate the individual effect of number of sorting elements
(n), binomial distribution parameters (m and p which give the number of independent trials and
the fixed probability of success in a single trial respectively) and also their interaction effects. A
International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
16
3-cube factorial experiment is conducted with three levels of each of the three factors n, m and p.
All the three factors are found to be significant both individually and interactively.
In our second study, Table-2 gives the data for factorial experiments to accomplish our study on
parameterized complexity.
Table 2: Data for 33
factorial experiment for Partition Sort
Partition sort times in second Binomial (m , p ) distribution input for various n (50000, 100000,
150000) , m (100 , 1000, 1500) and p (0.2, 0.5, 0.8).
Each reading is averaged over 50 readings.
n = 50000
m p=0.2 p=0.5 p=0.8
100 0.07248 0.07968 0.07314
1000 0.09662 0.10186 0.09884
1500 0.10032 0.10618 0.10212
n=100000
m p=0.2 p=0.5 p=0.8
100 0.16502 0.1734 0.16638
1000 0.21394 0.22318 0.21468
1500 0.22194 0.23084 0.22356
n = 150000
m p=0.2 p=0.5 p=0.8
100 0.26242 0.27632 0.26322
1000 0.33988 0.35744 0.34436
1500 0.35648 0.37 0.35572
Table-3 gives the results using MINITAB statistical package version 15.
Table-3: Results of 33
factorial experiment on partition-sort
International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
17
General Linear Model: y versus n, m, p
Factor Type Levels Values
n fixed 3 1, 2, 3
m fixed 3 1, 2, 3
p fixed 3 1, 2, 3
Analysis of Variance for y, using Adjusted SS for Tests
Source DF Seq SS Adj SS Adj MS F P
n 2 0.731167 0.731167 0.365584 17077435.80 0.000
m 2 0.056680 0.056680 0.028340 1323846.78 0.000
p 2 0.001440 0.001440 0.000720 33637.34 0.000
n*m 4 0.011331 0.011331 0.002833 132322.02 0.000
n*p 4 0.000283 0.000283 0.000071 3302.87 0.000
m*p 4 0.000034 0.000034 0.000009 397.33 0.000
n*m*p 8 0.000046 0.000046 0.000006 266.70 0.000
Error 54 0.000001 0.000001 0.000000
Total 80 0.800982
S = 0.000146313 R-Sq = 100.00% R-Sq(adj) = 100.00%
3. Discussion and more statistical analysis
Partition sort is highly affected by the main effects n, m and p. When we consider the interaction
effects, interestingly we find that all interactions are significant in Partition-Sort. Strikingly, even
the three factor interaction n*m*p cannot be neglected. This means Partition Sort is quite
sensitive to parameters of the input distribution and hence qualifies to be a potential candidate for
deep investigation in parameterized complexity both theoretically (through counts) and
experimentally (through weights) for inputs from other distributions. Further, we have obtained
some interesting patterns showing how the Binomial parameters influence the average sorting
time. Our investigations are ongoing for a theoretical justification for the same. The final results
are summarized in tables 4-5 and figures 2A, 2B and 3 based on these tables respectively.
Each entry in the following tables is averaged over 50 readings.
Table 4: Partition Sort, Binomial (m, p) distribution, array size N=50000, p=0.5 fixed
m 100 300 500 700 900 1100 1300 1500
Mean
time
(sec.)
0.07968 0.09066 0.09586 0.09968 0.10154 0.10438 0.10282 0.10618
International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
18
Avg Time vs m of Binomial (m,p)y = 1E-11x
3
- 5E-08x
2
+ 7E-05x + 0.0739
R
2
= 0.9916
0
0.02
0.04
0.06
0.08
0.1
0.12
0 200 400 600 800 1000 1200 1400 1600
m
AvgTimeofPartitionSortinsec
Fig 2A Third degree polynomial fit captures the trend
Avg Time vs m of Binomial (m,p)y = -5E-14x
4
+ 2E-10x
3
- 2E-07x
2
+ 0.0001x + 0.0703
R
2
= 0.9987
0
0.02
0.04
0.06
0.08
0.1
0.12
0 200 400 600 800 1000 1200 1400 1600
m
AvgTimeofPartitionSortinsec
Fig 2B Fourth degree polynomial appears to be a forced fit (over fit)
(don’t get carried away by the higher value of R2
!)
International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
19
Although the fourth degree polynomial fit gives a higher value of R2
, it forces the fit to pass
through all the data points. The essence of curve fitting lies in catching the trend (in the
population) exhibited by the observations rather than catching the observations themselves
(which reflect only a sample). Besides, a bound estimate must look like a bound estimate and it is
stronger to write yavg(n, m, p)=Oemp(m3
) than to write yavg(n, m, p)=Oemp(m4
) for fixed n and p.
So we agree to accept the first of the two propositions.
Table 5: Partition Sort, Binomial distribution (m, p), n=50000, m=1000 fixed
p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Mean
time
(sec.)
0.09084 0.09662 0.09884 0.10198 0.10186 0.10034 0.0989 0.09884 0.09096
Avg time vs p of Binomial (m,p) y = -0.0649x
2
+ 0.0659x + 0.0853
R
2
= 0.9293
0.09
0.092
0.094
0.096
0.098
0.1
0.102
0.104
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p
AvgTimeforPartitionSortinsec
Fig. 3 Second degree polynomial fit captures the trend
Fitting higher polynomials lead to over-fitting (details omitted) and from previous arguments we
put yavg(n, m, p)=Oemp(p2
) for fixed n and m.
For definitions of statistical bound and empirical O, we refer to [4]. For a list of properties of a
statistical complexity bound as well as to understand what design and analysis of computer
experiments mean when the response is a complexity such as time, [8] may be consulted.
4. Conclusion and suggestions for future work
International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
20
We conclude
(i) Partition Sort is more robust than Quick Sort in average case.
(ii) Partition Sort is sensitive to parameters of input distribution also, apart from the
parameter that characterizes the input size.
(iii) For n independent Binomial(m, p) inputs, all the three factors are significant both
independently and interactively. All the two factor interactions n*m, n*p and m*p
and even the three factor n*m*p is significant. This last finding is of paramount
importance to excite other researchers on parameterized complexity and is intriguing
if not impossible to be established theoretically. Theoretical analysis might confirm
the influence of the Binomial parameters but how do you confirm the significance of
their interactions? Using computer experiments where cheap and efficient prediction
is the motive [8][9][10], we have settled the imbroglio.
(iv) We also found yavg(n, m, p)=Oemp(m3
) for fixed n and p while yavg(n, m, p)=Oemp(p2
)
for fixed n and m. It should be kept in mind that these results are obtained by
working on weights and should not be confused with count based theoretical analysis
which need not always be identical.
In summary, this paper should convince the reader about the existence of weight based statistical
bounds that can be empirically estimated by merging the quantum of literature in computer
experiments (this literature includes factorial experiments, applied regression analysis and
exploratory data analysis, which we have used here, not to speak of other areas like spatial
statistics, bootstrapping, optimality design and even Bayesian analysis!) with that in algorithm
theory. Computer scientists will hopefully not throw away our statistical findings and should
seriously think about the prospects of building a weight based science theoretically to explain
algorithm analysis given that the current count based science is quite saturated. This was
essentially the central focus in our adventures. So the purpose achieved, we close the paper.
References
[1] S.Chakraborty and S.K. Sourabh, How Robust Are Average Complexity Measures? A Statistical
Case Study, Applied Math. and Compu., vol. 189(2), 2007, 1787-1797
[2] S.K. Sourabh and S.Chakraborty, How robust is quicksort average complexity? arXiv:0811.4376v1
[cs.DS], Advances in Mathematical Sciences Jour. (to appear)
[3] S.Chakraborty, S.K. Sourabh , On Why an Algorithmic Time Complexity Measure can be System
Invariant rather than System Independent, Applied Math. and Compu, Vol. 190(1), 2007, p. 195-204
[4] N.K. Singh and S.Chakraborty, Partition Sort and its Empirical Analysis, CIIT-2011, CCIS 250, pp.
340-346, 2011. © Springer-Verlag Heidelberg 2011.
[5] W.Kennedy and J.Gentle, Statistical Computing, Marcel Dekker Inc., 1980
[6] R.G. Downey and M.R.Fellows, Parameterized Complexity, Springer, (1999)
International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
21
[7] S.Chakraborty, S.K.Sourabh, M.Bose and K. Sushant, Replacement sort revisited: The “gold
standard” unearthed! , Applied Mathematics and Computation ,vol. 189(2), 2007, p. 384-394
[8] S.Chakraborty and S.K.Sourabh, A Computer Experiment Oriented Approach to Algorithmic
Complexity, Lambert Academic Publishing, (2010)
[9] J.Sacks, W.Weltch, T.Mitchel, H.Wynn, Design and Analysis of Computer Experiments, Statistical
Science Vol.4 (4), (1989)
[10] K.T. Fang, R. Li, A.Sudjianto, Design and Modeling of Computer Experiments, Chapman and Hall
(2006)

More Related Content

What's hot

Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
 
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming ProblemPenalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problempaperpublications3
 
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE IJORCS
 
To the numerical modeling of self similar solutions of
To the numerical modeling of self similar solutions ofTo the numerical modeling of self similar solutions of
To the numerical modeling of self similar solutions ofeSAT Publishing House
 
Clustering in artificial intelligence
Clustering in artificial intelligence Clustering in artificial intelligence
Clustering in artificial intelligence Karam Munir Butt
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
Application of thermal error in machine tools based on Dynamic Bayesian Network
Application of thermal error in machine tools based on Dynamic Bayesian NetworkApplication of thermal error in machine tools based on Dynamic Bayesian Network
Application of thermal error in machine tools based on Dynamic Bayesian NetworkIJRES Journal
 
Probabilistic Methods Of Signal And System Analysis, 3rd Edition
Probabilistic Methods Of Signal And System Analysis, 3rd EditionProbabilistic Methods Of Signal And System Analysis, 3rd Edition
Probabilistic Methods Of Signal And System Analysis, 3rd EditionPreston King
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporaryprjpublications
 
Estimating Reconstruction Error due to Jitter of Gaussian Markov Processes
Estimating Reconstruction Error due to Jitter of Gaussian Markov ProcessesEstimating Reconstruction Error due to Jitter of Gaussian Markov Processes
Estimating Reconstruction Error due to Jitter of Gaussian Markov ProcessesMudassir Javed
 
Two Types of Novel Discrete Time Chaotic Systems
Two Types of Novel Discrete Time Chaotic SystemsTwo Types of Novel Discrete Time Chaotic Systems
Two Types of Novel Discrete Time Chaotic Systemsijtsrd
 
A derivative free high ordered hybrid equation solver
A derivative free high ordered hybrid equation solverA derivative free high ordered hybrid equation solver
A derivative free high ordered hybrid equation solverZac Darcy
 
Ripple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network NodesRipple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network Nodesrahulmonikasharma
 
Bayesian system reliability and availability analysis underthe vague environm...
Bayesian system reliability and availability analysis underthe vague environm...Bayesian system reliability and availability analysis underthe vague environm...
Bayesian system reliability and availability analysis underthe vague environm...ijsc
 

What's hot (20)

Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster Results
 
Icmtea
IcmteaIcmtea
Icmtea
 
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming ProblemPenalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
 
Ijetcas14 536
Ijetcas14 536Ijetcas14 536
Ijetcas14 536
 
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
 
To the numerical modeling of self similar solutions of
To the numerical modeling of self similar solutions ofTo the numerical modeling of self similar solutions of
To the numerical modeling of self similar solutions of
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Clustering in artificial intelligence
Clustering in artificial intelligence Clustering in artificial intelligence
Clustering in artificial intelligence
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
Application of thermal error in machine tools based on Dynamic Bayesian Network
Application of thermal error in machine tools based on Dynamic Bayesian NetworkApplication of thermal error in machine tools based on Dynamic Bayesian Network
Application of thermal error in machine tools based on Dynamic Bayesian Network
 
Probabilistic Methods Of Signal And System Analysis, 3rd Edition
Probabilistic Methods Of Signal And System Analysis, 3rd EditionProbabilistic Methods Of Signal And System Analysis, 3rd Edition
Probabilistic Methods Of Signal And System Analysis, 3rd Edition
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
 
Estimating Reconstruction Error due to Jitter of Gaussian Markov Processes
Estimating Reconstruction Error due to Jitter of Gaussian Markov ProcessesEstimating Reconstruction Error due to Jitter of Gaussian Markov Processes
Estimating Reconstruction Error due to Jitter of Gaussian Markov Processes
 
Two Types of Novel Discrete Time Chaotic Systems
Two Types of Novel Discrete Time Chaotic SystemsTwo Types of Novel Discrete Time Chaotic Systems
Two Types of Novel Discrete Time Chaotic Systems
 
08 clustering
08 clustering08 clustering
08 clustering
 
A derivative free high ordered hybrid equation solver
A derivative free high ordered hybrid equation solverA derivative free high ordered hybrid equation solver
A derivative free high ordered hybrid equation solver
 
Ripple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network NodesRipple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network Nodes
 
Fuzzy c-means
Fuzzy c-meansFuzzy c-means
Fuzzy c-means
 
Bayesian system reliability and availability analysis underthe vague environm...
Bayesian system reliability and availability analysis underthe vague environm...Bayesian system reliability and availability analysis underthe vague environm...
Bayesian system reliability and availability analysis underthe vague environm...
 
Feature Selection
Feature Selection Feature Selection
Feature Selection
 

Similar to Partition Sort Revisited: Reconfirming the Robustness in Average Case and Much More

A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMS
A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMSA STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMS
A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMSijfcstjournal
 
A statistical comparative study of
A statistical comparative study ofA statistical comparative study of
A statistical comparative study ofijfcstjournal
 
An efficient fuzzy classifier with feature selection based
An efficient fuzzy classifier with feature selection basedAn efficient fuzzy classifier with feature selection based
An efficient fuzzy classifier with feature selection basedssairayousaf
 
Local Model Checking Algorithm Based on Mu-calculus with Partial Orders
Local Model Checking Algorithm Based on Mu-calculus with Partial OrdersLocal Model Checking Algorithm Based on Mu-calculus with Partial Orders
Local Model Checking Algorithm Based on Mu-calculus with Partial OrdersTELKOMNIKA JOURNAL
 
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...
DCE: A NOVEL DELAY CORRELATION  MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE  REAL...DCE: A NOVEL DELAY CORRELATION  MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE  REAL...
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...ijdpsjournal
 
New Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral RecognitionNew Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral RecognitionIJERA Editor
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...CSCJournals
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...Waqas Tariq
 
4213ijaia05
4213ijaia054213ijaia05
4213ijaia05ijaia
 
IRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET Journal
 
Parametric sensitivity analysis of a mathematical model of facultative mutualism
Parametric sensitivity analysis of a mathematical model of facultative mutualismParametric sensitivity analysis of a mathematical model of facultative mutualism
Parametric sensitivity analysis of a mathematical model of facultative mutualismIOSR Journals
 
Investigation on Using Fractal Geometry for Classification of Partial Dischar...
Investigation on Using Fractal Geometry for Classification of Partial Dischar...Investigation on Using Fractal Geometry for Classification of Partial Dischar...
Investigation on Using Fractal Geometry for Classification of Partial Dischar...IOSR Journals
 
LINEAR SEARCH VERSUS BINARY SEARCH: A STATISTICAL COMPARISON FOR BINOMIAL INPUTS
LINEAR SEARCH VERSUS BINARY SEARCH: A STATISTICAL COMPARISON FOR BINOMIAL INPUTSLINEAR SEARCH VERSUS BINARY SEARCH: A STATISTICAL COMPARISON FOR BINOMIAL INPUTS
LINEAR SEARCH VERSUS BINARY SEARCH: A STATISTICAL COMPARISON FOR BINOMIAL INPUTSIJCSEA Journal
 
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!idescitation
 
Mathematical Modelling and Computer Simulation Assist in Designing Non-tradit...
Mathematical Modelling and Computer Simulation Assist in Designing Non-tradit...Mathematical Modelling and Computer Simulation Assist in Designing Non-tradit...
Mathematical Modelling and Computer Simulation Assist in Designing Non-tradit...IJECEIAES
 
Random Walks in Statistical Theory of Communication
Random Walks in Statistical Theory of CommunicationRandom Walks in Statistical Theory of Communication
Random Walks in Statistical Theory of CommunicationIRJET Journal
 
Paper id 21201483
Paper id 21201483Paper id 21201483
Paper id 21201483IJRAT
 
Stevens-Benchmarking Sorting Algorithms
Stevens-Benchmarking Sorting AlgorithmsStevens-Benchmarking Sorting Algorithms
Stevens-Benchmarking Sorting AlgorithmsJames Stevens
 
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...IOSR Journals
 

Similar to Partition Sort Revisited: Reconfirming the Robustness in Average Case and Much More (20)

A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMS
A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMSA STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMS
A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMS
 
A statistical comparative study of
A statistical comparative study ofA statistical comparative study of
A statistical comparative study of
 
An efficient fuzzy classifier with feature selection based
An efficient fuzzy classifier with feature selection basedAn efficient fuzzy classifier with feature selection based
An efficient fuzzy classifier with feature selection based
 
Local Model Checking Algorithm Based on Mu-calculus with Partial Orders
Local Model Checking Algorithm Based on Mu-calculus with Partial OrdersLocal Model Checking Algorithm Based on Mu-calculus with Partial Orders
Local Model Checking Algorithm Based on Mu-calculus with Partial Orders
 
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...
DCE: A NOVEL DELAY CORRELATION  MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE  REAL...DCE: A NOVEL DELAY CORRELATION  MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE  REAL...
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...
 
New Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral RecognitionNew Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral Recognition
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
4213ijaia05
4213ijaia054213ijaia05
4213ijaia05
 
IRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using Clustering
 
Parametric sensitivity analysis of a mathematical model of facultative mutualism
Parametric sensitivity analysis of a mathematical model of facultative mutualismParametric sensitivity analysis of a mathematical model of facultative mutualism
Parametric sensitivity analysis of a mathematical model of facultative mutualism
 
Investigation on Using Fractal Geometry for Classification of Partial Dischar...
Investigation on Using Fractal Geometry for Classification of Partial Dischar...Investigation on Using Fractal Geometry for Classification of Partial Dischar...
Investigation on Using Fractal Geometry for Classification of Partial Dischar...
 
LINEAR SEARCH VERSUS BINARY SEARCH: A STATISTICAL COMPARISON FOR BINOMIAL INPUTS
LINEAR SEARCH VERSUS BINARY SEARCH: A STATISTICAL COMPARISON FOR BINOMIAL INPUTSLINEAR SEARCH VERSUS BINARY SEARCH: A STATISTICAL COMPARISON FOR BINOMIAL INPUTS
LINEAR SEARCH VERSUS BINARY SEARCH: A STATISTICAL COMPARISON FOR BINOMIAL INPUTS
 
9.venkata naga vamsi. a
9.venkata naga vamsi. a9.venkata naga vamsi. a
9.venkata naga vamsi. a
 
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
 
Mathematical Modelling and Computer Simulation Assist in Designing Non-tradit...
Mathematical Modelling and Computer Simulation Assist in Designing Non-tradit...Mathematical Modelling and Computer Simulation Assist in Designing Non-tradit...
Mathematical Modelling and Computer Simulation Assist in Designing Non-tradit...
 
Random Walks in Statistical Theory of Communication
Random Walks in Statistical Theory of CommunicationRandom Walks in Statistical Theory of Communication
Random Walks in Statistical Theory of Communication
 
Paper id 21201483
Paper id 21201483Paper id 21201483
Paper id 21201483
 
Stevens-Benchmarking Sorting Algorithms
Stevens-Benchmarking Sorting AlgorithmsStevens-Benchmarking Sorting Algorithms
Stevens-Benchmarking Sorting Algorithms
 
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...
 

Recently uploaded

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Partition Sort Revisited: Reconfirming the Robustness in Average Case and Much More

  • 1. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012 DOI : 10.5121/ijcsa.2012.2102 13 PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUCH MORE! Niraj Kumar Singh1 , Mita Pal2 and Soubhik Chakraborty3* 1 Department of Computer Science & Engineering, B.I.T. Mesra, Ranchi-835215, India 2,3 Department of Applied Mathematics, B.I.T. Mesra, Ranchi-835215, India *email address of the corresponding author: soubhikc@yahoo.co.in (S. Chakraborty) ABSTRACT In our previous work there was some indication that Partition Sort could be having a more robust average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we reconfirm this through computer experiments for inputs from Cauchy distribution for which expectation theoretically does not exist. Additionally, the algorithm is found to be sensitive to parameters of the input probability distribution demanding further investigation on parameterized complexity. The results on this algorithm for Binomial inputs in our second study are very encouraging in that direction. KEY WORDS Partition-sort; average case complexity, robustness; parameterized complexity; computer experiments; factorial experiments 1. Introduction Average complexity is an important field of study in algorithm analysis as it explains how certain algorithms with bad worst case complexity perform better on the average like Quick sort. The danger in making such a claim often lies in not verifying the robustness of the average complexity in question. Average complexity is theoretically obtained by applying mathematical expectation to the dominant operation or the dominant region in the code. One problem is: for a complex code it is not easy to identify the dominant operation. This problem can be resolved by replacing the count based mathematical bound by a weight based statistical bound that also permits collective consideration of all operations and then estimate it by directly working on time, regarding the time consumed by an operation as its weight. A bigger problem is that the probability distribution over which expectation is taken may not be realistic over the domain of the problem. Algorithm books derive these expectations for uniform probability inputs. Nothing is stated explicitly that the results will hold even for non-uniform inputs nor is there any indication as to how realistic the
  • 2. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012 14 uniform input is over the domain of the problem. The rejection of Knuth’s proof in [1] and Hoare’s proof in [2] for non uniform inputs should be a curtain raiser in that direction. Similarly, it appears from [3] that the average complexity in Schoor’s matrix multiplication algorithm is not the expected number of multiplications O(d1d2 n3 ), d1 and d2 being the density (fraction of non zero elements) of pre and post factor matrices, but the exact number of comparisons which is n2 provided there are sufficient zeroes and surprisingly we don’t need a sparse matrix to get an empirical O(n2 ) complexity! This result is obtained using a statistical bound estimate and shows that multiplication need not be the dominant operation in every matrix multiplication algorithm under certain input conditions. In our previous work we introduced Partition Sort and discovered it to be having a more robust average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we reconfirm this through computer experiments for inputs from Cauchy distribution for which expectation theoretically does not exist! Additionally, the algorithm is found to be sensitive to parameters of the input probability distribution demanding further investigation on parameterized complexity on this algorithm. This is confirmed for Binomial inputs in our second study. The Algorithm Partition Sort Partition-sort algorithm is based on divide and conquer paradigm. The function “partition” is the key sub-routine of this algorithm. The nature of partition function is such that when applied on input A[1…….n] it divides this list into two halves of sizes floor (n/2) and ceiling (n/2) respectively. The property of the elements in these halves is such that the value of each element in first half is less than the value of every element in the second half. The Partition-sort routine is called on each half recursively to finally obtain a sorted sequence of data as required. Partition Sort was introduced by Singh and Chakraborty [4] who obtained O(nlog2 2 n) worst case count, (nlog2n) best case count and empirical O(nlog2n) as the statistical bound estimate by working directly on time, for reasons stated earlier, in the average case. 2. Statistical Analysis 2.1 Reconfirming the robustness of average complexity of Partition Sort Theorem 1: If U1 and U2 are two independent uniform U [0, 1] variates then Z1 and Z2 defined below are two independent Standard Normal variates: Z1= (-2lnU1)1/2 Cos(2ЛU2); Z2= (-2lnU1)1/2 Sin(2ЛU2) This result is called Box Muller transformation. Theorem 2: If Z1 and Z2 are two independent standard Normal variates then Z1/Z2 is a standard Cauchy variate. For more details, we refer to [5]. Cauchy distribution is an unconventional distribution for which expectation does not exist theoretically. Hence it is not possible to know the average case complexity theoretically for inputs from this distribution. Working directly on time, using computer experiments, we have obtained an empirical O(nlogn) complexity in average sorting time for Partition sort for Cauchy distribution inputs which we simulated using theorems 1 and 2 given above. This result goes a
  • 3. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012 15 long way in reconfirming that Partition Sort’s average complexity is more robust compared to that of Quick Sort. In [4] we have theoretically proved that its worst case complexity is also much better than that of Quick Sort as O(nlog2 2 n) < O(n2 ). Although Partition Sort is inferior to Heap Sort’s O(nlogn) complexity in worst case, it is still easier to program Partition Sort. Table 1 and figure 1 based on table 1 summarize our results. Table 1: Average time for Partition Sort for Cauchy distribution inputs N 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 Me an Tim e (Se c.) 0.0516 8 0.1081 6 0.148 7 0.1721 8 0.2049 4 0.2407 8 0.265 9 0.3132 2 0.3512 8 0.39496 Fig. 1: Regression model suggesting empirical O(nlogn) complexity 2.2 Partition Sort subjected to parameterized complexity analysis Parameterized complexity is a branch of computational complexity theory in computer science that focuses on classifying computational problems according to their inherent difficulty with respect to multiple parameters of the input. The complexity of a problem is then measured as a function in those parameters. This allows to classify NP-hard problems on a finer scale than in the classical setting, where the complexity of a problem is only measured by the number of bits in the input (see also http://en.wikipedia.org/wiki/Parameterized_complexity). The first systematic work on parameterized complexity was done by Downey & Fellows [6]. The authors in [7] have strongly argued both theoretically and experimentally why for certain algorithms like sorting, the parameters of the input distribution should also be taken into account for explaining the complexity, not just the parameter characterizing the size of the input. The second study is accordingly devoted to parameterized complexity analysis whereby the sorting elements of Partition Sort come independently from a Binomial (m, p) distribution. Use is made of factorial experiments to investigate the individual effect of number of sorting elements (n), binomial distribution parameters (m and p which give the number of independent trials and the fixed probability of success in a single trial respectively) and also their interaction effects. A
  • 4. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012 16 3-cube factorial experiment is conducted with three levels of each of the three factors n, m and p. All the three factors are found to be significant both individually and interactively. In our second study, Table-2 gives the data for factorial experiments to accomplish our study on parameterized complexity. Table 2: Data for 33 factorial experiment for Partition Sort Partition sort times in second Binomial (m , p ) distribution input for various n (50000, 100000, 150000) , m (100 , 1000, 1500) and p (0.2, 0.5, 0.8). Each reading is averaged over 50 readings. n = 50000 m p=0.2 p=0.5 p=0.8 100 0.07248 0.07968 0.07314 1000 0.09662 0.10186 0.09884 1500 0.10032 0.10618 0.10212 n=100000 m p=0.2 p=0.5 p=0.8 100 0.16502 0.1734 0.16638 1000 0.21394 0.22318 0.21468 1500 0.22194 0.23084 0.22356 n = 150000 m p=0.2 p=0.5 p=0.8 100 0.26242 0.27632 0.26322 1000 0.33988 0.35744 0.34436 1500 0.35648 0.37 0.35572 Table-3 gives the results using MINITAB statistical package version 15. Table-3: Results of 33 factorial experiment on partition-sort
  • 5. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012 17 General Linear Model: y versus n, m, p Factor Type Levels Values n fixed 3 1, 2, 3 m fixed 3 1, 2, 3 p fixed 3 1, 2, 3 Analysis of Variance for y, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P n 2 0.731167 0.731167 0.365584 17077435.80 0.000 m 2 0.056680 0.056680 0.028340 1323846.78 0.000 p 2 0.001440 0.001440 0.000720 33637.34 0.000 n*m 4 0.011331 0.011331 0.002833 132322.02 0.000 n*p 4 0.000283 0.000283 0.000071 3302.87 0.000 m*p 4 0.000034 0.000034 0.000009 397.33 0.000 n*m*p 8 0.000046 0.000046 0.000006 266.70 0.000 Error 54 0.000001 0.000001 0.000000 Total 80 0.800982 S = 0.000146313 R-Sq = 100.00% R-Sq(adj) = 100.00% 3. Discussion and more statistical analysis Partition sort is highly affected by the main effects n, m and p. When we consider the interaction effects, interestingly we find that all interactions are significant in Partition-Sort. Strikingly, even the three factor interaction n*m*p cannot be neglected. This means Partition Sort is quite sensitive to parameters of the input distribution and hence qualifies to be a potential candidate for deep investigation in parameterized complexity both theoretically (through counts) and experimentally (through weights) for inputs from other distributions. Further, we have obtained some interesting patterns showing how the Binomial parameters influence the average sorting time. Our investigations are ongoing for a theoretical justification for the same. The final results are summarized in tables 4-5 and figures 2A, 2B and 3 based on these tables respectively. Each entry in the following tables is averaged over 50 readings. Table 4: Partition Sort, Binomial (m, p) distribution, array size N=50000, p=0.5 fixed m 100 300 500 700 900 1100 1300 1500 Mean time (sec.) 0.07968 0.09066 0.09586 0.09968 0.10154 0.10438 0.10282 0.10618
  • 6. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012 18 Avg Time vs m of Binomial (m,p)y = 1E-11x 3 - 5E-08x 2 + 7E-05x + 0.0739 R 2 = 0.9916 0 0.02 0.04 0.06 0.08 0.1 0.12 0 200 400 600 800 1000 1200 1400 1600 m AvgTimeofPartitionSortinsec Fig 2A Third degree polynomial fit captures the trend Avg Time vs m of Binomial (m,p)y = -5E-14x 4 + 2E-10x 3 - 2E-07x 2 + 0.0001x + 0.0703 R 2 = 0.9987 0 0.02 0.04 0.06 0.08 0.1 0.12 0 200 400 600 800 1000 1200 1400 1600 m AvgTimeofPartitionSortinsec Fig 2B Fourth degree polynomial appears to be a forced fit (over fit) (don’t get carried away by the higher value of R2 !)
  • 7. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012 19 Although the fourth degree polynomial fit gives a higher value of R2 , it forces the fit to pass through all the data points. The essence of curve fitting lies in catching the trend (in the population) exhibited by the observations rather than catching the observations themselves (which reflect only a sample). Besides, a bound estimate must look like a bound estimate and it is stronger to write yavg(n, m, p)=Oemp(m3 ) than to write yavg(n, m, p)=Oemp(m4 ) for fixed n and p. So we agree to accept the first of the two propositions. Table 5: Partition Sort, Binomial distribution (m, p), n=50000, m=1000 fixed p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Mean time (sec.) 0.09084 0.09662 0.09884 0.10198 0.10186 0.10034 0.0989 0.09884 0.09096 Avg time vs p of Binomial (m,p) y = -0.0649x 2 + 0.0659x + 0.0853 R 2 = 0.9293 0.09 0.092 0.094 0.096 0.098 0.1 0.102 0.104 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p AvgTimeforPartitionSortinsec Fig. 3 Second degree polynomial fit captures the trend Fitting higher polynomials lead to over-fitting (details omitted) and from previous arguments we put yavg(n, m, p)=Oemp(p2 ) for fixed n and m. For definitions of statistical bound and empirical O, we refer to [4]. For a list of properties of a statistical complexity bound as well as to understand what design and analysis of computer experiments mean when the response is a complexity such as time, [8] may be consulted. 4. Conclusion and suggestions for future work
  • 8. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012 20 We conclude (i) Partition Sort is more robust than Quick Sort in average case. (ii) Partition Sort is sensitive to parameters of input distribution also, apart from the parameter that characterizes the input size. (iii) For n independent Binomial(m, p) inputs, all the three factors are significant both independently and interactively. All the two factor interactions n*m, n*p and m*p and even the three factor n*m*p is significant. This last finding is of paramount importance to excite other researchers on parameterized complexity and is intriguing if not impossible to be established theoretically. Theoretical analysis might confirm the influence of the Binomial parameters but how do you confirm the significance of their interactions? Using computer experiments where cheap and efficient prediction is the motive [8][9][10], we have settled the imbroglio. (iv) We also found yavg(n, m, p)=Oemp(m3 ) for fixed n and p while yavg(n, m, p)=Oemp(p2 ) for fixed n and m. It should be kept in mind that these results are obtained by working on weights and should not be confused with count based theoretical analysis which need not always be identical. In summary, this paper should convince the reader about the existence of weight based statistical bounds that can be empirically estimated by merging the quantum of literature in computer experiments (this literature includes factorial experiments, applied regression analysis and exploratory data analysis, which we have used here, not to speak of other areas like spatial statistics, bootstrapping, optimality design and even Bayesian analysis!) with that in algorithm theory. Computer scientists will hopefully not throw away our statistical findings and should seriously think about the prospects of building a weight based science theoretically to explain algorithm analysis given that the current count based science is quite saturated. This was essentially the central focus in our adventures. So the purpose achieved, we close the paper. References [1] S.Chakraborty and S.K. Sourabh, How Robust Are Average Complexity Measures? A Statistical Case Study, Applied Math. and Compu., vol. 189(2), 2007, 1787-1797 [2] S.K. Sourabh and S.Chakraborty, How robust is quicksort average complexity? arXiv:0811.4376v1 [cs.DS], Advances in Mathematical Sciences Jour. (to appear) [3] S.Chakraborty, S.K. Sourabh , On Why an Algorithmic Time Complexity Measure can be System Invariant rather than System Independent, Applied Math. and Compu, Vol. 190(1), 2007, p. 195-204 [4] N.K. Singh and S.Chakraborty, Partition Sort and its Empirical Analysis, CIIT-2011, CCIS 250, pp. 340-346, 2011. © Springer-Verlag Heidelberg 2011. [5] W.Kennedy and J.Gentle, Statistical Computing, Marcel Dekker Inc., 1980 [6] R.G. Downey and M.R.Fellows, Parameterized Complexity, Springer, (1999)
  • 9. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012 21 [7] S.Chakraborty, S.K.Sourabh, M.Bose and K. Sushant, Replacement sort revisited: The “gold standard” unearthed! , Applied Mathematics and Computation ,vol. 189(2), 2007, p. 384-394 [8] S.Chakraborty and S.K.Sourabh, A Computer Experiment Oriented Approach to Algorithmic Complexity, Lambert Academic Publishing, (2010) [9] J.Sacks, W.Weltch, T.Mitchel, H.Wynn, Design and Analysis of Computer Experiments, Statistical Science Vol.4 (4), (1989) [10] K.T. Fang, R. Li, A.Sudjianto, Design and Modeling of Computer Experiments, Chapman and Hall (2006)