This research paper is a statistical comparative study of a few average case asymptotically optimal sorting
algorithms namely, Quick sort, Heap sort and K- sort. The three sorting algorithms all with the same
average case complexity have been compared by obtaining the corresponding statistical bounds while
subjecting these procedures over the randomly generated data from some standard discrete and continuous
probability distributions such as Binomial distribution, Uniform discrete and continuous distribution and
Poisson distribution. The statistical analysis is well supplemented by the parameterized complexity
analysis
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!idescitation
Sundararajan and Chakraborty [10] introduced a new
version of Quick sort removing the interchanges. Khreisat
[6] found this algorithm to be competing well with some other
versions of Quick sort. However, it uses an auxiliary array
thereby increasing the space complexity. Here, we provide a
second version of our new sort where we have removed the
auxiliary array. This second improved version of the algorithm,
which we call K-sort, is found to sort elements faster than
Heap sort for an appreciably large array size (n 70,00,000)
for uniform U[0, 1] inputs.
BINARY TREE SORT IS MORE ROBUST THAN QUICK SORT IN AVERAGE CASEIJCSEA Journal
Average case complexity, in order to be a useful and reliable measure, has to be robust. The probability distribution, generally uniform, over which expectation is taken should be realistic over the problem domain. But algorithm books do not certify that the uniform inputs are always realistic. Do the results hold even for non uniform inputs? In this context we observe that Binary Tree sort is more robust than the fastand popular Quick sort in the average case.
Partition Sort Revisited: Reconfirming the Robustness in Average Case and Muc...ijcsa
In our previous work there was some indication that Partition Sort could be having a more robust average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we reconfirm this through computer experiments for inputs from Cauchy distribution for which expectation theoretically does not exist. Additionally, the algorithm is found to be sensitive to parameters of the input probability distribution demanding further investigation on parameterized complexity. The results on this algorithm for Binomial inputs in our second study are very encouraging in that direction.
PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUC...IJCSEA Journal
In our previous work there was some indication that Partition Sort could be having a more robust average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we reconfirm this through computer experiments for inputs from Cauchy distribution for which expectation theoretically does not exist. Additionally, the algorithm is found to be sensitive to parameters of the input probability distribution demanding further investigation on parameterized complexity. The results on this algorithm for Binomial inputs in our second study are very encouraging in that direction.
MFBLP Method Forecast for Regional Load Demand SystemCSCJournals
Load forecast plays an important role in planning and operation of a power system. The accuracy of the forecast value is necessary for economically efficient operation and also for effective control. This paper describes a method of modified forward backward linear predictor (MFBLP) for solving the regional load demand of New South Wales (NSW), Australia. The method is designed and simulated based on the actual load data of New South Wales, Australia. The accuracy of discussed method is obtained and comparison with previous methods is also reported.
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!idescitation
Sundararajan and Chakraborty [10] introduced a new
version of Quick sort removing the interchanges. Khreisat
[6] found this algorithm to be competing well with some other
versions of Quick sort. However, it uses an auxiliary array
thereby increasing the space complexity. Here, we provide a
second version of our new sort where we have removed the
auxiliary array. This second improved version of the algorithm,
which we call K-sort, is found to sort elements faster than
Heap sort for an appreciably large array size (n 70,00,000)
for uniform U[0, 1] inputs.
BINARY TREE SORT IS MORE ROBUST THAN QUICK SORT IN AVERAGE CASEIJCSEA Journal
Average case complexity, in order to be a useful and reliable measure, has to be robust. The probability distribution, generally uniform, over which expectation is taken should be realistic over the problem domain. But algorithm books do not certify that the uniform inputs are always realistic. Do the results hold even for non uniform inputs? In this context we observe that Binary Tree sort is more robust than the fastand popular Quick sort in the average case.
Partition Sort Revisited: Reconfirming the Robustness in Average Case and Muc...ijcsa
In our previous work there was some indication that Partition Sort could be having a more robust average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we reconfirm this through computer experiments for inputs from Cauchy distribution for which expectation theoretically does not exist. Additionally, the algorithm is found to be sensitive to parameters of the input probability distribution demanding further investigation on parameterized complexity. The results on this algorithm for Binomial inputs in our second study are very encouraging in that direction.
PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUC...IJCSEA Journal
In our previous work there was some indication that Partition Sort could be having a more robust average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we reconfirm this through computer experiments for inputs from Cauchy distribution for which expectation theoretically does not exist. Additionally, the algorithm is found to be sensitive to parameters of the input probability distribution demanding further investigation on parameterized complexity. The results on this algorithm for Binomial inputs in our second study are very encouraging in that direction.
MFBLP Method Forecast for Regional Load Demand SystemCSCJournals
Load forecast plays an important role in planning and operation of a power system. The accuracy of the forecast value is necessary for economically efficient operation and also for effective control. This paper describes a method of modified forward backward linear predictor (MFBLP) for solving the regional load demand of New South Wales (NSW), Australia. The method is designed and simulated based on the actual load data of New South Wales, Australia. The accuracy of discussed method is obtained and comparison with previous methods is also reported.
On Selection of Periodic Kernels Parameters in Time Series Prediction cscpconf
In the paper the analysis of the periodic kernels parameters is described. Periodic kernels can
be used for the prediction task, performed as the typical regression problem. On the basis of the
Periodic Kernel Estimator (PerKE) the prediction of real time series is performed. As periodic
kernels require the setting of their parameters it is necessary to analyse their influence on the
prediction quality. This paper describes an easy methodology of finding values of parameters of
periodic kernels. It is based on grid search. Two different error measures are taken into
consideration as the prediction qualities but lead to comparable results. The methodology was
tested on benchmark and real datasets and proved to give satisfactory results.
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONcscpconf
In the paper the analysis of the periodic kernels parameters is described. Periodic kernels can
be used for the prediction task, performed as the typical regression problem. On the basis of the
Periodic Kernel Estimator (PerKE) the prediction of real time series is performed. As periodic
kernels require the setting of their parameters it is necessary to analyse their influence on the
prediction quality. This paper describes an easy methodology of finding values of parameters of
periodic kernels. It is based on grid search. Two different error measures are taken into
consideration as the prediction qualities but lead to comparable results. The methodology was
tested on benchmark and real datasets and proved to give satisfactory results.
Performance Assessment of Polyphase Sequences Using Cyclic Algorithmrahulmonikasharma
Polyphase Sequences (known as P1, P2, Px, Frank) exist for a square integer length with good auto correlation properties are helpful in the several applications. Unlike the Barker and Binary Sequences which exist for certain length and exhibits a maximum of two digit merit factor. The Integrated Sidelobe level (ISL) is often used to define excellence of the autocorrelation properties of given Polyphase sequence. In this paper, we present the application of Cyclic Algorithm named CA which minimizes the ISL (Integrated Sidelobe Level) related metric which in turn improve the Merit factor to a greater extent is main thing in applications like RADAR, SONAR and communications. To illustrate the performance of the P1, P2, Px, Frank sequences when cyclic Algorithm is applied. we presented a number of examples for integer lengths. CA(Px) sequence exhibits the good Merit Factor among all the Polyphase sequences that are considered.
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMcsitconf
A quantum computation problem is discussed in this paper. Many new features that make
quantum computation superior to classical computation can be attributed to quantum coherence
effect, which depends on the phase of quantum coherent state. Quantum Fourier transform
algorithm, the most commonly used algorithm, is introduced. And one of its most important
applications, phase estimation of quantum state based on quantum Fourier transform, is
presented in details. The flow of phase estimation algorithm and the quantum circuit model are
shown. And the error of the output phase value, as well as the probability of measurement, is
analysed. The probability distribution of the measuring result of phase value is presented and
the computational efficiency is discussed.
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMcscpconf
A quantum computation problem is discussed in this paper. Many new features that make quantum computation superior to classical computation can be attributed to quantum coherence
effect, which depends on the phase of quantum coherent state. Quantum Fourier transform algorithm, the most commonly used algorithm, is introduced. And one of its most important
applications, phase estimation of quantum state based on quantum Fourier transform, is presented in details. The flow of phase estimation algorithm and the quantum circuit model are
shown. And the error of the output phase value, as well as the probability of measurement, is analysed. The probability distribution of the measuring result of phase value is presented and the computational efficiency is discussed.
SEQUENTIAL CLUSTERING-BASED EVENT DETECTION FOR NONINTRUSIVE LOAD MONITORINGcscpconf
The problem of change-point detection has been well studied and adopted in many signal processing applications. In such applications, the informative segments of the signal are the stationary ones before and after the change-point. However, for some novel signal processing and machine learning applications such as Non-Intrusive Load Monitoring (NILM), the information contained in the non-stationary transient intervals is of equal or even more importance to the recognition process. In this paper, we introduce a novel clustering-based sequential detection of abrupt changes in an aggregate electricity consumption profile with the accurate decomposition of the input signal into stationary and non-stationary segments. We also introduce various event models in the context of clustering analysis. The proposed algorithm is applied to building-level energy profiles with promising results for the residential BLUED power dataset.
SEQUENTIAL CLUSTERING-BASED EVENT DETECTION FOR NONINTRUSIVE LOAD MONITORINGcsandit
The problem of change-point detection has been well studied and adopted in many signal processing applications. In such applications, the informative segments of the signal are the
stationary ones before and after the change-point. However, for some novel signal processing and machine learning applications such as Non-Intrusive Load Monitoring (NILM), the information contained in the non-stationary transient intervals is of equal or even more importance to the recognition process. In this paper, we introduce a novel clustering-based sequential detection of abrupt changes in an aggregate electricity consumption profile with
accurate decomposition of the input signal into stationary and non-stationary segments. We also introduce various event models in the context of clustering analysis. The proposed algorithm is applied to building-level energy profiles with promising results for the residential BLUED power dataset.
Geoid height determination is one of the major problems of geodesy because usage of satellite
techniques in geodesy isgetting increasing. Geoid heights can be determined using different methods according
to the available data. Soft computing methods such as Fuzzy logic and neural networks became so popular that
they are used to solve many engineering problems. Fuzzy logic theory and later developments in uncertainty
assessment have enabled us to develop more precise models for our requirements. In this study, How to
construct the best fuzzy model is examined. For this purpose, three different data sets were taken and two
different kinds (two inpust one output and three inputs one output) fuzzy model were formed for the calculation
of geoid heights in Istanbul (Turkey). The Fuzzy models results of these were compared with geoid heights
obtained by GPS/levelling methods. The fuzzy approximation models were tested on the test points.
LINEAR SEARCH VERSUS BINARY SEARCH: A STATISTICAL COMPARISON FOR BINOMIAL INPUTSIJCSEA Journal
For certain algorithms such as sorting and searching, the parameters of the input probability distribution,in addition to the size of the input, have been found to influence the complexity of the underlying algorithm.The present paper makes a statistical comparative study on parameterized complexity between linear and binary search algorithms for binomial inputs.
Parallel knn on gpu architecture using opencleSAT Journals
Abstract In data mining applications, one of the useful algorithms for classification is the kNN algorithm. The kNN search has a wide usage in many research and industrial domains like 3-dimensional object rendering, content-based image retrieval, statistics, biology (gene classification), etc. In spite of some improvements in the last decades, the computation time required by the kNN search remains the bottleneck for kNN classification, especially in high dimensional spaces. This bottleneck has created the necessity of the parallel kNN on commodity hardware. GPU and OpenCL architecture are the low cost high performance solutions for parallelising the kNN classifier. In regard to this, we have designed, implemented our proposed parallel kNN model to improve upon performance bottleneck issue of kNN algorithm. In this paper, we have proposed parallel kNN algorithm on GPU and OpenCL framework. In our approach, we distributed the distance computations of the data points among all GPU cores. Multiple threads invoked for each GPU core. We have implemented and tested our parallel kNN implementation on UCI datasets. The experimental results show that the speedup of the KNN algorithm is improved over the serial performance.
Keywords: kNN, GPU, CPU, Parallel Computing, Data Mining, Clustering Algorithm.
Financial Time Series Analysis Based On Normalized Mutual Information FunctionsIJCI JOURNAL
A method of predictability analysis of future values of financial time series is described. The method is based on normalized mutual information functions. In the analysis, the use of these functions allowed to refuse any restrictions on the distributions of the parameters and on the correlations between parameters. A comparative analysis of the predictability of financial time series of Tel Aviv 25 stock exchange has been carried out.
This paper introduces a new comparison base stable sorting algorithm, named RA sort. The RA sort
involves only the comparison of pair of elements in an array which ultimately sorts the array and does not
involve the comparison of each element with every other element. It tries to build upon the relationship
established between the elements in each pass. Instead of going for a blind comparison we prefer a
selective comparison to get an efficient method. Sorting is a fundamental operation in computer science.
This algorithm is analysed both theoretically and empirically to get a robust average case result. We have
performed its Empirical analysis and compared its performance with the well-known quick sort for various
input types. Although the theoretical worst case complexity of RA sort is Yworst(n) = O(n√), the
experimental results suggest an empirical Oemp(nlgn)1.333 time complexity for typical input instances, where
the parameter n characterizes the input size. The theoretical complexity is given for comparison operation.
We emphasize that the theoretical complexity is operation specific whereas the empirical one represents the
overall algorithmic complexity.
A Singular Spectrum Analysis Technique to Electricity Consumption ForecastingIJERA Editor
Singular Spectrum Analysis (SSA) is a relatively new and powerful nonparametric tool for analyzing and forecasting economic data. SSA is capable of decomposing the main time series into independent components like trends, oscillatory manner and noise. This paper focuses on employing the performance of SSA approach to the monthly electricity consumption of the Middle Province in Gaza Strip\Palestine. The forecasting results are compared with the results of exponential smoothing state space (ETS) and ARIMA models. The three techniques do similarly well in forecasting process. However, SSA outperforms the ETS and ARIMA techniques according to forecasting error accuracy measures.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
ENHANCING ENGLISH WRITING SKILLS THROUGH INTERNET-PLUS TOOLS IN THE PERSPECTI...ijfcstjournal
This investigation delves into incorporating a hybridized memetic strategy within the framework of English
composition pedagogy, leveraging Internet Plus resources. The study aims to provide an in-depth analysis
of how this method influences students’ writing competence, their perceptions of writing, and their
enthusiasm for English acquisition. Employing an explanatory research design that combines qualitative
and quantitative methods, the study collects data through surveys, interviews, and observations of students’
writing performance before and after the intervention. Findings demonstrate a beneficial impact of
integrating the memetic approach alongside Internet Plus tools on the writing aptitude of English as a
Foreign Language (EFL) learners. Students reported increased engagement with writing, attributing it to
the use of Internet plus tools. They also expressed that the memetic approach facilitated a deeper
understanding of cultural and social contexts in writing. Furthermore, the findings highlight a significant
improvement in students’ writing skills following the intervention. This study provides significant insights
into the practical implementation of the memetic approach within English writing education, highlighting
the beneficial contribution of Internet Plus tools in enriching students' learning journeys.
A SURVEY TO REAL-TIME MESSAGE-ROUTING NETWORK SYSTEM WITH KLA MODELLINGijfcstjournal
Messages routing over a network is one of the most fundamental concept in communication which requires
simultaneous transmission of messages from a source to a destination. In terms of Real-Time Routing, it
refers to the addition of a timing constraint in which messages should be received within a specified time
delay. This study involves Scheduling, Algorithm Design and Graph Theory which are essential parts of
the Computer Science (CS) discipline. Our goal is to investigate an innovative and efficient way to present
these concepts in the context of CS Education. In this paper, we will explore the fundamental modelling of
routing real-time messages on networks. We study whether it is possible to have an optimal on-line
algorithm for the Arbitrary Directed Graph network topology. In addition, we will examine the message
routing’s algorithmic complexity by breaking down the complex mathematical proofs into concrete, visual
examples. Next, we explore the Unidirectional Ring topology in finding the transmission’s
“makespan”.Lastly, we propose the same network modelling through the technique of Kinesthetic Learning
Activity (KLA). We will analyse the data collected and present the results in a case study to evaluate the
effectiveness of the KLA approach compared to the traditional teaching method.
More Related Content
Similar to A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMS
On Selection of Periodic Kernels Parameters in Time Series Prediction cscpconf
In the paper the analysis of the periodic kernels parameters is described. Periodic kernels can
be used for the prediction task, performed as the typical regression problem. On the basis of the
Periodic Kernel Estimator (PerKE) the prediction of real time series is performed. As periodic
kernels require the setting of their parameters it is necessary to analyse their influence on the
prediction quality. This paper describes an easy methodology of finding values of parameters of
periodic kernels. It is based on grid search. Two different error measures are taken into
consideration as the prediction qualities but lead to comparable results. The methodology was
tested on benchmark and real datasets and proved to give satisfactory results.
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONcscpconf
In the paper the analysis of the periodic kernels parameters is described. Periodic kernels can
be used for the prediction task, performed as the typical regression problem. On the basis of the
Periodic Kernel Estimator (PerKE) the prediction of real time series is performed. As periodic
kernels require the setting of their parameters it is necessary to analyse their influence on the
prediction quality. This paper describes an easy methodology of finding values of parameters of
periodic kernels. It is based on grid search. Two different error measures are taken into
consideration as the prediction qualities but lead to comparable results. The methodology was
tested on benchmark and real datasets and proved to give satisfactory results.
Performance Assessment of Polyphase Sequences Using Cyclic Algorithmrahulmonikasharma
Polyphase Sequences (known as P1, P2, Px, Frank) exist for a square integer length with good auto correlation properties are helpful in the several applications. Unlike the Barker and Binary Sequences which exist for certain length and exhibits a maximum of two digit merit factor. The Integrated Sidelobe level (ISL) is often used to define excellence of the autocorrelation properties of given Polyphase sequence. In this paper, we present the application of Cyclic Algorithm named CA which minimizes the ISL (Integrated Sidelobe Level) related metric which in turn improve the Merit factor to a greater extent is main thing in applications like RADAR, SONAR and communications. To illustrate the performance of the P1, P2, Px, Frank sequences when cyclic Algorithm is applied. we presented a number of examples for integer lengths. CA(Px) sequence exhibits the good Merit Factor among all the Polyphase sequences that are considered.
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMcsitconf
A quantum computation problem is discussed in this paper. Many new features that make
quantum computation superior to classical computation can be attributed to quantum coherence
effect, which depends on the phase of quantum coherent state. Quantum Fourier transform
algorithm, the most commonly used algorithm, is introduced. And one of its most important
applications, phase estimation of quantum state based on quantum Fourier transform, is
presented in details. The flow of phase estimation algorithm and the quantum circuit model are
shown. And the error of the output phase value, as well as the probability of measurement, is
analysed. The probability distribution of the measuring result of phase value is presented and
the computational efficiency is discussed.
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMcscpconf
A quantum computation problem is discussed in this paper. Many new features that make quantum computation superior to classical computation can be attributed to quantum coherence
effect, which depends on the phase of quantum coherent state. Quantum Fourier transform algorithm, the most commonly used algorithm, is introduced. And one of its most important
applications, phase estimation of quantum state based on quantum Fourier transform, is presented in details. The flow of phase estimation algorithm and the quantum circuit model are
shown. And the error of the output phase value, as well as the probability of measurement, is analysed. The probability distribution of the measuring result of phase value is presented and the computational efficiency is discussed.
SEQUENTIAL CLUSTERING-BASED EVENT DETECTION FOR NONINTRUSIVE LOAD MONITORINGcscpconf
The problem of change-point detection has been well studied and adopted in many signal processing applications. In such applications, the informative segments of the signal are the stationary ones before and after the change-point. However, for some novel signal processing and machine learning applications such as Non-Intrusive Load Monitoring (NILM), the information contained in the non-stationary transient intervals is of equal or even more importance to the recognition process. In this paper, we introduce a novel clustering-based sequential detection of abrupt changes in an aggregate electricity consumption profile with the accurate decomposition of the input signal into stationary and non-stationary segments. We also introduce various event models in the context of clustering analysis. The proposed algorithm is applied to building-level energy profiles with promising results for the residential BLUED power dataset.
SEQUENTIAL CLUSTERING-BASED EVENT DETECTION FOR NONINTRUSIVE LOAD MONITORINGcsandit
The problem of change-point detection has been well studied and adopted in many signal processing applications. In such applications, the informative segments of the signal are the
stationary ones before and after the change-point. However, for some novel signal processing and machine learning applications such as Non-Intrusive Load Monitoring (NILM), the information contained in the non-stationary transient intervals is of equal or even more importance to the recognition process. In this paper, we introduce a novel clustering-based sequential detection of abrupt changes in an aggregate electricity consumption profile with
accurate decomposition of the input signal into stationary and non-stationary segments. We also introduce various event models in the context of clustering analysis. The proposed algorithm is applied to building-level energy profiles with promising results for the residential BLUED power dataset.
Geoid height determination is one of the major problems of geodesy because usage of satellite
techniques in geodesy isgetting increasing. Geoid heights can be determined using different methods according
to the available data. Soft computing methods such as Fuzzy logic and neural networks became so popular that
they are used to solve many engineering problems. Fuzzy logic theory and later developments in uncertainty
assessment have enabled us to develop more precise models for our requirements. In this study, How to
construct the best fuzzy model is examined. For this purpose, three different data sets were taken and two
different kinds (two inpust one output and three inputs one output) fuzzy model were formed for the calculation
of geoid heights in Istanbul (Turkey). The Fuzzy models results of these were compared with geoid heights
obtained by GPS/levelling methods. The fuzzy approximation models were tested on the test points.
LINEAR SEARCH VERSUS BINARY SEARCH: A STATISTICAL COMPARISON FOR BINOMIAL INPUTSIJCSEA Journal
For certain algorithms such as sorting and searching, the parameters of the input probability distribution,in addition to the size of the input, have been found to influence the complexity of the underlying algorithm.The present paper makes a statistical comparative study on parameterized complexity between linear and binary search algorithms for binomial inputs.
Parallel knn on gpu architecture using opencleSAT Journals
Abstract In data mining applications, one of the useful algorithms for classification is the kNN algorithm. The kNN search has a wide usage in many research and industrial domains like 3-dimensional object rendering, content-based image retrieval, statistics, biology (gene classification), etc. In spite of some improvements in the last decades, the computation time required by the kNN search remains the bottleneck for kNN classification, especially in high dimensional spaces. This bottleneck has created the necessity of the parallel kNN on commodity hardware. GPU and OpenCL architecture are the low cost high performance solutions for parallelising the kNN classifier. In regard to this, we have designed, implemented our proposed parallel kNN model to improve upon performance bottleneck issue of kNN algorithm. In this paper, we have proposed parallel kNN algorithm on GPU and OpenCL framework. In our approach, we distributed the distance computations of the data points among all GPU cores. Multiple threads invoked for each GPU core. We have implemented and tested our parallel kNN implementation on UCI datasets. The experimental results show that the speedup of the KNN algorithm is improved over the serial performance.
Keywords: kNN, GPU, CPU, Parallel Computing, Data Mining, Clustering Algorithm.
Financial Time Series Analysis Based On Normalized Mutual Information FunctionsIJCI JOURNAL
A method of predictability analysis of future values of financial time series is described. The method is based on normalized mutual information functions. In the analysis, the use of these functions allowed to refuse any restrictions on the distributions of the parameters and on the correlations between parameters. A comparative analysis of the predictability of financial time series of Tel Aviv 25 stock exchange has been carried out.
This paper introduces a new comparison base stable sorting algorithm, named RA sort. The RA sort
involves only the comparison of pair of elements in an array which ultimately sorts the array and does not
involve the comparison of each element with every other element. It tries to build upon the relationship
established between the elements in each pass. Instead of going for a blind comparison we prefer a
selective comparison to get an efficient method. Sorting is a fundamental operation in computer science.
This algorithm is analysed both theoretically and empirically to get a robust average case result. We have
performed its Empirical analysis and compared its performance with the well-known quick sort for various
input types. Although the theoretical worst case complexity of RA sort is Yworst(n) = O(n√), the
experimental results suggest an empirical Oemp(nlgn)1.333 time complexity for typical input instances, where
the parameter n characterizes the input size. The theoretical complexity is given for comparison operation.
We emphasize that the theoretical complexity is operation specific whereas the empirical one represents the
overall algorithmic complexity.
A Singular Spectrum Analysis Technique to Electricity Consumption ForecastingIJERA Editor
Singular Spectrum Analysis (SSA) is a relatively new and powerful nonparametric tool for analyzing and forecasting economic data. SSA is capable of decomposing the main time series into independent components like trends, oscillatory manner and noise. This paper focuses on employing the performance of SSA approach to the monthly electricity consumption of the Middle Province in Gaza Strip\Palestine. The forecasting results are compared with the results of exponential smoothing state space (ETS) and ARIMA models. The three techniques do similarly well in forecasting process. However, SSA outperforms the ETS and ARIMA techniques according to forecasting error accuracy measures.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
Similar to A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMS (20)
ENHANCING ENGLISH WRITING SKILLS THROUGH INTERNET-PLUS TOOLS IN THE PERSPECTI...ijfcstjournal
This investigation delves into incorporating a hybridized memetic strategy within the framework of English
composition pedagogy, leveraging Internet Plus resources. The study aims to provide an in-depth analysis
of how this method influences students’ writing competence, their perceptions of writing, and their
enthusiasm for English acquisition. Employing an explanatory research design that combines qualitative
and quantitative methods, the study collects data through surveys, interviews, and observations of students’
writing performance before and after the intervention. Findings demonstrate a beneficial impact of
integrating the memetic approach alongside Internet Plus tools on the writing aptitude of English as a
Foreign Language (EFL) learners. Students reported increased engagement with writing, attributing it to
the use of Internet plus tools. They also expressed that the memetic approach facilitated a deeper
understanding of cultural and social contexts in writing. Furthermore, the findings highlight a significant
improvement in students’ writing skills following the intervention. This study provides significant insights
into the practical implementation of the memetic approach within English writing education, highlighting
the beneficial contribution of Internet Plus tools in enriching students' learning journeys.
A SURVEY TO REAL-TIME MESSAGE-ROUTING NETWORK SYSTEM WITH KLA MODELLINGijfcstjournal
Messages routing over a network is one of the most fundamental concept in communication which requires
simultaneous transmission of messages from a source to a destination. In terms of Real-Time Routing, it
refers to the addition of a timing constraint in which messages should be received within a specified time
delay. This study involves Scheduling, Algorithm Design and Graph Theory which are essential parts of
the Computer Science (CS) discipline. Our goal is to investigate an innovative and efficient way to present
these concepts in the context of CS Education. In this paper, we will explore the fundamental modelling of
routing real-time messages on networks. We study whether it is possible to have an optimal on-line
algorithm for the Arbitrary Directed Graph network topology. In addition, we will examine the message
routing’s algorithmic complexity by breaking down the complex mathematical proofs into concrete, visual
examples. Next, we explore the Unidirectional Ring topology in finding the transmission’s
“makespan”.Lastly, we propose the same network modelling through the technique of Kinesthetic Learning
Activity (KLA). We will analyse the data collected and present the results in a case study to evaluate the
effectiveness of the KLA approach compared to the traditional teaching method.
A COMPARATIVE ANALYSIS ON SOFTWARE ARCHITECTURE STYLESijfcstjournal
Software architecture is the structural solution that achieves the overall technical and operational
requirements for software developments. Software engineers applied software architectures for their
software system developments; however, they worry the basic benchmarks in order to select software
architecture styles, possible components, integration methods (connectors) and the exact application of
each style.
The objective of this research work was a comparative analysis of software architecture styles by its
weakness and benefits in order to select by the programmer during their design time. Finally, in this study,
the researcher has been identified architectural styles, weakness, and Strength and application areas with
its component, connector and Interface for the selected architectural styles.
SYSTEM ANALYSIS AND DESIGN FOR A BUSINESS DEVELOPMENT MANAGEMENT SYSTEM BASED...ijfcstjournal
A design of a sales system for professional services requires a comprehensive understanding of the
dynamics of sale cycles and how key knowledge for completing sales is managed. This research describes
a design model of a business development (sales) system for professional service firms based on the Saudi
Arabian commercial market, which takes into account the new advances in technology while preserving
unique or cultural practices that are an important part of the Saudi Arabian commercial market. The
design model has combined a number of key technologies, such as cloud computing and mobility, as an
integral part of the proposed system. An adaptive development process has also been used in implementing
the proposed design model.
AN ALGORITHM FOR SOLVING LINEAR OPTIMIZATION PROBLEMS SUBJECTED TO THE INTERS...ijfcstjournal
Frank t-norms are parametric family of continuous Archimedean t-norms whose members are also strict
functions. Very often, this family of t-norms is also called the family of fundamental t-norms because of the
role it plays in several applications. In this paper, optimization of a linear objective function with fuzzy
relational inequality constraints is investigated. The feasible region is formed as the intersection of two
inequality fuzzy systems defined by frank family of t-norms is considered as fuzzy composition. First, the
resolution of the feasible solutions set is studied where the two fuzzy inequality systems are defined with
max-Frank composition. Second, some related basic and theoretical properties are derived. Then, a
necessary and sufficient condition and three other necessary conditions are presented to conceptualize the
feasibility of the problem. Subsequently, it is shown that a lower bound is always attainable for the optimal
objective value. Also, it is proved that the optimal solution of the problem is always resulted from the
unique maximum solution and a minimal solution of the feasible region. Finally, an algorithm is presented
to solve the problem and an example is described to illustrate the algorithm. Additionally, a method is
proposed to generate random feasible max-Frank fuzzy relational inequalities. By this method, we can
easily generate a feasible test problem and employ our algorithm to it.
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...ijfcstjournal
Underwater detector network is one amongst the foremost difficult and fascinating analysis arenas that
open the door of pleasing plenty of researchers during this field of study. In several under water based
sensor applications, nodes are square measured and through this the energy is affected. Thus, the mobility
of each sensor nodes are measured through the water atmosphere from the water flow for sensor based
protocol formations. Researchers have developed many routing protocols. However, those lost their charm
with the time. This can be the demand of the age to supply associate degree upon energy-efficient and
ascendable strong routing protocol for under water actuator networks. During this work, the authors tend
to propose a customary routing protocol named level primarily based routing protocol (LBRP), reaching to
offer strong, ascendable and energy economical routing. LBRP conjointly guarantees the most effective use
of total energy consumption and ensures packet transmission which redirects as an additional reliability in
compare to different routing protocols. In this work, the authors have used the level of forwarding node,
residual energy and distance from the forwarding node to the causing node as a proof in multicasting
technique comparisons. Throughout this work, the authors have got a recognition result concerning about
86.35% on the average in node multicasting performances. Simulation has been experienced each in a
wheezy and quiet atmosphere which represents the endorsement of higher performance for the planned
protocol.
STRUCTURAL DYNAMICS AND EVOLUTION OF CAPSULE ENDOSCOPY (PILL CAMERA) TECHNOLO...ijfcstjournal
This research paper examined and re-evaluates the technological innovation, theory, structural dynamics
and evolution of Pill Camera(Capsule Endoscopy) technology in redirecting the response manner of small
bowel (intestine) examination in human. The Pill Camera (Endoscopy Capsule) is made up of sealed
biocompatible material to withstand acid, enzymes and other antibody chemicals in the stomach is a
technology that helps the medical practitioners especially the general physicians and the
gastroenterologists to examine and re-examine the intestine for possible bleeding or infection. Before the
advent of the Pill camera (Endoscopy Capsule) the colonoscopy was the local method used but research
showed that some parts (bowel) of the intestine can’t be reach by mere traditional method hence the need
for Pill Camera. Countless number of deaths from stomach disease such as polyps, inflammatory bowel
(Crohn”s diseases), Cancers, Ulcer, anaemia and tumours of small intestines which ordinary would have
been detected by sophisticated technology like Pill Camera has become norm in the developing nations.
Nevertheless, not only will this paper examine and re-evaluate the Pill Camera Innovation, theory,
Structural dynamics and evolution it unravelled and aimed to create awareness for both medical
practitioners and the public.
AN OPTIMIZED HYBRID APPROACH FOR PATH FINDINGijfcstjournal
Path finding algorithm addresses problem of finding shortest path from source to destination avoiding
obstacles. There exist various search algorithms namely A*, Dijkstra's and ant colony optimization. Unlike
most path finding algorithms which require destination co-ordinates to compute path, the proposed
algorithm comprises of a new method which finds path using backtracking without requiring destination
co-ordinates. Moreover, in existing path finding algorithm, the number of iterations required to find path is
large. Hence, to overcome this, an algorithm is proposed which reduces number of iterations required to
traverse the path. The proposed algorithm is hybrid of backtracking and a new technique(modified 8-
neighbor approach). The proposed algorithm can become essential part in location based, network, gaming
applications. grid traversal, navigation, gaming applications, mobile robot and Artificial Intelligence.
EAGRO CROP MARKETING FOR FARMING COMMUNITYijfcstjournal
The Major Occupation in India is the Agriculture; the people involved in the Agriculture belong to the poor
class and category. The people of the farming community are unaware of the new techniques and Agromachines, which would direct the world to greater heights in the field of agriculture. Though the farmers
work hard, they are cheated by agents in today’s market. This serves as a opportunity to solve
all the problems that farmers face in the current world. The eAgro crop marketing will serve as a better
way for the farmers to sell their products within the country with some mediocre knowledge about using
the website. This would provide information to the farmers about current market rate of agro-products,
their sale history and profits earned in a sale. This site will also help the farmers to know about the market
information and to view agricultural schemes of the Government provided to farmers.
EDGE-TENACITY IN CYCLES AND COMPLETE GRAPHSijfcstjournal
It is well known that the tenacity is a proper measure for studying vulnerability and reliability in graphs.
Here, a modified edge-tenacity of a graph is introduced based on the classical definition of tenacity.
Properties and bounds for this measure are introduced; meanwhile edge-tenacity is calculated for cycle
graphs and also for complete graphs.
COMPARATIVE STUDY OF DIFFERENT ALGORITHMS TO SOLVE N QUEENS PROBLEMijfcstjournal
This Paper provides a brief description of the Genetic Algorithm (GA), the Simulated Annealing (SA)
Algorithm, the Backtracking (BT) Algorithm and the Brute Force (BF) Search Algorithm and attempts to
explain the way as how the Proposed Genetic Algorithm (GA), the Proposed Simulated Annealing (SA)
Algorithm using GA, the Backtracking (BT) Algorithm and the Brute Force (BF) Search Algorithm can be
employed in finding the best solution of N Queens Problem and also, makes a comparison between these
four algorithms. It is entirely a review based work. The four algorithms were written as well as
implemented. From the Results, it was found that, the Proposed Genetic Algorithm (GA) performed better
than the Proposed Simulated Annealing (SA) Algorithm using GA, the Backtracking (BT) Algorithm and
the Brute Force (BF) Search Algorithm and it also provided better fitness value (solution) than the
Proposed Simulated Annealing Algorithm (SA) using GA, the Backtracking (BT) Algorithm and the Brute
Force (BF) Search Algorithm, for different N values. Also, it was noticed that, the Proposed GA took more
time to provide result than the Proposed SA using GA.
PSTECEQL: A NOVEL EVENT QUERY LANGUAGE FOR VANET’S UNCERTAIN EVENT STREAMSijfcstjournal
In recent years, the complex event processing technology has been used to process the VANET’s temporal
and spatial event streams. However, we usually cannot get the accurate data because the device sensing
accuracy limitations of the system. We only can get the uncertain data from the complex and limited
environment of the VANET. Because the VANET’s event streams are consist of the uncertain data, so they
are also uncertain. How effective to express and process these uncertain event streams has become the core
issue for the VANET system. To solve this problem, we propose a novel complex event query language
PSTeCEQL (probabilistic spatio-temporal constraint event query language). Firstly, we give the definition
of the possible world model of VANET’s uncertain event streams. Secondly, we propose an event query
language PSTeCEQL and give the syntax and the operational semantics of the language. Finally, we
illustrate the validity of the PSTeCEQL by an example.
CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG DATA USING PRE-PROCESSING BASED ON...ijfcstjournal
Now a day enormous amount of data is getting explored through Internet of Things (IoT) as technologies
are advancing and people uses these technologies in day to day activities, this data is termed as Big Data
having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose
frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by
traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets
but it has large communication cost which reduces execution efficiency. This proposed new pre-processed
k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using kmeans algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets
from generated clusters using MapReduce programming model. Results shown that execution efficiency of
ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as
one of the pre-processing technique.
A MUTATION TESTING ANALYSIS AND REGRESSION TESTINGijfcstjournal
Software testing is a testing which conducted a test to provide information to client about the quality of the
product under test. Software testing can also provide an objective, independent view of the software to
allow the business to appreciate and understand the risks of software implementation. In this paper we
focused on two main software testing –mutation testing and mutation testing. Mutation testing is a
procedural testing method, i.e. we use the structure of the code to guide the test program, A mutation is a
little change in a program. Such changes are applied to model low level defects that obtain in the process
of coding systems. Ideally mutations should model low-level defect creation. Mutation testing is a process
of testing in which code is modified then mutated code is tested against test suites. The mutations used in
source code are planned to include in common programming errors. A good unit test typically detects the
program mutations and fails automatically. Mutation testing is used on many different platforms, including
Java, C++, C# and Ruby. Regression testing is a type of software testing that seeks to uncover
new software bugs, or regressions, in existing functional and non-functional areas of a system after
changes such as enhancements, patches or configuration changes, have been made to them. When defects
are found during testing, the defect got fixed and that part of the software started working as needed. But
there may be a case that the defects that fixed have introduced or uncovered a different defect in the
software. The way to detect these unexpected bugs and to fix them used regression testing. The main focus
of regression testing is to verify that changes in the software or program have not made any adverse side
effects and that the software still meets its need. Regression tests are done when there are any changes
made on software, because of modified functions.
GREEN WSN- OPTIMIZATION OF ENERGY USE THROUGH REDUCTION IN COMMUNICATION WORK...ijfcstjournal
Advances in micro fabrication and communication techniques have led to unimaginable proliferation of
WSN applications. Research is focussed on reduction of setup operational energy costs. Bulk of operational
energy costs are linked to communication activities of WSN. Any progress towards energy efficiency has a
potential of huge savings globally. Therefore, every energy efficient step is an endeavour to cut costs and
‘Go Green’. In this paper, we have proposed a framework to reduce communication workload through: Innetwork compression and multiple query synthesis at the base-station and modification of query syntax
through introduction of Static Variables. These approaches are general approaches which can be used in
any WSN irrespective of application.
A NEW MODEL FOR SOFTWARE COSTESTIMATION USING HARMONY SEARCHijfcstjournal
Accurate and realistic estimation is always considered to be a great challenge in software industry.
Software Cost Estimation (SCE) is the standard application used to manage software projects. Determining
the amount of estimation in the initial stages of the project depends on planning other activities of the
project. In fact, the estimation is confronted with a number of uncertainties and barriers’, yet assessing the
previous projects is essential to solve this problem. Several models have been developed for the analysis of
software projects. But the classical reference method is the COCOMO model, there are other methods
which are also applied such as Function Point (FP), Line of Code(LOC); meanwhile, the expert`s opinions
matter in this regard. In recent years, the growth and the combination of meta-heuristic algorithms with
high accuracy have brought about a great achievement in software engineering. Meta-heuristic algorithms
which can analyze data from multiple dimensions and identify the optimum solution between them are
analytical tools for the analysis of data. In this paper, we have used the Harmony Search (HS)algorithm for
SCE. The proposed model which is a collection of 60 standard projects from Dataset NASA60 has been
assessed.The experimental results show that HS algorithm is a good way for determining the weight
similarity measures factors of software effort, and reducing the error of MRE.
AGENT ENABLED MINING OF DISTRIBUTED PROTEIN DATA BANKSijfcstjournal
Mining biological data is an emergent area at the intersection between bioinformatics and data mining
(DM). The intelligent agent based model is a popular approach in constructing Distributed Data Mining
(DDM) systems to address scalable mining over large scale distributed data. The nature of associations
between different amino acids in proteins has also been a subject of great anxiety. There is a strong need to
develop new models and exploit and analyze the available distributed biological data sources. In this study,
we have designed and implemented a multi-agent system (MAS) called Agent enriched Quantitative
Association Rules Mining for Amino Acids in distributed Protein Data Banks (AeQARM-AAPDB). Such
globally strong association rules enhance understanding of protein composition and are desirable for
synthesis of artificial proteins. A real protein data bank is used to validate the system.
International Journal on Foundations of Computer Science & Technology (IJFCST)ijfcstjournal
International Journal on Foundations of Computer Science & Technology (IJFCST) is a Bi-monthly peer-reviewed and refereed open access journal that publishes articles which contribute new results in all areas of the Foundations of Computer Science & Technology. Over the last decade, there has been an explosion in the field of computer science to solve various problems from mathematics to engineering. This journal aims to provide a platform for exchanging ideas in new emerging trends that needs more focus and exposure and will attempt to publish proposals that strengthen our goals. Topics of interest include, but are not limited to the following:
Because the technology is used largely in the last decades; cybercrimes have become a significant
international issue as a result of the huge damage that it causes to the business and even to the ordinary
users of technology. The main aims of this paper is to shed light on digital crimes and gives overview about
what a person who is related to computer science has to know about this new type of crimes. The paper has
three sections: Introduction to Digital Crime which gives fundamental information about digital crimes,
Digital Crime Investigation which presents different investigation models and the third section is about
Cybercrime Law.
DISTRIBUTION OF MAXIMAL CLIQUE SIZE UNDER THE WATTS-STROGATZ MODEL OF EVOLUTI...ijfcstjournal
In this paper, we analyze the evolution of a small-world network and its subsequent transformation to a
random network using the idea of link rewiring under the well-known Watts-Strogatz model for complex
networks. Every link u-v in the regular network is considered for rewiring with a certain probability and if
chosen for rewiring, the link u-v is removed from the network and the node u is connected to a randomly
chosen node w (other than nodes u and v). Our objective in this paper is to analyze the distribution of the
maximal clique size per node by varying the probability of link rewiring and the degree per node (number
of links incident on a node) in the initial regular network. For a given probability of rewiring and initial
number of links per node, we observe the distribution of the maximal clique per node to follow a Poisson
distribution. We also observe the maximal clique size per node in the small-world network to be very close
to that of the average value and close to that of the maximal clique size in a regular network. There is no
appreciable decrease in the maximal clique size per node when the network transforms from a regular
network to a small-world network. On the other hand, when the network transforms from a small-world
network to a random network, the average maximal clique size value decreases significantly
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMS
1. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.4, July 2015
DOI:10.5121/ijfcst.2015.5403 21
A STATISTICAL COMPARATIVE STUDY OF
SOME SORTING ALGORITHMS
Anchala Kumari1
, Niraj Kumar Singh2*
and Soubhik Chakraborty3
1
Department of Statistics, Patna University, Patna
2
Department of Computer Science & Engineering, BIT Mesra, Ranchi
3
Department of Mathematics, BIT Mesra, Ranchi
ABSTRACT
This research paper is a statistical comparative study of a few average case asymptotically optimal sorting
algorithms namely, Quick sort, Heap sort and K- sort. The three sorting algorithms all with the same
average case complexity have been compared by obtaining the corresponding statistical bounds while
subjecting these procedures over the randomly generated data from some standard discrete and continuous
probability distributions such as Binomial distribution, Uniform discrete and continuous distribution and
Poisson distribution. The statistical analysis is well supplemented by the parameterized complexity
analysis.
KEYWORDS
Parameterized complexity, Statistical bound, Empirical-O, Computer experiment.
1. INTRODUCTION
Quick sort and Heap sort are the two standard sorting techniques used in literature for sorting
large data sets. These two methods exhibit the very same average case complexity of O(Nlog2N),
where N is the size of input to be sorted. Quick sort, possibly, is the best choice for sorting
random data. The sequential access nature of this algorithm keeps the associated constant terms
small and hence resulting in an efficient choice among the algorithms with similar asymptotic
class. The worst case complexity of quick sort is that of O(N2
) while the heap sort in worst case
exhibit the same (Nlog2N) complexity. The ϴ(Nlog2N) tight complexity of heap sort gives it an
edge over much used quick sort for use in stringent real time systems. New-sort, an improved
version of Quick sort which introduced by Sundararajan and Chakarborty [1] also confirms to
the same average and worst case complexity as that of Quick sort ,i.e., O(Nlog2N) and O(N2
)
respectively. But this New sort technique uses an auxiliary array, increasing the space complexity
thereby. A further improvement over the New sort was made by removing the concept of
auxiliary array from it and this sorting algorithm was named as K-sort [2]. Interestingly, for
typical inputs, on average the K-sort also consumes O(Nlog2N) time complexity. In a recent
research paper Singh et al. [3] explores the quick sort algorithm in detail, where it discusses
various interesting patterns obtained for runtime complexity data.
In this paper the three sorting algorithms all with the same average case complexity have been
compared by obtaining the corresponding statistical bounds while subjecting these procedures
over the randomly generated data from some standard discrete and continuous probability
distributions such as Binomial distribution, Uniform discrete and continuous distribution and
Poisson distribution.
2. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.4, July 2015
22
ASYMPTOTIC ANALYSIS
Asymptotic analysis of Quick sort: The average and the best case recurrence of quick sort is T(N)
= 2T(N/2) + ϴ(N), T(1)=0. This upon solving yields a running time of O(Nlog2N). The worst
case recurrence is T(N) = T(N-1) + ϴ(N), T(1)=0, which results in O(N2
) complexity of quick sort
algorithm.
Asymptotic analysis of Heap sort: The best and worst case complexity of heap sort belong to
ϴ(Nlog2N) complexity.
Asymptotic analysis of K-sort: Due to its peculiar similarity the asymptotic time complexities of
K-sort is similar to that of quick sort.
EMPERICAL-O ANALYSIS
Empirical-O is an empirical estimate of the statistical bound over a finite range, obtained by
supplying numerical values to the weights which emerge from computer experiment. A computer
experiment being defined as a series of runs of a code for various inputs and is called
deterministic if it gives identical outputs if the code is re run for the same input.
Statistical bound (non probabilistic): If Wijis the weight of (a computing) operation of type i in
the jth
repetition (generally time is taken as a weight) and y is a “stochastic realization of the
deterministic T= ∑1.Wij where we count one for each operation repetition irrespective of the type,
the statistical bound of the algorithm is the asymptotic least upper bound of y expressed as a
function of N, N being the algorithm’s input size. T is realised for a fixed input while y is
expressed for a fixed size of the input. It is important to know y becomes stochastic for those
algorithms where fixing size does not fix all the computing operations. Sorting algorithm fall in
this category
Now we perform the empirical analysis of the results obtained by applying the specified
algorithms over the input data generated from the probability distributions mentioned earlier. The
codes were written in Dev C++ 5.8.2 and analysis was performed using Minitab statistical
Package.
The response (CPU time to run the code), the mean time in seconds is given in the tables 1-4 and
relative performance plots are presented in figures 1-4. Average case analysis is performed
directly on program run time to estimate the weight based statistical bound over a finite range by
running computer experiments [4][5]. This estimate is called empirical-O[6]][7]. Time of an
operation is taken as weight. Weighing permits collective consideration of all operations into a
conceptual bound which we call a statistical bound in order to distinguish it from the count based
mathematical bounds that are operation specific.
Sample size: The various inputs vary from a size of 1-10 lac which may be considered as
reasonably large for practical data set.
2. RELATIVE PERFORMANCE ANALYSIS OF DIFFERENT
ALGORITHMS
2.1 Discrete Uniform Distribution
Discrete Uniform distribution is characterised by single parameter k. In this experiment k has
been fixed at 1000.
3. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.4, July 2015
23
Table 1. Data for discrete uniform distribution
N Heap sort
(HS)
K- Sort
(KS)
Quick Sort
(QS)
100000 .0528 .0526 .03040
200000 .0998 .1842 .0966
300000 .1778 .4032 .1904
400000 .2436 .6340 .3096
500000 .2878 .9424 .4750
600000 .3472 1.3376 .6594
700000 .4028 1.775 .860
800000 .4622 2.886 1.092
900000 .5216 2.9168 1.301
1000000 .5828 3.5912 1.744
Some interesting results are seemed to emerge from the above table. For N< 200000, quick sort
gives better performance as heap sort compared to other two algorithms but for N>=30000, the
heap sort outperforms the quick sort. The reason for ill performance of Quick sort may is due to
increase in number of ties in data set as N increases. As for as the K-sort is considered it gives
the worst performance at this particular value of k.
1000000
800000
600000
400000
200000
0
4
3
2
1
0
N
Y-Data
H S
K S
Q S
V ariab le
R e la tiv e pe rform a nce for dis cre te uniform dis tribution K= 1 0 0 0
Figure 1. Relative plots of quick heap and K-sort algorithms (k=1000)
For Discrete Uniform inputs in average case Quick sort, and K-Sort exhibit a time complexity of
Oemp(N2
) whereas that of heap sort it is Oemp(Nlog2N) [2].
2.2 Binomial Distribution
The Binomial distribution has two parameters m and p, m being the number of independent
Bernoulli trials and p the probability of success in each trial. The mean time given in table 2 was
obtained by varying N between 100000 to 1000000 and fixing m and p at 1000 and 0.5
respectively and making several runs for the same input.
4. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.4, July 2015
24
Table 2. Data for Binomial distribution
N Heap sort
(HS)
K-sort
(KS)
Quick
sort
(QS)
100000 .4998 .6096 .2872
200000 1.0450 4.0198 1.1166
300000 1.6344 5.3716 2.46360
400000 2.24000 9.9072 4.3850
500000 2.8534 15.02379 6.8992
600000 3.4602 22.8832 9.8526
700000 4.0792 30.51688 13.5338
800000 4.74800 41.0869 17.66879
900000 5.3766 49.628720 22.0784
1000000 5.9798 62.4471 22.2676
1 0 0 0 0 0
8 0 0 0 0
6 0 0 0 0
4 0 0 0 0
2 0 0 0 0
0
7 0
6 0
5 0
4 0
3 0
2 0
1 0
0
N
Y-Data
H S
K S
Q S
V ar ia b le
R e la ti v e p e r f o r m a n c e p l o t f o r B in o m i a l in p u t
Figure 2. Relative plots of quick heap and K-sort algorithms (m=1000 and p=0.5)
The input data table (2) and relative performance plot (figure 2) supports the fact that on the
average K-sort consumes more time as compared to other two algorithms for sorting array of
same size. However for Binomial inputs quick sort and K-sort both confirms to Oemp(N2
), whereas
heap sort has O(Nlog2N) complexity.
2.3 Poisson Distribution
The Poisson distribution depends on the parameter λ. Lambda (which is both the mean and the
variance) should not be large as this is the distribution of rare events. While performing the
empirical analysis with Poisson inputs, its parameter, λ is fixed at 5.0
5. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.4, July 2015
25
Table 3. Data for Poisson Input
N Heap sort
(HS)
K-sort
(KS)
Quick
sort
(QS)
20000 .015 .1778 .0964
40000 .019 .7158 .3190
60000 .0342 1.7450 .7250
80000 .047 2.8082 1.2696
100000 .0636 4.5356 1.9696
120000 .0716 6.320 2.8502
140000 .0818 8.5972 3.8226
160000 .094 11.2604 4.979
18000 .110 14.736 6.3406
200000 .125 18.0549 7.8044
2 0 0 0 0 0
1 5 0 0 0 0
1 0 0 0 0 0
5 0 0 0 0
0
2 0
1 5
1 0
5
0
N
Y-Data
H S
K S
Q S
V a r ia b le
R e l a t i v e P e r f o r m a n c e p l o t f o r P o i s s o n i n p u t
Figure 3. Relative plots of quick heap and K-sort algorithms (λ=5.0)
For Poisson inputs, heap sort gives the good performance as compared to other two algorithms.
Quick sort and K-sort both have Oemp(N2
) complexity whereas if we have a look on the three
graph (figure 3), it is obvious that heap sort again confirms to Oemp(Nlog2N) complexity,
2.4 Uniform Continuous Distribution
Table 4 gives the mean execution time for the data simulated from uniform continuous
distribution [0, θ ].
Table 4. Data for Uniform Continuous Distribution
N Heap sort
(HS)
K-sort
(KS)
Quick
sort
(QS)
100000 .055 .0310 .0340
200000 .11277 .0748 .062
300000 .1780 .1092 .0872
6. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.4, July 2015
26
400000 .2400 .1468 .1064
500000 .3150 .2002 .1312
600000 .3842 .2624 .159
700000 .4533 .3092 .1908
800000 .5188 .3672 .219
900000 .902 .425 .2534
10000 1.0154 .4750 .2842
1000000
800000
600000
400000
200000
0
1.0
0.8
0.6
0.4
0.2
0.0
n
Y-Data
H S
KS
Q S
Variab le
Relative performance plot for Uniform Continuous distribution
Figure 4. Relative plots of quick heap and K-sort algorithms [0-1]
Here the scenario has just changed. Quick sort outperforms the two algorithms in its performance.
The K-sort for all values of N in selected range out performance the heap algorithm . However
all the three algorithms suggest Oemp(Nlog2N) complexity.
3. STATISTICAL ANALYSIS OF DIFFERENT ALGORITHMS FOR
DISCRETE DISTRIBUTIONS
In this section we test the hypothesis whether the average performance of the algorithms for
different input distributions is same or not. For this we apply two way analysis of variance. A
value of p = 0.358 greater than 0.05 as shown in table 5, is indicative of the fact that as for as
their average performance is concerned there is no reason to differentiate between the three.
Table 5. Results of Two Way ANOVA
Source DF SS MS F P
PRODIS 3 163.354 54.4514 2.52 0.155
SORTALG 2 52.955 26.4776 1.23 0.358
Error 6 129.602 21.6003
Total 11345.911
7. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.4, July 2015
27
3.1 Parametric complexity
Parametric complexity is one of the important criterions for selection of an algorithm among the
several algorithms, since besides the size of input, parameters of input distribution has direct
effect on execution time of an algorithm. We have examined the parameterized complexity of the
three algorithms for binomial inputs, the reason being that distribution has two parameters . A 32
factorial experiment with three repeated set of data elements at same combination of level of
factors m and p has been employed. The results given in table 6 below reveal some interesting
findings:
Table 6. Parameterized Complexity of heap sort, k-sort and quick sort
sources Heap
sort
(HS)
K-sort
(KS)
Quick
sort
(QS)
df F P F P F P
M 2 0.13 .876 3.85 .04 1.11 .351
P 2 0.15 .866 37.37 0.00 26.1
8
.000
MP 4 0.18 .948 1.62 .213 0.31 .867
The F and P values revealed that in case of Heap Sort the two parameters neither singularly nor
jointly has any effect on sorting time while in case of K-sort though both the factors have
independent effects on the complexity , probability of success being highly significant. While
applying the quick sort algorithm, the number of trials (m) shows highly non significant effect
while the probability of success p delivers significantly high effect on complexity .Thus the
proper selection of input parameters can have rewarding effect on reducing the complexity of an
algorithm.. For different values of p, the average execution time is given in the table 7 below
Table 7. optimal value of p m=5000, n=50000
P K-sort Quick sort
0.1 .188 .056
0.2 .144 .047
0.3 .126 .040
0.4 .127 .044
0.5 .115 .0438
06 .117 .0566
0.7 .125 .063
o.8 .1406 ..072
0.9 .183 ..072
Here we find that in both the cases initially the execution time decreases as the value of p is
increased, at p=.5 , execution time is minimum and then again it goes on increasing. Thus the
optimal value of p is .5. But as for the optimal value of m is concerned in case of k sort, as shown
in the following table 8, the execution time decreases with increase in the value of m Thus in
this case a high value of m is preferable.
8. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.4, July 2015
28
Table 8. optimal value of m for K sort (p=.5, n=50000)
M Ksort
2000 .174
3000 ,1456
4000 .1275
5000 .0782
6000 .0704
7000 .0666
8000 .0644
9000 .061
10000 .056
4. CONCLUSIONS
The three sorting algorithms, heap sort, K-sort and quick sort though theoretically deliberating to
same complexity O(Nlog2N) has been supported by the statistical analysis in section 3. We have
no evidence against rejection of the hypothesis of homogeneity of algorithms as far as their
average performance is considered. But unfortunately in worst case quick sort and K-sorts have
complexity O(N2
) than heap sort which exhibits a complexity of O(Nlog2N) since we have the
relation that O(Nlog2N)<O(N2
) .
However as far as Empirical-O estimates are considered , quick sort for N less than 200000 gives
better performance for some discrete distributions such as Binomial and Uniform distribution
while for N>300000 heap sort is best, K-sort for these distributions does not work good. But
sorting an array generated randomly (not generated from any standard probability distribution, K-
sort works good. It can sort an array of size never greater than 7000000 in less time than heap sort
[2].
For continuous uniform distribution, quick sort gives the good performance as compared to other
two. This result is quite expected as in case of a continuous distribution the probability of getting
similar valued elements (ties) is theoretically zero. It is well known through various results [8]
that quick sort behaves exceptionally good in such cases. Whereas in the very same scenario heap
sort is expected to perform relatively poor (however the time complexity of heap sort remains of
the order of Nlog2N) [9].
The different behaviour of the algorithms to input data can be supplemented by the parameterised
complexity analysis since the true potential of an algorithm is related to parameters of the input
distribution. As far as the parameterised complexity of heap sort is considered, it is in favour of
its worst case complexity which is less than the other two algorithms. The reason being obvious
as execution time does not depend on the binomial parameters. But in case of quick sort the
parameter p has highly significant effect on execution time. Though the two parameters m and p
have independent effects on complexity in case of K sort, but we find that parameter m has a
very little effect where as p has highly significant . In both the cases of K sort and Quick sort, the
complexity is minimum at p=.5 but a high value of p is preferable to reduce the complexity while
using the K-sort algorithm. Thus proper selection of input parameters can have rewarding effect
on reducing complexity of an algorithm.
9. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.4, July 2015
29
REFERENCES
[1] Sundararajan, KK & Chakraborty S, (2007) "A New sorting algorithm" Journal of Applied
Ma1thematics and Computation 188, pp 1037-1041.
[2] Sundararajan, KK, Pal, M, Chakarborty ,S., & Mahanti, NC, (2013) "K-Sort: A New Sorting
Algorithm that beats Heap Sort for n ≤ 70 lakhs".
[3] Singh, N K, Chakraborty, S, and Mallick, DK, (2014) "A Statistical Peek into Average Case
Complexity", IJEDPO, Vol. 4, No 2, Int. J. on Recent Trends in Engineering and Technology 8(1),
64-67.
[4] Fang, KT, Li, R, and Sudjianto, (2000) Design and Modelling of computer Experiments, Taylor &
Francis Group, Boca Raton.
[5] Sacks, J, Welch,W, Mitchel, T, and Wynn, (1989) "Design and Analysis of Computer Experiments",
Statistical Science 4.
[6] Chakraborty , S, & Sourabh, SK, (2010) A Computer Oriented Approach to Algorithmic Complexity,
Lambert Academic Publishing.
[7] Chakraborty, S, Modi, N and Panigrahi, S, (2009) "Will the weight based Statistical Bounds
Revolutionize the It?", International Journal of Computational Cognition 7(3).
[8] Singh, NK & Chakraborty, S, (2012) "Smart Sort: A Fast, Efficient and Robust Sorting Algorithm",
IJACMS, ISSN 2230-9624, Vol 3, Issue 4, pp 487-493.
[9] Sourabh, SK ,and Chakraborty, S, (2009) "Empirical Study on the Robustness of Average Complexity
and Parameterized Complexity Measure for Heap sort Algorithm", International Journal of
Computational Cognition.Vol.7, No.4.
Authors
Anchala Kumari is a Professor in the department of statistics, Patna University, India. Her
research area includes: Operations research, Design of Experiments and Computer
Programming.
Niraj Kumar Singh is a Teaching Cum Research Fellow at department of CSE, BIT
Mesra, India. His research interest includes: Design and Analysis of Algorithms and
Algorithmic Analysis through Statistical bounds.
Soubhik Chakraborty is a Professor in the department of Mathematics, BIT Mesra, India.
His research interest includes: Music Analysis, Algorithmic Complexity and Statistical
Computing.