In our previous work there was some indication that Partition Sort could be having a more robust average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we reconfirm this through computer experiments for inputs from Cauchy distribution for which expectation theoretically does not exist. Additionally, the algorithm is found to be sensitive to parameters of the input probability distribution demanding further investigation on parameterized complexity. The results on this algorithm for Binomial inputs in our second study are very encouraging in that direction.
PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUC...IJCSEA Journal
In our previous work there was some indication that Partition Sort could be having a more robust average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we reconfirm this through computer experiments for inputs from Cauchy distribution for which expectation theoretically does not exist. Additionally, the algorithm is found to be sensitive to parameters of the input probability distribution demanding further investigation on parameterized complexity. The results on this algorithm for Binomial inputs in our second study are very encouraging in that direction.
BINARY TREE SORT IS MORE ROBUST THAN QUICK SORT IN AVERAGE CASEIJCSEA Journal
This document discusses the average case complexity of binary tree sort compared to quicksort. It analyzes the robustness of binary tree sort's O(n log n) average case complexity for non-uniform input distributions like binomial, Poisson, discrete uniform, continuous uniform, and standard normal distributions. Through statistical modeling and simulation experiments on different sample sizes, it is shown that binary tree sort maintains O(n log n) average case complexity even for non-uniform inputs, demonstrating that it is more robust than quicksort in the average case. The document concludes that only algorithms with equal worst case and average case complexities can reliably depend on average complexity measures, otherwise robustness must be verified.
This document proposes a new method to remove the dependence of fuzzy c-means clustering on random initialization. The conventional fuzzy c-means algorithm's performance is highly dependent on the randomly initialized membership values used to select initial centroids. The proposed method uses an algorithm by Yuan et al. to determine initial centroids without randomization. These centroids are then used as inputs to the conventional fuzzy c-means algorithm. The performance of the proposed method is compared to conventional fuzzy c-means using partition coefficient and clustering entropy validity indices. Results show the proposed method produces more consistent and better performance by removing the effect of random initialization.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
This document summarizes a research paper that proposes a new method to accelerate the nearest neighbor search step of the k-means clustering algorithm. The k-means algorithm is computationally expensive due to calculating distances between data points and cluster centers. The proposed method uses geometric relationships between data points and centers to reject centers that are unlikely to be the nearest neighbor, without decreasing clustering accuracy. Experimental results showed the method significantly reduced the number of distance computations required.
This document is a seminar report on the K-Means clustering algorithm submitted by Gaurav Handa. It includes an introduction that discusses the importance of data mining and describes K-Means clustering. It also includes chapters that analyze and plan the implementation of K-Means, describe the algorithm and its flowchart, discuss limitations, and provide examples of implementing K-Means using graphs and Java code. The report was submitted in partial fulfillment of seminar requirements and includes acknowledgements and certificates.
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other.
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureIJRES Journal
This paper is devoted to processing data given in an ordinal scale. A new objective function of a
special type is introduced. A group of robust fuzzy clustering algorithms based on the similarity measure is
introduced.
PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUC...IJCSEA Journal
In our previous work there was some indication that Partition Sort could be having a more robust average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we reconfirm this through computer experiments for inputs from Cauchy distribution for which expectation theoretically does not exist. Additionally, the algorithm is found to be sensitive to parameters of the input probability distribution demanding further investigation on parameterized complexity. The results on this algorithm for Binomial inputs in our second study are very encouraging in that direction.
BINARY TREE SORT IS MORE ROBUST THAN QUICK SORT IN AVERAGE CASEIJCSEA Journal
This document discusses the average case complexity of binary tree sort compared to quicksort. It analyzes the robustness of binary tree sort's O(n log n) average case complexity for non-uniform input distributions like binomial, Poisson, discrete uniform, continuous uniform, and standard normal distributions. Through statistical modeling and simulation experiments on different sample sizes, it is shown that binary tree sort maintains O(n log n) average case complexity even for non-uniform inputs, demonstrating that it is more robust than quicksort in the average case. The document concludes that only algorithms with equal worst case and average case complexities can reliably depend on average complexity measures, otherwise robustness must be verified.
This document proposes a new method to remove the dependence of fuzzy c-means clustering on random initialization. The conventional fuzzy c-means algorithm's performance is highly dependent on the randomly initialized membership values used to select initial centroids. The proposed method uses an algorithm by Yuan et al. to determine initial centroids without randomization. These centroids are then used as inputs to the conventional fuzzy c-means algorithm. The performance of the proposed method is compared to conventional fuzzy c-means using partition coefficient and clustering entropy validity indices. Results show the proposed method produces more consistent and better performance by removing the effect of random initialization.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
This document summarizes a research paper that proposes a new method to accelerate the nearest neighbor search step of the k-means clustering algorithm. The k-means algorithm is computationally expensive due to calculating distances between data points and cluster centers. The proposed method uses geometric relationships between data points and centers to reject centers that are unlikely to be the nearest neighbor, without decreasing clustering accuracy. Experimental results showed the method significantly reduced the number of distance computations required.
This document is a seminar report on the K-Means clustering algorithm submitted by Gaurav Handa. It includes an introduction that discusses the importance of data mining and describes K-Means clustering. It also includes chapters that analyze and plan the implementation of K-Means, describe the algorithm and its flowchart, discuss limitations, and provide examples of implementing K-Means using graphs and Java code. The report was submitted in partial fulfillment of seminar requirements and includes acknowledgements and certificates.
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other.
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureIJRES Journal
This paper is devoted to processing data given in an ordinal scale. A new objective function of a
special type is introduced. A group of robust fuzzy clustering algorithms based on the similarity measure is
introduced.
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
The problem of evaluating node importance in clustering has been active research in present days and many methods have been developed. Most of the clustering algorithms deal with general similarity measures. However In real situation most of the cases data changes over time. But clustering this type of data not only decreases the quality of clusters but also disregards the expectation of users, when usually require recent clustering results. In this regard we proposed Our-NIR method that is better than Ming-Syan Chen proposed a method and it has proven with the help of results of node importance, which is related to calculate the node importance that is very useful in clustering of categorical data, still it has deficiency that is importance of data labeling and outlier detection. In this paper we modified Our-NIR method for evaluating of node importance by introducing the probability distribution which will be better than by comparing the results.
This document analyzes the steady-state probabilities of an M/M/1 queueing system with a single server subject to differentiated vacations, balking, and partial vacation interruptions. It presents the Kolmogorov differential difference equations that describe the system and derives closed-form expressions for the steady-state probabilities πj,n of being in each state (j = 0, 1, 2 server states and n = customers). The key results are expressions for the steady-state probabilities π0,n, π1,n, and π2,n in terms of the arrival rate λ, service rate μ, and vacation rates γ1 and γ2.
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problempaperpublications3
Abstract: In this work, the fuzzy nonlinear programming problem (FNLPP) has been developed and their result have also discussed. The numerical solutions of crisp problems and have been compared and the fuzzy solution and its effectiveness have also been presented and discussed. The penalty function method has been developed and mixed with Nelder and Mend’s algorithm of direct optimization problem solutionhave been used together to solve this FNLPP.
Keyword:Fuzzy set theory, fuzzy numbers, decision making, nonlinear programming, Nelder and Mend’s algorithm, penalty function method.
This document summarizes a research paper that examines how to estimate inputs for a cooperative and supportive neural network to achieve a desired output. The paper studies how time-varying inputs influence the network's behavior and presents theorems showing that if the inputs satisfy certain conditions, the network's solutions will converge to a pre-specified output. Estimates on restricting the inputs are provided. The results demonstrate that a neural network can be made to approach different outputs without changing its architecture by properly selecting the inputs.
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE IJORCS
Quality of clustering is an important issue in application of clustering techniques. Most traditional cluster validity indices are geometry-based cluster quality measures. This work proposes a cluster validity index based on the decision-theoretic rough set model by considering various loss functions. Real time retail data show the usefulness of the proposed validity index for the evaluation of rough and crisp clustering. The measure is shown to help determine optimal number of clusters, as well as an important parameter called threshold in rough clustering. The experiments with a promotional campaign for the retail data illustrate the ability of the proposed measure to incorporate financial considerations in evaluating quality of a clustering scheme. This ability to deal with monetary values distinguishes the proposed decision-theoretic measure from other distance-based measures. Our proposed system validity index can also be efficient for evaluating other clustering algorithms such as fuzzy clustering.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Cluster analysis involves grouping data objects into clusters so that objects within the same cluster are more similar to each other than objects in other clusters. There are several major clustering approaches including partitioning methods that iteratively construct partitions, hierarchical methods that create hierarchical decompositions, density-based methods based on connectivity and density, grid-based methods using a multi-level granularity structure, and model-based methods that find the best fit of a model to the clusters. Partitioning methods like k-means and k-medoids aim to optimize a partitioning criterion by iteratively updating cluster centroids or medoids.
The document discusses different clustering algorithms, including k-means and EM clustering. K-means aims to partition items into k clusters such that each item belongs to the cluster with the nearest mean. It works iteratively to assign items to centroids and recompute centroids until the clusters no longer change. EM clustering generalizes k-means by computing probabilities of cluster membership based on probability distributions, with the goal of maximizing the overall probability of items given the clusters. Both algorithms are used to group similar items in applications like market segmentation.
Application of thermal error in machine tools based on Dynamic Bayesian NetworkIJRES Journal
In recent years, the growing interest toward complex manufacturing on machine tools and the
machining accuracy have solicited new efforts in the area of modeling and analysis of machine tools machining
errors. Therefore, the mathematical model study on the relationship between temperature field and thermal error
is the core content, which can improve the precision of parts processing and the thermal stability, also predict
and compensate machining errors of CNC machine tools. It is critical to obtain the thermal errors of a precision
machine tools in real-time. In this paper, based on Dynamic Bayesian Network (DBN), a pioneering modeling
method applied in thermal error research is presented. The dependence of thermal error and temperature field is
clearly described by graph theory, and the fuzzy classification method is proposed to reduce the computational
complexity, then forming a new method for thermal error modeling of machine tools.
Probabilistic Methods Of Signal And System Analysis, 3rd EditionPreston King
Probabilistic Methods of Signal and System Analysis, 3/e stresses the engineering applications of probability theory, presenting the material at a level and in a manner ideally suited to engineering students at the junior or senior level. It is also useful as a review for graduate students and practicing engineers.
Thoroughly revised and updated, this third edition incorporates increased use of the computer in both text examples and selected problems. It utilizes MATLAB as a computational tool and includes new sections relating to Bernoulli trials, correlation of data sets, smoothing of data, computer computation of correlation functions and spectral densities, and computer simulation of systems. All computer examples can be run using the Student Version of MATLAB. Almost all of the examples and many of the problems have been modified or changed entirely, and a number of new problems have been added. A separate appendix discusses and illustrates the application of computers to signal and system analysis
This document discusses various clustering techniques for image segmentation. It begins by defining clustering and image segmentation. It then describes four main clustering techniques - exclusive clustering (e.g. k-means), overlapping clustering (e.g. fuzzy c-means), hierarchical clustering, and probabilistic-D clustering. For each technique, it provides details on the clustering algorithm and steps. It concludes that fuzzy c-means is superior to other approaches for image segmentation efficiency but has high computational time, while probabilistic-D clustering aims to reduce this time.
Estimating Reconstruction Error due to Jitter of Gaussian Markov ProcessesMudassir Javed
This paper presents estimation of reconstruction error due to jitter of Gaussian Markov Processes. Two samples are considered for the analysis in two different situations. In one situation, the first sample does not have jitter while the other one is effected by jitter. In the second situation, both the samples are effected by jitter. The probability density functions of the jitter are given by Uniform Distribution and Erlang Distribution. Statistical averaging is applied to conditional expectation of random variable of jitter. From that, conditional variance is obtained which is defined as reconstruction error function and by knowing that, the reconstruction error of a Gaussian Markov Process is determined.
Two Types of Novel Discrete Time Chaotic Systemsijtsrd
In this paper, two types of one dimensional discrete time systems are firstly proposed and the chaos behaviors are numerically discussed. Based on the time domain approach, an invariant set and equilibrium points of such discrete time systems are presented. Besides, the stability of equilibrium points will be analyzed in detail. Finally, Lyapunov exponent plots as well as state response and Fourier amplitudes of the proposed discrete time systems are given to verify and demonstrate the chaos behaviors. Yeong-Jeu Sun ""Two Types of Novel Discrete-Time Chaotic Systems"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020, URL: https://www.ijtsrd.com/papers/ijtsrd29853.pdf
Paper Url : https://www.ijtsrd.com/engineering/electrical-engineering/29853/two-types-of-novel-discrete-time-chaotic-systems/yeong-jeu-sun
This document discusses different types of clustering analysis techniques in data mining. It describes clustering as the task of grouping similar objects together. The document outlines several key clustering algorithms including k-means clustering and hierarchical clustering. It provides an example to illustrate how k-means clustering works by randomly selecting initial cluster centers and iteratively assigning data points to clusters and recomputing cluster centers until convergence. The document also discusses limitations of k-means and how hierarchical clustering builds nested clusters through sequential merging of clusters based on a similarity measure.
A derivative free high ordered hybrid equation solverZac Darcy
Generally a range of equation solvers for estimating the solution of an equation contain the derivative of
first or higher order. Such solvers are difficult to apply in the instances of complicated functional
relationship. The equation solver proposed in this paper meant to solve many of the involved complicated
problems and establishing a process tending towards a higher ordered by alloying the already proved
conventional methods like Newton-Raphson method (N-R), Regula Falsi method (R-F) & Bisection method
(BIS). The present method is good to solve those nonlinear and transcendental equations that cannot be
solved by the basic algebra. Comparative analysis are also made with the other racing formulas of this
group and the result shows that present method is faster than all such methods of the class.
Ripple Algorithm to Evaluate the Importance of Network Nodesrahulmonikasharma
Inthis paper raise the ripples algorithm to evaluate the importance of network node was proposed, its principle is based onthe direct influence of adjacent nodes, and affect farther nodes indirectlyby closer ones just like the ripples on the water. Then we defined two judgments,the discriminationof node importance and the accuracy of key node selecting, to verify its efficiency. The greater degree of discriminationand higher accuracy means better efficiency of algorithm. At last we performed experiment on ARPA network, to compare the efficiency of different algorithms, closeness centricity, node deletion, node contraction method, algorithm raised by Zhou Xuan etc. and ripple method. Results show that ripple algorithm is better than the other measures in the discrimination of node importance and the accuracy of key node selecting.
The document proposes a novel Spatial Fuzzy C-Means (PET-SFCM) clustering algorithm to segment PET scan images of patients with neurodegenerative disorders like Alzheimer's disease. The algorithm incorporates spatial neighborhood information into the traditional Fuzzy C-Means algorithm. It was tested on real patient data sets and showed satisfactory results compared to conventional FCM and K-Means clustering algorithms. The PET-SFCM algorithm provides an effective way to segment PET images and analyze brain changes related to neurological conditions.
Bayesian system reliability and availability analysis underthe vague environm...ijsc
Reliability modeling is the most important discipline of reliable engineering. The main purpose of this paper is to provide a methodology for discussing the vague environment. Actually we discuss on Bayesian system reliability and availability analysis on the vague environment based on Exponential distribution
under squared error symmetric and precautionary asymmetric loss functions. In order to apply the Bayesian approach, model parameters are assumed to be vague random variables with vague prior distributions. This approach will be used to create the vague Bayes estimate of system reliability and availability by introducing and applying a theorem called “Resolution Identity” for vague sets. For this purpose, the original problem is transformed into a nonlinear programming problem which is then divided up into eight subproblems to simplify computations. Finally, the results obtained for the subproblems can
be used to determine the membership functions of the vague Bayes estimate of system reliability. Finally, the sub problems can be solved by using any commercial optimizers, e.g. GAMS or LINGO.
This document discusses feature selection techniques for classification problems. It begins by outlining class separability measures like divergence, Bhattacharyya distance, and scatter matrices. It then discusses feature subset selection approaches, including scalar feature selection which treats features individually, and feature vector selection which considers feature sets and correlations. Examples are provided to demonstrate calculating class separability measures for different feature combinations on sample datasets. Exhaustive search and suboptimal techniques like forward, backward, and floating selection are discussed for choosing optimal feature subsets. The goal of feature selection is to select a subset of features that maximizes class separation.
A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMSijfcstjournal
This research paper is a statistical comparative study of a few average case asymptotically optimal sorting
algorithms namely, Quick sort, Heap sort and K- sort. The three sorting algorithms all with the same
average case complexity have been compared by obtaining the corresponding statistical bounds while
subjecting these procedures over the randomly generated data from some standard discrete and continuous
probability distributions such as Binomial distribution, Uniform discrete and continuous distribution and
Poisson distribution. The statistical analysis is well supplemented by the parameterized complexity
analysis
This research paper is a statistical comparative study of a few average case asymptotically optimal sorting algorithms namely, Quick sort, Heap sort and K- sort. The three sorting algorithms all with the same average case complexity have been compared by obtaining the corresponding statistical bounds while subjecting these procedures over the randomly generated data from some standard discrete and continuous
probability distributions such as Binomial distribution, Uniform discrete and continuous distribution and Poisson distribution. The statistical analysis is well supplemented by the parameterized complexity analysis.
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
The problem of evaluating node importance in clustering has been active research in present days and many methods have been developed. Most of the clustering algorithms deal with general similarity measures. However In real situation most of the cases data changes over time. But clustering this type of data not only decreases the quality of clusters but also disregards the expectation of users, when usually require recent clustering results. In this regard we proposed Our-NIR method that is better than Ming-Syan Chen proposed a method and it has proven with the help of results of node importance, which is related to calculate the node importance that is very useful in clustering of categorical data, still it has deficiency that is importance of data labeling and outlier detection. In this paper we modified Our-NIR method for evaluating of node importance by introducing the probability distribution which will be better than by comparing the results.
This document analyzes the steady-state probabilities of an M/M/1 queueing system with a single server subject to differentiated vacations, balking, and partial vacation interruptions. It presents the Kolmogorov differential difference equations that describe the system and derives closed-form expressions for the steady-state probabilities πj,n of being in each state (j = 0, 1, 2 server states and n = customers). The key results are expressions for the steady-state probabilities π0,n, π1,n, and π2,n in terms of the arrival rate λ, service rate μ, and vacation rates γ1 and γ2.
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problempaperpublications3
Abstract: In this work, the fuzzy nonlinear programming problem (FNLPP) has been developed and their result have also discussed. The numerical solutions of crisp problems and have been compared and the fuzzy solution and its effectiveness have also been presented and discussed. The penalty function method has been developed and mixed with Nelder and Mend’s algorithm of direct optimization problem solutionhave been used together to solve this FNLPP.
Keyword:Fuzzy set theory, fuzzy numbers, decision making, nonlinear programming, Nelder and Mend’s algorithm, penalty function method.
This document summarizes a research paper that examines how to estimate inputs for a cooperative and supportive neural network to achieve a desired output. The paper studies how time-varying inputs influence the network's behavior and presents theorems showing that if the inputs satisfy certain conditions, the network's solutions will converge to a pre-specified output. Estimates on restricting the inputs are provided. The results demonstrate that a neural network can be made to approach different outputs without changing its architecture by properly selecting the inputs.
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE IJORCS
Quality of clustering is an important issue in application of clustering techniques. Most traditional cluster validity indices are geometry-based cluster quality measures. This work proposes a cluster validity index based on the decision-theoretic rough set model by considering various loss functions. Real time retail data show the usefulness of the proposed validity index for the evaluation of rough and crisp clustering. The measure is shown to help determine optimal number of clusters, as well as an important parameter called threshold in rough clustering. The experiments with a promotional campaign for the retail data illustrate the ability of the proposed measure to incorporate financial considerations in evaluating quality of a clustering scheme. This ability to deal with monetary values distinguishes the proposed decision-theoretic measure from other distance-based measures. Our proposed system validity index can also be efficient for evaluating other clustering algorithms such as fuzzy clustering.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Cluster analysis involves grouping data objects into clusters so that objects within the same cluster are more similar to each other than objects in other clusters. There are several major clustering approaches including partitioning methods that iteratively construct partitions, hierarchical methods that create hierarchical decompositions, density-based methods based on connectivity and density, grid-based methods using a multi-level granularity structure, and model-based methods that find the best fit of a model to the clusters. Partitioning methods like k-means and k-medoids aim to optimize a partitioning criterion by iteratively updating cluster centroids or medoids.
The document discusses different clustering algorithms, including k-means and EM clustering. K-means aims to partition items into k clusters such that each item belongs to the cluster with the nearest mean. It works iteratively to assign items to centroids and recompute centroids until the clusters no longer change. EM clustering generalizes k-means by computing probabilities of cluster membership based on probability distributions, with the goal of maximizing the overall probability of items given the clusters. Both algorithms are used to group similar items in applications like market segmentation.
Application of thermal error in machine tools based on Dynamic Bayesian NetworkIJRES Journal
In recent years, the growing interest toward complex manufacturing on machine tools and the
machining accuracy have solicited new efforts in the area of modeling and analysis of machine tools machining
errors. Therefore, the mathematical model study on the relationship between temperature field and thermal error
is the core content, which can improve the precision of parts processing and the thermal stability, also predict
and compensate machining errors of CNC machine tools. It is critical to obtain the thermal errors of a precision
machine tools in real-time. In this paper, based on Dynamic Bayesian Network (DBN), a pioneering modeling
method applied in thermal error research is presented. The dependence of thermal error and temperature field is
clearly described by graph theory, and the fuzzy classification method is proposed to reduce the computational
complexity, then forming a new method for thermal error modeling of machine tools.
Probabilistic Methods Of Signal And System Analysis, 3rd EditionPreston King
Probabilistic Methods of Signal and System Analysis, 3/e stresses the engineering applications of probability theory, presenting the material at a level and in a manner ideally suited to engineering students at the junior or senior level. It is also useful as a review for graduate students and practicing engineers.
Thoroughly revised and updated, this third edition incorporates increased use of the computer in both text examples and selected problems. It utilizes MATLAB as a computational tool and includes new sections relating to Bernoulli trials, correlation of data sets, smoothing of data, computer computation of correlation functions and spectral densities, and computer simulation of systems. All computer examples can be run using the Student Version of MATLAB. Almost all of the examples and many of the problems have been modified or changed entirely, and a number of new problems have been added. A separate appendix discusses and illustrates the application of computers to signal and system analysis
This document discusses various clustering techniques for image segmentation. It begins by defining clustering and image segmentation. It then describes four main clustering techniques - exclusive clustering (e.g. k-means), overlapping clustering (e.g. fuzzy c-means), hierarchical clustering, and probabilistic-D clustering. For each technique, it provides details on the clustering algorithm and steps. It concludes that fuzzy c-means is superior to other approaches for image segmentation efficiency but has high computational time, while probabilistic-D clustering aims to reduce this time.
Estimating Reconstruction Error due to Jitter of Gaussian Markov ProcessesMudassir Javed
This paper presents estimation of reconstruction error due to jitter of Gaussian Markov Processes. Two samples are considered for the analysis in two different situations. In one situation, the first sample does not have jitter while the other one is effected by jitter. In the second situation, both the samples are effected by jitter. The probability density functions of the jitter are given by Uniform Distribution and Erlang Distribution. Statistical averaging is applied to conditional expectation of random variable of jitter. From that, conditional variance is obtained which is defined as reconstruction error function and by knowing that, the reconstruction error of a Gaussian Markov Process is determined.
Two Types of Novel Discrete Time Chaotic Systemsijtsrd
In this paper, two types of one dimensional discrete time systems are firstly proposed and the chaos behaviors are numerically discussed. Based on the time domain approach, an invariant set and equilibrium points of such discrete time systems are presented. Besides, the stability of equilibrium points will be analyzed in detail. Finally, Lyapunov exponent plots as well as state response and Fourier amplitudes of the proposed discrete time systems are given to verify and demonstrate the chaos behaviors. Yeong-Jeu Sun ""Two Types of Novel Discrete-Time Chaotic Systems"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020, URL: https://www.ijtsrd.com/papers/ijtsrd29853.pdf
Paper Url : https://www.ijtsrd.com/engineering/electrical-engineering/29853/two-types-of-novel-discrete-time-chaotic-systems/yeong-jeu-sun
This document discusses different types of clustering analysis techniques in data mining. It describes clustering as the task of grouping similar objects together. The document outlines several key clustering algorithms including k-means clustering and hierarchical clustering. It provides an example to illustrate how k-means clustering works by randomly selecting initial cluster centers and iteratively assigning data points to clusters and recomputing cluster centers until convergence. The document also discusses limitations of k-means and how hierarchical clustering builds nested clusters through sequential merging of clusters based on a similarity measure.
A derivative free high ordered hybrid equation solverZac Darcy
Generally a range of equation solvers for estimating the solution of an equation contain the derivative of
first or higher order. Such solvers are difficult to apply in the instances of complicated functional
relationship. The equation solver proposed in this paper meant to solve many of the involved complicated
problems and establishing a process tending towards a higher ordered by alloying the already proved
conventional methods like Newton-Raphson method (N-R), Regula Falsi method (R-F) & Bisection method
(BIS). The present method is good to solve those nonlinear and transcendental equations that cannot be
solved by the basic algebra. Comparative analysis are also made with the other racing formulas of this
group and the result shows that present method is faster than all such methods of the class.
Ripple Algorithm to Evaluate the Importance of Network Nodesrahulmonikasharma
Inthis paper raise the ripples algorithm to evaluate the importance of network node was proposed, its principle is based onthe direct influence of adjacent nodes, and affect farther nodes indirectlyby closer ones just like the ripples on the water. Then we defined two judgments,the discriminationof node importance and the accuracy of key node selecting, to verify its efficiency. The greater degree of discriminationand higher accuracy means better efficiency of algorithm. At last we performed experiment on ARPA network, to compare the efficiency of different algorithms, closeness centricity, node deletion, node contraction method, algorithm raised by Zhou Xuan etc. and ripple method. Results show that ripple algorithm is better than the other measures in the discrimination of node importance and the accuracy of key node selecting.
The document proposes a novel Spatial Fuzzy C-Means (PET-SFCM) clustering algorithm to segment PET scan images of patients with neurodegenerative disorders like Alzheimer's disease. The algorithm incorporates spatial neighborhood information into the traditional Fuzzy C-Means algorithm. It was tested on real patient data sets and showed satisfactory results compared to conventional FCM and K-Means clustering algorithms. The PET-SFCM algorithm provides an effective way to segment PET images and analyze brain changes related to neurological conditions.
Bayesian system reliability and availability analysis underthe vague environm...ijsc
Reliability modeling is the most important discipline of reliable engineering. The main purpose of this paper is to provide a methodology for discussing the vague environment. Actually we discuss on Bayesian system reliability and availability analysis on the vague environment based on Exponential distribution
under squared error symmetric and precautionary asymmetric loss functions. In order to apply the Bayesian approach, model parameters are assumed to be vague random variables with vague prior distributions. This approach will be used to create the vague Bayes estimate of system reliability and availability by introducing and applying a theorem called “Resolution Identity” for vague sets. For this purpose, the original problem is transformed into a nonlinear programming problem which is then divided up into eight subproblems to simplify computations. Finally, the results obtained for the subproblems can
be used to determine the membership functions of the vague Bayes estimate of system reliability. Finally, the sub problems can be solved by using any commercial optimizers, e.g. GAMS or LINGO.
This document discusses feature selection techniques for classification problems. It begins by outlining class separability measures like divergence, Bhattacharyya distance, and scatter matrices. It then discusses feature subset selection approaches, including scalar feature selection which treats features individually, and feature vector selection which considers feature sets and correlations. Examples are provided to demonstrate calculating class separability measures for different feature combinations on sample datasets. Exhaustive search and suboptimal techniques like forward, backward, and floating selection are discussed for choosing optimal feature subsets. The goal of feature selection is to select a subset of features that maximizes class separation.
A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMSijfcstjournal
This research paper is a statistical comparative study of a few average case asymptotically optimal sorting
algorithms namely, Quick sort, Heap sort and K- sort. The three sorting algorithms all with the same
average case complexity have been compared by obtaining the corresponding statistical bounds while
subjecting these procedures over the randomly generated data from some standard discrete and continuous
probability distributions such as Binomial distribution, Uniform discrete and continuous distribution and
Poisson distribution. The statistical analysis is well supplemented by the parameterized complexity
analysis
This research paper is a statistical comparative study of a few average case asymptotically optimal sorting algorithms namely, Quick sort, Heap sort and K- sort. The three sorting algorithms all with the same average case complexity have been compared by obtaining the corresponding statistical bounds while subjecting these procedures over the randomly generated data from some standard discrete and continuous
probability distributions such as Binomial distribution, Uniform discrete and continuous distribution and Poisson distribution. The statistical analysis is well supplemented by the parameterized complexity analysis.
An efficient fuzzy classifier with feature selection basedssairayousaf
This document presents an efficient fuzzy classifier with feature selection capabilities. A fuzzy entropy measure is used to partition the input feature space into non-overlapping decision regions and to select relevant features. Fuzzy entropy evaluates the information of pattern distribution in the pattern space. The decision regions do not overlap, reducing computational complexity and load. Classification speed is extremely fast while still achieving good performance by correctly determining decision region boundaries. Feature selection via fuzzy entropy reduces dimensionality by discarding noisy, redundant, and unimportant features. The proposed classifier is applied to two databases with good classification results, demonstrating its effectiveness.
Local Model Checking Algorithm Based on Mu-calculus with Partial OrdersTELKOMNIKA JOURNAL
The propositionalμ-calculus can be divided into two categories, global model checking algorithm
and local model checking algorithm. Both of them aim at reducing time complexity and space complexity
effectively. This paper analyzes the computing process of alternating fixpoint nested in detail and designs
an efficient local model checking algorithm based on the propositional μ-calculus by a group of partial
ordered relation, and its time complexity is O(d2(dn)d/2+2) (d is the depth of fixpoint nesting, n is the
maximum of number of nodes), space complexity is O(d(dn)d/2). As far as we know, up till now, the best
local model checking algorithm whose index of time complexity is d. In this paper, the index for time
complexity of this algorithm is reduced from d to d/2. It is more efficient than algorithms of previous
research.
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...ijdpsjournal
Tomography is important for network design and routing optimization. Prior approaches require either
precise time synchronization or complex cooperation. Furthermore, active tomography consumes explicit
probing resulting in limited scalability. To address the first issue we propose a novel Delay Correlation
Estimation methodology named DCE with no need of synchronization and special cooperation. For the
second issue we develop a passive realization mechanism merely using regular data flow without explicit
bandwidth consumption. Extensive simulations in OMNeT++ are made to evaluate its accuracy where we
show that DCE measurement is highly identical with the true value. Also from test result we find that
mechanism of passive realization is able to achieve both regular data transmission and purpose of
tomography with excellent robustness versus different background traffic and package size.
New Approach of Preprocessing For Numeral RecognitionIJERA Editor
The present paper proposes a new approach of preprocessing for handwritten, printed and isolated numeral
characters. The new approach reduces the size of the input image of each numeral by discarding the redundant
information. This method reduces also the number of features of the attribute vector provided by the extraction
features method. Numeral recognition is carried out in this work through k nearest neighbors and multilayer
perceptron techniques. The simulations have obtained a good rate of recognition in fewer running time.
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...Waqas Tariq
Selection of inputs is one of the most substantial components of classification algorithms for data mining and pattern recognition problems since even the best classifier will perform badly if the inputs are not selected very well. Big data and computational complexity are main cause of bad performance and low accuracy for classical classifiers. In other words, the complexity of classifier method is inversely proportional with its classification efficiency. For this purpose, two hybrid classifiers have been developed by using both type-1 and type-2 fuzzy c-means clustering with cascaded a classifier. In this proposed classifier, a large number of data points are reduced by using fuzzy c-means clustering before applied to a classifier algorithm as inputs. The aim of this study is to investigate the effect of fuzzy clustering on well-known and useful classifiers such as artificial neural networks (ANN) and support vector machines (SVM). Then the role of positive effects of these proposed algorithms were investigated on applied different data sets.
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...CSCJournals
Selection of inputs is one of the most substantial components of classification algorithms for data mining and pattern recognition problems since even the best classifier will perform badly if the inputs are not selected very well. Big data and computational complexity are main cause of bad performance and low accuracy for classical classifiers. In other words, the complexity of classifier method is inversely proportional with its classification efficiency. For this purpose, two hybrid classifiers have been developed by using both type-1 and type-2 fuzzy c-means clustering with cascaded a classifier. In this proposed classifier, a large number of data points are reduced by using fuzzy c-means clustering before applied to a classifier algorithm as inputs. The aim of this study is to investigate the effect of fuzzy clustering on well-known and useful classifiers such as artificial neural networks (ANN) and support vector machines (SVM). Then the role of positive effects of these proposed algorithms were investigated on applied different data sets.
The document discusses finding initial parameters for neural networks used in data clustering. It presents a modified K-means fast learning artificial neural network (K-FLANN) algorithm that uses differential evolution to optimize the vigilance and tolerance parameters of K-FLANN. Differential evolution is used to select parameter values that yield good clustering performance. The modified K-FLANN algorithm is evaluated on several data sets and shown to provide promising results in terms of convergence rate and accuracy compared to other algorithms. Comparisons are also made between the original and modified versions of K-FLANN.
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET Journal
This document discusses optimization techniques for clustering algorithms. It introduces fuzzy bee colony optimization (FBCO) and compares its performance to other swarm algorithms like fuzzy c-means (FCM) and fuzzy particle swarm optimization (FPSO). FBCO is motivated by the natural behaviors of bee colonies and aims to avoid local minima problems. The document provides background on clustering, describes the FCM and FPSO algorithms, and proposes a FBCO algorithm to improve clustering performance.
Parametric sensitivity analysis of a mathematical model of facultative mutualismIOSR Journals
The complex dynamics of facultative mutualism is best described by a system of continuous non-linear first order ordinary differential equations. The methods of 1-norm, 2-norm, and infinity-norm will be used to quantify and differentiate the different forms of the sensitivity of model parameters. These contributions will be presented and discussed.
Investigation on Using Fractal Geometry for Classification of Partial Dischar...IOSR Journals
1) The document investigates using fractal geometry to classify partial discharge (PD) patterns from different insulation defects.
2) Fractal features like fractal dimension and lacunarity are extracted from 3D PD patterns using box counting. Each PD pattern has a unique fractal dimension and multiple lacunarity values depending on box size.
3) Neural networks are used to classify PD patterns based on their fractal features. The goal is to minimize input features to improve classification performance. Different lacunarity values from varying box sizes are analyzed to find those most useful for classification.
LINEAR SEARCH VERSUS BINARY SEARCH: A STATISTICAL COMPARISON FOR BINOMIAL INPUTSIJCSEA Journal
For certain algorithms such as sorting and searching, the parameters of the input probability distribution,in addition to the size of the input, have been found to influence the complexity of the underlying algorithm.The present paper makes a statistical comparative study on parameterized complexity between linear and binary search algorithms for binomial inputs.
The document analyzes the use of the Tent map as a source of pseudorandom bits for generating binary codes. It evaluates the Tent map's period length, discrimination value, and merit factor. The Tent map is proposed as an alternative to traditional low-complexity pseudorandom bit generators. Different window functions are applied to the binary codes generated from the Tent map to reduce side lobes and improve performance. Results show discrimination increases with sequence length and some window functions perform better than others at different lengths.
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!idescitation
Sundararajan and Chakraborty [10] introduced a new
version of Quick sort removing the interchanges. Khreisat
[6] found this algorithm to be competing well with some other
versions of Quick sort. However, it uses an auxiliary array
thereby increasing the space complexity. Here, we provide a
second version of our new sort where we have removed the
auxiliary array. This second improved version of the algorithm,
which we call K-sort, is found to sort elements faster than
Heap sort for an appreciably large array size (n 70,00,000)
for uniform U[0, 1] inputs.
Mathematical Modelling and Computer Simulation Assist in Designing Non-tradit...IJECEIAES
This document describes a study that uses mathematical modeling and computer simulation to analyze the processes involved in electrostatic separation and precipitation. The study develops computer models of cylindrical electrostatic separators charged with both direct current and alternating current. The models are based on solving systems of differential equations that describe the motion, forces, and electric fields affecting macroscopic charged particles within the separator space. The computer simulations analyze the kinematics, dynamics and trajectories of particles moving through the separators. They provide insights into critical operating states for AC charged separators and the effects of various factors like gradient force on particle motion. The models and simulations improve understanding of separation processes compared to prior empirical approaches and enable more complex analysis of particle ballistics.
Random Walks in Statistical Theory of CommunicationIRJET Journal
1. The document discusses random walks, which are a type of random process that can model phenomena like diffusion and stock price variations. Random walks involve successive random steps and can take place in discrete or continuous time and space.
2. The document provides details on modeling random walks mathematically using probabilities and binomial distributions. It also discusses calculating the probability of a random walk returning to its origin.
3. The document shows examples of using random walks to model particle motion in 2D and 3D spaces. It also discusses how continuous random variables can produce Brownian motion, a type of random walk. Random walks have various applications in fields like computer science, image processing, genetics and neuroscience.
1. The document presents a new approach for steganography detection using a combination of Fisher's linear discriminant function (FLD) and radial basis function neural network (RBF).
2. In the training phase, FLD is used to project high-dimensional image data onto a lower dimensional space, then an RBF network is trained to classify images as containing hidden data or not.
3. Experiments show the combined FLD-RBF method provides promising results for steganography detection compared to existing supervised methods, though extracting the hidden information remains challenging.
This document analyzes the performance of QuickSort and MergeSort sorting algorithms through benchmarking. It benchmarks the two algorithms on five datasets - two presorted, three random. The results show QuickSort performs worse on presorted data as expected, while MergeSort performs consistently. Some anomalies are discussed in the results. The document also discusses the methodology, limitations, expectations and lessons learned from the analysis.
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...IOSR Journals
Experts in the mathematical modeling for two interacting technologies have observed the different contributions between the intraspecific and the interspecific coefficients in conjunction with the starting population sizes and the trading period. In this complex multi-parameter system of competing technologies which evolve over time, we have used the numerical method of mathematical norms to measure the sensitivity values of the intraspecific coefficients b and e, the starting population sizes of the two interacting technologies and the duration of trading. We have observed that the two intraspecific coefficients can be considered as most sensitive parameter while the starting populations are called least sensitive. We will expect these contributions to provide useful insights in the determination of the important parameters which drive the dynamics of the technological substitution model in the context of one-at-a-timesensitivity analysis
Similar to Partition Sort Revisited: Reconfirming the Robustness in Average Case and Much More (20)
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Partition Sort Revisited: Reconfirming the Robustness in Average Case and Much More
1. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
DOI : 10.5121/ijcsa.2012.2102 13
PARTITION SORT REVISITED:
RECONFIRMING THE ROBUSTNESS IN
AVERAGE CASE AND MUCH MORE!
Niraj Kumar Singh1
, Mita Pal2
and Soubhik Chakraborty3*
1
Department of Computer Science & Engineering,
B.I.T. Mesra, Ranchi-835215, India
2,3
Department of Applied Mathematics,
B.I.T. Mesra, Ranchi-835215, India
*email address of the corresponding author: soubhikc@yahoo.co.in
(S. Chakraborty)
ABSTRACT
In our previous work there was some indication that Partition Sort could be having a more robust
average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we
reconfirm this through computer experiments for inputs from Cauchy distribution for which
expectation theoretically does not exist. Additionally, the algorithm is found to be sensitive to
parameters of the input probability distribution demanding further investigation on parameterized
complexity. The results on this algorithm for Binomial inputs in our second study are very
encouraging in that direction.
KEY WORDS
Partition-sort; average case complexity, robustness; parameterized complexity; computer
experiments; factorial experiments
1. Introduction
Average complexity is an important field of study in algorithm analysis as it explains how certain
algorithms with bad worst case complexity perform better on the average like Quick sort. The
danger in making such a claim often lies in not verifying the robustness of the average complexity
in question. Average complexity is theoretically obtained by applying mathematical expectation
to the dominant operation or the dominant region in the code. One problem is: for a complex code
it is not easy to identify the dominant operation. This problem can be resolved by replacing the
count based mathematical bound by a weight based statistical bound that also permits collective
consideration of all operations and then estimate it by directly working on time, regarding the
time consumed by an operation as its weight. A bigger problem is that the probability distribution
over which expectation is taken may not be realistic over the domain of the problem. Algorithm
books derive these expectations for uniform probability inputs. Nothing is stated explicitly that
the results will hold even for non-uniform inputs nor is there any indication as to how realistic the
2. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
14
uniform input is over the domain of the problem. The rejection of Knuth’s proof in [1] and
Hoare’s proof in [2] for non uniform inputs should be a curtain raiser in that direction. Similarly,
it appears from [3] that the average complexity in Schoor’s matrix multiplication algorithm is not
the expected number of multiplications O(d1d2 n3
), d1 and d2 being the density (fraction of non
zero elements) of pre and post factor matrices, but the exact number of comparisons which is n2
provided there are sufficient zeroes and surprisingly we don’t need a sparse matrix to get an
empirical O(n2
) complexity! This result is obtained using a statistical bound estimate and shows
that multiplication need not be the dominant operation in every matrix multiplication algorithm
under certain input conditions.
In our previous work we introduced Partition Sort and discovered it to be having a more robust
average case O(nlogn) complexity than the popular Quick sort. In our first study in this paper, we
reconfirm this through computer experiments for inputs from Cauchy distribution for which
expectation theoretically does not exist! Additionally, the algorithm is found to be sensitive to
parameters of the input probability distribution demanding further investigation on parameterized
complexity on this algorithm. This is confirmed for Binomial inputs in our second study.
The Algorithm Partition Sort
Partition-sort algorithm is based on divide and conquer paradigm. The function “partition” is the
key sub-routine of this algorithm. The nature of partition function is such that when applied on
input A[1…….n] it divides this list into two halves of sizes floor (n/2) and ceiling (n/2)
respectively. The property of the elements in these halves is such that the value of each element in
first half is less than the value of every element in the second half. The Partition-sort routine is
called on each half recursively to finally obtain a sorted sequence of data as required. Partition
Sort was introduced by Singh and Chakraborty [4] who obtained O(nlog2
2
n) worst case count,
(nlog2n) best case count and empirical O(nlog2n) as the statistical bound estimate by working
directly on time, for reasons stated earlier, in the average case.
2. Statistical Analysis
2.1 Reconfirming the robustness of average complexity of Partition Sort
Theorem 1: If U1 and U2 are two independent uniform U [0, 1] variates then Z1 and Z2 defined
below are two independent Standard Normal variates:
Z1= (-2lnU1)1/2
Cos(2ЛU2); Z2= (-2lnU1)1/2
Sin(2ЛU2)
This result is called Box Muller transformation.
Theorem 2: If Z1 and Z2 are two independent standard Normal variates then Z1/Z2 is a standard
Cauchy variate. For more details, we refer to [5].
Cauchy distribution is an unconventional distribution for which expectation does not exist
theoretically. Hence it is not possible to know the average case complexity theoretically for inputs
from this distribution. Working directly on time, using computer experiments, we have obtained
an empirical O(nlogn) complexity in average sorting time for Partition sort for Cauchy
distribution inputs which we simulated using theorems 1 and 2 given above. This result goes a
3. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
15
long way in reconfirming that Partition Sort’s average complexity is more robust compared to
that of Quick Sort. In [4] we have theoretically proved that its worst case complexity is also much
better than that of Quick Sort as O(nlog2
2
n) < O(n2
). Although Partition Sort is inferior to Heap
Sort’s O(nlogn) complexity in worst case, it is still easier to program Partition Sort.
Table 1 and figure 1 based on table 1 summarize our results.
Table 1: Average time for Partition Sort for Cauchy distribution inputs
N 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
Me
an
Tim
e
(Se
c.)
0.0516
8
0.1081
6
0.148
7
0.1721
8
0.2049
4
0.2407
8
0.265
9
0.3132
2
0.3512
8
0.39496
Fig. 1: Regression model suggesting empirical O(nlogn) complexity
2.2 Partition Sort subjected to parameterized complexity analysis
Parameterized complexity is a branch of computational complexity theory in computer science
that focuses on classifying computational problems according to their inherent difficulty with
respect to multiple parameters of the input. The complexity of a problem is then measured as a
function in those parameters. This allows to classify NP-hard problems on a finer scale than in the
classical setting, where the complexity of a problem is only measured by the number of bits in the
input (see also http://en.wikipedia.org/wiki/Parameterized_complexity). The first systematic work
on parameterized complexity was done by Downey & Fellows [6]. The authors in [7] have
strongly argued both theoretically and experimentally why for certain algorithms like sorting, the
parameters of the input distribution should also be taken into account for explaining the
complexity, not just the parameter characterizing the size of the input. The second study is
accordingly devoted to parameterized complexity analysis whereby the sorting elements of
Partition Sort come independently from a Binomial (m, p) distribution. Use is made of
factorial experiments to investigate the individual effect of number of sorting elements
(n), binomial distribution parameters (m and p which give the number of independent trials and
the fixed probability of success in a single trial respectively) and also their interaction effects. A
4. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
16
3-cube factorial experiment is conducted with three levels of each of the three factors n, m and p.
All the three factors are found to be significant both individually and interactively.
In our second study, Table-2 gives the data for factorial experiments to accomplish our study on
parameterized complexity.
Table 2: Data for 33
factorial experiment for Partition Sort
Partition sort times in second Binomial (m , p ) distribution input for various n (50000, 100000,
150000) , m (100 , 1000, 1500) and p (0.2, 0.5, 0.8).
Each reading is averaged over 50 readings.
n = 50000
m p=0.2 p=0.5 p=0.8
100 0.07248 0.07968 0.07314
1000 0.09662 0.10186 0.09884
1500 0.10032 0.10618 0.10212
n=100000
m p=0.2 p=0.5 p=0.8
100 0.16502 0.1734 0.16638
1000 0.21394 0.22318 0.21468
1500 0.22194 0.23084 0.22356
n = 150000
m p=0.2 p=0.5 p=0.8
100 0.26242 0.27632 0.26322
1000 0.33988 0.35744 0.34436
1500 0.35648 0.37 0.35572
Table-3 gives the results using MINITAB statistical package version 15.
Table-3: Results of 33
factorial experiment on partition-sort
5. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
17
General Linear Model: y versus n, m, p
Factor Type Levels Values
n fixed 3 1, 2, 3
m fixed 3 1, 2, 3
p fixed 3 1, 2, 3
Analysis of Variance for y, using Adjusted SS for Tests
Source DF Seq SS Adj SS Adj MS F P
n 2 0.731167 0.731167 0.365584 17077435.80 0.000
m 2 0.056680 0.056680 0.028340 1323846.78 0.000
p 2 0.001440 0.001440 0.000720 33637.34 0.000
n*m 4 0.011331 0.011331 0.002833 132322.02 0.000
n*p 4 0.000283 0.000283 0.000071 3302.87 0.000
m*p 4 0.000034 0.000034 0.000009 397.33 0.000
n*m*p 8 0.000046 0.000046 0.000006 266.70 0.000
Error 54 0.000001 0.000001 0.000000
Total 80 0.800982
S = 0.000146313 R-Sq = 100.00% R-Sq(adj) = 100.00%
3. Discussion and more statistical analysis
Partition sort is highly affected by the main effects n, m and p. When we consider the interaction
effects, interestingly we find that all interactions are significant in Partition-Sort. Strikingly, even
the three factor interaction n*m*p cannot be neglected. This means Partition Sort is quite
sensitive to parameters of the input distribution and hence qualifies to be a potential candidate for
deep investigation in parameterized complexity both theoretically (through counts) and
experimentally (through weights) for inputs from other distributions. Further, we have obtained
some interesting patterns showing how the Binomial parameters influence the average sorting
time. Our investigations are ongoing for a theoretical justification for the same. The final results
are summarized in tables 4-5 and figures 2A, 2B and 3 based on these tables respectively.
Each entry in the following tables is averaged over 50 readings.
Table 4: Partition Sort, Binomial (m, p) distribution, array size N=50000, p=0.5 fixed
m 100 300 500 700 900 1100 1300 1500
Mean
time
(sec.)
0.07968 0.09066 0.09586 0.09968 0.10154 0.10438 0.10282 0.10618
6. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
18
Avg Time vs m of Binomial (m,p)y = 1E-11x
3
- 5E-08x
2
+ 7E-05x + 0.0739
R
2
= 0.9916
0
0.02
0.04
0.06
0.08
0.1
0.12
0 200 400 600 800 1000 1200 1400 1600
m
AvgTimeofPartitionSortinsec
Fig 2A Third degree polynomial fit captures the trend
Avg Time vs m of Binomial (m,p)y = -5E-14x
4
+ 2E-10x
3
- 2E-07x
2
+ 0.0001x + 0.0703
R
2
= 0.9987
0
0.02
0.04
0.06
0.08
0.1
0.12
0 200 400 600 800 1000 1200 1400 1600
m
AvgTimeofPartitionSortinsec
Fig 2B Fourth degree polynomial appears to be a forced fit (over fit)
(don’t get carried away by the higher value of R2
!)
7. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
19
Although the fourth degree polynomial fit gives a higher value of R2
, it forces the fit to pass
through all the data points. The essence of curve fitting lies in catching the trend (in the
population) exhibited by the observations rather than catching the observations themselves
(which reflect only a sample). Besides, a bound estimate must look like a bound estimate and it is
stronger to write yavg(n, m, p)=Oemp(m3
) than to write yavg(n, m, p)=Oemp(m4
) for fixed n and p.
So we agree to accept the first of the two propositions.
Table 5: Partition Sort, Binomial distribution (m, p), n=50000, m=1000 fixed
p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Mean
time
(sec.)
0.09084 0.09662 0.09884 0.10198 0.10186 0.10034 0.0989 0.09884 0.09096
Avg time vs p of Binomial (m,p) y = -0.0649x
2
+ 0.0659x + 0.0853
R
2
= 0.9293
0.09
0.092
0.094
0.096
0.098
0.1
0.102
0.104
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p
AvgTimeforPartitionSortinsec
Fig. 3 Second degree polynomial fit captures the trend
Fitting higher polynomials lead to over-fitting (details omitted) and from previous arguments we
put yavg(n, m, p)=Oemp(p2
) for fixed n and m.
For definitions of statistical bound and empirical O, we refer to [4]. For a list of properties of a
statistical complexity bound as well as to understand what design and analysis of computer
experiments mean when the response is a complexity such as time, [8] may be consulted.
4. Conclusion and suggestions for future work
9. International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.1, February 2012
21
[7] S.Chakraborty, S.K.Sourabh, M.Bose and K. Sushant, Replacement sort revisited: The “gold
standard” unearthed! , Applied Mathematics and Computation ,vol. 189(2), 2007, p. 384-394
[8] S.Chakraborty and S.K.Sourabh, A Computer Experiment Oriented Approach to Algorithmic
Complexity, Lambert Academic Publishing, (2010)
[9] J.Sacks, W.Weltch, T.Mitchel, H.Wynn, Design and Analysis of Computer Experiments, Statistical
Science Vol.4 (4), (1989)
[10] K.T. Fang, R. Li, A.Sudjianto, Design and Modeling of Computer Experiments, Chapman and Hall
(2006)