This document proposes an improved particle swarm optimization (PSO) algorithm for data clustering that incorporates Gauss chaotic map. PSO is often prone to premature convergence, so the proposed method uses Gauss chaotic map to generate random sequences that substitute the random parameters in PSO, providing more exploration of the search space. The algorithm is tested on six real-world datasets and shown to outperform K-means, standard PSO, and other hybrid clustering algorithms. The key aspects of the proposed GaussPSO method and experimental results demonstrating its effectiveness are described.
A New Enhanced Method of Non Parametric power spectrum Estimation.CSCJournals
The spectral analysis of non uniform sampled data sequences using Fourier Periodogram method is the classical approach. In view of data fitting and computational standpoints why the Least squares periodogram(LSP) method is preferable than the “classical” Fourier periodogram and as well as to the frequently-used form of LSP due to Lomb and Scargle is explained. Then a new method of spectral analysis of nonuniform data sequences can be interpreted as an iteratively weighted LSP that makes use of a data-dependent weighting matrix built from the most recent spectral estimate. It is iterative and it makes use of an adaptive (i.e., data-dependent) weighting, we refer to it as the iterative adaptive approach (IAA).LSP and IAA are nonparametric methods that can be used for the spectral analysis of general data sequences with both continuous and discrete spectra. However, they are most suitable for data sequences with discrete spectra (i.e., sinusoidal data), which is the case we emphasize in this paper. Of the existing methods for nonuniform sinusoidal data, Welch, MUSIC and ESPRIT methods appear to be the closest in spirit to the IAA proposed here. Indeed, all these methods make use of the estimated covariance matrix that is computed in the first iteration of IAA from LSP. MUSIC and ESPRIT, on the other hand, are parametric methods that require a guess of the number of sinusoidal components present in the data, otherwise they cannot be used; furthermore.
The document presents a methodology for decomposing dependence for high-dimensional extremes using a tail pairwise dependence matrix (TPDM). The authors apply the method to analyze spatial precipitation extremes across 409 stations in the United States. They estimate the TPDM from the station data and perform an eigendecomposition to identify principal components that explain extremal dependence structures. The first few eigenvectors and eigenvalues account for a large proportion of the total dependence. The extremal principal components may provide insight into climate drivers of extreme precipitation patterns.
A brief talk on reservoir computing from the perspective of dynamical system. Mostly based on these 2 papers:
1. Pathak, J., Hunt, B., Girvan, M., Lu, Z., & Ott, E. (2018). Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach. Physical review letters, 120(2), 024102.
2. A Parsimonious Dynamical Model for Structural Learning in the Human Brain. arXiv preprint arXiv:1807.05214.
In this paper, we propose an improved quantum-behaved particle swarm optimization (QPSO), introducing chaos theory into QPSO and exerting logistic map to every particle position X(t) at a certain probability. In this improved QPSO, the logistic map is used to generate a set of chaotic offsets and produce multiple positions around X(t). According to their fitness, the particle's position is updated. In order to further enhance the diversity of particles, mutation operation is introduced into and acts on one dimension of the particle's position. What's more, the chaos and mutation probabilities are carefully selected. Through several typical function experiments, its result shows that the convergence accuracy of the improved QPSO is better than those of QPSO, so it is feasible and effective to introduce chaos theory and mutation operation into QPSO.
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONijsc
Two modern optimization methods including Particle Swarm Optimization and Differential Evolution are
compared on twelve constrained nonlinear test functions. Generally, the results show that Differential
Evolution is better than Particle Swarm Optimization in terms of high-quality solutions, running time and
robustness.
ON ESTIMATION OF TIME SCALES OF MASS TRANSPORT IN INHOMOGENOUS MATERIALZac Darcy
In this paper we generalized recently introduced approach of estimation of time scales of mass transport in inhomogenous materials under influence of inhomogenous potential field. Some examples of using of the approach were considered.
This document introduces an R package called PSF that implements a Pattern Sequence based Forecasting (PSF) algorithm for univariate time series forecasting. The PSF algorithm clusters time series data and then predicts future values based on identifying repeating patterns of clusters. The PSF package contains functions that perform the main steps of the PSF algorithm, including selecting the optimal number of clusters, selecting the optimal window size, and making predictions for a given window size and number of clusters. The package aims to promote and simplify the use of the PSF algorithm for time series forecasting.
A New Enhanced Method of Non Parametric power spectrum Estimation.CSCJournals
The spectral analysis of non uniform sampled data sequences using Fourier Periodogram method is the classical approach. In view of data fitting and computational standpoints why the Least squares periodogram(LSP) method is preferable than the “classical” Fourier periodogram and as well as to the frequently-used form of LSP due to Lomb and Scargle is explained. Then a new method of spectral analysis of nonuniform data sequences can be interpreted as an iteratively weighted LSP that makes use of a data-dependent weighting matrix built from the most recent spectral estimate. It is iterative and it makes use of an adaptive (i.e., data-dependent) weighting, we refer to it as the iterative adaptive approach (IAA).LSP and IAA are nonparametric methods that can be used for the spectral analysis of general data sequences with both continuous and discrete spectra. However, they are most suitable for data sequences with discrete spectra (i.e., sinusoidal data), which is the case we emphasize in this paper. Of the existing methods for nonuniform sinusoidal data, Welch, MUSIC and ESPRIT methods appear to be the closest in spirit to the IAA proposed here. Indeed, all these methods make use of the estimated covariance matrix that is computed in the first iteration of IAA from LSP. MUSIC and ESPRIT, on the other hand, are parametric methods that require a guess of the number of sinusoidal components present in the data, otherwise they cannot be used; furthermore.
The document presents a methodology for decomposing dependence for high-dimensional extremes using a tail pairwise dependence matrix (TPDM). The authors apply the method to analyze spatial precipitation extremes across 409 stations in the United States. They estimate the TPDM from the station data and perform an eigendecomposition to identify principal components that explain extremal dependence structures. The first few eigenvectors and eigenvalues account for a large proportion of the total dependence. The extremal principal components may provide insight into climate drivers of extreme precipitation patterns.
A brief talk on reservoir computing from the perspective of dynamical system. Mostly based on these 2 papers:
1. Pathak, J., Hunt, B., Girvan, M., Lu, Z., & Ott, E. (2018). Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach. Physical review letters, 120(2), 024102.
2. A Parsimonious Dynamical Model for Structural Learning in the Human Brain. arXiv preprint arXiv:1807.05214.
In this paper, we propose an improved quantum-behaved particle swarm optimization (QPSO), introducing chaos theory into QPSO and exerting logistic map to every particle position X(t) at a certain probability. In this improved QPSO, the logistic map is used to generate a set of chaotic offsets and produce multiple positions around X(t). According to their fitness, the particle's position is updated. In order to further enhance the diversity of particles, mutation operation is introduced into and acts on one dimension of the particle's position. What's more, the chaos and mutation probabilities are carefully selected. Through several typical function experiments, its result shows that the convergence accuracy of the improved QPSO is better than those of QPSO, so it is feasible and effective to introduce chaos theory and mutation operation into QPSO.
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONijsc
Two modern optimization methods including Particle Swarm Optimization and Differential Evolution are
compared on twelve constrained nonlinear test functions. Generally, the results show that Differential
Evolution is better than Particle Swarm Optimization in terms of high-quality solutions, running time and
robustness.
ON ESTIMATION OF TIME SCALES OF MASS TRANSPORT IN INHOMOGENOUS MATERIALZac Darcy
In this paper we generalized recently introduced approach of estimation of time scales of mass transport in inhomogenous materials under influence of inhomogenous potential field. Some examples of using of the approach were considered.
This document introduces an R package called PSF that implements a Pattern Sequence based Forecasting (PSF) algorithm for univariate time series forecasting. The PSF algorithm clusters time series data and then predicts future values based on identifying repeating patterns of clusters. The PSF package contains functions that perform the main steps of the PSF algorithm, including selecting the optimal number of clusters, selecting the optimal window size, and making predictions for a given window size and number of clusters. The package aims to promote and simplify the use of the PSF algorithm for time series forecasting.
This document discusses various approaches for data fusion, which refers to statistically combining data from different sources. The main approaches covered are data assimilation, optimal interpolation, variational methods, and the Kalman filter. Data assimilation aims to combine model output with observations to estimate the true state. Optimal interpolation finds the best linear combination of a background field and observations to minimize error. Variational methods determine the state by minimizing a cost function, while the Kalman filter sequentially assimilates observations using forecast and analysis steps. The goal of all these approaches is to integrate multiple data sources to obtain a better estimate of the true state than using any one source alone.
Availability of a Redundant System with Two Parallel Active Componentstheijes
This paper considers a redundant system which consists of two parallel active components. The time-to-failure
and the time-to-repair of the components follow an exponential and a general distribution, respectively. The
repairs of failed components are randomly interrupted. The time-to-interrupt is taken from an exponentially
distributed random variable and the interrupt times are generally distributed. We obtain the availability for the
system
This document discusses unsupervised learning and clustering algorithms. It begins with an introduction to unsupervised learning, including motivations and differences from supervised learning. It then covers mixture density models, maximum likelihood estimation, and the k-means clustering algorithm. It discusses evaluating clustering using criterion functions and similarity measures. Specific topics covered include normal mixture models, EM algorithm, Euclidean distance, and hierarchical clustering.
The asynchronous parallel algorithms are developed to solve massive optimization problems in a distributed data system, which can be run in parallel on multiple nodes with little or no synchronization. Recently they have been successfully implemented to solve a range of difficult problems in practice. However, the existing theories are mostly based on fairly restrictive assumptions on the delays, and cannot explain the convergence and speedup properties of such algorithms. In this talk we will give an overview on distributed optimization, and discuss some new theoretical results on the convergence of asynchronous parallel stochastic gradient algorithm with unbounded delays. Simulated and real data will be used to demonstrate the practical implication of these theoretical results.
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Leo Asselborn
This presentation proposes an algorithmic approach to
synthesize stabilizing control laws for discrete-time piecewise
affine probabilistic (PWAP) systems based on computations of
probabilistic reachable sets. The considered class of systems
contains probabilistic components (with Gaussian distribution)
modeling additive disturbances and state initialization. The
probabilistic reachable state sets contain all states that are
reachable with a given confidence level under the effect of
time-variant control laws. The control synthesis uses principles
of the ellipsoidal calculus, and it considers that the system
parametrization depends on the partition of the state space. The
proposed algorithm uses LMI-constrained semi-definite programming
(SDP) problems to compute stabilizing controllers,
while polytopic input constraints and transitions between regions
of the state space are considered. The formulation of
the SDP is adopted from a previous work in [1] for switched
systems, in which the switching of the continuous dynamics
is triggered by a discrete input variable. Here, as opposed
to [1], the switching occurs autonomously and an algorithmic
procedure is suggested to synthesis a stabilizing controller. An
example for illustration is included.
Training and Inference for Deep Gaussian ProcessesKeyon Vafa
The document discusses training and inference for deep Gaussian processes (DGPs). It introduces the Deep Gaussian Process Sampling (DGPS) algorithm for learning DGPs. The DGPS algorithm relies on Monte Carlo sampling to circumvent the intractability of exact inference in DGPs. It is described as being more straightforward than existing DGP methods and able to more easily adapt to using arbitrary kernels. The document provides background on Gaussian processes and motivation for using deep Gaussian processes before describing the DGPS algorithm in more detail.
Development and quantification of interatomic potentials. Presented at HTCMC 9 in Toronto, Canada June 30th 2016. For further information on DFTFIT see https://github.com/costrouc/dftfit
Data fusion is the process of combining data from different sources to enhance the utility of the combined product. In remote sensing, input data sources are typically massive, noisy, and have different spatial supports and sampling characteristics. We take an inferential approach to this data fusion problem: we seek to infer a true but not directly observed spatial (or spatio-temporal) field from heterogeneous inputs. We use a statistical model to make these inferences, but like all models it is at least somewhat uncertain. In this talk, we will discuss our experiences with the impacts of these uncertainties and some potential ways addressing them.
The document discusses regression models for modeling relationships between input and output variables. It covers linear regression, using linear functions to model the relationship, and nonlinear regression, using nonlinear functions. Maximum a posteriori (MAP) estimation and least squares estimation are described as approaches for estimating the parameters of regression models from data. MAP estimation maximizes the posterior probability of the parameters given the data and assumes prior probabilities on the parameters, while least squares minimizes error. Regularized least squares is also covered, which adds a regularization term to improve stability. Computer experiments are demonstrated applying linear regression to classification problems.
The document describes a new method called SITAd for scalable similarity search of molecular descriptors in large databases. SITAd uses two techniques: 1) database partitioning to limit the search space, and 2) converting similarity search to inner product search. It builds a wavelet tree to efficiently solve the inner product search problem. Experiments on a database of 42 million compounds showed that SITAd was over 100 times faster than alternative inverted index methods while using less memory.
This document discusses nonparametric pattern recognition techniques, including density estimation methods like Parzen windows and the k-nearest neighbors algorithm. It covers density estimation, using Parzen windows to estimate densities without assuming a known form, and provides examples of applying Parzen windows to both classification and estimating mixtures of unknown densities from sample data. Probabilistic neural networks are also introduced as a parallel implementation of Parzen window density estimation.
(PhD Dissertation Defense) Theoretical and Numerical Investigations on Crysta...James D.B. Wang, PhD
This document summarizes Di-Bao Wang's Ph.D. dissertation on theoretical and numerical investigations of crystalline solid materials and their application in simulating laser-assisted nano-imprinting. The dissertation improves the time-history kernel method for modeling thermal motion in crystalline solids by developing a more efficient inverse Laplace transform approach. It also develops an absorbing boundary layer method to accelerate neighbor list updating for molecular dynamics simulations. These enhanced computational methods are applied to study laser-assisted nanoimprinting processes at the nanoscale.
PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...Aboul Ella Hassanien
This talk presented at Bio-inspiring and evolutionary computation: Trends, applications and open issues workshop, 7 Nov. 2015 Faculty of Computers and Information, Cairo University
Binary Vector Reconstruction via Discreteness-Aware Approximate Message PassingRyo Hayakawa
The document proposes a Discreteness-Aware Approximate Message Passing (DAMP) algorithm for reconstructing discrete-valued vectors from underdetermined linear measurements. DAMP extends existing AMP algorithms to handle discrete variables by incorporating probability distributions of the elements. The algorithm is analyzed using state evolution to derive conditions for perfect reconstruction. A Bayes optimal version of DAMP is also developed by minimizing mean squared error. Simulation results demonstrate improved reconstruction performance compared to conventional methods.
This document discusses support vector machines (SVMs) for pattern classification. It begins with an introduction to SVMs, noting that they construct a hyperplane to maximize the margin of separation between positive and negative examples. It then covers finding the optimal hyperplane for linearly separable and nonseparable patterns, including allowing some errors in classification. The document discusses solving the optimization problem using quadratic programming and Lagrange multipliers. It also introduces the kernel trick for applying SVMs to non-linear decision boundaries using a kernel function to map data to a higher-dimensional feature space. Examples are provided of applying SVMs to the XOR problem and computer experiments classifying a double moon dataset.
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating HyperplaneShinichi Tamura
The presentation material for the reading club of Element of Statistical Learning by Hastie et al.
The contents of the sections cover
- Properties of logistic regression compared to least square s fitting
- Difference between logistic regression vs. linear discriminant analysis
- Rosenblatt's perceptron algorithm
- Derivation of optimal hyperplane, which offers the basis for SVM
-------------------------------------------------------------------------
研究室での『統計学習の基礎』(Hastieら著)の輪講用発表資料(ぜんぶ英語)です。
担当範囲は
・最小二乗法との類推で見るロジスティック回帰の特徴
・ロジスティック回帰と線形判別分析の比較
・ローゼンブラットのパーセプトロンアルゴリズム
・SVMの基礎となる最適分離超平面の導出
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!idescitation
Sundararajan and Chakraborty [10] introduced a new
version of Quick sort removing the interchanges. Khreisat
[6] found this algorithm to be competing well with some other
versions of Quick sort. However, it uses an auxiliary array
thereby increasing the space complexity. Here, we provide a
second version of our new sort where we have removed the
auxiliary array. This second improved version of the algorithm,
which we call K-sort, is found to sort elements faster than
Heap sort for an appreciably large array size (n 70,00,000)
for uniform U[0, 1] inputs.
Clustering: Large Databases in data miningZHAO Sam
The document discusses different approaches for clustering large databases, including divide-and-conquer, incremental, and parallel clustering. It describes three major scalable clustering algorithms: BIRCH, which incrementally clusters incoming records and organizes clusters in a tree structure; CURE, which uses a divide-and-conquer approach to partition data and cluster subsets independently; and DBSCAN, a density-based algorithm that groups together densely populated areas of points.
1) The document discusses detection and attribution in climate science, which refers to statistical techniques used to identify the contributions of different forcing factors (like greenhouse gases or solar activity) to changes in climate signals over time.
2) It provides context on the history and development of detection and attribution methods, beginning with early work in the 1970s-1990s and more recent Bayesian approaches.
3) A key paper discussed is one by Katzfuss, Hammerling and Smith (2017) that introduced a Bayesian hierarchical model for climate change detection and attribution to help address uncertainties.
The document presents a study on implementing temporal aggregation functions through user defined aggregate (UDA) functions. It discusses aggregating tuples that are adjacent, overlapping, or near each other in timestamp. Algorithms are presented for aggregating adjacent tuples, overlapping tuples, and near by tuples within a threshold. The algorithms perform aggregation by scanning the relation once. Literature on parsimonious temporal aggregation and supporting temporal queries efficiently in databases is also summarized.
The document analyzes the trailer for the movie The Ring. It discusses how the trailer uses dark lighting effects, smoky appearances, and dark clothing to create a grim and dark atmosphere that links to the horror theme. Mirrors and pixilated screens are used to make what will happen unclear and keep the audience feeling anticipated. The contrast of soft music with pixilated TV sounds adds a chilling vibe. Dark lighting on faces and sets adds to the fear and negative feeling. Shots from behind the character make them seem weak and vulnerable.
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...Shrikant Mandlik
The 2015 study has 472 pages, 177 tables and figures. Worldwide markets are poised to achieve significant growth as the cloud computing for utility infrastructure and the tablets and smart phone communications systems make training information more cogent and more available, remaking all sporting everywhere.
Information services will leverage automated process to leverage cloud computing: services The value of sports analytics is the predictive capabilities provided. The best sports teams are the ones using the power of real-time information to their advantage. What to measure? What real time information is the best? Can the players game the analytics systems?
Lets start with the story of Babe Ruth. The “Babe” used to come to every at bat with the desire to win the game. So early in the game, aware that at the end of the game it would fall on him to win the game, the “Babe” would deliberately strike out on pitches that he really could hit. Later in the game, the pitcher would remember the pitches that had gotten the “Babe” out and “Babe Ruth” could hit with ease, winning the game defying the statisticians.
So, Babe Ruth used sports analytics in the 1930’s in reverse, hoping to entice the pitcher to throw that very pitch he could hit in a tight situation later in the game. His very success illustrates that in sports analytics sophistication is needed. For sports analytics to track Babe Ruth, it would have been necessary to look at the pitches he could hit at the end of the game, not just everything that came at him. How sophisticated is that? You have to know your players to do good sports analytics.
Babe Ruth is at the center of one of the sad stories of sporting in Boston. The Boston Red Sox baseball team, in 2003, had not won a world series since Babe Ruth was sold to New York, the so called “Curse of the Bambino.” John Henry, a financial analytics wizard came along and purchased the Boston Red Sox along with other partners and he took the team to three world series using sports analytics as the dominant force for running the team and building fan enthusiasm. Sports become the model for predictive business decision making. Business has been reorganized among teams, inspired by sports. Analytics, developed by businesses are finding innovative use in sports, leading to models for
business to organize and manage teams.
Sports analytics market driving forces relate to the ability to improve winning percentages and decrease the cost of paying players. By implementing metrics functions that describe how to put together a winning team without a very high payroll, sports analytics provide a winning edge to team management. Analytics are used to figure out how a team can improve fan appeal.
Sports analytics are used for creating fantasy leagues, giving sports fantasy players access to statistics that enhances their play of the game. It is used to improve scouting, to detect new player unusual talent and evaluate
This document discusses various approaches for data fusion, which refers to statistically combining data from different sources. The main approaches covered are data assimilation, optimal interpolation, variational methods, and the Kalman filter. Data assimilation aims to combine model output with observations to estimate the true state. Optimal interpolation finds the best linear combination of a background field and observations to minimize error. Variational methods determine the state by minimizing a cost function, while the Kalman filter sequentially assimilates observations using forecast and analysis steps. The goal of all these approaches is to integrate multiple data sources to obtain a better estimate of the true state than using any one source alone.
Availability of a Redundant System with Two Parallel Active Componentstheijes
This paper considers a redundant system which consists of two parallel active components. The time-to-failure
and the time-to-repair of the components follow an exponential and a general distribution, respectively. The
repairs of failed components are randomly interrupted. The time-to-interrupt is taken from an exponentially
distributed random variable and the interrupt times are generally distributed. We obtain the availability for the
system
This document discusses unsupervised learning and clustering algorithms. It begins with an introduction to unsupervised learning, including motivations and differences from supervised learning. It then covers mixture density models, maximum likelihood estimation, and the k-means clustering algorithm. It discusses evaluating clustering using criterion functions and similarity measures. Specific topics covered include normal mixture models, EM algorithm, Euclidean distance, and hierarchical clustering.
The asynchronous parallel algorithms are developed to solve massive optimization problems in a distributed data system, which can be run in parallel on multiple nodes with little or no synchronization. Recently they have been successfully implemented to solve a range of difficult problems in practice. However, the existing theories are mostly based on fairly restrictive assumptions on the delays, and cannot explain the convergence and speedup properties of such algorithms. In this talk we will give an overview on distributed optimization, and discuss some new theoretical results on the convergence of asynchronous parallel stochastic gradient algorithm with unbounded delays. Simulated and real data will be used to demonstrate the practical implication of these theoretical results.
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Leo Asselborn
This presentation proposes an algorithmic approach to
synthesize stabilizing control laws for discrete-time piecewise
affine probabilistic (PWAP) systems based on computations of
probabilistic reachable sets. The considered class of systems
contains probabilistic components (with Gaussian distribution)
modeling additive disturbances and state initialization. The
probabilistic reachable state sets contain all states that are
reachable with a given confidence level under the effect of
time-variant control laws. The control synthesis uses principles
of the ellipsoidal calculus, and it considers that the system
parametrization depends on the partition of the state space. The
proposed algorithm uses LMI-constrained semi-definite programming
(SDP) problems to compute stabilizing controllers,
while polytopic input constraints and transitions between regions
of the state space are considered. The formulation of
the SDP is adopted from a previous work in [1] for switched
systems, in which the switching of the continuous dynamics
is triggered by a discrete input variable. Here, as opposed
to [1], the switching occurs autonomously and an algorithmic
procedure is suggested to synthesis a stabilizing controller. An
example for illustration is included.
Training and Inference for Deep Gaussian ProcessesKeyon Vafa
The document discusses training and inference for deep Gaussian processes (DGPs). It introduces the Deep Gaussian Process Sampling (DGPS) algorithm for learning DGPs. The DGPS algorithm relies on Monte Carlo sampling to circumvent the intractability of exact inference in DGPs. It is described as being more straightforward than existing DGP methods and able to more easily adapt to using arbitrary kernels. The document provides background on Gaussian processes and motivation for using deep Gaussian processes before describing the DGPS algorithm in more detail.
Development and quantification of interatomic potentials. Presented at HTCMC 9 in Toronto, Canada June 30th 2016. For further information on DFTFIT see https://github.com/costrouc/dftfit
Data fusion is the process of combining data from different sources to enhance the utility of the combined product. In remote sensing, input data sources are typically massive, noisy, and have different spatial supports and sampling characteristics. We take an inferential approach to this data fusion problem: we seek to infer a true but not directly observed spatial (or spatio-temporal) field from heterogeneous inputs. We use a statistical model to make these inferences, but like all models it is at least somewhat uncertain. In this talk, we will discuss our experiences with the impacts of these uncertainties and some potential ways addressing them.
The document discusses regression models for modeling relationships between input and output variables. It covers linear regression, using linear functions to model the relationship, and nonlinear regression, using nonlinear functions. Maximum a posteriori (MAP) estimation and least squares estimation are described as approaches for estimating the parameters of regression models from data. MAP estimation maximizes the posterior probability of the parameters given the data and assumes prior probabilities on the parameters, while least squares minimizes error. Regularized least squares is also covered, which adds a regularization term to improve stability. Computer experiments are demonstrated applying linear regression to classification problems.
The document describes a new method called SITAd for scalable similarity search of molecular descriptors in large databases. SITAd uses two techniques: 1) database partitioning to limit the search space, and 2) converting similarity search to inner product search. It builds a wavelet tree to efficiently solve the inner product search problem. Experiments on a database of 42 million compounds showed that SITAd was over 100 times faster than alternative inverted index methods while using less memory.
This document discusses nonparametric pattern recognition techniques, including density estimation methods like Parzen windows and the k-nearest neighbors algorithm. It covers density estimation, using Parzen windows to estimate densities without assuming a known form, and provides examples of applying Parzen windows to both classification and estimating mixtures of unknown densities from sample data. Probabilistic neural networks are also introduced as a parallel implementation of Parzen window density estimation.
(PhD Dissertation Defense) Theoretical and Numerical Investigations on Crysta...James D.B. Wang, PhD
This document summarizes Di-Bao Wang's Ph.D. dissertation on theoretical and numerical investigations of crystalline solid materials and their application in simulating laser-assisted nano-imprinting. The dissertation improves the time-history kernel method for modeling thermal motion in crystalline solids by developing a more efficient inverse Laplace transform approach. It also develops an absorbing boundary layer method to accelerate neighbor list updating for molecular dynamics simulations. These enhanced computational methods are applied to study laser-assisted nanoimprinting processes at the nanoscale.
PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...Aboul Ella Hassanien
This talk presented at Bio-inspiring and evolutionary computation: Trends, applications and open issues workshop, 7 Nov. 2015 Faculty of Computers and Information, Cairo University
Binary Vector Reconstruction via Discreteness-Aware Approximate Message PassingRyo Hayakawa
The document proposes a Discreteness-Aware Approximate Message Passing (DAMP) algorithm for reconstructing discrete-valued vectors from underdetermined linear measurements. DAMP extends existing AMP algorithms to handle discrete variables by incorporating probability distributions of the elements. The algorithm is analyzed using state evolution to derive conditions for perfect reconstruction. A Bayes optimal version of DAMP is also developed by minimizing mean squared error. Simulation results demonstrate improved reconstruction performance compared to conventional methods.
This document discusses support vector machines (SVMs) for pattern classification. It begins with an introduction to SVMs, noting that they construct a hyperplane to maximize the margin of separation between positive and negative examples. It then covers finding the optimal hyperplane for linearly separable and nonseparable patterns, including allowing some errors in classification. The document discusses solving the optimization problem using quadratic programming and Lagrange multipliers. It also introduces the kernel trick for applying SVMs to non-linear decision boundaries using a kernel function to map data to a higher-dimensional feature space. Examples are provided of applying SVMs to the XOR problem and computer experiments classifying a double moon dataset.
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating HyperplaneShinichi Tamura
The presentation material for the reading club of Element of Statistical Learning by Hastie et al.
The contents of the sections cover
- Properties of logistic regression compared to least square s fitting
- Difference between logistic regression vs. linear discriminant analysis
- Rosenblatt's perceptron algorithm
- Derivation of optimal hyperplane, which offers the basis for SVM
-------------------------------------------------------------------------
研究室での『統計学習の基礎』(Hastieら著)の輪講用発表資料(ぜんぶ英語)です。
担当範囲は
・最小二乗法との類推で見るロジスティック回帰の特徴
・ロジスティック回帰と線形判別分析の比較
・ローゼンブラットのパーセプトロンアルゴリズム
・SVMの基礎となる最適分離超平面の導出
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!idescitation
Sundararajan and Chakraborty [10] introduced a new
version of Quick sort removing the interchanges. Khreisat
[6] found this algorithm to be competing well with some other
versions of Quick sort. However, it uses an auxiliary array
thereby increasing the space complexity. Here, we provide a
second version of our new sort where we have removed the
auxiliary array. This second improved version of the algorithm,
which we call K-sort, is found to sort elements faster than
Heap sort for an appreciably large array size (n 70,00,000)
for uniform U[0, 1] inputs.
Clustering: Large Databases in data miningZHAO Sam
The document discusses different approaches for clustering large databases, including divide-and-conquer, incremental, and parallel clustering. It describes three major scalable clustering algorithms: BIRCH, which incrementally clusters incoming records and organizes clusters in a tree structure; CURE, which uses a divide-and-conquer approach to partition data and cluster subsets independently; and DBSCAN, a density-based algorithm that groups together densely populated areas of points.
1) The document discusses detection and attribution in climate science, which refers to statistical techniques used to identify the contributions of different forcing factors (like greenhouse gases or solar activity) to changes in climate signals over time.
2) It provides context on the history and development of detection and attribution methods, beginning with early work in the 1970s-1990s and more recent Bayesian approaches.
3) A key paper discussed is one by Katzfuss, Hammerling and Smith (2017) that introduced a Bayesian hierarchical model for climate change detection and attribution to help address uncertainties.
The document presents a study on implementing temporal aggregation functions through user defined aggregate (UDA) functions. It discusses aggregating tuples that are adjacent, overlapping, or near each other in timestamp. Algorithms are presented for aggregating adjacent tuples, overlapping tuples, and near by tuples within a threshold. The algorithms perform aggregation by scanning the relation once. Literature on parsimonious temporal aggregation and supporting temporal queries efficiently in databases is also summarized.
The document analyzes the trailer for the movie The Ring. It discusses how the trailer uses dark lighting effects, smoky appearances, and dark clothing to create a grim and dark atmosphere that links to the horror theme. Mirrors and pixilated screens are used to make what will happen unclear and keep the audience feeling anticipated. The contrast of soft music with pixilated TV sounds adds a chilling vibe. Dark lighting on faces and sets adds to the fear and negative feeling. Shots from behind the character make them seem weak and vulnerable.
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...Shrikant Mandlik
The 2015 study has 472 pages, 177 tables and figures. Worldwide markets are poised to achieve significant growth as the cloud computing for utility infrastructure and the tablets and smart phone communications systems make training information more cogent and more available, remaking all sporting everywhere.
Information services will leverage automated process to leverage cloud computing: services The value of sports analytics is the predictive capabilities provided. The best sports teams are the ones using the power of real-time information to their advantage. What to measure? What real time information is the best? Can the players game the analytics systems?
Lets start with the story of Babe Ruth. The “Babe” used to come to every at bat with the desire to win the game. So early in the game, aware that at the end of the game it would fall on him to win the game, the “Babe” would deliberately strike out on pitches that he really could hit. Later in the game, the pitcher would remember the pitches that had gotten the “Babe” out and “Babe Ruth” could hit with ease, winning the game defying the statisticians.
So, Babe Ruth used sports analytics in the 1930’s in reverse, hoping to entice the pitcher to throw that very pitch he could hit in a tight situation later in the game. His very success illustrates that in sports analytics sophistication is needed. For sports analytics to track Babe Ruth, it would have been necessary to look at the pitches he could hit at the end of the game, not just everything that came at him. How sophisticated is that? You have to know your players to do good sports analytics.
Babe Ruth is at the center of one of the sad stories of sporting in Boston. The Boston Red Sox baseball team, in 2003, had not won a world series since Babe Ruth was sold to New York, the so called “Curse of the Bambino.” John Henry, a financial analytics wizard came along and purchased the Boston Red Sox along with other partners and he took the team to three world series using sports analytics as the dominant force for running the team and building fan enthusiasm. Sports become the model for predictive business decision making. Business has been reorganized among teams, inspired by sports. Analytics, developed by businesses are finding innovative use in sports, leading to models for
business to organize and manage teams.
Sports analytics market driving forces relate to the ability to improve winning percentages and decrease the cost of paying players. By implementing metrics functions that describe how to put together a winning team without a very high payroll, sports analytics provide a winning edge to team management. Analytics are used to figure out how a team can improve fan appeal.
Sports analytics are used for creating fantasy leagues, giving sports fantasy players access to statistics that enhances their play of the game. It is used to improve scouting, to detect new player unusual talent and evaluate
Sushant A. Ukande is seeking a project role that allows him to apply his expertise in embedded systems. He has an M.Tech in Embedded Systems from VIT University with a 7.3 CGPA. His academic projects include developing a grain analyzer using near-infrared spectroscopy and creating a wireless remote monitoring system using an ARM Cortex M0. He is proficient in C, VHDL, and embedded C programming. Sushant's qualifications also include qualifying for the GATE exam in 2013 with a score of 345.
Строительная в области строительная компания предоставляющая услуги в области строительства в том числе предоставляет спец. оборудования для строительства.
Este documento describe diferentes tipos de malware, incluyendo troyanos, gusanos, virus de macro, secuestradores y computadoras zombie. Los troyanos ejecutan acciones destructivas cuando se abre un archivo adjunto, los gusanos se replican automáticamente usando partes del sistema operativo, los virus de macro infectan documentos de ofimática, los secuestradores alteran la página de inicio del navegador e impiden cambios, y las computadoras zombie son controladas por terceros para diseminar malware.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise has also been shown to increase gray matter volume in the brain and reduce risks for conditions like Alzheimer's and dementia.
This document provides information about Swapnroop Drugs and Pharmaceuticals, an ISO 9001:2008 certified company based in India that specializes in sales and marketing of pharmaceutical products. It lists their vision as providing affordable, high quality pharmaceutical products to benefit customers. Their mission is to increase their product portfolio and deliver more products to customers under one roof to increase company revenue. They have a variety of pharmaceutical products available and can deliver worldwide using courier services within 4-5 days. The document provides contact information and a list of over 200 API (active pharmaceutical ingredient) products.
Mohamed Atef M. Mostafa passed the EXIN Foundation Bridge in IT Service Management exam according to ISO/IEC 20000 standards on July 11, 2011 with a score of 85%, which was above the required passing score of 65%. The registration number for Mohamed Atef M. Mostafa is 4229272 and he is located in Utrecht.
Don Ravi Weeramanthire is an electrician instrumentation technician with over 20 years of experience working with PLC control systems, SCADA, electrical equipment control systems, conveyor systems, generators, pump control systems, injection molding control systems, and troubleshooting various industrial electrical systems. He has worked in Bahrain, Sri Lanka, Saudi Arabia, and Kuwait on projects ranging from 2014 to 1987 for companies like ATS Group, Inupama Heavy Equipment, Anchor Can Making Manufacturing Company, UnitedArabCan Co-Cacola, Al Pax Marine, SedarGroup, and Nawalokagroup. He has an electrical diploma certificate and professional qualifications in Siemens Simatic
Este documento presenta la tabla TPACK que describe los diferentes tipos de conocimiento necesarios para integrar efectivamente las tecnologías en la enseñanza de un tema específico. La tabla explica los constructos TK, PK, CK, PCK, TPK, TCK y TPACK, y proporciona ejemplos de cada uno en relación al tema de "la ciudad" en el área de ciencias sociales.
A Fast and Inexpensive Particle Swarm Optimization for Drifting Problem-SpacesZubin Bhuyan
Particle Swarm Optimization is a class of stochastic, population based optimization techniques which are mostly suitable for static problems. However, real world optimization problems are time variant, i.e., the problem space changes over time. Several researches have been done to address this dynamic optimization problem using Particle Swarms. In this paper we probe the issues of tracking and optimizing Particle Swarms in a dynamic system where the problem-space drifts in a particular direction. Our assumption is that the approximate amount of drift is known, but the direction of the drift is unknown. We propose a Drift Predictive PSO (DriP-PSO) model which does not incur high computation cost, and is very fast and accurate. The main idea behind this technique is to determine the approximate direction using a small number of stagnant particles in which the problem-space is drifting so that the particle velocities may be adjusted accordingly in the subsequent iteration of the algorithm.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
In this work Predestination of Particles Wavering Search (PPS) algorithm has been applied to solve optimal reactive power problem. PPS algorithm has been modeled based on the motion of the particles in the exploration space. Normally the movement of the particle is based on gradient and swarming motion. Particles are permitted to progress in steady velocity in gradient-based progress, but when the outcome is poor when compared to previous upshot, immediately particle rapidity will be upturned with semi of the magnitude and it will help to reach local optimal solution and it is expressed as wavering movement. In standard IEEE 14, 30, 57,118,300 bus systems Proposed Predestination of Particles Wavering Search (PPS) algorithm is evaluated and simulation results show the PPS reduced the power loss efficiently.
Point Placement Algorithms: An Experimental StudyCSCJournals
The point location problem is to determine the position of n distinct points on a line, up to translation and reflection by the fewest possible pairwise (adversarial) distance queries. In this paper we report on an experimental study of a number of deterministic point placement algorithms and an incremental randomized algorithm, with the goal of obtaining a greater insight into the practical utility of these algorithms, particularly of the randomized one.
A comparison of particle swarm optimization and the genetic algorithm by Rani...Pim Piepers
Particle Swarm Optimization (PSO) and the Genetic Algorithm (GA) are population-based heuristic search methods inspired by biological phenomena. This paper aims to compare the computational effectiveness and efficiency of PSO and GA through hypothesis testing on benchmark problems and engineering design optimization problems. PSO is described as being similar to GA but with lower computational cost. The hypothesis tested is whether PSO has equal effectiveness to GA in finding optimal solutions but with better computational efficiency through fewer function evaluations. Three benchmark problems and two engineering design problems are used to test the performance of PSO and GA.
Efficient steganography techniques are needed for the security of digital information over the Internet and for secret data communication. Therefore, many techniques are proposed for steganography. One of these intelligent techniques is Particle Swarm Optimization (PSO) algorithm. Recently, many modifications are made to Standard PSO (SPSO) such as Human-Based Particle Swarm Optimization (HPSO). Therefore, this paper presents image steganography using HPSO in order to find best locations in image cover to hide text secret message. Then, a comparison is done between image steganography using PSO and using HPSO. Experimental results on six (256×256) cover images and different size of secret massages, prove that the performance of the proposed image steganography using HPSO has been improved in comparison with using SPSO.
This paper proposes a new methodology to
optimize trajectory of the path for multi-robots using
Improved particle swarm optimization Algorithm (IPSO) in
clutter Environment. IPSO technique is incorporated into
the multi-robot system in a dynamic framework, which will
provide robust performance, self-deterministic cooperation,
and coping with an inhospitable environment
an improver particle optmizacion plan de negociosCarlos Iza
This document proposes an improved particle swarm optimization (IPSO) algorithm to optimize trajectory paths for multiple robots in a cluttered environment. IPSO incorporates independent decision making, coordination, and cooperation between robots to accomplish a shared goal. A path planning scheme using IPSO was developed to determine optimal successive robot positions from initial to target locations. Simulation and experiment results show IPSO outperforms particle swarm optimization and differential evolution algorithms with respect to average total trajectory deviation and average uncovered target distance.
A Comparison of Particle Swarm Optimization and Differential Evolutionijsc
Two modern optimization methods including Particle Swarm Optimization and Differential Evolution are compared on twelve constrained nonlinear test functions. Generally, the results show that Differential Evolution is better than Particle Swarm Optimization in terms of high-quality solutions, running time and robustness.
MARKOV CHAIN AND ADAPTIVE PARAMETER SELECTION ON PARTICLE SWARM OPTIMIZERijsc
Particle Swarm Optimizer (PSO) is such a complex stochastic process so that analysis on the stochastic
behavior of the PSO is not easy. The choosing of parameters plays an important role since it is critical in
the performance of PSO. As far as our investigation is concerned, most of the relevant researches are
based on computer simulations and few of them are based on theoretical approach. In this paper,
theoretical approach is used to investigate the behavior of PSO. Firstly, a state of PSO is defined in this
paper, which contains all the information needed for the future evolution. Then the memory-less property of
the state defined in this paper is investigated and proved. Secondly, by using the concept of the state and
suitably dividing the whole process of PSO into countable number of stages (levels), a stationary Markov
chain is established. Finally, according to the property of a stationary Markov chain, an adaptive method
for parameter selection is proposed.
Optimal rule set generation using pso algorithmcsandit
Classification and Prediction is an important resea
rch area of data mining. Construction of
classifier model for any decision system is an impo
rtant job for many data mining applications.
The objective of developing such a classifier is to
classify unlabeled dataset into classes. Here
we have applied a discrete Particle Swarm Optimizat
ion (PSO) algorithm for selecting optimal
classification rule sets from huge number of rules
possibly exist in a dataset. In the proposed
DPSO algorithm, decision matrix approach was used f
or generation of initial possible
classification rules from a dataset. Then the propo
sed algorithm discovers important or
significant rules from all possible classification
rules without sacrificing predictive accuracy.
The proposed algorithm deals with discrete valued d
ata, and its initial population of candidate
solutions contains particles of different sizes. Th
e experiment has been done on the task of
optimal rule selection in the data sets collected f
rom UCI repository. Experimental results show
that the proposed algorithm can automatically evolv
e on average the small number of
conditions per rule and a few rules per rule set, a
nd achieved better classification performance
of predictive accuracy for few classes.
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...IJCSEA Journal
A hybrid learning automata–genetic algorithm (HLGA) is proposed to solve QoS routing optimization problem of next generation networks. The algorithm complements the advantages of the learning Automato Algorithm(LA) and Genetic Algorithm(GA). It firstly uses the good global search capability of LA to generate initial population needed by GA, then it uses GA to improve the Quality of Service(QoS) and acquiring the optimization tree through new algorithms for crossover and mutation operators which are an NP–Complete problem. In the proposed algorithm, the connectivity matrix of edges is used for genotype representation. Some novel heuristics are also proposed for mutation, crossover, and creation of random individuals. We evaluate the performance and efficiency of the proposed HLGA-based algorithm in comparison with other existing heuristic and GA-based algorithms by the result of simulation. Simulation results demonstrate that this paper proposed algorithm not only has the fast calculating speed and high accuracy but also can improve the efficiency in Next Generation Networks QoS routing. The proposed algorithm has overcome all of the previous algorithms in the literature..
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
International Journal of Managing Information Technology (IJMIT)IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph, the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network. SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed. In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
BPSO&1-NN algorithm-based variable selection for power system stability ident...IJAEMSJORNAL
Due to the very high nonlinearity of the power system, traditional analytical methods take a lot of time to solve, causing delay in decision-making. Therefore, quickly detecting power system instability helps the control system to make timely decisions become the key factor to ensure stable operation of the power system. Power system stability identification encounters large data set size problem. The need is to select representative variables as input variables for the identifier. This paper proposes to apply wrapper method to select variables. In which, Binary Particle Swarm Optimization (BPSO) algorithm combines with K-NN (K=1) identifier to search for good set of variables. It is named BPSO&1-NN. Test results on IEEE 39-bus diagram show that the proposed method achieves the goal of reducing variables with high accuracy.
A PARTICLE SWARM OPTIMIZATION ALGORITHM BASED ON UNIFORM DESIGNIJDKP
The Particle Swarm Optimization (PSO) Algorithm is one of swarm intelligence optimization algorithms.
Usually the population’s values of PSO algorithm are random which leads to random distribution of
search quality and search velocity. This paper presents PSO based on uniform design (UD). UD is widely
used in various applications and introduced to generate an initial population, in which the population
members are scattered uniformly over the search space. In evolution, UD is also introduced to replace
some worse individuals. Based on abovementioned these technologies a Particle Swarm Optimization
Algorithm based on Uniform Design (PSO-UD) algorithm is proposed. At last, the performance of PSOUD
algorithm is tested and compared. Tests show that the PSO-UD algorithm faster than standard PSO
algorithm with random populations.
A Connectionist Approach To The Quadratic Assignment ProblemSheila Sinclair
This document describes two connectionist models for solving the quadratic assignment problem (QAP). Model 1 is based on a Boltzmann machine, which uses simulated annealing to escape from local optima. Model 1 had inefficiencies, so the authors developed Model 2, which uses a different connectionist architecture. Computational results show Model 2 performs better than Model 1 on QAP instances of dimension up to 90.
A Connectionist Approach To The Quadratic Assignment Problem
Imecs2012 pp440 445
1. Abstract—In recent years, clustering is still a popular
analysis tool for data statistics. The data structure identifying
from the large-scale data has become a very important issue in
the data mining problem. In this paper, an improved particle
swarm optimization based on Gauss chaotic map for clustering
is proposed. Gauss chaotic map adopts a random sequence with
a random starting point as a parameter, and relies on this
parameter to update the positions and velocities of the particles.
It provides the significant chaos distribution to balance the
exploration and exploitation capability for search process. This
easy and fast function generates a random seed processes, and
further improve the performance of PSO due to their
unpredictability. In the experiments, the eight different
clustering algorithms were extensively compared on six test
data. The results indicate that the performance of our proposed
method is significantly better than the performance of other
algorithms for data clustering problem.
Index Terms—Data Clustering, Particle Swarm
Optimization.
I. INTRODUCTION
lustering technique is the process of grouping from a set
of objects. The objects within a cluster are similar to
each other, but they are dissimilar to objects in other clusters.
The property of clustering helps to identify some inherent
structures that presents in the objects. Clustering reflects the
statistical structure of the overall collection of input patterns
in the data because the subset of patterns and its particular
problem have certain meanings [1]. The pattern can be
represented mathematically a vector in the multi-dimensional
space.
K-means algorithm is a popular clustering technique and it
was successfully applied to many of practical clustering
problems [2]. However, the K-means is not convex and it
may contain many local minima since it suffers from several
drawbacks due to its choice of initializations. Recent
advancements in clustering algorithm introduce the
evolutionary computing such as genetic algorithms [3] and
particle swarm optimization [4, 5]. Genetic algorithms
typically start with some candidate solutions to the
L.Y. Chuang is with the Department of Chemical Engineering, I-Shou
University , 84001 , Kaohsiung, Taiwan (E-mail: chuang@isu.edu.tw)
Y.D. Lin is with the Department of Electronic Engineering, National
Kaohsiung University of Applied Sciences, 80778, Kaohsiung, Taiwan
(E-mail: e0955767257@yahoo.com.tw)
C.H. Yang is with the Department of Electronic Engineering, National
Kaohsiung University of Applied Sciences, 80778, Kaohsiung Taiwan
(phone: 886-7-3814526#5639; E-mail: chyang@cc.kuas.edu.tw). He is also
with the Network Systems Department, Toko University, 61363, Chiayi,
Taiwan. (E-mail: chyang@cc.kuas.edu.tw).
optimization problem and these candidates evolve towards a
better solution through selection, crossover and mutation.
The concept of PSO was designed to simulate social behavior
which major property is information exchange and in
practical applications. Many studies used PSO to cluster data
within multi-dimensional space and obtained the outstanding
results. However, the rate of convergence is insufficient
when it searches global optima. Fan et al., [6] proposed to
combine Nelder–Mead simplex search method with PSO, the
rationale behind it being that such a hybrid approach will
enjoy the merits of both PSO and Nelder–Mead simplex
search method. Kao et al., explore the applicability of the
hybrid K-means algorithm, Nelder-Mead simplex search
method, and particle swarm optimization (K–NM–PSO) to
clustering data vectors [7].
PSO adopts a random sequence with a random starting
point as a parameter, and relies on this parameter to update
the positions and velocities of the particles. However, PSO
often leads to premature convergence, especially in complex
multi-peak search problems such clustering of
high-dimensional. We combined the Gauss chaotic Map and
particle swarm optimization, named GaussPSO. Results of
the conducted experimental trials on a variety of data sets
taken from several real-life situations demonstrate that
proposed GaussPSO is superior to the K-means, PSO,
NM-PSO, K-PSO, and K-NM-PSO algorithms [7].
II.METHOD
A. Particle Swarm Optimization (PSO)
The original PSO method [8] is a population-based
optimization technique, where a population is called a swarm.
Every particle in swarm is analogous to an individual “fish”
in a school, and it can be seemed a swarm consists of N
particles moving around a D-dimensional search space.
Every particle makes use of its own memory and knowledge
gained by the swarm as a whole to find the best solution. The
pbesti is introduced as the best previously visited position of
the ith particle; it is denoted as pi = (pi1, pi2, …, piD). The gbest
is the global best position of the all individual pbesti values; it
is denoted as the g = (g1, g2, …, gD). The position of the ith
particle is represented by xi = (xi1, xi2, …, xiD), x∈ (Xmin,
Xmax)D
and its velocity is represented as vi = (vi1, vi2, …, viD) ,
v∈[Vmin, Vmax]D
. The position and velocity of the ith particle
are updated by pbesti and gbest in the each generation. The
update equations can be formulated as:
( ) ( )old
idd
old
idid
old
id
new
id xgbestrcxpbestrcvwv −××+−××+×= 2211 (1)
An Improved Particle Swarm Optimization
for Data Clustering
Li-Yeh Chuang, Yu-Da Lin, and Cheng-Hong Yang, Member, IAENG
C
Proceedings of the International MultiConference of Engineers and Computer Scientists 2012 Vol I,
IMECS 2012, March 14 - 16, 2012, Hong Kong
ISBN: 978-988-19251-1-4
ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
IMECS 2012
2. new
id
old
id
new
id vxx += (2)
where r1 and r2 are random numbers between (0, 1); c1 and c2
control how far a particle will move in once generation; new
idv
and old
idv denote respectively the velocities of the new and old
particle; old
idx is the current particle position; new
idx is a updated
particle position. The inertia weight w controls the impact of
the previous velocity of a particle on its current one; w is
designed to replace Vmax and adjust the influence of previous
particle velocities on the optimization process. For
high-performance problem, a suitable tradeoff between
exploration and exploitation is essential. One of the most
important considerations in PSO is how to effectively
balance the global and local search abilities of the swarm,
because the proper balance of global and local search over
the entire run is critical to the success of PSO [9]. In general,
the inertia weight decreases linearly from 0.9 to 0.4
throughout the search process [10]. The respective equation
can be written as:
( ) min
max
max
minmax w
Iteration
IterationIteration
www i
LDW +
−
×−= (3)
where wmax is 0.9, wmin is 0.4 and Iterationmax is the maximum
number of allowed iterations.
B. Gauss chaotic Map Particle Swarm Optimization
(GaussPSO)
Gauss chaotic map is similar to the quadratic
transformation in the sense that it allows a complete analysis
of its qualitative and quantitative properties of chaos. It
provides the continued fraction expansion of numbers, which
is an analogy to the shift transformation corresponding to the
quadratic iterator. This shift transformation can be satisfied
the properties of chaos ─ dense periodic points, mixing and
sensitivity [11]. We used these characteristics on Gauss
chaotic map and adaptive action to avoid entrapment of the
PSO in a local optimum.
In PSO, the parameters w, r1 and r2 are the key factors
affecting the convergence behavior of the PSO. The r1 and r2
control the balance between the global exploration and the
local search ability. An inertia weight w that linearly decrease
from 0.9 to 0.4 throughout the search process is usually
adopted [10]. Additionally, Gauss chaotic map is frequently
used chaotic behavior maps and chaotic sequences can be
quickly generated and easily stored, it is no need for storage
of long sequences. In Gauss chaotic map PSO (GaussPSO),
sequences generated by the Gauss chaotic map substitute the
random parameters r1 and r2 in PSO. The parameters r1 and r2
are modified based on the following equation.
⎪⎩
⎪
⎨
⎧
∈=
=
=
)1,0()(,1mod
1
)
1
(
0)(,0
)(
xGr
xx
Frac
xGr
xGr (4)
The velocity update equation for GaussPSO can thus be
formulated as:
( )
( )old
idd
old
idid
old
id
new
id
xgbestGrc
xpbestGrcvwv
−××+
−××+×=
22
11 (5)
where Gr is a function based on the results of the Gauss
chaotic map with values between 0.0 and 1.0. The
pseudo-code of the GaussPSO is shown below.
GaussPSO Pseudo-Code
01: Begin
02: Initial particle swarm
03: While (number of iterations, or the stopping criterion is not met)
04: Evaluate fitness of particle swarm
05: For n = 1 to number of particles
06: Find pbest
07: Find gbest
08: For d = 1 to number of dimension of particle
09: Update the position of particles by equations 5 and 2
10: Next d
11: Next n
12: Update the inertia weight value by equation 3
13: Update the value of Gr by equation 4
14: Next generation until stopping criterion
15: End
C. The application of the PSO algorithm
a) Initial particle swarm
The 3×N particles are randomly generated with an
individual position and velocity in the solution space. The
generated position for the ith particle is defined as xi (xi∈{xi1,
xi2, …, xin}) and the velocity is defined as vi (vi ∈{vi1, vi2, …,
vin}), where n is the number of particle. Every particle is
composed of K center positions for each cluster, where K is
the anticipated number of clusters. N is computed as follow:
N =K×d (6)
where d is the data set dimension. For example, a possible
encoding of a particle for a two-dimensional problem with
three clusters is illustrated in Fig. 1. The three cluster centers
in this particle Xi are randomly generated as X1= (2.5, 2.7, 4.5,
5, 1.2, 2.2) and the particle dimension is N = 6, i.e., K=3, d=2
and the population size is 18.
Fig. 1. Encoding of particles in PSO
Proceedings of the International MultiConference of Engineers and Computer Scientists 2012 Vol I,
IMECS 2012, March 14 - 16, 2012, Hong Kong
ISBN: 978-988-19251-1-4
ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
IMECS 2012
3. b) Grouping the data vectors for every particle
The all data set are grouped into K clusters according to the
data vectors on the basis of the Euclidean distance as the
similar measurement. A matrix xi = (C1, C2, …, Cj, .., CK),
where Cj represents the jth cluster centroid vector and K is the
number of clusters, is calculated the distance as the length
between the data vector and the centroid vector of the
respective cluster in every particle, the calculation is
described in equation 7. For each data vector, it is assigned to
the cluster with the shortest distance.
∑=
−=⋅
d
i
jipijp zxzx
1
2
)()D( (7)
∑
∈∀
1
=
jp cx
p
j
j x
n
z (8)
c) Fitness evaluation of each particle
The fitness value of each particle is computed by the
following fitness function. The fitness value is the sum of the
intra-cluster distances of all clusters. This sum of distance has
a profound impact on the error rate.
njKiZXfitness ij ,,1,,,1, KK ==−= ∑ (9)
where K and n are the numbers of clusters and data sets,
respectively. Zi is the cluster center i and Xj is the data point j.
d) Update pbest and gbest
In each of the iteration, each particle will compare its
current fitness value with the fitness value of its own pbest
solution and the fitness value of the population’s gbest
solution. The pbest and gbest values are updated if the new
values are better than the old ones. If the fitness value of each
particle Xi in the current generation is better than the previous
pbest fitness value, then both of the position and fitness value
of pbest will be updated as Xi. Similarly, if the fitness value
of pbest in the current generation is better than previous gbest
fitness value, then both of the position and fitness value of
gbest will be updated as Xi.
III. RESULT AND DISCUSSION
A.Parameter settings
In an experiments, the iteration was set to 1000 and the
population size was set to 50. The acceleration parameters
were for PSO were set to c1=c2=2. Vmax was equal to (Xmax –
Xmin) and Vmin was equal to – (Xmax – Xmin) [8]. The results are
the averages of 50 simulation runs. For each run, 10 × N
iterations were carried out for each of the six data sets in
every algorithm when solving an N-dimensional problem.
The criterion 10 × N was adopted in many previous
experiments with a great success in terms of its effectiveness
[7].
B. Data sets
Six experimental data sets, i.e., Vowel, Iris, Crude oil,
CMC, Cancer, and Wine are used to test the qualities of the
respective clustering methods. These data sets represent
examples of data with low, medium and high dimensions. All
data sets are available at
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/.
Table I summarizes the characteristics of these data sets.
Given is a data set with three features that are grouped into
two clusters. The number of parameters are optimized in
order to find the two optimal cluster center vectors that are
equal to the product of the number of clusters and the number
of features as N = k × d = 2 × 3 = 6. The six real-life data sets
are described below:
(1) The Vowel data set (n = 871, d = 3, k = 6) consists of 871
Indian Telugu vowel sounds. It includes the three features
corresponding to the first, second and third vowel
frequencies, and six overlapping classes {d (72 objects), a
(89 objects), i (172 objects), u (151 objects), e(207
objects), o (180 objects)}.
(2) Fisher’s iris data set (n = 150, d = 4, k = 3) consists of the
three different species of iris flowers: iris setosa, iris
virginica and iris versicolour. For each species, 50
samples were collected from the four features that are
sepal length, sepal width, petal length and petal width.
(3) The Crude oil data set (n = 56, d = 5, k = 3) consists of 56
objects are characterized by five features: vanadium, iron,
beryllium, saturated hydrocarbons, and aromatic
hydrocarbons. Three crude-oil samples were collected
from the three zones of sandstone (Wilhelm has 7 objects,
Sub-Mulnia has 11 objects, and Upper has 38 objects).
(4) The Contraceptive Method Choice (denoted CMC with n
= 1473, d = 9, k = 3) consists of a subset of the 1987
National Indonesia Contraceptive Prevalence Survey.
The samples consist of the married women who were
either not pregnant or not sure of their pregnancy at the
time the interviews were conducted. It predicts the choice
of the current contraceptive method (no contraception has
629 objects, long-term methods have 334 objects, and
short-term methods have 510 objects) of a woman based
on her demographic and socioeconomic characteristics.
(5) The Wisconsin breast cancer data set (n = 683, d = 9, k = 2)
consists of 683 objects characterized by nine features:
clump thickness, cell size uniformity, cell shape
uniformity, marginal adhesion, single epithelial cell size,
bare nuclei, bland chromatin, normal nucleoli and mitoses.
There are two categories in the data; malignant tumors
(444 objects) and benign tumors (239 objects).
(6) The Wine data set (n = 178, d = 13, k = 3) consists of 178
objects characterized by 13 features: alcohol content,
malic acid content, ash content, alkalinity of ash,
concentration of magnesium, total phenols, flavanoids,
nonflavanoid phenols, and proanthocyanins, and color
intensity, hue and OD280/OD315 of diluted wines and
pralines. These features were obtained by chemical
analysis of wines that are produced in the same region in
Italy but derived from three different cultivars. The
quantities of objects in the three categories of the data sets
are: class 1 (59 objects), class 2 (71 objects), and class 3
(48 objects).
Proceedings of the International MultiConference of Engineers and Computer Scientists 2012 Vol I,
IMECS 2012, March 14 - 16, 2012, Hong Kong
ISBN: 978-988-19251-1-4
ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
IMECS 2012
4. Table I
SUMMARY OF THE CHARACTERISTICS OF THE CONSIDERED DATA SETS
Name of
data set
Number of
classes
Number of
features
Size of data set
(size of classes in
parentheses)
Vowel 6 3
871 (72, 89, 172,
151, 207, 180)
Iris 3 4 150 (50, 50, 50)
Crude Oil 3 5 56 (7, 11, 38)
CMC 3 9
1473 (629, 334,
510)
Cancer 2 9 683 (444, 239)
Wine 3 13 178 (59, 71, 48)
C.Test for statistical significance
Results from GaussPSO was compared with the other
methods, i.e., K-means, GA, KGA, PSO, NM-PSO, K-PSO,
and K-NM-PSO, to demonstrate the capability of data
clustering. The quality of the respective clustering was
measured by the following four criteria:
(1) The sum of the intra-cluster distances: The distances
between data vectors within a cluster and the centroid of the
cluster are defined in equation 7, and a higher quality of
clustering represents that the sum is relatively small.
(2) Error rate: The numbers of misplaced points are divided
by the total number of points, as shown in equation 10:
100
,0
≠,1
∑
1
×⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
⎭
⎬
⎫
⎩
⎨
⎧
=
=
=
n
i ii
ii
BA
BA
error (10)
where n denotes the total number of points. Ai and Bi denote
the data sets of which the ith point is a member before and
after of clustering. In Table II an example is shown by the
two data points (2.5, 4.5) and (7.5, 5.5) are out of clusters, 1
and 2 are misplaced and the error rate is 2/5, i.e., 40%.
Table II
ERROR RATE CALCULATIONS
i Data point Ai Bi Non-misplaced (0) /
misplaced (1)
1 (4.0, 5.0) 2 2 0
2 (2.5, 4.5) 1 2 1
3 (4.5, 3.5) 2 2 0
4 (7.5, 5.5) 1 2 1
5 (5.0, 6.0) 2 2 0
D.Experimental Results and Discussion
In this section, the performances of GaussPSO, and other
proposed methods from 20 runs simulations are compared by
means of the best fitness values and the standard deviation
among six data sets. Table III summarizes the intra-cluster
distances and error rates obtained from the eight clustering
algorithms from the six data sets.
The test results are clearly shown that the PSO
outperforms the GA method, independent of whether the
average intra-cluster distance or best intra-cluster distance is
measured. For K-PSO compare with KGA, K-PSO still leads
KGA, however, PSO offers better optimized solutions than
GA with or without integration of the K-means method. For
the all data sets, the averages and standard deviation of the
GaussPSO is better than the ones for K-PSO and K-NM-PSO,
in which K-PSO is a hybrid of the K-means and PSO
algorithm, and K-NM-PSO is a hybrid of the K-means,
Nelder–Mead simplex search [12] and PSO. Please note that
in terms of the best distance, PSO, NM-PSO, K-PSO and
K-NM-PSO all have a larger standard deviation than
GaussPSO, even though they may achieve a global optimum.
This means that PSO, NM-PSO, K-PSO, K-NM-PSO are
weaker search tools for global optima than GaussPSO if all
algorithms are executed just once. It follows that GaussPSO
are more efficient in finding the global optimum solution
than the other four PSO methods. For the error rates, standard
deviations of the error rates and the best solution of the error
rates from the 20 simulation runs. Table IV lists the number
of objective function evaluations required by the seven
methods after 10 × N iterations. K-means algorithm has
fewest function evaluations on all data sets, but its results are
less than satisfactory, as seen in Table III. GaussPSO is the
same function evaluations, and they are fewer than PSO,
NM-PSO, K-PSO and K-NM-PSO in terms of an average.
E.Advantage of the Gauss chaotic map algorithm
The Gauss chaotic map is a very powerful tool for
avoiding entrapment in local optima, besides it does not
increase the complexity. The computational complexity for
GaussPSO and PSO can be derived as O(PG), where P is the
population size and G is the number of generations. In
equation 5, we can observe that the chaotic map is only used
to amend the PSO updating equation.
The standard PSO, together with each individual and the
whole population, evolves towards best fitness, in which the
fitness function is evaluated with the objective function.
Although this scheme has the property to increase the
convergence capability, i.e., to evolve the population toward
better fitness, but the convergence speed is too fast, the
population may get stuck in a local optimum, since the
swarms diversity rapidly decreases. On the other hand, it
cannot be searched arbitrarily slowly if we want PSO to be
effective.
Gauss chaotic map is a non-linear system with ergodicity,
stochastic and regularity properties, and is very sensitive to
its initial conditions and parameters. Consequently, the
efficiency of GaussPSO is better than the standard PSO
because of the chaotic property, i.e., small variation in an
initial variable will result in huge difference in the solutions
after some iteration. Since chaotic maps are frequently used
chaotic behavior maps and the chaotic sequences can be
quickly generated and easily stored, there is no need for
storage of long sequences [11, 13].
Summary all the evidence gathered in the simulations
illustrates that GaussPSO converges to global optima with
fewer function evaluations and a smaller error rate than the
other algorithms, which naturally leads to the conclusion that
GaussPSO is a viable and robust technique for data
clustering.
Proceedings of the International MultiConference of Engineers and Computer Scientists 2012 Vol I,
IMECS 2012, March 14 - 16, 2012, Hong Kong
ISBN: 978-988-19251-1-4
ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
IMECS 2012
5. IV. CONCLUSION
The novel method GaussPSO is introduced to solve the
data clustering problems. This study used the six public
recognizable UCI data sets to investigate the performance
through our experiments. We uses minimum intra-cluster
distances as a metric to search robustly data cluster centers in
N-dimensional Euclidean space. The experimental results
demonstrate that our proposed clustering algorithm reaches a
minimal error rate and are possessed of the fastest
convergence and the highest stabilities of results.
Acknowledgment
This work is partly supported by the National Science
Council in Taiwan under grant NSC97-2622-E-151-
008-CC2.
REFERENCES
[1] A. K. Jain, M. N. Murty, and P. J. Flynn, "Data
clustering: A review," Acm Computing Surveys, vol.
31, pp. 264-323, 1999.
[2] S. Z. Selim and M. A. Ismail, "K-Means-Type
Algorithms: A Generalized Convergence Theorem
and Characterization of Local Optimality," IEEE
Transactions on Pattern Analysis and Machine
Intelligence, vol. 6, pp. 81-87, 1984.
[3] K. J. Kim and H. Ahn, "A recommender system
using GA K-means clustering in an online shopping
market," Expert Systems with Applications, vol. 34,
pp. 1200-1209, 2008.
[4] S. Paterlini and T. Krink, "Differential evolution
and particle swarm optimisation in partitional
clustering," Computational Statistics & Data
Analysis, vol. 50, pp. 1220-1247, 2006.
[5] S. Rana, S. Jasola, and R. Kumar, "A review on
particle swarm optimization algorithms and their
applications to data clustering," Artificial
Intelligence Review, vol. 35, pp. 211-222, 2011.
[6] S. K. S. Fan, Y. C. Liang, and E. Zahara, "Hybrid
simplex search and particle swarm optimization for
the global optimization of multimodal functions,"
Engineering Optimization, vol. 36, pp. 401-418,
2004.
[7] Y. T. Kao, E. Zahara, and I. W. Kao, "A hybridized
approach to data clustering," Expert Systems with
Applications, vol. 34, pp. 1754-1762, 2008.
[8] J. Kennedy and R. Eberhart, "Particle swarm
optimization," in IEEE International Joint
Conference on Neural Network. vol. 4, pp.
1942-1948, 1995.
[9] Y. Shi and R. C. Eberhart, "Fuzzy adaptive particle
swarm optimization," in In Proceedings of the 2001
congress on evolutionary computation, pp. 101-106,
2001.
[10] Y. Shi and R. C. Eberhart, "Empirical study of
particle swarm optimization," in In Proceedings of
Congress on Evolutionary Computation,
Washington, D.C., pp. 1945-1949, 2002.
[11] Heinz-Otto Peitgen, Hartmut Jürgens, and D. Saupe,
Chaos and fractals: new frontiers of science New
York: Springer, 2004.
[12] J. A. Nelder and R. Mead, "simplex method for
function minimization," Computer Journal, vol. 7,
pp. 308-313, 1965.
[13] H. J. Gao, Y. S. Zhang, S. Y. Liang, and D. Q. Li,
"A new chaotic algorithm for image encryption,"
Chaos Solitons & Fractals, vol. 29, pp. 393-399,
2006.
Proceedings of the International MultiConference of Engineers and Computer Scientists 2012 Vol I,
IMECS 2012, March 14 - 16, 2012, Hong Kong
ISBN: 978-988-19251-1-4
ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
IMECS 2012
6. TABLE III
COMPARISON OF INTRA-CLUSTER DISTANCES AND ERROR RATES FOR GAUSSPSO, K-MEANS, GA, KGA, PSO, NM-PSO, K-PSO, AND K-NM-PSO
Data set Criteria
Method
K-means GA KGA PSO NM-PSO K-PSO K-NM-PSO GaussPSO
Vowel
Average
Std
Best
159242.87
916
149422.26
390088.24
N/A
383484.15
149368.45
N/A
149356.01
168477.00
3715.73
163882.00
151983.91
4386.43
149240.02
149375.70
155.56
149206.10
149141.40
120.38
149005.00
149015.50
120.67
148967.20
Error
rates (%)
Std
Best
44.26
2.15
42.02
N/A
N/A
N/A
N/A
N/A
N/A
44.65
2.55
41.45
41.96
0.98
40.07
42.24
0.95
40.64
41.94
0.95
40.64
42.10
1.59
39.84
Iris
Average
Std
Best
106.05
14.11
97.33
135.40
N/A
124.13
97.10
N/A
97.10
103.51
9.69
96.66
100.72
5.82
96.66
96.76
0.07
96.66
96.67
0.008
96.66
96.66
6.551E-4
96.66
Error
rates (%)
Std
Best
17.80
10.72
10.67
N/A
N/A
N/A
N/A
N/A
N/A
12.53
5.38
10.00
11.13
3.02
8.00
10.20
0.32
10.00
10.07
0.21
10.00
10.00
0.00
10.00
Crude
Oil
Average
Std
Best
287.36
25.41
279.20
308.16
N/A
297.05
278.97
N/A
278.97
285.51
10.31
279.07
277.59
0.37
277.19
277.77
0.33
277.45
277.29
0.095
277.15
277.23
3.465E-2
277.21
Error
rates (%)
Std
Best
24.46
1.21
23.21
N/A
N/A
N/A
N/A
N/A
N/A
24.64
1.73
23.21
24.29
0.75
23.21
24.29
0.92
23.21
23.93
0.72
23.21
26.43
0.71
25
CMC
Average
Std
Best
5693.60
473.14
5542.20
N/A
N/A
N/A
N/A
N/A
N/A
5734.20
289.00
5538.50
5563.40
30.27
5537.30
5532.90
0.09
5532.88
5532.70
0.23
5532.40
5532.18
4.055E-5
5532.18
Error
rates (%)
Std
Best
54.49
0.04
54.45
N/A
N/A
N/A
N/A
N/A
N/A
54.41
0.13
54.24
54.47
0.06
54.38
54.38
0.00
54.38
54.38
0.054
54.31
54.38
0.00
54.38
Cancer
Average
Std
Best
2988.30
0.46
2987
N/A
N/A
N/A
N/A
N/A
N/A
3334.60
357.66
2976.30
2977.70
13.73
2965.59
2965.80
1.63
2964.50
2964.70
0.15
2964.50
2964.39
8.21E-6
2964.39
Error
rates (%)
Std
Best
4.08
0.46
3.95
N/A
N/A
N/A
N/A
N/A
N/A
5.11
1.32
3.66
4.28
1.10
3.66
3.66
0.00
3.66
3.66
0.00
3.66
3.51
0.00
3.51
Wine
Average
Std
Best
18061.00
793.21
16555.68
N/A
N/A
N/A
N/A
N/A
N/A
16311.00
22.98
16294.00
16303.00
4.28
16292.00
16294.00
1.70
16292.00
16293.00
0.46
16292.00
16292.68
0.66
16292.18
Error
rates (%)
Std
Best
31.12
0.71
29.78
N/A
N/A
N/A
N/A
N/A
N/A
28.71
0.27
28.09
28.48
0.27
28.09
28.48
0.40
28.09
28.37
0.27
28.09
28.31
0.28
28.09
The results of K-means, GA, KGA, PSO, NM-PSO, K-PSO, K-NM-PSO can be found in [7].
TABLE IV
NUMBER OF FUNCTION EVALUATIONS OF EACH CLUSTERING ALGORITHM
Data set
Method
K-means PSO NM-PSO K-PSO K-NM-PSO GaussPSO
Vowel 180 16,290 10,501 15,133 9,291 9,774
Iris 120 7,260 4,836 6,906 4,556 4,356
Crude Oil 150 11,325 7,394 10,807 7,057 6,795
CMC 270 36,585 23,027 34,843 21,597 21,951
Cancer 180 16,290 10,485 15,756 10,149 9,774
Wine 390 73,245 47,309 74,305 46,459 45,747
Average 215 26,833 17,259 26,292 16,519 16,400
The results of K-means, PSO, NM-PSO, K-PSO, K-NM-PSO can be found in [7].
Proceedings of the International MultiConference of Engineers and Computer Scientists 2012 Vol I,
IMECS 2012, March 14 - 16, 2012, Hong Kong
ISBN: 978-988-19251-1-4
ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)
IMECS 2012