International Journal of Managing Information Technology (IJMIT) Vol.6, No.3, August 2014 
EXTENDED PSO ALGORITHM FOR 
IMPROVEMENT PROBLEMS K-MEANS 
CLUSTERING ALGORITHM 
Amin Rostami1and Maryam Lashkari2 
1Department of Computer Engineering, Ferdows Branch, Islamic Azad University, 
Ferdows, Iran. 
2Department of Computer Engineering, Ferdows Branch, Islamic Azad University, 
Ferdows, Iran. 
ABSTRACT 
The clustering is a without monitoring process and one of the most common data mining techniques. The 
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a 
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering 
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30 
year it is still very popular among the developed clustering algorithm and then for improvement problem of 
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO. 
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the 
problem’s optimal answer. The probe of results show that mooted algorithm have better performance 
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality 
of clustering. 
KEYWORDS 
Clustering, Data Mining, Extended chaotic particle swarm optimization, K-means algorithm. 
1. INTRODUCTION 
Nowadays, usage of data mining observe in most of science, visibly. It’s obvious that if don’t 
prepared suitable bedfast for use of this science, we will be away from achieved progress. 
Clustering is one of the most common data mining tools. That use in most case such as: 
engineering, data mining, medical science, social science and other items. As for clustering very 
applications, need to clustering and data mining is necessary in most field for further progress. 
First time idea of clustering represent in 1935 year and nowadays because of progresses and huge 
mutation most of researchers pay attention to clustering. The clustering is process of collection 
grouping form without label’s data. 
That inner members have most similar to each other in a cluster and least similar to regard as 
other cluster’s members. So, clustering is more ideal when two inner cluster likeness factor is 
maximum and outside cluster likeness is least. There is other criterions, Such as: Euclidean 
distance, hamming, for determination level of sample’s likeness to each other. That every 
criterion have further usage in special field. 
DOI : 10.5121/ijmit.2014.6302 17
International Journal of Managing Information Technology (IJMIT) Vol.6, No.3, August 2014 
Purpose’s function is convex and non-linear in most clustering problems [1]. It’s possible that 
algorithm place in trap of local optimal and produce the problem’s optimal answer. There are 
several clustering algorithm that grouping to following kinds. Hierarchical clustering algorithm, 
partition, density based, model and graph based that each of them are more effective regard as 
other algorithm in special data environment. In all of this algorithms, researchers try to balanced, 
control or improve parameters to be more effective algorithm that consist of: 
-high measurement,- having ability to work with high dimension – having ability to dynamic data 
clustering, - having ability to work with high distance of problem,- having least need to additional 
knowledge about problem,- suitable management from noises, and interpretable clusters. 
Partition clustering algorithm is one of the most common and most applied from clustering 
algorithm. That specific data collection to specified partition’s number. So that samples in every 
partition have most similar to each other in a cluster and most difference with samples in other 
clusters .K-means algorithm of is the famous clustering algorithm in this field [6]. And it’s one of 
the favorite center-pivot clustering algorithms in clustering technique. K-means start with 
Initialization to cluster’s centers and other things with regard to Euclidean distance criterion 
allocate to one cluster that have least distance to cluster’s centers. In every algorithm repetition, 
perform two chief phase. First, every item in data collection allocate to a cluster that have least 
distance from cluster’s centers. In continue, after that spots grouping to K cluster the new 
cluster’s centers calculate by estimate average from samples of every cluster. And algorithm 
repeat. The temporal algorithm finish to there is any change in calculation of cluster’s centers and 
or finish the repetition special number [9]. In this algorithm purpose function error square series 
that goal is to reach a minimum it, that show in equation (1). X indicate to cluster’s samples and 
C indicate to cluster’s center. 
18 
∅ = 
∈min||	 − ||
(1) 
Advantages of K-means algorithm: 
Ease of implementation and high-speed performance, measurable and efficient in large data 
collection. 
Disadvantages and problems of k-means algorithm: 
1- Selection of the first cluster’s centers and number of cluster do by user. For this reason 
clustering results is dependent to first algorithm’s selection and if first algorithm’s 
condition don’t be suitable, it’s possible algorithm place in trap of local optimal. 
2- Selection number of optimal cluster for problem is difficult. 
3- This algorithm, because of calculation average from cluster samples for determination 
cluster’s center, have weak management regard as noises and data. 
4- This algorithm can’t be usable in data collection that calculation average is not 
describable. 
5- Data clustering is not usable with different forms and density. 
In continue we will express techniques for improvement problems that usually this techniques 
focus on 3 issue that are: 
1- Determination way of selection first parameters. 
2- Alternation in basic algorithm. 
3- Combination clustering algorithm with other initiative algorithms.
International Journal of Managing Information Technology (IJMIT) Vol.6, No.3, August 2014 
And then new solution posed for improvement problems of placing k-means algorithm’s result in 
local optimal and validation evaluate by using 3 real data collection and some indexes and finally, 
we will have brief comparison from posed techniques. Continue of article organized in following 
from: 
19 
1. Related works 
2. Analysis and comparison posed algorithms. 
3. The proposed method. 
4. Simulation. 
5. Conclusion 
2. RELATED WORKS 
K-medidos clustering algorithm: 
This algorithm[1,3] for resolving problem of noises weak management in k-means algorithm and 
also in perform in case that evaluation average for data collection is non describable. The idea 
that posed in this algorithm is contemplate most central sample as cluster’s center in every cluster 
rather that selection data’s average of one cluster. 
Disadvantage: algorithm’s temporal complexity is high and it isn’t suitable and efficient for large 
data collection. Result of clustering is sensitive to first condition of algorithm and determination 
optimal K is difficult for problem. 
CLARA clustering algorithm: 
For solving the problem of k-medidos algorithm in large data sets, that is, high temporal 
complexity posed CLARA algorithm [2]. In this algorithm solve the problem of k-medidos 
algorithm, temporal complexity in large data collection, but there is a problem. Suppose , n is 
total number of samples and m is most number of samples that this way of clustering can process 
in objective time . If nm, often clustering from several small sample of data cause that 
eliminate some of the data in same groups. 
K-modes clustering algorithm: 
For clustering nominal data, k-means algorithm isn’t suitable for this reason, posed generalized 
way of k-means algorithm, k-means. In this algorithm [3] rather that evaluation average, we use 
of mode every cluster as cluster’s centroid and also algorithm first parameters such as k-means, 
selected randomly for this reason alongside advantage, be suitable for nominal data, it’s possible 
that algorithm’s result place in local optimal and also be suitable only for nominal data and it’s 
not efficient for numerical data. 
Particles Swarm Optimization Clustering algorithm: 
As we said before one of the problem of k-means algorithm is placing algorithm’s result in trap of 
local optimal, because of algorithm local search in region of problem. Clustering algorithm based 
on PSO algorithm problem posed [4]. For elimination of this clustering algorithm based on pso 
have better operation rather than k-means algorithm with few dimension for data collection and 
there is more probability for get all over optimal answer rather than k-means algorithm because of 
all over research in region of problem but use of pso algorithm lead to much repetitions and slow 
convergence for data with high volume. For this reason, we often combine this 2 algorithm with 
each other to be complement and they cover weakness each other.
International Journal of Managing Information Technology (IJMIT) Vol.6, No.3, August 2014 
20 
(2) 
(3) 
(4) 
Chaotic particle swarm optimization clustering algorithm: 
Two main problem of clustering using PSO method is the convergence to local optimal and slow 
convergence velocity, which is tried to be solved by using two ideas of chaos theory and 
acceleration strategy . In the formula of updating velocity of the cluster centers that is mentioned 
in the (2) updating is done for each particle for relocating the particle to the new position, from 
the best answer for each particle (Pbest) and the best global solution so far (gbest) . In which W 
Inertia coefficient rate tends to previous velocity of the particle, c1 rates tends to the local best 
position of the particle, and c2 trends to the best global position of the particle [5]. 
In (3) replacing cr instead of rr improves PSO algorithm as given: 
  =           !# − 	 $
% !# − 	 $ 
  =           !# − 	 $
− $  % !# − 	 $ 
'() = *  '($   − '($$ 
In (4), Cr random value is created for each round independently between 0 and 1.which 
substitutes both r1 and r2, and parameter k is the number of predicted clusters. Using the chaos 
theory in PSO population generation will result in more diverse of the algorithm. 
Figure1. Chaos map [5] 
As can be see in Figure 1. To achieve more optimal particle swarm optimization algorithm, chaos 
theory is applied And in other change to increase the rate of convergence used acceleration 
strategy therefore in this mode a number of the population which are the best toward the target - 
move not all population that it increases the rate of convergence [5]. 
Genetic clustering algorithm: 
In this algorithm [7] for exit from trap of local optimal in k-means algorithm we use of genetics 
optimization algorithm for better data clustering. Because of evolutionary algorithm, such as 
genetics, have ability for global search in answer, use of them for clustering, decrease probability 
from placing answer of algorithm in local optimal. And finally produce more optimal answer for 
clustering. 
Ant colony clustering algorithm: 
Ant colony clustering algorithm [9] is pivot population innovative algorithm that used for solving 
problem of optimization, such as: clustering. This algorithm is capable to produce optimal answer 
with high speed in clusters and with complex forms rather than other innovative algorithm. This
International Journal of Managing Information Technology (IJMIT) Vol.6, No.3, August 2014 
algorithm1- for better data clustering and reach to all over optimal answer with more probability 
of k-means algorithm use.2- Of ant colony algorithm for data clustering process. 
21 
K-mica compound clustering algorithm: 
This algorithm [11] is combination of colonial competition algorithm with k-means clustering 
algorithm. In this algorithm after production of primary population, randomly, k-means algorithm 
perform on available data with distinct numbers. Then obtained final cluster’s centers consider as 
primary population of colonial competition algorithm that is imperialists and perform clustering 
on them based on extended colonial competition algorithm and allocate colony to suitable 
imperialists and clustering perform over data based on extended competition algorithm and 
allocate colonies to suitable colonialisms. 
Four hybrid strategies for combination continuous ant colony optimization with PSO 
algorithm for utilizing in clustering process: 
This algorithm [12] posed 4 hybrid strategies for combination PSO algorithm. Their examinations 
show that utilizing hybrid strategies for clustering is so better than independent utilizing of k-means, 
PSO, ACOR algorithm for clustering process. 
Four Hybrid strategies that used by them are: 
1: Series combination of 2 algorithm PSO, ACOR 
2: Parallel combination of 2 algorithm PSO, ACOR 
3: Series combination of 2 algorithm with one extended chart from pheromone-Particles 
4: Substitution global best between 2 algorithm. 
3. ANALYSIS AND COMPARISON OF ALGORITHM 
As you see, we posed and checked several strategies and algorithms for elimination problems and 
challenges of k-means clustering algorithm each of discussed algorithms have advantages and 
disadvantages. Some of them expanded for elimination of previous algorithm limitation or they 
are new strategies for solve the problems of k-means algorithm. Challenges of k-means algorithm 
are: 
1: Sensitivity to noise data 
2: it’s limited to numerical data 
3: Result of algorithm is dependent to primary condition and placing algorithm in local optimal. 
4: Lack of suitable clustering for clusters with different forms and density. 
In continuance, table 1 show the comparison of described algorithm from the point of view of 
several important parameters. Empty cells of chart show that have any importance about specific 
algorithm from relevant parameter. 
Algorithm Advantages Disadvantages Temporal 
complexity 
Suitable 
for data 
sets 
Result of 
algorithm 
Sensitivity 
to noise 
Kind of 
algorithm 
search 
K-medidos 
Better 
manageme 
nt of noises 
and pert 
data it’s 
suitable for 
data sets 
which 
evaluation 
High temporal 
complexity in 
large data sets. It 
is not suitable 
for clusters with 
different forms 
and density. 
Result of 
algorithm is 
O(k(n-k)2) Numeric 
al 
Most 
central 
members 
in each 
cluster 
Have 
not 
Local
International Journal of Managing Information Technology (IJMIT) Vol.6, No.3, August 2014 
22 
of average 
isn’t 
describable 
in it. 
related to first 
algorithm 
condition. And 
there is high 
probability for 
placing result of 
algorithm in 
local optimal. 
It’s hard to 
determine 
optimal (k) for 
problem. Utility 
of this algorithm 
is lesser and it is 
implementation 
is more complex 
rather than k-means 
algorithm. 
CLARA This 
algorithm 
can solve 
the 
problem of 
k-medidos 
algorithm, 
that is high 
temporal 
complexity 
in large 
data sets 
and also is 
suitable for 
massive 
data sets 
Having 
weakness in 
operation of 
clustering. It is 
not suitable for 
clusters with 
different forms 
and density. The 
result of 
algorithm is 
related to first 
algorithm 
condition and 
there is high 
probability for 
placing result of 
algorithm in 
local optimal. 
It’s hard to 
determine 
optimal(k) for 
problem 
O(k(40+k)2 
+k(n-k)) 
Numeric 
al 
Most 
central 
members 
in each 
cluster 
Have 
not 
Local 
K-modes It’s suitable 
for 
clustering 
of nominal 
data sets. 
Having 
weakness in 
clustering of 
numerical data. 
It is not suitable 
for clusters with 
different forms 
and density. The 
result of 
algorithm is 
related to first 
algorithm 
condition and 
there is high 
probability for 
O(n) Nominal Mood of 
each 
cluster 
Have 
not 
Local
International Journal of Managing Information Technology (IJMIT) Vol.6, No.3, August 2014 
23 
placing result of 
algorithm in 
local optimal. 
It’s hard to 
determine 
optimal (k) for 
problem 
Clusterin 
g based 
on PSO 
algorith 
m 
Probability 
for reach to 
all over 
optimal 
answer and 
exit of 
local 
optimal is 
more than 
k-means 
algorithm 
because of 
all over 
research in 
area of 
problem 
Use of pso 
algorithm lead to 
much repetitions 
and slow 
convergence for 
data with high 
volume and it’s 
suitable for data 
sets with low 
volume. First 
copy from pso 
algorithm is very 
related to 
problem 
parameters. And 
for this reason, 
algorithm place 
in local optimal. 
- Numeric 
al 
Centers 
of first 
clusters 
for k-means. 
- Global 
chaotic 
particle 
swarm 
optimiza 
tion 
clusterin 
g 
algorith 
m 
There is 
increase of 
population 
variation 
and 
increase of 
convergenc 
e speed in 
pso 
clustering 
algorithm. 
And there 
is more 
probability 
for reach to 
all over 
optimal 
answer 
rather than 
pso 
clustering 
algorithm. 
There is higher 
fiscal 
complexity than 
pso. 
- Numeric 
al 
Centers 
of first 
clusters 
for k-means 
algorith 
m or 
correctio 
n of 
formed 
clusters 
by k-means 
- Global 
Clusterin 
g 
algorith 
m based 
on GA 
algorith 
m 
Exit from 
trap of 
local 
optimal 
with high 
percent and 
there is 
probability 
for reach to 
Low 
convergence 
speed and 
increase of fiscal 
complexity 
- Nominal Centers 
of 
clusters 
- Global
International Journal of Managing Information Technology (IJMIT) Vol.6, No.3, August 2014 
24 
all over 
optimal 
answer for 
clustering. 
Clusterin 
g 
algorith 
m based 
on ant 
colony 
algorith 
m 
It produce 
optimal 
answer 
with higher 
speed and 
complex 
forms in 
clusters 
rather than 
other 
algorithms. 
Better data 
clustering 
and reach 
to all over 
optimal 
answer 
with more 
probability 
rather than 
k-means 
algorithm 
It’s possible that 
algorithm place 
in trap of local 
optimal and 
produce optimal 
answer because 
of randomly 
things selection 
by ants and 
numbers of 
repetition 
- Numeric 
al 
Optimal 
centers 
of 
clusters 
- Global 
Compou 
nd 
clusterin 
g 
algorith 
m(PSO+ 
ACO+K 
-means) 
There is 
improveme 
nt in 
problem of 
first 
condition 
selection 
for k-means 
algorithm. 
Increase of 
convergenc 
e speed to 
all over 
optimal 
answer and 
there is 
more 
probability 
for close to 
all over 
optimal 
answer 
rather than 
other 
evolutionar 
y algorithm 
High fiscal 
complexity 
- Numeric 
al 
Optimal 
centers 
of 
clusters 
- Global
International Journal of Managing Information Technology (IJMIT) Vol.6, No.3, August 2014 
25 
4. PROPOSED METHOD 
4,1. Introduction of standard PSO algorithm and it is problem 
Particle swarm optimization (PSO) is a population-based stochastic search process, modeled after 
the social behavior of a bird flock. The algorithm maintains a population of particles, where each 
particle represents a potential solution to an optimization problem.In the context of PSO, a swarm 
refers to a number ofpotential solutions to the optimization problem, where eachpotential solution 
is referred to as a particle. The aim of thePSO is to find the particle position that results in the 
bestevaluation of a given fitness (objective) function.Each particle represents a position in Nd 
dimensionalspace, and is :'flown'' through this multi-dimensional search space, adjusting its 
position toward bothThe particle's best position found thus farThe best position in the 
neighborhood of that panicle.Each particle i maintains the following information: 
xi : The current position of the particle; 
vi: The current velocity of the particle; 
yi : The personal best position of the panicle. 
Using the above notation. A particle's position is adjustedaccording to 
.,#  $ = .,#$  .,#$-..,#$ − 	.,#$/
.,#$.,#$ − 	.,#$$ 
0#  $ = 0  #  $ 
Where w is the inertia weight, c1 and c2 are the acceleration constants, r 1.j (t). r2.j(t) ~ U(0.1), 
and k = 1. . . Nd. The velocity is thus calculated based on three contributions: 
a fraction of the previous velocity.The cognitive component which is a function of the distance of 
the particle from its personal best position. The social component which is a function of the 
distance of the particle from the best particle found thus far (i.e. the best of the personal bests). 
Important issue in standard PSO algorithm is rate of it is fast convergence that possible lead to 
placing result of algorithm in local optimal. It’s clear that use of more informational sources 
increase the space of search and distribution of algorithm and improve problem of PSO. There for 
in suggestive algorithm that we called it, ECPSO1, briefly, try for increase utility of pso 
algorithm implement changes in movement Particle function for Improvement from utility of 
algorithm. In this equivalence, two randomly function(rand1,rand2) determined according to 
recent posed strategies based on chaos map. That this function have hidden order rather than 
randomly numbers that are disordered and this change cause improvement of utility from PSO 
algorithm in clustering. 
As you see in 20 equivalence, the new Particle speed in primary pso text calculate according to 
local best situation of Particle and global best situation from all of the Particle. Entrance of global 
best situation in speed of Particle, cased intense movement in displacement of Particles for going 
to new situation and also cased fast convergence of pso algorithm and increase probability 
placing algorithm in local optimal. 
Because if this situation be a incorrect and deviant situation caused that particle have intense 
deviation in it is movement. That is, one misled leader deviate all the population and algorithm 
place in trap of local optimal and can’t reach to all over optimal answer. Therefor, in our 
suggestive approach for solving this problem, consider (k) global best for all the population that 
number of this global best determine according to population and during performance of 
1 
Extended chaotic particle swarm optimization 
(5) 
(6)

Extended pso algorithm for improvement problems k means clustering algorithm

  • 1.
    International Journal ofManaging Information Technology (IJMIT) Vol.6, No.3, August 2014 EXTENDED PSO ALGORITHM FOR IMPROVEMENT PROBLEMS K-MEANS CLUSTERING ALGORITHM Amin Rostami1and Maryam Lashkari2 1Department of Computer Engineering, Ferdows Branch, Islamic Azad University, Ferdows, Iran. 2Department of Computer Engineering, Ferdows Branch, Islamic Azad University, Ferdows, Iran. ABSTRACT The clustering is a without monitoring process and one of the most common data mining techniques. The purpose of clustering is grouping similar data together in a group, so were most similar to each other in a cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30 year it is still very popular among the developed clustering algorithm and then for improvement problem of placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO. Our new algorithm is able to be cause of exit from local optimal and with high percent produce the problem’s optimal answer. The probe of results show that mooted algorithm have better performance regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality of clustering. KEYWORDS Clustering, Data Mining, Extended chaotic particle swarm optimization, K-means algorithm. 1. INTRODUCTION Nowadays, usage of data mining observe in most of science, visibly. It’s obvious that if don’t prepared suitable bedfast for use of this science, we will be away from achieved progress. Clustering is one of the most common data mining tools. That use in most case such as: engineering, data mining, medical science, social science and other items. As for clustering very applications, need to clustering and data mining is necessary in most field for further progress. First time idea of clustering represent in 1935 year and nowadays because of progresses and huge mutation most of researchers pay attention to clustering. The clustering is process of collection grouping form without label’s data. That inner members have most similar to each other in a cluster and least similar to regard as other cluster’s members. So, clustering is more ideal when two inner cluster likeness factor is maximum and outside cluster likeness is least. There is other criterions, Such as: Euclidean distance, hamming, for determination level of sample’s likeness to each other. That every criterion have further usage in special field. DOI : 10.5121/ijmit.2014.6302 17
  • 2.
    International Journal ofManaging Information Technology (IJMIT) Vol.6, No.3, August 2014 Purpose’s function is convex and non-linear in most clustering problems [1]. It’s possible that algorithm place in trap of local optimal and produce the problem’s optimal answer. There are several clustering algorithm that grouping to following kinds. Hierarchical clustering algorithm, partition, density based, model and graph based that each of them are more effective regard as other algorithm in special data environment. In all of this algorithms, researchers try to balanced, control or improve parameters to be more effective algorithm that consist of: -high measurement,- having ability to work with high dimension – having ability to dynamic data clustering, - having ability to work with high distance of problem,- having least need to additional knowledge about problem,- suitable management from noises, and interpretable clusters. Partition clustering algorithm is one of the most common and most applied from clustering algorithm. That specific data collection to specified partition’s number. So that samples in every partition have most similar to each other in a cluster and most difference with samples in other clusters .K-means algorithm of is the famous clustering algorithm in this field [6]. And it’s one of the favorite center-pivot clustering algorithms in clustering technique. K-means start with Initialization to cluster’s centers and other things with regard to Euclidean distance criterion allocate to one cluster that have least distance to cluster’s centers. In every algorithm repetition, perform two chief phase. First, every item in data collection allocate to a cluster that have least distance from cluster’s centers. In continue, after that spots grouping to K cluster the new cluster’s centers calculate by estimate average from samples of every cluster. And algorithm repeat. The temporal algorithm finish to there is any change in calculation of cluster’s centers and or finish the repetition special number [9]. In this algorithm purpose function error square series that goal is to reach a minimum it, that show in equation (1). X indicate to cluster’s samples and C indicate to cluster’s center. 18 ∅ = ∈min|| − ||
  • 3.
    (1) Advantages ofK-means algorithm: Ease of implementation and high-speed performance, measurable and efficient in large data collection. Disadvantages and problems of k-means algorithm: 1- Selection of the first cluster’s centers and number of cluster do by user. For this reason clustering results is dependent to first algorithm’s selection and if first algorithm’s condition don’t be suitable, it’s possible algorithm place in trap of local optimal. 2- Selection number of optimal cluster for problem is difficult. 3- This algorithm, because of calculation average from cluster samples for determination cluster’s center, have weak management regard as noises and data. 4- This algorithm can’t be usable in data collection that calculation average is not describable. 5- Data clustering is not usable with different forms and density. In continue we will express techniques for improvement problems that usually this techniques focus on 3 issue that are: 1- Determination way of selection first parameters. 2- Alternation in basic algorithm. 3- Combination clustering algorithm with other initiative algorithms.
  • 4.
    International Journal ofManaging Information Technology (IJMIT) Vol.6, No.3, August 2014 And then new solution posed for improvement problems of placing k-means algorithm’s result in local optimal and validation evaluate by using 3 real data collection and some indexes and finally, we will have brief comparison from posed techniques. Continue of article organized in following from: 19 1. Related works 2. Analysis and comparison posed algorithms. 3. The proposed method. 4. Simulation. 5. Conclusion 2. RELATED WORKS K-medidos clustering algorithm: This algorithm[1,3] for resolving problem of noises weak management in k-means algorithm and also in perform in case that evaluation average for data collection is non describable. The idea that posed in this algorithm is contemplate most central sample as cluster’s center in every cluster rather that selection data’s average of one cluster. Disadvantage: algorithm’s temporal complexity is high and it isn’t suitable and efficient for large data collection. Result of clustering is sensitive to first condition of algorithm and determination optimal K is difficult for problem. CLARA clustering algorithm: For solving the problem of k-medidos algorithm in large data sets, that is, high temporal complexity posed CLARA algorithm [2]. In this algorithm solve the problem of k-medidos algorithm, temporal complexity in large data collection, but there is a problem. Suppose , n is total number of samples and m is most number of samples that this way of clustering can process in objective time . If nm, often clustering from several small sample of data cause that eliminate some of the data in same groups. K-modes clustering algorithm: For clustering nominal data, k-means algorithm isn’t suitable for this reason, posed generalized way of k-means algorithm, k-means. In this algorithm [3] rather that evaluation average, we use of mode every cluster as cluster’s centroid and also algorithm first parameters such as k-means, selected randomly for this reason alongside advantage, be suitable for nominal data, it’s possible that algorithm’s result place in local optimal and also be suitable only for nominal data and it’s not efficient for numerical data. Particles Swarm Optimization Clustering algorithm: As we said before one of the problem of k-means algorithm is placing algorithm’s result in trap of local optimal, because of algorithm local search in region of problem. Clustering algorithm based on PSO algorithm problem posed [4]. For elimination of this clustering algorithm based on pso have better operation rather than k-means algorithm with few dimension for data collection and there is more probability for get all over optimal answer rather than k-means algorithm because of all over research in region of problem but use of pso algorithm lead to much repetitions and slow convergence for data with high volume. For this reason, we often combine this 2 algorithm with each other to be complement and they cover weakness each other.
  • 5.
    International Journal ofManaging Information Technology (IJMIT) Vol.6, No.3, August 2014 20 (2) (3) (4) Chaotic particle swarm optimization clustering algorithm: Two main problem of clustering using PSO method is the convergence to local optimal and slow convergence velocity, which is tried to be solved by using two ideas of chaos theory and acceleration strategy . In the formula of updating velocity of the cluster centers that is mentioned in the (2) updating is done for each particle for relocating the particle to the new position, from the best answer for each particle (Pbest) and the best global solution so far (gbest) . In which W Inertia coefficient rate tends to previous velocity of the particle, c1 rates tends to the local best position of the particle, and c2 trends to the best global position of the particle [5]. In (3) replacing cr instead of rr improves PSO algorithm as given: = !# − $
  • 6.
    % !# − $ = !# − $
  • 7.
    − $ % !# − $ '() = * '($ − '($$ In (4), Cr random value is created for each round independently between 0 and 1.which substitutes both r1 and r2, and parameter k is the number of predicted clusters. Using the chaos theory in PSO population generation will result in more diverse of the algorithm. Figure1. Chaos map [5] As can be see in Figure 1. To achieve more optimal particle swarm optimization algorithm, chaos theory is applied And in other change to increase the rate of convergence used acceleration strategy therefore in this mode a number of the population which are the best toward the target - move not all population that it increases the rate of convergence [5]. Genetic clustering algorithm: In this algorithm [7] for exit from trap of local optimal in k-means algorithm we use of genetics optimization algorithm for better data clustering. Because of evolutionary algorithm, such as genetics, have ability for global search in answer, use of them for clustering, decrease probability from placing answer of algorithm in local optimal. And finally produce more optimal answer for clustering. Ant colony clustering algorithm: Ant colony clustering algorithm [9] is pivot population innovative algorithm that used for solving problem of optimization, such as: clustering. This algorithm is capable to produce optimal answer with high speed in clusters and with complex forms rather than other innovative algorithm. This
  • 8.
    International Journal ofManaging Information Technology (IJMIT) Vol.6, No.3, August 2014 algorithm1- for better data clustering and reach to all over optimal answer with more probability of k-means algorithm use.2- Of ant colony algorithm for data clustering process. 21 K-mica compound clustering algorithm: This algorithm [11] is combination of colonial competition algorithm with k-means clustering algorithm. In this algorithm after production of primary population, randomly, k-means algorithm perform on available data with distinct numbers. Then obtained final cluster’s centers consider as primary population of colonial competition algorithm that is imperialists and perform clustering on them based on extended colonial competition algorithm and allocate colony to suitable imperialists and clustering perform over data based on extended competition algorithm and allocate colonies to suitable colonialisms. Four hybrid strategies for combination continuous ant colony optimization with PSO algorithm for utilizing in clustering process: This algorithm [12] posed 4 hybrid strategies for combination PSO algorithm. Their examinations show that utilizing hybrid strategies for clustering is so better than independent utilizing of k-means, PSO, ACOR algorithm for clustering process. Four Hybrid strategies that used by them are: 1: Series combination of 2 algorithm PSO, ACOR 2: Parallel combination of 2 algorithm PSO, ACOR 3: Series combination of 2 algorithm with one extended chart from pheromone-Particles 4: Substitution global best between 2 algorithm. 3. ANALYSIS AND COMPARISON OF ALGORITHM As you see, we posed and checked several strategies and algorithms for elimination problems and challenges of k-means clustering algorithm each of discussed algorithms have advantages and disadvantages. Some of them expanded for elimination of previous algorithm limitation or they are new strategies for solve the problems of k-means algorithm. Challenges of k-means algorithm are: 1: Sensitivity to noise data 2: it’s limited to numerical data 3: Result of algorithm is dependent to primary condition and placing algorithm in local optimal. 4: Lack of suitable clustering for clusters with different forms and density. In continuance, table 1 show the comparison of described algorithm from the point of view of several important parameters. Empty cells of chart show that have any importance about specific algorithm from relevant parameter. Algorithm Advantages Disadvantages Temporal complexity Suitable for data sets Result of algorithm Sensitivity to noise Kind of algorithm search K-medidos Better manageme nt of noises and pert data it’s suitable for data sets which evaluation High temporal complexity in large data sets. It is not suitable for clusters with different forms and density. Result of algorithm is O(k(n-k)2) Numeric al Most central members in each cluster Have not Local
  • 9.
    International Journal ofManaging Information Technology (IJMIT) Vol.6, No.3, August 2014 22 of average isn’t describable in it. related to first algorithm condition. And there is high probability for placing result of algorithm in local optimal. It’s hard to determine optimal (k) for problem. Utility of this algorithm is lesser and it is implementation is more complex rather than k-means algorithm. CLARA This algorithm can solve the problem of k-medidos algorithm, that is high temporal complexity in large data sets and also is suitable for massive data sets Having weakness in operation of clustering. It is not suitable for clusters with different forms and density. The result of algorithm is related to first algorithm condition and there is high probability for placing result of algorithm in local optimal. It’s hard to determine optimal(k) for problem O(k(40+k)2 +k(n-k)) Numeric al Most central members in each cluster Have not Local K-modes It’s suitable for clustering of nominal data sets. Having weakness in clustering of numerical data. It is not suitable for clusters with different forms and density. The result of algorithm is related to first algorithm condition and there is high probability for O(n) Nominal Mood of each cluster Have not Local
  • 10.
    International Journal ofManaging Information Technology (IJMIT) Vol.6, No.3, August 2014 23 placing result of algorithm in local optimal. It’s hard to determine optimal (k) for problem Clusterin g based on PSO algorith m Probability for reach to all over optimal answer and exit of local optimal is more than k-means algorithm because of all over research in area of problem Use of pso algorithm lead to much repetitions and slow convergence for data with high volume and it’s suitable for data sets with low volume. First copy from pso algorithm is very related to problem parameters. And for this reason, algorithm place in local optimal. - Numeric al Centers of first clusters for k-means. - Global chaotic particle swarm optimiza tion clusterin g algorith m There is increase of population variation and increase of convergenc e speed in pso clustering algorithm. And there is more probability for reach to all over optimal answer rather than pso clustering algorithm. There is higher fiscal complexity than pso. - Numeric al Centers of first clusters for k-means algorith m or correctio n of formed clusters by k-means - Global Clusterin g algorith m based on GA algorith m Exit from trap of local optimal with high percent and there is probability for reach to Low convergence speed and increase of fiscal complexity - Nominal Centers of clusters - Global
  • 11.
    International Journal ofManaging Information Technology (IJMIT) Vol.6, No.3, August 2014 24 all over optimal answer for clustering. Clusterin g algorith m based on ant colony algorith m It produce optimal answer with higher speed and complex forms in clusters rather than other algorithms. Better data clustering and reach to all over optimal answer with more probability rather than k-means algorithm It’s possible that algorithm place in trap of local optimal and produce optimal answer because of randomly things selection by ants and numbers of repetition - Numeric al Optimal centers of clusters - Global Compou nd clusterin g algorith m(PSO+ ACO+K -means) There is improveme nt in problem of first condition selection for k-means algorithm. Increase of convergenc e speed to all over optimal answer and there is more probability for close to all over optimal answer rather than other evolutionar y algorithm High fiscal complexity - Numeric al Optimal centers of clusters - Global
  • 12.
    International Journal ofManaging Information Technology (IJMIT) Vol.6, No.3, August 2014 25 4. PROPOSED METHOD 4,1. Introduction of standard PSO algorithm and it is problem Particle swarm optimization (PSO) is a population-based stochastic search process, modeled after the social behavior of a bird flock. The algorithm maintains a population of particles, where each particle represents a potential solution to an optimization problem.In the context of PSO, a swarm refers to a number ofpotential solutions to the optimization problem, where eachpotential solution is referred to as a particle. The aim of thePSO is to find the particle position that results in the bestevaluation of a given fitness (objective) function.Each particle represents a position in Nd dimensionalspace, and is :'flown'' through this multi-dimensional search space, adjusting its position toward bothThe particle's best position found thus farThe best position in the neighborhood of that panicle.Each particle i maintains the following information: xi : The current position of the particle; vi: The current velocity of the particle; yi : The personal best position of the panicle. Using the above notation. A particle's position is adjustedaccording to .,# $ = .,#$ .,#$-..,#$ − .,#$/
  • 13.
    .,#$.,#$ − .,#$$ 0# $ = 0 # $ Where w is the inertia weight, c1 and c2 are the acceleration constants, r 1.j (t). r2.j(t) ~ U(0.1), and k = 1. . . Nd. The velocity is thus calculated based on three contributions: a fraction of the previous velocity.The cognitive component which is a function of the distance of the particle from its personal best position. The social component which is a function of the distance of the particle from the best particle found thus far (i.e. the best of the personal bests). Important issue in standard PSO algorithm is rate of it is fast convergence that possible lead to placing result of algorithm in local optimal. It’s clear that use of more informational sources increase the space of search and distribution of algorithm and improve problem of PSO. There for in suggestive algorithm that we called it, ECPSO1, briefly, try for increase utility of pso algorithm implement changes in movement Particle function for Improvement from utility of algorithm. In this equivalence, two randomly function(rand1,rand2) determined according to recent posed strategies based on chaos map. That this function have hidden order rather than randomly numbers that are disordered and this change cause improvement of utility from PSO algorithm in clustering. As you see in 20 equivalence, the new Particle speed in primary pso text calculate according to local best situation of Particle and global best situation from all of the Particle. Entrance of global best situation in speed of Particle, cased intense movement in displacement of Particles for going to new situation and also cased fast convergence of pso algorithm and increase probability placing algorithm in local optimal. Because if this situation be a incorrect and deviant situation caused that particle have intense deviation in it is movement. That is, one misled leader deviate all the population and algorithm place in trap of local optimal and can’t reach to all over optimal answer. Therefor, in our suggestive approach for solving this problem, consider (k) global best for all the population that number of this global best determine according to population and during performance of 1 Extended chaotic particle swarm optimization (5) (6)