Extended pso algorithm for improvement problems k means clustering algorithm

International Journal of Managing Information Technology (IJMIT) Vol.6, No.3, August 2014
EXTENDED PSO ALGORITHM FOR
IMPROVEMENT PROBLEMS K-MEANS
CLUSTERING ALGORITHM
Amin Rostami1and Maryam Lashkari2
1Department of Computer Engineering, Ferdows Branch, Islamic Azad University,
Ferdows, Iran.
2Department of Computer Engineering, Ferdows Branch, Islamic Azad University,
Ferdows, Iran.
ABSTRACT
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
KEYWORDS
Clustering, Data Mining, Extended chaotic particle swarm optimization, K-means algorithm.
1. INTRODUCTION
Nowadays, usage of data mining observe in most of science, visibly. It’s obvious that if don’t
prepared suitable bedfast for use of this science, we will be away from achieved progress.
Clustering is one of the most common data mining tools. That use in most case such as:
engineering, data mining, medical science, social science and other items. As for clustering very
applications, need to clustering and data mining is necessary in most field for further progress.
First time idea of clustering represent in 1935 year and nowadays because of progresses and huge
mutation most of researchers pay attention to clustering. The clustering is process of collection
grouping form without label’s data.
That inner members have most similar to each other in a cluster and least similar to regard as
other cluster’s members. So, clustering is more ideal when two inner cluster likeness factor is
maximum and outside cluster likeness is least. There is other criterions, Such as: Euclidean
distance, hamming, for determination level of sample’s likeness to each other. That every
criterion have further usage in special field.
DOI : 10.5121/ijmit.2014.6302 17

Purpose’s function is convex and non-linear in most clustering problems [1]. It’s possible that
algorithm place in trap of local optimal and produce the problem’s optimal answer. There are
several clustering algorithm that grouping to following kinds. Hierarchical clustering algorithm,
partition, density based, model and graph based that each of them are more effective regard as
other algorithm in special data environment. In all of this algorithms, researchers try to balanced,
control or improve parameters to be more effective algorithm that consist of:
-high measurement,- having ability to work with high dimension – having ability to dynamic data
clustering, - having ability to work with high distance of problem,- having least need to additional
knowledge about problem,- suitable management from noises, and interpretable clusters.
Partition clustering algorithm is one of the most common and most applied from clustering
algorithm. That specific data collection to specified partition’s number. So that samples in every
partition have most similar to each other in a cluster and most difference with samples in other
clusters .K-means algorithm of is the famous clustering algorithm in this field [6]. And it’s one of
the favorite center-pivot clustering algorithms in clustering technique. K-means start with
Initialization to cluster’s centers and other things with regard to Euclidean distance criterion
allocate to one cluster that have least distance to cluster’s centers. In every algorithm repetition,
perform two chief phase. First, every item in data collection allocate to a cluster that have least
distance from cluster’s centers. In continue, after that spots grouping to K cluster the new
cluster’s centers calculate by estimate average from samples of every cluster. And algorithm
repeat. The temporal algorithm finish to there is any change in calculation of cluster’s centers and
or finish the repetition special number [9]. In this algorithm purpose function error square series
that goal is to reach a minimum it, that show in equation (1). X indicate to cluster’s samples and
C indicate to cluster’s center.
18
∅ =
∈min|| − ||

(1)
Advantages of K-means algorithm:
Ease of implementation and high-speed performance, measurable and efficient in large data
collection.
Disadvantages and problems of k-means algorithm:
1- Selection of the first cluster’s centers and number of cluster do by user. For this reason
clustering results is dependent to first algorithm’s selection and if first algorithm’s
condition don’t be suitable, it’s possible algorithm place in trap of local optimal.
2- Selection number of optimal cluster for problem is difficult.
3- This algorithm, because of calculation average from cluster samples for determination
cluster’s center, have weak management regard as noises and data.
4- This algorithm can’t be usable in data collection that calculation average is not
describable.
5- Data clustering is not usable with different forms and density.
In continue we will express techniques for improvement problems that usually this techniques
focus on 3 issue that are:
1- Determination way of selection first parameters.
2- Alternation in basic algorithm.
3- Combination clustering algorithm with other initiative algorithms.

And then new solution posed for improvement problems of placing k-means algorithm’s result in
local optimal and validation evaluate by using 3 real data collection and some indexes and finally,
we will have brief comparison from posed techniques. Continue of article organized in following
from:
19
1. Related works
2. Analysis and comparison posed algorithms.
3. The proposed method.
4. Simulation.
5. Conclusion
2. RELATED WORKS
K-medidos clustering algorithm:
This algorithm[1,3] for resolving problem of noises weak management in k-means algorithm and
also in perform in case that evaluation average for data collection is non describable. The idea
that posed in this algorithm is contemplate most central sample as cluster’s center in every cluster
rather that selection data’s average of one cluster.
Disadvantage: algorithm’s temporal complexity is high and it isn’t suitable and efficient for large
data collection. Result of clustering is sensitive to first condition of algorithm and determination
optimal K is difficult for problem.
CLARA clustering algorithm:
For solving the problem of k-medidos algorithm in large data sets, that is, high temporal
complexity posed CLARA algorithm [2]. In this algorithm solve the problem of k-medidos
algorithm, temporal complexity in large data collection, but there is a problem. Suppose , n is
total number of samples and m is most number of samples that this way of clustering can process
in objective time . If nm, often clustering from several small sample of data cause that
eliminate some of the data in same groups.
K-modes clustering algorithm:
For clustering nominal data, k-means algorithm isn’t suitable for this reason, posed generalized
way of k-means algorithm, k-means. In this algorithm [3] rather that evaluation average, we use
of mode every cluster as cluster’s centroid and also algorithm first parameters such as k-means,
selected randomly for this reason alongside advantage, be suitable for nominal data, it’s possible
that algorithm’s result place in local optimal and also be suitable only for nominal data and it’s
not efficient for numerical data.
Particles Swarm Optimization Clustering algorithm:
As we said before one of the problem of k-means algorithm is placing algorithm’s result in trap of
local optimal, because of algorithm local search in region of problem. Clustering algorithm based
on PSO algorithm problem posed [4]. For elimination of this clustering algorithm based on pso
have better operation rather than k-means algorithm with few dimension for data collection and
there is more probability for get all over optimal answer rather than k-means algorithm because of
all over research in region of problem but use of pso algorithm lead to much repetitions and slow
convergence for data with high volume. For this reason, we often combine this 2 algorithm with
each other to be complement and they cover weakness each other.

20
(2)
(3)
(4)
Chaotic particle swarm optimization clustering algorithm:
Two main problem of clustering using PSO method is the convergence to local optimal and slow
convergence velocity, which is tried to be solved by using two ideas of chaos theory and
acceleration strategy . In the formula of updating velocity of the cluster centers that is mentioned
in the (2) updating is done for each particle for relocating the particle to the new position, from
the best answer for each particle (Pbest) and the best global solution so far (gbest) . In which W
Inertia coefficient rate tends to previous velocity of the particle, c1 rates tends to the local best
position of the particle, and c2 trends to the best global position of the particle [5].
In (3) replacing cr instead of rr improves PSO algorithm as given:
= !# − $

% !# − $
= !# − $

− $ % !# − $
'() = * '($ − '($$
In (4), Cr random value is created for each round independently between 0 and 1.which
substitutes both r1 and r2, and parameter k is the number of predicted clusters. Using the chaos
theory in PSO population generation will result in more diverse of the algorithm.
Figure1. Chaos map [5]
As can be see in Figure 1. To achieve more optimal particle swarm optimization algorithm, chaos
theory is applied And in other change to increase the rate of convergence used acceleration
strategy therefore in this mode a number of the population which are the best toward the target -
move not all population that it increases the rate of convergence [5].
Genetic clustering algorithm:
In this algorithm [7] for exit from trap of local optimal in k-means algorithm we use of genetics
optimization algorithm for better data clustering. Because of evolutionary algorithm, such as
genetics, have ability for global search in answer, use of them for clustering, decrease probability
from placing answer of algorithm in local optimal. And finally produce more optimal answer for
clustering.
Ant colony clustering algorithm:
Ant colony clustering algorithm [9] is pivot population innovative algorithm that used for solving
problem of optimization, such as: clustering. This algorithm is capable to produce optimal answer
with high speed in clusters and with complex forms rather than other innovative algorithm. This

algorithm1- for better data clustering and reach to all over optimal answer with more probability
of k-means algorithm use.2- Of ant colony algorithm for data clustering process.
21
K-mica compound clustering algorithm:
This algorithm [11] is combination of colonial competition algorithm with k-means clustering
algorithm. In this algorithm after production of primary population, randomly, k-means algorithm
perform on available data with distinct numbers. Then obtained final cluster’s centers consider as
primary population of colonial competition algorithm that is imperialists and perform clustering
on them based on extended colonial competition algorithm and allocate colony to suitable
imperialists and clustering perform over data based on extended competition algorithm and
allocate colonies to suitable colonialisms.
Four hybrid strategies for combination continuous ant colony optimization with PSO
algorithm for utilizing in clustering process:
This algorithm [12] posed 4 hybrid strategies for combination PSO algorithm. Their examinations
show that utilizing hybrid strategies for clustering is so better than independent utilizing of k-means,
PSO, ACOR algorithm for clustering process.
Four Hybrid strategies that used by them are:
1: Series combination of 2 algorithm PSO, ACOR
2: Parallel combination of 2 algorithm PSO, ACOR
3: Series combination of 2 algorithm with one extended chart from pheromone-Particles
4: Substitution global best between 2 algorithm.
3. ANALYSIS AND COMPARISON OF ALGORITHM
As you see, we posed and checked several strategies and algorithms for elimination problems and
challenges of k-means clustering algorithm each of discussed algorithms have advantages and
disadvantages. Some of them expanded for elimination of previous algorithm limitation or they
are new strategies for solve the problems of k-means algorithm. Challenges of k-means algorithm
are:
1: Sensitivity to noise data
2: it’s limited to numerical data
3: Result of algorithm is dependent to primary condition and placing algorithm in local optimal.
4: Lack of suitable clustering for clusters with different forms and density.
In continuance, table 1 show the comparison of described algorithm from the point of view of
several important parameters. Empty cells of chart show that have any importance about specific
algorithm from relevant parameter.
Algorithm Advantages Disadvantages Temporal
complexity
Suitable
for data
sets
Result of
algorithm
Sensitivity
to noise
Kind of
algorithm
search
K-medidos
Better
manageme
nt of noises
and pert
data it’s
suitable for
data sets
which
evaluation
High temporal
complexity in
large data sets. It
is not suitable
for clusters with
different forms
and density.
Result of
algorithm is
O(k(n-k)2) Numeric
al
Most
central
members
in each
cluster
Have
not
Local

22
of average
isn’t
describable
in it.
related to first
algorithm
condition. And
there is high
probability for
placing result of
algorithm in
local optimal.
It’s hard to
determine
optimal (k) for
problem. Utility
of this algorithm
is lesser and it is
implementation
is more complex
rather than k-means
algorithm.
CLARA This
algorithm
can solve
the
problem of
k-medidos
algorithm,
that is high
temporal
complexity
in large
data sets
and also is
suitable for
massive
data sets
Having
weakness in
operation of
clustering. It is
not suitable for
clusters with
different forms
and density. The
result of
algorithm is
related to first
algorithm
condition and
there is high
probability for
placing result of
algorithm in
local optimal.
It’s hard to
determine
optimal(k) for
problem
O(k(40+k)2
+k(n-k))
Numeric
al
Most
central
members
in each
cluster
Have
not
Local
K-modes It’s suitable
for
clustering
of nominal
data sets.
Having
weakness in
clustering of
numerical data.
It is not suitable
for clusters with
different forms
and density. The
result of
algorithm is
related to first
algorithm
condition and
there is high
probability for
O(n) Nominal Mood of
each
cluster
Have
not
Local

23
placing result of
algorithm in
local optimal.
It’s hard to
determine
optimal (k) for
problem
Clusterin
g based
on PSO
algorith
m
Probability
for reach to
all over
optimal
answer and
exit of
local
optimal is
more than
k-means
algorithm
because of
all over
research in
area of
problem
Use of pso
algorithm lead to
much repetitions
and slow
convergence for
data with high
volume and it’s
suitable for data
sets with low
volume. First
copy from pso
algorithm is very
related to
problem
parameters. And
for this reason,
algorithm place
in local optimal.
- Numeric
al
Centers
of first
clusters
for k-means.
- Global
chaotic
particle
swarm
optimiza
tion
clusterin
g
algorith
m
There is
increase of
population
variation
and
increase of
convergenc
e speed in
pso
clustering
algorithm.
And there
is more
probability
for reach to
all over
optimal
answer
rather than
pso
clustering
algorithm.
There is higher
fiscal
complexity than
pso.
- Numeric
al
Centers
of first
clusters
for k-means
algorith
m or
correctio
n of
formed
clusters
by k-means
- Global
Clusterin
g
algorith
m based
on GA
algorith
m
Exit from
trap of
local
optimal
with high
percent and
there is
probability
for reach to
Low
convergence
speed and
increase of fiscal
complexity
- Nominal Centers
of
clusters
- Global

24
all over
optimal
answer for
clustering.
Clusterin
g
algorith
m based
on ant
colony
algorith
m
It produce
optimal
answer
with higher
speed and
complex
forms in
clusters
rather than
other
algorithms.
Better data
clustering
and reach
to all over
optimal
answer
with more
probability
rather than
k-means
algorithm
It’s possible that
algorithm place
in trap of local
optimal and
produce optimal
answer because
of randomly
things selection
by ants and
numbers of
repetition
- Numeric
al
Optimal
centers
of
clusters
- Global
Compou
nd
clusterin
g
algorith
m(PSO+
ACO+K
-means)
There is
improveme
nt in
problem of
first
condition
selection
for k-means
algorithm.
Increase of
convergenc
e speed to
all over
optimal
answer and
there is
more
probability
for close to
all over
optimal
answer
rather than
other
evolutionar
y algorithm
High fiscal
complexity
- Numeric
al
Optimal
centers
of
clusters
- Global

25
4. PROPOSED METHOD
4,1. Introduction of standard PSO algorithm and it is problem
Particle swarm optimization (PSO) is a population-based stochastic search process, modeled after
the social behavior of a bird flock. The algorithm maintains a population of particles, where each
particle represents a potential solution to an optimization problem.In the context of PSO, a swarm
refers to a number ofpotential solutions to the optimization problem, where eachpotential solution
is referred to as a particle. The aim of thePSO is to find the particle position that results in the
bestevaluation of a given fitness (objective) function.Each particle represents a position in Nd
dimensionalspace, and is :'flown'' through this multi-dimensional search space, adjusting its
position toward bothThe particle's best position found thus farThe best position in the
neighborhood of that panicle.Each particle i maintains the following information:
xi : The current position of the particle;
vi: The current velocity of the particle;
yi : The personal best position of the panicle.
Using the above notation. A particle's position is adjustedaccording to
.,# $ = .,#$ .,#$-..,#$ − .,#$/

.,#$.,#$ − .,#$$
0# $ = 0 # $
Where w is the inertia weight, c1 and c2 are the acceleration constants, r 1.j (t). r2.j(t) ~ U(0.1),
and k = 1. . . Nd. The velocity is thus calculated based on three contributions:
a fraction of the previous velocity.The cognitive component which is a function of the distance of
the particle from its personal best position. The social component which is a function of the
distance of the particle from the best particle found thus far (i.e. the best of the personal bests).
Important issue in standard PSO algorithm is rate of it is fast convergence that possible lead to
placing result of algorithm in local optimal. It’s clear that use of more informational sources
increase the space of search and distribution of algorithm and improve problem of PSO. There for
in suggestive algorithm that we called it, ECPSO1, briefly, try for increase utility of pso
algorithm implement changes in movement Particle function for Improvement from utility of
algorithm. In this equivalence, two randomly function(rand1,rand2) determined according to
recent posed strategies based on chaos map. That this function have hidden order rather than
randomly numbers that are disordered and this change cause improvement of utility from PSO
algorithm in clustering.
As you see in 20 equivalence, the new Particle speed in primary pso text calculate according to
local best situation of Particle and global best situation from all of the Particle. Entrance of global
best situation in speed of Particle, cased intense movement in displacement of Particles for going
to new situation and also cased fast convergence of pso algorithm and increase probability
placing algorithm in local optimal.
Because if this situation be a incorrect and deviant situation caused that particle have intense
deviation in it is movement. That is, one misled leader deviate all the population and algorithm
place in trap of local optimal and can’t reach to all over optimal answer. Therefor, in our
suggestive approach for solving this problem, consider (k) global best for all the population that
number of this global best determine according to population and during performance of
1
Extended chaotic particle swarm optimization
(5)
(6)

Extended pso algorithm for improvement problems k means clustering algorithm

More Related Content

What's hot

Similar to Extended pso algorithm for improvement problems k means clustering algorithm

More from IJMIT JOURNAL

Recently uploaded

Extended pso algorithm for improvement problems k means clustering algorithm