1. University of Science and Technology Houari Boumediene
The ACO-MEDOIDS
Using the Ant Colony Optimization for partitioning
data into cluters
MOUDJARI Leila
l.moudj11@gmail.com
April 15, 2017
2. 1
Presentation plan
Introduction
Clustering and related work
What Is Cluster Analysis?
Requirements of clustering
Categorization of Clustering Methods
Clustering and related work
The importance of swarm intelligence and the ACO approach
Ant Colony Optimization
Adaptation of ACO to the medoids problem
ACO-MEDOID algorithm
An ant
The search space
Solution construction
Selecting rule
Fitness function
Pheromone update
The empirical parameters
Conclusion
MOUDJARI Leila | ACO-MEDOIDS
4. 2
Introduction
Data mining is used in several disciplines, database systems,
statistics, machine learning, visualization, information science...
MOUDJARI Leila | ACO-MEDOIDS
5. 2
Introduction
Data mining is used in several disciplines, database systems,
statistics, machine learning, visualization, information science...
A data mining system can perform several tasks such as
characterization, discrimination, association or correlation
analysis, classification, prediction, clustering, outlier analysis, or
evolution analysis. These tasks can be classified as supervised
or unsupervised. Data clustering is an unsupervised learning
and one of the most challenging problems in data mining. It’s
also classified as an NP-hard problem.
MOUDJARI Leila | ACO-MEDOIDS
6. 2
Introduction
Data mining is used in several disciplines, database systems,
statistics, machine learning, visualization, information science...
A data mining system can perform several tasks such as
characterization, discrimination, association or correlation
analysis, classification, prediction, clustering, outlier analysis, or
evolution analysis. These tasks can be classified as supervised
or unsupervised. Data clustering is an unsupervised learning
and one of the most challenging problems in data mining. It’s
also classified as an NP-hard problem.
One of the strongest disciplines which faced this class of
problems and still remains liable is swarm intelligence. Therefore
we leaned towards this discipline as other researchers did.
MOUDJARI Leila | ACO-MEDOIDS
8. 3
Introduction
Over the last years, many have presented works in this area, we
mention the BAT-CLARA [1], Association Rule Mining Based on
Bat Algorithm [2], MACOC: a medoid-based ACO clustering
algorithm [3], SACOC: A spectral-based ACO clustering
algorithm [4]...
MOUDJARI Leila | ACO-MEDOIDS
10. 4
Introduction
Clustering is a large field and lots of work might still be needed in
the different areas. However we are concetrating ours on the
partitioning algorithms. Pricesly partitioning the dataset into k
clusters. Which is also an NP-hard task.
MOUDJARI Leila | ACO-MEDOIDS
11. 4
Introduction
Clustering is a large field and lots of work might still be needed in
the different areas. However we are concetrating ours on the
partitioning algorithms. Pricesly partitioning the dataset into k
clusters. Which is also an NP-hard task.
The most well-known and commonly used partitioning methods
are k-means, k-medoids (PAM), and their variations [5].
Such as CLARA, CLARANS, CLAM (a recent one 2011, using a
hybrid metaheuristic between VNS and Tabu Search to solve the
problem of k-medoid clustering) [6], ...etc.
MOUDJARI Leila | ACO-MEDOIDS
12. 4
Introduction
Clustering is a large field and lots of work might still be needed in
the different areas. However we are concetrating ours on the
partitioning algorithms. Pricesly partitioning the dataset into k
clusters. Which is also an NP-hard task.
The most well-known and commonly used partitioning methods
are k-means, k-medoids (PAM), and their variations [5].
Such as CLARA, CLARANS, CLAM (a recent one 2011, using a
hybrid metaheuristic between VNS and Tabu Search to solve the
problem of k-medoid clustering) [6], ...etc.
We hereby present an algorithm for k-medoid clustering based
on an ACO solution search the ACO-medoids. As its name
indicates, the algorithm uses the Ant colony optimisation to
explore the search space looking for an optimal set of medoids
with reference to k-medoids for necessary clustering concepts.
MOUDJARI Leila | ACO-MEDOIDS
15. 5
Clustering and related work
What Is Cluster Analysis?
Clustering is an unsupervised learning process it does
not rely on predefined classes and class-labeled
training examples, therefore it is considered as a
form of learning by observation and not by examples.
MOUDJARI Leila | ACO-MEDOIDS
16. 5
Clustering and related work
What Is Cluster Analysis?
Clustering is an unsupervised learning process it does
not rely on predefined classes and class-labeled
training examples, therefore it is considered as a
form of learning by observation and not by examples.
It aims to reduce the data size by grouping similar
objects in one cluster, so Giving a set of data
objects a clustering algorithm must be capable of
grouping the different objects into classes, so that a
high intragroup similarity and a low inter-group
similarity are ensured.
MOUDJARI Leila | ACO-MEDOIDS
17. 5
Clustering and related work
What Is Cluster Analysis?
Clustering is an unsupervised learning process it does
not rely on predefined classes and class-labeled
training examples, therefore it is considered as a
form of learning by observation and not by examples.
It aims to reduce the data size by grouping similar
objects in one cluster, so Giving a set of data
objects a clustering algorithm must be capable of
grouping the different objects into classes, so that a
high intragroup similarity and a low inter-group
similarity are ensured.
The similarity or dissimilarity is assessed via a
distance measure(Euclidean or Manhattan distance
measures, or other distance measurements, may also be
used)
MOUDJARI Leila | ACO-MEDOIDS
19. 6
Clustering and related work
Requirements of clustering
Scalability,
MOUDJARI Leila | ACO-MEDOIDS
20. 6
Clustering and related work
Requirements of clustering
Scalability,
Ability to deal with different types of attributes,
MOUDJARI Leila | ACO-MEDOIDS
21. 6
Clustering and related work
Requirements of clustering
Scalability,
Ability to deal with different types of attributes,
Ability to deal with noisy data,
MOUDJARI Leila | ACO-MEDOIDS
22. 6
Clustering and related work
Requirements of clustering
Scalability,
Ability to deal with different types of attributes,
Ability to deal with noisy data,
High dimensionality (number of attributes)...
MOUDJARI Leila | ACO-MEDOIDS
23. 7
Categorization of Clustering Methods
In general, these algorithms can be classified into the following
categories:
MOUDJARI Leila | ACO-MEDOIDS
24. 7
Categorization of Clustering Methods
In general, these algorithms can be classified into the following
categories:
Partitioning methods
what characterizes this class is a predefined number k of partitions,
each partition represents a cluster. So that each cluster must contain
at least one object and an object must belong to at most one group.
The most known methods are k-means and k-medoids.
MOUDJARI Leila | ACO-MEDOIDS
25. 7
Categorization of Clustering Methods
In general, these algorithms can be classified into the following
categories:
Partitioning methods
what characterizes this class is a predefined number k of partitions,
each partition represents a cluster. So that each cluster must contain
at least one object and an object must belong to at most one group.
The most known methods are k-means and k-medoids.
Hierarchical methods
creates a hierarchical decomposition of the dataset, it can be
classified as being either agglomerative (bottom-up) or divisive
(top-down).
MOUDJARI Leila | ACO-MEDOIDS
26. 8
Categorization of Clustering Methods
Density-based methods
unlike partitioning methods, these are based on the notion of density
(number of objects or data points) instead of distance. They continue
growing the given cluster as long as the density in the “neighborhood”
exceeds some threshold. Such as DBSCAN and its extension,
OPTICS, are typical density-based methods.
MOUDJARI Leila | ACO-MEDOIDS
27. 8
Categorization of Clustering Methods
Density-based methods
unlike partitioning methods, these are based on the notion of density
(number of objects or data points) instead of distance. They continue
growing the given cluster as long as the density in the “neighborhood”
exceeds some threshold. Such as DBSCAN and its extension,
OPTICS, are typical density-based methods.
There is also the Grid-based methods, Model-based methods,
Constraint-based clustering...
MOUDJARI Leila | ACO-MEDOIDS
28. 9
Clustering and related work
K-mean algorithm
Input: k (the number of clusters),
D(a data set containing n objects).
Output: A set of k clusters.
Begin
1. arbitrarily choose k objects from D as the
initial cluster centers;
2. repeat
3. (re)assign each object to the cluster to
which the object is the most similar;
4. update the cluster means, i.e., calculate the
mean value of the objects for each cluster;
5. until no change;
End.
MOUDJARI Leila | ACO-MEDOIDS
29. 10
Clustering and related work
k-Medoids algorithm
Input: k (the number of clusters),
D(a data set containing n objects).
Output: A set of k clusters.
Begin
1. arbitrarily choose k objects from D as the initial cluster centers;
2. repeat
3. assign each remaining object to the nearest cluster;
4. randomly select a nonrepresentative object, orand ;
5. compute the total cost, S, of swapping
representative object, oj with orand ;
6. if S < 0 then swap oj with orand to form the new set of k
medoids;
7. until no change;
End.
MOUDJARI Leila | ACO-MEDOIDS
30. 11
Clustering and related work
K-medoids was presented as a solution to some of k-means flows. As
its sensitivity towards outliers and the fact that the centroids are
abstract objects. PAM proved that real objects diminish the error
value. However, it has some lacks. When it comes to large datasets it
loses due to the significant amount of time needed to construct the
set of medoids. In spite of that, researchers tried to improve it. That’s
why clustering field witnessed the birth of its variations: CLARA,
followed by CLARANS and others as already mentioned. However
the problem persists. How can we gain in scalability without
loosing in quality?
In the last years clustering draw attention of the meta-heuristic
community. Several works have been presented. One of the
promising optimization methods is ACO.
MOUDJARI Leila | ACO-MEDOIDS
31. 12
The importance of swarm intelligence and the
ACO approach
Swarm intelligence
MOUDJARI Leila | ACO-MEDOIDS
32. 12
The importance of swarm intelligence and the
ACO approach
Swarm intelligence
It is well known that we are more effective when we work with
others rather than working in isolation and this is, the core of
swarm intelligence.
MOUDJARI Leila | ACO-MEDOIDS
33. 12
The importance of swarm intelligence and the
ACO approach
Swarm intelligence
It is well known that we are more effective when we work with
others rather than working in isolation and this is, the core of
swarm intelligence.
Swarm intelligence is based on the collective behavior of
species. Each method is a result of nature observation and
intelligence forms of group behavior analysis. It results in the
simulation of these studied behaviors of collective insects, animal
and human. It gained popularity with the burst of artificial
intelligence in the 80s. Especially when dealing with
combinatorial problems. Such problems are divided into classes,
P (polynomial), NP, Np-complete and NP-hard. The latter two
generally have an exponential complexity.
MOUDJARI Leila | ACO-MEDOIDS
34. 13
The importance of swarm intelligence and the
ACO approach
Problem ∈ [Np-hard | Np-complete] ==> call 911.
clustering ∈ [Np-hard] ==> Swarm intelligence.
MOUDJARI Leila | ACO-MEDOIDS
35. 14
The importance of swarm intelligence and the
ACO approach
Ant Colony Optimization
ACO showed its strength when dealing with problems related to
graphs.
It was driven by the fascination for ants, how they worked in
harmony to nourish and build a habitat.
They cooperate and help each other by sharing useful
information such as the path to take or to avoid.
they communicate using a substance they release called
"pheromone", as a stigmergy.
The use of ACO-based algorithms is very large and domain
based, therefore it was adopted to several types of problems.
MOUDJARI Leila | ACO-MEDOIDS
36. 15
The importance of swarm intelligence and the
ACO approach
Ant Colony Optimization: the algorithm
ACO algorithm
Begin
1. While (not stop conditions) do
2. for k=1 to Nb-ants do
3. begin
4. Build a solution (Sk );
5. Evaluate (Sk );
6. Apply online pheromone update;
7. end-for;
8. Determine the best solution of the current iteration;
9. Apply offline pheromone update;
10. end-while;
End.
MOUDJARI Leila | ACO-MEDOIDS
37. 16
The importance of swarm intelligence and the
ACO approach
Ant Colony Optimization
One of the advantages of applying ACO algorithms to the
clustering problems is that ACO performs a global search in the
solution space, which is less likely to get trapped in local minima
and, thus, has the potential to find more accurate solutions [7].
The algorithm, uses an iterative search strategy to find an
approximate optimal solution, using the pheromone track and a
heuristic.
ACO has been successfully adopted for multiple problems.
Works on unsupervised learning have focused on clustering
showing the potential of ACO-based techniques.
MOUDJARI Leila | ACO-MEDOIDS
38. 17
Clustering and ACO
Nevertheless, more work need to be done, especially for
medoid-based clustering, which compared to classical
centroid-based techniques are more efficient. In this area, divers
algorithms were proposed such as:
"An adaptive multi-agent ACO clustering algorithm" in 2005 by
Weijiao Zhang and Chunhuang Liu.
"Classification with cluster-based Bayesian multi-nets using Ant
Colony Optimization" in 2014 by Khalid M. Selma and Alex A.
Freitas.
Also MACOC: a medoid-based ACO clustering algorithm in 2014.
Recently, a "Medoid-based clustering algorithms using ant
colony optimization" (METACOC and METACOC-K) were
proposed in 2016 (Héctor D. Menéndez, Fernando E. B. Otero,
David Camacho) [7].
...etc.
MOUDJARI Leila | ACO-MEDOIDS
39. 18
Adaptation of ACO to the medoids problem
ACO-MEDOID algorithm
MOUDJARI Leila | ACO-MEDOIDS
40. 18
Adaptation of ACO to the medoids problem
ACO-MEDOID algorithm
"ACO-medoids" for finding the best set of k-medoids. Based on the
ant colony optimisation and the k-medoids. we will strat with the
general form of the algorithm
MOUDJARI Leila | ACO-MEDOIDS
41. 18
Adaptation of ACO to the medoids problem
ACO-MEDOID algorithm
"ACO-medoids" for finding the best set of k-medoids. Based on the
ant colony optimisation and the k-medoids. we will strat with the
general form of the algorithm
ACO-medoid algorithm
Input: k (the number of clusters),
D(a data set containing n objects).
M(the similarity (distance) matrix).
Output: A set of k clusters.
Begin
// start by creating the initial population
1. foreach ant do
2. arbitrarily choose k objects from D as the initial solution of the ant;
3. end-foreach;
4. While (change or i < Max − Iter) do
5. foreach ant do
6. begin
7. Build a solution (Sk );
8. Evaluate (Sk );
9. Update Abest and Vbest 19;
10. Apply online pheromone update;
11. end-foreach;
12. Determine the best solution of the current iteration;
13. Apply offline pheromone update;
14. end-while;
End.
MOUDJARI Leila | ACO-MEDOIDS
42. 19
Adaptation of ACO to the medoids problem
An ant
A virtual agent in the multi-dimensional space, which is the search
space. It has the following properties:
sol: which is the current solution of the ant,
Abest: the best solution found so far by the ant,
Vbest: the valuation of Abest.
MOUDJARI Leila | ACO-MEDOIDS
43. 20
Adaptation of ACO to the medoids problem
The search space
It includes all potential combinations of objects that can build a set of
medoids (solutions), verifying the similarity/dissimilarity constraint of
clustering. The number of these possible solutions depends on k (the
number of clusters) so if we have n objects that need to be placed in
k clusters then the number is determined as follows:
For the initial object we have n possibilities,
for the next one we have n − 1,
for the kth
we have n − k − 1 possibilities,
The total number of solutions is then equal to
n ∗ (n − 1) ∗ ... ∗ (n − k − 1) => meaning exponential.
MOUDJARI Leila | ACO-MEDOIDS
44. 21
Adaptation of ACO to the medoids problem
Solution construction
In order to build a solution an ant has two possible strategies exploit
or explore. The first is a local search based method that helps an ant
improve its solution, the second help exploring new promising
regions. As shown in the next pseudocode;
MOUDJARI Leila | ACO-MEDOIDS
45. 21
Adaptation of ACO to the medoids problem
Solution construction
In order to build a solution an ant has two possible strategies exploit
or explore. The first is a local search based method that helps an ant
improve its solution, the second help exploring new promising
regions. As shown in the next pseudocode;
Procedure: constructSolution
Input: an ant,
Output: an ant
Begin
// choose a strategy randomly
1. S0 : a random variable uniformly distributed in [0,1]
2. if S0 <= Sp then
3. sol = explore;
4. applyLocalSearch(Sol);
5. else
6. apply local search(Abest);
7. endif;
End.
MOUDJARI Leila | ACO-MEDOIDS
46. 22
Adaptation of ACO to the medoids problem
Solution construction
Procedure: explore
Input: D the dataset;
Output: s
Begin
1. s = empty;
2. while (i<k and D not empty) do
3. select oi from D using the selecting rule 24;
4. Append oi to s;
5. eliminate oi from D;
6. endwhile;
End.
MOUDJARI Leila | ACO-MEDOIDS
47. 23
Adaptation of ACO to the medoids problem
Solution construction
Procedure: localSearch
Input: the solution to be improved ;
Output: a solution
Begin
1. for j = 1 to lmax do
2. for m=1 to mds 27 do
3. C: the corresponding cluster;
4. choose object orand from C;
5. compute the total cost S, of swapping the
representative object Sol[m], with orand ;
6. if S < 0 then swap orand with Sol[m] to
form the new set of k representative objects;
7. update clusters;
8. endfor;
9. endfor;
End.
MOUDJARI Leila | ACO-MEDOIDS
48. 24
Adaptation of ACO to the medoids problem
Selecting rule
The selecting process tries to find the furthest object in the selection
D from the set of objects already chosen as medoids, by using the
following formula;
j =
maxu∈Y {T(u)} if q ≤ q0
maxu∈Y {P(u)} otherwise
(1)
Pu(t) =
Tu(t)
v∈Y Tv
where;
Tj pheromone amount of the jth
object ∈ D,
Y set of possible medoids,
P the probability that data instance j could
be selected as a medoid
q is a random number distributed uniformly
in [0, 1],
q0 is an empirical parameter,
MOUDJARI Leila | ACO-MEDOIDS
49. 25
Adaptation of ACO to the medoids problem
Fitness function
It is used to evaluate a solution, it represents the cost (Ecost ) of a
solution. However, in order to compare two solutions we calculate S,
S = Snew − Sold . If S is negative, then the new solution is better than
the old one.
Ecost =
k
i=1
C
j=1
M[m, j]
where;
M is the distance matrix,
C is the number of objects in the clusteri .
Another possible objective function is the sum of the probability P
calculate in the following formula. The aim is to maximize it;
if q ≤ q0
P =
1 if j = argmaxu∈Y {T(u)}
0 otherwise
(2)
else P is calculated as formula 24.
MOUDJARI Leila | ACO-MEDOIDS
50. 26
Adaptation of ACO to the medoids problem
Pheromone update
Regarding the pheromone updates we used the on and offline
updates calculated as follows;
Online update
Ti = (1 − ρ)Ti (t) + ρτ0
where;
ρ: is the evaporation rate and also an empirical parameter,
τ: is the initial value of pheromone. Offline update At the end of
each iteration, the offline update is performed. So the ant with the
best current solution deposits an amount of pheromone equal to
∆Ti (t). The update is performed using this formula:
Ti = (1 − ρ)Ti (t) + ρ∆Ti (t)
where;
∆Ti (t) =
1
C if the ant uses the object l
0 otherwise
(3)
C: the cost of the ant’s solution (Ecost 25).
MOUDJARI Leila | ACO-MEDOIDS
51. 27
Adaptation of ACO to the medoids problem
The empirical parameters
This section presents the different empirical parameters that need to
be defined in order to improve the solution quality.
parameter role
A number of ants
Max-Iter Iterations number of the algorithm
lmax Iterations number of local search
Sp in [0,1] Intensification/diversification rate
the strategy rate
q0 selection rate
mds Number of clusters to be updated
(can be equal to k or randomly
chosen each time in [1-k])
ρ the evaporation rate
MOUDJARI Leila | ACO-MEDOIDS
52. 28
Conclusion
We presented some ideas for the use of ACO to solve the medoids
problem, through a proposed medoid and ACO based clustering
algorithm we called "ACO-medoids". It is based on the ants’ collective
behavior and k-medoids for building the clusters. Implementation and
tests need to be done so that we can be conclusive regarding the
algorithm behavior. However swarm based algorithms, including ACO
proved that they can improve the time/space complexity of NP-hard
problems. Therefore, we believe that the algorithm can provide the
optimal solution in a finite amount of time.
MOUDJARI Leila | ACO-MEDOIDS
54. 29
Bibliographie
[1] NadjetKamel YasmineAboubi, HabibaDrias.
Bat-clara: Bat-inspired algorithm for clustering large applications.
IFAC-PapersOnLine 49-12 243–248, 2016.
[2] Habiba Drias Kamel Eddine Heraguemi, Nadjet Kamel.
Association rule mining based on bat algorithm.
Journal of Computational and Theoretical Nanoscience
12(7):1195-1200, 2015.
[3] Fernando E. B. Otero Héctor D. Menéndez and David Camacho.
Macoc: a medoid-based aco clustering algorithm.
DOI: 10.1007/978-3-319-09952-1_11, 2014.
[4] Fernando E. B. Otero Héctor D. Menéndez and David Camacho.
Sacoc a spectral-based aco clustering algorithm.
DOI: 10.1007/978-3-319-10422-5_20, 2014.
[5] Data mining: concepts and techniques (second edition).
ELESEVIER, 2011.
[6] V.J. J Nguyen, Q. & Rayward-Smith.MOUDJARI Leila | ACO-MEDOIDS