Data Clustering Using
Swarm Intelligence Algorithms
An Overview
Faculty of Computers and Information,
Cairo University and SRGE member
Mona M.Soliman
http://www.egyptscience.net
Bio-inspiring and evolutionary computation: Trends, applications and open issues
workshop, 7 Nov. 2015 Faculty of Computers and Information, Cairo University
Agenda
 Introduction
 Types of data clustering
 Classical clustering Algorithms
 Swarm Intelligence Algorithms
 Clustering with SI Algorithms
2
 Clustering means the act of partitioning an
unlabeled dataset into groups of similar objects.
 Each group, called a `cluster', consists of
objects that are similar between themselves and
dissimilar to objects of other groups.
 From a machine learning perspective, clusters
correspond to the hidden patterns in data, the
search for clusters is a kind of unsupervised
learning, and the resulting system represents a
data concept.
3
Introduction
Problem Definition
 In the past few decades, cluster analysis has played a central role in a
variety of fields ranging from :
 Engineering (machine learning, artificial intelligence, pattern recognition,
mechanical engineering, electrical engineering)
 Computer sciences (web mining, spatial database analysis, textual document
collection, image segmentation)
 Life and medical sciences (genetics, biology, microbiology,paleontology,
psychiatry, pathology)
 Earth sciences (geography. geology, remote sensing)
 Social sciences (sociology, psychology, archeology, education)
 Economics (marketing, business)
4
Introduction
Motivation
What is a good cluster is
5
Inter-cluster
distances are
maximized
Intra-cluster
distances are
minimized
Types of Data Clustering
6
Data
Clustering
Hierarchal
Agglomerative Divisive
Partitional
Error
Minimizatio
n
Graph
theoretic
Density
Based
Model
Based
bottom-
up
Top-
down
K means
minimal
Spannin
g Tree
expectatio
n
maximatio
n
-Decision tree
-Neural
network
Hierarchal clustering
7
Types of Data Clustering
8
Data
Clustering
Hierarchal
Agglomerative Divisive
Partitional
Error
Minimizatio
n
Graph
theoretic
Density
Based
Model
Based
bottom-
up
Top-
down
K means
minimal
Spannin
g Tree
expectatio
n
maximatio
n
-Decision tree
-Neural
network
Original Points A Partitional Clustering
Partitinal clustering
The Classical Clustering Algorithms
k-means Algorithm
 The K-means algorithm
groups D-dimensional data
vectors into a predefined
number of clusters on the
basis of the Euclidean
distance as the similarity
criteria.
 Euclidean distances among
data vectors are minimum for
data vectors within a cluster
as compared with distances
to other data vectors in
 Vectors of the same cluster
are associated with one
centroid vector, which
represents the center of that
cluster and is the mean of the
data vectors that belong
together.
10
Swarm Intelligence Algorithms
Biological Foundation
 The collective and social behavior of
living creatures motivated researchers to
undertake the study of today what is
known as Swarm Intelligence
 The efforts to mimic such behaviors
through computer simulation finally
resulted into the fascinating field of SI.
 SI systems are typically made up of a
population of simple agents interacting
locally with one another and with their
environment.
11
 The behavior of a single ant,
bee, termite and wasp often
is too simple, but their
collective and social behavior
is of paramount significance.
 Ant Colony Optimization (1992)
 Particle Swarm Optimization
(1995)
 Fish Swarm Optimization (2002)
 Bee Swarm Optimization (2005)
 Cat Swarm Optimization (2006)
 Firefly Optimization (2008)
 Cuckoo Search Optimization
(2009)
12
Swarm Intelligence Algorithms
An overview
Clustering with the SI Algorithms
Relevance of SI Algorithms in Clustering
- SI algorithms are mainly stochastic search and optimization
techniques.
- All swarm intelligence algorithms are based on population, where
their
iterative procedure leads to improve the position of individual in
population and subsequently, their movement toward the better
positions.
- They are efficient, adaptive and robust search methods producing
near
optimal solutions and have a large amount of implicit parallelism.Data clustering may be well formulated as a difficult global
optimization problem; thereby making the application of SI tools
more obvious and appropriate.
13
Improving the performance of existing classical clustering
methods (e.g. k-means , k-medoid, fuzzy clustering )
• K-means clustering have many drawbacks: Such as being trapped in
local minimum and being sensitive to initial cluster centers
• improve the cluster quality by refinement algorithm. (ACO,PSO,Bee
Swarm, Firefly Swarm)
• Determining the optimal number of clusters
• Determine the initial cluster centers
Clustering with the SI Algorithms
Relevance of SI Algorithms in Clustering
15
Clustering with the SI Algorithms
Relevance of SI Algorithms in Clustering
Creation of clustering algorithm based on SI algorithms
• Fish swarm Clustering
• Cat Swarm Clustering
Thank You

Data Clustering Using Swarm Intelligence Algorithms An Overview

  • 1.
    Data Clustering Using SwarmIntelligence Algorithms An Overview Faculty of Computers and Information, Cairo University and SRGE member Mona M.Soliman http://www.egyptscience.net Bio-inspiring and evolutionary computation: Trends, applications and open issues workshop, 7 Nov. 2015 Faculty of Computers and Information, Cairo University
  • 2.
    Agenda  Introduction  Typesof data clustering  Classical clustering Algorithms  Swarm Intelligence Algorithms  Clustering with SI Algorithms 2
  • 3.
     Clustering meansthe act of partitioning an unlabeled dataset into groups of similar objects.  Each group, called a `cluster', consists of objects that are similar between themselves and dissimilar to objects of other groups.  From a machine learning perspective, clusters correspond to the hidden patterns in data, the search for clusters is a kind of unsupervised learning, and the resulting system represents a data concept. 3 Introduction Problem Definition
  • 4.
     In thepast few decades, cluster analysis has played a central role in a variety of fields ranging from :  Engineering (machine learning, artificial intelligence, pattern recognition, mechanical engineering, electrical engineering)  Computer sciences (web mining, spatial database analysis, textual document collection, image segmentation)  Life and medical sciences (genetics, biology, microbiology,paleontology, psychiatry, pathology)  Earth sciences (geography. geology, remote sensing)  Social sciences (sociology, psychology, archeology, education)  Economics (marketing, business) 4 Introduction Motivation
  • 5.
    What is agood cluster is 5 Inter-cluster distances are maximized Intra-cluster distances are minimized
  • 6.
    Types of DataClustering 6 Data Clustering Hierarchal Agglomerative Divisive Partitional Error Minimizatio n Graph theoretic Density Based Model Based bottom- up Top- down K means minimal Spannin g Tree expectatio n maximatio n -Decision tree -Neural network
  • 7.
  • 8.
    Types of DataClustering 8 Data Clustering Hierarchal Agglomerative Divisive Partitional Error Minimizatio n Graph theoretic Density Based Model Based bottom- up Top- down K means minimal Spannin g Tree expectatio n maximatio n -Decision tree -Neural network
  • 9.
    Original Points APartitional Clustering Partitinal clustering
  • 10.
    The Classical ClusteringAlgorithms k-means Algorithm  The K-means algorithm groups D-dimensional data vectors into a predefined number of clusters on the basis of the Euclidean distance as the similarity criteria.  Euclidean distances among data vectors are minimum for data vectors within a cluster as compared with distances to other data vectors in  Vectors of the same cluster are associated with one centroid vector, which represents the center of that cluster and is the mean of the data vectors that belong together. 10
  • 11.
    Swarm Intelligence Algorithms BiologicalFoundation  The collective and social behavior of living creatures motivated researchers to undertake the study of today what is known as Swarm Intelligence  The efforts to mimic such behaviors through computer simulation finally resulted into the fascinating field of SI.  SI systems are typically made up of a population of simple agents interacting locally with one another and with their environment. 11
  • 12.
     The behaviorof a single ant, bee, termite and wasp often is too simple, but their collective and social behavior is of paramount significance.  Ant Colony Optimization (1992)  Particle Swarm Optimization (1995)  Fish Swarm Optimization (2002)  Bee Swarm Optimization (2005)  Cat Swarm Optimization (2006)  Firefly Optimization (2008)  Cuckoo Search Optimization (2009) 12 Swarm Intelligence Algorithms An overview
  • 13.
    Clustering with theSI Algorithms Relevance of SI Algorithms in Clustering - SI algorithms are mainly stochastic search and optimization techniques. - All swarm intelligence algorithms are based on population, where their iterative procedure leads to improve the position of individual in population and subsequently, their movement toward the better positions. - They are efficient, adaptive and robust search methods producing near optimal solutions and have a large amount of implicit parallelism.Data clustering may be well formulated as a difficult global optimization problem; thereby making the application of SI tools more obvious and appropriate. 13
  • 14.
    Improving the performanceof existing classical clustering methods (e.g. k-means , k-medoid, fuzzy clustering ) • K-means clustering have many drawbacks: Such as being trapped in local minimum and being sensitive to initial cluster centers • improve the cluster quality by refinement algorithm. (ACO,PSO,Bee Swarm, Firefly Swarm) • Determining the optimal number of clusters • Determine the initial cluster centers Clustering with the SI Algorithms Relevance of SI Algorithms in Clustering
  • 15.
    15 Clustering with theSI Algorithms Relevance of SI Algorithms in Clustering Creation of clustering algorithm based on SI algorithms • Fish swarm Clustering • Cat Swarm Clustering
  • 16.