DATA CLUSTERING AND
OPTIMIZATION TECHNIQUES
Information Science & Informatics
Informatics and Neuroinformatics
1
Spyros Ktenas
Spyros Ktenas - http://open-works.org/profiles/spyros-ktenas
WHAT IS CLUSTERING
 Is the process in which a set of objects are separated
into a set of logical groups. Entry of objects in a group is
translated as a similarity of these objects and vice versa
(objects belonging to different groups are dissimilar).
The similarity or not, among the objects, essentially
depends on the specific problem and the form of the
"objects". In the bibliography is seen as grouping and
unsupervised learning. Objects can also be mentioned
with different terms (patterns, vectors).
2
Clustering
Clustering
Spyros Ktenas - http://open-works.org/profiles/spyros-ktenas
CLUSTERING APPLICATIONS
 Health, Biology, Constructions, Insurance, Marketing, Technology,
Academic, Networks
 Classification of species
 Housing planning
 Calcification of customers
 Search engines group items as clusters
 Drug Activity Prediction
3
Clustering
Spyros Ktenas - http://open-works.org/profiles/spyros-ktenas
CLUSTERING ALGORITHMS
 Connectivity-based clustering (hierarchical clustering)
Items are more similar to nearby items
 Centroid-based clustering (K-means)
The algorithm starts by separating the starting items into k initial sets either
randomly or using localized data. It then calculates the centroid of each set,
implements a new separation so that each point is related to the nearest
centroid. Then the centroid is recalculated for the new groups, the algorithm
repeats the two steps until the items can not be changed (the centroid
remains unchanged).
 Distribution-based clustering
 Density-based clustering (DBSCAN)
4
Clustering
Images from Wikipedia
Spyros Ktenas - http://open-works.org/profiles/spyros-ktenas
CLUSTERING ALGORITHMS OPTIMIZATION PAPERS
 Swarm Intelligence Algorithms for Data Clustering
Ajith Abraham, Swagatam Das, and Sandip Roy
Bio-inspired algorithms - Swarm Intelligence (SI) has successfully been applied
to a number of real world clustering problems. This chapter explores the role
of SI in clustering different kinds of Datasets.
The proposed algorithm can automatically compute the optimal number of
clusters in any dataset and thus requires minimal user intervention.
Comparison with a state of the art GA based clustering strategy, reveals the
superiority of the algorithm both in terms of accuracy and speed.
 Initializing Partition-Optimization Algorithms
Ranjan Maitra
Partition-optimization approaches, such as k-means or expectation-
maximization (EM) algorithms, are sub-optimal and find solutions in the
vicinity of their initialization. This paper proposes a staged approach to
specifying initial values by finding a large number of local modes and then
obtaining representatives from the most separated ones. 5
Clustering
Spyros Ktenas - http://open-works.org/profiles/spyros-ktenas
CLUSTERING ALGORITHMS OPTIMIZATION PAPERS
 Clusterpath: An Algorithm for Clustering using Convex Fusion Penalties
Toby Dylan Hocking, Armand Joulin, Francis Bach, Jean-Philippe Vert
A convex relaxation of hierarchical clustering, which results in a family of
objective functions with a natural geometric interpretation. The method
experimentally gives state-of-the-art results similar to spectral clustering for
non-convex clusters, and has the added benefit of learning a tree structure
from the data Initializing Partition-Optimization Algorithms.
6
Clustering
Spyros Ktenas - http://open-works.org/profiles/spyros-ktenas
CLUSTERING ALGORITHMS OPTIMIZATION PAPERS
 An Optimized Version of the K-Means Clustering Algorithm
Marian Poteras,Marian Cristian Mihaescu,Mihai Mocanu
The paper describes an optimized version of the K-Means algorithm. The
optimization refers to the running time. The implementation proposed in this
paper distinguishes data elements which won’t change their cluster during the
next iteration and those who might change it, reducing significantly the
workload especially for large data. The prototype showed up to 70% reduction
of the running time.
7
Clustering
Spyros Ktenas - http://open-works.org/profiles/spyros-ktenas
K-MEANS PHP IMPLEMENTATION
 The first steps of a PHP k-means implementation completed
-UI
-Input Validation
-Initializations
-Clustering for first Iteration
 Future Work
-Iterations implementation
-Results Presentation
8
Clustering
Spyros Ktenas - http://open-works.org/profiles/spyros-ktenas
CONCLUSIONS
9
Clustering
 Although data clustering is an old problem, it remains an active
field of scientific research. No algorithm has been found that can
group all real-world data efficiently and error-free. In order to judge
the quality of clustering, we need a specially designed statistical
mathematical function called clustering validity, however
bibliographic research reveals that most of these validity indicators
are empirically designed and there is no universally good index that
can work.
THANK YOU

Data clustering and optimization techniques

  • 1.
    DATA CLUSTERING AND OPTIMIZATIONTECHNIQUES Information Science & Informatics Informatics and Neuroinformatics 1 Spyros Ktenas
  • 2.
    Spyros Ktenas -http://open-works.org/profiles/spyros-ktenas WHAT IS CLUSTERING  Is the process in which a set of objects are separated into a set of logical groups. Entry of objects in a group is translated as a similarity of these objects and vice versa (objects belonging to different groups are dissimilar). The similarity or not, among the objects, essentially depends on the specific problem and the form of the "objects". In the bibliography is seen as grouping and unsupervised learning. Objects can also be mentioned with different terms (patterns, vectors). 2 Clustering Clustering
  • 3.
    Spyros Ktenas -http://open-works.org/profiles/spyros-ktenas CLUSTERING APPLICATIONS  Health, Biology, Constructions, Insurance, Marketing, Technology, Academic, Networks  Classification of species  Housing planning  Calcification of customers  Search engines group items as clusters  Drug Activity Prediction 3 Clustering
  • 4.
    Spyros Ktenas -http://open-works.org/profiles/spyros-ktenas CLUSTERING ALGORITHMS  Connectivity-based clustering (hierarchical clustering) Items are more similar to nearby items  Centroid-based clustering (K-means) The algorithm starts by separating the starting items into k initial sets either randomly or using localized data. It then calculates the centroid of each set, implements a new separation so that each point is related to the nearest centroid. Then the centroid is recalculated for the new groups, the algorithm repeats the two steps until the items can not be changed (the centroid remains unchanged).  Distribution-based clustering  Density-based clustering (DBSCAN) 4 Clustering Images from Wikipedia
  • 5.
    Spyros Ktenas -http://open-works.org/profiles/spyros-ktenas CLUSTERING ALGORITHMS OPTIMIZATION PAPERS  Swarm Intelligence Algorithms for Data Clustering Ajith Abraham, Swagatam Das, and Sandip Roy Bio-inspired algorithms - Swarm Intelligence (SI) has successfully been applied to a number of real world clustering problems. This chapter explores the role of SI in clustering different kinds of Datasets. The proposed algorithm can automatically compute the optimal number of clusters in any dataset and thus requires minimal user intervention. Comparison with a state of the art GA based clustering strategy, reveals the superiority of the algorithm both in terms of accuracy and speed.  Initializing Partition-Optimization Algorithms Ranjan Maitra Partition-optimization approaches, such as k-means or expectation- maximization (EM) algorithms, are sub-optimal and find solutions in the vicinity of their initialization. This paper proposes a staged approach to specifying initial values by finding a large number of local modes and then obtaining representatives from the most separated ones. 5 Clustering
  • 6.
    Spyros Ktenas -http://open-works.org/profiles/spyros-ktenas CLUSTERING ALGORITHMS OPTIMIZATION PAPERS  Clusterpath: An Algorithm for Clustering using Convex Fusion Penalties Toby Dylan Hocking, Armand Joulin, Francis Bach, Jean-Philippe Vert A convex relaxation of hierarchical clustering, which results in a family of objective functions with a natural geometric interpretation. The method experimentally gives state-of-the-art results similar to spectral clustering for non-convex clusters, and has the added benefit of learning a tree structure from the data Initializing Partition-Optimization Algorithms. 6 Clustering
  • 7.
    Spyros Ktenas -http://open-works.org/profiles/spyros-ktenas CLUSTERING ALGORITHMS OPTIMIZATION PAPERS  An Optimized Version of the K-Means Clustering Algorithm Marian Poteras,Marian Cristian Mihaescu,Mihai Mocanu The paper describes an optimized version of the K-Means algorithm. The optimization refers to the running time. The implementation proposed in this paper distinguishes data elements which won’t change their cluster during the next iteration and those who might change it, reducing significantly the workload especially for large data. The prototype showed up to 70% reduction of the running time. 7 Clustering
  • 8.
    Spyros Ktenas -http://open-works.org/profiles/spyros-ktenas K-MEANS PHP IMPLEMENTATION  The first steps of a PHP k-means implementation completed -UI -Input Validation -Initializations -Clustering for first Iteration  Future Work -Iterations implementation -Results Presentation 8 Clustering
  • 9.
    Spyros Ktenas -http://open-works.org/profiles/spyros-ktenas CONCLUSIONS 9 Clustering  Although data clustering is an old problem, it remains an active field of scientific research. No algorithm has been found that can group all real-world data efficiently and error-free. In order to judge the quality of clustering, we need a specially designed statistical mathematical function called clustering validity, however bibliographic research reveals that most of these validity indicators are empirically designed and there is no universally good index that can work. THANK YOU