cluster.pptx

Cluster analysis

lustering is a statistical classification approach for the
supervised learning. Cluster analysis or clustering is
the task of grouping a set of objects in such a way that
objects in the same group (called a cluster) are more
similar to each other than to those in other groups
(clusters).
 It is a main task of exploratory data mining, and a
common technique for statistical data analysis, used in
many fields including machine learning, pattern
recognition, image analysis and data compression.

Contin..
 Clustering can be achieved by various algorithms that
differ significantly in their understanding of what
constitutes a cluster and how to efficiently find them.
Popular notions of clusters include groups with small
distances between cluster members, dense areas of the
data space, intervals or particular statistical
distributions.

Applications of unsupervised
machine learning
 Some applications of unsupervised machine learning
techniques are:
 Clustering automatically split the dataset into groups base
on their similarities
 Anomaly detection can discover unusual data points in
your dataset. It is useful for finding fraudulent transactions
 Association mining identifies sets of items which often
occur together in your dataset
 Latent variable models are widely used for data
preprocessing. Like reducing the number of features in a
dataset or decomposing the dataset into multiple
components

Disadvantages of unsupervised
learning
 You cannot get precise information regarding data
sorting, and the output as data used in unsupervised
learning is labeled and not known
 Less accuracy of the results is because the input data is
not known and not labeled by people in advance. This
means that the machine requires to do this itself.
 The spectral classes do not always correspond to
informational classes.
 The user needs to spend time interpreting and label
the classes which follow that classification.

Cluster Models
 Connectivity models( Hierarchy clustering):
clustering builds models based on distance
connectivity.
 Centroid models: k-means algorithm represents each
cluster by a single mean vector.
 Distribution models: clusters are modeled using
statistical distributions, such as multivariate normal
distributions.
 Density models: DBSCAN and OPTICS defines clusters
as connected dense regions in the data space.

Conti..
 Graph-based models: a clique, that is, a subset of nodes
in a graph such that every two nodes in the subset are
connected by an edge can be considered as a
prototypical form of cluster. Relaxations of the
complete connectivity requirement (a fraction of the
edges can be missing) are known as quasi-cliques.
 Signed graph models: Every path in a signed graph has
a sign from the product of the signs on the edges. The
weaker “clusterability axiom” yields results with more
than two clusters, or subgraphs with only positive
edges.

Conti..
 Neural models: the most well known unsupervised neural
network is the self organizing map and these models can
usually be characterized as similar to one or more of the
above models, and including subspace models when neural
networks implement a form of Principal component
analysis or Independent component analysis.
 Subspace models: clusters are modeled with both cluster
members and relevant attributes.
 Group models: some algorithms do not provide a refined
model for their results and just provide the grouping
information.

cluster.pptx

More Related Content

Similar to cluster.pptx

More from KamranAli649587

Recently uploaded

cluster.pptx