SlideShare a Scribd company logo
Cluster Analysis
7/2/2019 Compiled by : Kamal Acharya 1
7/2/2019 Compiled by : Kamal Acharya 2
Cluster Analysis(Clustering/automatic classification/ data segmentation)
• Clustering is the process of grouping a set of data objects into multiple
groups or clusters so that objects within a cluster have high similarity,
but are very dissimilar to objects in other clusters.
• Dissimilarities and similarities are assessed based on the attribute
values describing the objects and often involve distance measures.
7/2/2019 Compiled by : Kamal Acharya 3
Contd..
• Clustering is known as unsupervised leaning because the class
label information is not present. For this reason, clustering is a
form of learning by observation, rather than learning by
examples.
• Different clustering methods may generate different clusterings
on the same data set.
• The partitioning is not performed by humans, but by the
clustering algorithm.
7/2/2019 Compiled by : Kamal Acharya 4
Contd..
• Hence, Clustering is used:
– As a stand-alone tool to get insight into data distribution
• Visualization of clusters may unveil important information
– As a preprocessing step for other algorithms
• Efficient indexing or compression often relies on clustering
7/2/2019 Compiled by : Kamal Acharya 5
Some Applications of Clustering
• Cluster analysis has been widely used in numerous applications
such as:
– In business intelligence
– In image recognization
– In web search
– In Outlier detection
– In biology
7/2/2019 Compiled by : Kamal Acharya 6
Contd..
• In Business intelligence:
– clustering can help marketers discover distinct groups in their
customer bases and characterize customer groups based on
purchasing patterns so that, for example, advertising can be
appropriately targeted..
7/2/2019 Compiled by : Kamal Acharya 7
Contd..
• In image recognization:
– In image recognition, clustering can be used to discover clusters or
“subclasses” in handwritten character recognition systems.
– For example: We can use clustering to determine subclasses for “1,” each
of which represents a variation on the way in which 1 can be written.
7/2/2019 Compiled by : Kamal Acharya 8
Contd..
• In web search
– document grouping: Clustering can be used to organize the
search results into groups and present the results in a concise
and easily accessible way.
– cluster Weblog data to discover groups of similar access
patterns.
7/2/2019 Compiled by : Kamal Acharya 9
Contd..
• In Outlier detection
– Clustering can also be used for outlier detection, where
outliers (values that are “far away” from any cluster) may be
more interesting than common cases.
– Applications of outlier detection include the detection of
credit card fraud and the monitoring of criminal activities in
electronic commerce.
7/2/2019 Compiled by : Kamal Acharya 10
Contd..
• In biology:
– In biology, it can be used to derive plant and animal
taxonomies, categorize genes with similar functionality, and
gain insight into structures inherent in populations.
7/2/2019 Compiled by : Kamal Acharya 11
What Is Good Clustering?
• A good clustering method will produce high quality clusters with
– high intra-class similarity
– low inter-class similarity
• The quality of a clustering result depends on both the similarity
measure used by the method and its implementation.
• The quality of a clustering method is also measured by its ability
to discover some or all of the hidden patterns.
7/2/2019 Compiled by : Kamal Acharya 12
Requirements for clustering as a data mining tool
• The following are typical requirements of clustering in data
mining.
– Scalability
– Ability to deal with different types of attributes
– Discovery of clusters with arbitrary shape
– Requirements for domain knowledge to determine input parameters
– Ability to deal with noisy data
– Incremental clustering and insensitivity to input order
– Capability of clustering high-dimensionality data
– Constraint-based clustering
– Interpretability and usability
7/2/2019 Compiled by : Kamal Acharya 13
Contd..
• Scalability:
– Many clustering algorithms work well on small data sets
containing fewer than several hundred data objects; however,
a large database may contain millions of objects.
– Clustering on a sample of a given large data set may lead to
biased results.
– Highly scalable clustering algorithms are needed.
7/2/2019 Compiled by : Kamal Acharya 14
Contd..
• Ability to deal with different types of attributes:
– Many algorithms are designed to cluster interval-based
(numerical) data.
– However, applications may require clustering other types of
data, such as binary, categorical (nominal), and ordinal data,
or mixtures of these data types.
7/2/2019 Compiled by : Kamal Acharya 15
Contd..
• Discovery of clusters with arbitrary shape:
– Many clustering algorithms determine clusters based on
Euclidean distance measures.
– Algorithms based on such distance measures tend to find
spherical clusters with similar size and density.
– However, a cluster could be of any shape.
– It is important to develop algorithms that can detect clusters
of arbitrary shape.
7/2/2019 Compiled by : Kamal Acharya 16
Contd..
• Minimal requirements for domain knowledge to
determine input parameters:
– Many clustering algorithms require users to input certain
parameters in cluster analysis (such as the number of desired
clusters).
– The clustering results can be quite sensitive to input parameters.
– Parameters are often difficult to determine, especially for data sets
containing high-dimensional objects.
– This not only burdens users, but it also makes the quality of
clustering difficult to control.
7/2/2019 Compiled by : Kamal Acharya 17
Contd..
• Ability to deal with noisy data:
– Most real-world databases contain outliers or missing,
unknown, or erroneous data.
– Some clustering algorithms are sensitive to such data and may
lead to clusters of poor quality.
7/2/2019 Compiled by : Kamal Acharya 18
Contd..
• Incremental clustering and insensitivity to the order of input
records:
– Some clustering algorithms cannot incorporate newly inserted
data (i.e., database updates) into existing clustering structures
and, instead, must determine a new clustering from scratch.
– Some clustering algorithms are sensitive to the order of input
data. That is, given a set of data objects, such an algorithm
may return dramatically different clusterings depending on
the order of presentation of the input objects.
– It is important to develop incremental clustering algorithms
and algorithms that are insensitive to the order of input.
7/2/2019 Compiled by : Kamal Acharya 19
Contd..
• High dimensionality:
– A database or a data warehouse can contain several
dimensions or attributes.
– Many clustering algorithms are good at handling low-
dimensional data, involving only two to three dimensions.
– Human eyes are good at judging the quality of clustering for
up to three dimensions.
– Finding clusters of data objects in high dimensional space is
challenging, especially considering that such data can be
sparse and highly skewed.
7/2/2019 Compiled by : Kamal Acharya 20
Contd..
• Constraint-based clustering:
– Real-world applications may need to perform clustering under
various kinds of constraints.
– Suppose that your job is to choose the locations for a given
number of new automatic banking machines (ATMs) in a city.
– To decide upon this, you may cluster households while
considering constraints such as the city’s rivers and highway
networks, and the type and number of customers per cluster.
– A challenging task is to find groups of data with good
clustering behavior that satisfy specified constraints.
7/2/2019 Compiled by : Kamal Acharya 21
Contd..
• Interpretability and usability:
– Users expect clustering results to be interpretable,
comprehensible, and usable.
– That is, clustering may need to be tied to specific semantic
interpretations and applications.
– It is important to study how an application goal may
influence the selection of clustering features and methods.
7/2/2019 Compiled by : Kamal Acharya 22
Aspects of clustering
• A clustering algorithm/methods
– Partitional clustering
– Hierarchical clustering
– …
• A distance (similarity, or dissimilarity) function
• Clustering quality
– Inter-clusters distance  maximized
– Intra-clusters distance  minimized
• The quality of a clustering result depends on the
algorithm, the distance function, and the application.
7/2/2019 Compiled by : Kamal Acharya 23
Major Clustering Methods:
• In general, the major fundamental clustering methods can be
classified into the following categories:
– Partitioning Methods
– Hierarchical Methods
– Density-Based Methods
– Grid-Based Methods
7/2/2019 Compiled by : Kamal Acharya 24
Contd..
• Partitioning Methods:
– A partitioning method constructs k partitions of the data, where each
partition represents a cluster and k <= n. That is, it classifies the data
into k groups, which together satisfy the following requirements:
• Each group must contain at least one object, and
• Each object must belong to exactly one group.
– A partitioning method creates an initial partitioning. It then uses an
iterative relocation technique that attempts to improve the
partitioning by moving objects from one group to another.
– The general criterion of a good partitioning is that objects in the
same cluster are close or related to each other, whereas objects of
different clusters are far apart or very different.
7/2/2019 Compiled by : Kamal Acharya 25
Contd..
• E.g., K-means, and K-medoids
7/2/2019 Compiled by : Kamal Acharya 26
Contd..
• Hierarchical Methods:
– A hierarchical method creates a hierarchical decomposition of
the given set of data objects.
– A hierarchical method can be classified as being either
agglomerative or divisive, based on how the hierarchical
decomposition is formed.
7/2/2019 Compiled by : Kamal Acharya 27
Contd..
• E.g.,
7/2/2019 Compiled by : Kamal Acharya 28
Contd..
• The agglomerative approach, also called the bottom-up approach, starts
with each object forming a separate group. It successively merges the
objects or groups that are close to one another, until all of the groups
are merged into one or until a termination condition holds.
• The divisive approach, also called the top-down approach, starts with
all of the objects in the same cluster. In each successive iteration, a
cluster is split up into smaller clusters, until eventually each object is in
one cluster, or until a termination condition holds.
7/2/2019 Compiled by : Kamal Acharya 29
Contd..
• Density-based methods:
– General idea is to continue growing the given cluster as long
as the density in the neighborhood exceeds some threshold;
that is, for each data point within a given cluster, the
neighborhood of a given radius has to contain at least a
minimum number of points.
– Such a method can be used to filter out noise (outliers)and
discover clusters of arbitrary shape.
– E.g., DBSCAN
7/2/2019 Compiled by : Kamal Acharya 30
Contd..
• E.g.,
7/2/2019 Compiled by : Kamal Acharya 31
Contd..
• Grid-based methods:
– Grid-based methods quantize the object space into a finite
number of cells that form a grid structure.
– All the clustering operations are performed on the grid
structure.
– E.g., STING
7/2/2019 Compiled by : Kamal Acharya 32
Contd..
7/2/2019 Compiled by : Kamal Acharya 33
Partitioning Methods
• Given a data set, D, of n objects, and k, the number of clusters to
form, a partitioning algorithm organizes the objects into k
partitions (k<=n), where each partition represents a cluster.
7/2/2019 Compiled by : Kamal Acharya 34
k-Means: A Centroid-Based Technique
• A centroid-based partitioning technique uses the centroid of a cluster,
Ci , to represent that cluster.
• The centroid of a cluster is its center point such as the mean or medoid
of the objects (or points) assigned to the cluster.
• The difference between an object and ci, the representative of
the cluster, is measured by dist(p, ci),
• where dist(i, j) is the Euclidean distance between two points
7/2/2019 Compiled by : Kamal Acharya 35
Contd..
• The k-means algorithm defines the centroid of a cluster as the
mean value of the points within the cluster. It proceeds as
follows:
– First, it randomly selects k of the objects in D, each of which initially
represents a cluster mean or center.
– For each of the remaining objects, an object is assigned to the cluster to
which it is the most similar, based on the Euclidean distance between the
object and the cluster mean.
– The k-means algorithm then iteratively improves the within-cluster
variation. For each cluster, it computes the new mean using the objects
assigned to the cluster in the previous iteration. All the objects are then
reassigned using the updated means as the new cluster centers.
– The iterations continue until the assignment is stable, that is, the clusters
formed in the current round are the same as those formed in the previous
round.
7/2/2019 Compiled by : Kamal Acharya 36
Contd..
• Algorithm:
– The k-means algorithm for partitioning, where each cluster’s center is
represented by the mean value of the objects in the cluster.
7/2/2019 Compiled by : Kamal Acharya 37
The K-Means Clustering Method
• Example
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K=2
Arbitrarily choose K
object as initial cluster
center
Assign
each
objects
to most
similar
center
Update
the
cluster
means
Update
the
cluster
means
reassignreassign
7/2/2019 Compiled by : Kamal Acharya 38
Contd..
• Example1: Clusters the following instances of given data (2-
Dimensional form) with the help of K means algorithm (Take K
= 2)
Instance X Y
1 1 1.5
2 1 4.5
3 2 1.5
4 2 3.5
5 3 2.5
6 3 4
7/2/2019 Compiled by : Kamal Acharya 39
Contd..
• Example 2: Clusters the following instances of given data (2-
Dimensional form) with the help of K means algorithm (Take K
= 2)
Instance X Y
1 1 2.5
2 1 4.5
3 2.5 3
4 2 1.5
5 4.5 1.5
6 4 5
7/2/2019 Compiled by : Kamal Acharya 40
Hierarchical clustering
• A hierarchical clustering method works by grouping data objects
into a hierarchy or “tree” of clusters.
• Representing data objects in the form of a hierarchy is useful for
data summarization and visualization.
7/2/2019 Compiled by : Kamal Acharya 41
Contd..
• Depending on whether the hierarchical decomposition is formed
in a bottom-up (merging) or top-down (splitting) fashion a
hierarchical clustering method can be classified into two
categories:
– Agglomerative Hierarchical Clustering and
– Divisive Hierarchical Clustering
7/2/2019 Compiled by : Kamal Acharya 42
Contd..
• Agglomerative Hierarchical Clustering:
– uses a bottom-up strategy.
– starts by letting each object form its own cluster and
iteratively merges clusters into larger and larger clusters, until
all the objects are in a single cluster or certain termination
conditions(desired number of clusters) are satisfied.
– For the merging step, it finds the two clusters that are closest
to each other (according to some similarity measure), and
combines the two to form one cluster.
7/2/2019 Compiled by : Kamal Acharya 43
Contd..
• Example: a data set of five objects, {a, b, c, d, e}. Initially, AGNES
(AGglomerative NESting), the agglomerative method, places each object into
a cluster of its own. The clusters are then merged step-by-step according to
some criterion (e.g., minimum Euclidean distance).
7/2/2019 Compiled by : Kamal Acharya 44
Contd..
• Divisive hierarchical clustering :
– A divisive hierarchical clustering method employs a top-down
strategy.
– It starts by placing all objects in one cluster, which is the
hierarchy’s root.
– It then divides the root cluster into several smaller sub-clusters, and
recursively partitions those clusters into smaller ones.
– The partitioning process continues until each cluster at the lowest
level either containing only one object, or the objects within a
cluster are sufficiently similar to each other.
7/2/2019 Compiled by : Kamal Acharya 45
Contd..
• Example: DIANA (DIvisive ANAlysis), a divisive hierarchical clustering
method:
– a data set of five objects, {a, b, c, d, e}. All the objects are used to form
one initial cluster. The cluster is split according to some principle such as
the maximum Euclidean distance between the closest neighboring objects
in the cluster. The cluster-splitting process repeats until, eventually, each
new cluster contains only a single object.
7/2/2019 Compiled by : Kamal Acharya 46
Contd..
• agglomerative versus divisive hierarchical clustering:
– Organize objects into a hierarchy using a bottom-up or top-
down strategy, respectively.
– Agglomerative methods start with individual objects as
clusters, which are iteratively merged to form larger clusters.
– Conversely, divisive methods initially let all the given objects
form one cluster, which they iteratively split into smaller
clusters.
7/2/2019 Compiled by : Kamal Acharya 47
Contd..
• Hierarchical clustering methods can encounter difficulties regarding
the selection of merge or split points.
– Such a decision is critical, because once a group of objects is
merged or split, the process at the next step will operate on the
newly generated clusters. It will neither undo what was done
previously, nor perform object swapping between clusters.
– Thus, merge or split decisions, if not well chosen, may lead to low-
quality clusters.
• Moreover, the methods do not scale well because each decision of
merge or split needs to examine and evaluate many objects or clusters.
7/2/2019 Compiled by : Kamal Acharya 48
Density Based Methods
• Partitioning methods and hierarchical clustering are suitable for finding
spherical-shaped clusters.
• Moreover, they are also severely affected by the presence of noise and
outliers in the data.
• Unfortunately, real life data contain:
– Clusters of arbitrary shape such as oval, linear, s-shaped, etc.
– Many noise
• Solution : Density based methods
7/2/2019 Compiled by : Kamal Acharya 49
Contd..
• Basic Idea behind Density based methods:
– Model clusters as dense regions in the data space, separated by sparse
regions.
• Major features:
– Discover clusters of arbitrary shape(e.g., oval, s-shaped, etc)
– Handle noise
– Need density parameters as termination condition
• E.g., : DBSCAN(Density Based Spatial Clustering of Applications with Noise)
Density-Based Clustering: Background
• Neighborhood of point p=all points within distance e from p:
– NEps(p)={q | dist(p,q) <= e }
• Two parameters:
– e : Maximum radius of the neighbourhood
– MinPts: Minimum number of points in an e -neighbourhood of that point
• If the number of points in the e -neighborhood of p is at least
MinPts, then p is called a core object.
p
q
MinPts = 5
e = 1 cm
7/2/2019 Compiled by : Kamal Acharya 50
Contd..
• Directly density-reachable: A point p is directly density-
reachable from a point q wrt. e, MinPts if
– 1) p belongs to NEps(q)
– 2) core point condition: |NEps (q)| >= MinPts
p
q
MinPts = 5
e = 1 cm
7/2/2019 Compiled by : Kamal Acharya 51
Contd..
• Density-reachable:
– A point p is density-reachable from a point q wrt. Eps, MinPts if there is a
chain of points p1, …, pn, q = p1,….. pn = p such that pi+1 is directly
density-reachable from pi
p
q
p1
7/2/2019 Compiled by : Kamal Acharya 52
Contd..
• Density-connected:
– A point p is density-connected to a point q wrt. Eps, MinPts if there is a
point o such that both, p and q are density-reachable from o wrt. Eps and
MinPts.
p q
o
7/2/2019 Compiled by : Kamal Acharya 53
7/2/2019 Compiled by : Kamal Acharya 54
Contd..• Density = number of points within a specified radius (Eps).
• A point is a core point if it has at least a specified number of
points (MinPts) within Eps.
• These are points that are at the interior of a cluster
• Counts the point itself
• A border point is not a core point, but is in the neighborhood of a
core point
• A noise point is any point that is not a core point or a border
point
e.g.,: Minpts=7
7/2/2019 Compiled by : Kamal Acharya 55
DBSCAN(Density Based Spatial Clustering of Applications with Noise)
• To find the next cluster, DBSCAN randomly selects an unvisited object
from the remaining ones. The clustering process continues until all
objects are visited.
7/2/2019 Compiled by : Kamal Acharya 56
Contd..
7/2/2019 Compiled by : Kamal Acharya 57
Contd..
• Example:
– If Epsilon is 2 and minpoint is 2, what are the clusters that DBScan would
discover with the following 8 examples: A1=(2,10), A2=(2,5), A3=(8,4),
A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9).
• Solution :
– d(a,b) denotes the Eucledian distance between a and b. It is obtained
directly from the distance matrix calculated as follows:
– d(a,b)=sqrt((xb-xa)2+(yb-ya)2))
7/2/2019 Compiled by : Kamal Acharya 58
Contd..
A1 A2 A3 A4 A5 A6 A7 A8
A1 0 √25 √36 √13 √50 √52 √65 √5
A2 0 √37 √18 √25 √17 √10 √20
A3 0 √25 √2 √2 √53 √41
A4 0 √13 √17 √52 √2
A5 0 √2 √45 √25
A6 0 √29 √29
A7 0 √58
A8 0
7/2/2019 Compiled by : Kamal Acharya 59
Contd..
• N2(A1)={};
• N2(A2)={};
• N2(A3)={A5, A6};
• N2(A4)={A8};
• N2(A5)={A3, A6};
• N2(A6)={A3, A5};
• N2(A7)={};
• N2(A8)={A4};
• So A1, A2, and A7 are outliers, while we have two clusters C1={A4,
A8} and C2={A3, A5, A6}
7/2/2019 Compiled by : Kamal Acharya 60
Contd..
7/2/2019 Compiled by : Kamal Acharya 61
Advantages and Disadvantages of DBSCAN algorithm:
• Advantages:
– DBSCAN does not require one to specify the number of clusters in the
data priori, as opposed to k-means.
– DBSCAN can find arbitrarily shaped clusters
– DBSCAN is robust to outliers.
– DBSCAN is mostly insensitive to the ordering of the points in the
database.
– The parameters minPts and ε can be set by a domain expert, if the data is
well understood.
7/2/2019 Compiled by : Kamal Acharya 62
Contd..
• Disadvantages:
– DBSCAN is not entirely deterministic: border points that are reachable
from more than one cluster can be part of either cluster, depending on the
order the data is processed. Fortunately, this situation does not arise often,
and has little impact on the clustering result: both on core points and noise
points, DBSCAN is deterministic
– DBSCAN cannot cluster data sets well with large differences in densities,
since the minPts-ε combination cannot then be chosen appropriately for all
clusters.
– If the data and scale are not well understood, choosing a meaningful
distance threshold ε can be difficult.
7/2/2019 Compiled by : Kamal Acharya 63
Homework
• Explain the aims of cluster analysis.
• What is clustering? How is it different than supervised classification?
In what situation clustering can be useful?
• List and explain desired features of cluster analysis.
• Explain the different types of cluster analysis methods and discuss their
features.
• Describe the k-means algorithm and write its strengths and
weaknesses.
• Describe the features of Hierarchical clustering methods? In what
situations are these methods useful?
Thank You !
Compiled by : Kamal Acharya 647/2/2019

More Related Content

What's hot

hierarchical methods
hierarchical methodshierarchical methods
hierarchical methods
rajshreemuthiah
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
Azad public school
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
Krish_ver2
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
Krish_ver2
 
Data Reduction
Data ReductionData Reduction
Data Reduction
Rajan Shah
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
Valerii Klymchuk
 
Data cube computation
Data cube computationData cube computation
Data cube computationRashmi Sheikh
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
ImXaib
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
Khwaja Aamer
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
 
Search Engines
Search EnginesSearch Engines
Search Engines
Kamal Acharya
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Kamalakshi Deshmukh-Samag
 
Introduction to ggplot2
Introduction to ggplot2Introduction to ggplot2
Introduction to ggplot2maikroeder
 
Clustering
ClusteringClustering
Clustering
M Rizwan Aqeel
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
Er. Nawaraj Bhandari
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Salah Amean
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
janani thirupathi
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
Yogendra Tamang
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
PrasiddhaSarma
 

What's hot (20)

hierarchical methods
hierarchical methodshierarchical methods
hierarchical methods
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Data cube computation
Data cube computationData cube computation
Data cube computation
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Introduction to ggplot2
Introduction to ggplot2Introduction to ggplot2
Introduction to ggplot2
 
Clustering
ClusteringClustering
Clustering
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 

Similar to Cluster Analysis

BIM Data Mining Unit5 by Tekendra Nath Yogi
 BIM Data Mining Unit5 by Tekendra Nath Yogi BIM Data Mining Unit5 by Tekendra Nath Yogi
BIM Data Mining Unit5 by Tekendra Nath Yogi
Tekendra Nath Yogi
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
DrGnaneswariG
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
Nandakumar P
 
Descriptive m0deling
Descriptive m0delingDescriptive m0deling
Descriptive m0deling
Muluken Sholaye Tesfaye
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
Pratik Meshram
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptx
SureshPolisetty2
 
A Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data MiningA Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data Mining
Editor IJCATR
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdf
igeabroad
 
Customer segmentation.pptx
Customer segmentation.pptxCustomer segmentation.pptx
Customer segmentation.pptx
Addalashashikumar
 
Clustering: Grouping all Data for Insights
Clustering: Grouping all Data for InsightsClustering: Grouping all Data for Insights
Clustering: Grouping all Data for Insights
sasankkandru1439
 
Assessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data LinkagesAssessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data Linkages
journal ijrtem
 
Cluster analysis (2).docx
Cluster analysis (2).docxCluster analysis (2).docx
Cluster analysis (2).docx
YaseenRashid4
 
How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...
Nicolle Dammann
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
Natasha Grant
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
PRAWEEN KUMAR
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
Mustafa Sherazi
 
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET Journal
 
Introduction to Data Mining and Data Warehousing
Introduction to Data Mining and Data WarehousingIntroduction to Data Mining and Data Warehousing
Introduction to Data Mining and Data Warehousing
Kamal Acharya
 
From data mining to knowledge discovery in
From data mining to knowledge discovery inFrom data mining to knowledge discovery in
From data mining to knowledge discovery in
Raj Kumar Ranabhat
 

Similar to Cluster Analysis (20)

BIM Data Mining Unit5 by Tekendra Nath Yogi
 BIM Data Mining Unit5 by Tekendra Nath Yogi BIM Data Mining Unit5 by Tekendra Nath Yogi
BIM Data Mining Unit5 by Tekendra Nath Yogi
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
Descriptive m0deling
Descriptive m0delingDescriptive m0deling
Descriptive m0deling
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptx
 
A Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data MiningA Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data Mining
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdf
 
Customer segmentation.pptx
Customer segmentation.pptxCustomer segmentation.pptx
Customer segmentation.pptx
 
Clustering: Grouping all Data for Insights
Clustering: Grouping all Data for InsightsClustering: Grouping all Data for Insights
Clustering: Grouping all Data for Insights
 
Assessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data LinkagesAssessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data Linkages
 
Cluster analysis (2).docx
Cluster analysis (2).docxCluster analysis (2).docx
Cluster analysis (2).docx
 
How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...IRJET-  	  Optimal Number of Cluster Identification using Robust K-Means for ...
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...
 
Introduction to Data Mining and Data Warehousing
Introduction to Data Mining and Data WarehousingIntroduction to Data Mining and Data Warehousing
Introduction to Data Mining and Data Warehousing
 
Av24317320
Av24317320Av24317320
Av24317320
 
From data mining to knowledge discovery in
From data mining to knowledge discovery inFrom data mining to knowledge discovery in
From data mining to knowledge discovery in
 

More from Kamal Acharya

Programming the basic computer
Programming the basic computerProgramming the basic computer
Programming the basic computer
Kamal Acharya
 
Computer Arithmetic
Computer ArithmeticComputer Arithmetic
Computer Arithmetic
Kamal Acharya
 
Introduction to Computer Security
Introduction to Computer SecurityIntroduction to Computer Security
Introduction to Computer Security
Kamal Acharya
 
Session and Cookies
Session and CookiesSession and Cookies
Session and Cookies
Kamal Acharya
 
Functions in php
Functions in phpFunctions in php
Functions in php
Kamal Acharya
 
Web forms in php
Web forms in phpWeb forms in php
Web forms in php
Kamal Acharya
 
Making decision and repeating in PHP
Making decision and repeating  in PHPMaking decision and repeating  in PHP
Making decision and repeating in PHP
Kamal Acharya
 
Working with arrays in php
Working with arrays in phpWorking with arrays in php
Working with arrays in php
Kamal Acharya
 
Text and Numbers (Data Types)in PHP
Text and Numbers (Data Types)in PHPText and Numbers (Data Types)in PHP
Text and Numbers (Data Types)in PHP
Kamal Acharya
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHP
Kamal Acharya
 
Capacity Planning of Data Warehousing
Capacity Planning of Data WarehousingCapacity Planning of Data Warehousing
Capacity Planning of Data Warehousing
Kamal Acharya
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
Kamal Acharya
 
Web Mining
Web MiningWeb Mining
Web Mining
Kamal Acharya
 
Information Privacy and Data Mining
Information Privacy and Data MiningInformation Privacy and Data Mining
Information Privacy and Data Mining
Kamal Acharya
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
Kamal Acharya
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
Kamal Acharya
 
Functions in Python
Functions in PythonFunctions in Python
Functions in Python
Kamal Acharya
 
Python Flow Control
Python Flow ControlPython Flow Control
Python Flow Control
Kamal Acharya
 
Fundamentals of Python Programming
Fundamentals of Python ProgrammingFundamentals of Python Programming
Fundamentals of Python Programming
Kamal Acharya
 

More from Kamal Acharya (20)

Programming the basic computer
Programming the basic computerProgramming the basic computer
Programming the basic computer
 
Computer Arithmetic
Computer ArithmeticComputer Arithmetic
Computer Arithmetic
 
Introduction to Computer Security
Introduction to Computer SecurityIntroduction to Computer Security
Introduction to Computer Security
 
Session and Cookies
Session and CookiesSession and Cookies
Session and Cookies
 
Functions in php
Functions in phpFunctions in php
Functions in php
 
Web forms in php
Web forms in phpWeb forms in php
Web forms in php
 
Making decision and repeating in PHP
Making decision and repeating  in PHPMaking decision and repeating  in PHP
Making decision and repeating in PHP
 
Working with arrays in php
Working with arrays in phpWorking with arrays in php
Working with arrays in php
 
Text and Numbers (Data Types)in PHP
Text and Numbers (Data Types)in PHPText and Numbers (Data Types)in PHP
Text and Numbers (Data Types)in PHP
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHP
 
Capacity Planning of Data Warehousing
Capacity Planning of Data WarehousingCapacity Planning of Data Warehousing
Capacity Planning of Data Warehousing
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Information Privacy and Data Mining
Information Privacy and Data MiningInformation Privacy and Data Mining
Information Privacy and Data Mining
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Functions in Python
Functions in PythonFunctions in Python
Functions in Python
 
Python Flow Control
Python Flow ControlPython Flow Control
Python Flow Control
 
Fundamentals of Python Programming
Fundamentals of Python ProgrammingFundamentals of Python Programming
Fundamentals of Python Programming
 

Recently uploaded

STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
DhatriParmar
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
ShivajiThube2
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 

Recently uploaded (20)

STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 

Cluster Analysis

  • 1. Cluster Analysis 7/2/2019 Compiled by : Kamal Acharya 1
  • 2. 7/2/2019 Compiled by : Kamal Acharya 2 Cluster Analysis(Clustering/automatic classification/ data segmentation) • Clustering is the process of grouping a set of data objects into multiple groups or clusters so that objects within a cluster have high similarity, but are very dissimilar to objects in other clusters. • Dissimilarities and similarities are assessed based on the attribute values describing the objects and often involve distance measures.
  • 3. 7/2/2019 Compiled by : Kamal Acharya 3 Contd.. • Clustering is known as unsupervised leaning because the class label information is not present. For this reason, clustering is a form of learning by observation, rather than learning by examples. • Different clustering methods may generate different clusterings on the same data set. • The partitioning is not performed by humans, but by the clustering algorithm.
  • 4. 7/2/2019 Compiled by : Kamal Acharya 4 Contd.. • Hence, Clustering is used: – As a stand-alone tool to get insight into data distribution • Visualization of clusters may unveil important information – As a preprocessing step for other algorithms • Efficient indexing or compression often relies on clustering
  • 5. 7/2/2019 Compiled by : Kamal Acharya 5 Some Applications of Clustering • Cluster analysis has been widely used in numerous applications such as: – In business intelligence – In image recognization – In web search – In Outlier detection – In biology
  • 6. 7/2/2019 Compiled by : Kamal Acharya 6 Contd.. • In Business intelligence: – clustering can help marketers discover distinct groups in their customer bases and characterize customer groups based on purchasing patterns so that, for example, advertising can be appropriately targeted..
  • 7. 7/2/2019 Compiled by : Kamal Acharya 7 Contd.. • In image recognization: – In image recognition, clustering can be used to discover clusters or “subclasses” in handwritten character recognition systems. – For example: We can use clustering to determine subclasses for “1,” each of which represents a variation on the way in which 1 can be written.
  • 8. 7/2/2019 Compiled by : Kamal Acharya 8 Contd.. • In web search – document grouping: Clustering can be used to organize the search results into groups and present the results in a concise and easily accessible way. – cluster Weblog data to discover groups of similar access patterns.
  • 9. 7/2/2019 Compiled by : Kamal Acharya 9 Contd.. • In Outlier detection – Clustering can also be used for outlier detection, where outliers (values that are “far away” from any cluster) may be more interesting than common cases. – Applications of outlier detection include the detection of credit card fraud and the monitoring of criminal activities in electronic commerce.
  • 10. 7/2/2019 Compiled by : Kamal Acharya 10 Contd.. • In biology: – In biology, it can be used to derive plant and animal taxonomies, categorize genes with similar functionality, and gain insight into structures inherent in populations.
  • 11. 7/2/2019 Compiled by : Kamal Acharya 11 What Is Good Clustering? • A good clustering method will produce high quality clusters with – high intra-class similarity – low inter-class similarity • The quality of a clustering result depends on both the similarity measure used by the method and its implementation. • The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns.
  • 12. 7/2/2019 Compiled by : Kamal Acharya 12 Requirements for clustering as a data mining tool • The following are typical requirements of clustering in data mining. – Scalability – Ability to deal with different types of attributes – Discovery of clusters with arbitrary shape – Requirements for domain knowledge to determine input parameters – Ability to deal with noisy data – Incremental clustering and insensitivity to input order – Capability of clustering high-dimensionality data – Constraint-based clustering – Interpretability and usability
  • 13. 7/2/2019 Compiled by : Kamal Acharya 13 Contd.. • Scalability: – Many clustering algorithms work well on small data sets containing fewer than several hundred data objects; however, a large database may contain millions of objects. – Clustering on a sample of a given large data set may lead to biased results. – Highly scalable clustering algorithms are needed.
  • 14. 7/2/2019 Compiled by : Kamal Acharya 14 Contd.. • Ability to deal with different types of attributes: – Many algorithms are designed to cluster interval-based (numerical) data. – However, applications may require clustering other types of data, such as binary, categorical (nominal), and ordinal data, or mixtures of these data types.
  • 15. 7/2/2019 Compiled by : Kamal Acharya 15 Contd.. • Discovery of clusters with arbitrary shape: – Many clustering algorithms determine clusters based on Euclidean distance measures. – Algorithms based on such distance measures tend to find spherical clusters with similar size and density. – However, a cluster could be of any shape. – It is important to develop algorithms that can detect clusters of arbitrary shape.
  • 16. 7/2/2019 Compiled by : Kamal Acharya 16 Contd.. • Minimal requirements for domain knowledge to determine input parameters: – Many clustering algorithms require users to input certain parameters in cluster analysis (such as the number of desired clusters). – The clustering results can be quite sensitive to input parameters. – Parameters are often difficult to determine, especially for data sets containing high-dimensional objects. – This not only burdens users, but it also makes the quality of clustering difficult to control.
  • 17. 7/2/2019 Compiled by : Kamal Acharya 17 Contd.. • Ability to deal with noisy data: – Most real-world databases contain outliers or missing, unknown, or erroneous data. – Some clustering algorithms are sensitive to such data and may lead to clusters of poor quality.
  • 18. 7/2/2019 Compiled by : Kamal Acharya 18 Contd.. • Incremental clustering and insensitivity to the order of input records: – Some clustering algorithms cannot incorporate newly inserted data (i.e., database updates) into existing clustering structures and, instead, must determine a new clustering from scratch. – Some clustering algorithms are sensitive to the order of input data. That is, given a set of data objects, such an algorithm may return dramatically different clusterings depending on the order of presentation of the input objects. – It is important to develop incremental clustering algorithms and algorithms that are insensitive to the order of input.
  • 19. 7/2/2019 Compiled by : Kamal Acharya 19 Contd.. • High dimensionality: – A database or a data warehouse can contain several dimensions or attributes. – Many clustering algorithms are good at handling low- dimensional data, involving only two to three dimensions. – Human eyes are good at judging the quality of clustering for up to three dimensions. – Finding clusters of data objects in high dimensional space is challenging, especially considering that such data can be sparse and highly skewed.
  • 20. 7/2/2019 Compiled by : Kamal Acharya 20 Contd.. • Constraint-based clustering: – Real-world applications may need to perform clustering under various kinds of constraints. – Suppose that your job is to choose the locations for a given number of new automatic banking machines (ATMs) in a city. – To decide upon this, you may cluster households while considering constraints such as the city’s rivers and highway networks, and the type and number of customers per cluster. – A challenging task is to find groups of data with good clustering behavior that satisfy specified constraints.
  • 21. 7/2/2019 Compiled by : Kamal Acharya 21 Contd.. • Interpretability and usability: – Users expect clustering results to be interpretable, comprehensible, and usable. – That is, clustering may need to be tied to specific semantic interpretations and applications. – It is important to study how an application goal may influence the selection of clustering features and methods.
  • 22. 7/2/2019 Compiled by : Kamal Acharya 22 Aspects of clustering • A clustering algorithm/methods – Partitional clustering – Hierarchical clustering – … • A distance (similarity, or dissimilarity) function • Clustering quality – Inter-clusters distance  maximized – Intra-clusters distance  minimized • The quality of a clustering result depends on the algorithm, the distance function, and the application.
  • 23. 7/2/2019 Compiled by : Kamal Acharya 23 Major Clustering Methods: • In general, the major fundamental clustering methods can be classified into the following categories: – Partitioning Methods – Hierarchical Methods – Density-Based Methods – Grid-Based Methods
  • 24. 7/2/2019 Compiled by : Kamal Acharya 24 Contd.. • Partitioning Methods: – A partitioning method constructs k partitions of the data, where each partition represents a cluster and k <= n. That is, it classifies the data into k groups, which together satisfy the following requirements: • Each group must contain at least one object, and • Each object must belong to exactly one group. – A partitioning method creates an initial partitioning. It then uses an iterative relocation technique that attempts to improve the partitioning by moving objects from one group to another. – The general criterion of a good partitioning is that objects in the same cluster are close or related to each other, whereas objects of different clusters are far apart or very different.
  • 25. 7/2/2019 Compiled by : Kamal Acharya 25 Contd.. • E.g., K-means, and K-medoids
  • 26. 7/2/2019 Compiled by : Kamal Acharya 26 Contd.. • Hierarchical Methods: – A hierarchical method creates a hierarchical decomposition of the given set of data objects. – A hierarchical method can be classified as being either agglomerative or divisive, based on how the hierarchical decomposition is formed.
  • 27. 7/2/2019 Compiled by : Kamal Acharya 27 Contd.. • E.g.,
  • 28. 7/2/2019 Compiled by : Kamal Acharya 28 Contd.. • The agglomerative approach, also called the bottom-up approach, starts with each object forming a separate group. It successively merges the objects or groups that are close to one another, until all of the groups are merged into one or until a termination condition holds. • The divisive approach, also called the top-down approach, starts with all of the objects in the same cluster. In each successive iteration, a cluster is split up into smaller clusters, until eventually each object is in one cluster, or until a termination condition holds.
  • 29. 7/2/2019 Compiled by : Kamal Acharya 29 Contd.. • Density-based methods: – General idea is to continue growing the given cluster as long as the density in the neighborhood exceeds some threshold; that is, for each data point within a given cluster, the neighborhood of a given radius has to contain at least a minimum number of points. – Such a method can be used to filter out noise (outliers)and discover clusters of arbitrary shape. – E.g., DBSCAN
  • 30. 7/2/2019 Compiled by : Kamal Acharya 30 Contd.. • E.g.,
  • 31. 7/2/2019 Compiled by : Kamal Acharya 31 Contd.. • Grid-based methods: – Grid-based methods quantize the object space into a finite number of cells that form a grid structure. – All the clustering operations are performed on the grid structure. – E.g., STING
  • 32. 7/2/2019 Compiled by : Kamal Acharya 32 Contd..
  • 33. 7/2/2019 Compiled by : Kamal Acharya 33 Partitioning Methods • Given a data set, D, of n objects, and k, the number of clusters to form, a partitioning algorithm organizes the objects into k partitions (k<=n), where each partition represents a cluster.
  • 34. 7/2/2019 Compiled by : Kamal Acharya 34 k-Means: A Centroid-Based Technique • A centroid-based partitioning technique uses the centroid of a cluster, Ci , to represent that cluster. • The centroid of a cluster is its center point such as the mean or medoid of the objects (or points) assigned to the cluster. • The difference between an object and ci, the representative of the cluster, is measured by dist(p, ci), • where dist(i, j) is the Euclidean distance between two points
  • 35. 7/2/2019 Compiled by : Kamal Acharya 35 Contd.. • The k-means algorithm defines the centroid of a cluster as the mean value of the points within the cluster. It proceeds as follows: – First, it randomly selects k of the objects in D, each of which initially represents a cluster mean or center. – For each of the remaining objects, an object is assigned to the cluster to which it is the most similar, based on the Euclidean distance between the object and the cluster mean. – The k-means algorithm then iteratively improves the within-cluster variation. For each cluster, it computes the new mean using the objects assigned to the cluster in the previous iteration. All the objects are then reassigned using the updated means as the new cluster centers. – The iterations continue until the assignment is stable, that is, the clusters formed in the current round are the same as those formed in the previous round.
  • 36. 7/2/2019 Compiled by : Kamal Acharya 36 Contd.. • Algorithm: – The k-means algorithm for partitioning, where each cluster’s center is represented by the mean value of the objects in the cluster.
  • 37. 7/2/2019 Compiled by : Kamal Acharya 37 The K-Means Clustering Method • Example 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means Update the cluster means reassignreassign
  • 38. 7/2/2019 Compiled by : Kamal Acharya 38 Contd.. • Example1: Clusters the following instances of given data (2- Dimensional form) with the help of K means algorithm (Take K = 2) Instance X Y 1 1 1.5 2 1 4.5 3 2 1.5 4 2 3.5 5 3 2.5 6 3 4
  • 39. 7/2/2019 Compiled by : Kamal Acharya 39 Contd.. • Example 2: Clusters the following instances of given data (2- Dimensional form) with the help of K means algorithm (Take K = 2) Instance X Y 1 1 2.5 2 1 4.5 3 2.5 3 4 2 1.5 5 4.5 1.5 6 4 5
  • 40. 7/2/2019 Compiled by : Kamal Acharya 40 Hierarchical clustering • A hierarchical clustering method works by grouping data objects into a hierarchy or “tree” of clusters. • Representing data objects in the form of a hierarchy is useful for data summarization and visualization.
  • 41. 7/2/2019 Compiled by : Kamal Acharya 41 Contd.. • Depending on whether the hierarchical decomposition is formed in a bottom-up (merging) or top-down (splitting) fashion a hierarchical clustering method can be classified into two categories: – Agglomerative Hierarchical Clustering and – Divisive Hierarchical Clustering
  • 42. 7/2/2019 Compiled by : Kamal Acharya 42 Contd.. • Agglomerative Hierarchical Clustering: – uses a bottom-up strategy. – starts by letting each object form its own cluster and iteratively merges clusters into larger and larger clusters, until all the objects are in a single cluster or certain termination conditions(desired number of clusters) are satisfied. – For the merging step, it finds the two clusters that are closest to each other (according to some similarity measure), and combines the two to form one cluster.
  • 43. 7/2/2019 Compiled by : Kamal Acharya 43 Contd.. • Example: a data set of five objects, {a, b, c, d, e}. Initially, AGNES (AGglomerative NESting), the agglomerative method, places each object into a cluster of its own. The clusters are then merged step-by-step according to some criterion (e.g., minimum Euclidean distance).
  • 44. 7/2/2019 Compiled by : Kamal Acharya 44 Contd.. • Divisive hierarchical clustering : – A divisive hierarchical clustering method employs a top-down strategy. – It starts by placing all objects in one cluster, which is the hierarchy’s root. – It then divides the root cluster into several smaller sub-clusters, and recursively partitions those clusters into smaller ones. – The partitioning process continues until each cluster at the lowest level either containing only one object, or the objects within a cluster are sufficiently similar to each other.
  • 45. 7/2/2019 Compiled by : Kamal Acharya 45 Contd.. • Example: DIANA (DIvisive ANAlysis), a divisive hierarchical clustering method: – a data set of five objects, {a, b, c, d, e}. All the objects are used to form one initial cluster. The cluster is split according to some principle such as the maximum Euclidean distance between the closest neighboring objects in the cluster. The cluster-splitting process repeats until, eventually, each new cluster contains only a single object.
  • 46. 7/2/2019 Compiled by : Kamal Acharya 46 Contd.. • agglomerative versus divisive hierarchical clustering: – Organize objects into a hierarchy using a bottom-up or top- down strategy, respectively. – Agglomerative methods start with individual objects as clusters, which are iteratively merged to form larger clusters. – Conversely, divisive methods initially let all the given objects form one cluster, which they iteratively split into smaller clusters.
  • 47. 7/2/2019 Compiled by : Kamal Acharya 47 Contd.. • Hierarchical clustering methods can encounter difficulties regarding the selection of merge or split points. – Such a decision is critical, because once a group of objects is merged or split, the process at the next step will operate on the newly generated clusters. It will neither undo what was done previously, nor perform object swapping between clusters. – Thus, merge or split decisions, if not well chosen, may lead to low- quality clusters. • Moreover, the methods do not scale well because each decision of merge or split needs to examine and evaluate many objects or clusters.
  • 48. 7/2/2019 Compiled by : Kamal Acharya 48 Density Based Methods • Partitioning methods and hierarchical clustering are suitable for finding spherical-shaped clusters. • Moreover, they are also severely affected by the presence of noise and outliers in the data. • Unfortunately, real life data contain: – Clusters of arbitrary shape such as oval, linear, s-shaped, etc. – Many noise • Solution : Density based methods
  • 49. 7/2/2019 Compiled by : Kamal Acharya 49 Contd.. • Basic Idea behind Density based methods: – Model clusters as dense regions in the data space, separated by sparse regions. • Major features: – Discover clusters of arbitrary shape(e.g., oval, s-shaped, etc) – Handle noise – Need density parameters as termination condition • E.g., : DBSCAN(Density Based Spatial Clustering of Applications with Noise)
  • 50. Density-Based Clustering: Background • Neighborhood of point p=all points within distance e from p: – NEps(p)={q | dist(p,q) <= e } • Two parameters: – e : Maximum radius of the neighbourhood – MinPts: Minimum number of points in an e -neighbourhood of that point • If the number of points in the e -neighborhood of p is at least MinPts, then p is called a core object. p q MinPts = 5 e = 1 cm 7/2/2019 Compiled by : Kamal Acharya 50
  • 51. Contd.. • Directly density-reachable: A point p is directly density- reachable from a point q wrt. e, MinPts if – 1) p belongs to NEps(q) – 2) core point condition: |NEps (q)| >= MinPts p q MinPts = 5 e = 1 cm 7/2/2019 Compiled by : Kamal Acharya 51
  • 52. Contd.. • Density-reachable: – A point p is density-reachable from a point q wrt. Eps, MinPts if there is a chain of points p1, …, pn, q = p1,….. pn = p such that pi+1 is directly density-reachable from pi p q p1 7/2/2019 Compiled by : Kamal Acharya 52
  • 53. Contd.. • Density-connected: – A point p is density-connected to a point q wrt. Eps, MinPts if there is a point o such that both, p and q are density-reachable from o wrt. Eps and MinPts. p q o 7/2/2019 Compiled by : Kamal Acharya 53
  • 54. 7/2/2019 Compiled by : Kamal Acharya 54 Contd..• Density = number of points within a specified radius (Eps). • A point is a core point if it has at least a specified number of points (MinPts) within Eps. • These are points that are at the interior of a cluster • Counts the point itself • A border point is not a core point, but is in the neighborhood of a core point • A noise point is any point that is not a core point or a border point e.g.,: Minpts=7
  • 55. 7/2/2019 Compiled by : Kamal Acharya 55 DBSCAN(Density Based Spatial Clustering of Applications with Noise) • To find the next cluster, DBSCAN randomly selects an unvisited object from the remaining ones. The clustering process continues until all objects are visited.
  • 56. 7/2/2019 Compiled by : Kamal Acharya 56 Contd..
  • 57. 7/2/2019 Compiled by : Kamal Acharya 57 Contd.. • Example: – If Epsilon is 2 and minpoint is 2, what are the clusters that DBScan would discover with the following 8 examples: A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9). • Solution : – d(a,b) denotes the Eucledian distance between a and b. It is obtained directly from the distance matrix calculated as follows: – d(a,b)=sqrt((xb-xa)2+(yb-ya)2))
  • 58. 7/2/2019 Compiled by : Kamal Acharya 58 Contd.. A1 A2 A3 A4 A5 A6 A7 A8 A1 0 √25 √36 √13 √50 √52 √65 √5 A2 0 √37 √18 √25 √17 √10 √20 A3 0 √25 √2 √2 √53 √41 A4 0 √13 √17 √52 √2 A5 0 √2 √45 √25 A6 0 √29 √29 A7 0 √58 A8 0
  • 59. 7/2/2019 Compiled by : Kamal Acharya 59 Contd.. • N2(A1)={}; • N2(A2)={}; • N2(A3)={A5, A6}; • N2(A4)={A8}; • N2(A5)={A3, A6}; • N2(A6)={A3, A5}; • N2(A7)={}; • N2(A8)={A4}; • So A1, A2, and A7 are outliers, while we have two clusters C1={A4, A8} and C2={A3, A5, A6}
  • 60. 7/2/2019 Compiled by : Kamal Acharya 60 Contd..
  • 61. 7/2/2019 Compiled by : Kamal Acharya 61 Advantages and Disadvantages of DBSCAN algorithm: • Advantages: – DBSCAN does not require one to specify the number of clusters in the data priori, as opposed to k-means. – DBSCAN can find arbitrarily shaped clusters – DBSCAN is robust to outliers. – DBSCAN is mostly insensitive to the ordering of the points in the database. – The parameters minPts and ε can be set by a domain expert, if the data is well understood.
  • 62. 7/2/2019 Compiled by : Kamal Acharya 62 Contd.. • Disadvantages: – DBSCAN is not entirely deterministic: border points that are reachable from more than one cluster can be part of either cluster, depending on the order the data is processed. Fortunately, this situation does not arise often, and has little impact on the clustering result: both on core points and noise points, DBSCAN is deterministic – DBSCAN cannot cluster data sets well with large differences in densities, since the minPts-ε combination cannot then be chosen appropriately for all clusters. – If the data and scale are not well understood, choosing a meaningful distance threshold ε can be difficult.
  • 63. 7/2/2019 Compiled by : Kamal Acharya 63 Homework • Explain the aims of cluster analysis. • What is clustering? How is it different than supervised classification? In what situation clustering can be useful? • List and explain desired features of cluster analysis. • Explain the different types of cluster analysis methods and discuss their features. • Describe the k-means algorithm and write its strengths and weaknesses. • Describe the features of Hierarchical clustering methods? In what situations are these methods useful?
  • 64. Thank You ! Compiled by : Kamal Acharya 647/2/2019