Cluster Analysis.pptx

Cluster Analysis
Dr. Revathy V R
Assistant Professor
Dept. of Computer Science (UG)
Kristu Jayanti College, Bangalore

OVERVIEW
● Cluster Analysis: Basic Concepts
● Applications
● What is a good clustering?
● Types of Data in Cluster Analysis

Cluster Analysis: Basic Concepts
What is Cluster Analysis?
• Cluster: A collection of data objects.
– similar (or related) to one another within the same group. – dissimilar (or
unrelated) to the objects in other groups.
• Cluster analysis (or clustering, data segmentation, …)
– Finding similarities between data according to the characteristics found in
the data and grouping similar data objects into clusters.
• Clustering is a Unsupervised learning Concepts.
– As a stand-alone tool to get insight into data distribution.
– As a preprocessing step for other algorithms.

Applications of Clustering
• Biology: Taxonomy of living things like kingdom, phylum, class,
order, family, genus and species.
• Information retrieval: To document clustering.
• Land use: Identification of areas of similar land use in an earth
observation database.
• Marketing: Help marketers discover distinct groups in their
customer bases, and then use this knowledge to develop targeted
marketing programs.
• Climate: Understanding earth climate, find patterns of atmospheric
and ocean.
• Economic Science: Used for Market Research.

What Is Good Clustering?
• A good clustering method will produce high quality clusters.
– High intra-class similarity: cohesive within clusters.
– Low inter-class similarity: distinctive between clusters.
• The quality of a clustering method depends on :
– The similarity measure used by the method.
– Its implementation, and
– Its ability to discover some or all of the hidden patterns.

Types of Data in Cluster Analysis
1. Interval-Scaled variables
2. Binary variables
3. Nominal, Ordinal, and Ratio variables
4. Variables of mixed types

1. Interval-Scaled variables :
Interval-scaled variables are continuous
measurements of a roughly linear scale.
Example:
weight and height, latitude and longitude
coordinates (e.g., when clustering houses),
and weather temperature.

2. Binary variables :
A binary variable is a variable that can take
only 2 values.
Example : Generally gender variables can take 2
variables male and female.

3. Nominal, Ordinal, and Ratio variables:
Nominal (or) Categorical variables
A generalization of the binary variable in that it can take more than
2 states.
Example : red, yellow, blue, green.
Ordinal Variables:
An ordinal variable can be discrete or continuous.
Example : Rank.
Ratio variables :
It is a positive measurement on a nonlinear scale, approximately at an
exponential scale.
Example : Ae^Bt or A^e-Bt.

4.Variables of mixed types :
A database may contain all the six types of
variables symmetric binary, asymmetric
binary, nominal, ordinal, interval, and
ratio. Those combinedly called as mixed-
type variables.

Cluster Analysis.pptx

More Related Content

Similar to Cluster Analysis.pptx

Recently uploaded

Cluster Analysis.pptx