Basics of Clustering

Cluster Analysis
For segmentation

Clustering
what is it?
why do we use it?
how do we do it?

What is it?
• Cluster analysis is the
process of grouping a set of
data into clusters.
• A cluster is a collection of data
points where each observation is
1) similar to other observations in
the same cluster, and 2) dissimilar
to observations in other clusters

What is it?
• Cluster analysis is a statistical tool for discovering
hidden patterns in groups of observations - e.g., on what
criteria are these “clusters” made?
• Cluster analysis is still quite subjective in nature. Does it
make sense?

In Marketing…
• Clustering is used to discover
distinct groups in customer
bases (e.g., segments), and
use this knowledge to develop
targeted marketing programs
• Another example: Insurance
companies use clustering to
determine “what type” of
drivers are risky, and safe -
and charge premiums
accordingly!

Good Clusters have:
• High: Intra-class similarity
(observations in the cluster
share qualities)
• Low: Inter-class similarity
(distinct clusters are different
from one-another)

Consider- two important
characteristics
Student grades work hours
a 3.5 0
b 3.7 5
c 2.9 10
d 2.0 12
e 3.0 15
f 2.8 14
work hours
grades
a
d
c
b
ef
cluster 1
cluster 2

How do we use this
information?
We have 2 distinct
segments.
Other data we have:
age, gender,
hometown, grade
level, major, hair color.
What is the segment
proﬁle of each?
work hours
grades
a
d
c
b
ef
cluster 1
cluster 2

Are
these both viable
targets?
That depends on ….?
Are all of these characteristics useful?
How do we use this
information?
We have 2 distinct
segments.
Descriptive Statistics
age gender hometown major haircolor
segment 1
- works a
lot
Mean = 20 57% male 90% NKY
65%
Business
50%
blonde
segment 2
- good
students
Mean = 20 75% male
66% OH,
IN
50% Arts 75% brown

How to do it!
• You need access to SPSS.
You can either 1) log in to
NKU’s virtual network (VPN)
using the virtual desktop, or
you can use a University
computer. (I suggest VPN)
• Use this link to learn how to
use the virtual desktop. You
ﬁrst have to install the VPN
software if you want to do it
off-campus: click here.

Steps to follow…
• Open your data set and save it
to a portable drive or your NKU
“j” drive
• We will be using “Two-Step”
cluster analysis.
• From SPSS ﬁle:
Analyze —->Classify —> two-step cluster
The Youtube tutorial is linked here if you need to review it.
Then follow the instructions on the YouTube
tutorial.

Warnings
Don’t use binary variables in the clustering process (e.g., gender,
team (yes/no)). These are “swamping variables” and will hijack your
clusters.
Clusters of 3-4 are ideal, even if you have to force it and the
criteria are not very good. You only have what you have…
Your data set might not ever give you “perfect” results based on the
criteria discussed in the video tutorial. Thats ok. Do the best you
can.

Basics of Clustering

More Related Content

What's hot

Similar to Basics of Clustering

More from B. Nichols

Recently uploaded

Basics of Clustering