Optimizing Market Segmentation

Optimizing Segmentation
Insight Research Group

2

Why is Segmentation Difficult?
 Infinite number of possible solutions

 Hundreds of possible variables for to use

 Clearly defined clusters are rarely present in real life data-
sets

3

Technical Challenges

 Challenge: Incorporating fundamentally
categorical variables
 Ethnicity, Religion, Political Party, Etc.

 Standard methods assume continuous data (ideal case) and
require interval level data as worst case (e.g. ratings scales)
 Correlation, linear regression, k-means clustering

4

Technical Solutions
 Challenge: Incorporating fundamentally categorical
variables

 Multiple Correspondence Analysis (factor analysis for categorical
data)

 Pro: handles both demographic (categorical) and ratings variables
 Would allow treating sets of variables separately (i.e. demographic,
behavioral, psychological) – these sets could be used as inputs to
clustering method

 Con: segmentation would be based on extracted components

5


 Determining the number of clusters/segments in the
data

 Standard methods require the user to specify the number of cluster
to extract

 Our standard practice results in fewer clusters then input variables
 e.g. AMC segmentation solutions required ~12 variables to find ~5
segments
 This ratio of features-to-segments will „water-down‟ the effect of the
individual variables (segments do not differ significantly on most items)

6

Technical Solution 1
 Challenge: Determining the number of clusters/segments
in the data

 Solution: fit a probabilistic mixture model and compute a
complexity penalized likelihood (AIC / BIC scores)
 The model with the best AIC / BIC score is our best guess for the
number of natural clusters in the data

 Gaussian mixture models for continuous data
 Latent Class Models for categorical data
 Latent class models can handle both categorical and continuous data if the
continuous data is binned.

 Both of the above return BIC scores to determine the number of
clusters

7

 How many clusters do you see? (4 sources generated the data –duh)

8

 The BIC infers 4 clusters (4 clusters solution had the best BIC score)

9

 How many clusters do you see? (4 sources generated the data –not so obvious)

10

 The BIC says 4! (4 clusters had the best BIC score, thanks BIC!)

11

 Challenge: Determining the number of clusters/segments
in the data
 Solution: ensure there are fewer input variables then extracted
clusters
2(+) segments can be obtained from
a single variable.

That is a 2-1 ratio of segments-to-
variables

For AMC & MTV we got 5 segments
from ~12 variables. A ratio of 0.4-1.
- That is less then 1 segments for
every two variables…

 Also See: Van Buuren & Heiser (1989); Vichi & Kiers (2001); Hwang, Dillon, &
Takane (2006).

12

 Respondents vary in their use of
ratings scales

 Some respondents only use part
of the scale,
 Either top or bottom of range

 Segmentation method will find the
high/low scale-use respondents
and define segments for them
 See AMC segments,

13

Psychographic
banner for AMC
segments.

These items were not
used to define the cluster
solution.

14

 Challenge: Respondents vary in their use of ratings
scales

 Calibrate respondents to equate ratings scale across sample
 Overcoming Scale Use Heterogeneity (2003) Peter E. Rossi

 Pro: Improves the accuracy and validly of standard methods
 E.g. correlation, regression, clustering

 Con: requires complex and computational expensive models
 i.e. hierarchical bayesian models – available as R package

15

 Challenge: Respondents vary in their use of ratings
scales

 Abandon rating scales – use simple Agree/Disagree variables
 Focus on methods for categorical variables

 Multiple Correspondence Analysis (factor analysis for categorical data)

 Pro: handles both demographic (categorical) and ratings variables
 Would allow treating sets of variables separately (i.e. demographic, behavioral,
psychological) – these sets could be used as inputs to clustering methods

16

What Slows Us Down?
 Each segmentation iteration consumes resources

 Producing new segmentation variable for each respondent
 .5 man hour

 Producing new banners
 Generating tables - .25 hours
 Formatting and printing – 1+ man hours

 Analyzing full banner for new segmentation
 Requires entire research team, 6+ man hours

17

How to Speed it up
 Producing new segmentation variable for each respondent
 .5 man hour – Not the bottleneck

 Producing new banners
 Generating tables - .25 hours – Not the bottleneck
 Formatting and printing – 1+ man hours – Potential for Automation

 Analyzing full banner for the new segmentation
 Requires entire research team, 6+ man hours – workflow bottleneck
 Ideas / brainstorm
 Criteria of success is often vague
 When the goal is well defined quant methods can increase efficiency
 If you can formalize it you can solve it
 Time invested in the planning phase will reap productivity gains during analysis

18

Hypothetical Case Study
 Goals Brainstorm:
 Client and previous research says:
 “segmentation should differentiate enthusiasts (early adopters) and utility
consumers (late adopters)”
 “also, segmentation should include demographics that are known to influence
technology adoption.
 Age, Gender, Income, Education

 Quant answers:
 “Ok, lets write a battery of questions addressing consumers perceptions and
relation to technology products – this will be distilled into a single „tech
enthusiasm‟ measure.
 “Also, all relevant demographic information can be reduced into a one (or more)
demo factors
 “Segments will be defined from a „reduced dimensionality‟ representation of the
data (MCA)”

20
Categories graph

21
Combined graph

23

MCA for Segmentation
 (2006). An extension of multiple correspondence analysis for identifying
heterogeneous subgroups of respondents

 (2010). Traveler segmentation strategy with nominal variables through
correspondence analysis

 (2010). Fuzzy cluster multiple correspondence analysis

 (2010). Simultaneous two-way clustering of multiple correspondence
analysis

 (2005). A simultaneous approach to constrained multiple correspondence
analysis and cluster analysis for market segmentation

 (2002). Analysis of categorical marketing data by generalized constrained
multiple correspondence analysis

24

Further Directions
 Extension to Multiple Correspondence Analysis
 Methods that let us combine nominal, numeric, and ordinal
variables
 Methods that let us group variables into sets.
 E.g. could ensures that psychographic, behavioral and demographic
have an equal influence on the final solution.

 Methods that simultaneously preform dimensionality
reduction and cluster discovery
 Optimizes the entire analysis to discover the most distinctive
clusters
 Very promising approach
 Con: I have not found an implementation of these methods.

Optimizing Market Segmentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Optimizing Market Segmentation

Similar to Optimizing Market Segmentation (20)

Recently uploaded

Recently uploaded (20)

Optimizing Market Segmentation