SlideShare a Scribd company logo
Using R for Customer Segmentation
useR! 2008
Dortmund, Germany
August, 2008
Jim Porzak,
Senior Director of Analytics
Responsys, Inc.
San Francisco, California
11Aug08 userR! 08 - Porzak, Customer Segmentation 2
Outline
● Two main case study examples
– Customer purchase behavior data
● Goal: actionable segments to improve LTV of customer
base
– Prospect intent & interest survey data
● Goal: actionable segments to better target messaging
content and tactics
● Real data from real clients (sanitized)
● Workshop format
– Hands on
– Discussion heavy
11Aug08 userR! 08 - Porzak, Customer Segmentation 3
Introduction
11Aug08 userR! 08 - Porzak, Customer Segmentation 4
Why Segment?
● Better communication with customers and
prospects
– Recipient should feel that we understand him or her
as an individual
– “Send the right message to the right person at the
right time”
● Challenges:
– Widely applicable
● General rules based on readily available data
● A new contact can be placed in their segment easily
– Usable
● Marketing can relate
● Technology can deliver
11Aug08 userR! 08 - Porzak, Customer Segmentation 5
Segmentation in Practice
11Aug08 userR! 08 - Porzak, Customer Segmentation 6
Behavioral Segmentation
11Aug08 userR! 08 - Porzak, Customer Segmentation 7
What's Behavioral Segmentation?
● Based on what people actually do
– Not on what that say they do
● Purchase behavior
– Discuss examples...
● Usage behavior
– Discuss examples...
11Aug08 userR! 08 - Porzak, Customer Segmentation 8
Why do Behavioral Segmentation?
● All comes down to interacting with your
customer or prospect in the appropriate way
– From customers perspective, not yours!
● Ideally a “one-to-one” interaction
– Not practical in today's world
– Goal: perceived by customer as “one-to-one”
11Aug08 userR! 08 - Porzak, Customer Segmentation 9
Today's Purchase Behavior Data Set
● Actual web & phone sales records (sanitized)
– 541k order detail lines
– 135k Customers
– Over 2 ½ years
– Of ~900 different products
– In 5 product categories
● Conventional wisdom
– Strong seasonality
– Have a loyal customer base
– But, have retention problem
11Aug08 userR! 08 - Porzak, Customer Segmentation 10
What we know
Date: 10/10/07 Order #: 12345
Customer: 3894832
Sue Smith
1 Short Street
Qty SKU Description Unit Price Ext Price
1 123 1.50 1.50
3 345 White Widget 2.00 6.00
Total 7.50
Tax 0.60
Shipping 2.00
Grand Total 10.10
Smallville, ND, 39248
Green Gizzmo
Imagine a customer order form:
We get the highlighted data.
Plus: order channel and product (SKU) category
11Aug08 userR! 08 - Porzak, Customer Segmentation 11
Preloaded as “orders” data frame
> load("BehavioralDataSet.Rda")
> str(orders)
'data.frame': 541101 obs. of 9 variables:
$ SKU_ID : int 459 459 459 459 459 459 459 459 459 459 ...
$ ChannelID: int 3 4 3 3 3 3 4 3 3 3 ...
$ CustID : int 134945 212174 39861 11227 137271 60982 ...
$ OrderID : int 326324 109305 172669 132642 20449 40826 ...
$ OrderDate:Class 'Date' num [1:541101] 13211 13649 13670 ...
$ Quantity : int 1 2 1 3 1 1 1 1 1 1 ...
$ Amount : num 18 36 18 54 18 18 18 18 18 18 ...
$ Channel : Factor w/ 4 levels "phone1","phone2",..: 3 4 3
3 ...
$ Category : Factor w/ 7 levels "*","C","G","I",..: 3 3 3 3
3 ...
11Aug08 userR! 08 - Porzak, Customer Segmentation 12
orders summary
> summary(orders[-(1:2)])
CustID OrderID OrderDate Quantity
Min. : 2 Min. : 2 Min. :2005-09-01 Min. : 0.000
1st Qu.: 62221 1st Qu.:105292 1st Qu.:2006-07-18 1st Qu.: 1.000
Median :124343 Median :210908 Median :2007-02-14 Median : 1.000
Mean :152974 Mean :207535 Mean :2007-03-11 Mean : 1.113
3rd Qu.:185119 3rd Qu.:315711 3rd Qu.:2007-12-04 3rd Qu.: 1.000
Max. :506929 Max. :388319 Max. :2008-07-14 Max. :275.000
NA's : 4
Amount Channel Category
Min. : 0.01 phone1: 14303 *: 0
1st Qu.: 20.00 phone2: 90 C:142147
Median : 30.00 web1 :451354 G:114300
Mean : 31.81 web2 : 75354 I: 14961
3rd Qu.: 35.00 N: 50385
Max. :4577.00 T:199354
X: 19954
11Aug08 userR! 08 - Porzak, Customer Segmentation 13
Goal of this exercise?
● Marketers need to come up with a
communication strategy & associated tactics
which will entice customers to exhibit higher
LTV – Long Term Value.
● Segment by past purchase behavior to provide
actionable subsets of customers
– When marketers use our subsets, they get
measurably better results than previous “one size
fits all” method.
11Aug08 userR! 08 - Porzak, Customer Segmentation 14
How are we going to do this?
(Discussion)
11Aug08 userR! 08 - Porzak, Customer Segmentation 15
Hints
● Live Stage
● Value
● Engagement
● Favorite Products
● Timing
11Aug08 userR! 08 - Porzak, Customer Segmentation 16
Recency, Frequency, & Monetary Metrics
● Recency
– How long ago was last purchase? (days)
– Measured for “As Of Date” of data set
● Frequency
– How many orders in analysis period (2 ½ years)
– Attempting to measure engagement
● Monetary
– What is total $ value of all orders in analysis period
Question: Do you expect these three to be uncorrelated?
11Aug08 userR! 08 - Porzak, Customer Segmentation 17
An Aside: Classical RFM
● Invented by direct marketers in 1950's as a way to
model response rates (before good stat software was
readily available)
● One typical method
– R, F, & M each scored in quantile (typically 5)
– Combined score for each recipient was
concatenation of the three digits, eg “351”
– Scores ranked by empirical response rate
– Mailing then done to top xx% of list
● Today we use, lm, glm, randomForest, ...
● But, concepts still valid as conceptional model
● And, R & F measures typically very important in any
predictive model
11Aug08 userR! 08 - Porzak, Customer Segmentation 18
I also typically include...
● Breadth
– How many different SKUs purchased?
● Tenure
– How long as customer been with us?
11Aug08 userR! 08 - Porzak, Customer Segmentation 19
Next Step – Aggregate by Customer
● We need some “raw” RFM values
● Make the data frame “RFM_raw”
– CustomerID: the business key back to database
– FirstPurchaseDate: interesting for tenure metric
– LastPurchaseDate: basis of Recency
– NumberOrders: basis of Frequency
– NumberSKUs: basis of Breadth (engagement metric)
– TotalAmount: basis of Monetary
● Also calculate
– AsOfDate <- max(LastPurchaseDate)
11Aug08 userR! 08 - Porzak, Customer Segmentation 20
Building the RFM_raw data frame
## for performance, make OrderDate an integer during aggregation
orders_n <- orders
orders_n$OrderDate <- as.integer(orders_n$OrderDate)
## build up one column at a time
RFM_raw <- with(orders_n, data.frame(CustomerID = sort(unique(CustID))))
RFM_raw <- cbind(RFM_raw, FirstPurchaseDate = with(orders_n,
as.Date(as.integer(by(OrderDate, CustID, min)), "1970-01-01")))
RFM_raw <- cbind(RFM_raw, LastPurchaseDate = with(orders_n,
as.Date(as.integer(by(OrderDate, CustID, max)), "1970-01-01")))
RFM_raw <- cbind(RFM_raw, NumberOrders = with(orders_n,
as.numeric(by(OrderID, CustID, function(x) length(unique(x))))))
RFM_raw <- cbind(RFM_raw, NumberSKUs = with(orders_n,
as.numeric(by(SKU_ID, CustID, function(x) length(unique(x))))))
RFM_raw <- cbind(RFM_raw, TotalAmount = with(orders_n,
as.numeric(by(Amount, CustID, sum))))
AsOfDate <- max(RFM_raw$LastPurchaseDate)
save(RFM_raw, AsOfDate, file = "RFM_raw.Rda")
This take a while (1 ½ minutes on my laptop). You may want to download RFM_raw.Rda
11Aug08 userR! 08 - Porzak, Customer Segmentation 21
Do some RMF EDA
## Jim's miscellaneous DMA functions
source("dma_misc.R")
## for interactive games:
attach(RFM_raw)
## EDA plots using base graphics
rfm.plot(as.numeric(AsOfDate - LastPurchaseDate) %/% 7, "rec")
rfm.plot(NumberOrders, "freq")
rfm.plot(TotalAmount, "mon")
rfm.plot(NumberSKUs, "breadth")
## EDA plots using iPlots
ihist(as.numeric(AsOfDate - LastPurchaseDate) %/% 7, title = "Recency")
ihist(NumberOrders, title = "Frequency")
ihist(TotalAmount, title = "Monetary")
ihist(NumberSKUs, title = "Breadth")
11Aug08 userR! 08 - Porzak, Customer Segmentation 22
RFM EDA Plots
In all cases, “best is left.”
11Aug08 userR! 08 - Porzak, Customer Segmentation 23
Assign reasonable RFM breaks
● Recency:
– Breaks (weeks <=): 25, 51, 77, 103, <else>
– levels = c("0-5", "6-11", "12-17", "18-23", "24-29"))
● Note levels labeled in months, not weeks
● Frequency:
– Breaks (count <=): 1, 3, 7, <else>
– levels = c("8+", "7-4", "3-2", "1"))
● Note ordering for best is left.
● Monetary:
– Breaks (value <=): 50, 100, 200, 400, <else>
– levels = c("401+", "400-201", "200-101", "100-51", "50-0"))
● Again ordering is best is left.
11Aug08 userR! 08 - Porzak, Customer Segmentation 24
Build RFM_segs data frame
RFM_segs <- data.frame(Recency_weeks = as.numeric(AsOfDate - RFM_raw$LastPurchaseDate) %/% 7)
row.names(RFM_segs) <- row.names(RFM_raw)
## now label levels with months rather than weeks
RFM_segs$Recency <- ordered(ifelse(RFM_segs$Recency_weeks <= 25, "0-5",
ifelse(RFM_segs$Recency_weeks <= 51, "6-11",
ifelse(RFM_segs$Recency_weeks <= 77, "12-17",
ifelse(RFM_segs$Recency_weeks <= 103, "18-23", "24-29")))),
levels = c("0-5", "6-11", "12-17", "18-23", "24-29"))
RFM_segs$Frequency_count <- RFM_raw$NumberOrders
RFM_segs$Frequency <- ordered(ifelse(RFM_segs$Frequency_count == 1, "1",
ifelse(RFM_segs$Frequency_count <= 3, "3-2",
ifelse(RFM_segs$Frequency_count <= 7, "7-4", "8+"))),
levels = c("8+", "7-4", "3-2", "1"))
RFM_segs$Monetary_value <- RFM_raw$TotalAmount
RFM_segs$Monetary <- ordered(ifelse(RFM_segs$Monetary_value <= 50, "50-0",
ifelse(RFM_segs$Monetary_value <= 100, "100-51",
ifelse(RFM_segs$Monetary_value <= 200, "200-101",
ifelse(RFM_segs$Monetary_value <= 400, "400-201", "401+")))),
levels = c("401+", "400-201", "200-101", "100-51", "50-0"))
11Aug08 userR! 08 - Porzak, Customer Segmentation 25
We typically also add Breadth & Tenure:
RFM_segs$Breadth_count <- RFM_raw$NumberSKUs
RFM_segs$Breadth <- ordered(ifelse(RFM_segs$Breadth_count == 1, "1",
ifelse(RFM_segs$Breadth_count == 2, "2",
ifelse(RFM_segs$Breadth_count <= 4, "4-3",
ifelse(RFM_segs$Breadth_count <= 9, "9-5", "10+")))),
levels = c("10+", "9-5", "4-3", "2", "1"))
RFM_segs$Tenure_weeks <- as.numeric(AsOfDate - FirstPurchaseDate) %/% 7
RFM_segs$Tenure <- ordered(ifelse(RFM_segs$Tenure_weeks <= 12, "0-12",
ifelse(RFM_segs$Tenure_weeks <= 25, "13-25",
ifelse(RFM_segs$Tenure_weeks <= 38, "26-38",
ifelse(RFM_segs$Tenure_weeks <= 51, "39-51",
ifelse(RFM_segs$Tenure_weeks <= 64, "52-64",
ifelse(RFM_segs$Tenure_weeks <= 77, "65-77",
ifelse(RFM_segs$Tenure_weeks <= 90, "78-90",
ifelse(RFM_segs$Tenure_weeks <= 103, "91-103",
"104+")))))))),
levels = c("104+", "91-103", "78-90", "65-77", "52-64", "39-51",
"26-38", "13-25", "0-12"))
save(RFM_segs, file = "RFM_segs.Rda")
11Aug08 userR! 08 - Porzak, Customer Segmentation 26
How do customers look in RFM space?
● I like mosaic plots (& especially vcd* package!)
● Set up a “structure table” with assignments:
● And a convenience function for mosaic:
require(vcd)
RFM_st <- structable(~ Recency + Frequency + Monetary + Breadth,
data = RFM_segs)
mm <- function(f) {
mosaic(f, data = RFM_st,
shade = TRUE,
labeling_args = list(rot_labels = c(left = 90, top = 45),
just_labels = c(left = "left",
top = "center")),
spacing = spacing_dimequal(unit(c(0.5, 0.8), "lines")),
keep_aspect_ratio = FALSE
)
}
* To learn more, attend: The strucplot framework for Visualizing Categorical Data. Wed, 11:30. E29
11Aug08 userR! 08 - Porzak, Customer Segmentation 27
mm(~ Recency + Frequency)
11Aug08 userR! 08 - Porzak, Customer Segmentation 28
mm(~ Frequency + Monetary)
11Aug08 userR! 08 - Porzak, Customer Segmentation 29
mm(~ Recency + Monetary)
11Aug08 userR! 08 - Porzak, Customer Segmentation 30
mm(~ Breadth + Monetary)
11Aug08 userR! 08 - Porzak, Customer Segmentation 31
To really show off vcd!
pairs(RFM_st, lower_panel = pairs_assoc, shade = TRUE)
11Aug08 userR! 08 - Porzak, Customer Segmentation 32
Time to get real – remember goal?
11Aug08 userR! 08 - Porzak, Customer Segmentation 33
Actionable for Marketers
The big two concepts:
1. Lifestage
2. Value
Turns out we can do both with Recency &
Frequency!
11Aug08 userR! 08 - Porzak, Customer Segmentation 34
Use Balloon Plots to Communicate
require(gplots)
# Recency by Frequence - Counts
RxF <- as.data.frame(table(RFM_segs$Recency, RFM_segs$Frequency,
dnn = c("Recency", "Frequency")),
responseName = "Number_Customers")
with(RxF, balloonplot(Recency, Frequency, Number_Customers, zlab = "#
Customers"))
# Recency by Frequency - Annual Value (total annual sales to segment)
VbyRxF <- (aggregate(RFM_segs$Monetary_value,
by = list(Recency = factor(RFM_segs$Recency),
Frequency = RFM_segs$Frequency),
sum))
names(VbyRxF)[3] <- "Annual_Sales"
VbyRxF$Annual_Sales <- VbyRxF$Annual_Sales / (28/12) ## normalize to
annual revnue
with(VbyRxF, balloonplot(Recency, Frequency, Annual_Sales / 1000, zlab =
"Annual Sales (000)"))
11Aug08 userR! 08 - Porzak, Customer Segmentation 35
Recency by Frequency - Counts
11Aug08 userR! 08 - Porzak, Customer Segmentation 36
Recency by Frequency - Value
11Aug08 userR! 08 - Porzak, Customer Segmentation 37
Exercise – Assign Segments
● Lifestage “dimension”
– New
– Active
– Lapsed
– Lost
● Value “dimension”
– Gold
– Silver
– Bronze
● Combined as
– High Value, Repeat, New, One-time, Lapsed, & Lost
11Aug08 userR! 08 - Porzak, Customer Segmentation 38
Color & Label Segment Cells
# a matrix of segment codes
RF_segs0 <- matrix("", nrow = 4, ncol = 5)
# manually make assignments
object.browser() ## Fill in H, R, N, L, or O. Save as RF_segs.txt
# get back into R
RF_segs <- as.matrix(read.delim("RF_segs.txt", sep = "t",
na.strings = ""))
RF_segs[is.na(RF_segs)] <- "X" ## N/A's become “Lost”
# add colors and labels to balloon plot
# Magic values for balloon cell centers
RF_x <- matrix(2:6 + 0.25, nrow = 4, ncol = 5, byrow = TRUE)
RF_y <- matrix(4:1, nrow = 4, ncol = 5, byrow = FALSE)
RF_cols <- sapply(RF_segs, function(x) switch(x, H="gold",
R="slategray2", N="green",
L="yellow", O="darkgreen", "red"))
points(RF_x, RF_y, col = RF_cols, pch = 16, cex = 12)
text(RF_x, RF_y, RF_segs, cex = 2)
11Aug08 userR! 08 - Porzak, Customer Segmentation 39
Final Segments for Marketers
11Aug08 userR! 08 - Porzak, Customer Segmentation 40
Conceptual RF Segments
11Aug08 userR! 08 - Porzak, Customer Segmentation 41
Break Time!
11Aug08 userR! 08 - Porzak, Customer Segmentation 42
Attitudinal Segmentation
11Aug08 userR! 08 - Porzak, Customer Segmentation 43
Marketing Challenge
● Our client offers free download of software with
high perceived value, but
● First asks user to fill out a simple survey
● Challenge is to come up with a “few” segments
that will be used by segment to:
– Prioritize contact strategy
– Craft marketing messages based on profile
11Aug08 userR! 08 - Porzak, Customer Segmentation 44
Sample Data
● Surveys from 20k respondents
● All within same time frame (a number of weeks)
● All requested the software download
11Aug08 userR! 08 - Porzak, Customer Segmentation 45
Survey Description
● 35 check boxes or radio buttons
– None required. Coded as binary responses
● Arranged in 5 sections
– License: W and/or X
– Role: one of D, SA, ITM, ITA, Str, Oth (radio
buttons)
– System: any of S, T, A, B, C, D, O (check boxes)
– Interest: any of M, O Pl, Pr, Sup, 64, Con, Per, DT,
Z, Oth. (check boxes)
– Application: any of Web, Inf, Col, Db, J2, Top, Dev,
Per, Other (check boxes)
11Aug08 userR! 08 - Porzak, Customer Segmentation 46
Data Set
Provided as data frame csb, in
InterestPreferenceSurvey.Rda
# Getting started
setwd("C:/Data/useR08/R")
require(lattice)
require(grDevices)
require(vcd)
require(flexclust)
load(file = "InterestPreferenceSurvey.Rda")
str(csb)
'data.frame': 20000 obs. of 35 variables:
$ Lic_W : int 0 0 0 0 0 0 0 0 0 0 ...
$ Lic_X : int 1 1 1 0 1 1 1 1 1 1 ...
$ Role_D : int 0 0 0 0 0 0 0 0 1 0 ...
$ Role_SA : int 0 0 1 0 1 0 0 1 0 0 ...
$ Role_ITM: int 0 0 0 1 0 0 0 0 0 0 ...
$ Role_ITA: int 0 0 0 0 0 0 0 0 0 0 ...
11Aug08 userR! 08 - Porzak, Customer Segmentation 47
Proportion Responders by Question
> mean(csb)
Lic_W Lic_X Role_D Role_SA Role_ITM Role_ITA Role_Stu Role_Oth
0.16040 0.90980 0.19905 0.32910 0.06905 0.08465 0.21080 0.05090
Sys_S Sys_T Sys_A Sys_B Sys_C Sys_D Sys_O Int_M
0.17780 0.39720 0.17020 0.13975 0.09325 0.03510 0.19260 0.36960
Int_O Int_Pl Int_Pr Int_Sup Int_64 Int_Con Int_Per Int_DT
0.46810 0.09395 0.10055 0.08985 0.23445 0.21235 0.31420 0.11790
Int_Z Int_Oth Ap_Web Ap_Inf Ap_Col Ap_Db Ap_J2 Ap_Top
0.23450 0.05995 0.39640 0.19125 0.18365 0.30125 0.19455 0.30145
Ap_Dev Ap_Per Ap_Other
0.18960 0.20050 0.03735
11Aug08 userR! 08 - Porzak, Customer Segmentation 48
Clustering Strategy
● flexclust package by Fritz Leisch
● See his 2006 paper (on his personal page):
A Toolbox for K-Centroids Cluster Analysis
● This is (mostly) an optional response type
survey
– 1 = “yes” is significant
– 0 is just absence not really a “no”
– Respondents checking Role_SA have much more
in common than those not checking Role_SA
● Following Fritz's argument we use the
expectation based Jaccard distance measure.
11Aug08 userR! 08 - Porzak, Customer Segmentation 49
A First Cluster Run
require(flexclust)
## set up flexclust control object
fc_cont <- new("flexclustControl")
fc_cont@tolerance <- 0.1 ## this doesn't seem to work as expected
fc_cont@iter.max <- 30 ## seems to be effective convergence
##fc_cont@verbose <- 1 ## set TRUE if to see each step
my_seed <- 0
my_family <- "ejaccard"
num_clust <- 4
my_seed <- my_seed + 1
set.seed(my_seed)
cl <- kcca(csb, k = num_clust, save.data = TRUE, control = fc_cont,
family = kccaFamily(my_family))
## This takes ~ 1.5 min. on my laptop
11Aug08 userR! 08 - Porzak, Customer Segmentation 50
Cluster Summary
> summary(cl)
kcca object of family 'ejaccard'
call:
kcca(x = csb, k = num_clust, family = kccaFamily(my_family),
control = fc_cont, save.data = TRUE)
cluster info:
size av_dist max_dist separation
1 5551 0.7159832 1 0.6766653
2 4577 0.7707523 1 0.7437616
3 2535 0.7482347 1 0.7038259
4 7337 0.7215583 1 0.6732479
no convergence after 200 iterations
sum of within cluster distances: 14693.00
11Aug08 userR! 08 - Porzak, Customer Segmentation 51
Run Plots
pop_av_dist <- with(cl@clusinfo, sum(size*av_dist)/sum(size))
main_txt <- paste("kcca ", c1@family@name, " - ",
num_clust, " clusters (",
nsamp, "k sample, seed = ", my_seed,
")", sep = "")
# Neighborhood Graph on 1st principle components
csb.pca <- prcomp(csb)
plot(cl, data = as.matrix(csb), project = csb.pca,
main = main_txt,
sub = paste("nAv Dist = ", format(pop_av_dist, digits = 5),
", k = ", c1@k, sep = "")
)
# Activity Profiles for each segment
print(barchart(cl, main = main_txt, strip.prefix = "#",
scales = list(cex = 0.6)))
11Aug08 userR! 08 - Porzak, Customer Segmentation 52
Plots (k=4, seed = 1)
11Aug08 userR! 08 - Porzak, Customer Segmentation 53
Plots (k=4, seed = 2)
11Aug08 userR! 08 - Porzak, Customer Segmentation 54
Plots (k=4, seed = 3)
11Aug08 userR! 08 - Porzak, Customer Segmentation 55
Are any of these any good?
● If so, which?
● How to decide?
● Quoting Fritz (pg 15):
The actual choice of expectation-based Jaccard
with K = 6 clusters ... has been made manually by
comparing various solutions and selecting the one
which made most sense from the practitioners
point of view. This may seem unsatisfying because
the decision is subjective, but cluster analysis here
is used as a tool for exploratory data analysis and
offers simplified views of a complex data set.
11Aug08 userR! 08 - Porzak, Customer Segmentation 56
Our Selection Criteria
1. Choice of k, must have mostly ~ stable
solutions, and
2. Cluster profiles must be interpretable. IOW,
what is the story you can tell about each
cluster? Will the marketers relate to it?
11Aug08 userR! 08 - Porzak, Customer Segmentation 57
Your Challenge...
Do what Fritz said:
The actual choice ... has been made manually by
comparing various solutions and selecting the one
which made most sense.
Here are 4 runs for each k = 3 to 8; 24 in all.
Pick the “best” one, make up stories for each cluster,
and explain your choice to group.
11Aug08 userR! 08 - Porzak, Customer Segmentation 58
For the Record. Jim's Pick:
11Aug08 userR! 08 - Porzak, Customer Segmentation 59
Jim's Stories
Based on knowing a bit more about the client
than I can share with you.
#1: An “S” loyalist, high % SA's
#2: Favors name brands, high responders
#3: A “T” loyalist, broad but reduced responses
#4: Favors name brands, but otherwise low resp.
#5: Student, gray box, open source, desktop.
11Aug08 userR! 08 - Porzak, Customer Segmentation 60
Finally, using predict in flexclust
Once we (analysts & marketers) have decided on
a clustering model, we want to use it to assign
new respondents to likely segment.
flexclust includes predict:
persona <- predict(cl, csb)
head(persona)
str(persona)
PersonaPredict <- as.data.frame(persona)
names(PersonaPredict) <- "cluster"
> table(PersonaPredict)
PersonaPredict
1 2 3 4 5
2313 6479 4654 2702 3852
11Aug08 userR! 08 - Porzak, Customer Segmentation 61
Closing the Loop –
Tying Back to Purchase Model
Where ppBand is probability of purchase band ( 0 = 0.0 – 0.999,
1 = 0.10 – 0.199, … 9 = 0.90 – 0.999). IOW, 0 is really low & 9 is
really high probability of purchase according to the model
11Aug08 userR! 08 - Porzak, Customer Segmentation 62
Conclusion
11Aug08 userR! 08 - Porzak, Customer Segmentation 63
Follow up
● Slides and code will be up next week on
http://www.porzak.com/JimArchive/useR2008/
● Ping me with questions or comments:
jporzak@gmail.com
● Check out the San Francisco useR Group:
ia.meetup.com/67/
Thanks!
11Aug08 userR! 08 - Porzak, Customer Segmentation 64
Appendix
11Aug08 userR! 08 - Porzak, Customer Segmentation 65
section
11Aug08 userR! 08 - Porzak, Customer Segmentation 66
Code slide
##

More Related Content

What's hot

What Is Customer Engagement?
What Is Customer Engagement?What Is Customer Engagement?
What Is Customer Engagement?
Braze (formerly Appboy)
 
Marketing Research Paper
Marketing Research PaperMarketing Research Paper
Marketing Research Paper
Daniel Forster
 
Beer game esristhi
Beer game esristhiBeer game esristhi
Beer game esristhivijay kumar
 
A STUDY ON CONSUMER SATISFACTION AT BIG BAZAAR
A STUDY ON CONSUMER SATISFACTION AT BIG BAZAARA STUDY ON CONSUMER SATISFACTION AT BIG BAZAAR
A STUDY ON CONSUMER SATISFACTION AT BIG BAZAAR
Shashi Kishore
 
DRIVING THE LOYALTY SALES PERCENTAGE OF RELIANCE FRESH STORE, VAISHALI NAGAR,...
DRIVING THE LOYALTY SALES PERCENTAGE OF RELIANCE FRESH STORE, VAISHALI NAGAR,...DRIVING THE LOYALTY SALES PERCENTAGE OF RELIANCE FRESH STORE, VAISHALI NAGAR,...
DRIVING THE LOYALTY SALES PERCENTAGE OF RELIANCE FRESH STORE, VAISHALI NAGAR,...BHOMA RAM
 
Financial service marketing by sahil
Financial service marketing by sahilFinancial service marketing by sahil
Financial service marketing by sahil
Sahil Khosla
 
What is Key Account Management?
What is Key Account Management?What is Key Account Management?
What is Key Account Management?
Pipeliner CRM
 
Tomo Wavelabs I-Corps@NIH 121014
Tomo Wavelabs I-Corps@NIH 121014Tomo Wavelabs I-Corps@NIH 121014
Tomo Wavelabs I-Corps@NIH 121014
Stanford University
 
Consumer perception of big bazaar
Consumer perception of big bazaarConsumer perception of big bazaar
Consumer perception of big bazaar
Shweta Germanwala
 
CRMSession 2 - Customer Focused Organisations
CRMSession 2 - Customer Focused OrganisationsCRMSession 2 - Customer Focused Organisations
CRMSession 2 - Customer Focused Organisations
Welingkar Institute of Management Development & Research (WeSchool)
 
Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer SegmentationCarlos Soares
 
Selecting competitor to attack and avoid
Selecting competitor to attack and avoidSelecting competitor to attack and avoid
Selecting competitor to attack and avoid
Tribhuvan University
 
robinson faris wind b2b by ansh
robinson faris wind b2b  by anshrobinson faris wind b2b  by ansh
robinson faris wind b2b by ansh
Anshul Sharma
 
Customer segmentation
Customer segmentation Customer segmentation
Customer segmentation
CleverTap
 
4.6 customer lifetime value
4.6 customer lifetime value4.6 customer lifetime value
4.6 customer lifetime value
Rick Rasmussen
 
Business to Business Branding - Introduction to Branding
Business to Business Branding - Introduction to BrandingBusiness to Business Branding - Introduction to Branding
Business to Business Branding - Introduction to Branding
Rai University Ahmedabad
 
Chapter02
Chapter02Chapter02
Chapter02dvrs
 
04 marketing research report
04 marketing research report04 marketing research report
04 marketing research reportSelvakani Nadar
 
Competing with information technology
Competing with information technologyCompeting with information technology
Competing with information technologyAmrit Banstola
 
Distribution Channel Management
Distribution Channel ManagementDistribution Channel Management
Distribution Channel Management
Dr. Parveen Kaur Nagpal
 

What's hot (20)

What Is Customer Engagement?
What Is Customer Engagement?What Is Customer Engagement?
What Is Customer Engagement?
 
Marketing Research Paper
Marketing Research PaperMarketing Research Paper
Marketing Research Paper
 
Beer game esristhi
Beer game esristhiBeer game esristhi
Beer game esristhi
 
A STUDY ON CONSUMER SATISFACTION AT BIG BAZAAR
A STUDY ON CONSUMER SATISFACTION AT BIG BAZAARA STUDY ON CONSUMER SATISFACTION AT BIG BAZAAR
A STUDY ON CONSUMER SATISFACTION AT BIG BAZAAR
 
DRIVING THE LOYALTY SALES PERCENTAGE OF RELIANCE FRESH STORE, VAISHALI NAGAR,...
DRIVING THE LOYALTY SALES PERCENTAGE OF RELIANCE FRESH STORE, VAISHALI NAGAR,...DRIVING THE LOYALTY SALES PERCENTAGE OF RELIANCE FRESH STORE, VAISHALI NAGAR,...
DRIVING THE LOYALTY SALES PERCENTAGE OF RELIANCE FRESH STORE, VAISHALI NAGAR,...
 
Financial service marketing by sahil
Financial service marketing by sahilFinancial service marketing by sahil
Financial service marketing by sahil
 
What is Key Account Management?
What is Key Account Management?What is Key Account Management?
What is Key Account Management?
 
Tomo Wavelabs I-Corps@NIH 121014
Tomo Wavelabs I-Corps@NIH 121014Tomo Wavelabs I-Corps@NIH 121014
Tomo Wavelabs I-Corps@NIH 121014
 
Consumer perception of big bazaar
Consumer perception of big bazaarConsumer perception of big bazaar
Consumer perception of big bazaar
 
CRMSession 2 - Customer Focused Organisations
CRMSession 2 - Customer Focused OrganisationsCRMSession 2 - Customer Focused Organisations
CRMSession 2 - Customer Focused Organisations
 
Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer Segmentation
 
Selecting competitor to attack and avoid
Selecting competitor to attack and avoidSelecting competitor to attack and avoid
Selecting competitor to attack and avoid
 
robinson faris wind b2b by ansh
robinson faris wind b2b  by anshrobinson faris wind b2b  by ansh
robinson faris wind b2b by ansh
 
Customer segmentation
Customer segmentation Customer segmentation
Customer segmentation
 
4.6 customer lifetime value
4.6 customer lifetime value4.6 customer lifetime value
4.6 customer lifetime value
 
Business to Business Branding - Introduction to Branding
Business to Business Branding - Introduction to BrandingBusiness to Business Branding - Introduction to Branding
Business to Business Branding - Introduction to Branding
 
Chapter02
Chapter02Chapter02
Chapter02
 
04 marketing research report
04 marketing research report04 marketing research report
04 marketing research report
 
Competing with information technology
Competing with information technologyCompeting with information technology
Competing with information technology
 
Distribution Channel Management
Distribution Channel ManagementDistribution Channel Management
Distribution Channel Management
 

Viewers also liked

Segmentation of Targeting
Segmentation of TargetingSegmentation of Targeting
Segmentation of Targeting
Kumar P
 
RFM Segmentation
RFM SegmentationRFM Segmentation
RFM Segmentation
Kamil Bartocha
 
RFM: A Cool Tool for Simple Analytics
RFM: A Cool Tool for Simple AnalyticsRFM: A Cool Tool for Simple Analytics
RFM: A Cool Tool for Simple Analytics
C.TRAC Inc.
 
Customer Segmentation Masterclass - IIR 2010
Customer Segmentation Masterclass - IIR 2010Customer Segmentation Masterclass - IIR 2010
Customer Segmentation Masterclass - IIR 2010
Vladimir Dimitroff
 
An Introduction to RFM in Analytics
An Introduction to RFM in AnalyticsAn Introduction to RFM in Analytics
An Introduction to RFM in Analytics
SAS Canada
 
How to Create a Customer Segmentation Model
How to Create a Customer Segmentation ModelHow to Create a Customer Segmentation Model
How to Create a Customer Segmentation Model
Mark Haubert
 
Customer segmentation
Customer segmentationCustomer segmentation
Customer segmentationweave Belgium
 
Customer Clustering For Retail Marketing
Customer Clustering For Retail MarketingCustomer Clustering For Retail Marketing
Customer Clustering For Retail Marketing
Jonathan Sedar
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin RSelf-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
shanelynn
 
Cluster analysis for market segmentation
Cluster analysis for market segmentationCluster analysis for market segmentation
Cluster analysis for market segmentation
Vishal Tandel
 
Rfm clustering analysis
Rfm clustering analysisRfm clustering analysis
Rfm clustering analysis
air india
 
Behavioral Segmentation in AppMetrica
Behavioral Segmentation in AppMetricaBehavioral Segmentation in AppMetrica
Behavioral Segmentation in AppMetrica
Alexander Lukin
 
RFM Model Conversion Week
RFM Model Conversion WeekRFM Model Conversion Week
RFM Model Conversion Week
Andra Baragan
 
R workshop xiv--Survival Analysis with R
R workshop xiv--Survival Analysis with RR workshop xiv--Survival Analysis with R
R workshop xiv--Survival Analysis with R
Vivian S. Zhang
 
Behavioral Segmentation Analytics
Behavioral Segmentation AnalyticsBehavioral Segmentation Analytics
Behavioral Segmentation Analytics
Lance Wills
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Important Information About Behavioral Segmentation Models
Important Information About Behavioral Segmentation ModelsImportant Information About Behavioral Segmentation Models
Important Information About Behavioral Segmentation Models
esavageus
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Revolution Analytics
 
Behavioral Segmentation Case Study
Behavioral Segmentation Case StudyBehavioral Segmentation Case Study
Behavioral Segmentation Case StudyJonathan Drews
 

Viewers also liked (20)

Segmentation of Targeting
Segmentation of TargetingSegmentation of Targeting
Segmentation of Targeting
 
RFM Segmentation
RFM SegmentationRFM Segmentation
RFM Segmentation
 
RFM: A Cool Tool for Simple Analytics
RFM: A Cool Tool for Simple AnalyticsRFM: A Cool Tool for Simple Analytics
RFM: A Cool Tool for Simple Analytics
 
Customer Segmentation Masterclass - IIR 2010
Customer Segmentation Masterclass - IIR 2010Customer Segmentation Masterclass - IIR 2010
Customer Segmentation Masterclass - IIR 2010
 
An Introduction to RFM in Analytics
An Introduction to RFM in AnalyticsAn Introduction to RFM in Analytics
An Introduction to RFM in Analytics
 
How to Create a Customer Segmentation Model
How to Create a Customer Segmentation ModelHow to Create a Customer Segmentation Model
How to Create a Customer Segmentation Model
 
Customer segmentation
Customer segmentationCustomer segmentation
Customer segmentation
 
Customer Clustering For Retail Marketing
Customer Clustering For Retail MarketingCustomer Clustering For Retail Marketing
Customer Clustering For Retail Marketing
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin RSelf-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
 
Cluster analysis for market segmentation
Cluster analysis for market segmentationCluster analysis for market segmentation
Cluster analysis for market segmentation
 
RFM technical_brochure
RFM technical_brochureRFM technical_brochure
RFM technical_brochure
 
Rfm clustering analysis
Rfm clustering analysisRfm clustering analysis
Rfm clustering analysis
 
Behavioral Segmentation in AppMetrica
Behavioral Segmentation in AppMetricaBehavioral Segmentation in AppMetrica
Behavioral Segmentation in AppMetrica
 
RFM Model Conversion Week
RFM Model Conversion WeekRFM Model Conversion Week
RFM Model Conversion Week
 
R workshop xiv--Survival Analysis with R
R workshop xiv--Survival Analysis with RR workshop xiv--Survival Analysis with R
R workshop xiv--Survival Analysis with R
 
Behavioral Segmentation Analytics
Behavioral Segmentation AnalyticsBehavioral Segmentation Analytics
Behavioral Segmentation Analytics
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Important Information About Behavioral Segmentation Models
Important Information About Behavioral Segmentation ModelsImportant Information About Behavioral Segmentation Models
Important Information About Behavioral Segmentation Models
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
 
Behavioral Segmentation Case Study
Behavioral Segmentation Case StudyBehavioral Segmentation Case Study
Behavioral Segmentation Case Study
 

Similar to Using R for customer segmentation

Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Pentaho
 
Analytics Proposal
Analytics ProposalAnalytics Proposal
Analytics Proposal
Parag Tikekar
 
[Big] Data For Marketers: Targeting the Right Market
[Big] Data For Marketers: Targeting the Right Market[Big] Data For Marketers: Targeting the Right Market
[Big] Data For Marketers: Targeting the Right Market
Panji Winata
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
Aseda Owusua Addai-Deseh
 
SC for Startups
SC for StartupsSC for Startups
SC for Startups
Yahia Arafa
 
Webinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDBWebinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDB
MongoDB
 
8 Quick Market Sizing Approaches
8 Quick Market Sizing Approaches8 Quick Market Sizing Approaches
8 Quick Market Sizing ApproachesOpenView
 
How to Size a Market Opportunity — Fast
How to Size a Market Opportunity — FastHow to Size a Market Opportunity — Fast
How to Size a Market Opportunity — Fast
OpenView
 
Nastel AutoPilot Proactive Application Analytics
Nastel AutoPilot Proactive Application AnalyticsNastel AutoPilot Proactive Application Analytics
Nastel AutoPilot Proactive Application Analytics
jKool
 
Dataware case study
Dataware case study Dataware case study
Dataware case study
Zeeshan Ahmed
 
Mathematical Model For Customer Life Time Based Offer Management
Mathematical Model For Customer Life Time Based Offer ManagementMathematical Model For Customer Life Time Based Offer Management
Mathematical Model For Customer Life Time Based Offer Management
Zehra Kendir
 
Machine learning for customer classification
Machine learning for customer classificationMachine learning for customer classification
Machine learning for customer classification
Andrew Barnes
 
Customer Life Time Value
Customer Life Time ValueCustomer Life Time Value
Customer Life Time Value
Global Analytics Partners (GAP)
 
EXTENT-2015: Prognoz Market Surveillance
EXTENT-2015: Prognoz  Market SurveillanceEXTENT-2015: Prognoz  Market Surveillance
EXTENT-2015: Prognoz Market Surveillance
Iosif Itkin
 
Leveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataLeveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic data
MRS
 
1030 track2 komp
1030 track2 komp1030 track2 komp
1030 track2 komp
Rising Media, Inc.
 
Data science role in business
Data science role in businessData science role in business
Data science role in business
Sergey Sviridenko
 
Seagate
SeagateSeagate
I Know First Presentation (May 2016)
I Know First Presentation (May 2016)I Know First Presentation (May 2016)
I Know First Presentation (May 2016)
I Know First: Daily Market Forecast
 

Similar to Using R for customer segmentation (20)

Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
 
Analytics Proposal
Analytics ProposalAnalytics Proposal
Analytics Proposal
 
[Big] Data For Marketers: Targeting the Right Market
[Big] Data For Marketers: Targeting the Right Market[Big] Data For Marketers: Targeting the Right Market
[Big] Data For Marketers: Targeting the Right Market
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
Data Warehouse-Final
Data Warehouse-FinalData Warehouse-Final
Data Warehouse-Final
 
SC for Startups
SC for StartupsSC for Startups
SC for Startups
 
Webinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDBWebinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDB
 
8 Quick Market Sizing Approaches
8 Quick Market Sizing Approaches8 Quick Market Sizing Approaches
8 Quick Market Sizing Approaches
 
How to Size a Market Opportunity — Fast
How to Size a Market Opportunity — FastHow to Size a Market Opportunity — Fast
How to Size a Market Opportunity — Fast
 
Nastel AutoPilot Proactive Application Analytics
Nastel AutoPilot Proactive Application AnalyticsNastel AutoPilot Proactive Application Analytics
Nastel AutoPilot Proactive Application Analytics
 
Dataware case study
Dataware case study Dataware case study
Dataware case study
 
Mathematical Model For Customer Life Time Based Offer Management
Mathematical Model For Customer Life Time Based Offer ManagementMathematical Model For Customer Life Time Based Offer Management
Mathematical Model For Customer Life Time Based Offer Management
 
Machine learning for customer classification
Machine learning for customer classificationMachine learning for customer classification
Machine learning for customer classification
 
Customer Life Time Value
Customer Life Time ValueCustomer Life Time Value
Customer Life Time Value
 
EXTENT-2015: Prognoz Market Surveillance
EXTENT-2015: Prognoz  Market SurveillanceEXTENT-2015: Prognoz  Market Surveillance
EXTENT-2015: Prognoz Market Surveillance
 
Leveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataLeveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic data
 
1030 track2 komp
1030 track2 komp1030 track2 komp
1030 track2 komp
 
Data science role in business
Data science role in businessData science role in business
Data science role in business
 
Seagate
SeagateSeagate
Seagate
 
I Know First Presentation (May 2016)
I Know First Presentation (May 2016)I Know First Presentation (May 2016)
I Know First Presentation (May 2016)
 

Recently uploaded

SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 

Recently uploaded (20)

SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 

Using R for customer segmentation

  • 1. Using R for Customer Segmentation useR! 2008 Dortmund, Germany August, 2008 Jim Porzak, Senior Director of Analytics Responsys, Inc. San Francisco, California
  • 2. 11Aug08 userR! 08 - Porzak, Customer Segmentation 2 Outline ● Two main case study examples – Customer purchase behavior data ● Goal: actionable segments to improve LTV of customer base – Prospect intent & interest survey data ● Goal: actionable segments to better target messaging content and tactics ● Real data from real clients (sanitized) ● Workshop format – Hands on – Discussion heavy
  • 3. 11Aug08 userR! 08 - Porzak, Customer Segmentation 3 Introduction
  • 4. 11Aug08 userR! 08 - Porzak, Customer Segmentation 4 Why Segment? ● Better communication with customers and prospects – Recipient should feel that we understand him or her as an individual – “Send the right message to the right person at the right time” ● Challenges: – Widely applicable ● General rules based on readily available data ● A new contact can be placed in their segment easily – Usable ● Marketing can relate ● Technology can deliver
  • 5. 11Aug08 userR! 08 - Porzak, Customer Segmentation 5 Segmentation in Practice
  • 6. 11Aug08 userR! 08 - Porzak, Customer Segmentation 6 Behavioral Segmentation
  • 7. 11Aug08 userR! 08 - Porzak, Customer Segmentation 7 What's Behavioral Segmentation? ● Based on what people actually do – Not on what that say they do ● Purchase behavior – Discuss examples... ● Usage behavior – Discuss examples...
  • 8. 11Aug08 userR! 08 - Porzak, Customer Segmentation 8 Why do Behavioral Segmentation? ● All comes down to interacting with your customer or prospect in the appropriate way – From customers perspective, not yours! ● Ideally a “one-to-one” interaction – Not practical in today's world – Goal: perceived by customer as “one-to-one”
  • 9. 11Aug08 userR! 08 - Porzak, Customer Segmentation 9 Today's Purchase Behavior Data Set ● Actual web & phone sales records (sanitized) – 541k order detail lines – 135k Customers – Over 2 ½ years – Of ~900 different products – In 5 product categories ● Conventional wisdom – Strong seasonality – Have a loyal customer base – But, have retention problem
  • 10. 11Aug08 userR! 08 - Porzak, Customer Segmentation 10 What we know Date: 10/10/07 Order #: 12345 Customer: 3894832 Sue Smith 1 Short Street Qty SKU Description Unit Price Ext Price 1 123 1.50 1.50 3 345 White Widget 2.00 6.00 Total 7.50 Tax 0.60 Shipping 2.00 Grand Total 10.10 Smallville, ND, 39248 Green Gizzmo Imagine a customer order form: We get the highlighted data. Plus: order channel and product (SKU) category
  • 11. 11Aug08 userR! 08 - Porzak, Customer Segmentation 11 Preloaded as “orders” data frame > load("BehavioralDataSet.Rda") > str(orders) 'data.frame': 541101 obs. of 9 variables: $ SKU_ID : int 459 459 459 459 459 459 459 459 459 459 ... $ ChannelID: int 3 4 3 3 3 3 4 3 3 3 ... $ CustID : int 134945 212174 39861 11227 137271 60982 ... $ OrderID : int 326324 109305 172669 132642 20449 40826 ... $ OrderDate:Class 'Date' num [1:541101] 13211 13649 13670 ... $ Quantity : int 1 2 1 3 1 1 1 1 1 1 ... $ Amount : num 18 36 18 54 18 18 18 18 18 18 ... $ Channel : Factor w/ 4 levels "phone1","phone2",..: 3 4 3 3 ... $ Category : Factor w/ 7 levels "*","C","G","I",..: 3 3 3 3 3 ...
  • 12. 11Aug08 userR! 08 - Porzak, Customer Segmentation 12 orders summary > summary(orders[-(1:2)]) CustID OrderID OrderDate Quantity Min. : 2 Min. : 2 Min. :2005-09-01 Min. : 0.000 1st Qu.: 62221 1st Qu.:105292 1st Qu.:2006-07-18 1st Qu.: 1.000 Median :124343 Median :210908 Median :2007-02-14 Median : 1.000 Mean :152974 Mean :207535 Mean :2007-03-11 Mean : 1.113 3rd Qu.:185119 3rd Qu.:315711 3rd Qu.:2007-12-04 3rd Qu.: 1.000 Max. :506929 Max. :388319 Max. :2008-07-14 Max. :275.000 NA's : 4 Amount Channel Category Min. : 0.01 phone1: 14303 *: 0 1st Qu.: 20.00 phone2: 90 C:142147 Median : 30.00 web1 :451354 G:114300 Mean : 31.81 web2 : 75354 I: 14961 3rd Qu.: 35.00 N: 50385 Max. :4577.00 T:199354 X: 19954
  • 13. 11Aug08 userR! 08 - Porzak, Customer Segmentation 13 Goal of this exercise? ● Marketers need to come up with a communication strategy & associated tactics which will entice customers to exhibit higher LTV – Long Term Value. ● Segment by past purchase behavior to provide actionable subsets of customers – When marketers use our subsets, they get measurably better results than previous “one size fits all” method.
  • 14. 11Aug08 userR! 08 - Porzak, Customer Segmentation 14 How are we going to do this? (Discussion)
  • 15. 11Aug08 userR! 08 - Porzak, Customer Segmentation 15 Hints ● Live Stage ● Value ● Engagement ● Favorite Products ● Timing
  • 16. 11Aug08 userR! 08 - Porzak, Customer Segmentation 16 Recency, Frequency, & Monetary Metrics ● Recency – How long ago was last purchase? (days) – Measured for “As Of Date” of data set ● Frequency – How many orders in analysis period (2 ½ years) – Attempting to measure engagement ● Monetary – What is total $ value of all orders in analysis period Question: Do you expect these three to be uncorrelated?
  • 17. 11Aug08 userR! 08 - Porzak, Customer Segmentation 17 An Aside: Classical RFM ● Invented by direct marketers in 1950's as a way to model response rates (before good stat software was readily available) ● One typical method – R, F, & M each scored in quantile (typically 5) – Combined score for each recipient was concatenation of the three digits, eg “351” – Scores ranked by empirical response rate – Mailing then done to top xx% of list ● Today we use, lm, glm, randomForest, ... ● But, concepts still valid as conceptional model ● And, R & F measures typically very important in any predictive model
  • 18. 11Aug08 userR! 08 - Porzak, Customer Segmentation 18 I also typically include... ● Breadth – How many different SKUs purchased? ● Tenure – How long as customer been with us?
  • 19. 11Aug08 userR! 08 - Porzak, Customer Segmentation 19 Next Step – Aggregate by Customer ● We need some “raw” RFM values ● Make the data frame “RFM_raw” – CustomerID: the business key back to database – FirstPurchaseDate: interesting for tenure metric – LastPurchaseDate: basis of Recency – NumberOrders: basis of Frequency – NumberSKUs: basis of Breadth (engagement metric) – TotalAmount: basis of Monetary ● Also calculate – AsOfDate <- max(LastPurchaseDate)
  • 20. 11Aug08 userR! 08 - Porzak, Customer Segmentation 20 Building the RFM_raw data frame ## for performance, make OrderDate an integer during aggregation orders_n <- orders orders_n$OrderDate <- as.integer(orders_n$OrderDate) ## build up one column at a time RFM_raw <- with(orders_n, data.frame(CustomerID = sort(unique(CustID)))) RFM_raw <- cbind(RFM_raw, FirstPurchaseDate = with(orders_n, as.Date(as.integer(by(OrderDate, CustID, min)), "1970-01-01"))) RFM_raw <- cbind(RFM_raw, LastPurchaseDate = with(orders_n, as.Date(as.integer(by(OrderDate, CustID, max)), "1970-01-01"))) RFM_raw <- cbind(RFM_raw, NumberOrders = with(orders_n, as.numeric(by(OrderID, CustID, function(x) length(unique(x)))))) RFM_raw <- cbind(RFM_raw, NumberSKUs = with(orders_n, as.numeric(by(SKU_ID, CustID, function(x) length(unique(x)))))) RFM_raw <- cbind(RFM_raw, TotalAmount = with(orders_n, as.numeric(by(Amount, CustID, sum)))) AsOfDate <- max(RFM_raw$LastPurchaseDate) save(RFM_raw, AsOfDate, file = "RFM_raw.Rda") This take a while (1 ½ minutes on my laptop). You may want to download RFM_raw.Rda
  • 21. 11Aug08 userR! 08 - Porzak, Customer Segmentation 21 Do some RMF EDA ## Jim's miscellaneous DMA functions source("dma_misc.R") ## for interactive games: attach(RFM_raw) ## EDA plots using base graphics rfm.plot(as.numeric(AsOfDate - LastPurchaseDate) %/% 7, "rec") rfm.plot(NumberOrders, "freq") rfm.plot(TotalAmount, "mon") rfm.plot(NumberSKUs, "breadth") ## EDA plots using iPlots ihist(as.numeric(AsOfDate - LastPurchaseDate) %/% 7, title = "Recency") ihist(NumberOrders, title = "Frequency") ihist(TotalAmount, title = "Monetary") ihist(NumberSKUs, title = "Breadth")
  • 22. 11Aug08 userR! 08 - Porzak, Customer Segmentation 22 RFM EDA Plots In all cases, “best is left.”
  • 23. 11Aug08 userR! 08 - Porzak, Customer Segmentation 23 Assign reasonable RFM breaks ● Recency: – Breaks (weeks <=): 25, 51, 77, 103, <else> – levels = c("0-5", "6-11", "12-17", "18-23", "24-29")) ● Note levels labeled in months, not weeks ● Frequency: – Breaks (count <=): 1, 3, 7, <else> – levels = c("8+", "7-4", "3-2", "1")) ● Note ordering for best is left. ● Monetary: – Breaks (value <=): 50, 100, 200, 400, <else> – levels = c("401+", "400-201", "200-101", "100-51", "50-0")) ● Again ordering is best is left.
  • 24. 11Aug08 userR! 08 - Porzak, Customer Segmentation 24 Build RFM_segs data frame RFM_segs <- data.frame(Recency_weeks = as.numeric(AsOfDate - RFM_raw$LastPurchaseDate) %/% 7) row.names(RFM_segs) <- row.names(RFM_raw) ## now label levels with months rather than weeks RFM_segs$Recency <- ordered(ifelse(RFM_segs$Recency_weeks <= 25, "0-5", ifelse(RFM_segs$Recency_weeks <= 51, "6-11", ifelse(RFM_segs$Recency_weeks <= 77, "12-17", ifelse(RFM_segs$Recency_weeks <= 103, "18-23", "24-29")))), levels = c("0-5", "6-11", "12-17", "18-23", "24-29")) RFM_segs$Frequency_count <- RFM_raw$NumberOrders RFM_segs$Frequency <- ordered(ifelse(RFM_segs$Frequency_count == 1, "1", ifelse(RFM_segs$Frequency_count <= 3, "3-2", ifelse(RFM_segs$Frequency_count <= 7, "7-4", "8+"))), levels = c("8+", "7-4", "3-2", "1")) RFM_segs$Monetary_value <- RFM_raw$TotalAmount RFM_segs$Monetary <- ordered(ifelse(RFM_segs$Monetary_value <= 50, "50-0", ifelse(RFM_segs$Monetary_value <= 100, "100-51", ifelse(RFM_segs$Monetary_value <= 200, "200-101", ifelse(RFM_segs$Monetary_value <= 400, "400-201", "401+")))), levels = c("401+", "400-201", "200-101", "100-51", "50-0"))
  • 25. 11Aug08 userR! 08 - Porzak, Customer Segmentation 25 We typically also add Breadth & Tenure: RFM_segs$Breadth_count <- RFM_raw$NumberSKUs RFM_segs$Breadth <- ordered(ifelse(RFM_segs$Breadth_count == 1, "1", ifelse(RFM_segs$Breadth_count == 2, "2", ifelse(RFM_segs$Breadth_count <= 4, "4-3", ifelse(RFM_segs$Breadth_count <= 9, "9-5", "10+")))), levels = c("10+", "9-5", "4-3", "2", "1")) RFM_segs$Tenure_weeks <- as.numeric(AsOfDate - FirstPurchaseDate) %/% 7 RFM_segs$Tenure <- ordered(ifelse(RFM_segs$Tenure_weeks <= 12, "0-12", ifelse(RFM_segs$Tenure_weeks <= 25, "13-25", ifelse(RFM_segs$Tenure_weeks <= 38, "26-38", ifelse(RFM_segs$Tenure_weeks <= 51, "39-51", ifelse(RFM_segs$Tenure_weeks <= 64, "52-64", ifelse(RFM_segs$Tenure_weeks <= 77, "65-77", ifelse(RFM_segs$Tenure_weeks <= 90, "78-90", ifelse(RFM_segs$Tenure_weeks <= 103, "91-103", "104+")))))))), levels = c("104+", "91-103", "78-90", "65-77", "52-64", "39-51", "26-38", "13-25", "0-12")) save(RFM_segs, file = "RFM_segs.Rda")
  • 26. 11Aug08 userR! 08 - Porzak, Customer Segmentation 26 How do customers look in RFM space? ● I like mosaic plots (& especially vcd* package!) ● Set up a “structure table” with assignments: ● And a convenience function for mosaic: require(vcd) RFM_st <- structable(~ Recency + Frequency + Monetary + Breadth, data = RFM_segs) mm <- function(f) { mosaic(f, data = RFM_st, shade = TRUE, labeling_args = list(rot_labels = c(left = 90, top = 45), just_labels = c(left = "left", top = "center")), spacing = spacing_dimequal(unit(c(0.5, 0.8), "lines")), keep_aspect_ratio = FALSE ) } * To learn more, attend: The strucplot framework for Visualizing Categorical Data. Wed, 11:30. E29
  • 27. 11Aug08 userR! 08 - Porzak, Customer Segmentation 27 mm(~ Recency + Frequency)
  • 28. 11Aug08 userR! 08 - Porzak, Customer Segmentation 28 mm(~ Frequency + Monetary)
  • 29. 11Aug08 userR! 08 - Porzak, Customer Segmentation 29 mm(~ Recency + Monetary)
  • 30. 11Aug08 userR! 08 - Porzak, Customer Segmentation 30 mm(~ Breadth + Monetary)
  • 31. 11Aug08 userR! 08 - Porzak, Customer Segmentation 31 To really show off vcd! pairs(RFM_st, lower_panel = pairs_assoc, shade = TRUE)
  • 32. 11Aug08 userR! 08 - Porzak, Customer Segmentation 32 Time to get real – remember goal?
  • 33. 11Aug08 userR! 08 - Porzak, Customer Segmentation 33 Actionable for Marketers The big two concepts: 1. Lifestage 2. Value Turns out we can do both with Recency & Frequency!
  • 34. 11Aug08 userR! 08 - Porzak, Customer Segmentation 34 Use Balloon Plots to Communicate require(gplots) # Recency by Frequence - Counts RxF <- as.data.frame(table(RFM_segs$Recency, RFM_segs$Frequency, dnn = c("Recency", "Frequency")), responseName = "Number_Customers") with(RxF, balloonplot(Recency, Frequency, Number_Customers, zlab = "# Customers")) # Recency by Frequency - Annual Value (total annual sales to segment) VbyRxF <- (aggregate(RFM_segs$Monetary_value, by = list(Recency = factor(RFM_segs$Recency), Frequency = RFM_segs$Frequency), sum)) names(VbyRxF)[3] <- "Annual_Sales" VbyRxF$Annual_Sales <- VbyRxF$Annual_Sales / (28/12) ## normalize to annual revnue with(VbyRxF, balloonplot(Recency, Frequency, Annual_Sales / 1000, zlab = "Annual Sales (000)"))
  • 35. 11Aug08 userR! 08 - Porzak, Customer Segmentation 35 Recency by Frequency - Counts
  • 36. 11Aug08 userR! 08 - Porzak, Customer Segmentation 36 Recency by Frequency - Value
  • 37. 11Aug08 userR! 08 - Porzak, Customer Segmentation 37 Exercise – Assign Segments ● Lifestage “dimension” – New – Active – Lapsed – Lost ● Value “dimension” – Gold – Silver – Bronze ● Combined as – High Value, Repeat, New, One-time, Lapsed, & Lost
  • 38. 11Aug08 userR! 08 - Porzak, Customer Segmentation 38 Color & Label Segment Cells # a matrix of segment codes RF_segs0 <- matrix("", nrow = 4, ncol = 5) # manually make assignments object.browser() ## Fill in H, R, N, L, or O. Save as RF_segs.txt # get back into R RF_segs <- as.matrix(read.delim("RF_segs.txt", sep = "t", na.strings = "")) RF_segs[is.na(RF_segs)] <- "X" ## N/A's become “Lost” # add colors and labels to balloon plot # Magic values for balloon cell centers RF_x <- matrix(2:6 + 0.25, nrow = 4, ncol = 5, byrow = TRUE) RF_y <- matrix(4:1, nrow = 4, ncol = 5, byrow = FALSE) RF_cols <- sapply(RF_segs, function(x) switch(x, H="gold", R="slategray2", N="green", L="yellow", O="darkgreen", "red")) points(RF_x, RF_y, col = RF_cols, pch = 16, cex = 12) text(RF_x, RF_y, RF_segs, cex = 2)
  • 39. 11Aug08 userR! 08 - Porzak, Customer Segmentation 39 Final Segments for Marketers
  • 40. 11Aug08 userR! 08 - Porzak, Customer Segmentation 40 Conceptual RF Segments
  • 41. 11Aug08 userR! 08 - Porzak, Customer Segmentation 41 Break Time!
  • 42. 11Aug08 userR! 08 - Porzak, Customer Segmentation 42 Attitudinal Segmentation
  • 43. 11Aug08 userR! 08 - Porzak, Customer Segmentation 43 Marketing Challenge ● Our client offers free download of software with high perceived value, but ● First asks user to fill out a simple survey ● Challenge is to come up with a “few” segments that will be used by segment to: – Prioritize contact strategy – Craft marketing messages based on profile
  • 44. 11Aug08 userR! 08 - Porzak, Customer Segmentation 44 Sample Data ● Surveys from 20k respondents ● All within same time frame (a number of weeks) ● All requested the software download
  • 45. 11Aug08 userR! 08 - Porzak, Customer Segmentation 45 Survey Description ● 35 check boxes or radio buttons – None required. Coded as binary responses ● Arranged in 5 sections – License: W and/or X – Role: one of D, SA, ITM, ITA, Str, Oth (radio buttons) – System: any of S, T, A, B, C, D, O (check boxes) – Interest: any of M, O Pl, Pr, Sup, 64, Con, Per, DT, Z, Oth. (check boxes) – Application: any of Web, Inf, Col, Db, J2, Top, Dev, Per, Other (check boxes)
  • 46. 11Aug08 userR! 08 - Porzak, Customer Segmentation 46 Data Set Provided as data frame csb, in InterestPreferenceSurvey.Rda # Getting started setwd("C:/Data/useR08/R") require(lattice) require(grDevices) require(vcd) require(flexclust) load(file = "InterestPreferenceSurvey.Rda") str(csb) 'data.frame': 20000 obs. of 35 variables: $ Lic_W : int 0 0 0 0 0 0 0 0 0 0 ... $ Lic_X : int 1 1 1 0 1 1 1 1 1 1 ... $ Role_D : int 0 0 0 0 0 0 0 0 1 0 ... $ Role_SA : int 0 0 1 0 1 0 0 1 0 0 ... $ Role_ITM: int 0 0 0 1 0 0 0 0 0 0 ... $ Role_ITA: int 0 0 0 0 0 0 0 0 0 0 ...
  • 47. 11Aug08 userR! 08 - Porzak, Customer Segmentation 47 Proportion Responders by Question > mean(csb) Lic_W Lic_X Role_D Role_SA Role_ITM Role_ITA Role_Stu Role_Oth 0.16040 0.90980 0.19905 0.32910 0.06905 0.08465 0.21080 0.05090 Sys_S Sys_T Sys_A Sys_B Sys_C Sys_D Sys_O Int_M 0.17780 0.39720 0.17020 0.13975 0.09325 0.03510 0.19260 0.36960 Int_O Int_Pl Int_Pr Int_Sup Int_64 Int_Con Int_Per Int_DT 0.46810 0.09395 0.10055 0.08985 0.23445 0.21235 0.31420 0.11790 Int_Z Int_Oth Ap_Web Ap_Inf Ap_Col Ap_Db Ap_J2 Ap_Top 0.23450 0.05995 0.39640 0.19125 0.18365 0.30125 0.19455 0.30145 Ap_Dev Ap_Per Ap_Other 0.18960 0.20050 0.03735
  • 48. 11Aug08 userR! 08 - Porzak, Customer Segmentation 48 Clustering Strategy ● flexclust package by Fritz Leisch ● See his 2006 paper (on his personal page): A Toolbox for K-Centroids Cluster Analysis ● This is (mostly) an optional response type survey – 1 = “yes” is significant – 0 is just absence not really a “no” – Respondents checking Role_SA have much more in common than those not checking Role_SA ● Following Fritz's argument we use the expectation based Jaccard distance measure.
  • 49. 11Aug08 userR! 08 - Porzak, Customer Segmentation 49 A First Cluster Run require(flexclust) ## set up flexclust control object fc_cont <- new("flexclustControl") fc_cont@tolerance <- 0.1 ## this doesn't seem to work as expected fc_cont@iter.max <- 30 ## seems to be effective convergence ##fc_cont@verbose <- 1 ## set TRUE if to see each step my_seed <- 0 my_family <- "ejaccard" num_clust <- 4 my_seed <- my_seed + 1 set.seed(my_seed) cl <- kcca(csb, k = num_clust, save.data = TRUE, control = fc_cont, family = kccaFamily(my_family)) ## This takes ~ 1.5 min. on my laptop
  • 50. 11Aug08 userR! 08 - Porzak, Customer Segmentation 50 Cluster Summary > summary(cl) kcca object of family 'ejaccard' call: kcca(x = csb, k = num_clust, family = kccaFamily(my_family), control = fc_cont, save.data = TRUE) cluster info: size av_dist max_dist separation 1 5551 0.7159832 1 0.6766653 2 4577 0.7707523 1 0.7437616 3 2535 0.7482347 1 0.7038259 4 7337 0.7215583 1 0.6732479 no convergence after 200 iterations sum of within cluster distances: 14693.00
  • 51. 11Aug08 userR! 08 - Porzak, Customer Segmentation 51 Run Plots pop_av_dist <- with(cl@clusinfo, sum(size*av_dist)/sum(size)) main_txt <- paste("kcca ", c1@family@name, " - ", num_clust, " clusters (", nsamp, "k sample, seed = ", my_seed, ")", sep = "") # Neighborhood Graph on 1st principle components csb.pca <- prcomp(csb) plot(cl, data = as.matrix(csb), project = csb.pca, main = main_txt, sub = paste("nAv Dist = ", format(pop_av_dist, digits = 5), ", k = ", c1@k, sep = "") ) # Activity Profiles for each segment print(barchart(cl, main = main_txt, strip.prefix = "#", scales = list(cex = 0.6)))
  • 52. 11Aug08 userR! 08 - Porzak, Customer Segmentation 52 Plots (k=4, seed = 1)
  • 53. 11Aug08 userR! 08 - Porzak, Customer Segmentation 53 Plots (k=4, seed = 2)
  • 54. 11Aug08 userR! 08 - Porzak, Customer Segmentation 54 Plots (k=4, seed = 3)
  • 55. 11Aug08 userR! 08 - Porzak, Customer Segmentation 55 Are any of these any good? ● If so, which? ● How to decide? ● Quoting Fritz (pg 15): The actual choice of expectation-based Jaccard with K = 6 clusters ... has been made manually by comparing various solutions and selecting the one which made most sense from the practitioners point of view. This may seem unsatisfying because the decision is subjective, but cluster analysis here is used as a tool for exploratory data analysis and offers simplified views of a complex data set.
  • 56. 11Aug08 userR! 08 - Porzak, Customer Segmentation 56 Our Selection Criteria 1. Choice of k, must have mostly ~ stable solutions, and 2. Cluster profiles must be interpretable. IOW, what is the story you can tell about each cluster? Will the marketers relate to it?
  • 57. 11Aug08 userR! 08 - Porzak, Customer Segmentation 57 Your Challenge... Do what Fritz said: The actual choice ... has been made manually by comparing various solutions and selecting the one which made most sense. Here are 4 runs for each k = 3 to 8; 24 in all. Pick the “best” one, make up stories for each cluster, and explain your choice to group.
  • 58. 11Aug08 userR! 08 - Porzak, Customer Segmentation 58 For the Record. Jim's Pick:
  • 59. 11Aug08 userR! 08 - Porzak, Customer Segmentation 59 Jim's Stories Based on knowing a bit more about the client than I can share with you. #1: An “S” loyalist, high % SA's #2: Favors name brands, high responders #3: A “T” loyalist, broad but reduced responses #4: Favors name brands, but otherwise low resp. #5: Student, gray box, open source, desktop.
  • 60. 11Aug08 userR! 08 - Porzak, Customer Segmentation 60 Finally, using predict in flexclust Once we (analysts & marketers) have decided on a clustering model, we want to use it to assign new respondents to likely segment. flexclust includes predict: persona <- predict(cl, csb) head(persona) str(persona) PersonaPredict <- as.data.frame(persona) names(PersonaPredict) <- "cluster" > table(PersonaPredict) PersonaPredict 1 2 3 4 5 2313 6479 4654 2702 3852
  • 61. 11Aug08 userR! 08 - Porzak, Customer Segmentation 61 Closing the Loop – Tying Back to Purchase Model Where ppBand is probability of purchase band ( 0 = 0.0 – 0.999, 1 = 0.10 – 0.199, … 9 = 0.90 – 0.999). IOW, 0 is really low & 9 is really high probability of purchase according to the model
  • 62. 11Aug08 userR! 08 - Porzak, Customer Segmentation 62 Conclusion
  • 63. 11Aug08 userR! 08 - Porzak, Customer Segmentation 63 Follow up ● Slides and code will be up next week on http://www.porzak.com/JimArchive/useR2008/ ● Ping me with questions or comments: jporzak@gmail.com ● Check out the San Francisco useR Group: ia.meetup.com/67/ Thanks!
  • 64. 11Aug08 userR! 08 - Porzak, Customer Segmentation 64 Appendix
  • 65. 11Aug08 userR! 08 - Porzak, Customer Segmentation 65 section
  • 66. 11Aug08 userR! 08 - Porzak, Customer Segmentation 66 Code slide ##