This document discusses using R for customer segmentation. It outlines using purchase behavior data and survey response data to create actionable customer segments. The goal is to improve customer lifetime value by sending targeted messages. The document demonstrates building RFM (recency, frequency, monetary) metrics and segments from purchase data, including data aggregation, metric calculation, and segment assignment. Visualizations of customers in the RFM space are shown to understand segment distributions.
Customer Segmentation with R - Deep Dive into flexclustJim Porzak
Jim Porzak's presentation at useR! 2015 in Aalborg, Denmark. Learn on how to segment customers based on stated interest surveys using the flexclust package in R. Covers basic customer segmentation concepts, introduction to flexclust, and solutions to three practical issues: the numbering problem, the stability problem, and the best choice for k.
Accountable Care Organizations (ACOs) are organizations of health care providers who provide care to a group of patients. Created in an attempt to decrease the cost of service delivery and increase efficiency, value and profit, these organizations are new territory for the CPA professional. This presentation was given to the Michigan Association of Certified Public Accountants at their Healthcare Conference on April 23, 2013.
Join the author of “The Wallet Allocation Rule” as he dives into the research and analytics behind winning more share of wallet. In this session, Luke Williams explores the wallet allocation rule, and how changing methods or measurement and analysis, to more actively predict change, can be a sustainable competitive advantage.
Go-to-market strategy, positioning and viral growthLisa Enckell
Presentation during entrepreneurship week at Luleå University, Sweden on go to market strategies, positioning and viral growth. Examples and learnings from Wrapp, Clue and other startups.
GREEN MARKETING - AN ANALYSIS OF CONSUMER BEHAVIOUR TOWARDS GREEN PRODUCTSTanushree Bhowmick
This paper was presented in the Ist International Conference on Business & Information Management (ICBIM), 2012, organized by NIT, Durgapur. This is basically a research paper aiming to contribute towards the growing ecological concern that most marketers want to address these days. It also tries to unveil consumer behaviour towards the purchase and consumption of eco-friendly products.
Customer Segmentation with R - Deep Dive into flexclustJim Porzak
Jim Porzak's presentation at useR! 2015 in Aalborg, Denmark. Learn on how to segment customers based on stated interest surveys using the flexclust package in R. Covers basic customer segmentation concepts, introduction to flexclust, and solutions to three practical issues: the numbering problem, the stability problem, and the best choice for k.
Accountable Care Organizations (ACOs) are organizations of health care providers who provide care to a group of patients. Created in an attempt to decrease the cost of service delivery and increase efficiency, value and profit, these organizations are new territory for the CPA professional. This presentation was given to the Michigan Association of Certified Public Accountants at their Healthcare Conference on April 23, 2013.
Join the author of “The Wallet Allocation Rule” as he dives into the research and analytics behind winning more share of wallet. In this session, Luke Williams explores the wallet allocation rule, and how changing methods or measurement and analysis, to more actively predict change, can be a sustainable competitive advantage.
Go-to-market strategy, positioning and viral growthLisa Enckell
Presentation during entrepreneurship week at Luleå University, Sweden on go to market strategies, positioning and viral growth. Examples and learnings from Wrapp, Clue and other startups.
GREEN MARKETING - AN ANALYSIS OF CONSUMER BEHAVIOUR TOWARDS GREEN PRODUCTSTanushree Bhowmick
This paper was presented in the Ist International Conference on Business & Information Management (ICBIM), 2012, organized by NIT, Durgapur. This is basically a research paper aiming to contribute towards the growing ecological concern that most marketers want to address these days. It also tries to unveil consumer behaviour towards the purchase and consumption of eco-friendly products.
Customer engagement is the process of actively building, nurturing, and managing relationships with customers. Customer engagement can unlock exponential growth for your company. Learn more about how it's done.
A group of 4 Marketing/MBA students and I worked on a live project for a local business looking to determine a way to increase sales to students at UTD. This marketing research project utilized data gathering techniques, analytics tools, and critical thinking in order to make conclusions based on our primary data.
What is Key Account Management? Key account management (KAM) defines full relationship between your business and the customers you are selling to. It describes the individual approach of sales people to their customers in order to create long everlasting business relationship.
What is customer segmentation and what are some best practices around segmenting your mobile app users? This Slideshare presents 4 best practices for effectively grouping your customers into actionable segments.
To know more, visit: https://clevertap.com/blog/customer-segmentation-examples-for-better-mobile-marketing/
Branding begins with the consistency of presentation that becomes the identity of a company. This session comprises deep study of business to business branding.
Distribution Channel
Management of Distribution Channel
Need of Distribution Channel
Need for Channel Management
Channel Partners and their Functions
Difference between Distributor and Wholesaler
Choice of Distribution System
Distribution Strategy
Factors Affecting Effective Management of Distribution Channels
Channel Conflict
Conflict Resolution
Motivating Channel Members
Selecting Channel Partners
Evaluating Channels
Channel Control
RFM Segmentation is the easiest and most frequently used form of database segmentation. It is based on three key metrics: Recency, Frequency and Monetary Value of customer activity. RFM is often used with transactional history in e-commerce, but can also work for Social Media interactions, online gaming or discussion boards. Based on calculated segments a marketer can prepare cross-sell, up-sell, retention and reactivation capampaigns. This deck provides a simple introduction to the RFM Segmentation methodology.
Customer engagement is the process of actively building, nurturing, and managing relationships with customers. Customer engagement can unlock exponential growth for your company. Learn more about how it's done.
A group of 4 Marketing/MBA students and I worked on a live project for a local business looking to determine a way to increase sales to students at UTD. This marketing research project utilized data gathering techniques, analytics tools, and critical thinking in order to make conclusions based on our primary data.
What is Key Account Management? Key account management (KAM) defines full relationship between your business and the customers you are selling to. It describes the individual approach of sales people to their customers in order to create long everlasting business relationship.
What is customer segmentation and what are some best practices around segmenting your mobile app users? This Slideshare presents 4 best practices for effectively grouping your customers into actionable segments.
To know more, visit: https://clevertap.com/blog/customer-segmentation-examples-for-better-mobile-marketing/
Branding begins with the consistency of presentation that becomes the identity of a company. This session comprises deep study of business to business branding.
Distribution Channel
Management of Distribution Channel
Need of Distribution Channel
Need for Channel Management
Channel Partners and their Functions
Difference between Distributor and Wholesaler
Choice of Distribution System
Distribution Strategy
Factors Affecting Effective Management of Distribution Channels
Channel Conflict
Conflict Resolution
Motivating Channel Members
Selecting Channel Partners
Evaluating Channels
Channel Control
RFM Segmentation is the easiest and most frequently used form of database segmentation. It is based on three key metrics: Recency, Frequency and Monetary Value of customer activity. RFM is often used with transactional history in e-commerce, but can also work for Social Media interactions, online gaming or discussion boards. Based on calculated segments a marketer can prepare cross-sell, up-sell, retention and reactivation capampaigns. This deck provides a simple introduction to the RFM Segmentation methodology.
A presentation from the 2014 National Postal Forum by Gary Seitz, Executive VP of C.TRAC, on Recency, Frequency, and Monetary (RFM) analysis as a simple tool to help mailers.
Fundamentals and advanced concepts in customer segmentation. CLV (customer lifetime value) and specific implications in Telecoms. Approaches in operational deployment of customer segmentation.
This quick start guide provides an Enterprise Guide project that categorizes customers into a predefined number of ‘segments’ based on the score from the RFM analysis as well as:
Introduction to the RFM model
Data Requirements
SAS project configuration considerations
Model Description
Workflow Overview and Build
How to Create a Customer Segmentation ModelMark Haubert
Are your sales and marketing teams focused on the right customers? Learn how to define your Ideal Customer Criteria, create a Customer Segmentation Model, identify your Key Accounts and focus your teams on customers with the greatest potential for growth.
45min talk given at LondonR March 2014 Meetup.
The presentation describes how one might go about an insights-driven data science project using the R language and packages, using an open source dataset.
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin Rshanelynn
Self-Organising maps for Customer Segmentation using R.
These slides are from a talk given to the Dublin R Users group on 20th January 2014. The slides describe the uses of customer segmentation, the algorithm behind Self-Organising Maps (SOMs) and go through two use cases, with example code in R.
Accompanying code and datasets now available at http://shanelynn.ie/index.php/self-organising-maps-for-customer-segmentation-using-r/.
User-centric segmentation for all AppMetrica reports. Get new insights from your usual audience reports grouping users by different behavior models.
Which on-boarding scenario brings more conversions?
Select a cohort of previously loyal users who haven’t launched the app last month.
Identify user groups relevant for the special offer you are going to launch.
Think up complicated questions about your users. AppMetrica brings you the right answers.
To increase conversions on your site, you must first understand your visitors and their experiences. Once you can tie these experiences to outcomes (conversions), only then can you optimize and improve the user experience.
This presentation demonstrates how conducting behavioral segmentation on your site's visitors can help you build better A/B testing scenarios, optimize the user experience, and increase conversions.
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RRevolution Analytics
Everything happens somewhere and spatial analysis attempts to use location as an explanatory variable. Such analysis is made complex by the very many ways we habitually record spatial location, the complexity of spatial data structures, and the wide variety of possible domain-driven questions we might ask. One option is to develop and use software for specific types of spatial data, another is to use a purpose-built geographical information system (GIS), but determined work by R enthusiasts has resulted in a multiplicity of packages in the R environment that can also be used.
With the combination of Pentaho and MongoDB, it’s drastically simpler and faster to build single analytical views of clients by aggregating and blending data from a variety of internal sources (customer, transaction, position data) and external sources (social networking, central bank, news, pricing) with fast response times.
Webinar covers:
An insider’s view of new ways financial services companies are using MongoDB to rapidly store and consume unlimited shapes and sizes of data
How Pentaho makes it easy to enrich data in MongoDB with predictive scoring, visual data integration tools, reports, interactive dashboards, and data visualizations
A live demo of blending Twitter, equity pricing, and news data into a single analytical view that unlocks market intelligence to create investment opportunities
[Big] Data For Marketers: Targeting the Right MarketPanji Winata
big data analytics for marketers: customer 360 degree profile, customer segmentation using RFM, Smartphone Adoption Prediction for Telecommunication Service, Event Trigger for Preventing Customer Churn in Telecommunication Service
Presentation on "A Complete Overview of Data Driven Decision Making in a Quickly Changing Business Environment" given by Isaac Aidoo, Head of Data Analytics, Zoona.
Webinar: Making A Single View of the Customer Real with MongoDBMongoDB
Tier 1 banks, top insurance providers and other global financial services institutions have discovered that with the use of MongoDB, they are able to achieve a single view of the customer. This allows them not only to comply with KYC and other regulations, but also to engage customers efficiently, which helps reduce churn and increase wallet share while reducing costs. We will focus on how MongoDB's dynamic schema, real-time replication and auto-scaling make it possible to create a global, unified data hub aggregating disparate data sources, which can be made available to customers, customer service representatives (CSRs), and relationship managers (RMs).
Reliable market sizing doesn't have to be as complicated or painstakingly slow as you think. This presentation offers a quick overview of the art and science of market sizing, and offers a step-by-step guide on how to conduct seven fast market sizing approaches.
A presentation on Nastel AutoPilot's capabilities for advanced application analytics. Based on Complex Event Processing (CEP) it provides early warning about potential or actual problems across multiple data sources - and it does it in real-time.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
1. Using R for Customer Segmentation
useR! 2008
Dortmund, Germany
August, 2008
Jim Porzak,
Senior Director of Analytics
Responsys, Inc.
San Francisco, California
2. 11Aug08 userR! 08 - Porzak, Customer Segmentation 2
Outline
● Two main case study examples
– Customer purchase behavior data
● Goal: actionable segments to improve LTV of customer
base
– Prospect intent & interest survey data
● Goal: actionable segments to better target messaging
content and tactics
● Real data from real clients (sanitized)
● Workshop format
– Hands on
– Discussion heavy
4. 11Aug08 userR! 08 - Porzak, Customer Segmentation 4
Why Segment?
● Better communication with customers and
prospects
– Recipient should feel that we understand him or her
as an individual
– “Send the right message to the right person at the
right time”
● Challenges:
– Widely applicable
● General rules based on readily available data
● A new contact can be placed in their segment easily
– Usable
● Marketing can relate
● Technology can deliver
5. 11Aug08 userR! 08 - Porzak, Customer Segmentation 5
Segmentation in Practice
7. 11Aug08 userR! 08 - Porzak, Customer Segmentation 7
What's Behavioral Segmentation?
● Based on what people actually do
– Not on what that say they do
● Purchase behavior
– Discuss examples...
● Usage behavior
– Discuss examples...
8. 11Aug08 userR! 08 - Porzak, Customer Segmentation 8
Why do Behavioral Segmentation?
● All comes down to interacting with your
customer or prospect in the appropriate way
– From customers perspective, not yours!
● Ideally a “one-to-one” interaction
– Not practical in today's world
– Goal: perceived by customer as “one-to-one”
9. 11Aug08 userR! 08 - Porzak, Customer Segmentation 9
Today's Purchase Behavior Data Set
● Actual web & phone sales records (sanitized)
– 541k order detail lines
– 135k Customers
– Over 2 ½ years
– Of ~900 different products
– In 5 product categories
● Conventional wisdom
– Strong seasonality
– Have a loyal customer base
– But, have retention problem
10. 11Aug08 userR! 08 - Porzak, Customer Segmentation 10
What we know
Date: 10/10/07 Order #: 12345
Customer: 3894832
Sue Smith
1 Short Street
Qty SKU Description Unit Price Ext Price
1 123 1.50 1.50
3 345 White Widget 2.00 6.00
Total 7.50
Tax 0.60
Shipping 2.00
Grand Total 10.10
Smallville, ND, 39248
Green Gizzmo
Imagine a customer order form:
We get the highlighted data.
Plus: order channel and product (SKU) category
12. 11Aug08 userR! 08 - Porzak, Customer Segmentation 12
orders summary
> summary(orders[-(1:2)])
CustID OrderID OrderDate Quantity
Min. : 2 Min. : 2 Min. :2005-09-01 Min. : 0.000
1st Qu.: 62221 1st Qu.:105292 1st Qu.:2006-07-18 1st Qu.: 1.000
Median :124343 Median :210908 Median :2007-02-14 Median : 1.000
Mean :152974 Mean :207535 Mean :2007-03-11 Mean : 1.113
3rd Qu.:185119 3rd Qu.:315711 3rd Qu.:2007-12-04 3rd Qu.: 1.000
Max. :506929 Max. :388319 Max. :2008-07-14 Max. :275.000
NA's : 4
Amount Channel Category
Min. : 0.01 phone1: 14303 *: 0
1st Qu.: 20.00 phone2: 90 C:142147
Median : 30.00 web1 :451354 G:114300
Mean : 31.81 web2 : 75354 I: 14961
3rd Qu.: 35.00 N: 50385
Max. :4577.00 T:199354
X: 19954
13. 11Aug08 userR! 08 - Porzak, Customer Segmentation 13
Goal of this exercise?
● Marketers need to come up with a
communication strategy & associated tactics
which will entice customers to exhibit higher
LTV – Long Term Value.
● Segment by past purchase behavior to provide
actionable subsets of customers
– When marketers use our subsets, they get
measurably better results than previous “one size
fits all” method.
14. 11Aug08 userR! 08 - Porzak, Customer Segmentation 14
How are we going to do this?
(Discussion)
16. 11Aug08 userR! 08 - Porzak, Customer Segmentation 16
Recency, Frequency, & Monetary Metrics
● Recency
– How long ago was last purchase? (days)
– Measured for “As Of Date” of data set
● Frequency
– How many orders in analysis period (2 ½ years)
– Attempting to measure engagement
● Monetary
– What is total $ value of all orders in analysis period
Question: Do you expect these three to be uncorrelated?
17. 11Aug08 userR! 08 - Porzak, Customer Segmentation 17
An Aside: Classical RFM
● Invented by direct marketers in 1950's as a way to
model response rates (before good stat software was
readily available)
● One typical method
– R, F, & M each scored in quantile (typically 5)
– Combined score for each recipient was
concatenation of the three digits, eg “351”
– Scores ranked by empirical response rate
– Mailing then done to top xx% of list
● Today we use, lm, glm, randomForest, ...
● But, concepts still valid as conceptional model
● And, R & F measures typically very important in any
predictive model
18. 11Aug08 userR! 08 - Porzak, Customer Segmentation 18
I also typically include...
● Breadth
– How many different SKUs purchased?
● Tenure
– How long as customer been with us?
19. 11Aug08 userR! 08 - Porzak, Customer Segmentation 19
Next Step – Aggregate by Customer
● We need some “raw” RFM values
● Make the data frame “RFM_raw”
– CustomerID: the business key back to database
– FirstPurchaseDate: interesting for tenure metric
– LastPurchaseDate: basis of Recency
– NumberOrders: basis of Frequency
– NumberSKUs: basis of Breadth (engagement metric)
– TotalAmount: basis of Monetary
● Also calculate
– AsOfDate <- max(LastPurchaseDate)
20. 11Aug08 userR! 08 - Porzak, Customer Segmentation 20
Building the RFM_raw data frame
## for performance, make OrderDate an integer during aggregation
orders_n <- orders
orders_n$OrderDate <- as.integer(orders_n$OrderDate)
## build up one column at a time
RFM_raw <- with(orders_n, data.frame(CustomerID = sort(unique(CustID))))
RFM_raw <- cbind(RFM_raw, FirstPurchaseDate = with(orders_n,
as.Date(as.integer(by(OrderDate, CustID, min)), "1970-01-01")))
RFM_raw <- cbind(RFM_raw, LastPurchaseDate = with(orders_n,
as.Date(as.integer(by(OrderDate, CustID, max)), "1970-01-01")))
RFM_raw <- cbind(RFM_raw, NumberOrders = with(orders_n,
as.numeric(by(OrderID, CustID, function(x) length(unique(x))))))
RFM_raw <- cbind(RFM_raw, NumberSKUs = with(orders_n,
as.numeric(by(SKU_ID, CustID, function(x) length(unique(x))))))
RFM_raw <- cbind(RFM_raw, TotalAmount = with(orders_n,
as.numeric(by(Amount, CustID, sum))))
AsOfDate <- max(RFM_raw$LastPurchaseDate)
save(RFM_raw, AsOfDate, file = "RFM_raw.Rda")
This take a while (1 ½ minutes on my laptop). You may want to download RFM_raw.Rda
21. 11Aug08 userR! 08 - Porzak, Customer Segmentation 21
Do some RMF EDA
## Jim's miscellaneous DMA functions
source("dma_misc.R")
## for interactive games:
attach(RFM_raw)
## EDA plots using base graphics
rfm.plot(as.numeric(AsOfDate - LastPurchaseDate) %/% 7, "rec")
rfm.plot(NumberOrders, "freq")
rfm.plot(TotalAmount, "mon")
rfm.plot(NumberSKUs, "breadth")
## EDA plots using iPlots
ihist(as.numeric(AsOfDate - LastPurchaseDate) %/% 7, title = "Recency")
ihist(NumberOrders, title = "Frequency")
ihist(TotalAmount, title = "Monetary")
ihist(NumberSKUs, title = "Breadth")
22. 11Aug08 userR! 08 - Porzak, Customer Segmentation 22
RFM EDA Plots
In all cases, “best is left.”
23. 11Aug08 userR! 08 - Porzak, Customer Segmentation 23
Assign reasonable RFM breaks
● Recency:
– Breaks (weeks <=): 25, 51, 77, 103, <else>
– levels = c("0-5", "6-11", "12-17", "18-23", "24-29"))
● Note levels labeled in months, not weeks
● Frequency:
– Breaks (count <=): 1, 3, 7, <else>
– levels = c("8+", "7-4", "3-2", "1"))
● Note ordering for best is left.
● Monetary:
– Breaks (value <=): 50, 100, 200, 400, <else>
– levels = c("401+", "400-201", "200-101", "100-51", "50-0"))
● Again ordering is best is left.
26. 11Aug08 userR! 08 - Porzak, Customer Segmentation 26
How do customers look in RFM space?
● I like mosaic plots (& especially vcd* package!)
● Set up a “structure table” with assignments:
● And a convenience function for mosaic:
require(vcd)
RFM_st <- structable(~ Recency + Frequency + Monetary + Breadth,
data = RFM_segs)
mm <- function(f) {
mosaic(f, data = RFM_st,
shade = TRUE,
labeling_args = list(rot_labels = c(left = 90, top = 45),
just_labels = c(left = "left",
top = "center")),
spacing = spacing_dimequal(unit(c(0.5, 0.8), "lines")),
keep_aspect_ratio = FALSE
)
}
* To learn more, attend: The strucplot framework for Visualizing Categorical Data. Wed, 11:30. E29
31. 11Aug08 userR! 08 - Porzak, Customer Segmentation 31
To really show off vcd!
pairs(RFM_st, lower_panel = pairs_assoc, shade = TRUE)
32. 11Aug08 userR! 08 - Porzak, Customer Segmentation 32
Time to get real – remember goal?
33. 11Aug08 userR! 08 - Porzak, Customer Segmentation 33
Actionable for Marketers
The big two concepts:
1. Lifestage
2. Value
Turns out we can do both with Recency &
Frequency!
34. 11Aug08 userR! 08 - Porzak, Customer Segmentation 34
Use Balloon Plots to Communicate
require(gplots)
# Recency by Frequence - Counts
RxF <- as.data.frame(table(RFM_segs$Recency, RFM_segs$Frequency,
dnn = c("Recency", "Frequency")),
responseName = "Number_Customers")
with(RxF, balloonplot(Recency, Frequency, Number_Customers, zlab = "#
Customers"))
# Recency by Frequency - Annual Value (total annual sales to segment)
VbyRxF <- (aggregate(RFM_segs$Monetary_value,
by = list(Recency = factor(RFM_segs$Recency),
Frequency = RFM_segs$Frequency),
sum))
names(VbyRxF)[3] <- "Annual_Sales"
VbyRxF$Annual_Sales <- VbyRxF$Annual_Sales / (28/12) ## normalize to
annual revnue
with(VbyRxF, balloonplot(Recency, Frequency, Annual_Sales / 1000, zlab =
"Annual Sales (000)"))
35. 11Aug08 userR! 08 - Porzak, Customer Segmentation 35
Recency by Frequency - Counts
36. 11Aug08 userR! 08 - Porzak, Customer Segmentation 36
Recency by Frequency - Value
37. 11Aug08 userR! 08 - Porzak, Customer Segmentation 37
Exercise – Assign Segments
● Lifestage “dimension”
– New
– Active
– Lapsed
– Lost
● Value “dimension”
– Gold
– Silver
– Bronze
● Combined as
– High Value, Repeat, New, One-time, Lapsed, & Lost
38. 11Aug08 userR! 08 - Porzak, Customer Segmentation 38
Color & Label Segment Cells
# a matrix of segment codes
RF_segs0 <- matrix("", nrow = 4, ncol = 5)
# manually make assignments
object.browser() ## Fill in H, R, N, L, or O. Save as RF_segs.txt
# get back into R
RF_segs <- as.matrix(read.delim("RF_segs.txt", sep = "t",
na.strings = ""))
RF_segs[is.na(RF_segs)] <- "X" ## N/A's become “Lost”
# add colors and labels to balloon plot
# Magic values for balloon cell centers
RF_x <- matrix(2:6 + 0.25, nrow = 4, ncol = 5, byrow = TRUE)
RF_y <- matrix(4:1, nrow = 4, ncol = 5, byrow = FALSE)
RF_cols <- sapply(RF_segs, function(x) switch(x, H="gold",
R="slategray2", N="green",
L="yellow", O="darkgreen", "red"))
points(RF_x, RF_y, col = RF_cols, pch = 16, cex = 12)
text(RF_x, RF_y, RF_segs, cex = 2)
39. 11Aug08 userR! 08 - Porzak, Customer Segmentation 39
Final Segments for Marketers
43. 11Aug08 userR! 08 - Porzak, Customer Segmentation 43
Marketing Challenge
● Our client offers free download of software with
high perceived value, but
● First asks user to fill out a simple survey
● Challenge is to come up with a “few” segments
that will be used by segment to:
– Prioritize contact strategy
– Craft marketing messages based on profile
44. 11Aug08 userR! 08 - Porzak, Customer Segmentation 44
Sample Data
● Surveys from 20k respondents
● All within same time frame (a number of weeks)
● All requested the software download
45. 11Aug08 userR! 08 - Porzak, Customer Segmentation 45
Survey Description
● 35 check boxes or radio buttons
– None required. Coded as binary responses
● Arranged in 5 sections
– License: W and/or X
– Role: one of D, SA, ITM, ITA, Str, Oth (radio
buttons)
– System: any of S, T, A, B, C, D, O (check boxes)
– Interest: any of M, O Pl, Pr, Sup, 64, Con, Per, DT,
Z, Oth. (check boxes)
– Application: any of Web, Inf, Col, Db, J2, Top, Dev,
Per, Other (check boxes)
46. 11Aug08 userR! 08 - Porzak, Customer Segmentation 46
Data Set
Provided as data frame csb, in
InterestPreferenceSurvey.Rda
# Getting started
setwd("C:/Data/useR08/R")
require(lattice)
require(grDevices)
require(vcd)
require(flexclust)
load(file = "InterestPreferenceSurvey.Rda")
str(csb)
'data.frame': 20000 obs. of 35 variables:
$ Lic_W : int 0 0 0 0 0 0 0 0 0 0 ...
$ Lic_X : int 1 1 1 0 1 1 1 1 1 1 ...
$ Role_D : int 0 0 0 0 0 0 0 0 1 0 ...
$ Role_SA : int 0 0 1 0 1 0 0 1 0 0 ...
$ Role_ITM: int 0 0 0 1 0 0 0 0 0 0 ...
$ Role_ITA: int 0 0 0 0 0 0 0 0 0 0 ...
48. 11Aug08 userR! 08 - Porzak, Customer Segmentation 48
Clustering Strategy
● flexclust package by Fritz Leisch
● See his 2006 paper (on his personal page):
A Toolbox for K-Centroids Cluster Analysis
● This is (mostly) an optional response type
survey
– 1 = “yes” is significant
– 0 is just absence not really a “no”
– Respondents checking Role_SA have much more
in common than those not checking Role_SA
● Following Fritz's argument we use the
expectation based Jaccard distance measure.
49. 11Aug08 userR! 08 - Porzak, Customer Segmentation 49
A First Cluster Run
require(flexclust)
## set up flexclust control object
fc_cont <- new("flexclustControl")
fc_cont@tolerance <- 0.1 ## this doesn't seem to work as expected
fc_cont@iter.max <- 30 ## seems to be effective convergence
##fc_cont@verbose <- 1 ## set TRUE if to see each step
my_seed <- 0
my_family <- "ejaccard"
num_clust <- 4
my_seed <- my_seed + 1
set.seed(my_seed)
cl <- kcca(csb, k = num_clust, save.data = TRUE, control = fc_cont,
family = kccaFamily(my_family))
## This takes ~ 1.5 min. on my laptop
50. 11Aug08 userR! 08 - Porzak, Customer Segmentation 50
Cluster Summary
> summary(cl)
kcca object of family 'ejaccard'
call:
kcca(x = csb, k = num_clust, family = kccaFamily(my_family),
control = fc_cont, save.data = TRUE)
cluster info:
size av_dist max_dist separation
1 5551 0.7159832 1 0.6766653
2 4577 0.7707523 1 0.7437616
3 2535 0.7482347 1 0.7038259
4 7337 0.7215583 1 0.6732479
no convergence after 200 iterations
sum of within cluster distances: 14693.00
55. 11Aug08 userR! 08 - Porzak, Customer Segmentation 55
Are any of these any good?
● If so, which?
● How to decide?
● Quoting Fritz (pg 15):
The actual choice of expectation-based Jaccard
with K = 6 clusters ... has been made manually by
comparing various solutions and selecting the one
which made most sense from the practitioners
point of view. This may seem unsatisfying because
the decision is subjective, but cluster analysis here
is used as a tool for exploratory data analysis and
offers simplified views of a complex data set.
56. 11Aug08 userR! 08 - Porzak, Customer Segmentation 56
Our Selection Criteria
1. Choice of k, must have mostly ~ stable
solutions, and
2. Cluster profiles must be interpretable. IOW,
what is the story you can tell about each
cluster? Will the marketers relate to it?
57. 11Aug08 userR! 08 - Porzak, Customer Segmentation 57
Your Challenge...
Do what Fritz said:
The actual choice ... has been made manually by
comparing various solutions and selecting the one
which made most sense.
Here are 4 runs for each k = 3 to 8; 24 in all.
Pick the “best” one, make up stories for each cluster,
and explain your choice to group.
58. 11Aug08 userR! 08 - Porzak, Customer Segmentation 58
For the Record. Jim's Pick:
59. 11Aug08 userR! 08 - Porzak, Customer Segmentation 59
Jim's Stories
Based on knowing a bit more about the client
than I can share with you.
#1: An “S” loyalist, high % SA's
#2: Favors name brands, high responders
#3: A “T” loyalist, broad but reduced responses
#4: Favors name brands, but otherwise low resp.
#5: Student, gray box, open source, desktop.
60. 11Aug08 userR! 08 - Porzak, Customer Segmentation 60
Finally, using predict in flexclust
Once we (analysts & marketers) have decided on
a clustering model, we want to use it to assign
new respondents to likely segment.
flexclust includes predict:
persona <- predict(cl, csb)
head(persona)
str(persona)
PersonaPredict <- as.data.frame(persona)
names(PersonaPredict) <- "cluster"
> table(PersonaPredict)
PersonaPredict
1 2 3 4 5
2313 6479 4654 2702 3852
61. 11Aug08 userR! 08 - Porzak, Customer Segmentation 61
Closing the Loop –
Tying Back to Purchase Model
Where ppBand is probability of purchase band ( 0 = 0.0 – 0.999,
1 = 0.10 – 0.199, … 9 = 0.90 – 0.999). IOW, 0 is really low & 9 is
really high probability of purchase according to the model
63. 11Aug08 userR! 08 - Porzak, Customer Segmentation 63
Follow up
● Slides and code will be up next week on
http://www.porzak.com/JimArchive/useR2008/
● Ping me with questions or comments:
jporzak@gmail.com
● Check out the San Francisco useR Group:
ia.meetup.com/67/
Thanks!