Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SFbayACM.org Data Science Camp 
Saturday, October 25, 2014 
Greg Makowski 
Twitter Tag #DMCAMP
 Customer Description – CC Company – 
“Who” vs. “How” to Talk to Customers 
 Hotel Price Optimization – Using 
Clusters ...
 Context: 
◦ Major credit card company 
◦ South American Market 
◦ Repeat for Argentina, Brazil… and “dollar countries” 
...
 Solution, 3 projects for each customer base 
◦ “WHO” to talk to… 
 Customer Attrition Model – Neural Network (5 algs te...
 Consider Scalability 
◦ 100k – 500k customers 
◦ Some cluster methods are O(n) or O(n2) 
◦ Use Kmeans to create 100 clus...
Select the 5-15% customers 
“highest” in the spike 
1 
4 7 10 13 16 19 
Tree-Net 
Random 
Cumulative Profit 
5% Customer 
...
 How to design the cluster analysis? 
◦ Select top fields from neural network 
 Sensitivity Analysis on the NN 
 % spen...
 Consider Scalability 
◦ 100k – 500k customers 
◦ Some cluster methods are O(n) or O(n2) 
◦ Use Kmeans to create 100 clus...
 Consider Scalability 
◦ 100k – 500k customers 
◦ Some cluster methods are O(n) or O(n2) 
◦ Use Kmeans to create 100 clus...
Clusters 
Most customers  Least 
ALL 1 2 N 
Ordered 
by 
Importance 
Fields 100% 36% 22%  5%
Clusters 
Most customers  Least 
Ordered 
by 
Importance 
ALL 1 2 N 
100% 36% Fields 22%  5% min MAX
Most: 
Var X, Y, Z 
Least 
Var A, B, C 
May have 12 clusters, 36 variables 
Then each cluster may have 6 attributes 
to us...
 Select “WHO” with (Attrition)x(Profitability) 
 Select “HOW” with Cluster Segments 
◦ Given the variable selection, onl...
Analysis 
Type 
CRM 
Behavior 
Media 
Message 
$$$ 
Best 
Customers 
Upgrade, Downgrade 
Loyal 
Loyalty 
Cross-Sell 
Prosp...
 Customer Description – CC Company – 
“Who” vs. “How” to Talk to Customers 
 Hotel Price Optimization – Using 
Clusters ...
 Objective: 
◦ Optimize pricing for hotel rooms 
◦ Take into account geography & use 
 weekend, vacation, business, conf...
 Revenue Management is a general process used 
to 
◦ optimize profit 
◦ given the remaining (plane seat or hotel room) 
i...
 Data Mining and Operations Research Design 
◦ When training predictive models, it helps to learn 
behavior “in the same ...
 How do we group or divide individual hotels, 
given all the attributes? 
◦ Brand, location, % utilization weekday or wee...
 Solution: 
◦ 1) Build an initial predictive model predicting 
pricing. Find the most important variables. 
◦ 2) Create 8...
 Customer Description – CC Company – 
“Who” vs. “How” to Talk to Customers 
 Hotel Price Optimization – Using 
Clusters ...
 The “Retail Supply Chain” is from 
◦ the manufacturer to 
◦ distribution center to 
◦ Warehouse to 
◦ Store to Consumer ...
 4,000 stores 
 100,000 products/SKU’s (stock keeping units) 
◦ 400 million store-product combinations 
 52 weeks per y...
 End up creating 2,000+ “profiles” or 
centroids 
 Assign new store-SKU’s to an existing profile 
 If it doesn’t match ...
 Bottom-up findings (after the fact) 
◦ Buying hunting related items as the ducks migrate 
north
Three case studies deploying cluster analysis
Three case studies deploying cluster analysis
Three case studies deploying cluster analysis
Three case studies deploying cluster analysis
Three case studies deploying cluster analysis
Three case studies deploying cluster analysis
Upcoming SlideShare
Loading in …5
×

Three case studies deploying cluster analysis

5,412 views

Published on

Three case studies are discussed, that include cluster analysis as a component.
1) Customer description for a credit card attrition model, to describe how to talk to customers.
2) Hotel price optimization. Use clusters to find subsets of similar behavior, and optimize prices within each cluster. Use a neural net as the objective function.
3) Retail supply chain, planning replenishment using 52 week demand curves using thousands of seasonal "profiles" or clusters.

Published in: Data & Analytics
  • Be the first to comment

Three case studies deploying cluster analysis

  1. 1. SFbayACM.org Data Science Camp Saturday, October 25, 2014 Greg Makowski Twitter Tag #DMCAMP
  2. 2.  Customer Description – CC Company – “Who” vs. “How” to Talk to Customers  Hotel Price Optimization – Using Clusters as Non-Linear Constraints  Retail Supply Chain – Planning Replenishment for 52 Week Demand Curves
  3. 3.  Context: ◦ Major credit card company ◦ South American Market ◦ Repeat for Argentina, Brazil… and “dollar countries”  Objectives or Problem: ◦ How to best manage the customer population ◦ Develop a software system, to repeat over geography and time ◦ How to AUTOMATE understanding?  How to automate naming the clusters?
  4. 4.  Solution, 3 projects for each customer base ◦ “WHO” to talk to…  Customer Attrition Model – Neural Network (5 algs tested)  Decrease in spending over time  Basic vs. Supplemental Cards  By 7 categories  Challenge: Double digit inflation in some countries (90’s)  Standardize by monthly spending  Mining Factoid: Credit Card Digit 11 was predictive  Billing cycles? Monthly salaries + high inflation  Customer Profitability – Net Present Value ◦ “HOW” to talk to them…  Cluster Analysis
  5. 5.  Consider Scalability ◦ 100k – 500k customers ◦ Some cluster methods are O(n) or O(n2) ◦ Use Kmeans to create 100 clusters O(n) ◦ Then use O(n2) methods to reduce from 100 clusters down to 8-12 clusters ◦
  6. 6. Select the 5-15% customers “highest” in the spike 1 4 7 10 13 16 19 Tree-Net Random Cumulative Profit 5% Customer Groups Total Profit / cell Attrition Profitability 83% of Attrition Profit was Lost in top 15%
  7. 7.  How to design the cluster analysis? ◦ Select top fields from neural network  Sensitivity Analysis on the NN  % spending by category  Restaurant, Retail, Grocery, Hotel, Air, Auto, …  Trend over time (slope, expected future value)  Decide to create 8 – 12 clusters or customer segments to communicate to marketers ◦
  8. 8.  Consider Scalability ◦ 100k – 500k customers ◦ Some cluster methods are O(n) or O(n2) ◦ Use Kmeans to create 100 clusters O(n) ◦ Then use O(n2) methods to reduce from 100 clusters down to 8-12 clusters ◦
  9. 9.  Consider Scalability ◦ 100k – 500k customers ◦ Some cluster methods are O(n) or O(n2) ◦ Use Kmeans to create 100 clusters O(n) ◦ Then use O(n2) methods to reduce from 100 clusters down to 8-12 clusters ◦ This uses all the data scalebly, and more sophisticated hierarchical cluster search ◦
  10. 10. Clusters Most customers  Least ALL 1 2 N Ordered by Importance Fields 100% 36% 22%  5%
  11. 11. Clusters Most customers  Least Ordered by Importance ALL 1 2 N 100% 36% Fields 22%  5% min MAX
  12. 12. Most: Var X, Y, Z Least Var A, B, C May have 12 clusters, 36 variables Then each cluster may have 6 attributes to use in naming min MAX
  13. 13.  Select “WHO” with (Attrition)x(Profitability)  Select “HOW” with Cluster Segments ◦ Given the variable selection, only a few clusters matched most of the 15% subset of the customers to manage  Marketers could understand well the different audiences and reasons for attrition – and could better write copy for communication  About 50 Executives walked around with the one page cluster summary in their pocket, frequently used to plan customer strategies
  14. 14. Analysis Type CRM Behavior Media Message $$$ Best Customers Upgrade, Downgrade Loyal Loyalty Cross-Sell Prospect Segment Reactivation Attrition Retention Fraud
  15. 15.  Customer Description – CC Company – “Who” vs. “How” to Talk to Customers  Hotel Price Optimization – Using Clusters as Non-Linear Constraints  Retail Supply Chain – Planning Replenishment for 52 Week Demand Curves
  16. 16.  Objective: ◦ Optimize pricing for hotel rooms ◦ Take into account geography & use  weekend, vacation, business, conference, …  Seasons of the year as it relates to demand  The hotel owns many brands (chains) focused on different audiences ◦ Different price tiers, target audiences,… ◦ Hotel, motel, extended stay, … ◦ What “lessons learned” cross brands?
  17. 17.  Revenue Management is a general process used to ◦ optimize profit ◦ given the remaining (plane seat or hotel room) inventory ◦ the remaining time until the inventory is gone  Operations Research ◦ Linear or Non-Linear Programming  Lin or Non-Lin in either constraints or objective function ◦ Need an objective function to optimize  Train predictive models to forecast price, given conditions
  18. 18.  Data Mining and Operations Research Design ◦ When training predictive models, it helps to learn behavior “in the same ball park” with the same model. ◦ If the underlying thought process is fairly different, subdivide the data into different subsets and train different models. For example:  Attrition: checking, credit card, line of credit, mortgage  In Mortgage Bond Pricing: monthly prepayment of none vs. 100’s vs. 1,000’s vs. a full refinance
  19. 19.  How do we group or divide individual hotels, given all the attributes? ◦ Brand, location, % utilization weekday or weekend,  Find bottom-up clusters, rather than top-down assertions on the data  For cluster variables – use best variables in pricing predictive models (sound familiar?)
  20. 20.  Solution: ◦ 1) Build an initial predictive model predicting pricing. Find the most important variables. ◦ 2) Create 8-16 clusters, using those variables ◦ 3) Within each cluster  A) Train a predictive model for use as the OR objective function  B) Run a LINEAR OR price optimization, on the data subset
  21. 21.  Customer Description – CC Company – “Who” vs. “How” to Talk to Customers  Hotel Price Optimization – Using Clusters as Non-Linear Constraints  Retail Supply Chain – Planning Replenishment for 52 Week Demand Curves
  22. 22.  The “Retail Supply Chain” is from ◦ the manufacturer to ◦ distribution center to ◦ Warehouse to ◦ Store to Consumer  Replenishment is to re-supply products on the shelves ◦ Minimize overstock and understock ◦ Heavy understock causes LOSS OF SALES ◦ Heavy overstock causes 30% end of season liquidation
  23. 23.  4,000 stores  100,000 products/SKU’s (stock keeping units) ◦ 400 million store-product combinations  52 weeks per year ◦ 20.8 billion store-product-week combinations  Not the smallest problem in the mid-90’s  Holidays shift in week number, from year to year – need to adjust
  24. 24.  End up creating 2,000+ “profiles” or centroids  Assign new store-SKU’s to an existing profile  If it doesn’t match (within a radius)… ◦ Re-run cluster analysis ◦ Lock existing centroids ◦ Create new centroids for data points outside ◦ Add to the “profile library”
  25. 25.  Bottom-up findings (after the fact) ◦ Buying hunting related items as the ducks migrate north

×