SlideShare a Scribd company logo
SFbayACM.org Data Science Camp 
Saturday, October 25, 2014 
Greg Makowski 
Twitter Tag #DMCAMP
 Customer Description – CC Company – 
“Who” vs. “How” to Talk to Customers 
 Hotel Price Optimization – Using 
Clusters as Non-Linear Constraints 
 Retail Supply Chain – Planning 
Replenishment for 52 Week Demand 
Curves
 Context: 
◦ Major credit card company 
◦ South American Market 
◦ Repeat for Argentina, Brazil… and “dollar countries” 
 Objectives or Problem: 
◦ How to best manage the customer population 
◦ Develop a software system, to repeat over geography 
and time 
◦ How to AUTOMATE understanding? 
 How to automate naming the clusters?
 Solution, 3 projects for each customer base 
◦ “WHO” to talk to… 
 Customer Attrition Model – Neural Network (5 algs tested) 
 Decrease in spending over time 
 Basic vs. Supplemental Cards 
 By 7 categories 
 Challenge: Double digit inflation in some countries (90’s) 
 Standardize by monthly spending 
 Mining Factoid: Credit Card Digit 11 was predictive 
 Billing cycles? Monthly salaries + high inflation 
 Customer Profitability – Net Present Value 
◦ “HOW” to talk to them… 
 Cluster Analysis
 Consider Scalability 
◦ 100k – 500k customers 
◦ Some cluster methods are O(n) or O(n2) 
◦ Use Kmeans to create 100 clusters O(n) 
◦ Then use O(n2) methods to reduce from 100 clusters 
down to 8-12 clusters 
◦
Select the 5-15% customers 
“highest” in the spike 
1 
4 7 10 13 16 19 
Tree-Net 
Random 
Cumulative Profit 
5% Customer 
Groups 
Total Profit / cell 
Attrition 
Profitability 
83% of Attrition Profit was Lost in top 15%
 How to design the cluster analysis? 
◦ Select top fields from neural network 
 Sensitivity Analysis on the NN 
 % spending by category 
 Restaurant, Retail, Grocery, Hotel, Air, Auto, … 
 Trend over time (slope, expected future value) 
 Decide to create 8 – 12 clusters or customer segments 
to communicate to marketers 
◦
 Consider Scalability 
◦ 100k – 500k customers 
◦ Some cluster methods are O(n) or O(n2) 
◦ Use Kmeans to create 100 clusters O(n) 
◦ Then use O(n2) methods to reduce from 100 clusters 
down to 8-12 clusters 
◦
 Consider Scalability 
◦ 100k – 500k customers 
◦ Some cluster methods are O(n) or O(n2) 
◦ Use Kmeans to create 100 clusters O(n) 
◦ Then use O(n2) methods to reduce from 100 clusters 
down to 8-12 clusters 
◦ This uses all the data scalebly, and more 
sophisticated hierarchical cluster search 
◦
Clusters 
Most customers  Least 
ALL 1 2 N 
Ordered 
by 
Importance 
Fields 100% 36% 22%  5%
Clusters 
Most customers  Least 
Ordered 
by 
Importance 
ALL 1 2 N 
100% 36% Fields 22%  5% min MAX
Most: 
Var X, Y, Z 
Least 
Var A, B, C 
May have 12 clusters, 36 variables 
Then each cluster may have 6 attributes 
to use in naming 
min MAX
 Select “WHO” with (Attrition)x(Profitability) 
 Select “HOW” with Cluster Segments 
◦ Given the variable selection, only a few clusters 
matched most of the 15% subset of the customers to 
manage 
 Marketers could understand well the different 
audiences and reasons for attrition – and 
could better write copy for communication 
 About 50 Executives walked around with the 
one page cluster summary in their pocket, 
frequently used to plan customer strategies
Analysis 
Type 
CRM 
Behavior 
Media 
Message 
$$$ 
Best 
Customers 
Upgrade, Downgrade 
Loyal 
Loyalty 
Cross-Sell 
Prospect 
Segment 
Reactivation 
Attrition 
Retention 
Fraud
 Customer Description – CC Company – 
“Who” vs. “How” to Talk to Customers 
 Hotel Price Optimization – Using 
Clusters as Non-Linear Constraints 
 Retail Supply Chain – Planning 
Replenishment for 52 Week Demand 
Curves
 Objective: 
◦ Optimize pricing for hotel rooms 
◦ Take into account geography & use 
 weekend, vacation, business, conference, … 
 Seasons of the year as it relates to demand 
 The hotel owns many brands (chains) focused 
on different audiences 
◦ Different price tiers, target audiences,… 
◦ Hotel, motel, extended stay, … 
◦ What “lessons learned” cross brands?
 Revenue Management is a general process used 
to 
◦ optimize profit 
◦ given the remaining (plane seat or hotel room) 
inventory 
◦ the remaining time until the inventory is gone 
 Operations Research 
◦ Linear or Non-Linear Programming 
 Lin or Non-Lin in either constraints or objective function 
◦ Need an objective function to optimize 
 Train predictive models to forecast price, given 
conditions
 Data Mining and Operations Research Design 
◦ When training predictive models, it helps to learn 
behavior “in the same ball park” with the same 
model. 
◦ If the underlying thought process is fairly different, 
subdivide the data into different subsets and train 
different models. For example: 
 Attrition: checking, credit card, line of credit, mortgage 
 In Mortgage Bond Pricing: monthly prepayment of 
none vs. 100’s vs. 1,000’s vs. a full refinance
 How do we group or divide individual hotels, 
given all the attributes? 
◦ Brand, location, % utilization weekday or weekend, 
 Find bottom-up clusters, rather than top-down 
assertions on the data 
 For cluster variables – use best variables in 
pricing predictive models (sound familiar?)
 Solution: 
◦ 1) Build an initial predictive model predicting 
pricing. Find the most important variables. 
◦ 2) Create 8-16 clusters, using those variables 
◦ 3) Within each cluster 
 A) Train a predictive model for use as the OR objective 
function 
 B) Run a LINEAR OR price optimization, on the data 
subset
 Customer Description – CC Company – 
“Who” vs. “How” to Talk to Customers 
 Hotel Price Optimization – Using 
Clusters as Non-Linear Constraints 
 Retail Supply Chain – Planning 
Replenishment for 52 Week Demand 
Curves
 The “Retail Supply Chain” is from 
◦ the manufacturer to 
◦ distribution center to 
◦ Warehouse to 
◦ Store to Consumer 
 Replenishment is to re-supply products on the 
shelves 
◦ Minimize overstock and understock 
◦ Heavy understock causes LOSS OF SALES 
◦ Heavy overstock causes 30% end of season liquidation
 4,000 stores 
 100,000 products/SKU’s (stock keeping units) 
◦ 400 million store-product combinations 
 52 weeks per year 
◦ 20.8 billion store-product-week combinations 
 Not the smallest problem in the mid-90’s 
 Holidays shift in week number, from year to 
year – need to adjust
 End up creating 2,000+ “profiles” or 
centroids 
 Assign new store-SKU’s to an existing profile 
 If it doesn’t match (within a radius)… 
◦ Re-run cluster analysis 
◦ Lock existing centroids 
◦ Create new centroids for data points outside 
◦ Add to the “profile library”
 Bottom-up findings (after the fact) 
◦ Buying hunting related items as the ducks migrate 
north
Three case studies deploying cluster analysis
Three case studies deploying cluster analysis
Three case studies deploying cluster analysis
Three case studies deploying cluster analysis
Three case studies deploying cluster analysis

More Related Content

What's hot

Customer segmentation.pptx
Customer segmentation.pptxCustomer segmentation.pptx
Customer segmentation.pptx
Addalashashikumar
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
Alexander Konduforov
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Mahbubur Rahman Shimul
 
Decision trees and random forests
Decision trees and random forestsDecision trees and random forests
Decision trees and random forests
Debdoot Sheet
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
Hitesh Mohapatra
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
MaatougSelim
 
Decision tree
Decision treeDecision tree
Decision tree
ShraddhaPandey45
 
Basics of Customer Segmentation
Basics of Customer SegmentationBasics of Customer Segmentation
Basics of Customer Segmentation
Nemwos
 
Nonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemNonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problem
Michele Filannino
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
Pravinkumar Landge
 
Knn
KnnKnn
Anomaly Detection Using Isolation Forests
Anomaly Detection Using Isolation ForestsAnomaly Detection Using Isolation Forests
Anomaly Detection Using Isolation Forests
Turi, Inc.
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
Sulman Ahmed
 
Decision trees & random forests
Decision trees & random forestsDecision trees & random forests
Decision trees & random forests
SC5.io
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
Anandha L Ranganathan
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Sanghamitra Deb
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
Simplilearn
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
rajshreemuthiah
 

What's hot (20)

Customer segmentation.pptx
Customer segmentation.pptxCustomer segmentation.pptx
Customer segmentation.pptx
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Decision trees and random forests
Decision trees and random forestsDecision trees and random forests
Decision trees and random forests
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Decision tree
Decision treeDecision tree
Decision tree
 
Basics of Customer Segmentation
Basics of Customer SegmentationBasics of Customer Segmentation
Basics of Customer Segmentation
 
Nonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemNonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problem
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Knn
KnnKnn
Knn
 
Anomaly Detection Using Isolation Forests
Anomaly Detection Using Isolation ForestsAnomaly Detection Using Isolation Forests
Anomaly Detection Using Isolation Forests
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Decision trees & random forests
Decision trees & random forestsDecision trees & random forests
Decision trees & random forests
 
5desc
5desc5desc
5desc
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 

Similar to Three case studies deploying cluster analysis

Wooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit CustomersWooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit Customers
Lucinda Linde
 
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionBig Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Matt Stubbs
 
Automated Data Mining for Everyone
Automated Data Mining for EveryoneAutomated Data Mining for Everyone
Automated Data Mining for Everyone
Exponea
 
Using Big Data & Analytics to Create Consumer Actionable Insights
Using Big Data & Analytics to Create Consumer Actionable InsightsUsing Big Data & Analytics to Create Consumer Actionable Insights
Using Big Data & Analytics to Create Consumer Actionable Insights莫利伟 Olivier Maugain
 
Churn analysis
Churn analysisChurn analysis
Churn analysis
Naveen Kumar
 
Identifying high value customers
Identifying high value customersIdentifying high value customers
Identifying high value customers
Stefano Maria De' Rossi
 
Customer Lifetime Value in Digital Marketing
Customer Lifetime Value in Digital MarketingCustomer Lifetime Value in Digital Marketing
Customer Lifetime Value in Digital Marketing
Taste Medio
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
Boston Institute of Analytics
 
Retail Energy Analytics_Marketelligent
Retail Energy Analytics_MarketelligentRetail Energy Analytics_Marketelligent
Retail Energy Analytics_Marketelligent
Marketelligent
 
Developing a customer data platform
Developing a customer data platformDeveloping a customer data platform
Developing a customer data platform
Tredence Inc
 
Rapid Optimization Application Development Using Excel and Solver
Rapid Optimization Application Development Using Excel and SolverRapid Optimization Application Development Using Excel and Solver
Rapid Optimization Application Development Using Excel and Solver
Michael Mina
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
Aseda Owusua Addai-Deseh
 
Demand forecasting case study
Demand forecasting case studyDemand forecasting case study
Demand forecasting case study
Rupam Devnath
 
Developing a Customer Win Back Strategy
Developing a Customer Win Back StrategyDeveloping a Customer Win Back Strategy
Developing a Customer Win Back Strategy
Art Hall
 
Data science vs real world: friends or foes - Pavle Kecman
Data science vs real world: friends or foes - Pavle KecmanData science vs real world: friends or foes - Pavle Kecman
Data science vs real world: friends or foes - Pavle Kecman
Institute of Contemporary Sciences
 
Big Data and the Next Best Offer
Big Data and the Next Best OfferBig Data and the Next Best Offer
Big Data and the Next Best Offer
Michel Bruley
 
Webinar: How to Setup a Product to Perform by Worldpay PM
Webinar: How to Setup a Product to Perform by Worldpay PMWebinar: How to Setup a Product to Perform by Worldpay PM
Webinar: How to Setup a Product to Perform by Worldpay PM
Product School
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
Pranov Mishra
 
Using AI and ML Solutions for Proactive Customer Retention.pptx
Using AI and ML Solutions for Proactive Customer Retention.pptxUsing AI and ML Solutions for Proactive Customer Retention.pptx
Using AI and ML Solutions for Proactive Customer Retention.pptx
VOZIQ
 

Similar to Three case studies deploying cluster analysis (20)

Wooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit CustomersWooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit Customers
 
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionBig Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
 
Automated Data Mining for Everyone
Automated Data Mining for EveryoneAutomated Data Mining for Everyone
Automated Data Mining for Everyone
 
Using Big Data & Analytics to Create Consumer Actionable Insights
Using Big Data & Analytics to Create Consumer Actionable InsightsUsing Big Data & Analytics to Create Consumer Actionable Insights
Using Big Data & Analytics to Create Consumer Actionable Insights
 
Churn analysis
Churn analysisChurn analysis
Churn analysis
 
Identifying high value customers
Identifying high value customersIdentifying high value customers
Identifying high value customers
 
Customer Lifetime Value in Digital Marketing
Customer Lifetime Value in Digital MarketingCustomer Lifetime Value in Digital Marketing
Customer Lifetime Value in Digital Marketing
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Retail Energy Analytics_Marketelligent
Retail Energy Analytics_MarketelligentRetail Energy Analytics_Marketelligent
Retail Energy Analytics_Marketelligent
 
Developing a customer data platform
Developing a customer data platformDeveloping a customer data platform
Developing a customer data platform
 
Rapid Optimization Application Development Using Excel and Solver
Rapid Optimization Application Development Using Excel and SolverRapid Optimization Application Development Using Excel and Solver
Rapid Optimization Application Development Using Excel and Solver
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
Demand forecasting case study
Demand forecasting case studyDemand forecasting case study
Demand forecasting case study
 
Developing a Customer Win Back Strategy
Developing a Customer Win Back StrategyDeveloping a Customer Win Back Strategy
Developing a Customer Win Back Strategy
 
Pilar sanchezaita ratetiger[faro]
Pilar sanchezaita ratetiger[faro]Pilar sanchezaita ratetiger[faro]
Pilar sanchezaita ratetiger[faro]
 
Data science vs real world: friends or foes - Pavle Kecman
Data science vs real world: friends or foes - Pavle KecmanData science vs real world: friends or foes - Pavle Kecman
Data science vs real world: friends or foes - Pavle Kecman
 
Big Data and the Next Best Offer
Big Data and the Next Best OfferBig Data and the Next Best Offer
Big Data and the Next Best Offer
 
Webinar: How to Setup a Product to Perform by Worldpay PM
Webinar: How to Setup a Product to Perform by Worldpay PMWebinar: How to Setup a Product to Perform by Worldpay PM
Webinar: How to Setup a Product to Perform by Worldpay PM
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Using AI and ML Solutions for Proactive Customer Retention.pptx
Using AI and ML Solutions for Proactive Customer Retention.pptxUsing AI and ML Solutions for Proactive Customer Retention.pptx
Using AI and ML Solutions for Proactive Customer Retention.pptx
 

More from Greg Makowski

Understanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
Greg Makowski
 
A Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data ScientistsA Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data Scientists
Greg Makowski
 
Kdd 2019: Standardizing Data Science to Help Hiring
Kdd 2019:  Standardizing Data Science to Help HiringKdd 2019:  Standardizing Data Science to Help Hiring
Kdd 2019: Standardizing Data Science to Help Hiring
Greg Makowski
 
Tales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareTales from an ip worker in consulting and software
Tales from an ip worker in consulting and software
Greg Makowski
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Greg Makowski
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
Greg Makowski
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
Greg Makowski
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Greg Makowski
 
SFbayACM ACM Data Science Camp 2015 10 24
SFbayACM ACM Data Science Camp 2015 10 24SFbayACM ACM Data Science Camp 2015 10 24
SFbayACM ACM Data Science Camp 2015 10 24
Greg Makowski
 
How to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot ProjectHow to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot Project
Greg Makowski
 
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Greg Makowski
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Greg Makowski
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
Linked In Slides 2009 02 24 B
Linked In Slides 2009 02 24 BLinked In Slides 2009 02 24 B
Linked In Slides 2009 02 24 B
Greg Makowski
 
The 360º Leader (Section 2 of 6)
The 360º Leader (Section 2 of 6)The 360º Leader (Section 2 of 6)
The 360º Leader (Section 2 of 6)
Greg Makowski
 
The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)
Greg Makowski
 

More from Greg Makowski (17)

Understanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
 
A Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data ScientistsA Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data Scientists
 
Kdd 2019: Standardizing Data Science to Help Hiring
Kdd 2019:  Standardizing Data Science to Help HiringKdd 2019:  Standardizing Data Science to Help Hiring
Kdd 2019: Standardizing Data Science to Help Hiring
 
Tales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareTales from an ip worker in consulting and software
Tales from an ip worker in consulting and software
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
 
SFbayACM ACM Data Science Camp 2015 10 24
SFbayACM ACM Data Science Camp 2015 10 24SFbayACM ACM Data Science Camp 2015 10 24
SFbayACM ACM Data Science Camp 2015 10 24
 
How to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot ProjectHow to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot Project
 
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Linked In Slides 2009 02 24 B
Linked In Slides 2009 02 24 BLinked In Slides 2009 02 24 B
Linked In Slides 2009 02 24 B
 
The 360º Leader (Section 2 of 6)
The 360º Leader (Section 2 of 6)The 360º Leader (Section 2 of 6)
The 360º Leader (Section 2 of 6)
 
The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)
 

Recently uploaded

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 

Three case studies deploying cluster analysis

  • 1. SFbayACM.org Data Science Camp Saturday, October 25, 2014 Greg Makowski Twitter Tag #DMCAMP
  • 2.  Customer Description – CC Company – “Who” vs. “How” to Talk to Customers  Hotel Price Optimization – Using Clusters as Non-Linear Constraints  Retail Supply Chain – Planning Replenishment for 52 Week Demand Curves
  • 3.  Context: ◦ Major credit card company ◦ South American Market ◦ Repeat for Argentina, Brazil… and “dollar countries”  Objectives or Problem: ◦ How to best manage the customer population ◦ Develop a software system, to repeat over geography and time ◦ How to AUTOMATE understanding?  How to automate naming the clusters?
  • 4.  Solution, 3 projects for each customer base ◦ “WHO” to talk to…  Customer Attrition Model – Neural Network (5 algs tested)  Decrease in spending over time  Basic vs. Supplemental Cards  By 7 categories  Challenge: Double digit inflation in some countries (90’s)  Standardize by monthly spending  Mining Factoid: Credit Card Digit 11 was predictive  Billing cycles? Monthly salaries + high inflation  Customer Profitability – Net Present Value ◦ “HOW” to talk to them…  Cluster Analysis
  • 5.  Consider Scalability ◦ 100k – 500k customers ◦ Some cluster methods are O(n) or O(n2) ◦ Use Kmeans to create 100 clusters O(n) ◦ Then use O(n2) methods to reduce from 100 clusters down to 8-12 clusters ◦
  • 6. Select the 5-15% customers “highest” in the spike 1 4 7 10 13 16 19 Tree-Net Random Cumulative Profit 5% Customer Groups Total Profit / cell Attrition Profitability 83% of Attrition Profit was Lost in top 15%
  • 7.  How to design the cluster analysis? ◦ Select top fields from neural network  Sensitivity Analysis on the NN  % spending by category  Restaurant, Retail, Grocery, Hotel, Air, Auto, …  Trend over time (slope, expected future value)  Decide to create 8 – 12 clusters or customer segments to communicate to marketers ◦
  • 8.  Consider Scalability ◦ 100k – 500k customers ◦ Some cluster methods are O(n) or O(n2) ◦ Use Kmeans to create 100 clusters O(n) ◦ Then use O(n2) methods to reduce from 100 clusters down to 8-12 clusters ◦
  • 9.  Consider Scalability ◦ 100k – 500k customers ◦ Some cluster methods are O(n) or O(n2) ◦ Use Kmeans to create 100 clusters O(n) ◦ Then use O(n2) methods to reduce from 100 clusters down to 8-12 clusters ◦ This uses all the data scalebly, and more sophisticated hierarchical cluster search ◦
  • 10. Clusters Most customers  Least ALL 1 2 N Ordered by Importance Fields 100% 36% 22%  5%
  • 11. Clusters Most customers  Least Ordered by Importance ALL 1 2 N 100% 36% Fields 22%  5% min MAX
  • 12. Most: Var X, Y, Z Least Var A, B, C May have 12 clusters, 36 variables Then each cluster may have 6 attributes to use in naming min MAX
  • 13.  Select “WHO” with (Attrition)x(Profitability)  Select “HOW” with Cluster Segments ◦ Given the variable selection, only a few clusters matched most of the 15% subset of the customers to manage  Marketers could understand well the different audiences and reasons for attrition – and could better write copy for communication  About 50 Executives walked around with the one page cluster summary in their pocket, frequently used to plan customer strategies
  • 14. Analysis Type CRM Behavior Media Message $$$ Best Customers Upgrade, Downgrade Loyal Loyalty Cross-Sell Prospect Segment Reactivation Attrition Retention Fraud
  • 15.  Customer Description – CC Company – “Who” vs. “How” to Talk to Customers  Hotel Price Optimization – Using Clusters as Non-Linear Constraints  Retail Supply Chain – Planning Replenishment for 52 Week Demand Curves
  • 16.  Objective: ◦ Optimize pricing for hotel rooms ◦ Take into account geography & use  weekend, vacation, business, conference, …  Seasons of the year as it relates to demand  The hotel owns many brands (chains) focused on different audiences ◦ Different price tiers, target audiences,… ◦ Hotel, motel, extended stay, … ◦ What “lessons learned” cross brands?
  • 17.  Revenue Management is a general process used to ◦ optimize profit ◦ given the remaining (plane seat or hotel room) inventory ◦ the remaining time until the inventory is gone  Operations Research ◦ Linear or Non-Linear Programming  Lin or Non-Lin in either constraints or objective function ◦ Need an objective function to optimize  Train predictive models to forecast price, given conditions
  • 18.  Data Mining and Operations Research Design ◦ When training predictive models, it helps to learn behavior “in the same ball park” with the same model. ◦ If the underlying thought process is fairly different, subdivide the data into different subsets and train different models. For example:  Attrition: checking, credit card, line of credit, mortgage  In Mortgage Bond Pricing: monthly prepayment of none vs. 100’s vs. 1,000’s vs. a full refinance
  • 19.  How do we group or divide individual hotels, given all the attributes? ◦ Brand, location, % utilization weekday or weekend,  Find bottom-up clusters, rather than top-down assertions on the data  For cluster variables – use best variables in pricing predictive models (sound familiar?)
  • 20.  Solution: ◦ 1) Build an initial predictive model predicting pricing. Find the most important variables. ◦ 2) Create 8-16 clusters, using those variables ◦ 3) Within each cluster  A) Train a predictive model for use as the OR objective function  B) Run a LINEAR OR price optimization, on the data subset
  • 21.  Customer Description – CC Company – “Who” vs. “How” to Talk to Customers  Hotel Price Optimization – Using Clusters as Non-Linear Constraints  Retail Supply Chain – Planning Replenishment for 52 Week Demand Curves
  • 22.  The “Retail Supply Chain” is from ◦ the manufacturer to ◦ distribution center to ◦ Warehouse to ◦ Store to Consumer  Replenishment is to re-supply products on the shelves ◦ Minimize overstock and understock ◦ Heavy understock causes LOSS OF SALES ◦ Heavy overstock causes 30% end of season liquidation
  • 23.  4,000 stores  100,000 products/SKU’s (stock keeping units) ◦ 400 million store-product combinations  52 weeks per year ◦ 20.8 billion store-product-week combinations  Not the smallest problem in the mid-90’s  Holidays shift in week number, from year to year – need to adjust
  • 24.
  • 25.  End up creating 2,000+ “profiles” or centroids  Assign new store-SKU’s to an existing profile  If it doesn’t match (within a radius)… ◦ Re-run cluster analysis ◦ Lock existing centroids ◦ Create new centroids for data points outside ◦ Add to the “profile library”
  • 26.  Bottom-up findings (after the fact) ◦ Buying hunting related items as the ducks migrate north