SlideShare a Scribd company logo
1 of 17
Application to Typing Tools
By
Bryan Butler
UNSUPERVISED AND
SUPERVISED MACHINE
LEARNING IN
MARKETING WITH R
GOAL: DEVELOP A DYNAMIC SEGMENTATION
TOOL
• Develop a segmentation engine that takes in customer survey responses and segments
them according to their needs
• Segmentation is a very common market research revenue driver
• Critical aspect to segmentation analysis is validation and reproducibility of the model
• Do the segments hold up over time?
• Behavioral/Psychographic segmentation can be blended with traditional demographic or
other segmentations for a finer approach
• Provides a multi-dimensional approach to explain WHY a segment acts in certain
ways
• In this data set, the survey is designed to reveal a series of attributes that help match why
a customer chooses or does not choose to use a company’s products and services
• The tool rates company on customer connection attributes
PARAMETERS AND CONSTRAINTS
• Client Specification: Final tool must be built into Excel
• Common client request, but with significant impact on the choice of model, process,
tools, etc.
• Sample size
• ~900 respondents may not be enough for a larger amount of segments
• Fit 3 -4 segments
• Requires unsupervised learning as first step
• There is no dependent or outcome variable already in the dataset
• Supervised learning to predict clusters
• Dimensional reduction is important part of process
• Questions to consider:
• How much error is acceptable to the end user?
• Are there penalties for false positives and false negatives?
PROJECT ROADMAP
• Design survey, collect a sufficiently large dataset
• Hierarchical Clustering: find clusters using unsupervised learning
• Create dummy variables for each segment
• Multinomial modeling assumptions not likely to hold
• Supervised learning with GLMNET
• Reduce dimensionality
• Fit reduced logistic regressions to each segment
• Employs a “voting” method to choose segment
• Easily embedded in Excel
• Can see the exact drivers of each segment
BEST CLUSTERS OF 3 OR 4
The bend in the plot is the number of segments
DENDROGRAM OF 3 CLUSTERS
Small Segment – Difficult to Predict
OVERLAY CLUSTER ANALYSIS TO EXISTING
SEGMENTS
Psychographic segmentation consists of three groups vs 8 stated segments
Reinforces the selection of 3 segments over 4
Responses to the questions when compared to the segments are shown below:
1 2 3 4 5 6 7 8
Care Organization 25 48 2 2 29 2 0 6
Convenience Store/Reseller 4 15 0 0 13 0 0 1
Foodservice/Restaurant 13 24 1 5 23 2 0 7
Large Family 35 56 1 2 48 1 1 1
Neighborhood Family 20 51 1 2 20 2 0 6
New Mom 32 57 2 7 25 2 1 1
Professional Services Business 30 69 0 1 38 2 0 6
Social Couple 39 76 2 2 31 4 0 6
SUPERVISED LEARNING - GLMNET
• Choose GLMNET model for high performance
• Expect to find the upper bound of accuracy
• Easier to interpret than RF, GBM
• Create dummy variables for each segment
• End result will be three binary/logistic regressions; one for each segment
• Use the probability output rather than classification to allow for “voting”
• Ex. Prob(Segment1) = .21, Prob(Segment2) = .55, Prob(Segment3) = .90
• Respondent assigned to Segment3
• Split data into training and testing sets
• Use a 70/30 split
GLMNET PERFORMANCE ON SEGMENT 1 – VERY
HIGH
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 164 1
1 1 104
Accuracy : 0.9926
95% CI : (0.9735, 0.9991)
No Information Rate : 0.6111
P-Value [Acc > NIR] : <2e-16
Segments 2 and 3 had accuracy of 94% and 80%
VARIABLE IMPORTANCE SEGMENT 1 - GLMNET
MODEL REDUCTION – LOGISTIC REGRESSION
FOR SEGMENT 1
• Dimensional reduction distilled the model for Segment 1 to four questions
• Q16: Appreciates my loyalty
• Q22: I feel proud
• Q25: Sense of belonging
• Q13: Use own products/services
• Segment is focused on the values provided by emotional validation and its associated
benefits
LOGISTIC REGRESSION PERFORMANCE ON
SEGMENT 1
Reference
Prediction Other Seg1
Other 159 11
Seg1 6 94
Accuracy : 0.937
95% CI : (0.9011, 0.9629)
No Information Rate : 0.6111
P-Value [Acc > NIR] : <2e-16
SEGMENT 2 VARIABLE IMPORTANCE
Best Predictors of
the Segment
Characteristics of Other Segments
SEGMENT 2 – LOGISTIC REGRESSION MODEL
PERFORMANCE
Model uses reduced set of questions: Q10, Q3, Q23,Q9
Focus of questions is customer service
Reference
Prediction Other Seg2
Other 238 8
Seg2 7 17
Accuracy : 0.9444
95% CI : (0.91, 0.9686)
No Information Rate : 0.9074
P-Value [Acc > NIR] : 0.01785
SEGMENT 3 –VARIABLE IMPORTANCE
SEGMENT 3 – LOGISTIC REGRESSION MODEL
PERFORMANCE
Model uses reduced set of questions: Q16, Q23
Focus of question is value
Reference
Prediction Other Seg3
Other 103 31
Seg3 27 109
Accuracy : 0.7852
95% CI : (0.7313, 0.8327)
No Information Rate : 0.5185
P-Value [Acc > NIR] : <2e-16
DEVELOPING THE FINAL ENGINE
• GLMNET was used to find the highest performing model and also reduce the dimensionality of
the survey to a focused set of questions
• Reduced survey for tool from 17 to 9 questions
• Logistic regressions were fit from the GLMNET output based on variable importance
• Generally performed with good accuracy, but lower than GLMNET
• Performance evaluated with CV, ROC and Confusion Matrix
• One model was developed for each segment
• Final assignment made based on a voting approach
• Final test made across all survey respondents
• Smallest segment had most error as expected
• Overall model accuracy was 85%; acceptable to the client
• No penalties for misclassification

More Related Content

What's hot

SPC WithAdrian Adrian Beale
SPC WithAdrian Adrian BealeSPC WithAdrian Adrian Beale
SPC WithAdrian Adrian BealeAdrian Beale
 
A robust multi criteria optimization approach
A robust multi criteria optimization approachA robust multi criteria optimization approach
A robust multi criteria optimization approachPhuong Dx
 
Process Capability: Overview
Process Capability: OverviewProcess Capability: Overview
Process Capability: OverviewMatt Hansen
 
Process capability
Process capabilityProcess capability
Process capabilityajaymadhale
 
Process capability
Process capabilityProcess capability
Process capabilitypadam nagar
 
Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Sunny Mervyne Baa
 
Industrial Examples - Process Capability in Total Quality Management
Industrial Examples - Process Capability in Total Quality ManagementIndustrial Examples - Process Capability in Total Quality Management
Industrial Examples - Process Capability in Total Quality ManagementDr.Raja R
 
Spc lecture presentation (bonnie corrror)
Spc lecture presentation (bonnie corrror)Spc lecture presentation (bonnie corrror)
Spc lecture presentation (bonnie corrror)Jitesh Gaurav
 
Clahrc ps cmeeting_21st_sept2015_spacer_project_kp
Clahrc ps cmeeting_21st_sept2015_spacer_project_kpClahrc ps cmeeting_21st_sept2015_spacer_project_kp
Clahrc ps cmeeting_21st_sept2015_spacer_project_kpNIHR CLAHRC West Midlands
 
OpEx SPC Training Module
OpEx SPC Training ModuleOpEx SPC Training Module
OpEx SPC Training Moduleguestad37e2f
 
Operations Research - Models
Operations Research - ModelsOperations Research - Models
Operations Research - ModelsSundar B N
 
Six sigma part1: Process capability
Six sigma part1: Process capabilitySix sigma part1: Process capability
Six sigma part1: Process capabilityNavneet Dwivedi
 
Summary research on c pk vs cp
Summary research on c pk vs cpSummary research on c pk vs cp
Summary research on c pk vs cpIngrid McKenzie
 
Example Solutions for Scheduling and Work Planning
Example Solutions for Scheduling and Work PlanningExample Solutions for Scheduling and Work Planning
Example Solutions for Scheduling and Work PlanningSIS Group International
 
Nota Bab 1 JF608
Nota Bab 1 JF608Nota Bab 1 JF608
Nota Bab 1 JF608Mira Awang
 

What's hot (20)

SPC WithAdrian Adrian Beale
SPC WithAdrian Adrian BealeSPC WithAdrian Adrian Beale
SPC WithAdrian Adrian Beale
 
A robust multi criteria optimization approach
A robust multi criteria optimization approachA robust multi criteria optimization approach
A robust multi criteria optimization approach
 
Process Capability: Overview
Process Capability: OverviewProcess Capability: Overview
Process Capability: Overview
 
Process capability
Process capabilityProcess capability
Process capability
 
Process capability
Process capabilityProcess capability
Process capability
 
6 sigma
6 sigma 6 sigma
6 sigma
 
Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...
 
Industrial Examples - Process Capability in Total Quality Management
Industrial Examples - Process Capability in Total Quality ManagementIndustrial Examples - Process Capability in Total Quality Management
Industrial Examples - Process Capability in Total Quality Management
 
23 timestudy
23 timestudy23 timestudy
23 timestudy
 
Spc lecture presentation (bonnie corrror)
Spc lecture presentation (bonnie corrror)Spc lecture presentation (bonnie corrror)
Spc lecture presentation (bonnie corrror)
 
Clahrc ps cmeeting_21st_sept2015_spacer_project_kp
Clahrc ps cmeeting_21st_sept2015_spacer_project_kpClahrc ps cmeeting_21st_sept2015_spacer_project_kp
Clahrc ps cmeeting_21st_sept2015_spacer_project_kp
 
OpEx SPC Training Module
OpEx SPC Training ModuleOpEx SPC Training Module
OpEx SPC Training Module
 
Operations Research - Models
Operations Research - ModelsOperations Research - Models
Operations Research - Models
 
Evolutionary Operation
Evolutionary OperationEvolutionary Operation
Evolutionary Operation
 
Team 16_Report
Team 16_ReportTeam 16_Report
Team 16_Report
 
Six sigma part1: Process capability
Six sigma part1: Process capabilitySix sigma part1: Process capability
Six sigma part1: Process capability
 
Summary research on c pk vs cp
Summary research on c pk vs cpSummary research on c pk vs cp
Summary research on c pk vs cp
 
Example Solutions for Scheduling and Work Planning
Example Solutions for Scheduling and Work PlanningExample Solutions for Scheduling and Work Planning
Example Solutions for Scheduling and Work Planning
 
6 control charts
6 control charts6 control charts
6 control charts
 
Nota Bab 1 JF608
Nota Bab 1 JF608Nota Bab 1 JF608
Nota Bab 1 JF608
 

Viewers also liked

Naveed_Presentation_Mayo
Naveed_Presentation_MayoNaveed_Presentation_Mayo
Naveed_Presentation_MayoNaveed Afzal
 
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...Y-h Taguchi
 
Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...Y-h Taguchi
 
Apparent microRNA-target-specific histone modification in mammalian spermatog...
Apparent microRNA-target-specific histone modification in mammalian spermatog...Apparent microRNA-target-specific histone modification in mammalian spermatog...
Apparent microRNA-target-specific histone modification in mammalian spermatog...Y-h Taguchi
 
Fraud Analysis and Other Applications of Unsupervised Learning in Property an...
Fraud Analysis and Other Applications of Unsupervised Learning in Property an...Fraud Analysis and Other Applications of Unsupervised Learning in Property an...
Fraud Analysis and Other Applications of Unsupervised Learning in Property an...Salford Systems
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applicationsAnish Das
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learningbutest
 

Viewers also liked (11)

Naveed_Presentation_Mayo
Naveed_Presentation_MayoNaveed_Presentation_Mayo
Naveed_Presentation_Mayo
 
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
 
Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...
 
Apparent microRNA-target-specific histone modification in mammalian spermatog...
Apparent microRNA-target-specific histone modification in mammalian spermatog...Apparent microRNA-target-specific histone modification in mammalian spermatog...
Apparent microRNA-target-specific histone modification in mammalian spermatog...
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Fraud Analysis and Other Applications of Unsupervised Learning in Property an...
Fraud Analysis and Other Applications of Unsupervised Learning in Property an...Fraud Analysis and Other Applications of Unsupervised Learning in Property an...
Fraud Analysis and Other Applications of Unsupervised Learning in Property an...
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
 

Similar to machineLearningTypingTool_Rev1

Leveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataLeveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataMRS
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...IRJET Journal
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEFeng Zhu
 
Quality andc apability hand out 091123200010 Phpapp01
Quality andc apability hand out 091123200010 Phpapp01Quality andc apability hand out 091123200010 Phpapp01
Quality andc apability hand out 091123200010 Phpapp01jasonhian
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detectionjagan477830
 
six sigma-s04.ppt
six sigma-s04.pptsix sigma-s04.ppt
six sigma-s04.pptHassanHani5
 
TELECOM_CHURN_PREDICTIAAAAAAAAAAAAAAAAAON[1].pptx
TELECOM_CHURN_PREDICTIAAAAAAAAAAAAAAAAAON[1].pptxTELECOM_CHURN_PREDICTIAAAAAAAAAAAAAAAAAON[1].pptx
TELECOM_CHURN_PREDICTIAAAAAAAAAAAAAAAAAON[1].pptxGaganaGowda31
 
Improving Customer Experience via Experimentation Dynamic Talks: San Francisc...
Improving Customer Experience via Experimentation Dynamic Talks: San Francisc...Improving Customer Experience via Experimentation Dynamic Talks: San Francisc...
Improving Customer Experience via Experimentation Dynamic Talks: San Francisc...Grid Dynamics
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilitiesAllan D. Butler
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxAniket Patil
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxpatilaniket2418
 

Similar to machineLearningTypingTool_Rev1 (20)

Leveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataLeveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic data
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
 
Tqm unit 4
Tqm unit 4Tqm unit 4
Tqm unit 4
 
Six Sigma
Six SigmaSix Sigma
Six Sigma
 
Quality andc apability hand out 091123200010 Phpapp01
Quality andc apability hand out 091123200010 Phpapp01Quality andc apability hand out 091123200010 Phpapp01
Quality andc apability hand out 091123200010 Phpapp01
 
Six sigma & TQM
Six sigma & TQMSix sigma & TQM
Six sigma & TQM
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 
Six Sigma-s04.ppt
Six Sigma-s04.pptSix Sigma-s04.ppt
Six Sigma-s04.ppt
 
Six Sigma-s04.ppt
Six Sigma-s04.pptSix Sigma-s04.ppt
Six Sigma-s04.ppt
 
six sigma-s04.ppt
six sigma-s04.pptsix sigma-s04.ppt
six sigma-s04.ppt
 
TELECOM_CHURN_PREDICTIAAAAAAAAAAAAAAAAAON[1].pptx
TELECOM_CHURN_PREDICTIAAAAAAAAAAAAAAAAAON[1].pptxTELECOM_CHURN_PREDICTIAAAAAAAAAAAAAAAAAON[1].pptx
TELECOM_CHURN_PREDICTIAAAAAAAAAAAAAAAAAON[1].pptx
 
Improving Customer Experience via Experimentation Dynamic Talks: San Francisc...
Improving Customer Experience via Experimentation Dynamic Talks: San Francisc...Improving Customer Experience via Experimentation Dynamic Talks: San Francisc...
Improving Customer Experience via Experimentation Dynamic Talks: San Francisc...
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research Paper
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
plt
pltplt
plt
 
Models ABC
Models ABCModels ABC
Models ABC
 

machineLearningTypingTool_Rev1

  • 1. Application to Typing Tools By Bryan Butler UNSUPERVISED AND SUPERVISED MACHINE LEARNING IN MARKETING WITH R
  • 2. GOAL: DEVELOP A DYNAMIC SEGMENTATION TOOL • Develop a segmentation engine that takes in customer survey responses and segments them according to their needs • Segmentation is a very common market research revenue driver • Critical aspect to segmentation analysis is validation and reproducibility of the model • Do the segments hold up over time? • Behavioral/Psychographic segmentation can be blended with traditional demographic or other segmentations for a finer approach • Provides a multi-dimensional approach to explain WHY a segment acts in certain ways • In this data set, the survey is designed to reveal a series of attributes that help match why a customer chooses or does not choose to use a company’s products and services • The tool rates company on customer connection attributes
  • 3. PARAMETERS AND CONSTRAINTS • Client Specification: Final tool must be built into Excel • Common client request, but with significant impact on the choice of model, process, tools, etc. • Sample size • ~900 respondents may not be enough for a larger amount of segments • Fit 3 -4 segments • Requires unsupervised learning as first step • There is no dependent or outcome variable already in the dataset • Supervised learning to predict clusters • Dimensional reduction is important part of process • Questions to consider: • How much error is acceptable to the end user? • Are there penalties for false positives and false negatives?
  • 4. PROJECT ROADMAP • Design survey, collect a sufficiently large dataset • Hierarchical Clustering: find clusters using unsupervised learning • Create dummy variables for each segment • Multinomial modeling assumptions not likely to hold • Supervised learning with GLMNET • Reduce dimensionality • Fit reduced logistic regressions to each segment • Employs a “voting” method to choose segment • Easily embedded in Excel • Can see the exact drivers of each segment
  • 5. BEST CLUSTERS OF 3 OR 4 The bend in the plot is the number of segments
  • 6. DENDROGRAM OF 3 CLUSTERS Small Segment – Difficult to Predict
  • 7. OVERLAY CLUSTER ANALYSIS TO EXISTING SEGMENTS Psychographic segmentation consists of three groups vs 8 stated segments Reinforces the selection of 3 segments over 4 Responses to the questions when compared to the segments are shown below: 1 2 3 4 5 6 7 8 Care Organization 25 48 2 2 29 2 0 6 Convenience Store/Reseller 4 15 0 0 13 0 0 1 Foodservice/Restaurant 13 24 1 5 23 2 0 7 Large Family 35 56 1 2 48 1 1 1 Neighborhood Family 20 51 1 2 20 2 0 6 New Mom 32 57 2 7 25 2 1 1 Professional Services Business 30 69 0 1 38 2 0 6 Social Couple 39 76 2 2 31 4 0 6
  • 8. SUPERVISED LEARNING - GLMNET • Choose GLMNET model for high performance • Expect to find the upper bound of accuracy • Easier to interpret than RF, GBM • Create dummy variables for each segment • End result will be three binary/logistic regressions; one for each segment • Use the probability output rather than classification to allow for “voting” • Ex. Prob(Segment1) = .21, Prob(Segment2) = .55, Prob(Segment3) = .90 • Respondent assigned to Segment3 • Split data into training and testing sets • Use a 70/30 split
  • 9. GLMNET PERFORMANCE ON SEGMENT 1 – VERY HIGH Confusion Matrix and Statistics Reference Prediction 0 1 0 164 1 1 1 104 Accuracy : 0.9926 95% CI : (0.9735, 0.9991) No Information Rate : 0.6111 P-Value [Acc > NIR] : <2e-16 Segments 2 and 3 had accuracy of 94% and 80%
  • 11. MODEL REDUCTION – LOGISTIC REGRESSION FOR SEGMENT 1 • Dimensional reduction distilled the model for Segment 1 to four questions • Q16: Appreciates my loyalty • Q22: I feel proud • Q25: Sense of belonging • Q13: Use own products/services • Segment is focused on the values provided by emotional validation and its associated benefits
  • 12. LOGISTIC REGRESSION PERFORMANCE ON SEGMENT 1 Reference Prediction Other Seg1 Other 159 11 Seg1 6 94 Accuracy : 0.937 95% CI : (0.9011, 0.9629) No Information Rate : 0.6111 P-Value [Acc > NIR] : <2e-16
  • 13. SEGMENT 2 VARIABLE IMPORTANCE Best Predictors of the Segment Characteristics of Other Segments
  • 14. SEGMENT 2 – LOGISTIC REGRESSION MODEL PERFORMANCE Model uses reduced set of questions: Q10, Q3, Q23,Q9 Focus of questions is customer service Reference Prediction Other Seg2 Other 238 8 Seg2 7 17 Accuracy : 0.9444 95% CI : (0.91, 0.9686) No Information Rate : 0.9074 P-Value [Acc > NIR] : 0.01785
  • 15. SEGMENT 3 –VARIABLE IMPORTANCE
  • 16. SEGMENT 3 – LOGISTIC REGRESSION MODEL PERFORMANCE Model uses reduced set of questions: Q16, Q23 Focus of question is value Reference Prediction Other Seg3 Other 103 31 Seg3 27 109 Accuracy : 0.7852 95% CI : (0.7313, 0.8327) No Information Rate : 0.5185 P-Value [Acc > NIR] : <2e-16
  • 17. DEVELOPING THE FINAL ENGINE • GLMNET was used to find the highest performing model and also reduce the dimensionality of the survey to a focused set of questions • Reduced survey for tool from 17 to 9 questions • Logistic regressions were fit from the GLMNET output based on variable importance • Generally performed with good accuracy, but lower than GLMNET • Performance evaluated with CV, ROC and Confusion Matrix • One model was developed for each segment • Final assignment made based on a voting approach • Final test made across all survey respondents • Smallest segment had most error as expected • Overall model accuracy was 85%; acceptable to the client • No penalties for misclassification