SlideShare a Scribd company logo
1 of 17
© 2013, PAYCHEX, Inc. All rights reserved.
Retention Modeling in
Uncertain Economic
Times
Chip Galusha
Data Scientist
Paychex, Inc.
© 2013, PAYCHEX, Inc. All rights reserved.
History of Paychex
1983
1971
Market Cap
$18 Billion
Total Revenue
$2.7 Billion
A leading provider of integrated
human capital management,
providing businesses the
freedom to succeed.
Today
Data Science @ Paychex
3
Data
Science
Team
Marketing
Sales
Information
Technology
FinanceRisk
Human
Resources
Service
Strategic Client Retention
 2009: developed and deployed first predictive retention model
4
2009 2010 201620152014201320122011
V1 V2 V3 V4
 2012: Clients with retention strategy applied we’re 80% less likely to leave
Controllable Losses
 Price
 Service Issue
 Dissatisfaction w/ Product
Uncontrollable Losses Out of Business
Business Acquired
Business Change
Out of Business
Business Acquired
Business Change
 Out of Business
 Business Acquired
 Business Change
𝑃 𝐶𝑙𝑖𝑒𝑛𝑡 𝐿𝑜𝑠𝑠 𝑿𝑖 ) =
1
1 + 𝑒−(β𝑿 𝑖)
Retention Tracking System
F
• Business 1: Annual Est. Revenue
• Business 2: Annual Est. Revenue
• …
D
• Business 3: Annual Est. Revenue
• Business 4: Annual Est. Revenue
• …
- Retention specialists proactively
call clients based on retention
score to mitigate potential issues
- Client payroll liability
- Employee counts
- ...
- Business Type
- Geographic
- Credit Info
- …
Model
Scores
Clients
Monthly
Inputs Output
- Probability
of Client
leaving
Paychex
600K clients; 8 full-time equivalent retention specialists
National Retention Team
Problem: Concept Drift
6
The underlying assumptions and distributions change over time
with respect to predicting losses
time
 Population Stability Index (PSI) scores help us identify shifts in the
distributions of the variables in our models
 What happens when the relationship with the target changes?
Example of Concept Drift
7
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Date
11/1/2012
12/1/2012
1/1/2013
2/1/2013
3/1/2013
4/1/2013
5/1/2013
6/1/2013
7/1/2013
8/1/2013
9/1/2013
10/1/2013
11/1/2013
12/1/2013
1/1/2014
2/1/2014
3/1/2014
4/1/2014
5/1/2014
6/1/2014
7/1/2014
8/1/2014
9/1/2014
10/1/2014
11/1/2014
12/1/2014
1/1/2015
2/1/2015
3/1/2015
4/1/2015
5/1/2015
6/1/2015
7/1/2015
8/1/2015
9/1/2015
10/1/2015
11/1/2015
12/1/2015
1/1/2016
2/1/2016
3/1/2016
4/1/2016
5/1/2016
6/1/2016
7/1/2016
8/1/2016
9/1/2016
10/1/2016
11/1/2016
12/1/2016
1/1/2017
2/1/2017
3/1/2017
California Accommodation & Food Services:
Year over Year Change in Losses
yoy 6 per. Mov. Avg. (yoy)
Solution: Dynamic Model Averaging*
8
Dynamic
 Adapts to account for concept drift
 Allows for the inclusion of new variables as they become available
Model Averaging
 Use an ensemble of historical models to estimate probability of client loss
 Establishing a weighting heuristic to apply each model
* Not Associated w/ Raftery et al. (2010)
Model Averaging
9
 Tested four alternatives:
Dynamic Model Averaging
Regression Combiner
Use a weighted average base on ridge regression
,t-12 t-3*
t-12 t-11 t-4 t-3…
Average Model Scores From Each Model
Baseline Dynamic
,t-3*
Use Most Recent Model
One Year Prior Dynamic
,t-12*
Use Model from One Year Prior
 The best performance
achieved by the averaged
scores of a trend model (t-3)
and seasonal model (t-12)
Dynamic Variables Inclusion
 Access to data is continually improving. How can we adapt to this?
 Just join new variables to production data set!
 What if variables are discontinued?
 We can identify which variables were used in our model 12 months ago
but aren’t in our current scoring data set.
 Add zero-filled column
10
0
0
⋮
Model Fitting: Logistic Regression via
LASSO (aka Elastic Net)
 Least Absolute Shrinkage and Selection Operator(LASSO)
 Regularize logistic regression
 Standardize variables (scale them to mean = 0 and sd = 1)
 In traditional modeling, we choose to either include or exclude a variable in a
model (domain knowledge, AIC/BIC)
 LASSO automatically includes all variables in the model, but shrinks
their impact on the model if it is not important
 By shrinking the coefficients in the model, we can drive some
coefficients to zero.
11
Example LASSO Plot
12
• A vertical line represents a cross-
sectional view of the coefficients for
a model.
• The red line indicates a chosen
parameter/model. It is a regression
with 3 variables (3 coefficients)
 Lcavol: 0.55
 Weight: 0.17
 Svi : 0.09
• With a very large shrinkage factor,
the sum of usual regression
coefficients are less than the
constraint, so this is exactly the
same as using a regular regression
• With a very small shrinkage factor,
the coefficients must sum to zero, so
there are no coefficients
Variables*
13
Continuous Variables
 Trend: Mann-Kendall Trend Test Statistic
 Standard Deviation
 Average
 Demand Index
 Length of Service transformed using a Spline
Categorical Variables
 # of Discount Products > # of Billed Products – {0,1}
 Has Penalty in last 6 months & LOS < 2 years – {0,1}
 Has Case in last 6 months & LOS < 2 years – {0,1}
 Has Return in last 6 months & LOS < 2 years – {0,1}
 Start Month - {1,…,12}
 401k, Taxpay, Time & Attendance Indicators – {0,1}
* Final variables(w/ non Zero Coefficients) will change month by month
Performance Tracking
14
Model
Performance
Technical
Discriminative
Ability of Model
 ROC Curve
 Brier score
Diagnostic
 Variable
Importance
 Client Score
Shifts
Business
Strategic
Diagnostics
 Calls / Save
Model Valuation
 $$$
 Proper
resource
allocation
Technical Performance Tracking
15
Next Steps
16
 Experiment with scoring
 Update weighting heuristic
 Apply strategy to other model developments
© 2013, PAYCHEX, Inc. All rights reserved.
Thank You
Chip Galusha
Data Scientist
fgalusha@paychex.com

More Related Content

What's hot

What's hot (12)

TQM - 4 - Major TQM Tools - Joseph KK Ho
TQM - 4 - Major TQM Tools - Joseph KK HoTQM - 4 - Major TQM Tools - Joseph KK Ho
TQM - 4 - Major TQM Tools - Joseph KK Ho
 
Histogram
HistogramHistogram
Histogram
 
7 qc tools
7 qc tools7 qc tools
7 qc tools
 
1a s4 i creating runcharts final
1a s4 i creating runcharts final1a s4 i creating runcharts final
1a s4 i creating runcharts final
 
Seven tools of TQM
Seven tools of TQMSeven tools of TQM
Seven tools of TQM
 
Runchart
RunchartRunchart
Runchart
 
7 qc tools
7 qc tools7 qc tools
7 qc tools
 
Mangt tool with statistical process control ch 18 asif jamal
Mangt tool with statistical process control  ch 18 asif jamalMangt tool with statistical process control  ch 18 asif jamal
Mangt tool with statistical process control ch 18 asif jamal
 
7 Quality tools by krishna heda
7 Quality tools by krishna heda7 Quality tools by krishna heda
7 Quality tools by krishna heda
 
SPC,SQC & QC TOOLS
SPC,SQC & QC TOOLSSPC,SQC & QC TOOLS
SPC,SQC & QC TOOLS
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
7 qc tool training
7 qc tool  training7 qc tool  training
7 qc tool training
 

Similar to 1440 track3 galusha

Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
Eric Esajian
 
Six sigma
Six sigmaSix sigma
Six sigma
Home
 

Similar to 1440 track3 galusha (19)

Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Useful Tools for Problem Solving by Operational Excellence Consulting
Useful Tools for Problem Solving by Operational Excellence ConsultingUseful Tools for Problem Solving by Operational Excellence Consulting
Useful Tools for Problem Solving by Operational Excellence Consulting
 
1030 track 3 rolleston_using our laptop
1030 track 3 rolleston_using our laptop1030 track 3 rolleston_using our laptop
1030 track 3 rolleston_using our laptop
 
six-sigma2040.ppt
six-sigma2040.pptsix-sigma2040.ppt
six-sigma2040.ppt
 
Improving the performance of Telco Churn Predictive Model with SPSS & 6 Sigma
Improving the performance of Telco Churn Predictive Model with SPSS & 6 SigmaImproving the performance of Telco Churn Predictive Model with SPSS & 6 Sigma
Improving the performance of Telco Churn Predictive Model with SPSS & 6 Sigma
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
6 sigma what is it
6 sigma what is it6 sigma what is it
6 sigma what is it
 
Data Mining Problems in Retail
Data Mining Problems in RetailData Mining Problems in Retail
Data Mining Problems in Retail
 
Analytics
AnalyticsAnalytics
Analytics
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Machine learning project
Machine learning project Machine learning project
Machine learning project
 
Operations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paperOperations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paper
 
6 sigma
6 sigma6 sigma
6 sigma
 
Six sigma pedagogy
Six sigma pedagogySix sigma pedagogy
Six sigma pedagogy
 
Six sigma
Six sigma Six sigma
Six sigma
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
 
Ch3. Demand Forecasting.ppt
Ch3. Demand Forecasting.pptCh3. Demand Forecasting.ppt
Ch3. Demand Forecasting.ppt
 
Six sigma
Six sigmaSix sigma
Six sigma
 
Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...
Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...
Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...
 

More from Rising Media, Inc.

More from Rising Media, Inc. (20)

1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop
 
Matt gershoff
Matt gershoffMatt gershoff
Matt gershoff
 
Keynote adam greco
Keynote adam grecoKeynote adam greco
Keynote adam greco
 
1620 keynote olson_using our laptop
1620 keynote olson_using our laptop1620 keynote olson_using our laptop
1620 keynote olson_using our laptop
 
1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop
 
1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop
 
1415 track 2 richardson
1415 track 2 richardson1415 track 2 richardson
1415 track 2 richardson
 
1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop
 
1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop
 
915 e metrics_claudia perlich
915 e metrics_claudia perlich915 e metrics_claudia perlich
915 e metrics_claudia perlich
 
855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop
 
1615 plack using our laptop
1615 plack using our laptop1615 plack using our laptop
1615 plack using our laptop
 
1530 rimmele do not share
1530 rimmele do not share1530 rimmele do not share
1530 rimmele do not share
 
1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable
 
1115 fiztgerald schuchardt
1115 fiztgerald schuchardt1115 fiztgerald schuchardt
1115 fiztgerald schuchardt
 
1000 kondic do not share
1000 kondic do not share1000 kondic do not share
1000 kondic do not share
 
905 keynote peele_using our laptop
905 keynote peele_using our laptop905 keynote peele_using our laptop
905 keynote peele_using our laptop
 
Stephen morse sharable
Stephen morse sharableStephen morse sharable
Stephen morse sharable
 
Elder shareable
Elder shareableElder shareable
Elder shareable
 
1115 ramirez using our laptop
1115 ramirez using our laptop1115 ramirez using our laptop
1115 ramirez using our laptop
 

Recently uploaded

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 

Recently uploaded (20)

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
👉 Bhilai Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl Ser...
👉 Bhilai Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl Ser...👉 Bhilai Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl Ser...
👉 Bhilai Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl Ser...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 

1440 track3 galusha

  • 1. © 2013, PAYCHEX, Inc. All rights reserved. Retention Modeling in Uncertain Economic Times Chip Galusha Data Scientist Paychex, Inc.
  • 2. © 2013, PAYCHEX, Inc. All rights reserved. History of Paychex 1983 1971 Market Cap $18 Billion Total Revenue $2.7 Billion A leading provider of integrated human capital management, providing businesses the freedom to succeed. Today
  • 3. Data Science @ Paychex 3 Data Science Team Marketing Sales Information Technology FinanceRisk Human Resources Service
  • 4. Strategic Client Retention  2009: developed and deployed first predictive retention model 4 2009 2010 201620152014201320122011 V1 V2 V3 V4  2012: Clients with retention strategy applied we’re 80% less likely to leave Controllable Losses  Price  Service Issue  Dissatisfaction w/ Product Uncontrollable Losses Out of Business Business Acquired Business Change Out of Business Business Acquired Business Change  Out of Business  Business Acquired  Business Change 𝑃 𝐶𝑙𝑖𝑒𝑛𝑡 𝐿𝑜𝑠𝑠 𝑿𝑖 ) = 1 1 + 𝑒−(β𝑿 𝑖)
  • 5. Retention Tracking System F • Business 1: Annual Est. Revenue • Business 2: Annual Est. Revenue • … D • Business 3: Annual Est. Revenue • Business 4: Annual Est. Revenue • … - Retention specialists proactively call clients based on retention score to mitigate potential issues - Client payroll liability - Employee counts - ... - Business Type - Geographic - Credit Info - … Model Scores Clients Monthly Inputs Output - Probability of Client leaving Paychex 600K clients; 8 full-time equivalent retention specialists National Retention Team
  • 6. Problem: Concept Drift 6 The underlying assumptions and distributions change over time with respect to predicting losses time  Population Stability Index (PSI) scores help us identify shifts in the distributions of the variables in our models  What happens when the relationship with the target changes?
  • 7. Example of Concept Drift 7 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Date 11/1/2012 12/1/2012 1/1/2013 2/1/2013 3/1/2013 4/1/2013 5/1/2013 6/1/2013 7/1/2013 8/1/2013 9/1/2013 10/1/2013 11/1/2013 12/1/2013 1/1/2014 2/1/2014 3/1/2014 4/1/2014 5/1/2014 6/1/2014 7/1/2014 8/1/2014 9/1/2014 10/1/2014 11/1/2014 12/1/2014 1/1/2015 2/1/2015 3/1/2015 4/1/2015 5/1/2015 6/1/2015 7/1/2015 8/1/2015 9/1/2015 10/1/2015 11/1/2015 12/1/2015 1/1/2016 2/1/2016 3/1/2016 4/1/2016 5/1/2016 6/1/2016 7/1/2016 8/1/2016 9/1/2016 10/1/2016 11/1/2016 12/1/2016 1/1/2017 2/1/2017 3/1/2017 California Accommodation & Food Services: Year over Year Change in Losses yoy 6 per. Mov. Avg. (yoy)
  • 8. Solution: Dynamic Model Averaging* 8 Dynamic  Adapts to account for concept drift  Allows for the inclusion of new variables as they become available Model Averaging  Use an ensemble of historical models to estimate probability of client loss  Establishing a weighting heuristic to apply each model * Not Associated w/ Raftery et al. (2010)
  • 9. Model Averaging 9  Tested four alternatives: Dynamic Model Averaging Regression Combiner Use a weighted average base on ridge regression ,t-12 t-3* t-12 t-11 t-4 t-3… Average Model Scores From Each Model Baseline Dynamic ,t-3* Use Most Recent Model One Year Prior Dynamic ,t-12* Use Model from One Year Prior  The best performance achieved by the averaged scores of a trend model (t-3) and seasonal model (t-12)
  • 10. Dynamic Variables Inclusion  Access to data is continually improving. How can we adapt to this?  Just join new variables to production data set!  What if variables are discontinued?  We can identify which variables were used in our model 12 months ago but aren’t in our current scoring data set.  Add zero-filled column 10 0 0 ⋮
  • 11. Model Fitting: Logistic Regression via LASSO (aka Elastic Net)  Least Absolute Shrinkage and Selection Operator(LASSO)  Regularize logistic regression  Standardize variables (scale them to mean = 0 and sd = 1)  In traditional modeling, we choose to either include or exclude a variable in a model (domain knowledge, AIC/BIC)  LASSO automatically includes all variables in the model, but shrinks their impact on the model if it is not important  By shrinking the coefficients in the model, we can drive some coefficients to zero. 11
  • 12. Example LASSO Plot 12 • A vertical line represents a cross- sectional view of the coefficients for a model. • The red line indicates a chosen parameter/model. It is a regression with 3 variables (3 coefficients)  Lcavol: 0.55  Weight: 0.17  Svi : 0.09 • With a very large shrinkage factor, the sum of usual regression coefficients are less than the constraint, so this is exactly the same as using a regular regression • With a very small shrinkage factor, the coefficients must sum to zero, so there are no coefficients
  • 13. Variables* 13 Continuous Variables  Trend: Mann-Kendall Trend Test Statistic  Standard Deviation  Average  Demand Index  Length of Service transformed using a Spline Categorical Variables  # of Discount Products > # of Billed Products – {0,1}  Has Penalty in last 6 months & LOS < 2 years – {0,1}  Has Case in last 6 months & LOS < 2 years – {0,1}  Has Return in last 6 months & LOS < 2 years – {0,1}  Start Month - {1,…,12}  401k, Taxpay, Time & Attendance Indicators – {0,1} * Final variables(w/ non Zero Coefficients) will change month by month
  • 14. Performance Tracking 14 Model Performance Technical Discriminative Ability of Model  ROC Curve  Brier score Diagnostic  Variable Importance  Client Score Shifts Business Strategic Diagnostics  Calls / Save Model Valuation  $$$  Proper resource allocation
  • 16. Next Steps 16  Experiment with scoring  Update weighting heuristic  Apply strategy to other model developments
  • 17. © 2013, PAYCHEX, Inc. All rights reserved. Thank You Chip Galusha Data Scientist fgalusha@paychex.com