Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Database	
  Marketing	
  and	
  CRM	
  –	
  
Analyzing	
  DONOR	
  data	
  set	
  	
  
Akanksha	
  Jain	
  
Project	
  Goals	
  
•  Goal:	
  Using	
  historical	
  data	
  set	
  DONOR_RAW,	
  develop	
  a	
  

model	
  which	
  c...
TOOLS	
  
•  SAS	
  Enterprise	
  Miner	
  4.3	
  
•  SAS	
  9.3_M1	
  
Diagram 	
  	
  
Diagram	
  (con’t)	
  
Data	
  Source	
  
•  Reject	
  Variables:	
  
•  TARGET_D	
  (using	
  TARGET_B	
  as	
  target)	
  
•  ID	
  (an	
  id	
...
Data	
  Partition	
  
•  Train	
  –	
  60%	
  
•  Validate	
  –	
  25%	
  
•  Test	
  –	
  15%	
  
Variable	
  Transformation	
  
Taking	
  Log	
  TransformaKon	
  to	
  reduce	
  Skewness	
  
•  LIFETIME_GIFT_RANGE	
  
•...
Model:	
  CHAID	
  
•  Nominal	
  Criterion:	
  Chi	
  Square	
  
•  Significance	
  Level:	
  0.1	
  
•  Minimum	
  number...
Model:	
  CHAID	
  (con’t)	
  
Model:	
  CHAID	
  (con’t)	
  
Inference:	
  
	
  FREQUENCY_STATUS_97NK	
  =	
  3	
  or	
  4;	
  
MONTHS_SINCE_LAST_GIFT	
...
Variable	
  Selection	
  
•  Target	
  AssociaKons:	
  Select	
  Chi	
  Square	
  
Model:	
  Forward	
  Regression	
  
MODEL	
  OPTIONS	
  -­‐>	
  INPUT	
  CODING	
  -­‐>DEVIATION	
  	
  
SELECTION	
  METH...
Model:	
  Forward	
  Regression	
  
(con’t)	
  
Model:	
  Forward	
  Regression	
  
(con’t)	
  
Model:	
  Backward	
  Regression	
  
MODEL	
  OPTIONS	
  -­‐>	
  INPUT	
  CODING	
  -­‐>DEVIATION	
  	
  
SELECTION	
  MET...
Model:	
  Backward	
  Regression	
  
(con’t)	
  
Model:	
  Backward	
  Regression	
  
(con’t)	
  
Model:	
  Stepwise	
  Regression	
  
MODEL	
  OPTIONS	
  -­‐>	
  INPUT	
  CODING	
  -­‐>DEVIATION	
  	
  
SELECTION	
  MET...
Model:	
  Stepwise	
  Regression	
  
(con’t)	
  
Model:	
  Stepwise	
  Regression	
  
(con’t)	
  
Variable	
  Comparison	
  
Forward	
  

Backward	
  

Stepwise	
  

FILE_CARD_GIFT	
  

FILE_CARD_GIFT	
  

FILE_CARD_GIFT...
Model	
  Comparison	
  
(Validation):	
  Cumulative	
  LIFT	
  
	
  
Model	
  Comparison	
  
(Validation):	
  Cumulative	
  LIFT	
  
Inference:	
  
•  Capture	
  top	
  20%	
  of	
  the	
  ma...
Model	
  Comparison	
  (TEST):	
  
Cumulative	
  LIFT	
  
Model	
  Comparison	
  (TEST):	
  
Cumulative	
  LIFT	
  
Inference:	
  
•  Capture	
  top	
  20%	
  of	
  the	
  market	
...
Forward	
  versus	
  Backward	
  	
  
•  Variables:	
  

•  LIFETIME_GIFT_AMOUNT	
  
•  MEDIAN_HOUSEHOLD_INCOME	
  
•  REC...
Model:	
  Forward	
  +	
  
RECENT_RESPONSE_PROP	
  	
  	
  
•  Variable	
  SelecKon	
  (call	
  it	
  Variable_1Extra):	
 ...
Model	
  Comparison	
  
(Validation):	
  Cumulative	
  LIFT	
  
Model	
  Comparison	
  
(Validation):	
  Cumulative	
  LIFT	
  
Inference:	
  
•  Capture	
  top	
  20%	
  of	
  the	
  ma...
Model	
  Comparison	
  (TEST):	
  
Cumulative	
  LIFT	
  
Model	
  Comparison	
  (TEST):	
  
Cumulative	
  LIFT	
  
Inference:	
  
•  Capture	
  top	
  20%	
  of	
  the	
  market	
...
Model:	
  Decision	
  
Final	
  Model:	
  FOR_1EXTRA	
  
	
  
Variables:	
  
• 
• 
• 
• 
• 
• 
• 

FILE_CARD_GIFT	
  
FREQ...
Interaction	
  Terms	
  
•  FREQ_PEP	
  =	
  FREQUENCY_STATUS_97NK	
  *	
  PEP_STAR	
  
•  FREQ_MONTH	
  =	
  FREQUENCY_ST...
Model:	
  Forward	
  Regression	
  
with	
  Interaction	
  Terms	
  
Rename	
  model	
  as	
  FOR_1E_INT	
  
MODEL	
  OPTI...
Model	
  FOR_1E_INT:	
  
Cumulative	
  LIFT	
  
Model	
  FOR_1E_INT:	
  Variable	
  
List	
  
Model	
  Comparison	
  
(Validation):	
  Cumulative	
  LIFT	
  
Model	
  Comparison	
  
(Validation):	
  Cumulative	
  LIFT	
  
Inference:	
  
•  Capture	
  top	
  20%	
  of	
  the	
  ma...
Model	
  Comparison	
  (TEST):	
  
Cumulative	
  LIFT	
  
Model	
  Comparison	
  (TEST):	
  
Cumulative	
  LIFT	
  
Inference:	
  
•  Capture	
  top	
  20%	
  of	
  the	
  market	
...
Model:	
  For_1EXTRA	
  +	
  
Interaction	
  terms	
  	
  	
  
•  Variable	
  SelecKon	
  (call	
  it	
  Variable_UNION):	...
Model	
  Comparison	
  
(Validation):	
  Cumulative	
  LIFT	
  
Model	
  Comparison	
  
(Validation):	
  Cumulative	
  LIFT	
  
Inference:	
  
•  Capture	
  top	
  20%	
  of	
  the	
  ma...
Model	
  Comparison	
  (Test):	
  
Cumulative	
  LIFT	
  
Model	
  Comparison	
  (Test):	
  
Cumulative	
  LIFT	
  
Inference:	
  
•  Capture	
  top	
  20%	
  of	
  the	
  market	
...
Model:	
  Decision	
  
Final	
  Model:	
  FOR_1EXTRA	
  because	
  	
  
•  No	
  significant	
  improvement	
  with	
  othe...
Score:	
  On	
  Donor_Raw_Data	
  
THANK	
  YOU	
  
Upcoming SlideShare
Loading in …5
×

Predictive Model for Customer Segmentation using Database Marketing Techniques

1,105 views

Published on

Develop a predictive model using historical data set DONOR_RAW, which can predict whether the prospect will donate/ not donate.

Data set: DONOR_RAW data set
• 50 Variables
• 19,372 observations

Tools Used:
• SAS Enterprise Miner 4.3
• SAS 9.3_M1

Techniques Used:
• Logistic Regression
• Decision Trees - CHAID

Also introduced Interaction Terms to have a better understanding of the data.

Final Model Selection Analysis based on:
• LIFT Chart

Published in: Marketing, Technology, Business
  • Login to see the comments

Predictive Model for Customer Segmentation using Database Marketing Techniques

  1. 1. Database  Marketing  and  CRM  –   Analyzing  DONOR  data  set     Akanksha  Jain  
  2. 2. Project  Goals   •  Goal:  Using  historical  data  set  DONOR_RAW,  develop  a   model  which  can  predict  whether  the  prospect  will   donate/  not  donate   •  Scope:  DONOR_RAW  data  set   •  50  Variables   •  19,372  observaKons   •  Dependent  Variable:  TARGET_B(Binary)   •  Responder:  1   •  Non-­‐Responder:  0  
  3. 3. TOOLS   •  SAS  Enterprise  Miner  4.3   •  SAS  9.3_M1  
  4. 4. Diagram    
  5. 5. Diagram  (con’t)  
  6. 6. Data  Source   •  Reject  Variables:   •  TARGET_D  (using  TARGET_B  as  target)   •  ID  (an  id  number)   •  WEALTH_RATING  (huge  no.  of  missing  values)   •  Variable  TARGET_B   •  Change  Role  to  TARGET   •  Change  Order  to  DESCENDING   •  Select  complete  data  set  as  Sample   •  Set  Prior  ProbabiliKes     •  Responder:  0.05   •  Non-­‐Responder:  0.95  
  7. 7. Data  Partition   •  Train  –  60%   •  Validate  –  25%   •  Test  –  15%  
  8. 8. Variable  Transformation   Taking  Log  TransformaKon  to  reduce  Skewness   •  LIFETIME_GIFT_RANGE   •  LIFETIME_MAX_GIFT_AMT   •  LIFETIME_MIN_GIFT_AMT   •  MOR_HIT_RATE   •  FILE_AVG_GIFT   •  LIFETIME_AVG_GIFT_AMT   •  PCT_ATTRIBUTE1   •  LAST_GIFT_AMT   •  RECENT_AVG_GIFT_AMT     Keep  all  variables,  original  and  log  transformaKons  
  9. 9. Model:  CHAID   •  Nominal  Criterion:  Chi  Square   •  Significance  Level:  0.1   •  Minimum  number  of  observaKons  in  a  leaf  =  25   •  ObservaKons  required  for  a  split  search  =  55   •  Model  assessment  measure:  Total  Leaf  Impurity  (Gini   Index)  
  10. 10. Model:  CHAID  (con’t)  
  11. 11. Model:  CHAID  (con’t)   Inference:    FREQUENCY_STATUS_97NK  =  3  or  4;   MONTHS_SINCE_LAST_GIFT  <  8.5      1%  =  56%   Less  MarkeKng  Effort  needed  as  most  likely  that   they  will  donate  anyways   FREQUENCY_STATUS_97NK  =  3  or  4;   MONTHS_SINCE_LAST_GIFT  >=  8.5;   NUMBER_PROM_12  <11.5   1%  =  43%     Will  also  donate  but  the  company  should  be   careful  and  not  send  them  too  many  promoKons   FREQUENCY_STATUS_97NK  =  3  or  4;   MONTHS_SINCE_LAST_GIFT  >=  8.5;   NUMBER_PROM_12  >=  11.5   1%  =  30%     Are  geong  too  many  promoKons;  and  hence   company  should  cut  on  sending  them   promoKons   FREQUENCY_STATUS_97NK  =  1,  2  or  Missing   1%  =  21%   Study  them  more  closely  as  in  why  they  are  not   donaKng,  what  other  factors  are  responsible  and   then  decide  how  to  design  a  markeKng   campaign  for  them.    
  12. 12. Variable  Selection   •  Target  AssociaKons:  Select  Chi  Square  
  13. 13. Model:  Forward  Regression   MODEL  OPTIONS  -­‐>  INPUT  CODING  -­‐>DEVIATION     SELECTION  METHOD  -­‐>  FORWARD   CRITERIA  -­‐>  CROSS  VALIDATION  MISCLASSIFICATION   ADVANCED  -­‐>  OPTIMIZATION  METHOD  -­‐>  NEWTON-­‐RAPHSON   w/  LINE  SEARCH   •  SL  Entry:  0.05   •  •  •  • 
  14. 14. Model:  Forward  Regression   (con’t)  
  15. 15. Model:  Forward  Regression   (con’t)  
  16. 16. Model:  Backward  Regression   MODEL  OPTIONS  -­‐>  INPUT  CODING  -­‐>DEVIATION     SELECTION  METHOD  -­‐>  BACKWARD   CRITERIA  -­‐>  CROSS  VALIDATION  MISCLASSIFICATION   ADVANCED  -­‐>  OPTIMIZATION  METHOD  -­‐>  NEWTON-­‐RAPHSON   w/  LINE  SEARCH   •  SL  Stay:  0.05   •  •  •  • 
  17. 17. Model:  Backward  Regression   (con’t)  
  18. 18. Model:  Backward  Regression   (con’t)  
  19. 19. Model:  Stepwise  Regression   MODEL  OPTIONS  -­‐>  INPUT  CODING  -­‐>DEVIATION     SELECTION  METHOD  -­‐>  STEPWISE   CRITERIA  -­‐>  CROSS  VALIDATION  MISCLASSIFICATION   ADVANCED  -­‐>  OPTIMIZATION  METHOD  -­‐>  NEWTON-­‐RAPHSON   w/  LINE  SEARCH   •  SL  Entry:  0.15   •  SL  Stay:  0.05   •  •  •  • 
  20. 20. Model:  Stepwise  Regression   (con’t)  
  21. 21. Model:  Stepwise  Regression   (con’t)  
  22. 22. Variable  Comparison   Forward   Backward   Stepwise   FILE_CARD_GIFT   FILE_CARD_GIFT   FILE_CARD_GIFT   FREQUENCY_STATUS_97NK   FREQUENCY_STATUS_97NK   FREQUENCY_STATUS_97NK   INCOME_GROUP   INCOME_GROUP   INCOME_GROUP   LIFE_AV9*   LIFE_AV9*   LIFE_AV9*   MONTHS_SINCE_LAST_GIFT   MONTHS_SINCE_LAST_GIFT   MONTHS_SINCE_LAST_GIFT   PEP_STAR   PEP_STAR   PEP_STAR   LIFETIME_GIFT_AMOUNT   MEDIAN_HOUSEHOLD_INCOME   RECENT_RESPONSE_PROP   *LIFE_AV9  is  the  log(LIFETIME_AVG_GIFT_AMOUNT)  
  23. 23. Model  Comparison   (Validation):  Cumulative  LIFT    
  24. 24. Model  Comparison   (Validation):  Cumulative  LIFT   Inference:   •  Capture  top  20%  of  the  market  -­‐>FORWARD     •  Capture  top  30%  of  the  market  -­‐>BACKWARD  
  25. 25. Model  Comparison  (TEST):   Cumulative  LIFT  
  26. 26. Model  Comparison  (TEST):   Cumulative  LIFT   Inference:   •  Capture  top  20%  of  the  market  -­‐>FORWARD     •  Capture  top  30%  of  the  market  -­‐>  ANY  
  27. 27. Forward  versus  Backward     •  Variables:   •  LIFETIME_GIFT_AMOUNT   •  MEDIAN_HOUSEHOLD_INCOME   •  RECENT_RESPONSE_PROP   •  CorrelaKons:   •  MEDIAN_HOUSEHOLD_INCOME  and  INCOME_GROUP  =  43%   •  LIFE_AV9  and  LIFETIME_AVG_GIFT_AMT  =  83%     •  FILE_CARD_GIFT  and  RECENT_RESPONSE_PROP  =  30%  
  28. 28. Model:  Forward  +   RECENT_RESPONSE_PROP       •  Variable  SelecKon  (call  it  Variable_1Extra):   •  •  •  •  •  •  •  FILE_CARD_GIFT   FREQUENCY_STATUS_97NK   INCOME_GROUP   LIFE_AV9   MONTHS_SINCE_LAST_GIFT   PEP_STAR   RECENT_RESPONSE_PROP   •  Reject  other  variables  manually   •  Call  this  model  For_1Extra  
  29. 29. Model  Comparison   (Validation):  Cumulative  LIFT  
  30. 30. Model  Comparison   (Validation):  Cumulative  LIFT   Inference:   •  Capture  top  20%  of  the  market  -­‐>For_1Extra   •  Capture  top  30%  of  the  market  -­‐>  BACKWARD  
  31. 31. Model  Comparison  (TEST):   Cumulative  LIFT  
  32. 32. Model  Comparison  (TEST):   Cumulative  LIFT   Inference:   •  Capture  top  20%  of  the  market  -­‐>For_1Extra   •  Capture  top  30%  of  the  market  -­‐>  ANY    
  33. 33. Model:  Decision   Final  Model:  FOR_1EXTRA     Variables:   •  •  •  •  •  •  •  FILE_CARD_GIFT   FREQUENCY_STATUS_97NK   INCOME_GROUP   LIFE_AV9   MONTHS_SINCE_LAST_GIFT   PEP_STAR   RECENT_RESPONSE_PROP  
  34. 34. Interaction  Terms   •  FREQ_PEP  =  FREQUENCY_STATUS_97NK  *  PEP_STAR   •  FREQ_MONTH  =  FREQUENCY_STATUS_97NK  *   MONTHS_SINCE_LAST_GIFT     •  FREQ_INCOME  =  FREQUENCY_STATUS_97NK  *   INCOME_GROUP  
  35. 35. Model:  Forward  Regression   with  Interaction  Terms   Rename  model  as  FOR_1E_INT   MODEL  OPTIONS  -­‐>  INPUT  CODING  -­‐>DEVIATION     SELECTION  METHOD  -­‐>  FORWARD   CRITERIA  -­‐>  CROSS  VALIDATION  MISCLASSIFICATION   ADVANCED  -­‐>  OPTIMIZATION  METHOD  -­‐>  NEWTON-­‐RAPHSON   w/  LINE  SEARCH   •  SL  Entry:  0.05   •  •  •  •  • 
  36. 36. Model  FOR_1E_INT:   Cumulative  LIFT  
  37. 37. Model  FOR_1E_INT:  Variable   List  
  38. 38. Model  Comparison   (Validation):  Cumulative  LIFT  
  39. 39. Model  Comparison   (Validation):  Cumulative  LIFT   Inference:   •  Capture  top  20%  of  the  market  -­‐>FOR_1E_INT   •  Capture  top  30%  of  the  market  -­‐>  FOR_1EXTRA  
  40. 40. Model  Comparison  (TEST):   Cumulative  LIFT  
  41. 41. Model  Comparison  (TEST):   Cumulative  LIFT   Inference:   •  Capture  top  20%  of  the  market  -­‐>ANY   •  Capture  top  30%  of  the  market  -­‐>  FOR_1EXTRA  
  42. 42. Model:  For_1EXTRA  +   Interaction  terms       •  Variable  SelecKon  (call  it  Variable_UNION):   •  •  •  •  •  •  •  •  •  •    FILE_CARD_GIFT   FREQENCY_STATUS_97NK   INCOME_GROUP   LIFE_AV9   MONTHS_SINCE_LAST_GIFT   PEP_STAR   RECENT_RESPONSE_PROP   FREQ_PEP     FREQ_MONTH     FREQ_INCOME   •  Reject  other  variables  manually   •  Call  this  model  For_Union  
  43. 43. Model  Comparison   (Validation):  Cumulative  LIFT  
  44. 44. Model  Comparison   (Validation):  Cumulative  LIFT   Inference:   •  Capture  top  20%  of  the  market  -­‐>  FOR_1E_INT   •  Capture  top  30%  of  the  market  -­‐>  FOR_UNION  
  45. 45. Model  Comparison  (Test):   Cumulative  LIFT  
  46. 46. Model  Comparison  (Test):   Cumulative  LIFT   Inference:   •  Capture  top  20%  of  the  market  -­‐>  ANY   •  Capture  top  30%  of  the  market  -­‐>  FOR_UNION  
  47. 47. Model:  Decision   Final  Model:  FOR_1EXTRA  because     •  No  significant  improvement  with  other  models   •  InteracKon  terms  bring  along  complexity     Variables:   •  •  •  •  •  •  •  FILE_CARD_GIFT   FREQUENCY_STATUS_97NK   INCOME_GROUP   LIFE_AV9   MONTHS_SINCE_LAST_GIFT   PEP_STAR   RECENT_RESPONSE_PROP  
  48. 48. Score:  On  Donor_Raw_Data  
  49. 49. THANK  YOU  

×