SlideShare a Scribd company logo
1 of 46
Modeling and Analysis for the Non-Statistician Presented by: Andrew Curtis Vice President Richard Pless Consultant
1 Models are developed using a six-step process. % Effort 1.  Research Design					10% 2.  Data Checking and Variable Creation		30  3.  Create Analysis Files					30  4.  Calibrate Scoring Model				10  5.  Model Evaluation					10 6.  Model Implementation				10 	 1. Research Design
2 Research design requires the input of both marketers and analysts. Is the problem solvable through modeling? Do we have representative promotions from which to develop a model? Do we need to be concerned about selection bias? Will we be able to pull all the information we need to score the model off of our database in a timely manner? 1. Research Design
3 Research Design--Unsolvable Problems Prospecting models for niche marketer.    Some lists work really well.   All others are unprofitable, even in the first 		decile. Finding all prospective buyers.   Impossible to accurately predict all behavior.   All models leave some revenue on the table. 1. Research Design
4 Research Design--Unrepresentative Promotions. Album promotion during a major tour. Retail sale announcement during major clearance.  Veterans magazine solicitation during the Gulf War. 1. Research Design
5 Research Design--Selection Bias. The model is built off a series of mailings for business-appropriate suits, dresses, and accessories. The mailings were mailed to women only. If the resulting model is put into production without the gender pre-screen, then males will end up getting contacted, probably quite unprofitably. 1. Research Design
6 Research Design--Timely Scoring Data. The model looks for number of Web applicants from a given ZIP code in the prior week but the data can only be pulled monthly. At best, the model can only be scored accurately once a month. The predictor which uses the information is ineffective. 1. Research Design
7 Rule #1 Garbage In  Garbage Out!Bad Data In  Bad Models Out! Analysis is only good as the data being analyzed. All input data must be checked for reasonableness, timeliness, and completeness.  Information extracted from multiple sources must be verified that all data are appended to the “master file” appropriately.   You must engage in on-going quality control! 2. Data Checking
8 Study and scrutinize the data dictionary! Understand every field in the database. Eliminate fields that are too new, poorly filled, or unrealiable. Look at distributions of values for each field.    Know what every field means.  Understand every value in the field.  If there a “Z”, find out what “Z” means. Work with the finance to define the business rules for properly counting orders, revenue, and other business drivers. 2. Data Checking
9 Clean the data when appropriate. Models are driven by underlying data patterns.   Bad patterns lead to bad models. Correct data/variables with:  Anomalies  Missing values   Outliers  Errors. 2. Data Checking
10 Data Checking--Example of an anomaly. Dollars per Contact Over Five Mailings 2. Data Checking
11 Data Checking--Missing Data Example. Response Rates by Age 2. Data Checking
12 Data Checking--Outliers Example The “Michael Jordan” example.  Individual credit card holders with $200,000 lines of credit.  The department store employee with 100 shopping trips a year.  2. Data Checking
13 Data Checking--Errors pose a tremendous risk for the modeler. Commonly Occurring Errors: Response data from a prior mailing incorrectly matched back to the customer file.   Changes in meaning or usage of a particular variable.   Alpha characters in supposedly numeric variable fields.  2. Data Checking
14 Variable creation captures the dynamics of the business. Use creativity to create predictor variables. Predictor variables typically come in three classes: Recency—the time elapsed since an action. Frequency—the number of times an event has  	    happen, e.g. orders, clicked on a web page etc. Monetary—the amount of money spent purchasing goods and services. Use ratios and cross variables to identify meaningful interactions between variables. 2b. Variable Creation
15 Predictor Variable Creation--Example Monetary Sum of Revenue = $500 Frequency Count Order Dates= 6 Orders Recency (11/14/01 – 8/17/01) =  89 Days or 3 Months! 2b. Variable Creation
16 Predictor Variable Creation--Example Average Order Size = $500 / 6 Orders Total Books = 4 Total DVDS = 1 Total Electronics = 1 Percent Gift Purchases= 2 / 6 = 33% Recency in Books  (11/14/01 – 6/1/01) =  166 Days or 5.5 Months! 2b. Variable Creation
17 Selecting a Target Variable Make sure your target variable will give you the type of results you want.   Measuring response: may get a lot of hand- 	 raisers that are not profitable.   Measuring profit: by focusing only on the   dollars, you may miss a viable low-profit group.  Isolate all information gathered during the target period from being included as a predictor variable. 2b. Variable Creation
18 Analysis files have three time frames: Predictor Period—The time before individuals are selected for a marketing contact.  All predictor variables must contain only data from this period. Gap Period—The time between the selection date and when the first response is recorded. Target Period—The time between the first and last response date.   All target variables must only contain information from this period.                  Predictor Period                                Gap Period            Target Period  Selection Date First Response Date Last Response Date 3. Create Analysis Files
19 Good models are developed with modeling and validation samples.  Before modeling begins, split the analysis file into two random subsets: modeling and validation. Develop the model using only the modeling subset. Test the robustness and accuracy of the model using the validation subset. Techniques exist for handling validation when analysis sample is too small to split.  3. Create Analysis Files
20 The appropriate modeling technique is driven by several factors.   The nature of the target variable.  The software that is supported in the production environment.   The skills of the analytical team. 4. Model Calibration
21 No modeling technique should operate on autopilot.   The analyst developing the model must: Know how to use the modeling technique. Know how to interpret the results. Know a “cringe variable” when they see one.  Know how the model will be used by the marketers. Without a pilot, even the most sophisticated plane will crash. 4. Model Calibration
22 Scoring models can be built using many different techniques. Linear regression  Logistic regression Discriminant analysis Neural networks  Many, many more... All can be used as predictors of future behavior. 4. Model Calibration
23 Model Calibration Rule #1 If you want to get famous, talk about technique. If you want a great model, concentrate on  “the other 90 percent.” 4. Model Calibration
24 Corollary to Rule #1 Regardless of your technique of choice,  if you short-change “the other 90 percent,” you will probably end up with a lousy model. 4. Model Calibration
25 Construction analogy Throw several power tools onto a pile of lumber,  come back in a month, and -- presto –  you will NOT have a house. 4. Model Calibration
26 Linear Regression is best suited for continuous outcomes, such as sales. Output can be understood by non-statisticians.  Each name is assigned an estimated value.  Scored population is easily ranked with respect to the target variable (sales, profits, etc.).  Does not automatically identify interactions between predictor variables. 4. Model Calibration
27 Linear Regression Example Scoring Model for Predicting Monthly Revenue Score =	     0.08 		  + 0.06 * House Value (Estimated in $Thousands) 	            - 0.20 * Number of Children  		  + 0.10 * Average Credit Card Limit (in $Thousands) 	             - 0.30 * Number of Autos JohnJenniferYOU House Value?      $150,000	$125,000 No. of Kids?	 2		0  Ave Limit?            $15,000	$8,000 No. of Cars?	 2 		1 Score                    $9.58                  $8.08 4. Model Calibration
28 Logistic regression is best suited for binary outcomes, such as buy/no buy. Output can be understood by non-statisticians.  Each name is assigned a probability of performing the expected outcome that is NOT a prediction of future performance.  Scored population is easily ranked with respect to likelihood of displaying the targeted behavior.  Does not automatically identify interactions between predictor variables. 4. Model Calibration
29 Logistic Regression Example Scoring Model for Predicting Likelihood to Purchase (Yes/No) Score = 		0.01 		           + 0.04 * Person Owns Home (1=yes,0=no) 		            - 0.05 * Number of Credit Cards  		           + 0.01 * Income (Estimated in $Thousands) 		            - 0.02 * Age Probability Fix =   1  /  [1 + Exponent(-Score)] JohnJenniferYOU Owns Home?      No		    Yes No. of Cards?	6		    3  Income?	$40,000	    $25,000 Age?		45 		     35 Score                   -0.79 (prob=31%)    -0.55 (prob=37%) 4. Model Calibration
30 Neural networks can be used with either binary or continuous targets. No restrictions on the type or structure of either the target variable or the historical variables. Can more easily capture interactions between predictor variables. Output is very difficult to explain.  Implementation can be difficult. Models don’t always outperform traditional regression. 4. Model Calibration
31 When done well, scoring models are smooth with few, if any clumps. Target behaviors of the scored names distribute on a “Gains Table” smoothly from highest to lowest. This makes it easier to target a precise number of names, or to select down to a precise threshold of response or profit. 5. Model Evaluation
32 Understanding the Lift Table Start by ranking all customers by their descending scores and observing the number of responders in each “decile.” 5. Model Evaluation
33 Next, calculate response rates for each decile. 5. Model Evaluation
34 Then, calculate the percent of all respondents that are in each decile. 5. Model Evaluation
35 Sum down the columns to calculate cumulative totals. 5. Model Evaluation
36 Calculate cumulative response Rate and percentage of response rates. 5. Model Evaluation
37 Lift is the ratio of cum response rate to the overall response rate = 0.310 5. Model Evaluation
38 Gains tables can show performance for both response and revenue. 5. Model Evaluation
39 Graphical displays of the lift table are easy to follow. 5. Model Evaluation
40 With cost figures, the gains table can be expanded to show profit. 5. Model Evaluation
41 In this example, profit peaks around a mail quantity of 60,000. 5. Model Evaluation
42 The production algorithm translates the model into the production environment.  The model is worthless without proper implementation.  Goal: create identical production and model algorithms.  Involve the production people.  Involve the marketers. 6. Model Implementation
43 Quality control procedures ensure the model is applied correctly every time.  Develop audit trail reports that highlight potential problems.  Look for model degradation over time.  Develop mini-profiles of each scoring decile and compare over time.  6. Model Implementation
44 Testing should always be done to continually validate assumptions.  The secret of determining the success of the model used for direct marketing is through tracking the results of its use in-market. Each cell must be measured as well as the overall.  For scoring models, this means that ‘cells’ must be created, usually deciles or percentiles. Each group is marked and tracked.  The performance can be compared to each other and to expected. 6. Model Implementation
45 Focus not only on overall performance, but also at the margin. If you are losing money at the margin, too many unprofitable names are being contacted.  If you are making money at the margin, you may be leaving profits on the table.   Common sense and company policy will guide you to a target marginal ROI. 6. Model Implementation

More Related Content

What's hot

Predictive Analysis can help you Combat Employee Attrition! Learn how?
Predictive Analysis can help you Combat Employee Attrition! Learn how?Predictive Analysis can help you Combat Employee Attrition! Learn how?
Predictive Analysis can help you Combat Employee Attrition! Learn how?Edureka!
 
How to retain and motivate Staff DaNang December 2013
How to retain and motivate Staff DaNang December 2013How to retain and motivate Staff DaNang December 2013
How to retain and motivate Staff DaNang December 2013Tom Vovers
 
Unit 3 hr analytics
Unit   3 hr analyticsUnit   3 hr analytics
Unit 3 hr analyticsVijay K S
 
Hr metrics guide
Hr metrics guideHr metrics guide
Hr metrics guidelauricomoli
 
Wastage analysis
Wastage analysisWastage analysis
Wastage analysisanjupoonia
 
Emphases on external & internal supply of candidate in organization
Emphases on external & internal supply of candidate in organization Emphases on external & internal supply of candidate in organization
Emphases on external & internal supply of candidate in organization wasifjanjua
 
Unit 4 HR Analytics
Unit   4 HR AnalyticsUnit   4 HR Analytics
Unit 4 HR AnalyticsVijay K S
 
HUMAN RESOURCE ANALYTICS METRICS
HUMAN RESOURCE ANALYTICS METRICSHUMAN RESOURCE ANALYTICS METRICS
HUMAN RESOURCE ANALYTICS METRICSHoney Ramchandani
 
Data visualization via Tableau
Data visualization via TableauData visualization via Tableau
Data visualization via Tableaukahhuey
 
Business Scenario: Human Capital Management
Business Scenario: Human Capital ManagementBusiness Scenario: Human Capital Management
Business Scenario: Human Capital ManagementiHCM
 
Module 12: Job Classification & Merit Increase System
Module 12: Job Classification & Merit Increase SystemModule 12: Job Classification & Merit Increase System
Module 12: Job Classification & Merit Increase SystemSam Pratt
 
Actionable results to enhance Employee satisfaction score analysis via Tableau
Actionable results to enhance Employee satisfaction score analysis via TableauActionable results to enhance Employee satisfaction score analysis via Tableau
Actionable results to enhance Employee satisfaction score analysis via TableauShruti Nigam (CWM, AFP)
 
Business intelligence and data analytic for value realization
Business intelligence and data analytic for value realization Business intelligence and data analytic for value realization
Business intelligence and data analytic for value realization iyke ezeugo
 
Chapter 5 recruitment
Chapter 5 recruitmentChapter 5 recruitment
Chapter 5 recruitmentRachelle Rona
 
Selection and recruitment
Selection and recruitmentSelection and recruitment
Selection and recruitmentShubham Singhal
 

What's hot (20)

Predictive Analysis can help you Combat Employee Attrition! Learn how?
Predictive Analysis can help you Combat Employee Attrition! Learn how?Predictive Analysis can help you Combat Employee Attrition! Learn how?
Predictive Analysis can help you Combat Employee Attrition! Learn how?
 
How to retain and motivate Staff DaNang December 2013
How to retain and motivate Staff DaNang December 2013How to retain and motivate Staff DaNang December 2013
How to retain and motivate Staff DaNang December 2013
 
Unit 3 hr analytics
Unit   3 hr analyticsUnit   3 hr analytics
Unit 3 hr analytics
 
Demand forecasting1
Demand forecasting1Demand forecasting1
Demand forecasting1
 
Labour market analysis
Labour market analysisLabour market analysis
Labour market analysis
 
HR Programs
HR ProgramsHR Programs
HR Programs
 
Hr metrics guide
Hr metrics guideHr metrics guide
Hr metrics guide
 
Wastage analysis
Wastage analysisWastage analysis
Wastage analysis
 
Attrition
AttritionAttrition
Attrition
 
Emphases on external & internal supply of candidate in organization
Emphases on external & internal supply of candidate in organization Emphases on external & internal supply of candidate in organization
Emphases on external & internal supply of candidate in organization
 
Unit 4 HR Analytics
Unit   4 HR AnalyticsUnit   4 HR Analytics
Unit 4 HR Analytics
 
HUMAN RESOURCE ANALYTICS METRICS
HUMAN RESOURCE ANALYTICS METRICSHUMAN RESOURCE ANALYTICS METRICS
HUMAN RESOURCE ANALYTICS METRICS
 
Hr Analytics
Hr AnalyticsHr Analytics
Hr Analytics
 
Data visualization via Tableau
Data visualization via TableauData visualization via Tableau
Data visualization via Tableau
 
Business Scenario: Human Capital Management
Business Scenario: Human Capital ManagementBusiness Scenario: Human Capital Management
Business Scenario: Human Capital Management
 
Module 12: Job Classification & Merit Increase System
Module 12: Job Classification & Merit Increase SystemModule 12: Job Classification & Merit Increase System
Module 12: Job Classification & Merit Increase System
 
Actionable results to enhance Employee satisfaction score analysis via Tableau
Actionable results to enhance Employee satisfaction score analysis via TableauActionable results to enhance Employee satisfaction score analysis via Tableau
Actionable results to enhance Employee satisfaction score analysis via Tableau
 
Business intelligence and data analytic for value realization
Business intelligence and data analytic for value realization Business intelligence and data analytic for value realization
Business intelligence and data analytic for value realization
 
Chapter 5 recruitment
Chapter 5 recruitmentChapter 5 recruitment
Chapter 5 recruitment
 
Selection and recruitment
Selection and recruitmentSelection and recruitment
Selection and recruitment
 

Similar to Modeling for the Non-Statistician

ML game metrics monitoring system launch / Aleksandr Tolmachev (Xsolla)
ML game metrics monitoring system launch / Aleksandr Tolmachev (Xsolla)ML game metrics monitoring system launch / Aleksandr Tolmachev (Xsolla)
ML game metrics monitoring system launch / Aleksandr Tolmachev (Xsolla)DevGAMM Conference
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxkprasad8
 
MonetizingStatistics
MonetizingStatisticsMonetizingStatistics
MonetizingStatisticsAaron Sankey
 
Survey analytics conjointanalysis_1
Survey analytics conjointanalysis_1Survey analytics conjointanalysis_1
Survey analytics conjointanalysis_1QuestionPro
 
Discussion Questions Chapter 15Terms in Review1Define or exp.docx
Discussion Questions Chapter 15Terms in Review1Define or exp.docxDiscussion Questions Chapter 15Terms in Review1Define or exp.docx
Discussion Questions Chapter 15Terms in Review1Define or exp.docxedgar6wallace88877
 
Data Analyst Interview Questions & Answers
Data Analyst Interview Questions & AnswersData Analyst Interview Questions & Answers
Data Analyst Interview Questions & AnswersSatyam Jaiswal
 
How to Run Discrete Choice Conjoint Analysis
How to Run Discrete Choice Conjoint AnalysisHow to Run Discrete Choice Conjoint Analysis
How to Run Discrete Choice Conjoint AnalysisQuestionPro
 
Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsProduct School
 
Gentrepreneur DAY: Market Research
Gentrepreneur DAY: Market ResearchGentrepreneur DAY: Market Research
Gentrepreneur DAY: Market ResearchGentrepreneur
 
8 rajib chakravorty risk
8 rajib chakravorty risk8 rajib chakravorty risk
8 rajib chakravorty riskCCR-interactive
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveHyderabad Scalability Meetup
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaTrushita Redij
 
Forward-Looking ALLL: Computing Qualitative Adjustments
Forward-Looking ALLL: Computing Qualitative AdjustmentsForward-Looking ALLL: Computing Qualitative Adjustments
Forward-Looking ALLL: Computing Qualitative AdjustmentsLibby Bierman
 
Chainsaw Conjoint
Chainsaw ConjointChainsaw Conjoint
Chainsaw ConjointQuestionPro
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Intro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdfIntro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdfMachineLearning22
 
Chapter 1 Introduction to Business Analytics.pdf
Chapter 1 Introduction to Business Analytics.pdfChapter 1 Introduction to Business Analytics.pdf
Chapter 1 Introduction to Business Analytics.pdfShamshadAli58
 

Similar to Modeling for the Non-Statistician (20)

ML game metrics monitoring system launch / Aleksandr Tolmachev (Xsolla)
ML game metrics monitoring system launch / Aleksandr Tolmachev (Xsolla)ML game metrics monitoring system launch / Aleksandr Tolmachev (Xsolla)
ML game metrics monitoring system launch / Aleksandr Tolmachev (Xsolla)
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
Analytics
AnalyticsAnalytics
Analytics
 
MonetizingStatistics
MonetizingStatisticsMonetizingStatistics
MonetizingStatistics
 
Survey analytics conjointanalysis_1
Survey analytics conjointanalysis_1Survey analytics conjointanalysis_1
Survey analytics conjointanalysis_1
 
Discussion Questions Chapter 15Terms in Review1Define or exp.docx
Discussion Questions Chapter 15Terms in Review1Define or exp.docxDiscussion Questions Chapter 15Terms in Review1Define or exp.docx
Discussion Questions Chapter 15Terms in Review1Define or exp.docx
 
Data Analyst Interview Questions & Answers
Data Analyst Interview Questions & AnswersData Analyst Interview Questions & Answers
Data Analyst Interview Questions & Answers
 
How to Run Discrete Choice Conjoint Analysis
How to Run Discrete Choice Conjoint AnalysisHow to Run Discrete Choice Conjoint Analysis
How to Run Discrete Choice Conjoint Analysis
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data Decisions
 
Gentrepreneur DAY: Market Research
Gentrepreneur DAY: Market ResearchGentrepreneur DAY: Market Research
Gentrepreneur DAY: Market Research
 
8 rajib chakravorty risk
8 rajib chakravorty risk8 rajib chakravorty risk
8 rajib chakravorty risk
 
Business Analytics.pptx
Business Analytics.pptxBusiness Analytics.pptx
Business Analytics.pptx
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_Trushita
 
Forward-Looking ALLL: Computing Qualitative Adjustments
Forward-Looking ALLL: Computing Qualitative AdjustmentsForward-Looking ALLL: Computing Qualitative Adjustments
Forward-Looking ALLL: Computing Qualitative Adjustments
 
Chainsaw Conjoint
Chainsaw ConjointChainsaw Conjoint
Chainsaw Conjoint
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Intro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdfIntro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdf
 
Chapter 1 Introduction to Business Analytics.pdf
Chapter 1 Introduction to Business Analytics.pdfChapter 1 Introduction to Business Analytics.pdf
Chapter 1 Introduction to Business Analytics.pdf
 

Modeling for the Non-Statistician

  • 1. Modeling and Analysis for the Non-Statistician Presented by: Andrew Curtis Vice President Richard Pless Consultant
  • 2. 1 Models are developed using a six-step process. % Effort 1. Research Design 10% 2. Data Checking and Variable Creation 30 3. Create Analysis Files 30 4. Calibrate Scoring Model 10 5. Model Evaluation 10 6. Model Implementation 10 1. Research Design
  • 3. 2 Research design requires the input of both marketers and analysts. Is the problem solvable through modeling? Do we have representative promotions from which to develop a model? Do we need to be concerned about selection bias? Will we be able to pull all the information we need to score the model off of our database in a timely manner? 1. Research Design
  • 4. 3 Research Design--Unsolvable Problems Prospecting models for niche marketer. Some lists work really well. All others are unprofitable, even in the first decile. Finding all prospective buyers. Impossible to accurately predict all behavior. All models leave some revenue on the table. 1. Research Design
  • 5. 4 Research Design--Unrepresentative Promotions. Album promotion during a major tour. Retail sale announcement during major clearance. Veterans magazine solicitation during the Gulf War. 1. Research Design
  • 6. 5 Research Design--Selection Bias. The model is built off a series of mailings for business-appropriate suits, dresses, and accessories. The mailings were mailed to women only. If the resulting model is put into production without the gender pre-screen, then males will end up getting contacted, probably quite unprofitably. 1. Research Design
  • 7. 6 Research Design--Timely Scoring Data. The model looks for number of Web applicants from a given ZIP code in the prior week but the data can only be pulled monthly. At best, the model can only be scored accurately once a month. The predictor which uses the information is ineffective. 1. Research Design
  • 8. 7 Rule #1 Garbage In  Garbage Out!Bad Data In  Bad Models Out! Analysis is only good as the data being analyzed. All input data must be checked for reasonableness, timeliness, and completeness. Information extracted from multiple sources must be verified that all data are appended to the “master file” appropriately. You must engage in on-going quality control! 2. Data Checking
  • 9. 8 Study and scrutinize the data dictionary! Understand every field in the database. Eliminate fields that are too new, poorly filled, or unrealiable. Look at distributions of values for each field. Know what every field means. Understand every value in the field. If there a “Z”, find out what “Z” means. Work with the finance to define the business rules for properly counting orders, revenue, and other business drivers. 2. Data Checking
  • 10. 9 Clean the data when appropriate. Models are driven by underlying data patterns. Bad patterns lead to bad models. Correct data/variables with: Anomalies Missing values Outliers Errors. 2. Data Checking
  • 11. 10 Data Checking--Example of an anomaly. Dollars per Contact Over Five Mailings 2. Data Checking
  • 12. 11 Data Checking--Missing Data Example. Response Rates by Age 2. Data Checking
  • 13. 12 Data Checking--Outliers Example The “Michael Jordan” example. Individual credit card holders with $200,000 lines of credit. The department store employee with 100 shopping trips a year. 2. Data Checking
  • 14. 13 Data Checking--Errors pose a tremendous risk for the modeler. Commonly Occurring Errors: Response data from a prior mailing incorrectly matched back to the customer file. Changes in meaning or usage of a particular variable. Alpha characters in supposedly numeric variable fields. 2. Data Checking
  • 15. 14 Variable creation captures the dynamics of the business. Use creativity to create predictor variables. Predictor variables typically come in three classes: Recency—the time elapsed since an action. Frequency—the number of times an event has happen, e.g. orders, clicked on a web page etc. Monetary—the amount of money spent purchasing goods and services. Use ratios and cross variables to identify meaningful interactions between variables. 2b. Variable Creation
  • 16. 15 Predictor Variable Creation--Example Monetary Sum of Revenue = $500 Frequency Count Order Dates= 6 Orders Recency (11/14/01 – 8/17/01) = 89 Days or 3 Months! 2b. Variable Creation
  • 17. 16 Predictor Variable Creation--Example Average Order Size = $500 / 6 Orders Total Books = 4 Total DVDS = 1 Total Electronics = 1 Percent Gift Purchases= 2 / 6 = 33% Recency in Books (11/14/01 – 6/1/01) = 166 Days or 5.5 Months! 2b. Variable Creation
  • 18. 17 Selecting a Target Variable Make sure your target variable will give you the type of results you want. Measuring response: may get a lot of hand- raisers that are not profitable. Measuring profit: by focusing only on the dollars, you may miss a viable low-profit group. Isolate all information gathered during the target period from being included as a predictor variable. 2b. Variable Creation
  • 19. 18 Analysis files have three time frames: Predictor Period—The time before individuals are selected for a marketing contact. All predictor variables must contain only data from this period. Gap Period—The time between the selection date and when the first response is recorded. Target Period—The time between the first and last response date. All target variables must only contain information from this period. Predictor Period Gap Period Target Period Selection Date First Response Date Last Response Date 3. Create Analysis Files
  • 20. 19 Good models are developed with modeling and validation samples. Before modeling begins, split the analysis file into two random subsets: modeling and validation. Develop the model using only the modeling subset. Test the robustness and accuracy of the model using the validation subset. Techniques exist for handling validation when analysis sample is too small to split. 3. Create Analysis Files
  • 21. 20 The appropriate modeling technique is driven by several factors. The nature of the target variable. The software that is supported in the production environment. The skills of the analytical team. 4. Model Calibration
  • 22. 21 No modeling technique should operate on autopilot. The analyst developing the model must: Know how to use the modeling technique. Know how to interpret the results. Know a “cringe variable” when they see one. Know how the model will be used by the marketers. Without a pilot, even the most sophisticated plane will crash. 4. Model Calibration
  • 23. 22 Scoring models can be built using many different techniques. Linear regression Logistic regression Discriminant analysis Neural networks Many, many more... All can be used as predictors of future behavior. 4. Model Calibration
  • 24. 23 Model Calibration Rule #1 If you want to get famous, talk about technique. If you want a great model, concentrate on “the other 90 percent.” 4. Model Calibration
  • 25. 24 Corollary to Rule #1 Regardless of your technique of choice, if you short-change “the other 90 percent,” you will probably end up with a lousy model. 4. Model Calibration
  • 26. 25 Construction analogy Throw several power tools onto a pile of lumber, come back in a month, and -- presto – you will NOT have a house. 4. Model Calibration
  • 27. 26 Linear Regression is best suited for continuous outcomes, such as sales. Output can be understood by non-statisticians. Each name is assigned an estimated value. Scored population is easily ranked with respect to the target variable (sales, profits, etc.). Does not automatically identify interactions between predictor variables. 4. Model Calibration
  • 28. 27 Linear Regression Example Scoring Model for Predicting Monthly Revenue Score = 0.08 + 0.06 * House Value (Estimated in $Thousands) - 0.20 * Number of Children + 0.10 * Average Credit Card Limit (in $Thousands) - 0.30 * Number of Autos JohnJenniferYOU House Value? $150,000 $125,000 No. of Kids? 2 0 Ave Limit? $15,000 $8,000 No. of Cars? 2 1 Score $9.58 $8.08 4. Model Calibration
  • 29. 28 Logistic regression is best suited for binary outcomes, such as buy/no buy. Output can be understood by non-statisticians. Each name is assigned a probability of performing the expected outcome that is NOT a prediction of future performance. Scored population is easily ranked with respect to likelihood of displaying the targeted behavior. Does not automatically identify interactions between predictor variables. 4. Model Calibration
  • 30. 29 Logistic Regression Example Scoring Model for Predicting Likelihood to Purchase (Yes/No) Score = 0.01 + 0.04 * Person Owns Home (1=yes,0=no) - 0.05 * Number of Credit Cards + 0.01 * Income (Estimated in $Thousands) - 0.02 * Age Probability Fix = 1 / [1 + Exponent(-Score)] JohnJenniferYOU Owns Home? No Yes No. of Cards? 6 3 Income? $40,000 $25,000 Age? 45 35 Score -0.79 (prob=31%) -0.55 (prob=37%) 4. Model Calibration
  • 31. 30 Neural networks can be used with either binary or continuous targets. No restrictions on the type or structure of either the target variable or the historical variables. Can more easily capture interactions between predictor variables. Output is very difficult to explain. Implementation can be difficult. Models don’t always outperform traditional regression. 4. Model Calibration
  • 32. 31 When done well, scoring models are smooth with few, if any clumps. Target behaviors of the scored names distribute on a “Gains Table” smoothly from highest to lowest. This makes it easier to target a precise number of names, or to select down to a precise threshold of response or profit. 5. Model Evaluation
  • 33. 32 Understanding the Lift Table Start by ranking all customers by their descending scores and observing the number of responders in each “decile.” 5. Model Evaluation
  • 34. 33 Next, calculate response rates for each decile. 5. Model Evaluation
  • 35. 34 Then, calculate the percent of all respondents that are in each decile. 5. Model Evaluation
  • 36. 35 Sum down the columns to calculate cumulative totals. 5. Model Evaluation
  • 37. 36 Calculate cumulative response Rate and percentage of response rates. 5. Model Evaluation
  • 38. 37 Lift is the ratio of cum response rate to the overall response rate = 0.310 5. Model Evaluation
  • 39. 38 Gains tables can show performance for both response and revenue. 5. Model Evaluation
  • 40. 39 Graphical displays of the lift table are easy to follow. 5. Model Evaluation
  • 41. 40 With cost figures, the gains table can be expanded to show profit. 5. Model Evaluation
  • 42. 41 In this example, profit peaks around a mail quantity of 60,000. 5. Model Evaluation
  • 43. 42 The production algorithm translates the model into the production environment. The model is worthless without proper implementation. Goal: create identical production and model algorithms. Involve the production people. Involve the marketers. 6. Model Implementation
  • 44. 43 Quality control procedures ensure the model is applied correctly every time. Develop audit trail reports that highlight potential problems. Look for model degradation over time. Develop mini-profiles of each scoring decile and compare over time. 6. Model Implementation
  • 45. 44 Testing should always be done to continually validate assumptions. The secret of determining the success of the model used for direct marketing is through tracking the results of its use in-market. Each cell must be measured as well as the overall. For scoring models, this means that ‘cells’ must be created, usually deciles or percentiles. Each group is marked and tracked. The performance can be compared to each other and to expected. 6. Model Implementation
  • 46. 45 Focus not only on overall performance, but also at the margin. If you are losing money at the margin, too many unprofitable names are being contacted. If you are making money at the margin, you may be leaving profits on the table. Common sense and company policy will guide you to a target marginal ROI. 6. Model Implementation