Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

930 views

Published on

Step-by-step guide to prepare customer data for modeling.

No Downloads

Total views

930

On SlideShare

0

From Embeds

0

Number of Embeds

112

Shares

0

Downloads

25

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Modeling and Analysis for the Non-Statistician<br />Presented by:<br />Andrew Curtis<br />Vice President<br />Richard Pless<br />Consultant<br />
- 2. 1<br />Models are developed using a six-step process.<br />% Effort<br />1. Research Design 10%<br />2. Data Checking and Variable Creation 30 <br />3. Create Analysis Files 30 <br />4. Calibrate Scoring Model 10 <br />5. Model Evaluation 10<br />6. Model Implementation 10 <br />1. Research Design<br />
- 3. 2<br />Research design requires the input of both marketers and analysts.<br />Is the problem solvable through modeling?<br />Do we have representative promotions from which to develop a model?<br />Do we need to be concerned about selection bias?<br />Will we be able to pull all the information we need to score the model off of our database in a timely manner?<br />1. Research Design<br />
- 4. 3<br />Research Design--Unsolvable Problems<br />Prospecting models for niche marketer. <br /> Some lists work really well.<br /> All others are unprofitable, even in the first decile.<br />Finding all prospective buyers.<br /> Impossible to accurately predict all behavior.<br /> All models leave some revenue on the table.<br />1. Research Design<br />
- 5. 4<br />Research Design--Unrepresentative Promotions.<br />Album promotion during a major tour.<br />Retail sale announcement during major clearance. <br />Veterans magazine solicitation during the Gulf War.<br />1. Research Design<br />
- 6. 5<br />Research Design--Selection Bias.<br />The model is built off a series of mailings for business-appropriate suits, dresses, and accessories.<br />The mailings were mailed to women only.<br />If the resulting model is put into production without the gender pre-screen, then males will end up getting contacted, probably quite unprofitably.<br />1. Research Design<br />
- 7. 6<br />Research Design--Timely Scoring Data.<br />The model looks for number of Web applicants from a given ZIP code in the prior week but the data can only be pulled monthly.<br />At best, the model can only be scored accurately once a month.<br />The predictor which uses the information is ineffective.<br />1. Research Design<br />
- 8. 7<br />Rule #1 Garbage In Garbage Out!Bad Data In Bad Models Out!<br />Analysis is only good as the data being analyzed.<br />All input data must be checked for reasonableness, timeliness, and completeness. <br />Information extracted from multiple sources must be verified that all data are appended to the “master file” appropriately. <br />You must engage in on-going quality control!<br />2. Data Checking<br />
- 9. 8<br />Study and scrutinize the data dictionary!<br />Understand every field in the database.<br />Eliminate fields that are too new, poorly filled, or unrealiable.<br />Look at distributions of values for each field. <br /> Know what every field means.<br /> Understand every value in the field. If there a “Z”, find out what “Z” means.<br />Work with the finance to define the business rules for properly counting orders, revenue, and other business drivers.<br />2. Data Checking<br />
- 10. 9<br />Clean the data when appropriate.<br />Models are driven by underlying data patterns.<br /> Bad patterns lead to bad models.<br />Correct data/variables with:<br /> Anomalies<br /> Missing values <br /> Outliers<br /> Errors.<br />2. Data Checking<br />
- 11. 10<br />Data Checking--Example of an anomaly.<br />Dollars per Contact Over Five Mailings<br />2. Data Checking<br />
- 12. 11<br />Data Checking--Missing Data Example.<br />Response Rates by Age<br />2. Data Checking<br />
- 13. 12<br />Data Checking--Outliers Example<br />The “Michael Jordan” example. <br />Individual credit card holders with $200,000 lines of credit. <br />The department store employee with 100 shopping trips a year. <br />2. Data Checking<br />
- 14. 13<br />Data Checking--Errors pose a tremendous risk for the modeler.<br />Commonly Occurring Errors:<br />Response data from a prior mailing incorrectly matched back to the customer file. <br />Changes in meaning or usage of a particular variable. <br />Alpha characters in supposedly numeric variable fields. <br />2. Data Checking<br />
- 15. 14<br />Variable creation captures the dynamics of the business.<br />Use creativity to create predictor variables.<br />Predictor variables typically come in three classes:<br />Recency—the time elapsed since an action.<br />Frequency—the number of times an event has happen, e.g. orders, clicked on a web page etc.<br />Monetary—the amount of money spent purchasing goods and services.<br />Use ratios and cross variables to identify meaningful interactions between variables.<br />2b. Variable Creation<br />
- 16. 15<br />Predictor Variable Creation--Example<br />Monetary<br />Sum of Revenue = $500<br />Frequency<br />Count Order Dates=<br />6 Orders<br />Recency<br />(11/14/01 – 8/17/01) = <br />89 Days or 3 Months!<br />2b. Variable Creation<br />
- 17. 16<br />Predictor Variable Creation--Example<br />Average Order Size =<br />$500 / 6 Orders<br />Total Books = 4<br />Total DVDS = 1<br />Total Electronics = 1<br />Percent Gift Purchases=<br />2 / 6 = 33%<br />Recency in Books <br />(11/14/01 – 6/1/01) = <br />166 Days or 5.5 Months!<br />2b. Variable Creation<br />
- 18. 17<br />Selecting a Target Variable<br />Make sure your target variable will give you the type of results you want. <br /> Measuring response: may get a lot of hand- raisers that are not profitable. <br /> Measuring profit: by focusing only on the dollars, you may miss a viable low-profit group. <br />Isolate all information gathered during the target period from being included as a predictor variable.<br />2b. Variable Creation<br />
- 19. 18<br />Analysis files have three time frames:<br />Predictor Period—The time before individuals are selected for a marketing contact. All predictor variables must contain only data from this period.<br />Gap Period—The time between the selection date and when the first response is recorded.<br />Target Period—The time between the first and last response date. All target variables must only contain information from this period.<br /> Predictor Period Gap Period Target Period <br />Selection Date<br />First Response Date<br />Last Response Date<br />3. Create Analysis Files<br />
- 20. 19<br />Good models are developed with modeling and validation samples. <br />Before modeling begins, split the analysis file into two random subsets: modeling and validation.<br />Develop the model using only the modeling subset.<br />Test the robustness and accuracy of the model using the validation subset.<br />Techniques exist for handling validation when analysis sample is too small to split. <br />3. Create Analysis Files<br />
- 21. 20<br />The appropriate modeling technique is driven by several factors. <br />The nature of the target variable. <br />The software that is supported in the production environment. <br />The skills of the analytical team.<br />4. Model Calibration<br />
- 22. 21<br />No modeling technique should operate on autopilot. <br />The analyst developing the model must:<br />Know how to use the modeling technique.<br />Know how to interpret the results.<br />Know a “cringe variable” when they see one. <br />Know how the model will be used by the marketers.<br />Without a pilot, even the most sophisticated plane will crash.<br />4. Model Calibration<br />
- 23. 22<br />Scoring models can be built using many different techniques.<br />Linear regression <br />Logistic regression<br />Discriminant analysis<br />Neural networks <br />Many, many more...<br />All can be used as predictors of future behavior.<br />4. Model Calibration<br />
- 24. 23<br />Model Calibration Rule #1<br />If you want to get famous, talk about technique.<br />If you want a great model, concentrate on <br />“the other 90 percent.”<br />4. Model Calibration<br />
- 25. 24<br />Corollary to Rule #1<br />Regardless of your technique of choice, <br />if you short-change “the other 90 percent,”<br />you will probably end up with a lousy model.<br />4. Model Calibration<br />
- 26. 25<br />Construction analogy<br />Throw several power tools onto a pile of lumber, <br />come back in a month, and -- presto – <br />you will NOT have a house.<br />4. Model Calibration<br />
- 27. 26<br />Linear Regression is best suited for continuous outcomes, such as sales.<br />Output can be understood by non-statisticians. <br />Each name is assigned an estimated value. <br />Scored population is easily ranked with respect to the target variable (sales, profits, etc.). <br />Does not automatically identify interactions between predictor variables.<br />4. Model Calibration<br />
- 28. 27<br />Linear Regression Example<br />Scoring Model for Predicting Monthly Revenue<br />Score = 0.08<br /> + 0.06 * House Value (Estimated in $Thousands)<br /> - 0.20 * Number of Children <br /> + 0.10 * Average Credit Card Limit (in $Thousands)<br /> - 0.30 * Number of Autos<br />JohnJenniferYOU<br />House Value? $150,000 $125,000<br />No. of Kids? 2 0 <br />Ave Limit? $15,000 $8,000<br />No. of Cars? 2 1<br />Score $9.58 $8.08<br />4. Model Calibration<br />
- 29. 28<br />Logistic regression is best suited for binary outcomes, such as buy/no buy.<br />Output can be understood by non-statisticians. <br />Each name is assigned a probability of performing the expected outcome that is NOT a prediction of future performance. <br />Scored population is easily ranked with respect to likelihood of displaying the targeted behavior. <br />Does not automatically identify interactions between predictor variables.<br />4. Model Calibration<br />
- 30. 29<br />Logistic Regression Example<br />Scoring Model for Predicting Likelihood to Purchase (Yes/No)<br />Score = 0.01<br /> + 0.04 * Person Owns Home (1=yes,0=no)<br /> - 0.05 * Number of Credit Cards <br /> + 0.01 * Income (Estimated in $Thousands)<br /> - 0.02 * Age<br />Probability Fix = 1 / [1 + Exponent(-Score)]<br />JohnJenniferYOU<br />Owns Home? No Yes<br />No. of Cards? 6 3 <br />Income? $40,000 $25,000<br />Age? 45 35<br />Score -0.79 (prob=31%) -0.55 (prob=37%)<br />4. Model Calibration<br />
- 31. 30<br />Neural networks can be used with either binary or continuous targets.<br />No restrictions on the type or structure of either the target variable or the historical variables.<br />Can more easily capture interactions between predictor variables.<br />Output is very difficult to explain. <br />Implementation can be difficult.<br />Models don’t always outperform traditional regression.<br />4. Model Calibration<br />
- 32. 31<br />When done well, scoring models are smooth with few, if any clumps.<br />Target behaviors of the scored names distribute on a “Gains Table” smoothly from highest to lowest.<br />This makes it easier to target a precise number of names, or to select down to a precise threshold of response or profit.<br />5. Model Evaluation<br />
- 33. 32<br />Understanding the Lift Table<br />Start by ranking all customers by their descending scores and observing the number of responders in each “decile.”<br />5. Model Evaluation<br />
- 34. 33<br />Next, calculate response rates for each decile.<br />5. Model Evaluation<br />
- 35. 34<br />Then, calculate the percent of all respondents that are in each decile.<br />5. Model Evaluation<br />
- 36. 35<br />Sum down the columns to calculate cumulative totals.<br />5. Model Evaluation<br />
- 37. 36<br />Calculate cumulative response Rate and percentage of response rates.<br />5. Model Evaluation<br />
- 38. 37<br />Lift is the ratio of cum response rate to the overall response rate = 0.310<br />5. Model Evaluation<br />
- 39. 38<br />Gains tables can show performance for both response and revenue.<br />5. Model Evaluation<br />
- 40. 39<br />Graphical displays of the lift table are easy to follow.<br />5. Model Evaluation<br />
- 41. 40<br />With cost figures, the gains table can be expanded to show profit.<br />5. Model Evaluation<br />
- 42. 41<br />In this example, profit peaks around a mail quantity of 60,000.<br />5. Model Evaluation<br />
- 43. 42<br />The production algorithm translates the model into the production environment. <br />The model is worthless without proper implementation. <br />Goal: create identical production and model algorithms. <br />Involve the production people. <br />Involve the marketers.<br />6. Model Implementation<br />
- 44. 43<br />Quality control procedures ensure the model is applied correctly every time. <br />Develop audit trail reports that highlight potential problems. <br />Look for model degradation over time. <br />Develop mini-profiles of each scoring decile and compare over time. <br />6. Model Implementation<br />
- 45. 44<br />Testing should always be done to continually validate assumptions. <br />The secret of determining the success of the model used for direct marketing is through tracking the results of its use in-market.<br />Each cell must be measured as well as the overall.<br /> For scoring models, this means that ‘cells’ must be created, usually deciles or percentiles.<br />Each group is marked and tracked. The performance can be compared to each other and to expected.<br />6. Model Implementation<br />
- 46. 45<br />Focus not only on overall performance, but also at the margin.<br />If you are losing money at the margin, too many unprofitable names are being contacted. <br />If you are making money at the margin, you may be leaving profits on the table. <br />Common sense and company policy will guide you to a target marginal ROI.<br />6. Model Implementation<br />

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment