Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

3 Tests Experts Use to Validate Predictive Model Accuracy


Published on

There are many different tests you can use to determine if the predictive models you create will prove valuable to your organization. We spoke to three top data mining experts to learn the tests they use to measure the accuracy of their own results, and what makes each test so effective.

Published in: Business

3 Tests Experts Use to Validate Predictive Model Accuracy

  1. 1. 3 Tests Experts Use to Validate Predictive Model Accuracy
  2. 2. Lift Charts and Decile Tables Lift charts and decile tables compare the results of a model against what the results would be if no model was used. Karl Rexer, founder of Rexer Analytics, uses lift charts and decile tables to test models that predict binary behaviors, e.g. if a lead will convert to a sale on a website. Here’s how it works: 1. Randomly split lead data into two samples: 60% = modeling sample, 40% = hold-out sample. 2. Use data mining algorithms to find the best set of predictors that work in the modeling sample and identify highly responsive leads. 3. Score leads on a scale of 1-100, 100 being the most likely to convert. 4. Rank order leads by score. 5. Split leads into 10 sections (deciles). 6. Evaluate the results in a decile table.
  3. 3. Lift Charts and Decile Tables If the model is working well, the leads in the top deciles will have a much higher conversion rate than leads in the lower deciles. The “Lift” column = how much more successful the model is than if no model was used. Hold-out sample decile table
  4. 4. Lift Charts and Decile Tables This data is then plotted on a lift chart to illustrate the performance of the model. If no model was used, the results would appear as a linear line (in red below), i.e. contacting leads in the first decile (first 10% of leads) = 10% of sales, contacting 20% of leads = 20% of sales, etc. The blue line represents the predictive model. The red X represents the lift of the first decile above the random model. The green line represents the “perfect” model, or the fewest leads you would have to contact to yield 100% of sales.
  5. 5. Lift Charts and Decile Tables “[With data mining algorithms, lift charts and decile tables], you’re doing something called supervised learning. You’re using historical data where you know the outcome of the scenario to supervise the creation of your model and evaluate how well it will work to predict a certain behavior. It’s a different methodology.” - Karl Rexer, founder of Rexer Analytics
  6. 6. Target Shuffling Target shuffling is a process that reveals how likely it is for results to have occurred by chance. John Elder, founder of Elder Research, uses target shuffling to test the statistical accuracy of his data mining results. Here’s how target shuffling works: 1. Randomly shuffle the output (target variable) on the training data to “break the relationship” between it and the input variables. 2. Search for combinations of variables having a high concentration of interesting outputs. 3. Save the “most interesting” result and repeat the process many times. 4. Look at a distribution of the collection of bogus “most interesting results” to see how much of apparent results can be extracted from random data. 5. Evaluate where on (or beyond) this distribution your actual results stand. 6. Use this as your “significance” measure.
  7. 7. Target Shuffling According to Elder, target shuffling is useful for preventing what he calls the “vast search effect.” “The more variables you have, the easier it becomes to ‘oversearch’ and identify false patterns between them,” he says. Elder compares the “bogus” results using a histogram, or a graphical representation of how data is distributed, and evaluates where on this distribution his model’s initial results stand. If this initial result is stronger than the best result of your shuffled data, it means your findings are valid.
  8. 8. Target Shuffling Histogram comparing model success to shuffled models
  9. 9. Target Shuffling “Target shuffling is a very good way to test non-traditional statistical problems. But more importantly, it’s a process that makes sense to a decision maker. Statistics is not persuasive to most people—it’s just too complex. “If you’re a business person, you want to make decisions based upon things that are real and will hold up. So when you simulate a scenario like this, it quantifies how likely it is that the results you observed could have arisen by chance in a way that people can understand.” - John Elder, founder of Elder Research
  10. 10. Bootstrap Sampling Bootstrap sampling tests a model’s performance on certain subsets of data over and over again to provide an estimate of accuracy. Dean Abbott, president of Abbott Analytics, Inc. uses this method to test the consistency of his predictive models and to determine if they’re not just statistically significant, but operationally significant. “You can have a model that is statistically significant, but it doesn’t mean that it’s generating enough revenue to be interesting,” he explains. “You might come up with a model for a marketing campaign that winds up generating $40,000 in additional revenue, but that’s not enough to even cover the cost of the modeler who built it.”
  11. 11. Bootstrap Sampling Here’s how bootstrap sampling works: 1. Take a random sample of data and split it into three subsets: training, testing and validation. 2. Build model on the training subset. 3. Evaluate model on the testing subset. 4. Repeat this training and testing process several times. 5. Once you’re convinced your model is consistent and accurate, deploy it against the final validation subset. The validation subset provides a better understanding of just how much better a model is likely to be if you use it on real data. This method is good for two things: picking which of your models “wins,” and showing the range of lifts you get when you run models through multiple times.
  12. 12. Bootstrap Sampling
  13. 13. Bootstrap Sampling “Bootstrap sampling tells you how the model accuracy is bounded, and thus what to expect when you run it live,” he says. “When you only run a model through test data, it’s hard to know if the lift you’re getting is real.” - Dean Abbott, president of Abbott Analytics, Inc.
  14. 14. Read Report Read the full article to learn more. @PlottingSuccess /SoftwareAdvice /company/software-advice @SoftwareAdvice Software Advice™ is a trusted resource for software buyers. The company's website,, provides detailed reviews, comparisons and research to help organizations choose the right software. Meanwhile, the company’s team of software analysts provide free telephone consultations to help each software buyer identify systems that best fit their needs. In the process, Software Advice connects software buyers and sellers, generating high-quality opportunities for software vendors.