3 Tests Experts Use to Validate Predictive Model Accuracy

3 Tests Experts Use to
Validate Predictive Model
Accuracy

Lift Charts and Decile Tables

Lift charts and decile tables compare the results of a model against what the
results would be if no model was used.
Karl Rexer, founder of Rexer Analytics, uses lift charts and decile tables to test models that
predict binary behaviors, e.g. if a lead will convert to a sale on a website.
Here’s how it works:
1. Randomly split lead data into two samples: 60% = modeling sample, 40%
= hold-out sample.
2. Use data mining algorithms to find the best set of predictors that work in
the modeling sample and identify highly responsive leads.
3. Score leads on a scale of 1-100, 100 being the most likely to convert.
4. Rank order leads by score.
5. Split leads into 10 sections (deciles).
6. Evaluate the results in a decile table.


If the model is working well, the leads in the top deciles will have a much higher conversion
rate than leads in the lower deciles. The “Lift” column = how much more successful the
model is than if no model was used.

Hold-out sample decile table


This data is then plotted on a lift chart to illustrate the performance of the model.
If no model was used, the results would appear as a linear line (in red below), i.e.
contacting leads in the first decile (first 10% of leads) = 10% of sales, contacting 20% of
leads = 20% of sales, etc.

The blue line represents the predictive model. The red X represents the lift of the first
decile above the random model.
The green line represents the “perfect” model, or the fewest leads you would have to
contact to yield 100% of sales.


“[With data mining algorithms, lift charts and decile tables],
you’re doing something called supervised learning. You’re using
historical data where you know the outcome of the scenario to
supervise the creation of your model and evaluate how well it will
work to predict a certain behavior. It’s a different methodology.”
- Karl Rexer, founder of Rexer Analytics

Target Shuffling

Target shuffling is a process that reveals how likely it is for results to have
occurred by chance.
John Elder, founder of Elder Research, uses target shuffling to test the statistical accuracy
of his data mining results.
Here’s how target shuffling works:
1. Randomly shuffle the output (target variable) on the training data to
“break the relationship” between it and the input variables.
2. Search for combinations of variables having a high concentration of
interesting outputs.
3. Save the “most interesting” result and repeat the process many times.
4. Look at a distribution of the collection of bogus “most interesting results”
to see how much of apparent results can be extracted from random data.
5. Evaluate where on (or beyond) this distribution your actual results stand.
6. Use this as your “significance” measure.

Target Shuffling

According to Elder, target shuffling is useful for preventing what he calls the “vast search
effect.”
“The more variables you have, the easier it becomes to ‘oversearch’ and identify false
patterns between them,” he says.
Elder compares the “bogus” results using a histogram, or a graphical representation of
how data is distributed, and evaluates where on this distribution his model’s initial results
stand.
If this initial result is stronger than the best result of your shuffled data, it means your
findings are valid.

Target Shuffling

Histogram comparing model success to shuffled models

Target Shuffling

“Target shuffling is a very good way to test non-traditional
statistical problems. But more importantly, it’s a process that
makes sense to a decision maker. Statistics is not persuasive to
most people—it’s just too complex.
“If you’re a business person, you want to make decisions based
upon things that are real and will hold up. So when you simulate
a scenario like this, it quantifies how likely it is that the results
you observed could have arisen by chance in a way that people
can understand.”
- John Elder, founder of Elder Research

Bootstrap Sampling

Bootstrap sampling tests a model’s performance on certain subsets of data over
and over again to provide an estimate of accuracy.
Dean Abbott, president of Abbott Analytics, Inc. uses this method to test the consistency
of his predictive models and to determine if they’re not just statistically significant, but
operationally significant.
“You can have a model that is statistically significant, but it doesn’t mean that it’s
generating enough revenue to be interesting,” he explains.
“You might come up with a model for a marketing campaign that winds up generating
$40,000 in additional revenue, but that’s not enough to even cover the cost of the modeler
who built it.”

Bootstrap Sampling

Here’s how bootstrap sampling works:
1. Take a random sample of data and split it into three subsets: training,
testing and validation.
2. Build model on the training subset.
3. Evaluate model on the testing subset.
4. Repeat this training and testing process several times.
5. Once you’re convinced your model is consistent and accurate, deploy it
against the final validation subset.
The validation subset provides a better understanding of just how much better a
model is likely to be if you use it on real data.
This method is good for two things: picking which of your models “wins,” and
showing the range of lifts you get when you run models through multiple times.

Bootstrap Sampling

“Bootstrap sampling tells you how the model accuracy is
bounded, and thus what to expect when you run it live,” he says.
“When you only run a model through test data, it’s hard to know
if the lift you’re getting is real.”
- Dean Abbott, president of Abbott Analytics, Inc.

Read Report

Read the full article to learn more.

@PlottingSuccess
/SoftwareAdvice

/company/software-advice
@SoftwareAdvice

Software Advice™ is a trusted resource for software buyers. The company's
website, www.softwareadvice.com, provides detailed reviews, comparisons and
research to help organizations choose the right software. Meanwhile, the company’s
team of software analysts provide free telephone consultations to help each
software buyer identify systems that best fit their needs. In the process, Software
Advice connects software buyers and sellers, generating high-quality opportunities
for software vendors.

3 Tests Experts Use to Validate Predictive Model Accuracy

More Related Content

Viewers also liked

More from Software Advice

Recently uploaded

3 Tests Experts Use to Validate Predictive Model Accuracy