Successfully reported this slideshow.
Upcoming SlideShare
×

# 3 Tests Experts Use to Validate Predictive Model Accuracy

17,800 views

Published on

There are many different tests you can use to determine if the predictive models you create will prove valuable to your organization. We spoke to three top data mining experts to learn the tests they use to measure the accuracy of their own results, and what makes each test so effective.

• Full Name
Comment goes here.

Are you sure you want to Yes No
• D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/7k2yk ◀ ◀ ◀ ◀

Are you sure you want to  Yes  No
• D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/7k2yk ◀ ◀ ◀ ◀

Are you sure you want to  Yes  No
• I have always found it hard to meet the requirements of being a student. Ever since my years of high school, I really have no idea what professors are looking for to give good grades. After some google searching, I found this service ⇒ www.WritePaper.info ⇐ who helped me write my research paper. The final result was amazing, and I highly recommend ⇒ www.WritePaper.info ⇐ to anyone in the same mindset as me.

Are you sure you want to  Yes  No
• D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/7k2yk ◀ ◀ ◀ ◀

Are you sure you want to  Yes  No

### 3 Tests Experts Use to Validate Predictive Model Accuracy

1. 1. 3 Tests Experts Use to Validate Predictive Model Accuracy
2. 2. Lift Charts and Decile Tables Lift charts and decile tables compare the results of a model against what the results would be if no model was used. Karl Rexer, founder of Rexer Analytics, uses lift charts and decile tables to test models that predict binary behaviors, e.g. if a lead will convert to a sale on a website. Here’s how it works: 1. Randomly split lead data into two samples: 60% = modeling sample, 40% = hold-out sample. 2. Use data mining algorithms to find the best set of predictors that work in the modeling sample and identify highly responsive leads. 3. Score leads on a scale of 1-100, 100 being the most likely to convert. 4. Rank order leads by score. 5. Split leads into 10 sections (deciles). 6. Evaluate the results in a decile table.
3. 3. Lift Charts and Decile Tables If the model is working well, the leads in the top deciles will have a much higher conversion rate than leads in the lower deciles. The “Lift” column = how much more successful the model is than if no model was used. Hold-out sample decile table
4. 4. Lift Charts and Decile Tables This data is then plotted on a lift chart to illustrate the performance of the model. If no model was used, the results would appear as a linear line (in red below), i.e. contacting leads in the first decile (first 10% of leads) = 10% of sales, contacting 20% of leads = 20% of sales, etc. The blue line represents the predictive model. The red X represents the lift of the first decile above the random model. The green line represents the “perfect” model, or the fewest leads you would have to contact to yield 100% of sales.
5. 5. Lift Charts and Decile Tables “[With data mining algorithms, lift charts and decile tables], you’re doing something called supervised learning. You’re using historical data where you know the outcome of the scenario to supervise the creation of your model and evaluate how well it will work to predict a certain behavior. It’s a different methodology.” - Karl Rexer, founder of Rexer Analytics
6. 6. Target Shuffling Target shuffling is a process that reveals how likely it is for results to have occurred by chance. John Elder, founder of Elder Research, uses target shuffling to test the statistical accuracy of his data mining results. Here’s how target shuffling works: 1. Randomly shuffle the output (target variable) on the training data to “break the relationship” between it and the input variables. 2. Search for combinations of variables having a high concentration of interesting outputs. 3. Save the “most interesting” result and repeat the process many times. 4. Look at a distribution of the collection of bogus “most interesting results” to see how much of apparent results can be extracted from random data. 5. Evaluate where on (or beyond) this distribution your actual results stand. 6. Use this as your “significance” measure.
7. 7. Target Shuffling According to Elder, target shuffling is useful for preventing what he calls the “vast search effect.” “The more variables you have, the easier it becomes to ‘oversearch’ and identify false patterns between them,” he says. Elder compares the “bogus” results using a histogram, or a graphical representation of how data is distributed, and evaluates where on this distribution his model’s initial results stand. If this initial result is stronger than the best result of your shuffled data, it means your findings are valid.
8. 8. Target Shuffling Histogram comparing model success to shuffled models
9. 9. Target Shuffling “Target shuffling is a very good way to test non-traditional statistical problems. But more importantly, it’s a process that makes sense to a decision maker. Statistics is not persuasive to most people—it’s just too complex. “If you’re a business person, you want to make decisions based upon things that are real and will hold up. So when you simulate a scenario like this, it quantifies how likely it is that the results you observed could have arisen by chance in a way that people can understand.” - John Elder, founder of Elder Research
10. 10. Bootstrap Sampling Bootstrap sampling tests a model’s performance on certain subsets of data over and over again to provide an estimate of accuracy. Dean Abbott, president of Abbott Analytics, Inc. uses this method to test the consistency of his predictive models and to determine if they’re not just statistically significant, but operationally significant. “You can have a model that is statistically significant, but it doesn’t mean that it’s generating enough revenue to be interesting,” he explains. “You might come up with a model for a marketing campaign that winds up generating \$40,000 in additional revenue, but that’s not enough to even cover the cost of the modeler who built it.”
11. 11. Bootstrap Sampling Here’s how bootstrap sampling works: 1. Take a random sample of data and split it into three subsets: training, testing and validation. 2. Build model on the training subset. 3. Evaluate model on the testing subset. 4. Repeat this training and testing process several times. 5. Once you’re convinced your model is consistent and accurate, deploy it against the final validation subset. The validation subset provides a better understanding of just how much better a model is likely to be if you use it on real data. This method is good for two things: picking which of your models “wins,” and showing the range of lifts you get when you run models through multiple times.
12. 12. Bootstrap Sampling
13. 13. Bootstrap Sampling “Bootstrap sampling tells you how the model accuracy is bounded, and thus what to expect when you run it live,” he says. “When you only run a model through test data, it’s hard to know if the lift you’re getting is real.” - Dean Abbott, president of Abbott Analytics, Inc.