Be the first to like this
Empirical Software Engineering relies on reusable datasets to make it easier to replicate empirical studies and therefore build theories on top of those empirical results. An area where these reusable datasets are particularly useful is defect predictions. In this area, the goal is to predict which entities will be more error prone, so managers can take preventive actions to improve the quality of the delivered system. These reusable datasets contain information about source code files and their history, bug reports, and bugs fixed in each one of the files. However, some of the most used datasets in the Empirical Software Engineering community have been shown to be biased: many links between files and fixed bugs are missing. Research work has already shown that this bias may affect the performance of defect prediction models. In this talk we will show how to use statistical techniques to evaluate the bias in datasets, and to estimate their impact on defect prediction
Clipping is a handy way to collect important slides you want to go back to later.