Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

1,181 views

Published on

Talk given at the 2012 Working Conference on Mining Software Repositories (MSR'12) in Zürich, Switzerland.

Published in: Education, Technology
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total views
1,181
On SlideShare
0
From Embeds
0
Number of Embeds
90
Actions
Shares
0
Downloads
35
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Think Locally, Act Gobally - Improving Defect and Effort Prediction Models

  1. 1. Think Locally, Act Globally Improving Defect and Effort Prediction Models Nicolas Bettenburg • Meiyappan Nagappan • Ahmed E. Hassan Queen’s University • Kingston, ON, Canada SOFTWARE ANALYSIS & INTELLIGENCE LAB TSaturday, 2 June, 12
  2. 2. Data Modelling in Empirical SE measured from project data Observations 2Saturday, 2 June, 12
  3. 3. Data Modelling in Empirical SE measured from project data Observations describe observations mathematically Model 2Saturday, 2 June, 12
  4. 4. Data Modelling in Empirical SE measured from project data Observations describe observations mathematically Model Prediction guide decision making Understanding guide process optimizations and future research 2Saturday, 2 June, 12
  5. 5. Model Building Today Whole Dataset 3Saturday, 2 June, 12
  6. 6. Model Building Today Whole Dataset Training Data Testing Data 3Saturday, 2 June, 12
  7. 7. Model Building Today Whole Dataset Training Data Learned Model M Testing Data 3Saturday, 2 June, 12
  8. 8. Model Building Today Whole Dataset Training Data Learned Model M Y Testing Data Predictions 3Saturday, 2 June, 12
  9. 9. Model Building Today Whole Dataset Training Data Learned Model M Y Testing Data Predictions Compare 3Saturday, 2 June, 12
  10. 10. Much Research Effort on new metrics and new models! 4Saturday, 2 June, 12
  11. 11. Maybe we need to look more at the data partSaturday, 2 June, 12
  12. 12. In the FieldSaturday, 2 June, 12
  13. 13. In the Field Tom ZimmermannSaturday, 2 June, 12
  14. 14. In the Field We ran 622 cross-project predictions and found that only 3.4% actually worked. Tom ZimmermannSaturday, 2 June, 12
  15. 15. In the Field We ran 622 cross-project predictions and found that only 3.4% actually worked. Tom Zimmermann Tim MenziesSaturday, 2 June, 12
  16. 16. In the Field We ran 622 cross-project predictions and found that only 3.4% actually worked. Tom Zimmermann Rather than focus on generalities, empirical SE should focus more on context-specific principles. Tim MenziesSaturday, 2 June, 12
  17. 17. In the Field We ran 622 cross-project predictions and found that only 3.4% actually worked. Tom Zimmermann Taking local properties of data into consideration leads to better models! Rather than focus on generalities, empirical SE should focus more on context-specific principles. Tim MenziesSaturday, 2 June, 12
  18. 18. Using Locality in Statistical ModelsSaturday, 2 June, 12
  19. 19. Using Locality in Statistical Models 1 Does this principle work for statistical models?Saturday, 2 June, 12
  20. 20. Using Locality in Statistical Models 1 Does this principle work for statistical models? 2 Does it work for Prediction?Saturday, 2 June, 12
  21. 21. Using Locality in Statistical Models 1 Does this principle work for statistical models? 2 Does it work for Prediction? 3 Can we do better?Saturday, 2 June, 12
  22. 22. Building Local Models Whole Dataset Training Data Learned Model M Y Testing Data Predictions 8Saturday, 2 June, 12
  23. 23. Building Local Models ter Data Clus Whole Dataset Training Data Learned Model M Y Testing Data Predictions 8Saturday, 2 June, 12
  24. 24. Building Local Models ltiple n Mu Data Lear dels ter Mo Clus Whole Dataset Training Data Learned Models M1 M2 M3 Y Testing Data Predictions 8Saturday, 2 June, 12
  25. 25. Building Local Models ltiple n Mu Data Lear dels ter Mo Clus Whole Dataset Training Data Learned Models M1 M2 M3 Y Y Y Testing Data Predictions dict Pre ally Ind ividu 8Saturday, 2 June, 12
  26. 26. Building Local Models ltiple n Mu Data Lear dels ter Mo Clus Whole Dataset Training Data Learned Models M1 M2 M3 Y Y Y Testing Data Predictions Compare dict Pre ally Ind ividu 8Saturday, 2 June, 12
  27. 27. HAPTER 2. Global StatisticalMODELS GENERAL ASPECTS OF FITTING REGRESSION Model 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 9 Saturday, 2 June, 12
  28. 28. HAPTER 2. Global StatisticalMODELS GENERAL ASPECTS OF FITTING REGRESSION Model 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 9 Saturday, 2 June, 12
  29. 29. HAPTER 2. Global StatisticalMODELS GENERAL ASPECTS OF FITTING REGRESSION Model 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 9 Saturday, 2 June, 12
  30. 30. HAPTER 2. Global StatisticalMODELS GENERAL ASPECTS OF FITTING REGRESSION Model 34 f(X) 0 1 2 3 4 5 6 X Model fit leaves much room for improvement! Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 9 Saturday, 2 June, 12
  31. 31. Local Statistical ModelCHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 3 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 10 Saturday, 2 June, 12
  32. 32. Local Statistical ModelCHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 3 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 10 Saturday, 2 June, 12
  33. 33. Local Statistical ModelCHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 3 f(X) Model 2 Model 1 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 10 Saturday, 2 June, 12
  34. 34. Local Statistical ModelCHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 3 f(X) Model 2 Model 1 0 1 2 3 4 5 6 X Improved Fit! Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 10 Saturday, 2 June, 12
  35. 35. How can we use this approach to get an even better fit?Saturday, 2 June, 12
  36. 36. Be Even More Local !HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12Saturday, 2 June, 12
  37. 37. Be Even More Local !HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12Saturday, 2 June, 12
  38. 38. Be Even More Local !HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12Saturday, 2 June, 12
  39. 39. Be Even More Local !HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) Great Fit! 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12Saturday, 2 June, 12
  40. 40. Be Even More Local !HAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) Great Fit! BUT: Risk of Overfitting the Data!! 0 1 2 3 4 5 6 X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 12Saturday, 2 June, 12
  41. 41. Saturday, 2 June, 12
  42. 42. Clustering independent of FitSaturday, 2 June, 12
  43. 43. CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELSGENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. C(Y |X) = f (X) = X , C(Y |X) = f (X) = X , where X = 0 + 1 X1 + 2 X2 + 3 X3 + 4X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 , and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+Saturday, 2 June, 12 X3 = (X b)+ X4 = (X c)+.
  44. 44. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELSGENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. C(Y |X) = f (X) = X , C(Y |X) = f (X) = X , where X = 0 + 1 X1 + 2 X2 + 3 X3 + 4X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 , and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+Saturday, 2 June, 12 X3 = (X b)+ X4 = (X c)+.
  45. 45. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELSGENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 0 1 2 3 4 5 6 X C(Y |X) = f (X) = X , C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A , where X = 0 + 1X1 + 2X2 + 3X3 + 4X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 , and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+Saturday, 2 June, 12 C(Y |X) = f (X) = X , X3 = (X b)+ X4 = (X c)+.
  46. 46. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELSGENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 0 1 2 3 4 5 6 X C(Y |X) = f (X) = X , C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A , where X = 0 + 1X1 + 2X2 + 3X3 + 4X = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 , and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+Saturday, 2 June, 12 C(Y |X) = f (X) = X , X3 = (X b)+ X4 = (X c)+.
  47. 47. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELSGENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 0 1 2 3 4 5 6 X C(Y |X) = f (X) = X , C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A , where X = 0 + 1X1 + 2X2 + 3X3 + 4X = Multivariate2 Adaptive4X4, 0 + 1X1 + 2X + 3X3 + Regression Splines (MARS) and X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+Saturday, 2 June, 12 C(Y |X) = f (X) = X , X3 = (X b)+ X4 = (X c)+.
  48. 48. Optimize Local Fit wrt. Minimizing Global Overfit CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELS 34 CHAPTER 2. GENERAL ASPECTS OF FITTING REGRESSION MODELSGENERAL ASPECTS OF FITTING REGRESSION MODELS 34 f(X) f(X) f(X) 0 1 2 3 4 5 6 0 1 2 3 4 5 6 X X Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A linear spline function with knots at a = 1, b = 3, c = 5. 0 1 2 3 4 5 6 X C(Y |X) = f (X) = X , C(Y |X) = f (X) = X linear spline function with knots at a = 1, b = 3, c = 5. Figure 2.1: A , where X = 0 + 1X1 + 2X2 + 3X3 + 4X = Multivariate2 Adaptive4X4, 0 + 1X1 + 2X + 3X3 + Regression Splines (MARS) and create local knowledge that optimizes process globally X1 = X X2 = (X a)+ 14 X1 = X X2 = (X a)+Saturday, 2 June, 12 C(Y |X) = f (X) = X , X3 = (X b)+ X4 = (X c)+.
  49. 49. Case Study 15Saturday, 2 June, 12
  50. 50. Case Study Xalan 2.6 Post-Release Defects per Class 20 CK Metrics Lucene 2.4 15Saturday, 2 June, 12
  51. 51. Case Study Xalan 2.6 Post-Release Defects per Class 20 CK Metrics Lucene 2.4 Total Development Effort in Hours CHINA 14 FP Metrics 15Saturday, 2 June, 12
  52. 52. Case Study Xalan 2.6 Post-Release Defects per Class 20 CK Metrics Lucene 2.4 Total Development Effort in Hours CHINA 14 FP Metrics Development Length in Months NasaCoc 24 COCOMO-II Metrics 15Saturday, 2 June, 12
  53. 53. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) 16Saturday, 2 June, 12
  54. 54. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 16Saturday, 2 June, 12
  55. 55. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 16Saturday, 2 June, 12
  56. 56. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 16Saturday, 2 June, 12
  57. 57. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 16Saturday, 2 June, 12
  58. 58. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) 8 Xalan 2.6 0.33 0.52 0.69 Number of Clusters Dataset 6 CHINA 4 Lucene 2.4 0.32 0.60 0.83 Lucene 2.4 NasaCoc Xalan 2.6 2 0 CHINA 0.83 0.89 0.89 Fold01 Fold02 Fold03 Fold04 Fold05 Fold06 Fold07 Fold08 Fold09 Fold10 NasaCOC 0.93 0.97 0.99 Figure 3: Number of clusters generated by MCLUST in each run of the 10-fold cross validation. term for each additional prediction variable entering the is too small to continue or until a maximum number of terms regression model [23]. is reached. In our case study, the maximum number of terms For practical purposes, we use a publicly available imple- is automatically determined by the implementation, and is mentation of BIC-based model selection, contained in the based on the amount of independent variables we give as R package: BMA. The input to the BMA implementation input. For MARS models, we use all independent variables is the dataset itself, as well as a list of all dependent and in a dataset after VIF analysis. independent variables that should be considered. In our case The first phase often builds a model that suffers from 16 study, we always supply a list of all independent variables overfitting. As a result, the second phase, called the back-Saturday,were 12 that 2 June, left after VIF analysis. The output of the BMA ward phase, prunes the model, to increase the model’s gen-
  59. 59. Results: Goodness of Fit Rank-Correlation (0 = worst fit, 1 = optimal fit) Local Global MARS (Clustered) Xalan 2.6 0.33 0.52 0.69 Lucene 2.4 0.32 0.60 0.83 CHINA 0.83 0.89 0.89 NasaCOC 0.93 0.97 0.99 UP TO 2.5x BETTER FIT WHEN USING DATA LOCALITY! 16Saturday, 2 June, 12
  60. 60. Results: Prediction Error Global Local MARS 0.7 1.2 0.525 0.9 0.35 0.64 0.6 1.15 1.15 0.52 0.94 0.175 0.4 0.3 0 0 Xalan 2.6 Lucene 2.4 800 4 600 3 400 765 2 3.26 552.85 200 1 2.14 1.63 234.43 0 0 CHINA NasaCoC 17Saturday, 2 June, 12
  61. 61. Results: Prediction Error Global Local MARS 0.7 1.2 0.525 0.9 0.35 0.64 0.6 1.15 1.15 0.52 0.94 0.175 0.4 0.3 0 0 Xalan 2.6 Lucene 2.4 800 4 600 3 400 765 2 3.26 552.85 200 1 2.14 1.63 234.43 0 0 CHINA NasaCoC Up to 4x lower prediction error with Local Models! 17Saturday, 2 June, 12
  62. 62. ? Model InterpretationSaturday, 2 June, 12
  63. 63. Model Interpretation 0.5 1 avg_cc 2 ca 3 cam 4 cbm 0.80 1.1 0.52 1.6 −0.5 0.70 0.9 0.48 1.2 −1.5 0.60 0.7 0.44 0.50 0.5 −2.5 0.8 0 5 10 15 20 0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 0.0 5 ce 6 dam 7 dit 8 ic 0.62 0.6 0.8 0.65 0.58 0.5 0.45 0.6 0.60 0.4 0.54 0.55 0.4 0.3 0.35 0.50 0.50 0.2 0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 1 (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset 9 Part Model 11 loc 12 max_cc (b) P 1.8 0.7 6 2.6 d 2.0 4 0.6 5 1.4 4 3 0.5 1.5Figure 6: Global models report general trends, while global models with local c 1.0 3 2 0.4 1.0 2 1 0.3 0.6describes the response (in this case bugs) while keeping all other prediction variab 0.5 1 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 Fold 9, Cluster 1 13 mfa 14 moa 15 noc 16 npm pr 0.50 0.58 1.0 0.51 ic npm mfa O 0.70 0.5 19 0.49 0.46 w 0.0 0.54 0.60 .47Saturday, 2 June, 12
  64. 64. Model Interpretation 0.5 1 avg_cc 2 ca 3 cam 4 cbm 0.80 1.1 0.52 1.6 −0.5 0.70 0.9 0.48 1.2 −1.5 0.60 0.7 0.44 0.50 0.5 −2.5 0.8 0 5 10 15 20 0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 0.0 5 ce 6 dam 7 dit 8 ic 0.62 0.6 0.8 0.65 0.58 0.5 0.45 0.6 0.60 0.4 0.54 0.55 0.4 0.3 0.35 0.50 0.50 0.2 0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 1 (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset 9 Part Model 11 loc 12 max_cc (b) P 1.8 0.7 6 2.6 d 2.0 4 0.6 5 1.4 4 3 0.5 1.5Figure 6: Global models report general trends, while global models with local c Traditional Global Model: General Trends 1.0 3 2 0.4 1.0 2 1 0.3 0.6describes the response (in this case bugs) while keeping all other prediction variab 0.5 1 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 Fold 9, Cluster 1 13 mfa 14 moa 15 noc 16 npm pr 0.50 0.58 1.0 0.51 ic npm mfa O 0.70 0.5 19 0.49 0.46 w 0.0 0.54 0.60 .47Saturday, 2 June, 12
  65. 65. Model Interpretation 0.5 1 avg_cc 2 ca 3 cam 4 cbm 0.80 1.1 0.52 1.6 −0.5 0.70 0.9 0.48 1.2 −1.5 0.60 0.7 0.44 0.50 0.5 −2.5 0.8 0 5 10 15 20 0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 0.0 5 ce 6 dam 7 dit 8 ic 0.62 0.6 0.8 0.65 0.58 0.5 0.45 0.6 0.60 0.4 0.54 0.55 0.4 0.3 0.35 0.50 0.50 0.2 0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 1 (a)lcom of a global 10 lcom3 learned on the Xalan 2.6 dataset 9 Part Model 11 loc 12 max_cc (b) P 1.8 0.7 6 2.6 d 2.0 4 0.6 5 1.4 4 3 0.5 1.5Figure 6: Global models report general trends, while global models with local c Traditional Global Model: General Trends 1.0 3 2 0.4 1.0 2describes One Curve per metric, run corp on all other prediction variab the response (in this case bugs) while keeping that curve 1 0.3 0.6 0.5 1 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 Fold 9, Cluster 1 13 mfa 14 moa 15 noc 16 npm pr 0.50 0.58 1.0 0.51 ic npm mfa O 0.70 0.5 19 0.49 0.46 w 0.0 0.54 0.60 .47Saturday, 2 June, 12
  66. 66. 1 4 0.3 0.4 0. 0.5 1.0 1. 3 0.3 0.4 0.5 Figure 6: Global models report general trends, while global models with local considerations give insig 0.5 1.0 1.5 Model Interpretation Figure 6: Global models report general trends, while global models with local considerations give insight 1.0 3 1.0 2 1.0 3 1.0 2 2 2 1 0.6 describes the response (in this case bugs) while keeping all other prediction variables atat their median val describes the response (in this case bugs) while keeping all other prediction variables their median value 0.8 1 1 0.6 0.8 1 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 1000 2000 3000 4000 0.0 0.2 0.4 0 1000 3000 5000 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 0 20 40 60 80 120 0 1000 2000 3000 4000 0.0 0.2 0 Fold 9, Cluster 1 15 noc Fold 9, Cluster 1 prediction models lead prediction models lea 13 mfa 14 moa 16 npm 13 npm 0.50 13 npm 0.58 13 mfa 14 moa 15 noc 16 npm 0.0 0.5 1.0 0.51 0.50 0.58 ic npm mfa Our findings thus co 0.0 0.5 1.0 0.51 0.70 ic npm mfa Our findings thus c 0.70 0.49 0.46 who observed a asimil 0.49 0.54 0.46 who observed sim 0.60 0.54 0.47 0.60 Clustermachine-lear WHICH 1 0.47 0.42 WHICH machine-lea −1.0 0.42 0.50 0.50 0.45 −1.0 0.50 0.50 0.45 have practical implic 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 0 5 10 15 20 25 30 0 20 40 60 80 100 120 0 20 40 60 80 100 120 have practical impli 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 0 5 10 15 20 25 30 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 2 4 6 8 10 using regression mod 0 2 4 6 8 10 using regression mo are more insightful th Fold 9, Cluster 6 ... are more insightful t general trends across Fold 9, Cluster 6 general trends acros ic npm mfa demonstrated that such ic npm mfa demonstrated that su particular parts of the 0 01 12 2 3 3 particular parts of th in the Xalan 2.6 def in the Xalan 2.6 de Cluster 6 are infl sets of classes 0 1 2 3 4 0 10 20 30 40 60 sets of classes are in as inheritance, cohes 0 1 2 3 4 0 10 20 30 40 60 as inheritance, coh reinforce the recomm Figure 7: Example of contradicting trends in local models (Xalan 2.6, Figure 17: Example ofin Fold 9). trends in local models (Xalan 2.6, contradicting the use of the recom reinforce a “one-size Cluster and Cluster 6 model, whenatrying to the use of “one-si Cluster 1 and Cluster 6 in Fold 9). model, when trying t model already partition the data into regions with individual model already partition the data into regions increase of ic properties. For example, we observe that an with individual B. Act Globally properties. For example, we observethrough parent classes) B. Act Globally (measuring the inheritance coupling that an increase of ic When the goal is carry (measuring the only have a negative effect on bug-proneness is predicted to When the goal is car inheritance coupling through parent classes) understanding, local m 20Saturday, predicted to only have a negative effect on bug-proneness is 2 June, 12 understanding, local

×