Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multi-Objective Cross-Project Defect Prediction

799 views

Published on

Cross-project defect prediction is very appealing because (i) it allows predicting defects in projects for which the availability of data is limited, and (ii) it allows producing generalizable prediction models. However, existing research suggests that cross-project prediction is particularly challenging and, due to heterogeneity of projects, prediction accuracy is not always very good. This paper proposes a novel, multi-objective approach for cross-project defect prediction, based on a multi-objective logistic regression model built using a genetic algorithm. Instead of providing the software engineer with a single predictive model, the multi-objective approach allows software engineers to choose predictors achieving a compromise between number of likely defect-prone artifacts (effectiveness) and LOC to be analyzed/tested (which can be considered as a proxy of the cost of code inspection). Results of an empirical evaluation on 10 datasets from the Promise repository indicate the superiority and the usefulness of the multi-objective approach with respect to single-objective predictors. Also, the proposed approach outperforms an alternative approach for cross-project prediction, based on local prediction upon clusters of similar classes.

  • Be the first to comment

  • Be the first to like this

Multi-Objective Cross-Project Defect Prediction

  1. 1. Gerardo   Canfora   Andrea  De   Lucia   Massimiliano   Di  Penta   Rocco   Oliveto   Annibale Panichella Sebas<ano   Panichella   Multi-Objective Cross-Project Defect Prediction
  2. 2. Bugs are everywhere…
  3. 3. Software Testing
  4. 4. Practical Constraints Sofwtare Quality Money Time
  5. 5. Defect Prediction Spent more resources on components most likely to fail
  6. 6. Indicators of defects Cached history information Kim  at  al.    ICSE  2007   Change Metrics Moset  at  al.    ICSE  2008.   A metrics suite for object oriented design Chidamber   at  al.   TSE      1994  
  7. 7. Defect Prediction Methodology Predic<ng   Model    Project     Test  Set   Training  Set   Defect Prone Class1 YES Class2 YES Class3 NO … YES ClassN …
  8. 8. Defect Prediction Methodology Predic<ng   Model    Project     Test  Set   Training  Set   Defect Prone Class1 YES Class2 YES Class3 NO … YES ClassN … Within Project
  9. 9. Defect Prediction Methodology Predic<ng   Model    Project     Test  Set   Training  Set   Defect Prone Class1 YES Class2 YES Class3 NO … YES ClassN … Within Project Issue: Size of the Training Set
  10. 10. Defect Prediction Methodology Predic<ng   Model    Project     Test  Set   Training  Set   Defect Prone Class1 YES Class2 YES Class3 NO … YES ClassN … Predic<ng   Model   Test  Set   Training  Set   Defect Prone Class1 YES Class2 YES Class3 NO … YES ClassN … Within Project Issue: Size of the Training Set  Past  Projects    New  Project  
  11. 11.  Project  B    Project  A   Defect Prediction Methodology Predic<ng   Model    Project     Test  Set   Training  Set   Defect Prone Class1 YES Class2 YES Class3 NO … YES ClassN … Predic<ng   Model   Test  Set   Training  Set   Defect Prone Class1 YES Class2 YES Class3 NO … YES ClassN … Within Project Cross-Project Issue: Size of the Training Set
  12. 12.  Project  B    Project  A   Defect Prediction Methodology Predic<ng   Model    Project     Test  Set   Training  Set   Defect Prone Class1 YES Class2 YES Class3 NO … YES ClassN … Predic<ng   Model   Test  Set   Training  Set   Defect Prone Class1 YES Class2 YES Class3 NO … YES ClassN … Within Project Cross-Project Issue: Size of the Training Set Issue: The predicting accuracy can be lower
  13. 13. Cost Effectiveness 1)  Cross-project does not necessarily works worse than within-project 2)  Better precision (accuracy) does not mirror less inspection cost 3)  Traditional predicting model: logistic regression Recaling the “imprecision” of Cross- project Defect Prediction, Rahman   at   al.   FSE  2012  
  14. 14. Cost Effectiveness: example Class  A   Class  B   Class  C   Class  D  
  15. 15. Cost Effectiveness: example Predicting model 1 Class  A   Class  B   Class  A   Class  C   Class  D   100 LOC 10,000 LOC 100 LOC 100 LOC 100 LOC Predicting model 2 Class  A   Class  B   Class  C   Class  D  
  16. 16. Cost Effectiveness: example Predicting model 1 Class  A   Class  B   Class  A   Class  C   Class  D   BUG   BUG   100 LOC 10,000 LOC 100 LOC 100 LOC 100 LOC Predicting model 2 Class  A   Class  B   Class  C   Class  D  
  17. 17. Cost Effectiveness: example Predicting model 1 Class  A   Class  B   Class  A   Class  C   Class  D   BUG   BUG   100 LOC 10,000 LOC 100 LOC 100 LOC 100 LOC Precision  =  50  %   Cost  =10,100  LOC   Predicting model 2 Class  A   Class  B   Class  C   Class  D  
  18. 18. Cost Effectiveness: an example Predicting model 1 Class  A   Class  B   Class  A   Class  C   Class  D   BUG   BUG   100 LOC 10,000 LOC 100 LOC 100 LOC 100 LOC Precision  =  50  %   Cost  =10,100  LOC   Predicting model 2 Precision  =  33  %   Cost  =  300  LOC   Class  A   Class  B   Class  C   Class  D  
  19. 19. Class  A   Class  B   Class  C   Class  D   Cost Effectiveness: an example Predicting model 1 Class  A   Class  B   Class  A   Class  C   Class  D   BUG   BUG   100 LOC 10,000 LOC 100 LOC 100 LOC 100 LOC Predicting model 2 Precision does not mirrorthe inspection cost All the existing predicting models work on precision and not on cost We need COST oriented models
  20. 20. Mul+-­‐objec+ve     Logis+c  Regression  
  21. 21. Building Predicting Model on Training Set Training  Set   P1 P2 … Class1 m11 m12 … Class2 m21 m22 … Class3 m31 m32 … Class4 … … … … … … … Logis<c   Regression   Pred. C1 1 C2 1 C3 0 C4 1 … 0
  22. 22. Building Predicting Model on Training Set Training  Set   Logis<c   Regression   Pred. C1 1 C2 1 C3 0 C4 1 … 0 Actual Val C1 1 C2 0 C3 1 C4 1 … 0 P1 P2 … Class1 m11 m12 … Class2 m21 m22 … Class3 m31 m32 … Class4 … … … … … … …
  23. 23. Building Predicting Model on Training Set Training  Set   Logis<c   Regression   Pred. C1 1 C2 1 C3 0 C4 1 … 0 Actual Val C1 1 C2 0 C3 1 C4 1 … 0 Comparison P1 P2 … Class1 m11 m12 … Class2 m21 m22 … Class3 m31 m32 … Class4 … … … … … … …
  24. 24. Building Predicting Model on Training Set Training  Set   Logis<c   Regression   Pred. C1 1 C2 1 C3 0 C4 1 … 0 Actual Val C1 1 C2 0 C3 1 C4 1 … 0 Comparison P1 P2 … Class1 m11 m12 … Class2 m21 m22 … Class3 m31 m32 … Class4 … … … … … … … GOAL: minimazing the predicting error (PRECISION)
  25. 25. Building Predicting Model on Training Set Training  Set   Logis<c   Regression   Pred. C1 1 C2 1 C3 0 C4 1 … 0 Actual Val C1 1 C2 0 C3 1 C4 1 … 0 Comparison P1 P2 … Class1 m11 m12 … Class2 m21 m22 … Class3 m31 m32 … Class4 … … … … … … … GOAL: minimazing the predicting error (PRECISION)
  26. 26. Multi-objective Logistic Regression Pred. 1 0 … 1 0 LOC 100 95 … 110 10 *   =   Cost 100 0 … 110 0 Ispection Cost = 210 LOC
  27. 27. Multi-objective Logistic Regression Pred. 1 0 … 1 0 LOC 100 95 … 110 10 *   =   Cost 100 0 … 110 0 Ispection Cost = 210 LOC Pred. 1 0 … 1 0 Actual Values 1 1 … 1 0 *   =   #Bug 1 0 … 1 0 Effectiveness = 2 defects
  28. 28. Multi-objective Logistic Regression ⎪ ⎩ ⎪ ⎨ ⎧ ⋅= ⋅= ∑ ∑ i ii i i i ActualPredessEffectiven CostPredCostIspectionmin max Pred. 1 0 … 1 0 LOC 100 95 … 110 10 *   =   Cost 100 0 … 110 0 Ispection Cost = 210 LOC Pred. 1 0 … 1 0 Actual Values 1 1 … 1 0 *   =   #Bug 1 0 … 1 0 Effectiveness = 2 defects
  29. 29. Multi-objective Logistic Regression ⎪ ⎩ ⎪ ⎨ ⎧ ⋅= ⋅= ∑ ∑ i ii i i i ActualedessEffectiven CostPredCostIspection Pr min max Pred. 1 0 … 1 0 LOC 100 95 … 110 10 *   =   Cost 100 0 … 110 0 Ispection Cost = 210 LOC Pred. 1 0 … 1 0 Actual Values 1 1 … 1 0 *   =   #Bug 1 0 … 1 0 Effectiveness = 2 defects
  30. 30. a + b mi1 + c mi2 + … Multi-objective Genetic Algorithm ⎪ ⎩ ⎪ ⎨ ⎧ ⋅= ⋅= ∑ ∑ i ii i i i ActualedessEffectiven CostPredCostIspection Pr min max . 1 e e Pred + = a + b mi1 + c mi2 + … Chromosome        (a, b,c , …) Fitness Function Multiple objectives are optimized using Pareto efficient approaches
  31. 31. Multi-objective Genetic Algorithm Pareto Optimality: all solutionsthat are not dominated by anyother solutions form the Paretooptimal set. Multiple otpimal solutions (models) can be found Cost Effectiveness The frontier allows to make a well-informed decision that balances the trade-offs between the two objectives
  32. 32. Empirical Evaluation
  33. 33. Research Questions RQ1: How does the multi-objective (MO)prediction perform, compared to single-objective (SO) prediction
  34. 34. Research Questions RQ1: How does the multi-objective (MO)prediction perform, compared to single-objective (SO) prediction Cross-project MO vs. cross-project SO vs. within project SO
  35. 35. Research Questions RQ2: How does the proposed approach perform, comparedto the local prediction approach by Menzie et al. ? RQ1: How does the multi-objective (MO)prediction perform, compared to single-objective (SO) prediction Cross-project MO vs. cross-project SO vs. within project SO
  36. 36. Research Questions RQ2: How does the proposed approach perform, comparedto the local prediction approach by Menzie et al. ? RQ1: How does the multi-objective (MO)prediction perform, compared to single-objective (SO) prediction Cross-project MO vs. cross-project SO vs. within project SO Cross-project MO vs. Local Prediction
  37. 37. Experiment outline • 10 java projects from PROMISE datasetü   different  sizes   ü   different  context  applica<on  
  38. 38. • 10 java projects from PROMISE datasetü   different  sizes   ü   different  context  applica<on   Experiment outline • Cross-projects defect prediction: ü Training  model  on  nine  projects  and  test  on  the  remaining  one     (10  <mes)   RQ1  
  39. 39. • 10 java projects from PROMISE datasetü   different  sizes   ü   different  context  applica<on   Experiment outline • Cross-projects defect prediction: ü Training  model  on  nine  projects  and  test  on  the  remaining  one     (10  <mes)   • Within project defect prediction: ü   10  cross-­‐folder  valida<on   RQ1   RQ1  
  40. 40. • 10 java projects from PROMISE datasetü   different  sizes   ü   different  context  applica<on   Experiment outline • Cross-projects defect prediction: ü Training  model  on  nine  projects  and  test  on  the  remaining  one     (10  <mes)   • Within project defect prediction: ü   10  cross-­‐folder  valida<on   • Local prediction: ü     K-­‐means  clustering  algorithm   ü   Silhoue]e  Coefficient   RQ1   RQ1   RQ2  
  41. 41. Results
  42. 42. Results Log4jjEdit
  43. 43. Cross-project MO vs. Cross-project SO 0   50   100   150   200   250   300   KLOC   Cross-­‐project  SO   Cross  project  MO  
  44. 44. Cross-project MO vs. Cross-project SO 0   50   100   150   200   250   300   KLOC   Cross-­‐project  SO   Cross  project  MO   The proposed multi-objective model Outperform the single-objective one
  45. 45. Cross-project MO vs. Within-project SO 0   50   100   150   200   250   300   350   KLOC   Within  project  SO   Cross  project  MO  
  46. 46. Cross-project MO vs. Within-project SO 0   10   20   30   40   50   60   70   80   90   100   Precision   Within  project  SO   Cross  project  MO  
  47. 47. Cross-project MO vs. Within-project SO 0   10   20   30   40   50   60   70   80   90   100   Precision   Within  project  SO   Cross  project  MO   Cross-project prediction is worse than within-project prediction in terms of PRECISION
  48. 48. Cross-project MO vs. Within-project SO 0   10   20   30   40   50   60   70   80   90   100   Precision   Within  project  SO   Cross  project  MO   Cross-project prediction is worse than within-project prediction in terms of PRECISION But it is better than within-project predictors in term of COST-EFFECTIVENESS
  49. 49. 0   50   100   150   200   250   300   KLOC   Local  Predic<on   Cross  project  MO   Cross-project MO vs. Local Prediction
  50. 50. 0   50   100   150   200   250   300   KLOC   Local  Predic<on   Cross  project  MO   Cross-project MO vs. Local Prediction The multi-objective predictor outperforms the local predictor.
  51. 51. Conclusions
  52. 52. Conclusions
  53. 53. Conclusions
  54. 54. Conclusions

×