Micro Interaction Metrics                for Defect PredictionTaek Lee, Jaechang Nam, Dongyun Han, Sunghun Kim, Hoh Peter ...
Outline• Research motivation• The existing metrics• The proposed metrics• Experiment results• Threats to validity• Conclus...
Defect Prediction? why is it necessary?
Software quality assurance is inherently a resource   constrained activity!
Predicting defect-prone     software entities* isto put the best labor effort        on the entities                  * fu...
Indicators of defects• Complexity of source codes        (Chidamber and Kemerer 1994)• Frequent code changes     (Moser et...
Indeed,where do defects   come from?
Humans Error!Programmers make mistakes,  consequently defects are injected, and software fails    Human      Bugs     Soft...
Programmer Interaction and Software Quality
Programmer Interaction    and Software Quality“Errors are from cognitive breakdownwhile understanding and implementing    ...
Programmer Interaction    and Software Quality“Errors are from cognitive breakdownwhile understanding and implementing    ...
Don’t we need to also consider   developers’ interactions          as defect indicators?
…, but the existing indicatorscan NOT directly capture  developers’ interactions
Using Mylyn data, we propose novel“Micro Interaction Metrics (MIMs)”    capturing developers’ interactions
The Mylyn* data is stored  as an attachment to thecorresponding bug reports in       the XML format      * Eclipse plug-in...
<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ”        … StructureHandle=“ ” … Interest=“ ” … >
<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ”        … StructureHandle=“ ” … Interest=“ ” … >
<InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ”        … StructureHandle=“ ” … Interest=“ ” … >
Two levels of MIMs Design
Two levels of MIMs DesignFile-level MIMsspecific interactions for afile in a task(e.g., AvgTimeIntervalEditEdit)
Two levels of MIMs DesignFile-level MIMsspecific interactions for afile in a task(e.g., AvgTimeIntervalEditEdit)Task-level...
Two levels of MIMs DesignFile-level MIMs                   Mylyn Task Logsspecific interactions for afile in a task       ...
Two levels of MIMs Design                         Mylyn Task Logs                         10:30   Selection   file ATask-l...
The Proposed Micro Interaction Metrics
The Proposed Micro Interaction Metrics
The Proposed Micro Interaction Metrics
For example,NumPatternSXEY is to capture     this interaction:
For example, NumPatternSXEY is to capture      this interaction:“How much times did a programmer     Select a file of grou...
X if a file shows defect                           locality* propertiesgroup X or Y                           Y otherwise ...
Bug Prediction Process
STEP1: Counting & Labeling               Instances           Task 1   Task 2   Task 3       Task i        Task i+1      Ta...
STEP1: Counting & Labeling               Instances           Task 1   Task 2   Task 3         Task i        Task i+1      ...
STEP1: Counting & Labeling               Instances           Task 1   Task 2   Task 3       Task i        Task i+1      Ta...
STEP1: Counting & Labeling               Instances           Task 1   Task 2   Task 3       Task i        Task i+1      Ta...
STEP1: Counting & Labeling               Instances           Task 1   Task 2   Task 3       Task i        Task i+1      Ta...
STEP1: Counting & Labeling               Instances           Task 1   Task 2   Task 3       Task i        Task i+1      Ta...
STEP1: Counting & Labeling               Instances           Task 1   Task 2   Task 3       Task i        Task i+1      Ta...
STEP1: Counting & Labeling               Instances           Task 1   Task 2   Task 3       Task i        Task i+1      Ta...
STEP1: Counting & Labeling               Instances   The number of counted post defects      (edited files only within bug...
STEP2: Extraction of MIMsDec 2005                Time P     Sep 2010
STEP2: Extraction of MIMs           Task 1    Task 2     Task 3          Task 4                                           ...
STEP2: Extraction of MIMs           Task 1           f3.java              ...             edit              …             ...
STEP2: Extraction of MIMs                                                   Metrics Computation           Task 1          ...
STEP2: Extraction of MIMs                                                   Metrics Computation           Task 1          ...
STEP2: Extraction of MIMs                                                   Metrics Computation           Task 1     Task ...
STEP2: Extraction of MIMs                                                   Metrics Computation           Task 1     Task ...
STEP2: Extraction of MIMs                                                                 Metrics Computation           Ta...
STEP2: Extraction of MIMs                                                                 Metrics Computation           Ta...
STEP2: Extraction of MIMs                                                                 Metrics Computation           Ta...
Understand JAVA tool was usedfor extracting 32 source Code      Metrics (CMs)*            * Chidamber and Kemerer, and OO ...
Understand JAVA tool was used      for extracting 32 source Code                                Metrics (CMs)*            ...
Fifteen History Metrics (HMs)* were    collected from the corresponding              CVS repository                       ...
Fifteen History Metrics (HMs)* were     collected from the corresponding               CVS repository                     ...
STEP3: Creating a training corpus   Instance               Extracted MIMs     Label     Name                              ...
STEP4: Building prediction modelsClassification and Regressionmodeling with different machine learning algorithms using th...
STEP5: Prediction EvaluationClassification  Measures
STEP5: Prediction Evaluation                     How many instances                    are really buggy among             ...
STEP5: Prediction Evaluation                        How many instances                       are really buggy among       ...
STEP5: Prediction Evaluation                         Regression                          Measures               correlatio...
STEP5: Prediction Evaluation                                         Regression  between # of real buggyinstances and # of...
T-test with 100 times of 10-fold cross validation  Reject H0* and accept H1*        if p-value < 0.05 (at the 95% confiden...
Result Summary MIMs improve prediction accuracy for1 different Eclipse project subjects2 different machine learning algori...
Prediction for different project subjects          File instances and % of defects
Prediction for different project subjects    MIM: the proposed metrics   CM: source code metrics   HM: history metrics
Prediction for different project subjects                                                BASELINE: Dummy Classifier       ...
Prediction for different project subjects    MIM: the proposed metrics   CM: source code metrics   HM: history metrics
Prediction for different project subjects    T-test results (significant figures are in bold, p-value < 0.05)
Prediction with different algorithms
Prediction with different algorithms  T-test results (significant figures are in bold, p-value < 0.05)
Prediction in different training periods           Model training period        Model testing periodDec 2005              ...
Prediction in different training periods               50%                    :               50%               70%       ...
Prediction in different training periods
Prediction in different training periods   T-test results (significant figures are in bold, p-value < 0.05)
Top 42 (37%) from MIMs among total 113 metrics   (MIMs+CMs+HMs)
Possible InsightTOP 1: NumLowDOIEditTOP 2: NumPatternEXSXTOP 3: TimeSpentOnEdit
Possible Insight        TOP 1: NumLowDOIEdit        TOP 2: NumPatternEXSX        TOP 3: TimeSpentOnEditChances are that mo...
Performance comparisonwith regression modelingfor predicting # of post-defects
Predicting Post-Defect Numbers
Predicting Post-Defect NumbersT-test results (significant figures are in bold, p-value < 0.05)
Threats to Validity• Systems examined might not be representative• Systems are all open source projects• Defect informatio...
ConclusionOur findings exemplify that developer’sinteraction can affect software qualityOur proposed micro interaction met...
We believe future defect predictionmodels will use more developers’ direct and   micro level interaction informationMIMs a...
Thank you! Any Question?• Problem  – Developer’s interaction information can affect    software quality (defects)?• Approa...
Backup Slides
One possible ARGUMENT: Some developers may nothave used Mylyn to fix bugs
Error chance in counting post-defects    as a result getting biased labels(i.e., incorrect % of buggy instances)
Repeated experiment using same instances but with adifferent heuristics of defect counting, CVS-log-based         approach...
Prediction with CVS-log-based approach                                     CVS-log-based
Prediction with CVS-log-based approach                                                                      CVS-log-based ...
CVS-log-based approach reported more        additional post-defects (more % of buggy-labeled instances)
CVS-log-based approach reported more        additional post-defects (more % of buggy-labeled instances) MIMs failed to fea...
Note that it is difficult to100% guarantee the quality of          CVS change logs (e.g., no explicit bug ID, missing logs)
Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)
Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)
Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)
Upcoming SlideShare
Loading in …5
×

Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

1,550 views
1,413 views

Published on

ESEC/FSE presentation

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,550
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Micro Interaction Metrics for Defect Prediction (ESEC/FSE 2011)

  1. 1. Micro Interaction Metrics for Defect PredictionTaek Lee, Jaechang Nam, Dongyun Han, Sunghun Kim, Hoh Peter In FSE 2011, Hungary, Sep. 5-9
  2. 2. Outline• Research motivation• The existing metrics• The proposed metrics• Experiment results• Threats to validity• Conclusion
  3. 3. Defect Prediction? why is it necessary?
  4. 4. Software quality assurance is inherently a resource constrained activity!
  5. 5. Predicting defect-prone software entities* isto put the best labor effort on the entities * functions or code files
  6. 6. Indicators of defects• Complexity of source codes (Chidamber and Kemerer 1994)• Frequent code changes (Moser et al. 2008)• Previous defect information (Kim et al. 2007)• Code dependencies (Zimmermann 2007)
  7. 7. Indeed,where do defects come from?
  8. 8. Humans Error!Programmers make mistakes, consequently defects are injected, and software fails Human Bugs Software Errors Injected fails
  9. 9. Programmer Interaction and Software Quality
  10. 10. Programmer Interaction and Software Quality“Errors are from cognitive breakdownwhile understanding and implementing requirements” - Ko et al. 2005
  11. 11. Programmer Interaction and Software Quality“Errors are from cognitive breakdownwhile understanding and implementing requirements” - Ko et al. 2005“Work interruptions or task switchingmay affect programmer productivity” - DeLine et al. 2006
  12. 12. Don’t we need to also consider developers’ interactions as defect indicators?
  13. 13. …, but the existing indicatorscan NOT directly capture developers’ interactions
  14. 14. Using Mylyn data, we propose novel“Micro Interaction Metrics (MIMs)” capturing developers’ interactions
  15. 15. The Mylyn* data is stored as an attachment to thecorresponding bug reports in the XML format * Eclipse plug-in storing and recovering task contexts
  16. 16. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >
  17. 17. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >
  18. 18. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >
  19. 19. Two levels of MIMs Design
  20. 20. Two levels of MIMs DesignFile-level MIMsspecific interactions for afile in a task(e.g., AvgTimeIntervalEditEdit)
  21. 21. Two levels of MIMs DesignFile-level MIMsspecific interactions for afile in a task(e.g., AvgTimeIntervalEditEdit)Task-level MIMsproperty values sharedover the whole task(e.g., TimeSpent)
  22. 22. Two levels of MIMs DesignFile-level MIMs Mylyn Task Logsspecific interactions for afile in a task 10:30 Selection file A(e.g., AvgTimeIntervalEditEdit) 11:00 Edit file B 12:30 Edit file B
  23. 23. Two levels of MIMs Design Mylyn Task Logs 10:30 Selection file ATask-level MIMs 11:00 Edit file Bproperty values shared 12:30 Edit file Bover the whole task(e.g., TimeSpent)
  24. 24. The Proposed Micro Interaction Metrics
  25. 25. The Proposed Micro Interaction Metrics
  26. 26. The Proposed Micro Interaction Metrics
  27. 27. For example,NumPatternSXEY is to capture this interaction:
  28. 28. For example, NumPatternSXEY is to capture this interaction:“How much times did a programmer Select a file of group X and then Edit a file of group Y in a task activity?”
  29. 29. X if a file shows defect locality* propertiesgroup X or Y Y otherwise H if a file hasgroup H or L high** DOI value L otherwise * hinted by the paper [Kim et al. 2007] ** threshold: median of degree of interest (DOI) values in a task
  30. 30. Bug Prediction Process
  31. 31. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.javaDec 2005 Time P Sep 2010
  32. 32. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.javaDec 2005 Time P Sep 2010 All the Mylyn task data collectable from Eclipse subprojects (Dec 05 ~Sep 10)
  33. 33. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.javaDec 2005 Time P Sep 2010
  34. 34. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.javaDec 2005 Time P Sep 2010 Post-defect counting period
  35. 35. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.javaDec 2005 Time P Sep 2010 Post-defect counting period
  36. 36. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.javaDec 2005 Time P Sep 2010 Post-defect counting period
  37. 37. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.javaDec 2005 Time P Sep 2010 Post-defect counting period
  38. 38. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.javaDec 2005 Time P Sep 2010 Post-defect counting period
  39. 39. STEP1: Counting & Labeling Instances The number of counted post defects (edited files only within bug fixing tasks) Task 1 f1.java Task 3 Task 2 =1 Task i Task i+1 Task i+2 Task i+3 f2.java = 1 f3.java = 2 f1.java f3.java f2.java … … f1.java f3.java f2.java … f3.java Labeling rule for the file instance “buggy” (if # of post-defects > 0)Dec 2005 “clean” (if # of post-defects = 0) Time P Sep 2010 Post-defect counting period
  40. 40. STEP2: Extraction of MIMsDec 2005 Time P Sep 2010
  41. 41. STEP2: Extraction of MIMs Task 1 Task 2 Task 3 Task 4 …Dec 2005 Time P Sep 2010 Metrics extraction period
  42. 42. STEP2: Extraction of MIMs Task 1 f3.java ... edit … edit …Dec 2005 Time P Sep 2010 Metrics extraction period
  43. 43. STEP2: Extraction of MIMs Metrics Computation Task 1 f3.java ... edit … edit …Dec 2005 Time P Sep 2010 Metrics extraction period
  44. 44. STEP2: Extraction of MIMs Metrics Computation Task 1 MIMf3.java  valueTask1 f3.java ... edit … edit …Dec 2005 Time P Sep 2010 Metrics extraction period
  45. 45. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 MIMf3.java  valueTask1 f3.java ... f1.java ... MIMf1.java  valueTask2 edit edit … … edit edit … …Dec 2005 Time P Sep 2010 Metrics extraction period
  46. 46. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 MIMf3.java  valueTask1 f3.java ... f1.java ... f2.java ... MIMf1.java  valueTask2 edit edit edit … edit … edit … edit MIMf2.java  valueTask3 … … …Dec 2005 Time P Sep 2010 Metrics extraction period
  47. 47. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 Task 4 MIMf3.java  valueTask1 f3.java ... f1.java ... f2.java ... f1.java MIMf1.java  valueTask2 …edit edit edit edit MIMf2.java  valueTask3 …edit.. … … … … edit edit edit f2.java … … … …edit…Dec 2005 Time P Sep 2010 Metrics extraction period
  48. 48. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 Task 4 MIMf3.java  valueTask1 f3.java f1.java f2.java f1.java MIMf1.java  (valueTask2+valueTask4) ... ... ... …edit edit … edit … edit … …edit.. f2.java … MIMf2.java  (valueTask3+valueTask4) edit edit edit … … … …edit…Dec 2005 Time P Sep 2010 Metrics extraction period
  49. 49. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 Task 4 MIMf3.java  valueTask1 f3.java f1.java f2.java f1.java MIMf1.java  (valueTask2+valueTask4)/2 ) ... ... ... …edit edit … edit … edit … …edit.. f2.java … MIMf2.java  (valueTask3+valueTask4)/2 ) edit edit edit … … … …edit…Dec 2005 Time P Sep 2010 Metrics extraction period
  50. 50. Understand JAVA tool was usedfor extracting 32 source Code Metrics (CMs)* * Chidamber and Kemerer, and OO metrics
  51. 51. Understand JAVA tool was used for extracting 32 source Code Metrics (CMs)* List of selected source code metrics CVS last revision …Dec 2005 Time P Sep 2010 * Chidamber and Kemerer, and OO metrics
  52. 52. Fifteen History Metrics (HMs)* were collected from the corresponding CVS repository * Moser et al.
  53. 53. Fifteen History Metrics (HMs)* were collected from the corresponding CVS repository List of history metrics (HMs) CVS revisions …Dec 2005 Time P Sep 2010 * Moser et al.
  54. 54. STEP3: Creating a training corpus Instance Extracted MIMs Label Name Training … Classifier Instance # of post Extracted MIMs Name defects Training … Regression
  55. 55. STEP4: Building prediction modelsClassification and Regressionmodeling with different machine learning algorithms using the WEKA* tool * an open source data mining tool
  56. 56. STEP5: Prediction EvaluationClassification Measures
  57. 57. STEP5: Prediction Evaluation How many instances are really buggy among the buggy-predicted outcomes?Classification Measures
  58. 58. STEP5: Prediction Evaluation How many instances are really buggy among the buggy-predicted outcomes? How many instances are correctly predicted as ‘buggy’ among the real buggy onesClassification Measures
  59. 59. STEP5: Prediction Evaluation Regression Measures correlation coefficient (-1~1) mean absolute error (0~1) root square error (0~1)
  60. 60. STEP5: Prediction Evaluation Regression between # of real buggyinstances and # of instances Measures predicted as buggy correlation coefficient (-1~1) mean absolute error (0~1) root square error (0~1)
  61. 61. T-test with 100 times of 10-fold cross validation Reject H0* and accept H1* if p-value < 0.05 (at the 95% confidence level) * H0: no difference in average performance, H1: different (better!)
  62. 62. Result Summary MIMs improve prediction accuracy for1 different Eclipse project subjects2 different machine learning algorithms3 different model training periods
  63. 63. Prediction for different project subjects File instances and % of defects
  64. 64. Prediction for different project subjects MIM: the proposed metrics CM: source code metrics HM: history metrics
  65. 65. Prediction for different project subjects BASELINE: Dummy Classifier predicts in a purely random manner e.g., for 12.5% of buggy instances Precision(B)=12.5%, Recall(B)=50% F-measure(B)=20% MIM: the proposed metrics CM: source code metrics HM: history metrics
  66. 66. Prediction for different project subjects MIM: the proposed metrics CM: source code metrics HM: history metrics
  67. 67. Prediction for different project subjects T-test results (significant figures are in bold, p-value < 0.05)
  68. 68. Prediction with different algorithms
  69. 69. Prediction with different algorithms T-test results (significant figures are in bold, p-value < 0.05)
  70. 70. Prediction in different training periods Model training period Model testing periodDec 2005 Sep 2010 Time P
  71. 71. Prediction in different training periods 50% : 50% 70% : 30% 80% : 20% Model training period Model testing periodDec 2005 Sep 2010 Time P
  72. 72. Prediction in different training periods
  73. 73. Prediction in different training periods T-test results (significant figures are in bold, p-value < 0.05)
  74. 74. Top 42 (37%) from MIMs among total 113 metrics (MIMs+CMs+HMs)
  75. 75. Possible InsightTOP 1: NumLowDOIEditTOP 2: NumPatternEXSXTOP 3: TimeSpentOnEdit
  76. 76. Possible Insight TOP 1: NumLowDOIEdit TOP 2: NumPatternEXSX TOP 3: TimeSpentOnEditChances are that more defects might be generatedif a programmer TOP2 repeatedly edit and browse afile especially related to the previous defects TOP3 with putting more weight on editing time, and especially TOP1 when editing such the files less frequently or less recently accessed ever …
  77. 77. Performance comparisonwith regression modelingfor predicting # of post-defects
  78. 78. Predicting Post-Defect Numbers
  79. 79. Predicting Post-Defect NumbersT-test results (significant figures are in bold, p-value < 0.05)
  80. 80. Threats to Validity• Systems examined might not be representative• Systems are all open source projects• Defect information might be biased
  81. 81. ConclusionOur findings exemplify that developer’sinteraction can affect software qualityOur proposed micro interaction metrics improve defect prediction accuracy significantly …
  82. 82. We believe future defect predictionmodels will use more developers’ direct and micro level interaction informationMIMs are a first step towards it
  83. 83. Thank you! Any Question?• Problem – Developer’s interaction information can affect software quality (defects)?• Approach – We proposed novel micro interaction metrics (MIMs) overcoming the popular static metrics• Result – MIMs significantly improve prediction accuracy compared to source code metrics (CMs) and history metrics (HMs)
  84. 84. Backup Slides
  85. 85. One possible ARGUMENT: Some developers may nothave used Mylyn to fix bugs
  86. 86. Error chance in counting post-defects as a result getting biased labels(i.e., incorrect % of buggy instances)
  87. 87. Repeated experiment using same instances but with adifferent heuristics of defect counting, CVS-log-based approach* * with keywords: “fix”, “bug”, “bug report ID” in change logs
  88. 88. Prediction with CVS-log-based approach CVS-log-based
  89. 89. Prediction with CVS-log-based approach CVS-log-based T-test results (significant figures are in bold, p-value < 0.05)
  90. 90. CVS-log-based approach reported more additional post-defects (more % of buggy-labeled instances)
  91. 91. CVS-log-based approach reported more additional post-defects (more % of buggy-labeled instances) MIMs failed to feature them due tolack of the corresponding Mylyn data
  92. 92. Note that it is difficult to100% guarantee the quality of CVS change logs (e.g., no explicit bug ID, missing logs)

×