Report

Share

Follow

•1 like•1,702 views

•1 like•1,702 views

Report

Share

Download to read offline

ESEC/FSE presentation

Follow

- 1. Micro Interaction Metrics for Defect Prediction Taek Lee, Jaechang Nam, Dongyun Han, Sunghun Kim, Hoh Peter In FSE 2011, Hungary, Sep. 5-9
- 2. Outline • Research motivation • The existing metrics • The proposed metrics • Experiment results • Threats to validity • Conclusion
- 3. Defect Prediction? why is it necessary?
- 4. Software quality assurance is inherently a resource constrained activity!
- 5. Predicting defect-prone software entities* is to put the best labor effort on the entities * functions or code files
- 6. Indicators of defects • Complexity of source codes (Chidamber and Kemerer 1994) • Frequent code changes (Moser et al. 2008) • Previous defect information (Kim et al. 2007) • Code dependencies (Zimmermann 2007)
- 7. Indeed, where do defects come from?
- 8. Humans Error! Programmers make mistakes, consequently defects are injected, and software fails Human Bugs Software Errors Injected fails
- 9. Programmer Interaction and Software Quality
- 10. Programmer Interaction and Software Quality “Errors are from cognitive breakdown while understanding and implementing requirements” - Ko et al. 2005
- 11. Programmer Interaction and Software Quality “Errors are from cognitive breakdown while understanding and implementing requirements” - Ko et al. 2005 “Work interruptions or task switching may affect programmer productivity” - DeLine et al. 2006
- 12. Don’t we need to also consider developers’ interactions as defect indicators?
- 13. …, but the existing indicators can NOT directly capture developers’ interactions
- 14. Using Mylyn data, we propose novel “Micro Interaction Metrics (MIMs)” capturing developers’ interactions
- 15. The Mylyn* data is stored as an attachment to the corresponding bug reports in the XML format * Eclipse plug-in storing and recovering task contexts
- 17. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >
- 18. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >
- 19. <InteractionEvent … Kind=“ ” … StartDate=“ ” EndDate=“ ” … StructureHandle=“ ” … Interest=“ ” … >
- 20. Two levels of MIMs Design
- 21. Two levels of MIMs Design File-level MIMs specific interactions for a file in a task (e.g., AvgTimeIntervalEditEdit)
- 22. Two levels of MIMs Design File-level MIMs specific interactions for a file in a task (e.g., AvgTimeIntervalEditEdit) Task-level MIMs property values shared over the whole task (e.g., TimeSpent)
- 23. Two levels of MIMs Design File-level MIMs Mylyn Task Logs specific interactions for a file in a task 10:30 Selection file A (e.g., AvgTimeIntervalEditEdit) 11:00 Edit file B 12:30 Edit file B
- 24. Two levels of MIMs Design Mylyn Task Logs 10:30 Selection file A Task-level MIMs 11:00 Edit file B property values shared 12:30 Edit file B over the whole task (e.g., TimeSpent)
- 25. The Proposed Micro Interaction Metrics
- 26. The Proposed Micro Interaction Metrics
- 27. The Proposed Micro Interaction Metrics
- 28. For example, NumPatternSXEY is to capture this interaction:
- 29. For example, NumPatternSXEY is to capture this interaction: “How much times did a programmer Select a file of group X and then Edit a file of group Y in a task activity?”
- 30. X if a file shows defect locality* properties group X or Y Y otherwise H if a file has group H or L high** DOI value L otherwise * hinted by the paper [Kim et al. 2007] ** threshold: median of degree of interest (DOI) values in a task
- 32. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010
- 33. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 All the Mylyn task data collectable from Eclipse subprojects (Dec 05 ~Sep 10)
- 34. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010
- 35. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
- 36. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
- 37. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
- 38. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
- 39. STEP1: Counting & Labeling Instances Task 1 Task 2 Task 3 Task i Task i+1 Task i+2 Task i+3 f1.java f3.java f2.java … f2.java … f1.java f3.java f3.java Dec 2005 Time P Sep 2010 Post-defect counting period
- 40. STEP1: Counting & Labeling Instances The number of counted post defects (edited files only within bug fixing tasks) Task 1 f1.java Task 3 Task 2 =1 Task i Task i+1 Task i+2 Task i+3 f2.java = 1 f3.java = 2 f1.java f3.java f2.java … … f1.java f3.java f2.java … f3.java Labeling rule for the file instance “buggy” (if # of post-defects > 0) Dec 2005 “clean” (if # of post-defects = 0) Time P Sep 2010 Post-defect counting period
- 41. STEP2: Extraction of MIMs Dec 2005 Time P Sep 2010
- 42. STEP2: Extraction of MIMs Task 1 Task 2 Task 3 Task 4 … Dec 2005 Time P Sep 2010 Metrics extraction period
- 43. STEP2: Extraction of MIMs Task 1 f3.java ... edit … edit … Dec 2005 Time P Sep 2010 Metrics extraction period
- 44. STEP2: Extraction of MIMs Metrics Computation Task 1 f3.java ... edit … edit … Dec 2005 Time P Sep 2010 Metrics extraction period
- 45. STEP2: Extraction of MIMs Metrics Computation Task 1 MIMf3.java valueTask1 f3.java ... edit … edit … Dec 2005 Time P Sep 2010 Metrics extraction period
- 46. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 MIMf3.java valueTask1 f3.java ... f1.java ... MIMf1.java valueTask2 edit edit … … edit edit … … Dec 2005 Time P Sep 2010 Metrics extraction period
- 47. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 MIMf3.java valueTask1 f3.java ... f1.java ... f2.java ... MIMf1.java valueTask2 edit edit edit … edit … edit … edit MIMf2.java valueTask3 … … … Dec 2005 Time P Sep 2010 Metrics extraction period
- 48. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 Task 4 MIMf3.java valueTask1 f3.java ... f1.java ... f2.java ... f1.java MIMf1.java valueTask2 …edit edit edit edit MIMf2.java valueTask3 …edit.. … … … … edit edit edit f2.java … … … …edit… Dec 2005 Time P Sep 2010 Metrics extraction period
- 49. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 Task 4 MIMf3.java valueTask1 f3.java f1.java f2.java f1.java MIMf1.java (valueTask2+valueTask4) ... ... ... …edit edit … edit … edit … …edit.. f2.java … MIMf2.java (valueTask3+valueTask4) edit edit edit … … … …edit… Dec 2005 Time P Sep 2010 Metrics extraction period
- 50. STEP2: Extraction of MIMs Metrics Computation Task 1 Task 2 Task 3 Task 4 MIMf3.java valueTask1 f3.java f1.java f2.java f1.java MIMf1.java (valueTask2+valueTask4)/2 ) ... ... ... …edit edit … edit … edit … …edit.. f2.java … MIMf2.java (valueTask3+valueTask4)/2 ) edit edit edit … … … …edit… Dec 2005 Time P Sep 2010 Metrics extraction period
- 51. Understand JAVA tool was used for extracting 32 source Code Metrics (CMs)* * Chidamber and Kemerer, and OO metrics
- 52. Understand JAVA tool was used for extracting 32 source Code Metrics (CMs)* List of selected source code metrics CVS last revision … Dec 2005 Time P Sep 2010 * Chidamber and Kemerer, and OO metrics
- 53. Fifteen History Metrics (HMs)* were collected from the corresponding CVS repository * Moser et al.
- 54. Fifteen History Metrics (HMs)* were collected from the corresponding CVS repository List of history metrics (HMs) CVS revisions … Dec 2005 Time P Sep 2010 * Moser et al.
- 55. STEP3: Creating a training corpus Instance Extracted MIMs Label Name Training … Classifier Instance # of post Extracted MIMs Name defects Training … Regression
- 56. STEP4: Building prediction models Classification and Regression modeling with different machine learning algorithms using the WEKA* tool * an open source data mining tool
- 57. STEP5: Prediction Evaluation Classification Measures
- 58. STEP5: Prediction Evaluation How many instances are really buggy among the buggy-predicted outcomes? Classification Measures
- 59. STEP5: Prediction Evaluation How many instances are really buggy among the buggy-predicted outcomes? How many instances are correctly predicted as ‘buggy’ among the real buggy ones Classification Measures
- 60. STEP5: Prediction Evaluation Regression Measures correlation coefficient (-1~1) mean absolute error (0~1) root square error (0~1)
- 61. STEP5: Prediction Evaluation Regression between # of real buggy instances and # of instances Measures predicted as buggy correlation coefficient (-1~1) mean absolute error (0~1) root square error (0~1)
- 62. T-test with 100 times of 10-fold cross validation Reject H0* and accept H1* if p-value < 0.05 (at the 95% confidence level) * H0: no difference in average performance, H1: different (better!)
- 63. Result Summary MIMs improve prediction accuracy for 1 different Eclipse project subjects 2 different machine learning algorithms 3 different model training periods
- 64. Prediction for different project subjects File instances and % of defects
- 65. Prediction for different project subjects MIM: the proposed metrics CM: source code metrics HM: history metrics
- 66. Prediction for different project subjects BASELINE: Dummy Classifier predicts in a purely random manner e.g., for 12.5% of buggy instances Precision(B)=12.5%, Recall(B)=50% F-measure(B)=20% MIM: the proposed metrics CM: source code metrics HM: history metrics
- 67. Prediction for different project subjects MIM: the proposed metrics CM: source code metrics HM: history metrics
- 68. Prediction for different project subjects T-test results (significant figures are in bold, p-value < 0.05)
- 69. Prediction with different algorithms
- 70. Prediction with different algorithms T-test results (significant figures are in bold, p-value < 0.05)
- 71. Prediction in different training periods Model training period Model testing period Dec 2005 Sep 2010 Time P
- 72. Prediction in different training periods 50% : 50% 70% : 30% 80% : 20% Model training period Model testing period Dec 2005 Sep 2010 Time P
- 73. Prediction in different training periods
- 74. Prediction in different training periods T-test results (significant figures are in bold, p-value < 0.05)
- 76. Top 42 (37%) from MIMs among total 113 metrics (MIMs+CMs+HMs)
- 77. Possible Insight TOP 1: NumLowDOIEdit TOP 2: NumPatternEXSX TOP 3: TimeSpentOnEdit
- 78. Possible Insight TOP 1: NumLowDOIEdit TOP 2: NumPatternEXSX TOP 3: TimeSpentOnEdit Chances are that more defects might be generated if a programmer TOP2 repeatedly edit and browse a file especially related to the previous defects TOP3 with putting more weight on editing time, and especially TOP1 when editing such the files less frequently or less recently accessed ever …
- 79. Performance comparison with regression modeling for predicting # of post-defects
- 81. Predicting Post-Defect Numbers T-test results (significant figures are in bold, p-value < 0.05)
- 82. Threats to Validity • Systems examined might not be representative • Systems are all open source projects • Defect information might be biased
- 83. Conclusion Our findings exemplify that developer’s interaction can affect software quality Our proposed micro interaction metrics improve defect prediction accuracy significantly …
- 84. We believe future defect prediction models will use more developers’ direct and micro level interaction information MIMs are a first step towards it
- 85. Thank you! Any Question? • Problem – Developer’s interaction information can affect software quality (defects)? • Approach – We proposed novel micro interaction metrics (MIMs) overcoming the popular static metrics • Result – MIMs significantly improve prediction accuracy compared to source code metrics (CMs) and history metrics (HMs)
- 86. Backup Slides
- 88. One possible ARGUMENT: Some developers may not have used Mylyn to fix bugs
- 89. Error chance in counting post-defects as a result getting biased labels (i.e., incorrect % of buggy instances)
- 90. Repeated experiment using same instances but with a different heuristics of defect counting, CVS-log-based approach* * with keywords: “fix”, “bug”, “bug report ID” in change logs
- 91. Prediction with CVS-log-based approach CVS-log-based
- 92. Prediction with CVS-log-based approach CVS-log-based T-test results (significant figures are in bold, p-value < 0.05)
- 93. CVS-log-based approach reported more additional post-defects (more % of buggy-labeled instances)
- 94. CVS-log-based approach reported more additional post-defects (more % of buggy-labeled instances) MIMs failed to feature them due to lack of the corresponding Mylyn data
- 95. Note that it is difficult to 100% guarantee the quality of CVS change logs (e.g., no explicit bug ID, missing logs)