Rongxin Wu, Hongyu Zhang, Sunghum Kim, Shi-chi Cheung                 Tsinghua University, ChinaThe Hong Kong University o...
• The links between fixed bugs and committed  changes are important:  – for measuring software quality  – for constructing...
• To discover the links:        Mining software repository!• Heuristics traditionally used to collect links  between bugs ...
Defective        4
Missing Links!Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets”, FSE 2009   a5
• Missing bug reference in change log• Irregular bug reference formats   “issue 681” , “bug 232”, “Fixed for #239”, “see ...
• To recover the missing links, we studied many  bug reports (including comments) and change  logs• We have identified the...
• Time interval between bug-fix time and  change committed time                                           8
• Through empirical studies, we have identified  the following features of links:  – Bug owner and change committer: they ...
Mapping• Bug owner and change committer                            relationship       Bug Owner            Change Committe...
Bug owner and change committer                                 11
• Through empirical studies, we have identified  the following features of links:  – Text similarity: the textual descript...
• Text similarity       Texts are                         similar!                        Using IR                     tec...
14
• To determine the criteria of features, we learn  from the explicit links that can be identified  through traditional heu...
• Determine time interval and similarity threshold                                   Step by step search the              ...
• Determine mapping relationship between bug  owners and change committers                                To find the poss...
• To obtain the ground truth (“golden set” of links)  • For ZXing and OpenIntents, we manually identify the links  • For A...
• Four possible outcomes  –   A link we identify is a true link → TP  –   A link we identify is not a true link → FP  –   ...
F-measure    Recall                                                          ReLink                                       ...
21
• What can we do with the recovered links?  – Improving Maintainability Measurement    The percentage of bug-fixing chang...
• Maintainability Measurement:                                 23
24
• Defect Prediction  ReLink can improve the performance of defect prediction!
• The quality of golden set of links can’t be  completely assured• All the datasets are collected from open source  projec...
• We propose ReLink to recover the missing  links• The recovered links have positive impact on  the follow-up software mai...
Thank you!Dr Hongyu ZhangSchool of Software, Tsinghua UniversityBeijing 100084, ChinaEmail: hongyu@tsinghua.edu.cnWeb: htt...
Upcoming SlideShare
Loading in …5
×

ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011)

1,370 views

Published on

ESEC/FSE 2011 presentation

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,370
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011)

  1. 1. Rongxin Wu, Hongyu Zhang, Sunghum Kim, Shi-chi Cheung Tsinghua University, ChinaThe Hong Kong University of Science and Technology, Hong Kong 1
  2. 2. • The links between fixed bugs and committed changes are important: – for measuring software quality – for constructing defect prediction models CommittedFixed ChangesBugs BugZilla CVS/SVN 2
  3. 3. • To discover the links: Mining software repository!• Heuristics traditionally used to collect links between bugs and changes: Searching for keywords (such as “Fixed” or “Bug”) and Bug IDs Bugzilla Mailings Source CVS/ Execution Code SVN traces Crash Require- Developer ments Logs … 3
  4. 4. Defective 4
  5. 5. Missing Links!Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets”, FSE 2009 a5
  6. 6. • Missing bug reference in change log• Irregular bug reference formats  “issue 681” , “bug 232”, “Fixed for #239”, “see #149”, “solve problem 681”,  Typos: “Fic 239” 6
  7. 7. • To recover the missing links, we studied many bug reports (including comments) and change logs• We have identified the following features of links: – Time interval: the bug-fix time and change committed time are close 7
  8. 8. • Time interval between bug-fix time and change committed time 8
  9. 9. • Through empirical studies, we have identified the following features of links: – Bug owner and change committer: they are often the same person, or have mapping relationships 9
  10. 10. Mapping• Bug owner and change committer relationship Bug Owner Change Committer Project dswitkin@gmail.com dswitkin ZXing dswitkin@google.com dswitkin@google.com ZXing srowen@gmail.com srowen ZXing pelili0101@googlemail.c peli0101 Openintents om Will Rowe Wrowe Apache Erik Abele Erikabele Apache 10
  11. 11. Bug owner and change committer 11
  12. 12. • Through empirical studies, we have identified the following features of links: – Text similarity: the textual descriptions in the bug report are often similar to those in the change logs. 12
  13. 13. • Text similarity Texts are similar! Using IR technology to measure similarity 13
  14. 14. 14
  15. 15. • To determine the criteria of features, we learn from the explicit links that can be identified through traditional heuristics: – For the time interval feature and the text similarity feature, we exhaustively search for the optimal combination of these two values so that the maximum F-measure can be achieved. – For the mappings between bug owners and change committers, we also learn them from the explicit links. 15
  16. 16. • Determine time interval and similarity threshold Step by step search the optimal similarity threshold and time interval values
  17. 17. • Determine mapping relationship between bug owners and change committers To find the possible mappings from the explicit links
  18. 18. • To obtain the ground truth (“golden set” of links) • For ZXing and OpenIntents, we manually identify the links • For Apache, we use the data provided by Bird et al. (annotated by an Apache core developer)
  19. 19. • Four possible outcomes – A link we identify is a true link → TP – A link we identify is not a true link → FP – A link we miss is a true link → FN – A link we miss is not a true link → TN• Evaluation Metrics TP TP Precision Recall TP FP TP FN 2 * Precision * Recall FMeasure Precision Recall 19
  20. 20. F-measure Recall ReLink Traditional Precision 0.65 0.7 0.75 0.8 0.85 0.9 Performance of ReLink in Apache Project
  21. 21. 21
  22. 22. • What can we do with the recovered links? – Improving Maintainability Measurement The percentage of bug-fixing changes The percentage of buggy files Mean time to fix – Constructing better software defect prediction models
  23. 23. • Maintainability Measurement: 23
  24. 24. 24
  25. 25. • Defect Prediction ReLink can improve the performance of defect prediction!
  26. 26. • The quality of golden set of links can’t be completely assured• All the datasets are collected from open source projects• The approach needs to be verified in more projects 26
  27. 27. • We propose ReLink to recover the missing links• The recovered links have positive impact on the follow-up software maintenance studies including defect prediction and maintainability measurement.• Future work:  Further improving the performance of ReLink  Applying to more projects including industrial projects 27
  28. 28. Thank you!Dr Hongyu ZhangSchool of Software, Tsinghua UniversityBeijing 100084, ChinaEmail: hongyu@tsinghua.edu.cnWeb: http://sites.google.com/site/hongyujohn/ 28

×