ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011)
1. Rongxin Wu, Hongyu Zhang, Sunghum Kim, Shi-chi Cheung
Tsinghua University, China
The Hong Kong University of Science and Technology, Hong Kong 1
2. • The links between fixed bugs and committed
changes are important:
– for measuring software quality
– for constructing defect prediction models
Committed
Fixed Changes
Bugs
BugZilla CVS/SVN
2
3. • To discover the links:
Mining software repository!
• Heuristics traditionally used to collect links
between bugs and changes:
Searching for keywords (such as “Fixed” or
“Bug”) and Bug IDs
Bugzilla Mailings
Source
CVS/ Execution
Code
SVN traces
Crash
Require- Developer
ments Logs
… 3
7. • To recover the missing links, we studied many
bug reports (including comments) and change
logs
• We have identified the following features of links:
– Time interval: the bug-fix time and change committed
time are close
7
8. • Time interval between bug-fix time and
change committed time
8
9. • Through empirical studies, we have identified
the following features of links:
– Bug owner and change committer: they are often
the same person, or have mapping relationships
9
10. Mapping
• Bug owner and change committer relationship
Bug Owner Change Committer Project
dswitkin@gmail.com dswitkin ZXing
dswitkin@google.com dswitkin@google.com ZXing
srowen@gmail.com srowen ZXing
pelili0101@googlemail.c
peli0101 Openintents
om
Will Rowe Wrowe Apache
Erik Abele Erikabele Apache
10
12. • Through empirical studies, we have identified
the following features of links:
– Text similarity: the textual descriptions in the bug
report are often similar to those in the change
logs.
12
13. • Text similarity Texts are
similar!
Using IR
technology to
measure similarity
13
15. • To determine the criteria of features, we learn
from the explicit links that can be identified
through traditional heuristics:
– For the time interval feature and the text similarity
feature, we exhaustively search for the optimal
combination of these two values so that the
maximum F-measure can be achieved.
– For the mappings between bug owners and
change committers, we also learn them from the
explicit links.
15
16. • Determine time interval and similarity threshold
Step by step search the
optimal similarity
threshold and time
interval values
17. • Determine mapping relationship between bug
owners and change committers
To find the possible mappings
from the explicit links
18. • To obtain the ground truth (“golden set” of links)
• For ZXing and OpenIntents, we manually identify the links
• For Apache, we use the data provided by Bird et al. (annotated
by an Apache core developer)
19. • Four possible outcomes
– A link we identify is a true link → TP
– A link we identify is not a true link → FP
– A link we miss is a true link → FN
– A link we miss is not a true link → TN
• Evaluation Metrics
TP TP
Precision Recall
TP FP TP FN
2 * Precision * Recall
FMeasure
Precision Recall 19
20. F-measure
Recall ReLink
Traditional
Precision
0.65 0.7 0.75 0.8 0.85 0.9
Performance of ReLink in Apache Project
22. • What can we do with the recovered links?
– Improving Maintainability Measurement
The percentage of bug-fixing changes
The percentage of buggy files
Mean time to fix
– Constructing better software defect
prediction models
25. • Defect Prediction
ReLink can improve the performance of defect prediction!
26. • The quality of golden set of links can’t be
completely assured
• All the datasets are collected from open source
projects
• The approach needs to be verified in more
projects
26
27. • We propose ReLink to recover the missing
links
• The recovered links have positive impact on
the follow-up software maintenance studies
including defect prediction and maintainability
measurement.
• Future work:
Further improving the performance of ReLink
Applying to more projects including industrial
projects
27
28. Thank you!
Dr Hongyu Zhang
School of Software, Tsinghua University
Beijing 100084, China
Email: hongyu@tsinghua.edu.cn
Web: http://sites.google.com/site/hongyujohn/
28