• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
ReLink: Recovering Links  between Bugs and Changes (ESEC/FSE 2011)
 

ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011)

on

  • 864 views

ESEC/FSE 2011 presentation

ESEC/FSE 2011 presentation

Statistics

Views

Total Views
864
Views on SlideShare
864
Embed Views
0

Actions

Likes
1
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    ReLink: Recovering Links  between Bugs and Changes (ESEC/FSE 2011) ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011) Presentation Transcript

    • Rongxin Wu, Hongyu Zhang, Sunghum Kim, Shi-chi Cheung Tsinghua University, ChinaThe Hong Kong University of Science and Technology, Hong Kong 1
    • • The links between fixed bugs and committed changes are important: – for measuring software quality – for constructing defect prediction models CommittedFixed ChangesBugs BugZilla CVS/SVN 2
    • • To discover the links: Mining software repository!• Heuristics traditionally used to collect links between bugs and changes: Searching for keywords (such as “Fixed” or “Bug”) and Bug IDs Bugzilla Mailings Source CVS/ Execution Code SVN traces Crash Require- Developer ments Logs … 3
    • Defective 4
    • Missing Links!Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets”, FSE 2009 a5
    • • Missing bug reference in change log• Irregular bug reference formats  “issue 681” , “bug 232”, “Fixed for #239”, “see #149”, “solve problem 681”,  Typos: “Fic 239” 6
    • • To recover the missing links, we studied many bug reports (including comments) and change logs• We have identified the following features of links: – Time interval: the bug-fix time and change committed time are close 7
    • • Time interval between bug-fix time and change committed time 8
    • • Through empirical studies, we have identified the following features of links: – Bug owner and change committer: they are often the same person, or have mapping relationships 9
    • Mapping• Bug owner and change committer relationship Bug Owner Change Committer Project dswitkin@gmail.com dswitkin ZXing dswitkin@google.com dswitkin@google.com ZXing srowen@gmail.com srowen ZXing pelili0101@googlemail.c peli0101 Openintents om Will Rowe Wrowe Apache Erik Abele Erikabele Apache 10
    • Bug owner and change committer 11
    • • Through empirical studies, we have identified the following features of links: – Text similarity: the textual descriptions in the bug report are often similar to those in the change logs. 12
    • • Text similarity Texts are similar! Using IR technology to measure similarity 13
    • 14
    • • To determine the criteria of features, we learn from the explicit links that can be identified through traditional heuristics: – For the time interval feature and the text similarity feature, we exhaustively search for the optimal combination of these two values so that the maximum F-measure can be achieved. – For the mappings between bug owners and change committers, we also learn them from the explicit links. 15
    • • Determine time interval and similarity threshold Step by step search the optimal similarity threshold and time interval values
    • • Determine mapping relationship between bug owners and change committers To find the possible mappings from the explicit links
    • • To obtain the ground truth (“golden set” of links) • For ZXing and OpenIntents, we manually identify the links • For Apache, we use the data provided by Bird et al. (annotated by an Apache core developer)
    • • Four possible outcomes – A link we identify is a true link → TP – A link we identify is not a true link → FP – A link we miss is a true link → FN – A link we miss is not a true link → TN• Evaluation Metrics TP TP Precision Recall TP FP TP FN 2 * Precision * Recall FMeasure Precision Recall 19
    • F-measure Recall ReLink Traditional Precision 0.65 0.7 0.75 0.8 0.85 0.9 Performance of ReLink in Apache Project
    • 21
    • • What can we do with the recovered links? – Improving Maintainability Measurement The percentage of bug-fixing changes The percentage of buggy files Mean time to fix – Constructing better software defect prediction models
    • • Maintainability Measurement: 23
    • 24
    • • Defect Prediction ReLink can improve the performance of defect prediction!
    • • The quality of golden set of links can’t be completely assured• All the datasets are collected from open source projects• The approach needs to be verified in more projects 26
    • • We propose ReLink to recover the missing links• The recovered links have positive impact on the follow-up software maintenance studies including defect prediction and maintainability measurement.• Future work:  Further improving the performance of ReLink  Applying to more projects including industrial projects 27
    • Thank you!Dr Hongyu ZhangSchool of Software, Tsinghua UniversityBeijing 100084, ChinaEmail: hongyu@tsinghua.edu.cnWeb: http://sites.google.com/site/hongyujohn/ 28