ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011)

Sung Kim
Sung KimAssociate Prof.
Rongxin Wu, Hongyu Zhang, Sunghum Kim, Shi-chi Cheung
                 Tsinghua University, China
The Hong Kong University of Science and Technology, Hong Kong   1
• The links between fixed bugs and committed
  changes are important:
  – for measuring software quality
  – for constructing defect prediction models

                                           Committed
Fixed                                      Changes
Bugs
        BugZilla                 CVS/SVN


                                                  2
• To discover the links:
        Mining software repository!
• Heuristics traditionally used to collect links
  between bugs and changes:
   Searching for keywords (such as “Fixed” or
     “Bug”) and Bug IDs
                                          Bugzilla    Mailings
                                     Source
                                                   CVS/      Execution
                                      Code
                                                   SVN         traces
                                                             Crash
                               Require-   Developer
                                ments                 Logs
                                                                     … 3
Defective




        4
Missing Links!




Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets”, FSE 2009   a5
• Missing bug reference in change log




• Irregular bug reference formats
   “issue 681” , “bug 232”, “Fixed for #239”, “see
   #149”, “solve problem 681”,
   Typos: “Fic 239”
                                                      6
• To recover the missing links, we studied many
  bug reports (including comments) and change
  logs
• We have identified the following features of links:
   – Time interval: the bug-fix time and change committed
     time are close




                                                            7
• Time interval between bug-fix time and
  change committed time




                                           8
• Through empirical studies, we have identified
  the following features of links:


  – Bug owner and change committer: they are often
    the same person, or have mapping relationships




                                                     9
Mapping
• Bug owner and change committer                            relationship


       Bug Owner            Change Committer      Project

  dswitkin@gmail.com            dswitkin           ZXing

  dswitkin@google.com      dswitkin@google.com     ZXing

   srowen@gmail.com              srowen            ZXing
 pelili0101@googlemail.c
                                peli0101         Openintents
           om
       Will Rowe                 Wrowe             Apache

       Erik Abele               Erikabele          Apache
                                                                     10
Bug owner and change committer




                                 11
• Through empirical studies, we have identified
  the following features of links:




  – Text similarity: the textual descriptions in the bug
    report are often similar to those in the change
    logs.

                                                       12
• Text similarity       Texts are
                         similar!




                        Using IR
                     technology to
                    measure similarity
                                    13
14
• To determine the criteria of features, we learn
  from the explicit links that can be identified
  through traditional heuristics:
  – For the time interval feature and the text similarity
    feature, we exhaustively search for the optimal
    combination of these two values so that the
    maximum F-measure can be achieved.
  – For the mappings between bug owners and
    change committers, we also learn them from the
    explicit links.

                                                       15
• Determine time interval and similarity threshold
                                   Step by step search the
                                      optimal similarity
                                     threshold and time
                                       interval values
• Determine mapping relationship between bug
  owners and change committers

                                To find the possible mappings
                                     from the explicit links
• To obtain the ground truth (“golden set” of links)
  • For ZXing and OpenIntents, we manually identify the links
  • For Apache, we use the data provided by Bird et al. (annotated
    by an Apache core developer)
• Four possible outcomes
  –   A link we identify is a true link → TP
  –   A link we identify is not a true link → FP
  –   A link we miss is a true link → FN
  –   A link we miss is not a true link → TN
• Evaluation Metrics
                      TP                       TP
       Precision                  Recall
                    TP FP                    TP FN

                    2 * Precision * Recall
       FMeasure
                     Precision Recall                19
F-measure




    Recall                                                          ReLink
                                                                    Traditional



 Precision



             0.65     0.7      0.75      0.8     0.85         0.9

                    Performance of ReLink in Apache Project
21
• What can we do with the recovered links?
  – Improving Maintainability Measurement
    The percentage of bug-fixing changes
    The percentage of buggy files
    Mean time to fix
  – Constructing better software defect
    prediction models
• Maintainability Measurement:




                                 23
24
• Defect Prediction




  ReLink can improve the performance of defect prediction!
• The quality of golden set of links can’t be
  completely assured

• All the datasets are collected from open source
  projects

• The approach needs to be verified in more
  projects

                                                26
• We propose ReLink to recover the missing
  links
• The recovered links have positive impact on
  the follow-up software maintenance studies
  including defect prediction and maintainability
  measurement.
• Future work:
   Further improving the performance of ReLink
   Applying to more projects including industrial
   projects
                                                     27
Thank you!

Dr Hongyu Zhang
School of Software, Tsinghua University
Beijing 100084, China
Email: hongyu@tsinghua.edu.cn
Web: http://sites.google.com/site/hongyujohn/


                                                28
1 of 28

Recommended

De kredietcrisis voor dummies by
De kredietcrisis voor dummiesDe kredietcrisis voor dummies
De kredietcrisis voor dummiesstandaardonline
304 views45 slides
Evaluating Recommended Applications by
Evaluating Recommended ApplicationsEvaluating Recommended Applications
Evaluating Recommended Applicationsrsse2008
656 views19 slides
ePub 4 Ways: Vendor Conversion by
ePub 4 Ways: Vendor ConversionePub 4 Ways: Vendor Conversion
ePub 4 Ways: Vendor Conversionreedkm
1.2K views17 slides
SEppt by
SEpptSEppt
SEpptHemankita Perabathini
212 views19 slides
OpenURL Resolver Implementation: Trialing, Tuning, Training (SLA 2006) by
OpenURL Resolver Implementation: Trialing, Tuning, Training (SLA 2006)OpenURL Resolver Implementation: Trialing, Tuning, Training (SLA 2006)
OpenURL Resolver Implementation: Trialing, Tuning, Training (SLA 2006)Rafal Kasprowski
624 views14 slides
Functional Programmer's Starter Kit by
Functional Programmer's Starter KitFunctional Programmer's Starter Kit
Functional Programmer's Starter KitGarreth Dottin
170 views25 slides

More Related Content

Similar to ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011)

Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R... by
Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...OdessaJS Conf
121 views32 slides
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems by
An Empirical Study of Unspecified Dependencies in Make-Based Build SystemsAn Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build Systemscorpaulbezemer
557 views55 slides
Fp201 unit1 1 by
Fp201 unit1 1Fp201 unit1 1
Fp201 unit1 1rohassanie
387 views26 slides
REST vs. GraphQL: Critical Look by
REST vs. GraphQL: Critical LookREST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical LookNordic APIs
1.2K views74 slides
Measuring Your Code by
Measuring Your CodeMeasuring Your Code
Measuring Your CodeNate Abele
1.6K views54 slides
Streamlined Geek Talk by
Streamlined Geek TalkStreamlined Geek Talk
Streamlined Geek TalkSarah Allen
515 views13 slides

Similar to ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011)(20)

Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R... by OdessaJS Conf
Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
OdessaJS Conf121 views
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems by corpaulbezemer
An Empirical Study of Unspecified Dependencies in Make-Based Build SystemsAn Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems
corpaulbezemer557 views
Fp201 unit1 1 by rohassanie
Fp201 unit1 1Fp201 unit1 1
Fp201 unit1 1
rohassanie387 views
REST vs. GraphQL: Critical Look by Nordic APIs
REST vs. GraphQL: Critical LookREST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical Look
Nordic APIs1.2K views
Measuring Your Code by Nate Abele
Measuring Your CodeMeasuring Your Code
Measuring Your Code
Nate Abele1.6K views
Streamlined Geek Talk by Sarah Allen
Streamlined Geek TalkStreamlined Geek Talk
Streamlined Geek Talk
Sarah Allen515 views
Concurrent Ruby Application Servers by Lin Jen-Shin
Concurrent Ruby Application ServersConcurrent Ruby Application Servers
Concurrent Ruby Application Servers
Lin Jen-Shin5.8K views
An Efficient Approach for Requirement Traceability Integrated With Software R... by IOSR Journals
An Efficient Approach for Requirement Traceability Integrated With Software R...An Efficient Approach for Requirement Traceability Integrated With Software R...
An Efficient Approach for Requirement Traceability Integrated With Software R...
IOSR Journals324 views
Put Your Hands in the Mud: What Technique, Why, and How by Massimiliano Di Penta
Put Your Hands in the Mud: What Technique, Why, and HowPut Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and How
Package Repositories: The Unsung Heroes of Configuration and Release Managem... by IBM UrbanCode Products
Package Repositories:  The Unsung Heroes of Configuration and Release Managem...Package Repositories:  The Unsung Heroes of Configuration and Release Managem...
Package Repositories: The Unsung Heroes of Configuration and Release Managem...
Towards a Quality Assessment of Web Corpora for Language Technology Applications by Marina Santini
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Marina Santini225 views
Technical Challenges in Resource Discovery by Paul Walk
Technical Challenges in Resource DiscoveryTechnical Challenges in Resource Discovery
Technical Challenges in Resource Discovery
Paul Walk902 views
Technical Coping Strategies for Resource Discovery - Paul Walk by Jisc
Technical Coping Strategies for Resource Discovery - Paul WalkTechnical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul Walk
Jisc1.1K views
A Tasty deep-dive into Open API Specification Links by Tony Tam
A Tasty deep-dive into Open API Specification LinksA Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification Links
Tony Tam2K views
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U... by Raffi Khatchadourian
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re... by IEEEBEBTECHSTUDENTPROJECTS
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ... by IEEEBEBTECHSTUDENTSPROJECTS
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...
CROSSMINER Project at OW2con'19 by OW2
CROSSMINER Project at OW2con'19CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19
OW2279 views

More from Sung Kim

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning by
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
1.3K views23 slides
Deep API Learning (FSE 2016) by
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Sung Kim
1.4K views25 slides
Time series classification by
Time series classificationTime series classification
Time series classificationSung Kim
5.7K views29 slides
Tensor board by
Tensor boardTensor board
Tensor boardSung Kim
8.4K views17 slides
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria... by
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...Sung Kim
2.5K views16 slides
Heterogeneous Defect Prediction (

ESEC/FSE 2015) by
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
2.2K views28 slides

More from Sung Kim(20)

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning by Sung Kim
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
Sung Kim1.3K views
Deep API Learning (FSE 2016) by Sung Kim
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
Sung Kim1.4K views
Time series classification by Sung Kim
Time series classificationTime series classification
Time series classification
Sung Kim5.7K views
Tensor board by Sung Kim
Tensor boardTensor board
Tensor board
Sung Kim8.4K views
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria... by Sung Kim
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
Sung Kim2.5K views
Heterogeneous Defect Prediction (

ESEC/FSE 2015) by Sung Kim
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Sung Kim2.2K views
A Survey on Automatic Software Evolution Techniques by Sung Kim
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
Sung Kim1.1K views
Crowd debugging (FSE 2015) by Sung Kim
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
Sung Kim1.9K views
Software Defect Prediction on Unlabeled Datasets by Sung Kim
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
Sung Kim16.7K views
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015) by Sung Kim
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Sung Kim1.6K views
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014) by Sung Kim
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Sung Kim1.9K views
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2... by Sung Kim
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
Sung Kim2.2K views
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014) by Sung Kim
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Sung Kim6.4K views
Source code comprehension on evolving software by Sung Kim
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
Sung Kim1.6K views
A Survey on Dynamic Symbolic Execution for Automatic Test Generation by Sung Kim
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
Sung Kim3.1K views
Survey on Software Defect Prediction by Sung Kim
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
Sung Kim14.1K views
MSR2014 opening by Sung Kim
MSR2014 openingMSR2014 opening
MSR2014 opening
Sung Kim17K views
Personalized Defect Prediction by Sung Kim
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
Sung Kim3.7K views
STAR: Stack Trace based Automatic Crash Reproduction by Sung Kim
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
Sung Kim7K views
Transfer defect learning by Sung Kim
Transfer defect learningTransfer defect learning
Transfer defect learning
Sung Kim3.2K views

Recently uploaded

Special_edition_innovator_2023.pdf by
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
17 views6 slides
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
85 views32 slides
Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentationssuserb54b561
14 views27 slides
Future of Indian ConsumerTech by
Future of Indian ConsumerTechFuture of Indian ConsumerTech
Future of Indian ConsumerTechKapil Khandelwal (KK)
21 views68 slides
Kyo - Functional Scala 2023.pdf by
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
368 views92 slides
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...Jasper Oosterveld
18 views49 slides

Recently uploaded(20)

Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson85 views
Future of AR - Facebook Presentation by ssuserb54b561
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
ssuserb54b56114 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma39 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab19 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10248 views
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker37 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays11 views

ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011)

  • 1. Rongxin Wu, Hongyu Zhang, Sunghum Kim, Shi-chi Cheung Tsinghua University, China The Hong Kong University of Science and Technology, Hong Kong 1
  • 2. • The links between fixed bugs and committed changes are important: – for measuring software quality – for constructing defect prediction models Committed Fixed Changes Bugs BugZilla CVS/SVN 2
  • 3. • To discover the links: Mining software repository! • Heuristics traditionally used to collect links between bugs and changes: Searching for keywords (such as “Fixed” or “Bug”) and Bug IDs Bugzilla Mailings Source CVS/ Execution Code SVN traces Crash Require- Developer ments Logs … 3
  • 5. Missing Links! Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets”, FSE 2009 a5
  • 6. • Missing bug reference in change log • Irregular bug reference formats  “issue 681” , “bug 232”, “Fixed for #239”, “see #149”, “solve problem 681”,  Typos: “Fic 239” 6
  • 7. • To recover the missing links, we studied many bug reports (including comments) and change logs • We have identified the following features of links: – Time interval: the bug-fix time and change committed time are close 7
  • 8. • Time interval between bug-fix time and change committed time 8
  • 9. • Through empirical studies, we have identified the following features of links: – Bug owner and change committer: they are often the same person, or have mapping relationships 9
  • 10. Mapping • Bug owner and change committer relationship Bug Owner Change Committer Project dswitkin@gmail.com dswitkin ZXing dswitkin@google.com dswitkin@google.com ZXing srowen@gmail.com srowen ZXing pelili0101@googlemail.c peli0101 Openintents om Will Rowe Wrowe Apache Erik Abele Erikabele Apache 10
  • 11. Bug owner and change committer 11
  • 12. • Through empirical studies, we have identified the following features of links: – Text similarity: the textual descriptions in the bug report are often similar to those in the change logs. 12
  • 13. • Text similarity Texts are similar! Using IR technology to measure similarity 13
  • 14. 14
  • 15. • To determine the criteria of features, we learn from the explicit links that can be identified through traditional heuristics: – For the time interval feature and the text similarity feature, we exhaustively search for the optimal combination of these two values so that the maximum F-measure can be achieved. – For the mappings between bug owners and change committers, we also learn them from the explicit links. 15
  • 16. • Determine time interval and similarity threshold Step by step search the optimal similarity threshold and time interval values
  • 17. • Determine mapping relationship between bug owners and change committers To find the possible mappings from the explicit links
  • 18. • To obtain the ground truth (“golden set” of links) • For ZXing and OpenIntents, we manually identify the links • For Apache, we use the data provided by Bird et al. (annotated by an Apache core developer)
  • 19. • Four possible outcomes – A link we identify is a true link → TP – A link we identify is not a true link → FP – A link we miss is a true link → FN – A link we miss is not a true link → TN • Evaluation Metrics TP TP Precision Recall TP FP TP FN 2 * Precision * Recall FMeasure Precision Recall 19
  • 20. F-measure Recall ReLink Traditional Precision 0.65 0.7 0.75 0.8 0.85 0.9 Performance of ReLink in Apache Project
  • 21. 21
  • 22. • What can we do with the recovered links? – Improving Maintainability Measurement The percentage of bug-fixing changes The percentage of buggy files Mean time to fix – Constructing better software defect prediction models
  • 24. 24
  • 25. • Defect Prediction ReLink can improve the performance of defect prediction!
  • 26. • The quality of golden set of links can’t be completely assured • All the datasets are collected from open source projects • The approach needs to be verified in more projects 26
  • 27. • We propose ReLink to recover the missing links • The recovered links have positive impact on the follow-up software maintenance studies including defect prediction and maintainability measurement. • Future work:  Further improving the performance of ReLink  Applying to more projects including industrial projects 27
  • 28. Thank you! Dr Hongyu Zhang School of Software, Tsinghua University Beijing 100084, China Email: hongyu@tsinghua.edu.cn Web: http://sites.google.com/site/hongyujohn/ 28