Successfully reported this slideshow.
Your SlideShare is downloading. ×

iFixR: Bug Report Driven Program Repair

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 35 Ad

iFixR: Bug Report Driven Program Repair

Download to read offline

Issue tracking systems are commonly used in modern software development for collecting feedback from users and developers. An ultimate automation target of software maintenance is then the systematization of patch generation for user-reported bugs. Although this ambition is aligned with the momentum of automated program repair, the literature has, so far, mostly focused on generate-and- validate setups where fault localization and patch generation are driven by a well-defined test suite. On the one hand, however, the common (yet strong) assumption on the existence of relevant test cases does not hold in practice for most development settings: many bugs are reported without the available test suite being able to reveal them. On the other hand, for many projects, the number of bug reports generally outstrips the resources available to triage them. Towards increasing the adoption of patch generation tools by practitioners, we investigate a new repair pipeline, iFixR, driven by bug reports: (1) bug reports are fed to an IR-based fault localizer; (2) patches are generated from fix patterns and validated via regression testing; (3) a prioritized list of generated patches is proposed to developers. We evaluate iFixR on the Defects4J dataset, which we enriched (i.e., faults are linked to bug reports) and carefully-reorganized (i.e., the timeline of test-cases is naturally split). iFixR generates genuine/plausible patches for 21/44 Defects4J faults with its IR-based fault localizer. iFixR accurately places a genuine/plausible patch among its top-5 recommendation for 8/13 of these faults (without using future test cases in generation-and-validation).

Issue tracking systems are commonly used in modern software development for collecting feedback from users and developers. An ultimate automation target of software maintenance is then the systematization of patch generation for user-reported bugs. Although this ambition is aligned with the momentum of automated program repair, the literature has, so far, mostly focused on generate-and- validate setups where fault localization and patch generation are driven by a well-defined test suite. On the one hand, however, the common (yet strong) assumption on the existence of relevant test cases does not hold in practice for most development settings: many bugs are reported without the available test suite being able to reveal them. On the other hand, for many projects, the number of bug reports generally outstrips the resources available to triage them. Towards increasing the adoption of patch generation tools by practitioners, we investigate a new repair pipeline, iFixR, driven by bug reports: (1) bug reports are fed to an IR-based fault localizer; (2) patches are generated from fix patterns and validated via regression testing; (3) a prioritized list of generated patches is proposed to developers. We evaluate iFixR on the Defects4J dataset, which we enriched (i.e., faults are linked to bug reports) and carefully-reorganized (i.e., the timeline of test-cases is naturally split). iFixR generates genuine/plausible patches for 21/44 Defects4J faults with its IR-based fault localizer. iFixR accurately places a genuine/plausible patch among its top-5 recommendation for 8/13 of these faults (without using future test cases in generation-and-validation).

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to iFixR: Bug Report Driven Program Repair (20)

Advertisement

Recently uploaded (20)

iFixR: Bug Report Driven Program Repair

  1. 1. iFixR: Bug Report driven Program Repair Anil Koyuncu, Kui Liu, Tegawendé F. Bissyandé, Dongsun Kim, Martin Monperrus, Jacques Klein, Yves Le Traon SnT, University of Luxembourg, Luxembourg 2019-08-28, Wednesday @FSE 2019
  2. 2. Typical Generate-validate pipeline 2 > Code changes in Software repositories Bug fix patches Enhanced AST diff representations Diff hunk search space construction Rooted tree isomorphism computation Clustering based on subgraph Identification Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Iterative folding Patch Candidates Patch Generation Patch Validation Suspicious Code Locations Buggy Program Fault Localization Test Suite Pass Fail Testing
  3. 3. Assumption of Test suite 3 > Code changes in Software repositories Bug fix patches Enhanced AST diff representations Diff hunk search space construction Rooted tree isomorphism computation Clustering based on subgraph Identification Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Iterative folding Patch Candidates Patch Generation Patch Validation Suspicious Code Locations Buggy Program Fault Localization Test Suite Pass Fail Testing Test Cases in Defects4J , 92% of bugs do not have a failing test case when a bug report submitted
  4. 4. What practical repair pipeline to account for limited test-suite? 4 > Code changes in Software repositories Bug fix patches Enhanced AST diff representations Diff hunk search space construction Rooted tree isomorphism computation Clustering based on subgraph Identification Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Iterative folding Patch Candidates Patch Generation Patch Validation Suspicious Code Locations Buggy Program Fault Localization Test Suite Pass Fail Testing Test Cases ? ?
  5. 5. What practical repair pipeline to account for limited test-suite? 5 > Code changes in Software repositories Bug fix patches Enhanced AST diff representations Diff hunk search space construction Rooted tree isomorphism computation Clustering based on subgraph Identification Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Iterative folding Patch Candidates Patch Generation Patch Validation Suspicious Code Locations Buggy Program Fault Localization Test Suite Pass Fail Testing ? Bug Report
  6. 6. What practical repair pipeline to account for limited test-suite? 6 > Code changes in Software repositories Bug fix patches Enhanced AST diff representations Diff hunk search space construction Rooted tree isomorphism computation Clustering based on subgraph Identification Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Iterative folding Patch Candidates Patch Generation Patch Validation Suspicious Code Locations Buggy Program Fault Localization Test Suite Pass Fail Regression Tests Bug Report
  7. 7. Approach 7
  8. 8. iFixR: Bug Report driven Program Repair IRBL FeaturesDistribute to Regions ... ... ... ... Regions Step 2 Step 3 Step 4 Step 5 Divide Conquer Best Models Leaf-wise Weights Computations Code changes in Software repositories Bug fix patches Enhanced AST diff representations Diff hunk search space construction Rooted tree isomorphism computation Clustering based on subgraph Identification Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Iterative folding Patch Candidates Patch Generation Patch Validation Regression Testing Fix Pattern Matching IR-based fault localization Suspicious Code Locations Buggy Program Select fix pattern Mutate suspicious code Fault Localization Bug Report Fix patterns Manual Validation Developer Test Code Elements (AST) 8 > A relevant test case reproducing the bug may not be readily available when a bug report is submitted to the issue tracking system.
  9. 9. Where is the Bug? IRBL FeaturesDistribute to Regions ... ... ... ... Regions Step 2 Step 3 Step 4 Step 5 Divide Conquer Best Models Leaf-wise Weights Computations Code changes in Software repositories Bug fix patches Enhanced AST diff representations Diff hunk search space construction Rooted tree isomorphism computation Clustering based on subgraph Identification Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Iterative folding Patch Candidates Patch Generation Patch Validation Regression Testing Fix Pattern Matching IR-based fault localization Suspicious Code Locations Buggy Program Select fix pattern Mutate suspicious code Fault Localization Bug Report Fix patterns Manual Validation Developer Test Code Elements (AST) 9 > Information Retrieval based Fault Localization (IRFL)
  10. 10. Information Retrieval based Fault Localization • gener 10 >
  11. 11. Statement-level IR-based Fault Localization 11 Top-K Suspicious Code Files > … Statement Suspiciousness scores Statement-level Fault Localizer
  12. 12. How should we change the code? IRBL FeaturesDistribute to Regions ... ... ... ... Regions Step 2 Step 3 Step 4 Step 5 Divide Conquer Best Models Leaf-wise Weights Computations Code changes in Software repositories Bug fix patches Enhanced AST diff representations Diff hunk search space construction Rooted tree isomorphism computation Clustering based on subgraph Identification Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Iterative folding Patch Candidates Patch Generation Patch Validation Regression Testing Fix Pattern Matching IR-based fault localization Suspicious Code Locations Buggy Program Select fix pattern Mutate suspicious code Fault Localization Bug Report Fix patterns Manual Validation Developer Test Code Elements (AST) 12 > Fix Pattern-based Selection
  13. 13. 13 > Fix Pattern Taxonomy FP1. Insert Cast Checker. FP2. Insert Null Pointer Checker. FP3. Insert Range Checker. FP4. Insert Missed Statement. FP5. Mutate Class Instance Creation. FP6. Mutate Conditional Expression. FP7. Mutate Data Type. FP8. Mutate Integer Division Operation. FP9. Mutate Literal Expression. FP10. Mutate Method Invocation Expression. FP11. Mutate Operator. FP12. Mutate Return Statement. FP13. Mutate Variable. FP14. Move Statement. FP15. Remove Statement.
  14. 14. 14 > Fix Pattern Selection (see TBAR@ISSTA 2019) Selected fix pattern A ranked list of suspicious statements Fix pattern data base Code AST … // Buggy code of Closure-13 in Defects4J dataset. if(options.dependencyOptions.needsManagement() && !options.skipAllPasses && options.closurePass) { // AST. IfStatement ---InfixExpression ------MethodInvocation --------- …… ------Operator ------PrefixExpression --------- …… ------Operator ------QualifiedName --------- …… FP6. Mutate Conditional Expression. FP6.1: - ...condExp1... + ...condExp2... FP6.2: - ...condExp1 Op condExp2... + ...condExp1... FP6.3: - ...condExp1... + ...condExp1 Op condExp2... Breadth First Search
  15. 15. How should we generate the patch? IRBL FeaturesDistribute to Regions ... ... ... ... Regions Step 2 Step 3 Step 4 Step 5 Divide Conquer Best Models Leaf-wise Weights Computations Code changes in Software repositories Bug fix patches Enhanced AST diff representations Diff hunk search space construction Rooted tree isomorphism computation Clustering based on subgraph Identification Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Iterative folding Patch Candidates Patch Generation Patch Validation Regression Testing Fix Pattern Matching IR-based fault localization Suspicious Code Locations Buggy Program Select fix pattern Mutate suspicious code Fault Localization Bug Report Fix patterns Manual Validation Developer Test Code Elements (AST) 15 > Fix Pattern-based Patch Generation
  16. 16. 16 > Patch Generation // Buggy code of Closure-13 in Defects4J dataset. if(options.dependencyOptions.needsManagement() && !options.skipAllPasses && options.closurePass) { // Patch Candidate I. - if(options.dependencyOptions.needsManagement() && - !options.skipAllPasses && + if(!options.skipAllPasses && options.closurePass) { // Patch Candidate II. if (options.dependencyOptions.needsManagement() && - !options.skipAllPasses && options.closurePass) { // Patch Candidate III. if (options.dependencyOptions.needsManagement() && - !options.skipAllPasses && - options.closurePass) { + !options.skipAllPasses) { FP6. Mutate Conditional Expression. FP6.2: - ...condExp1 Op condExp2... + ...condExp1... Patch CandidatesSelected fix pattern Mutate suspicious code
  17. 17. Which patches are valid? IRBL FeaturesDistribute to Regions ... ... ... ... Regions Step 2 Step 3 Step 4 Step 5 Divide Conquer Best Models Leaf-wise Weights Computations Code changes in Software repositories Bug fix patches Enhanced AST diff representations Diff hunk search space construction Rooted tree isomorphism computation Clustering based on subgraph Identification Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Iterative folding Patch Candidates Patch Generation Patch Validation Regression Testing Fix Pattern Matching IR-based fault localization Suspicious Code Locations Buggy Program Select fix pattern Mutate suspicious code Fault Localization Bug Report Fix patterns Manual Validation Developer Test Code Elements (AST) 17 > Available test cases to validate patch candidates ! Available test cases may not even trigger the bug
  18. 18. iFixR: Patch Validation 18 Regression tests to validate patch candidates A patch ordering strategy to recommend patches with priority Heuristics to re-prioritize the patch candidates 1. Minimal changes 2. Higher fault localization score 3. Affected code elements >
  19. 19. 19 RQ1: [Fault localization] : To what extent does IR-based fault localization provide reliable results for an APR scenario? RQ2: [Overfitting] : To what extent does IR-based fault localization point to locations that are less subject to overfitting? RQ3: [Patch ordering] : What is the effectiveness of iFixR’s patch ordering strategy? > RQs: Research Questions
  20. 20. 20 > RQ1: Fault localization performance Spectrum Fault Localization (SFL) vs Information Retrieval Fault Localization(IRFL) IR-based fault localization in iFixR is as accurate as Spectrum-based fault localization
  21. 21. 21 > RQ2: Reparability IRFL vs SFL impacts on the number of generated correct/plausible patches. IRFL and SBFL localization information lead to similar repair performance in terms of the number of fixed bugs plausibly/correctly
  22. 22. 22 IR-based fault localization lead less to overfitted patches than the code locations suggested by Spectrum-based fault localization > RQ2: Overfitting Dissection of why patches are plausible.
  23. 23. 23 > RQ3: Patch ordering Patch re-prioritization recommends more plausible and even correct patches
  24. 24. 24 > RQ3: Patch Recommendation [iFixRtop5] - only top 5 recommended patches validated only with regression tests: [iFixRall] - all generated patches validated only with regression tests [iFixRopt]- all generated patches validated with augmented test suites Scenarios: Scenario
  25. 25. 25 > Take-aways • Test cases can be incomplete / buggy • Bug reports deserve more interest • Better approaches to selecting fix ingredients are necessary • Prioritization techniques must be investigated • More comprehensive benchmarks are needed
  26. 26. 26 > Conclusion • Feasibility of automating patch generation from bug reports • No assumptions on the availability of future test cases • Patch ordering strategy to recommend patches for developer validation
  27. 27. 27 iFixR: Bug Report driven Program Repair IRBL FeaturesDistribute to Regions ... ... ... ... Regions Step 2 Step 3 Step 4 Step 5 Divide Conquer Best Models Leaf-wise Weights Computations Code changes in Software repositories Bug fix patches Enhanced AST diff representations Diff hunk search space construction Rooted tree isomorphism computation Clustering based on subgraph Identification Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Iterative folding Patch Candidates Patch Generation Patch Validation Regression Testing Fix Pattern Matching IR-based fault localization Suspicious Code Locations Buggy Program Select fix pattern Mutate suspicious code Fault Localization Bug Report Fix patterns Manual Validation Developer Test Code Elements (AST) 10 > A relevant test case reproducing the bug may not be readily available when a bug report is submitted to the issue tracking system. 25 > RQ1: Fault localization performance Spectrum Fault Localization (SFL) vs Information Retrieval Fault Localization(IRFL) IR-based fault localization in iFixR is as accurate as Spectrum-based fault localization 27 IR-based fault localization lead less to overfitted patches than the code locations suggested by Spectrum-based fault localization > RQ2: Overfitting Dissection of why patches are plausible.
  28. 28. 28
  29. 29. Strong assumption limits the practicality of APR • Incomplete • Few tests are written • Future • Tests written after the source code • Regression • Tests to validate the bugs are fixed and will not reoccur 29 >
  30. 30. Typical Information Retrieval Fault Localization 30 Suspicious Code Files >
  31. 31. Tests in Suites: Defects4J Cases 33 96% 4% Test Suite Evolution Future test cases Available test cases > Test case changes in fix commits of Defects4J bugs.
  32. 32. Test case availability when the bug is reported 34 8% 92% After Removing Future Test Cases Failing test cases No failure > Failing test cases after removing future test cases Fault Localization and Patch validation may be challenged in the case of user-reported bugs, given the lack of relevant test cases.
  33. 33. How to repair without complete test suite? 35 • Fault localization without the future test cases? Bug Reports • Patch validation without the future test cases? Regression Tests Patch suggestions for developers > A relevant test case reproducing the bug may not be readily available when a bug report is submitted to the issue tracking system.
  34. 34. Statement-level IR-based Fault Localization 37 Top-K Suspicious Code Files Extract Statements Extract Text Similarity Feature Vectors File Suspiciousness scores X > Preprocess & Tokenization … … … Statement Suspiciousness scores Weighted Suspiciousness scores … Statement Suspiciousness scores
  35. 35. What practical repair pipeline to account for limited test-suite? 38 > Code changes in Software repositories Bug fix patches Enhanced AST diff representations Diff hunk search space construction Rooted tree isomorphism computation Clustering based on subgraph Identification Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Iterative folding Patch Candidates Patch Generation Patch Validation Suspicious Code Locations Buggy Program Fault Localization Test Suite Pass Fail Regression Tests Bug Report 1. 2. 5. Patch Patch Patch Recommended Patches …

Editor's Notes

  • Hello, my name is Anil. Phd student from university of Luxembourg.
    I am going to present our work iFixr.

    where, we have investigated the feasibility of automating
    patch generation from bug reports.

    To that end we implemented, an APR pipeline variant adapted to the constraints of maintainers dealing with bug reports.
  • We start by revisit the fundamental steps of APR approaches, to better understand the constraints while dealing bugs reports.

    --make really shorter below---
    Generate-and-validate a common APR technique which work as follows:

    given a buggy program and a test suite
    a fault localization step, identifies suspicious code locations,

    a patch generation step, generates patch candidates on the buggy code locations,

    and finally a patch validation step, executes the test cases to check whether the patched program meets the behaviour specified by the test suite.

    ---
  • In the absence of complete specifications of the desired behavior program, generate and validate approaches uses test suites
    as an affordable approximation to program
    specifications,
    which mainly
    drives the fault localization and patch validation steps.

    Even though such approaches have been shown effective within well-established benchmarks, the common (yet strong) assumption on the existence of relevant test cases does not always hold in practice for most development settings:

    indeed, Based on our empirical assessment in Defects4J, we found that overall,92% of bugs do not have a failing test cases when a bug report is submitted
  • As Fault Localization and Patch validation may be challenged, given the lack of relevant test cases.

    How can we develop an APR pipeline that is feasible under the practical constraint of limited
    test suites?
  • we assume we may leverage bug reports to drive the fault localization
  • And for patch validation we may rely only test cases available in code base,
    to perform regression testing [95], to ensure that,
    at least, the generated patches do not obstruct the behavior of the
    existing, unchanged part of the software

    ---bozuk bu kisim--
    and a strategy to prioritize patches for recommendation
    to developers.

    given that we assume only the presence of
    regression test cases to validate patch candidates, many of these
    patches may fail on the future test cases that are relevant to the
    reported bugs.

    A recommendation system to ensure that patches presented first to
    the developers are the most likely to be plausible and even correct.

    patch suggestion for developers.

    present a sorted recommendation list of patches
  • Following the constraint on the availability of test cases, and our intuition of bug reports and available test cases in the code bae.
    , we propose iFixR, a new
    program repair workflow which considers a practical repair setup

    consists of 4 principle steps.


    (1) bug reports are fed to an IR-based fault localizer;
    (2) patches are generated from fix patterns and validated via regression
    testing;
    (3) a prioritized list of generated patches is proposed
    to developers for validation


  • Now lets see each step in details:

    We take as input the bug report in natural language and the buggy program at the time of bug report submission.

    iFixR replaces classical spectrum-based fault localization
    with Information Retrieval (IR)-based fault localization
  • the general objective of Information Retrieval (IR)-based fault localization is to leverage
    similarity between the terms to identify relevant buggy code locations

    But, till now it Has not been explored in context of ARP
    One of the reason is that state-of-art techniques localize bugs at file / method level.




    ----
    Given a bug report, and a defective program,

    the general objective of Information Retrieval (IR)-based fault localization
    (IRFL) is to leverage potential
    similarity between the terms to identify relevant buggy code locations

    Despite recurring interest in the literature and a large body of work on IRFL
    , we did not find any adoption in program
    repair research or practice.

    We assume that one of the
    reasons is that IRFL techniques have so far focused on file-level localization / or method-level ,
    which is too coarse-grained in comparison to spectrum based
    fault localization output which is commonly used in APR.

  • We propose a fined-grained IR-based fault localization

    We built on existing stat—of to localize at file level,

    then we develop an heuristic based algorithm to identify suspicious statements within the file.

    These heuristics considers
    token similarity within each statement
    and likelihood of statements types to be buggy.
    More details are in the paper.




    In order to obtain a finer-grained (such as statement-level localization or line-level) IR-based fault localization which can be integrated into patch generation pipeline,
    , we develop an algorithm to rank suspicious statements
    based on the output (i.e., files) of a state-of-the-art IRFL tool.

    -----

    We start by taking the bug report and among with the returned
    list of suspicious files of an existing IRFL tool.

    We extract text from the bug report and statements from each file to collect textual tokens.

    We apply then preprocessing steps such as Stop word removal, stemming and tokenization and form bag of tokens.

    We build a feature vectors from Each bag of tokens (for the bug report, and for each
    statement) and finally
    We use cosine similarity among the vectors

    and rank the statements in the file that
    are relevant to the bug report.


    finally in order to reflect the Highly ranked files scores over the statements
    we weight
    these scores with the suspiciousness score statements in files with the associated files.

  • Now that we have the suspicious code locations, we resort to fix patterns.

    We have indeed seen that fix patterns aka fix templates is a common, and reliable, strategy in automatic program repair
  • we consider fix patterns from the literature

    Overall, iFixR includes 35 fix patterns under 15 categories.

    fix patterns that encodes the recipe of change actions that should be
    applied to mutate a code element.

    For example, 1 pf is about inserting a cast checker for the variable appearing in the fix location.
    where as fp such as null check, considers different variations of null check
  • Our fix pattern selection is based on the
    common implementation of template based APR.
    Simply we are
    matching the AST context information of each suspicious statement.
    against the context AST of the fix pattern.

    consider this buggy statement of closure-13 in defects4j .
    Its context is an if statement
    This matches the fix pattern that mutates conditional expression in the fix pattern database.

    --



    once the fault localization process yields a list of suspicious code locations,
    iFixR attempts to select fix patterns from its database for each statement in the locations list.

    The selection of fix patterns is conducted in a naïve way based on the by matching the AST context informations of each suspicious statement.
    against the context AST of the fix pattern.

    ----

    3- The selection of fix patterns is conducted in a naïve way based on the AST context information of each suspicious statement.

    4- Specifically given a suspicious statement ,
    5- iFixR traverses each node of the suspicious statement AST in a breadth first manner and tries to match each node against the context AST of the fix pattern.

    6- If a node can match any bug context, a related fix pattern will be matched to generate patch candidates with the corresponding code change pattern.

  • Once a matching fix pattern is found (i.e., a pattern is selected for a suspicious statement), a patch is generated by mutating the statement



  • Specifically, given a suspicious statement and fix pattern,
    we generate patch candidates
    by mutating the suspicious statement code following the recipe in the
    fix pattern.



  • iFixr considers bug reports. When the bug is reported there are test cases within the code base.
    We refer to them as available test case.

    We perform regression testing to ensure that, at least, the generated patches do not break the existing, unchanged part of the software.




    For every reported bug, fault localization followed by pattern matching
    and code mutation will yield a set of patch candidates. In a
    typical test-based APR system, these patch candidates must let the
    program pass all test cases
    the patch candidates set is actively pruned to remove
    all patches that do not meet these requirements.

    In our work, in
    accordance with our investigation findings that such test cases may
    not be available at the time the bug is reported we
    assume that iFixR cannot reason about future test cases to select
    patch candidates.

    Instead, we rely only on past test cases, which were available
    in the code base, when the bug is reported.

    Such test cases are leveraged to perform regression testing which will ensure that, at least, the genereated patches do not obstruct the behavior of the existing, unchanged part of the software

    --
    , which is already explicitly
    encoded by developers in their current test suite.






  • Available test cases may not even trigger the bug, with regression test we cannot guarantee the patch candidate actually fixes the bug

    Thus.
    we need a patch order strategy to increase the change of having the correct patch on top.


    We rely on heuristics
    to re-prioritize patch candidates.

    First, prioritize patches that have minimal changes.
    Then we consider patches on code locations with higher fault localization scores.
    Finally, we prioritize Again the patches on statements that are likely to be buggy.

    statements that are likely to be buggy


    we cannot guarantee that all patches that
    pass regression tests will indeed fix the bug.
    --
    say something
    --
    we imaged iFixR can produce a ranked recommendation list of patch
    suggestions for developers

    Thus in order to increase the
    probability of placing a correct patch on top of the list we adopt a patch ordering strategy

    We rely on heuristics
    to re-prioritize the validated patches in order to ensure the best candidate is presented at first.

    We use the following
    heuristics to re-prioritize the patch candidates

    favor patches that minimize the differences between the patched program and the buggy program (counting # of deleted and inserted AST nodes)


    the suspiciousness scores (of the associated statements) suspiciousness (when two patch cand. Have equal edit script size, the tie is broken using the suspiciousness score of the associated statement.

    change actions are irrelevant to bug fixing (irrelevant changes actions , iFixR systematically under-prioritize patches generated by –literal exp,variable into method invocation or final static, inserting a method invocation without params)


  • In order to assess the feasibility of automating
    the generation of patches for user-reported bugs,
    we formulated on the following research questions
    associated with the different steps in the iFixR workflow.


    We are try to asses our ir based fault localization.



  • We compare our
    fine-grained IRFL implementation against the classical spectrum based
    localization.
    ---
    Explain top1

    Our IRFL locates 25 bugs among the first suspicious statement, compared to 26 and 23 bugs SBFL
    Locates.
    Although performance results are similar, we remind that SBFL is applied by
    considering future test cases.

    ---





    Overall, the results show that our IRFL implementation is strictly
    comparable to the common implementation of spectrum-based
    fault localization when applied on the Defects4J bug dataset.



    SBFL is performed based on two different versions of the
    GZoltar testing framework, but always based on the Ochiai ranking
    metric. Finally, because fault localization tools output a ranked list
    of suspicious statements, results are provided in terms of whether
    the correct location is placed under the top-k suspected statements.

    Note
    that the comparison is conducted for 171 bugs of Math and Lang,
    given that these are the projects for which the bug linking can be
    reliably performed for applying the IRFL.

    Although performance
    results are similar, we remind the reader that SBFL is applied by
    considering future test cases.


  • We compare our IRFL implementation
    against commonly-used SBFL implementations in the literature of
    test-based APR. to investigate the
    impact of localization bias.


    Overall, we find that IRFL and SBFL localization information
    lead to similar repair performance in terms of the number of fixed
    bugs plausibly/correctly.
  • We further investigate the cases of plausible patches in both localization
    scenarios to characterize the reasons why these patches appear to
    only be overfitting the test suites.

    We found the main reasons;
    Due to fault localization error
    Pattern prioritization
    Lack of fix ingredients

    fault localization
    errors: mutating irrelevantly suspicious
    statements


  • In the absence
    of complete test suites, we imaged a
    Patch re-priorization stragety can increase the change of having the correct patch on top.

    In the absence of
    a complete test suite, we cannot guarantee that all patches that
    pass regression tests will fix the bug


    performance is promising as it manages, for 13 bugs, to present a plausible
    patch among its top-5 recommended patches per bug. Among those
    plausible patches, 8 are eventually found to be correct.


    --

    R3- in the last research question
    , we investigate the overall workflow
    of iFixR, by re-simulating the real-world cases of software
    maintenance cycle when a bug is reported: future test cases are
    not available for patch validation.

    We thus discard the
    future test cases of the Defects4J dataset and generate patches that
    must be recommended to developers.

    The evaluation protocol thus
    consists in assessing to what extent correct/plausible patches are
    placed in the top of the recommendation list

    In the absence of future test cases to drive the patch validation process, we
    use heuristics to re-prioritize the patch candidates
    towards ensuring that patches which are recommended first will
    eventually be correct (or at least plausible when relevant test cases
    are implemented).

    We present results both for the case where we
    do not re-prioritize and the case where we re-prioritize.


    Overall, we note that iFixR performance is promising as it manages, for 13 bugs, to present a plausible
    patch among its top-5 recommended patches per bug. Among those
    plausible patches, 8 are eventually found to be correct.




  • With iFixR, we have shown
    that bug reports could be handled automatically for a variety of
    bugs.


    In the absence
    of complete test suites for validating patch candidates, a
    recommendation system must ensure that patches presented first to
    the developers are the most likely to be plausible and even correct
  • iFixR which does not require future test cases to localize bugs, generate patches and
    present a sorted recommendation list of patches.
  • Many bugs are reported without the available test suite being able to reveal them.

    In most development settings, test cases are :

    -Incomplete
    -written after the fix
    -and focus on validating that bugs are indeed fixed and will not reoccur

    We argue that the assumption on the availability of relevant test
    cases limits the adoption of APR by practitioners.
  • IRFL approaches systematically
    to formulate a query from a given bug report by extracting tokens

    matched in a search space of documents indexed through
    tokens extracted from source code.


    IRFL approaches then rank the
    documents based on a probability of relevance (often measured as
    a Suspicious score).

    Highly ranked files are predicted to be the ones
    that are likely to contain the buggy code.


    Despite recurring interest in the literature and a large body of work on IRFL
    , we did not find any adoption in program
    repair research or practice.

    We assume that one of the
    reasons is that IRFL techniques have so far focused on file-level localization / or method-level ,
    which is too coarse-grained in comparison to spectrum based
    fault localization output which is commonly used in APR.


  • We revisited fundamental steps of debugging to understand the constraints of maintainers in practice.

    Bug fixing / debugging is a process, following a set of activities split in a timeline.

    It starts with the submission of a bug report, which is then assigned to a developer who finds the location of a defect and fix the source files and then test the modified software.

    Finally, a tester checks the scenario of the reported bug, and then verifies whether the bug has been resolved or not.
  • APR studies generally performed on well-defined benchmarks such as defects4j.

    In order to highlight the practical challenges for user-reported bugs, we study how the test suite is evolved.


    we assume a similar timeline split of the benchmark dataset is necessary

    To that end, we carefully examine the actual bug fix commits associated with Defects4J bugs and their impact of test cases

    Overall, for 96%(=381/395) bugs, bug fix commits change the relevant test cases, which are actually future data with respect to the bug discovery process
  • As Future test cases are not available at the time the bug is reported,
    we assume a reorganize the dataset in a timeline split similar to manual debugging process is necessary
    to accurately simulate the context for user reported bugs.

    Concretely, we remove the future test cases of the Defects4J dataset and executed the test cases.

    We found that .overall, for 92%(=365/395) bugs does not have any failing test cases.

    This finding suggests that, in practice, the fault localization and patch validation step in APR may
    be challenged in the case of user-reported bugs.
  • As the test cases drive fault localization and validating generated patches in the test suite based APR pipelines, it is necessary to investigate alternate approaches for fault localization and patch validation.

    For fault localization we assume we may leverage bug reports
    And for patch validation we may rely only on past test cases, which were available
    in the code base and patch suggestion for developers.
  • the workflow of the iFixR approach , considers the following issues
    Where is the bug?
    How should we change the code?
    Which patches are valid?
    Which patches do we recommend fist?
  • In order to obtain a finer-grained (such as statement-level localization or line-level) IR-based fault localizer which can be integrated into patch generation pipeline,
    , we develop an algorithm to rank suspicious statements
    based on the output (i.e., files) of a state-of-the-art IRFL tool.

    -----

    We start by taking the bug report and among with the returned
    list of suspicious files of an existing IRFL tool.

    We extract text from the bug report and statements from each file to collect textual tokens.

    We apply then preprocessing steps such as Stop word removal, stemming and tokenization and form bag of tokens.

    We build a feature vectors from Each bag of tokens (for the bug report, and for each
    statement) and finally
    We use cosine similarity among the vectors

    and rank the statements in the file that
    are relevant to the bug report.


    finally in order to reflect the Highly ranked files scores over the statements
    we weight
    these scores with the suspiciousness score statements in files with the associated files.
  • And for patch validation we may rely only on past test cases, to perform regression testing [95], to ensure that,
    at least, the generated patches do not obstruct the behavior of the
    existing, unchanged part of the software

    ---bozuk bu kisim--
    and a strategy to prioritize patches for recommendation
    to developers.

    given that we assume only the presence of
    regression test cases to validate patch candidates, many of these
    patches may fail on the future test cases that are relevant to the
    reported bugs.

    A recommendation system to ensure that patches presented first to
    the developers are the most likely to be plausible and even correct.

    patch suggestion for developers.

    present a sorted recommendation list of patches

×