Several static analysis tools, such as Splint or FindBugs, have been proposed to the software development community to help detect security vulnerabilities or bad programming practices. However, the adoption of these tools is hindered by their high false positive rates. If the false positive rate is too high, developers may get acclimated to violation reports from these tools, causing concrete and severe bugs being overlooked. Fortunately, some violations are actually addressed and resolved by developers. We claim that those violations that are recurrently fixed are likely to be true positives, and an automated approach can learn to repair similar unseen violations. However, there is lack of a systematic way to investigate the distributions on existing violations and fixed ones in the wild, that can provide insights into prioritizing violations for developers, and an effective way to mine code and fix patterns which can help developers easily understand the reasons of leading violations and how to fix them.
In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations. To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair.
TBar: Revisiting Template-based Automated Program RepairDongsun Kim
We revisit the performance of template-based APR to build comprehensive knowledge about the effectiveness of fix patterns, and to highlight the importance of complementary steps such as fault localization or donor code retrieval. To that end, we first investigate the literature to collect, summarize and label recurrently-used fix patterns. Based on the investigation, we build TBar, a straightforward APR tool that systematically attempts to apply these fix patterns to program bugs. We thoroughly evaluate TBar on the Defects4J benchmark. In particular, we assess the actual qualitative and quantitative diversity of fix patterns, as well as their effectiveness in yielding plausible or correct patches. Eventually, we find that, assuming a perfect fault localization, TBar correctly/plausibly fixes 74/101 bugs. Replicating a standard and practical pipeline of APR assessment, we demonstrate that TBar correctly fixes 43 bugs from Defects4J, an unprecedented performance in the literature (including all approaches, i.e., template-based, stochastic mutation-based or synthesis-based APR).
Learning to Spot and Refactor Inconsistent Method NamesDongsun Kim
To ensure code readability and facilitate software maintenance, program methods must be named properly. In particular, method names must be consistent with the corresponding method implementations. Debugging method names remains an important topic in the literature, where various approaches analyze commonalities among method names in a large dataset to detect inconsistent method names and suggest better ones. We note that the state-of-the-art does not analyze the implemented code itself to assess consistency. We thus propose a novel automated approach to debugging method names based on the analysis of consistency between method names and method code. The approach leverages deep feature representation techniques adapted to the nature of each artifact. Experimental results on over 2.1 million Java methods show that we can achieve up to 15 percentage points improvement over the state-of-the-art, establishing a record performance of 67.9% F1-measure in identifying inconsistent method names. We further demonstrate that our approach yields up to 25% accuracy in suggesting full names, while the state-of-the-art lags far behind at 1.1% accuracy. Finally, we report on our success in fixing 66 inconsistent method names in a live study on projects in the wild.
Bug fixing is a time-consuming and tedious task. To reduce the manual efforts in bug fixing, researchers have pre- sented automated approaches to software repair. Unfortunately, recent studies have shown that the state-of-the-art techniques in automated repair tend to generate patches only for a small number of bugs even with quality issues (e.g., incorrect behavior and nonsensical changes). To improve automated program repair (APR) techniques, the community should deepen its knowledge on repair actions from real-world patches since most of the techniques rely on patches written by human developers. Previous investigations on real-world patches are limited to statement level that is not sufficiently fine-grained to build this knowledge. In this work, we contribute to building this knowledge via a systematic and fine-grained study of 16,450 bug fix commits from seven Java open-source projects. We find that there are opportunities for APR techniques to improve their effectiveness by looking at code elements that have not yet been investigated. We also discuss nine insights into tuning automated repair tools. For example, a small number of statement and expression types are recurrently impacted by real-world patches, and expression-level granularity could reduce search space of finding fix ingredients, where previous studies never explored.
iFixR: Bug Report Driven Program RepairDongsun Kim
Issue tracking systems are commonly used in modern software development for collecting feedback from users and developers. An ultimate automation target of software maintenance is then the systematization of patch generation for user-reported bugs. Although this ambition is aligned with the momentum of automated program repair, the literature has, so far, mostly focused on generate-and- validate setups where fault localization and patch generation are driven by a well-defined test suite. On the one hand, however, the common (yet strong) assumption on the existence of relevant test cases does not hold in practice for most development settings: many bugs are reported without the available test suite being able to reveal them. On the other hand, for many projects, the number of bug reports generally outstrips the resources available to triage them. Towards increasing the adoption of patch generation tools by practitioners, we investigate a new repair pipeline, iFixR, driven by bug reports: (1) bug reports are fed to an IR-based fault localizer; (2) patches are generated from fix patterns and validated via regression testing; (3) a prioritized list of generated patches is proposed to developers. We evaluate iFixR on the Defects4J dataset, which we enriched (i.e., faults are linked to bug reports) and carefully-reorganized (i.e., the timeline of test-cases is naturally split). iFixR generates genuine/plausible patches for 21/44 Defects4J faults with its IR-based fault localizer. iFixR accurately places a genuine/plausible patch among its top-5 recommendation for 8/13 of these faults (without using future test cases in generation-and-validation).
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug LocalizationDongsun Kim
Jaekwon Lee, Dongsun Kim, Tegawendé F. Bissyandé, Woosung Jung and Yves Le Traon, “Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization”, in Proceedings of the 27th International Symposium on Software Testing and Analysis (ISSTA 2018), Amsterdam, Netherlands, July 16 – 21, 2018.
LSRepair: Live Search of Fix Ingredients for Automated Program RepairDongsun Kim
Automated program repair (APR) has extensively been developed by leveraging search-based techniques, in which fix ingredients are explored and identified in different granular- ities from a specific search space. State-of-the approaches often find fix ingredients by using mutation operators or leveraging manually-crafted templates. We argue that the fix ingredients can be searched in an online mode, leveraging code search techniques to find potentially-fixed versions of buggy code fragments from which repair actions can be extracted. In this study, we present an APR tool, LSRepair, that automatically explores code repositories to search for fix ingredients at the method-level granularity with three strategies of similar code search. Our preliminary evaluation shows that code search can drive a faster fix process (some bugs are fixed in a few seconds). LSRepair helps repair 19 bugs from the Defects4J benchmark successfully. We expect our approach to open new directions for fixing multiple-lines bugs.
You Cannot Fix What You Cannot Find! --- An Investigation of Fault Localizati...Dongsun Kim
Properly benchmarking Automated Program Re- pair (APR) systems should contribute to the development and adoption of the research outputs by practitioners. To that end, the research community must ensure that it reaches significant milestones by reliably comparing state-of-the-art tools for a better understanding of their strengths and weaknesses. In this work, we identify and investigate a practical bias caused by the fault localization (FL) step in a repair pipeline. We propose to highlight the different fault localization configurations used in the literature, and their impact on APR systems when applied to the Defects4J benchmark. Then, we explore the performance variations that can be achieved by “tweaking” the FL step. Eventually, we expect to create a new momentum for (1) full disclosure of APR experimental procedures with respect to FL, (2) realistic expectations of repairing bugs in Defects4J, as well as (3) reliable performance comparison among the state-of-the- art APR systems, and against the baseline performance results of our thoroughly assessed kPAR repair tool. Our main findings include: (a) only a subset of Defects4J bugs can be currently localized by commonly-used FL techniques; (b) current practice of comparing state-of-the-art APR systems (i.e., counting the number of fixed bugs) is potentially misleading due to the bias of FL configurations; and (c) APR authors do not properly qualify their performance achievement with respect to the different tuning parameters implemented in APR systems.
Impact of Tool Support in Patch ConstructionDongsun Kim
Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon, “Impact of Tool Support in Patch Construction,” in Proceedings of the 26th International Symposium on Software Testing and Analysis (ISSTA 2017), Santa Barbara, California, United States, July 10-14, 2017.
TBar: Revisiting Template-based Automated Program RepairDongsun Kim
We revisit the performance of template-based APR to build comprehensive knowledge about the effectiveness of fix patterns, and to highlight the importance of complementary steps such as fault localization or donor code retrieval. To that end, we first investigate the literature to collect, summarize and label recurrently-used fix patterns. Based on the investigation, we build TBar, a straightforward APR tool that systematically attempts to apply these fix patterns to program bugs. We thoroughly evaluate TBar on the Defects4J benchmark. In particular, we assess the actual qualitative and quantitative diversity of fix patterns, as well as their effectiveness in yielding plausible or correct patches. Eventually, we find that, assuming a perfect fault localization, TBar correctly/plausibly fixes 74/101 bugs. Replicating a standard and practical pipeline of APR assessment, we demonstrate that TBar correctly fixes 43 bugs from Defects4J, an unprecedented performance in the literature (including all approaches, i.e., template-based, stochastic mutation-based or synthesis-based APR).
Learning to Spot and Refactor Inconsistent Method NamesDongsun Kim
To ensure code readability and facilitate software maintenance, program methods must be named properly. In particular, method names must be consistent with the corresponding method implementations. Debugging method names remains an important topic in the literature, where various approaches analyze commonalities among method names in a large dataset to detect inconsistent method names and suggest better ones. We note that the state-of-the-art does not analyze the implemented code itself to assess consistency. We thus propose a novel automated approach to debugging method names based on the analysis of consistency between method names and method code. The approach leverages deep feature representation techniques adapted to the nature of each artifact. Experimental results on over 2.1 million Java methods show that we can achieve up to 15 percentage points improvement over the state-of-the-art, establishing a record performance of 67.9% F1-measure in identifying inconsistent method names. We further demonstrate that our approach yields up to 25% accuracy in suggesting full names, while the state-of-the-art lags far behind at 1.1% accuracy. Finally, we report on our success in fixing 66 inconsistent method names in a live study on projects in the wild.
Bug fixing is a time-consuming and tedious task. To reduce the manual efforts in bug fixing, researchers have pre- sented automated approaches to software repair. Unfortunately, recent studies have shown that the state-of-the-art techniques in automated repair tend to generate patches only for a small number of bugs even with quality issues (e.g., incorrect behavior and nonsensical changes). To improve automated program repair (APR) techniques, the community should deepen its knowledge on repair actions from real-world patches since most of the techniques rely on patches written by human developers. Previous investigations on real-world patches are limited to statement level that is not sufficiently fine-grained to build this knowledge. In this work, we contribute to building this knowledge via a systematic and fine-grained study of 16,450 bug fix commits from seven Java open-source projects. We find that there are opportunities for APR techniques to improve their effectiveness by looking at code elements that have not yet been investigated. We also discuss nine insights into tuning automated repair tools. For example, a small number of statement and expression types are recurrently impacted by real-world patches, and expression-level granularity could reduce search space of finding fix ingredients, where previous studies never explored.
iFixR: Bug Report Driven Program RepairDongsun Kim
Issue tracking systems are commonly used in modern software development for collecting feedback from users and developers. An ultimate automation target of software maintenance is then the systematization of patch generation for user-reported bugs. Although this ambition is aligned with the momentum of automated program repair, the literature has, so far, mostly focused on generate-and- validate setups where fault localization and patch generation are driven by a well-defined test suite. On the one hand, however, the common (yet strong) assumption on the existence of relevant test cases does not hold in practice for most development settings: many bugs are reported without the available test suite being able to reveal them. On the other hand, for many projects, the number of bug reports generally outstrips the resources available to triage them. Towards increasing the adoption of patch generation tools by practitioners, we investigate a new repair pipeline, iFixR, driven by bug reports: (1) bug reports are fed to an IR-based fault localizer; (2) patches are generated from fix patterns and validated via regression testing; (3) a prioritized list of generated patches is proposed to developers. We evaluate iFixR on the Defects4J dataset, which we enriched (i.e., faults are linked to bug reports) and carefully-reorganized (i.e., the timeline of test-cases is naturally split). iFixR generates genuine/plausible patches for 21/44 Defects4J faults with its IR-based fault localizer. iFixR accurately places a genuine/plausible patch among its top-5 recommendation for 8/13 of these faults (without using future test cases in generation-and-validation).
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug LocalizationDongsun Kim
Jaekwon Lee, Dongsun Kim, Tegawendé F. Bissyandé, Woosung Jung and Yves Le Traon, “Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization”, in Proceedings of the 27th International Symposium on Software Testing and Analysis (ISSTA 2018), Amsterdam, Netherlands, July 16 – 21, 2018.
LSRepair: Live Search of Fix Ingredients for Automated Program RepairDongsun Kim
Automated program repair (APR) has extensively been developed by leveraging search-based techniques, in which fix ingredients are explored and identified in different granular- ities from a specific search space. State-of-the approaches often find fix ingredients by using mutation operators or leveraging manually-crafted templates. We argue that the fix ingredients can be searched in an online mode, leveraging code search techniques to find potentially-fixed versions of buggy code fragments from which repair actions can be extracted. In this study, we present an APR tool, LSRepair, that automatically explores code repositories to search for fix ingredients at the method-level granularity with three strategies of similar code search. Our preliminary evaluation shows that code search can drive a faster fix process (some bugs are fixed in a few seconds). LSRepair helps repair 19 bugs from the Defects4J benchmark successfully. We expect our approach to open new directions for fixing multiple-lines bugs.
You Cannot Fix What You Cannot Find! --- An Investigation of Fault Localizati...Dongsun Kim
Properly benchmarking Automated Program Re- pair (APR) systems should contribute to the development and adoption of the research outputs by practitioners. To that end, the research community must ensure that it reaches significant milestones by reliably comparing state-of-the-art tools for a better understanding of their strengths and weaknesses. In this work, we identify and investigate a practical bias caused by the fault localization (FL) step in a repair pipeline. We propose to highlight the different fault localization configurations used in the literature, and their impact on APR systems when applied to the Defects4J benchmark. Then, we explore the performance variations that can be achieved by “tweaking” the FL step. Eventually, we expect to create a new momentum for (1) full disclosure of APR experimental procedures with respect to FL, (2) realistic expectations of repairing bugs in Defects4J, as well as (3) reliable performance comparison among the state-of-the- art APR systems, and against the baseline performance results of our thoroughly assessed kPAR repair tool. Our main findings include: (a) only a subset of Defects4J bugs can be currently localized by commonly-used FL techniques; (b) current practice of comparing state-of-the-art APR systems (i.e., counting the number of fixed bugs) is potentially misleading due to the bias of FL configurations; and (c) APR authors do not properly qualify their performance achievement with respect to the different tuning parameters implemented in APR systems.
Impact of Tool Support in Patch ConstructionDongsun Kim
Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon, “Impact of Tool Support in Patch Construction,” in Proceedings of the 26th International Symposium on Software Testing and Analysis (ISSTA 2017), Santa Barbara, California, United States, July 10-14, 2017.
AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis ViolationsDongsun Kim
Fix pattern-based patch generation is a promising direction in Automated Program Repair (APR). Notably, it has been demonstrated to produce more acceptable and correct patches than the patches obtained with mutation operators through genetic programming. The performance of pattern-based APR systems, however, depends on the fix ingredients mined from fix changes in development histories. Unfortunately, collecting a reliable set of bug fixes in repositories can be challenging. In this paper, we propose to investigate the possibility in an APR scenario of leveraging code changes that address violations by static bug detection tools. To that end, we build the AVATAR APR system, which exploits fix patterns of static analysis violations as ingredients for patch generation. Evaluated on the Defects4J benchmark, we show that, assuming a perfect localization of faults, AVATAR can generate correct patches to fix 34/39 bugs. We further find that AVATAR yields performance metrics that are comparable to that of the closely-related approaches in the literature. While AVATAR outperforms many of the state-of-the- art pattern-based APR systems, it is mostly complementary to current approaches. Overall, our study highlights the relevance of static bug finding tools as indirect contributors of fix ingredients for addressing code defects identified with functional test cases.
The PVS-Studio developers' team has carried out comparison of the own static code analyzer PVS-Studio with the open-source Cppcheck static code analyzer. As a material for comparison, the source codes of the three open-source projects by id Software were chosen: Doom 3, Quake 3: Arena, Wolfenstein: Enemy Territory. The article describes the comparison methodology and lists of detected errors. The conclusions section at the end of the article contains "non-conclusions" actually, as we consciously avoid drawing any conclusions: you can reproduce our comparison and draw your own ones.
Tesseract. Recognizing Errors in Recognition SoftwareAndrey Karpov
Tesseract is a free software program for text recognition developed by Google. According to the project description, "Tesseract is probably the most accurate open source OCR engine available". And what if we try to catch some bugs there with the help of the CppCat analyzer?
Static analysis works for mission-critical systems, why not yours? Rogue Wave Software
Take a deep dive into the world of static code analysis (SCA) by immersing yourself into different analysis techniques, examples of the problems they find, and learning how SCA fits into various types of environments, from the developer desktop to the QA team. The goal is to provide a solid foundation for you to make the best decision for testing technology and process selection, including: Types of defects found by SCA;
Typical myths and barriers to adoption; and How SCA aligns to different testing maturity levels.
Headache from using mathematical softwarePVS-Studio
It so happened that during some period of time I was discussing on the Internet, one would think, different topics: free alternatives of Matlab for universities and students, and finding errors in algorithms with the help of static code analysis. All these discussions were brought together by the terrible quality of the code of modern programs. In particular, it is about quality of software for mathematicians and scientists. Immediately there arises the question of the credibility to the calculations and studies conducted with the help of such programs. We will try to reflect on this topic and look for the errors.
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
Yida's presentation at MSR 2015!
Abstract—Developers expend significant effort on reviewing source code changes, hence the comprehensibility of code changes directly affects development productivity. Our prior study has suggested that composite code changes, which mix multiple development issues together, are typically difficult to review. Unfortunately, our manual inspection of 453 open source code changes reveals a non-trivial occurrence (up to 29%) of such composite changes.
In this paper, we propose a heuristic-based approach to automatically partition composite changes, such that each sub-change in the partition is more cohesive and self-contained. Our quantitative and qualitative evaluation results are promising in demonstrating the potential benefits of our approach for facilitating code review of composite code changes.
Product Derivation is a key activity in Software Product Line Engineering. During this process, derivation operators modify or create core assets (e.g., model elements, source code instructions, components) by adding, removing or substituting them according to a given configuration. The result is a derived product that generally needs to conform to a programming or modeling language. Some operators lead to invalid products when applied to certain assets, some others do not; knowing this in advance can help to better use them, however this is challenging, specially if we consider assets expressed in extensive and complex languages such as Java. In this paper, we empirically answer the following question: which product line operators, applied to which program elements, can synthesize variants of programs that are incorrect, correct or perhaps even conforming to test suites? We implement source code transformations, based on the derivation operators of the Common Variability Language. We automatically synthesize more than 370,000 program variants from a set of 8 real large Java projects (up to 85,000 lines of code), obtaining an extensive panorama of the sanity of the operations.
Paper was presented at SPLC'15
Why it is profitable to use static analysis, how can it solves problems for developers, testing, security researches and quality managers.
This session gives overview of static analysis - what is it for, what problems it solves, overview of commercial and free tools available as eclipse plugins (for JDT and CDT), how to adapt it for the organization to help developers.
Testing parallel software is a more complicated task in comparison to testing a standard program. The programmer should be aware both of the traps he can face while testing parallel code and existing methodologies and toolkit.
PVS-Studio advertisement - static analysis of C/C++ codeAndrey Karpov
This document advertises the PVS-Studio static analyzer. It describes how using PVS-Studio reduces the number of errors in code of C/C++/C++11 projects and costs on code testing, debugging and maintenance. A lot of examples of errors are cited found by the analyzer in various Open-Source projects. The document describes PVS-Studio at the time of version 4.38 on October 12-th, 2011, and therefore does not describe the capabilities of the tool in the next versions. To learn about new capabilities, visit the product's site <a>http://www.viva64.com</a> or search for an updated version of this article.
AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis ViolationsDongsun Kim
Fix pattern-based patch generation is a promising direction in Automated Program Repair (APR). Notably, it has been demonstrated to produce more acceptable and correct patches than the patches obtained with mutation operators through genetic programming. The performance of pattern-based APR systems, however, depends on the fix ingredients mined from fix changes in development histories. Unfortunately, collecting a reliable set of bug fixes in repositories can be challenging. In this paper, we propose to investigate the possibility in an APR scenario of leveraging code changes that address violations by static bug detection tools. To that end, we build the AVATAR APR system, which exploits fix patterns of static analysis violations as ingredients for patch generation. Evaluated on the Defects4J benchmark, we show that, assuming a perfect localization of faults, AVATAR can generate correct patches to fix 34/39 bugs. We further find that AVATAR yields performance metrics that are comparable to that of the closely-related approaches in the literature. While AVATAR outperforms many of the state-of-the- art pattern-based APR systems, it is mostly complementary to current approaches. Overall, our study highlights the relevance of static bug finding tools as indirect contributors of fix ingredients for addressing code defects identified with functional test cases.
The PVS-Studio developers' team has carried out comparison of the own static code analyzer PVS-Studio with the open-source Cppcheck static code analyzer. As a material for comparison, the source codes of the three open-source projects by id Software were chosen: Doom 3, Quake 3: Arena, Wolfenstein: Enemy Territory. The article describes the comparison methodology and lists of detected errors. The conclusions section at the end of the article contains "non-conclusions" actually, as we consciously avoid drawing any conclusions: you can reproduce our comparison and draw your own ones.
Tesseract. Recognizing Errors in Recognition SoftwareAndrey Karpov
Tesseract is a free software program for text recognition developed by Google. According to the project description, "Tesseract is probably the most accurate open source OCR engine available". And what if we try to catch some bugs there with the help of the CppCat analyzer?
Static analysis works for mission-critical systems, why not yours? Rogue Wave Software
Take a deep dive into the world of static code analysis (SCA) by immersing yourself into different analysis techniques, examples of the problems they find, and learning how SCA fits into various types of environments, from the developer desktop to the QA team. The goal is to provide a solid foundation for you to make the best decision for testing technology and process selection, including: Types of defects found by SCA;
Typical myths and barriers to adoption; and How SCA aligns to different testing maturity levels.
Headache from using mathematical softwarePVS-Studio
It so happened that during some period of time I was discussing on the Internet, one would think, different topics: free alternatives of Matlab for universities and students, and finding errors in algorithms with the help of static code analysis. All these discussions were brought together by the terrible quality of the code of modern programs. In particular, it is about quality of software for mathematicians and scientists. Immediately there arises the question of the credibility to the calculations and studies conducted with the help of such programs. We will try to reflect on this topic and look for the errors.
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
Yida's presentation at MSR 2015!
Abstract—Developers expend significant effort on reviewing source code changes, hence the comprehensibility of code changes directly affects development productivity. Our prior study has suggested that composite code changes, which mix multiple development issues together, are typically difficult to review. Unfortunately, our manual inspection of 453 open source code changes reveals a non-trivial occurrence (up to 29%) of such composite changes.
In this paper, we propose a heuristic-based approach to automatically partition composite changes, such that each sub-change in the partition is more cohesive and self-contained. Our quantitative and qualitative evaluation results are promising in demonstrating the potential benefits of our approach for facilitating code review of composite code changes.
Product Derivation is a key activity in Software Product Line Engineering. During this process, derivation operators modify or create core assets (e.g., model elements, source code instructions, components) by adding, removing or substituting them according to a given configuration. The result is a derived product that generally needs to conform to a programming or modeling language. Some operators lead to invalid products when applied to certain assets, some others do not; knowing this in advance can help to better use them, however this is challenging, specially if we consider assets expressed in extensive and complex languages such as Java. In this paper, we empirically answer the following question: which product line operators, applied to which program elements, can synthesize variants of programs that are incorrect, correct or perhaps even conforming to test suites? We implement source code transformations, based on the derivation operators of the Common Variability Language. We automatically synthesize more than 370,000 program variants from a set of 8 real large Java projects (up to 85,000 lines of code), obtaining an extensive panorama of the sanity of the operations.
Paper was presented at SPLC'15
Why it is profitable to use static analysis, how can it solves problems for developers, testing, security researches and quality managers.
This session gives overview of static analysis - what is it for, what problems it solves, overview of commercial and free tools available as eclipse plugins (for JDT and CDT), how to adapt it for the organization to help developers.
Testing parallel software is a more complicated task in comparison to testing a standard program. The programmer should be aware both of the traps he can face while testing parallel code and existing methodologies and toolkit.
PVS-Studio advertisement - static analysis of C/C++ codeAndrey Karpov
This document advertises the PVS-Studio static analyzer. It describes how using PVS-Studio reduces the number of errors in code of C/C++/C++11 projects and costs on code testing, debugging and maintenance. A lot of examples of errors are cited found by the analyzer in various Open-Source projects. The document describes PVS-Studio at the time of version 4.38 on October 12-th, 2011, and therefore does not describe the capabilities of the tool in the next versions. To learn about new capabilities, visit the product's site <a>http://www.viva64.com</a> or search for an updated version of this article.
An important event has taken place in the PVS-Studio analyzer's life: support of C#-code analysis was added in the latest version. As one of its developers, I couldn't but try it on some project. Reading about scanning small and little-known projects is not much interesting of course, so it had to be something popular, and I picked MonoDevelop.
PVS-Studio advertisement - static analysis of C/C++ codePVS-Studio
This document advertises the PVS-Studio static analyzer. It describes how using PVS-Studio reduces the number of errors in code of C/C++/C++11 projects and costs on code testing, debugging and maintenance. A lot of examples of errors are cited found by the analyzer in various Open-Source projects. The document describes PVS-Studio at the time of version 4.38 on October 12-th, 2011, and therefore does not describe the capabilities of the tool in the next versions. To learn about new capabilities, visit the product's site http://www.viva64.com or search for an updated version of this article.
Analysis of PascalABC.NET using SonarQube plugins: SonarC# and PVS-StudioPVS-Studio
In November 2016, we posted an article about the development and use of the PVS-Studio plugin for SonarQube. We received great feedback from our customers and interested users who requested testing the plugin on a real project. As the interest in this subject is not decreasing, we decided to test the plugin on a C# project PascalABC.NET. Also, it should be borne in mind, that SonarQube have their own static analyzer of C# code - SonarC#. To make the report more complete, we decided to test SonarC# as well. The objective of this work was not the comparison of the analyzers, but the demonstration of the main peculiarities of their interaction with the SonarQube service. Plain comparison of the analyzers would not be fair due to the fact that PVS-Studio is a specialized tool for bug detection and potential vulnerabilities, while SonarQube is a service for the assessment of the code quality by a large number of parameters: code duplication, compliance with the code standards, unit tests coverage, potential bugs in the code, density of comments in the code, technical debt and so on.
We continue checking Microsoft projects: analysis of PowerShellPVS-Studio
It has become a "good tradition" for Microsoft to make their products open-source: CoreFX, .Net Compiler Platform (Roslyn), Code Contracts, MSBuild, and other projects. For us, the developers of PVS-Studio analyzer, it's an opportunity to check well-known projects, tell people (including the project authors themselves) about the bugs we find, and additionally test our analyzer. Today we are going to talk about the errors found in another project by Microsoft, PowerShell.
Machine Learning in Static Analysis of Program Source CodeAndrey Karpov
Machine learning has firmly entrenched in a variety of human fields, from speech recognition to medical diagnosing. The popularity of this approach is so great that people try to use it wherever they can. Some attempts to replace classical approaches with neural networks turn up unsuccessful. This time we'll consider machine learning in terms of creating effective static code analyzers for finding bugs and potential vulnerabilities.
Chapter 10 Testing and Quality Assurance1Unders.docxketurahhazelhurst
Chapter 10:
Testing and Quality
Assurance
1
Understand quality & basic techniques for software verification and validation.
Analyze basics of software testing and testing techniques.
Discuss the concept of “inspection” process.
Objectives
2
Quality assurance (QA): activities designed
to measure and improve quality in a product— and process.
Quality control (QC): activities designed to validate and verify the quality of the product through detecting faults and “fixing” the defects.
Need good techniques, process, tools,
and team.
Testing Introduction
similar
3
Two traditional definitions:
Conforms to requirements.
Fit to use.
Verification: checking software conforms to
its requirements (did the software evolve
from the requirements properly; does the software “work”?)
Is the system correct?
Validation: checking software meets user requirements (fit to use)
Are we building the correct system?
What Is “Quality”?
4
Testing: executing program in a controlled environment and “verifying/validating” output.
Inspections and reviews.
Formal methods (proving software correct).
Static analysis detects “error-prone conditions.”
Some “Error-Detection” Techniques (finding errors)
5
Error: a mistake made by a programmer or software engineer that caused the fault, which in turn may cause a failure
Fault (defect, bug): condition that may cause a failure in the system
Failure (problem): inability of system to perform a function according to its spec due to some fault
Fault or failure/problem severity (based on consequences)
Fault or failure/problem priority (based on importance of developing a fix, which is in turn based
on severity)
Faults and Failures
6
Activity performed for:
Evaluating product quality
Improving products by identifying defects and having them fixed prior to software release
Dynamic (running-program) verification of program’s behavior on a finite set of test cases selected from execution domain.
Testing can NOT prove product works 100%—even though we use testing to demonstrate that parts of the software works.
Testing
Not always
done!
7
Who tests?
Programmers
Testers/Req. Analyst
Users
What is tested?
Unit code testing
Functional code testing
Integration/system testing
User interface testing
Testing (cont.)
Why test?
Acceptance (customer)
Conformance (std, laws, etc.)
Configuration (user vs. dev.)
Performance, stress, security, etc.
How (test cases designed)?
Intuition
Specification based (black box)
Code based (white box)
Existing cases (regression)
8
Progression of Testing
Equivalence Class Partitioning
Divide the input into several groups, deemed “equivalent” for purposes of finding errors.
Pick one “representative” for each class used for testing.
Equivalence classes determined by req./design specifications and some intuition
Example: pick “larger” of
two integers and . . .
Lessen duplication.
Complete coverage.
10
Suppose we have n distinct functional requirements.
Su ...
Traps detection during migration of C and C++ code to 64-bit WindowsPVS-Studio
Appearance of 64-bit processors on PC market made developers face the task of converting old 32-bit applications for new platforms. After the migration of the application code it is highly probable that the code will work incorrectly. This article reviews questions related to software verification and testing. It also concerns difficulties a developer of 64-bit Windows application may face and the ways of solving them.
Static analysis as part of the development process in Unreal EnginePVS-Studio
Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in the code that a programmer wants to identify as early as possible. One of the ways to reduce the number of errors is the use of a static analyzer like PVS-Studio. Moreover, the analyzer is not only evolving, but also constantly learning to look for new error patterns, some of which we will discuss in this article. If you care about code quality, this article is for you.
New Year PVS-Studio 6.00 Release: Scanning RoslynPVS-Studio
The long wait is finally over. We have released a static code analyzer PVS-Studio 6.00 that supports the analysis of C# projects. It can now analyze projects written in languages C, C++, C++/CLI, C++/CX, and C#. For this release, we have prepared a report based on the analysis of open-source project Roslyn. It is thanks to Roslyn that we were able to add the C# support to PVS-Studio, and we are very grateful to Microsoft for this project.
Regular use of static code analysis in team developmentPVS-Studio
Static code analysis technologies are used in companies with mature software development processes. However, there might be different levels of using and introducing code analysis tools into a development process: from manual launch of an analyzer "from time to time" or when searching for hard-to-find errors to everyday automatic launch or launch of a tool when adding new source code into the version control system.
The article observes some questions related to testing the 64-bit software. Some difficulties which a developer of resource-intensive 64-bit applications may face and the ways to overcome them are described.
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSijseajournal
Most bug assignment approaches utilize text classification and information retrieval techniques. These
approaches use the textual contents of bug reports to build recommendation models. The textual contents of
bug reports are usually of high dimension and noisy source of information. These approaches suffer from
low accuracy and high computational needs. In this paper, we investigate whether using categorical fields
of bug reports, such as component to which the bug belongs, are appropriate to represent bug reports
instead of textual description. We build a classification model by utilizing the categorical features, as a
representation, for the bug report. The experimental evaluation is conducted using three projects namely
NetBeans, Freedesktop, and Firefox. We compared this approach with two machine learning based bug
assignment approaches. The evaluation shows that using the textual contents of bug reports is important. In
addition, it shows that the categorical features can improve the classification accuracy
Similar to Mining Fix Patterns for FindBugs Violations (20)
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
1. Mining Fix Patterns for
FindBugsViolations
logotype of the University
of Luxembourg
1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg
2 School of Computing, KAIST, Daejeon, Republic of Korea
Kui Liu1, Dongsun Kim1, Tegawendé F. Bissyandé1,
Shin Yoo2, Yves Le Traon1
29th May 2019
5. 3
Violations from Static Analysis Tools
Static analysis tools such as
FindBugs detect violations
Developers may (or may not)
change source code to fix
the violations.
6. 3
Violations from Static Analysis Tools
Static analysis tools such as
FindBugs detect violations
PopulateRepositoryMojo.java file at revision bdf3fe in project nbm-maven-plugin.
Terms—Fix pattern, pattern mining, program repair, findbugs violation, unsupervised learning.
F
ODUCTION
oftware projects widely use static code analysis
ssess software quality and identify potential de-
eral commercial [1], [2], [3] and open-source [4],
] tools are integrated into many software projects,
operating system development projects [8]. For
Java-based projects often adopt FindBugs [4] or
while C projects use Splint [6], cppcheck [7],
Static Analyzer [9], while Linux driver code
matically assessed with a battery of static analyzers
parse and the LDV toolkit. Developers may benefit
tools before running a program in real environ-
en though those tools do not guarantee that all
defects are real bugs [10].
analysis can detect several types of defects such
y vulnerabilities, performance issues, and bad
ming practices (so-called code smells) [11]. Re-
es denote those defects as static analysis viola-
public boolean equals(Object obj) {
// Violation Type:
// BC_EQUALS_METHOD_SHOULD_WORK_FOR_ALL_OBJECTS
return getModule().equals(
((ModuleWrapper) obj).getModule());
}
Fig. 1: Example of a detected violation, taken from Popu-
lateRepositoryMojo.java file at revision bdf3fe in project
nbm-maven-plugin1
.
As later addressed by developers via a patch represented
in Fig. 2, the method should return false if obj is not
of the same type as the object being compared. In this
case, when the type of obj argument is not the type of
ModuleWrapper, a java.lang.ClassCastException
should be thrown.
public boolean equals(Object obj) {
- return getModule().equals(
Example
Developers may (or may not)
change source code to fix
the violations.
7. 3
Violations from Static Analysis Tools
Static analysis tools such as
FindBugs detect violations
PopulateRepositoryMojo.java file at revision bdf3fe in project nbm-maven-plugin.
Terms—Fix pattern, pattern mining, program repair, findbugs violation, unsupervised learning.
F
ODUCTION
oftware projects widely use static code analysis
ssess software quality and identify potential de-
eral commercial [1], [2], [3] and open-source [4],
] tools are integrated into many software projects,
operating system development projects [8]. For
Java-based projects often adopt FindBugs [4] or
while C projects use Splint [6], cppcheck [7],
Static Analyzer [9], while Linux driver code
matically assessed with a battery of static analyzers
parse and the LDV toolkit. Developers may benefit
tools before running a program in real environ-
en though those tools do not guarantee that all
defects are real bugs [10].
analysis can detect several types of defects such
y vulnerabilities, performance issues, and bad
ming practices (so-called code smells) [11]. Re-
es denote those defects as static analysis viola-
public boolean equals(Object obj) {
// Violation Type:
// BC_EQUALS_METHOD_SHOULD_WORK_FOR_ALL_OBJECTS
return getModule().equals(
((ModuleWrapper) obj).getModule());
}
Fig. 1: Example of a detected violation, taken from Popu-
lateRepositoryMojo.java file at revision bdf3fe in project
nbm-maven-plugin1
.
As later addressed by developers via a patch represented
in Fig. 2, the method should return false if obj is not
of the same type as the object being compared. In this
case, when the type of obj argument is not the type of
ModuleWrapper, a java.lang.ClassCastException
should be thrown.
public boolean equals(Object obj) {
- return getModule().equals(
Example
Developers may (or may not)
change source code to fix
the violations.
n software projects widely use static code analysis
o assess software quality and identify potential de-
everal commercial [1], [2], [3] and open-source [4],
[7] tools are integrated into many software projects,
ng operating system development projects [8]. For
e, Java-based projects often adopt FindBugs [4] or
] while C projects use Splint [6], cppcheck [7],
ng Static Analyzer [9], while Linux driver code
tematically assessed with a battery of static analyzers
Sparse and the LDV toolkit. Developers may benefit
he tools before running a program in real environ-
even though those tools do not guarantee that all
ed defects are real bugs [10].
ic analysis can detect several types of defects such
urity vulnerabilities, performance issues, and bad
mming practices (so-called code smells) [11]. Re-
udies denote those defects as static analysis viola-
12] or alerts [13]. In the remainder of this paper,
ply refer to them as violations. Fig. 1 shows a viola-
stance, detected by FindBugs, which is a violation
BC_EQUALS_METHOD_SHOULD_WORK_FOR_ALL_OBJECTS,
oes not comply with the programming rule that the
mentation of method equals(Object obj) should
ke any assumption about the type of its obj argu-
// Violation Type:
// BC_EQUALS_METHOD_SHOULD_WORK_FOR_ALL_OBJECTS
return getModule().equals(
((ModuleWrapper) obj).getModule());
}
Fig. 1: Example of a detected violation, taken from Popu-
lateRepositoryMojo.java file at revision bdf3fe in project
nbm-maven-plugin1
.
As later addressed by developers via a patch represented
in Fig. 2, the method should return false if obj is not
of the same type as the object being compared. In this
case, when the type of obj argument is not the type of
ModuleWrapper, a java.lang.ClassCastException
should be thrown.
public boolean equals(Object obj) {
- return getModule().equals(
- ((ModuleWrapper) obj).getModule());
+ return obj instanceof ModuleWrapper &&
+ getModule().equals(
+ ((ModuleWrapper) obj).getModule());
}
Fig. 2: Example of fixing violation, taken from Commit
0fd11c of project nbm-maven-plugin.
Commit 0fd11c of project nbm-maven-plugin
Example
13. 6
Fixing based on bug description?
BC: Equals method should not assume anything about the type of its argument
(BC_EQUALS_METHOD_SHOULD_WORK_FOR_ALL_OBJECTS)
The equals(Object o) method shouldn't make any assumptions about the type of o. It should simply return false if o is not the
same type as this.
BIT: Check for sign of bitwise operation (BIT_SIGNED_CHECK)
This method compares an expression such as ((event.detail & SWT.SELECTED) > 0). Using bit arithmetic and then
comparing with the greater than operator can lead to unexpected results (of course depending on the value of
SWT.SELECTED). If SWT.SELECTED is a negative number, this is a candidate for a bug. Even when SWT.SELECTED is not
negative, it seems good practice to use '!= 0' instead of '> 0'.
CN: Class implements Cloneable but does not define or use clone method (CN_IDIOM)
Class implements Cloneable but does not define or use the clone method.
14. 6
Fixing based on bug description?
BC: Equals method should not assume anything about the type of its argument
(BC_EQUALS_METHOD_SHOULD_WORK_FOR_ALL_OBJECTS)
The equals(Object o) method shouldn't make any assumptions about the type of o. It should simply return false if o is not the
same type as this.
BIT: Check for sign of bitwise operation (BIT_SIGNED_CHECK)
This method compares an expression such as ((event.detail & SWT.SELECTED) > 0). Using bit arithmetic and then
comparing with the greater than operator can lead to unexpected results (of course depending on the value of
SWT.SELECTED). If SWT.SELECTED is a negative number, this is a candidate for a bug. Even when SWT.SELECTED is not
negative, it seems good practice to use '!= 0' instead of '> 0'.
CN: Class implements Cloneable but does not define or use clone method (CN_IDIOM)
Class implements Cloneable but does not define or use the clone method.
15. 6
Fixing based on bug description?
BC: Equals method should not assume anything about the type of its argument
(BC_EQUALS_METHOD_SHOULD_WORK_FOR_ALL_OBJECTS)
The equals(Object o) method shouldn't make any assumptions about the type of o. It should simply return false if o is not the
same type as this.
BIT: Check for sign of bitwise operation (BIT_SIGNED_CHECK)
This method compares an expression such as ((event.detail & SWT.SELECTED) > 0). Using bit arithmetic and then
comparing with the greater than operator can lead to unexpected results (of course depending on the value of
SWT.SELECTED). If SWT.SELECTED is a negative number, this is a candidate for a bug. Even when SWT.SELECTED is not
negative, it seems good practice to use '!= 0' instead of '> 0'.
CN: Class implements Cloneable but does not define or use clone method (CN_IDIOM)
Class implements Cloneable but does not define or use the clone method.
16. 6
Fixing based on bug description?
BC: Equals method should not assume anything about the type of its argument
(BC_EQUALS_METHOD_SHOULD_WORK_FOR_ALL_OBJECTS)
The equals(Object o) method shouldn't make any assumptions about the type of o. It should simply return false if o is not the
same type as this.
BIT: Check for sign of bitwise operation (BIT_SIGNED_CHECK)
This method compares an expression such as ((event.detail & SWT.SELECTED) > 0). Using bit arithmetic and then
comparing with the greater than operator can lead to unexpected results (of course depending on the value of
SWT.SELECTED). If SWT.SELECTED is a negative number, this is a candidate for a bug. Even when SWT.SELECTED is not
negative, it seems good practice to use '!= 0' instead of '> 0'.
CN: Class implements Cloneable but does not define or use clone method (CN_IDIOM)
Class implements Cloneable but does not define or use the clone method.
17. 7
Fixing based on bug description?
BC: Equals method should not assume anything about the type of its argument
(BC_EQUALS_METHOD_SHOULD_WORK_FOR_ALL_OBJECTS)
The equals(Object o) method shouldn't make any assumptions about the type of o. It should simply return false if o is not the
same type as this.
BIT: Check for sign of bitwise operation (BIT_SIGNED_CHECK)
This method compares an expression such as ((event.detail & SWT.SELECTED) > 0). Using bit arithmetic and then
comparing with the greater than operator can lead to unexpected results (of course depending on the value of
SWT.SELECTED). If SWT.SELECTED is a negative number, this is a candidate for a bug. Even when SWT.SELECTED is not
negative, it seems good practice to use '!= 0' instead of '> 0'.
CN: Class implements Cloneable but does not define or use clone method (CN_IDIOM)
Class implements Cloneable but does not define or use the clone method.
Requires strong background knowledge.
Provides not enough details.
40. 21
Tracking violations
Identify identical violations
between revisions*.
Detect whether a violation
is fixed, or just removed.
[*] P. Avgustinov, A. I. Baars, A. S. Henriksen, G. Lavender, G. Menzel, O. de Moor, M. Schfer, and J. Tibble,
“Tracking Static Analysis Violations over Time to Capture Developer Characteristics,” in Proceedings of the 37th
International Conference on Software Engineering, 2015, pp. 437–447.
41. 22
Parsing changes (i.e., patches)
UPD ReturnStatement@@”return getModule().equals(((ModuleWrapper) obj).getModule());”
---INS InfixExpression@@”obj instanceof ModuleWrapper…” to ReturnStatement
------INS InstanceofExpression@@”obj instanceof ModuleWrapper ” to InfixExpression
---------INS Variable@@”obj” to InstanceofExpression
---------INS Operator@@”instanceof” to InstanceofExpression
---------INS SimpleType@@”ModuleWrapper” to InstanceofExpression
------MOV MethodInvocation@@” getModule().equals(…)” to InfixExpression
We used GumTree* to
identify AST-level changes.
[*] J.-R. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus, “Fine-grained and accurate source code differencing,” in ACM/IEEE
International Conference on Automated Software Engineering. Vasteras, Sweden - September 15 - 19: ACM, 2014, pp. 313–324.
42. 23
Tokenizing change information
UPD ReturnStatement@@”return getModule().equals(((ModuleWrapper) obj).getModule());”
---INS InfixExpression@@”obj instanceof ModuleWrapper…” to ReturnStatement
------INS InstanceofExpression@@”obj instanceof ModuleWrapper ” to InfixExpression
---------INS Variable@@”obj” to InstanceofExpression
---------INS Operator@@”instanceof” to InstanceofExpression
---------INS SimpleType@@”ModuleWrapper” to InstanceofExpression
------MOV MethodInvocation@@” getModule().equals(…)” to InfixExpression
43. 23
Tokenizing change information
[UPD ReturnStatement, INS InfixExpression, INS InstanceofExpression, INSVariable, INS
Operator, INS SimpleType, MOV MethodInvocation]
UPD ReturnStatement@@”return getModule().equals(((ModuleWrapper) obj).getModule());”
---INS InfixExpression@@”obj instanceof ModuleWrapper…” to ReturnStatement
------INS InstanceofExpression@@”obj instanceof ModuleWrapper ” to InfixExpression
---------INS Variable@@”obj” to InstanceofExpression
---------INS Operator@@”instanceof” to InstanceofExpression
---------INS SimpleType@@”ModuleWrapper” to InstanceofExpression
------MOV MethodInvocation@@” getModule().equals(…)” to InfixExpression
Token Embedding
(Word2Vec)
<2, 6, 9, …> <8, 4, 1, …> <9, 0, 7, …> <2, 3, 0, …> … <7, 1, 2, …> …
44. 23
Tokenizing change information
[UPD ReturnStatement, INS InfixExpression, INS InstanceofExpression, INSVariable, INS
Operator, INS SimpleType, MOV MethodInvocation]
UPD ReturnStatement@@”return getModule().equals(((ModuleWrapper) obj).getModule());”
---INS InfixExpression@@”obj instanceof ModuleWrapper…” to ReturnStatement
------INS InstanceofExpression@@”obj instanceof ModuleWrapper ” to InfixExpression
---------INS Variable@@”obj” to InstanceofExpression
---------INS Operator@@”instanceof” to InstanceofExpression
---------INS SimpleType@@”ModuleWrapper” to InstanceofExpression
------MOV MethodInvocation@@” getModule().equals(…)” to InfixExpression
Token Embedding
(Word2Vec)
<2, 6, 9, …> <8, 4, 1, …> <9, 0, 7, …> <2, 3, 0, …> … <7, 1, 2, …> …
45. 24
…
…
n × k (a two-dimensional
numeric vector)
Input layer
C1: 4 feature maps
S1: 4 feature maps
C2: 6 feature maps
S2: 6 feature maps
Convolutional layer
Convolutional
layerSubsampling layer
Subsampling
layer Fully connected layers
Output
layer
UPD ReturnStatement
INS InfixExpression
INS InstanceofExpression
INS Variable
INS Operator
INS SimpleType
MOV MethodInvocation
INSMethod
0
0
0
0
0
0
Dense layer
Output is extracted
features
Embedding change information
46. 24
…
…
n × k (a two-dimensional
numeric vector)
Input layer
C1: 4 feature maps
S1: 4 feature maps
C2: 6 feature maps
S2: 6 feature maps
Convolutional layer
Convolutional
layerSubsampling layer
Subsampling
layer Fully connected layers
Output
layer
UPD ReturnStatement
INS InfixExpression
INS InstanceofExpression
INS Variable
INS Operator
INS SimpleType
MOV MethodInvocation
INSMethod
0
0
0
0
0
0
Dense layer
Output is extracted
features
Embedding change information
47. 25
Clustering Patches and Identifying Fix Patterns
Violation Type:
BC_EQUALS_METHOD_SHOULD_WORK_FOR_ALL_OBJECTS
Patch Example:
- return exp1().equals(((T)obj).exp2());
+ return obj instanceof T && exp1().
equals(((T)obj).exp2());
Fix Pattern###:
UPD ReturnStatement
---INS InfixExpression
------MOV MethodInvocation
------INS InstanceofExpression
---------INS Variable
---------INS Instanceof
---------INS SimpleType
------INS Operator
[*] D. Pelleg, A. W. Moore et al., “X-means: Extending k-means with efficient estimation of the number of
clusters.” in ICML, vol. 1, 2000, pp. 727–734.
*
49. 27hese violations, as a result, 16,918,530 distinct
e identified.
ABLE 1: Subjects used in this study.
# Projects 730
# Commits 291,615
# Violations (detected) 250,387,734
# Distinct violations 16,918,530
# Violations types 400
Subjects
Collected from GitHub.com.
With at least one violation
fixing commits.
50. 28
Fix Patterns Identified
Violation Type:
BC_EQUALS_METHOD_SHOULD_WORK_FOR_ALL_OBJECTS
Patch Example:
- return exp1().equals(((T)obj).exp2());
+ return obj instanceof T && exp1().equals(((T)obj).exp2());
Fix Pattern###:
UPD ReturnStatement
---INS InfixExpression
------MOV MethodInvocation
------INS InstanceofExpression
---------INS Variable
---------INS Instanceof
---------INS SimpleType
------INS Operator
We have identified
174 fix patterns for
111 violation types.
Example
52. 30
Comparison (Defects4J)
Chart Closure Lang Math Mokito Time Total
AVATAR* 5 8 5 6 2 1 27
CapGen 4 0 5 12 0 0 21
Nopol 1 0 3 1 0 0 5
ACS 2 0 3 12 0 1 18
SimFix 4 6 9 14 0 1 34
[*] K. Liu, A. Koyuncu, D. Kim, and T. F. Bissyandè, “AVATAR: Fixing Semantic Bugs with Fix Patterns of Static
Analysis Violations,” in 2019 IEEE 26th International Conference on Software Analysis, Evolution and
Reengineering (SANER), 2019, pp. 1–12.
53. 30
Comparison (Defects4J)
Chart Closure Lang Math Mokito Time Total
AVATAR* 5 8 5 6 2 1 27
CapGen 4 0 5 12 0 0 21
Nopol 1 0 3 1 0 0 5
ACS 2 0 3 12 0 1 18
SimFix 4 6 9 14 0 1 34
[*] K. Liu, A. Koyuncu, D. Kim, and T. F. Bissyandè, “AVATAR: Fixing Semantic Bugs with Fix Patterns of Static
Analysis Violations,” in 2019 IEEE 26th International Conference on Software Analysis, Evolution and
Reengineering (SANER), 2019, pp. 1–12.
54. 30
Comparison (Defects4J)
Chart Closure Lang Math Mokito Time Total
AVATAR* 5 8 5 6 2 1 27
CapGen 4 0 5 12 0 0 21
Nopol 1 0 3 1 0 0 5
ACS 2 0 3 12 0 1 18
SimFix 4 6 9 14 0 1 34
Note that our fix patterns
are extracted only from
violation fixing patterns.
[*] K. Liu, A. Koyuncu, D. Kim, and T. F. Bissyandè, “AVATAR: Fixing Semantic Bugs with Fix Patterns of Static
Analysis Violations,” in 2019 IEEE 26th International Conference on Software Analysis, Evolution and
Reengineering (SANER), 2019, pp. 1–12.
57. 32
Summary
X
Live Study
Subject
# Pull Requests
Submitted Merged Improved Rejected Ignored
json-simple 2 2
commons-io 2 2
commons-lang 7 1 1 5
commons-math 6 6
ant 16 9 1 4 2
cassandra 9 9
mahout 3 3
aries 5 5
poi 44 44
camel 22 14 8
Total 116 67 2 15 32
X
…
…
n × k (a two-dimensional
numeric vector)
Input layer
C1: 4 feature maps
S1: 4 feature maps
C2: 6 feature maps
S2: 6 feature maps
Convolutional layer
Convolutional
layerSubsampling layer
Subsampling
layer Fully connected layers
Output
layer
UPD ReturnStatement
INS InfixExpression
INS InstanceofExpression
INS Variable
INS Operator
INS SimpleType
MOV MethodInvocation
INSMethod
0
0
0
0
0
0
Dense layer
Output is extracted
features
Embedding change information
X
Overview
X
Fixing based on bug description?
BC: Equals method should not assume anything about the type of its argument
(BC_EQUALS_METHOD_SHOULD_WORK_FOR_ALL_OBJECTS)
The equals(Object o) method shouldn't make any assumptions about the type of o. It should simply return false if o is not the
same type as this.
BIT: Check for sign of bitwise operation (BIT_SIGNED_CHECK)
This method compares an expression such as ((event.detail & SWT.SELECTED) > 0). Using bit arithmetic and then
comparing with the greater than operator can lead to unexpected results (of course depending on the value of
SWT.SELECTED). If SWT.SELECTED is a negative number, this is a candidate for a bug. Even when SWT.SELECTED is not
negative, it seems good practice to use '!= 0' instead of '> 0'.
CN: Class implements Cloneable but does not define or use clone method (CN_IDIOM)
Class implements Cloneable but does not define or use the clone method.