The document summarizes a multidimensional empirical study on refactoring activity in three open source projects. It aims to understand refactoring practices by developers through analyzing version control histories. The study seeks to answer five research questions related to the types of refactorings applied to test vs production code, individual contributions to refactoring, alignment with release dates, relationship between test and production refactoring, and motivations for applied refactorings. The methodology examines commit histories, detects refactorings, identifies test code, analyzes activity trends, and classifies sampled refactorings. Key findings include different refactoring focuses for test vs production code, alignment of refactoring and testing, increased refactoring before releases
Integration Group - Lithium test strategyOpenDaylight
This document outlines an integration and test strategy for OpenDaylight projects. It proposes testing individual features in isolation and together to check for interference. Tests would run automatically through continuous integration on code changes or merges between projects. Features would be tested by installing them individually and with a set of compatible features to test integration. Metrics like code coverage, bugs, and performance would help evaluate quality.
Who Should Review My Code? A file-location based code-reviewer recommendation approach for modern code review.
This research study is presented at the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER2015)
Find more information and preprint at patanamon.com
Code review is one of the crucial software activities where developers and stakeholders collaborate with each other in order to assess software changes. Since code review processes act as a final gate for new software changes to be integrated into the software product, an intense collaboration is necessary in order to prevent defects and produce a high quality of software products. Recently, code review analytics has been implemented in projects (for example, StackAnalytics4 of the OpenStack project) to monitor the collaboration activities between developers and stakeholders in the code review processes. Yet, due to the large volume of software data, code review analytics can only report a static summary (e.g., counting), while neither insights nor instant suggestions are provided. Hence, to better gain valuable insights from software data and help software projects make a better decision, we conduct an empirical investigation using statistical approaches. In particular, we use the large-scale data of 196,712 reviews spread across the Android, Qt, and OpenStack open source projects to train a prediction model in order to uncover the relationship between the characteristics of software changes and the likelihood of having poor code review collaborations. We extract 20 patch characteristics which are grouped along five dimensions, i.e., software changes properties, review participation history, past involvement of a code author, past involvement of reviewers, and review environment dimensions. To validate our findings, we use the bootstrap technique which repeats the experiment 1,000 times. Due to the large volume of studied data, and an intensive computation of characteristic extraction and find- ing validation, the use of the High-Performance-Computing (HPC) re- sources is mandatory to expedite the analysis and generate insights in a timely manner. Through our case study, we find that the amount of review participation in the past and the description length of software changes are a significant indicator that new software changes will suffer from poor code review collaborations [2017]. Moreover, we find that the purpose of introducing new features can increase the likelihood that new software changes will receive late collaboration from reviewers. Our findings highlight the need for the policies of software change submission that monitor these characteristics in order to help software projects improve the quality of code reviews processes. Moreover, based on our findings, future work should develop real-time code review analytics implemented on HPC resources in order to instantly provide insights and suggestions to software projects
A Survey on Automatic Software Evolution TechniquesSung Kim
The document discusses automatic software evolution techniques. It describes approaches like refactoring, automatic patch generation, runtime recovery and performance improvement. The most popular current approach is "generate and validate" which evolves a program with validation. Challenges include search space explosion as techniques are limited in the variants they can generate, and developing effective search methods to find valid patches. The document proposes mining existing software changes to learn common templates to guide the search for valid patches.
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...Raffi Khatchadourian
Java 8 is one of the largest upgrades to the popular language and framework in over a decade. In this talk, I will first overview several new, key features of Java 8 that can help make programs easier to read, write, and maintain, especially in regards to collections. These features include Lambda Expressions, the Stream API, and enhanced interfaces, many of which help bridge the gap between functional and imperative programming paradigms and allow for succinct concurrency implementations. Next, I will discuss several open issues related to automatically migrating (refactoring) legacy Java software to use such features correctly, efficiently, and as completely as possible. Solving these problems will help developers to maximally understand and adopt these new features thus improving their software.
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...Raffi Khatchadourian
Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today’s data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.
Methodology and Campaign Design for the Evaluation of Semantic Search ToolsStuart Wrigley
The main problem with the state of the art in the semantic search domain is the lack of comprehensive evaluations. There exist only a few efforts to evaluate semantic search tools and to compare the results with other evaluations of their kind.
In this paper, we present a systematic approach for testing and benchmarking semantic search tools that was developed within the SEALS project. Unlike other semantic web evaluations our methodology tests search tools both automatically and interactively with a human user in the loop. This allows us to test not only functional performance measures, such as precision and recall, but also usability issues, such as ease of use and comprehensibility of the query language.
The paper describes the evaluation goals and assumptions; the criteria and metrics; the type of experiments we will conduct as well as the datasets required to conduct the evaluation in the context of the SEALS initiative. To our knowledge it is the first effort to present a comprehensive evaluation methodology for Semantic Web search tools.
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
This document describes DeepAM, an approach for migrating APIs between programming languages using multi-modal sequence-to-sequence learning. DeepAM collects a parallel corpus of API sequences and natural language descriptions from large codebases. It learns semantic representations of API sequences using a deep neural network and aligns equivalent sequences between languages. DeepAM is evaluated on migrating Java APIs to C# and achieves higher accuracy than existing techniques in mining common API mappings from aligned sequences.
Integration Group - Lithium test strategyOpenDaylight
This document outlines an integration and test strategy for OpenDaylight projects. It proposes testing individual features in isolation and together to check for interference. Tests would run automatically through continuous integration on code changes or merges between projects. Features would be tested by installing them individually and with a set of compatible features to test integration. Metrics like code coverage, bugs, and performance would help evaluate quality.
Who Should Review My Code? A file-location based code-reviewer recommendation approach for modern code review.
This research study is presented at the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER2015)
Find more information and preprint at patanamon.com
Code review is one of the crucial software activities where developers and stakeholders collaborate with each other in order to assess software changes. Since code review processes act as a final gate for new software changes to be integrated into the software product, an intense collaboration is necessary in order to prevent defects and produce a high quality of software products. Recently, code review analytics has been implemented in projects (for example, StackAnalytics4 of the OpenStack project) to monitor the collaboration activities between developers and stakeholders in the code review processes. Yet, due to the large volume of software data, code review analytics can only report a static summary (e.g., counting), while neither insights nor instant suggestions are provided. Hence, to better gain valuable insights from software data and help software projects make a better decision, we conduct an empirical investigation using statistical approaches. In particular, we use the large-scale data of 196,712 reviews spread across the Android, Qt, and OpenStack open source projects to train a prediction model in order to uncover the relationship between the characteristics of software changes and the likelihood of having poor code review collaborations. We extract 20 patch characteristics which are grouped along five dimensions, i.e., software changes properties, review participation history, past involvement of a code author, past involvement of reviewers, and review environment dimensions. To validate our findings, we use the bootstrap technique which repeats the experiment 1,000 times. Due to the large volume of studied data, and an intensive computation of characteristic extraction and find- ing validation, the use of the High-Performance-Computing (HPC) re- sources is mandatory to expedite the analysis and generate insights in a timely manner. Through our case study, we find that the amount of review participation in the past and the description length of software changes are a significant indicator that new software changes will suffer from poor code review collaborations [2017]. Moreover, we find that the purpose of introducing new features can increase the likelihood that new software changes will receive late collaboration from reviewers. Our findings highlight the need for the policies of software change submission that monitor these characteristics in order to help software projects improve the quality of code reviews processes. Moreover, based on our findings, future work should develop real-time code review analytics implemented on HPC resources in order to instantly provide insights and suggestions to software projects
A Survey on Automatic Software Evolution TechniquesSung Kim
The document discusses automatic software evolution techniques. It describes approaches like refactoring, automatic patch generation, runtime recovery and performance improvement. The most popular current approach is "generate and validate" which evolves a program with validation. Challenges include search space explosion as techniques are limited in the variants they can generate, and developing effective search methods to find valid patches. The document proposes mining existing software changes to learn common templates to guide the search for valid patches.
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...Raffi Khatchadourian
Java 8 is one of the largest upgrades to the popular language and framework in over a decade. In this talk, I will first overview several new, key features of Java 8 that can help make programs easier to read, write, and maintain, especially in regards to collections. These features include Lambda Expressions, the Stream API, and enhanced interfaces, many of which help bridge the gap between functional and imperative programming paradigms and allow for succinct concurrency implementations. Next, I will discuss several open issues related to automatically migrating (refactoring) legacy Java software to use such features correctly, efficiently, and as completely as possible. Solving these problems will help developers to maximally understand and adopt these new features thus improving their software.
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...Raffi Khatchadourian
Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today’s data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.
Methodology and Campaign Design for the Evaluation of Semantic Search ToolsStuart Wrigley
The main problem with the state of the art in the semantic search domain is the lack of comprehensive evaluations. There exist only a few efforts to evaluate semantic search tools and to compare the results with other evaluations of their kind.
In this paper, we present a systematic approach for testing and benchmarking semantic search tools that was developed within the SEALS project. Unlike other semantic web evaluations our methodology tests search tools both automatically and interactively with a human user in the loop. This allows us to test not only functional performance measures, such as precision and recall, but also usability issues, such as ease of use and comprehensibility of the query language.
The paper describes the evaluation goals and assumptions; the criteria and metrics; the type of experiments we will conduct as well as the datasets required to conduct the evaluation in the context of the SEALS initiative. To our knowledge it is the first effort to present a comprehensive evaluation methodology for Semantic Web search tools.
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
This document describes DeepAM, an approach for migrating APIs between programming languages using multi-modal sequence-to-sequence learning. DeepAM collects a parallel corpus of API sequences and natural language descriptions from large codebases. It learns semantic representations of API sequences using a deep neural network and aligns equivalent sequences between languages. DeepAM is evaluated on migrating Java APIs to C# and achieves higher accuracy than existing techniques in mining common API mappings from aligned sequences.
ESSIR LivingKnowledge DiversityEngine tutorialJonathon Hare
The document summarizes a symposium on bias and diversity in information retrieval testbeds. It introduces the Diversity Engine, which provides collections, annotation tools, and an evaluation framework to allow for collaborative and comparable research on indexing and searching documents annotated with various metadata like entities, bias, trust, and multimedia features. It describes the architecture, design decisions, supported document collections and formats, analysis modules for text and images, indexing and search functionality using Solr, application development, and evaluation framework. An example application is demonstrated by indexing a sample collection and making it searchable.
Review Participation in Modern Code Review: An Empirical Study of the Android...The University of Adelaide
This work empirically investigates the factors influence review participation in the MCR process. Through a case study of the Android, Qt, and OpenStack open source projects, we find that the amount of review participation in the past is a significant indicator of patches that will suffer from poor review participation. Moreover, the description length of a patch and the purpose of introducing new features also share a relationship with the likelihood of receiving poor review participation.
This full article of this work is published in the Empirical Software Engineering journal. Available online at http://dx.doi.org/10.1007/s10664-016-9452-6
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...Raffi Khatchadourian
This document describes an approach to automatically evolve the logging statement levels of features in software based on degree of interest over time. It extracts feature logging statements and their levels from code, manipulates a degree of interest model based on code changes from Git histories, and identifies mismatches between current levels and predicted levels to suggest level changes. The goal is to reduce information overload by bringing more relevant feature logs to the foreground and less relevant ones to the background based on recent development activity. Security implications regarding side-channel attacks are also mentioned.
Software Defect Prediction on Unlabeled DatasetsSung Kim
The document describes techniques for software defect prediction when labeled training data is unavailable. It proposes Transfer Defect Learning (TCA+) to improve cross-project defect prediction by normalizing data distributions between source and target projects. For projects with heterogeneous metrics, it introduces Heterogeneous Defect Prediction (HDP) which matches similar metrics between source and target to build cross-project prediction models. It also discusses CLAMI for defect prediction using only unlabeled data without human effort. The techniques are evaluated on open source projects to demonstrate their effectiveness compared to traditional cross-project and within-project prediction.
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...Lionel Briand
The document summarizes research into vulnerabilities in XML parsers related to billion laughs (BIL) denial of service attacks and XML external entity (XXE) attacks. The research assessed 13 XML parsers and 8 open source systems that use XML parsers. It found that over half of the parsers tested were vulnerable to BIL and XXE attacks. It also found that all of the open source systems tested were vulnerable because they used vulnerable parsers without applying any mitigation techniques. The research concludes that BIL and XXE attacks remain successful against many modern XML parsers and systems that use them. It recommends that software developers configure parsers securely and limit vulnerable features, and that parser developers improve security in their default configurations.
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Raffi Khatchadourian
Actor concurrency is becoming increasingly important in the development of real-world software systems. Although actor concurrency may be less susceptible to some multithreaded concurrency bugs, such as low-level data races and deadlocks, it comes with its own bugs that may be different. However, the fundamental characteristics of actor concurrency bugs, including their symptoms, root causes, API usages, examples, and differences when they come from different sources are still largely unknown. Actor software development can significantly benefit from a comprehensive qualitative and quantitative understanding of these characteristics, which is the focus of this work, to foster better API documentation, development practices, testing, debugging, repairing, and verification frameworks. To conduct this study, we take the following major steps. First, we construct a set of 186 real-world Akka actor bugs from Stack Overflow and GitHub via manual analysis of 3,924
Stack Overflow questions, answers, and comments and 3,315 GitHub commits, messages, original and modified code snippets, issues, and pull requests. Second, we manually study these actor bugs and their fixes to understand and classify their symptoms, root causes, and API usages. Third, we study the differences between the commonalities and distributions of symptoms, root causes, and API usages of our Stack Overflow and GitHub actor bugs. Fourth, we discuss real-world examples of our actor bugs with these symptoms and root causes. Finally, we investigate the relation of our findings with those of previous work and discuss their implications. A few findings of our study are: (1) symptoms of our actor bugs can be classified into five categories, with Error as the most common symptom and Incorrect Exceptions as the least common, (2) root causes of our actor bugs can be classified into ten categories, with Logic as the most common root cause and Untyped Communication as the least common, (3) a small number of Akka API packages are responsible for most of API usages by our actor bugs, and (4) our Stack Overflow and GitHub actor bugs can differ significantly in commonalities and distributions of their symptoms, root causes, and API usages. While some of our findings agree with those of previous work, others sharply contrast.
This document provides a summary of automation testing topics covered on day 7, including parameters, environment variables, data tables, and examples of how to use them in tests. Parameters discussed include test, action, random number, and data table parameters. Environment variables can be built-in, user-defined, or stored in an external XML file. Data tables allow running tests with different data multiple times and have global and local sheets. Methods are provided to work with data tables, parameters, and sheets.
This document describes the Hermes tool for creating representative benchmark corpora for evaluating code analyses. Hermes uses feature queries to identify the relevant technical features present in projects, and then selects a minimal subset of projects that maximally covers these features. This optimized corpus is more manageable for development and testing while still being comprehensive. The document provides examples of feature queries and how Hermes was used to reduce the corpus size for one evaluation from 100 to 5 projects, speeding up the evaluation significantly with only a small loss of coverage.
The document summarizes a dissertation defense about adaptive bug prediction by analyzing project history. It discusses the motivation for leveraging project history and software configuration management data for bug prediction. It also describes creating a corpus by identifying bug-fix changes and bug-introducing changes from commits. The dissertation proposes using a "bug cache" to predict likely locations of future bugs based on past bug occurrences.
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Feng Zhang
Defect prediction on projects with limited historical data has attracted great interest from both researchers and practitioners. Cross-project defect prediction has been the main area of progress by reusing classifiers from other projects. However, existing approaches require some degree of homogeneity (e.g., a similar distribution of metric values) between the training projects and the target project. Satisfying the homogeneity requirement often requires significant effort (currently a very active area of research).
An unsupervised classifier does not require any training data, therefore the heterogeneity challenge is no longer an issue. In this paper, we examine two types of unsupervised classifiers: a) distance-based classifiers (e.g., k-means); and b) connectivity-based classifiers. While distance-based unsupervised classifiers have been previously used in the defect prediction literature with disappointing performance, connectivity-based classifiers have never been explored before in our community.
We compare the performance of unsupervised classifiers versus supervised classifiers using data from 26 projects from three publicly available datasets (i.e., AEEEM, NASA, and PROMISE). In the cross-project setting, our proposed connectivity-based classifier (via spectral clustering) ranks as one of the top classifiers among five widely-used supervised classifiers (i.e., random forest, naive Bayes, logistic regression, decision tree, and logistic model tree) and five unsupervised classifiers (i.e., k-means, partition around medoids, fuzzy C-means, neural-gas, and spectral clustering). In the within-project setting (i.e., models are built and applied on the same project), our spectral classifier ranks in the second tier, while only random forest ranks in the first tier. Hence, connectivity-based unsupervised classifiers offer a viable solution for cross and within project defect predictions.
This document describes DeepAPI, a deep learning model that uses an RNN encoder-decoder architecture to generate API usage sequences from natural language queries. It trains on a large corpus of code snippets paired with their documentation comments. DeepAPI is shown to outperform traditional IR approaches by better understanding query semantics and word order. Automatic and human evaluations demonstrate its ability to generate accurate and relevant API sequences for a variety of queries. Parameters like hidden units and word dimensions are analyzed. Enhancements like weighting APIs by importance further improve performance. DeepAPI has applications beyond code search like synthesis of sample code from query understanding.
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectSCAPE Project
Jpylyzer is a tool for validation and feature extraction for the JP2 (JPEG 2000 Part 1) still image format. The tool is being developed in the SCAPE Project and was presented by Johan van der Knijff at Archiving 2012 in Copenhagen.
Runtime Behavior of JavaScript ProgramsIRJET Journal
The document analyzes the dynamic behavior of JavaScript programs by collecting execution traces from 103 websites and benchmark suites. It finds that JavaScript programs exhibit more dynamism than commonly assumed, with only 81% of call sites being monomorphic and many functions being variadic. Constructor functions also frequently return objects with different property sets. The study aims to provide a more accurate characterization of JavaScript behavior to inform future research.
This is a presentation given at https://toracle2021.github.io
Security testing verifies that the data and the re- sources of software systems are protected from attackers. Unfor- tunately, it suffers from the oracle problem, which refers to the challenge, given an input for a system, of distinguishing correct from incorrect behavior. In many situations where potential vulnerabilities are tested, a test oracle may not exist, or it might be impractical due to the many inputs for which specific oracles have to be defined.
In this paper, we propose a metamorphic testing approach that alleviates the oracle problem in security testing. It enables engineers to specify metamorphic relations (MRs) that capture security properties of the system. Such MRs are then used to automate testing and detect vulnerabilities.
We provide a catalog of 22 system-agnostic MRs to automate security testing in Web systems. Our approach targets 39% of the OWASP security testing activities not automated by state- of-the-art techniques. It automatically detected 10 out of 12 vulnerabilities affecting two widely used systems, one commercial and the other open source (Jenkins).
Data collection for software defect predictionAmmAr mobark
It is one of the important stages that software companies need it, it will be after produce the program and published, to know the reactions of the users and their impressions about the program and work on developing and improving it.
The Contents
* BACKGROUND AND RELATED WORK
* EXPERIMENTAL PLANNING
-Research Goal -Research Questions -Experimental Subjects
-Experimental Material -Tasks and Methods
-Experimental Design
عمار عبد الكريم صاحب مبارك
AmmAr Abdualkareem sahib mobark
This document discusses automation testing using QuickTest Professional (QTP). It covers topics like actions, function libraries, and debugging tests. Actions help divide tests into logical units and can be reusable or non-reusable. Function libraries allow storing reusable functions in different file formats. Debugging tools in QTP include the debug viewer, breakpoints, and step commands. The document provides examples of calling existing actions, using action parameters, and differences between actions and functions. Overall it serves as training material for using actions, functions and debugging in QTP test automation.
The Use of Development History in Software Refactoring Using a Multi-Objectiv...Ali Ouni
The document presents a multi-objective approach to automate software refactoring using evolutionary algorithms. It formulates refactoring as a multi-objective optimization problem to improve code quality, preserve semantics, and maximize reuse of past development history. An evaluation on two open source projects shows the approach corrects most defects while maintaining high refactoring precision compared to existing techniques. Future work includes leveraging refactoring histories from multiple systems and improving context-based similarity measures.
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewAli Ouni
This document presents RevRec, a search-based approach for recommending peer reviewers for code changes. RevRec uses a genetic algorithm to combine models of reviewer expertise and collaboration. It was evaluated on three open source projects and achieved an average of 55% precision and 70% recall in recommending reviewers. The results indicate that considering both expertise and social aspects improves accuracy over expertise alone. A population-based search algorithm like genetic algorithm was found to be better suited than local search methods for this recommendation problem. Future work could integrate additional data sources to improve the expertise model and accuracy of recommendations.
This document discusses using R for cheminformatics and quantitative structure-activity relationship (QSAR) modeling. It provides an overview of R and describes tools like the Chemistry Development Kit (CDK) and rcdk package that allow working with chemical data in R. The document discusses reading and working with molecules and chemical data, calculating molecular descriptors, generating fingerprints, accessing public databases, and using the results for QSAR modeling and machine learning in R.
This document summarizes a study on refactoring activity that analyzed refactoring operations, developers responsible, timing relative to releases, relationships between production and test code refactoring, and purposes of refactorings. The study introduced the first refactoring detection tool to operate on Git commits and catalogued 44 motivations for refactoring types. It found that refactoring detection tools need to handle partial code and inspired future work improving precision, recall, oracles, and supported refactorings. The tools have been widely influential in empirical studies of refactoring.
Java 9 is expected to include several new features and changes, including:
- New collection factory methods like Set.of() and Map.of() that provide immutable collections.
- Enhancements to the Stream API such as takeWhile() and dropWhile().
- Syntax changes like allowing effectively final variables in try-with-resources and @SafeVarargs for private methods.
- The addition of JShell to provide a Java REPL.
- Garbage First (G1) garbage collector becoming the default collector.
- Various performance and logging improvements.
ESSIR LivingKnowledge DiversityEngine tutorialJonathon Hare
The document summarizes a symposium on bias and diversity in information retrieval testbeds. It introduces the Diversity Engine, which provides collections, annotation tools, and an evaluation framework to allow for collaborative and comparable research on indexing and searching documents annotated with various metadata like entities, bias, trust, and multimedia features. It describes the architecture, design decisions, supported document collections and formats, analysis modules for text and images, indexing and search functionality using Solr, application development, and evaluation framework. An example application is demonstrated by indexing a sample collection and making it searchable.
Review Participation in Modern Code Review: An Empirical Study of the Android...The University of Adelaide
This work empirically investigates the factors influence review participation in the MCR process. Through a case study of the Android, Qt, and OpenStack open source projects, we find that the amount of review participation in the past is a significant indicator of patches that will suffer from poor review participation. Moreover, the description length of a patch and the purpose of introducing new features also share a relationship with the likelihood of receiving poor review participation.
This full article of this work is published in the Empirical Software Engineering journal. Available online at http://dx.doi.org/10.1007/s10664-016-9452-6
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...Raffi Khatchadourian
This document describes an approach to automatically evolve the logging statement levels of features in software based on degree of interest over time. It extracts feature logging statements and their levels from code, manipulates a degree of interest model based on code changes from Git histories, and identifies mismatches between current levels and predicted levels to suggest level changes. The goal is to reduce information overload by bringing more relevant feature logs to the foreground and less relevant ones to the background based on recent development activity. Security implications regarding side-channel attacks are also mentioned.
Software Defect Prediction on Unlabeled DatasetsSung Kim
The document describes techniques for software defect prediction when labeled training data is unavailable. It proposes Transfer Defect Learning (TCA+) to improve cross-project defect prediction by normalizing data distributions between source and target projects. For projects with heterogeneous metrics, it introduces Heterogeneous Defect Prediction (HDP) which matches similar metrics between source and target to build cross-project prediction models. It also discusses CLAMI for defect prediction using only unlabeled data without human effort. The techniques are evaluated on open source projects to demonstrate their effectiveness compared to traditional cross-project and within-project prediction.
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...Lionel Briand
The document summarizes research into vulnerabilities in XML parsers related to billion laughs (BIL) denial of service attacks and XML external entity (XXE) attacks. The research assessed 13 XML parsers and 8 open source systems that use XML parsers. It found that over half of the parsers tested were vulnerable to BIL and XXE attacks. It also found that all of the open source systems tested were vulnerable because they used vulnerable parsers without applying any mitigation techniques. The research concludes that BIL and XXE attacks remain successful against many modern XML parsers and systems that use them. It recommends that software developers configure parsers securely and limit vulnerable features, and that parser developers improve security in their default configurations.
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Raffi Khatchadourian
Actor concurrency is becoming increasingly important in the development of real-world software systems. Although actor concurrency may be less susceptible to some multithreaded concurrency bugs, such as low-level data races and deadlocks, it comes with its own bugs that may be different. However, the fundamental characteristics of actor concurrency bugs, including their symptoms, root causes, API usages, examples, and differences when they come from different sources are still largely unknown. Actor software development can significantly benefit from a comprehensive qualitative and quantitative understanding of these characteristics, which is the focus of this work, to foster better API documentation, development practices, testing, debugging, repairing, and verification frameworks. To conduct this study, we take the following major steps. First, we construct a set of 186 real-world Akka actor bugs from Stack Overflow and GitHub via manual analysis of 3,924
Stack Overflow questions, answers, and comments and 3,315 GitHub commits, messages, original and modified code snippets, issues, and pull requests. Second, we manually study these actor bugs and their fixes to understand and classify their symptoms, root causes, and API usages. Third, we study the differences between the commonalities and distributions of symptoms, root causes, and API usages of our Stack Overflow and GitHub actor bugs. Fourth, we discuss real-world examples of our actor bugs with these symptoms and root causes. Finally, we investigate the relation of our findings with those of previous work and discuss their implications. A few findings of our study are: (1) symptoms of our actor bugs can be classified into five categories, with Error as the most common symptom and Incorrect Exceptions as the least common, (2) root causes of our actor bugs can be classified into ten categories, with Logic as the most common root cause and Untyped Communication as the least common, (3) a small number of Akka API packages are responsible for most of API usages by our actor bugs, and (4) our Stack Overflow and GitHub actor bugs can differ significantly in commonalities and distributions of their symptoms, root causes, and API usages. While some of our findings agree with those of previous work, others sharply contrast.
This document provides a summary of automation testing topics covered on day 7, including parameters, environment variables, data tables, and examples of how to use them in tests. Parameters discussed include test, action, random number, and data table parameters. Environment variables can be built-in, user-defined, or stored in an external XML file. Data tables allow running tests with different data multiple times and have global and local sheets. Methods are provided to work with data tables, parameters, and sheets.
This document describes the Hermes tool for creating representative benchmark corpora for evaluating code analyses. Hermes uses feature queries to identify the relevant technical features present in projects, and then selects a minimal subset of projects that maximally covers these features. This optimized corpus is more manageable for development and testing while still being comprehensive. The document provides examples of feature queries and how Hermes was used to reduce the corpus size for one evaluation from 100 to 5 projects, speeding up the evaluation significantly with only a small loss of coverage.
The document summarizes a dissertation defense about adaptive bug prediction by analyzing project history. It discusses the motivation for leveraging project history and software configuration management data for bug prediction. It also describes creating a corpus by identifying bug-fix changes and bug-introducing changes from commits. The dissertation proposes using a "bug cache" to predict likely locations of future bugs based on past bug occurrences.
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Feng Zhang
Defect prediction on projects with limited historical data has attracted great interest from both researchers and practitioners. Cross-project defect prediction has been the main area of progress by reusing classifiers from other projects. However, existing approaches require some degree of homogeneity (e.g., a similar distribution of metric values) between the training projects and the target project. Satisfying the homogeneity requirement often requires significant effort (currently a very active area of research).
An unsupervised classifier does not require any training data, therefore the heterogeneity challenge is no longer an issue. In this paper, we examine two types of unsupervised classifiers: a) distance-based classifiers (e.g., k-means); and b) connectivity-based classifiers. While distance-based unsupervised classifiers have been previously used in the defect prediction literature with disappointing performance, connectivity-based classifiers have never been explored before in our community.
We compare the performance of unsupervised classifiers versus supervised classifiers using data from 26 projects from three publicly available datasets (i.e., AEEEM, NASA, and PROMISE). In the cross-project setting, our proposed connectivity-based classifier (via spectral clustering) ranks as one of the top classifiers among five widely-used supervised classifiers (i.e., random forest, naive Bayes, logistic regression, decision tree, and logistic model tree) and five unsupervised classifiers (i.e., k-means, partition around medoids, fuzzy C-means, neural-gas, and spectral clustering). In the within-project setting (i.e., models are built and applied on the same project), our spectral classifier ranks in the second tier, while only random forest ranks in the first tier. Hence, connectivity-based unsupervised classifiers offer a viable solution for cross and within project defect predictions.
This document describes DeepAPI, a deep learning model that uses an RNN encoder-decoder architecture to generate API usage sequences from natural language queries. It trains on a large corpus of code snippets paired with their documentation comments. DeepAPI is shown to outperform traditional IR approaches by better understanding query semantics and word order. Automatic and human evaluations demonstrate its ability to generate accurate and relevant API sequences for a variety of queries. Parameters like hidden units and word dimensions are analyzed. Enhancements like weighting APIs by importance further improve performance. DeepAPI has applications beyond code search like synthesis of sample code from query understanding.
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectSCAPE Project
Jpylyzer is a tool for validation and feature extraction for the JP2 (JPEG 2000 Part 1) still image format. The tool is being developed in the SCAPE Project and was presented by Johan van der Knijff at Archiving 2012 in Copenhagen.
Runtime Behavior of JavaScript ProgramsIRJET Journal
The document analyzes the dynamic behavior of JavaScript programs by collecting execution traces from 103 websites and benchmark suites. It finds that JavaScript programs exhibit more dynamism than commonly assumed, with only 81% of call sites being monomorphic and many functions being variadic. Constructor functions also frequently return objects with different property sets. The study aims to provide a more accurate characterization of JavaScript behavior to inform future research.
This is a presentation given at https://toracle2021.github.io
Security testing verifies that the data and the re- sources of software systems are protected from attackers. Unfor- tunately, it suffers from the oracle problem, which refers to the challenge, given an input for a system, of distinguishing correct from incorrect behavior. In many situations where potential vulnerabilities are tested, a test oracle may not exist, or it might be impractical due to the many inputs for which specific oracles have to be defined.
In this paper, we propose a metamorphic testing approach that alleviates the oracle problem in security testing. It enables engineers to specify metamorphic relations (MRs) that capture security properties of the system. Such MRs are then used to automate testing and detect vulnerabilities.
We provide a catalog of 22 system-agnostic MRs to automate security testing in Web systems. Our approach targets 39% of the OWASP security testing activities not automated by state- of-the-art techniques. It automatically detected 10 out of 12 vulnerabilities affecting two widely used systems, one commercial and the other open source (Jenkins).
Data collection for software defect predictionAmmAr mobark
It is one of the important stages that software companies need it, it will be after produce the program and published, to know the reactions of the users and their impressions about the program and work on developing and improving it.
The Contents
* BACKGROUND AND RELATED WORK
* EXPERIMENTAL PLANNING
-Research Goal -Research Questions -Experimental Subjects
-Experimental Material -Tasks and Methods
-Experimental Design
عمار عبد الكريم صاحب مبارك
AmmAr Abdualkareem sahib mobark
This document discusses automation testing using QuickTest Professional (QTP). It covers topics like actions, function libraries, and debugging tests. Actions help divide tests into logical units and can be reusable or non-reusable. Function libraries allow storing reusable functions in different file formats. Debugging tools in QTP include the debug viewer, breakpoints, and step commands. The document provides examples of calling existing actions, using action parameters, and differences between actions and functions. Overall it serves as training material for using actions, functions and debugging in QTP test automation.
The Use of Development History in Software Refactoring Using a Multi-Objectiv...Ali Ouni
The document presents a multi-objective approach to automate software refactoring using evolutionary algorithms. It formulates refactoring as a multi-objective optimization problem to improve code quality, preserve semantics, and maximize reuse of past development history. An evaluation on two open source projects shows the approach corrects most defects while maintaining high refactoring precision compared to existing techniques. Future work includes leveraging refactoring histories from multiple systems and improving context-based similarity measures.
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewAli Ouni
This document presents RevRec, a search-based approach for recommending peer reviewers for code changes. RevRec uses a genetic algorithm to combine models of reviewer expertise and collaboration. It was evaluated on three open source projects and achieved an average of 55% precision and 70% recall in recommending reviewers. The results indicate that considering both expertise and social aspects improves accuracy over expertise alone. A population-based search algorithm like genetic algorithm was found to be better suited than local search methods for this recommendation problem. Future work could integrate additional data sources to improve the expertise model and accuracy of recommendations.
This document discusses using R for cheminformatics and quantitative structure-activity relationship (QSAR) modeling. It provides an overview of R and describes tools like the Chemistry Development Kit (CDK) and rcdk package that allow working with chemical data in R. The document discusses reading and working with molecules and chemical data, calculating molecular descriptors, generating fingerprints, accessing public databases, and using the results for QSAR modeling and machine learning in R.
This document summarizes a study on refactoring activity that analyzed refactoring operations, developers responsible, timing relative to releases, relationships between production and test code refactoring, and purposes of refactorings. The study introduced the first refactoring detection tool to operate on Git commits and catalogued 44 motivations for refactoring types. It found that refactoring detection tools need to handle partial code and inspired future work improving precision, recall, oracles, and supported refactorings. The tools have been widely influential in empirical studies of refactoring.
Java 9 is expected to include several new features and changes, including:
- New collection factory methods like Set.of() and Map.of() that provide immutable collections.
- Enhancements to the Stream API such as takeWhile() and dropWhile().
- Syntax changes like allowing effectively final variables in try-with-resources and @SafeVarargs for private methods.
- The addition of JShell to provide a Java REPL.
- Garbage First (G1) garbage collector becoming the default collector.
- Various performance and logging improvements.
Inria Tech Talk : Comment améliorer la qualité de vos logiciels avec STAMPStéphanie Roger
Que vous soyez développeur ou entrepreneur, découvrez le projet STAMP piloté par Inria, l'institut national de recherche dédié aux sciences du numérique.
Microservices Part 4: Functional Reactive ProgrammingAraf Karsh Hamid
ReactiveX is a combination of the best ideas from the Observer pattern, the Iterator pattern, and functional programming. It combines the Observer pattern, Iterator pattern, and functional programming concepts. ReactiveX allows for asynchronous and event-based programming by using the Observer pattern to push data to observers, rather than using a synchronous pull-based approach.
Acutesoft is a best online center in Hyderabad. We are providing very best online training on Informatica.
Get(100$ off) on INFORMATICA Online Training at AcuteSoft Solutions For more information please contact us.
This document outlines the course content for an Informatica training course. The course covers data warehousing concepts, installing and using Informatica PowerCenter, transformations, mappings, workflows, monitoring, debugging, error handling, performance tuning, and best practices. The course includes over 30 hands-on labs covering topics like basic and advanced transformations, mappings, workflows, debugging, heterogeneous targets, reusable objects, workflow tasks, and a case study project.
When we are working on tests exercising large parts of our software system (e.g. in an acceptance test suite), we often have to set up a considerable amount of data to set the stage for the scenario under test. This might include several calls to cumbersome APIs. At first, such code can be hard do get right. When it is working properly, many times the intention of the setup is greatly hidden in a convoluted mess of code. Therefore, such code can pose a major hurdle for the evolution of the project. Although intensive refactoring can provide benefits, there are things demanding even better readability. At this point, techniques, patterns and tools like Specflow can provide advantages.
In this talk we discuss typical problems faced with the setup of test data and means to address those. We illustrate three cases where non-trivial setup was needed. After understanding the challenges faced we will present and discuss the final solutions. All topics are supported by code examples from a 10+ year project that has faced all of those issues.
This document describes ExSchema, a tool that discovers and maintains schemas from polyglot persistence applications. ExSchema analyzes the source code of applications using multiple data stores like document databases, graph databases, and relational databases. It identifies entities, attributes, relationships, and updates by examining declarations, annotations, project structure, and update operations. ExSchema represents the extracted schemas in a uniform metamodel and can output documentation and code artifacts to help developers understand and evolve the application's data design.
Resilience Engineering: A field of study, a community, and some perspective s...John Allspaw
These are slides from my talk on March 28, 2018 at the LA SCALE tech Meetup, graciously hosted at TicketMaster's office. (https://www.meetup.com/scalela/events/248904126/)
Ensuring Software Quality Through Test Automation- Naperville Software Develo...LinkCompanyAdmin
Presenter Adrian Theodorescu will focus on what it means to write a functional test automation framework, including but not limited to design goals and important features, as well as how to integrate it with source control and a continuous integration tool. The presenter would like to clarify that the views expressed in the talk are his own and not those of his employer.
Graph-Based Source Code Analysis of JavaScript Repositories Dániel Stein
A graph-based approach to analyze JavaScript source codes, using Neo4j as the graph database backend and ShapeSecurity Shift as the parser.
Hungarian version (presented at a Neo4j meetup): http://www.slideshare.net/steindani/forrskdtrak-grfalap-statikus-analzise
The document discusses ideas for new integrated development environments (IDEs) that actively include software change history and evolution analysis techniques. It describes harvesting data from change histories, release histories, dependencies and other sources to provide visualizations of developer networks, code evolution metrics, and other insights. The document argues that modern IDEs could integrate these kinds of historical analyses and visualizations to help developers, project managers, and newcomers better understand software structure and changes over time.
Geant4 is the main simulation framework used for hight energy physics. It is used by the majority of the LHC experiments in CERN for detector simulation. Physics validation is a form of regression testing, that ensures that the produced simulation data do agree with previous versions of the framework and follow similar patterns with real data acquired from the experiments.
I developed a new system that is able to formulate the process of running Geant4 applications/tests on the LHC world-wide computing grid (WLCG), gather the results, stores them in relational database for indexing and object storage and provide them to the user through an intuitive web user-interface that can run on-demand small analysis tasks.
Similar to A Multidimensional Empirical Study on Refactoring Activity (20)
Refactoring Mining - The key to unlock software evolutionNikolaos Tsantalis
Keynote at the International Workshop on Refactoring
co-located with the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE '21)
November 14, 2021
This document describes research on identifying extract method refactoring opportunities through program analysis and tool support. It discusses limitations of early work, such as not supporting extraction of objects or composite variables. It presents techniques to address these limitations, including handling behavior preservation issues and ensuring the complete computation of extracted variables. Evaluation results of tools are provided, showing improvements in precision and recall over time. The document also discusses how the work influenced the researcher's career and other researchers.
Accurate and Efficient Refactoring Detection in Commit HistoryNikolaos Tsantalis
Refactoring detection algorithms have been crucial to a variety of applications: (i) empirical studies about the evolution of code, tests, and faults, (ii) tools for library API migration, (iii) improving the comprehension of changes and code reviews, etc. However, recent research has questioned the accuracy of the state-of-the-art refactoring detection tools, which poses threats to the reliability of their application. Moreover, previous refactoring detection tools are very sensitive to user-provided similarity thresholds, which further reduces their practical accuracy. In addition, their requirement to build the project versions/revisions under analysis makes them inapplicable in many real-world scenarios. To reinvigorate a previously fruitful line of research that has stifled, we designed, implemented, and evaluated RefactoringMiner, a technique that overcomes the above limitations. At the heart of RefactoringMiner is an AST-based statement matching algorithm that determines refactoring candidates without requiring user-defined thresholds. To empirically evaluate RefactoringMiner, we created the most comprehensive oracle to date that uses triangulation to create a dataset with considerably reduced bias, representing 3,188 refactorings from 185 open-source projects. Using this oracle, we found that RefactoringMiner has a precision of 98% and recall of 87%, which is a significant improvement over the previous state-of-the-art.
Lambda expressions have been introduced in Java 8 to support functional programming and enable behavior parameterization by passing functions as parameters to methods. The majority of software clones (duplicated code) are known to have behavioral differences (i.e., Type-2 and Type-3 clones). However, to the best of our knowledge, there is no previous work to investigate the utility of Lambda expressions for parameterizing such behavioral differences in clones. In this paper, we propose a technique that examines the applicability of Lambda expressions for the refactoring of clones with behavioral differences. Moreover, we empirically investigate the applicability and characteristics of the Lambda expressions introduced to refactor a large dataset of clones. Our findings show that Lambda expressions enable the refactoring of a significant portion of clones that could not be refactored by any other means.
Refactoring is a widespread practice that helps developers
to improve the maintainability and readability of their code.
However, there is a limited number of studies empirically
investigating the actual motivations behind specific refactoring operations applied by developers. To fill this gap, we monitored Java projects hosted on GitHub to detect recently applied refactorings, and asked the developers to explain the reasons behind their decision to refactor the code.
By applying thematic analysis on the collected responses,
we compiled a catalogue of 44 distinct motivations for 12
well-known refactoring types. We found that refactoring activity is mainly driven by changes in the requirements and
much less by code smells. Extract Method is the most versatile refactoring operation serving 11 different purposes.
Finally, we found evidence that the IDE used by the developers affects the adoption of automated refactoring tools.
Cascading Style Sheets (CSS) is the standard language for
styling web documents and is extensively used in the industry. However, CSS lacks constructs that would allow code reuse (e.g., functions). Consequently, maintaining CSS code
is often a cumbersome and error-prone task. Preprocessors (e.g., Less and Sass) have been introduced to fill this gap,
by extending CSS with the missing constructs. Despite the
clear maintainability benefits coming from the use of preprocessors, there is currently no support for migrating legacy
CSS code to preprocessors. In this paper, we propose a
technique for automatically detecting duplicated style declarations in CSS code that can be migrated to preprocessor functions (i.e., mixins). Our technique can parameterize differences in the style values of duplicated declarations, and ensure that the migration will not change the presentation semantics of the web documents. The evaluation has shown that our technique is able to detect 98% of the
mixins that professional developers introduced in websites and
Style Sheet libraries, and can safely migrate real CSS code.
The document discusses a tool for refactoring clones in software. It aims to help developers explore clone groups, inspect differences between clones, and refactor clones safely. The tool detects clones using control structure matching and program dependence graphs. It visualizes clones and differences and automatically determines the best refactoring strategy, such as extract method or introduce utility method. Future work includes supporting more clones, lambda expressions, and testing refactorings. The tool is intended to improve on poor clone refactoring support in current tools.
This document summarizes an empirical study on the use of CSS preprocessor features like variables, nesting, mixins, and extending. The study analyzed over 1,200 preprocessor files from 50 websites using Sass, SCSS, and Less. The key findings are:
1) Developers primarily use global variables with color values, shallow nesting hierarchies, mixins that are reused and have few parameters/declarations, and no-parameter mixins instead of extending.
2) Migration tools should support these common patterns and also warn about extending which may break presentation semantics.
3) Future work includes developer surveys and studying how preprocessor code evolves over time.
Developers widely use CSS preprocessors and nesting is the most commonly used feature. Mixins are typically short with few parameters and often contain cross-browser declarations. While extends aim to reduce duplication, developers prefer mixins due to extends sometimes breaking presentation by changing selector order.
Improving the Unification of Software Clones Using Tree and Graph Matching Al...Nikolaos Tsantalis
This document presents an approach to improve the refactoring of software clones using tree and graph matching algorithms. The motivation is the limitations of current refactoring tools in optimally mapping clones with minimum differences and scaling to large code bases. The approach uses control and program dependence graphs to divide the matching problem into subproblems and find mappings that maximize matched statements while minimizing differences. An evaluation on 2342 clone groups from 7 projects found the approach could refactor 82% more groups than the state-of-the-art CeDAR tool, and 28% of groups involving clones in different files. Future work involves further empirical studies and handling more complex clone types.
This document discusses the history and future directions of code smell research. It covers the origins of refactoring and code smells, definitions of various code smells, methods for identifying code smells, prioritizing refactoring opportunities based on risk analysis, and the need to consider refactoring dependencies and conflicts. The goal is to improve software maintainability and quality through addressing code smells.
Preventive Software Maintenance: The Past, the Present, the FutureNikolaos Tsantalis
The document discusses preventive software maintenance and focuses on automatically detecting refactoring opportunities for common object-oriented design problems like duplicated code and inappropriate interfaces. It proposes treating design improvement as a search problem and applying search algorithms to find an optimal sequence of refactorings. Advantages of this approach include feasible behavior-preserving solutions to design issues and a more holistic way to prioritize preventive maintenance tasks.
An Empirical Study of Duplication in Cascading Style SheetsNikolaos Tsantalis
This document summarizes an empirical study on duplication in Cascading Style Sheets (CSS). It finds that about 60% of declarations are duplicated on average across CSS files from various websites. The study analyzes three types of duplication and detects duplication by parsing CSS into a model. It examines CSS files from open source projects and top websites, finding the largest duplications are typically between two selectors and involve five or more common declarations. While CSS size correlates with duplication, grouping adoption does not strongly correlate with reducing duplication. Future work is needed to refactor duplications and help migrate CSS to preprocessing languages.
Ranking Refactoring Suggestions based on Historical VolatilityNikolaos Tsantalis
This document discusses ranking refactoring suggestions based on historical code volatility. It proposes using the historical volatility of code elements, measured by changes across prior versions, to prioritize refactoring efforts. It evaluates four forecasting models - random walk, historical average, exponential smoothing, and exponentially weighted moving average - for predicting future volatility to rank suggestions. The historical average model achieved the lowest error and produced rankings most similar to actual volatility, especially for elements with frequent changes, making it a suitable strategy. The goal is to focus refactoring on code most likely to undergo future maintenance changes.
The document summarizes a research paper on automatically detecting features in Ajax-enabled web applications. It proposes clustering DOM states based on structural similarity and using different distance metrics, including a composite-change-aware tree-edit distance. An evaluation on three Ajax apps found this tree-edit distance performed best. Future work includes automatically labeling detected features using text analysis and emphasis tags.
The document discusses challenges and limitations with existing tools for refactoring code clones. It proposes a new approach to improve clone refactoring that involves three main phases:
1) Control structure matching to identify refactorable clones that have identical control flow.
2) PDG mapping to find the optimal mapping between clone statements that maximizes matches and minimizes differences.
3) Precondition examination to ensure refactoring preserves program behavior.
The goal is to parameterize more types of differences, find optimal matches, and add behavior preservation checks. This would allow safer and more effective refactoring of clones.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
ESPP presentation to EU Waste Water Network, 4th June 2024 “EU policies driving nutrient removal and recycling
and the revised UWWTD (Urban Waste Water Treatment Directive)”
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
A Multidimensional Empirical Study on Refactoring Activity
1. A Multidimensional Empirical Study on
Refactoring Activity
Nikolaos Tsantalis, Victor Guana, Eleni Stroulia, and Abram Hindle
Department of Computer Science and Software Engineering - Concordia University,
Department of Computing Science - University of Alberta, Edmonton, Alberta, Canada
1
2. What do we want to understand?
Code Refactoring:
“Disciplined technique for restructuring an existing body of code,
altering its internal structure without changing its external behavior” ….
“to improve some of the non-functional attributes of the software”…
“code readability and reduced complexity to improve
the maintainability of the source code, as well as a more expressive
internal architecture or object model to improve extensibility”
Martin Fowler in http://refactoring.com
2
SSRG Lab – University of Alberta
3. What do we want to understand?
Five research questions that explore the different types of refactorings applied to
different types of sources.
Dimensions:
• The individual contribution of team members on refactoring activities.
• The alignment of refactoring activity with release dates and testing periods.
• The motivation behind the applied refactorings.
3
SSRG Lab – University of Alberta
4. Why do we want to understand it?
• Help the community better reflect on the actual refactoring practice.
• Increase the awareness of the software maintenance community about the
drivers behind the application of refactorings.
• Help motivate the creation of refactoring tools (e.g., code smell detectors, IDEs)
addressing the actual needs of developers in a principled and empirical manner.
4
SSRG Lab – University of Alberta
5. How do we want to understand it?
• Three Case Studies (Git Repositories)
Dev. Time
JUnit
12 years (4 analyzed)
Analyzed Rev.
1018 Revisions
Jakarta HTTP Core
7 years (7 analyzed)
1588 revisions
Jakarta HTTP Client
6 years (6 analyzed)
1838 revisions
• Detecting Refactorings: Lightweight version of the UMLDiff algorithm for
differencing object-oriented models.
• Detecting Testing Activity: Semi-automatic testing activity in software evolution
through package organization analysis, keyword and topic analysis and manual
inspection.
5
SSRG Lab – University of Alberta
6. RQ1: Do software developers perform different
types of refactoring operations on test code and
production code?
• We needed to determine successive revisions in the
repositories, given the nature of the non-linear
commit history of GIT
But.. how?
• Implicit Branching.
• Lightweight version of the UMLDiff algorithm for
differencing object-oriented models + refactoring
detection rules defined by Biegel et al. [1]
6
SSRG Lab – University of Alberta
7. RQ1: Do software developers perform different
types of refactoring operations on test code and
production code?
7
SSRG Lab – University of Alberta
8. RQ2: Which developers are responsible for
refactorings?
Identify test-specific and production-specific code entities
JUnit:
Test code is well organized and classes, methods, fields are
located in a junit.tests package since
the beginning of the project.
But.. how?
HTTP Core & Client:
Classes had a prefix of Test or a suffix of Mock
Hybrid approach automatically collecting test classes
following the pattern matching approach, and manually
inspecting the results.
8
SSRG Lab – University of Alberta
9. RQ2: Which developers are responsible for
refactorings?
9
SSRG Lab – University of Alberta
10. RQ3: Is there more refactoring activity before
major project releases than after?
We tracked each project major release dates in the public software
release dates + Manually examined their relevance to filter minor
releases such as
• Isolated application of patches
• Major documentation migration
• Change of license types
But.. how?
HTTPCore: project had major releases every 2 months.
JUnit: commit activity decreased substantially 30 days after
the release dates
observation window
40 days before
release
40 days after
10
SSRG Lab – University of Alberta
11. RQ3: Is there more refactoring activity before
major project releases than after?
HTTP-Client
11
SSRG Lab – University of Alberta
12. RQ3: Is there more refactoring activity before
major project releases than after?
HTTP-Core
12
SSRG Lab – University of Alberta
13. RQ3: Is there more refactoring activity before
major project releases than after?
JUnit
13
SSRG Lab – University of Alberta
14. RQ4: Is refactoring activity on production code
preceded by the addition/modification of test code?
Extraction of test code update periods
1.
2.
3.
4.
5.
Detected additions/modifications of Test classes in the commit history
Created plots of activity
Focused on periods with contiguous and intense activity
Manually inspected the periods to discard False positives
The last day of each period was set as the pivot point in our window
analysis
14
SSRG Lab – University of Alberta
15. RQ4: Is refactoring activity on production code
preceded by the addition/modification of test code?
HTTP-Client
15
SSRG Lab – University of Alberta
16. RQ4: Is refactoring activity on production code
preceded by the addition/modification of test code?
HTTP-Core
16
SSRG Lab – University of Alberta
17. RQ4: Is refactoring activity on production code
preceded by the addition/modification of test code?
JUnit
17
SSRG Lab – University of Alberta
18. RQ5: What is the purpose of
the applied refactorings?
Goal: Find the reason behind the application of the detected refactorings.
1.
2.
3.
4.
Inspected a randomly selected subset of refactorings to come up with
some basic categories of motivations.
Defined rules to classify a refactoring instance in each of the basic
categories.
Labeled all refactoring instances applying the classification rules.
In the case of new emerging categories, the classification rules were
updated and all instances were re-labelled.
18
SSRG Lab – University of Alberta
19. We classified 210
Extract Method instances in total
Extract Method is known as the
“Swiss Army knife” of refactorings
19
SSRG Lab – University of Alberta
20. Facilitate Extension
private boolean isShadowed(Method method, List<Method> results) {
for (Method each : results) {
if (each.getName().equals(method.getName()))
return true;
}
return false;
}
BEFORE
private boolean isShadowed(Method method, List<Method> results) {
for (Method each : results) {
if (isShadowed(method, each))
return true;
}
return false;
}
private boolean isShadowed(Method current, Method previous) {
if (! previous.getName().equals(current.getName()))
return false;
//additional conditions examined
if (...)
AFTER
}
20
21. Backward Compatibility
public void writeTo(final OutputStream out, int mode) {
InputStream in = new FileInputStream(this.file);
byte[] tmp = new byte[4096];
int l;
while ((l = in.read(tmp)) != -1) {
out.write(tmp, 0, l);
}
out.flush();
}
BEFORE
@Deprecated
public void writeTo(final OutputStream out, int mode) {
writeTo(out);
}
public void writeTo(final OutputStream out) {
InputStream in = new FileInputStream(this.file);
byte[] tmp = new byte[4096];
int l;
while ((l = in.read(tmp)) != -1) {
out.write(tmp, 0, l);
}
out.flush();
}
AFTER
unused
21
22. Encapsulate Field
public static Properties fPreferences;
private static void readPreferences() {
fPreferences = new Properties(fPreferences);
fPreferences.load(is);
}
public static String getPreference(String key) {
return fPreferences.getProperty(key);
}
BEFORE
private static Properties fPreferences;
private static void readPreferences() {
setPreferences(new Properties(getPreferences()));
getPreferences().load(is);
}
public static String getPreference(String key) {
return getPreferences().getProperty(key);
}
protected static void setPreferences(Properties preferences) {
fPreferences = preferences;
}
protected static Properties getPreferences() {
return fPreferences;
}
AFTER
22
23. Hide Message Chain
public boolean isShadowedBy(FrameworkField otherMember) {
return otherMember.getField().getName().equals(this.getField().getName());
}
BEFORE
public boolean isShadowedBy(FrameworkField otherMember) {
return otherMember.getName().equals(getName());
}
private String getName() {
return this.getField().getName();
}
AFTER
23
24. Introduce Factory Method
public Runner runnerForClass(Class<?> testClass) throws Throwable {
List<RunnerBuilder> builders= Arrays.asList(
new IgnoredBuilder(),
new AnnotatedBuilder(this),
suiteMethodBuilder(),
new JUnit3Builder(),
new JUnit4Builder());
}
BEFORE
public Runner runnerForClass(Class<?> testClass) throws Throwable {
List<RunnerBuilder> builders= Arrays.asList(
ignoredBuilder(),
annotatedBuilder(),
suiteMethodBuilder(),
junit3Builder(),
junit4Builder());
}
protected AnnotatedBuilder annotatedBuilder() {
return new AnnotatedBuilder(this);
}
AFTER
24
25. RQ5: What is the purpose of
the applied refactorings?
We found three main motivations for the application of the extract method
refactoring: code smell resolution, extension and backward compatibility.
Extract
Method
Refactoring
44%
44%
12%
25
SSRG Lab – University of Alberta
27. RQ5: What is the purpose of
the applied refactorings?
We found three main motivations for the application of the extract method
refactoring: code smell resolution, extension and refinement of abstraction level.
31%
Inheritance
Related
Refactorings
20%
49%
27
SSRG Lab – University of Alberta
28. Bird's-eye-view Conclusions
RQ1: Do software developers perform different types of refactoring operations on
test code and production code?
In production code: Design improvements such as, resolution of code
smells and modularization.
In test code: Internal reorganization of the classes of each project into
packages and renaming of classes for conceptual reasons.
RQ2. Which developers are responsible for refactorings?
Refactorings were mainly fulfilled by a single developer acting as a refactoring
manager.
Managers share refactoring responsibilities in both production and test code.
28
SSRG Lab – University of Alberta
29. Bird's-eye-view Conclusions
RQ3. Is there more refactoring activity before major project releases than after?
Refactoring activity is significantly more frequent before a release than after. We
believe this activity is motivated by the desire to improve the design, and prepare
the code for future extensions, right before releasing a stable API.
RQ4: Is refactoring activity on production code preceded by the addition or
modification of test code?
A tight alignment between refactoring activities and test code updates.
Testing and refactoring are dependent implying the adoption of test-driven
refactoring practices in the examined projects.
29
SSRG Lab – University of Alberta
30. Bird's-eye-view Conclusions
RQ5: What is the purpose of the applied refactorings?
Extract Method refactoring:
• Code smells: The decomposition of methods was the most dominant
motivation
• Facilitating Extension: Introduction of Factory Methods
Inheritance related refactorings:
• Removal of duplicate code and the generalization of abstraction.
• Decomposition of interfaces and specialization of abstraction were rarely
applied practices.
30
SSRG Lab – University of Alberta
31. Bird's-eye-view Conclusions
RQ1: Do software developers perform different types of refactoring operations
on test code and production code?
RQ2. Which developers are responsible for refactorings?
RQ3. Is there more refactoring activity before major project releases than after?
RQ4: Is refactoring activity on production code preceded by the addition or
modification of test code?
RQ5: What is the purpose of the applied refactorings?
tsantalis@cse.concordia.ca
guana@ualberta.ca
31
SSRG Lab – University of Alberta
Editor's Notes
JUNIT: 383 detected refactorings, 219 (57%) were production code refactorings, while 164 (43%) weretest code refactorings.HTTPCore, 421 (80%) were applied to production code, while 106 (20%) were applied to test codeHTTPClient, 161 (81%) where applied to production code and 37 (19%)The goal of the applied refactorings on test code was often to organize the tests into new packages, reorganize inner classes, rename some test classes by giving a more meaningful name and move methods between test classes for conceptual reasons. Conversely, the refactorings that were applied to production code, of the examined systems, were intended to improve the design of the system by modularizingthe code and removing design problems. The three most dominant refactorings on production code were Move Class, Extract Method and Rename Class followed by refactorings related to the reallocation of system's behavior (Move and Pull Up Method). The refactorings related to the reallocation of system's state (Move andPull Up Field) as well as the introduction of additional inheritance levels (Extract Superclass/Interface) are less frequent. Finally, the refactorings that move state or behavior to subclasses (Push Down Method/Field) are rarely applied.
JUNIT: 383 detected refactorings, 219 (57%) were production code refactorings, while 164 (43%) weretest code refactorings.HTTPCore, 421 (80%) were applied to production code, while 106 (20%) were applied to test codeHTTPClient, 161 (81%) where applied to production code and 37 (19%)The goal of the applied refactorings on test code was often to organize the tests into new packages, reorganize inner classes, rename some test classes by giving a more meaningful name and move methods between test classes for conceptual reasons. Conversely, the refactorings that were applied to production code, of the examined systems, were intended to improve the design of the system by modularizingthe code and removing design problems. The three most dominant refactorings on production code were Move Class, Extract Method and Rename Class followed by refactorings related to the reallocation of system's behavior (Move and Pull Up Method). The refactorings related to the reallocation of system's state (Move andPull Up Field) as well as the introduction of additional inheritance levels (Extract Superclass/Interface) are less frequent. Finally, the refactorings that move state or behavior to subclasses (Push Down Method/Field) are rarely applied.
It is worth noticing that although HTTPCore (with 8 committers) and HTTPClient (with 9 committers) share almost the sameteam of developers, only Oleg Kalnichevski contributes refactorings in both projects. This means that the rest of the developers, althoughthey commit code in both projects, they contribute refactorings in only one of them (probably the project they are more familiar with).We can conclude that most of the applied refactorings are performed by specific developers that usually have a key role in the management of the project.Moreover, we did not notice any substantial change in the distribution of the refactoring commits by the developers to the product codeand test code of the three projects under study.
It is worth noticing that although HTTPCore (with 8 committers) and HTTPClient (with 9 committers) share almost the sameteam of developers, only Oleg Kalnichevski contributes refactorings in both projects. This means that the rest of the developers, althoughthey commit code in both projects, they contribute refactorings in only one of them (probably the project they are more familiar with).We can conclude that most of the applied refactorings are performed by specific developers that usually have a key role in the management of the project.Moreover, we did not notice any substantial change in the distribution of the refactoring commits by the developers to the product codeand test code of the three projects under study.
For the 4th RQ we investigated whether the refactoring activity on production code is accompanied with the addition/modification of test code.To answer this question, we had to extract periods of test code updates.
For all examined projects, we found a significantly higher refactoring activity during the test update periods compared to the activity after the end of test periods.
In particular, for project HTTP-Core we found that mostrefactorings actually coincide with the end of test update periods.
For the 5th RQ, our goal what to find the reasons behind the application of the detected refactorings.We applied a two-phase process. In the first phase…
Our first focus was the Extract Method refactoring which is considered as the Swiss Army Knife of refactorings, since it can be used to solve many design problemssuch as removing code duplication and decomposing long/complex methods, as well as it is used in combination with other refactorings to solve more complex design problems (Feature Envy, God Class). We classified 210 Extract Method instances in total. I will now show some indicative cases from the examined projects motivating the application of Extract Method refactoring.
Facilitate functionality extension: The extracted/original method contains newly added code apart from the extracted/removed one.Introduce polymorphism: The extracted method is abstract and the extracted code is moved to a subclass providing a concrete implementation for the newly introduced abstract method.
The encapsulation restricts the access to other classes
This was done for extension purposes, because factory methods can also return subclass instances of the returned type.