MIP Award presentation at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024

•

0 likes•10 views

Presentation for the Most Influential Paper (MIP) award at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024 Abstract: Existing defect prediction models use product or process metrics and machine learning methods to identify defect- prone source code entities. Different classifiers (e.g., linear regression, logistic regression, or classification trees) have been investigated in the last decade. The results achieved so far are sometimes contrasting and do not show a clear winner. In this paper we present an empirical study aiming at statistically analyzing the equivalence of different defect predictors. We also propose a combined approach, coined as CODEP (COmbined DEfect Predictor), that employs the classification provided by different machine learning techniques to improve the detection of defect-prone entities. The study was conducted on 10 open source software systems and in the context of cross-project defect prediction, that represents one of the main challenges in the defect prediction field. The statistical analysis of the results indicates that the investigated classifiers are not equivalent and they can complement each other. This is also confirmed by the superior prediction accuracy achieved by CODEP when compared to stand-alone defect predictors.

Science

MIP - Cross-Project Defect Prediction
Models: l'Union Fait la Force
Annibale Panichella Rocco Oliveto Andrea De Lucia
1

Cross-Project Defect Prediction
4
File LOC LCOM DIT … Bug
F1 76 0 2 …
F2 101 1 12 …
… … … … … …
Project A
(Source)
AI Model
|——Software Metrics——|
Project B
(Target)
File LOC LCOM DIT … Bug
F1 176 22 1 … ?
F2 433 144 4 … ?
… … … … … ?
|———Same Metrics———|
Training Test

Friedman test + Nemenyi
Liem and Panichella, Arxiv 2020
The Master Predictor?
5
“One Ring to rule them all”
Some models seem to perform better
than others
Differences among libraries/
implementations

A Deeper Analysis
6
Bayes
Net
AUC = 0.56
Logistic AUC = 0.49
If (dit<10)
If (loc > 5)
1
0
…
True False
True False
Decision
Tree
AUC = 0.57
4
BN
DT
Log
8
9
1
38 10
0
Number of true positives
(identi
fi
ed bugs) for Ant v1.7

L’Union Fait La Force
(Unity Is Strength)
7

Inspired by Peer-Reviewing
8
Manuscript
Reviewer1 Reviewer2 Reviewer3
Strong
Reject
Strong
Accept
Strong
Accept
ACCEPT
Discussion
File
If (dit<10)
If (loc > 5)
1
0
…
True False
True False
Model 1 Model 2 Model 3
Combined Model
(CODEP)
Final Decision

Inspired by Peer-Reviewing
9
File
If (dit<10)
If (loc > 5)
1
0
…
True False
True False
Model 1 Model 2 Model 3
Combined Model
(CODEP)
Final Decision
Base models are diverse (diversity is
the key like in peer-reviewing)
Combination can be done with any
base/meta model (e.g., Naive Bayes Net)
CODEP takes as inputs the predicted
scores (con
fi
dence values) and not the
predicted labels (yes/no) of the base models

Replicating the Study 10 Years Later
10
A quick demo…

Replicating the Study 10 Years Later
11
Over multiple runs (random K-folds),
CODEP shows statistically better
performance than the base model
Some models have larger results
variance (less stable) than others
AU
C

The Impact in the Community
12
“Some classi
fi
ers are more consistent in
predicting defects than others.”
Bowes et al., PROMISE 2015
“Despite similar predictive performance values
for these four classi
fi
ers, each detects different
sets of defects.”
Similar results also showed by
• Bowes et al., in SQJ 2017
• Chen et al., JSEP 2019

13
Other combination strategies:
• Ensemble (Majority voting)
• Ensemble Stacking (Different ML models)
• Classi
fi
er selection (Adaptive per target project)
The Impact in the Community
0
3
6
9
12
2016 2017 2018 2019 2020 2021 2022
# Follow-up papers on
Ensemble methods
(Directly citing CODEP)
150+ Papers on
Google scholar
on Ensemble and
combination
methods for
Defect Prediction
Majority voting is the best…
Majority voting is weak…
Bootstrap aggregation is the
best ever…
Voting average is the top-one

14
Further studies on the role of model diversity on combination performances
• Petric et al., ESEM 2016 “Diversity […] is essential"
• Chen et al., IEEETransactions or Reliability, 2023
• Sharma et al., JSS 2023
• …
Combination of different ML models for federated learning (DSA 2023)
Beyond defect prediction:
• Malware detection
• Effort estimation
• GitHub PR prediction
• …
The Impact in the Community

The Hot-Topic Waves
16
Standard
ML
Time
Research
Interests
Deep
Learning
LLMs
Random
Forest
Every 2-4 years (wave) we jump on new
techniques hoping to
fi
nd the “master ring”
We often forget the free-lunch theorem
in optimization and machine learning
Even a stopped clock is right twice in a day
(simple baseline are stronger than we think)
We should focus on understanding when to
use X orY rather than focusing on X vs.Y
(l’union fait la force)

MIP - Cross-Project Defect Prediction
Models: l'Union Fait la Force
Annibale Panichella Rocco Oliveto Andrea De Lucia
17

This document summarizes a study that used machine learning algorithms to predict fault-prone files in nine open source Java projects containing over 18,000 files and 3 million lines of code. Six machine learning algorithms (Naive Bayes, Bayesian network, decision tree, radial basis function, simple logistic, and zeroR) were evaluated based on their ability to rank files by probability of being buggy, as determined by the FindBugs tool. The results showed that decision trees and Bayesian networks performed best at ranking the files, with decision trees outperforming other methods in previous studies. Lift curves were used to evaluate the performance of the models by plotting the number of files examined against the number of buggy files found.

[ADBIS 2021] - Optimizing Execution Plans in a Multistore

Chiara Forresi

Multistores are data management systems that enable query processing across different database management systems (DBMSs); besides the distribution of data, complexity factors like schema heterogeneity and data replication must be resolved through integration and data fusion activities. In a recent work [2], we have proposed a multistore solution that relies on a dataspace to provide the user with an integrated view of the available data and enables the formulation and execution of GPSJ (generalized projection, selection and join) queries. In this paper, we propose a technique to optimize the execution of GPSJ queries by finding the most efficient execution plan on the multistore. In particular, we devise three different strategies to carry out joins and data fusion, and we build a cost model to enable the evaluation of different execution plans. Through the experimental evaluation, we are able to profile the suitability of each strategy to different multistore configurations, thus validating our multi-strategy approach and motivating further research on this topic.

A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...

ijseajournal

This document presents an algorithm to extract data from open source software project repositories on GitHub for building duration estimation models. The algorithm extracts data on contributors, commits, lines of code added and removed, and active days for each contributor within a release period. This data is then used to build linear regression models to estimate project duration based on the number of commits by contributor type (full-time, part-time, occasional). The algorithm is tested on data extracted from 21 releases of the WordPress project hosted on GitHub.

A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...

ijseajournal

Software project estimation is important for allocating resources and planning a reasonable work schedule. Estimation models are typically built using data from completed projects. While organizations have their historical data repositories, it is difficult to obtain their collaboration due to privacy and competitive concerns. To overcome the issue of public access to private data repositories this study proposes an algorithm to extract sufficient data from the GitHub repository for building duration estimation models. More specifically, this study extracts and analyses historical data on WordPress projects to estimate OSS project duration using commits as an independent variable as well as an improved classification of contributors based on the number of active days for each contributor within a release period. The results indicate that duration estimation models using data from OSS repositories perform well and partially solves the problem of lack of data encountered in empirical research in software engineering

Defect Prediction: Accomplishments and Future Challenges

Yasutaka Kamei

The document discusses the accomplishments and future challenges of defect prediction in software engineering. It provides an overview of defect prediction, including leveraging data from repositories to measure source code metrics and build prediction models. Major accomplishments include increased data availability and openness, the ability to extract various metric types, and improved modeling performance. However, challenges remain such as keeping up with fast development paces and making models more accessible. The document argues that future areas of focus include defect prediction for mobile apps and integrating just-in-time models into continuous integration processes.

Software engineering practices and software quality empirical research results

Nikolai Avteniev

2cee Master Cocomo20071

CS, NcState

Deep Software Variability and Frictionless Reproducibility

University of Rennes, INSA Rennes, Inria/IRISA, CNRS

The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions. Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability. Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields. I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating). I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems. Exposé invité Journées Nationales du GDR GPL 2024

Most evaluations of novel algorithmic contributions assess their accuracy in predicting what was withheld in an offline evaluation scenario. However, several doubts have been raised that standard offline evaluation practices are not appropriate to select the best algorithm for field deployment. The goal of this work is therefore to compare the offline and the online evaluation methodology with the same study participants, i.e. a within users experimental design. This paper presents empirical evidence that the ranking of algorithms based on offline accuracy measurements clearly contradicts the results from the online study with the same set of users. Thus the external validity of the most commonly applied evaluation methodology is not guaranteed.

From Bugs to Decision Support - Selected Research Highlights

Markus Borg

Benchmarking machine learning techniques

ijseajournal

Machine Learning approaches are good in solving problems that have less information. In most cases, the software domain problems characterize as a process of learning that depend on the various circumstances and changes accordingly. A predictive model is constructed by using machine learning approaches and classified them into defective and non-defective modules. Machine learning techniques help developers to retrieve useful information after the classification and enable them to analyse data from different perspectives. Machine learning techniques are proven to be useful in terms of software bug prediction. This study used public available data sets of software modules and provides comparative performance analysis of different machine learning techniques for software bug prediction. Results showed most of the machine learning methods performed well on software bug datasets.

Master’s Thesis.pptx

usmimukherjee18

Complimenting Deficient Bug Reports With Missing Information Leveraging Neural Text Generation Abstract: Software bug reports often lack crucial information (e.g., steps to reproduce, expected behaviour), which makes bug resolution challenging. A recent study found that 78% of bug reports from open-source projects (e.g., Eclipse) contain less than 100 words each and thus require the developers to spend more time on bug resolution. According to an existing survey, 77% of 327 professional developers from major technology companies (e.g., Google, Meta) consider missing information a major problem and emphasize complementing them with useful information (e.g., environment configuration). In this thesis, we propose and evaluate two novel approaches that complement deficient bug reports with relevant information using Generative AI. In our first study, we propose --- BugMentor --- a novel approach that combines structured information retrieval and neural text generation (e.g., CodeT5) to generate appropriate answers to the follow-up questions from bug reports. Our approach identifies past, relevant bug reports to a given bug report, constructs the context and then leverages it to generate the answers. According to our evaluation, BugMentor generates good answers and outperforms three existing baselines significantly in terms of four appropriate metrics (e.g., BLEU, Semantic Similarity). We also conduct a developer study involving 10 participants where BugMentor’s answers were found to be more accurate, precise, concise and useful. In our second study, we propose --- BugEnricher --- a novel approach that enriches bug reports with meaningful explanations using neural text generation. We fine-tuned the T5 model on software-specific vocabulary (e.g., Stack Overflow tags) to generate explanations against software-specific terms and jargon, which has the potential to enrich a bug report. Our evaluation using three performance metrics shows that BugEnricher generates understandable to good explanations according to Google’s standards and outperforms two baselines from the literature. We also conduct a case study to demonstrate the benefit of our bug report enhancement and found that it was able to improve an existing technique in detecting textually dissimilar duplicate bug reports, which has been reported as a major challenge. Given the empirical evidence above, our approaches have strong potential to support bug resolution and bug report management. Thesis URL : http://hdl.handle.net/10222/83339

Cukic Promise08 V3

gregoryg

The study compared software quality prediction models using design metrics, code metrics, and a combination of both, on multiple projects from NASA. The results showed that models using a combination of design and code metrics performed better than using either metric set alone, and code metrics generally performed better than design metrics alone. Combining metrics from different development phases provided better fault prediction than only using code-level metrics.

A Hybrid Approach to Expert and Model Based Effort Estimation

CS, NcState

Daniel Baker defended his thesis on a hybrid approach to expert and model-based software effort estimation. He automated several expert judgment best practices in his 2CEE tool and found that feature selection sometimes improved model performance while bagging and boosting did not. Evaluation of methods at NASA JPL found median error was greatly reduced. The thesis achieved a more robust uncertainty representation and revealed unstable COCOMO calibrations versus previous reports.

Predicting More from Less: Synergies of Learning

CS, NcState

Software Analytics - Achievements and Challenges

Tao Xie

Software Engineering Research: Leading a Double-Agent Life.

Lionel Briand

The document discusses testing of closed-loop controllers in automotive systems. It notes the increasing complexity of automotive software and challenges in model-in-the-loop (MIL), software-in-the-loop (SIL), and hardware-in-the-loop (HIL) testing. It presents an approach to generate test cases for continuous controllers at the MIL level by representing requirements as objective functions and using a search-based approach to find worst-case scenarios. Experimental results found significantly worse scenarios than their industry partner, allowing them to generate better stress tests for HIL. The approach addresses a problem largely ignored in MIL testing of continuous controllers.

Software Design Patterns Lecutre (Intro)

VikramJothyPrakash1

This document provides an overview of design patterns including: an introduction to object-oriented concepts and design patterns; definitions of what design patterns are and why they are used; the common elements of a design pattern; classifications of design patterns; potential pros and cons of using design patterns; descriptions and examples of popular design patterns like Strategy, Observer, Singleton, Decorator; and a conclusion with references. Popular design patterns like Strategy, Observer, Singleton, and Decorator are explained in more detail through definitions, class diagrams, examples of problems they address, and their solutions.

[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models

DataScienceConferenc1

As many organizations are bundling large language models (LLMs) in their products, they face the problem of rigorous model selection. This talk gives a data-centric understanding of how LLMs are built and evaluated. We will discuss the limitations of current models and pay special attention to the available evaluation protocols. How do we distinguish good models from the others? What tasks and datasets should we try or avoid? How do we incorporate feedback from our users? We will present the guidelines the attendees can use in their future experiments.

Technical report jada

University of Bari (Italy)

This document summarizes a technical report on an empirical study of design pattern evolution in 39 open source Java projects. The study analyzed a total of 428 software releases from the projects to identify 10 common design patterns. A total of 27,855 instances of the patterns were found. The data collected includes the number of each pattern identified in each project release. This dataset will be further analyzed to understand how design patterns evolve over the lifetime of a software system.

Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...

CS, NcState

The document discusses evaluating ensembles of learning machines for software effort estimation. It aims to determine if ensemble methods improve upon single learners, which ensembles perform best, and how to select models for different datasets. The study uses several public software effort estimation datasets and evaluates multiple ensemble techniques, including bagging and negative correlation learning, against single learners like decision trees. Statistical tests are used to rigorously compare the performance of different models.

Towards a Better Understanding of the Impact of Experimental Components on De...

Chakkrit (Kla) Tantithamthavorn

Software Quality Assurance (SQA) teams play a critical role in the software development process to ensure the absence of software defects. It is not feasible to perform exhaustive SQA tasks (i.e., software testing and code review) on a large software product given the limited SQA resources that are available. Thus, the prioritization of SQA efforts is an essential step in all SQA efforts. Defect prediction models are used to prioritize risky software modules and understand the impact of software metrics on the defect-proneness of software modules. The predictions and insights that are derived from defect prediction models can help software teams allocate their limited SQA resources to the modules that are most likely to be defective and avoid common past pitfalls that are associated with the defective modules of the past. However, the predictions and insights that are derived from defect prediction models may be inaccurate and unreliable if practitioners do not control for the impact of experimental components (e.g., datasets, metrics, and classifiers) on defect prediction models, which could lead to erroneous decision-making in practice. In this thesis, we investigate the impact of experimental components on the performance and interpretation of defect prediction models. More specifically, we investigate the impact of the three often overlooked experimental components (i.e., issue report mislabelling, parameter optimization of classification techniques, and model validation techniques) have on defect prediction models. Through case studies of systems that span both proprietary and open-source domains, we demonstrate that (1) issue report mislabelling does not impact the precision of defect prediction models, suggesting that researchers can rely on the predictions of defect prediction models that were trained using noisy defect datasets; (2) automated parameter optimization for classification techniques substantially improve the performance and stability of defect prediction models, as well as they change their interpretation, suggesting that researchers should no longer shy from applying parameter optimization to their models; and (3) the out-of-sample bootstrap validation technique produces a good balance between bias and variance of performance estimates, suggesting that the single holdout and cross-validation families that are commonly-used nowadays should be avoided.

Abstract.doc

butest

This document describes a machine learning model for software defect prediction. It uses NASA software metrics data to train artificial neural networks and decision tree models to predict defect density values. The model performs regression to predict defect values for test data. Experimental results show that while both ANN and decision tree methods did not initially provide acceptable predictions compared to the data variance, further experiments could enhance defect prediction performance through a two-step modeling approach.

Final Survey On Rule Base Development Slideshare

Valentin Zacharias

ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...

ACM Chicago

Intelligent Software Engineering: Synergy between AI and Software Engineering

Tao Xie

This document discusses the synergy between artificial intelligence and software engineering. It begins with an overview of intelligent software engineering and how AI techniques can be applied to software engineering problems. Specific examples discussed include using dynamic symbolic execution for automated test generation for binary code, .NET code, and mobile app code. The document also discusses using machine learning for software analytics, testing, and natural language interfacing for IDEs. Open challenges in the field of intelligent software engineering are mentioned at the end.

Bits of Evidence

Greg Wilson

The document summarizes research in software engineering and development practices. It discusses several studies that have provided evidence for practices like rigorous inspections reducing errors, Conway's Law relating organizational structure to system structure, and physical distance not affecting post-release fault rates as much as distance in the organizational chart. The document advocates building development practices around these empirical facts and calls for continued work to systematically synthesize research evidence and practices.

Comparative Performance Analysis of Machine Learning Techniques for Software ...

csandit

Machine learning techniques can be used to analyse data from different perspectives and enable developers to retrieve useful information. Machine learning techniques are proven to be useful in terms of software bug prediction. In this paper, a comparative performance analysis of different machine learning techniques is explored for software bug prediction on public available data sets. Results showed most of the machine learning methods performed well on software bug datasets.

Breaking the Silence: the Threats of Using LLMs in Software Engineering

Annibale Panichella

Presentation of our work presented at ICSE 2024 (NIER track) in Lisbon Abstract: Large Language Models (LLMs) have gained considerable traction within the Software Engineering (SE) community, impacting various SE tasks from code completion to test generation, from program repair to code summarization. Despite their promise, researchers must still be careful as numerous intricate factors can influence the outcomes of experiments involving LLMs. This paper initiates an open discussion on potential threats to the validity of LLM-based research including issues such as closed-source models, possible data leakage between LLM training data and research evaluation, and the reproducibility of LLM-based findings. In response, this paper proposes a set of guidelines tailored for SE researchers and Language Model (LM) providers to mitigate these concerns. The implications of the guidelines are illustrated using existing good practices followed by LLM providers and a practical example for SE researchers in the context of test case generation.

Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...

Annibale Panichella

More machine learning (ML) models are introduced to the field of Software Engineering (SE) and reached a stage of maturity to be considered for real-world use; But the real world is complex, and testing these models lacks often in explainability, feasibility and computational capacities. Existing research introduced meta-morphic testing to gain additional insights and certainty about the model, by applying semantic-preserving changes to input-data while observing model-output. As this is currently done at random places, it can lead to potentially unrealistic datapoints and high computational costs. With this work, we introduce genetic search as an aid for metamorphic testing in SE ML. Exploiting the delta in output as a fitness function, the evolutionary intelligence optimizes the transformations to produce higher deltas with less changes. We perform a case study minimizing F1 and MRR for Code2Vec on a representative sample from java-small with both genetic and random search. Our results show that within the same amount of time, genetic search was able to achieve a decrease of 10% in F1 while random search produced 3% drop.

Evolutionary Testing for Crash Reproduction

Annibale Panichella

Manual crash reproduction is a labor-intensive and timeconsuming task. Therefore, several solutions have been proposed in literature for automatic crash reproduction, including generating unit tests via symbolic execution and mutation analysis. However, various limitations adversely a ect the capabilities of the existing solutions in covering a wider range of crashes because generating helpful tests that trigger speci c execution paths is particularly challenging. In this paper, we propose a new solution for automatic crash reproduction based on evolutionary unit test generation techniques. The proposed solution exploits crash data from collected stack traces to guide search-based algorithms toward the generation of unit test cases that can reproduce the original crashes. Results from our preliminary study on real crashes from Apache Commons libraries show that our solution can successfully reproduce crashes which are not reproducible by two other state-of-art techniques.

Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...

Annibale Panichella

This document proposes using genetic algorithms to parameterize and assemble information retrieval (IR)-based solutions for software engineering tasks. The goal is to find an optimal configuration of IR parameters like term extraction, stopword removal, weighting, and distance functions. The approach evaluates different IR configurations based on how well they cluster related documents, as measured by silhouette coefficient. Empirical tests on traceability recovery and duplicate bug detection tasks show this "GA-IR" approach finds configurations comparable to ideal ones and outperforming baselines. The clustering hypothesis - that better document clusters correlate with better IR performance - is also supported.

Security Threat Identification and Testing

Annibale Panichella

Business applications are more and more collaborative (cross-domains, cross-devices, service composition). Security shall focus on the overall application scenario including the interplay between its entities/devices/services, not only on the isolated systems within it. In this paper we propose the Security Threat Identification And TEsting (STIATE) toolkit to support development teams toward security assessment of their under-development applications focusing on subtle security logic flaws that may go undetected by using current industrial technology. At design-time, STIATE supports the development teams toward threat modeling and analysis by identifying automatically potential threats (via model checking and mutation techniques) on top of sequence diagrams enriched with security annotations (including {whatif} conditions). At run-time, STIATE supports the development teams toward testing by exploiting the identified threats to automatically generate and execute test-cases on the up and running application. We demonstrate the usage of the STIATE toolkit on an application scenario employing the SAML Single Sign-On multi-party protocol, a well-known industrial security standard largely studied in previous literature.

Reformulating Branch Coverage as a Many-Objective Optimization Problem

Annibale Panichella

Test data generation has been extensively investigated as a search problem, where the search goal is to maximize the number of covered program elements (e.g., branches). Recently, the whole suite approach, which combines the fitness functions of single branches into an aggregate, test suite-level fitness, has been demonstrated to be superior to the traditional single-branch at a time approach. In this paper, we propose to consider branch coverage directly as a many-objective optimization problem, instead of aggregating multiple objectives into a single value, as in the whole suite approach. Since programs may have hundreds of branches (objectives), traditional many-objective algorithms that are designed for numerical optimization problems with less than 15 objectives are not applicable. Hence, we introduce a novel highly scalable many-objective genetic algorithm, called MOSA (Many-Objective Sorting Algorithm), suitably defined for the many-objective branch coverage problem. Results achieved on 64 Java classes indicate that the proposed many-objective algorithm is significantly more effective and more efficient than the whole suite approach. In particular, effectiveness (coverage) was significantly improved in 66% of the subjects and efficiency (search budget consumed) was improved in 62% of the subjects on which effectiveness remains the same.

Results for EvoSuite-MOSA at the Third Unit Testing Tool Competition

Annibale Panichella

The document describes MOSA, a new many-objective sorting algorithm for test case prioritization and generation. MOSA reformulates branch coverage as a multi-objective optimization problem by defining an objective function for each branch. It then uses a genetic algorithm and preference criterion to iteratively generate test cases that optimize coverage of multiple branches at once. MOSA was implemented in the Evosuite-MOSA tool and achieved competitive results in the Third Unit Testing Tool Competition, requiring less time than other high-scoring tools to achieve similar coverage levels. However, Evosuite-MOSA also encountered some crashes and limitations when handling certain classes in the competition benchmark.

Adaptive User Feedback for IR-based Traceability Recovery

Annibale Panichella

Traceability recovery allows software engineers to understand the interconnections among software artefacts and, thus, it provides an important support to software maintenance activities. In the last decade, Information Retrieval (IR) has been widely adopted as core technology of semi-automatic tools to extract traceability links between artefacts according to their textual information. However, a widely known problem of IR-based methods is that some artefacts may share more words with non-related artefacts than with related ones. To overcome this problem, enhancing strategies have been proposed in literature. One of these strategies is relevance feedback, which allows to modify the textual similarity according to information about links classified by the users. Even though this technique is widely used for natural language documents, previous work has demonstrated that relevance feedback is not always useful for software artefacts. In this paper, we propose an adaptive version of relevance feedback that, unlike the standard version, considers the characteristics of both (i) the software artefacts and (ii) the previously classified links for deciding whether and how to apply the feedback. An empirical evaluation conducted on three systems suggests that the adaptive relevance feedback outperforms both a pure IR-based method and the standard feedback.

Diversity mechanisms for evolutionary populations in Search-Based Software En...

Annibale Panichella

This document discusses mechanisms for maintaining diversity in evolutionary algorithms. It begins by explaining the importance of balancing exploration and exploitation. Several techniques for preserving diversity are then presented, including modifying genetic operators, changing the objective function, and applying statistical methods. Empirical evaluations demonstrate how diversity mechanisms can improve performance in search-based software engineering problems like test data generation and test suite optimization, which often suffer from premature convergence and getting stuck in local optima due to loss of diversity. Parameter tuning techniques like adjusting the mutation rate and niching methods like fitness sharing are also described as ways to explicitly promote diversity.

Estimating the Evolution Direction of Populations to Improve Genetic Algorithms

Annibale Panichella

Meta-heuristics have been successfully used to solve a wide variety of problems. However, one issue many techniques have is their risk of being trapped into local optima, or to create a limited variety of solutions (problem known as ``population drift''). During recent and past years, different kinds of techniques have been proposed to deal with population drift, for example hybridizing genetic algorithms with local search techniques or using niche techniques. This paper proposes a technique, based on Singular Value Decomposition (SVD), to enhance Genetic Algorithms (GAs) population diversity. SVD helps to estimate the evolution direction and drive next generations towards orthogonal dimensions. The proposed SVD-based GA has been evaluated on 11 benchmark problems and compared with a simple GA and a GA with a distance-crowding schema. Results indicate that SVD-based GA achieves significantly better solutions and exhibits a quicker convergence than the alternative techniques.

More from Annibale Panichella (20)

Breaking the Silence: the Threats of Using LLMs in Software Engineering

Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...

A Fast Multi-objective Evolutionary Approach for Designing Large-Scale Optica...

An Improved Pareto Front Modeling Algorithm for Large-scale Many-Objective Op...

VST2022.pdf

IPA Fall Days 2019

An Adaptive Evolutionary Algorithm based on Non-Euclidean Geometry for Many-O...

Speeding-up Software Testing With Computational Intelligence

Incremental Control Dependency Frontier Exploration for Many-Criteria Test C...

Sbst2018 contest2018

Java Unit Testing Tool Competition — Fifth Round

ICSE 2017 - Evocrash

Evolutionary Testing for Crash Reproduction

Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...

Security Threat Identification and Testing

Reformulating Branch Coverage as a Many-Objective Optimization Problem

Results for EvoSuite-MOSA at the Third Unit Testing Tool Competition

Adaptive User Feedback for IR-based Traceability Recovery

Diversity mechanisms for evolutionary populations in Search-Based Software En...

Estimating the Evolution Direction of Populations to Improve Genetic Algorithms

Recently uploaded

Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf

Selcen Ozturkcan

GBSN - Biochemistry (Unit 6) Chemistry of Proteins

Areesha Ahmad

Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...

Sérgio Sacani

Context. The observation of several L-band emission sources in the S cluster has led to a rich discussion of their nature. However, a definitive answer to the classification of the dusty objects requires an explanation for the detection of compact Doppler-shifted Brγ emission. The ionized hydrogen in combination with the observation of mid-infrared L-band continuum emission suggests that most of these sources are embedded in a dusty envelope. These embedded sources are part of the S-cluster, and their relationship to the S-stars is still under debate. To date, the question of the origin of these two populations has been vague, although all explanations favor migration processes for the individual cluster members. Aims. This work revisits the S-cluster and its dusty members orbiting the supermassive black hole SgrA* on bound Keplerian orbits from a kinematic perspective. The aim is to explore the Keplerian parameters for patterns that might imply a nonrandom distribution of the sample. Additionally, various analytical aspects are considered to address the nature of the dusty sources. Methods. Based on the photometric analysis, we estimated the individual H−K and K−L colors for the source sample and compared the results to known cluster members. The classification revealed a noticeable contrast between the S-stars and the dusty sources. To fit the flux-density distribution, we utilized the radiative transfer code HYPERION and implemented a young stellar object Class I model. We obtained the position angle from the Keplerian fit results; additionally, we analyzed the distribution of the inclinations and the longitudes of the ascending node. Results. The colors of the dusty sources suggest a stellar nature consistent with the spectral energy distribution in the near and midinfrared domains. Furthermore, the evaporation timescales of dusty and gaseous clumps in the vicinity of SgrA* are much shorter ( 2yr) than the epochs covered by the observations (≈15yr). In addition to the strong evidence for the stellar classification of the D-sources, we also find a clear disk-like pattern following the arrangements of S-stars proposed in the literature. Furthermore, we find a global intrinsic inclination for all dusty sources of 60 ± 20◦, implying a common formation process. Conclusions. The pattern of the dusty sources manifested in the distribution of the position angles, inclinations, and longitudes of the ascending node strongly suggests two different scenarios: the main-sequence stars and the dusty stellar S-cluster sources share a common formation history or migrated with a similar formation channel in the vicinity of SgrA*. Alternatively, the gravitational influence of SgrA* in combination with a massive perturber, such as a putative intermediate mass black hole in the IRS 13 cluster, forces the dusty objects and S-stars to follow a particular orbital arrangement. Key words. stars: black holes– stars: formation– Galaxy: center– galaxies: star formation

EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...

Sérgio Sacani

Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions among stars. Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars. The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun. Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically, the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec. Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation were carried out using the ACIS-Extract software. Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a photon flux threshold of approximately 2 × 10−8 photons cm−2 s −1 . The X-ray sources exhibit a highly concentrated spatial distribution, with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.

Farming systems analysis: what have we learnt?.pptx

Frédéric Baudron

HOW DO ORGANISMS REPRODUCE?reproduction part 1

Shashank Shekhar Pandey

fermented food science of sauerkraut.pptx

ananya23nair

Alternate Wetting and Drying - Climate Smart Agriculture

International Food Policy Research Institute- South Asia Office

Sciences of Europe journal No 142 (2024)

Sciences of Europe

Describing and Interpreting an Immersive Learning Case with the Immersion Cub...

Leonel Morgado

Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.

HUMAN EYE By-R.M Class 10 phy best digital notes.pdf

Ritik83251

(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...

Scintica Instrumentation

Gadgets for management of stored product pests_Dr.UPR.pdf

PirithiRaju

Insectsplayamajorroleinthedeteriorationoffoodgrainscausingbothquantitativeandqualitativelosses Wellprovedthatnogranariescanbefilledwithgrainswithoutinsectsastheharvestedproducecontainegg(or)larvae(or)pupae(or)adultinsectinthembecauseoffieldcarryoverinfestationwhichcannotbeavoidedindevelopingcountrieslikeIndia Simpletechnologiesfortimelydetectionofinsectsinthestoredproduceandtherebyplantimelycontrolmeasures

Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...

Travis Hills MN

在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样

vluwdy49

学校原件一模一样【微信：741003700 】《(salfor毕业证书)索尔福德大学毕业证》【微信：741003700 】学位证，留信认证（真实可查，永久存档）原件一模一样纸张工艺/offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原。 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 【主营项目】一.毕业证【q微741003700】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【q/微741003700】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

Tissue fluids_etiology_volume regulation_pressure.pptx

muralinath2

Authoring a personal GPT for your research and practice: How we created the Q...

Leonel Morgado

Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.

LEARNING TO LIVE WITH LAWS OF MOTION .pptx

yourprojectpartner05

Immersive Learning That Works: Research Grounding and Paths Forward

Leonel Morgado

We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.

The debris of the ‘last major merger’ is dynamically young

Sérgio Sacani

The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the ‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space, because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago. We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data 1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’ did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within the last few Gyr, consistent with the body of work surrounding the VRM.

Recently uploaded (20)

Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf

GBSN - Biochemistry (Unit 6) Chemistry of Proteins

Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...

EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...

Farming systems analysis: what have we learnt?.pptx

HOW DO ORGANISMS REPRODUCE?reproduction part 1

fermented food science of sauerkraut.pptx

Alternate Wetting and Drying - Climate Smart Agriculture

Sciences of Europe journal No 142 (2024)

Describing and Interpreting an Immersive Learning Case with the Immersion Cub...

HUMAN EYE By-R.M Class 10 phy best digital notes.pdf

(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...

Gadgets for management of stored product pests_Dr.UPR.pdf

Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...

在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样

Tissue fluids_etiology_volume regulation_pressure.pptx

Authoring a personal GPT for your research and practice: How we created the Q...

LEARNING TO LIVE WITH LAWS OF MOTION .pptx

Immersive Learning That Works: Research Grounding and Paths Forward

The debris of the ‘last major merger’ is dynamically young

MIP Award presentation at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024

1. MIP - Cross-Project Defect Prediction Models: l'Union Fait la Force Annibale Panichella Rocco Oliveto Andrea De Lucia 1

2. SANER 2014 (Prev. WCRE) 2

3. SANER 2014 (Prev. WCRE) 3

4. Cross-Project Defect Prediction 4 File LOC LCOM DIT … Bug F1 76 0 2 … F2 101 1 12 … … … … … … … Project A (Source) AI Model |——Software Metrics——| Project B (Target) File LOC LCOM DIT … Bug F1 176 22 1 … ? F2 433 144 4 … ? … … … … … ? |———Same Metrics———| Training Test

5. Friedman test + Nemenyi Liem and Panichella, Arxiv 2020 The Master Predictor? 5 “One Ring to rule them all” Some models seem to perform better than others Differences among libraries/ implementations

6. A Deeper Analysis 6 Bayes Net AUC = 0.56 Logistic AUC = 0.49 If (dit<10) If (loc > 5) 1 0 … True False True False Decision Tree AUC = 0.57 4 BN DT Log 8 9 1 38 10 0 Number of true positives (identi fi ed bugs) for Ant v1.7

7. L’Union Fait La Force (Unity Is Strength) 7

8. Inspired by Peer-Reviewing 8 Manuscript Reviewer1 Reviewer2 Reviewer3 Strong Reject Strong Accept Strong Accept ACCEPT Discussion File If (dit<10) If (loc > 5) 1 0 … True False True False Model 1 Model 2 Model 3 Combined Model (CODEP) Final Decision

9. Inspired by Peer-Reviewing 9 File If (dit<10) If (loc > 5) 1 0 … True False True False Model 1 Model 2 Model 3 Combined Model (CODEP) Final Decision Base models are diverse (diversity is the key like in peer-reviewing) Combination can be done with any base/meta model (e.g., Naive Bayes Net) CODEP takes as inputs the predicted scores (con fi dence values) and not the predicted labels (yes/no) of the base models

10. Replicating the Study 10 Years Later 10 A quick demo…

11. Replicating the Study 10 Years Later 11 Over multiple runs (random K-folds), CODEP shows statistically better performance than the base model Some models have larger results variance (less stable) than others AU C

12. The Impact in the Community 12 “Some classi fi ers are more consistent in predicting defects than others.” Bowes et al., PROMISE 2015 “Despite similar predictive performance values for these four classi fi ers, each detects different sets of defects.” Similar results also showed by • Bowes et al., in SQJ 2017 • Chen et al., JSEP 2019

13. 13 Other combination strategies: • Ensemble (Majority voting) • Ensemble Stacking (Different ML models) • Classi fi er selection (Adaptive per target project) The Impact in the Community 0 3 6 9 12 2016 2017 2018 2019 2020 2021 2022 # Follow-up papers on Ensemble methods (Directly citing CODEP) 150+ Papers on Google scholar on Ensemble and combination methods for Defect Prediction Majority voting is the best… Majority voting is weak… Bootstrap aggregation is the best ever… Voting average is the top-one

14. 14 Further studies on the role of model diversity on combination performances • Petric et al., ESEM 2016 “Diversity […] is essential" • Chen et al., IEEETransactions or Reliability, 2023 • Sharma et al., JSS 2023 • … Combination of different ML models for federated learning (DSA 2023) Beyond defect prediction: • Malware detection • Effort estimation • GitHub PR prediction • … The Impact in the Community

15. What About Current Trends in SE? 15

16. The Hot-Topic Waves 16 Standard ML Time Research Interests Deep Learning LLMs Random Forest Every 2-4 years (wave) we jump on new techniques hoping to fi nd the “master ring” We often forget the free-lunch theorem in optimization and machine learning Even a stopped clock is right twice in a day (simple baseline are stronger than we think) We should focus on understanding when to use X orY rather than focusing on X vs.Y (l’union fait la force)

17. MIP - Cross-Project Defect Prediction Models: l'Union Fait la Force Annibale Panichella Rocco Oliveto Andrea De Lucia 17

MIP Award presentation at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024

Recommended

Recommended

More Related Content

Similar to MIP Award presentation at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024

Similar to MIP Award presentation at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024 (20)

More from Annibale Panichella

More from Annibale Panichella (20)

Recently uploaded

Recently uploaded (20)

MIP Award presentation at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024