Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Prediction"

•Download as PPT, PDF•

1 like•846 views

Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Prediction" Huihua Lu, Bojan Cukic and Mark Culp.

Technology Education

An Iterative Semi-supervised Approach to Software Fault Prediction Huihua Lu, Bojan Cukic, Mark Culp Lane Department of Computer Science and Electrical Engineering Department of Statistics West Virginia University Morgantown, WV September 2011

Presentation Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Goal of the Study ,[object Object],[object Object]

Semi-Supervised Learning-1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Semi-Supervised Learning-2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Related Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Methodology-1 ,[object Object],[object Object],[object Object],[object Object],Initialize the labels for U Reset the labels for L Fit the labels for U+L

Methodology-2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Performance Measures ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Experiments ,[object Object],[object Object],[object Object],[object Object],[object Object]

Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Future Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Questions ,[object Object],[object Object],[object Object]

With the rise of software systems ranging from personal assistance to the nation's facilities, software defects become more critical concerns as they can cost millions of dollar as well as impact human lives. Yet, at the breakneck pace of rapid software development settings (like DevOps paradigm), the Quality Assurance (QA) practices nowadays are still time-consuming. Continuous Analytics for Software Quality (i.e., defect prediction models) can help development teams prioritize their QA resources and chart better quality improvement plan to avoid pitfalls in the past that lead to future software defects. Due to the need of specialists to design and configure a large number of configurations (e.g., data quality, data preprocessing, classification techniques, interpretation techniques), a set of practical guidelines for developing accurate and interpretable defect models has not been well-developed. The ultimate goal of my research aims to (1) provide practical guidelines on how to develop accurate and interpretable defect models for non-specialists; (2) develop an intelligible defect model that offer suggestions how to improve both software quality and processes; and (3) integrate defect models into a real-world practice of rapid development cycles like CI/CD settings. My research project is expected to provide significant benefits including the reduction of software defects and operating costs, while accelerating development productivity for building software systems in many of Australia's critical domains such as Smart Cities and e-Health.

Testing survey by_directionsTao He

Towards a Better Understanding of the Impact of Experimental Components on De...

Chakkrit (Kla) Tantithamthavorn

Software Quality Assurance (SQA) teams play a critical role in the software development process to ensure the absence of software defects. It is not feasible to perform exhaustive SQA tasks (i.e., software testing and code review) on a large software product given the limited SQA resources that are available. Thus, the prioritization of SQA efforts is an essential step in all SQA efforts. Defect prediction models are used to prioritize risky software modules and understand the impact of software metrics on the defect-proneness of software modules. The predictions and insights that are derived from defect prediction models can help software teams allocate their limited SQA resources to the modules that are most likely to be defective and avoid common past pitfalls that are associated with the defective modules of the past. However, the predictions and insights that are derived from defect prediction models may be inaccurate and unreliable if practitioners do not control for the impact of experimental components (e.g., datasets, metrics, and classifiers) on defect prediction models, which could lead to erroneous decision-making in practice. In this thesis, we investigate the impact of experimental components on the performance and interpretation of defect prediction models. More specifically, we investigate the impact of the three often overlooked experimental components (i.e., issue report mislabelling, parameter optimization of classification techniques, and model validation techniques) have on defect prediction models. Through case studies of systems that span both proprietary and open-source domains, we demonstrate that (1) issue report mislabelling does not impact the precision of defect prediction models, suggesting that researchers can rely on the predictions of defect prediction models that were trained using noisy defect datasets; (2) automated parameter optimization for classification techniques substantially improve the performance and stability of defect prediction models, as well as they change their interpretation, suggesting that researchers should no longer shy from applying parameter optimization to their models; and (3) the out-of-sample bootstrap validation technique produces a good balance between bias and variance of performance estimates, suggesting that the single holdout and cross-validation families that are commonly-used nowadays should be avoided.

An Empirical Comparison of Model Validation Techniques for Defect Prediction ...

Chakkrit (Kla) Tantithamthavorn

Defect prediction models help software quality assurance teams to effectively allocate their limited resources to the most defect-prone software modules. Model validation techniques, such as k-fold cross-validation, use this historical data to estimate how well a model will perform in the future. However, little is known about how accurate the performance estimates of these model validation techniques tend to be. In this paper, we set out to investigate the bias and variance of model validation techniques in the domain of defect prediction. A preliminary analysis of 101 publicly available defect prediction datasets suggests that 77% of them are highly susceptible to producing unstable results. Hence, selecting an appropriate model validation technique is a critical experimental design choice. Based on an analysis of 256 studies in the defect prediction literature, we select the 12 most commonly adopted model validation techniques for evaluation. Through a case study of data from 18 systems that span both open-source and proprietary domains, we derive the following practical guidelines for future defect prediction studies: (1) the single holdout validation techniques should be avoided; and (2) researchers should use the out-of-sample bootstrap validation technique instead of holdout or the commonly-used cross-validation techniques.

A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...

Editor IJCATR

Software reliability is considered as a quantifiable metric, which is defined as the probability of a software to operate without failure for a specified period of time in a specific environment. Various software reliability growth models have been proposed to predict the reliability of a software. These models help vendors to predict the behaviour of the software before shipment. The reliability is predicted by estimating the parameters of the software reliability growth models. But the model parameters are generally in nonlinear relationships which creates many problems in finding the optimal parameters using traditional techniques like Maximum Likelihood and least Square Estimation. Various stochastic search algorithms have been introduced which have made the task of parameter estimation, more reliable and computationally easier. Parameter estimation of NHPP based reliability models, using MLE and using an evolutionary search algorithm called Particle Swarm Optimization, has been explored in the paper.

In today's increasingly digitalised world, software defects are enormously expensive. In 2018, the Consortium for IT Software Quality reported that software defects cost the global economy $2.84 trillion dollars and affected more than 4 billion people. The average annual cost of software defects on Australian businesses is A$29 billion per year. Thus, failure to eliminate defects in safety-critical systems could result in serious injury to people, threats to life, death, and disasters. Traditionally, software quality assurance activities like testing and code review are widely adopted to discover software defects in a software product. However, ultra-large-scale systems, such as, Google, can consist of more than two billion lines of code, so exhaustively reviewing and testing every single line of code isn't feasible with limited time and resources. This project aims to create technologies that enable software engineers to produce the highest quality software systems with the lowest operational costs. To achieve this, this project will invent an end-to-end explainable AI platform to (1) understand the nature of critical defects; (2) predict and locate defects; (3) explain and visualise the characteristics of defects; (4) suggest potential patches to automatically fix defects; (5) integrate such platform as a GitHub bot plugin.

Decision Support Analyss for Software Effort Estimation by Analogy

Tim Menzies

Ssbse12b.ppt

Ptidej Team

Automated parameter optimization should be included in future  defect predict...Chakkrit (Kla) Tantithamthavorn

Abstract.docbutest

Test case prioritization using firefly algorithm for software testing

Journal Papers

Experiments on Design Pattern Discovery

Tim Menzies

@#$@#$@#$"""@#$@#$"""

nikhilawareness

Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...

Chakkrit (Kla) Tantithamthavorn

Software analytics focuses on analyzing and modeling a rich source of software data using well-established data analytics techniques in order to glean actionable insights for improving development practices, productivity, and software quality. However, if care is not taken when analyzing and modeling software data, the predictions and insights that are derived from analytical models may be inaccurate and unreliable. The goal of this hands-on tutorial is to guide participants on how to (1) analyze software data using statistical techniques like correlation analysis, hypothesis testing, effect size analysis, and multiple comparisons, (2) develop accurate, reliable, and reproducible analytical models, (3) interpret the models to uncover relationships and insights, and (4) discuss pitfalls associated with analytical techniques including hands-on examples with real software data. R will be the primary programming language. Code samples will be available in a public GitHub repository. Participants will do exercises via either RStudio or Jupyter Notebook through Binder.

The Impact of Class Rebalancing Techniques on the Performance and Interpretat...

Chakkrit (Kla) Tantithamthavorn

Defect models that are trained on class imbalanced datasets (i.e., the proportion of defective and clean modules is not equally represented) are highly susceptible to produce inaccurate prediction models. Prior research compares the impact of class rebalancing techniques on the performance of defect models but arrives at contradictory conclusions due to the use of different choice of datasets, classification techniques, and performance measures. Such contradictory conclusions make it hard to derive practical guidelines for whether class rebalancing techniques should be applied in the context of defect models. In this paper, we investigate the impact of class rebalancing techniques on performance measures and the interpretation of defect models. We also investigate the experimental settings in which class rebalancing techniques are beneficial for defect models. Through a case study of 101 datasets that span across proprietary and open-source systems, we conclude that the impact of class rebalancing techniques on the performance of defect prediction models depends on the used performance measure and the used classification techniques. We observe that the optimized SMOTE technique and the under-sampling technique are beneficial when quality assurance teams wish to increase AUC and Recall, respectively, but they should be avoided when deriving knowledge and understandings from defect models.

Ijetcas14 468Iasir Journals

Genetic algorithm based approach for

IJCSES Journal

an error in that computer program. In order to improve the software quality, prediction of faulty modules is necessary. Various Metric suites and techniques are available to predict the modules which are critical and likely to be fault prone. Genetic Algorithm is a problem solving algorithm. It uses genetics as its model of problem solving. It’s a search technique to find approximate solutions to optimization and search problems.Genetic algorithm is applied for solving the problem of faulty module prediction and as well as for finding the most important attribute for fault occurrence. In order to perform the analysis, performance validation of the Genetic Algorithm using open source software jEdit is done. The results are measured in terms Accuracy and Error in predicting by calculating probability of detection and probability of false Alarms

Model based test case prioritization using neural network classification

cseij

Model-based testing for real-life software systems often require a large number of tests, all of which cannot exhaustively be run due to time and cost constraints. Thus, it is necessary to prioritize the test cases in accordance with their importance the tester perceives. In this paper, this problem is solved by improving our given previous study, namely, applying classification approach to the results of our previous study functional relationship between the test case prioritization group membership and the two attributes: important index and frequency for all events belonging to given group are established. A for classification purpose, neural network (NN) that is the most advances is preferred and a data set obtained from our study for all test cases is classified using multilayer perceptron (MLP) NN. The classification results for commercial test prioritization application show the high classification accuracies about 96% and the acceptable test prioritization performances are achieved.

How to fine-tune and develop your own large language model.pptx

Knoldus Inc.

STAT7440StudentIMLPresentationJishan.pptx

JishanAhmed24

MIT521 software testing (2012) v2

Yudep Apoi

Benchmarking transfer learning approaches for NLP

Yury Kashnitsky

Training language models to follow instructions with human feedback (Instruct...

Rama Irsheidat

Training language models to follow instructions with human feedback (InstructGPT).pptx Long Ouyang, Jeff Wu, Xu Jiang et al. (OpenAI) Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.

Using the Machine to predict Testability

Miguel Lopez

03 Machine Learning Overview.pptx

SagarBurnah

What's hot

Automated exam question set generator using utility based agent and learning ...

Journal Papers

A software fault localization technique based on program mutationsTao He

Explainable Artificial Intelligence (XAI)  to Predict and Explain Future Soft...

Chakkrit (Kla) Tantithamthavorn

Decision Support Analyss for Software Effort Estimation by Analogy

Tim Menzies

Ssbse12b.ppt

Ptidej Team

Automated parameter optimization should be included in future  defect predict...Chakkrit (Kla) Tantithamthavorn

Abstract.docbutest

Test case prioritization using firefly algorithm for software testing

Journal Papers

Experiments on Design Pattern Discovery

Tim Menzies

@#$@#$@#$"""@#$@#$"""

nikhilawareness

Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...

Chakkrit (Kla) Tantithamthavorn

The Impact of Class Rebalancing Techniques on the Performance and Interpretat...

Chakkrit (Kla) Tantithamthavorn

Ijetcas14 468Iasir Journals

Genetic algorithm based approach for

IJCSES Journal

Model based test case prioritization using neural network classification

cseij

What's hot (15)

Automated exam question set generator using utility based agent and learning ...

A software fault localization technique based on program mutations

Explainable Artificial Intelligence (XAI)  to Predict and Explain Future Soft...

Decision Support Analyss for Software Effort Estimation by Analogy

Ssbse12b.ppt

Automated parameter optimization should be included in future  defect predict...

Abstract.doc

Test case prioritization using firefly algorithm for software testing

Experiments on Design Pattern Discovery

@#$@#$@#$"""@#$@#$"""

Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...

The Impact of Class Rebalancing Techniques on the Performance and Interpretat...

Ijetcas14 468

Genetic algorithm based approach for

Model based test case prioritization using neural network classification

Similar to Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Prediction"

How to fine-tune and develop your own large language model.pptx

Knoldus Inc.

STAT7440StudentIMLPresentationJishan.pptx

JishanAhmed24

MIT521 software testing (2012) v2

Yudep Apoi

Benchmarking transfer learning approaches for NLP

Yury Kashnitsky

Training language models to follow instructions with human feedback (Instruct...

Rama Irsheidat

Using the Machine to predict Testability

Miguel Lopez

03 Machine Learning Overview.pptx

SagarBurnah

Endsem AI merged.pdf

ShivamMishra603376

Check upload1Referral Bhai

1 Saint Leo University GBA 334 Applied Decision.docx

aryan532920

1 Saint Leo University GBA 334 Applied Decision Methods for Business Course Description: This course explores the use of applied quantitative techniques to aid in business-oriented decision making. Emphasis is on problem identification and formulation with application of solution techniques and the interpretation of results. Included are probability theory; decision making under certainty, risk and uncertainty; utility theory; forecasting; inventory control; PERT/CPM; queuing theory; and linear programming. Prerequisite: MAT 201 Textbook: Saint Leo University. (2013), Quantitative analysis (custom). Boston, MA: Pearson Learning Solution s. eBook with print upgrade option – ISBN: 978-1-269-86314-8 You will access the eBook via a link in the Course Home menu, where you can purchase the print upgrade option. Software The use of statistical software is a required component in this course. It is expected that you already have a basic understanding of computers and Microsoft Excel. In-depth training is provided during the course on the appropriate use of the following packages:  TreePlan-Student-179 Excel Add In  Excel QM, version 4  POM QM, version 4  Analysis Tool Pack for Microsoft Excel must be activated To access the information needed to install the software, click the Software Installation Information link located under Resources in the course menu. Learning Outcomes: At the completion of the course you should be familiar with several decision methods of decision-making in a business environment. You will find that almost every type of problem to which you will be exposed in the business world has been explored and methods of solving them have been devised. You should be able to apply these methods to the real-world situations in which you will one day find yourself. The skills developed during this class include: 1. Explain the key attributes and differences between the normal, standard normal, and binomial distribution of variables. 2. Identify and explain the underlying assumptions, key variables, theoretical basis, and solution techniques for the following decision-making problems: a. Decision Analysis b. Probability Theory and Analysis c. Regression Analysis d. Forecasting Methods e. Inventory Control Methods f. Project Management (including PERT/CPM) g. Network Models h. Queuing Theory i. Linear Programming Approaches and the Transportation and Assignment Special Cases j. Statistical Process Control 2 3. Formulate and execute a solution to a variety of decision-making problems using computer software. 4. Identify, explain, and interpret the key areas of computer output for the various decision-making problems. 5. Apply one of the approaches covered in class to a real-world issue and present the findings. 6. VALUES OUTCOME: Demonstrate the core value of excellence by adequately preparing for each class session, actively participating in cl ...

Recommendation System for Design Patterns in Software Development

Francis Palma

[2017/2018] RESEARCH in software engineering

Ivano Malavolta

Rsse12.ppt

Ptidej Team

Software testingthaneofife

Check upload1Nitish Bhardwaj

Prvt file testdummyuser1analytics

A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...

Shakas Technologies

A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning. Shakas Technologies ( Galaxy of Knowledge) #11/A 2nd East Main Road, Gandhi Nagar, Vellore - 632006. Mobile : +91-9500218218 / 8220150373| land line- 0416- 3552723 Shakas Training & Development | Shakas Sales & Services | Shakas Educational Trust|IEEE projects | Research & Development | Journal Publication | Email : info@shakastech.com | shakastech@gmail.com | website: www.shakastech.com Facebook: https://www.facebook.com/pages/Shakas-Technologies

ch_02 Machine Learning Overview.pdf

AhmedSalah48055

10.1.1.124.4940Swaraj Kumar

Rsse12.ppt

Yann-Gaël Guéhéneuc

Similar to Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Prediction" (20)

How to fine-tune and develop your own large language model.pptx

STAT7440StudentIMLPresentationJishan.pptx

MIT521 software testing (2012) v2

Benchmarking transfer learning approaches for NLP

Training language models to follow instructions with human feedback (Instruct...

Using the Machine to predict Testability

03 Machine Learning Overview.pptx

Endsem AI merged.pdf

Check upload1

1 Saint Leo University GBA 334 Applied Decision.docx

Recommendation System for Design Patterns in Software Development

[2017/2018] RESEARCH in software engineering

Rsse12.ppt

Software testing

Check upload1

Prvt file test

A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...

ch_02 Machine Learning Overview.pdf

10.1.1.124.4940

Rsse12.ppt

More from CS, NcState

Talks2015 novdec

Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Prediction"

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Prediction"

Similar to Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Prediction" (20)

More from CS, NcState

More from CS, NcState (20)

Recently uploaded

Recently uploaded (20)

Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Prediction"

Editor's Notes