SlideShare a Scribd company logo
1 of 23
Continuous Test Suite Failure Prediction
Cong Pan1*, Michael Pradel2
1School of Reliability and Systems Engineering, Beihang University, China
2Department of Computer Science, University of Stuttgart, Germany
*Parts of this work were done while visiting University of Stuttgart
07/17/2021 ISSTA 2021 1
cong_pan@buaa.edu.cn michael@binaervarianz.de
Do We Really Need to Execute Test Suites for Every
Code Change?
ISSTA 2021 2
[1] Jeff Anderson, Saeed Salem, and Hyunsook Do. Striving for failure: an industrial case study about test failure prediction.
[2] Mateusz Machalica, Alex Samylkin, Meredith Porth, and Satish Chandra. Predictive test selection.
[3] https://github.com/search, January 2021
* We collect a dataset from Travis CI and GitHub, which includes 15,000 test suite runs from 242 open-source projects.
The Dynamics AX project has nearly 65,000
regression test cases, takes 3 days to execute[1]
The Facebook mobile code base has over 10,000 code changes
per week, each potentially triggers over 10,000 test cases[2]
Over 205 million projects on GitHub, many of them
use GitHub Actions for continuous integration[3]
Around 4.21% test suite invocations turn a previous
passing test suite into a failing test suite*
07/17/2021
Terminology and Problem Statement
07/17/2021 ISSTA 2021 3
Code Change
Test Suite
Classification
Model Test Suite
Failure
Test Suite
Pass
Input Predict
Continuous Test Suite Failure Prediction
Code Change
Test Case
Classification
Model Test Case
Failure
Test Case
Pass
Input Predict
Test Case Failure Prediction
Continuous test suite failure prediction: Given a code change 𝑐 and a test suite 𝑠 , the problem
is to predict whether triggering 𝑠 as part of continuous integration upon 𝑐 will pass or fail 𝑠.
Test Suite Selection
Code Change
Test Suite 1
Classification
Model Test Suite i
Failure
Test Suite i
Pass
Input Predict
Test Suite 2
Test Suite N
Workflow of Our Approach
07/17/2021 ISSTA 2021 4
• 44 features from 9 categories
• Features are adapted from just-in-time defect prediction and test case
failure prediction, and 19 newly added features
• Code change features
• Development history features
• Test features
Feature Extraction
07/17/2021 ISSTA 2021 5
28 features in 6 categories
Code Change Features
07/17/2021 ISSTA 2021 6
*DP=Defect Prediction, TC = Test Case Failure Prediction
*
7 features in 2 categories
Development History Features
07/17/2021 ISSTA 2021 7
*
*DP=Defect Prediction, TC = Test Case Failure Prediction
9 features in Test category
Test Features
07/17/2021 ISSTA 2021 8
*DP=Defect Prediction, TC = Test Case Failure Prediction
*
• Two principles:
• Predict test suite results instead of build results
• Count only first test failures to identify the exact bug-inducing code change
Label Extraction
07/17/2021 ISSTA 2021 9
Aim: label test suites as pass or failure based on historical test results
Code
Change
Statistics of the Dataset
07/17/2021 ISSTA 2021 10
• Machine learning models
• Decision tree, Naive Bayes, Support Vector Machine, Logistic Regression, Random
Forest, Multi-layer Perceptron, LightGBM
• 10*5 cross validation
• Evaluation metrics
• AUC, F-measure, G-measure, MCC
Experimental Setup
07/17/2021 ISSTA 2021 11
The test suite failure prediction model is effective. The best studied model, LightGBM, achieves an
AUC of 0.836.
The effect of different classification thresholds
on prediction performance
Effectiveness of different classification models
RQ1: How Effective are the Prediction Models?
07/17/2021 ISSTA 2021 12
RQ1: How Effective are the Prediction Models?
With time-based data splitting, the model is still effective, but provides slightly worse predictions
due to the smaller size of the training data set.
Use adjacent time steps
for training and testing
Use data from the same time
step for training and testing
Use samples in one time
step for training and samples
in the new step for testing
Use the first 80% of samples
within a time step for training
and the last 20% for testing
07/17/2021 ISSTA 2021 13
Simply reusing features from related domains yields a less effective model than our full feature set.
Features known from prior work vs. full feature set
RQ2: How Effective are the Features?
07/17/2021 ISSTA 2021 14
RQ2: How Effective are the Features?
Information about the developer experience, previous test results, and abundance of test
cases is most important for an effective prediction model.
Most important features: test features (TF10, TC, TP10, TF) and experience features (REXP,
SEXP, EXP)
07/17/2021 ISSTA 2021 15
The cost model shows when continuous test suite failure prediction is effective in practice.
The input parameters include:
• Cost 𝑟 of running a test suite
• Computational cost, maintenance cost, developer & development process cost
• Cost 𝑑 of delayed detection of a failure-inducing code change
• Failure localization, developer memories
• Test suite failure rate 𝑓 at which code changes cause test suite failures
• 𝑓=4.21% in our study
Cost Model
07/17/2021 ISSTA 2021 16
Cost Model Strategies
07/17/2021 ISSTA 2021 17
Strategy Comparison
MODEL strategy vs PERIOD strategy
MODEL strategy vs other strategies
(ALL, PERIOD, NEVER, RANDOM)
boundary
condition
boundary
condition
interval
07/17/2021 ISSTA 2021 18
Instantiating the theoretical cost model with real-world data shows that the predictive model is
cost-effective if 2.47 < < 37.58.
RQ3: (When) Is the Model Cost-Saving?
07/17/2021 ISSTA 2021 19
Example: suppose r=2 person hours, d=20 person hours,
d/r=10, model is the best strategy
• Flaky tests
• Marked in test reports: 0.067%
• Execute a code change multiple times and get different test results: <1%
• Dataset size and programming language
• Focus on open-source projects
• Theoretical abstraction of real costs
Threats to Validity
07/17/2021 ISSTA 2021 20
• We define the problem of continuous test suite failure prediction
• We share a large-scale dataset gathered from 242 real-world projects
• Based on the proposed features, our approach improves over baselines
that use features for just-in-time defect prediction and test case failure
prediction by 13.9% and 2.9%
• We present a cost model showing that our results could be useful in real
scenarios
Conclusion
07/17/2021 ISSTA 2021 21
Acknowledgements
07/17/2021 ISSTA 2021 22
Thank you!
07/17/2021 ISSTA 2021 23

More Related Content

What's hot

Rob Baarda - Are Real Test Metrics Predictive for the Future?
Rob Baarda - Are Real Test Metrics Predictive for the Future?Rob Baarda - Are Real Test Metrics Predictive for the Future?
Rob Baarda - Are Real Test Metrics Predictive for the Future?TEST Huddle
 
Assessing the Reliability of a Human Estimator
Assessing the Reliability of a Human EstimatorAssessing the Reliability of a Human Estimator
Assessing the Reliability of a Human EstimatorTim Menzies
 
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONSQUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONSijseajournal
 
Test effort estimation
Test effort estimationTest effort estimation
Test effort estimationramesh kumar
 
On The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityOn The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityDelft University of Technology
 
Midterm Exam Solutions Fall02
Midterm Exam Solutions Fall02Midterm Exam Solutions Fall02
Midterm Exam Solutions Fall02Radu_Negulescu
 
Test Estimation Techniques
Test Estimation TechniquesTest Estimation Techniques
Test Estimation TechniquesNishant Worah
 
A Productive Method for Improving Test Effectiveness
A Productive Method for Improving Test EffectivenessA Productive Method for Improving Test Effectiveness
A Productive Method for Improving Test EffectivenessShradha Singh
 
Using Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorUsing Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorTim Menzies
 
Regression Optimizer
Regression OptimizerRegression Optimizer
Regression OptimizerShradha Singh
 
Test effort estimation a reason behind successful testing
Test effort estimation   a reason behind successful testingTest effort estimation   a reason behind successful testing
Test effort estimation a reason behind successful testingIndium Software
 
Complexity Measures for Secure Service-Orieted Software Architectures
Complexity Measures for Secure Service-Orieted Software ArchitecturesComplexity Measures for Secure Service-Orieted Software Architectures
Complexity Measures for Secure Service-Orieted Software ArchitecturesTim Menzies
 
Partitioned Based Regression Verification
Partitioned Based Regression VerificationPartitioned Based Regression Verification
Partitioned Based Regression VerificationAung Thu Rha Hein
 
Fundamental test process (TESTING IMPLEMENTATION SYSTEM)
Fundamental test process (TESTING IMPLEMENTATION SYSTEM)Fundamental test process (TESTING IMPLEMENTATION SYSTEM)
Fundamental test process (TESTING IMPLEMENTATION SYSTEM)Putri nadya Fazri
 
The art of system and solution testing
The art of system and solution testingThe art of system and solution testing
The art of system and solution testinggaoliang641
 
Test Estimation using Test Case Point Analysis method
Test Estimation using Test Case Point Analysis methodTest Estimation using Test Case Point Analysis method
Test Estimation using Test Case Point Analysis methodKMS Technology
 
Defect Prediction: Accomplishments and Future Challenges
Defect Prediction: Accomplishments and Future ChallengesDefect Prediction: Accomplishments and Future Challenges
Defect Prediction: Accomplishments and Future ChallengesYasutaka Kamei
 
TESTING IMPLEMENTATION SYSTEM
TESTING IMPLEMENTATION SYSTEMTESTING IMPLEMENTATION SYSTEM
TESTING IMPLEMENTATION SYSTEMPutri nadya Fazri
 
Test Case Point Analysis
Test Case Point AnalysisTest Case Point Analysis
Test Case Point Analysisvuqn
 
H047054064
H047054064H047054064
H047054064inventy
 

What's hot (20)

Rob Baarda - Are Real Test Metrics Predictive for the Future?
Rob Baarda - Are Real Test Metrics Predictive for the Future?Rob Baarda - Are Real Test Metrics Predictive for the Future?
Rob Baarda - Are Real Test Metrics Predictive for the Future?
 
Assessing the Reliability of a Human Estimator
Assessing the Reliability of a Human EstimatorAssessing the Reliability of a Human Estimator
Assessing the Reliability of a Human Estimator
 
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONSQUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
 
Test effort estimation
Test effort estimationTest effort estimation
Test effort estimation
 
On The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityOn The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code Quality
 
Midterm Exam Solutions Fall02
Midterm Exam Solutions Fall02Midterm Exam Solutions Fall02
Midterm Exam Solutions Fall02
 
Test Estimation Techniques
Test Estimation TechniquesTest Estimation Techniques
Test Estimation Techniques
 
A Productive Method for Improving Test Effectiveness
A Productive Method for Improving Test EffectivenessA Productive Method for Improving Test Effectiveness
A Productive Method for Improving Test Effectiveness
 
Using Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorUsing Developer Information as a Prediction Factor
Using Developer Information as a Prediction Factor
 
Regression Optimizer
Regression OptimizerRegression Optimizer
Regression Optimizer
 
Test effort estimation a reason behind successful testing
Test effort estimation   a reason behind successful testingTest effort estimation   a reason behind successful testing
Test effort estimation a reason behind successful testing
 
Complexity Measures for Secure Service-Orieted Software Architectures
Complexity Measures for Secure Service-Orieted Software ArchitecturesComplexity Measures for Secure Service-Orieted Software Architectures
Complexity Measures for Secure Service-Orieted Software Architectures
 
Partitioned Based Regression Verification
Partitioned Based Regression VerificationPartitioned Based Regression Verification
Partitioned Based Regression Verification
 
Fundamental test process (TESTING IMPLEMENTATION SYSTEM)
Fundamental test process (TESTING IMPLEMENTATION SYSTEM)Fundamental test process (TESTING IMPLEMENTATION SYSTEM)
Fundamental test process (TESTING IMPLEMENTATION SYSTEM)
 
The art of system and solution testing
The art of system and solution testingThe art of system and solution testing
The art of system and solution testing
 
Test Estimation using Test Case Point Analysis method
Test Estimation using Test Case Point Analysis methodTest Estimation using Test Case Point Analysis method
Test Estimation using Test Case Point Analysis method
 
Defect Prediction: Accomplishments and Future Challenges
Defect Prediction: Accomplishments and Future ChallengesDefect Prediction: Accomplishments and Future Challenges
Defect Prediction: Accomplishments and Future Challenges
 
TESTING IMPLEMENTATION SYSTEM
TESTING IMPLEMENTATION SYSTEMTESTING IMPLEMENTATION SYSTEM
TESTING IMPLEMENTATION SYSTEM
 
Test Case Point Analysis
Test Case Point AnalysisTest Case Point Analysis
Test Case Point Analysis
 
H047054064
H047054064H047054064
H047054064
 

Similar to Continuous test suite failure prediction

Reengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approachReengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approachIJECEIAES
 
DevOps CI Automation Continuous Integration
DevOps CI Automation Continuous IntegrationDevOps CI Automation Continuous Integration
DevOps CI Automation Continuous IntegrationIRJET Journal
 
Reducing Cycle Time for iDEN Releases – A Development and Test Perspective
Reducing Cycle Time for iDEN Releases – A Development and Test PerspectiveReducing Cycle Time for iDEN Releases – A Development and Test Perspective
Reducing Cycle Time for iDEN Releases – A Development and Test PerspectivePraveen Srivastava
 
Reducing Cycle Time for iDEN Releases – A Development and Test Perspective
Reducing Cycle Time for iDEN Releases – A Development and Test PerspectiveReducing Cycle Time for iDEN Releases – A Development and Test Perspective
Reducing Cycle Time for iDEN Releases – A Development and Test PerspectivePraveen Srivastava
 
Issre2014 test defectprediction
Issre2014 test defectpredictionIssre2014 test defectprediction
Issre2014 test defectpredictionKim Herzig
 
ISTQB, ISEB Lecture Notes- 2
ISTQB, ISEB Lecture Notes- 2ISTQB, ISEB Lecture Notes- 2
ISTQB, ISEB Lecture Notes- 2onsoftwaretest
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
ESEconf2011 - Guckenheimer Sam: "Agile in the Very Large"
ESEconf2011 - Guckenheimer Sam: "Agile in the Very Large"ESEconf2011 - Guckenheimer Sam: "Agile in the Very Large"
ESEconf2011 - Guckenheimer Sam: "Agile in the Very Large"Aberla
 
Unit Testing Frameworks: A comparative study
Unit Testing Frameworks: A comparative studyUnit Testing Frameworks: A comparative study
Unit Testing Frameworks: A comparative studyIRJET Journal
 
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks Test Case Optimization and Redundancy Reduction Using GA and Neural Networks
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks IJECEIAES
 
ISTQB / ISEB Foundation Exam Practice - 2
ISTQB / ISEB Foundation Exam Practice - 2ISTQB / ISEB Foundation Exam Practice - 2
ISTQB / ISEB Foundation Exam Practice - 2Yogindernath Gupta
 
Enhancing the Software Effort Prediction Accuracy using Reduced Number of Cos...
Enhancing the Software Effort Prediction Accuracy using Reduced Number of Cos...Enhancing the Software Effort Prediction Accuracy using Reduced Number of Cos...
Enhancing the Software Effort Prediction Accuracy using Reduced Number of Cos...IRJET Journal
 
Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...eSAT Publishing House
 
Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...eSAT Journals
 
Deployment of Debug and Trace for features in RISC-V Core
Deployment of Debug and Trace for features in RISC-V CoreDeployment of Debug and Trace for features in RISC-V Core
Deployment of Debug and Trace for features in RISC-V CoreIRJET Journal
 
Improvement in Error Resilience in BIST using hamming code
Improvement in Error Resilience in BIST using hamming codeImprovement in Error Resilience in BIST using hamming code
Improvement in Error Resilience in BIST using hamming codeIJMTST Journal
 
Isita_Pal_Resume_(1)
Isita_Pal_Resume_(1)Isita_Pal_Resume_(1)
Isita_Pal_Resume_(1)Isita Pal
 

Similar to Continuous test suite failure prediction (20)

Reengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approachReengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approach
 
50120140502011
5012014050201150120140502011
50120140502011
 
50120140502011
5012014050201150120140502011
50120140502011
 
DevOps CI Automation Continuous Integration
DevOps CI Automation Continuous IntegrationDevOps CI Automation Continuous Integration
DevOps CI Automation Continuous Integration
 
Reducing Cycle Time for iDEN Releases – A Development and Test Perspective
Reducing Cycle Time for iDEN Releases – A Development and Test PerspectiveReducing Cycle Time for iDEN Releases – A Development and Test Perspective
Reducing Cycle Time for iDEN Releases – A Development and Test Perspective
 
Reducing Cycle Time for iDEN Releases – A Development and Test Perspective
Reducing Cycle Time for iDEN Releases – A Development and Test PerspectiveReducing Cycle Time for iDEN Releases – A Development and Test Perspective
Reducing Cycle Time for iDEN Releases – A Development and Test Perspective
 
Issre2014 test defectprediction
Issre2014 test defectpredictionIssre2014 test defectprediction
Issre2014 test defectprediction
 
Maestro_Abstract
Maestro_AbstractMaestro_Abstract
Maestro_Abstract
 
ISTQB, ISEB Lecture Notes- 2
ISTQB, ISEB Lecture Notes- 2ISTQB, ISEB Lecture Notes- 2
ISTQB, ISEB Lecture Notes- 2
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
ESEconf2011 - Guckenheimer Sam: "Agile in the Very Large"
ESEconf2011 - Guckenheimer Sam: "Agile in the Very Large"ESEconf2011 - Guckenheimer Sam: "Agile in the Very Large"
ESEconf2011 - Guckenheimer Sam: "Agile in the Very Large"
 
Unit Testing Frameworks: A comparative study
Unit Testing Frameworks: A comparative studyUnit Testing Frameworks: A comparative study
Unit Testing Frameworks: A comparative study
 
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks Test Case Optimization and Redundancy Reduction Using GA and Neural Networks
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks
 
ISTQB / ISEB Foundation Exam Practice - 2
ISTQB / ISEB Foundation Exam Practice - 2ISTQB / ISEB Foundation Exam Practice - 2
ISTQB / ISEB Foundation Exam Practice - 2
 
Enhancing the Software Effort Prediction Accuracy using Reduced Number of Cos...
Enhancing the Software Effort Prediction Accuracy using Reduced Number of Cos...Enhancing the Software Effort Prediction Accuracy using Reduced Number of Cos...
Enhancing the Software Effort Prediction Accuracy using Reduced Number of Cos...
 
Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...
 
Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...
 
Deployment of Debug and Trace for features in RISC-V Core
Deployment of Debug and Trace for features in RISC-V CoreDeployment of Debug and Trace for features in RISC-V Core
Deployment of Debug and Trace for features in RISC-V Core
 
Improvement in Error Resilience in BIST using hamming code
Improvement in Error Resilience in BIST using hamming codeImprovement in Error Resilience in BIST using hamming code
Improvement in Error Resilience in BIST using hamming code
 
Isita_Pal_Resume_(1)
Isita_Pal_Resume_(1)Isita_Pal_Resume_(1)
Isita_Pal_Resume_(1)
 

Recently uploaded

SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !risocarla2016
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.KathleenAnnCordero2
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comsaastr
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Escort Service
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘 謝
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power
 

Recently uploaded (20)

SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
 

Continuous test suite failure prediction

  • 1. Continuous Test Suite Failure Prediction Cong Pan1*, Michael Pradel2 1School of Reliability and Systems Engineering, Beihang University, China 2Department of Computer Science, University of Stuttgart, Germany *Parts of this work were done while visiting University of Stuttgart 07/17/2021 ISSTA 2021 1 cong_pan@buaa.edu.cn michael@binaervarianz.de
  • 2. Do We Really Need to Execute Test Suites for Every Code Change? ISSTA 2021 2 [1] Jeff Anderson, Saeed Salem, and Hyunsook Do. Striving for failure: an industrial case study about test failure prediction. [2] Mateusz Machalica, Alex Samylkin, Meredith Porth, and Satish Chandra. Predictive test selection. [3] https://github.com/search, January 2021 * We collect a dataset from Travis CI and GitHub, which includes 15,000 test suite runs from 242 open-source projects. The Dynamics AX project has nearly 65,000 regression test cases, takes 3 days to execute[1] The Facebook mobile code base has over 10,000 code changes per week, each potentially triggers over 10,000 test cases[2] Over 205 million projects on GitHub, many of them use GitHub Actions for continuous integration[3] Around 4.21% test suite invocations turn a previous passing test suite into a failing test suite* 07/17/2021
  • 3. Terminology and Problem Statement 07/17/2021 ISSTA 2021 3 Code Change Test Suite Classification Model Test Suite Failure Test Suite Pass Input Predict Continuous Test Suite Failure Prediction Code Change Test Case Classification Model Test Case Failure Test Case Pass Input Predict Test Case Failure Prediction Continuous test suite failure prediction: Given a code change 𝑐 and a test suite 𝑠 , the problem is to predict whether triggering 𝑠 as part of continuous integration upon 𝑐 will pass or fail 𝑠. Test Suite Selection Code Change Test Suite 1 Classification Model Test Suite i Failure Test Suite i Pass Input Predict Test Suite 2 Test Suite N
  • 4. Workflow of Our Approach 07/17/2021 ISSTA 2021 4
  • 5. • 44 features from 9 categories • Features are adapted from just-in-time defect prediction and test case failure prediction, and 19 newly added features • Code change features • Development history features • Test features Feature Extraction 07/17/2021 ISSTA 2021 5
  • 6. 28 features in 6 categories Code Change Features 07/17/2021 ISSTA 2021 6 *DP=Defect Prediction, TC = Test Case Failure Prediction *
  • 7. 7 features in 2 categories Development History Features 07/17/2021 ISSTA 2021 7 * *DP=Defect Prediction, TC = Test Case Failure Prediction
  • 8. 9 features in Test category Test Features 07/17/2021 ISSTA 2021 8 *DP=Defect Prediction, TC = Test Case Failure Prediction *
  • 9. • Two principles: • Predict test suite results instead of build results • Count only first test failures to identify the exact bug-inducing code change Label Extraction 07/17/2021 ISSTA 2021 9 Aim: label test suites as pass or failure based on historical test results Code Change
  • 10. Statistics of the Dataset 07/17/2021 ISSTA 2021 10
  • 11. • Machine learning models • Decision tree, Naive Bayes, Support Vector Machine, Logistic Regression, Random Forest, Multi-layer Perceptron, LightGBM • 10*5 cross validation • Evaluation metrics • AUC, F-measure, G-measure, MCC Experimental Setup 07/17/2021 ISSTA 2021 11
  • 12. The test suite failure prediction model is effective. The best studied model, LightGBM, achieves an AUC of 0.836. The effect of different classification thresholds on prediction performance Effectiveness of different classification models RQ1: How Effective are the Prediction Models? 07/17/2021 ISSTA 2021 12
  • 13. RQ1: How Effective are the Prediction Models? With time-based data splitting, the model is still effective, but provides slightly worse predictions due to the smaller size of the training data set. Use adjacent time steps for training and testing Use data from the same time step for training and testing Use samples in one time step for training and samples in the new step for testing Use the first 80% of samples within a time step for training and the last 20% for testing 07/17/2021 ISSTA 2021 13
  • 14. Simply reusing features from related domains yields a less effective model than our full feature set. Features known from prior work vs. full feature set RQ2: How Effective are the Features? 07/17/2021 ISSTA 2021 14
  • 15. RQ2: How Effective are the Features? Information about the developer experience, previous test results, and abundance of test cases is most important for an effective prediction model. Most important features: test features (TF10, TC, TP10, TF) and experience features (REXP, SEXP, EXP) 07/17/2021 ISSTA 2021 15
  • 16. The cost model shows when continuous test suite failure prediction is effective in practice. The input parameters include: • Cost 𝑟 of running a test suite • Computational cost, maintenance cost, developer & development process cost • Cost 𝑑 of delayed detection of a failure-inducing code change • Failure localization, developer memories • Test suite failure rate 𝑓 at which code changes cause test suite failures • 𝑓=4.21% in our study Cost Model 07/17/2021 ISSTA 2021 16
  • 18. Strategy Comparison MODEL strategy vs PERIOD strategy MODEL strategy vs other strategies (ALL, PERIOD, NEVER, RANDOM) boundary condition boundary condition interval 07/17/2021 ISSTA 2021 18
  • 19. Instantiating the theoretical cost model with real-world data shows that the predictive model is cost-effective if 2.47 < < 37.58. RQ3: (When) Is the Model Cost-Saving? 07/17/2021 ISSTA 2021 19 Example: suppose r=2 person hours, d=20 person hours, d/r=10, model is the best strategy
  • 20. • Flaky tests • Marked in test reports: 0.067% • Execute a code change multiple times and get different test results: <1% • Dataset size and programming language • Focus on open-source projects • Theoretical abstraction of real costs Threats to Validity 07/17/2021 ISSTA 2021 20
  • 21. • We define the problem of continuous test suite failure prediction • We share a large-scale dataset gathered from 242 real-world projects • Based on the proposed features, our approach improves over baselines that use features for just-in-time defect prediction and test case failure prediction by 13.9% and 2.9% • We present a cost model showing that our results could be useful in real scenarios Conclusion 07/17/2021 ISSTA 2021 21