SlideShare a Scribd company logo
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
On Strategies To Improve
Software Defect Prediction
Rahul Krishna
PhD Scholar
Dept. Computer Science
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Overview
• Motivation
• Research Questions
• Background
• Data Sets
• Experimental Setup
• Experimental Results
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
MOTIVATION
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Why Defect Prediction?
• Boehm and Papaccio[1] comment that early detection helps
reduce cost incurred to fix at a later stage “by a factor of upto 200”
• IEEE Metrics 2002 concluded that “Finding and fixing bugs after
delivery is usually 100 times more expensive that do so at the
requirements and design phase”[2]
• Shull et al.[2] claim that, “About 40-50% of the user programs enter
use with nontrivial defects”
• In the agile world, code bases are more developed than tested
• The takeaway– Find Bugs Early!
[1] B. W. Boehm and P. N. Papaccio, “Understanding and controlling software costs,” IEEE Trans. Softw. Eng., vol. 14, no. 10, pp. 1462–1477, Oct.
1988.
[2] F. Shull, V. Basili, B. Boehm, A. W. Brown, P. Costa, M. Lindvall, D. Port, I. Rus, R. Tesoriero, and M. Zelkowitz, “What we have learned about
fighting defects,” in Software Metrics, 2002. Proceedings. Eighth IEEE Symp. on. IEEE,pp. 249–258.
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Easier said than done..
• No oracles or closed form mathematical models.
• Expert opinion is would take too long.
• There way too much data
– Github has over 9 million users and 21.1 million repositories.
• Develop efficient code analysis measures
• Use Machine Learning tools
– Algorithms are too generic, needs optimization
• But real world data is skewed
– “80% of the defects lie in only 20% of the modules”
– Not enough defective samples in a project to learn meaningful
patterns
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Research Questions
• RQ1: Can techniques such as SMOTE be used to
preprocess data to improve prediction accuracy?
• RQ2: Does Tuning a data miner improve it’s
prediction accuracy?
• RQ3: Can tuning be performed in conjunction with
SMOTE to further improve the prediction accuracy?
• RQ4: Is SMOTE limited only to defect prediction?
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
BACKGROUND
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Defect Prediction
• Models are hard to obtain, to complex, and not aren’t reliable.
• Different regions of the same data have different properties[1]
• A plausible solution:
– Use Case Based Reasoning
– Learn from past data and reflect at new data
• They’re pretty neat
– Can work with partial data (useful at early stages)[2]
– Can work with sparse samples[3]
– Rather robust
[1] T. Menzies, A. Butcher, D. Cok, A. Marcus, L. Layman, F. Shull, B. Turhan, and T. Zimmermann, “Local versus global lessons for defect
prediction and effort estimation,” Software Engineering, IEEE Transactions on, vol. 39, no. 6, pp. 822 – 834, June 2013.
[2] F. Walkerden and R. Jeffery, “An empirical study of analogy based software effort estimation,” Empirical software engineering, vol. 4, no. 2,
pp.
135–158, 1999.
[3] I. Myrtveit, E. Stensrud, and M. Shepperd, “Reliability and validity in comparative studies of software prediction models,” Software
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
• Lessmann et al.[1] compared 21 different learners for software
defect prediction.
• They found Random Forest to be the Best and CART to be Worst
• That’s strange!
– They’re both tree based learners
– One is deterministic, other is random
– But they surely can’t be on opposite ends of spectrum. Can they?
• It’s probably the data
– It’s always the data
• Maybe the predictors need to be calibrated
Defect Prediction
[1] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking classification models for software defect prediction: A proposed framework
and novel findings,” Software Engineering, IEEE Transactions on, vol. 34, no. 4, pp. 485–496, July 2008
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Class Imbalance in Data
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Class Imbalance in Data
• Too many samples of non-defective modules
• Trees constructed by CART and RF would be
severely biased
• Use SMOTE[1] to preprocess training data
– Upsample minority class by creating “synthetic”
samples
– Downsample majority class by randomly discarding
samples
• My criterion (My infallible Engineering judgment)
– At least 50 samples from minority class
– At most 100 samples from majority class
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Parameter Tuning
• SMOTE preprocess training data
• Tuning calibrates the predictor
• Automate calibration using metaheuristics
– Differential Evolution is popular and a simple optimizer
• Use training data to learn the best parameters for the
predictor
• Test data must not be revealed
– Only datasets with 3 or more historic versions are used
– Last version is used for test, all other are used for
training
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Differential Evolution
(in a nutshell)
1. Randomly choose attributes
2. Pick any two attributes and create a new
attribute by interpolation
3. If the new attribute performs better than
the old one discard the old one
4. If not discard the new one
5. Repeat 2-4
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
DATASETS
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Datasets
• 8 Defect Prediction Datasets:
1. Ant
2. Ivy
3. Jedit
4. Lucene
5. Poi
6. Synapse
7. Velocity
8. Xalan
• 1 Bugzilla dataset (Thanks Chris!)
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
The Metrics
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
EXPERIMENTAL SETUP
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Statistical Measures
• Let A,B,C,D denote True negative, False Negative, False Positive, True Positive
• The standard measures:
• F,G measure both defects and non-defects at once. Recall and specificity only
measure one.
• G is especially useful, it is the harmonic mean between recall and specificity.
• G is lower than both recall and fallout.
– High G implies both Recall and sensitivity are high. Which is good!
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
EXPERIMENTAL RESULTS
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Defect Dataset
• RQ1:Can techniques such as SMOTE be used to preprocess data to
improve prediction accuracy?
– RF was better than CART in 6 out of the 8 datasets.
– SMOTE helped improve the performance in 4 out of those 6 datasets.
• RQ2: Does Tuning a data miner improve it’s prediction accuracy?
– Not always, just tuning didn’t help
• RQ3: Can tuning be performed in conjunction with SMOTE to further
improve the prediction accuracy?
– Yes. In 6 out the 8 datasets, SMOTE+Tuning surely helps
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Security Flaws Dataset
Search-based SE: without search, you won’t find a thing.
“Engineering is optimization and optimization is search.”
ai4se.net
Conclusion
• Defect Data Set
– SMOTEing is beneficial
– Tuning alone is not too useful
– The combination of both works even better.
• Security Flaw Dataset
– Improves sensitivity by 10 times
• In summary:
– Always reflect over the data
– Calibrate your predictor before use

More Related Content

What's hot

Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
Peter Varhol
 
20210128 traverso grow
20210128 traverso grow20210128 traverso grow
20210128 traverso grow
Alberto Traverso
 
Using a Visual Abstract as a Lens for Communicating and Promoting Design Scie...
Using a Visual Abstract as a Lens for Communicating and Promoting Design Scie...Using a Visual Abstract as a Lens for Communicating and Promoting Design Scie...
Using a Visual Abstract as a Lens for Communicating and Promoting Design Scie...
Margaret-Anne Storey
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
Lisa Cohen
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data Scientist
Alexey Grigorev
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_DescriptionSuman Banerjee
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and development
Eleanor Howe
 

What's hot (7)

Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
 
20210128 traverso grow
20210128 traverso grow20210128 traverso grow
20210128 traverso grow
 
Using a Visual Abstract as a Lens for Communicating and Promoting Design Scie...
Using a Visual Abstract as a Lens for Communicating and Promoting Design Scie...Using a Visual Abstract as a Lens for Communicating and Promoting Design Scie...
Using a Visual Abstract as a Lens for Communicating and Promoting Design Scie...
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data Scientist
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_Description
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and development
 

Viewers also liked

Trabajo ingles...!
Trabajo ingles...!Trabajo ingles...!
Trabajo ingles...!
constanzamardones123456
 
1 year
1 year1 year
Trabajo ingles...!
Trabajo ingles...!Trabajo ingles...!
Trabajo ingles...!
constanzamardones123456
 
Internet y navegador web
Internet y navegador webInternet y navegador web
Internet y navegador web
Jose Luis NV
 
Trabajo ingles...!
Trabajo ingles...!Trabajo ingles...!
Trabajo ingles...!
constanzamardones123456
 
Nola prestatzen diren karakolak
Nola prestatzen diren karakolakNola prestatzen diren karakolak
Nola prestatzen diren karakolak
vitored02
 
The “Bellwether” Effect and Its Implications to Transfer Learning
The “Bellwether” Effect and Its Implications to Transfer LearningThe “Bellwether” Effect and Its Implications to Transfer Learning
The “Bellwether” Effect and Its Implications to Transfer Learning
Rahul Krishna
 
Blog
BlogBlog
Discurso introductorio
Discurso introductorioDiscurso introductorio
Discurso introductorio
Felipe Caroca
 
Businesses To Begin For Under $500
Businesses To Begin For Under $500Businesses To Begin For Under $500
Businesses To Begin For Under $500Dises1962725
 
ece513
ece513ece513
Muusika
MuusikaMuusika
Muusika
Silver Linde
 
Guia el mundo actual 4º globalización
Guia el mundo actual 4º globalizaciónGuia el mundo actual 4º globalización
Guia el mundo actual 4º globalización
Felipe Caroca
 

Viewers also liked (15)

Trabajo ingles...!
Trabajo ingles...!Trabajo ingles...!
Trabajo ingles...!
 
Trabajo ingles...!
Trabajo ingles...!Trabajo ingles...!
Trabajo ingles...!
 
1 year
1 year1 year
1 year
 
Trabajo ingles...!
Trabajo ingles...!Trabajo ingles...!
Trabajo ingles...!
 
Internet y navegador web
Internet y navegador webInternet y navegador web
Internet y navegador web
 
Trabajo ingles...!
Trabajo ingles...!Trabajo ingles...!
Trabajo ingles...!
 
Nola prestatzen diren karakolak
Nola prestatzen diren karakolakNola prestatzen diren karakolak
Nola prestatzen diren karakolak
 
The “Bellwether” Effect and Its Implications to Transfer Learning
The “Bellwether” Effect and Its Implications to Transfer LearningThe “Bellwether” Effect and Its Implications to Transfer Learning
The “Bellwether” Effect and Its Implications to Transfer Learning
 
Blog
BlogBlog
Blog
 
Discurso introductorio
Discurso introductorioDiscurso introductorio
Discurso introductorio
 
Redes
RedesRedes
Redes
 
Businesses To Begin For Under $500
Businesses To Begin For Under $500Businesses To Begin For Under $500
Businesses To Begin For Under $500
 
ece513
ece513ece513
ece513
 
Muusika
MuusikaMuusika
Muusika
 
Guia el mundo actual 4º globalización
Guia el mundo actual 4º globalizaciónGuia el mundo actual 4º globalización
Guia el mundo actual 4º globalización
 

Similar to Software Testing

Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
Simon Hughes
 
Evolutionary MOO : A Distributed Computing Approach
Evolutionary MOO : A Distributed Computing ApproachEvolutionary MOO : A Distributed Computing Approach
Evolutionary MOO : A Distributed Computing Approach
North Carolina State University
 
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdfData+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
neelakandan2001kpm
 
Efficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity rankingEfficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity ranking
Shakas Technologies
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Simon Hughes
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
HJ van Veen
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Lucidworks
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data Scientists
Mitch Sanders
 
SpeedTrack Tech Overview 2015
SpeedTrack Tech Overview 2015SpeedTrack Tech Overview 2015
SpeedTrack Tech Overview 2015Michael Zoltowski
 
In search for a good practice of finding information
In search for a good practice of finding informationIn search for a good practice of finding information
In search for a good practice of finding informationKristian Norling
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015
Georgina Tilby
 
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine LearningBuilding Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
David Walker, CSM,CSD,MCP,MCAD,MCSD,MVP
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
Abdel Salam Sayyad
 
2014 IEEE JAVA DATA MINING PROJECT Searching dimension incomplete databases
2014 IEEE JAVA DATA MINING PROJECT Searching dimension incomplete databases2014 IEEE JAVA DATA MINING PROJECT Searching dimension incomplete databases
2014 IEEE JAVA DATA MINING PROJECT Searching dimension incomplete databases
IEEEMEMTECHSTUDENTSPROJECTS
 
IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases
IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databasesIEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases
IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases
IEEEFINALYEARSTUDENTPROJECTS
 
D017642026
D017642026D017642026
D017642026
IOSR Journals
 
Generation of Search Based Test Data on Acceptability Testing Principle
Generation of Search Based Test Data on Acceptability Testing PrincipleGeneration of Search Based Test Data on Acceptability Testing Principle
Generation of Search Based Test Data on Acceptability Testing Principle
iosrjce
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
Dasha Herrmannova
 
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine LearningBuilding Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
David Walker, CSM,CSD,MCP,MCAD,MCSD,MVP
 
Site search analytics workshop presentation
Site search analytics workshop presentationSite search analytics workshop presentation
Site search analytics workshop presentation
Louis Rosenfeld
 

Similar to Software Testing (20)

Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Evolutionary MOO : A Distributed Computing Approach
Evolutionary MOO : A Distributed Computing ApproachEvolutionary MOO : A Distributed Computing Approach
Evolutionary MOO : A Distributed Computing Approach
 
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdfData+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
 
Efficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity rankingEfficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity ranking
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data Scientists
 
SpeedTrack Tech Overview 2015
SpeedTrack Tech Overview 2015SpeedTrack Tech Overview 2015
SpeedTrack Tech Overview 2015
 
In search for a good practice of finding information
In search for a good practice of finding informationIn search for a good practice of finding information
In search for a good practice of finding information
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015
 
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine LearningBuilding Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
 
2014 IEEE JAVA DATA MINING PROJECT Searching dimension incomplete databases
2014 IEEE JAVA DATA MINING PROJECT Searching dimension incomplete databases2014 IEEE JAVA DATA MINING PROJECT Searching dimension incomplete databases
2014 IEEE JAVA DATA MINING PROJECT Searching dimension incomplete databases
 
IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases
IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databasesIEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases
IEEE 2014 JAVA DATA MINING PROJECTS Searching dimension incomplete databases
 
D017642026
D017642026D017642026
D017642026
 
Generation of Search Based Test Data on Acceptability Testing Principle
Generation of Search Based Test Data on Acceptability Testing PrincipleGeneration of Search Based Test Data on Acceptability Testing Principle
Generation of Search Based Test Data on Acceptability Testing Principle
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine LearningBuilding Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
 
Site search analytics workshop presentation
Site search analytics workshop presentationSite search analytics workshop presentation
Site search analytics workshop presentation
 

Recently uploaded

Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 

Recently uploaded (20)

Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 

Software Testing

  • 1. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net On Strategies To Improve Software Defect Prediction Rahul Krishna PhD Scholar Dept. Computer Science
  • 2. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Overview • Motivation • Research Questions • Background • Data Sets • Experimental Setup • Experimental Results
  • 3. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net MOTIVATION
  • 4. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Why Defect Prediction? • Boehm and Papaccio[1] comment that early detection helps reduce cost incurred to fix at a later stage “by a factor of upto 200” • IEEE Metrics 2002 concluded that “Finding and fixing bugs after delivery is usually 100 times more expensive that do so at the requirements and design phase”[2] • Shull et al.[2] claim that, “About 40-50% of the user programs enter use with nontrivial defects” • In the agile world, code bases are more developed than tested • The takeaway– Find Bugs Early! [1] B. W. Boehm and P. N. Papaccio, “Understanding and controlling software costs,” IEEE Trans. Softw. Eng., vol. 14, no. 10, pp. 1462–1477, Oct. 1988. [2] F. Shull, V. Basili, B. Boehm, A. W. Brown, P. Costa, M. Lindvall, D. Port, I. Rus, R. Tesoriero, and M. Zelkowitz, “What we have learned about fighting defects,” in Software Metrics, 2002. Proceedings. Eighth IEEE Symp. on. IEEE,pp. 249–258.
  • 5. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Easier said than done.. • No oracles or closed form mathematical models. • Expert opinion is would take too long. • There way too much data – Github has over 9 million users and 21.1 million repositories. • Develop efficient code analysis measures • Use Machine Learning tools – Algorithms are too generic, needs optimization • But real world data is skewed – “80% of the defects lie in only 20% of the modules” – Not enough defective samples in a project to learn meaningful patterns
  • 6. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Research Questions • RQ1: Can techniques such as SMOTE be used to preprocess data to improve prediction accuracy? • RQ2: Does Tuning a data miner improve it’s prediction accuracy? • RQ3: Can tuning be performed in conjunction with SMOTE to further improve the prediction accuracy? • RQ4: Is SMOTE limited only to defect prediction?
  • 7. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net BACKGROUND
  • 8. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Defect Prediction • Models are hard to obtain, to complex, and not aren’t reliable. • Different regions of the same data have different properties[1] • A plausible solution: – Use Case Based Reasoning – Learn from past data and reflect at new data • They’re pretty neat – Can work with partial data (useful at early stages)[2] – Can work with sparse samples[3] – Rather robust [1] T. Menzies, A. Butcher, D. Cok, A. Marcus, L. Layman, F. Shull, B. Turhan, and T. Zimmermann, “Local versus global lessons for defect prediction and effort estimation,” Software Engineering, IEEE Transactions on, vol. 39, no. 6, pp. 822 – 834, June 2013. [2] F. Walkerden and R. Jeffery, “An empirical study of analogy based software effort estimation,” Empirical software engineering, vol. 4, no. 2, pp. 135–158, 1999. [3] I. Myrtveit, E. Stensrud, and M. Shepperd, “Reliability and validity in comparative studies of software prediction models,” Software
  • 9. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net • Lessmann et al.[1] compared 21 different learners for software defect prediction. • They found Random Forest to be the Best and CART to be Worst • That’s strange! – They’re both tree based learners – One is deterministic, other is random – But they surely can’t be on opposite ends of spectrum. Can they? • It’s probably the data – It’s always the data • Maybe the predictors need to be calibrated Defect Prediction [1] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking classification models for software defect prediction: A proposed framework and novel findings,” Software Engineering, IEEE Transactions on, vol. 34, no. 4, pp. 485–496, July 2008
  • 10. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Class Imbalance in Data
  • 11. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Class Imbalance in Data • Too many samples of non-defective modules • Trees constructed by CART and RF would be severely biased • Use SMOTE[1] to preprocess training data – Upsample minority class by creating “synthetic” samples – Downsample majority class by randomly discarding samples • My criterion (My infallible Engineering judgment) – At least 50 samples from minority class – At most 100 samples from majority class
  • 12. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Parameter Tuning • SMOTE preprocess training data • Tuning calibrates the predictor • Automate calibration using metaheuristics – Differential Evolution is popular and a simple optimizer • Use training data to learn the best parameters for the predictor • Test data must not be revealed – Only datasets with 3 or more historic versions are used – Last version is used for test, all other are used for training
  • 13. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Differential Evolution (in a nutshell) 1. Randomly choose attributes 2. Pick any two attributes and create a new attribute by interpolation 3. If the new attribute performs better than the old one discard the old one 4. If not discard the new one 5. Repeat 2-4
  • 14. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net DATASETS
  • 15. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Datasets • 8 Defect Prediction Datasets: 1. Ant 2. Ivy 3. Jedit 4. Lucene 5. Poi 6. Synapse 7. Velocity 8. Xalan • 1 Bugzilla dataset (Thanks Chris!)
  • 16. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net The Metrics
  • 17. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net EXPERIMENTAL SETUP
  • 18. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Statistical Measures • Let A,B,C,D denote True negative, False Negative, False Positive, True Positive • The standard measures: • F,G measure both defects and non-defects at once. Recall and specificity only measure one. • G is especially useful, it is the harmonic mean between recall and specificity. • G is lower than both recall and fallout. – High G implies both Recall and sensitivity are high. Which is good!
  • 19. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net EXPERIMENTAL RESULTS
  • 20. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Defect Dataset • RQ1:Can techniques such as SMOTE be used to preprocess data to improve prediction accuracy? – RF was better than CART in 6 out of the 8 datasets. – SMOTE helped improve the performance in 4 out of those 6 datasets. • RQ2: Does Tuning a data miner improve it’s prediction accuracy? – Not always, just tuning didn’t help • RQ3: Can tuning be performed in conjunction with SMOTE to further improve the prediction accuracy? – Yes. In 6 out the 8 datasets, SMOTE+Tuning surely helps
  • 21. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net
  • 22. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net
  • 23. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Security Flaws Dataset
  • 24. Search-based SE: without search, you won’t find a thing. “Engineering is optimization and optimization is search.” ai4se.net Conclusion • Defect Data Set – SMOTEing is beneficial – Tuning alone is not too useful – The combination of both works even better. • Security Flaw Dataset – Improves sensitivity by 10 times • In summary: – Always reflect over the data – Calibrate your predictor before use