SlideShare a Scribd company logo
Feature Selection Techniques For
Software Fault Prediction
(Summary)
Sungdo Gu
2015.03.27
MOTIVATION & PAPERS
 What is the minimum number of software metrics(features) that should be
considered for building an effective defect prediction model?
• A typical software defect prediction model is trained using software metrics
and fault data that have been collected from previously-developed software
releases or similar projects
• Quality of the software is an important aspect and software fault prediction
helps to better concentrate on faulty modules.
• With increasing complexity of software nowadays, feature selection is
important to remove the redundant, irrelevant and erroneous data from
dataset.
“How Many Software Metrics Should be Selected for Defect Prediction?”
“Measuring Stability of Threshold-based Feature Selection Techniques”
“A Hybrid Feature Selection Model For Software Fault Prediction”
FEATURE SELECTION TECHNIQUE
 Feature Selection Technique
 feature ranking
 feature subset selection
 Feature Selection Technique
 filter : which a feature subset is selected without involving any
learning algorithm.
 wrapper : use feedback from a learning algorithm to determine which
features to include in building a classification model.
 Feature Selection
: the process of choosing a subset of feature.
SOFTWARE METRICS
 A software metric is a quantitative measure of a degree to which a
software system or process possesses some property.
 CK metrics were desigened:
 to measure unique aspects of the Object Oriented approach.
 to measure complexity of the design.
 McCabe & Halstead metrics were designed:
 to measure complexity of module-based program.
SOFTWARE METRICS: Examples
<McCabe & Halstead Metrics> <CK Metrics>
CK Metrics: Examples
 WMC (Weighted Methods per Class)
 Definition
• WMC is the sum of the complexity of the methods of a class.
• WMC = Number of Methods (NOM), when all methods’ complexity are
considered UNITY.
 DIT (Depth of Inheritance Tree)
 Definition
• The maximum length from the node to the root of the tree
 CBO (Coupling Between Objects)
 Definition
• It is a count of the number of other classes to which it is coupled.
THRESHOLD-BASED FEATURE RANKING
 Five versions of TBFS feature rankers based on five different performance
metrics are considered.
• Mutual Information (MI)
• Kolmogorov-Smirnov (KS)
• Deviance (DV)
• Area Under the ROC (Receiver Operating Characteristic) Curve (AUC)
• Area Under the Precision-Recall Curve (PRC)
 Threshold-Based Feature Selection technique (TBFS)
: belongs to filter-based feature ranking techniques category.
 the TBFS can be extended to additional performance metrics such as
F-measure, Odds Ratio etc.
THRESHOLD-BASED FEATURE RANKING
CLASSIFIER
 Three classifiers
 Multilayer Perceptron
 k-Nearest Neighbors
 Logistic Regression
 Classifier Performance Metric
→ AUC (Area Under the ROC(Receiver Operating Characteristic))
: Performance metric that considers the ability of a classifier to differentiate
between the two classes.
- The AUC is a single-value measurement, whose value ranges from 0 to 1.
SOFTWARE MEASUREMENT DATA
 The software metrics & fault data collected from a real-world software project.
: The Eclipse from the PROMISE data repository.
 Transform the original data by
(1) removing all non-numeric attributes
(2) converting the post-release defects attribute to a binary class attribute
: fault-prone (fp) / not-fault-prone (nfp)
EMPIRICAL DESIGN
 Rank the metrics and choose the top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 and 20 metrics
according to their respective scores.
 The defect prediction models are evaluated in term of the AUC performance
metric.
 To understand the impact of
 different size of feature subset
 the five filter-based rankers
 the three different learners on the models’ predicive power
 five-fold cross-validation
EMPIRICAL RESULT
EMPIRICAL RESULT
STABILITY (ROBUSTNESS)
 The STABILITY of feature selection method is normally defined as the
degree of agreement between its outputs when applied to randomly-
selected subsets of the same input data.
where 𝑛 is the total number of features in the dataset, 𝑑 is the cardinality of
the intersection between subsets 𝑇𝑖 and 𝑇𝑗, and
Let 𝑇𝑖 𝑎𝑛𝑑 𝑇𝑗 be subsets of features, where 𝑇𝑖 = 𝑇𝑗 = 𝑘.
=> The greater the consistency index, the more similar the subsets are.
• To assess the robustness (stability) of feature selection techniques,
consistency index was used.
ANOTHER RESULTS
A HYBRID FEATURE SELECTION MODEL
A HYBRID FEATURE SELECTION MODEL
• Correlation based Feature Selection
• Chi-Squared
• OneR
• Gain Ratio
 Filter-method
• Naïve Bayes
• RBF Network (Radial Basis Function Network)
• J48 (Decision Tree)
 Wrapper-method
A HYBRID FEATURE SELECTION: RESULT
A HYBRID FEATURE SELECTION: RESULT
Thank you
Q & A

More Related Content

What's hot

Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control PoliciesModel-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Lionel Briand
 
An Approach to Software Testing of Machine Learning Applications
An Approach to Software Testing of Machine Learning ApplicationsAn Approach to Software Testing of Machine Learning Applications
An Approach to Software Testing of Machine Learning Applicationsbutest
 
Data collection for software defect prediction
Data collection for software defect predictionData collection for software defect prediction
Data collection for software defect prediction
AmmAr mobark
 
SSBSE 2020 keynote
SSBSE 2020 keynoteSSBSE 2020 keynote
SSBSE 2020 keynote
Shiva Nejati
 
Decision Support Analyss for Software Effort Estimation by Analogy
Decision Support Analyss for Software Effort Estimation by AnalogyDecision Support Analyss for Software Effort Estimation by Analogy
Decision Support Analyss for Software Effort Estimation by Analogy
Tim Menzies
 
Enabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial IntelligenceEnabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial Intelligence
Lionel Briand
 
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Lionel Briand
 
StratCel: A Strategy-Centric Approach to the Design of End-User Debugging Tools
StratCel: A Strategy-Centric Approach to the Design of End-User Debugging ToolsStratCel: A Strategy-Centric Approach to the Design of End-User Debugging Tools
StratCel: A Strategy-Centric Approach to the Design of End-User Debugging Tools
hciresearch
 
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Feng Zhang
 
Research-Based Innovation with Industry: Project Experience and Lessons Learned
Research-Based Innovation with Industry: Project Experience and Lessons LearnedResearch-Based Innovation with Industry: Project Experience and Lessons Learned
Research-Based Innovation with Industry: Project Experience and Lessons Learned
Lionel Briand
 
Functional Safety in ML-based Cyber-Physical Systems
Functional Safety in ML-based Cyber-Physical SystemsFunctional Safety in ML-based Cyber-Physical Systems
Functional Safety in ML-based Cyber-Physical Systems
Lionel Briand
 
Scalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingScalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and Testing
Lionel Briand
 
Software Testing Techniques
Software Testing TechniquesSoftware Testing Techniques
Software Testing Techniques
Kiran Kumar
 
Automated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web ApplicationsAutomated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web ApplicationsLionel Briand
 
Dynamic analysis in Software Testing
Dynamic analysis in Software TestingDynamic analysis in Software Testing
Dynamic analysis in Software Testing
Sagar Pednekar
 
Supporting Change in Product Lines within the Context of Use Case-driven Deve...
Supporting Change in Product Lines within the Context of Use Case-driven Deve...Supporting Change in Product Lines within the Context of Use Case-driven Deve...
Supporting Change in Product Lines within the Context of Use Case-driven Deve...
Lionel Briand
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection method
Amir Razmjou
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
Sung Kim
 
Testing the Untestable: Model Testing of Complex Software-Intensive Systems
Testing the Untestable: Model Testing of Complex Software-Intensive SystemsTesting the Untestable: Model Testing of Complex Software-Intensive Systems
Testing the Untestable: Model Testing of Complex Software-Intensive Systems
Lionel Briand
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
Sung Kim
 

What's hot (20)

Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control PoliciesModel-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
 
An Approach to Software Testing of Machine Learning Applications
An Approach to Software Testing of Machine Learning ApplicationsAn Approach to Software Testing of Machine Learning Applications
An Approach to Software Testing of Machine Learning Applications
 
Data collection for software defect prediction
Data collection for software defect predictionData collection for software defect prediction
Data collection for software defect prediction
 
SSBSE 2020 keynote
SSBSE 2020 keynoteSSBSE 2020 keynote
SSBSE 2020 keynote
 
Decision Support Analyss for Software Effort Estimation by Analogy
Decision Support Analyss for Software Effort Estimation by AnalogyDecision Support Analyss for Software Effort Estimation by Analogy
Decision Support Analyss for Software Effort Estimation by Analogy
 
Enabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial IntelligenceEnabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial Intelligence
 
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
 
StratCel: A Strategy-Centric Approach to the Design of End-User Debugging Tools
StratCel: A Strategy-Centric Approach to the Design of End-User Debugging ToolsStratCel: A Strategy-Centric Approach to the Design of End-User Debugging Tools
StratCel: A Strategy-Centric Approach to the Design of End-User Debugging Tools
 
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
 
Research-Based Innovation with Industry: Project Experience and Lessons Learned
Research-Based Innovation with Industry: Project Experience and Lessons LearnedResearch-Based Innovation with Industry: Project Experience and Lessons Learned
Research-Based Innovation with Industry: Project Experience and Lessons Learned
 
Functional Safety in ML-based Cyber-Physical Systems
Functional Safety in ML-based Cyber-Physical SystemsFunctional Safety in ML-based Cyber-Physical Systems
Functional Safety in ML-based Cyber-Physical Systems
 
Scalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingScalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and Testing
 
Software Testing Techniques
Software Testing TechniquesSoftware Testing Techniques
Software Testing Techniques
 
Automated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web ApplicationsAutomated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web Applications
 
Dynamic analysis in Software Testing
Dynamic analysis in Software TestingDynamic analysis in Software Testing
Dynamic analysis in Software Testing
 
Supporting Change in Product Lines within the Context of Use Case-driven Deve...
Supporting Change in Product Lines within the Context of Use Case-driven Deve...Supporting Change in Product Lines within the Context of Use Case-driven Deve...
Supporting Change in Product Lines within the Context of Use Case-driven Deve...
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection method
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
Testing the Untestable: Model Testing of Complex Software-Intensive Systems
Testing the Untestable: Model Testing of Complex Software-Intensive SystemsTesting the Untestable: Model Testing of Complex Software-Intensive Systems
Testing the Untestable: Model Testing of Complex Software-Intensive Systems
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 

Viewers also liked

Towards Process Improvement for Case Management. An Outline Based on Viable S...
Towards Process Improvement for Case Management. An Outline Based on Viable S...Towards Process Improvement for Case Management. An Outline Based on Viable S...
Towards Process Improvement for Case Management. An Outline Based on Viable S...
Ilia Bider
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
Stanley Wang
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
13 software metrics
13 software metrics13 software metrics
Jokes in slides
Jokes in slidesJokes in slides
Jokes in slides
harekrishna3012
 

Viewers also liked (6)

Towards Process Improvement for Case Management. An Outline Based on Viable S...
Towards Process Improvement for Case Management. An Outline Based on Viable S...Towards Process Improvement for Case Management. An Outline Based on Viable S...
Towards Process Improvement for Case Management. An Outline Based on Viable S...
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
13 software metrics
13 software metrics13 software metrics
13 software metrics
 
Jokes in slides
Jokes in slidesJokes in slides
Jokes in slides
 

Similar to Feature Selection Techniques for Software Fault Prediction (Summary)

module 1.pptx
module 1.pptxmodule 1.pptx
module 1.pptx
PawanBharadwaj2
 
Q01231103109
Q01231103109Q01231103109
Q01231103109
IOSR Journals
 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_Jie
MDO_Lab
 
Threshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniquesThreshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniques
journalBEEI
 
software engineering module i & ii.pptx
software engineering module i & ii.pptxsoftware engineering module i & ii.pptx
software engineering module i & ii.pptx
rani marri
 
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
IEEEGLOBALSOFTTECHNOLOGIES
 
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...
IEEEFINALYEARPROJECTS
 
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
IEEEGLOBALSOFTTECHNOLOGIES
 
SWE-6 TESTING.pptx
SWE-6 TESTING.pptxSWE-6 TESTING.pptx
SWE-6 TESTING.pptx
prashant821809
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
IOSR Journals
 
ADMET.pptx
ADMET.pptxADMET.pptx
ADMET.pptx
Santu Chall
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...
IRJET Journal
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritization
ijsrd.com
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...
IEEEGLOBALSOFTTECHNOLOGIES
 
MBSE and Model-Based Testing with Capella
MBSE and Model-Based Testing with CapellaMBSE and Model-Based Testing with Capella
MBSE and Model-Based Testing with Capella
Obeo
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliability
Gurkamal Rakhra
 
A Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionA Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature Selection
Davide Nardone
 
Kaplan - Systems and Software - Spring Review 2013
Kaplan - Systems and Software - Spring Review 2013Kaplan - Systems and Software - Spring Review 2013
Kaplan - Systems and Software - Spring Review 2013
The Air Force Office of Scientific Research
 
Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...
Gabriel Moreira
 
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
Łukasz Król
 

Similar to Feature Selection Techniques for Software Fault Prediction (Summary) (20)

module 1.pptx
module 1.pptxmodule 1.pptx
module 1.pptx
 
Q01231103109
Q01231103109Q01231103109
Q01231103109
 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_Jie
 
Threshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniquesThreshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniques
 
software engineering module i & ii.pptx
software engineering module i & ii.pptxsoftware engineering module i & ii.pptx
software engineering module i & ii.pptx
 
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
 
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...
 
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
 
SWE-6 TESTING.pptx
SWE-6 TESTING.pptxSWE-6 TESTING.pptx
SWE-6 TESTING.pptx
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 
ADMET.pptx
ADMET.pptxADMET.pptx
ADMET.pptx
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritization
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...
 
MBSE and Model-Based Testing with Capella
MBSE and Model-Based Testing with CapellaMBSE and Model-Based Testing with Capella
MBSE and Model-Based Testing with Capella
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliability
 
A Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionA Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature Selection
 
Kaplan - Systems and Software - Spring Review 2013
Kaplan - Systems and Software - Spring Review 2013Kaplan - Systems and Software - Spring Review 2013
Kaplan - Systems and Software - Spring Review 2013
 
Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...
 
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 

Feature Selection Techniques for Software Fault Prediction (Summary)

  • 1. Feature Selection Techniques For Software Fault Prediction (Summary) Sungdo Gu 2015.03.27
  • 2. MOTIVATION & PAPERS  What is the minimum number of software metrics(features) that should be considered for building an effective defect prediction model? • A typical software defect prediction model is trained using software metrics and fault data that have been collected from previously-developed software releases or similar projects • Quality of the software is an important aspect and software fault prediction helps to better concentrate on faulty modules. • With increasing complexity of software nowadays, feature selection is important to remove the redundant, irrelevant and erroneous data from dataset. “How Many Software Metrics Should be Selected for Defect Prediction?” “Measuring Stability of Threshold-based Feature Selection Techniques” “A Hybrid Feature Selection Model For Software Fault Prediction”
  • 3. FEATURE SELECTION TECHNIQUE  Feature Selection Technique  feature ranking  feature subset selection  Feature Selection Technique  filter : which a feature subset is selected without involving any learning algorithm.  wrapper : use feedback from a learning algorithm to determine which features to include in building a classification model.  Feature Selection : the process of choosing a subset of feature.
  • 4. SOFTWARE METRICS  A software metric is a quantitative measure of a degree to which a software system or process possesses some property.  CK metrics were desigened:  to measure unique aspects of the Object Oriented approach.  to measure complexity of the design.  McCabe & Halstead metrics were designed:  to measure complexity of module-based program.
  • 5. SOFTWARE METRICS: Examples <McCabe & Halstead Metrics> <CK Metrics>
  • 6. CK Metrics: Examples  WMC (Weighted Methods per Class)  Definition • WMC is the sum of the complexity of the methods of a class. • WMC = Number of Methods (NOM), when all methods’ complexity are considered UNITY.  DIT (Depth of Inheritance Tree)  Definition • The maximum length from the node to the root of the tree  CBO (Coupling Between Objects)  Definition • It is a count of the number of other classes to which it is coupled.
  • 7. THRESHOLD-BASED FEATURE RANKING  Five versions of TBFS feature rankers based on five different performance metrics are considered. • Mutual Information (MI) • Kolmogorov-Smirnov (KS) • Deviance (DV) • Area Under the ROC (Receiver Operating Characteristic) Curve (AUC) • Area Under the Precision-Recall Curve (PRC)  Threshold-Based Feature Selection technique (TBFS) : belongs to filter-based feature ranking techniques category.  the TBFS can be extended to additional performance metrics such as F-measure, Odds Ratio etc.
  • 9. CLASSIFIER  Three classifiers  Multilayer Perceptron  k-Nearest Neighbors  Logistic Regression  Classifier Performance Metric → AUC (Area Under the ROC(Receiver Operating Characteristic)) : Performance metric that considers the ability of a classifier to differentiate between the two classes. - The AUC is a single-value measurement, whose value ranges from 0 to 1.
  • 10. SOFTWARE MEASUREMENT DATA  The software metrics & fault data collected from a real-world software project. : The Eclipse from the PROMISE data repository.  Transform the original data by (1) removing all non-numeric attributes (2) converting the post-release defects attribute to a binary class attribute : fault-prone (fp) / not-fault-prone (nfp)
  • 11. EMPIRICAL DESIGN  Rank the metrics and choose the top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 and 20 metrics according to their respective scores.  The defect prediction models are evaluated in term of the AUC performance metric.  To understand the impact of  different size of feature subset  the five filter-based rankers  the three different learners on the models’ predicive power  five-fold cross-validation
  • 14. STABILITY (ROBUSTNESS)  The STABILITY of feature selection method is normally defined as the degree of agreement between its outputs when applied to randomly- selected subsets of the same input data. where 𝑛 is the total number of features in the dataset, 𝑑 is the cardinality of the intersection between subsets 𝑇𝑖 and 𝑇𝑗, and Let 𝑇𝑖 𝑎𝑛𝑑 𝑇𝑗 be subsets of features, where 𝑇𝑖 = 𝑇𝑗 = 𝑘. => The greater the consistency index, the more similar the subsets are. • To assess the robustness (stability) of feature selection techniques, consistency index was used.
  • 16. A HYBRID FEATURE SELECTION MODEL
  • 17. A HYBRID FEATURE SELECTION MODEL • Correlation based Feature Selection • Chi-Squared • OneR • Gain Ratio  Filter-method • Naïve Bayes • RBF Network (Radial Basis Function Network) • J48 (Decision Tree)  Wrapper-method
  • 18. A HYBRID FEATURE SELECTION: RESULT
  • 19. A HYBRID FEATURE SELECTION: RESULT

Editor's Notes

  1. Today, I'd like to give a presentation about software quality. It's going to cover feature selection issue in software quality, and this might be a summary of a couple papers that I have read I gave a title to "Feature Selection Technique~".
  2. 품질이 중요하고, 결함 예측이 결함 모듈에 집중하도록 도움이 된다. SW 복잡도가 증가함에 따라, 피처셀렉션은 중복, 불필요 데이터를 제거하는데 중요하다. 일반적 sw결함 예측 모델은 메트릭과 결함 데이터를 이용하여 트레이닝되는데, 그 데이터들은 기존에 개발되었거나 비슷한 프로젝트로 부터 수집된다.
  3. Feature selection technique – feature ranking / feature subset selection으로 나눔 Feature ranking은 각각의 predictive power에 따라 순위를 매겨 결정 Feature subset selection은 좋은 predictive power를 총괄적으로 가지고 있는 속성들의 subset을 찾는 것 또한 feature selection technique은 – filter / wrapper / embedded로 나눌수 있음 Filter: 어떤 learning 알고리즘을 쓰지않고 feature subset 선택하는 것 Wrapper: classification 모델을 만드는데 어떤 feature를 포함시킬지 결정하는데 learning 알고리즘의 feedback을 이용
  4. There are pretty many types of SW metrics, but I am gonna introduce two kinds of SW metrics which are mainly used.
  5. First, each attribute’s values are normalized between 0 and 1, and calculating performance metric using normalized attribute. Create feature ranking. 각 속성(피처)값을 0과1값으로 normalize한다. -> 각 독립 속성은 클래스 속성과 짝을 이룬다. (Y값 말하는듯..) 그리고 줄어든 두개 속성 데이터셋은 11개의 다른 성능메트릭으로 평가, 사후 확률에 기반하여.
  6. 두 클래스를 구별하기 위해 분류기의 능력을 고려한 성능 메트릭.
  7. They wanted to figure out the impact of size of feature subset. 그래서 그들은 1,2,3,~20까지 메트릭 순위를 매기고 선택함 -> 다음의 영향을 이해하기 위해.
  8. Besides, one of papers that I read focus on the robustness (or stability) of feature selection techniques. 같은 input data를 랜덤하게 선택한 subset을 적용했을 때, output들간의 일치 정도..-> Stability / robustness (안정성, 단단함) # cardinality: 집합의 원소 개수 (d는 교집합의 원소개수?)
  9. 이 논문은 stability를 확인하기 위해 데이터셋을 계속 바꿔가며 실험했다. pertubation
  10. Furthermore, there are model of A mixture of filter and wrapper approach. A Hybrid feature selection model for Software fault prediction.
  11. Hybrid feature selection model for Software fault prediction