The document discusses the impact of noise on the performance of bug prediction models. It introduces three research questions about the resistance of prediction models to noise, the ability to detect and remove noise, and whether removing noise can improve performance. The study approach involves training prediction models on software data with artificially inserted noise, and evaluating the models' performance on clean test data. The results show that the models are reasonably resistant to noise for some projects but performance decreases for others. This indicates noise could negatively impact models if not addressed.
Defect prediction models help software quality assurance teams to effectively allocate their limited resources to the most defect-prone software modules. Model validation techniques, such as k-fold cross-validation, use this historical data to estimate how well a model will perform in the future. However, little is known about how accurate the performance estimates of these model validation techniques tend to be. In this paper, we set out to investigate the bias and variance of model validation techniques in the domain of defect prediction. A preliminary analysis of 101 publicly available defect prediction datasets suggests that 77% of them are highly susceptible to producing unstable results. Hence, selecting an appropriate model validation technique is a critical experimental design choice. Based on an analysis of 256 studies in the defect prediction literature, we select the 12 most commonly adopted model validation techniques for evaluation. Through a case study of data from 18 systems that span both open-source and proprietary domains, we derive the following practical guidelines for future defect prediction studies: (1) the single holdout validation techniques should be avoided; and (2) researchers should use the out-of-sample bootstrap validation technique instead of holdout or the commonly-used cross-validation techniques.
Defect, defect, defect: PROMISE 2012 Keynote Sung Kim
Ā
Software prediction leveraging repositories has received a tremendous amount of attention within the software engineering community, including PROMISE. In this talk, I will first present great achievements in defect prediction research including new defect prediction features, promising algorithms, and interesting analysis results. However, there are still many challenges in defect prediction. I will talk about them and discuss potential solutions for them leveraging prediction 2.0.
Software analytics (for software quality purpose) is a statistical or machine learning classifier that is trained to identify defect-prone software modules. The goal of software analytics is to help software engineers prioritize their software testing effort on the most-risky modules and understand past pitfalls that lead to defective code. While the adoption of software analytics enables software organizations to distil actionable insights, there are still many barriers to broad and successful adoption of such analytics systems. Indeed, even if software organizations can access such invaluable software artifacts and toolkits for data analytics, researchers and practitioners often have little knowledge to properly develop analytics systems. Thus, the accuracy of the predictions and the insights that are derived from analytics systems is one of the most important challenges of data science in software engineering.
In this work, we conduct a series of empirical investigation to better understand the impact of experimental components (i.e., class mislabelling, parameter optimization of classification techniques, and model validation techniques) on the performance and interpretation of software analytics. To accelerate a large amount of compute-intensive experiment, we leverage the High-Performance-Computing (HPC) resources of Centre for Advanced Computing (CAC) from Queenās University, Canada. Through case studies of systems that span both proprietary and open- source domains, we demonstrate that (1) realistic noise does not impact the precision of software analytics; (2) automated parameter optimization for classification techniques substantially improve the performance and stability of software analytics; and (3) the out-of- sample bootstrap validation technique produces a good balance between bias and variance of performance estimates. Our results lead us to conclude that the experimental components of analytics modelling impact the predictions and associated insights that are derived from software analytics. Empirical investigations on the impact of overlooked experimental components are needed to derive practical guidelines for analytics modelling.
Software Quality Assurance (SQA) teams play a critical role in the software development process to ensure the absence of software defects. It is not feasible to perform exhaustive SQA tasks (i.e., software testing and code review) on a large software product given the limited SQA resources that are available. Thus, the prioritization of SQA efforts is an essential step in all SQA efforts. Defect prediction models are used to prioritize risky software modules and understand the impact of software metrics on the defect-proneness of software modules. The predictions and insights that are derived from defect prediction models can help software teams allocate their limited SQA resources to the modules that are most likely to be defective and avoid common past pitfalls that are associated with the defective modules of the past. However, the predictions and insights that are derived from defect prediction models may be inaccurate and unreliable if practitioners do not control for the impact of experimental components (e.g., datasets, metrics, and classifiers) on defect prediction models, which could lead to erroneous decision-making in practice. In this thesis, we investigate the impact of experimental components on the performance and interpretation of defect prediction models. More specifically, we investigate the impact of the three often overlooked experimental components (i.e., issue report mislabelling, parameter optimization of classification techniques, and model validation techniques) have on defect prediction models. Through case studies of systems that span both proprietary and open-source domains, we demonstrate that (1) issue report mislabelling does not impact the precision of defect prediction models, suggesting that researchers can rely on the predictions of defect prediction models that were trained using noisy defect datasets; (2) automated parameter optimization for classification techniques substantially improve the performance and stability of defect prediction models, as well as they change their interpretation, suggesting that researchers should no longer shy from applying parameter optimization to their models; and (3) the out-of-sample bootstrap validation technique produces a good balance between bias and variance of performance estimates, suggesting that the single holdout and cross-validation families that are commonly-used nowadays should be avoided.
Developers often wonder how to implement a certain functionality
(e.g., how to parse XML files) using APIs. Obtaining
an API usage sequence based on an API-related natural
language query is very helpful in this regard. Given a query,
existing approaches utilize information retrieval models to
search for matching API sequences. These approaches treat
queries and APIs as bags-of-words and lack a deep understanding
of the semantics of the query.
We propose DeepAPI, a deep learning based approach to
generate API usage sequences for a given natural language
query. Instead of a bag-of-words assumption, it learns the
sequence of words in a query and the sequence of associated
APIs. DeepAPI adapts a neural language model named
RNN Encoder-Decoder. It encodes a word sequence (user
query) into a fixed-length context vector, and generates an
API sequence based on the context vector. We also augment
the RNN Encoder-Decoder by considering the importance
of individual APIs. We empirically evaluate our approach
with more than 7 million annotated code snippets collected
from GitHub. The results show that our approach generates
largely accurate API sequences and outperforms the related
approaches.
Defect prediction models help software quality assurance teams to effectively allocate their limited resources to the most defect-prone software modules. Model validation techniques, such as k-fold cross-validation, use this historical data to estimate how well a model will perform in the future. However, little is known about how accurate the performance estimates of these model validation techniques tend to be. In this paper, we set out to investigate the bias and variance of model validation techniques in the domain of defect prediction. A preliminary analysis of 101 publicly available defect prediction datasets suggests that 77% of them are highly susceptible to producing unstable results. Hence, selecting an appropriate model validation technique is a critical experimental design choice. Based on an analysis of 256 studies in the defect prediction literature, we select the 12 most commonly adopted model validation techniques for evaluation. Through a case study of data from 18 systems that span both open-source and proprietary domains, we derive the following practical guidelines for future defect prediction studies: (1) the single holdout validation techniques should be avoided; and (2) researchers should use the out-of-sample bootstrap validation technique instead of holdout or the commonly-used cross-validation techniques.
Defect, defect, defect: PROMISE 2012 Keynote Sung Kim
Ā
Software prediction leveraging repositories has received a tremendous amount of attention within the software engineering community, including PROMISE. In this talk, I will first present great achievements in defect prediction research including new defect prediction features, promising algorithms, and interesting analysis results. However, there are still many challenges in defect prediction. I will talk about them and discuss potential solutions for them leveraging prediction 2.0.
Software analytics (for software quality purpose) is a statistical or machine learning classifier that is trained to identify defect-prone software modules. The goal of software analytics is to help software engineers prioritize their software testing effort on the most-risky modules and understand past pitfalls that lead to defective code. While the adoption of software analytics enables software organizations to distil actionable insights, there are still many barriers to broad and successful adoption of such analytics systems. Indeed, even if software organizations can access such invaluable software artifacts and toolkits for data analytics, researchers and practitioners often have little knowledge to properly develop analytics systems. Thus, the accuracy of the predictions and the insights that are derived from analytics systems is one of the most important challenges of data science in software engineering.
In this work, we conduct a series of empirical investigation to better understand the impact of experimental components (i.e., class mislabelling, parameter optimization of classification techniques, and model validation techniques) on the performance and interpretation of software analytics. To accelerate a large amount of compute-intensive experiment, we leverage the High-Performance-Computing (HPC) resources of Centre for Advanced Computing (CAC) from Queenās University, Canada. Through case studies of systems that span both proprietary and open- source domains, we demonstrate that (1) realistic noise does not impact the precision of software analytics; (2) automated parameter optimization for classification techniques substantially improve the performance and stability of software analytics; and (3) the out-of- sample bootstrap validation technique produces a good balance between bias and variance of performance estimates. Our results lead us to conclude that the experimental components of analytics modelling impact the predictions and associated insights that are derived from software analytics. Empirical investigations on the impact of overlooked experimental components are needed to derive practical guidelines for analytics modelling.
Software Quality Assurance (SQA) teams play a critical role in the software development process to ensure the absence of software defects. It is not feasible to perform exhaustive SQA tasks (i.e., software testing and code review) on a large software product given the limited SQA resources that are available. Thus, the prioritization of SQA efforts is an essential step in all SQA efforts. Defect prediction models are used to prioritize risky software modules and understand the impact of software metrics on the defect-proneness of software modules. The predictions and insights that are derived from defect prediction models can help software teams allocate their limited SQA resources to the modules that are most likely to be defective and avoid common past pitfalls that are associated with the defective modules of the past. However, the predictions and insights that are derived from defect prediction models may be inaccurate and unreliable if practitioners do not control for the impact of experimental components (e.g., datasets, metrics, and classifiers) on defect prediction models, which could lead to erroneous decision-making in practice. In this thesis, we investigate the impact of experimental components on the performance and interpretation of defect prediction models. More specifically, we investigate the impact of the three often overlooked experimental components (i.e., issue report mislabelling, parameter optimization of classification techniques, and model validation techniques) have on defect prediction models. Through case studies of systems that span both proprietary and open-source domains, we demonstrate that (1) issue report mislabelling does not impact the precision of defect prediction models, suggesting that researchers can rely on the predictions of defect prediction models that were trained using noisy defect datasets; (2) automated parameter optimization for classification techniques substantially improve the performance and stability of defect prediction models, as well as they change their interpretation, suggesting that researchers should no longer shy from applying parameter optimization to their models; and (3) the out-of-sample bootstrap validation technique produces a good balance between bias and variance of performance estimates, suggesting that the single holdout and cross-validation families that are commonly-used nowadays should be avoided.
Developers often wonder how to implement a certain functionality
(e.g., how to parse XML files) using APIs. Obtaining
an API usage sequence based on an API-related natural
language query is very helpful in this regard. Given a query,
existing approaches utilize information retrieval models to
search for matching API sequences. These approaches treat
queries and APIs as bags-of-words and lack a deep understanding
of the semantics of the query.
We propose DeepAPI, a deep learning based approach to
generate API usage sequences for a given natural language
query. Instead of a bag-of-words assumption, it learns the
sequence of words in a query and the sequence of associated
APIs. DeepAPI adapts a neural language model named
RNN Encoder-Decoder. It encodes a word sequence (user
query) into a fixed-length context vector, and generates an
API sequence based on the context vector. We also augment
the RNN Encoder-Decoder by considering the importance
of individual APIs. We empirically evaluate our approach
with more than 7 million annotated code snippets collected
from GitHub. The results show that our approach generates
largely accurate API sequences and outperforms the related
approaches.
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
Ā
Yida's presentation at MSR 2015!
AbstractāDevelopers expend significant effort on reviewing source code changes, hence the comprehensibility of code changes directly affects development productivity. Our prior study has suggested that composite code changes, which mix multiple development issues together, are typically difficult to review. Unfortunately, our manual inspection of 453 open source code changes reveals a non-trivial occurrence (up to 29%) of such composite changes.
In this paper, we propose a heuristic-based approach to automatically partition composite changes, such that each sub-change in the partition is more cohesive and self-contained. Our quantitative and qualitative evaluation results are promising in demonstrating the potential benefits of our approach for facilitating code review of composite code changes.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Ā
Clients donāt know what they donāt know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clientsā needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
Ā
Yida's presentation at MSR 2015!
AbstractāDevelopers expend significant effort on reviewing source code changes, hence the comprehensibility of code changes directly affects development productivity. Our prior study has suggested that composite code changes, which mix multiple development issues together, are typically difficult to review. Unfortunately, our manual inspection of 453 open source code changes reveals a non-trivial occurrence (up to 29%) of such composite changes.
In this paper, we propose a heuristic-based approach to automatically partition composite changes, such that each sub-change in the partition is more cohesive and self-contained. Our quantitative and qualitative evaluation results are promising in demonstrating the potential benefits of our approach for facilitating code review of composite code changes.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Ā
Clients donāt know what they donāt know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clientsā needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
Ā
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Ā
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
Ā
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Ā
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
Ā
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties ā USA
Expansion of bot farms ā how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks ā Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Ā
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navyās DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATOās (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
Ā
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Ā
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Ā
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Securing your Kubernetes cluster_ a step-by-step guide to success !
Ā
Dealing with Noise in Defect Prediction
1. Dealing with
Noise
in bug prediction
Sunghun Kim, Hongyu Zhang,
Rongxin Wu and Liang Gong
The Hong Kong University of Science & Technology
Tsinghua University
3. Where are the bugs?
Complex ļ¬les!
[Menzies et al.]
2
4. Where are the bugs?
Modiļ¬ed ļ¬les!
Complex ļ¬les! [Nagappan et al.]
[Menzies et al.]
2
5. Where are the bugs?
Modiļ¬ed ļ¬les!
Complex ļ¬les! [Nagappan et al.]
[Menzies et al.]
Nearby other bugs!
[Zimmermann et al.]
2
6. Where are the bugs?
Modiļ¬ed ļ¬les!
Complex ļ¬les! [Nagappan et al.]
[Menzies et al.]
Nearby other bugs! Previously ļ¬xed ļ¬les
[Zimmermann et al.] [Hassan et al.]
2
9. Prediction model
training instances
(features+ labels)
?
Learner
3
10. Prediction model
training instances
(features+ labels)
?
Learner
3
11. Prediction model
training instances
(features+ labels)
?
Learner Prediction
3
12. Prediction model
training instances
(features+ labels)
?
Learner Prediction
3
13. Training on software evolution is key
ā¢ Software features can be used to predict bugs
ā¢ Defect labels obtained from software evolution
ā¢ Supervised learning algorithms
Version Bug
Archive Database
4
14. Change classiļ¬cation
5
Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
15. Change classiļ¬cation
bug-introducing (ābadā)
X X X X
5
Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
16. Change classiļ¬cation
BUILD A LEARNER
bug-introducing (ābadā)
X X X X
5
Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
17. Change classiļ¬cation
BUILD A LEARNER
bug-introducing (ābadā)
X X X X
new change
5
Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
18. Change classiļ¬cation
BUILD A LEARNER
bug-introducing (ābadā)
X X X X
new change
PREDICT QUALITY
5
Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
20. Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
commit
ļ¬xed bugs Bf
commit
commit
commit
7 Bird et al. āFair and Balanced? Bias in Bug-Fix Datasets,ā FSE2009
21. Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
commit
ļ¬xed bugs Bf
commit
commit
commit
linked via log messages
7 Bird et al. āFair and Balanced? Bias in Bug-Fix Datasets,ā FSE2009
22. Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
commit
ļ¬xed bugs Bf
commit
linked ļ¬xed bugs Bļ¬
commit
commit
linked via log messages
7 Bird et al. āFair and Balanced? Bias in Bug-Fix Datasets,ā FSE2009
23. Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
commit
ļ¬xed bugs Bf
commit
linked ļ¬xes Cļ¬ linked ļ¬xed bugs Bļ¬
commit
commit
linked via log messages
7 Bird et al. āFair and Balanced? Bias in Bug-Fix Datasets,ā FSE2009
24. Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
related,
commit
but not linked ļ¬xed bugs Bf
commit
linked ļ¬xes Cļ¬ linked ļ¬xed bugs Bļ¬
commit
commit
linked via log messages
7 Bird et al. āFair and Balanced? Bias in Bug-Fix Datasets,ā FSE2009
25. Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
bug ļ¬xes Cf related,
commit
but not linked ļ¬xed bugs Bf
commit
linked ļ¬xes Cļ¬ linked ļ¬xed bugs Bļ¬
commit
commit
linked via log messages
7 Bird et al. āFair and Balanced? Bias in Bug-Fix Datasets,ā FSE2009
26. Source Repository Bug Database
all commits C
oise!
N all bugs B
commit
commit commit
commit commit
bug ļ¬xes Cf related,
commit
but not linked ļ¬xed bugs Bf
commit
linked ļ¬xes Cļ¬ linked ļ¬xed bugs Bļ¬
commit
commit
linked via log messages
7 Bird et al. āFair and Balanced? Bias in Bug-Fix Datasets,ā FSE2009
27. Effect of training on
superbiased data (Severity)
Trained on all bugs
Trained on biased data1
Trained on biased data2
0% 20% 40% 60% 80% 100%
Bug Recall
8 Bird et al. āFair and Balanced? Bias in Bug-Fix Datasets,ā FSE2009
28. Effect of training on
superbiased data (Severity)
Trained on all bugs
Trained on biased data1
Trained on biased data2
0% 20% 40% 60% 80% 100%
Bug Recall
9 Bird et al. āFair and Balanced? Bias in Bug-Fix Datasets,ā FSE2009
29. Effect of training on
superbiased data (Severity)
Trained on all bugs
Trained on biased data1
Trained on biased data2
0% 20% 40% 60% 80% 100%
Bug Recall
10 Bird et al. āFair and Balanced? Bias in Bug-Fix Datasets,ā FSE2009
30. Effect of training on
superbiased data (Severity)
Bias in bug severityon all bugs
Trained
affects BugCache on biased data1
Trained
Trained on biased data2
0% 20% 40% 60% 80% 100%
Bug Recall
10 Bird et al. āFair and Balanced? Bias in Bug-Fix Datasets,ā FSE2009
32. Study questions
ā¢ Q1: How resistant a defect prediction
model is to noise?
ā¢ Q2: How much noise could be detected/
removed?
ā¢ Q3: Could we remove noise to improve
defect prediction performance?
12
53. Q1: How resistant a defect
prediction model is to noise?
$"
!#,"
!#+"
!#*"
!"##$%&'()*+",)
-./"
!#)"
!#(" 01234"
!#'" 5673829"
!#&"
:;7<=>1"
!#%"
-;9?92"
!#$"
!"
!" !#$" !#%" !#&" !#'" !#(" !#)"
-./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
25
54. Q1: How resistant a defect
prediction model is to noise?
$"
!#,"
!#+"
!#*"
!"##$%&'()*+",)
-./"
!#)"
!#(" 01234"
!#'" 5673829"
!#&"
:;7<=>1"
!#%"
-;9?92"
!#$"
!"
!" !#$" !#%" !#&" !#'" !#(" !#)"
-./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
26
55. Q1: How resistant a defect
prediction model is to noise?
$"
!#,"
!#+"
!#*"
!"##$%&'()*+",)
-./"
!#)"
!#(" 01234"
!#'" 5673829"
!#&"
!#%"
!#$"
20~30% :;7<=>1"
-;9?92"
!"
!" !#$" !#%" !#&" !#'" !#(" !#)"
-./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
26
56. Study questions
ā¢ Q1: How resistant a defect prediction
model is to noise?
ā¢ Q2: How much noise could be detected/
removed?
ā¢ Q3: Could we remove noise to improve
defect prediction performance?
27
57. Detecting noise
1 Removing
buggy labels False negative noise
Original training
2 Adding
buggy labels
False positive noise
28
58. Detecting noise
False negative noise
Original training
False positive noise
29
59. 30
ts. However, it is very hard to get a golden set. In our approach,
e carefully select high quality datasets and assume them the
lden sets. We then add FPs and FNs intentionally to create a
False positive noise
ise set. To add FPs and FNs, we randomly selects instances in a
lden set and artificially change their labels from buggy to clean
from clean to buggy, inspired by experiments in [4].
Original training
?
noise
Clean False negative noise
Detecting noise
F igure 4. C reating biased training set
make FN data sets (for RQ1), we randomly select n% buggy
60. return Aj
Closest 9. Thenoise identiļ¬cation algorit
F igure
list pseudo-code of the C LN I
A
31
63. Study questions
ā¢ Q1: How resistant a defect prediction
model is to noise?
ā¢ Q2: How much noise could be detected/
removed?
Q3: Could we remove noise to improve
defect prediction performance?
34
66. Bug prediction using cleaned data
Noisey Cleaned
100
75
SWT F-measure
50
25
76%
F-measure
with 45% noise
0
0% 15% 30% 45%
36 Noise level
67. Study limitations
ā¢ All datasets are collected from open source
projects
ā¢ The golden set used in this paper may not be
perfect
ā¢ The noisy data simulations may not reļ¬ect
the actual noise patterns in practice
37
68. Summary
ā¢ Prediction models (used in our experiments)
are resistant (up to 20~30%) of noise
ā¢ Noise detection is promising
ā¢ Future work
- Building oracle defect sets
- Improving noise detection algorithms
- Applying to more defect prediction models
(regression, bugcache)
38