SlideShare a Scribd company logo
1 of 74
The Impact of Mislabelling on the
Performance and Interpretation
of Defect Prediction Models
Chakkrit (Kla)

Tantithamthavorn
Shane McIntosh Ahmed E. Hassan Akinori Ihara Kenichi Matsumoto
@klainfo kla@chakkrit.com
Software defects are costly
2
Software defects are costly
Monetary
NIST estimates that software defects cost
the US economy $59.5 billion per year!
2
Software defects are costly
Reputation
The Obama administration will always
be connected to healthcare.gov
Monetary
NIST estimates that software defects cost
the US economy $59.5 billion per year!
2
SQA teams try to find defects 

before they escape to the field
3
SQA teams have limited resources
4
Limited

QA Resources
Software continues to grow 

in size and complexity
5
Defect prediction models help

SQA teams to
5
Defect prediction models help

SQA teams to
Predict

what are risky modules
5
Defect prediction models help

SQA teams to
Predict

what are risky modules
Understand 

what makes software fail
Modules that are fixed during post-release
development are set as defective
6
Changes
Release Date
Issues
Post-Release
Snapshot at the release date
Defect Dataset
Modules that are fixed during post-release
development are set as defective
6
Changes
Release Date
Issues
Post-Release Module 1
Module 2
Module 3
Module 4
Snapshot at the release date
Defect Dataset
Modules that are fixed during post-release
development are set as defective
6
Changes
Release Date
Issues
Post-Release Module 1
Module 2
Module 3
Module 4
Snapshot at the release date
Defect Dataset
Modules that are fixed during post-release
development are set as defective
6
Changes
Release Date
Issues
Post-Release Module 1
Module 2
Module 3
Module 4
Bug Report#1
Snapshot at the release date
Defect Dataset
Modules that are fixed during post-release
development are set as defective
6
Changes
Release Date
Issues
Post-Release Module 1
Module 2
Module 3
Module 4
Fixed

Module1, Module2
Bug Report#1
Snapshot at the release date
Defect Dataset
Modules that are fixed during post-release
development are set as defective
6
Changes
Release Date
Issues
Post-Release Module 1
Module 2
Module 3
Module 4
Fixed

Module1, Module2
Bug Report#1
Snapshot at the release date
Label as
Defective
Defect Dataset
Modules that are fixed during post-release
development are set as defective
6
Changes
Release Date
Issues
Post-Release Module 1
Module 2
Module 3
Module 4
Fixed

Module1, Module2
Bug Report#1
Snapshot at the release date
Label as
Clean
Label as
Defective
Defect Dataset
Defect models are trained 

using Machine Learning
7
Module 1
Module 2
Module 3
Module 4
Defect Dataset
Defect models are trained 

using Machine Learning
7
Module 1
Module 2
Module 3
Module 4
Defect Dataset
Machine Learning or

Statistical Learning
Defect 

model
Defect data are noisy
The reliability of the models depends
on the quality of the training data
8
Module 1
Module 2
Module 3
Module 4
Defect Dataset
Machine Learning or

Statistical Learning
Defect 

modelNOISY
Unreliable
Issue reports are mislabelled
9
Fixed

Module1, Module2
Bug Report#1
Fields in issue tracking
systems are often missing 

or incorrect.
[Aranda et al., ICSE 2009]
Actual Classify Meaning
Defect

Mislabelling
A new feature could
be incorrectly
labeled as a bug
Non-Defect

Mislabelling
A bug could be
mislabelled as a
new feature
Issue reports are mislabelled
10
Fixed

Module1, Module2
Bug Report#1
43% of issue reports
are mislabelled.
[Herzig et al., ICSE 2013]
[Antoniol et al., CASCON 2008]
Fields in issue tracking
systems are often missing 

or incorrect.
[Aranda et al., ICSE 2009]
Actual Classify Meaning
Defect

Mislabelling
A new feature could
be incorrectly
labeled as a bug
Non-Defect

Mislabelling
A bug could be
mislabelled as a
new feature
Issue reports are mislabelled
11
Fixed

Module1, Module2
Bug Report#1
43% of issue reports
are mislabelled.
[Herzig et al., ICSE 2013]
[Antoniol et al., CASCON 2008]
Fields in issue tracking
systems are often missing 

or incorrect.
[Aranda et al., ICSE 2009]
Actual Classify Meaning
Defect

Mislabelling
A new feature could
be incorrectly
labeled as a bug
Non-Defect

Mislabelling
A bug could be
mislabelled as a
new feature
Then, modules are mislabelled
12
#1
Actual Classify Meaning
Defect

Mislabelling
A new feature could
be incorrectly
labeled as a bug
M1
M2
M4
NOISY DATA
M1,M2
M3
#1
Then, modules are mislabelled
12
#1
Actual Classify Meaning
Defect

Mislabelling
A new feature could
be incorrectly
labeled as a bug
M1
M2
M4
NOISY DATA
M1,M2
#2
M3
M3
#1
Then, modules are mislabelled
12
#1
Actual Classify Meaning
Defect

Mislabelling
A new feature could
be incorrectly
labeled as a bug
M1
M2
M4
NOISY DATA
M1,M2
#2
M3
#2 is mislabelled.
M3
#1
Then, modules are mislabelled
12
#1
Actual Classify Meaning
Defect

Mislabelling
A new feature could
be incorrectly
labeled as a bug
M1
M2
M4
NOISY DATA
M1,M2
#2
M3
#2 is mislabelled.
M3M3
M3 should be 

a clean module
#2
#1
13
Mislabelling may impact the
performance
Prior works assumed that
mislabelling is random
[Kim et al., ICSE 2011] and 

[Seiffert et al., Information Science 2014]
Random mislabelling
has a negative impact 

on the performance.
14
Mislabelling is likely non-random
We suspect that novice developers are likely to
mislabel more than experienced developers.
Novice developers
are known to overlook
the bookkeeping issue
[Bachmann et al., FSE 2010]
(RQ1) The Nature
of Mislabelling
The impact of realistic mislabelling on the
performance and interpretation of defect models
15
(RQ1) The Nature
of Mislabelling
The impact of realistic mislabelling on the
performance and interpretation of defect models
(RQ3) Its Impact 

on the Interpretation
Defect 

model
(RQ2) Its Impact 

on the Performance
15
Using prediction models to classify

whether issue reports are mislabelled
16
Prediction 

Model
Using prediction models to classify

whether issue reports are mislabelled
16
Prediction 

Model
Mislabelling is predictable
Performs 

Well
Using prediction models to classify

whether issue reports are mislabelled
16
Prediction 

Model
Mislabelling is random
Performs 

Poorly
Mislabelling is predictable
Performs 

Well
Selecting our studied systems
17
Manually-curated
issue reports
[Herzig et al., ICSE 2013]
Jackrabbit Lucene
0.78
0.12
0.64
0.50
0.70
0.19
0.75
0.12
0.71
0.50
0.73
0.19
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Precision Recall F−Measure Precision Recall F−Measure
PerformanceValue
Our Model Random Guessing
Jackrabbit
0.78
0.64
0.70
0.6
0.7
0.8
0.9
1.0eValue
Our Model
Mislabelling is non-random
Jackrabbit Luc
0.78
0.70
0.75
0.71
0.8
0.9
1.0
lue
Our Model Random Guess
Jackrabbit Lucene
0.78 0.75 0.0.8
0.9
1.0
e
Our Model Random Guessing
Jackrabbit Lu
0.78
0.70
0.75
0.7
0.7
0.8
0.9
1.0
alue
Our Model Random Gues
18
Jackrabbit Lucene
0.78
0.12
0.64
0.50
0.70
0.19
0.75
0.12
0.71
0.50
0.73
0.19
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Precision Recall F−Measure Precision Recall F−Measure
PerformanceValue
Our Model Random Guessing
Jackrabbit
0.78
0.64
0.70
0.6
0.7
0.8
0.9
1.0eValue
Our Model
Mislabelling is non-random
Jackrabbit Luc
0.78
0.70
0.75
0.71
0.8
0.9
1.0
lue
Our Model Random Guess
Jackrabbit Lucene
0.78 0.75 0.0.8
0.9
1.0
e
Our Model Random Guessing
Jackrabbit Lu
0.78
0.70
0.75
0.7
0.7
0.8
0.9
1.0
alue
Our Model Random Gues
18
Jackrabbit Lucene
0.78
0.12
0.64
0.50
0.70
0.19
0.75
0.12
0.71
0.50
0.73
0.19
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Precision Recall F−Measure Precision Recall F−Measure
PerformanceValue
Our Model Random Guessing
Jackrabbit
0.78
0.64
0.70
0.6
0.7
0.8
0.9
1.0eValue
Our Model
Mislabelling is non-random
Jackrabbit Luc
0.78
0.70
0.75
0.71
0.8
0.9
1.0
lue
Our Model Random Guess
Jackrabbit Lucene
0.78 0.75 0.0.8
0.9
1.0
e
Our Model Random Guessing
Jackrabbit Lu
0.78
0.70
0.75
0.7
0.7
0.8
0.9
1.0
alue
Our Model Random Gues
19
Jackrabbit Lucene
0.78
0.12
0.64
0.50
0.70
0.19
0.75
0.12
0.71
0.50
0.73
0.19
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Precision Recall F−Measure Precision Recall F−Measure
PerformanceValue
Our Model Random Guessing
Jackrabbit
0.78
0.64
0.70
0.6
0.7
0.8
0.9
1.0eValue
Our Model
Mislabelling is non-random
Jackrabbit Luc
0.78
0.70
0.75
0.71
0.8
0.9
1.0
lue
Our Model Random Guess
Jackrabbit Lucene
0.78 0.75 0.0.8
0.9
1.0
e
Our Model Random Guessing
Jackrabbit Lu
0.78
0.70
0.75
0.7
0.7
0.8
0.9
1.0
alue
Our Model Random Gues
20
Jackrabbit Lucene
0.78
0.12
0.64
0.50
0.70
0.19
0.75
0.12
0.71
0.50
0.73
0.19
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Precision Recall F−Measure Precision Recall F−Measure
PerformanceValue
Our Model Random Guessing
Jackrabbit
0.78
0.64
0.70
0.6
0.7
0.8
0.9
1.0eValue
Our Model
Mislabelling is non-random
Jackrabbit Luc
0.78
0.70
0.75
0.71
0.8
0.9
1.0
lue
Our Model Random Guess
Jackrabbit Lucene
0.78 0.75 0.0.8
0.9
1.0
e
Our Model Random Guessing
Jackrabbit Lu
0.78
0.70
0.75
0.7
0.7
0.8
0.9
1.0
alue
Our Model Random Gues
Our models achieve a mean of F-measure up to 0.73,
which is 4-34 times better than random guessing. 20
(RQ1) The Nature
of Mislabelling
The impact of realistic mislabelling on the
performance and interpretation of defect models
21
(RQ1) The Nature
of Mislabelling
The impact of realistic mislabelling on the
performance and interpretation of defect models
21
Mislabelling is
non-random
(RQ1) The Nature
of Mislabelling
The impact of realistic mislabelling on the
performance and interpretation of defect models
(RQ2) The Impact 

on the Performance
21
Mislabelling is
non-random
22
Compare the performance between
clean models and noisy models
Clean 

Performance
Realistic Noisy

Performance
Random Noisy

Performance
VS VS
Generating three samples
23
Clean
M1
M2
M3
M4
(Oracle)

#2 is mislabelled
Clean

Sample
#2
#1
Generating three samples
24
Add Noise
M1
M2
M3
M4
Clean
M1
M2
M3
M4
(Oracle)

#2 is mislabelled
#2
#1
Clean

Sample
Realistic 

Noisy

Sample
Realistically flip the
modules’ label that
are addressed by the
mislabelled issue
reports.
Generating three samples
24
Add Noise
M1
M2
M3
M4
Clean
M1
M2
M3
M4
(Oracle)

#2 is mislabelled
#2
#1
Clean

Sample
Realistic 

Noisy

Sample
Realistically flip the
modules’ label that
are addressed by the
mislabelled issue
reports.
Generating three samples
25
Add Noise
M1
M2
M3
M4
Random 

Noisy

Sample
M1
M2
M3
M4
Add Noise
Clean
M1
M2
M3
M4
(Oracle)

#2 is mislabelled
#2
#1
Clean

Sample
Realistic 

Noisy

Sample
Randomly flip the
module’s label
Realistically flip the
modules’ label that
are addressed by the
mislabelled issue
reports.
Generating three samples
25
Add Noise
M1
M2
M3
M4
Random 

Noisy

Sample
M1
M2
M3
M4
Add Noise
Clean
M1
M2
M3
M4
(Oracle)

#2 is mislabelled
#2
#1
Clean

Sample
Realistic 

Noisy

Sample
26
Clean 

Performance
Realistic Noisy

Performance
Random Noisy

Performance
Clean 

Sample
Realistic Noisy

Sample
Random Noisy

Sample
VS VS
Defect 

model
Defect 

model
Defect 

model
Generate the performance of

clean models and noisy models
26
Clean 

Performance
Realistic Noisy

Performance
Random Noisy

Performance
Clean 

Sample
Realistic Noisy

Sample
Random Noisy

Sample
VS VS
Defect 

model
Defect 

model
Defect 

model
Performance 

Ratio
=
Performance of Realistic Noisy Model
Performance of Clean Model
Generate the performance of

clean models and noisy models
While the recall is often impacted,
the precision is rarely impacted.
27
= Realistic Noisy
Clean
Interpretation:

Ratio = 1 means there is no impact.
Precision Recall
1.00.50.02.01.5
Ratio
While the recall is often impacted,
the precision is rarely impacted.
27
= Realistic Noisy
Clean
Interpretation:

Ratio = 1 means there is no impact.
Precision is rarely impactedby realistic mislabelling.
Precision Recall
1.00.50.02.01.5
Ratio
While the recall is often impacted,
the precision is rarely impacted.
27
= Realistic Noisy
Clean
Interpretation:

Ratio = 1 means there is no impact.
Models trained on noisy data
achieve 56% of the recall of
models trained on clean
data.
Precision is rarely impactedby realistic mislabelling.
Precision Recall
1.00.50.02.01.5
Ratio
(RQ1) The Nature
of Mislabelling
The impact of realistic mislabelling on the
performance and interpretation of defect models
(RQ2) The Impact 

on the Performance
28
Mislabelling is
non-random
(RQ1) The Nature
of Mislabelling
The impact of realistic mislabelling on the
performance and interpretation of defect models
(RQ2) The Impact 

on the Performance
28
Mislabelling is
non-random
While the recall is
often impacted,
the precision is
rarely impacted
(RQ1) The Nature
of Mislabelling
The impact of realistic mislabelling on the
performance and interpretation of defect models
(RQ3) The Impact 

on the Interpretation
Defect 

model
(RQ2) The Impact 

on the Performance
28
Mislabelling is
non-random
While the recall is
often impacted,
the precision is
rarely impacted
29
Generate the rank of metrics of

clean models and noisy models
Clean 

model
Realistic
noisy
model
Variable Importance

Scores
Variable Importance

Scores
Variable Importance

Scores
Random
noisy
model
30
Generate the rank of metrics of

clean models and noisy models
Clean 

model
Realistic
noisy
model
Variable Importance

Scores
Variable Importance

Scores
Variable Importance

Scores
Rank of metrics
Ranking Ranking Ranking
Rank of metrics Rank of metrics
Random
noisy
model
31
2 1 3
Clean Model
Rank of metrics 

of the clean model
Whether a metric of the clean model appears
at the same rank in the noisy models?
31
2 1 3
Clean Model
Rank of metrics 

of the clean model
Noisy Model
2 1 3
?
Rank of metrics 

of the noisy model
Whether a metric of the clean model appears
at the same rank in the noisy models?
32
Only the metrics in the 1st
rank are
robust to the mislabelling
2 1 3
Clean Model Noisy Model
2 1 3
85% of the metrics in the 1st rank of the clean model 

also appear in the 1st rank of the noisy model.
33
Conversely, the metrics in the 

2nd
and 3rd
ranks are less stable
2 1 3
Clean Model Noisy Model
2 1 3
As little as 18% of the metrics in the 2nd and 3rd rank of the
clean models appear in the same rank in the noisy models
(RQ1) The Nature
of Mislabelling
The impact of realistic mislabelling on the
performance and interpretation of defect models
(RQ3) The Impact 

on the Interpretation
Defect 

model
(RQ2) The Impact 

on the Performance
34
Mislabelling is
non-random
While the recall is
often impacted,
the precision is
rarely impacted
(RQ1) The Nature
of Mislabelling
The impact of realistic mislabelling on the
performance and interpretation of defect models
(RQ3) The Impact 

on the Interpretation
Defect 

model
(RQ2) The Impact 

on the Performance
34
Mislabelling is
non-random
While the recall is
often impacted,
the precision is
rarely impacted
Only top-rank
metrics are
robust to the
mislabelling
(RQ1) The Nature
of Mislabelling
Suggestions
(RQ3) The Impact 

on the Interpretation
Defect 

model
(RQ2) The Impact 

on the Performance
35
(RQ1) The Nature
of Mislabelling
Suggestions
(RQ3) The Impact 

on the Interpretation
Defect 

model
(RQ2) The Impact 

on the Performance
35
Researchers can
use our noise
models to clean
mislabelled issue
reports
(RQ1) The Nature
of Mislabelling
Suggestions
(RQ3) The Impact 

on the Interpretation
Defect 

model
(RQ2) The Impact 

on the Performance
35
Researchers can
use our noise
models to clean
mislabelled issue
reports
Cleaning data will
improve the
ability to identify
defective modules
(RQ1) The Nature
of Mislabelling
Suggestions
(RQ3) The Impact 

on the Interpretation
Defect 

model
(RQ2) The Impact 

on the Performance
35
Researchers can
use our noise
models to clean
mislabelled issue
reports
Cleaning data will
improve the
ability to identify
defective modules
Quality
improvement plan
should be made
based on the top
rank metrics
36
Issue reports are mislabelled
37
Fixed

Module1, Module2
Bug Report#1
43% of issue reports
are mislabelled.
[Herzig et al., ICSE 2013]
[Antoniol et al., CASCON 2008]
Fields in issue tracking
systems are often missing 

or incorrect.
[Aranda et al., ICSE 2009]
Actual Classify Meaning
Defect

Mislabelling
A new feature could
be incorrectly
labeled as a bug
Non-Defect

Mislabelling
A bug could be
mislabelled as a
new feature
38
Mislabelling may impact the
performance
Prior works assumed that
mislabelling is random
[Kim et al., ICSE 2011] and 

[Seiffert et al., Information Science 2014]
Random mislabelling
has a negative impact 

on the performance.
(RQ1) The Nature
of Mislabelling
Findings
(RQ3) The Impact 

on the Interpretation
Defect 

model
(RQ2) The Impact 

on the Performance
39
Mislabelling is
non-random
While the recall is
often impacted,
the precision is
rarely impacted.
Only top-rank
metrics are
robust to the
mislabelling
(RQ1) The Nature
of Mislabelling
Suggestions
(RQ3) The Impact 

on the Interpretation
Defect 

model
(RQ2) The Impact 

on the Performance
40
Researchers can
use our noise
models to clean
mislabelled issue
reports
Cleaning data will
improve the
ability to identify
defective modules
Quality
improvement plan
should be made
based on the top
rank metrics
Issue reports are mislabelled
36
Fixed

Module1, Module2
Bug Report#1
43% of issue reports
are mislabelled.
[Herzig et al., ICSE 2013]
[Antoniol et al., CASCON 2008]
Fields in issue tracking
systems are often missing 

or incorrect.
[Aranda et al., ICSE 2009]
Actual Classify Meaning
Defect

Mislabelling
A new feature could
be incorrectly
labeled as a bug
Non-Defect

Mislabelling
A bug could be
mislabelled as a
new feature
@klainfo kla@chakkrit.com
12
Mislabelling may impact the
performance
Prior works assumed that
mislabelling is random
[Kim et al., ICSE 2011] and 

[Seiffert et al., Information Science 2014]
Random mislabelling
has a negative impact 

on the performance.
(RQ1) The Nature
of Mislabelling
Findings
(RQ3) The Impact 

on the Interpretation
Defect 

model
(RQ2) The Impact 

on the Performance
39
Mislabelling is
non-random
While the recall is
often impacted,
the precision is
rarely impacted.
Only top-rank
metrics are
robust to the
mislabelling
(RQ1) The Nature
of Mislabelling
Suggestions
(RQ3) The Impact 

on the Interpretation
Defect 

model
(RQ2) The Impact 

on the Performance
40
Researchers can
use our noise
models to clean
mislabelled issue
reports
Cleaning data will
improve the
ability to identify
defective modules
Quality
improvement plan
should be made
based on the top
rank metrics

More Related Content

What's hot

Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Feng Zhang
 
Defect effort prediction models in software
Defect effort prediction models in softwareDefect effort prediction models in software
Defect effort prediction models in softwareIAEME Publication
 
Software reliability growth model
Software reliability growth modelSoftware reliability growth model
Software reliability growth modelHimanshu
 
The adoption of machine learning techniques for software defect prediction: A...
The adoption of machine learning techniques for software defect prediction: A...The adoption of machine learning techniques for software defect prediction: A...
The adoption of machine learning techniques for software defect prediction: A...RAKESH RANA
 
Defect Prediction: Accomplishments and Future Challenges
Defect Prediction: Accomplishments and Future ChallengesDefect Prediction: Accomplishments and Future Challenges
Defect Prediction: Accomplishments and Future ChallengesYasutaka Kamei
 
Chapter 7 software reliability
Chapter 7 software reliabilityChapter 7 software reliability
Chapter 7 software reliabilitydespicable me
 
Formal Verification of Developer Tests: a Research Agenda Inspired by Mutatio...
Formal Verification of Developer Tests: a Research Agenda Inspired by Mutatio...Formal Verification of Developer Tests: a Research Agenda Inspired by Mutatio...
Formal Verification of Developer Tests: a Research Agenda Inspired by Mutatio...University of Antwerp
 
Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...
Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...
Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...University of Antwerp
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliabilityranapoonam1
 
Software testing strategy
Software testing strategySoftware testing strategy
Software testing strategyijseajournal
 
Data collection for software defect prediction
Data collection for software defect predictionData collection for software defect prediction
Data collection for software defect predictionAmmAr mobark
 
Software Quality Analysis Using Mutation Testing Scheme
Software Quality Analysis Using Mutation Testing SchemeSoftware Quality Analysis Using Mutation Testing Scheme
Software Quality Analysis Using Mutation Testing SchemeEditor IJMTER
 
Test Automation Maturity: A Self-Assessment Tool
Test Automation Maturity: A Self-Assessment ToolTest Automation Maturity: A Self-Assessment Tool
Test Automation Maturity: A Self-Assessment ToolUniversity of Antwerp
 
Using Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorUsing Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorTim Menzies
 
Reproducible Crashes: Fuzzing Pharo by Mutating the Test Methods
Reproducible Crashes: Fuzzing Pharo by Mutating the Test MethodsReproducible Crashes: Fuzzing Pharo by Mutating the Test Methods
Reproducible Crashes: Fuzzing Pharo by Mutating the Test MethodsUniversity of Antwerp
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directionsTao He
 
Parameter Estimation of GOEL-OKUMOTO Model by Comparing ACO with MLE Method
Parameter Estimation of GOEL-OKUMOTO Model by Comparing ACO with MLE MethodParameter Estimation of GOEL-OKUMOTO Model by Comparing ACO with MLE Method
Parameter Estimation of GOEL-OKUMOTO Model by Comparing ACO with MLE MethodIRJET Journal
 
Software testing defect prediction model a practical approach
Software testing defect prediction model   a practical approachSoftware testing defect prediction model   a practical approach
Software testing defect prediction model a practical approacheSAT Journals
 

What's hot (20)

Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
 
Defect effort prediction models in software
Defect effort prediction models in softwareDefect effort prediction models in software
Defect effort prediction models in software
 
O0181397100
O0181397100O0181397100
O0181397100
 
Software reliability growth model
Software reliability growth modelSoftware reliability growth model
Software reliability growth model
 
J034057065
J034057065J034057065
J034057065
 
The adoption of machine learning techniques for software defect prediction: A...
The adoption of machine learning techniques for software defect prediction: A...The adoption of machine learning techniques for software defect prediction: A...
The adoption of machine learning techniques for software defect prediction: A...
 
Defect Prediction: Accomplishments and Future Challenges
Defect Prediction: Accomplishments and Future ChallengesDefect Prediction: Accomplishments and Future Challenges
Defect Prediction: Accomplishments and Future Challenges
 
Chapter 7 software reliability
Chapter 7 software reliabilityChapter 7 software reliability
Chapter 7 software reliability
 
Formal Verification of Developer Tests: a Research Agenda Inspired by Mutatio...
Formal Verification of Developer Tests: a Research Agenda Inspired by Mutatio...Formal Verification of Developer Tests: a Research Agenda Inspired by Mutatio...
Formal Verification of Developer Tests: a Research Agenda Inspired by Mutatio...
 
Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...
Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...
Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to In...
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliability
 
Software testing strategy
Software testing strategySoftware testing strategy
Software testing strategy
 
Data collection for software defect prediction
Data collection for software defect predictionData collection for software defect prediction
Data collection for software defect prediction
 
Software Quality Analysis Using Mutation Testing Scheme
Software Quality Analysis Using Mutation Testing SchemeSoftware Quality Analysis Using Mutation Testing Scheme
Software Quality Analysis Using Mutation Testing Scheme
 
Test Automation Maturity: A Self-Assessment Tool
Test Automation Maturity: A Self-Assessment ToolTest Automation Maturity: A Self-Assessment Tool
Test Automation Maturity: A Self-Assessment Tool
 
Using Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorUsing Developer Information as a Prediction Factor
Using Developer Information as a Prediction Factor
 
Reproducible Crashes: Fuzzing Pharo by Mutating the Test Methods
Reproducible Crashes: Fuzzing Pharo by Mutating the Test MethodsReproducible Crashes: Fuzzing Pharo by Mutating the Test Methods
Reproducible Crashes: Fuzzing Pharo by Mutating the Test Methods
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 
Parameter Estimation of GOEL-OKUMOTO Model by Comparing ACO with MLE Method
Parameter Estimation of GOEL-OKUMOTO Model by Comparing ACO with MLE MethodParameter Estimation of GOEL-OKUMOTO Model by Comparing ACO with MLE Method
Parameter Estimation of GOEL-OKUMOTO Model by Comparing ACO with MLE Method
 
Software testing defect prediction model a practical approach
Software testing defect prediction model   a practical approachSoftware testing defect prediction model   a practical approach
Software testing defect prediction model a practical approach
 

Viewers also liked

Automated parameter optimization should be included in future 
defect predict...
Automated parameter optimization should be included in future 
defect predict...Automated parameter optimization should be included in future 
defect predict...
Automated parameter optimization should be included in future 
defect predict...Chakkrit (Kla) Tantithamthavorn
 
An exploratory study of the state of practice of performance testing in Java-...
An exploratory study of the state of practice of performance testing in Java-...An exploratory study of the state of practice of performance testing in Java-...
An exploratory study of the state of practice of performance testing in Java-...corpaulbezemer
 
Performance Regression Analysis: Accomplishments and Challenges
Performance Regression Analysis: Accomplishments and ChallengesPerformance Regression Analysis: Accomplishments and Challenges
Performance Regression Analysis: Accomplishments and Challengescorpaulbezemer
 
SANER 2015 ERA track: Differential Flame Graphs
SANER 2015 ERA track: Differential Flame GraphsSANER 2015 ERA track: Differential Flame Graphs
SANER 2015 ERA track: Differential Flame Graphscorpaulbezemer
 
Logging library migrations - A case study for the Apache Software Foundation ...
Logging library migrations - A case study for the Apache Software Foundation ...Logging library migrations - A case study for the Apache Software Foundation ...
Logging library migrations - A case study for the Apache Software Foundation ...corpaulbezemer
 
Optimizing the Performance-Related Configurations of Object-Relational Mappin...
Optimizing the Performance-Related Configurations of Object-Relational Mappin...Optimizing the Performance-Related Configurations of Object-Relational Mappin...
Optimizing the Performance-Related Configurations of Object-Relational Mappin...corpaulbezemer
 
Ghotra icse
Ghotra icseGhotra icse
Ghotra icseSAIL_QU
 
An Automated Approach for Recommending When to Stop Performance Tests
An Automated Approach for Recommending When to Stop Performance TestsAn Automated Approach for Recommending When to Stop Performance Tests
An Automated Approach for Recommending When to Stop Performance TestsSAIL_QU
 
深層リカレントニューラルネットワークを用いた日本語述語項構造解析
深層リカレントニューラルネットワークを用いた日本語述語項構造解析深層リカレントニューラルネットワークを用いた日本語述語項構造解析
深層リカレントニューラルネットワークを用いた日本語述語項構造解析Hiroki Ouchi
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkVolker Hirsch
 

Viewers also liked (11)

Automated parameter optimization should be included in future 
defect predict...
Automated parameter optimization should be included in future 
defect predict...Automated parameter optimization should be included in future 
defect predict...
Automated parameter optimization should be included in future 
defect predict...
 
An exploratory study of the state of practice of performance testing in Java-...
An exploratory study of the state of practice of performance testing in Java-...An exploratory study of the state of practice of performance testing in Java-...
An exploratory study of the state of practice of performance testing in Java-...
 
Performance Regression Analysis: Accomplishments and Challenges
Performance Regression Analysis: Accomplishments and ChallengesPerformance Regression Analysis: Accomplishments and Challenges
Performance Regression Analysis: Accomplishments and Challenges
 
SANER 2015 ERA track: Differential Flame Graphs
SANER 2015 ERA track: Differential Flame GraphsSANER 2015 ERA track: Differential Flame Graphs
SANER 2015 ERA track: Differential Flame Graphs
 
Logging library migrations - A case study for the Apache Software Foundation ...
Logging library migrations - A case study for the Apache Software Foundation ...Logging library migrations - A case study for the Apache Software Foundation ...
Logging library migrations - A case study for the Apache Software Foundation ...
 
Optimizing the Performance-Related Configurations of Object-Relational Mappin...
Optimizing the Performance-Related Configurations of Object-Relational Mappin...Optimizing the Performance-Related Configurations of Object-Relational Mappin...
Optimizing the Performance-Related Configurations of Object-Relational Mappin...
 
Ghotra icse
Ghotra icseGhotra icse
Ghotra icse
 
An Automated Approach for Recommending When to Stop Performance Tests
An Automated Approach for Recommending When to Stop Performance TestsAn Automated Approach for Recommending When to Stop Performance Tests
An Automated Approach for Recommending When to Stop Performance Tests
 
深層リカレントニューラルネットワークを用いた日本語述語項構造解析
深層リカレントニューラルネットワークを用いた日本語述語項構造解析深層リカレントニューラルネットワークを用いた日本語述語項構造解析
深層リカレントニューラルネットワークを用いた日本語述語項構造解析
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 

Similar to The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models

Because you can’t fix what you don’t know is broken...
Because you can’t fix what you don’t know is broken...Because you can’t fix what you don’t know is broken...
Because you can’t fix what you don’t know is broken...Marcel Bruch
 
Bypassing Secure Boot using Fault Injection
Bypassing Secure Boot using Fault InjectionBypassing Secure Boot using Fault Injection
Bypassing Secure Boot using Fault InjectionRiscure
 
Fighting Software Inefficiency Through Automated Bug Detection
 Fighting Software Inefficiency Through Automated Bug Detection Fighting Software Inefficiency Through Automated Bug Detection
Fighting Software Inefficiency Through Automated Bug DetectionMd E. Haque
 
How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?Daniel Alencar
 
Icsme14danieletal 150722141344-lva1-app6891
Icsme14danieletal 150722141344-lva1-app6891Icsme14danieletal 150722141344-lva1-app6891
Icsme14danieletal 150722141344-lva1-app6891SAIL_QU
 
How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)
How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)
How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)Andrey Karpov
 
Integration Testing at go-mmt
Integration Testing at go-mmtIntegration Testing at go-mmt
Integration Testing at go-mmtOm Vikram Thapa
 
Nss power point_machine_learning
Nss power point_machine_learningNss power point_machine_learning
Nss power point_machine_learningGauravsd2014
 
Software reliability prediction
Software reliability predictionSoftware reliability prediction
Software reliability predictionMirza Mohymen
 
A Future where we don’t write tests
A Future where we don’t write testsA Future where we don’t write tests
A Future where we don’t write testsFelix Dobslaw
 
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)Giovanni Rosa
 
Taming scary production code that nobody wants to touch
Taming scary production code that nobody wants to touchTaming scary production code that nobody wants to touch
Taming scary production code that nobody wants to touchMike Clement
 
Duplicate Bug Reports Considered Harmful ... Really?
Duplicate Bug Reports Considered Harmful ... Really?Duplicate Bug Reports Considered Harmful ... Really?
Duplicate Bug Reports Considered Harmful ... Really?Nicolas Bettenburg
 
Works For Me! Characterizing Non-Reproducible Bug Reports
Works For Me! Characterizing Non-Reproducible Bug ReportsWorks For Me! Characterizing Non-Reproducible Bug Reports
Works For Me! Characterizing Non-Reproducible Bug ReportsSALT Lab @ UBC
 
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdfLlama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdfDr. Yasir Butt
 
DevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More DefectsDevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More DefectsTechWell
 
Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Massimiliano Di Penta
 
StarWest 2013 Performance is not an afterthought – make it a part of your Agi...
StarWest 2013 Performance is not an afterthought – make it a part of your Agi...StarWest 2013 Performance is not an afterthought – make it a part of your Agi...
StarWest 2013 Performance is not an afterthought – make it a part of your Agi...Andreas Grabner
 

Similar to The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models (20)

BH-US-06-Bilar.pdf
BH-US-06-Bilar.pdfBH-US-06-Bilar.pdf
BH-US-06-Bilar.pdf
 
Because you can’t fix what you don’t know is broken...
Because you can’t fix what you don’t know is broken...Because you can’t fix what you don’t know is broken...
Because you can’t fix what you don’t know is broken...
 
Bypassing Secure Boot using Fault Injection
Bypassing Secure Boot using Fault InjectionBypassing Secure Boot using Fault Injection
Bypassing Secure Boot using Fault Injection
 
Fighting Software Inefficiency Through Automated Bug Detection
 Fighting Software Inefficiency Through Automated Bug Detection Fighting Software Inefficiency Through Automated Bug Detection
Fighting Software Inefficiency Through Automated Bug Detection
 
How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?
 
Icsme14danieletal 150722141344-lva1-app6891
Icsme14danieletal 150722141344-lva1-app6891Icsme14danieletal 150722141344-lva1-app6891
Icsme14danieletal 150722141344-lva1-app6891
 
How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)
How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)
How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)
 
Integration Testing at go-mmt
Integration Testing at go-mmtIntegration Testing at go-mmt
Integration Testing at go-mmt
 
Nss power point_machine_learning
Nss power point_machine_learningNss power point_machine_learning
Nss power point_machine_learning
 
Software reliability prediction
Software reliability predictionSoftware reliability prediction
Software reliability prediction
 
A Future where we don’t write tests
A Future where we don’t write testsA Future where we don’t write tests
A Future where we don’t write tests
 
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
 
Taming scary production code that nobody wants to touch
Taming scary production code that nobody wants to touchTaming scary production code that nobody wants to touch
Taming scary production code that nobody wants to touch
 
Duplicate Bug Reports Considered Harmful ... Really?
Duplicate Bug Reports Considered Harmful ... Really?Duplicate Bug Reports Considered Harmful ... Really?
Duplicate Bug Reports Considered Harmful ... Really?
 
Works For Me! Characterizing Non-Reproducible Bug Reports
Works For Me! Characterizing Non-Reproducible Bug ReportsWorks For Me! Characterizing Non-Reproducible Bug Reports
Works For Me! Characterizing Non-Reproducible Bug Reports
 
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdfLlama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf
 
DevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More DefectsDevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More Defects
 
Ch01-whyTest.pptx
Ch01-whyTest.pptxCh01-whyTest.pptx
Ch01-whyTest.pptx
 
Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?
 
StarWest 2013 Performance is not an afterthought – make it a part of your Agi...
StarWest 2013 Performance is not an afterthought – make it a part of your Agi...StarWest 2013 Performance is not an afterthought – make it a part of your Agi...
StarWest 2013 Performance is not an afterthought – make it a part of your Agi...
 

More from Chakkrit (Kla) Tantithamthavorn

More from Chakkrit (Kla) Tantithamthavorn (6)

The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
 
Impact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location TechniqueImpact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location Technique
 
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
Open Data in Asia: An Overview of Open Data Policies and Practices in 13 Coun...
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Engine
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Example Application of GPU
Example Application of GPUExample Application of GPU
Example Application of GPU
 

Recently uploaded

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Recently uploaded (20)

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 

The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models

  • 1. The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models Chakkrit (Kla)
 Tantithamthavorn Shane McIntosh Ahmed E. Hassan Akinori Ihara Kenichi Matsumoto @klainfo kla@chakkrit.com
  • 3. Software defects are costly Monetary NIST estimates that software defects cost the US economy $59.5 billion per year! 2
  • 4. Software defects are costly Reputation The Obama administration will always be connected to healthcare.gov Monetary NIST estimates that software defects cost the US economy $59.5 billion per year! 2
  • 5. SQA teams try to find defects 
 before they escape to the field 3
  • 6. SQA teams have limited resources 4 Limited
 QA Resources Software continues to grow 
 in size and complexity
  • 7. 5 Defect prediction models help
 SQA teams to
  • 8. 5 Defect prediction models help
 SQA teams to Predict
 what are risky modules
  • 9. 5 Defect prediction models help
 SQA teams to Predict
 what are risky modules Understand 
 what makes software fail
  • 10. Modules that are fixed during post-release development are set as defective 6 Changes Release Date Issues Post-Release Snapshot at the release date Defect Dataset
  • 11. Modules that are fixed during post-release development are set as defective 6 Changes Release Date Issues Post-Release Module 1 Module 2 Module 3 Module 4 Snapshot at the release date Defect Dataset
  • 12. Modules that are fixed during post-release development are set as defective 6 Changes Release Date Issues Post-Release Module 1 Module 2 Module 3 Module 4 Snapshot at the release date Defect Dataset
  • 13. Modules that are fixed during post-release development are set as defective 6 Changes Release Date Issues Post-Release Module 1 Module 2 Module 3 Module 4 Bug Report#1 Snapshot at the release date Defect Dataset
  • 14. Modules that are fixed during post-release development are set as defective 6 Changes Release Date Issues Post-Release Module 1 Module 2 Module 3 Module 4 Fixed
 Module1, Module2 Bug Report#1 Snapshot at the release date Defect Dataset
  • 15. Modules that are fixed during post-release development are set as defective 6 Changes Release Date Issues Post-Release Module 1 Module 2 Module 3 Module 4 Fixed
 Module1, Module2 Bug Report#1 Snapshot at the release date Label as Defective Defect Dataset
  • 16. Modules that are fixed during post-release development are set as defective 6 Changes Release Date Issues Post-Release Module 1 Module 2 Module 3 Module 4 Fixed
 Module1, Module2 Bug Report#1 Snapshot at the release date Label as Clean Label as Defective Defect Dataset
  • 17. Defect models are trained 
 using Machine Learning 7 Module 1 Module 2 Module 3 Module 4 Defect Dataset
  • 18. Defect models are trained 
 using Machine Learning 7 Module 1 Module 2 Module 3 Module 4 Defect Dataset Machine Learning or
 Statistical Learning Defect 
 model
  • 19. Defect data are noisy The reliability of the models depends on the quality of the training data 8 Module 1 Module 2 Module 3 Module 4 Defect Dataset Machine Learning or
 Statistical Learning Defect 
 modelNOISY Unreliable
  • 20. Issue reports are mislabelled 9 Fixed
 Module1, Module2 Bug Report#1 Fields in issue tracking systems are often missing 
 or incorrect. [Aranda et al., ICSE 2009] Actual Classify Meaning Defect
 Mislabelling A new feature could be incorrectly labeled as a bug Non-Defect
 Mislabelling A bug could be mislabelled as a new feature
  • 21. Issue reports are mislabelled 10 Fixed
 Module1, Module2 Bug Report#1 43% of issue reports are mislabelled. [Herzig et al., ICSE 2013] [Antoniol et al., CASCON 2008] Fields in issue tracking systems are often missing 
 or incorrect. [Aranda et al., ICSE 2009] Actual Classify Meaning Defect
 Mislabelling A new feature could be incorrectly labeled as a bug Non-Defect
 Mislabelling A bug could be mislabelled as a new feature
  • 22. Issue reports are mislabelled 11 Fixed
 Module1, Module2 Bug Report#1 43% of issue reports are mislabelled. [Herzig et al., ICSE 2013] [Antoniol et al., CASCON 2008] Fields in issue tracking systems are often missing 
 or incorrect. [Aranda et al., ICSE 2009] Actual Classify Meaning Defect
 Mislabelling A new feature could be incorrectly labeled as a bug Non-Defect
 Mislabelling A bug could be mislabelled as a new feature
  • 23. Then, modules are mislabelled 12 #1 Actual Classify Meaning Defect
 Mislabelling A new feature could be incorrectly labeled as a bug M1 M2 M4 NOISY DATA M1,M2 M3 #1
  • 24. Then, modules are mislabelled 12 #1 Actual Classify Meaning Defect
 Mislabelling A new feature could be incorrectly labeled as a bug M1 M2 M4 NOISY DATA M1,M2 #2 M3 M3 #1
  • 25. Then, modules are mislabelled 12 #1 Actual Classify Meaning Defect
 Mislabelling A new feature could be incorrectly labeled as a bug M1 M2 M4 NOISY DATA M1,M2 #2 M3 #2 is mislabelled. M3 #1
  • 26. Then, modules are mislabelled 12 #1 Actual Classify Meaning Defect
 Mislabelling A new feature could be incorrectly labeled as a bug M1 M2 M4 NOISY DATA M1,M2 #2 M3 #2 is mislabelled. M3M3 M3 should be 
 a clean module #2 #1
  • 27. 13 Mislabelling may impact the performance Prior works assumed that mislabelling is random [Kim et al., ICSE 2011] and 
 [Seiffert et al., Information Science 2014] Random mislabelling has a negative impact 
 on the performance.
  • 28. 14 Mislabelling is likely non-random We suspect that novice developers are likely to mislabel more than experienced developers. Novice developers are known to overlook the bookkeeping issue [Bachmann et al., FSE 2010]
  • 29. (RQ1) The Nature of Mislabelling The impact of realistic mislabelling on the performance and interpretation of defect models 15
  • 30. (RQ1) The Nature of Mislabelling The impact of realistic mislabelling on the performance and interpretation of defect models (RQ3) Its Impact 
 on the Interpretation Defect 
 model (RQ2) Its Impact 
 on the Performance 15
  • 31. Using prediction models to classify
 whether issue reports are mislabelled 16 Prediction 
 Model
  • 32. Using prediction models to classify
 whether issue reports are mislabelled 16 Prediction 
 Model Mislabelling is predictable Performs 
 Well
  • 33. Using prediction models to classify
 whether issue reports are mislabelled 16 Prediction 
 Model Mislabelling is random Performs 
 Poorly Mislabelling is predictable Performs 
 Well
  • 34. Selecting our studied systems 17 Manually-curated issue reports [Herzig et al., ICSE 2013]
  • 35. Jackrabbit Lucene 0.78 0.12 0.64 0.50 0.70 0.19 0.75 0.12 0.71 0.50 0.73 0.19 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Precision Recall F−Measure Precision Recall F−Measure PerformanceValue Our Model Random Guessing Jackrabbit 0.78 0.64 0.70 0.6 0.7 0.8 0.9 1.0eValue Our Model Mislabelling is non-random Jackrabbit Luc 0.78 0.70 0.75 0.71 0.8 0.9 1.0 lue Our Model Random Guess Jackrabbit Lucene 0.78 0.75 0.0.8 0.9 1.0 e Our Model Random Guessing Jackrabbit Lu 0.78 0.70 0.75 0.7 0.7 0.8 0.9 1.0 alue Our Model Random Gues 18
  • 36. Jackrabbit Lucene 0.78 0.12 0.64 0.50 0.70 0.19 0.75 0.12 0.71 0.50 0.73 0.19 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Precision Recall F−Measure Precision Recall F−Measure PerformanceValue Our Model Random Guessing Jackrabbit 0.78 0.64 0.70 0.6 0.7 0.8 0.9 1.0eValue Our Model Mislabelling is non-random Jackrabbit Luc 0.78 0.70 0.75 0.71 0.8 0.9 1.0 lue Our Model Random Guess Jackrabbit Lucene 0.78 0.75 0.0.8 0.9 1.0 e Our Model Random Guessing Jackrabbit Lu 0.78 0.70 0.75 0.7 0.7 0.8 0.9 1.0 alue Our Model Random Gues 18
  • 37. Jackrabbit Lucene 0.78 0.12 0.64 0.50 0.70 0.19 0.75 0.12 0.71 0.50 0.73 0.19 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Precision Recall F−Measure Precision Recall F−Measure PerformanceValue Our Model Random Guessing Jackrabbit 0.78 0.64 0.70 0.6 0.7 0.8 0.9 1.0eValue Our Model Mislabelling is non-random Jackrabbit Luc 0.78 0.70 0.75 0.71 0.8 0.9 1.0 lue Our Model Random Guess Jackrabbit Lucene 0.78 0.75 0.0.8 0.9 1.0 e Our Model Random Guessing Jackrabbit Lu 0.78 0.70 0.75 0.7 0.7 0.8 0.9 1.0 alue Our Model Random Gues 19
  • 38. Jackrabbit Lucene 0.78 0.12 0.64 0.50 0.70 0.19 0.75 0.12 0.71 0.50 0.73 0.19 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Precision Recall F−Measure Precision Recall F−Measure PerformanceValue Our Model Random Guessing Jackrabbit 0.78 0.64 0.70 0.6 0.7 0.8 0.9 1.0eValue Our Model Mislabelling is non-random Jackrabbit Luc 0.78 0.70 0.75 0.71 0.8 0.9 1.0 lue Our Model Random Guess Jackrabbit Lucene 0.78 0.75 0.0.8 0.9 1.0 e Our Model Random Guessing Jackrabbit Lu 0.78 0.70 0.75 0.7 0.7 0.8 0.9 1.0 alue Our Model Random Gues 20
  • 39. Jackrabbit Lucene 0.78 0.12 0.64 0.50 0.70 0.19 0.75 0.12 0.71 0.50 0.73 0.19 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Precision Recall F−Measure Precision Recall F−Measure PerformanceValue Our Model Random Guessing Jackrabbit 0.78 0.64 0.70 0.6 0.7 0.8 0.9 1.0eValue Our Model Mislabelling is non-random Jackrabbit Luc 0.78 0.70 0.75 0.71 0.8 0.9 1.0 lue Our Model Random Guess Jackrabbit Lucene 0.78 0.75 0.0.8 0.9 1.0 e Our Model Random Guessing Jackrabbit Lu 0.78 0.70 0.75 0.7 0.7 0.8 0.9 1.0 alue Our Model Random Gues Our models achieve a mean of F-measure up to 0.73, which is 4-34 times better than random guessing. 20
  • 40. (RQ1) The Nature of Mislabelling The impact of realistic mislabelling on the performance and interpretation of defect models 21
  • 41. (RQ1) The Nature of Mislabelling The impact of realistic mislabelling on the performance and interpretation of defect models 21 Mislabelling is non-random
  • 42. (RQ1) The Nature of Mislabelling The impact of realistic mislabelling on the performance and interpretation of defect models (RQ2) The Impact 
 on the Performance 21 Mislabelling is non-random
  • 43. 22 Compare the performance between clean models and noisy models Clean 
 Performance Realistic Noisy
 Performance Random Noisy
 Performance VS VS
  • 44. Generating three samples 23 Clean M1 M2 M3 M4 (Oracle)
 #2 is mislabelled Clean
 Sample #2 #1
  • 45. Generating three samples 24 Add Noise M1 M2 M3 M4 Clean M1 M2 M3 M4 (Oracle)
 #2 is mislabelled #2 #1 Clean
 Sample Realistic 
 Noisy
 Sample
  • 46. Realistically flip the modules’ label that are addressed by the mislabelled issue reports. Generating three samples 24 Add Noise M1 M2 M3 M4 Clean M1 M2 M3 M4 (Oracle)
 #2 is mislabelled #2 #1 Clean
 Sample Realistic 
 Noisy
 Sample
  • 47. Realistically flip the modules’ label that are addressed by the mislabelled issue reports. Generating three samples 25 Add Noise M1 M2 M3 M4 Random 
 Noisy
 Sample M1 M2 M3 M4 Add Noise Clean M1 M2 M3 M4 (Oracle)
 #2 is mislabelled #2 #1 Clean
 Sample Realistic 
 Noisy
 Sample
  • 48. Randomly flip the module’s label Realistically flip the modules’ label that are addressed by the mislabelled issue reports. Generating three samples 25 Add Noise M1 M2 M3 M4 Random 
 Noisy
 Sample M1 M2 M3 M4 Add Noise Clean M1 M2 M3 M4 (Oracle)
 #2 is mislabelled #2 #1 Clean
 Sample Realistic 
 Noisy
 Sample
  • 49. 26 Clean 
 Performance Realistic Noisy
 Performance Random Noisy
 Performance Clean 
 Sample Realistic Noisy
 Sample Random Noisy
 Sample VS VS Defect 
 model Defect 
 model Defect 
 model Generate the performance of
 clean models and noisy models
  • 50. 26 Clean 
 Performance Realistic Noisy
 Performance Random Noisy
 Performance Clean 
 Sample Realistic Noisy
 Sample Random Noisy
 Sample VS VS Defect 
 model Defect 
 model Defect 
 model Performance 
 Ratio = Performance of Realistic Noisy Model Performance of Clean Model Generate the performance of
 clean models and noisy models
  • 51. While the recall is often impacted, the precision is rarely impacted. 27 = Realistic Noisy Clean Interpretation:
 Ratio = 1 means there is no impact. Precision Recall 1.00.50.02.01.5 Ratio
  • 52. While the recall is often impacted, the precision is rarely impacted. 27 = Realistic Noisy Clean Interpretation:
 Ratio = 1 means there is no impact. Precision is rarely impactedby realistic mislabelling. Precision Recall 1.00.50.02.01.5 Ratio
  • 53. While the recall is often impacted, the precision is rarely impacted. 27 = Realistic Noisy Clean Interpretation:
 Ratio = 1 means there is no impact. Models trained on noisy data achieve 56% of the recall of models trained on clean data. Precision is rarely impactedby realistic mislabelling. Precision Recall 1.00.50.02.01.5 Ratio
  • 54. (RQ1) The Nature of Mislabelling The impact of realistic mislabelling on the performance and interpretation of defect models (RQ2) The Impact 
 on the Performance 28 Mislabelling is non-random
  • 55. (RQ1) The Nature of Mislabelling The impact of realistic mislabelling on the performance and interpretation of defect models (RQ2) The Impact 
 on the Performance 28 Mislabelling is non-random While the recall is often impacted, the precision is rarely impacted
  • 56. (RQ1) The Nature of Mislabelling The impact of realistic mislabelling on the performance and interpretation of defect models (RQ3) The Impact 
 on the Interpretation Defect 
 model (RQ2) The Impact 
 on the Performance 28 Mislabelling is non-random While the recall is often impacted, the precision is rarely impacted
  • 57. 29 Generate the rank of metrics of
 clean models and noisy models Clean 
 model Realistic noisy model Variable Importance
 Scores Variable Importance
 Scores Variable Importance
 Scores Random noisy model
  • 58. 30 Generate the rank of metrics of
 clean models and noisy models Clean 
 model Realistic noisy model Variable Importance
 Scores Variable Importance
 Scores Variable Importance
 Scores Rank of metrics Ranking Ranking Ranking Rank of metrics Rank of metrics Random noisy model
  • 59. 31 2 1 3 Clean Model Rank of metrics 
 of the clean model Whether a metric of the clean model appears at the same rank in the noisy models?
  • 60. 31 2 1 3 Clean Model Rank of metrics 
 of the clean model Noisy Model 2 1 3 ? Rank of metrics 
 of the noisy model Whether a metric of the clean model appears at the same rank in the noisy models?
  • 61. 32 Only the metrics in the 1st rank are robust to the mislabelling 2 1 3 Clean Model Noisy Model 2 1 3 85% of the metrics in the 1st rank of the clean model 
 also appear in the 1st rank of the noisy model.
  • 62. 33 Conversely, the metrics in the 
 2nd and 3rd ranks are less stable 2 1 3 Clean Model Noisy Model 2 1 3 As little as 18% of the metrics in the 2nd and 3rd rank of the clean models appear in the same rank in the noisy models
  • 63. (RQ1) The Nature of Mislabelling The impact of realistic mislabelling on the performance and interpretation of defect models (RQ3) The Impact 
 on the Interpretation Defect 
 model (RQ2) The Impact 
 on the Performance 34 Mislabelling is non-random While the recall is often impacted, the precision is rarely impacted
  • 64. (RQ1) The Nature of Mislabelling The impact of realistic mislabelling on the performance and interpretation of defect models (RQ3) The Impact 
 on the Interpretation Defect 
 model (RQ2) The Impact 
 on the Performance 34 Mislabelling is non-random While the recall is often impacted, the precision is rarely impacted Only top-rank metrics are robust to the mislabelling
  • 65. (RQ1) The Nature of Mislabelling Suggestions (RQ3) The Impact 
 on the Interpretation Defect 
 model (RQ2) The Impact 
 on the Performance 35
  • 66. (RQ1) The Nature of Mislabelling Suggestions (RQ3) The Impact 
 on the Interpretation Defect 
 model (RQ2) The Impact 
 on the Performance 35 Researchers can use our noise models to clean mislabelled issue reports
  • 67. (RQ1) The Nature of Mislabelling Suggestions (RQ3) The Impact 
 on the Interpretation Defect 
 model (RQ2) The Impact 
 on the Performance 35 Researchers can use our noise models to clean mislabelled issue reports Cleaning data will improve the ability to identify defective modules
  • 68. (RQ1) The Nature of Mislabelling Suggestions (RQ3) The Impact 
 on the Interpretation Defect 
 model (RQ2) The Impact 
 on the Performance 35 Researchers can use our noise models to clean mislabelled issue reports Cleaning data will improve the ability to identify defective modules Quality improvement plan should be made based on the top rank metrics
  • 69. 36
  • 70. Issue reports are mislabelled 37 Fixed
 Module1, Module2 Bug Report#1 43% of issue reports are mislabelled. [Herzig et al., ICSE 2013] [Antoniol et al., CASCON 2008] Fields in issue tracking systems are often missing 
 or incorrect. [Aranda et al., ICSE 2009] Actual Classify Meaning Defect
 Mislabelling A new feature could be incorrectly labeled as a bug Non-Defect
 Mislabelling A bug could be mislabelled as a new feature
  • 71. 38 Mislabelling may impact the performance Prior works assumed that mislabelling is random [Kim et al., ICSE 2011] and 
 [Seiffert et al., Information Science 2014] Random mislabelling has a negative impact 
 on the performance.
  • 72. (RQ1) The Nature of Mislabelling Findings (RQ3) The Impact 
 on the Interpretation Defect 
 model (RQ2) The Impact 
 on the Performance 39 Mislabelling is non-random While the recall is often impacted, the precision is rarely impacted. Only top-rank metrics are robust to the mislabelling
  • 73. (RQ1) The Nature of Mislabelling Suggestions (RQ3) The Impact 
 on the Interpretation Defect 
 model (RQ2) The Impact 
 on the Performance 40 Researchers can use our noise models to clean mislabelled issue reports Cleaning data will improve the ability to identify defective modules Quality improvement plan should be made based on the top rank metrics
  • 74. Issue reports are mislabelled 36 Fixed
 Module1, Module2 Bug Report#1 43% of issue reports are mislabelled. [Herzig et al., ICSE 2013] [Antoniol et al., CASCON 2008] Fields in issue tracking systems are often missing 
 or incorrect. [Aranda et al., ICSE 2009] Actual Classify Meaning Defect
 Mislabelling A new feature could be incorrectly labeled as a bug Non-Defect
 Mislabelling A bug could be mislabelled as a new feature @klainfo kla@chakkrit.com 12 Mislabelling may impact the performance Prior works assumed that mislabelling is random [Kim et al., ICSE 2011] and 
 [Seiffert et al., Information Science 2014] Random mislabelling has a negative impact 
 on the performance. (RQ1) The Nature of Mislabelling Findings (RQ3) The Impact 
 on the Interpretation Defect 
 model (RQ2) The Impact 
 on the Performance 39 Mislabelling is non-random While the recall is often impacted, the precision is rarely impacted. Only top-rank metrics are robust to the mislabelling (RQ1) The Nature of Mislabelling Suggestions (RQ3) The Impact 
 on the Interpretation Defect 
 model (RQ2) The Impact 
 on the Performance 40 Researchers can use our noise models to clean mislabelled issue reports Cleaning data will improve the ability to identify defective modules Quality improvement plan should be made based on the top rank metrics