With the rise of the Mining Software Repositories (MSR) field, defect datasets extracted from software repositories play a foundational role in many empirical studies related to software quality. At the core of defect data preparation is the identification of post-release defects. Prior studies leverage many heuristics (e.g., keywords and issue IDs) to identify post-release defects. However, such the heuristic approach is based on several assumptions, which pose common threats to the validity of many studies. In this paper, we set out to investigate the nature of the difference of defect datasets generated by the heuristic approach and the realistic approach that leverages the earliest affected release that is realistically estimated by a software development team for a given defect. In addition, we investigate the impact of defect identification approaches on the predictive accuracy and the ranking of defective modules that are produced by defect models. Through a case study of defect datasets of 32 releases, we find that that the heuristic approach has a large impact on both defect count datasets and binary defect datasets. Surprisingly, we find that the heuristic approach has a minimal impact on defect count models, suggesting that future work should not be too concerned about defect count models that are constructed using heuristic defect datasets. On the other hand, using defect datasets generated by the realistic approach lead to an improvement in the predictive accuracy of defect classification models.
9654467111 Call Girls In Munirka Hotel And Home Service
Mining Software Defects: Should We Consider Affected Releases?
1. Should We Consider Affected Releases?
Dr. Chakkrit (Kla) Tantithamthavorn
Mining Software Defects
Monash University, Australia.
chakkrit.tantithamthavorn@monash.edu
@klainfohttp://chakkrit.com
Suraj Y. Jirayus J. Patanamon T.
2. ANALYTICAL MODELS FOR SOFTWARE DEFECTS
Focus on predicting, explaining future software defects, and building empirical theories
Predicting future software
defects so practitioners
can effectively optimize
limited resources
3. ANALYTICAL MODELS FOR SOFTWARE DEFECTS
Focus on predicting, explaining future software defects, and building empirical theories
Predicting future software
defects so practitioners
can effectively optimize
limited resources
Explaining what makes a
software fail so managers
can develop the most
effective improvement plans
4. ANALYTICAL MODELS FOR SOFTWARE DEFECTS
Focus on predicting, explaining future software defects, and building empirical theories
Predicting future software
defects so practitioners
can effectively optimize
limited resources
Building empirical-
grounded theories of
software quality
Explaining what makes a
software fail so managers
can develop the most
effective improvement plans
7. ANALYTICAL MODELLING WORKFLOW
MAME: Mining, Analyzing, Modelling, Explaining
Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..
ANALYZING
8. ANALYTICAL MODELLING WORKFLOW
MAME: Mining, Analyzing, Modelling, Explaining
Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..
ANALYZING
Analytical
Models
MODELLING
9. ANALYTICAL MODELLING WORKFLOW
MAME: Mining, Analyzing, Modelling, Explaining
Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..
ANALYZING
Analytical
Models
MODELLING
Knowledge
EXPLAINING
10. ANALYTICAL MODELLING WORKFLOW
MAME: Mining, Analyzing, Modelling, Explaining
Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..
ANALYZING
Analytical
Models
MODELLING
Knowledge
EXPLAINING
The focus on this
presentation
11. Raw Data
ITS
Issue
Tracking
System (ITS)
MINING SOFTWARE DEFECTS
Issue
Reports
VCS
Version
Control
System (VCS)
Code
Changes
Code
Snapshot
Commit
Log
STEP 1: EXTRACT DATA
12. Raw Data
ITS
Issue
Tracking
System (ITS)
MINING SOFTWARE DEFECTS
Issue
Reports
VCS
Version
Control
System (VCS)
Code
Changes
Code
Snapshot
Commit
Log
STEP 1: EXTRACT DATA
20. Reference: https://issues.apache.org/jira/browse/LUCENE-4128
Issue Reference ID
Bug / New Feature
Which releases are affected?
Which commits belong to this
issue report?
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue
Tracking
System (ITS)
Version
Control
System (VCS)
Issue
Reports
Raw Data
STEP 1: EXTRACT DATA
STEP 2: COLLECT METRICSSTEP 3: IDENTIFY DEFECTS
MINING SOFTWARE DEFECTS
Whether this report is created
after the release of interest?
23. CHALLENGES OF MINING SOFTWARE DEFECTS
THE HEURISTIC APPROACH
Post-release defects are
defined as modules that are
fixed for a defect report
within a post-release window
period (e.g., 6 months)
[Fischer et al, ICSM’03]
ID indicates a defect report ID,
C indicates a commit hash,
v indicates affected release(s)
Release 1.0
Changes
Issues
Timeline
Timeline
Fischer et al., “Populating a Release History Database from Version Control and Bug Tracking Systems”, ICSM’03
24. The 6-months window period
CHALLENGES OF MINING SOFTWARE DEFECTS
THE HEURISTIC APPROACH
Post-release defects are
defined as modules that are
fixed for a defect report
within a post-release window
period (e.g., 6 months)
[Fischer et al, ICSM’03]
ID indicates a defect report ID,
C indicates a commit hash,
v indicates affected release(s)
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
Fischer et al., “Populating a Release History Database from Version Control and Bug Tracking Systems”, ICSM’03
25. The 6-months window period
CHALLENGES OF MINING SOFTWARE DEFECTS
FILE HEU-LABEL
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
A.java
B.java
C.java
D.java
THE HEURISTIC APPROACH
Post-release defects are
defined as modules that are
fixed for a defect report
within a post-release window
period (e.g., 6 months)
[Fischer et al, ICSM’03]
ID indicates a defect report ID,
C indicates a commit hash,
v indicates affected release(s)
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
Fischer et al., “Populating a Release History Database from Version Control and Bug Tracking Systems”, ICSM’03
26. The 6-months window period
CHALLENGES OF MINING SOFTWARE DEFECTS
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
ID indicates a defect report ID,
C indicates a commit hash,
v indicates affected release(s)
27. The 6-months window period
CHALLENGES OF MINING SOFTWARE DEFECTS
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
Some defect reports that are addressed within
the specific post-release window period did not
actually affect the release of interest
ID indicates a defect report ID,
C indicates a commit hash,
v indicates affected release(s)
28. The 6-months window period
CHALLENGES OF MINING SOFTWARE DEFECTS
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
Some defect reports that are addressed within
the specific post-release window period did not
actually affect the release of interest
Some defect reports that actually affect the
release of interest may be addressed after the
specific window period
ID indicates a defect report ID,
C indicates a commit hash,
v indicates affected release(s)
29. USING THE EARLIEST AFFECTED RELEASES
That is realistically estimated by a software development team
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
THE REALISTIC APPROACH
Post-release defects are
defined as modules that are
fixed for a defect report that
affected a release of interest
ID indicates a defect report ID,
C indicates a commit hash,
v indicates affected release(s)
da Costa et al. suggest that the affected release field in
an Issue Tracking System (ITS) should be considered
when identifying defect-introducing commits.
da Costa et al., A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. TSE’17
30. USING THE EARLIEST AFFECTED RELEASES
That is realistically estimated by a software development team
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
THE REALISTIC APPROACH
Post-release defects are
defined as modules that are
fixed for a defect report that
affected a release of interest
ID indicates a defect report ID,
C indicates a commit hash,
v indicates affected release(s)
da Costa et al. suggest that the affected release field in
an Issue Tracking System (ITS) should be considered
when identifying defect-introducing commits.
da Costa et al., A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. TSE’17
DEFECTIVE
CLEAN
DEFECTIVE
DEFECTIVE
FILE HEU-LABEL
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
A.java
B.java
C.java
D.java
REAL-LABEL
31. (RQ1) How do heuristic
defect datasets and realistic
defect datasets differ?
(RQ2) How do defect
labelling approaches impact
the predictive accuracy of
defect models?
(RQ3) How do defect
labelling approaches impact
the ranking of defective
modules?
32. STUDIED SOFTWARE SYSTEMS
32 releases that span across 9 open-source software systems
Name %DefectiveRatio KLOC
ActiveMQ 6%-15% 142-299
Camel 2%-18% 75-383
Derby 14%-33% 412-533
Groovy 3%-8% 74-90
HBase 20%-26% 246-534
Hive 8%-19% 287-563
JRuby 5%-18% 105-238
Lucene 3%-24% 101-342
Wicket 4%-7% 109-165
Each dataset has 65 software metrics
• 54 code metrics
• 5 process metrics
• 6 ownership metrics
https://awsm-research.github.io/Rnalytica/
33. A defect dataset
with 2 types of
labels
1
1
2
0
A.java
B.java
C.java
D.java
FILE HeuBugCount RealBugCountMETRICS
……..
Labels generated by the heuristic approach
Labels generated by the realistic approach
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
HeuBug RealBug
DEFECTIVE
CLEAN
DEFECTIVE
CLEAN
1
0
2
0
(RQ1) How do heuristic defect datasets and realistic
defect datasets differ?
34. A defect dataset
with 2 types of
labels
1
1
2
0
A.java
B.java
C.java
D.java
FILE HeuBugCount RealBugCountMETRICS
……..
Labels generated by the heuristic approach
Labels generated by the realistic approach
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
HeuBug RealBug
DEFECTIVE
CLEAN
DEFECTIVE
CLEAN
1
0
2
0
Dichotomize to binary datasets
(RQ1) How do heuristic defect datasets and realistic
defect datasets differ?
35. (RQ1) How do heuristic defect datasets and realistic
defect datasets differ?
1
1
2
0
A.java
B.java
C.java
D.java
FILE HeuBugCount RealBugCountMETRICS
……..
How many files that
HeuBugCount !=
RealBugCount?
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
HeuBug RealBug
DEFECTIVE
CLEAN
DEFECTIVE
CLEAN
1
0
2
0
How many files that
HeuBug != RealBug?
38. ●
0
20
40
60
80
100
Defective Clean
Percentage
Defect Count Datasets
0
20
40
60
80
100
MislabelledDefective MislabelledClean
Percentage
Binary Defect Datasets
55% of defective files in
heuristic datasets are mislabelled
89% of defective files in heuristic defect
datasets have different defect counts
Both defect count and binary defect datasets are
impacted!
39. (RQ1) How do heuristic
defect datasets and realistic
defect datasets differ?
(RQ2) How do defect
labelling approaches impact
the predictive accuracy of
defect models?
(RQ3) How do defect
labelling approaches impact
the ranking of defective
modules?
Both defect count and binary
defect datasets are impacted
(i.e., different defect counts
and mislabelling)
40. (RQ2-3) How do defect labelling approaches impact
defect models?
Generate
Samples
Training
corpus w/
2 labels
Testing corpus
with realistic-
generated labels
Construct
Models
(regression +
random forest)
Heuristic
Models
Realistic
Models
Evaluate
Models
Predictive
Accuracy
Ranking of
Defects
100-repeated bootstrap iterations
A defect
dataset
with 2 types
of labels
41. LM_real LM_heu RFR_real RFR_heu
●
●
●
●
●●
0.0
0.2
0.4
0.6
0.8
1.0
MAE
75
80
85
90
95
100
SA
Surprisingly, there is no
significant difference in the
predictive accuracy of defect
count models
MAE (Mean Absolute Error): The lower MAE is , the more accurate
the models
SA (Standardized Accuracy): The higher SA is, the more accurate
the models
“Future studies should not be too concerned
about the quality of heuristic-generated defect
datasets when constructing defect count models”
42. ●●
0.0
0.2
0.4
0.6
0.8
1.0
AUC Precision Recall Fmeasure
GLM_real GLM_heu RFC_real RFC_heu
When using realistic
defect datasets,
defect classification
models perform
better
“Realistic-generated defect
datasets are recommended when
building defect classification
models”
Improve ~5% for
AUC (Cliff’s
medium to large)
Improve ~10% for
F-measure (Cliff’s
medium to large)
43. (RQ1) How do heuristic
defect datasets and realistic
defect datasets differ?
(RQ2) How do defect
labelling approaches impact
the predictive accuracy of
defect models?
(RQ3) How do defect
labelling approaches impact
the ranking of defective
modules?
When using realistic defect
datasets, there is no difference
for defect count models but
defect classification models
perform better
Both defect count and binary
defect datasets are impacted
(i.e., different defect counts
and mislabelling)
44. ●
●
●
●
●
●●
●
●
●
●
●
−0.2
0.0
0.2
0.4
P@20%LOC R@20%LOC Spearman
LM RF
●
●
●
●
●
●
−0.2
0.0
0.2
0.4
P@20%LOC R@20%LOC
GLM RF
The realistic approach has a negligible to small
impact on the ranking of defective files
Defect Count Models Defect Classification Models
Realistic-Heuristic
45. (RQ1) How do heuristic
defect datasets and realistic
defect datasets differ?
(RQ2) How do defect
labelling approaches impact
the predictive accuracy of
defect models?
(RQ3) How do defect
labelling approaches impact
the ranking of defective
modules?
The heuristic approach has a
negligible to small impact on
the ranking of defective
modules
When using realistic defect
datasets, there is no difference
for defect count models, but
defect classification models
perform better
Both defect count and binary
defect datasets are impacted
(i.e., different defect counts
and mislabelling)
46. TAKE
AWAY
For defect count models,
future work should not be
too concerned.
For defect classification
models, using the realistic
approach is recommended.