Mining Software Defects: Should We Consider Affected Releases?

Should We Consider Affected Releases?
Dr. Chakkrit (Kla) Tantithamthavorn
Mining Software Defects
Monash University, Australia.
chakkrit.tantithamthavorn@monash.edu
@klainfohttp://chakkrit.com
Suraj Y. Jirayus J. Patanamon T.

ANALYTICAL MODELS FOR SOFTWARE DEFECTS
Focus on predicting, explaining future software defects, and building empirical theories
Predicting future software
defects so practitioners
can effectively optimize
limited resources

limited resources
Explaining what makes a
software fail so managers
can develop the most
effective improvement plans

limited resources
Building empirical-
grounded theories of
software quality
Explaining what makes a
software fail so managers
can develop the most
effective improvement plans

ANALYTICAL MODELLING WORKFLOW
MAME: Mining, Analyzing, Modelling, Explaining

Raw Data
……
……
A B
Clean Data
MINING

Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..
 
ANALYZING

Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..
 
ANALYZING
Analytical  
Models
 
MODELLING

Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..
 
ANALYZING
Analytical  
Models
 
MODELLING
Knowledge
 
EXPLAINING

Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..
 
ANALYZING
Analytical  
Models
 
MODELLING
Knowledge
 
EXPLAINING
The focus on this
presentation

Raw Data
ITS
Issue  
Tracking 
System (ITS)
MINING SOFTWARE DEFECTS
Issue  
Reports
VCS
Version 
Control 
System (VCS)
Code
Changes
Code
Snapshot
Commit
Log
STEP 1: EXTRACT DATA

Reference: https://github.com/apache/lucene-solr/tree/662f8dd3423b3d56e9e1a197fe816393a33155e2
What are the source files
in this release?
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue  
Tracking 
System (ITS)
Version 
Control 
System (VCS)
Issue  
Reports
Raw Data

Reference: https://github.com/apache/lucene-solr/tree/662f8dd3423b3d56e9e1a197fe816393a33155e2
What are the source files
in this release?
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue  
Tracking 
System (ITS)
Version 
Control 
System (VCS)
Issue  
Reports
Raw Data
STEP 2: COLLECT METRICS

Reference: https://github.com/apache/lucene-solr/commit/662f8dd3423b3d56e9e1a197fe816393a33155e2
ITS VCS
Issue  
Tracking 
System (ITS)
Version 
Control 
System (VCS)
Raw Data
Commit
Log
Issue  
Reports
Code
Changes
Code
Snapshot

Reference: https://github.com/apache/lucene-solr/commit/662f8dd3423b3d56e9e1a197fe816393a33155e2
How many lines are
added or deleted?
ITS VCS
Issue  
Tracking 
System (ITS)
Version 
Control 
System (VCS)
Raw Data
Commit
Log
Issue  
Reports
Code
Changes
Code
Snapshot
Who edit this file?

ITS VCS
Issue  
Tracking 
System (ITS)
Version 
Control 
System (VCS)
Raw Data
CODE METRICS
Size, Code Complexity, Cognitive Complexity, 
OO Design (e.g., coupling, cohesion)
PROCESS METRICS
Development Practices  
(e.g., #commits, #dev, churn, #pre-
release defects, change complexity)
HUMAN FACTORS
Code Ownership, #MajorDevelopers,  
#MinorDevelopers, Author Ownership, 
Developer Experience
Code
Changes
Code
Snapshot
Commit
Log
Issue  
Reports

Reference: https://issues.apache.org/jira/browse/LUCENE-4128
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue  
Tracking 
System (ITS)
Version 
Control 
System (VCS)
Issue  
Reports
Raw Data
STEP 2: COLLECT METRICSSTEP 3: IDENTIFY DEFECTS

Reference: https://issues.apache.org/jira/browse/LUCENE-4128
Issue Reference ID
Bug / New Feature
Which releases are affected?
Which commits belong to this
issue report?
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue  
Tracking 
System (ITS)
Version 
Control 
System (VCS)
Issue  
Reports
Raw Data
STEP 2: COLLECT METRICSSTEP 3: IDENTIFY DEFECTS
Whether this report is created
after the release of interest?

ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue  
Tracking 
System (ITS)
Version 
Control 
System (VCS)
Issue  
Reports
Raw Data
STEP 3: IDENTIFY DEFECTS STEP 2: COLLECT METRICS
……
……
A B
Defect 
Dataset
Which files were changed to
fix the defect?

ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue  
Tracking 
System (ITS)
Version 
Control 
System (VCS)
Issue  
Reports
Raw Data
STEP 3: IDENTIFY DEFECTS STEP 2: COLLECT METRICS
……
……
A B
Defect 
Dataset
Which files were changed to
fix the defect?
Link

CHALLENGES OF MINING SOFTWARE DEFECTS
THE HEURISTIC APPROACH
Post-release defects are
defined as modules that are
fixed for a defect report
within a post-release window
period (e.g., 6 months)
[Fischer et al, ICSM’03]
ID indicates a defect report ID,  
C indicates a commit hash, 
v indicates affected release(s)
Release 1.0
Changes
Issues
Timeline
Timeline
Fischer et al., “Populating a Release History Database from Version Control and Bug Tracking Systems”, ICSM’03

The 6-months window period
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java

FILE HEU-LABEL
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
A.java
B.java
C.java
D.java
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java

Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java

Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
Some defect reports that are addressed within
the specific post-release window period did not
actually affect the release of interest

Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
Some defect reports that are addressed within
the specific post-release window period did not
actually affect the release of interest
Some defect reports that actually affect the
release of interest may be addressed after the
specific window period

USING THE EARLIEST AFFECTED RELEASES
That is realistically estimated by a software development team
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
THE REALISTIC APPROACH
fixed for a defect report that
affected a release of interest
da Costa et al. suggest that the affected release field in
an Issue Tracking System (ITS) should be considered
when identifying defect-introducing commits.
da Costa et al., A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. TSE’17

USING THE EARLIEST AFFECTED RELEASES
That is realistically estimated by a software development team
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
THE REALISTIC APPROACH
fixed for a defect report that
affected a release of interest
da Costa et al. suggest that the affected release field in
an Issue Tracking System (ITS) should be considered
when identifying defect-introducing commits.
da Costa et al., A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. TSE’17
DEFECTIVE
CLEAN
DEFECTIVE
DEFECTIVE
FILE HEU-LABEL
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
A.java
B.java
C.java
D.java
REAL-LABEL

(RQ1) How do heuristic  
defect datasets and realistic  
defect datasets differ?
(RQ2) How do defect
labelling approaches impact
the predictive accuracy of
defect models?
(RQ3) How do defect
the ranking of defective
modules?

STUDIED SOFTWARE SYSTEMS
32 releases that span across 9 open-source software systems
Name %DefectiveRatio KLOC
ActiveMQ 6%-15% 142-299
Camel 2%-18% 75-383
Derby 14%-33% 412-533
Groovy 3%-8% 74-90
HBase 20%-26% 246-534
Hive 8%-19% 287-563
JRuby 5%-18% 105-238
Lucene 3%-24% 101-342
Wicket 4%-7% 109-165
Each dataset has 65 software metrics
• 54 code metrics
• 5 process metrics
• 6 ownership metrics
https://awsm-research.github.io/Rnalytica/

A defect dataset 
with 2 types of
labels
1
1
2
0
A.java
B.java
C.java
D.java
FILE HeuBugCount RealBugCountMETRICS
……..
Labels generated by the heuristic approach
Labels generated by the realistic approach
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
HeuBug RealBug
DEFECTIVE
CLEAN
DEFECTIVE
CLEAN
1
0
2
0
(RQ1) How do heuristic defect datasets and realistic

A defect dataset 
with 2 types of
labels
1
1
2
0
A.java
B.java
C.java
D.java
……..
Labels generated by the heuristic approach
Labels generated by the realistic approach
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
HeuBug RealBug
DEFECTIVE
CLEAN
DEFECTIVE
CLEAN
1
0
2
0
Dichotomize to binary datasets

1
1
2
0
A.java
B.java
C.java
D.java
……..
How many files that
HeuBugCount !=
RealBugCount?
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
HeuBug RealBug
DEFECTIVE
CLEAN
DEFECTIVE
CLEAN
1
0
2
0
How many files that
HeuBug != RealBug?

●
0
20
40
60
80
100
Defective Clean
Percentage
Defect Count Datasets
Both defect count and binary defect datasets are
impacted!

●
0
20
40
60
80
100
Defective Clean
Percentage
89% of defective files in heuristic defect
datasets have different defect counts
impacted!

●
0
20
40
60
80
100
Defective Clean
Percentage
0
20
40
60
80
100
MislabelledDefective MislabelledClean
Percentage
Binary Defect Datasets
55% of defective files in  
heuristic datasets are mislabelled
89% of defective files in heuristic defect
datasets have different defect counts
impacted!

(RQ2) How do defect
defect models?
(RQ3) How do defect
modules?
Both defect count and binary
defect datasets are impacted
(i.e., different defect counts 
and mislabelling)

(RQ2-3) How do defect labelling approaches impact
defect models?
Generate
Samples
Training 
corpus w/  
2 labels
Testing corpus 
with realistic-
generated labels
Construct 
Models 
(regression +
random forest)
Heuristic 
Models
Realistic 
Models
Evaluate 
Models
Predictive  
Accuracy
Ranking of 
Defects
100-repeated bootstrap iterations
A defect
dataset 
with 2 types
of labels

LM_real LM_heu RFR_real RFR_heu
●
●
●
●
●●
0.0
0.2
0.4
0.6
0.8
1.0
MAE
75
80
85
90
95
100
SA
Surprisingly, there is no
significant difference in the
predictive accuracy of defect
count models
MAE (Mean Absolute Error): The lower MAE is , the more accurate
the models
SA (Standardized Accuracy): The higher SA is, the more accurate
the models
“Future studies should not be too concerned
about the quality of heuristic-generated defect
datasets when constructing defect count models”

●●
0.0
0.2
0.4
0.6
0.8
1.0
AUC Precision Recall Fmeasure
GLM_real GLM_heu RFC_real RFC_heu
When using realistic
defect datasets,
defect classification
models perform
better
“Realistic-generated defect
datasets are recommended when
building defect classification
models”
Improve ~5% for
AUC (Cliff’s
medium to large)
Improve ~10% for
F-measure (Cliff’s
medium to large)

(RQ2) How do defect
defect models?
(RQ3) How do defect
modules?
When using realistic defect
datasets, there is no difference
for defect count models but
defect classification models
perform better
and mislabelling)

●
●
●
●
●
●●
●
●
●
●
●
−0.2
0.0
0.2
0.4
P@20%LOC R@20%LOC Spearman
LM RF
●
●
●
●
●
●
−0.2
0.0
0.2
0.4
P@20%LOC R@20%LOC
GLM RF
The realistic approach has a negligible to small
impact on the ranking of defective files
Defect Count Models Defect Classification Models
Realistic-Heuristic

(RQ2) How do defect
defect models?
(RQ3) How do defect
modules?
The heuristic approach has a
negligible to small impact on
modules
When using realistic defect
datasets, there is no difference
for defect count models, but
defect classification models
perform better
and mislabelling)

TAKE 
AWAY
For defect count models,
future work should not be
too concerned.  
For defect classification
models, using the realistic
approach is recommended.

https://awsm-research.github.io/Rnalytica/
Mining Software Defects 
Should We Consider Affected Releases?
Dr. Chakkrit (Kla) Tantithamthavorn
chakkrit.tantithamthavorn@monash.edu
@klainfohttp://chakkrit.com

Mining Software Defects: Should We Consider Affected Releases?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mining Software Defects: Should We Consider Affected Releases?

Similar to Mining Software Defects: Should We Consider Affected Releases? (20)

Recently uploaded

Recently uploaded (20)

Mining Software Defects: Should We Consider Affected Releases?