Should We Consider Affected Releases?
Dr. Chakkrit (Kla) Tantithamthavorn
Mining Software Defects
Monash University, Australia.
chakkrit.tantithamthavorn@monash.edu
@klainfohttp://chakkrit.com
Suraj Y. Jirayus J. Patanamon T.
ANALYTICAL MODELS FOR SOFTWARE DEFECTS
Focus on predicting, explaining future software defects, and building empirical theories
Predicting future software
defects so practitioners
can effectively optimize
limited resources
ANALYTICAL MODELS FOR SOFTWARE DEFECTS
Focus on predicting, explaining future software defects, and building empirical theories
Predicting future software
defects so practitioners
can effectively optimize
limited resources
Explaining what makes a
software fail so managers
can develop the most
effective improvement plans
ANALYTICAL MODELS FOR SOFTWARE DEFECTS
Focus on predicting, explaining future software defects, and building empirical theories
Predicting future software
defects so practitioners
can effectively optimize
limited resources
Building empirical-
grounded theories of
software quality
Explaining what makes a
software fail so managers
can develop the most
effective improvement plans
ANALYTICAL MODELLING WORKFLOW
MAME: Mining, Analyzing, Modelling, Explaining
ANALYTICAL MODELLING WORKFLOW
MAME: Mining, Analyzing, Modelling, Explaining
Raw Data
……
……
A B
Clean Data
MINING
ANALYTICAL MODELLING WORKFLOW
MAME: Mining, Analyzing, Modelling, Explaining
Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..


ANALYZING
ANALYTICAL MODELLING WORKFLOW
MAME: Mining, Analyzing, Modelling, Explaining
Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..


ANALYZING
Analytical 

Models


MODELLING
ANALYTICAL MODELLING WORKFLOW
MAME: Mining, Analyzing, Modelling, Explaining
Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..


ANALYZING
Analytical 

Models


MODELLING
Knowledge


EXPLAINING
ANALYTICAL MODELLING WORKFLOW
MAME: Mining, Analyzing, Modelling, Explaining
Raw Data
……
……
A B
Clean Data
MINING
Correlation
.
.
. ..
. .
.
.
..


ANALYZING
Analytical 

Models


MODELLING
Knowledge


EXPLAINING
The focus on this
presentation
Raw Data
ITS
Issue 

Tracking

System (ITS)
MINING SOFTWARE DEFECTS
Issue 

Reports
VCS
Version

Control

System (VCS)
Code
Changes
Code
Snapshot
Commit
Log
STEP 1: EXTRACT DATA
Raw Data
ITS
Issue 

Tracking

System (ITS)
MINING SOFTWARE DEFECTS
Issue 

Reports
VCS
Version

Control

System (VCS)
Code
Changes
Code
Snapshot
Commit
Log
STEP 1: EXTRACT DATA
Reference: https://github.com/apache/lucene-solr/tree/662f8dd3423b3d56e9e1a197fe816393a33155e2
What are the source files
in this release?
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Issue 

Reports
Raw Data
STEP 1: EXTRACT DATA
MINING SOFTWARE DEFECTS
Reference: https://github.com/apache/lucene-solr/tree/662f8dd3423b3d56e9e1a197fe816393a33155e2
What are the source files
in this release?
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Issue 

Reports
Raw Data
STEP 2: COLLECT METRICS
STEP 1: EXTRACT DATA
MINING SOFTWARE DEFECTS
Reference: https://github.com/apache/lucene-solr/commit/662f8dd3423b3d56e9e1a197fe816393a33155e2
ITS VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Raw Data
Commit
Log
Issue 

Reports
MINING SOFTWARE DEFECTS
Code
Changes
Code
Snapshot
STEP 2: COLLECT METRICS
STEP 1: EXTRACT DATA
Reference: https://github.com/apache/lucene-solr/commit/662f8dd3423b3d56e9e1a197fe816393a33155e2
How many lines are
added or deleted?
ITS VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Raw Data
Commit
Log
Issue 

Reports
MINING SOFTWARE DEFECTS
Code
Changes
Code
Snapshot
STEP 2: COLLECT METRICS
STEP 1: EXTRACT DATA
Who edit this file?
ITS VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Raw Data
STEP 1: EXTRACT DATA
CODE METRICS
Size, Code Complexity, Cognitive Complexity,

OO Design (e.g., coupling, cohesion)
PROCESS METRICS
Development Practices 

(e.g., #commits, #dev, churn, #pre-
release defects, change complexity)
HUMAN FACTORS
Code Ownership, #MajorDevelopers, 

#MinorDevelopers, Author Ownership,

Developer Experience
Code
Changes
Code
Snapshot
Commit
Log
Issue 

Reports
STEP 2: COLLECT METRICS
MINING SOFTWARE DEFECTS
Reference: https://issues.apache.org/jira/browse/LUCENE-4128
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Issue 

Reports
Raw Data
STEP 1: EXTRACT DATA
STEP 2: COLLECT METRICSSTEP 3: IDENTIFY DEFECTS
MINING SOFTWARE DEFECTS
Reference: https://issues.apache.org/jira/browse/LUCENE-4128
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Issue 

Reports
Raw Data
STEP 1: EXTRACT DATA
STEP 2: COLLECT METRICSSTEP 3: IDENTIFY DEFECTS
MINING SOFTWARE DEFECTS
Reference: https://issues.apache.org/jira/browse/LUCENE-4128
Issue Reference ID
Bug / New Feature
Which releases are affected?
Which commits belong to this
issue report?
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Issue 

Reports
Raw Data
STEP 1: EXTRACT DATA
STEP 2: COLLECT METRICSSTEP 3: IDENTIFY DEFECTS
MINING SOFTWARE DEFECTS
Whether this report is created
after the release of interest?
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Issue 

Reports
Raw Data
STEP 1: EXTRACT DATA
STEP 3: IDENTIFY DEFECTS STEP 2: COLLECT METRICS
……
……
A B
Defect

Dataset
MINING SOFTWARE DEFECTS
Which files were changed to
fix the defect?
ITS
Code
Changes
Code
Snapshot
Commit
Log
VCS
Issue 

Tracking

System (ITS)
Version

Control

System (VCS)
Issue 

Reports
Raw Data
STEP 1: EXTRACT DATA
STEP 3: IDENTIFY DEFECTS STEP 2: COLLECT METRICS
……
……
A B
Defect

Dataset
MINING SOFTWARE DEFECTS
Which files were changed to
fix the defect?
Link
CHALLENGES OF MINING SOFTWARE DEFECTS
THE HEURISTIC APPROACH
Post-release defects are
defined as modules that are
fixed for a defect report
within a post-release window
period (e.g., 6 months)
[Fischer et al, ICSM’03]
ID indicates a defect report ID, 

C indicates a commit hash,

v indicates affected release(s)
Release 1.0
Changes
Issues
Timeline
Timeline
Fischer et al., “Populating a Release History Database from Version Control and Bug Tracking Systems”, ICSM’03
The 6-months window period
CHALLENGES OF MINING SOFTWARE DEFECTS
THE HEURISTIC APPROACH
Post-release defects are
defined as modules that are
fixed for a defect report
within a post-release window
period (e.g., 6 months)
[Fischer et al, ICSM’03]
ID indicates a defect report ID, 

C indicates a commit hash,

v indicates affected release(s)
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
Fischer et al., “Populating a Release History Database from Version Control and Bug Tracking Systems”, ICSM’03
The 6-months window period
CHALLENGES OF MINING SOFTWARE DEFECTS
FILE HEU-LABEL
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
A.java
B.java
C.java
D.java
THE HEURISTIC APPROACH
Post-release defects are
defined as modules that are
fixed for a defect report
within a post-release window
period (e.g., 6 months)
[Fischer et al, ICSM’03]
ID indicates a defect report ID, 

C indicates a commit hash,

v indicates affected release(s)
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
Fischer et al., “Populating a Release History Database from Version Control and Bug Tracking Systems”, ICSM’03
The 6-months window period
CHALLENGES OF MINING SOFTWARE DEFECTS
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
ID indicates a defect report ID, 

C indicates a commit hash,

v indicates affected release(s)
The 6-months window period
CHALLENGES OF MINING SOFTWARE DEFECTS
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
Some defect reports that are addressed within
the specific post-release window period did not
actually affect the release of interest
ID indicates a defect report ID, 

C indicates a commit hash,

v indicates affected release(s)
The 6-months window period
CHALLENGES OF MINING SOFTWARE DEFECTS
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
Some defect reports that are addressed within
the specific post-release window period did not
actually affect the release of interest
Some defect reports that actually affect the
release of interest may be addressed after the
specific window period
ID indicates a defect report ID, 

C indicates a commit hash,

v indicates affected release(s)
USING THE EARLIEST AFFECTED RELEASES
That is realistically estimated by a software development team
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
THE REALISTIC APPROACH
Post-release defects are
defined as modules that are
fixed for a defect report that
affected a release of interest
ID indicates a defect report ID, 

C indicates a commit hash,

v indicates affected release(s)
da Costa et al. suggest that the affected release field in
an Issue Tracking System (ITS) should be considered
when identifying defect-introducing commits.
da Costa et al., A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. TSE’17
USING THE EARLIEST AFFECTED RELEASES
That is realistically estimated by a software development team
Release 1.0
Changes
Issues
Timeline
Timeline
C1: Fixed ID-1
ID=1, v=1.0
A.java
ID=2, v=0.9
C2: Fixed ID-2
B.java
ID=3, v=1.0
C3: Fixed ID-3
C.java
ID=4, v=1.0
C4: Fixed ID-4
D.java
THE REALISTIC APPROACH
Post-release defects are
defined as modules that are
fixed for a defect report that
affected a release of interest
ID indicates a defect report ID, 

C indicates a commit hash,

v indicates affected release(s)
da Costa et al. suggest that the affected release field in
an Issue Tracking System (ITS) should be considered
when identifying defect-introducing commits.
da Costa et al., A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. TSE’17
DEFECTIVE
CLEAN
DEFECTIVE
DEFECTIVE
FILE HEU-LABEL
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
A.java
B.java
C.java
D.java
REAL-LABEL
(RQ1) How do heuristic 

defect datasets and realistic 

defect datasets differ?
(RQ2) How do defect
labelling approaches impact
the predictive accuracy of
defect models?
(RQ3) How do defect
labelling approaches impact
the ranking of defective
modules?
STUDIED SOFTWARE SYSTEMS
32 releases that span across 9 open-source software systems
Name %DefectiveRatio KLOC
ActiveMQ 6%-15% 142-299
Camel 2%-18% 75-383
Derby 14%-33% 412-533
Groovy 3%-8% 74-90
HBase 20%-26% 246-534
Hive 8%-19% 287-563
JRuby 5%-18% 105-238
Lucene 3%-24% 101-342
Wicket 4%-7% 109-165
Each dataset has 65 software metrics
• 54 code metrics
• 5 process metrics
• 6 ownership metrics
https://awsm-research.github.io/Rnalytica/
A defect dataset

with 2 types of
labels
1
1
2
0
A.java
B.java
C.java
D.java
FILE HeuBugCount RealBugCountMETRICS
……..
Labels generated by the heuristic approach
Labels generated by the realistic approach
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
HeuBug RealBug
DEFECTIVE
CLEAN
DEFECTIVE
CLEAN
1
0
2
0
(RQ1) How do heuristic defect datasets and realistic
defect datasets differ?
A defect dataset

with 2 types of
labels
1
1
2
0
A.java
B.java
C.java
D.java
FILE HeuBugCount RealBugCountMETRICS
……..
Labels generated by the heuristic approach
Labels generated by the realistic approach
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
HeuBug RealBug
DEFECTIVE
CLEAN
DEFECTIVE
CLEAN
1
0
2
0
Dichotomize to binary datasets
(RQ1) How do heuristic defect datasets and realistic
defect datasets differ?
(RQ1) How do heuristic defect datasets and realistic
defect datasets differ?
1
1
2
0
A.java
B.java
C.java
D.java
FILE HeuBugCount RealBugCountMETRICS
……..
How many files that
HeuBugCount !=
RealBugCount?
DEFECTIVE
DEFECTIVE
DEFECTIVE
CLEAN
HeuBug RealBug
DEFECTIVE
CLEAN
DEFECTIVE
CLEAN
1
0
2
0
How many files that
HeuBug != RealBug?
●
0
20
40
60
80
100
Defective Clean
Percentage
Defect Count Datasets
Both defect count and binary defect datasets are
impacted!
●
0
20
40
60
80
100
Defective Clean
Percentage
Defect Count Datasets
89% of defective files in heuristic defect
datasets have different defect counts
Both defect count and binary defect datasets are
impacted!
●
0
20
40
60
80
100
Defective Clean
Percentage
Defect Count Datasets
0
20
40
60
80
100
MislabelledDefective MislabelledClean
Percentage
Binary Defect Datasets
55% of defective files in 

heuristic datasets are mislabelled
89% of defective files in heuristic defect
datasets have different defect counts
Both defect count and binary defect datasets are
impacted!
(RQ1) How do heuristic 

defect datasets and realistic 

defect datasets differ?
(RQ2) How do defect
labelling approaches impact
the predictive accuracy of
defect models?
(RQ3) How do defect
labelling approaches impact
the ranking of defective
modules?
Both defect count and binary
defect datasets are impacted
(i.e., different defect counts

and mislabelling)
(RQ2-3) How do defect labelling approaches impact
defect models?
Generate
Samples
Training

corpus w/ 

2 labels
Testing corpus

with realistic-
generated labels
Construct

Models

(regression +
random forest)
Heuristic

Models
Realistic

Models
Evaluate

Models
Predictive 

Accuracy
Ranking of

Defects
100-repeated bootstrap iterations
A defect
dataset

with 2 types
of labels
LM_real LM_heu RFR_real RFR_heu
●
●
●
●
●●
0.0
0.2
0.4
0.6
0.8
1.0
MAE
75
80
85
90
95
100
SA
Surprisingly, there is no
significant difference in the
predictive accuracy of defect
count models
MAE (Mean Absolute Error): The lower MAE is , the more accurate
the models
SA (Standardized Accuracy): The higher SA is, the more accurate
the models
“Future studies should not be too concerned
about the quality of heuristic-generated defect
datasets when constructing defect count models”
●●
0.0
0.2
0.4
0.6
0.8
1.0
AUC Precision Recall Fmeasure
GLM_real GLM_heu RFC_real RFC_heu
When using realistic
defect datasets,
defect classification
models perform
better
“Realistic-generated defect
datasets are recommended when
building defect classification
models”
Improve ~5% for
AUC (Cliff’s
medium to large)
Improve ~10% for
F-measure (Cliff’s
medium to large)
(RQ1) How do heuristic 

defect datasets and realistic 

defect datasets differ?
(RQ2) How do defect
labelling approaches impact
the predictive accuracy of
defect models?
(RQ3) How do defect
labelling approaches impact
the ranking of defective
modules?
When using realistic defect
datasets, there is no difference
for defect count models but
defect classification models
perform better
Both defect count and binary
defect datasets are impacted
(i.e., different defect counts

and mislabelling)
●
●
●
●
●
●●
●
●
●
●
●
−0.2
0.0
0.2
0.4
P@20%LOC R@20%LOC Spearman
LM RF
●
●
●
●
●
●
−0.2
0.0
0.2
0.4
P@20%LOC R@20%LOC
GLM RF
The realistic approach has a negligible to small
impact on the ranking of defective files
Defect Count Models Defect Classification Models
Realistic-Heuristic
(RQ1) How do heuristic 

defect datasets and realistic 

defect datasets differ?
(RQ2) How do defect
labelling approaches impact
the predictive accuracy of
defect models?
(RQ3) How do defect
labelling approaches impact
the ranking of defective
modules?
The heuristic approach has a
negligible to small impact on
the ranking of defective
modules
When using realistic defect
datasets, there is no difference
for defect count models, but
defect classification models
perform better
Both defect count and binary
defect datasets are impacted
(i.e., different defect counts

and mislabelling)
TAKE

AWAY
For defect count models,
future work should not be
too concerned. 

For defect classification
models, using the realistic
approach is recommended.
https://awsm-research.github.io/Rnalytica/
Mining Software Defects

Should We Consider Affected Releases?
Dr. Chakkrit (Kla) Tantithamthavorn
chakkrit.tantithamthavorn@monash.edu
@klainfohttp://chakkrit.com

Mining Software Defects: Should We Consider Affected Releases?

  • 1.
    Should We ConsiderAffected Releases? Dr. Chakkrit (Kla) Tantithamthavorn Mining Software Defects Monash University, Australia. chakkrit.tantithamthavorn@monash.edu @klainfohttp://chakkrit.com Suraj Y. Jirayus J. Patanamon T.
  • 2.
    ANALYTICAL MODELS FORSOFTWARE DEFECTS Focus on predicting, explaining future software defects, and building empirical theories Predicting future software defects so practitioners can effectively optimize limited resources
  • 3.
    ANALYTICAL MODELS FORSOFTWARE DEFECTS Focus on predicting, explaining future software defects, and building empirical theories Predicting future software defects so practitioners can effectively optimize limited resources Explaining what makes a software fail so managers can develop the most effective improvement plans
  • 4.
    ANALYTICAL MODELS FORSOFTWARE DEFECTS Focus on predicting, explaining future software defects, and building empirical theories Predicting future software defects so practitioners can effectively optimize limited resources Building empirical- grounded theories of software quality Explaining what makes a software fail so managers can develop the most effective improvement plans
  • 5.
    ANALYTICAL MODELLING WORKFLOW MAME:Mining, Analyzing, Modelling, Explaining
  • 6.
    ANALYTICAL MODELLING WORKFLOW MAME:Mining, Analyzing, Modelling, Explaining Raw Data …… …… A B Clean Data MINING
  • 7.
    ANALYTICAL MODELLING WORKFLOW MAME:Mining, Analyzing, Modelling, Explaining Raw Data …… …… A B Clean Data MINING Correlation . . . .. . . . . .. 
 ANALYZING
  • 8.
    ANALYTICAL MODELLING WORKFLOW MAME:Mining, Analyzing, Modelling, Explaining Raw Data …… …… A B Clean Data MINING Correlation . . . .. . . . . .. 
 ANALYZING Analytical 
 Models 
 MODELLING
  • 9.
    ANALYTICAL MODELLING WORKFLOW MAME:Mining, Analyzing, Modelling, Explaining Raw Data …… …… A B Clean Data MINING Correlation . . . .. . . . . .. 
 ANALYZING Analytical 
 Models 
 MODELLING Knowledge 
 EXPLAINING
  • 10.
    ANALYTICAL MODELLING WORKFLOW MAME:Mining, Analyzing, Modelling, Explaining Raw Data …… …… A B Clean Data MINING Correlation . . . .. . . . . .. 
 ANALYZING Analytical 
 Models 
 MODELLING Knowledge 
 EXPLAINING The focus on this presentation
  • 11.
    Raw Data ITS Issue 
 Tracking
 System(ITS) MINING SOFTWARE DEFECTS Issue 
 Reports VCS Version
 Control
 System (VCS) Code Changes Code Snapshot Commit Log STEP 1: EXTRACT DATA
  • 12.
    Raw Data ITS Issue 
 Tracking
 System(ITS) MINING SOFTWARE DEFECTS Issue 
 Reports VCS Version
 Control
 System (VCS) Code Changes Code Snapshot Commit Log STEP 1: EXTRACT DATA
  • 13.
    Reference: https://github.com/apache/lucene-solr/tree/662f8dd3423b3d56e9e1a197fe816393a33155e2 What arethe source files in this release? ITS Code Changes Code Snapshot Commit Log VCS Issue 
 Tracking
 System (ITS) Version
 Control
 System (VCS) Issue 
 Reports Raw Data STEP 1: EXTRACT DATA MINING SOFTWARE DEFECTS
  • 14.
    Reference: https://github.com/apache/lucene-solr/tree/662f8dd3423b3d56e9e1a197fe816393a33155e2 What arethe source files in this release? ITS Code Changes Code Snapshot Commit Log VCS Issue 
 Tracking
 System (ITS) Version
 Control
 System (VCS) Issue 
 Reports Raw Data STEP 2: COLLECT METRICS STEP 1: EXTRACT DATA MINING SOFTWARE DEFECTS
  • 15.
    Reference: https://github.com/apache/lucene-solr/commit/662f8dd3423b3d56e9e1a197fe816393a33155e2 ITS VCS Issue
 Tracking
 System (ITS) Version
 Control
 System (VCS) Raw Data Commit Log Issue 
 Reports MINING SOFTWARE DEFECTS Code Changes Code Snapshot STEP 2: COLLECT METRICS STEP 1: EXTRACT DATA
  • 16.
    Reference: https://github.com/apache/lucene-solr/commit/662f8dd3423b3d56e9e1a197fe816393a33155e2 How manylines are added or deleted? ITS VCS Issue 
 Tracking
 System (ITS) Version
 Control
 System (VCS) Raw Data Commit Log Issue 
 Reports MINING SOFTWARE DEFECTS Code Changes Code Snapshot STEP 2: COLLECT METRICS STEP 1: EXTRACT DATA Who edit this file?
  • 17.
    ITS VCS Issue 
 Tracking
 System(ITS) Version
 Control
 System (VCS) Raw Data STEP 1: EXTRACT DATA CODE METRICS Size, Code Complexity, Cognitive Complexity,
 OO Design (e.g., coupling, cohesion) PROCESS METRICS Development Practices 
 (e.g., #commits, #dev, churn, #pre- release defects, change complexity) HUMAN FACTORS Code Ownership, #MajorDevelopers, 
 #MinorDevelopers, Author Ownership,
 Developer Experience Code Changes Code Snapshot Commit Log Issue 
 Reports STEP 2: COLLECT METRICS MINING SOFTWARE DEFECTS
  • 18.
    Reference: https://issues.apache.org/jira/browse/LUCENE-4128 ITS Code Changes Code Snapshot Commit Log VCS Issue 
 Tracking
 System(ITS) Version
 Control
 System (VCS) Issue 
 Reports Raw Data STEP 1: EXTRACT DATA STEP 2: COLLECT METRICSSTEP 3: IDENTIFY DEFECTS MINING SOFTWARE DEFECTS
  • 19.
    Reference: https://issues.apache.org/jira/browse/LUCENE-4128 ITS Code Changes Code Snapshot Commit Log VCS Issue 
 Tracking
 System(ITS) Version
 Control
 System (VCS) Issue 
 Reports Raw Data STEP 1: EXTRACT DATA STEP 2: COLLECT METRICSSTEP 3: IDENTIFY DEFECTS MINING SOFTWARE DEFECTS
  • 20.
    Reference: https://issues.apache.org/jira/browse/LUCENE-4128 Issue ReferenceID Bug / New Feature Which releases are affected? Which commits belong to this issue report? ITS Code Changes Code Snapshot Commit Log VCS Issue 
 Tracking
 System (ITS) Version
 Control
 System (VCS) Issue 
 Reports Raw Data STEP 1: EXTRACT DATA STEP 2: COLLECT METRICSSTEP 3: IDENTIFY DEFECTS MINING SOFTWARE DEFECTS Whether this report is created after the release of interest?
  • 21.
    ITS Code Changes Code Snapshot Commit Log VCS Issue 
 Tracking
 System (ITS) Version
 Control
 System(VCS) Issue 
 Reports Raw Data STEP 1: EXTRACT DATA STEP 3: IDENTIFY DEFECTS STEP 2: COLLECT METRICS …… …… A B Defect
 Dataset MINING SOFTWARE DEFECTS Which files were changed to fix the defect?
  • 22.
    ITS Code Changes Code Snapshot Commit Log VCS Issue 
 Tracking
 System (ITS) Version
 Control
 System(VCS) Issue 
 Reports Raw Data STEP 1: EXTRACT DATA STEP 3: IDENTIFY DEFECTS STEP 2: COLLECT METRICS …… …… A B Defect
 Dataset MINING SOFTWARE DEFECTS Which files were changed to fix the defect? Link
  • 23.
    CHALLENGES OF MININGSOFTWARE DEFECTS THE HEURISTIC APPROACH Post-release defects are defined as modules that are fixed for a defect report within a post-release window period (e.g., 6 months) [Fischer et al, ICSM’03] ID indicates a defect report ID, 
 C indicates a commit hash,
 v indicates affected release(s) Release 1.0 Changes Issues Timeline Timeline Fischer et al., “Populating a Release History Database from Version Control and Bug Tracking Systems”, ICSM’03
  • 24.
    The 6-months windowperiod CHALLENGES OF MINING SOFTWARE DEFECTS THE HEURISTIC APPROACH Post-release defects are defined as modules that are fixed for a defect report within a post-release window period (e.g., 6 months) [Fischer et al, ICSM’03] ID indicates a defect report ID, 
 C indicates a commit hash,
 v indicates affected release(s) Release 1.0 Changes Issues Timeline Timeline C1: Fixed ID-1 ID=1, v=1.0 A.java ID=2, v=0.9 C2: Fixed ID-2 B.java ID=3, v=1.0 C3: Fixed ID-3 C.java ID=4, v=1.0 C4: Fixed ID-4 D.java Fischer et al., “Populating a Release History Database from Version Control and Bug Tracking Systems”, ICSM’03
  • 25.
    The 6-months windowperiod CHALLENGES OF MINING SOFTWARE DEFECTS FILE HEU-LABEL DEFECTIVE DEFECTIVE DEFECTIVE CLEAN A.java B.java C.java D.java THE HEURISTIC APPROACH Post-release defects are defined as modules that are fixed for a defect report within a post-release window period (e.g., 6 months) [Fischer et al, ICSM’03] ID indicates a defect report ID, 
 C indicates a commit hash,
 v indicates affected release(s) Release 1.0 Changes Issues Timeline Timeline C1: Fixed ID-1 ID=1, v=1.0 A.java ID=2, v=0.9 C2: Fixed ID-2 B.java ID=3, v=1.0 C3: Fixed ID-3 C.java ID=4, v=1.0 C4: Fixed ID-4 D.java Fischer et al., “Populating a Release History Database from Version Control and Bug Tracking Systems”, ICSM’03
  • 26.
    The 6-months windowperiod CHALLENGES OF MINING SOFTWARE DEFECTS Release 1.0 Changes Issues Timeline Timeline C1: Fixed ID-1 ID=1, v=1.0 A.java ID=2, v=0.9 C2: Fixed ID-2 B.java ID=3, v=1.0 C3: Fixed ID-3 C.java ID=4, v=1.0 C4: Fixed ID-4 D.java ID indicates a defect report ID, 
 C indicates a commit hash,
 v indicates affected release(s)
  • 27.
    The 6-months windowperiod CHALLENGES OF MINING SOFTWARE DEFECTS Release 1.0 Changes Issues Timeline Timeline C1: Fixed ID-1 ID=1, v=1.0 A.java ID=2, v=0.9 C2: Fixed ID-2 B.java ID=3, v=1.0 C3: Fixed ID-3 C.java ID=4, v=1.0 C4: Fixed ID-4 D.java Some defect reports that are addressed within the specific post-release window period did not actually affect the release of interest ID indicates a defect report ID, 
 C indicates a commit hash,
 v indicates affected release(s)
  • 28.
    The 6-months windowperiod CHALLENGES OF MINING SOFTWARE DEFECTS Release 1.0 Changes Issues Timeline Timeline C1: Fixed ID-1 ID=1, v=1.0 A.java ID=2, v=0.9 C2: Fixed ID-2 B.java ID=3, v=1.0 C3: Fixed ID-3 C.java ID=4, v=1.0 C4: Fixed ID-4 D.java Some defect reports that are addressed within the specific post-release window period did not actually affect the release of interest Some defect reports that actually affect the release of interest may be addressed after the specific window period ID indicates a defect report ID, 
 C indicates a commit hash,
 v indicates affected release(s)
  • 29.
    USING THE EARLIESTAFFECTED RELEASES That is realistically estimated by a software development team Release 1.0 Changes Issues Timeline Timeline C1: Fixed ID-1 ID=1, v=1.0 A.java ID=2, v=0.9 C2: Fixed ID-2 B.java ID=3, v=1.0 C3: Fixed ID-3 C.java ID=4, v=1.0 C4: Fixed ID-4 D.java THE REALISTIC APPROACH Post-release defects are defined as modules that are fixed for a defect report that affected a release of interest ID indicates a defect report ID, 
 C indicates a commit hash,
 v indicates affected release(s) da Costa et al. suggest that the affected release field in an Issue Tracking System (ITS) should be considered when identifying defect-introducing commits. da Costa et al., A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. TSE’17
  • 30.
    USING THE EARLIESTAFFECTED RELEASES That is realistically estimated by a software development team Release 1.0 Changes Issues Timeline Timeline C1: Fixed ID-1 ID=1, v=1.0 A.java ID=2, v=0.9 C2: Fixed ID-2 B.java ID=3, v=1.0 C3: Fixed ID-3 C.java ID=4, v=1.0 C4: Fixed ID-4 D.java THE REALISTIC APPROACH Post-release defects are defined as modules that are fixed for a defect report that affected a release of interest ID indicates a defect report ID, 
 C indicates a commit hash,
 v indicates affected release(s) da Costa et al. suggest that the affected release field in an Issue Tracking System (ITS) should be considered when identifying defect-introducing commits. da Costa et al., A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. TSE’17 DEFECTIVE CLEAN DEFECTIVE DEFECTIVE FILE HEU-LABEL DEFECTIVE DEFECTIVE DEFECTIVE CLEAN A.java B.java C.java D.java REAL-LABEL
  • 31.
    (RQ1) How doheuristic 
 defect datasets and realistic 
 defect datasets differ? (RQ2) How do defect labelling approaches impact the predictive accuracy of defect models? (RQ3) How do defect labelling approaches impact the ranking of defective modules?
  • 32.
    STUDIED SOFTWARE SYSTEMS 32releases that span across 9 open-source software systems Name %DefectiveRatio KLOC ActiveMQ 6%-15% 142-299 Camel 2%-18% 75-383 Derby 14%-33% 412-533 Groovy 3%-8% 74-90 HBase 20%-26% 246-534 Hive 8%-19% 287-563 JRuby 5%-18% 105-238 Lucene 3%-24% 101-342 Wicket 4%-7% 109-165 Each dataset has 65 software metrics • 54 code metrics • 5 process metrics • 6 ownership metrics https://awsm-research.github.io/Rnalytica/
  • 33.
    A defect dataset
 with2 types of labels 1 1 2 0 A.java B.java C.java D.java FILE HeuBugCount RealBugCountMETRICS …….. Labels generated by the heuristic approach Labels generated by the realistic approach DEFECTIVE DEFECTIVE DEFECTIVE CLEAN HeuBug RealBug DEFECTIVE CLEAN DEFECTIVE CLEAN 1 0 2 0 (RQ1) How do heuristic defect datasets and realistic defect datasets differ?
  • 34.
    A defect dataset
 with2 types of labels 1 1 2 0 A.java B.java C.java D.java FILE HeuBugCount RealBugCountMETRICS …….. Labels generated by the heuristic approach Labels generated by the realistic approach DEFECTIVE DEFECTIVE DEFECTIVE CLEAN HeuBug RealBug DEFECTIVE CLEAN DEFECTIVE CLEAN 1 0 2 0 Dichotomize to binary datasets (RQ1) How do heuristic defect datasets and realistic defect datasets differ?
  • 35.
    (RQ1) How doheuristic defect datasets and realistic defect datasets differ? 1 1 2 0 A.java B.java C.java D.java FILE HeuBugCount RealBugCountMETRICS …….. How many files that HeuBugCount != RealBugCount? DEFECTIVE DEFECTIVE DEFECTIVE CLEAN HeuBug RealBug DEFECTIVE CLEAN DEFECTIVE CLEAN 1 0 2 0 How many files that HeuBug != RealBug?
  • 36.
    ● 0 20 40 60 80 100 Defective Clean Percentage Defect CountDatasets Both defect count and binary defect datasets are impacted!
  • 37.
    ● 0 20 40 60 80 100 Defective Clean Percentage Defect CountDatasets 89% of defective files in heuristic defect datasets have different defect counts Both defect count and binary defect datasets are impacted!
  • 38.
    ● 0 20 40 60 80 100 Defective Clean Percentage Defect CountDatasets 0 20 40 60 80 100 MislabelledDefective MislabelledClean Percentage Binary Defect Datasets 55% of defective files in 
 heuristic datasets are mislabelled 89% of defective files in heuristic defect datasets have different defect counts Both defect count and binary defect datasets are impacted!
  • 39.
    (RQ1) How doheuristic 
 defect datasets and realistic 
 defect datasets differ? (RQ2) How do defect labelling approaches impact the predictive accuracy of defect models? (RQ3) How do defect labelling approaches impact the ranking of defective modules? Both defect count and binary defect datasets are impacted (i.e., different defect counts
 and mislabelling)
  • 40.
    (RQ2-3) How dodefect labelling approaches impact defect models? Generate Samples Training
 corpus w/ 
 2 labels Testing corpus
 with realistic- generated labels Construct
 Models
 (regression + random forest) Heuristic
 Models Realistic
 Models Evaluate
 Models Predictive 
 Accuracy Ranking of
 Defects 100-repeated bootstrap iterations A defect dataset
 with 2 types of labels
  • 41.
    LM_real LM_heu RFR_realRFR_heu ● ● ● ● ●● 0.0 0.2 0.4 0.6 0.8 1.0 MAE 75 80 85 90 95 100 SA Surprisingly, there is no significant difference in the predictive accuracy of defect count models MAE (Mean Absolute Error): The lower MAE is , the more accurate the models SA (Standardized Accuracy): The higher SA is, the more accurate the models “Future studies should not be too concerned about the quality of heuristic-generated defect datasets when constructing defect count models”
  • 42.
    ●● 0.0 0.2 0.4 0.6 0.8 1.0 AUC Precision RecallFmeasure GLM_real GLM_heu RFC_real RFC_heu When using realistic defect datasets, defect classification models perform better “Realistic-generated defect datasets are recommended when building defect classification models” Improve ~5% for AUC (Cliff’s medium to large) Improve ~10% for F-measure (Cliff’s medium to large)
  • 43.
    (RQ1) How doheuristic 
 defect datasets and realistic 
 defect datasets differ? (RQ2) How do defect labelling approaches impact the predictive accuracy of defect models? (RQ3) How do defect labelling approaches impact the ranking of defective modules? When using realistic defect datasets, there is no difference for defect count models but defect classification models perform better Both defect count and binary defect datasets are impacted (i.e., different defect counts
 and mislabelling)
  • 44.
    ● ● ● ● ● ●● ● ● ● ● ● −0.2 0.0 0.2 0.4 P@20%LOC R@20%LOC Spearman LMRF ● ● ● ● ● ● −0.2 0.0 0.2 0.4 P@20%LOC R@20%LOC GLM RF The realistic approach has a negligible to small impact on the ranking of defective files Defect Count Models Defect Classification Models Realistic-Heuristic
  • 45.
    (RQ1) How doheuristic 
 defect datasets and realistic 
 defect datasets differ? (RQ2) How do defect labelling approaches impact the predictive accuracy of defect models? (RQ3) How do defect labelling approaches impact the ranking of defective modules? The heuristic approach has a negligible to small impact on the ranking of defective modules When using realistic defect datasets, there is no difference for defect count models, but defect classification models perform better Both defect count and binary defect datasets are impacted (i.e., different defect counts
 and mislabelling)
  • 46.
    TAKE
 AWAY For defect countmodels, future work should not be too concerned. 
 For defect classification models, using the realistic approach is recommended.
  • 50.
    https://awsm-research.github.io/Rnalytica/ Mining Software Defects
 ShouldWe Consider Affected Releases? Dr. Chakkrit (Kla) Tantithamthavorn chakkrit.tantithamthavorn@monash.edu @klainfohttp://chakkrit.com