10. RBRFC vs. DRFC
RBRFC – Regression-Based Random Forest Classifier
DRFC – Discretized Random Forest Classifier
10
If defective
ratio <= 15
Use RBRFC
If defective
ratio>=35
Use DRFC
Defective ratio – #defective modules/ #Total modules
DRFC(AUC)/RBRFC(AUC)
11. Between 15 and 35 – A simulation
study
RBRFC DRFC
11
DRFC(AUC)/RBRFC(AUC)
13. Discretized and regression-
based classifiers feature ranks
13
Shifts
If Rank 1
Yes, RBRFC and DRFC ranks are comparable
Else
They are not comparable
17. Need for defect prediction
To identify the
potentially defective
modules to test in a
timely fashion
(Performance).
To identify the factors
that cause the defects,
so that recurrence of
defects can be avoided
(Feature Importance). 17
18. Data for experiments
Tera-
promise
Repo
EPV >
10 and
Defectiv
e ratio <
80
Eclipse 2.0
Eclipse 2.1
Eclipse 3.0
Camel-1.2
Mylyn
PDE
Prop-1
Prop-2
Prop-3
Prop-4
Prop-5
Xalan-2.5
Xalan-2.6
Lucene-2.4
Poi-2.5
Poi-3.0
Xerces-1.4
Yes
No
Discard 18
20. Which classifier performs best ?
Regression-based
classifiers
Discretized
classifiers
Random
Forest
Random
Forest
Linear
Regression
KNN
20
21. How well does regression-based
classifiers perform?
Defective ratio < = 15
RBRF
C
DRF
C
DRFC RBRF
C
Defective ratio > = 30
RBRFC – Regression-Based Random Forest
Classifier
DRFC – Discretized Random Forest Classifier 21
22. Points for discussion
How does the
performance of
discretized and
regression-based
classifiers vary
for different
defect ratios?
Does the R2
regression fit
score impact the
performance of
regression-based
classifiers?
Permutation
feature
importance vs
default feature
importance?
22
23. Impact of R2 score on AUC
No correlation
between R2 and
AUC of the
regression-based
classifiers!
23
25. What do we suggest ?
• Use RBRFC if defective ratio < 35
• Don’t worry about R2 score of
regression-based classifiers
• If using default feature importance
calculation method, employ
caution when comparing
between the classifiers
25