each bug-fix revision for our two projects, as shown in
Figure 12. Most bug-fix revisions contain changes to just
one or two files. All 50% of file change numbers per
revision (between 25% and 75% quartiles) are about 1-3.
A typical approach for removing outliers from data is if a
data item is 1.5 times greater than the 50% quartile, it is
assumed to be an outlier. In our experiment, we adopt a
very conservative approach, and use as our definition of
Idea: not all ﬁle changes in the version that ﬁxes a
outlier file change counts that are greater than 5 times the
Figure 14. Bug-introducin
50% quartile. This ensures that any changes we note as
ignoring outlier revisions.
bug are bug-ﬁxing changes. Ignore these revisions! Hunk V
outliers truly have a large number of file changes.
Changes identified as outliers for our two projects are
4.5. Manual Fix
shown as ‘+’ in Figure 12. We identify bug-fix rev
and bug-fix revision dat
introducing changes. If a ch
is a bug-fix, we assume th
hunks in the revision are b
them are true bug-fixes? It
change log and understandi
One developer may think
others think it is only a s
feature addition. To check
true bug-fixes, we manually
marked them as bug-fix
judges, graduate students w
verification. A judge mark
projects (see Table 1) an
marks. Judges use a GUI-b
tool. The tool shows ind
Figure 12. Box plots for the number of file changes per revision. Judges read the
carefully and decide if the
Identify all bug-introducing changes
on method level granularity.
of SZZ against new algorithm.
Algorithm ﬁnds a bug introducing change,
but in reality the change is not the introducing change.
Algorithm cannot ﬁnd a bug introducing change,
that in reality is a bug introducing change.