Build systems orchestrate how human-readable source code is translated into executable programs. In a software project, source code changes can induce changes in the build system (aka. build co-changes). It is difficult for developers to identify when build co-changes are necessary due to the complexity of build systems. Prediction of build co-changes works well if there is a sufficient amount of training data to build a model. However, in practice, for new projects, there exists a limited number of changes. Using training data from other projects to predict the build co-changes in a new project can help improve the performance of the build co-change prediction. We refer to this problem as cross-project build co-change prediction.
In this paper, we propose CroBuild, a novel cross-project build co-change prediction approach that iteratively learns new classifiers. CroBuild constructs an ensemble of classifiers by iteratively building classifiers and assigning them weights according to its prediction error rate. Given that only a small proportion of code changes are build co-changing, we also propose an imbalance-aware approach that learns a threshold boundary between those code changes that are build co-changing and those that are not in order to construct classifiers in each iteration. To examine the benefits of CroBuild, we perform experiments on 4 large datasets including Mozilla, Eclipse-core, Lucene, and Jazz, comprising a total of 50,884 changes. On average, across the 4 datasets, CroBuild achieves a F1-score of up to 0.408. We also compare CroBuild with other approaches such as a basic model, AdaBoost proposed by Freund et al., and TrAdaBoost proposed by Dai et al.. On average, across the 4 datasets, the CroBuild approach yields an improvement in F1-scores of 41.54%, 36.63%, and 36.97% over the basic model, AdaBoost, and TrAdaBoost, respectively.
5. The build system is at the
heart of techniques like
Continuous Integration (CI)
4
.c .mk
6. The build system is at the
heart of techniques like
Continuous Integration (CI)
Commit
4
Commit
9719cf0
.c .mk
7. The build system is at the
heart of techniques like
Continuous Integration (CI)
Commit
4
Build
Commit
9719cf0
.c .mk
8. The build system is at the
heart of techniques like
Continuous Integration (CI)
Commit
4
Build
Test
Commit
9719cf0
.c .mk
9. The build system is at the
heart of techniques like
Continuous Integration (CI)
Commit
4
Build
Test
Report
Commit
9719cf0 wassuccessfullyintegrated
Commit
9719cf0
.c .mk
10. The build system is at the
heart of techniques like
Continuous Integration (CI)
Commit
4
Build
Test
Report
Commit
9719cf0 wassuccessfullyintegrated
Commit
9719cf0
.c .mk
11. “...nothing can be
said to be certain,
except death and
taxes” - Benjamin Franklin
The Build “Tax”
An Empirical Study of Build
Maintenance Effort
S. McIntosh, B. Adams, T. H. D.
Nguyen, Y. Kamei, A. E. Hassan
[ICSE 2011]
Up to 27% of source
changes require build
changes, too!
5
21. Neglected build maintenance
can even impact end users
7
Not working due
to linking of
incorrect SQLite
library version
When are build
changes necessary?
27. Missed code
in #2121
Add feature
#2121
Fix for
bug #1234
Grouping related changes according
to the work items that they address
.c .c .c
Transactions
Changes
.mk
9
28. 2121
Missed code
in #2121
Add feature
#2121
1234
Fix for
bug #1234
Grouping related changes according
to the work items that they address
.c .c .c
Transactions
Work items
Changes
.mk
9
29. 1 2
.mk
10
We train classifiers to identify code
changes that require build co-changes
Work
items
.c.c .c
Classification
model
Build change
necessary
No build change
necessary
30. 1 2
.mk
10
We train classifiers to identify code
changes that require build co-changes
Work
items
.c
.c .cClassification
model
Build change
necessary
No build change
necessary
32. 12
Prior work shows that within-project build
co-change prediction can be accurate
Mining Co-Change Information to
Understand when Build Changes
are Necessary
S. McIntosh, B. Adams, M.
Nagappan, A. E. Hassan
[ICSME 2014]
Build co-change
classifiers can achieve
an AUC of 0.60-0.88
33. However, a large amount of historical
data was used to train the classifiers
13
34. However, a large amount of historical
data was used to train the classifiers
13
35. However, a large amount of historical
data was used to train the classifiers
13
What about new
projects?
36. However, a large amount of historical
data was used to train the classifiers
13
What about new
projects?
…or projects withpoorly-recordedhistorical data?
37. However, a large amount of historical
data was used to train the classifiers
13
What about new
projects?
…or projects withpoorly-recordedhistorical data?
Can we leverage these large
corpora for the small ones?
39. 14
How well do build co-
change prediction models
perform on sparse data?
Precision
Recall
F1-score
AUC
0 0.25 0.5 0.75 1
5%
50%
90%
40. 14
How well do build co-
change prediction models
perform on sparse data?
Precision
Recall
F1-score
AUC
0 0.25 0.5 0.75 1
5%
50%
90%
Challenge 1:
Very small datasets tend
to yield models that
under-perform
41. 14
How well do build co-
change prediction models
perform on sparse data?
Precision
Recall
F1-score
AUC
0 0.25 0.5 0.75 1
5%
50%
90%
How well do build co-
change prediction models
perform on other datasets?
Precision
Recall
F1-score
AUC
0 0.25 0.5 0.75 1
Eclipse => Mozilla
Jazz => Mozilla
Lucene => Mozilla
Challenge 1:
Very small datasets tend
to yield models that
under-perform
42. 14
How well do build co-
change prediction models
perform on sparse data?
Precision
Recall
F1-score
AUC
0 0.25 0.5 0.75 1
5%
50%
90%
How well do build co-
change prediction models
perform on other datasets?
Precision
Recall
F1-score
AUC
0 0.25 0.5 0.75 1
Eclipse => Mozilla
Jazz => Mozilla
Lucene => Mozilla
Challenge 1:
Very small datasets tend
to yield models that
under-perform
Challenge 2:
Cross-project build co-
change models tend
to under-perform
46. 17
Using transfer learning to provide some
domain knowledge to the training corpus
Training
corpus
Testing
corpus
47. Move some training
data from target
system to the
training corpus
17
Using transfer learning to provide some
domain knowledge to the training corpus
Training
corpus
Testing
corpus
58. 24
Use training corpus to find an
appropriate threshold
Training
corpus
Classification
model
59. 24
Use training corpus to find an
appropriate threshold
Training
corpus
Classification
model 1
60. 25
Use training corpus to find an
appropriate threshold
Training
corpus
Classification
model
Classification
model 1
2
61. 25
Use training corpus to find an
appropriate threshold
Training
corpus
Classification
model
Classification
model 1
2
62. 26
Use training corpus to find an
appropriate threshold
Classification
model
Classification
model 1
2
…
Classification
model N
63. Ensemble of
models used on
the testing corpus
26
Use training corpus to find an
appropriate threshold
Classification
model
Classification
model 1
2
…
Classification
model N
67. 29
Our approach outperforms baseline
cross-project approaches
Eclipse
Jazz
Lucene
Mozilla
Average
0 0.25 0.5 0.75 1
Our approach Ordinary cross-project AdaBoost TrAdaBoost
Worstmeasured
F-score
68. 29
Our approach outperforms baseline
cross-project approaches
Eclipse
Jazz
Lucene
Mozilla
Average
0 0.25 0.5 0.75 1
Our approach Ordinary cross-project AdaBoost TrAdaBoost
Worstmeasured
F-score
37%-42%
improvement
69. 30
Our approach achieves similar
results to within-project models
Eclipse
Jazz
Lucene
Mozilla
Average
0 0.25 0.5 0.75 1
Our approach Within-project
Worstmeasured
F-score
70. 30
Our approach achieves similar
results to within-project models
Eclipse
Jazz
Lucene
Mozilla
Average
0 0.25 0.5 0.75 1
Our approach Within-project
Only a 7% drop in
performance
Worstmeasured
F-score