Cross-Project Build Co-change Prediction

Cross-Project Build
Co-change Prediction
Shane
McIntosh
Ahmed E.
Hassan
shanemcintosh@acm.org
@shane_mcintosh
shanemcintosh.org
Emad
Shihab
David
Lo
Xin
Xia

What is a build system?
Source
code
2

What is a build system?
Source
code
Deliverable
2

.tex
.c
.cc
.o
.o
.dvi
.a
.exe
.pdf
.deb
Build systems describe how sources are
translated into deliverables
3

The build system is at the
heart of techniques like
Continuous Integration (CI)
4
.c .mk

Commit
4
Commit
9719cf0
.c .mk

Commit
4
Build
Commit
9719cf0
.c .mk

Commit
4
Build
Test
Commit
9719cf0
.c .mk

Commit
4
Build
Test
Report
Commit
9719cf0 wassuccessfullyintegrated
Commit
9719cf0
.c .mk

“...nothing can be
said to be certain,
except death and
taxes” - Benjamin Franklin
The Build “Tax”
An Empirical Study of Build
Maintenance Effort
S. McIntosh, B. Adams, T. H. D.
Nguyen, Y. Kamei, A. E. Hassan
[ICSE 2011]
Up to 27% of source
changes require build
changes, too!
5

Neglected build maintenance
is a frequent cause of
build breakage
6
.c .mk

build breakage
Commit
6
Commit
aedd38
.c
.mk

build breakage
Commit
6
Build
Commit
aedd38
.c
.mk

build breakage
Commit
6
Build
Test
Commit
aedd38
.c
.mk

build breakage
Commit
6
Build
Test
Report
Commit
aedd38
.c
.mk
Commit
aedd38
broke the
build!

can even impact end users
7

7
Not working due
to linking of
incorrect SQLite
library version

7
Not working due
to linking of
incorrect SQLite
library version
When are build
changes necessary?

8
Overview of the studied systems

8
29 years of
historical data

8
29 years of
historical data
Proprietary and opensource systems

Grouping related changes according
to the work items that they address
9

.c .c .c
Changes
.mk
9

Missed code
in #2121
Add feature
#2121
Fix for
bug #1234
.c .c .c
Transactions
Changes
.mk
9

2121
Missed code
in #2121
Add feature
#2121
1234
Fix for
bug #1234
.c .c .c
Transactions
Work items
Changes
.mk
9

1 2
.mk
10
We train classiﬁers to identify code
changes that require build co-changes
Work
items
.c.c .c
Classiﬁcation
model
Build change
necessary
No build change
necessary

1 2
.mk
10
Work
items
.c
.c .cClassiﬁcation
model
Build change
necessary
No build change
necessary

1 2
.mk
11
Work
items
.c
Build change
necessary
No build change
necessary
Classiﬁcation
model

12
Prior work shows that within-project build
co-change prediction can be accurate
Mining Co-Change Information to
Understand when Build Changes
are Necessary
S. McIntosh, B. Adams, M.
Nagappan, A. E. Hassan
[ICSME 2014]
Build co-change
classiﬁers can achieve
an AUC of 0.60-0.88

However, a large amount of historical
data was used to train the classiﬁers
13

13
What about new
projects?

13
What about new
projects?
…or projects withpoorly-recordedhistorical data?

13
What about new
projects?
…or projects withpoorly-recordedhistorical data?
Can we leverage these large
corpora for the small ones?

14
How well do build co-
change prediction models
perform on sparse data?
Precision
Recall
F1-score
AUC
0 0.25 0.5 0.75 1
5%
50%
90%

14
Precision
Recall
F1-score
AUC
0 0.25 0.5 0.75 1
5%
50%
90%
Challenge 1:
Very small datasets tend
to yield models that
under-perform

14
Precision
Recall
F1-score
AUC
0 0.25 0.5 0.75 1
5%
50%
90%
perform on other datasets?
Precision
Recall
F1-score
AUC
0 0.25 0.5 0.75 1
Eclipse => Mozilla
Jazz => Mozilla
Lucene => Mozilla
Challenge 1:
under-perform

14
Precision
Recall
F1-score
AUC
0 0.25 0.5 0.75 1
5%
50%
90%
perform on other datasets?
Precision
Recall
F1-score
AUC
0 0.25 0.5 0.75 1
Eclipse => Mozilla
Jazz => Mozilla
Lucene => Mozilla
Challenge 1:
under-perform
Challenge 2:
Cross-project build co-
change models tend
to under-perform

15
Domain-speciﬁc project characteristics may
limit the applicability of cross-project models
Training
corpus
Testing
corpus

Training
corpus
16
Classiﬁcation
model
Testing
corpus

Training
corpus
16
Classiﬁcation
model
Testing
corpus
?

17
Using transfer learning to provide some
domain knowledge to the training corpus
Training
corpus
Testing
corpus

Move some training
data from target
system to the
training corpus
17
Training
corpus
Testing
corpus

18
Training
corpus
Testing
corpus

19
Training
corpus
Testing
corpus
Classiﬁcation
model

19
Training
corpus
Testing
corpus
Classiﬁcation
model
?

20
Challenge 3:
Build co-changes are the minority

20
Challenge 3:
Build co-changes are the minority
Only 8%-17% of changesare build co-changing

21
Training
corpus
Testing
corpus
Use training corpus to ﬁnd an
appropriate threshold

22
Training
corpus
Testing
corpus
Classiﬁcation
model

Set aside the
testing corpus
22
Training
corpus
Testing
corpus
Classiﬁcation
model

23
Training
corpus
Classiﬁcation
model
Training
corpus

Incorrectly
classiﬁed!
23
Training
corpus
Classiﬁcation
model
Training
corpus

24
Training
corpus
Classiﬁcation
model

24
Training
corpus
Classiﬁcation
model 1

25
Training
corpus
Classiﬁcation
model
Classiﬁcation
model 1
2

26
Classification
model
Classification
model 1
2
…
Classification
model N

Ensemble of
models used on
the testing corpus
26
Classification
model
Classification
model 1
2
…
Classification
model N

27
Evaluating our approach
Relative
performance

27
Relative
performance
Training conﬁguration
sensitivity
Source
Target

28
Relative
performance
Source
Target
sensitivity

29
Our approach outperforms baseline
cross-project approaches
Eclipse
Jazz
Lucene
Mozilla
Average
0 0.25 0.5 0.75 1
Our approach Ordinary cross-project AdaBoost TrAdaBoost
Worstmeasured
F-score

29
Our approach outperforms baseline
cross-project approaches
Eclipse
Jazz
Lucene
Mozilla
Average
0 0.25 0.5 0.75 1
Our approach Ordinary cross-project AdaBoost TrAdaBoost
Worstmeasured
F-score
37%-42%
improvement

30
Our approach achieves similar
results to within-project models
Eclipse
Jazz
Lucene
Mozilla
Average
0 0.25 0.5 0.75 1
Our approach Within-project
Worstmeasured
F-score

30
Our approach achieves similar
results to within-project models
Eclipse
Jazz
Lucene
Mozilla
Average
0 0.25 0.5 0.75 1
Our approach Within-project
Only a 7% drop in
performance
Worstmeasured
F-score

31
Relative
performance
Source
Target
sensitivity

31
Relative
performance
37%-42%
improvement
over baseline
Source
Target
sensitivity

31
Relative
performance
37%-42%
improvement
over baseline
Only 7% drop
of within-project
F-measure
Source
Target
sensitivity

32
Relative
performance
Source
Target
37%-42%
improvement
over baseline
Only 7% drop
of within-project
F-measure
sensitivity

33
Additional data from the target system
slowly improves classiﬁer performance
Source
Target
F-score

34
Relative
performance
Source
Target
37%-42%
improvement
over baseline
Only 7% drop
of within-project
F-measure
sensitivity

34
Relative
performance
Source
Target
37%-42%
improvement
over baseline
Only 7% drop
of within-project
F-measure
sensitivity
F-score tends to improve
as more target system
data becomes available

Cross-Project Build Co-change Prediction

More Related Content

What's hot

Viewers also liked

Similar to Cross-Project Build Co-change Prediction

Recently uploaded

Cross-Project Build Co-change Prediction