Tracing Software Build
Processes to Uncover License
Compliance Inconsistencies
Shane
McIntosh
Sander van
der Burg
Eelco
Dolstra
Julius
Davies
Daniel M.
Germán
Armijn
Hemel
Tjaldur
Software Governance
Solutions
@shane_mcintosh
Source
code
What is a build system?
Source
code
Deliverable
What is a build system?
.tex
.c
.cc
.o
.o
.dvi
.a
.exe
.pdf
.deb
Build systems describe how sources are
translated into deliverables
3
Continuous Integration:
Enabled by the
build system
4
.c .mk
Continuous Integration:
Enabled by the
build system
Commit
4
Commit
9719cf0
.c .mk
Continuous Integration:
Enabled by the
build system
Commit
4
Build
Commit
9719cf0
.c .mk
Continuous Integration:
Enabled by the
build system
Commit
4
Build
Test
Commit
9719cf0
.c .mk
Continuous Integration:
Enabled by the
build system
Commit
4
Build
Test
Report
Commit
9719cf0 wassuccessfullyintegrated
Commit
9719cf0
.c .mk
Continuous Integration:
Enabled by the
build system
Commit
4
Build
Test
Report
Commit
9719cf0 wassuccessfullyintegrated
Commit
9719cf0
.c .mk
5
There [is] no
such thing as
a free lunch
“ ”
5
There [is] no
such thing as
a free lunch
“ ” An Empirical Study of
Build Maintenance
Effort
S. McIntosh, B. Adams,
T. H. D. Nguyen,
Y. Kamei, A. E. Hassan
[ICSE 2011]
Up to 27% of
source changes
are accompanied
by build changes
6
Maintenance overhead
6
Maintenance overhead
.c .mk?
Source-build
co-change
6
Maintenance overhead
Build technology
and maintenance
.c .mk?
Source-build
co-change
6
Maintenance overhead
Build logic
cloning
Build technology
and maintenance
.c .mk?
Source-build
co-change
6
Execution overhead
Maintenance overhead
Build logic
cloning
Build technology
and maintenance
.c .mk?
Source-build
co-change
6
Execution overhead
Maintenance overhead
Build logic
cloning
Build technology
and maintenance
.c .mk?
Source-build
co-change
Build hotspot
detection
6
Execution overhead
Maintenance overhead
Build logic
cloning
Build technology
and maintenance
.c .mk?
Source-build
co-change
Powerful hotspot
indicators
Build hotspot
detection
6
Execution overhead
Maintenance overhead
Build logic
cloning
Build technology
and maintenance
.c .mk?
Source-build
co-change
Powerful hotspot
indicators
Build hotspot
detection
Build systems also contain
useful information!
Reusable components are released
under different license terms
7
Reusable components are released
under different license terms
7
Apache Public
License
Failure to comply with license terms
can lead to costly legal issues
8
Failure to comply with license terms
can lead to costly legal issues
8
Failure to comply with license terms
can lead to costly legal issues
8
9
Which source
files are enabled?
Ensuring license compliance with
reused components
.c.c.c.c
9
Which source
files are enabled?
Ensuring license compliance with
reused components
.c.c.c
.c
9
Which source
files are enabled?
Which components
are used?
Ensuring license compliance with
reused components
.c.c.c
.c
9
Which source
files are enabled?
Which components
are used?
How are they
combined?
Ensuring license compliance with
reused components
.c.c.c
.c
Static
link
Dynamic
link
9
Which source
files are enabled?
Which components
are used?
How are they
combined?
Ensuring license compliance with
reused components
.c.c.c
.c
Static
link
Dynamic
link
The build system can
answer these questions!
We use system tracing to
discover build dependencies
Build process
10
Trace
log
OS kernel
open()
We use system tracing to
discover build dependencies
Build process
10
read()
write()
close()
Trace
log
Trace
log
We mine build traces to construct a
concrete build dependency graph
patchelf.ccelf.h
patchelf.o
patchelf
/usr/bin/patchelf
libstdc++
11
Trace
log
We mine build traces to construct a
concrete build dependency graph
patchelf.ccelf.h
patchelf.o
patchelf
/usr/bin/patchelf
g++
libstdc++
g++
install
11
patchelf.ccelf.h
patchelf.o
patchelf
/usr/bin/patchelf
g++
libstdc++
g++
install
Annotate build graph nodes with
license information using Ninka
12
patchelf.ccelf.h
patchelf.o
patchelf
/usr/bin/patchelf
g++
libstdc++
g++
install
Annotate build graph nodes with
license information using Ninka
12
Inconsistency
introduced!
patchelf.ccelf.h
patchelf.o
patchelf
/usr/bin/patchelf
g++
libstdc++
g++
install
Annotate build graph nodes with
license information using Ninka
12
13
Empirical study
13
Empirical study
(RQ1)
Accuracy
13
Empirical study
(RQ1)
Accuracy
(RQ2)
Practicality
13
14
Measuring the accuracy
of our CBDG approach
Included .c.c.c.c
Excluded
14
Measuring the accuracy
of our CBDG approach
Included .c.c.c
.cExcluded
14
Measuring the accuracy
of our CBDG approach
Included .c.c .c
.cExcluded
Delete
14
Measuring the accuracy
of our CBDG approach
Included .c.c .c
.cExcluded
Delete
Execute
build
14
Broken means
true positive
Measuring the accuracy
of our CBDG approach
Included .c.c .c
.cExcluded
Delete
Execute
build
14
Clean means
false positive
Broken means
true positive
Measuring the accuracy
of our CBDG approach
Included .c.c .c
.cExcluded
Delete
Execute
build
14
Clean means
false positive
Broken means
true positive
Measuring the accuracy
of our CBDG approach
Included .c.c .c
.cExcluded
Delete
Execute
build
.c
14
Clean means
false positive
Broken means
true positive
Measuring the accuracy
of our CBDG approach
Included .c.c .c
.cExcluded
Delete
Execute
build
Execute
build
.c
14
Clean means
false positive
Broken means
true positive
Measuring the accuracy
of our CBDG approach
Included .c.c .c
.cExcluded
Delete
Execute
build
Broken means
false negative
Execute
build
.c
14
Clean means
false positive
Broken means
true positive
Measuring the accuracy
of our CBDG approach
Included .c.c .c
.cExcluded
Delete
Execute
build
Clean means
true negative
Broken means
false negative
Execute
build
.c
Empirical study
(RQ1)
Accuracy
Precision:
88%-100%
Recall:
98%-100%
(RQ2)
Practicality
15
Bugs filed using our approach
on multi-licensed packages
FFmpeg
License
was updated
within 3 days
+
16
Bugs filed using our approach
on multi-licensed packages
FFmpeg
License
was updated
within 3 days
+
CUPS
+
Offending files
were removed
within 2 days
16
Empirical study
(RQ1)
Accuracy
Precision:
88%-100%
Recall:
98%-100%
(RQ2)
Practicality
Prompted
quick code
changes in
two systems
17
Tracing Software Build Processes to Uncover License Compliance Inconsistencies: A Retrospective
Tracing Software Build Processes to Uncover License Compliance Inconsistencies: A Retrospective
Tracing Software Build Processes to Uncover License Compliance Inconsistencies: A Retrospective
Tracing Software Build Processes to Uncover License Compliance Inconsistencies: A Retrospective
Tracing Software Build Processes to Uncover License Compliance Inconsistencies: A Retrospective

Tracing Software Build Processes to Uncover License Compliance Inconsistencies: A Retrospective