While it is common for projects to measure what percentage of their statements are executed by tests, this single number is woefully inadequate at providing a detailed understanding of the extent to which a project’s code is tested, if there are gaps in the tests, and if these tests are useful in any meaningful way. For instance, seemingly simple changes in one part of the codebase may reduce the efficacy of existing tests that seem otherwise unrelated.
Code coverage can be useful to track long-term trends in how tested a project is, but on a day-to-day basis, can’t serve as an indicator for the change in test suite quality. In particular, moving the coverage needle even 0.01% can be extremely difficult in a project with millions of lines of code. At such a large scale, focus often drifts from which lines of code are covered to simply the number of lines covered. However, a change in the coverage of several hundred critical lines of code might be important for developers to take notice of. Over time, these small changes to which lines are covered add up to form a coverage debt, and can lead to a dangerous reduction in test suite effectiveness.
We are building tools and techniques to help every developer track and manage their coverage debt, easily answering questions like: Which lines are no longer covered, even though I didn’t change them? Which lines are non-deterministically covered, perhaps indicative of flaky tests? By answering these questions with hard data, we can provide developers with a rich understanding of the impact of their actions on test suite effectiveness.
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
A Large-Scale Study of Test Coverage Evolution
1. ASE Sept 5, 2018
A Large-Scale Study of
Test Coverage Evolution
Michael Hilton, Jonathan Bell, and Darko Marinov
Carnegie Mellon University, George Mason University, and University of Illinois at Urbana-Champaign
http://www.code-coverage.org/
2. ASE Sept 5, 2018
Coverage is the start to a
conversation about test quality
total # of statements executed by tests
total # of statementsStatement Coverage =
3. code-coverage.orgASE Sept 5, 2018
Coverage Uses
• So how do the projects in the middle use coverage on a day-to-day
basis, in a regression environment?
• Try to increase coverage
• Try to ensure each change is covered
• Use coverage for regression test selection/flaky test detection
• Our goal: Revisit what coverage can be used for, develop new
metrics through an empirical study
4. code-coverage.orgASE Sept 5, 2018
1a. Literature
search
2a. Clone and build projects, collect
coverage
29 projects
5,360 builds
Identifying Projects Collecting Coverage
2b. Extract data from Coveralls API
1b. GitHub +
Coveralls mining
18 projects
2,456 builds
3. Analyze diffs and
coverage per-build
4. Visualize aggregate
data
47 projects,
7,816 builds
Synthesizing Results Generating Visualizations
Methodology
Our dataset of code coverage information from 7,816 revisions of 47
projects is publicly available: http://www.code-coverage.org/
+
2b. Extract data from Coveralls API
1b. GitHub +
Coveralls mining
1a. Literature
search
2a. Clone and build projects, collect
coverage
3. Analyze diffs and
coverage per-build
4. Visualize aggregate
data
18 projects
29 projects
2,456 builds
5,360 builds
47 projects,
7,816 builds
Identifying Projects Collecting Coverage Synthesizing Results Generating Visualizations
Coveralls: A (previously) untapped new resource for
researchers
5. code-coverage.orgASE Sept 5, 2018
Tooling
We used diff to compare across file versions to track the changes to
unchanged but moved lines
$ diff --unchanged-line-format="%dn,%c'12'" --new-line-format="n%c'12'" --old-
line-format="" $FILE1 $FILE2 | awk '/,/{n++;print $0n} /n/{n++}'"
1. Foo;
2. Bar;
1. New;
2. Foo;
3. Bar;
Source file,
version 1
Source file,
version 2
6. code-coverage.orgASE Sept 5, 2018
New Questions
• Which statements were covered but are no longer covered?
• Which statements are newly covered?
• Which statements flap between covered and uncovered?
• Broadly: How does a change to one statement impact the
coverage of other statements?
7. code-coverage.orgASE Sept 5, 2018
RQ: Instead of trying to examine the total
coverage, is it enough to monitor just the
patch coverage?
8. code-coverage.orgASE Sept 5, 2018
Patch coverage does not correlate with
overall coverage
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
P12
P13
P14
P15
P16
P17
P18
P19
P20
P21
P22
P23
P24
P25
P26
P27
P28
P29
P30
P31
P32
P33
P34
P35
P36
P37
P38
P39
P40
P41
P42
P43
P44
P45
P46
P47
0 (0−25) (25−50] (50−75] (75−100) 100
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
0% 25% 50% 75% 100%
Percent of builds satisfying patch coverage % at level indicated by color:
Percent of builds satisfying coverage % at level indicated by color:
0 (0−25) (25−50] (50−75] (75−100) 100
25% 50% 75%
Percent of builds satisfying patch coverage % at level indicated by color:
Implication for developers: Do not rely on patch
coverage as an indicator of overall quality
9. code-coverage.orgASE Sept 5, 2018
int x = 12;
if(x > 10)
{
doStuff();
}
RQ: What impact do patches have on the
coverage of non-patch code?
int x = 12;
if(x > 10)
{
doStuff();
}
Original version: all lines covered,
doStuff() is called
if(x > 10)
{
doStuff();
}
int x = 10;
New version: first line changed,
reduces coverage since doStuff() not
called
10. code-coverage.orgASE Sept 5, 2018
Patches often impact coverage of existing (non-patch) code
P01
P02
P03
P04
P05
P06
P07
P08
P09
P10
P11
P12
P13
P14
P15
P16
P17
P18
P19
P20
P21
P22
P23
P24
P25
P26
P27
P28
P29
P30
P31
P32
P33
P34
P35
P36
P37
P38
P39
P40
P41
P42
P43
P44
P45
P46
P47
0 25 50 75 100
Patches with changes to code files:
Patches with no changes to code files:
Increase
Increase
No impact
No impact
Decrease
Decrease
P01
P02
P03
P04
P05
P06
P07
P08
P09
P10
P11
P12
P13
P14
P15
P16
P17
P18
P19
P20
P21
P22
P23
P24
P25
P26
P27
P28
P29
P30
P31
P32
P33
P34
P35
P36
P37
P38
P39
P40
P41
P42
P43
P44
P45
P46
P47
0 25 50 75 100
Patches with changes to code files:
Patches with no changes to code files:
Increase
Increase
No impact
No impact
Decrease
Decrease
Implication for developers: Do not assume patches
leave existing coverage unchanged
Implication for researchers: Do not assume changes
to non-code files do not change behavior
11. code-coverage.orgASE Sept 5, 2018
Tracking Coverage Over Time
Ideal progression of
coverage
Also possible
progression of
coverage
40% Covered
Version 1
50% Covered
Version 2
60% Covered
Version 3
80% Covered
Version 4
40% Covered 50% Covered 60% Covered 80% Covered
Covered
Not Covered
New idea: what about tracking the
coverage of each statement?
12. code-coverage.orgASE Sept 5, 2018
RQ: Is it ever the case that coverage
changes in ways we cannot observe using
traditional code coverage metrics?
60% Coverage 60% Coverage
≠
13. code-coverage.orgASE Sept 5, 2018
Percentage of commits (size of bar) with statements changing coverage (color of
bar) even when the total project coverage did not appear to change.
25 50 75
0 1−10 11−100 101−1,000 1,001+
P01
P02
P03
P04
P05
P06
P07
P08
P09
P10
P11
P12
P13
P14
P15
P16
P17
P18
P19
P20
P21
P22
P23
P24
P25
P26
P27
P28
P29
P30
P31
P32
P33
P34
P35
P36
P37
P38
P39
P40
P41
P42
P43
P44
P45
P46
P47
0 25 50 75 100
0 1−10 11−100 101−1,000 1,001+
Set of lines covered each build can vary
Implication for developers: Do not rely on “steady”
coverage indicating no change to coverage
14. code-coverage.orgASE Sept 5, 2018
RQ: How does the set of covered lines
change? Are there hot-spots of coverage
change, i.e., lines that flip between being
covered and uncovered throughout
evolution?
int x = 12; int x = 12; int x = 12; int x = 12;
16. code-coverage.orgASE Sept 5, 2018
RQ: What are the reasons for coverage changes?
Does code coverage change because old code
becomes tested, or because new, tested code is
added? Or, does deleting lines drive changes to
code coverage?