Beyond Coverage: 
What Lurks in Test Suites? 
Patrick Lam, @uWaterlooSE 
(and Felix Fang) 
University of Waterloo
Test Suites: Myths vs Realities.
Subjects: Open-Source Test Suites
Basic Test Suite Properties 
Benchmark sizes: 
30 kLOC (google-visualization) to 
495 kLOC (weka) 
% of system represented by tests: 
5.3% (weka) to 50.4% (joda-time)
Static Test Suite Properties
Test suite versus benchmark size 
m = 0.3002 
m = 0.03514
# test cases versus # test methods
apache-commons-collection tests 
Consider map.TestFlat3Map: 
contains 14 test methods 
yet, 156 test cases 
superclass tests: 42 tests 
+ 4 Apache Commons Collections “bulk tests”
Run-time Test Suite Properties
Test suites run quickly 
joda-time 4.9s 
jdom 5.0s 
google-vis 5.1s 
jgrapht 16.9s 
weka 28.9s 
apache-cc 34.0s 
poi 36.5s 
jmeter 53.0s 
jfreechart 241.0s
Failing tests 
76/384 
0 
n/a 0 
1 
0 
3/1109 
0 
0 
0
Continuous Integration: Daily Builds
Continuous Integration: Daily Tests 
(via SonarQube, 
Travis CI, Surefire)
Myth #1: 
Coverage is a key property 
of test suites.
Coverage is central in textbooks 
Ammann and Offutt, Introduction to Software Testing
Coverage metrics from EclEmma
Coverage metrics
Reality #1 
Coverage sometimes important, 
but tools only give limited data.
Guideline #1 
Consider metrics beyond 
reported coverage results: 
- weka uses peer review for QA 
- not measured by tools: 
input space coverage
Myth #2 
Tests are simple. 
- test complexity 
- test dependencies
Static Code Complexity
Test methods with at least 5 asserts 
e.g. from Joda-Time: 
public void testEquality() { 
assertSame(getInstance(TOKYO), getInstance(TOKYO)); 
assertSame(getInstance(LONDON), getInstance(LONDON)); 
assertSame(getInstance(PARIS), getInstance(PARIS)); 
assertSame(getInstanceUTC(), getInstanceUTC()); 
assertSame(getInstance(), getInstance(LONDON)); 
}
% Test methods with ≥ 5 asserts
Test Methods with Branches 
if (isAllowNullKey() == false) { 
try { 
assertEquals(null, o.nextKey(null)); 
} catch (NullPointerException ex) {} 
} else { 
assertEquals(null, o.nextKey(null)); 
} 
// from apache-cc
Test Methods with Loops 
counter = 0; 
while (this.complexPerm.hasNext()) { 
this.complexPerm.getNext(); 
counter++; 
} 
assertEquals(maxPermNum, counter); 
// from jgrapht
% Test Methods with Control-Flow
Tests Which Use the Filesystem
Filesystem Usage Details 
new File(tempDir, "tzdata"); 
verifies vs canonical forms 
of serialized collections on disk
More Filesystem Usage Details 
resources, serialization 
creates charts, tests their existence 
some comparisons vs test data
Tests Which Use the Network 
*
Network Usage Details 
connects to 
http://sc.openoffice.org 
tests HTTP mirror server 
at localhost
flip side: Mocks and Stubs 
True mocks only in Google Visualization.
flip side: Mocks and Stubs 
True mocks only in Google Visualization. 
Found stubs/fakes in 
4 other suites.
Reality #2 
Test cases are mostly simple. 
few asserts, little branching 
some filesystem/net usage
Consequence #2 
Many tests don’t need 
high expertise to write, 
but some do!
Myth #3 
Test cases are written by hand.
Types of reuse (standard Java) 
1. test class setUp()/tearDown() 
2. inheritance: e.g. in apache-cc, 
TestFastHashMap extends AbstractTestMap 
3. composition: e.g. in jfreechart, 
helper class RendererChangeDetector
JUnit setup/tearDown usage
Inheritance is heavily used 
(> 50% test classes inherit functionality)
Test Classes with Custom Superclasses
Helper Classes Example 
from poi: 
/** Test utility class to get Records 
* out of HSSF objects. */ 
public final class RecordInspector { 
public static Record[] getRecords(...) {} 
}
Helper Class Count 
weka 1 
google-vis 3 
jdom 6 
joda-time 7 
jfreechart 7 
jmeter 12 
jgrapht 15 
apache-cc 22 
hsqldb 31 
poi 54
Test Clone Example 
public void testNominalFiltering() { 
m_Filter = getFilter(Attribute.NOMINAL); 
Instances r = useFilter(); 
for (int i = 0; i < r.numAttributes(); i++) 
assertTrue(r.attribute(i).type() != Attribute.NOMINAL);} 
public void testStringFiltering() { 
m_Filter = getFilter(Attribute.STRING); 
Instances r = useFilter(); 
for (int i = 0; i < r.numAttributes(); i++) 
assertTrue(r.attribute(i).type() != Attribute.STRING);}
Assertion Fingerprints 
detect clones 
by identifying 
similar tests
Incidence of cloning
How to Refactor? 
● setUp/tearDown/subclassing 
● JUnit 4: 
Parametrized Unit Tests 
● Test Theories
apache-cc: Bulk tests 
public BulkTest bulkTestKeySet() { 
return new TestSet(makeFullMap().keySet()); 
} 
● runs all tests in the TestSet class 
with the object returned from makeFullMap(). 
keySet()
jdom: Generated Test Case Stubs 
class ClassGenerator makes e.g.: 
class TestDocument { 
void test_TCC__List(); 
void test_TCM__int_hashCode(); 
} 
Developer still needs to populate tests.
Automated Testing Technology 
In our test suites, 
the principal automation technology 
was cut-and-paste.
Reality #3 
Automated test generation 
is uncommon in our test suites.
Guideline 
Maximize reuse: 
setUp/tearDown, 
inheritance, 
parametrized tests, 
whatever works for you!
Suggestion 
Use automated test generation tools! 
Some examples: 
● Korat (structurally complex tests) 
● Randoop (random testing) 
● CERT Basic Fuzzing Framework 
http://mit.bme.hu/~micskeiz/pages/code_based_test_generation.html
Summary 
Myths: 
1. Coverage is a key property 
of test suites. ≈ 
2. Tests are simple. ✓ 
3. Tests are written by hand. ✓
Data 
https://docs.google. 
com/spreadsheets/d/1xAsdk35tJAOM4WGbGloliS4ovDJ8_ 
MDn6_Gzk0DXEZQ

GTAC 2014: What lurks in test suites?