A dive into code coverage tools, explaining what they are trying to measure, what they actually measure instead, and why enforcing code coverage percentages is usually futile.
7. The Starting
Point• “How well is our code
tested?”
• This is a qualitative measure
• Computers don’t do
qualitative
• Can we make it quantitive?
8. A Quantitive
Measure
“How many lines* of code can
I delete without causing any
tests** to fail?”
*statements, methods, branches, et
**or compilation
9. Why is this a good
measure?
• Direct translation of the
qualitative question
• Makes sense
• Minimises code written for a
set of tests
10. This is
expensive• Really, really, expensive
• n
statements/branches/method
s = n(n-1) compile and test
cycles
• We need something cheaper
22. What to do
instead?
• Spot check
• Manually use the more stringent
measure
• Compare to last week, not last
commit
• If the number goes down, know
why
• Separate covered and uncovered
27. What is mutation
testing?
• Mutate statements instead of
deleting them
• Every mutation should make
a test fail
28. Thoughts on
mutation testing
• Seems decent for loop logic or
math logic
• Doesn’t know how to mutate a lot
of statements
• Doesn’t mutate source code, just
object code
• Based on a traditional coverage
run
29. UNITED KINGDOM
+44 203 603 7830
helloUK@equalexperts.com
Equal Experts UK Ltd
30 Brock Street
London NW1 3FG
INDIA
+91 20 6607 7763
helloIndia@equalexperts.com
Equal Experts India Private Ltd
Office No. 4-C
Cerebrum IT Park No. B3
Kumar City, Kalyani Nagar
Pune, 411006
CANADA
+1 403 775 4861
helloCanada@equalexperts.com
Equal Experts Devices Inc
205 - 279 Midpark way S.E.
T2X 1M2
Calgary, Alberta
PORTUGAL
+351 211 378 414
helloPortugal@equalexperts.com
Equal Experts Portugal
Avenida Dom João II, Nº35
Edificio Infante 11ºA
1990-083 Parque das Nações
Lisboa – Portugal
USA
helloUSA@equalexperts.com
Equal Experts Inc
315 Hudson Street
9th Floor
New York City, NY 10013
Thanks!
Editor's Notes
And why?
Sometimes, this is “because my boss makes me”, but setting that aside…
This is a thing I hope most of us can get behind. The problems are not that this is done, but how it’s done. Sometimes this are people problems (we’ll get into that later), but sometimes they aren’t.
This isn’t the only measure…
“can I change a statement without making a test fail?” is also a good one?
The sample java later in this presentation has
23 statements that could potentially be removed (excluding trivial things like throws clauses and import statements). That’s 506 build and test cycles.
1 instrumented test run, which is more expensive than normal, but cheaper than hundreds or thousands of test runs
Synthetic methods: Default constructors, methods on enums
Java 7: ARM blocks. The compiler purposefully puts in more blocks than will be executed (null checks in the finally, etc) knowing that the JIT will optimise the extra ones away.
Coverage tools don’t even attempt to detect useless code.
Profoundly flawed = Java 7 support, etc.
Delete an untested method that does nothing but was executed during a test… coverage goes down slightly.
In our example, with the spurious method, instruction coverage is 82%. Without it, coverage is 70%.
Note that branch coverage went from 100% to undefined!
Profoundly flawed = Java 7 support, etc.
If it’s 100%, and you delete a chunk of untested code, it should still be 100%… because all of the code that’s less should still be covered.
This also holds for 0% coverage. I assume we’re all happy to ignore that case.
Separate code: consider a module with 100% (or high) coverage, and another module without enforced coverage. Move things into the one module over time.
Trivial getters and setters don’t need to be tested directly.
Tests are executable documentation, and documentation isn’t needed for that.
should you enforce 100% code coverage?
twice in my career I’ve been on teams where we were close to 100% code coverage. In 2013 we were three or four statements/branches away for a while. Spot checking every week or so. So finally, we put in explicit tests to cover those three or four spots. A week later, we were still at 100%.
I talked to some of the guys on the team… should we fail the build if coverage isn’t 100%? Let’s try it.. see what happens. We turned it on, and forgot about it for a couple of weeks. Then the first few times we tripped it, it was definitely areas where we had forgotten to write a test… so we decided to keep it.
Also, when somebody asked “what’s your code coverage?” and you can say 100% without checking anything you feel like an absolute boss. Good for political reasons sometimes. :-)
should you enforce 100% code coverage?
twice in my career I’ve been on teams where we were close to 100% code coverage. In 2013 we were three or four statements/branches away for a while. Spot checking every weak-ish. So finally, we put in explicit tests to cover those three or four spots. A week later, we were still at 100%.
I talked to some of the guys on the team… should we fail the build if coverage isn’t 100%? Let’s try it.. see what happens. We turned it on, and forgot about it for a couple of weeks. Then the first few times we tripped it, it was definitely areas where we had forgotten to write a test… so we decided to keep it.
Also, when somebody asked “what’s your code coverage?” and you can say 100% without checking anything you feel like an absolute boss. Good for political reasons sometimes. :-)
The last two times I’ve done this talk, people have mentioned mutation testing — specifically PIT. (Which seems to be the viable option in the Java world)
Example mutations: return null instead of a value, subtract instead of add, that sort of thing
The class of problem PIT is really good at catching is tests that don’t assert anything.
To improve performance, PIT does a single traditional coverage run… which it then uses to learn which tests to run which mutations against. Which means it’s got a gap for statements that aren’t executed by any tests…. Same old problem.
Mutating object code and not source code means that we can’t see that a mutation doesn’t make something not compile.
False positives mean that improving code can still make the coverage percentage go down. An example of this would be removing one of two duplicate methods.