My slides from the DevOpsDays Jakarta 2020. In this talk, i showed examples how to cheat your code coverage and why you should not trust the false sense of security saying 'it works because that line is covered by tests'.
https://www.devopsdaysjkt.org/
Keep C.A.L.M.S and Do DevOps !
#DevOpsDaysJakarta20
5. Code Coverage
Is a measure used in software testing. It describes the degree to
which the source code of a program has been tested.
https://en.wikipedia.org/wiki/Code_coverage
11. Why?
Why do we need Code Coverage
____________________________________________
To Ensure Quality
Minimize Bugs/Defects
Early detection of flaws
Avoid Cost due to Rework and Delay
Higher Confidence
Everyone Happy
12. Software Negligence & Testing Coverage, Cem Kraner, 1995
The question is, what's wrong with this argument?
14. How to reach 100% Code Coverage
____________________________________________
Every line of source code covered
Every underlying dependency is covered
Every possible representative input is covered
Every branch, condition and statement is covered
Every unexpected and error case is covered
16. Do we really need 100% Code Coverage?
____________________________________________
Every high risk area is covered
Every Sprint we increase our test suite and coverage
Every test is valuable
17. Program
Mutant 1
Mutant 2
Mutant 3
Test
Suite
4. Compare
Test
Suite
Same
Same
Different
(“killed”)
1. Test
3. Test
Test Outputs
Test Outputs
2. Mutation
Operation
Adequacy:
= #Different / total
= 1/3
= 33%
Mutation Testing
Reruns unit tests against modified version of your code
For a metric to be “good” in this context
Like all metrics, it can be fooled
Why we actually care about code coverage…
Finds Software Bugs Early
Whenever you use Unit Testing or Integration Testing, you typically look at Code Coverage which is a …
The degree here is a percentage of source code that is covered by automated tests. Mostly Unit and Integration tests
The lines in the code that are executed when one of the automated tests run, expressed as a percentage of the entire codebase. For example, 65% code coverage would mean that the tests execute 65% of the code.
Function coverage: how many of the functions defined have been called.
Statement coverage: how many of the statements in the program have been executed.
Branches coverage: how many of the branches of the control structures (if statements for instance) have been executed.
Condition coverage: how many of the boolean sub-expressions have been tested for a true and a false value.
Line coverage: how many of lines of source code have been tested.
Lets look at an example
So, I believe we all agree that we would try to aim for a high code coverage, but what exactly would it tell us?
So what does a high code coverage percentage tell us?
So what does 100% code coverage look like here?
A developer could get 100% code coverage for that method with a single short test. Unfortunately, that one line method has an insane amount of complexity and should actually contain perhaps hundreds of tests, not one of which will increase the code coverage metric beyond the first.So what do we know?
When we have 100% test coverage we know that a test has ran all of the lines in this code.
Cool, so it’s fully tested!
It’s measurements are reliable when you’re tracking how much of your code is ran by your tests but it tells you absolutely nothing of the value of those tests.
If you're concerned that sneaky developers will find some way to cheat the number, make themselves look good, and not increase quality at all, you're not entirely wrong. As with any metric, coverage can be cheated, abused, and broken.
Managers often misunderstood the meaning of Code Coverage
I expect a high level of coverage
Suppose a manager requires some level of coverage, perhaps 85%, as a "shipping gate". The product is not done - and you can't ship - until you have 85% coverage.7 The problem with this approach is that people optimize their performance according to how they’re measured. You can get 85% coverage by looking at the coverage conditions, picking the ones that seem easiest to satisfy, writing quick tests for them, and iterating until done. That's faster than thinking of coverage conditions as clues pointing to weaknesses in the test design. It's especially faster because thinking about test design might lead to "redundant" tests that don't increase coverage at all. They only find bugs
Lets fool around
Case:
I had to review source code on an existing project which was made by a service vendor. I found a "funny" and really dirty way used on a class just to achieve the minimal amount of code coverage.
A specific project folder for certain functionality consisted of 12 classes and I found that all of them had a static method called fakeMethod. Then I found one test class that contains only one method calling those fakeMethod for each class which exactly pushed the Code Coverage to the Shipping Gate we used.
Code Coverage provides us useful and actionable insights into your code.
With modern tools, we can see areas of the program that are weakly tested, However,
Bad code coverage is usually a symptom of badly tested code, but good code coverage certainly does not guarantee good source code or good tested source code
It’s measurements are reliable when you’re tracking how much of your code is ran by your tests but it tells you absolutely nothing of the value of those tests.
Visualizing it on its own is useless because it has no reliable, predictive relationship with the quality of the code or the tests.
The goal of test coverage targets is a noble one. By striving to ensure that every line of code is tested, you theoretically reduce the likelihood of a defect going into production unnoticed until an unfortunate customer stumbles across it. However, in reality you run the risk of becoming a slave to this number and writing a whole host of pointless tests that exist for the sole purpose of meeting the minimum coverage requirement. At this point, take a step back and think about why we even write tests.
Coverage numbers (like many numbers) are dangerous because they're objective but incomplete. They too often distort sensible action. Using them in isolation is as foolish as hiring based only on GPA
Teams writes more valuable tests, their code will become more testable
Their testable code will become more loosely coupled and better architected
Their bugs will regress less often, they'll end up with verifiable documentation,
and small refactoring's will become more common because they are safer and easier. In short the team will increase in maturity and their product will increase in quality.
Goal: kill all mutants!
Faults are created
Way for testing quality of tests
Modifies a program in small ways,
each mutated version is called a mutant and tests detect and reject mutants by causing the behaviour of the original version
Tests suites are measured by the percentage of mutants that they killed
Mutants are based on well-defined mutation operators
Mimic typical programming errors,
Wrong variable name
Wrong operator
Diving by zero
Statement deletion
Replacing Branch (Boolean condition)
boundary conditions {TBD} (>,<=)
Replacing arithmetic operators {+, -, *, /, %}
Replacing bitwise operators {&, |, ^}
Replacing reads from parameters
Replacing writes to local variables of the same type with each other.
Used to help tester develop effective tests or locate weakness in the test date
Kind of white box tests
It demonstrates the effectiveness of test cases
Run after unit test suites, takes longer
Example: .NET Ninja Turtles, CREAM
Replacing writes to local variables of the same type with each other.