Must.Kill.Mutants.
Gerald Mücke
@gmuecke
«Quality
Assurance»
“IS A WAY OF
PREVENTING MISTAKES
OR DEFECTS IN
MANUFACTURED
PRODUCTS AND
AVOIDING PROBLEMS
WHEN DELIVERING
SOLUTIONS OR
SERVICES TO
CUSTOMERS”
(WIKIPEDIA)
2
«manufactured»
products
 «The process of converting raw
materials, components, or parts into
finished goods that meet a customer's
expectations or specifications.»
 Most of the critical code is written
manually
 Raw Materials
 existing software or parts of it
 brain, ideas, knowledge, experience,
requirements,
 Every product is unique
3
«Preventing» defects
 Defects are «created» in development
 Can not be prevented,
it’s human to make mistakes
 Could be detected:
the earlier, the better
 Defects manifest in production
 Or during test
 Can be prevented:
the earlier, the better
4
Sources of a Product
 Internal Development
 QA embeddable
 QA along the pipe line
 Quality is shared effort
 More Easy to change or influence
 External Development
 Software Vendors
 more effort required for dedicated QA
 Less easy to change
 handoff «Waterfall» style
5
«We have
tested it»
ANONYMOUS
DEVELOPER
6
Real-Life Bugs
if( isThreadSafe() ) {
computeSingleThreaded();
} else {
computeMultiThreaded();
}
Made it to
Production,
Performance
Impact: 500%
Duration of Day-
End-Processing
7
Real-Life Bugs
if( ! isDevelopmentMode() ){
collectProfileDataAndSendDeveloperReport();
}
In Production,
Impact:
20% Performance
loss
Compliance
Violation
8
Real-Life Bugs
void function(LocalDate begin, LocalDate end, LocalDate minFrom, ...) {
//...
outerLoop:
while( it.hasNext() ) {
Object current = it.next();
Local from = funcA(current);
Local upto = funcB(current);
while(true){
if( ! isBeforeOrEqual( from , upto ) ) {
continue outerLoop;
}
if( condY(from, minFrom) ) {
from = DateUtil.addDaysToDate(upto, 1);
upto = DateUtil.getLastOfMonth(from);
from = DateUtil.min(new LocalDate[]{ end, from});
upto = DateUtil.min(new LocalDate[]{ end, upto});
void function(LocalDate begin, LocalDate end, LocalDate minFrom, ...) {
//...
outerLoop:
while( it.hasNext() ) {
Object current = it.next();
Local from = funcA(current);
Local upto = funcB(current);
while(true){
if( ! isBeforeOrEqual( from , upto ) ) {
continue outerLoop;
}
if( condY(from, minFrom) ) {
from = DateUtil.addDaysToDate(upto, 1);
upto = DateUtil.getLastOfMonth(from);
from = DateUtil.min(new LocalDate[]{ end, from});
upto = DateUtil.min(new LocalDate[]{ end, upto});
9
Real-Life Bugs (Testcode)
//mock away all relevant services
//use a real and fresh H2 DB
EntityManager em = null;
long someBusinessId = 1L;
//…run some code that does an insert
MyEntity e = em.find(MyEntity.class, someBusinessId);
assertNotNull(e);
10
//not the PK
«This will never
happen in
production»
ANONYMOUS
DEVELOPER
11
well… it did.
Product Delivery Pipeline
Development
Continuous
Integration
Quality
Assurance
Release Operations
Decision Point
13
How to
make
informed
decisions?
… WITHOUT HAVING A
CLUE
14
Good Decisions are based on:
Information
Simple
 Metrics
Number of Unit
Tests
Line Coverage
Branch Coverage
 Compilation success
Complex
Test Results
Code Review
Static Code
Analysis
Experience/Past
Releases
15
Code Coverage
 Information about what
elements of a product have
been touched by a test.
 Common Coverage Metrics
 Line Coverage
 Condition Coverage
 Branch Coverage
 Semantics ?
Code
Test
Test Oracle
16
Would you
release a product
based on
 100% Line Coverage
 100% Branch Coverage
 And all Tests are green
17
«Line or
Branche coverage
provide no value»
… FOR TAKING INFORMED DECISIONS
Cat
Vision
19
Let’s look with the eyes of a …
Delivery Pipeline with the view of …
Development
Continuous
Integration
Quality
Assurance
Release OperationsI don’t care
Developers
Development
Continuous
Integration
Quality
Assurance
Release OperationsI don’t understand I don’t understand
Product Owners
Development
Continuous
Integration
Quality
Assurance
Release Operations
I don’t care
Operators
Decision Point
21
"Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-SA
"Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-SA
"Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-SA
Delivery Pipeline: Tester Vision
Development
Continuous
Integration
Quality
Assurance
Release Operations
Decision Point
22
Testing is about
gaining new
information
The …* Pyramid
UI
Tests
Integration
Tests
Unit Tests
DegreeofAutomation
24
* No kitten or puppy got harmed for this slide
Information Gain
(without Testing)
Information
Development
Continuous
Integration
Quality
Assurance
Release Operations
25
The … Pyramid
UI
Tests
Integration
Tests
Unit Tests
DegreeofAutomation
Degreeof
Testing
26
Information Gain
(with Testing)
Information
Development
Continuous
Integration
Quality
Assurance
Release Operations
Value of Testing
27
« 77% of the
failures can be
reproduced by
a unit test »
YUAN, ET AL.
28
https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-yuan.pdf
The … Pyramid
UI
Tests
Integration
Tests
Unit Tests
DegreeofAutomation
Degreeof
Testing
Information
Gap
29
Where to improve?
Development
Continuous
Integration
Quality
Assurance
Release Operations
Cost / Defect
Coverable
Semantics
InformationInformation Gap
30
How to prove,
to test the right
thing right?
Mutation Testing – History
Mutations testing injects faults,
based on rules, into a product
to verify if the test suite
is capable of finding it.
 Fault injection technique
 Concept is known since ~1970
 First implementation of a mutation testing tool in 1980
 Most of the time it was subject to academic research only
 Recently, with increasing processing power, there is a growing interest
 More academic research ongoing
 Practical tooling available
32
Mutation Testing – Some Theory
 Mutation testing is a special form of Fault Injection
 Based on two hypotheses
 1: Most of the software faults are due to small syntactic errors
 2: Simple faults can cascade to more emergent faults
 Assumption:
 “if a mutant was introduced without the behavior of the test suite being
affected, this indicated either that the code that had been mutated was
never executed (dead code) or that the test suite was unable to locate the
faults represented by the mutant” (Wikipedia)
33
Mutation Testing - Definitions
 Mutant
 a variation P’ of the product P
created by applying a mutant
operator m
P’ = m(P)
 Killed Mutant
 a variation P’ in which a test has
found at least ONE error
 Live Mutant
 a variation P’ in which a test has
found NO errors
 Mutation Operators
 A function m() that creates a
variation of the Product P by
applying a set of modification rules
 Inject Faults into the Product
 Based on Bug Taxonomies
 Mutation Score
 Number of Killed Mutants / Total
number of Mutants
 Also Called Mutation Coverage
34
Some more definitions
 Equivalent Mutation
 a variation P’ that is semantically
identical to P
 Duplicate Mutation
 a variation P’ that is equivalent to
another variation P’’
 Weak Mutation
 Fault does not lead to incorrect output
 Strong Mutation
 Fault propagates to incorrect output
 Unstable Mutation
 Any test can find the mutations
generated by it
 High-Order Mutants
 Mutants that are defined by a set of
Low-Level Mutants
 Subsumed Mutants
 One mutant subsumes another if at
least one test kills the first and every test
that kills the first also kills the second.
35
Mutation Operators
 Boundaries
 Conditional Boundary
 Negate Conditionals
 Remove Conditional
 Return values
 Return Values
 Argument Propagation
 Method Calls
 Non Void Method Calls
 Void Method Calls
 Constructor Calls
 Calculations
 Invert Negatives
 Increments / Remove Increments
 Math
 Members and Constants
 Inline Constants
 Member Variable (experimental)
 Java Language
 Switch (experimental)
 Modifiers
 ...
 ...
36
« 73% of real faults are
coupled to the mutants
generated by
commonly used
mutation operators»
RENÉ JUST, ET AL.
http://people.cs.umass.edu/~rjust/publ/mutants_real_faults_fse_2014.pdf
37
« either you die
young or live long
enough to
become a villain»
HARVEY DENT
38
Approaches to
Mutation Testing
 Byte Code Mutation
 Can be done on-the-fly
 Faster to apply and execute
 Might be affected by compiler
optimizations
 Source Code Mutation
 Requires recompilation after every
change
 Takes very long
 Is not affected by compiler
optimizations
 Higher Level Mutations
 Configuration, Architecture,
Specification, Use/Business Case, ...
 No Tooling Support (yet?)
39
Mutation Testing 101
 Modify your code
(Mutant generation)
 Re-Run the Test
(Test selection + Loading)
 Check if test is failing
(Detection)
class Builder {
Builder withValue(String in) {
this.value = in;
this.value = in;
return this;
}
}
@Test
public void testLeft() {
Builder b = b.withValue(„one");
assertNotNull(b);
}
If test is Green it‘s a Fail!!!
40
Tooling
PIT
Java, Scala,
Kotlin
muJava Java
Jester Java
Judy Java
Mutator
Java, JavaScript,
Ruby, PHP
Javalanche Java
Jumble Java
Major Java
Stryker JavaScript
mutate.py Python
Mutant Ruby
Heckle Ruby
NinjaTurtles .Net, Mono
Nester C#
Humbug PHP
MuCheck Haskel
…
41
Tool: PIT
 Mutation Testing for Java / JVM
 Operates on ByteCode
modification
 ~ 20 Mutation Operands for
altering your code
 Parallel execution
 Active Community
 Mature Tooling
 HTML & XML Reports
 Written by Henry Cole, @0hjc
42
Example Output 43
Interpreting Results
 Live Mutants
 Reflects unspecified behavior
 superfluous code / unrequired semantics
 Could be an actual bug that is not covered by the test suite
 Could be equivalent mutation
 Killed by TimeOut or MemError
 Could be “real kill” (i.e. endless loop)
 Could be still alive
 Mutation Score
 Gives an indication of the overall quality of you test suite
44
Deviation in Mutation
Score
Value has it’s cost
Size of Codebase
Computational
Effort
Mutations
Found
More Operands
Less Operands
45
Limitations
 Fault Coverage
 ~¼ of real faults are not coverable by mutation testing
 Real Faults are made of a combination of non-complex faults
 Mutation Score
 PIT does not recognize subsumed or equivalent mutations
 mutation score may not be “academically” precise – context matters!
 Mutation Operators
 PIT has no Java concurrency mutation operands
 PIT has no high-order mutation operands
 Techniques
 PIT does not support sampling
46
Conclusion
47
Some Advices
 Unit Tests are usually owned by development
 challenge them with Mutation Testing!
 It’s NOT unit tested until mutation tested.
 Don’t go on a killing spree
 Set achievable goals for mutation score
 Triage surviving mutants
 A mutation score > 0.8 is considered good (it depends…)
 Determine mutation score regularly in a sensible intervall
 Every build vs. Every release
 Use historical data & SCM support
 Find concrete mutants as needed
 Adjust mutators & scope
48
«there is no statistically
significant difference
between Test-First and
Test-Last practices»
LECH MADEYSKI
49
http://madeyski.e-informatyka.pl/download/Madeyski10c.pdf
Takeaways
 Don’t trust your Unit Tests unless you mutation-tested it.
 Mutation Testing is the practice to find bugs in your test suite
 Forget about other coverage metrics
 Cheap to get, but next to no value
 Include Mutation Testing in your project.
 Always.
 Use it with common sense
 don’t go on a killing spree.
 For Java
 PIT is the tool to use.
50
Must.Kill.Mutants. Agile Testing Days 2017

Must.Kill.Mutants. Agile Testing Days 2017

  • 1.
  • 2.
    «Quality Assurance» “IS A WAYOF PREVENTING MISTAKES OR DEFECTS IN MANUFACTURED PRODUCTS AND AVOIDING PROBLEMS WHEN DELIVERING SOLUTIONS OR SERVICES TO CUSTOMERS” (WIKIPEDIA) 2
  • 3.
    «manufactured» products  «The processof converting raw materials, components, or parts into finished goods that meet a customer's expectations or specifications.»  Most of the critical code is written manually  Raw Materials  existing software or parts of it  brain, ideas, knowledge, experience, requirements,  Every product is unique 3
  • 4.
    «Preventing» defects  Defectsare «created» in development  Can not be prevented, it’s human to make mistakes  Could be detected: the earlier, the better  Defects manifest in production  Or during test  Can be prevented: the earlier, the better 4
  • 5.
    Sources of aProduct  Internal Development  QA embeddable  QA along the pipe line  Quality is shared effort  More Easy to change or influence  External Development  Software Vendors  more effort required for dedicated QA  Less easy to change  handoff «Waterfall» style 5
  • 6.
  • 7.
    Real-Life Bugs if( isThreadSafe()) { computeSingleThreaded(); } else { computeMultiThreaded(); } Made it to Production, Performance Impact: 500% Duration of Day- End-Processing 7
  • 8.
    Real-Life Bugs if( !isDevelopmentMode() ){ collectProfileDataAndSendDeveloperReport(); } In Production, Impact: 20% Performance loss Compliance Violation 8
  • 9.
    Real-Life Bugs void function(LocalDatebegin, LocalDate end, LocalDate minFrom, ...) { //... outerLoop: while( it.hasNext() ) { Object current = it.next(); Local from = funcA(current); Local upto = funcB(current); while(true){ if( ! isBeforeOrEqual( from , upto ) ) { continue outerLoop; } if( condY(from, minFrom) ) { from = DateUtil.addDaysToDate(upto, 1); upto = DateUtil.getLastOfMonth(from); from = DateUtil.min(new LocalDate[]{ end, from}); upto = DateUtil.min(new LocalDate[]{ end, upto}); void function(LocalDate begin, LocalDate end, LocalDate minFrom, ...) { //... outerLoop: while( it.hasNext() ) { Object current = it.next(); Local from = funcA(current); Local upto = funcB(current); while(true){ if( ! isBeforeOrEqual( from , upto ) ) { continue outerLoop; } if( condY(from, minFrom) ) { from = DateUtil.addDaysToDate(upto, 1); upto = DateUtil.getLastOfMonth(from); from = DateUtil.min(new LocalDate[]{ end, from}); upto = DateUtil.min(new LocalDate[]{ end, upto}); 9
  • 10.
    Real-Life Bugs (Testcode) //mockaway all relevant services //use a real and fresh H2 DB EntityManager em = null; long someBusinessId = 1L; //…run some code that does an insert MyEntity e = em.find(MyEntity.class, someBusinessId); assertNotNull(e); 10 //not the PK
  • 11.
    «This will never happenin production» ANONYMOUS DEVELOPER 11
  • 12.
  • 13.
  • 14.
  • 15.
    Good Decisions arebased on: Information Simple  Metrics Number of Unit Tests Line Coverage Branch Coverage  Compilation success Complex Test Results Code Review Static Code Analysis Experience/Past Releases 15
  • 16.
    Code Coverage  Informationabout what elements of a product have been touched by a test.  Common Coverage Metrics  Line Coverage  Condition Coverage  Branch Coverage  Semantics ? Code Test Test Oracle 16
  • 17.
    Would you release aproduct based on  100% Line Coverage  100% Branch Coverage  And all Tests are green 17
  • 18.
    «Line or Branche coverage provideno value» … FOR TAKING INFORMED DECISIONS
  • 19.
  • 20.
    Let’s look withthe eyes of a …
  • 21.
    Delivery Pipeline withthe view of … Development Continuous Integration Quality Assurance Release OperationsI don’t care Developers Development Continuous Integration Quality Assurance Release OperationsI don’t understand I don’t understand Product Owners Development Continuous Integration Quality Assurance Release Operations I don’t care Operators Decision Point 21 "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-SA "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-SA "Dieses Foto" von Unbekannter Autor ist lizenziert gemäß CC BY-SA
  • 22.
    Delivery Pipeline: TesterVision Development Continuous Integration Quality Assurance Release Operations Decision Point 22
  • 23.
    Testing is about gainingnew information
  • 24.
    The …* Pyramid UI Tests Integration Tests UnitTests DegreeofAutomation 24 * No kitten or puppy got harmed for this slide
  • 25.
  • 26.
    The … Pyramid UI Tests Integration Tests UnitTests DegreeofAutomation Degreeof Testing 26
  • 27.
  • 28.
    « 77% ofthe failures can be reproduced by a unit test » YUAN, ET AL. 28 https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-yuan.pdf
  • 29.
    The … Pyramid UI Tests Integration Tests UnitTests DegreeofAutomation Degreeof Testing Information Gap 29
  • 30.
    Where to improve? Development Continuous Integration Quality Assurance ReleaseOperations Cost / Defect Coverable Semantics InformationInformation Gap 30
  • 31.
    How to prove, totest the right thing right?
  • 32.
    Mutation Testing –History Mutations testing injects faults, based on rules, into a product to verify if the test suite is capable of finding it.  Fault injection technique  Concept is known since ~1970  First implementation of a mutation testing tool in 1980  Most of the time it was subject to academic research only  Recently, with increasing processing power, there is a growing interest  More academic research ongoing  Practical tooling available 32
  • 33.
    Mutation Testing –Some Theory  Mutation testing is a special form of Fault Injection  Based on two hypotheses  1: Most of the software faults are due to small syntactic errors  2: Simple faults can cascade to more emergent faults  Assumption:  “if a mutant was introduced without the behavior of the test suite being affected, this indicated either that the code that had been mutated was never executed (dead code) or that the test suite was unable to locate the faults represented by the mutant” (Wikipedia) 33
  • 34.
    Mutation Testing -Definitions  Mutant  a variation P’ of the product P created by applying a mutant operator m P’ = m(P)  Killed Mutant  a variation P’ in which a test has found at least ONE error  Live Mutant  a variation P’ in which a test has found NO errors  Mutation Operators  A function m() that creates a variation of the Product P by applying a set of modification rules  Inject Faults into the Product  Based on Bug Taxonomies  Mutation Score  Number of Killed Mutants / Total number of Mutants  Also Called Mutation Coverage 34
  • 35.
    Some more definitions Equivalent Mutation  a variation P’ that is semantically identical to P  Duplicate Mutation  a variation P’ that is equivalent to another variation P’’  Weak Mutation  Fault does not lead to incorrect output  Strong Mutation  Fault propagates to incorrect output  Unstable Mutation  Any test can find the mutations generated by it  High-Order Mutants  Mutants that are defined by a set of Low-Level Mutants  Subsumed Mutants  One mutant subsumes another if at least one test kills the first and every test that kills the first also kills the second. 35
  • 36.
    Mutation Operators  Boundaries Conditional Boundary  Negate Conditionals  Remove Conditional  Return values  Return Values  Argument Propagation  Method Calls  Non Void Method Calls  Void Method Calls  Constructor Calls  Calculations  Invert Negatives  Increments / Remove Increments  Math  Members and Constants  Inline Constants  Member Variable (experimental)  Java Language  Switch (experimental)  Modifiers  ...  ... 36
  • 37.
    « 73% ofreal faults are coupled to the mutants generated by commonly used mutation operators» RENÉ JUST, ET AL. http://people.cs.umass.edu/~rjust/publ/mutants_real_faults_fse_2014.pdf 37
  • 38.
    « either youdie young or live long enough to become a villain» HARVEY DENT 38
  • 39.
    Approaches to Mutation Testing Byte Code Mutation  Can be done on-the-fly  Faster to apply and execute  Might be affected by compiler optimizations  Source Code Mutation  Requires recompilation after every change  Takes very long  Is not affected by compiler optimizations  Higher Level Mutations  Configuration, Architecture, Specification, Use/Business Case, ...  No Tooling Support (yet?) 39
  • 40.
    Mutation Testing 101 Modify your code (Mutant generation)  Re-Run the Test (Test selection + Loading)  Check if test is failing (Detection) class Builder { Builder withValue(String in) { this.value = in; this.value = in; return this; } } @Test public void testLeft() { Builder b = b.withValue(„one"); assertNotNull(b); } If test is Green it‘s a Fail!!! 40
  • 41.
    Tooling PIT Java, Scala, Kotlin muJava Java JesterJava Judy Java Mutator Java, JavaScript, Ruby, PHP Javalanche Java Jumble Java Major Java Stryker JavaScript mutate.py Python Mutant Ruby Heckle Ruby NinjaTurtles .Net, Mono Nester C# Humbug PHP MuCheck Haskel … 41
  • 42.
    Tool: PIT  MutationTesting for Java / JVM  Operates on ByteCode modification  ~ 20 Mutation Operands for altering your code  Parallel execution  Active Community  Mature Tooling  HTML & XML Reports  Written by Henry Cole, @0hjc 42
  • 43.
  • 44.
    Interpreting Results  LiveMutants  Reflects unspecified behavior  superfluous code / unrequired semantics  Could be an actual bug that is not covered by the test suite  Could be equivalent mutation  Killed by TimeOut or MemError  Could be “real kill” (i.e. endless loop)  Could be still alive  Mutation Score  Gives an indication of the overall quality of you test suite 44
  • 45.
    Deviation in Mutation Score Valuehas it’s cost Size of Codebase Computational Effort Mutations Found More Operands Less Operands 45
  • 46.
    Limitations  Fault Coverage ~¼ of real faults are not coverable by mutation testing  Real Faults are made of a combination of non-complex faults  Mutation Score  PIT does not recognize subsumed or equivalent mutations  mutation score may not be “academically” precise – context matters!  Mutation Operators  PIT has no Java concurrency mutation operands  PIT has no high-order mutation operands  Techniques  PIT does not support sampling 46
  • 47.
  • 48.
    Some Advices  UnitTests are usually owned by development  challenge them with Mutation Testing!  It’s NOT unit tested until mutation tested.  Don’t go on a killing spree  Set achievable goals for mutation score  Triage surviving mutants  A mutation score > 0.8 is considered good (it depends…)  Determine mutation score regularly in a sensible intervall  Every build vs. Every release  Use historical data & SCM support  Find concrete mutants as needed  Adjust mutators & scope 48
  • 49.
    «there is nostatistically significant difference between Test-First and Test-Last practices» LECH MADEYSKI 49 http://madeyski.e-informatyka.pl/download/Madeyski10c.pdf
  • 50.
    Takeaways  Don’t trustyour Unit Tests unless you mutation-tested it.  Mutation Testing is the practice to find bugs in your test suite  Forget about other coverage metrics  Cheap to get, but next to no value  Include Mutation Testing in your project.  Always.  Use it with common sense  don’t go on a killing spree.  For Java  PIT is the tool to use. 50