2. Testing is a good thing
But how do we know our tests are
good?
3. Code coverage is a start
But it can give a “good” score with
really dreadful tests
4. Really dreadful tests
public int addTwoNumbers(int a, int b) {
return a – b;
}
...
@Test
public void shouldAddTwoNumbers() {
int result = addTwoNumbers(1, 1);
assertTrue(true);
}
Coverage: 100%
Usefulness: 0
8. If you can change the code, and a
test doesn’t fail, either the code is
never run or the tests are wrong.
9. Going with our previous example
public int addTwoNumbers(int a, int b) {
return a – b;
}
Let’s change something
...
@Test
public void shouldAddTwoNumbers() {
int result = addTwoNumbers(1, 1);
assertTrue(true);
}
10. Going with our previous example
public int addTwoNumbers(int a, int b) {
return a + b;
}
...
This still passes
@Test
public void shouldAddTwoNumbers() {
int result = addTwoNumbers(1, 1);
assertTrue(true);
}
11. So it caught a really rubbish test
How about something slightly less
obvious?
12. Slightly less obvious (and I mean slightly)
public int checkConditions(boolean a, boolean b) {
if (a && b) {
return 42;
}
else {
return 0;
}
}
@Test
public void testBothFalse() {
int result = checkConditions(false, false);
assertEquals(0, result);
}
@Test
public void testBothTrue () {
int result = checkConditions(true, true);
assertEquals(42, result);
} Coverage: 100%
Usefulness: >0
But still wrong
13. Slightly less obvious (and I mean slightly)
public int checkConditions(boolean a, boolean b) {
if (a && b) {
return 42;
} Mutate
else {
return 0;
}
}
@Test
public void testBothFalse() {
int result = checkConditions(false, false);
assertEquals(0, result);
}
@Test
public void testBothTrue () {
int result = checkConditions(true, true);
assertEquals(42, result);
}
14. Slightly less obvious (and I mean slightly)
public int checkConditions(boolean a, boolean b) {
if (a || b) {
return 42;
}
else {
return 0;
}
}
@Test
public void testBothFalse() { Passing tests
int result = checkConditions(false, false);
assertEquals(0, result);
}
@Test
public void testBothTrue () {
int result = checkConditions(true, true);
assertEquals(42, result);
}
17. The downfall of mutation
(Equivalent Mutants)
int index = 0
while (someCondition) {
doStuff();
index++;
if (index == 100) {
break;
}
}
Mutates to
int index = 0
while (someCondition) {
doStuff();
index++;
if (index >= 100) {
break;
}
}
But the programs are equivalent, so no test will fail
19. Java
• Loads of tools to choose from
• Bytecode vs source mutation
• Will look at PIT (seems like one of the better
ones)
20. PIT - pitest.org
• Works with “everything”
– Command line
– Ant
– Maven
• Bytecode level mutations (faster)
• Very customisable
– Exclude classes/packages from mutation
– Choose which mutations you want
– Timeouts
• Makes pretty HTML reports (line/mutation coverage)
22. Ruby
• Mutant seems to be the new favourite
• Runs in Rubinius (1.8 or 1.9 mode)
• Only supports RSpec
• Easy to set up
rvm install rbx-head
rvm use rbx-head
gem install mutant
• And easy to use
mutate “ClassName#method_to_test” spec
23. Summary
• Seems like it could identify areas of weakness
in our tests
• At the same time, could be very noisy
• Might be worth just trying it against an
existing project and seeing what happens
Difficult to identify equivalent mutants. There are some papers which suggest methods (but I didn’t have time to read them).
Since most of the team do Ruby, I’ve had a look into that too
Bytecode is faster to mutate as it avoids recompilationsJumble and Jester also seem quite popular
Exclude 3rd party frameworks
Looked into Heckle – since that was what the original topic of this talk was. Turns out it’s been dead for a long time.
Largely based on Heckle, rewritten on top of RubiniusOnly supports RSpec, but is that what’s used in the team? Author is looking to extend to other frameworks.Not sure if you need rubinius-head any more, but you did as of February 2012 (perhaps there’s a more stable version with support now)
If I’ve not hit the time limit, are there any questions?