Guillermo Polito - ESUG’24
Testing the tests with
Mutalk in 2024
guillermo.polito@inria.fr
Evref
fervE
Quick About Me
2
Evref
fervE
• Now: Researcher at Inria - Lille
• Pharo Contributor since ~2010
• Keywords: compilers, testing, test generation
• Interests: tooling, benchmarking, 日本語, board games, batman, concurrency
If any of that interests you, come talk to me!
guillermo.polito@inria.fr
@guillep
Automated Tests
in this context
when this happens
then this should happen
SetTest >> testAdd
| aSet |
"Context"
aSet := Set new.
“Stimuli"
aSet add: 5.
aSet add: 5.
"Check"
self assert: aSet size equals: 1.
3
We love tests
4
App
Run Tests
App is Healthy!
=D
App is broken
:’(
What is a good test?
“A good test is a test that catches bugs”
- me
5
Who watches the Watchmen tests?
6
TESTS
7
Mutalk: Mutation testing for Pharo
• Originally developed in Pharo 1.1 in Argentina (Chillo, Brunstein, Wilkinson)
• Presented at ESUG’09
• Pharo 9 to 12 !
8
Coverage vs Mutations
9
30%
Code
Coverage
4%
Mutation
Coverage
Mutation testing in a nutshell
10
App
Tests
High Score => Good tests!
=D
Low Score => Bad tests…
:’(
+
Create Mutants
SCORE
11
12
These kind of mutant
13
• Introduce arti
fi
cial bugs
• See if the test suite detects them
• What kind of bugs?
Competent developer hypothesis
• Developers are capable people
• Mistakes/bugs are small details easily overseen. E.g.,
• Missing +/- 1 in a loop
• An inverted conditional
• Signed/unsigned
14
Thousands of simple mutations
• Control
fl
ow bugs
• Arithmetic bugs
• Logic bugs
• Over
fl
ow bugs
• Typoes
15
Mutation Analysis
16
abc
Apply Mutation
original
program
mutant
Run Tests
mutation
axc
Survived
Killed
E.g., change one + by a -
The insight
• Survived mutants were either
1. not covered
2. not asserted
3. or semantically equivalent.
E.g. A+B=A-B if B=0 always
17
improve your tests!!
bias our results
Mutation Score
• Run each mutation independently
• Score:
18
#Killed
#Mutants
THE Problem of Mutation Analysis
19
Runtime = Time(tests) * #Mutants
Can we run less mutants?
Can we run less tests?
Optimizing of Mutation Analysis
20
Runtime = Time(tests) * #Mutants
Ideally… no impact on score!
• Run only tests covering mutants
• Only mutate covered code
• What if lots of mutants remain?
Selecting Mutants and Tests from Coverage
21
Test1
Test2
Test3
?
• Lots on research on it
• Aggressively reduces run time
Random Sampling of Mutants
22
Test
Test
Test
Running Mutalk on Pharo’s UUID
Stabilizes between 60-70%
• Strati
fi
ed sampling with di
ff
erent strategies
• per class, method, …
1. Taking a sample:
1. Take a group at random
2. Take a member from the group
Guiding Random Sampling of Mutants
23
Cat’A Cat’B Cat’Z
…
Cat’B
Strati
fi
cation
Sample
Group
Sample
Member
When to Stop Sampling: Budgets!
• Fixed budgets: # or % of mutants
• Too many: No win in runtime!
• Too few: Unrealistic mutation score…
• We need a practical solution:
• Time budget!
24
How many?
• Github actions
• + Microdown
• Con
fi
gure budgets, di
ff
erent run modes…
25
Github actions integration
• Fine-tune for your project
• Advanced mutation analysis
• Identify
• Trivial mutations: easy to kill
• Redundant tests and mutants
• Randomization strategy
Mutation Matrix and Heatmap
26
Talk to me
• If you want to use it!
• If you want to contribute!
• You need speci
fi
c operators
27
Mutalk 2024
• Working from Pharo9 to Pharo12 (13?)
• Practical: Budgets, random selection and CI
• New features: test
fi
lters, logging, new operators
• Fixes, improvements, cleanups
• Thanks to all people contributing over the years!
28
Coverage vs Mutation
• Average has 100% coverage
• What if we mistake the / by a *?
29
CollectionTest >> testAverage
self assert: #(1 2 3) average
Collection >> average
^ self sum / self size

ESUG 2024: Testing the tests with Mutalk in 2024

  • 1.
    Guillermo Polito -ESUG’24 Testing the tests with Mutalk in 2024 guillermo.polito@inria.fr Evref fervE
  • 2.
    Quick About Me 2 Evref fervE •Now: Researcher at Inria - Lille • Pharo Contributor since ~2010 • Keywords: compilers, testing, test generation • Interests: tooling, benchmarking, 日本語, board games, batman, concurrency If any of that interests you, come talk to me! guillermo.polito@inria.fr @guillep
  • 3.
    Automated Tests in thiscontext when this happens then this should happen SetTest >> testAdd | aSet | "Context" aSet := Set new. “Stimuli" aSet add: 5. aSet add: 5. "Check" self assert: aSet size equals: 1. 3
  • 4.
    We love tests 4 App RunTests App is Healthy! =D App is broken :’(
  • 5.
    What is agood test? “A good test is a test that catches bugs” - me 5
  • 6.
    Who watches theWatchmen tests? 6 TESTS
  • 7.
  • 8.
    Mutalk: Mutation testingfor Pharo • Originally developed in Pharo 1.1 in Argentina (Chillo, Brunstein, Wilkinson) • Presented at ESUG’09 • Pharo 9 to 12 ! 8
  • 9.
  • 10.
    Mutation testing ina nutshell 10 App Tests High Score => Good tests! =D Low Score => Bad tests… :’( + Create Mutants SCORE
  • 11.
  • 12.
  • 13.
    These kind ofmutant 13 • Introduce arti fi cial bugs • See if the test suite detects them • What kind of bugs?
  • 14.
    Competent developer hypothesis •Developers are capable people • Mistakes/bugs are small details easily overseen. E.g., • Missing +/- 1 in a loop • An inverted conditional • Signed/unsigned 14
  • 15.
    Thousands of simplemutations • Control fl ow bugs • Arithmetic bugs • Logic bugs • Over fl ow bugs • Typoes 15
  • 16.
    Mutation Analysis 16 abc Apply Mutation original program mutant RunTests mutation axc Survived Killed E.g., change one + by a -
  • 17.
    The insight • Survivedmutants were either 1. not covered 2. not asserted 3. or semantically equivalent. E.g. A+B=A-B if B=0 always 17 improve your tests!! bias our results
  • 18.
    Mutation Score • Runeach mutation independently • Score: 18 #Killed #Mutants
  • 19.
    THE Problem ofMutation Analysis 19 Runtime = Time(tests) * #Mutants
  • 20.
    Can we runless mutants? Can we run less tests? Optimizing of Mutation Analysis 20 Runtime = Time(tests) * #Mutants Ideally… no impact on score!
  • 21.
    • Run onlytests covering mutants • Only mutate covered code • What if lots of mutants remain? Selecting Mutants and Tests from Coverage 21 Test1 Test2 Test3 ?
  • 22.
    • Lots onresearch on it • Aggressively reduces run time Random Sampling of Mutants 22 Test Test Test Running Mutalk on Pharo’s UUID Stabilizes between 60-70%
  • 23.
    • Strati fi ed samplingwith di ff erent strategies • per class, method, … 1. Taking a sample: 1. Take a group at random 2. Take a member from the group Guiding Random Sampling of Mutants 23 Cat’A Cat’B Cat’Z … Cat’B Strati fi cation Sample Group Sample Member
  • 24.
    When to StopSampling: Budgets! • Fixed budgets: # or % of mutants • Too many: No win in runtime! • Too few: Unrealistic mutation score… • We need a practical solution: • Time budget! 24 How many?
  • 25.
    • Github actions •+ Microdown • Con fi gure budgets, di ff erent run modes… 25 Github actions integration
  • 26.
    • Fine-tune foryour project • Advanced mutation analysis • Identify • Trivial mutations: easy to kill • Redundant tests and mutants • Randomization strategy Mutation Matrix and Heatmap 26
  • 27.
    Talk to me •If you want to use it! • If you want to contribute! • You need speci fi c operators 27
  • 28.
    Mutalk 2024 • Workingfrom Pharo9 to Pharo12 (13?) • Practical: Budgets, random selection and CI • New features: test fi lters, logging, new operators • Fixes, improvements, cleanups • Thanks to all people contributing over the years! 28
  • 29.
    Coverage vs Mutation •Average has 100% coverage • What if we mistake the / by a *? 29 CollectionTest >> testAverage self assert: #(1 2 3) average Collection >> average ^ self sum / self size