Search-Based and Fuzz Testing
Tool Competition 2024
Nicolas Erni
Zurich University of Applied
Science (ZHAW)
Christian Birchler
Zurich University of Applied
Science (ZHAW)
Pouria Derakhshanfar
JetBrains
Stephan Lukasczyk
University of Passau
Mohammed Al-Ameen
Zurich University of Applied
Science (ZHAW)
Software Under test Generated Test Code
Sebastiano Panichella
Zurich University of Applied
Science (ZHAW)
Co-located with the 46th International Conference on Software Engineering (ICSE 2024)
History SBFT Python Tool Competition
Year Venue
Coverage
tool
Mutation Tool #CUTs #Projects
#Participants
(+ baseline)
Round 1 2024 SBST PyTest
MutPy /
Cosmic Ray
35 7 4
SBFT Tool Competition - 2024
Python tool competition: For the
fi
rst time ever, we are extending an invitation to researchers to participate in our
competition using their test generation tool for Python. Tools will be assessed based on a benchmark that evaluates code
coverage and mutation score.
What is New?
Figure 1: Example of test generation for simple Python functions.
New!!!
Software Under test Generated Test Code
Python tool competition Infrastructure
python-tool-competition-2024 Infrastructure
run run run
Klara …. Tooln
CUT
Time budget
generated
tests
generated
tests
generated
tests
Python tool competition Infrastructure
python-tool-competition-2024 Infrastructure
run run run
Klara …. Tooln
CUT
Time budget
Generated
tests
MutPy /
Cosmic Ray
Line and Branch
coverage metrics
Mutation metrics
Scoring Formula
T = Generated Test
B = Search Budget
C = Class under test
R = independent Run
Covi = statement coverage
Covb = branch coverage
Covm = Strong Mutation
getTime = generation time
covScore(T, B, C, R) = 1 × Covi + 2 × Covb + 4 × Covm
tScore(T, B, C, R) = covScore(T, B, C, R) × min
(
1,
2 × B
genTime)
Score(T, B, C, R) = tScore(T, B, C, R) + penalty(T, B, C, R)
Xavier Devroey, Alessio Gambi, Juan Pablo Galeotti, René Just, Fitsum Meshesha
Kifetew, Annibale Panichella, Sebastiano Panichella: JUGE: An infrastructure for
benchmarking Java unit test generators. Softw. Test. Verification Reliab. 33(3) (2023)
https://github.com/ThunderKey/python-tool-competition-2024
Software Under test Generated Test Code
Benchmark Projects
• Selection criteria
• GitHub repositories
• Open Source
• Simple files
• No system access (OS, process, network, disk)
Benchmark Projects
• Selection criteria
• GitHub repositories
• Open Source
• 3 projects selected
Klara
https://github.com/se2p/pynguin https://github.com/usagitoneko97/klara
Ghostwriter with Hypothesis
https://github.com/HypothesisWorks/hypothesis
Pynguin
Contest Methodology
Search budget
400
seconds
Files under test
35
Repetitions
4 repetitions
Execution environment
Linux VM
The Tools
Competitors
UtBot
Benchmark
Klara
Pynguin
Ghostwriter
V.S.
Results (1)
Average line coverage for each project per tool
Results (2)
Average branch coverage for each project per tool
Results (3)
Average mutation score for each project per tool
Results (4)
Results (5)
Final Ranking
Competitors
UtBot
Benchmark
Klara
Pynguin
Ghostwriter
V.S.
1
2
Lessons Learned
• Identified aspects to improve and bugs that could be fixed in the
infrastructure
• Docker will simplify the evaluation procedure
• More participants to the competition!
• From Academia & Industry
What’s Next?
• Contest Infrastructure
• https://github.com/ThunderKey/python-tool-competition-2024
• Improve usability
• Facilitate setup of an evaluation
• Facilitate evaluation in other contexts
• Update the user documentation
• For the next edition
• More tools
• More CUTs
• Time budgets
• Time penalty

SBFT Tool Competition 2024 -- Python Test Case Generation Track

  • 1.
    Search-Based and FuzzTesting Tool Competition 2024 Nicolas Erni Zurich University of Applied Science (ZHAW) Christian Birchler Zurich University of Applied Science (ZHAW) Pouria Derakhshanfar JetBrains Stephan Lukasczyk University of Passau Mohammed Al-Ameen Zurich University of Applied Science (ZHAW) Software Under test Generated Test Code Sebastiano Panichella Zurich University of Applied Science (ZHAW) Co-located with the 46th International Conference on Software Engineering (ICSE 2024)
  • 2.
    History SBFT PythonTool Competition Year Venue Coverage tool Mutation Tool #CUTs #Projects #Participants (+ baseline) Round 1 2024 SBST PyTest MutPy / Cosmic Ray 35 7 4
  • 3.
    SBFT Tool Competition- 2024 Python tool competition: For the fi rst time ever, we are extending an invitation to researchers to participate in our competition using their test generation tool for Python. Tools will be assessed based on a benchmark that evaluates code coverage and mutation score. What is New? Figure 1: Example of test generation for simple Python functions. New!!! Software Under test Generated Test Code
  • 4.
    Python tool competitionInfrastructure python-tool-competition-2024 Infrastructure run run run Klara …. Tooln CUT Time budget generated tests generated tests generated tests
  • 5.
    Python tool competitionInfrastructure python-tool-competition-2024 Infrastructure run run run Klara …. Tooln CUT Time budget Generated tests MutPy / Cosmic Ray Line and Branch coverage metrics Mutation metrics
  • 6.
    Scoring Formula T =Generated Test B = Search Budget C = Class under test R = independent Run Covi = statement coverage Covb = branch coverage Covm = Strong Mutation getTime = generation time covScore(T, B, C, R) = 1 × Covi + 2 × Covb + 4 × Covm tScore(T, B, C, R) = covScore(T, B, C, R) × min ( 1, 2 × B genTime) Score(T, B, C, R) = tScore(T, B, C, R) + penalty(T, B, C, R) Xavier Devroey, Alessio Gambi, Juan Pablo Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, Sebastiano Panichella: JUGE: An infrastructure for benchmarking Java unit test generators. Softw. Test. Verification Reliab. 33(3) (2023)
  • 7.
  • 8.
    Benchmark Projects • Selectioncriteria • GitHub repositories • Open Source • Simple files • No system access (OS, process, network, disk)
  • 9.
    Benchmark Projects • Selectioncriteria • GitHub repositories • Open Source • 3 projects selected Klara https://github.com/se2p/pynguin https://github.com/usagitoneko97/klara Ghostwriter with Hypothesis https://github.com/HypothesisWorks/hypothesis Pynguin
  • 10.
    Contest Methodology Search budget 400 seconds Filesunder test 35 Repetitions 4 repetitions Execution environment Linux VM
  • 11.
  • 12.
    Results (1) Average linecoverage for each project per tool
  • 13.
    Results (2) Average branchcoverage for each project per tool
  • 14.
    Results (3) Average mutationscore for each project per tool
  • 15.
  • 16.
  • 17.
  • 18.
    Lessons Learned • Identifiedaspects to improve and bugs that could be fixed in the infrastructure • Docker will simplify the evaluation procedure • More participants to the competition! • From Academia & Industry
  • 19.
    What’s Next? • ContestInfrastructure • https://github.com/ThunderKey/python-tool-competition-2024 • Improve usability • Facilitate setup of an evaluation • Facilitate evaluation in other contexts • Update the user documentation • For the next edition • More tools • More CUTs • Time budgets • Time penalty