More Related Content Similar to Unit Testing Tool Competition-Eighth Round (20) More from Sebastiano Panichella (20) Unit Testing Tool Competition-Eighth Round1. Java Unit Testing Tool Competition -
8th Round
Xavier Devroey <x.d.m.devroey@tudelft.nl>
Sebastiano Panichella <sebastiano.panichella@zhaw.ch>
Alessio Gambi <alessio.gambi@uni-passau.de>
2. History
Year Venue
Coverage
tool
Mutation
Tool
#CUTs #Projects
#Participants
(+ baseline)
Statistical
Tests
Round 1 2013 ICST Cobertura Javalanche 77 5 2 ✗
Round 2 2014 FITTEST JaCoCo PITest 63 9 4 ✗
Round 3 2015 SBST JaCoCo PITest 63 9 8 ✗
Round 4 2016 SBST
DEFECT4J
(Real Faults)
68 5 4 ✗
Round 5 2017 SBST JaCoCo
PITest +
Our Env.
69 8 2 (+ 2) ✓
Round 6 2018 SBST JaCoCo
PITest +
Our Env.
59 7 2 (+ 2)
✓ + combined
analysis
Round 7 2019 SBST JaCoCo
PITest +
Our Env.
69 8 2 (+ 2) ✓ + combined
analysis
Round 8 2020 SBST JaCoCo
PITest +
Our Env.
69 8 1 (+ 1)
✓ + combined
analysis
+
+
7. Scoring Formula
T = Generated Test
B = Search Budget
C = Class under test
R = independent Run
Covi = statement coverage
Covb = branch coverage
Covm = Strong Mutation
getTime = generation time
penalty = percentage of flaky test
and non-compiling tests
covScore(T, B, C, R) = 1 × Covi + 2 × Covb + 4 × Covm
tScore(T, B, C, R) = covScore(T, B, C, R) × min
(
1,
2 × B
genTime)
Score(T, B, C, R) = tScore(T, B, C, R) + penalty(T, B, C, R)
10. Benchmark Projects
• Selection criteria
• GitHub repositories
• Project builds using Maven or Gradle
• Contains JUnit 4 test suite
• 4 projects selected
Fescar/Seata
https://github.com/google/guava https://github.com/seata/seata
https://github.com/apache/pdfboxhttps://github.com/INRIA/spoon/
12. Classes Under Test Selection
Classes
1,094
Classes
CC >= 5 382
Classes
Randoop
10 seconds
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●●
●
●
●
● ●●
●
●●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●● ●●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●●
●
●
●●
●
● ●
●
●●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●●●●
●
●
●
●
●
●● ●
●
●
●
●● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
Candidate Sampled
0 25 50 75 100 0 25 50 75 100
0
25
50
75
100
Line coverage
Mutationscore
0
25
50
75
100
Branch
13. Classes Under Test Selection
Classes
1,094
Classes
CC >= 5 382
Classes
Randoop
10 seconds
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●●
●
●
●
● ●●
●
●●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●● ●●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●●
●
●
●●
●
● ●
●
●●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●●●●
●
●
●
●
●
●● ●
●
●
●
●● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
Candidate Sampled
0 25 50 75 100 0 25 50 75 100
0
25
50
75
100
Line coverage
Mutationscore
0
25
50
75
100
Branch
70
CUTs
Random
Fescar 20
Guava 20
PdfBox 20
Spoon 10
Total 70
14. Contest Methodology
Search budgets
1 min. 3 min.
Classes under test
70 classes
Repetitions
10 repetitions
Execution environment Statistical analysis
Friedman’s test
Post-hoc Conover
16. Results
Line coverage Branch coverage Covered mutants Mutation score
R.1 E.1 R.3 E.3 R.1 E.1 R.3 E.3 R.1 E.1 R.3 E.3 R.1 E.1 R.3 E.3
0
25
50
75
100
R.1: Randoop, 1 min. - E.1: EvoSuite, 1 min. - R.3: Randoop, 3 min. - E.3: EvoSuite, 3 min.
18. Lessons Learnt
• Results are consistent with other independent evaluations
• Shows that the bugs discovered in the infrastructure could be fixed
• The two-steps selection procedure for the classes under test is useful
• Discover configuration issues early
• Still largely a manual process
• Docker simplifies the evaluation procedure
19. What’s Next?
• Contest Infrastructure
• https://github.com/JUnitContest/junitcontest
• Improve usability
• Facilitate setup of an evaluation
• Facilitate evaluation in other contexts
• Update the user documentation
• Storage and versioning of the results (and participating tools?)
• For the next edition
• More tools
• More CUTs
20. Java Unit Testing Tool Competition -
9th Round
Chair: Sebastiano Panichella Co-chair(s): Alessio Gambi, You?