ATM: Black-box Test Case Minimization using Test Code Similarity
1. 1
uOttawa.ca
uOttawa.ca | nanda-lab.ca
School of Electrical Engineering & Computer Science | Nanda Lab
ATM: Black-box Test Case Minimization based on Test Code
Similarity and Evolutionary Search
05/18/2023
Rongqi Pan, Taher A. Ghaleb, Lionel Briand
2. 2
uOttawa.ca
Test Case Minimization (TSM)
● Problem:
○ Redundant test cases, i.e., test cases that unlikely to
detect different faults
○ Waste of time and resources
● Solution: Test case minimization, by permanently removing
redundant test cases in a test suite that are unlikely to detect
new faults
3. 3
Existing
TSM techniques
White-box techniques Black-box techniques
• Rely on production
code
• Not always accessible
by test engineers
• Coverage analysis is
computationally
expensive
• Rely on test code only
• More scalable than
white-box techniques
Test Case Minimization (TSM)
i.e., FAST-R
4. 4
uOttawa.ca
Proposed Approach: ATM
(Abstract Syntax Tree (AST)-based Test case Minimizer)
Motivation: Achieve a better trade-off between effectiveness and
efficiency than FAST-R
Transform test
code to ASTs
Test Suite
Measure
test case
similarity
Run search
algorithms
Minimized
test suite
Pre-process
test code
4 tree-based
similarity
measures
GA &
NSGA-II
5. 5
Proposed Approach: ATM
• Test Code Pre-processing
Test code before pre-processing
Test code after pre-processing
Remove logging
or printing statements
Remove
comments
Remove test
case names
Remove
assertions
Normalize
Variable
Identifiers
8. 8
Proposed Approach: ATM
• Measuring tree-based similarity between ASTs
Tree-based
Similarity
Structure-
oriented
Top-down
similarity
Detail-oriented
Bottom-up
similarity
Take the union
of top-down and
bottom-up
Combined
similarity
9. 9
Proposed Approach: ATM
• Measuring tree-based similarity between ASTs
Tree-based
Similarity
Structure-
oriented
Scattered code
differences
Top-down
similarity
Tree Edit
Distance
Detail-oriented
Bottom-up
similarity
Take the union
of top-down and
bottom-up
Combined
similarity
10. 10
uOttawa.ca
Proposed Approach: ATM
• Genetic Algorithm (GAs)
Initialize
Population
Evaluation Selection Crossover Mutation
Randomly selects a
subset of test cases
Evaluate the subset
using similarity
values as fitness
11. 11
uOttawa.ca
Proposed Approach: ATM
• Genetic Algorithm (GAs)
Initialize
Population
Evaluation Selection Crossover Mutation
Final
Minimized
Test Suite
Randomly selects a
subset of test cases
Evaluate the subset
using similarity
values as fitness
Reaches the
termination
criterion
Contains diverse
test cases
12. 12
uOttawa.ca
Proposed Approach: ATM
• Genetic Algorithm (GAs)
Initialize
Population
Evaluation Selection Crossover Mutation
Final
Minimized
Test Suite
Randomly selects a
subset of test cases
Evaluate the subset
using similarity
values as fitness
Reaches the
termination
criterion
We used single objective GA and multi-objective (NSGA-II)
Contains diverse
test cases
• Top-down
• Bottom-up
• Combined
• Tree Edit Distance
13. 13
uOttawa.ca
Evaluation of ATM
Dataset
DEFECTS4J
• 16 Java projects with
661 versions
• Each version has a
single real fault that is
related to the
production code
Baselines
Evaluation
Metrics
Random minimization
FAST-R
• Black-box
• Much more efficient
than white-box
technique
• Achieves a low fault
detection capability for
Java projects
Effectiveness
• Fault Detection Rate
(FDR)
Efficiency
• Execution Time
Minimization Budgets (The pre-defined test suite reduction rate)
• 25%, 50%, 75%
14. 14
uOttawa.ca
Results for ATM (50% budget, Time in minutes)
GA NSGA-II
Top-Down Bottom-Up Combined Tree Edit
Distance
Top-Down &
Bottom-Up
Combined & Tree
Edit Distance
FDR
0.78
Time
70.87
FDR
0.74
Time
67.05
FDR
0.80
Time
72.75
FDR
0.81
Time
82.23
FDR
0.78
Time
235.41
FDR
0.82
Time
258.44
FDR: ATM achieved high FDR results
(0.82 on average)
Execution Time: ATM ran 1.1-4.3 hours on average
Best Alternative: GA with combined similarity is
considered to be the best configuration when take
both effectiveness (0.80 FDR) and efficiency (1.2
hours on average)
15. 15
Comparison with FAST-R and Random Minimization
(50% budget, Time in seconds)
GA/Combined FAST++ FAST-CS FAST-pw FAST-all Random
Minimization
FDR
0.80
Time
4364.76
FDR
0.61
Time
0.44
FDR
0.60
Time
0.20
FDR
0.47
Time
4.14
FDR
0.59
Time
2.78
FDR
0.52
Time
0.0021
FDR: ATM achieved significantly higher FDR results
than FAST-R (+0.19 on average) and random
minimization (+0.28 on average)
Execution Time: ATM ran within practically acceptable
time given that minimization is only occasionally applied
when many new test cases are created (major releases)
Results achieved for other budgets were consistent
16. 16
Discussion
Effective test case minimization with easily accessible
information
• ATM performs significantly better than baseline techniques
without requiring production code analysis
Scalability
• On the largest project in our dataset, Time, which has nearly 4k
test cases, ATM took more than 10 hours, on average, per
version.
Effective versus efficient test case minimization
• ATM achieved significantly higher FDR results than baseline
approaches within practically acceptable time
17. 17
uOttawa.ca
uOttawa.ca | nanda-lab.ca
School of Electrical Engineering & Computer Science | Nanda Lab
ATM: Black-box Test Case Minimization based on Test Code
Similarity and Evolutionary Search
05/18/2023
Rongqi Pan, Taher A. Ghaleb, Lionel Briand