ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolutionary Search

1
uOttawa.ca
uOttawa.ca | nanda-lab.ca
School of Electrical Engineering & Computer Science | Nanda Lab
ATM: Black-box Test Case Minimization based on Test Code
Similarity and Evolutionary Search
05/18/2023
Rongqi Pan, Taher A. Ghaleb, Lionel Briand

2
uOttawa.ca
Test Case Minimization (TSM)
● Problem:
○ Redundant test cases, i.e., test cases that unlikely to
detect different faults
○ Waste of time and resources
● Solution: Test case minimization, by permanently removing
redundant test cases in a test suite that are unlikely to detect
new faults

3
Existing
TSM techniques
White-box techniques Black-box techniques
• Rely on production
code
• Not always accessible
by test engineers
• Coverage analysis is
computationally
expensive
• Rely on test code only
• More scalable than
white-box techniques
Test Case Minimization (TSM)
i.e., FAST-R

4
uOttawa.ca
Proposed Approach: ATM
(Abstract Syntax Tree (AST)-based Test case Minimizer)
Motivation: Achieve a better trade-off between effectiveness and
efficiency than FAST-R
Transform test
code to ASTs
Test Suite
Measure
test case
similarity
Run search
algorithms
Minimized
test suite
Pre-process
test code
4 tree-based
similarity
measures
GA &
NSGA-II

5
• Test Code Pre-processing
Test code before pre-processing
Test code after pre-processing
Remove logging
or printing statements
Remove
comments
Remove test
case names
Remove
assertions
Normalize
Variable
Identifiers

6
uOttawa.ca
• Transforming code to Abstract Syntax Tree (AST)
Java test method
Corresponding AST

7
• Measuring tree-based similarity between ASTs
Tree-based
Similarity
Structure-
oriented
Top-down
similarity
Detail-oriented
Bottom-up
similarity

8
Tree-based
Similarity
Structure-
oriented
Top-down
similarity
Detail-oriented
Bottom-up
similarity
Take the union
of top-down and
bottom-up
Combined
similarity

9
Tree-based
Similarity
Structure-
oriented
Scattered code
differences
Top-down
similarity
Tree Edit
Distance
Detail-oriented
Bottom-up
similarity
Take the union
of top-down and
bottom-up
Combined
similarity

10
uOttawa.ca
• Genetic Algorithm (GAs)
Initialize
Population
Evaluation Selection Crossover Mutation
Randomly selects a
subset of test cases
Evaluate the subset
using similarity
values as fitness

11
uOttawa.ca
Initialize
Population
Final
Minimized
Test Suite
Randomly selects a
Evaluate the subset
using similarity
values as fitness
Reaches the
termination
criterion
Contains diverse
test cases

12
uOttawa.ca
Initialize
Population
Final
Minimized
Test Suite
Randomly selects a
Evaluate the subset
using similarity
values as fitness
Reaches the
termination
criterion
We used single objective GA and multi-objective (NSGA-II)
Contains diverse
test cases
• Top-down
• Bottom-up
• Combined
• Tree Edit Distance

13
uOttawa.ca
Evaluation of ATM
Dataset
DEFECTS4J
• 16 Java projects with
661 versions
• Each version has a
single real fault that is
related to the
production code
Baselines
Evaluation
Metrics
Random minimization
FAST-R
• Black-box
• Much more efficient
than white-box
technique
• Achieves a low fault
detection capability for
Java projects
Effectiveness
• Fault Detection Rate
(FDR)
Efficiency
• Execution Time
Minimization Budgets (The pre-defined test suite reduction rate)
• 25%, 50%, 75%

14
uOttawa.ca
Results for ATM (50% budget, Time in minutes)
GA NSGA-II
Top-Down Bottom-Up Combined Tree Edit
Distance
Top-Down &
Bottom-Up
Combined & Tree
Edit Distance
FDR
0.78
Time
70.87
FDR
0.74
Time
67.05
FDR
0.80
Time
72.75
FDR
0.81
Time
82.23
FDR
0.78
Time
235.41
FDR
0.82
Time
258.44
FDR: ATM achieved high FDR results
(0.82 on average)
Execution Time: ATM ran 1.1-4.3 hours on average
Best Alternative: GA with combined similarity is
considered to be the best configuration when take
both effectiveness (0.80 FDR) and efficiency (1.2
hours on average)

15
Comparison with FAST-R and Random Minimization
(50% budget, Time in seconds)
GA/Combined FAST++ FAST-CS FAST-pw FAST-all Random
Minimization
FDR
0.80
Time
4364.76
FDR
0.61
Time
0.44
FDR
0.60
Time
0.20
FDR
0.47
Time
4.14
FDR
0.59
Time
2.78
FDR
0.52
Time
0.0021
FDR: ATM achieved significantly higher FDR results
than FAST-R (+0.19 on average) and random
minimization (+0.28 on average)
Execution Time: ATM ran within practically acceptable
time given that minimization is only occasionally applied
when many new test cases are created (major releases)
Results achieved for other budgets were consistent

16
Discussion
Effective test case minimization with easily accessible
information
• ATM performs significantly better than baseline techniques
without requiring production code analysis
Scalability
• On the largest project in our dataset, Time, which has nearly 4k
test cases, ATM took more than 10 hours, on average, per
version.
Effective versus efficient test case minimization
• ATM achieved significantly higher FDR results than baseline
approaches within practically acceptable time

17
uOttawa.ca
uOttawa.ca | nanda-lab.ca
School of Electrical Engineering & Computer Science | Nanda Lab
ATM: Black-box Test Case Minimization based on Test Code
Similarity and Evolutionary Search
05/18/2023
Rongqi Pan, Taher A. Ghaleb, Lionel Briand

ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolutionary Search

More Related Content

Similar to ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolutionary Search

More from Lionel Briand

Recently uploaded

ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolutionary Search