1
uOttawa.ca
uOttawa.ca | nanda-lab.ca
School of Electrical Engineering & Computer Science | Nanda Lab
ATM: Black-box Test Case Minimization based on Test Code
Similarity and Evolutionary Search
05/18/2023
Rongqi Pan, Taher A. Ghaleb, Lionel Briand
2
uOttawa.ca
Test Case Minimization (TSM)
● Problem:
○ Redundant test cases, i.e., test cases that unlikely to
detect different faults
○ Waste of time and resources
● Solution: Test case minimization, by permanently removing
redundant test cases in a test suite that are unlikely to detect
new faults
3
Existing
TSM techniques
White-box techniques Black-box techniques
• Rely on production
code
• Not always accessible
by test engineers
• Coverage analysis is
computationally
expensive
• Rely on test code only
• More scalable than
white-box techniques
Test Case Minimization (TSM)
i.e., FAST-R
4
uOttawa.ca
Proposed Approach: ATM
(Abstract Syntax Tree (AST)-based Test case Minimizer)
Motivation: Achieve a better trade-off between effectiveness and
efficiency than FAST-R
Transform test
code to ASTs
Test Suite
Measure
test case
similarity
Run search
algorithms
Minimized
test suite
Pre-process
test code
4 tree-based
similarity
measures
GA &
NSGA-II
5
Proposed Approach: ATM
• Test Code Pre-processing
Test code before pre-processing
Test code after pre-processing
Remove logging
or printing statements
Remove
comments
Remove test
case names
Remove
assertions
Normalize
Variable
Identifiers
6
uOttawa.ca
Proposed Approach: ATM
• Transforming code to Abstract Syntax Tree (AST)
Java test method
Corresponding AST
7
Proposed Approach: ATM
• Measuring tree-based similarity between ASTs
Tree-based
Similarity
Structure-
oriented
Top-down
similarity
Detail-oriented
Bottom-up
similarity
8
Proposed Approach: ATM
• Measuring tree-based similarity between ASTs
Tree-based
Similarity
Structure-
oriented
Top-down
similarity
Detail-oriented
Bottom-up
similarity
Take the union
of top-down and
bottom-up
Combined
similarity
9
Proposed Approach: ATM
• Measuring tree-based similarity between ASTs
Tree-based
Similarity
Structure-
oriented
Scattered code
differences
Top-down
similarity
Tree Edit
Distance
Detail-oriented
Bottom-up
similarity
Take the union
of top-down and
bottom-up
Combined
similarity
10
uOttawa.ca
Proposed Approach: ATM
• Genetic Algorithm (GAs)
Initialize
Population
Evaluation Selection Crossover Mutation
Randomly selects a
subset of test cases
Evaluate the subset
using similarity
values as fitness
11
uOttawa.ca
Proposed Approach: ATM
• Genetic Algorithm (GAs)
Initialize
Population
Evaluation Selection Crossover Mutation
Final
Minimized
Test Suite
Randomly selects a
subset of test cases
Evaluate the subset
using similarity
values as fitness
Reaches the
termination
criterion
Contains diverse
test cases
12
uOttawa.ca
Proposed Approach: ATM
• Genetic Algorithm (GAs)
Initialize
Population
Evaluation Selection Crossover Mutation
Final
Minimized
Test Suite
Randomly selects a
subset of test cases
Evaluate the subset
using similarity
values as fitness
Reaches the
termination
criterion
We used single objective GA and multi-objective (NSGA-II)
Contains diverse
test cases
• Top-down
• Bottom-up
• Combined
• Tree Edit Distance
13
uOttawa.ca
Evaluation of ATM
Dataset
DEFECTS4J
• 16 Java projects with
661 versions
• Each version has a
single real fault that is
related to the
production code
Baselines
Evaluation
Metrics
Random minimization
FAST-R
• Black-box
• Much more efficient
than white-box
technique
• Achieves a low fault
detection capability for
Java projects
Effectiveness
• Fault Detection Rate
(FDR)
Efficiency
• Execution Time
Minimization Budgets (The pre-defined test suite reduction rate)
• 25%, 50%, 75%
14
uOttawa.ca
Results for ATM (50% budget, Time in minutes)
GA NSGA-II
Top-Down Bottom-Up Combined Tree Edit
Distance
Top-Down &
Bottom-Up
Combined & Tree
Edit Distance
FDR
0.78
Time
70.87
FDR
0.74
Time
67.05
FDR
0.80
Time
72.75
FDR
0.81
Time
82.23
FDR
0.78
Time
235.41
FDR
0.82
Time
258.44
FDR: ATM achieved high FDR results
(0.82 on average)
Execution Time: ATM ran 1.1-4.3 hours on average
Best Alternative: GA with combined similarity is
considered to be the best configuration when take
both effectiveness (0.80 FDR) and efficiency (1.2
hours on average)
15
Comparison with FAST-R and Random Minimization
(50% budget, Time in seconds)
GA/Combined FAST++ FAST-CS FAST-pw FAST-all Random
Minimization
FDR
0.80
Time
4364.76
FDR
0.61
Time
0.44
FDR
0.60
Time
0.20
FDR
0.47
Time
4.14
FDR
0.59
Time
2.78
FDR
0.52
Time
0.0021
FDR: ATM achieved significantly higher FDR results
than FAST-R (+0.19 on average) and random
minimization (+0.28 on average)
Execution Time: ATM ran within practically acceptable
time given that minimization is only occasionally applied
when many new test cases are created (major releases)
Results achieved for other budgets were consistent
16
Discussion
Effective test case minimization with easily accessible
information
• ATM performs significantly better than baseline techniques
without requiring production code analysis
Scalability
• On the largest project in our dataset, Time, which has nearly 4k
test cases, ATM took more than 10 hours, on average, per
version.
Effective versus efficient test case minimization
• ATM achieved significantly higher FDR results than baseline
approaches within practically acceptable time
17
uOttawa.ca
uOttawa.ca | nanda-lab.ca
School of Electrical Engineering & Computer Science | Nanda Lab
ATM: Black-box Test Case Minimization based on Test Code
Similarity and Evolutionary Search
05/18/2023
Rongqi Pan, Taher A. Ghaleb, Lionel Briand

ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolutionary Search

  • 1.
    1 uOttawa.ca uOttawa.ca | nanda-lab.ca Schoolof Electrical Engineering & Computer Science | Nanda Lab ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolutionary Search 05/18/2023 Rongqi Pan, Taher A. Ghaleb, Lionel Briand
  • 2.
    2 uOttawa.ca Test Case Minimization(TSM) ● Problem: ○ Redundant test cases, i.e., test cases that unlikely to detect different faults ○ Waste of time and resources ● Solution: Test case minimization, by permanently removing redundant test cases in a test suite that are unlikely to detect new faults
  • 3.
    3 Existing TSM techniques White-box techniquesBlack-box techniques • Rely on production code • Not always accessible by test engineers • Coverage analysis is computationally expensive • Rely on test code only • More scalable than white-box techniques Test Case Minimization (TSM) i.e., FAST-R
  • 4.
    4 uOttawa.ca Proposed Approach: ATM (AbstractSyntax Tree (AST)-based Test case Minimizer) Motivation: Achieve a better trade-off between effectiveness and efficiency than FAST-R Transform test code to ASTs Test Suite Measure test case similarity Run search algorithms Minimized test suite Pre-process test code 4 tree-based similarity measures GA & NSGA-II
  • 5.
    5 Proposed Approach: ATM •Test Code Pre-processing Test code before pre-processing Test code after pre-processing Remove logging or printing statements Remove comments Remove test case names Remove assertions Normalize Variable Identifiers
  • 6.
    6 uOttawa.ca Proposed Approach: ATM •Transforming code to Abstract Syntax Tree (AST) Java test method Corresponding AST
  • 7.
    7 Proposed Approach: ATM •Measuring tree-based similarity between ASTs Tree-based Similarity Structure- oriented Top-down similarity Detail-oriented Bottom-up similarity
  • 8.
    8 Proposed Approach: ATM •Measuring tree-based similarity between ASTs Tree-based Similarity Structure- oriented Top-down similarity Detail-oriented Bottom-up similarity Take the union of top-down and bottom-up Combined similarity
  • 9.
    9 Proposed Approach: ATM •Measuring tree-based similarity between ASTs Tree-based Similarity Structure- oriented Scattered code differences Top-down similarity Tree Edit Distance Detail-oriented Bottom-up similarity Take the union of top-down and bottom-up Combined similarity
  • 10.
    10 uOttawa.ca Proposed Approach: ATM •Genetic Algorithm (GAs) Initialize Population Evaluation Selection Crossover Mutation Randomly selects a subset of test cases Evaluate the subset using similarity values as fitness
  • 11.
    11 uOttawa.ca Proposed Approach: ATM •Genetic Algorithm (GAs) Initialize Population Evaluation Selection Crossover Mutation Final Minimized Test Suite Randomly selects a subset of test cases Evaluate the subset using similarity values as fitness Reaches the termination criterion Contains diverse test cases
  • 12.
    12 uOttawa.ca Proposed Approach: ATM •Genetic Algorithm (GAs) Initialize Population Evaluation Selection Crossover Mutation Final Minimized Test Suite Randomly selects a subset of test cases Evaluate the subset using similarity values as fitness Reaches the termination criterion We used single objective GA and multi-objective (NSGA-II) Contains diverse test cases • Top-down • Bottom-up • Combined • Tree Edit Distance
  • 13.
    13 uOttawa.ca Evaluation of ATM Dataset DEFECTS4J •16 Java projects with 661 versions • Each version has a single real fault that is related to the production code Baselines Evaluation Metrics Random minimization FAST-R • Black-box • Much more efficient than white-box technique • Achieves a low fault detection capability for Java projects Effectiveness • Fault Detection Rate (FDR) Efficiency • Execution Time Minimization Budgets (The pre-defined test suite reduction rate) • 25%, 50%, 75%
  • 14.
    14 uOttawa.ca Results for ATM(50% budget, Time in minutes) GA NSGA-II Top-Down Bottom-Up Combined Tree Edit Distance Top-Down & Bottom-Up Combined & Tree Edit Distance FDR 0.78 Time 70.87 FDR 0.74 Time 67.05 FDR 0.80 Time 72.75 FDR 0.81 Time 82.23 FDR 0.78 Time 235.41 FDR 0.82 Time 258.44 FDR: ATM achieved high FDR results (0.82 on average) Execution Time: ATM ran 1.1-4.3 hours on average Best Alternative: GA with combined similarity is considered to be the best configuration when take both effectiveness (0.80 FDR) and efficiency (1.2 hours on average)
  • 15.
    15 Comparison with FAST-Rand Random Minimization (50% budget, Time in seconds) GA/Combined FAST++ FAST-CS FAST-pw FAST-all Random Minimization FDR 0.80 Time 4364.76 FDR 0.61 Time 0.44 FDR 0.60 Time 0.20 FDR 0.47 Time 4.14 FDR 0.59 Time 2.78 FDR 0.52 Time 0.0021 FDR: ATM achieved significantly higher FDR results than FAST-R (+0.19 on average) and random minimization (+0.28 on average) Execution Time: ATM ran within practically acceptable time given that minimization is only occasionally applied when many new test cases are created (major releases) Results achieved for other budgets were consistent
  • 16.
    16 Discussion Effective test caseminimization with easily accessible information • ATM performs significantly better than baseline techniques without requiring production code analysis Scalability • On the largest project in our dataset, Time, which has nearly 4k test cases, ATM took more than 10 hours, on average, per version. Effective versus efficient test case minimization • ATM achieved significantly higher FDR results than baseline approaches within practically acceptable time
  • 17.
    17 uOttawa.ca uOttawa.ca | nanda-lab.ca Schoolof Electrical Engineering & Computer Science | Nanda Lab ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolutionary Search 05/18/2023 Rongqi Pan, Taher A. Ghaleb, Lionel Briand