Instance Space Analysis for Search Based Software Engineering

Keynote
On the Eﬀectiveness of SBSE
Techniques through Instance
Space Analysis
Aldeida Aleti
Monash University, Australia
@AldeidaAleti aldeida.aleti@monash.edu

Effectiveness of SBSE - Status Quo
A large focus of SBSE research is in introducing new SBSE approaches
As part of the evaluation process, usually a set of experiments are conducted
- A benchmark is selected, e..g., Defects4J
- The new approach is compared against the state of the art
- Averages/medians are reported
- Some statistical tests are conducted

Instance Space Analysis
1. to understand and visualise the strengths and weaknesses of different approaches
2. to help with the objective assessment of different approaches
a. Scrutinising how approaches perform under different conditions, and stress testing them

Motivation 1: Are the problem instances adequate?

Problem 1: How were the problem instances selected?
Common benchmark problems are important for fair comparison, but are they
- demonstrably diverse
- unbiased
- representative of a range of real world context,
- challenging
- discriminating

Motivation 2: Reporting averages/medians obscures
important information
A. Perera, A. Aleti, M. Böhme and B. Turhan, "Defect Prediction Guided Search-Based
Software Testing," 2020 35th IEEE/ACM International Conference on Automated Software
Engineering (ASE), 2020, pp. 448-460.

Problem 2: Performance is often problem dependent
(NFT)
- What are the strengths and weaknesses of the approaches?
- Which are the problem instances where an approach performs really well and
why?
- Which are the problem instances where an approach struggles and why?
- How do features of the problem instances affect the performance of the
approaches?
- Which features give an algorithm competitive advantage?
- Given a problem instance with particular features, which approach should I use?
Which algorithm is suitable for future problems?

Example
Which approach is better? SF110
C. Oliveira, A. Aleti, L. Grunske and K. Smith-Miles, "Mapping the Effectiveness of Automated Test Suite Generation
Techniques," in IEEE Transactions on Reliability, vol. 67, no. 3, pp. 771-785, Sept. 2018, doi: 10.1109/TR.2018.2832072.

Open Questions
● What impacts the effectiveness of SBSE techniques?
○ How can features of problem instances help us infer what are the strengths and weaknesses of
different SBSE approaches?
○ How can we objectively assess different SBSE techniques
● How easy or hard are existing benchmarks? How diverse are they? Are they biased
towards a particular technique?
● Can we select the most suitable SBSE technique given a problem with particular
features?

Empirical Review of Program Repair Tools: A Large-Scale Experiment on 2 141 Bugs and 23 551
Repair Attempts. T. Durieux, F. Madeiral, M. Martinez, R. Abreu. ESEC/FSE Foundations of Software
Engineering (2019) doi: 10.1145/ 3338906.3338911.

ISA
K. Smith-Miles et al. / Computers & Operations Research 45 (2014) 12–24

Steps of ISA
1. Create the metadata
a. Features
b. SBSE performances
2. Create instance space
3. Visualise footprints
4. Explain strengths/weaknesses

Features (56)
What makes the problem easy or hard?

Performance measure
● Branch coverage.
● An approach is considered superior if its branch coverage is at least 1% higher than
the other techniques; otherwise, we use the label “Equal.”

Approaches
● Whole Test Suite with Archive (WSA)
● Many Objective Sorting Algorithm (MOSA)
● Random Testing (RT)

Significant features
● coupling between object classes
○ the number of classes coupled to a given class (method calls, field accesses, inheritance,
arguments, return types, and exceptions)
● response for a class
○ number of different methods that can be executed when a method is invoked for that object
of a class

Features (146)
Observation-based features (Yu et al. 2019)

(F1) MOA: Measure of Aggregation.
(F2) CAM: Cohesion Among Methods
(F3) AMC: Average Method Complexity
(F4) PMC: Private Method Count
(F5) AECSL: Atomic Expression Comparison Same Left indicates the number of statements
with a binary expression that have more than an atomic expression (e.g., variable access).
(F6) SPTWNG: Similar Primitive Type With Normal Guard indicates the number of
statements that contain a variable (local or global) that is also used in another statement
contained inside a guard (i.e., an If condition).
(F7) CVNI: Compatible Variable Not Included is the number of local primitive type variables
within the scope of a statement that involves primitive variables that are not part of that
statement.
(F8) VCTC: Variable Compatible Type in Condition measures the number of variables within
an If condition that are compatible with another variable in the scope.
(F9) PUIA: Primitive Used In Assignment - the number of primitive variables in assignments.

● Little overlap between
IntroClassJava/Defects4J and the other
datasets
● Bugs.jar has the most diverse bugs

For ISA to reveal useful insights
● Diverse features
● Diverse instances
● Diverse approaches
● A good performance measure

So what
We have a responsibility to find the weaknesses of the approaches we develop
We need to make sure that the chosen problem instances are demonstrably diverse,
unbiased, representative of a range of real world context, challenging,
discriminating of approach performance
To understand which approach is suitable for future problems, we must understand
which features impact its performance

Instance Space Analysis for Search Based Software Engineering

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Instance Space Analysis for Search Based Software Engineering

Similar to Instance Space Analysis for Search Based Software Engineering (20)

Recently uploaded

Recently uploaded (20)

Instance Space Analysis for Search Based Software Engineering