Searching for Quality: Genetic Algorithms and Metamorphic Testing for Software Engineering ML

Searching for Quality: Genetic Algorithms and
Metamorphic Testing for Software Engineering ML
1
Ruben Marang
rmar@live.nl
Annibale Panichella
a.panichella@tudelft.nl
TU Delft - CISELab
Leonhard Applis
l.h.applis@tudelft.nl
TU Delft - CISELab

Large Language Models (LLMs)
3

LLMs for Source Code
4
ICSE 2022 ICSE 2023
The goal is to semi-automate
traditional SE tasks.
Here, some examples:
• Code summarization
• Code retrieval
• Name predictions
• Test case generation
• …

Are LLMs Robust?
5
Code2Vec
Input Program Predicted
method names
www.code2vec.org

Are LLMs Robust?
6
Let’s change the variable names
• swapped memory
• array a
→
→
Input Program
Predicted
method names
The model provides
different results although
the program semantic
(behavior) is unchanged
Code2Vec

LLMs and Metamorphic Testing
7
• Prior studies showed that LLMs are not
robust to metamorphic
transformations
• Metamorphic transformation are applied
via random sampling
• This can lead to code snippets humans
would never write
• Is random sampling the best strategy?
ASE (NIER) 2021
void f(int[] array) {
if (true){
if (true){
if (true){
original
code
}
}
}
}
Example of
higher-order MT

Search-based Robustness Testing
for LLMs
8

Encoding
9
boolean f(Object target) {
for (Object elem: this.elements) {
if (elem.equals(target)) {
return true;
}
}
return false;
}
Program
Abstract SyntaxTree (AST)
Encoding
AST Node
Metamorphic
Transformation

Encoding
10
if (!elem.equals(target)) {
// nothing
} else
return true;
}
return false;
}
Program
Encoding
IfStmt
Reverse
Condition
AST Node
Metamorphic
Transformation

Encoding
11
if (!elem.equals(target)) {
int i = 0;
} else
return true;
}
return false;
}
Program
Encoding
IfStmt BlockStmt … …
Reverse
Condition
Unused
Variabel
… …
AST Node
Metamorphic
Transformation

Search Process (1)
12
Metrics
Code2Vec
Genetic Search
Files
Transformer
create
evaluate
It starts with a seed (initial) program
Run Code2Vec to get the prediction output
(e.g., method name)
Measure the quality of the prediction (e.g.,
method name accuracy)
Create new programs by randomly adding
metamorphic transformations (MTs)
1
1
2
2
3
3
4
4

Search Process (2)
13
Metrics
Code2Vec
Genetic Search
Files
Transformer
create
evaluate
Run Code2Vec against the new program
fi
les
Measure the quality of the predictions (e.g.,
method name accuracy)
Fitness = drop in the prediction quality
Create new programs with crossover and
mutation:
Crossover = recombine the applied MTs
Mutation = add (80%) or remove (20%) m.
transformations
5
5
6
6
7
7
8
8

Fitness Functions
We investigate three different
fi
tness functions:
• Drop in F1 metrics =
• Increase in MRR =
• Combination of the two functions above:
ΔF1
ΔMRR
0.5 × ΔF1 + 0.5 × ΔMRR
14
Original
Metric
After
MTs
After
MTs
Original
F1 MRR

Study Setup
16
Target Model = Code2Vec
Trained on more than 14 millions
code examples
Seed programs = 350
methods from the test-set
Parameters Value
Population size N 10
Mutation rate 0.50
Crossover probability pc = 1
Termination criterion 360 min

RQ1: Effectiveness of the Search
17
co 2023, 15-19 July, 2023, Lisbon, Portugal Leonhard Applis, Ruben M
gure 2: Comparison of F1 for random and genetic search
Figure 4: Metric-movement for rando
Algorithm in comparison:
• Random search (used in the literature)
• Genetic Algorithm (proposed by us)
Average over 10 runs
Main results:
• GA leads to10% drop in F1-score within 15
generations
• Random search performs about half as good
the genetic search
# Evaluated Solutions
20 40 60 80 100 120 140 150
F1

Further Analysis
18
• Correlation (RQ2):
High AST Changes = Better Survival
• Code2Vec preprocessing heavily utilizes
AST attributes
• IfTrue and LambdaIdentity are
the most frequent and successful
metamorphic transformations (RQ3)

More Analyses Available in our Paper…
19

Full Replication Package
20
Replication package:
https://zenodo.org/record/7306931
Source Code:
https://zenodo.org/record/7307012

Searching for Quality: Genetic Algorithms and Metamorphic Testing for Software Engineering ML

More Related Content

Similar to Searching for Quality: Genetic Algorithms and Metamorphic Testing for Software Engineering ML

More from Annibale Panichella

Recently uploaded

Searching for Quality: Genetic Algorithms and Metamorphic Testing for Software Engineering ML