Searching for Quality: Genetic Algorithms and
Metamorphic Testing for Software Engineering ML
1
Ruben Marang
rmar@live.nl
Annibale Panichella
a.panichella@tudelft.nl
TU Delft - CISELab
Leonhard Applis
l.h.applis@tudelft.nl
TU Delft - CISELab
Background
2
Large Language Models (LLMs)
3
LLMs for Source Code
4
ICSE 2022 ICSE 2023
The goal is to semi-automate
traditional SE tasks.
Here, some examples:
• Code summarization
• Code retrieval
• Name predictions
• Test case generation
• …
Are LLMs Robust?
5
Code2Vec
Input Program Predicted
method names
www.code2vec.org
Are LLMs Robust?
6
Let’s change the variable names
• swapped memory
• array a
→
→
Input Program
Predicted
method names
The model provides
different results although
the program semantic
(behavior) is unchanged
Code2Vec
LLMs and Metamorphic Testing
7
• Prior studies showed that LLMs are not
robust to metamorphic
transformations
• Metamorphic transformation are applied
via random sampling
• This can lead to code snippets humans
would never write
• Is random sampling the best strategy?
ASE (NIER) 2021
void f(int[] array) {
if (true){
if (true){
if (true){
 original
 code
}
}
}
}
Example of
higher-order MT
Search-based Robustness Testing
for LLMs
8
Encoding
9
boolean f(Object target) {
for (Object elem: this.elements) {
if (elem.equals(target)) {
return true;
}
}
return false;
}
Program
Abstract SyntaxTree (AST)
Encoding
AST Node
Metamorphic
Transformation
Encoding
10
boolean f(Object target) {
for (Object elem: this.elements) {
if (!elem.equals(target)) {
// nothing
} else
return true;
}
return false;
}
Program
Abstract SyntaxTree (AST)
Encoding
IfStmt
Reverse
Condition
AST Node
Metamorphic
Transformation
Encoding
11
boolean f(Object target) {
for (Object elem: this.elements) {
if (!elem.equals(target)) {
int i = 0;
} else
return true;
}
return false;
}
Program
Abstract SyntaxTree (AST)
Encoding
IfStmt BlockStmt … …
Reverse
Condition
Unused
Variabel
… …
AST Node
Metamorphic
Transformation
Search Process (1)
12
Metrics
Code2Vec
Genetic Search
Files
Transformer
create
evaluate
It starts with a seed (initial) program
Run Code2Vec to get the prediction output
(e.g., method name)
Measure the quality of the prediction (e.g.,
method name accuracy)
Create new programs by randomly adding
metamorphic transformations (MTs)
1
1
2
2
3
3
4
4
Search Process (2)
13
Metrics
Code2Vec
Genetic Search
Files
Transformer
create
evaluate
Run Code2Vec against the new program
fi
les
Measure the quality of the predictions (e.g.,
method name accuracy)
Fitness = drop in the prediction quality
Create new programs with crossover and
mutation:
Crossover = recombine the applied MTs
Mutation = add (80%) or remove (20%) m.
transformations
5
5
6
6
7
7
8
8
Fitness Functions
We investigate three different
fi
tness functions:
• Drop in F1 metrics =
• Increase in MRR =
• Combination of the two functions above:
ΔF1
ΔMRR
0.5 × ΔF1 + 0.5 × ΔMRR
14
Original
Metric
After
MTs
After
MTs
Original
F1 MRR
Empirical Evaluation
15
Study Setup
16
Target Model = Code2Vec
Trained on more than 14 millions
code examples
Seed programs = 350
methods from the test-set
Parameters Value
Population size N 10
Mutation rate 0.50
Crossover probability pc = 1
Termination criterion 360 min
RQ1: Effectiveness of the Search
17
co 2023, 15-19 July, 2023, Lisbon, Portugal Leonhard Applis, Ruben M
gure 2: Comparison of F1 for random and genetic search
Figure 4: Metric-movement for rando
Algorithm in comparison:
• Random search (used in the literature)
• Genetic Algorithm (proposed by us)
Average over 10 runs
Main results:
• GA leads to10% drop in F1-score within 15
generations
• Random search performs about half as good
the genetic search
# Evaluated Solutions
20 40 60 80 100 120 140 150
F1
Further Analysis
18
• Correlation (RQ2):
High AST Changes = Better Survival
• Code2Vec preprocessing heavily utilizes
AST attributes
• IfTrue and LambdaIdentity are
the most frequent and successful
metamorphic transformations (RQ3)
More Analyses Available in our Paper…
19
Full Replication Package
20
Replication package:
https://zenodo.org/record/7306931
Source Code:
https://zenodo.org/record/7307012
In Summary
21
Thank you!

Searching for Quality: Genetic Algorithms and Metamorphic Testing for Software Engineering ML

  • 1.
    Searching for Quality:Genetic Algorithms and Metamorphic Testing for Software Engineering ML 1 Ruben Marang rmar@live.nl Annibale Panichella a.panichella@tudelft.nl TU Delft - CISELab Leonhard Applis l.h.applis@tudelft.nl TU Delft - CISELab
  • 2.
  • 3.
  • 4.
    LLMs for SourceCode 4 ICSE 2022 ICSE 2023 The goal is to semi-automate traditional SE tasks. Here, some examples: • Code summarization • Code retrieval • Name predictions • Test case generation • …
  • 5.
    Are LLMs Robust? 5 Code2Vec InputProgram Predicted method names www.code2vec.org
  • 6.
    Are LLMs Robust? 6 Let’schange the variable names • swapped memory • array a → → Input Program Predicted method names The model provides different results although the program semantic (behavior) is unchanged Code2Vec
  • 7.
    LLMs and MetamorphicTesting 7 • Prior studies showed that LLMs are not robust to metamorphic transformations • Metamorphic transformation are applied via random sampling • This can lead to code snippets humans would never write • Is random sampling the best strategy? ASE (NIER) 2021 void f(int[] array) { if (true){ if (true){ if (true){ original code } } } } Example of higher-order MT
  • 8.
  • 9.
    Encoding 9 boolean f(Object target){ for (Object elem: this.elements) { if (elem.equals(target)) { return true; } } return false; } Program Abstract SyntaxTree (AST) Encoding AST Node Metamorphic Transformation
  • 10.
    Encoding 10 boolean f(Object target){ for (Object elem: this.elements) { if (!elem.equals(target)) { // nothing } else return true; } return false; } Program Abstract SyntaxTree (AST) Encoding IfStmt Reverse Condition AST Node Metamorphic Transformation
  • 11.
    Encoding 11 boolean f(Object target){ for (Object elem: this.elements) { if (!elem.equals(target)) { int i = 0; } else return true; } return false; } Program Abstract SyntaxTree (AST) Encoding IfStmt BlockStmt … … Reverse Condition Unused Variabel … … AST Node Metamorphic Transformation
  • 12.
    Search Process (1) 12 Metrics Code2Vec GeneticSearch Files Transformer create evaluate It starts with a seed (initial) program Run Code2Vec to get the prediction output (e.g., method name) Measure the quality of the prediction (e.g., method name accuracy) Create new programs by randomly adding metamorphic transformations (MTs) 1 1 2 2 3 3 4 4
  • 13.
    Search Process (2) 13 Metrics Code2Vec GeneticSearch Files Transformer create evaluate Run Code2Vec against the new program fi les Measure the quality of the predictions (e.g., method name accuracy) Fitness = drop in the prediction quality Create new programs with crossover and mutation: Crossover = recombine the applied MTs Mutation = add (80%) or remove (20%) m. transformations 5 5 6 6 7 7 8 8
  • 14.
    Fitness Functions We investigatethree different fi tness functions: • Drop in F1 metrics = • Increase in MRR = • Combination of the two functions above: ΔF1 ΔMRR 0.5 × ΔF1 + 0.5 × ΔMRR 14 Original Metric After MTs After MTs Original F1 MRR
  • 15.
  • 16.
    Study Setup 16 Target Model= Code2Vec Trained on more than 14 millions code examples Seed programs = 350 methods from the test-set Parameters Value Population size N 10 Mutation rate 0.50 Crossover probability pc = 1 Termination criterion 360 min
  • 17.
    RQ1: Effectiveness ofthe Search 17 co 2023, 15-19 July, 2023, Lisbon, Portugal Leonhard Applis, Ruben M gure 2: Comparison of F1 for random and genetic search Figure 4: Metric-movement for rando Algorithm in comparison: • Random search (used in the literature) • Genetic Algorithm (proposed by us) Average over 10 runs Main results: • GA leads to10% drop in F1-score within 15 generations • Random search performs about half as good the genetic search # Evaluated Solutions 20 40 60 80 100 120 140 150 F1
  • 18.
    Further Analysis 18 • Correlation(RQ2): High AST Changes = Better Survival • Code2Vec preprocessing heavily utilizes AST attributes • IfTrue and LambdaIdentity are the most frequent and successful metamorphic transformations (RQ3)
  • 19.
    More Analyses Availablein our Paper… 19
  • 20.
    Full Replication Package 20 Replicationpackage: https://zenodo.org/record/7306931 Source Code: https://zenodo.org/record/7307012
  • 21.