Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results in the Space Domain

Mutation Analysis for Cyber-Physical Systems:
Scalable Solutions and Results in the Space Domain
Oscar Cornejo, Fabrizio Pastore, Lionel Briand
University of Luxembourg
University of Ottawa
ICSE 2022
Journal First paper appearing in in IEEE Transactions on Software Engineering,
doi:10.1109/TSE.2021.3107680

2
Fault-based, Automated Quality Assurance
Assessment and Augmentation for Space Software
(FAQAS)
Gomspace
Luxembourg (GSL)
• Develop Nanosatellites
• Case study provider
• Technology validator
LuxSpace
SnT/University of
Luxembourg
• Technology provider
• Develop Microsatelites
https://faqas.uni.lu/
• Tenderer

Software
has a prominent role
in Space Systems

Software failures have high impact

How
Space Software is
Verified and
Validated?

Software testing is prevalent in V&V
Word cloud for
ECSS-Q-ST-80C standard

How to ensure
thorough testing?

8
Mutation Analysis
SUT SUT SUT SUT
Test
Suite
Test
Suite
Test
Suite
Test
Suite
SUT
Test
Suite
. . .
FAIL PASS FAIL FAIL PASS
SUT
Test
Suite
PASS
Mutation score =
# FAIL
Total
EQUIVALENT
REDUNDANT
SCALABILITY

12
Mutation Literature
Lack integrated
pipeline
Schemata and split-stream:
Infeasible with hardware
emulators
Mutants sampling:
no accuracy guarantees
or too many mutants
Prioritization solutions:
Assumptions on test length
or rely on expensive data-flow
analysis
Redundant mutants:
better keep
subsumed ones but
distinguish them.
We discard
only duplicate
mutants.
Equivalent mutants’ detection:
Symbolic execution
infeasible with large
systems
(modelling of channels,
no floating point, ..)
Dynamic analysis
collect too much data

Our pipeline:
Mutation Analysis for
Space Sofware
(MASS)

14
Collect test data
1
Code
Coverage

15
Create mutants
2
Collect test data
1
Code
Coverage

16
Create mutants Compile mutants
2
Collect test data
1
Code
Coverage
3
Mutants successfully
compiled

17
2
Collect test data
1
Code
Coverage
Remove equivalent/duplicate
based on compiler optimizations
4
3
compiled
Unique mutants
• Literature: Equivalent programs may lead to same executables
after compiler optimizations
• We compile the original software and every mutant multiple
times
• once for each optimization option (i.e., -O0, -O1, -O2, -O3, -Os, -Ofast in GCC)
• we compute the SHA-512 hash summary of the generated executable
• we compare hash summaries

18
Killed Mutants
Live Mutants
2
Collect test data
1
Code
Coverage
4
3
Mutants
Code coverage
compiled
Unique mutants
Sampled mutants
Sample mutants
Execute test cases
5
6

21
65.36%
Mutants sampling
20%
40%
60%
80%
100%
0%
0 500 1000
Mutation
score

22
65.36%
Mutants sampling
20%
40%
60%
80%
100%
0%
0 500 1000
Mutants sampled
Mutation
score

23
65.36%
Mutants sampling
20%
40%
60%
80%
100%
0%
0 500 1000
• A confidence interval captures a range
that has a probability (e.g., 95%) of
including the estimated value, (the
mutation score)
• MS=[L;U]
• Assuming that MS follows a binomial
distribution, we rely on the Clopper-
Pearson method for CI computation
Mutation
score
Mutants sampled
• Fixed-width sequential confidence
interval (FSCI) method:
• stop sampling when (U – L) < 10
• difference between estimated MS and
actual one is at most 5 percentage p.

24
Killed Mutants
Live Mutants
2
Collect test data
1
Code
Coverage
4
3
Mutants
Code coverage
compiled
Unique mutants
Evaluate mutation
score’s confidence
Sampled mutants
Sample mutants
Execute test cases
5
6 7

25
Killed Mutants
Live Mutants
2
Collect test data
1
Code
Coverage
4
3
Mutants
Code coverage
compiled
Unique mutants
Evaluate mutation
Sampled mutants
Sample mutants
Execute prioritized
subset of test cases
5
6 7 • MS=[L+Perr;U+ Perr]
• Proportion of mutants live
by mistake
Perr =[Lerr;Uerr]
• MS=[L+Lerr;U+ Uerr]
Confidence interval correction

26
Killed Mutants
Live Mutants
2
Collect test data
1
Code
Coverage
4
3
Mutants
Code coverage
compiled
Unique mutants
Evaluate mutation
Sampled mutants
Sample mutants
Execute prioritized
5
6 7
• Based on statement coverage
• Test suite prioritization:
• greedy algorithm
• select first the test cases
with the largest distance
from the closest, already
selected, test case
• Test suite reduction: exclude test
cases with a distance of zero
• Cosine distance: best accuracy

32
Killed Mutants
Live Mutants Live,
non-equivalent
2
Collect test data
1
Code
Coverage
4
3
Mutants
Code coverage
compiled
Unique mutants
Killed
Evaluate mutation
Sampled mutants
Discard likely equivalent
mutants based on coverage
Sample mutants
Execute prioritized
5
6 7 8

33
Discard likely equivalent mutants
Inspired by Schuler et al.

34
not equivalent if distance > threshold T
Stmt1: 20
Stmt2: 0
…
Stmt357: 6
Stmt1: 73
…
Stmt357: 56
Original program Mutant
Stmt1: 20
…
Stmt357: 6
Stmt1: 20
Stmt2: 0
…
Stmt357: 6
Stmt1: 2
…
Stmt357: 6
Stmt1: 20
…
Stmt357: 6
… …
Test1 Test1
Test5 Test5
Test257 Test257
distance=0
distance=0.2
distance=0.1

35
not equivalent if distance > 0
Stmt1: 20
Stmt2: 0
…
Stmt357: 6
Stmt1: 73
…
Stmt357: 56
Original program Mutant
Stmt1: 20
…
Stmt357: 6
Stmt1: 20
Stmt2: 0
…
Stmt357: 6
Stmt1: 2
…
Stmt357: 6
Stmt1: 20
…
Stmt357: 6
… …
Test1 Test1
Test5 Test5
Test257 Test257
distance=0
distance=0.2
distance=0.1

37
Killed Mutants
Live Mutants Live,
non-equivalent
2
Collect test data
1
Code
Coverage
Remove equivalent/redundant
4
3
Mutants
Code coverage
compiled
Unique mutants
Killed
Evaluate mutation
Sampled mutants
Sample mutants
Execute prioritized
5
6 7 8
Compute final
mutation score
9

38
Killed Mutants
Live Mutants Live,
non-equivalent
2
Collect test data
1
Code
Coverage
4
3
Mutants
Code coverage
compiled
Unique mutants
Killed
Evaluate mutation
Sampled mutants
Sample mutants
Execute prioritized
5
6 7 8
Compute final
mutation score
9
Mutation testing
10

39
Killed Mutants
Live Mutants Live,
non-equivalent
2
Collect test data
1
Code
Coverage
4
3
Mutants
Code coverage
compiled
Unique mutants
Killed
Evaluate mutation
Sampled mutants
Sample mutants
Execute prioritized
5
6 7 8
Compute final
mutation score
9
Mutation testing
10

42
Case study subjects
• Control software of LuxSpace’s
ESAIL satelite
• Network, Configuration, and Utility
libraries from GomSpace’s
nanosatellites
• MLFS - Mathematical Library for
Flight Software by ESA

45
RQ2. How do FSCI-based sampling compare
to other mutants sampling approaches?
• compute the actual mutation score using all the mutants
• compute the mutation score obtained with different sampling
methods, in 100 executions
• uniform sampling (i.e., selecting 1%, 2%, ..,20%, .., 100% of the mutants)
• uniform fixed-size sampling (i.e., selecting 100, 200, ..,1000 of the mutants)
• FSCI-based sampling (with CI=0.10, 0.09, .. 0.01)
• determine which approach configuration provides an accurate
mutation score with the lowest number of samples
• the mutation score is accurate if, for 95% executions, the difference from the actual
mutation score is below 5 percentage points

46
RQ2. Results
LibNet LibConf LibUt MLFS ESAIL
Approach # Mutants Delta Approach # Mutants Delta Approach # Mutants Delta Approach# Mutants Delta Approach # Mutants Delta
SAMPLE 0,01 50 13,64 SAMPLE 0,01 40 12,19 FIXED 100 7,32 FIXED 100 7,8 SAMPLE 0,01 36 14,04
SAMPLE 0,02 100 10,64 SAMPLE 0,02 79 10,03 SAMPLE 0,01 146 7,54 FIXED 200 5,2 SAMPLE 0,02 71 11,84
FIXED 100 9,88 FIXED 100 10,17 FIXED 200 5,73 SAMPLE 0,01 214 4,9 FIXED 100 8,89
SAMPLE 0,03 150 10,36 SAMPLE 0,03 118 7,7 SAMPLE 0,02 292 6,15 FSCI 0.1 248 4,56 SAMPLE 0,03 107 8,84
SAMPLE 0,04 200 6,4 SAMPLE 0,04 158 6,46 FIXED 300 5,37 FIXED 300 4,04 SAMPLE 0,04 142 7,92
FIXED 200 5,86 SAMPLE 0,05 197 6,98 FSCI 0.1 333 4,73 FSCI 0.09 302 4,64 SAMPLE 0,05 177 6,39
SAMPLE 0,05 250 7,07 FIXED 200 6,12 FIXED 400 4,45 FSCI 0.08 379 4,01 FIXED 200 6,14
SAMPLE 0,06 299 5,95 SAMPLE 0,06 236 5,78 FSCI 0.09 409 4,08 FIXED 400 3,8 SAMPLE 0,06 213 6,25
FIXED 300 4,48 SAMPLE 0,07 276 5,73 SAMPLE 0,03 438 4,2 SAMPLE 0,02 428 3,19 SAMPLE 0,07 248 5,2
SAMPLE 0,07 349 5,01 FIXED 300 5,88 FIXED 500 3,8 SAMPLE 0,03 642 3,15 SAMPLE 0,08 283 5,68
FSCI 0.1 364 4,42 SAMPLE 0,08 315 5,01 FSCI 0.08 514 3,94 SAMPLE 0,04 855 2,53 FIXED 300 5,53
SAMPLE 0,08 399 5,12 FSCI 0.1 346 4,26 SAMPLE 0,04 583 3,28 SAMPLE 0,05 1.069 2,58 SAMPLE 0,09 319 4,55
FIXED 400 5,49 SAMPLE 0,09 354 3,48 FIXED 600 3,29 SAMPLE 0,06 1.283 2,24 FSCI 0.1 366 3,92
FSCI 0.09 447 3,7 SAMPLE 0,1 394 4,36 FSCI 0.07 668 3,99 SAMPLE 0,07 1.497 2,24 SAMPLE 0,1 354 5,26
SAMPLE 0,09 449 4,53 FIXED 400 4,27 FIXED 700 3,3 SAMPLE 0,08 1.710 1,71 FIXED 400 4,52
SAMPLE 0,1 499 4,61 FSCI 0.09 425 3,79 SAMPLE 0,05 729 3,11 SAMPLE 0,09 1.924 1,73 FSCI 0.09 449 3,66
FIXED 500 3,85 FIXED 500 3,63 SAMPLE 0,06 875 2,92 SAMPLE 0,1 2.138 1,55 FIXED 500 4,08

48
RQ4. How do test suite optimization strategies
speed-up the mutation analysis process?
• Non-optimized mutation analysis
• for every mutant, execute all the test cases
that cover the mutant
• MASS includes two optimizations
• FSCI-based sampling
• Prioritize and Reduce
• Three possible MASS configurations
• execute all mutants and apply Prioritize and Reduce
• rely only on FSCI-based sampling
• FSCI-based sampling and Prioritize and Reduce
• Metric: the time savings wrt non-optimized
• consider 100 executions
Mutants Test cases
M1 T1 T2 T3 T4 T9
M2 T7 T9
M3 T4 T5 T8
M4 T10 12 T13 T15 T8
M1 T1 T2 T3 T4 T9
M3 T4 T5 T8
M1 T9 T3
M2 T9 T7
M3 T5 T8 T4
M4 T15 13 T8 T15
M1 T9 T3
M3 T5 T8 T4

51
RQ4. Results
• MASS makes mutation analysis
feasible for large software
• ESAIL from 11,000 hours to 1,531
(12 hours with 100 HPC nodes)
Subject Execution time (hours)
ESAILS 11,000
LIBNet 70
LIBPar 13
LIBUtil 59
MLFS 47
Non-optimized execution
execute all mutants
and apply
Prioritize and Reduce
FSCI-based
sampling
and
Prioritize and Reduce
rely on
FSCI-based
sampling

56
RQ6. How do MASS and traditonal mutation
score compare?
• Difference depends on the equivalent mutants being discarded
• Except for MLFS, score reflect imperfections in SiL test suites
• several features covered with HiL testing
• lack of coverage for exceptional/unusual case
• MLFS achieves MC/DC
• however, deletion operators show limitations in the test input partitions covered
Mutation score
Subject Traditional MASS
ESAILS 65.36 65.95
LIBNet 65.64 70.92
LIBPar 69.12 85.95
LIBUtil 71.20 84.41
MLFS 81.80 93.49
Average 70.62 81.14

57
How to ensure
thorough testing
in Space CPS?
Results confirm the scalability and effectiveness of MASS, in particular
(1) accuracy and time savings introduced by FSCI-based sampling
(2) usefulness of mutation analysis to discover test suite pitfalls
39
Killed Mutants
Live Mutants Live,
non-equivalent
2
Collect test data
1
Code
Coverage
4
3
Mutants
Code coverage
compiled
Unique mutants
Killed
Evaluate mutation
Sampled mutants
Sample mutants
Execute prioritized
5
6 7 8
Compute final
mutation score
9
Mutation testing
10
MASS: Mutation Analysis for Space Software
https://faqas.uni.lu/

Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results in the Space Domain

Recommended

Recommended

More Related Content

Similar to Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results in the Space Domain

Similar to Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results in the Space Domain (20)

More from Lionel Briand

More from Lionel Briand (20)

Recently uploaded

Recently uploaded (20)

Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results in the Space Domain