Search-Based Testing for Formal
Software Verification and Vice Versa
Shiva Nejati
snejati@uottawa.ca
@ShivaNejati
School of Electrical Engineering and Computer Science, University of Ottawa and
SnT Centre, University of Luxembourg
1
Search-Based Software Testing
2
W. Miller and D. L. Spooner, "Automatic
Generation of Floating-Point Test Data," in
IEEE TS, SE-2(3): 223-226, 1976.
Search-Based Software Testing
2
W. Miller and D. L. Spooner, "Automatic
Generation of Floating-Point Test Data," in
IEEE TS, SE-2(3): 223-226, 1976.
Bogdan Korel: Automated Software Test Data
Generation. IEEE TSE, 16(8): 870-879, 1990.
SBSE and SBST
3
Problem Domains
http://crestweb.cs.ucl.ac.uk/resources/sbse_repository/
SBST Applications
• Applied to various categories of software testing:
• Unit testing
• System testing
• Regression testing
• Model-based testing
• …
4
SBST Strengths
• Scalable
• Can be parallelized easily
• Versatile
• Make few assumptions about the structure of their inputs
• Flexible and adaptable
• Can be combined with other methods: Constraint Solving, Machine
Learning, etc.
• Simple!
5
But When Can We Not Use Search?
6
Verification
• Establishing properties of
programs by mathematical
proofs (Static Verification)
• Demonstrating
correctness of all system
usages
But When Can We Not Use Search?
6
Verification
• Establishing properties of
programs by mathematical
proofs (Static Verification)
• Demonstrating
correctness of all system
usages
Testingvs
Checking the system for a
set of normal and boundary
usages
Classical versus Stochastic
7
Stochastic
Optimisation
vs
Sampling solutions in a
randomised way and checking
if a desired solution is found.
Classical
Optimisation
Building solutions
incrementally (or recursively)
following a (semi)
deterministic algorithm.
“Program testing can be used
to show the presence of bugs,
but never their absence.”
Edsger W. Dijkstra
Dichotomy Between Testing and
Verification
• “As a programmer, even if you would like to have
correctness, you might find yourself spending most of your
time reasoning about incorrectness.” Peter W. O’Hearn
• Testing and verification almost mean the same for
practitioners
9
PhD in Formal
Methods
PhD in Formal
Methods
PostDoc in
Empirical SE
PhD in Formal
Methods
PostDoc in
Empirical SE
GA! No way! how can

a randomized search 

algorithm solve this problem?
PhD in Formal
Methods
PostDoc in
Empirical SE
But is there a way to
combine or compare both
types of optimisations?
In This Talk, …
11
Testing and/or Verification of Models of
Cyber Physical Systems
Claudio Menghi Khouloud Gaaloul Lionel Briand
• R1: The angular velocity of the satellite shall always be lower
than 1.5 m/s
• R4: The satellite attitude shall reach close to its target value
within 2000 s
The SatEx case study
13
The CPS development workflow
PHASE 1:
Modeling
(Simulink) Model
14
The CPS development workflow
PHASE 2:
Verification
Input Outputs
Model
Requirements
Check
15
The CPS development workflow
PHASE 3:
Coding
Model Source code
16
The CPS development workflow
Modeling
(Simulink)
Verification/

Testing
Coding
17
The CPS development workflow
Verification/

Testing
17
Model Requirements
18
Model Checking Model Testing
Comparing Model Checking
and Model Testing
Nejati, S., Gaaloul, K., Menghi, C., Briand, L.C., Foster, S., Wolfe, D. “Evaluating model testing and model checking
for nding requirements violations in Simulink models.” In: Proceedings of ESEC/SIGSOFT FSE 2019, Estonia, August
26-30, 2019. pp. 1015–1025. ACM (2019)
20
Model CheckingModel Testing
Logical
Properties
Simulink
Models
Natural Language
Requirements
Model
Checking
Model proven to
be correct
Failure Found No result
Ranges of
test input
variables
Simulink
Models
Natural Language
Requirements
Model
Testing
Logical
Properties
Fitness
Functions
No Failure FoundFailure Found
Simulink Model Checker
• QVTrace from QRA Corp,
Canada
• SMT-based model checker
for Simulink, Z3,
Mathematica
21
https://qracorp.com/qvtrace/
QVTrace
22
QVtrace has been designed to optimize the workflow for model-based design analysis. The
interface has three main sections as shown in the image below and described in detail on
the next page.
QVtrace User Manual v0.11.7 qracorp.com of4 21
1
2
3
Analysis in QVtrace can be approached in two ways:
a) By formally translating sets of requirements specifications and verifying the model
meets these, or
b) As an interactive querying process where the domain expert iteratively queries the
model for expected behaviour as the system components are modelled.
Analysis will always be done on all constraints present in the Constraints Window and can
be run from any subsystem in the model. It is important to note that the analysis will always
check the entire model against all constraints present, and not just the subsystem being
shown in the Design Navigation Window.
When running analysis, the constraints will first be verified to ensure these are consistent
with the QCT language syntax (see Section 5 for a guide to the QCT language syntax). For
example writing “param_1 == 5” where param_1 is a boolean variable will return an error
message stating that the constraint is inappropriately written, and no analysis will be run
on the model.
4. Interpreting QVtrace Analysis Results
4.1.Possible analysis results
No violations exist: This implies that the model is
consistent with the stated constraints for all possible
input values, and at all times. As shown in the left image,
the Results tab will turn green when no violations exist.
model for expected behaviour as the system component
Analysis will always be done on all constraints present in the C
be run from any subsystem in the model. It is important to note
check the entire model against all constraints present, and no
shown in the Design Navigation Window.
When running analysis, the constraints will first be verified to
with the QCT language syntax (see Section 5 for a guide to the
example writing “param_1 == 5” where param_1 is a boolean
message stating that the constraint is inappropriately written,
on the model.
4. Interpreting QVtrace Analysis Resu
4.1.Possible analysis results
No violations exist: This i
consistent with the stated
input values, and at all times.
the Results tab will turn green when no violations exist.
No violations exist up to a m
implies that the model has be
with the constraints within th
the system. However, there is no guarantee that at some great
When running analysis, the cons
with the QCT language syntax (s
example writing “param_1 == 5”
message stating that the constra
on the model.
4. Interpreting QVtr
4.1.Possible analysis
the Results tab will turn green wh
the system. However, there is no
occur. In these cases, the result
required to assess the validity o
including an explicit time referenc
QVtrace User Manual v0.11.7
Model Testing
(Falsification-based Testing)
• Uses meta-heuristic search
• Search guidance: fitness functions estimating how far a
candidate test is from violating a requirement
• Search heuristics: random, hill climbing, simulated
annealing, genetic algorithm, etc.
Search-based automated testing of continuous controllers: Framework, tool support, and case studies
Reza Matinnejad, Shiva Nejati, Lionel C. Briand, Thomas Bruckmann, and Claude Poull
Information & Software Technology, 2015
23
CPS Models
24
• 11 Models
• Open-loop vs Feedback-loop
• State-machines
• Continuous-Dynamics - Dynamical-Systems
• Non-linear dynamics
• Machine Learning components
Results — Fault Finding
25
Testing Model Checking
Reqs Violations Proven Violations
92 40 41 23
• MT and MC together could show that 41 requirements are correct and 40
requirements are violated
• Only 11 requirements remain inconclusive
Results — Fault Finding
26
500s0 400s300s200s100s
Results — Fault Finding
26
500s0 400s300s200s100s
BMC can analyse

up to 500 steps (50s)
Results — Fault Finding
26
500s0 400s300s200s100s
BMC can analyse

up to 500 steps (50s)
Testing found errors after 2000 steps
Results — Time
27
Testing Model Checking
Violations Proven Violations Inconclusive
5.8 min
MAX = 18.5min,
MIN=3min
0.6s
MAX=1.9s
MIN=0.06s
2.2s
MAX=10.1s
MIN=0.12s
15min to several
hours
Lessons Learned
• L1: Model Checking fails to analyse some CPS models
(Autopilot)
• This is a major obstacle in adoption of QVTrace by CPS
suppliers as confirmed by QRA
28
Lessons Learned
• L2: Model checking is less effective than Model Testing in
finding requirements failures
• Model Checking found 23 requirements violations
• Model Testing found 40 requirements violations
29
Lessons Learned
• L3: Model Checking executes considerably faster than Model
Testing when it can prove or violate requirements
• Model Checking was able to prove 41 requirements and
find violations in 23 requirements within a few seconds
30
Model Requirements
31
Model Checking Model Testing
Model Requirements
32
Model Testing
Model Requirements
32
Model Testing
Using SBST to automatically generate test inputs that reveal
requirement violations
Scaling Model Testing to Complex
Compute-intensive Models
Menghi, C., Nejati, S., Briand, L.C., Parache, Y.I. “Approximation-refinement testing of compute-intensive cyber-
physical models: An approach based on system identication”. In: International Conference on Software Engineering
(ICSE). arXiv (2020)
Challenge
34
*
Challenge
• Industrial models of CPS are often compute-intensive
34
*
Challenge
• Industrial models of CPS are often compute-intensive
• Compute-intensive models require hours to complete a single
simulation of the model under test (MUT)
34
*
Challenge
• Industrial models of CPS are often compute-intensive
• Compute-intensive models require hours to complete a single
simulation of the model under test (MUT)
• A simulation of the model of satellite requires ~1.5 hour
34
Provided by LuxSpace (https://luxspace.lu/)
*
*
Scaling Model Checking
35
E. Clarke, O. Grumberg, S. Jha, Y. Lu, H. Veith “Counterexample-Guided
Abstraction Refinement.” CAV 2000: 154-169
CEGAR
36
Model 

Abstraction 

CEGAR
36
Model 

Abstraction 

Model Check 

Abstract

Model 

CEGAR
36
Model 

Abstraction 

Model Check 

Abstract

Model 

No Bug

CEGAR
36
Model 

Abstraction 

Simulate

Bug
Model Check 

Abstract

Model 

No Bug

CEGAR
36
Model 

Abstraction 

Simulate

Bug
Model Check 

Abstract

Model 

No Bug

Real

Bug
CEGAR
36
Model 

Abstraction 

Simulate

Bug
Model Check 

Abstract

Model 

No Bug

Real

Bug
Refinement

Spurious Bug
CEGAR
36
Model 

Abstraction 

Simulate

Bug
Model Check 

Abstract

Model 

No Bug

Real

Bug
Refinement

Spurious Bug
Refined

Abstract

Model 

CEGAR
36
Model 

Abstraction 

Simulate

Bug
Model Check 

Abstract

Model 

No Bug

Real

Bug
Refinement

Spurious Bug
Refined

Abstract

Model 

Abstract 

Interpretation
AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstraction 

Model Check 

Abstract

Model 

Simulate

No Bug

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstraction 

Model Check 

Abstract

Model 

Simulate

No Bug

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstraction 

Abstract

Model 

Simulate

No Bug

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

SBST
AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstraction 

Abstract

Model 

Simulate

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

SBST
AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstraction 

Abstract

Model 

Simulate

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

SBST
AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstract

Model 

Simulate

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

SBST
Approximation 

AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstract

Model 

Simulate

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

Machine Learning

(system identification)
SBST
Approximation 

Evaluation:
Effectiveness and Efficiency
• RQ1. How effective is ARIsTEO in generating tests that reveal
requirements violations?
• RQ2. How efficient is ARIsTEO in generating tests revealing
requirements violations?
38
RQ1 and RQ2 -
Effectiveness and Efficiency
39
RQ1: On average, ARIsTEO detects 23.9% more
requirements violations than S-Taliro (min=-8%, max=95%).
RQ2: ARIsTEO is on average 31.3% (min=−1.6%, max=85.2%)
more efficient than S-Taliro.
RQ3 - Practical Usefulness
• RQ3. How applicable and useful is ARIsTEO in generating
tests revealing requirements violations for industrial CI-CPS
models?
40
RQ3 - Practical Usefulness
41
RQ3: ARIsTEO efficiently detected requirement violations
- in practical time - that S-Taliro could not find,
on an industrial CI-CPS model
Model Requirements
42
Model Testing
Model Requirements
42
Model Testing
Missing Assumptions on Inputs
43
Yaw
Roll Pitch
Req: When the autopilot is enabled, the aircraft altitude should reach the
desired altitude within 500 seconds in calm air.
Req
Missing Assumptions on Inputs
43
Yaw
Roll Pitch
Req: When the autopilot is enabled, the aircraft altitude should reach the
desired altitude within 500 seconds in calm air.
Assumption: The pilot should apply sufficient throttle force.
Req
Missing Assumptions on Inputs
43
Yaw
Roll Pitch
Req: When the autopilot is enabled, the aircraft altitude should reach the
desired altitude within 500 seconds in calm air.
Assumption: The pilot should apply sufficient throttle force.
Yaw
Roll Pitch
Throttle > c Req&
Req
Mining Assumptions using
Search and Decision Trees
Gaaloul, K., Menghi, C., Nejati, S., Briand, L., Wolfe, D. “Mining assumptions for software components using
machine learning.” In: Foundations of Software Engineering ESEC/SIGSOFT FSE 2020. ACM (2020)
Req
Yaw
Roll Pitch
Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
Model
Checking
Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
Model
Checking
Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
Model
Checking
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
P
Decision Tree
PF
F
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
P
Decision Tree
PF
F C1 ∨ C2 ∨ … ∨ Cn
Throttle > 0.5 ∧ pitchwheel > 10
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
P
Decision Tree
PF
F C1 ∨ C2 ∨ … ∨ Cn
Throttle > 0.5 ∧ pitchwheel > 10
Simple predicates!
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Genetic Programming
< ≥
-
∧
×
x Y
5 2
x Z
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Genetic Programming
< ≥
-
∧
×
x Y
5 2
x Z
x × y < 5 ∧ (x − z) ≥ 2
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Genetic Programming
< ≥
-
∧
×
x Y
5 2
x Z
x × y < 5 ∧ (x − z) ≥ 2
Complex linear and nonlinear formulas
Conclusions
• Assumption generation is important for model debugging and
compositional verification (a.k.a assume-guarantee reasoning)
• Current inference techniques rely on automata theory and can generate
only boolean assumptions or assumptions over predicates
• Applying decision tree learning to test data, we can
• Generate assumptions that include arithmetic constraints over
numeric variables
• Using genetic programming, we can even go beyond linear arithmetic
constraints
48
Summary and Reflections
• Formal verification and testing (including SBST) have a common
goal
• For most applications, formal verification fails to prove
correctness and (like testing) can only show the presence of
bugs
• SBST and ML may improve formal verification in scalability and
applicability
• Systematic frameworks developed in the formal verification
community may help improve and enhance SBST
49

SSBSE 2020 keynote

  • 1.
    Search-Based Testing forFormal Software Verification and Vice Versa Shiva Nejati snejati@uottawa.ca @ShivaNejati School of Electrical Engineering and Computer Science, University of Ottawa and SnT Centre, University of Luxembourg 1
  • 2.
    Search-Based Software Testing 2 W.Miller and D. L. Spooner, "Automatic Generation of Floating-Point Test Data," in IEEE TS, SE-2(3): 223-226, 1976.
  • 3.
    Search-Based Software Testing 2 W.Miller and D. L. Spooner, "Automatic Generation of Floating-Point Test Data," in IEEE TS, SE-2(3): 223-226, 1976. Bogdan Korel: Automated Software Test Data Generation. IEEE TSE, 16(8): 870-879, 1990.
  • 4.
    SBSE and SBST 3 ProblemDomains http://crestweb.cs.ucl.ac.uk/resources/sbse_repository/
  • 5.
    SBST Applications • Appliedto various categories of software testing: • Unit testing • System testing • Regression testing • Model-based testing • … 4
  • 6.
    SBST Strengths • Scalable •Can be parallelized easily • Versatile • Make few assumptions about the structure of their inputs • Flexible and adaptable • Can be combined with other methods: Constraint Solving, Machine Learning, etc. • Simple! 5
  • 7.
    But When CanWe Not Use Search? 6 Verification • Establishing properties of programs by mathematical proofs (Static Verification) • Demonstrating correctness of all system usages
  • 8.
    But When CanWe Not Use Search? 6 Verification • Establishing properties of programs by mathematical proofs (Static Verification) • Demonstrating correctness of all system usages Testingvs Checking the system for a set of normal and boundary usages
  • 9.
    Classical versus Stochastic 7 Stochastic Optimisation vs Samplingsolutions in a randomised way and checking if a desired solution is found. Classical Optimisation Building solutions incrementally (or recursively) following a (semi) deterministic algorithm.
  • 10.
    “Program testing canbe used to show the presence of bugs, but never their absence.” Edsger W. Dijkstra
  • 11.
    Dichotomy Between Testingand Verification • “As a programmer, even if you would like to have correctness, you might find yourself spending most of your time reasoning about incorrectness.” Peter W. O’Hearn • Testing and verification almost mean the same for practitioners 9
  • 12.
  • 13.
  • 14.
    PhD in Formal Methods PostDocin Empirical SE GA! No way! how can
 a randomized search 
 algorithm solve this problem?
  • 15.
    PhD in Formal Methods PostDocin Empirical SE But is there a way to combine or compare both types of optimisations?
  • 16.
    In This Talk,… 11 Testing and/or Verification of Models of Cyber Physical Systems
  • 17.
    Claudio Menghi KhouloudGaaloul Lionel Briand
  • 18.
    • R1: Theangular velocity of the satellite shall always be lower than 1.5 m/s • R4: The satellite attitude shall reach close to its target value within 2000 s The SatEx case study 13
  • 19.
    The CPS developmentworkflow PHASE 1: Modeling (Simulink) Model 14
  • 20.
    The CPS developmentworkflow PHASE 2: Verification Input Outputs Model Requirements Check 15
  • 21.
    The CPS developmentworkflow PHASE 3: Coding Model Source code 16
  • 22.
    The CPS developmentworkflow Modeling (Simulink) Verification/
 Testing Coding 17
  • 23.
    The CPS developmentworkflow Verification/
 Testing 17
  • 24.
  • 25.
    Comparing Model Checking andModel Testing Nejati, S., Gaaloul, K., Menghi, C., Briand, L.C., Foster, S., Wolfe, D. “Evaluating model testing and model checking for nding requirements violations in Simulink models.” In: Proceedings of ESEC/SIGSOFT FSE 2019, Estonia, August 26-30, 2019. pp. 1015–1025. ACM (2019)
  • 26.
    20 Model CheckingModel Testing Logical Properties Simulink Models NaturalLanguage Requirements Model Checking Model proven to be correct Failure Found No result Ranges of test input variables Simulink Models Natural Language Requirements Model Testing Logical Properties Fitness Functions No Failure FoundFailure Found
  • 27.
    Simulink Model Checker •QVTrace from QRA Corp, Canada • SMT-based model checker for Simulink, Z3, Mathematica 21 https://qracorp.com/qvtrace/
  • 28.
    QVTrace 22 QVtrace has beendesigned to optimize the workflow for model-based design analysis. The interface has three main sections as shown in the image below and described in detail on the next page. QVtrace User Manual v0.11.7 qracorp.com of4 21 1 2 3 Analysis in QVtrace can be approached in two ways: a) By formally translating sets of requirements specifications and verifying the model meets these, or b) As an interactive querying process where the domain expert iteratively queries the model for expected behaviour as the system components are modelled. Analysis will always be done on all constraints present in the Constraints Window and can be run from any subsystem in the model. It is important to note that the analysis will always check the entire model against all constraints present, and not just the subsystem being shown in the Design Navigation Window. When running analysis, the constraints will first be verified to ensure these are consistent with the QCT language syntax (see Section 5 for a guide to the QCT language syntax). For example writing “param_1 == 5” where param_1 is a boolean variable will return an error message stating that the constraint is inappropriately written, and no analysis will be run on the model. 4. Interpreting QVtrace Analysis Results 4.1.Possible analysis results No violations exist: This implies that the model is consistent with the stated constraints for all possible input values, and at all times. As shown in the left image, the Results tab will turn green when no violations exist. model for expected behaviour as the system component Analysis will always be done on all constraints present in the C be run from any subsystem in the model. It is important to note check the entire model against all constraints present, and no shown in the Design Navigation Window. When running analysis, the constraints will first be verified to with the QCT language syntax (see Section 5 for a guide to the example writing “param_1 == 5” where param_1 is a boolean message stating that the constraint is inappropriately written, on the model. 4. Interpreting QVtrace Analysis Resu 4.1.Possible analysis results No violations exist: This i consistent with the stated input values, and at all times. the Results tab will turn green when no violations exist. No violations exist up to a m implies that the model has be with the constraints within th the system. However, there is no guarantee that at some great When running analysis, the cons with the QCT language syntax (s example writing “param_1 == 5” message stating that the constra on the model. 4. Interpreting QVtr 4.1.Possible analysis the Results tab will turn green wh the system. However, there is no occur. In these cases, the result required to assess the validity o including an explicit time referenc QVtrace User Manual v0.11.7
  • 29.
    Model Testing (Falsification-based Testing) •Uses meta-heuristic search • Search guidance: fitness functions estimating how far a candidate test is from violating a requirement • Search heuristics: random, hill climbing, simulated annealing, genetic algorithm, etc. Search-based automated testing of continuous controllers: Framework, tool support, and case studies Reza Matinnejad, Shiva Nejati, Lionel C. Briand, Thomas Bruckmann, and Claude Poull Information & Software Technology, 2015 23
  • 30.
    CPS Models 24 • 11Models • Open-loop vs Feedback-loop • State-machines • Continuous-Dynamics - Dynamical-Systems • Non-linear dynamics • Machine Learning components
  • 31.
    Results — FaultFinding 25 Testing Model Checking Reqs Violations Proven Violations 92 40 41 23 • MT and MC together could show that 41 requirements are correct and 40 requirements are violated • Only 11 requirements remain inconclusive
  • 32.
    Results — FaultFinding 26 500s0 400s300s200s100s
  • 33.
    Results — FaultFinding 26 500s0 400s300s200s100s BMC can analyse
 up to 500 steps (50s)
  • 34.
    Results — FaultFinding 26 500s0 400s300s200s100s BMC can analyse
 up to 500 steps (50s) Testing found errors after 2000 steps
  • 35.
    Results — Time 27 TestingModel Checking Violations Proven Violations Inconclusive 5.8 min MAX = 18.5min, MIN=3min 0.6s MAX=1.9s MIN=0.06s 2.2s MAX=10.1s MIN=0.12s 15min to several hours
  • 36.
    Lessons Learned • L1:Model Checking fails to analyse some CPS models (Autopilot) • This is a major obstacle in adoption of QVTrace by CPS suppliers as confirmed by QRA 28
  • 37.
    Lessons Learned • L2:Model checking is less effective than Model Testing in finding requirements failures • Model Checking found 23 requirements violations • Model Testing found 40 requirements violations 29
  • 38.
    Lessons Learned • L3:Model Checking executes considerably faster than Model Testing when it can prove or violate requirements • Model Checking was able to prove 41 requirements and find violations in 23 requirements within a few seconds 30
  • 39.
  • 40.
  • 41.
    Model Requirements 32 Model Testing UsingSBST to automatically generate test inputs that reveal requirement violations
  • 42.
    Scaling Model Testingto Complex Compute-intensive Models Menghi, C., Nejati, S., Briand, L.C., Parache, Y.I. “Approximation-refinement testing of compute-intensive cyber- physical models: An approach based on system identication”. In: International Conference on Software Engineering (ICSE). arXiv (2020)
  • 43.
  • 44.
    Challenge • Industrial modelsof CPS are often compute-intensive 34 *
  • 45.
    Challenge • Industrial modelsof CPS are often compute-intensive • Compute-intensive models require hours to complete a single simulation of the model under test (MUT) 34 *
  • 46.
    Challenge • Industrial modelsof CPS are often compute-intensive • Compute-intensive models require hours to complete a single simulation of the model under test (MUT) • A simulation of the model of satellite requires ~1.5 hour 34 Provided by LuxSpace (https://luxspace.lu/) * *
  • 47.
    Scaling Model Checking 35 E.Clarke, O. Grumberg, S. Jha, Y. Lu, H. Veith “Counterexample-Guided Abstraction Refinement.” CAV 2000: 154-169
  • 48.
  • 49.
    CEGAR 36 Model 
 Abstraction 
 ModelCheck 
 Abstract
 Model 

  • 50.
    CEGAR 36 Model 
 Abstraction 
 ModelCheck 
 Abstract
 Model 
 No Bug

  • 51.
    CEGAR 36 Model 
 Abstraction 
 Simulate
 Bug ModelCheck 
 Abstract
 Model 
 No Bug

  • 52.
    CEGAR 36 Model 
 Abstraction 
 Simulate
 Bug ModelCheck 
 Abstract
 Model 
 No Bug
 Real
 Bug
  • 53.
    CEGAR 36 Model 
 Abstraction 
 Simulate
 Bug ModelCheck 
 Abstract
 Model 
 No Bug
 Real
 Bug Refinement
 Spurious Bug
  • 54.
    CEGAR 36 Model 
 Abstraction 
 Simulate
 Bug ModelCheck 
 Abstract
 Model 
 No Bug
 Real
 Bug Refinement
 Spurious Bug Refined
 Abstract
 Model 

  • 55.
    CEGAR 36 Model 
 Abstraction 
 Simulate
 Bug ModelCheck 
 Abstract
 Model 
 No Bug
 Real
 Bug Refinement
 Spurious Bug Refined
 Abstract
 Model 
 Abstract 
 Interpretation
  • 56.
    AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model
 Abstraction 
 Model Check 
 Abstract
 Model 
 Simulate
 No Bug
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 

  • 57.
    AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model
 Abstraction 
 Model Check 
 Abstract
 Model 
 Simulate
 No Bug
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 

  • 58.
    AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model
 Abstraction 
 Abstract
 Model 
 Simulate
 No Bug
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 
 SBST
  • 59.
    AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model
 Abstraction 
 Abstract
 Model 
 Simulate
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 
 SBST
  • 60.
    AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model
 Abstraction 
 Abstract
 Model 
 Simulate
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 
 SBST
  • 61.
    AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model
 Abstract
 Model 
 Simulate
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 
 SBST Approximation 

  • 62.
    AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model
 Abstract
 Model 
 Simulate
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 
 Machine Learning
 (system identification) SBST Approximation 

  • 63.
    Evaluation: Effectiveness and Efficiency •RQ1. How effective is ARIsTEO in generating tests that reveal requirements violations? • RQ2. How efficient is ARIsTEO in generating tests revealing requirements violations? 38
  • 64.
    RQ1 and RQ2- Effectiveness and Efficiency 39 RQ1: On average, ARIsTEO detects 23.9% more requirements violations than S-Taliro (min=-8%, max=95%). RQ2: ARIsTEO is on average 31.3% (min=−1.6%, max=85.2%) more efficient than S-Taliro.
  • 65.
    RQ3 - PracticalUsefulness • RQ3. How applicable and useful is ARIsTEO in generating tests revealing requirements violations for industrial CI-CPS models? 40
  • 66.
    RQ3 - PracticalUsefulness 41 RQ3: ARIsTEO efficiently detected requirement violations - in practical time - that S-Taliro could not find, on an industrial CI-CPS model
  • 67.
  • 68.
  • 69.
    Missing Assumptions onInputs 43 Yaw Roll Pitch Req: When the autopilot is enabled, the aircraft altitude should reach the desired altitude within 500 seconds in calm air. Req
  • 70.
    Missing Assumptions onInputs 43 Yaw Roll Pitch Req: When the autopilot is enabled, the aircraft altitude should reach the desired altitude within 500 seconds in calm air. Assumption: The pilot should apply sufficient throttle force. Req
  • 71.
    Missing Assumptions onInputs 43 Yaw Roll Pitch Req: When the autopilot is enabled, the aircraft altitude should reach the desired altitude within 500 seconds in calm air. Assumption: The pilot should apply sufficient throttle force. Yaw Roll Pitch Throttle > c Req& Req
  • 72.
    Mining Assumptions using Searchand Decision Trees Gaaloul, K., Menghi, C., Nejati, S., Briand, L., Wolfe, D. “Mining assumptions for software components using machine learning.” In: Foundations of Software Engineering ESEC/SIGSOFT FSE 2020. ACM (2020)
  • 73.
  • 74.
  • 75.
    Req Yaw Roll Pitch SBST Test Suite+ Oracle Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F
  • 76.
    Req Yaw Roll Pitch SBST Test Suite+ Oracle Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Machine Learning Throttle > c
  • 77.
    Req Yaw Roll Pitch SBST Test Suite+ Oracle Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Machine Learning Throttle > c Model Checking
  • 78.
    Req Yaw Roll Pitch SBST Test Suite+ Oracle Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Machine Learning Throttle > c Model Checking
  • 79.
    Req Yaw Roll Pitch SBST Test Suite+ Oracle Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Machine Learning Throttle > c Model Checking
  • 80.
    Test Inputs +pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F
  • 81.
    Test Inputs +pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F P Decision Tree PF F
  • 82.
    Test Inputs +pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F P Decision Tree PF F C1 ∨ C2 ∨ … ∨ Cn Throttle > 0.5 ∧ pitchwheel > 10
  • 83.
    Test Inputs +pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F P Decision Tree PF F C1 ∨ C2 ∨ … ∨ Cn Throttle > 0.5 ∧ pitchwheel > 10 Simple predicates!
  • 84.
    Test Inputs +pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F
  • 85.
    Test Inputs +pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Genetic Programming < ≥ - ∧ × x Y 5 2 x Z
  • 86.
    Test Inputs +pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Genetic Programming < ≥ - ∧ × x Y 5 2 x Z x × y < 5 ∧ (x − z) ≥ 2
  • 87.
    Test Inputs +pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Genetic Programming < ≥ - ∧ × x Y 5 2 x Z x × y < 5 ∧ (x − z) ≥ 2 Complex linear and nonlinear formulas
  • 88.
    Conclusions • Assumption generationis important for model debugging and compositional verification (a.k.a assume-guarantee reasoning) • Current inference techniques rely on automata theory and can generate only boolean assumptions or assumptions over predicates • Applying decision tree learning to test data, we can • Generate assumptions that include arithmetic constraints over numeric variables • Using genetic programming, we can even go beyond linear arithmetic constraints 48
  • 89.
    Summary and Reflections •Formal verification and testing (including SBST) have a common goal • For most applications, formal verification fails to prove correctness and (like testing) can only show the presence of bugs • SBST and ML may improve formal verification in scalability and applicability • Systematic frameworks developed in the formal verification community may help improve and enhance SBST 49