.lusoftware verification & validation
VVS
Achieving Scalability in Software Testing
with Machine Learning and
Metaheuristic Search
Lionel Briand
RAISE @ ICSE 2018
Scope
• The main challenge in testing software systems is
scalability
• Addressing scalability entails effective automation
• Lessons learned from industrial research collaborations:
satellite, automotive, finance, energy …
• Experiences from combining metaheuristic search,
machine learning, and other AI techniques, in addressing
testing scalability
2
Scalability
• The extent to which a technique can be applied on large or
complex artifacts (e.g., input spaces, code, models) and still
provide useful, automated support with acceptable effort, CPU,
and memory?
3
Collaborative Research @ SnT
4
• Research in context
• Addresses actual needs
• Well-defined problem
• Long-term collaborations
• Our lab is the industry
SVV Dept.
5
• Established in 2012, part of the SnT centre
• Requirements Engineering, Security Analysis, Design Verification,
Automated Testing, Runtime Monitoring
• ~ 25 lab members
• Partnerships
• ERC Advanced grant
Outline
• Overview, problem definition
• Example research projects with industry partners:
• Testing advanced driver assistance systems
• Testing controllers (automotive)
• Stress testing critical task deadlines (Energy)
• Vulnerability testing (Banking)
• Reflections and lessons learned
6
Introduction
7
Software Testing
8
SW Representation
(e.g., specifications)
SW Code
Derive Test cases
Execute Test cases
Compare
Expected
Results or properties
Get Test Results
Test Oracle
[Test Result==Oracle][Test Result!=Oracle]
Automation!
Search-Based Software Testing
• Express test generation problem
as a search problem
• Search for test input data with
certain properties, i.e.,
constraints
• Non-linearity of software (if,
loops, …): complex,
discontinuous, non-linear
search spaces (Baresel)
• Many search algorithms
(metaheuristics), from local
search to global search, e.g.,
Hill Climbing, Simulated
Annealing and Genetic
Algorithms
e search space neighbouring the
for fitness. If a better candidate
mbing moves to that new point,
rhood of that candidate solution.
the neighbourhood of the current
fers no better candidate solutions;
If the local optimum is not the
gure 3a), the search may benefit
performing a climb from a new
cape (Figure 3b).
le Hill Climbing is Simulated
Simulated Annealing is similar to
ement around the search space is
be made to points of lower fitness
he aim of escaping local optima.
bability value that is dependent
‘temperature’, which decreases
ogresses (Figure 4). The lower
kely the chances of moving to a
ch space, until ‘freezing point’ is
the algorithm behaves identically
d Annealing is named so because
hysical process of annealing in
curve of the fitness landscape until a local optimum is found. The fina
position may not represent the global optimum (part (a)), and restarts ma
be required (part (b))
Fitness
Input domain
Figure 4. Simulated Annealing may temporarily move to points of poore
fitness in the search space
Fitness
Input domain
Figure 5. Genetic Algorithms are global searches, sampling many poin
in the fitness landscape at once
“Search-Based Software Testing: Past, Present and Future”
Phil McMinn
Genetic Algorithm
9
cusses future directions for Search-Based
g, comprising issues involving execution
estability, automated oracles, reduction of
st and multi-objective optimisation. Finally,
udes with closing remarks.
-BASED OPTIMIZATION ALGORITHMS
form of an optimization algorithm, and
mplement, is random search. In test data
s are generated at random until the goal of
mple, the coverage of a particular program
nch) is fulfilled. Random search is very poor
ns when those solutions occupy a very small
ll search space. Such a situation is depicted
re the number of inputs covering a particular
are very few in number compared to the
ut domain. Test data may be found faster
ly if the search is given some guidance.
c searches, this guidance can be provided
a problem-specific fitness function, which
points in the search space with respect to
or their suitability for solving the problem
Input domain
portion of
input domain
denoting required
test data
randomly-generated
inputs
Figure 2. Random search may fail to fulfil low-probability test goals
Fitness
Input domain
(a) Climbing to a local optimum
Testing Advanced Driving
Assistance Systems
10
Cyber-Physical Systems
• A system of collaborating computational elements controlling
physical entities
11
Advanced Driver Assistance
Systems (ADAS)
12
Automated Emergency Braking (AEB)
Pedestrian Protection (PP)
Lane Departure Warning (LDW)
Traffic Sign Recognition (TSR)
Automotive Environment
• Highly varied environments, e.g., road topology, weather, building
and pedestrians …
• Huge number of possible scenarios, e.g., determined by
trajectories of pedestrians and cars
• ADAS play an increasingly critical role
• A challenge for testing
13
Advanced Driver Assistance
Systems (ADAS)
Decisions are made over time based on sensor data
14
Sensors
Controller
Actuators Decision
Sensors
/Camera
Environment
ADAS
A General and Fundamental Shift
• Increasingly so, it is easier to learn behavior from data using
machine learning, rather than specify and code
• Deep learning, reinforcement learning …
• Example: Neural networks (deep learning)
• Millions of weights learned
• No explicit code, no specifications
• Verification, testing?
15
CPS Development Process
16
Functional modeling:
• Controllers
• Plant
• Decision
Continuous and discrete
Simulink models
Model simulation and
testing
Architecture modelling
• Structure
• Behavior
• Traceability
System engineering modeling
(SysML)
Analysis:
• Model execution and
testing
• Model-based testing
• Traceability and
change impact
analysis
• ...
(partial) Code generation
Deployed executables on
target platform
Hardware (Sensors ...)
Analog simulators
Testing (expensive)
Hardware-in-the-Loop
Stage
Software-in-the-Loop
Stage
Model-in-the-Loop Stage
Automotive Environment
• Highly varied environments, e.g., road topology, weather, building
and pedestrians …
• Huge number of possible scenarios, e.g., determined by
trajectories of pedestrians and cars
• ADAS play an increasingly critical role
• A challenge for testing
17
Our Goal
• Developing an automated testing technique
for ADAS
18
• To help engineers efficiently and
effectively explore the complex test input
space of ADAS
• To identify critical (failure-revealing) test
scenarios
• Characterization of input conditions that
lead to most critical situations, e.g.,
safety violations
19
Automated Emergency Braking
System (AEB)
19
“Brake-request”
when braking is needed
to avoid collisions
Decision making
Vision
(Camera)
Sensor
Brake
Controller
Objects’
position/speed
Example Critical Situation
• “AEB properly detects a pedestrian in front of the car with a
high degree of certainty and applies braking, but an accident
still happens where the car hits the pedestrian with a
relatively high speed”
20
Testing ADAS
21
A simulator based on
Physical/Mathematical models
On-road testing
Simulation-based (model) testing
Testing via Physics-based
Simulation
22
ADAS
(SUT)
Simulator (Matlab/Simulink)
Model
(Matlab/Simulink)
▪ Physical plant (vehicle / sensors / actuators)
▪ Other cars
▪ Pedestrians
▪ Environment (weather / roads / traffic signs)
Test input
Test output
time-stamped output
AEB Domain Model
- visibility:
VisibilityRange
- fog: Boolean
- fogColor:
FogColor
Weather
- frictionCoeff:
Real
Road1
- v0 : Real
Vehicle
- : Real
- : Real
- : Real
- :Real
Pedestrian
- simulationTime:
Real
- timeStep: Real
Test
Scenario
1
1
- ModerateRain
- HeavyRain
- VeryHeavyRain
- ExtremeRain
«enumeration»
RainType- ModerateSnow
- HeavySnow
- VeryHeavySnow
- ExtremeSnow
«enumeration»
SnowType
- DimGray
- Gray
- DarkGray
- Silver
- LightGray
- None
«enumeration»
FogColor
1
WeatherC
{{OCL} self.fog=false
implies self.visibility = “300”
and self.fogColor=None}
Straight
- height:
RampHeight
Ramped
- radius:
CurvedRadius
Curved
- snowType:
SnowType
Snow
- rainType:
RainType
Rain
Normal
- 5 - 10 - 15 - 20
- 25 - 30 - 35 - 40
«enumeration»
CurvedRadius (CR)
- 4 - 6 - 8 - 10 - 12
«enumeration»
RampHeight (RH)
- 10 - 20 - 30 - 40 - 50
- 60 - 70 - 80 - 90 - 100
- 110 - 120 - 130 - 140
- 150 - 160 - 170 - 180
- 190 - 200 - 210 - 220
- 230 - 240 - 250 - 260
- 270 - 280 - 290 - 300
«enumeration»
VisibilityRange
- : TTC: Real
- : certaintyOfDetection:
Real
- : braking: Boolean
AEB Output
- : Real
- : Real
Output functions
Mobile
object
Position
vector
- x: Real
- y: Real
Position
1 11
1
1
Static input
1
Output
1
1
Dynamic input
xp
0
yp
0
vp
0
✓p
0
vc
0
v3
v2
v1
F1
F2
ADAS Testing Challenges
• Test input space is large, complex and multidimensional
• Explaining failures and fault localization are difficult
• Execution of physics-based simulation models is computationally
expensive
24
Black-Box Search-based Testing
25
Test input generation (NSGA II)
Evaluating test inputs
- Select best tests
- Generate new tests
(candidate)
test inputs
- Simulate every (candidate) test
- Compute fitness functions
Fitness
values
Test cases revealing worst case system behaviors
Input data ranges/dependencies + Simulator + Fitness functions
defined based on Oracles
Search: Genetic Evolution
26
Initial input
Fitness
computation
Selection
Breeding
Better Guidance
• Fitness computations rely on simulations and are very
expensive
• Search needs better guidance
27
Decision Trees
28
Partition the input space into homogeneous regions
All points
Count 1200
“non-critical” 79%
“critical” 21%
“non-critical” 59%
“critical” 41%
Count 564 Count 636
“non-critical” 98%
“critical” 2%
Count 412
“non-critical” 49%
“critical” 51%
Count 152
“non-critical” 84%
“critical” 16%
Count 230 Count 182
vp
0 >= 7.2km/h vp
0 < 7.2km/h
✓p
0 < 218.6 ✓p
0 >= 218.6
RoadTopology(CR = 5,
Straight, RH = [4 12](m))
RoadTopology
(CR = [10 40](m))
“non-critical” 31%
“critical” 69%
“non-critical” 72%
“critical” 28%
Genetic Evolution Guided by
Classification
29
Initial input
Fitness
computation
Classification
Selection
Breeding
Search Guided by Classification
30
Test input generation (NSGA II)
Evaluating test inputs
Build a classification tree
Select/generate tests in the fittest regions
Apply genetic operators
Input data ranges/dependencies + Simulator + Fitness functions
defined based on Oracles
(candidate)
test inputs
- Simulate every (candidate) test
- Compute fitness functions
Fitness
values
Test cases revealing worst case system behaviors +
A characterization of critical input regions
NSGAII-DT vs. NSGAII
31
NSGAII-DT outperforms NSGAII
HV
0.0
0.4
0.8
GD
0.05
0.15
0.25
SP
2
0.6
1.0
1.4
6 10 14 18 22 24
Time (h)
NSGAII-DT
NSGAII
Testing Controllers
32
Dynamic Continuous Controllers
33
MiL Test Cases
34
Model
Simulation
Input
Signals
Output
Signal(s)
S3
t
S2
t
S1
t
S3
t
S2
t
S1
t
Test Case 1
Test Case 2
• Supercharger bypass flap controller
üFlap position is bounded within
[0..1]
üImplemented in MATLAB/Simulink
ü34 (sub-)blocks decomposed into 6
abstraction levels
Supercharger
Bypass Flap
Supercharger
Bypass Flap
Flap position = 0 (open) Flap position = 1 (closed)
Simple Example
35
Initial
Desired Value
Final
Desired Value
time time
Desired Value
Actual Value
T/2 T T/2 T
Test Input Test Output
Plant
Model
Controller
(SUT)
Desired value Error
Actual value
System output+
-
MiL Testing of Controllers
36
Configurable Controllers at MIL
Plant Model
+
+
+
⌃
+
-
e(t)
actual(t)
desired(t)
⌃
KP e(t)
KD
de(t)
dt
KI
R
e(t) dt
P
I
D
output(t)
Time-dependent variables
Configuration Parameters
37
Requirements and Test Objectives
InitialDesired
(ID)
Desired ValueI (input)
Actual Value (output)
FinalDesired
(FD)
time
T/2 T
Smoothness
Responsiveness
Stability
38
A Search-Based Test Approach
Initial Desired (ID)
FinalDesired(FD)
Worst Case(s)?
• Search directed by model
execution feedback
• Controller’s dynamic behavior
can be complex
• Meta-heuristic search in (large)
input space: Finding worst case
inputs
• Possible because of automated
oracle (feedback loop)
• Different worst cases for
different requirements
39
Initial Solution
HeatMap
Diagram
1. Exploration
List of
Critical
RegionsDomain
Expert
Worst-Case
Scenarios
+
Controller-
plant
model
Objective
Functions
based on
Requirements
2. Single-State
Search
time
Desired Value
Actual Value
0 1 2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Initial Desired
Final Desired
40
Results
• We found much worse scenarios during MiL testing than our
partner had found so far
• These scenarios are also run at the HiL level, where testing is
much more expensive: MiL results => test selection for HiL
• But further research was needed:
• Simulations are expensive
• Configuration parameters
41
Final Solution
+
Controller
Model
(Simulink)
Worst-Case
Scenarios
List of
Critical
PartitionsRegression
Tree
1.Exploration with
Dimensionality
Reduction
2.Search with
Surrogate
Modeling
Objective
Functions
Domain
Expert
Visualization of the
8-dimension space
using regression treesDimensionality
reduction to identify
the significant variables
(Elementary Effect Analysis)
Surrogate modeling
to predict the fitness
function and
speed up the search
(Neural network)
42
Regression Tree
All Points
FD>=0.43306
Count
Mean
Std Dev
Count
Mean
Std Dev
FD<0.43306
Count
Mean
Std Dev
ID>=0.64679
Count
Mean
Std Dev
Count
Mean
Std Dev
Cal5>=0.020847 Cal5>0.020847
Count
Mean
Std Dev
Count
Mean
Std Dev
Cal5>=0.014827 Cal5<0.014827
Count
Mean
Std Dev
Count
Mean
Std Dev
1000
0.007822
0.0049497
ID<0.64679
574
0.0059513
0.0040003
426
0.0103425
0.0049919
373
0.0047594
0.0034346
201
0.0081631
0.0040422
182
0.0134555
0.0052883
244
0.0080206
0.0031751
70
0.0106795
0.0052045
131
0.0068185
0.0023515
43
Surrogate Modeling
Any supervised learning or statistical
technique providing fitness predictions
with confidence intervals
1. Predict higher fitness with high
confidence: Move to new position,
no simulation
2. Predict lower fitness with high
confidence: Do not move to new
position, no simulation
3. Low confidence in prediction:
Simulation
Surrogate Model
Real Function
x
Fitness
44
ü Our approach is able to identify more critical violations of the
controller requirements that had neither been found with
default/fixed configurations nor by manual testing.
MiL-Testing
different configurations
Stability
Smoothness
Responsiveness
MiL-Testing
fixed configurations Manual MiL-Testing
- -2.2% deviation
24% over/undershoot 20% over/undershoot 5% over/undershoot
170 ms response time 80 ms response time 50 ms response time
Results
45
Schedulability Analysis and
Testing
46
Problem and Context
• Schedulability analysis encompasses techniques that try to
predict whether (critical) tasks are schedulable, i.e., meet
their deadlines
• Stress testing runs carefully selected test cases that have
a high probability of leading to deadline misses
• Stress testing is complementary to schedulability analysis
• Testing is typically expensive, e.g., hardware in the loop
• Finding stress test cases is difficult
47
Finding Stress Test Cases is Hard
48
0
1
2
3
4
5
6
7
8
9
j0, j1 , j2 arrive at at0 , at1 , at2 and must
finish before dl0 , dl1 , dl2
J1 can miss its deadline dl1 depending on
when at2 occurs!
0
1
2
3
4
5
6
7
8
9
j0 j1 j2 j0 j1 j2
at0
dl0
dl1
at1 dl2
at2
T
T
at0
dl0 dl1
at1
at2
dl2
Challenges and Solutions
• Ranges for arrival times form a very large input space
• Task interdependencies and properties constrain what
parts of the space are feasible
• Solution: We re-expressed the problem as a constraint
optimization problem and used a combination of constraint
programming (IBM CPLEX) and meta-heuristic search (GA)
49
Constraint Optimization
50
Constraint Optimization Problem
Static Properties of Tasks
(Constants)
Dynamic Properties of Tasks
(Variables)
Performance Requirement
(Objective Function)
OS Scheduler Behaviour
(Constraints)
Combining CP and GA
51
A:12 S. Di Alesio et al.
Fig. 3: Overview of GA+CP: the solutions x , y and z in the initial population of GA evolve into
Case Study
52
Drivers
(Software-Hardware Interface)
Control Modules
Alarm Devices
(Hardware)
Multicore Architecture
Real-Time Operating System
System monitors gas leaks and fire in
oil extraction platforms
Summary
• We provided a solution for generating stress test cases by combining
meta-heuristic search and constraint programming
• Meta-heuristic search (GA) identifies high risk regions in the
input space
• Constraint programming (CP) finds provably worst-case
schedules within these (limited) regions
• Achieve (nearly) GA efficiency and CP effectiveness
• Our approach can be used both for stress testing and
schedulability analysis (assumption free)
53
Vulnerability Testing
54
42%
32%
9%
4%
3%
3%
3%
2%
2%
Code Injection
Manipulated data structures
Collect and analyze information
Indicator
Employ probabilistic techniques
Manipulate system resources
Subvert access control
Abuse existing functionality
Engage in deceptive…
X-Force Threat Intelligence Index 2017
55
https://www.ibm.com/security/xforce/
More than 40% of all
attacks were injection
attacks (e.g., SQLi)
Web Applications
56
Server SQL DatabaseClient
Web Applications
57
Web form
str1
str2
Username
Password
OK
SQL query
SELECT *
FROM Users WHERE
(usr = ‘str1’ AND psw = ‘str2’)
Name Surname …
John Smith …
Result
Server SQL DatabaseClient
Injection Attacks
58
SQL query
Name Surname …
Aria Stark …
John Snow …
… … …
Query result
SELECT *
FROM Users
WHERE (usr = ‘’ AND
psw = ‘’) OR 1=1 --
Server SQL DatabaseClient
Web form
‘) OR 1=1 --
Username
Password
OK
Protection Layers
Server
SQL
Database
Client
Data input
Validation
and
Sanitization
Database
Firewall
Web
Application
Firewall
59
Web Application Firewalls (WAFs)
60
Servermalicious
malicious
malicious
legitimate
WAF
WAF Rule Set
61
Rule set of Apache ModSecurity
https://github.com/SpiderLabs/ModSecurity
Misconfigured WAFs
62
BLOCKED
False Positive
ALLOWED
False Negative
Grammar-based Attack Generation
• BNF grammar for SQLi attacks
• Random strategy: randomly selected production rules are
applied recursively until only terminals are left
• Random strategy not efficient for bypassing attacks that
are difficult to find
• Machine learning? Search?
• How to guide the search? How can ML help?
Anatomy of SQLi attacks
64
‘ OR“a”=“a”#
Bypassing Attack
<START>
<sq> <wsp> <sqliAttack> <cmt>
<boolAttack>
<opOR> <boolTrueExpr>
OR <bynaryTrue>
<dq> <ch> <dq> <opEq> <dq> <ch> <dq>
“ a ” = “ a ”
<sQuoteContext>
‘ #_
Derivation Tree
‘
_
OR”a”=“a”
#
S =
{
Attack Slices
Learning Attack Patterns
65
S1 S2 S3 S4 … Sn Outcome
A1 1 1 0 0 … 0 Passed
A2 0 1 0 0 … 0 Blocked
… … … … … … … …
Am 1 1 1 1 … 1 Blocked
Training Set
PassedBlocked
S4
YesNo
YesNo
YesNo
S3
S2
Decision Tree
Sn
S1
…
• Random trees
• Random forest
Learning Attack Patterns
66
S1 S2 S3 S4 … Sn Outcome
A1 1 1 0 0 … 0 Passed
A2 0 1 0 0 … 0 Blocked
… … … … … … … …
Am 1 1 1 1 … 1 Blocked
PassedBlocked
S4
YesNo
YesNo
YesNo
S3
S2
Sn
S1
…
Training Set Decision Tree
Attack Pattern
S2 ∧ ¬ Sn ∧ S1
Machine Learning
PassedBlocked
S4
YesNo
YesNo
YesNo
S3
S2
Sn
S1
…
Generating Attacks via ML and EAs
67
Evolutionary Algorithm (EA)
Iteratively refine successful attack
conditions
Some Results
Apache ModSecurity
68
DistinctAttacks
Industrial WAFs
DistinctAttacks
Machine Learning-driven attack generation led to more
distinct, successful attacks being discovered faster
Related Work
• Automated repair of WAFs
• Automated testing targeting XML and SQL injections in web
applications
69
Reflecting
70
Search-Based Solutions
• Versatile
• Helps relax assumptions compared to exact approaches
• Helps decrease modeling requirements
• Scalability, e.g., easy to parallelize
• Requires massive empirical studies
• Search is rarely sufficient by itself
71
Multidisciplinary Approach
• Single-technology approaches rarely work in practice
• Combined search with:
• Machine learning
• Solvers, e.g., CP, SMT
• Statistical approaches, e.g., sensitivity analysis
• System and environment modeling and simulation
72
Objectives
• Reduce search space
• Better guide and focus search
• Compute fitness and provide guidance
• Avoid expensive and useless fitness computations
• Explain failures (e.g., decision trees)
• Get more guarantees (e.g., constraint programming)
73
Acknowledgements
• Shiva Nejati
• Reza Matinnejad
• Raja Ben Abdessalem
• Stefano Di Alesio
• Dennis Appelt
• Annibale Panichella
74
Selected References
• L. Briand et al. “Testing the untestable: Model testing of complex software-intensive systems”,
IEEE/ACM ICSE 2016, V2025
• R. Matinnejad et al., “MiL Testing of Highly Configurable Continuous Controllers: Scalable Search
Using Surrogate Models”, IEEE/ACM ASE 2014 (Distinguished paper award)
• S. Di Alesio et al. “Combining genetic algorithms and constraint programming to support stress
testing of task deadlines”, ACM Transactions on Software Engineering and Methodology (TOSEM),
25(1):4, 2015
• R. Ben Abdessalem et al., "Testing Vision-Based Control Systems Using Learnable Evolutionary
Algorithms”, IEEE/ACM ICSE 2018
• D. Appelt et al., “A Machine Learning-Driven Evolutionary Approach for Testing Web Application
Firewalls”, To appear in IEEE Transaction on Reliability
• More on: https://wwwen.uni.lu/snt/people/lionel_briand?page=Publications
75
.lusoftware verification & validation
VVS
Achieving Scalability in Software Testing
with Machine Learning and
Metaheuristic Search
Lionel Briand
RAISE @ ICSE 2018

Achieving Scalability in Software Testing with Machine Learning and Metaheuristic Search

  • 1.
    .lusoftware verification &validation VVS Achieving Scalability in Software Testing with Machine Learning and Metaheuristic Search Lionel Briand RAISE @ ICSE 2018
  • 2.
    Scope • The mainchallenge in testing software systems is scalability • Addressing scalability entails effective automation • Lessons learned from industrial research collaborations: satellite, automotive, finance, energy … • Experiences from combining metaheuristic search, machine learning, and other AI techniques, in addressing testing scalability 2
  • 3.
    Scalability • The extentto which a technique can be applied on large or complex artifacts (e.g., input spaces, code, models) and still provide useful, automated support with acceptable effort, CPU, and memory? 3
  • 4.
    Collaborative Research @SnT 4 • Research in context • Addresses actual needs • Well-defined problem • Long-term collaborations • Our lab is the industry
  • 5.
    SVV Dept. 5 • Establishedin 2012, part of the SnT centre • Requirements Engineering, Security Analysis, Design Verification, Automated Testing, Runtime Monitoring • ~ 25 lab members • Partnerships • ERC Advanced grant
  • 6.
    Outline • Overview, problemdefinition • Example research projects with industry partners: • Testing advanced driver assistance systems • Testing controllers (automotive) • Stress testing critical task deadlines (Energy) • Vulnerability testing (Banking) • Reflections and lessons learned 6
  • 7.
  • 8.
    Software Testing 8 SW Representation (e.g.,specifications) SW Code Derive Test cases Execute Test cases Compare Expected Results or properties Get Test Results Test Oracle [Test Result==Oracle][Test Result!=Oracle] Automation!
  • 9.
    Search-Based Software Testing •Express test generation problem as a search problem • Search for test input data with certain properties, i.e., constraints • Non-linearity of software (if, loops, …): complex, discontinuous, non-linear search spaces (Baresel) • Many search algorithms (metaheuristics), from local search to global search, e.g., Hill Climbing, Simulated Annealing and Genetic Algorithms e search space neighbouring the for fitness. If a better candidate mbing moves to that new point, rhood of that candidate solution. the neighbourhood of the current fers no better candidate solutions; If the local optimum is not the gure 3a), the search may benefit performing a climb from a new cape (Figure 3b). le Hill Climbing is Simulated Simulated Annealing is similar to ement around the search space is be made to points of lower fitness he aim of escaping local optima. bability value that is dependent ‘temperature’, which decreases ogresses (Figure 4). The lower kely the chances of moving to a ch space, until ‘freezing point’ is the algorithm behaves identically d Annealing is named so because hysical process of annealing in curve of the fitness landscape until a local optimum is found. The fina position may not represent the global optimum (part (a)), and restarts ma be required (part (b)) Fitness Input domain Figure 4. Simulated Annealing may temporarily move to points of poore fitness in the search space Fitness Input domain Figure 5. Genetic Algorithms are global searches, sampling many poin in the fitness landscape at once “Search-Based Software Testing: Past, Present and Future” Phil McMinn Genetic Algorithm 9 cusses future directions for Search-Based g, comprising issues involving execution estability, automated oracles, reduction of st and multi-objective optimisation. Finally, udes with closing remarks. -BASED OPTIMIZATION ALGORITHMS form of an optimization algorithm, and mplement, is random search. In test data s are generated at random until the goal of mple, the coverage of a particular program nch) is fulfilled. Random search is very poor ns when those solutions occupy a very small ll search space. Such a situation is depicted re the number of inputs covering a particular are very few in number compared to the ut domain. Test data may be found faster ly if the search is given some guidance. c searches, this guidance can be provided a problem-specific fitness function, which points in the search space with respect to or their suitability for solving the problem Input domain portion of input domain denoting required test data randomly-generated inputs Figure 2. Random search may fail to fulfil low-probability test goals Fitness Input domain (a) Climbing to a local optimum
  • 10.
  • 11.
    Cyber-Physical Systems • Asystem of collaborating computational elements controlling physical entities 11
  • 12.
    Advanced Driver Assistance Systems(ADAS) 12 Automated Emergency Braking (AEB) Pedestrian Protection (PP) Lane Departure Warning (LDW) Traffic Sign Recognition (TSR)
  • 13.
    Automotive Environment • Highlyvaried environments, e.g., road topology, weather, building and pedestrians … • Huge number of possible scenarios, e.g., determined by trajectories of pedestrians and cars • ADAS play an increasingly critical role • A challenge for testing 13
  • 14.
    Advanced Driver Assistance Systems(ADAS) Decisions are made over time based on sensor data 14 Sensors Controller Actuators Decision Sensors /Camera Environment ADAS
  • 15.
    A General andFundamental Shift • Increasingly so, it is easier to learn behavior from data using machine learning, rather than specify and code • Deep learning, reinforcement learning … • Example: Neural networks (deep learning) • Millions of weights learned • No explicit code, no specifications • Verification, testing? 15
  • 16.
    CPS Development Process 16 Functionalmodeling: • Controllers • Plant • Decision Continuous and discrete Simulink models Model simulation and testing Architecture modelling • Structure • Behavior • Traceability System engineering modeling (SysML) Analysis: • Model execution and testing • Model-based testing • Traceability and change impact analysis • ... (partial) Code generation Deployed executables on target platform Hardware (Sensors ...) Analog simulators Testing (expensive) Hardware-in-the-Loop Stage Software-in-the-Loop Stage Model-in-the-Loop Stage
  • 17.
    Automotive Environment • Highlyvaried environments, e.g., road topology, weather, building and pedestrians … • Huge number of possible scenarios, e.g., determined by trajectories of pedestrians and cars • ADAS play an increasingly critical role • A challenge for testing 17
  • 18.
    Our Goal • Developingan automated testing technique for ADAS 18 • To help engineers efficiently and effectively explore the complex test input space of ADAS • To identify critical (failure-revealing) test scenarios • Characterization of input conditions that lead to most critical situations, e.g., safety violations
  • 19.
    19 Automated Emergency Braking System(AEB) 19 “Brake-request” when braking is needed to avoid collisions Decision making Vision (Camera) Sensor Brake Controller Objects’ position/speed
  • 20.
    Example Critical Situation •“AEB properly detects a pedestrian in front of the car with a high degree of certainty and applies braking, but an accident still happens where the car hits the pedestrian with a relatively high speed” 20
  • 21.
    Testing ADAS 21 A simulatorbased on Physical/Mathematical models On-road testing Simulation-based (model) testing
  • 22.
    Testing via Physics-based Simulation 22 ADAS (SUT) Simulator(Matlab/Simulink) Model (Matlab/Simulink) ▪ Physical plant (vehicle / sensors / actuators) ▪ Other cars ▪ Pedestrians ▪ Environment (weather / roads / traffic signs) Test input Test output time-stamped output
  • 23.
    AEB Domain Model -visibility: VisibilityRange - fog: Boolean - fogColor: FogColor Weather - frictionCoeff: Real Road1 - v0 : Real Vehicle - : Real - : Real - : Real - :Real Pedestrian - simulationTime: Real - timeStep: Real Test Scenario 1 1 - ModerateRain - HeavyRain - VeryHeavyRain - ExtremeRain «enumeration» RainType- ModerateSnow - HeavySnow - VeryHeavySnow - ExtremeSnow «enumeration» SnowType - DimGray - Gray - DarkGray - Silver - LightGray - None «enumeration» FogColor 1 WeatherC {{OCL} self.fog=false implies self.visibility = “300” and self.fogColor=None} Straight - height: RampHeight Ramped - radius: CurvedRadius Curved - snowType: SnowType Snow - rainType: RainType Rain Normal - 5 - 10 - 15 - 20 - 25 - 30 - 35 - 40 «enumeration» CurvedRadius (CR) - 4 - 6 - 8 - 10 - 12 «enumeration» RampHeight (RH) - 10 - 20 - 30 - 40 - 50 - 60 - 70 - 80 - 90 - 100 - 110 - 120 - 130 - 140 - 150 - 160 - 170 - 180 - 190 - 200 - 210 - 220 - 230 - 240 - 250 - 260 - 270 - 280 - 290 - 300 «enumeration» VisibilityRange - : TTC: Real - : certaintyOfDetection: Real - : braking: Boolean AEB Output - : Real - : Real Output functions Mobile object Position vector - x: Real - y: Real Position 1 11 1 1 Static input 1 Output 1 1 Dynamic input xp 0 yp 0 vp 0 ✓p 0 vc 0 v3 v2 v1 F1 F2
  • 24.
    ADAS Testing Challenges •Test input space is large, complex and multidimensional • Explaining failures and fault localization are difficult • Execution of physics-based simulation models is computationally expensive 24
  • 25.
    Black-Box Search-based Testing 25 Testinput generation (NSGA II) Evaluating test inputs - Select best tests - Generate new tests (candidate) test inputs - Simulate every (candidate) test - Compute fitness functions Fitness values Test cases revealing worst case system behaviors Input data ranges/dependencies + Simulator + Fitness functions defined based on Oracles
  • 26.
    Search: Genetic Evolution 26 Initialinput Fitness computation Selection Breeding
  • 27.
    Better Guidance • Fitnesscomputations rely on simulations and are very expensive • Search needs better guidance 27
  • 28.
    Decision Trees 28 Partition theinput space into homogeneous regions All points Count 1200 “non-critical” 79% “critical” 21% “non-critical” 59% “critical” 41% Count 564 Count 636 “non-critical” 98% “critical” 2% Count 412 “non-critical” 49% “critical” 51% Count 152 “non-critical” 84% “critical” 16% Count 230 Count 182 vp 0 >= 7.2km/h vp 0 < 7.2km/h ✓p 0 < 218.6 ✓p 0 >= 218.6 RoadTopology(CR = 5, Straight, RH = [4 12](m)) RoadTopology (CR = [10 40](m)) “non-critical” 31% “critical” 69% “non-critical” 72% “critical” 28%
  • 29.
    Genetic Evolution Guidedby Classification 29 Initial input Fitness computation Classification Selection Breeding
  • 30.
    Search Guided byClassification 30 Test input generation (NSGA II) Evaluating test inputs Build a classification tree Select/generate tests in the fittest regions Apply genetic operators Input data ranges/dependencies + Simulator + Fitness functions defined based on Oracles (candidate) test inputs - Simulate every (candidate) test - Compute fitness functions Fitness values Test cases revealing worst case system behaviors + A characterization of critical input regions
  • 31.
    NSGAII-DT vs. NSGAII 31 NSGAII-DToutperforms NSGAII HV 0.0 0.4 0.8 GD 0.05 0.15 0.25 SP 2 0.6 1.0 1.4 6 10 14 18 22 24 Time (h) NSGAII-DT NSGAII
  • 32.
  • 33.
  • 34.
  • 35.
    • Supercharger bypassflap controller üFlap position is bounded within [0..1] üImplemented in MATLAB/Simulink ü34 (sub-)blocks decomposed into 6 abstraction levels Supercharger Bypass Flap Supercharger Bypass Flap Flap position = 0 (open) Flap position = 1 (closed) Simple Example 35
  • 36.
    Initial Desired Value Final Desired Value timetime Desired Value Actual Value T/2 T T/2 T Test Input Test Output Plant Model Controller (SUT) Desired value Error Actual value System output+ - MiL Testing of Controllers 36
  • 37.
    Configurable Controllers atMIL Plant Model + + + ⌃ + - e(t) actual(t) desired(t) ⌃ KP e(t) KD de(t) dt KI R e(t) dt P I D output(t) Time-dependent variables Configuration Parameters 37
  • 38.
    Requirements and TestObjectives InitialDesired (ID) Desired ValueI (input) Actual Value (output) FinalDesired (FD) time T/2 T Smoothness Responsiveness Stability 38
  • 39.
    A Search-Based TestApproach Initial Desired (ID) FinalDesired(FD) Worst Case(s)? • Search directed by model execution feedback • Controller’s dynamic behavior can be complex • Meta-heuristic search in (large) input space: Finding worst case inputs • Possible because of automated oracle (feedback loop) • Different worst cases for different requirements 39
  • 40.
    Initial Solution HeatMap Diagram 1. Exploration Listof Critical RegionsDomain Expert Worst-Case Scenarios + Controller- plant model Objective Functions based on Requirements 2. Single-State Search time Desired Value Actual Value 0 1 2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Initial Desired Final Desired 40
  • 41.
    Results • We foundmuch worse scenarios during MiL testing than our partner had found so far • These scenarios are also run at the HiL level, where testing is much more expensive: MiL results => test selection for HiL • But further research was needed: • Simulations are expensive • Configuration parameters 41
  • 42.
    Final Solution + Controller Model (Simulink) Worst-Case Scenarios List of Critical PartitionsRegression Tree 1.Explorationwith Dimensionality Reduction 2.Search with Surrogate Modeling Objective Functions Domain Expert Visualization of the 8-dimension space using regression treesDimensionality reduction to identify the significant variables (Elementary Effect Analysis) Surrogate modeling to predict the fitness function and speed up the search (Neural network) 42
  • 43.
    Regression Tree All Points FD>=0.43306 Count Mean StdDev Count Mean Std Dev FD<0.43306 Count Mean Std Dev ID>=0.64679 Count Mean Std Dev Count Mean Std Dev Cal5>=0.020847 Cal5>0.020847 Count Mean Std Dev Count Mean Std Dev Cal5>=0.014827 Cal5<0.014827 Count Mean Std Dev Count Mean Std Dev 1000 0.007822 0.0049497 ID<0.64679 574 0.0059513 0.0040003 426 0.0103425 0.0049919 373 0.0047594 0.0034346 201 0.0081631 0.0040422 182 0.0134555 0.0052883 244 0.0080206 0.0031751 70 0.0106795 0.0052045 131 0.0068185 0.0023515 43
  • 44.
    Surrogate Modeling Any supervisedlearning or statistical technique providing fitness predictions with confidence intervals 1. Predict higher fitness with high confidence: Move to new position, no simulation 2. Predict lower fitness with high confidence: Do not move to new position, no simulation 3. Low confidence in prediction: Simulation Surrogate Model Real Function x Fitness 44
  • 45.
    ü Our approachis able to identify more critical violations of the controller requirements that had neither been found with default/fixed configurations nor by manual testing. MiL-Testing different configurations Stability Smoothness Responsiveness MiL-Testing fixed configurations Manual MiL-Testing - -2.2% deviation 24% over/undershoot 20% over/undershoot 5% over/undershoot 170 ms response time 80 ms response time 50 ms response time Results 45
  • 46.
  • 47.
    Problem and Context •Schedulability analysis encompasses techniques that try to predict whether (critical) tasks are schedulable, i.e., meet their deadlines • Stress testing runs carefully selected test cases that have a high probability of leading to deadline misses • Stress testing is complementary to schedulability analysis • Testing is typically expensive, e.g., hardware in the loop • Finding stress test cases is difficult 47
  • 48.
    Finding Stress TestCases is Hard 48 0 1 2 3 4 5 6 7 8 9 j0, j1 , j2 arrive at at0 , at1 , at2 and must finish before dl0 , dl1 , dl2 J1 can miss its deadline dl1 depending on when at2 occurs! 0 1 2 3 4 5 6 7 8 9 j0 j1 j2 j0 j1 j2 at0 dl0 dl1 at1 dl2 at2 T T at0 dl0 dl1 at1 at2 dl2
  • 49.
    Challenges and Solutions •Ranges for arrival times form a very large input space • Task interdependencies and properties constrain what parts of the space are feasible • Solution: We re-expressed the problem as a constraint optimization problem and used a combination of constraint programming (IBM CPLEX) and meta-heuristic search (GA) 49
  • 50.
    Constraint Optimization 50 Constraint OptimizationProblem Static Properties of Tasks (Constants) Dynamic Properties of Tasks (Variables) Performance Requirement (Objective Function) OS Scheduler Behaviour (Constraints)
  • 51.
    Combining CP andGA 51 A:12 S. Di Alesio et al. Fig. 3: Overview of GA+CP: the solutions x , y and z in the initial population of GA evolve into
  • 52.
    Case Study 52 Drivers (Software-Hardware Interface) ControlModules Alarm Devices (Hardware) Multicore Architecture Real-Time Operating System System monitors gas leaks and fire in oil extraction platforms
  • 53.
    Summary • We provideda solution for generating stress test cases by combining meta-heuristic search and constraint programming • Meta-heuristic search (GA) identifies high risk regions in the input space • Constraint programming (CP) finds provably worst-case schedules within these (limited) regions • Achieve (nearly) GA efficiency and CP effectiveness • Our approach can be used both for stress testing and schedulability analysis (assumption free) 53
  • 54.
  • 55.
    42% 32% 9% 4% 3% 3% 3% 2% 2% Code Injection Manipulated datastructures Collect and analyze information Indicator Employ probabilistic techniques Manipulate system resources Subvert access control Abuse existing functionality Engage in deceptive… X-Force Threat Intelligence Index 2017 55 https://www.ibm.com/security/xforce/ More than 40% of all attacks were injection attacks (e.g., SQLi)
  • 56.
  • 57.
    Web Applications 57 Web form str1 str2 Username Password OK SQLquery SELECT * FROM Users WHERE (usr = ‘str1’ AND psw = ‘str2’) Name Surname … John Smith … Result Server SQL DatabaseClient
  • 58.
    Injection Attacks 58 SQL query NameSurname … Aria Stark … John Snow … … … … Query result SELECT * FROM Users WHERE (usr = ‘’ AND psw = ‘’) OR 1=1 -- Server SQL DatabaseClient Web form ‘) OR 1=1 -- Username Password OK
  • 59.
  • 60.
    Web Application Firewalls(WAFs) 60 Servermalicious malicious malicious legitimate WAF
  • 61.
    WAF Rule Set 61 Ruleset of Apache ModSecurity https://github.com/SpiderLabs/ModSecurity
  • 62.
  • 63.
    Grammar-based Attack Generation •BNF grammar for SQLi attacks • Random strategy: randomly selected production rules are applied recursively until only terminals are left • Random strategy not efficient for bypassing attacks that are difficult to find • Machine learning? Search? • How to guide the search? How can ML help?
  • 64.
    Anatomy of SQLiattacks 64 ‘ OR“a”=“a”# Bypassing Attack <START> <sq> <wsp> <sqliAttack> <cmt> <boolAttack> <opOR> <boolTrueExpr> OR <bynaryTrue> <dq> <ch> <dq> <opEq> <dq> <ch> <dq> “ a ” = “ a ” <sQuoteContext> ‘ #_ Derivation Tree ‘ _ OR”a”=“a” # S = { Attack Slices
  • 65.
    Learning Attack Patterns 65 S1S2 S3 S4 … Sn Outcome A1 1 1 0 0 … 0 Passed A2 0 1 0 0 … 0 Blocked … … … … … … … … Am 1 1 1 1 … 1 Blocked Training Set PassedBlocked S4 YesNo YesNo YesNo S3 S2 Decision Tree Sn S1 … • Random trees • Random forest
  • 66.
    Learning Attack Patterns 66 S1S2 S3 S4 … Sn Outcome A1 1 1 0 0 … 0 Passed A2 0 1 0 0 … 0 Blocked … … … … … … … … Am 1 1 1 1 … 1 Blocked PassedBlocked S4 YesNo YesNo YesNo S3 S2 Sn S1 … Training Set Decision Tree Attack Pattern S2 ∧ ¬ Sn ∧ S1
  • 67.
    Machine Learning PassedBlocked S4 YesNo YesNo YesNo S3 S2 Sn S1 … Generating Attacksvia ML and EAs 67 Evolutionary Algorithm (EA) Iteratively refine successful attack conditions
  • 68.
    Some Results Apache ModSecurity 68 DistinctAttacks IndustrialWAFs DistinctAttacks Machine Learning-driven attack generation led to more distinct, successful attacks being discovered faster
  • 69.
    Related Work • Automatedrepair of WAFs • Automated testing targeting XML and SQL injections in web applications 69
  • 70.
  • 71.
    Search-Based Solutions • Versatile •Helps relax assumptions compared to exact approaches • Helps decrease modeling requirements • Scalability, e.g., easy to parallelize • Requires massive empirical studies • Search is rarely sufficient by itself 71
  • 72.
    Multidisciplinary Approach • Single-technologyapproaches rarely work in practice • Combined search with: • Machine learning • Solvers, e.g., CP, SMT • Statistical approaches, e.g., sensitivity analysis • System and environment modeling and simulation 72
  • 73.
    Objectives • Reduce searchspace • Better guide and focus search • Compute fitness and provide guidance • Avoid expensive and useless fitness computations • Explain failures (e.g., decision trees) • Get more guarantees (e.g., constraint programming) 73
  • 74.
    Acknowledgements • Shiva Nejati •Reza Matinnejad • Raja Ben Abdessalem • Stefano Di Alesio • Dennis Appelt • Annibale Panichella 74
  • 75.
    Selected References • L.Briand et al. “Testing the untestable: Model testing of complex software-intensive systems”, IEEE/ACM ICSE 2016, V2025 • R. Matinnejad et al., “MiL Testing of Highly Configurable Continuous Controllers: Scalable Search Using Surrogate Models”, IEEE/ACM ASE 2014 (Distinguished paper award) • S. Di Alesio et al. “Combining genetic algorithms and constraint programming to support stress testing of task deadlines”, ACM Transactions on Software Engineering and Methodology (TOSEM), 25(1):4, 2015 • R. Ben Abdessalem et al., "Testing Vision-Based Control Systems Using Learnable Evolutionary Algorithms”, IEEE/ACM ICSE 2018 • D. Appelt et al., “A Machine Learning-Driven Evolutionary Approach for Testing Web Application Firewalls”, To appear in IEEE Transaction on Reliability • More on: https://wwwen.uni.lu/snt/people/lionel_briand?page=Publications 75
  • 76.
    .lusoftware verification &validation VVS Achieving Scalability in Software Testing with Machine Learning and Metaheuristic Search Lionel Briand RAISE @ ICSE 2018