SlideShare a Scribd company logo
1 of 168
Download to read offline
“Testing with Fewer Resources:
Toward Adaptive Approaches for Cost-e
ff
ective
Test Generation and Selection”
June 22-24, 2022 - Córdoba, Spain
Sebastiano Panichella
Zurich University of Applied Sciences
https://spanichella.github.io/
Christian Birchler
Zurich University of Applied Sciences
https://christianbirchler.github.io/
International Summer School
on Search- and Machine Learning-based
Software Engineering
2
About the Speakers (Christian)
https://christianbirchler.github.io/
2015 - Physics
2017 - Computer Science
- Software Systems
- Data Science
2021 - Research Assistant
- Testing Self-driving Cars
3
About the Speakers (Christian)
University of Sannio
PhD Student
June 2014
University of Salerno
Master Student
December 2010
University of Zurich
Research Associate
October 2014 - August 2018
Zurich University of
Applied Science
Senior Computer Science Researcher
Since August 2018
4
2010
2014
2018
Today
About the Speakers (Sebastiano)
5
Program Comprehension & Maintenance (PC&M)
Generation of source code documentation
Pro
fi
ling developers
Dependencies analysis in software ecosystems
Mobile computing (MC):
- Machine Learning & Genetic Algorithms
SummarizationTechniques
for Code,
Change,
Testing and User Feedback
PhD thesis:
“Supporting Newcomers in Software Development Projects"
Approved Project
Approved Project
Approved Project
Development & Testing challenges:
- Test case generation and assessment
- User Feedback Analysis
- Continuous Delivery
CD anti-patterns
Branch Coverage Prediction
Documentation defects detection
“Complex Systems”
2010
2014
2018
Today
About the Speakers (Sebastiano)
Outline
• Context & Motivation:
• Cyber-physical Systems
• DevOps and Arti
fi
cial Intelligence
• Search-based Software Testing (SBST) Barriers:
• A success SE story
• Overcoming Barriers of SBST Adoption in Industry
• Cost-effective Testing for Self-Driving Cars (SDCs):
• Regression Testing
• Test Selection & Test Prioritization for SDCs
Test
Case
Selection
Initial
Tests
Search
Test
Execution
Variants
Generation
6
Outline
• Context & Motivation:
• Cyber-physical Systems
• DevOps and Arti
fi
cial Intelligence
• Search-based Software Testing (SBST) Barriers:
• A success SE story
• Overcoming Barriers of SBST Adoption in Industry
• Cost-effective Testing for Self-Driving Cars (SDCs):
• Regression Testing
• Test Selection & Test Prioritization for SDCs
Test
Case
Selection
Initial
Tests
Search
Test
Execution
Variants
Generation
7
Context
“Our main research goal is to conduct industrial research, involving both industrial and
academic collaborations, to sustain the Internet of Things (IoT) vision of future "smart cities”,
with millions of smart systems connected over the internet, and/or controlled by complex
embedded software implemented for the cloud."
8
Context
“Our main research goal is to conduct industrial research, involving both industrial and
academic collaborations, to sustain the Internet of Things (IoT) vision of future "smart cities”,
with millions of smart systems connected over the internet, and/or controlled by complex
embedded software implemented for the cloud."
9
2) Artificial
Intelligence (AI) 3) DevOps, IoT,
Automated Testing (AT)
1) Cyber-physical Systems
Next
10-15 Years (and beyond)
Context
“Our main research goal is to conduct industrial research, involving both industrial and
academic collaborations, to sustain the Internet of Things (IoT) vision of future "smart cities”,
with millions of smart systems connected over the internet, and/or controlled by complex
embedded software implemented for the cloud."
10
Sebastiano Panichella Sajad Khatiri
Christian Birchler
COSMOS:
DevOps for Complex Cyber-physical Systems
https://www.cosmos-devops.org/ https://twitter.com/COSMOS_DEVOPS
“Emerging Cyber-physical Systems (CPS) will play a crucial role in the quality of
life of European citizens and the future of the European economy”
COSMOS Context
• CPS relevant sectors:
• Healthcare
• Automotive
• Water Monitoring
• Railway
• Manufacturing
• Avionics
• etc.
MEDICAL DELIVERY
FOOD DELIVERY
• Avionics
12
Industrial Use Cases
AVIATION
E-HEALTH
WATER MONITORING
SATELLITES
AUTOMOTIVE
RAILWAYS
DRONES SELF-DRIVING CARS
Reference Use Cases
13
COSMOS Use Cases
FOOD DELIVERY
Background
First aerodynamic
fl
ight on another planet. Landed with Perseverance rover on 18 February 2021
SPACE EXPLORATION
• -
• Our (Software Engineering) view of DevOps and AI for IoT systems:
• DevOps and Continuous Delivery (CD): Whats is it?
• Present, Challenges, and Opportunities
• Relevant Research Questions
• Arti
fi
cial Intelligence (AI) and Testing Automation:
• Present, Challenges, and Opportunities
• User-oriented Testing Automation
• Relevant Research Questions
“We all recognize the relevance and capacity of contemporary cyber-
physical systems for building the future of our society, but ongoing research
in the
fi
eld is also clearly failing in making the right countermeasures to
avoid that CPS usage a
ff
ects human being safety”. In
“Self-driving Uber kills Arizona
woman in first fatal crash involving
pedestrian”
Problem Statement
“A simple software update was
the direct cause of the fatal
crashes of the Boeing 737”
16
Question:
What are the main Challenges of Testing Cyber-physical Systems?
17
Answers from the Audience (1)
Question:
What are the main Challenges of Testing Cyber-physical Systems?
18
Answers from the Audience (2)
• -
• Our (Software Engineering) view of DevOps and AI for IoT systems:
• DevOps and Continuous Delivery (CD): Whats is it?
• Present, Challenges, and Opportunities
• Relevant Research Questions
• Arti
fi
cial Intelligence (AI) and Testing Automation:
• Present, Challenges, and Opportunities
• User-oriented Testing Automation
• Relevant Research Questions
“Self-driving Uber kills Arizona
woman in first fatal crash involving
pedestrian”
“Swiss Post drone
crashes in Zurich
Challenges
“A simple software update was
the direct cause of the fatal
crashes of the Boeing 737”
Challenge 1: Observability, testability, and predictability of the behavior
of emerging CPS is highly limited and, unfortunately, their usage in the real
world can lead to fatal crashes sometimes tragically involving also humans
19
Research Challenges and Opportunities
As reported by National Academies:
[“A 21st Century Cyber-Physical Systems Education”]
“today's practice of IoT system design and
implementation are often unable to support
the level of ``complexity, scalability, security,
safety, […] required to meet future needs”
20
Research Challenges and Opportunities
As reported by National Academies:
[“A 21st Century Cyber-Physical Systems Education”]
“The main problem is that contemporary
development methodologies for CPS need to
incorporate core aspects of both systems and
software engineering communities, with the
goal to explicitly embrace and consider the
several direct and indirect physical effects of
software”
[“Complexity challenges in development of cyber-physical systems”]
(Martin Törngren, Ulf Sellgren Pages 478-503)
“today's practice of IoT system design and
implementation are often unable to support
the level of ``complexity, scalability, security,
safety, […] required to meet future needs”
21
Crash of
Boeing 737
Research Challenges and Opportunities
As reported by National Academies:
[“A 21st Century Cyber-Physical Systems Education”]
[“Complexity challenges in development of cyber-physical systems”]
(Martin Törngren, Ulf Sellgren Pages 478-503)
[“Complexity challenges in development of cyber-physical systems”]
(Martin Törngren, Ulf Sellgren Pages 478-503)
“As identi
fi
ed by agile methodologies, the
development of modern/emerging systems
(e.g., e-health, automotive, satellite, and IoT
manufacturing systems) should evolve with
the systems, ``as development never ends''
“today's practice of IoT system design and
implementation are often unable to support
the level of ``complexity, scalability, security,
safety, […] required to meet future needs”
“The main problem is that contemporary
development methodologies for CPS need to
incorporate core aspects of both systems and
software engineering communities, with the
goal to explicitly embrace and consider the
several direct and indirect physical effects of
software”
22
Crash of
Boeing 737
Tools
Research Challenges and Opportunities
As reported by National Academies:
[“A 21st Century Cyber-Physical Systems Education”]
[“Complexity challenges in development of cyber-physical systems”]
(Martin Törngren, Ulf Sellgren Pages 478-503)
[“Complexity challenges in development of cyber-physical systems”]
(Martin Törngren, Ulf Sellgren Pages 478-503)
These concepts are closely related to DevOps and
Arti
fi
cial Intelligence technologies, and several
researchers and practitioners advocate them as a
promising solutions for the development,
maintenance, testing, and evolution of these
complex systems
“today's practice of IoT system design and
implementation are often unable to support
the level of ``complexity, scalability, security,
safety, […] required to meet future needs”
“The main problem is that contemporary
development methodologies for CPS need to
incorporate core aspects of both systems and
software engineering communities, with the
goal to explicitly embrace and consider the
several direct and indirect physical effects of
software”
“As identi
fi
ed by agile methodologies, the
development of modern/emerging systems
(e.g., e-health, automotive, satellite, and IoT
manufacturing systems) should evolve with
the systems, ``as development never ends''
23
Crash of
Boeing 737
Tools
Research Challenges and Opportunities
Challenge 1: Observability, testability, and
predictability of the behavior of emerging
CPS is highly limited and, unfortunately,
their usage in the real world can lead to fatal
crashes sometimes tragically involving also
humans
These concepts are closely related to DevOps and
Arti
fi
cial Intelligence technologies, and several
researchers and practitioners advocate them as a
promising solutions for the development,
maintenance, testing, and evolution of these
complex systems
Challenge 2: Contemporary DevOps and
AI practices and tools are potentially the
right solution to this problem, but they are
not developed to be applied in CPS
domains
24
• Context & Motivation:
• Cyber-physical Systems
• DevOps and Arti
fi
cial Intelligence
• Search-based Software Testing (SBST) Barriers:
• A success SE story
• Overcoming Barriers of SBST Adoption in Industry
• Cost-effective Testing for Self-Driving Cars (SDCs):
• Regression Testing
• Test Selection & Test Prioritization for SDCs
Outline
Test
Case
Selection
Initial
Tests
Search
Test
Execution
Variants
Generation
25
Traditional DevOps Pipeline
ADSs
• Generate Diversi
fi
ed Test
Inputs (or Scenarios)
• Evaluation based Failures
Detection
V.S.
Manual Automated Testing (SBST)
26
Search-Based Software Testing (SBST)
“The initial population” is a set of randomly generated test cases.
Selection
Crossover
Mutation
End?
YES NO
Initial Population
27
(Fitness Function) We need to select the best “fittest” test case for reproduction
Single-Point Crossover
Mutation: randomly changes some genes (elements within
each chromosome).
Mutation probability: each statement is mutated with
prob=1/n where n=#statements
Mutation
27
Mutation
Initial Population
Selection
Crossover
Mutation
End?
YES NO
class Triangle {
void computeTriangleType() {
if (isTriangle()){
if (side1 == side2) {
if (side2 == side3)
type = "EQUILATERAL";
else
type = "ISOSCELES";
} else {
if (side1 == side3) {
type = "ISOSCELES";
} else {
if (side2 == side3)
type = “ISOSCELES”;
else
checkRightAngle();
}
}
}// if isTriangle()
}}
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Goal
Class Under Test
Iterate until line 9 is covered
or the search budget (running
time or #iterations) is
consumed
28
Search-Based Software Testing (SBST)
28
Mutation
class Triangle {
void computeTriangleType() {
if (isTriangle()){
if (side1 == side2) {
if (side2 == side3)
type = "EQUILATERAL";
else
type = "ISOSCELES";
} else {
if (side1 == side3) {
type = "ISOSCELES";
} else {
if (side2 == side3)
type = “ISOSCELES”;
else
checkRightAngle();
}
}
}// if isTriangle()
}}
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Class Under Test
29
1
2
5
6 7 3
9
8
10
4
Control Flow Graph
Goal: Covering as many
code elements as possible
Branch coverage
Targets = {<1,5>, <1,2>,<5,6>, <5,7>,
<2,3>, <2,4>, <6,10>, <7,8>, <7,9>,
<3,10>, <4,10>, <8,10>, <9,10>}
Statement coverage
Targets = {1, 2, 3,4 ,5 ,6, 7, 8, 9,10}
Path coverage
Targets = {<1,5,6,10>, <1,5,7,8,10>,
<1,5,7,9,10>, <1,2,3,10>, <1,2,4,10>}
Search-Based Software Testing (SBST)
29
http://www.evosuite.org
• Command Line
• Eclipse Plugin
• Maven Plugin
• Measure Code Coverage
•…
30
Successful SBST Stories
Successful SBST Stories
Generated Tests
Production Code
31
https://github.com/EvoSuite/evosuite
https://arstechnica.com/information-technology/2017/08/facebook-dynamic-analysis-software-sapienz/
“The Sapienz Project at Facebook”
Successful SBST Stories
Sapienz in action:
https://youtu.be/j3eV8NiWLg4
32
What are the main Barriers of SBST tools Adoption in Real
Society (e.g., in industrial setting)?
Question
“Manual Testing is still Dominant in Industry…”
33
Answers from the Audience
SBST Barrier to Practical Adoption:
Test Code Comprehension
Are Generated Tests Helpful?
Modeling Readability to Improve Unit Tests
Ermira Daka, José Campos, and
Gordon Fraser
University of Sheffield
Sheffield, UK
Jonathan Dorn and Westley Weimer
University of Virginia
Virginia, USA
ABSTRACT
Writing good unit tests can be tedious and error prone, but even
once they are written, the job is not done: Developers need to reason
about unit tests throughout software development and evolution, in
order to diagnose test failures, maintain the tests, and to understand
code written by other developers. Unreadable tests are more dif-
ficult to maintain and lose some of their value to developers. To
overcome this problem, we propose a domain-specific model of unit
test readability based on human judgements, and use this model to
augment automated unit test generation. The resulting approach can
automatically generate test suites with both high coverage and also
improved readability. In human studies users prefer our improved
tests and are able to answer maintenance questions about them 14%
more quickly at the same level of accuracy.
Categories and Subject Descriptors. D.2.5 [Software Engineer-
ing]: Testing and Debugging – Testing Tools;
Keywords. Readability, unit testing, automated test generation
1. INTRODUCTION
Unit testing is a popular technique in object oriented program-
ming, where efficient automation frameworks such as JUnit allow
unit tests to be defined and executed conveniently. However, pro-
ducing good tests is a tedious and error prone task, and over their
lifetime, these tests often need to be read and understood by different
people. Developers use their own tests to guide their implemen-
tation activities, receive tests from automated unit test generation
tools to improve their test suites, and rely on the tests written by
developers of other code. Any test failures require fixing either the
software or the failing test, and any passing test may be consulted
by developers as documentation and usage example for the code
under test. Test comprehension is a manual activity that requires
one to understand the behavior represented by a test — a task that
may not be easy if the test was written a week ago, difficult if it
was written by a different person, and challenging if the test was
generated automatically.
How difficult it is to understand a unit test depends on many
factors. Unit tests for object-oriented languages typically consist of
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
ElementName elementName0 = new ElementName("", "");
Class<Object> class0 = Object.class;
VirtualHandler virtualHandler0 = new VirtualHandler(
elementName0, (Class) class0);
Object object0 = new Object();
RootHandler rootHandler0 = new RootHandler((ObjectHandler
) virtualHandler0, object0);
ObjectHandlerAdapter objectHandlerAdapter0 = new
ObjectHandlerAdapter((ObjectHandlerInterface)
rootHandler0);
assertEquals("ObjectHandlerAdapter",
objectHandlerAdapter0.getName());
ObjectHandlerAdapter objectHandlerAdapter0 = new
ObjectHandlerAdapter((ObjectHandlerInterface) null);
assertEquals("ObjectHandlerAdapter",
objectHandlerAdapter0.getName());
Figure 1: Two versions of a test that exercise the same functionality
but have a different appearance and readability.
sequences of calls to instantiate various objects, bring them to appro-
priate states, and create interactions between them. The particular
choice of sequence of calls and values can have a large impact on the
resulting test. For example, consider the pair of unit tests shown in
Figure 1. Both tests exercise the same functionality with respect to
the constructor of the class ObjectHandlerAdaptor in the Xi-
neo open source project (which treats null and rootHandler0
arguments identically). Despite this identical coverage of the subject
class in practice, they are quite different in presentation.
In terms of concrete features that may affect comprehension, the
first test is longer, uses more different classes, defines more variables,
has more parentheses, has longer lines. The visual appearance
of code in general is referred to as its readability — if code is
not readable, intuitively it will be more difficult to perform any
tasks that require understanding it. Despite significant interest from
managers and developers [8], a general understanding of software
readability remains elusive. For source code, Buse and Weimer [7]
applied machine learning on a dataset of code snippets with human
annotated ratings of readability, allowing them to predict whether
code snippets are considered readable or not. Although unit tests
are also just code in principle, they use a much more restricted
set of language features; for example, unit tests usually do not
contain conditional or looping statements. Therefore, a general code
readability metric may not be well suited for unit tests.
In this paper, we address this problem by designing a domain-
specific model of readability based on human judgements that ap-
plies to object oriented unit test cases. To support developers in
deriving readable unit tests, we use this model in an automated ap-
proach to improve the readability of unit tests, and integrate this into
an automated unit test generation tool. In detail, the contributions
of this paper are as follows:
• An analysis of the syntactic features of unit tests and their
Does Automated White-Box Test Generation
Really Help Software Testers?
Gordon Fraser1
Matt Staats2
Phil McMinn1
Andrea Arcuri3
Frank Padberg4
1
Department of 2
Division of Web Science 3
Simula Research 4
Karlsruhe Institute of
Computer Science, and Technology, Laboratory, Technology,
University of Sheffield, UK KAIST, South Korea Norway Karlsruhe, Germany
ABSTRACT
Automated test generation techniques can efficiently produce test
data that systematically cover structural aspects of a program. In
the absence of a specification, a common assumption is that these
tests relieve a developer of most of the work, as the act of testing
is reduced to checking the results of the tests. Although this as-
sumption has persisted for decades, there has been no conclusive
evidence to date confirming it. However, the fact that the approach
has only seen a limited uptake in industry suggests the contrary, and
calls into question its practical usefulness. To investigate this issue,
we performed a controlled experiment comparing a total of 49 sub-
jects split between writing tests manually and writing tests with the
aid of an automated unit test generation tool, EVOSUITE. We found
that, on one hand, tool support leads to clear improvements in com-
monly applied quality metrics such as code coverage (up to 300%
increase). However, on the other hand, there was no measurable
improvement in the number of bugs actually found by developers.
Our results not only cast some doubt on how the research commu-
nity evaluates test generation tools, but also point to improvements
and future work necessary before automated test generation tools
will be widely adopted by practitioners.
Categories and Subject Descriptors. D.2.5 [Software Engineer-
ing]: Testing and Debugging – Testing Tools;
General Terms. Algorithms, Experimentation, Reliability, Theory
Keywords. Unit testing, automated test generation, branch cover-
age, empirical software engineering
1. INTRODUCTION
Controlled empirical studies involving human subjects are not
common in software engineering. A recent survey by Sjoberg et
al. [28] showed that out of 5,453 analyzed software engineering
articles, only 1.9% included a controlled study with human sub-
jects. For software testing, several novel techniques and tools have
been developed to automate and solve different kinds of problems
and tasks—however, they have, in general, only been evaluated us-
ing surrogate measures (e.g., code coverage), and not with human
testers—leaving unanswered the more directly relevant question:
Does technique X really help software testers?
This paper addresses this question in the context of automated
white-box test generation, a research area that has received much
attention of late (e.g., [8, 12, 18, 31, 32]). When using white-box
test generation, a developer need not manually write the entire test
suite, and can instead automatically generate a set of test inputs
that systematically exercise a program (for example, by covering
all branches), and only need check that the outputs for the test in-
puts match those expected. Although the benefits for the developer
seem obvious, there is little evidence that it is effective for practical
software development. Manual testing is still dominant in industry,
and research tools are commonly evaluated in terms of code cover-
age achieved and other automatically measurable metrics that can
be applied without the involvement of actual end-users.
In order to determine if automated test generation is really help-
ful for software testing in a scenario without automated oracles, we
performed a controlled experiment involving 49 human subjects.
Subjects were given one of three Java classes containing seeded
faults and were asked to construct a JUnit test suite either manu-
ally, or with the assistance of the automated white-box test genera-
tion tool EVOSUITE [8]. EVOSUITE automatically produces JUnit
test suites that target branch coverage, and these unit tests contain
assertions that reflect the current behaviour of the class [10]. Con-
sequently, if the current behaviour is faulty, the assertions reflecting
the incorrect behaviour must be corrected. The performance of the
subjects was measured in terms of coverage, seeded faults found,
mutation score, and erroneous tests produced.
Our study yields three key results:
1. The experiment results confirm that tools for automated test
generation are effective at what they are designed to do—
producing test suites with high code coverage—when com-
pared with those constructed by humans.
2. The study does not confirm that using automated tools de-
signed for high coverage actually helps in finding faults. In
our experiments, subjects using EVOSUITE found the same
number of faults as manual testers, and during subsequent
mutation analysis, test suites did not always have higher mu-
tation scores.
3. Investigating how test suites evolve over the course of a test-
ing session revealed that there is a need to re-think test gen-
Developers spend up to 50% of their time in
understanding and analyzing the output of
automatic tools.
Fraser et al.
“Professional developers perceive
generated test cases as hard to
understand.”
Dana et al.
34
SBST Barrier to Practical Adoption:
Test Code Comprehension
Why?
Class Name: Option.java
Library: Apache Commons-Cli
Q1: What are the main
differences?
Generated Tests
Q2: Do they cover different
parts of the code?
35
SBST Barrier to Practical Adoption:
Test Code Comprehension
Why?
Class Name: Option.java
Library: Apache Commons-Cli
Q1: What are the main
differences?
Generated Tests
Q2: Do they cover different
parts of the code?
Candidate
Assertions
Q3: Are these
assertions correct?
Earl T. Barr, et al., “The Oracle Problem in Software Testing: A Survey”.IEEE Transactions on Software Engineering, 2015.
36
SBST Barrier to Practical Adoption:
Test Code Comprehension
Why?
Class Name: Option.java
Library: Apache Commons-Cli
Q1: What are the main
differences?
Generated Tests
Q2: Do they cover different
parts of the code?
Q3: Are these
assertions correct?
Earl T. Barr, et al., “The Oracle Problem in Software Testing: A Survey”.IEEE Transactions on Software Engineering, 2015.
37
SBST Barrier to Practical Adoption:
Test Code Comprehension
Are Generated Tests Helpful?
G. Fraser et al., Does Automated Unit Test Generation
Really Help Software Testers? A Controlled Empirical
Study, TOSEM 2015. 38
Automatically generated tests do not
improve the ability of developers to detect
faults when compared to manual testing.
39
?
SBST Barrier to Practical Adoption:
Addressing Test Code Comprehension
Test Case
How to Generate Test Case Summary?
Panichella et al. “The Impact of Test Case Summaries
on Bug Fixing Performance: An Empirical Investigation”.
ICSE 2016
How to Generate Test Case Summary?
SBST Barrier to Practical Adoption:
Addressing Test Code Comprehension
Panichella et al. “The Impact of Test Case Summaries
on Bug Fixing Performance: An Empirical Investigation”.
ICSE 2016
40
?
Generated Unit Test
… with Descriptions
40
How to Generate Test Case Summary?
SBST Barrier to Practical Adoption:
Addressing Test Code Comprehension ?
Panichella et al. “The Impact of Test Case Summaries
on Bug Fixing Performance: An Empirical Investigation”.
ICSE 2016
41
http://textcompactor.com/
35
Intuition
41
Summary Generator
Software Words Usage Model: deriving <actions>, <themes>, and
<secondary arguments> from class, methods, attributes and
variable identifiers
E. Hill et al. Automatically capturing
source code context of NL-queries for
software maintenance and reuse.
ICSE 2009
42
Summary Generator
SWUM in TestDescriber:
Covered Code
43
43
Summary Generator
SWUM in TestDescriber:
1) Select the covered statements
Covered Code
44
44
SWUM in TestDescriber:
1) Select the covered statements
2) Filter out Java keywords, etc.
Summary Generator
Covered Code
45
45
SWUM in TestDescriber:
1) Select the covered statements
2) Filter out Java keywords, etc.
3) Identifier Splitting (Camel case)
Summary Generator
Covered Code
46
46
SWUM in TestDescriber:
1) Select the covered statements
2) Filter out Java keywords, etc.
3) Identifier Splitting (Camel case)
4) Abbreviation Expansion (using
external vocabularies)
Summary Generator
Covered Code
47
47
SWUM in TestDescriber:
1) Select the covered statements
2) Filter out Java keywords, etc.
3) Identifier Splitting (Camel case)
4) Abbreviation Expansion (using
external vocabularies)
5) Part-of-Speech tagger
Summary Generator
<actions> = Verbs
<themes> = Nouns/Subjects
<secondary arguments> = Nouns / objectes, adjectives, etc
NOUN NOUN NOUN
ADJ
NOUN
NOUN
VERB
NOUN NOUN NOUN
NOUN
VERB NOUN
NOUN
ADJ
ADJ ADJ ADJ
NOUN
NOUN NOUN
VERB
ADJ
NOUN
CON
NOUN
ADJ
Covered Code
48
48
Summary Generator
NOUN NOUN NOUN
ADJ
NOUN
NOUN
VERB
NOUN NOUN NOUN
NOUN
VERB NOUN
NOUN
ADJ
ADJ ADJ ADJ
NOUN
NOUN NOUN
VERB
ADJ
NOUN
CON
NOUN
The test case instantiates an "Option"
with:
- option equal to “...”
- long option equal to “...”
- it has no argument
- description equal to “…”
An option validator validates it
The test exercises the following
condition:
- "Option" has no argument
NOUN NOUN NOUN
ADJ
NOUN
NOUN
VERB
NOUN NOUN NOUN
NOUN
VERB NOUN
NOUN
ADJ
ADJ ADJ ADJ
NOUN
NOUN NOUN
VERB
ADJ
NOUN
CON
NOUN
ADJ
Natural Language Sentences Parsed Code
49
49
The test case instantiates an "Option"
with:
- option equal to “...”
- long option equal to “...”
- it has no argument
- description equal to “…”
An option validator validates it
The test exercises the following
condition:
- "Option" has no argument
Natural Language Sentences
Class
Level
Method
Level
Statement
Level
Branch
Level
Summarisation Levels
50
50
The test case instantiates an "Option"
with:
- option equal to “...”
- long option equal to “...”
- it has no argument
- description equal to “…”
An option validator validates it
The test exercises the following
condition:
- "Option" has no argument
Natural Language Sentences
51
51
Summary Generator
Summary Generator
The test case instantiates an "Option"
with:
- option equal to “...”
- long option equal to “...”
- it has no argument
- description equal to “…”
An option validator validates it
The test exercises the following
condition:
- "Option" has no argument
Natural Language Sentences
52
52
Q2: Do Test Summaries Improve Test Readability?
Q1: Do Test Summaries Help Developers find more bugs?
Case Study
Bug Fixing Tasks
Involving 30 Developers
53
53
Subjects: 30 Developers (23 Researchers and 7 Developers)
Context
Object: two Java classes from Apache Commons Primitives and
Math4J that have been used in previous studies on search-based
software testing [by Fraser et al. TOSEM 2015]
ArrayIntList.java
Rational.java
Bug Fixing Tasks
Group 1 Group 2
ArrayIntList.java
Rational.java ArrayIntList.java
Rational.java
55
Bug Fixing Tasks
Group 1 Group 2
ArrayIntList.java
Rational.java ArrayIntList.java
Rational.java
56
Bug Fixing Tasks
Group 1 Group 2
ArrayIntList.java
Rational.java ArrayIntList.java
Rational.java
57
Bug Fixing Tasks
Group 1 Group 2
ArrayIntList.java
Rational.java ArrayIntList.java
Rational.java
Comments Comments
TestDescriber
58
Bug Fixing Tasks
Experiment conducted Offline via a Survey platform
Each participant received the experiment package consisting of:
1. A pretest questionnaire
2. Instructions and materials to perform the experiment
3. A post-test questionnaire
We did not reveal the goal of the study
45 minutes of time for each task
59
Q1: How do test case summaries impact the
number of bugs fixed by developers?
60
Comments
Q1: How do test case summaries impact the
number of bugs fixed by developers?
61
Comments
Summary: Using automatically generated test case summaries
significantly helps developers to identify and fix more bugs.
WITH Summaries:
(i) 46% of participants
consider the test cases as
“easy to understand”.
(ii) Only 18% of participants
considered the test cases
as incomprehensible.
Without
With 4%
6%
14%
33%
14%
6%
32%
9%
36%
45%
Medium High Very High Low Very Low
Perceived test comprehensibility WITH and
WITHOUT TestDescriber summaries
62
Q2: Do Test Summaries Improve Test Readability?
Comments
WITHOUT Summaries:
(i) Only 15% of participants
consider the test cases as
“easy to understand”.
(iii) 40% of participants
considered the test cases
as incomprehensible.
WITH Summaries:
(i) 46% of participants
consider the test cases as
“easy to understand”.
(ii) Only 18% of participants
considered the test cases
as incomprehensible.
Without
With 4%
6%
14%
33%
14%
6%
32%
9%
36%
45%
Medium High Very High Low Very Low
Perceived test comprehensibility WITH and
WITHOUT TestDescriber summaries
63
Q2: Do Test Summaries Improve Test Readability?
Comments
WITHOUT Summaries:
(i) Only 15% of participants
consider the test cases as
“easy to understand”.
(iii) 40% of participants
considered the test cases
as incomprehensible.
WITH Summaries:
(i) 46% of participants
consider the test cases as
“easy to understand”.
(ii) Only 18% of participants
considered the test cases
as incomprehensible.
Without
With 4%
6%
14%
33%
14%
6%
32%
9%
36%
45%
Medium High Very High Low Very Low
Perceived test comprehensibility WITH and
WITHOUT TestDescriber summaries
64
Q2: Do Test Summaries Improve Test Readability?
Comments
Summary: Test summaries statistically improve the
comprehensibility of automatically generated test case
according to human judgments.
1) Using automatically generated test
case summaries significantly helps
developers to identify and fix more bugs.
2) Test summaries statistically improve
the comprehensibility of automatically
generated test case according to
human judgments.
Panichella et al. “The Impact of Test Case
Summaries on Bug Fixing Performance: An
Empirical Investigation”. ICSE 2016
65
SBST Barrier to Practical Adoption:
Addressing Test Code Comprehension
66
SBST Barrier to Practical Adoption:
Addressing Test Code Comprehension
Other Studies Addressing this Open Problem…
Daka et al. Generating unit tests with descriptive names or:
would you name your children thing1 and thing2? ISSTA 2017
Generating unit tests with descriptive names
Panichella et al. Revisiting Test Smells in Automatically Generated
Tests: Limitations, Pitfalls, and Opportunities. ICSME 2020
Test Smells in Automatically Generated Tests
SBST Barrier to Practical Adoption:
Cost-effectiveness of Generated Tests
Why?
67
class Triangle {
int a, b, c; //sides
String type = "NOT_TRIANGLE";
Triangle (int a, int b, int c){…}
void computeTriangleType() {
1. if (a == b) {
2. if (b == c)
3. type = "EQUILATERAL";
else
4. type = "ISOSCELES";
} else {
5. if (a == c) {
6. type = "ISOSCELES";
} else {
7. if (b == c)
8. type = “ISOSCELES”;
else
9. type = “SCALENE”;
}
}
}
Java Class Under Test (CUT)
@Test
public void test(){
Triangle t = new Triangle (1,2,3);
t.computeTriangleType();
String type = t.getType();
assertTrue(type.equals(“SCALENE”));
}
Test Case
SBST Barrier to Practical Adoption:
Cost-effectiveness of Generated Tests
Why?
68
class Triangle {
int a, b, c; //sides
String type = "NOT_TRIANGLE";
Triangle (int a, int b, int c){…}
void computeTriangleType() {
1. if (a == b) {
2. if (b == c)
3. type = "EQUILATERAL";
else
4. type = "ISOSCELES";
} else {
5. if (a == c) {
6. type = "ISOSCELES";
} else {
7. if (b == c)
8. type = “ISOSCELES”;
else
9. type = “SCALENE”;
}
}
}
Java Class Under Test (CUT)
@Test
public void test(){
Triangle t = new Triangle (1,2,3);
t.computeTriangleType();
String type = t.getType();
assertTrue(type.equals(“SCALENE”));
}
Test Case
Code Coverage:
The main
Quality Assessment
Criteria
SBST Barrier to Practical Adoption:
Cost-effectiveness of Generated Tests
Why?
69
class Triangle {
int a, b, c; //sides
String type = "NOT_TRIANGLE";
Triangle (int a, int b, int c){…}
void computeTriangleType() {
1. if (a == b) {
2. if (b == c)
3. type = "EQUILATERAL";
else
4. type = "ISOSCELES";
} else {
5. if (a == c) {
6. type = "ISOSCELES";
} else {
7. if (b == c)
8. type = “ISOSCELES”;
else
9. type = “SCALENE”;
}
}
}
Java Class Under Test (CUT)
Software
Quality
Money
Time
Practical Constraints?
Cost Effectiveness: example
Class A Class B Class C Class D
70
Test Generation Tool 1
TestClass A TestClass B
Class A Class B Class C Class D
Cost Effectiveness: example
TestClass C TestClass D
Test Generation Tool 2
TestClass A TestClass B TestClass C TestClass D
BUG BUG
71
Test Generation Tool 1
TestClass A TestClass B
Cost Effectiveness: example
TestClass C TestClass D
Test Generation Tool 2
TestClass A TestClass B TestClass C TestClass D
BUG BUG
Coverage 67%
Coverage 66.5%
We need COST oriented models
V.S.
Manual Automated Testing
72
Test Generation Tool 1
TestClass A TestClass B
Cost Effectiveness: example
TestClass C TestClass D
Test Generation Tool 2
TestClass A TestClass B TestClass C TestClass D
BUG BUG
Coverage 67%
Coverage 66.5%
We need COST oriented models
+20%
V.S.
Manual Automated Testing
73
74
SBST Barrier to Practical Adoption:
Cost-effective Generation of Tests
Automatically generating tests with
appropriate performance (CPU, memory, etc.)
when deployed in different environments
Grano et al. “Testing with Fewer Resources: An
Adaptive Approach to Performance-Aware Test
Case Generation”. TSE 2019
Further Performance
Indicators…
+
It uses indicators of
Test Coverage…
?
75
Grano et al. “Testing with Fewer Resources: An
Adaptive Approach to Performance-Aware Test
Case Generation”. TSE 2019
Cached history
information
Kim at al.
ICSE 2007
Change Metrics
Moser at al.
ICSE 2008.
A metrics suite for
object oriented design
Chidamber at al.
TSE 1994
Indicators of Complexity
Cost-effective Generation of Tests
76
“We needed indicators (or
proxies) of test performances
(CPU, memory, etc.)..”
Grano et al. “Testing with Fewer Resources: An
Adaptive Approach to Performance-Aware Test
Case Generation”. TSE 2019
Cached history
information
Kim at al.
ICSE 2007
Change Metrics
Moser at al.
ICSE 2008.
A metrics suite for
object oriented design
Chidamber at al.
TSE 1994
Indicators of Complexity
Cost-effective Generation of Tests
pDynaMOSA (Adaptive Performance-Aware DynaMOSA),
pDynaMOSA Pipeline
2)
1)
First Criteria
Second Criteria
Yes
No
Yes
No
Coverage
Performance
SBST Barrier to Practical Adoption:
Cost-effective Generated of Tests
77
pDynaMOSA (Adaptive Performance-Aware DynaMOSA),
pDynaMOSA Pipeline
2)
1)
First Criteria
Second Criteria
Yes
No
Yes
No
Coverage
Performance
78
SBST Barrier to Practical Adoption:
Cost-effective Generation of Tests
pDynaMOSA (Adaptive Performance-Aware DynaMOSA),
pDynaMOSA Pipeline
2)
1)
First Criteria
Second Criteria
Yes
No
Yes
No
Coverage
Performance
79
SBST Barrier to Practical Adoption:
Cost-effective Generation of Tests
pDynaMOSA (Adaptive Performance-Aware DynaMOSA),
pDynaMOSA Pipeline
2)
1)
First Criteria
Second Criteria
Yes
No
Yes
No
Coverage
Performance
80
SBST Barrier to Practical Adoption:
Cost-effective Generation of Tests
De Oliveira “Perphecy: Performance regression test
selection made simple but effective,” (ICST 2017)
I1. Number of executed loops (branches)
(Higher loop cycle counts in
fl
uence the runtime of the test
case).
2)
1)
First Criteria
Second Criteria
Yes
No
Yes
No
Coverage
Performance
I2. Number of (test and code) method calls
(Fewer method calls result in shorter runtimes and
lower heap memory usage due to potentially fewer object
instantiations.).
I3. Number of object instantiations (not size)
(reducing the number of instantiated objects may lead to
decreased usage of heap memory - e.g., arrays dimension).
I4. Number of executed (test and code) Statements
(Statement execution frequency is a well-known proxy for
runtime).
I5. Test Length (LOC of test case)
(Super set of I2 AND I4)
A set of static (test) and dynamic (prod. code) performance proxies that provide
an approximation of the test execution costs (i.e., runtime and memory usage).
SBST Barrier to Practical Adoption:
Cost-effective Generated of Tests
Second Criteria
De Oliveira “Perphecy: Performance regression test
selection made simple but effective,” (ICST 2017)
I1. Number of executed loops (branches)
(Higher loop cycle counts in
fl
uence the runtime of the test
case).
I2. Number of (test and code) method calls
(Fewer method calls result in shorter runtimes and
lower heap memory usage due to potentially fewer object
instantiations.).
I3. Number of object instantiations (not size)
(reducing the number of instantiated objects may lead to
decreased usage of heap memory - e.g., arrays dimension).
I4. Number of executed (test and code) Statements
(Statement execution frequency is a well-known proxy for
runtime).
I5. Test Length (LOC of test case)
(Super set of I2 AND I4)
SBST Barrier to Practical Adoption:
Cost-effective Generated of Tests
Dataset from SBST Community
G. Fraser et al. “A large-scale evaluation of automated unit test generation using evosuite,”
ACM Transactions on Software Engineering and Methodology (TOSEM),
Evaluation
Q1. (Effectiveness) What is the target coverage achieved by
pDynaMOSA compared to DynaMOSA?
Small gain in terms of coverage… 83
SBST Barrier to Practical Adoption:
Cost-effective Generated of Tests
We may lose in terms of fault detection…
Q2. (Fault Detection) What is the mutation score achieved
by pDynaMOSA compared to DynaMOSA?
Small gain in terms of coverage… 84
SBST Barrier to Practical Adoption:
Cost-effective Generated of Tests
Small gain in terms of coverage…
Q3. (Performance) Does the adoption of performance proxies
lead to shorter runtime and lower heap memory consumption?
Huge bene
fi
ts in terms of runtime and
memory consumption…
“Statistically
rigorous java performance
evaluation,”
OOPSLA ’07.
85
SBST Barrier to Practical Adoption:
Cost-effective Generated of Tests
We may lose in terms of fault detection…
• Context & Motivation:
• Cyber-physical Systems
• DevOps and Arti
fi
cial Intelligence
• Search-based Software Testing (SBST) Barriers:
• A success SE story
• Overcoming Barriers of SBST Adoption in Industry
• Cost-effective Testing for Self-Driving Cars (SDCs):
• Regression Testing
• Test Selection & Test Prioritization for SDCs
Outline
Test
Case
Selection
Initial
Tests
Search
Test
Execution
Variants
Generation
86
Tesla Car
Autonomous Driving Systems (ADSs)
Multi-sensing Systems:
• Autonomous systems capture surrounding
environmental data at run-time via
multiple sensors (e.g. camera, radar, lidar)
as inputs
• Processes these data with Deep Neural
Networks (DNNs) and outputs control
decisions (e.g. steering).
• Requires robust testing that
• creates realistic, diverse test cases
87
Traf
fi
c Sign Recognition (TSR)
Pedestrian Protection (PP) Lane Departure Warning (LDW)
Automated Emergency Braking (AEB)
Environmental Data Collection With ADSs Sensors
88
.
.
.
Driving
Actions
Sensors /
Camera
Autonomous
Feature
Actuator
89
Environmental Data Collection With ADSs Sensors
1. Pedestrians
2. Lane Position
4. Other Cars
3. Traf
fi
c Signs
DNNs • steering
• stop
• acceleration/
deceleration
• …
ADSs
90
Traditional DevOps Pipeline ADSs
“Manual Testing is still
Dominant…”
Testing Steps in ADSs
91
Requirements of Testing ADSs
• Generate Diversi
fi
ed Test
Inputs (or Scenarios)
• Evaluation based Failures
Detection
“Manual Testing is still
Dominant…”
92
class Triangle {
int a, b, c; //sides
String type = "NOT_TRIANGLE";
Triangle (int a, int b, int c){…}
void computeTriangleType() {
1. if (a == b) {
2. if (b == c)
3. type = "EQUILATERAL";
else
4. type = "ISOSCELES";
} else {
5. if (a == c) {
6. type = "ISOSCELES";
} else {
7. if (b == c)
8. type = “ISOSCELES”;
else
9. type = “SCALENE”;
}
}
}
Java Class Under Test (CUT)
@Test
public void test(){
Triangle t = new Triangle (1,2,3);
t.computeTriangleType();
String type = t.getType();
assertTrue(type.equals(“SCALENE”));
}
Test Case
Traditional Development Pipeline:
Coding v.s. Testing
93
class Triangle {
int a, b, c; //sides
String type = "NOT_TRIANGLE";
Triangle (int a, int b, int c){…}
void computeTriangleType() {
1. if (a == b) {
2. if (b == c)
3. type = "EQUILATERAL";
else
4. type = "ISOSCELES";
} else {
5. if (a == c) {
6. type = "ISOSCELES";
} else {
7. if (b == c)
8. type = “ISOSCELES”;
else
9. type = “SCALENE”;
}
}
}
Java Class Under Test (CUT)
@Test
public void test(){
Triangle t = new Triangle (1,2,3);
t.computeTriangleType();
String type = t.getType();
assertTrue(type.equals(“SCALENE”));
}
Test Case
Code Coverage:
The main
Quality Assessment
Criteria
Traditional Development Pipeline:
Coding v.s. Testing
94
class Triangle {
int a, b, c; //sides
String type = "NOT_TRIANGLE";
Triangle (int a, int b, int c){…}
void computeTriangleType() {
1. if (a == b) {
2. if (b == c)
3. type = "EQUILATERAL";
else
4. type = "ISOSCELES";
} else {
5. if (a == c) {
6. type = "ISOSCELES";
} else {
7. if (b == c)
8. type = “ISOSCELES”;
else
9. type = “SCALENE”;
}
}
}
Java Class Under Test (CUT)
@Test
public void test(){
Triangle t = new Triangle (1,2,3);
t.computeTriangleType();
String type = t.getType();
assertTrue(type.equals(“SCALENE”));
}
Test Case
Traditional Development Pipeline:
Coding v.s. Testing
Code Coverage:
Not Suf
fi
cient as
Quality Assessment
Criteria
Challenges of Testing ADSs
95
Challenge 1:
Code coverage
vs.
Scenario Coverage
Challenge 2:
Code coverage
&
CPU & Memory
consumption
Challenge 3:
Unit-Test
v.s.
System-level Testing
96
Stop
Testing Target: Feature Interactions Failures
In
fi
nite Test Space
97
Testing Autonomous Driving Systems
98
World of Agile, 2018
99
World of Agile, 2018
Testing Autonomous Driving Systems
100
World of Agile, 2018
Testing Autonomous Driving Systems
101
Testing Autonomous Driving Systems
102
SRF
Swiss Post drone crashed into a lake
Testing Autonomous Driving Systems
103
SRF
NZZ
Swiss Post drone crashed in Zurich
Swiss Post drone crashed into a lake
Testing Autonomous Driving Systems
104
Testing Autonomous Driving Systems
105
npr, January 2022
Testing Autonomous Driving Systems
106
World of Agile, 2018
npr, January 2022 Reuters, September 2021
Testing Autonomous Driving Systems
107
World of Agile, 2018
The New York Times, April 2021
npr, January 2022 Reuters, September 2021
Testing Autonomous Driving Systems
108
Testing Autonomous Driving Systems
109
World of Agile, 2018
Testing Autonomous Driving Systems
110
World of Agile, 2018
Testing Autonomous Driving Systems
111
Testing Autonomous Driving Systems
112
Testing Autonomous Driving Systems
113
Testing Autonomous Driving Systems
114
Testing Autonomous Driving Systems
115
World of Agile, 2018
Testing Autonomous Driving Systems
116
Testing Autonomous Driving Systems
Real-world testing:
➡Realistic
➡Trustworthy
➡Costly
➡Nondeterministic
Simulation is:
➡Cheaper
➡Faster
➡Less reliable
➡Complex CI/CD
integration
117
Testing Autonomous Driving Systems
Real-world testing:
➡Realistic
➡Trustworthy
➡Costly
➡Nondeterministic
Regression Testing
118
“Regression testing is re-
running functional and non-
functional tests to ensure that
previously developed and tested
software still performs after a
change.”
Anirban Basu 2015
Regression Testing
119
Yoo et al. 2013
Regression Testing
120
Yoo et al. 2013
Selection
Regression Testing
121
Yoo et al. 2013
Selection
Prioritization
Regression Testing
122
Yoo et al. 2013
Minimization
Selection
Prioritization
Regression Testing
123
Minimization
Selection
Prioritization
Test Selection
124
Test Selection
125
road_points=[(x,y,z),…]
Test Selection
126
Test Selection
127
Test Selection for Self-driving Cars
128
How does a test
look like?
Test Selection for Self-driving Cars
129
How does a test
look like?
Test Selection for Self-driving Cars
130
How does a test
look like?
Test Selection for Self-driving Cars
131
Test Selection for Self-driving Cars
132
I can keep the
lane!
Test Selection for Self-driving Cars
133
Uuups… I am not
that good!
Test Selection for Self-driving Cars
134
Testing Costs
135
Simulation Time
# Tests
0% 25% 50% 75% 100%
Passing Tests Failing Tests
200 s 137 s
Test Selection
136
Birchler et al. SANER 2022
Test Selection
137
Birchler et al. SANER 2022
Test Selection
138
https://github.com/
ChristianBirchler/sdc-
scissor
GitHub
coSt-effeCtIve teSt SelectOR
cost-effective test selector
Scissor
SDC-Scissor
➡Free academic license!
SDC-Scissor Use Case
139
Cost-effectiveness
140
Speed up compared to random selection baseline
Dataset 2, Logistic
Dataset 1, Naïve Bayes
Dataset 1, Logistic
0% 42.5% 85% 127.5% 170%
Finding 1:
SDC-Scissor e
ff
ectively speeds
up the testing by 170%.
Finding 2:
Logistic and Naïve Bayes
classi
fi
ers save the most time.
Join the SDC-Scissor Community
141
sdc-scissor.readthedocs.io
Regression Testing
142
Minimization
Selection
Prioritization
Regression Testing
143
Minimization
Selection
Prioritization
Test Prioritization
144
Test Prioritization
145
Test Prioritization
146
Birchler et al. 2022 ACM TOSEM
Genetic Algorithm for Test Prioritization
147
Genetic Algorithm for Test Prioritization
148
Genetic Algorithm for Test Prioritization
149
Road Features
150
Single-objective Test Prioritization
151
Multi-objective Test Prioritization
152
Cost-effectiveness
153
Cost-effectiveness
154
SO-SDC-Prioritizer
MO-SDC-Prioritizer
Greedy
Regression Testing
155
Minimization
Selection
Prioritization
Regression Testing
156
Minimization
Selection
Prioritization
Test Minimization
157
Test Minimization
158
Scenario Minimization
159
Scenario Minimization
160
Scenario Minimization
161
Scenario Minimization
162
Scenario Minimization
163
Regression Testing
164
Minimization
Selection
Prioritization
Yoo et al. 2013
Regression Testing
165
Minimization
Selection
Prioritization
Future Work
Regression Testing
166
Yoo et al. 2013
Minimization
Selection
Prioritization
• Context & Motivation:
• Cyber-physical Systems
• DevOps and Arti
fi
cial Intelligence
• Search-based Software Testing (SBST) Barriers:
• A success SE story
• Overcoming Barriers of SBST Adoption in Industry
• Cost-effective Testing for Self-Driving Cars (SDCs):
• Regression Testing
• Test Selection & Test Prioritization for SDCs
Summary
Test
Case
Selection
Initial
Tests
Search
Test
Execution
Variants
Generation
167
Thanks for the Attention!
• Any Questions?
“Testing with Fewer Resources:
Toward Adaptive Approaches for Cost-
e
ff
ective Test Generation and Selection”
June 22-24, 2022 - Córdoba, Spain
Christian Birchler
Zurich University of Applied Sciences
https://christianbirchler.github.io/
Sebastiano Panichella
Zurich University of Applied Sciences
https://spanichella.github.io/

More Related Content

Similar to Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective Test Generation and Selection

Similar to Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective Test Generation and Selection (20)

Cyber Physical Systems – Collaborating Systems of Systems
Cyber Physical Systems – Collaborating Systems of SystemsCyber Physical Systems – Collaborating Systems of Systems
Cyber Physical Systems – Collaborating Systems of Systems
 
Cyber Physical Systems – Collaborating Systems of Systems
Cyber Physical Systems – Collaborating Systems of SystemsCyber Physical Systems – Collaborating Systems of Systems
Cyber Physical Systems – Collaborating Systems of Systems
 
Cyber Physical System
Cyber Physical SystemCyber Physical System
Cyber Physical System
 
Cyber Physical System
Cyber Physical SystemCyber Physical System
Cyber Physical System
 
Opportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data AnalyticsOpportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data Analytics
 
Opportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data AnalyticsOpportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data Analytics
 
Engineering Large Scale Cyber-Physical Systems
Engineering Large Scale Cyber-Physical SystemsEngineering Large Scale Cyber-Physical Systems
Engineering Large Scale Cyber-Physical Systems
 
Se research update
Se research updateSe research update
Se research update
 
Engineering Large Scale Cyber-Physical Systems
Engineering Large Scale Cyber-Physical SystemsEngineering Large Scale Cyber-Physical Systems
Engineering Large Scale Cyber-Physical Systems
 
Se research update
Se research updateSe research update
Se research update
 
A review: Artificial intelligence and expert systems for cyber security
A review: Artificial intelligence and expert systems for cyber securityA review: Artificial intelligence and expert systems for cyber security
A review: Artificial intelligence and expert systems for cyber security
 
A review: Artificial intelligence and expert systems for cyber security
A review: Artificial intelligence and expert systems for cyber securityA review: Artificial intelligence and expert systems for cyber security
A review: Artificial intelligence and expert systems for cyber security
 
Different applications and security concerns in Iot by Jatin Akad
Different applications and security concerns in Iot by Jatin AkadDifferent applications and security concerns in Iot by Jatin Akad
Different applications and security concerns in Iot by Jatin Akad
 
Different applications and security concerns in Iot by Jatin Akad
Different applications and security concerns in Iot by Jatin AkadDifferent applications and security concerns in Iot by Jatin Akad
Different applications and security concerns in Iot by Jatin Akad
 
Trends and innovations in Embedded System Education
Trends and innovations in Embedded System EducationTrends and innovations in Embedded System Education
Trends and innovations in Embedded System Education
 
Trends and innovations in Embedded System Education
Trends and innovations in Embedded System EducationTrends and innovations in Embedded System Education
Trends and innovations in Embedded System Education
 
Internet of things_by_economides_keynote_speech_at_ccit2014_final
Internet of things_by_economides_keynote_speech_at_ccit2014_finalInternet of things_by_economides_keynote_speech_at_ccit2014_final
Internet of things_by_economides_keynote_speech_at_ccit2014_final
 
Internet of things_by_economides_keynote_speech_at_ccit2014_final
Internet of things_by_economides_keynote_speech_at_ccit2014_finalInternet of things_by_economides_keynote_speech_at_ccit2014_final
Internet of things_by_economides_keynote_speech_at_ccit2014_final
 
Digital Engineering A Short Overview
Digital Engineering A Short OverviewDigital Engineering A Short Overview
Digital Engineering A Short Overview
 
Digital Engineering A Short Overview
Digital Engineering A Short OverviewDigital Engineering A Short Overview
Digital Engineering A Short Overview
 

More from Sebastiano Panichella

Search-based Software Testing (SBST) '22
Search-based Software Testing (SBST) '22Search-based Software Testing (SBST) '22
Search-based Software Testing (SBST) '22
Sebastiano Panichella
 
NLBSE’22: Tool Competition
NLBSE’22: Tool CompetitionNLBSE’22: Tool Competition
NLBSE’22: Tool Competition
Sebastiano Panichella
 

More from Sebastiano Panichella (20)

The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation Track
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation TrackSBFT Tool Competition 2024 - CPS-UAV Test Case Generation Track
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation Track
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
COSMOS: DevOps for Complex Cyber-physical Systems
COSMOS: DevOps for Complex Cyber-physical SystemsCOSMOS: DevOps for Complex Cyber-physical Systems
COSMOS: DevOps for Complex Cyber-physical Systems
 
Automated Identification and Qualitative Characterization of Safety Concerns ...
Automated Identification and Qualitative Characterization of Safety Concerns ...Automated Identification and Qualitative Characterization of Safety Concerns ...
Automated Identification and Qualitative Characterization of Safety Concerns ...
 
The 2nd Intl. Workshop on NL-based Software Engineering
The 2nd Intl. Workshop on NL-based Software EngineeringThe 2nd Intl. Workshop on NL-based Software Engineering
The 2nd Intl. Workshop on NL-based Software Engineering
 
The 16th Intl. Workshop on Search-Based and Fuzz Testing
The 16th Intl. Workshop on Search-Based and Fuzz TestingThe 16th Intl. Workshop on Search-Based and Fuzz Testing
The 16th Intl. Workshop on Search-Based and Fuzz Testing
 
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...
 
Exposed! A case study on the vulnerability-proneness of Google Play Apps
Exposed! A case study on the vulnerability-proneness of Google Play AppsExposed! A case study on the vulnerability-proneness of Google Play Apps
Exposed! A case study on the vulnerability-proneness of Google Play Apps
 
Search-based Software Testing (SBST) '22
Search-based Software Testing (SBST) '22Search-based Software Testing (SBST) '22
Search-based Software Testing (SBST) '22
 
NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22NL-based Software Engineering (NLBSE) '22
NL-based Software Engineering (NLBSE) '22
 
NLBSE’22: Tool Competition
NLBSE’22: Tool CompetitionNLBSE’22: Tool Competition
NLBSE’22: Tool Competition
 
"An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.
 "An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.  "An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.
"An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021.
 
An Empirical Investigation of Relevant Changes and Automation Needs in Modern...
An Empirical Investigation of Relevant Changes and Automation Needs in Modern...An Empirical Investigation of Relevant Changes and Automation Needs in Modern...
An Empirical Investigation of Relevant Changes and Automation Needs in Modern...
 
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
Search-Based Software Testing Tool Competition 2021 by Sebastiano Panichella,...
 
A Framework for Multi-source Studies based on Unstructured Data.
A Framework for Multi-source Studies based on Unstructured Data.A Framework for Multi-source Studies based on Unstructured Data.
A Framework for Multi-source Studies based on Unstructured Data.
 
Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfall...
Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfall...Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfall...
Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfall...
 
Requirements-Collector: Automating Requirements Specification from Elicitatio...
Requirements-Collector: Automating Requirements Specification from Elicitatio...Requirements-Collector: Automating Requirements Specification from Elicitatio...
Requirements-Collector: Automating Requirements Specification from Elicitatio...
 

Recently uploaded

Recently uploaded (10)

Deciding The Topic of our Magazine.pptx.
Deciding The Topic of our Magazine.pptx.Deciding The Topic of our Magazine.pptx.
Deciding The Topic of our Magazine.pptx.
 
OC Streetcar Final Presentation-Downtown Santa Ana
OC Streetcar Final Presentation-Downtown Santa AnaOC Streetcar Final Presentation-Downtown Santa Ana
OC Streetcar Final Presentation-Downtown Santa Ana
 
The Influence and Evolution of Mogul Press in Contemporary Public Relations.docx
The Influence and Evolution of Mogul Press in Contemporary Public Relations.docxThe Influence and Evolution of Mogul Press in Contemporary Public Relations.docx
The Influence and Evolution of Mogul Press in Contemporary Public Relations.docx
 
Understanding Poverty: A Community Questionnaire
Understanding Poverty: A Community QuestionnaireUnderstanding Poverty: A Community Questionnaire
Understanding Poverty: A Community Questionnaire
 
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdfOracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
 
Breathing in New Life_ Part 3 05 22 2024.pptx
Breathing in New Life_ Part 3 05 22 2024.pptxBreathing in New Life_ Part 3 05 22 2024.pptx
Breathing in New Life_ Part 3 05 22 2024.pptx
 
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdfMicrosoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
 
DAY 0 8 A Revelation 05-19-2024 PPT.pptx
DAY 0 8 A Revelation 05-19-2024 PPT.pptxDAY 0 8 A Revelation 05-19-2024 PPT.pptx
DAY 0 8 A Revelation 05-19-2024 PPT.pptx
 
ServiceNow CIS-Discovery Exam Dumps 2024
ServiceNow CIS-Discovery Exam Dumps 2024ServiceNow CIS-Discovery Exam Dumps 2024
ServiceNow CIS-Discovery Exam Dumps 2024
 
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdfACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
 

Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective Test Generation and Selection

  • 1. “Testing with Fewer Resources: Toward Adaptive Approaches for Cost-e ff ective Test Generation and Selection” June 22-24, 2022 - Córdoba, Spain Sebastiano Panichella Zurich University of Applied Sciences https://spanichella.github.io/ Christian Birchler Zurich University of Applied Sciences https://christianbirchler.github.io/ International Summer School on Search- and Machine Learning-based Software Engineering
  • 2. 2 About the Speakers (Christian) https://christianbirchler.github.io/
  • 3. 2015 - Physics 2017 - Computer Science - Software Systems - Data Science 2021 - Research Assistant - Testing Self-driving Cars 3 About the Speakers (Christian)
  • 4. University of Sannio PhD Student June 2014 University of Salerno Master Student December 2010 University of Zurich Research Associate October 2014 - August 2018 Zurich University of Applied Science Senior Computer Science Researcher Since August 2018 4 2010 2014 2018 Today About the Speakers (Sebastiano)
  • 5. 5 Program Comprehension & Maintenance (PC&M) Generation of source code documentation Pro fi ling developers Dependencies analysis in software ecosystems Mobile computing (MC): - Machine Learning & Genetic Algorithms SummarizationTechniques for Code, Change, Testing and User Feedback PhD thesis: “Supporting Newcomers in Software Development Projects" Approved Project Approved Project Approved Project Development & Testing challenges: - Test case generation and assessment - User Feedback Analysis - Continuous Delivery CD anti-patterns Branch Coverage Prediction Documentation defects detection “Complex Systems” 2010 2014 2018 Today About the Speakers (Sebastiano)
  • 6. Outline • Context & Motivation: • Cyber-physical Systems • DevOps and Arti fi cial Intelligence • Search-based Software Testing (SBST) Barriers: • A success SE story • Overcoming Barriers of SBST Adoption in Industry • Cost-effective Testing for Self-Driving Cars (SDCs): • Regression Testing • Test Selection & Test Prioritization for SDCs Test Case Selection Initial Tests Search Test Execution Variants Generation 6
  • 7. Outline • Context & Motivation: • Cyber-physical Systems • DevOps and Arti fi cial Intelligence • Search-based Software Testing (SBST) Barriers: • A success SE story • Overcoming Barriers of SBST Adoption in Industry • Cost-effective Testing for Self-Driving Cars (SDCs): • Regression Testing • Test Selection & Test Prioritization for SDCs Test Case Selection Initial Tests Search Test Execution Variants Generation 7
  • 8. Context “Our main research goal is to conduct industrial research, involving both industrial and academic collaborations, to sustain the Internet of Things (IoT) vision of future "smart cities”, with millions of smart systems connected over the internet, and/or controlled by complex embedded software implemented for the cloud." 8
  • 9. Context “Our main research goal is to conduct industrial research, involving both industrial and academic collaborations, to sustain the Internet of Things (IoT) vision of future "smart cities”, with millions of smart systems connected over the internet, and/or controlled by complex embedded software implemented for the cloud." 9
  • 10. 2) Artificial Intelligence (AI) 3) DevOps, IoT, Automated Testing (AT) 1) Cyber-physical Systems Next 10-15 Years (and beyond) Context “Our main research goal is to conduct industrial research, involving both industrial and academic collaborations, to sustain the Internet of Things (IoT) vision of future "smart cities”, with millions of smart systems connected over the internet, and/or controlled by complex embedded software implemented for the cloud." 10
  • 11. Sebastiano Panichella Sajad Khatiri Christian Birchler COSMOS: DevOps for Complex Cyber-physical Systems https://www.cosmos-devops.org/ https://twitter.com/COSMOS_DEVOPS
  • 12. “Emerging Cyber-physical Systems (CPS) will play a crucial role in the quality of life of European citizens and the future of the European economy” COSMOS Context • CPS relevant sectors: • Healthcare • Automotive • Water Monitoring • Railway • Manufacturing • Avionics • etc. MEDICAL DELIVERY FOOD DELIVERY • Avionics 12
  • 13. Industrial Use Cases AVIATION E-HEALTH WATER MONITORING SATELLITES AUTOMOTIVE RAILWAYS DRONES SELF-DRIVING CARS Reference Use Cases 13 COSMOS Use Cases
  • 15. Background First aerodynamic fl ight on another planet. Landed with Perseverance rover on 18 February 2021 SPACE EXPLORATION
  • 16. • - • Our (Software Engineering) view of DevOps and AI for IoT systems: • DevOps and Continuous Delivery (CD): Whats is it? • Present, Challenges, and Opportunities • Relevant Research Questions • Arti fi cial Intelligence (AI) and Testing Automation: • Present, Challenges, and Opportunities • User-oriented Testing Automation • Relevant Research Questions “We all recognize the relevance and capacity of contemporary cyber- physical systems for building the future of our society, but ongoing research in the fi eld is also clearly failing in making the right countermeasures to avoid that CPS usage a ff ects human being safety”. In “Self-driving Uber kills Arizona woman in first fatal crash involving pedestrian” Problem Statement “A simple software update was the direct cause of the fatal crashes of the Boeing 737” 16
  • 17. Question: What are the main Challenges of Testing Cyber-physical Systems? 17 Answers from the Audience (1)
  • 18. Question: What are the main Challenges of Testing Cyber-physical Systems? 18 Answers from the Audience (2)
  • 19. • - • Our (Software Engineering) view of DevOps and AI for IoT systems: • DevOps and Continuous Delivery (CD): Whats is it? • Present, Challenges, and Opportunities • Relevant Research Questions • Arti fi cial Intelligence (AI) and Testing Automation: • Present, Challenges, and Opportunities • User-oriented Testing Automation • Relevant Research Questions “Self-driving Uber kills Arizona woman in first fatal crash involving pedestrian” “Swiss Post drone crashes in Zurich Challenges “A simple software update was the direct cause of the fatal crashes of the Boeing 737” Challenge 1: Observability, testability, and predictability of the behavior of emerging CPS is highly limited and, unfortunately, their usage in the real world can lead to fatal crashes sometimes tragically involving also humans 19
  • 20. Research Challenges and Opportunities As reported by National Academies: [“A 21st Century Cyber-Physical Systems Education”] “today's practice of IoT system design and implementation are often unable to support the level of ``complexity, scalability, security, safety, […] required to meet future needs” 20
  • 21. Research Challenges and Opportunities As reported by National Academies: [“A 21st Century Cyber-Physical Systems Education”] “The main problem is that contemporary development methodologies for CPS need to incorporate core aspects of both systems and software engineering communities, with the goal to explicitly embrace and consider the several direct and indirect physical effects of software” [“Complexity challenges in development of cyber-physical systems”] (Martin Törngren, Ulf Sellgren Pages 478-503) “today's practice of IoT system design and implementation are often unable to support the level of ``complexity, scalability, security, safety, […] required to meet future needs” 21 Crash of Boeing 737
  • 22. Research Challenges and Opportunities As reported by National Academies: [“A 21st Century Cyber-Physical Systems Education”] [“Complexity challenges in development of cyber-physical systems”] (Martin Törngren, Ulf Sellgren Pages 478-503) [“Complexity challenges in development of cyber-physical systems”] (Martin Törngren, Ulf Sellgren Pages 478-503) “As identi fi ed by agile methodologies, the development of modern/emerging systems (e.g., e-health, automotive, satellite, and IoT manufacturing systems) should evolve with the systems, ``as development never ends'' “today's practice of IoT system design and implementation are often unable to support the level of ``complexity, scalability, security, safety, […] required to meet future needs” “The main problem is that contemporary development methodologies for CPS need to incorporate core aspects of both systems and software engineering communities, with the goal to explicitly embrace and consider the several direct and indirect physical effects of software” 22 Crash of Boeing 737 Tools
  • 23. Research Challenges and Opportunities As reported by National Academies: [“A 21st Century Cyber-Physical Systems Education”] [“Complexity challenges in development of cyber-physical systems”] (Martin Törngren, Ulf Sellgren Pages 478-503) [“Complexity challenges in development of cyber-physical systems”] (Martin Törngren, Ulf Sellgren Pages 478-503) These concepts are closely related to DevOps and Arti fi cial Intelligence technologies, and several researchers and practitioners advocate them as a promising solutions for the development, maintenance, testing, and evolution of these complex systems “today's practice of IoT system design and implementation are often unable to support the level of ``complexity, scalability, security, safety, […] required to meet future needs” “The main problem is that contemporary development methodologies for CPS need to incorporate core aspects of both systems and software engineering communities, with the goal to explicitly embrace and consider the several direct and indirect physical effects of software” “As identi fi ed by agile methodologies, the development of modern/emerging systems (e.g., e-health, automotive, satellite, and IoT manufacturing systems) should evolve with the systems, ``as development never ends'' 23 Crash of Boeing 737 Tools
  • 24. Research Challenges and Opportunities Challenge 1: Observability, testability, and predictability of the behavior of emerging CPS is highly limited and, unfortunately, their usage in the real world can lead to fatal crashes sometimes tragically involving also humans These concepts are closely related to DevOps and Arti fi cial Intelligence technologies, and several researchers and practitioners advocate them as a promising solutions for the development, maintenance, testing, and evolution of these complex systems Challenge 2: Contemporary DevOps and AI practices and tools are potentially the right solution to this problem, but they are not developed to be applied in CPS domains 24
  • 25. • Context & Motivation: • Cyber-physical Systems • DevOps and Arti fi cial Intelligence • Search-based Software Testing (SBST) Barriers: • A success SE story • Overcoming Barriers of SBST Adoption in Industry • Cost-effective Testing for Self-Driving Cars (SDCs): • Regression Testing • Test Selection & Test Prioritization for SDCs Outline Test Case Selection Initial Tests Search Test Execution Variants Generation 25
  • 26. Traditional DevOps Pipeline ADSs • Generate Diversi fi ed Test Inputs (or Scenarios) • Evaluation based Failures Detection V.S. Manual Automated Testing (SBST) 26
  • 27. Search-Based Software Testing (SBST) “The initial population” is a set of randomly generated test cases. Selection Crossover Mutation End? YES NO Initial Population 27 (Fitness Function) We need to select the best “fittest” test case for reproduction Single-Point Crossover Mutation: randomly changes some genes (elements within each chromosome). Mutation probability: each statement is mutated with prob=1/n where n=#statements Mutation 27
  • 28. Mutation Initial Population Selection Crossover Mutation End? YES NO class Triangle { void computeTriangleType() { if (isTriangle()){ if (side1 == side2) { if (side2 == side3) type = "EQUILATERAL"; else type = "ISOSCELES"; } else { if (side1 == side3) { type = "ISOSCELES"; } else { if (side2 == side3) type = “ISOSCELES”; else checkRightAngle(); } } }// if isTriangle() }} 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Goal Class Under Test Iterate until line 9 is covered or the search budget (running time or #iterations) is consumed 28 Search-Based Software Testing (SBST) 28
  • 29. Mutation class Triangle { void computeTriangleType() { if (isTriangle()){ if (side1 == side2) { if (side2 == side3) type = "EQUILATERAL"; else type = "ISOSCELES"; } else { if (side1 == side3) { type = "ISOSCELES"; } else { if (side2 == side3) type = “ISOSCELES”; else checkRightAngle(); } } }// if isTriangle() }} 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Class Under Test 29 1 2 5 6 7 3 9 8 10 4 Control Flow Graph Goal: Covering as many code elements as possible Branch coverage Targets = {<1,5>, <1,2>,<5,6>, <5,7>, <2,3>, <2,4>, <6,10>, <7,8>, <7,9>, <3,10>, <4,10>, <8,10>, <9,10>} Statement coverage Targets = {1, 2, 3,4 ,5 ,6, 7, 8, 9,10} Path coverage Targets = {<1,5,6,10>, <1,5,7,8,10>, <1,5,7,9,10>, <1,2,3,10>, <1,2,4,10>} Search-Based Software Testing (SBST) 29
  • 30. http://www.evosuite.org • Command Line • Eclipse Plugin • Maven Plugin • Measure Code Coverage •… 30 Successful SBST Stories
  • 31. Successful SBST Stories Generated Tests Production Code 31 https://github.com/EvoSuite/evosuite
  • 32. https://arstechnica.com/information-technology/2017/08/facebook-dynamic-analysis-software-sapienz/ “The Sapienz Project at Facebook” Successful SBST Stories Sapienz in action: https://youtu.be/j3eV8NiWLg4 32
  • 33. What are the main Barriers of SBST tools Adoption in Real Society (e.g., in industrial setting)? Question “Manual Testing is still Dominant in Industry…” 33 Answers from the Audience
  • 34. SBST Barrier to Practical Adoption: Test Code Comprehension Are Generated Tests Helpful? Modeling Readability to Improve Unit Tests Ermira Daka, José Campos, and Gordon Fraser University of Sheffield Sheffield, UK Jonathan Dorn and Westley Weimer University of Virginia Virginia, USA ABSTRACT Writing good unit tests can be tedious and error prone, but even once they are written, the job is not done: Developers need to reason about unit tests throughout software development and evolution, in order to diagnose test failures, maintain the tests, and to understand code written by other developers. Unreadable tests are more dif- ficult to maintain and lose some of their value to developers. To overcome this problem, we propose a domain-specific model of unit test readability based on human judgements, and use this model to augment automated unit test generation. The resulting approach can automatically generate test suites with both high coverage and also improved readability. In human studies users prefer our improved tests and are able to answer maintenance questions about them 14% more quickly at the same level of accuracy. Categories and Subject Descriptors. D.2.5 [Software Engineer- ing]: Testing and Debugging – Testing Tools; Keywords. Readability, unit testing, automated test generation 1. INTRODUCTION Unit testing is a popular technique in object oriented program- ming, where efficient automation frameworks such as JUnit allow unit tests to be defined and executed conveniently. However, pro- ducing good tests is a tedious and error prone task, and over their lifetime, these tests often need to be read and understood by different people. Developers use their own tests to guide their implemen- tation activities, receive tests from automated unit test generation tools to improve their test suites, and rely on the tests written by developers of other code. Any test failures require fixing either the software or the failing test, and any passing test may be consulted by developers as documentation and usage example for the code under test. Test comprehension is a manual activity that requires one to understand the behavior represented by a test — a task that may not be easy if the test was written a week ago, difficult if it was written by a different person, and challenging if the test was generated automatically. How difficult it is to understand a unit test depends on many factors. Unit tests for object-oriented languages typically consist of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. ElementName elementName0 = new ElementName("", ""); Class<Object> class0 = Object.class; VirtualHandler virtualHandler0 = new VirtualHandler( elementName0, (Class) class0); Object object0 = new Object(); RootHandler rootHandler0 = new RootHandler((ObjectHandler ) virtualHandler0, object0); ObjectHandlerAdapter objectHandlerAdapter0 = new ObjectHandlerAdapter((ObjectHandlerInterface) rootHandler0); assertEquals("ObjectHandlerAdapter", objectHandlerAdapter0.getName()); ObjectHandlerAdapter objectHandlerAdapter0 = new ObjectHandlerAdapter((ObjectHandlerInterface) null); assertEquals("ObjectHandlerAdapter", objectHandlerAdapter0.getName()); Figure 1: Two versions of a test that exercise the same functionality but have a different appearance and readability. sequences of calls to instantiate various objects, bring them to appro- priate states, and create interactions between them. The particular choice of sequence of calls and values can have a large impact on the resulting test. For example, consider the pair of unit tests shown in Figure 1. Both tests exercise the same functionality with respect to the constructor of the class ObjectHandlerAdaptor in the Xi- neo open source project (which treats null and rootHandler0 arguments identically). Despite this identical coverage of the subject class in practice, they are quite different in presentation. In terms of concrete features that may affect comprehension, the first test is longer, uses more different classes, defines more variables, has more parentheses, has longer lines. The visual appearance of code in general is referred to as its readability — if code is not readable, intuitively it will be more difficult to perform any tasks that require understanding it. Despite significant interest from managers and developers [8], a general understanding of software readability remains elusive. For source code, Buse and Weimer [7] applied machine learning on a dataset of code snippets with human annotated ratings of readability, allowing them to predict whether code snippets are considered readable or not. Although unit tests are also just code in principle, they use a much more restricted set of language features; for example, unit tests usually do not contain conditional or looping statements. Therefore, a general code readability metric may not be well suited for unit tests. In this paper, we address this problem by designing a domain- specific model of readability based on human judgements that ap- plies to object oriented unit test cases. To support developers in deriving readable unit tests, we use this model in an automated ap- proach to improve the readability of unit tests, and integrate this into an automated unit test generation tool. In detail, the contributions of this paper are as follows: • An analysis of the syntactic features of unit tests and their Does Automated White-Box Test Generation Really Help Software Testers? Gordon Fraser1 Matt Staats2 Phil McMinn1 Andrea Arcuri3 Frank Padberg4 1 Department of 2 Division of Web Science 3 Simula Research 4 Karlsruhe Institute of Computer Science, and Technology, Laboratory, Technology, University of Sheffield, UK KAIST, South Korea Norway Karlsruhe, Germany ABSTRACT Automated test generation techniques can efficiently produce test data that systematically cover structural aspects of a program. In the absence of a specification, a common assumption is that these tests relieve a developer of most of the work, as the act of testing is reduced to checking the results of the tests. Although this as- sumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the fact that the approach has only seen a limited uptake in industry suggests the contrary, and calls into question its practical usefulness. To investigate this issue, we performed a controlled experiment comparing a total of 49 sub- jects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, EVOSUITE. We found that, on one hand, tool support leads to clear improvements in com- monly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research commu- nity evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners. Categories and Subject Descriptors. D.2.5 [Software Engineer- ing]: Testing and Debugging – Testing Tools; General Terms. Algorithms, Experimentation, Reliability, Theory Keywords. Unit testing, automated test generation, branch cover- age, empirical software engineering 1. INTRODUCTION Controlled empirical studies involving human subjects are not common in software engineering. A recent survey by Sjoberg et al. [28] showed that out of 5,453 analyzed software engineering articles, only 1.9% included a controlled study with human sub- jects. For software testing, several novel techniques and tools have been developed to automate and solve different kinds of problems and tasks—however, they have, in general, only been evaluated us- ing surrogate measures (e.g., code coverage), and not with human testers—leaving unanswered the more directly relevant question: Does technique X really help software testers? This paper addresses this question in the context of automated white-box test generation, a research area that has received much attention of late (e.g., [8, 12, 18, 31, 32]). When using white-box test generation, a developer need not manually write the entire test suite, and can instead automatically generate a set of test inputs that systematically exercise a program (for example, by covering all branches), and only need check that the outputs for the test in- puts match those expected. Although the benefits for the developer seem obvious, there is little evidence that it is effective for practical software development. Manual testing is still dominant in industry, and research tools are commonly evaluated in terms of code cover- age achieved and other automatically measurable metrics that can be applied without the involvement of actual end-users. In order to determine if automated test generation is really help- ful for software testing in a scenario without automated oracles, we performed a controlled experiment involving 49 human subjects. Subjects were given one of three Java classes containing seeded faults and were asked to construct a JUnit test suite either manu- ally, or with the assistance of the automated white-box test genera- tion tool EVOSUITE [8]. EVOSUITE automatically produces JUnit test suites that target branch coverage, and these unit tests contain assertions that reflect the current behaviour of the class [10]. Con- sequently, if the current behaviour is faulty, the assertions reflecting the incorrect behaviour must be corrected. The performance of the subjects was measured in terms of coverage, seeded faults found, mutation score, and erroneous tests produced. Our study yields three key results: 1. The experiment results confirm that tools for automated test generation are effective at what they are designed to do— producing test suites with high code coverage—when com- pared with those constructed by humans. 2. The study does not confirm that using automated tools de- signed for high coverage actually helps in finding faults. In our experiments, subjects using EVOSUITE found the same number of faults as manual testers, and during subsequent mutation analysis, test suites did not always have higher mu- tation scores. 3. Investigating how test suites evolve over the course of a test- ing session revealed that there is a need to re-think test gen- Developers spend up to 50% of their time in understanding and analyzing the output of automatic tools. Fraser et al. “Professional developers perceive generated test cases as hard to understand.” Dana et al. 34
  • 35. SBST Barrier to Practical Adoption: Test Code Comprehension Why? Class Name: Option.java Library: Apache Commons-Cli Q1: What are the main differences? Generated Tests Q2: Do they cover different parts of the code? 35
  • 36. SBST Barrier to Practical Adoption: Test Code Comprehension Why? Class Name: Option.java Library: Apache Commons-Cli Q1: What are the main differences? Generated Tests Q2: Do they cover different parts of the code? Candidate Assertions Q3: Are these assertions correct? Earl T. Barr, et al., “The Oracle Problem in Software Testing: A Survey”.IEEE Transactions on Software Engineering, 2015. 36
  • 37. SBST Barrier to Practical Adoption: Test Code Comprehension Why? Class Name: Option.java Library: Apache Commons-Cli Q1: What are the main differences? Generated Tests Q2: Do they cover different parts of the code? Q3: Are these assertions correct? Earl T. Barr, et al., “The Oracle Problem in Software Testing: A Survey”.IEEE Transactions on Software Engineering, 2015. 37
  • 38. SBST Barrier to Practical Adoption: Test Code Comprehension Are Generated Tests Helpful? G. Fraser et al., Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study, TOSEM 2015. 38 Automatically generated tests do not improve the ability of developers to detect faults when compared to manual testing.
  • 39. 39 ? SBST Barrier to Practical Adoption: Addressing Test Code Comprehension Test Case How to Generate Test Case Summary? Panichella et al. “The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation”. ICSE 2016
  • 40. How to Generate Test Case Summary? SBST Barrier to Practical Adoption: Addressing Test Code Comprehension Panichella et al. “The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation”. ICSE 2016 40 ? Generated Unit Test … with Descriptions 40
  • 41. How to Generate Test Case Summary? SBST Barrier to Practical Adoption: Addressing Test Code Comprehension ? Panichella et al. “The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation”. ICSE 2016 41 http://textcompactor.com/ 35 Intuition 41
  • 42. Summary Generator Software Words Usage Model: deriving <actions>, <themes>, and <secondary arguments> from class, methods, attributes and variable identifiers E. Hill et al. Automatically capturing source code context of NL-queries for software maintenance and reuse. ICSE 2009 42
  • 43. Summary Generator SWUM in TestDescriber: Covered Code 43 43
  • 44. Summary Generator SWUM in TestDescriber: 1) Select the covered statements Covered Code 44 44
  • 45. SWUM in TestDescriber: 1) Select the covered statements 2) Filter out Java keywords, etc. Summary Generator Covered Code 45 45
  • 46. SWUM in TestDescriber: 1) Select the covered statements 2) Filter out Java keywords, etc. 3) Identifier Splitting (Camel case) Summary Generator Covered Code 46 46
  • 47. SWUM in TestDescriber: 1) Select the covered statements 2) Filter out Java keywords, etc. 3) Identifier Splitting (Camel case) 4) Abbreviation Expansion (using external vocabularies) Summary Generator Covered Code 47 47
  • 48. SWUM in TestDescriber: 1) Select the covered statements 2) Filter out Java keywords, etc. 3) Identifier Splitting (Camel case) 4) Abbreviation Expansion (using external vocabularies) 5) Part-of-Speech tagger Summary Generator <actions> = Verbs <themes> = Nouns/Subjects <secondary arguments> = Nouns / objectes, adjectives, etc NOUN NOUN NOUN ADJ NOUN NOUN VERB NOUN NOUN NOUN NOUN VERB NOUN NOUN ADJ ADJ ADJ ADJ NOUN NOUN NOUN VERB ADJ NOUN CON NOUN ADJ Covered Code 48 48
  • 49. Summary Generator NOUN NOUN NOUN ADJ NOUN NOUN VERB NOUN NOUN NOUN NOUN VERB NOUN NOUN ADJ ADJ ADJ ADJ NOUN NOUN NOUN VERB ADJ NOUN CON NOUN The test case instantiates an "Option" with: - option equal to “...” - long option equal to “...” - it has no argument - description equal to “…” An option validator validates it The test exercises the following condition: - "Option" has no argument NOUN NOUN NOUN ADJ NOUN NOUN VERB NOUN NOUN NOUN NOUN VERB NOUN NOUN ADJ ADJ ADJ ADJ NOUN NOUN NOUN VERB ADJ NOUN CON NOUN ADJ Natural Language Sentences Parsed Code 49 49
  • 50. The test case instantiates an "Option" with: - option equal to “...” - long option equal to “...” - it has no argument - description equal to “…” An option validator validates it The test exercises the following condition: - "Option" has no argument Natural Language Sentences Class Level Method Level Statement Level Branch Level Summarisation Levels 50 50
  • 51. The test case instantiates an "Option" with: - option equal to “...” - long option equal to “...” - it has no argument - description equal to “…” An option validator validates it The test exercises the following condition: - "Option" has no argument Natural Language Sentences 51 51 Summary Generator
  • 52. Summary Generator The test case instantiates an "Option" with: - option equal to “...” - long option equal to “...” - it has no argument - description equal to “…” An option validator validates it The test exercises the following condition: - "Option" has no argument Natural Language Sentences 52 52 Q2: Do Test Summaries Improve Test Readability? Q1: Do Test Summaries Help Developers find more bugs?
  • 53. Case Study Bug Fixing Tasks Involving 30 Developers 53 53
  • 54. Subjects: 30 Developers (23 Researchers and 7 Developers) Context Object: two Java classes from Apache Commons Primitives and Math4J that have been used in previous studies on search-based software testing [by Fraser et al. TOSEM 2015] ArrayIntList.java Rational.java
  • 55. Bug Fixing Tasks Group 1 Group 2 ArrayIntList.java Rational.java ArrayIntList.java Rational.java 55
  • 56. Bug Fixing Tasks Group 1 Group 2 ArrayIntList.java Rational.java ArrayIntList.java Rational.java 56
  • 57. Bug Fixing Tasks Group 1 Group 2 ArrayIntList.java Rational.java ArrayIntList.java Rational.java 57
  • 58. Bug Fixing Tasks Group 1 Group 2 ArrayIntList.java Rational.java ArrayIntList.java Rational.java Comments Comments TestDescriber 58
  • 59. Bug Fixing Tasks Experiment conducted Offline via a Survey platform Each participant received the experiment package consisting of: 1. A pretest questionnaire 2. Instructions and materials to perform the experiment 3. A post-test questionnaire We did not reveal the goal of the study 45 minutes of time for each task 59
  • 60. Q1: How do test case summaries impact the number of bugs fixed by developers? 60 Comments
  • 61. Q1: How do test case summaries impact the number of bugs fixed by developers? 61 Comments Summary: Using automatically generated test case summaries significantly helps developers to identify and fix more bugs.
  • 62. WITH Summaries: (i) 46% of participants consider the test cases as “easy to understand”. (ii) Only 18% of participants considered the test cases as incomprehensible. Without With 4% 6% 14% 33% 14% 6% 32% 9% 36% 45% Medium High Very High Low Very Low Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries 62 Q2: Do Test Summaries Improve Test Readability? Comments
  • 63. WITHOUT Summaries: (i) Only 15% of participants consider the test cases as “easy to understand”. (iii) 40% of participants considered the test cases as incomprehensible. WITH Summaries: (i) 46% of participants consider the test cases as “easy to understand”. (ii) Only 18% of participants considered the test cases as incomprehensible. Without With 4% 6% 14% 33% 14% 6% 32% 9% 36% 45% Medium High Very High Low Very Low Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries 63 Q2: Do Test Summaries Improve Test Readability? Comments
  • 64. WITHOUT Summaries: (i) Only 15% of participants consider the test cases as “easy to understand”. (iii) 40% of participants considered the test cases as incomprehensible. WITH Summaries: (i) 46% of participants consider the test cases as “easy to understand”. (ii) Only 18% of participants considered the test cases as incomprehensible. Without With 4% 6% 14% 33% 14% 6% 32% 9% 36% 45% Medium High Very High Low Very Low Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries 64 Q2: Do Test Summaries Improve Test Readability? Comments Summary: Test summaries statistically improve the comprehensibility of automatically generated test case according to human judgments.
  • 65. 1) Using automatically generated test case summaries significantly helps developers to identify and fix more bugs. 2) Test summaries statistically improve the comprehensibility of automatically generated test case according to human judgments. Panichella et al. “The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation”. ICSE 2016 65 SBST Barrier to Practical Adoption: Addressing Test Code Comprehension
  • 66. 66 SBST Barrier to Practical Adoption: Addressing Test Code Comprehension Other Studies Addressing this Open Problem… Daka et al. Generating unit tests with descriptive names or: would you name your children thing1 and thing2? ISSTA 2017 Generating unit tests with descriptive names Panichella et al. Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities. ICSME 2020 Test Smells in Automatically Generated Tests
  • 67. SBST Barrier to Practical Adoption: Cost-effectiveness of Generated Tests Why? 67 class Triangle { int a, b, c; //sides String type = "NOT_TRIANGLE"; Triangle (int a, int b, int c){…} void computeTriangleType() { 1. if (a == b) { 2. if (b == c) 3. type = "EQUILATERAL"; else 4. type = "ISOSCELES"; } else { 5. if (a == c) { 6. type = "ISOSCELES"; } else { 7. if (b == c) 8. type = “ISOSCELES”; else 9. type = “SCALENE”; } } } Java Class Under Test (CUT) @Test public void test(){ Triangle t = new Triangle (1,2,3); t.computeTriangleType(); String type = t.getType(); assertTrue(type.equals(“SCALENE”)); } Test Case
  • 68. SBST Barrier to Practical Adoption: Cost-effectiveness of Generated Tests Why? 68 class Triangle { int a, b, c; //sides String type = "NOT_TRIANGLE"; Triangle (int a, int b, int c){…} void computeTriangleType() { 1. if (a == b) { 2. if (b == c) 3. type = "EQUILATERAL"; else 4. type = "ISOSCELES"; } else { 5. if (a == c) { 6. type = "ISOSCELES"; } else { 7. if (b == c) 8. type = “ISOSCELES”; else 9. type = “SCALENE”; } } } Java Class Under Test (CUT) @Test public void test(){ Triangle t = new Triangle (1,2,3); t.computeTriangleType(); String type = t.getType(); assertTrue(type.equals(“SCALENE”)); } Test Case Code Coverage: The main Quality Assessment Criteria
  • 69. SBST Barrier to Practical Adoption: Cost-effectiveness of Generated Tests Why? 69 class Triangle { int a, b, c; //sides String type = "NOT_TRIANGLE"; Triangle (int a, int b, int c){…} void computeTriangleType() { 1. if (a == b) { 2. if (b == c) 3. type = "EQUILATERAL"; else 4. type = "ISOSCELES"; } else { 5. if (a == c) { 6. type = "ISOSCELES"; } else { 7. if (b == c) 8. type = “ISOSCELES”; else 9. type = “SCALENE”; } } } Java Class Under Test (CUT) Software Quality Money Time Practical Constraints?
  • 70. Cost Effectiveness: example Class A Class B Class C Class D 70
  • 71. Test Generation Tool 1 TestClass A TestClass B Class A Class B Class C Class D Cost Effectiveness: example TestClass C TestClass D Test Generation Tool 2 TestClass A TestClass B TestClass C TestClass D BUG BUG 71
  • 72. Test Generation Tool 1 TestClass A TestClass B Cost Effectiveness: example TestClass C TestClass D Test Generation Tool 2 TestClass A TestClass B TestClass C TestClass D BUG BUG Coverage 67% Coverage 66.5% We need COST oriented models V.S. Manual Automated Testing 72
  • 73. Test Generation Tool 1 TestClass A TestClass B Cost Effectiveness: example TestClass C TestClass D Test Generation Tool 2 TestClass A TestClass B TestClass C TestClass D BUG BUG Coverage 67% Coverage 66.5% We need COST oriented models +20% V.S. Manual Automated Testing 73
  • 74. 74 SBST Barrier to Practical Adoption: Cost-effective Generation of Tests Automatically generating tests with appropriate performance (CPU, memory, etc.) when deployed in different environments Grano et al. “Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test Case Generation”. TSE 2019 Further Performance Indicators… + It uses indicators of Test Coverage… ?
  • 75. 75 Grano et al. “Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test Case Generation”. TSE 2019 Cached history information Kim at al. ICSE 2007 Change Metrics Moser at al. ICSE 2008. A metrics suite for object oriented design Chidamber at al. TSE 1994 Indicators of Complexity Cost-effective Generation of Tests
  • 76. 76 “We needed indicators (or proxies) of test performances (CPU, memory, etc.)..” Grano et al. “Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test Case Generation”. TSE 2019 Cached history information Kim at al. ICSE 2007 Change Metrics Moser at al. ICSE 2008. A metrics suite for object oriented design Chidamber at al. TSE 1994 Indicators of Complexity Cost-effective Generation of Tests
  • 77. pDynaMOSA (Adaptive Performance-Aware DynaMOSA), pDynaMOSA Pipeline 2) 1) First Criteria Second Criteria Yes No Yes No Coverage Performance SBST Barrier to Practical Adoption: Cost-effective Generated of Tests 77
  • 78. pDynaMOSA (Adaptive Performance-Aware DynaMOSA), pDynaMOSA Pipeline 2) 1) First Criteria Second Criteria Yes No Yes No Coverage Performance 78 SBST Barrier to Practical Adoption: Cost-effective Generation of Tests
  • 79. pDynaMOSA (Adaptive Performance-Aware DynaMOSA), pDynaMOSA Pipeline 2) 1) First Criteria Second Criteria Yes No Yes No Coverage Performance 79 SBST Barrier to Practical Adoption: Cost-effective Generation of Tests
  • 80. pDynaMOSA (Adaptive Performance-Aware DynaMOSA), pDynaMOSA Pipeline 2) 1) First Criteria Second Criteria Yes No Yes No Coverage Performance 80 SBST Barrier to Practical Adoption: Cost-effective Generation of Tests
  • 81. De Oliveira “Perphecy: Performance regression test selection made simple but effective,” (ICST 2017) I1. Number of executed loops (branches) (Higher loop cycle counts in fl uence the runtime of the test case). 2) 1) First Criteria Second Criteria Yes No Yes No Coverage Performance I2. Number of (test and code) method calls (Fewer method calls result in shorter runtimes and lower heap memory usage due to potentially fewer object instantiations.). I3. Number of object instantiations (not size) (reducing the number of instantiated objects may lead to decreased usage of heap memory - e.g., arrays dimension). I4. Number of executed (test and code) Statements (Statement execution frequency is a well-known proxy for runtime). I5. Test Length (LOC of test case) (Super set of I2 AND I4) A set of static (test) and dynamic (prod. code) performance proxies that provide an approximation of the test execution costs (i.e., runtime and memory usage). SBST Barrier to Practical Adoption: Cost-effective Generated of Tests Second Criteria
  • 82. De Oliveira “Perphecy: Performance regression test selection made simple but effective,” (ICST 2017) I1. Number of executed loops (branches) (Higher loop cycle counts in fl uence the runtime of the test case). I2. Number of (test and code) method calls (Fewer method calls result in shorter runtimes and lower heap memory usage due to potentially fewer object instantiations.). I3. Number of object instantiations (not size) (reducing the number of instantiated objects may lead to decreased usage of heap memory - e.g., arrays dimension). I4. Number of executed (test and code) Statements (Statement execution frequency is a well-known proxy for runtime). I5. Test Length (LOC of test case) (Super set of I2 AND I4) SBST Barrier to Practical Adoption: Cost-effective Generated of Tests Dataset from SBST Community G. Fraser et al. “A large-scale evaluation of automated unit test generation using evosuite,” ACM Transactions on Software Engineering and Methodology (TOSEM), Evaluation
  • 83. Q1. (Effectiveness) What is the target coverage achieved by pDynaMOSA compared to DynaMOSA? Small gain in terms of coverage… 83 SBST Barrier to Practical Adoption: Cost-effective Generated of Tests
  • 84. We may lose in terms of fault detection… Q2. (Fault Detection) What is the mutation score achieved by pDynaMOSA compared to DynaMOSA? Small gain in terms of coverage… 84 SBST Barrier to Practical Adoption: Cost-effective Generated of Tests
  • 85. Small gain in terms of coverage… Q3. (Performance) Does the adoption of performance proxies lead to shorter runtime and lower heap memory consumption? Huge bene fi ts in terms of runtime and memory consumption… “Statistically rigorous java performance evaluation,” OOPSLA ’07. 85 SBST Barrier to Practical Adoption: Cost-effective Generated of Tests We may lose in terms of fault detection…
  • 86. • Context & Motivation: • Cyber-physical Systems • DevOps and Arti fi cial Intelligence • Search-based Software Testing (SBST) Barriers: • A success SE story • Overcoming Barriers of SBST Adoption in Industry • Cost-effective Testing for Self-Driving Cars (SDCs): • Regression Testing • Test Selection & Test Prioritization for SDCs Outline Test Case Selection Initial Tests Search Test Execution Variants Generation 86
  • 87. Tesla Car Autonomous Driving Systems (ADSs) Multi-sensing Systems: • Autonomous systems capture surrounding environmental data at run-time via multiple sensors (e.g. camera, radar, lidar) as inputs • Processes these data with Deep Neural Networks (DNNs) and outputs control decisions (e.g. steering). • Requires robust testing that • creates realistic, diverse test cases 87
  • 88. Traf fi c Sign Recognition (TSR) Pedestrian Protection (PP) Lane Departure Warning (LDW) Automated Emergency Braking (AEB) Environmental Data Collection With ADSs Sensors 88
  • 89. . . . Driving Actions Sensors / Camera Autonomous Feature Actuator 89 Environmental Data Collection With ADSs Sensors 1. Pedestrians 2. Lane Position 4. Other Cars 3. Traf fi c Signs DNNs • steering • stop • acceleration/ deceleration • …
  • 90. ADSs 90 Traditional DevOps Pipeline ADSs “Manual Testing is still Dominant…”
  • 91. Testing Steps in ADSs 91 Requirements of Testing ADSs • Generate Diversi fi ed Test Inputs (or Scenarios) • Evaluation based Failures Detection “Manual Testing is still Dominant…”
  • 92. 92 class Triangle { int a, b, c; //sides String type = "NOT_TRIANGLE"; Triangle (int a, int b, int c){…} void computeTriangleType() { 1. if (a == b) { 2. if (b == c) 3. type = "EQUILATERAL"; else 4. type = "ISOSCELES"; } else { 5. if (a == c) { 6. type = "ISOSCELES"; } else { 7. if (b == c) 8. type = “ISOSCELES”; else 9. type = “SCALENE”; } } } Java Class Under Test (CUT) @Test public void test(){ Triangle t = new Triangle (1,2,3); t.computeTriangleType(); String type = t.getType(); assertTrue(type.equals(“SCALENE”)); } Test Case Traditional Development Pipeline: Coding v.s. Testing
  • 93. 93 class Triangle { int a, b, c; //sides String type = "NOT_TRIANGLE"; Triangle (int a, int b, int c){…} void computeTriangleType() { 1. if (a == b) { 2. if (b == c) 3. type = "EQUILATERAL"; else 4. type = "ISOSCELES"; } else { 5. if (a == c) { 6. type = "ISOSCELES"; } else { 7. if (b == c) 8. type = “ISOSCELES”; else 9. type = “SCALENE”; } } } Java Class Under Test (CUT) @Test public void test(){ Triangle t = new Triangle (1,2,3); t.computeTriangleType(); String type = t.getType(); assertTrue(type.equals(“SCALENE”)); } Test Case Code Coverage: The main Quality Assessment Criteria Traditional Development Pipeline: Coding v.s. Testing
  • 94. 94 class Triangle { int a, b, c; //sides String type = "NOT_TRIANGLE"; Triangle (int a, int b, int c){…} void computeTriangleType() { 1. if (a == b) { 2. if (b == c) 3. type = "EQUILATERAL"; else 4. type = "ISOSCELES"; } else { 5. if (a == c) { 6. type = "ISOSCELES"; } else { 7. if (b == c) 8. type = “ISOSCELES”; else 9. type = “SCALENE”; } } } Java Class Under Test (CUT) @Test public void test(){ Triangle t = new Triangle (1,2,3); t.computeTriangleType(); String type = t.getType(); assertTrue(type.equals(“SCALENE”)); } Test Case Traditional Development Pipeline: Coding v.s. Testing Code Coverage: Not Suf fi cient as Quality Assessment Criteria
  • 95. Challenges of Testing ADSs 95 Challenge 1: Code coverage vs. Scenario Coverage Challenge 2: Code coverage & CPU & Memory consumption Challenge 3: Unit-Test v.s. System-level Testing
  • 96. 96 Stop Testing Target: Feature Interactions Failures
  • 98. Testing Autonomous Driving Systems 98 World of Agile, 2018
  • 99. 99 World of Agile, 2018 Testing Autonomous Driving Systems
  • 100. 100 World of Agile, 2018 Testing Autonomous Driving Systems
  • 102. 102 SRF Swiss Post drone crashed into a lake Testing Autonomous Driving Systems
  • 103. 103 SRF NZZ Swiss Post drone crashed in Zurich Swiss Post drone crashed into a lake Testing Autonomous Driving Systems
  • 105. 105 npr, January 2022 Testing Autonomous Driving Systems
  • 106. 106 World of Agile, 2018 npr, January 2022 Reuters, September 2021 Testing Autonomous Driving Systems
  • 107. 107 World of Agile, 2018 The New York Times, April 2021 npr, January 2022 Reuters, September 2021 Testing Autonomous Driving Systems
  • 109. 109 World of Agile, 2018 Testing Autonomous Driving Systems
  • 110. 110 World of Agile, 2018 Testing Autonomous Driving Systems
  • 115. 115 World of Agile, 2018 Testing Autonomous Driving Systems
  • 116. 116 Testing Autonomous Driving Systems Real-world testing: ➡Realistic ➡Trustworthy ➡Costly ➡Nondeterministic
  • 117. Simulation is: ➡Cheaper ➡Faster ➡Less reliable ➡Complex CI/CD integration 117 Testing Autonomous Driving Systems Real-world testing: ➡Realistic ➡Trustworthy ➡Costly ➡Nondeterministic
  • 118. Regression Testing 118 “Regression testing is re- running functional and non- functional tests to ensure that previously developed and tested software still performs after a change.” Anirban Basu 2015
  • 120. Regression Testing 120 Yoo et al. 2013 Selection
  • 121. Regression Testing 121 Yoo et al. 2013 Selection Prioritization
  • 122. Regression Testing 122 Yoo et al. 2013 Minimization Selection Prioritization
  • 128. Test Selection for Self-driving Cars 128 How does a test look like?
  • 129. Test Selection for Self-driving Cars 129 How does a test look like?
  • 130. Test Selection for Self-driving Cars 130 How does a test look like?
  • 131. Test Selection for Self-driving Cars 131
  • 132. Test Selection for Self-driving Cars 132 I can keep the lane!
  • 133. Test Selection for Self-driving Cars 133 Uuups… I am not that good!
  • 134. Test Selection for Self-driving Cars 134
  • 135. Testing Costs 135 Simulation Time # Tests 0% 25% 50% 75% 100% Passing Tests Failing Tests 200 s 137 s
  • 138. Test Selection 138 https://github.com/ ChristianBirchler/sdc- scissor GitHub coSt-effeCtIve teSt SelectOR cost-effective test selector Scissor SDC-Scissor ➡Free academic license!
  • 140. Cost-effectiveness 140 Speed up compared to random selection baseline Dataset 2, Logistic Dataset 1, Naïve Bayes Dataset 1, Logistic 0% 42.5% 85% 127.5% 170% Finding 1: SDC-Scissor e ff ectively speeds up the testing by 170%. Finding 2: Logistic and Naïve Bayes classi fi ers save the most time.
  • 141. Join the SDC-Scissor Community 141 sdc-scissor.readthedocs.io
  • 146. Test Prioritization 146 Birchler et al. 2022 ACM TOSEM
  • 147. Genetic Algorithm for Test Prioritization 147
  • 148. Genetic Algorithm for Test Prioritization 148
  • 149. Genetic Algorithm for Test Prioritization 149
  • 165. Yoo et al. 2013 Regression Testing 165 Minimization Selection Prioritization Future Work
  • 166. Regression Testing 166 Yoo et al. 2013 Minimization Selection Prioritization
  • 167. • Context & Motivation: • Cyber-physical Systems • DevOps and Arti fi cial Intelligence • Search-based Software Testing (SBST) Barriers: • A success SE story • Overcoming Barriers of SBST Adoption in Industry • Cost-effective Testing for Self-Driving Cars (SDCs): • Regression Testing • Test Selection & Test Prioritization for SDCs Summary Test Case Selection Initial Tests Search Test Execution Variants Generation 167
  • 168. Thanks for the Attention! • Any Questions? “Testing with Fewer Resources: Toward Adaptive Approaches for Cost- e ff ective Test Generation and Selection” June 22-24, 2022 - Córdoba, Spain Christian Birchler Zurich University of Applied Sciences https://christianbirchler.github.io/ Sebastiano Panichella Zurich University of Applied Sciences https://spanichella.github.io/