Andy Zaidman
Delft University of Technology
The Netherlands
International Conference on
Automation of Software Test
(AST 2023)
May 16th, 2023
Melbourne, Australia
A story on testing… and a bit of music
Human-in-the-Loop
important in testing
⎼Sigrid Eldh, during AST 2023 on Monday May 15th
Automatic for the people?
Automatic by the people?
Automatic by the machine?
Automatic for the machine?
Automatic for the people
Automatic by the people?
Automatic by the machine?
Automatic for the machine?
8. Ignoreland
What do we know about
developers’ testing activities in
the IDE?
2443 SW Engineers – 118 countries
Java/C# – Eclipse, IntelliJ, Android Studio, Visual Studio
Minimal 5 month observation period (max. 2.5 years)
161 person years of development work
Not all Java projects do unit testing (in the IDE)
3 508
1 498
(43%)
Write/look at test code
or execute tests
Estimated when installing tool
Time spent on test code engineering
test code
engineering
production code
engineering
47% - 53%
Estimated when installing tool After measuring min. 5 months
Time spent on test code engineering
25% - 75%
test code
engineering
production code
engineering
47% - 53% test code
engineering
production code
engineering
Overestimating the testing effort stems from:
(1) the tedious nature of testing
(2) developers disliking it
10. Man on the Moon
Lyrics: Now Andy did you hear about this one?
…
If you believed they put a man on the moon
…
High Coverage
&
Enormous Potential
Difficult to Read Test Cases
&
Are They Asserting Correctness?
Tools like TestDescriber
Tools like TestDescriber
Key issue:
Understanding
the scenario
under test
Tools like TestDescriber
Key issue:
Understanding
the scenario
under test
Which test makes
more sense? Which
scenario is easier to
grasp?
Do test need to be
UNDERSTANDABLE?
A good test can catch a bug
and returns feedback that can
help you identify the issue.
A good test can catch a bug
and returns feedback that can
help you identify the issue.
A good developer test forms
executable documentation that
tells you how to use the methods.
A good test can catch a bug
and returns feedback that can
help you identify the issue.
A good developer test forms
executable documentation that
tells you how to use the methods.
Understandability of a test
(scenario) is important
What is the purpose of a generated test?
Throw away; find faults in current version of the software
What is the purpose of a generated test?
Throw away; find faults in current version of the software
Inspiration to write a manually written test case
What is the purpose of a generated test?
Throw away; find faults in current version of the software
Inspiration to write a manually written test case
To become part of a maintained test suite
What is the purpose of a generated test?
Throw away; find faults in current version of the software
Inspiration to write a manually written test case
To become part of a maintained test suite
A more specific generated test, e.g., a crash replicating test
What is the purpose of a generated test?
Throw away; find faults in current version of the software
Inspiration to write a manually written test case
To become part of a maintained test suite
A more specific generated test, e.g., a crash replicating test
4. Sweetness follows
Some solution spaces…
• Test amplification
• Test generation with information carving
• Documenting generated tests
• Better support for manual writing of test cases
• …
Test amplification
Test amplification
• Starts from existing test cases
Applies systematic “mutations” to test code to see whether more
coverage is obtained and/or different scenarios are tested
Test amplification
• Starts from existing test cases
Applies systematic “mutations” to test code to see whether more
coverage is obtained and/or different scenarios are tested
+ Easier to understand test
scenarios compared to
freshly generated tests
- Difficult to cover entire
search space, depends on
existing tests
- Inspection cost
Test Impact Graph
Test generation with information
carving
Carving to explore the search space
Carving to explore the search space
vs.
Carving to improve the
understandability of tests
MicroTestCarver approach
EvoSuite
MicroTestCarver
Documenting generated tests
Better support for manual writing
of test cases
4. Everybody Hurts
When are developers
discouraged to test?
When do developers
aspire to test or become
better testers?
Emerging theory
1. We observe 13 developers thinking-aloud while testing methods in
open-source software
2. We challenge and augment our findings by surveying 72 software
developers
Main test case engineering strategies of
software engineers
1. intensively guided by documentation,
2. intensively guided by source code,
3. or ad-hoc.
Recommendations
1. Tool support
• Creation of test skeletons
• Quick code coverage indications
• Copy/paste support for test code
2. Developers
• Have a clear adequacy criterion to avoid uncertainty
3. Education
• Teach how to use code coverage tools to steer testing, not just as a
metric
Recommendations
1. Tool support
• Creation of test skeletons
• Quick code coverage indications
• Copy/paste support for test code
2. Developers
• Have a clear adequacy criterion to avoid uncertainty
3. Education
• Teach how to use code coverage tools to steer testing, not just as a
metric
Come see our work on SW testing education
at SEENG in the next session!
12. Follow the river
Where does it leave us?
We are going to have to make our hands dirty…
Whether we like it or not…
Test cost Inspection cost
Test cost Inspection cost
“You can’t test software with your hands in your pants”
We cannot
neglect testing
Great tools
We should be obsessive in improving the
user experience
We should be obsessive in improving the
user experience UX
We should be obsessive in improving the
developer experience DX
We should be obsessive in improving the
tester experience TX
Automatic for the people
Automatic for the people
by the machine
Automatic for the people
by the machine and
the people
Automatic for the people
by the machine and
the people
All great
work by…

and
many
more
Team
Thank you!
International Conference on
Automation of Software Test
(AST 2023)
May 16th, 2023
Melbourne, Australia
azaidman
Some pointers
• Maurício Aniche, Christoph Treude, Andy Zaidman. How Developers Engineer Test Cases: An
Observational Study; IEEE Trans. on Software Engineering, 2022.
• Moritz Beller, Georgios Gousios, Annibale Panichella, Sebastian Proksch, Sven Amann, Andy
Zaidman. Developer Testing in The IDE: Patterns, Beliefs, And Behavior; IEEE Trans. on Software
Engineering, 2019.
• Sebastiano Panichella, Annibale Panichella, Moritz Beller, Andy Zaidman, Harald Gall. The Impact
of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation; ICSE 2016.
• Carolin Brandt, Andy Zaidman. Developer-Centric Test Amplification: The Interplay Between
Automatic Generation and Human Exploration; Empirical Software Engineering, 2022.
• Mark Swillus, Andy Zaidman. Sentiment Overflow in the Testing Stack: Analysing Software Testing
Posts on Stack Overflow, Arxiv 2022.

Automatic for the People

  • 1.
    Andy Zaidman Delft Universityof Technology The Netherlands International Conference on Automation of Software Test (AST 2023) May 16th, 2023 Melbourne, Australia
  • 2.
    A story ontesting… and a bit of music
  • 3.
    Human-in-the-Loop important in testing ⎼SigridEldh, during AST 2023 on Monday May 15th
  • 4.
    Automatic for thepeople? Automatic by the people? Automatic by the machine? Automatic for the machine?
  • 5.
    Automatic for thepeople Automatic by the people? Automatic by the machine? Automatic for the machine?
  • 7.
  • 8.
    What do weknow about developers’ testing activities in the IDE?
  • 10.
    2443 SW Engineers– 118 countries Java/C# – Eclipse, IntelliJ, Android Studio, Visual Studio Minimal 5 month observation period (max. 2.5 years) 161 person years of development work
  • 11.
    Not all Javaprojects do unit testing (in the IDE) 3 508 1 498 (43%) Write/look at test code or execute tests
  • 12.
    Estimated when installingtool Time spent on test code engineering test code engineering production code engineering 47% - 53%
  • 13.
    Estimated when installingtool After measuring min. 5 months Time spent on test code engineering 25% - 75% test code engineering production code engineering 47% - 53% test code engineering production code engineering
  • 14.
    Overestimating the testingeffort stems from: (1) the tedious nature of testing (2) developers disliking it
  • 15.
    10. Man onthe Moon Lyrics: Now Andy did you hear about this one? … If you believed they put a man on the moon …
  • 18.
  • 19.
    Difficult to ReadTest Cases & Are They Asserting Correctness?
  • 20.
  • 21.
    Tools like TestDescriber Keyissue: Understanding the scenario under test
  • 22.
    Tools like TestDescriber Keyissue: Understanding the scenario under test
  • 23.
    Which test makes moresense? Which scenario is easier to grasp?
  • 26.
    Do test needto be UNDERSTANDABLE?
  • 27.
    A good testcan catch a bug and returns feedback that can help you identify the issue.
  • 28.
    A good testcan catch a bug and returns feedback that can help you identify the issue. A good developer test forms executable documentation that tells you how to use the methods.
  • 29.
    A good testcan catch a bug and returns feedback that can help you identify the issue. A good developer test forms executable documentation that tells you how to use the methods. Understandability of a test (scenario) is important
  • 30.
    What is thepurpose of a generated test? Throw away; find faults in current version of the software
  • 31.
    What is thepurpose of a generated test? Throw away; find faults in current version of the software Inspiration to write a manually written test case
  • 32.
    What is thepurpose of a generated test? Throw away; find faults in current version of the software Inspiration to write a manually written test case To become part of a maintained test suite
  • 33.
    What is thepurpose of a generated test? Throw away; find faults in current version of the software Inspiration to write a manually written test case To become part of a maintained test suite A more specific generated test, e.g., a crash replicating test
  • 34.
    What is thepurpose of a generated test? Throw away; find faults in current version of the software Inspiration to write a manually written test case To become part of a maintained test suite A more specific generated test, e.g., a crash replicating test
  • 35.
  • 36.
    Some solution spaces… •Test amplification • Test generation with information carving • Documenting generated tests • Better support for manual writing of test cases • …
  • 37.
  • 38.
    Test amplification • Startsfrom existing test cases Applies systematic “mutations” to test code to see whether more coverage is obtained and/or different scenarios are tested
  • 39.
    Test amplification • Startsfrom existing test cases Applies systematic “mutations” to test code to see whether more coverage is obtained and/or different scenarios are tested + Easier to understand test scenarios compared to freshly generated tests - Difficult to cover entire search space, depends on existing tests - Inspection cost
  • 40.
  • 42.
    Test generation withinformation carving
  • 43.
    Carving to explorethe search space
  • 44.
    Carving to explorethe search space vs. Carving to improve the understandability of tests
  • 45.
  • 46.
  • 47.
  • 50.
    Better support formanual writing of test cases
  • 52.
  • 55.
  • 56.
    When do developers aspireto test or become better testers?
  • 57.
  • 59.
    1. We observe13 developers thinking-aloud while testing methods in open-source software 2. We challenge and augment our findings by surveying 72 software developers
  • 60.
    Main test caseengineering strategies of software engineers 1. intensively guided by documentation, 2. intensively guided by source code, 3. or ad-hoc.
  • 61.
    Recommendations 1. Tool support •Creation of test skeletons • Quick code coverage indications • Copy/paste support for test code 2. Developers • Have a clear adequacy criterion to avoid uncertainty 3. Education • Teach how to use code coverage tools to steer testing, not just as a metric
  • 62.
    Recommendations 1. Tool support •Creation of test skeletons • Quick code coverage indications • Copy/paste support for test code 2. Developers • Have a clear adequacy criterion to avoid uncertainty 3. Education • Teach how to use code coverage tools to steer testing, not just as a metric Come see our work on SW testing education at SEENG in the next session!
  • 63.
  • 64.
    Where does itleave us?
  • 65.
    We are goingto have to make our hands dirty…
  • 66.
    Whether we likeit or not…
  • 67.
  • 68.
    Test cost Inspectioncost “You can’t test software with your hands in your pants”
  • 69.
  • 70.
  • 71.
    We should beobsessive in improving the user experience
  • 72.
    We should beobsessive in improving the user experience UX
  • 73.
    We should beobsessive in improving the developer experience DX
  • 74.
    We should beobsessive in improving the tester experience TX
  • 75.
  • 76.
    Automatic for thepeople by the machine
  • 77.
    Automatic for thepeople by the machine and the people
  • 78.
    Automatic for thepeople by the machine and the people
  • 79.
  • 80.
    Thank you! International Conferenceon Automation of Software Test (AST 2023) May 16th, 2023 Melbourne, Australia azaidman
  • 81.
    Some pointers • MaurícioAniche, Christoph Treude, Andy Zaidman. How Developers Engineer Test Cases: An Observational Study; IEEE Trans. on Software Engineering, 2022. • Moritz Beller, Georgios Gousios, Annibale Panichella, Sebastian Proksch, Sven Amann, Andy Zaidman. Developer Testing in The IDE: Patterns, Beliefs, And Behavior; IEEE Trans. on Software Engineering, 2019. • Sebastiano Panichella, Annibale Panichella, Moritz Beller, Andy Zaidman, Harald Gall. The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation; ICSE 2016. • Carolin Brandt, Andy Zaidman. Developer-Centric Test Amplification: The Interplay Between Automatic Generation and Human Exploration; Empirical Software Engineering, 2022. • Mark Swillus, Andy Zaidman. Sentiment Overflow in the Testing Stack: Analysing Software Testing Posts on Stack Overflow, Arxiv 2022.