3. 3
Test amplification consists of
the automatic transformation of an
existing manually written test suite,
to enhance a specific, measurable
property.
Test Amplification
9. Program Synthesis is
the task of automatically finding programs
from the underlying programming language
that satisfy user intent expressed in some form
of constraints.
10
Program Synthesis
10. Test Amplification is code generation
• Test Amplification is code generation
• The intention: increasing the coverage
• The input: Program under test + Test suite
• The output: A new test suite
11
12. They can Infer types
V. J. Hellendoorn, C. Bird, E. T. Barr, and M. Allamanis, “Deep learning type inference,” in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on
the Foundations of Software Engineering. 13
13. They can understand a code
M. Allamanis, M. Brockschmidt, and M. Khademi, “Learning to represent programs with graphs,” in International Conference on Learning Representations, 2018. 14
18. References
• [Danglot 2018] Benjamin Danglot, Oscar Vera-Perez, Zhongxing Yu, Andy Zaidman, Martin Monperrus, and
Benoit Baudry. 2018. A Snowballing Literature Study on Test Amplification. arXiv paper 1705.10692v2 (2018).
• [Danglot 2019] Benjamin Danglot, Oscar Luis Vera-Pérez, Benoit Baudry, and Martin Monperrus. 2019.
Automatic Test Improvement with DSpot: a Study with Ten Mature Open-Source Projects. Empirical Software
Engineering, Springer Verlag (2019).
• [Abdi-IWST19] Mehrdad Abdi, Henrique Rocha, Serge Demeyer. 2019. Test amplification in the pharo
smalltalk ecosystem. IWST (2019)
• [Gulwani 2017] S. Gulwani, O. Polozov and R. Singh. Program Synthesis. Foundations and Trends® in
Programming Languages, vol. 4, no. 1-2, pp. 1–119, 2017.
• [Xie2006] T. Xie. Augmenting Automatically Generated Unit-test Suites with Regression Oracle Checking. In
Proceedings of the 20th European Conference on Object-Oriented Programming, pages 380{403, 2006.
19
19. Adopting Program Synthesis
for Test amplification
Mehrdad Abdi (mehrdad.abdi@uantwerpen.be)
Henrique Rocha (henrique.rocha@uantwerpen.be)
Serge Demeyer (serge.demeyer@uantwerpen.be)
20
Editor's Notes
Here, I'm going to talk about my rough (/raf/) idea,
the opportunity using program synthesis in test generation. It’s in early steps of research and I would like to hear your feedbacks about that.
We began with Test amplification.
-----
We have a Program, or unit under test,
And a test suite, that is written by a human, a developer.
The test suite has a coverage on the unit under test.
You can see the coverage with blue color!!
----
We are going to generate a new test suite by transforming the existing one, that increases the test coverage.
---
you can see the increase in red color!!!
Formally, we define the amplification as this.
[read the def]
Here is an example, how we can amplify a test suite
We have the existing test suite. you can see it in top left.
first, we strip the assertion statements from it. (step 1)
then, we apply small changes on it, to produce many versions of it. (step 2)
then, we carve observation mechanism in each test to capture the state of object under test during execution. (step 3)
Then, We run the tests and collect the state of object under test. (step 4)
Using these information, we generate assert statements. (step 5)
And finally, we choose tests that are increasing the coverage. (step 6)
We repeat this loop N times,
----
The steps with green color, called Input amplification
----
And steps with red called Asseroin amplification
This idea is implemented in a tool called Dspot, and we replicated this, in Pharo ecosystem, in smalltalk language. We call it SmallAmp.
Smallamp is open source, and it's under development.
After imlemeinting smallamp, we see that there are some areas for improvement.
I will talk about these areas in next slides.
In a dynamic language like Pharo, we don’t have access to type information, until the execution of the code.
For example, for the arguments of a method, in static analyses, we see them like this. What’s anArray? What’s aNewColumnName?
when we need type information?
1) we need it to generate assertions, fortunately assertion generation has a dynamic step which we have access to the types. so, there's no problem here.
2) we need it to calculate the mutation coverage, fortunately again, we used existing mutation framework in Pharo called MuTalk. They have solved this problem, and this is out of our scope.
3) we need it to generate random calls to object under test in test input. And some of methods needs arguments. this is a challenge.
OK, we can have a profiling step.
we install a proxy on these methods to log the type of the arguments, and dynamically run the test suites to capture the types.
what if, a method is not called in the test suite?
the question: can we infer the type by:
the name of variabels,
or the usage- how they are used inside the method, for example, we have aNumber - 1 in our code, so it can be a number.
or how this method is used when it’s called? for example, we can find a code in other classes, that calls this function, and it passes an array in first argument, and we can infer that it can be an array.
we perform changes on test body to generate new versions of a test
but we do it randomly, randomly change a literal, or randomly add a new method call
We have a huge number of possible test bodies. -> a search problem
And, most of these test bodies makes no sense.
the question
can we reduce the domain of search?
can we do the changes by an intention, for example, Can we extract some patterns by analyzing the codes related to this class in the ecosystem.
during the test execution, we observe the state of the object under test
we record the return values of the observer methods, and then we generate assertions.
what’s an observer method: a simple definition is a public method that returns a value,
but what will happen, if it’s changing the state of the object? -> descent
In this list, which method is an observer method?
When we should assert the value of an observer?
an answer is always. So we generate the same asserts every time, regardless of the previous update.
Or we can assert only changed values, by comparing to previous state. But in withdraw example, sometimes we need to assert that the value is not changed!
There should be a relation between test inputs and test oracles, what’s that relation? Data flow? Let’s use it in amplification.
We generate test codes, and we expect them to be useful, to be joined to the code by developer and to be maintained.
But we can not expect humans maintain what they don’t understand.
But what’s a readable test?
There are patterns to generate readable and good tests,
But how can put all the patterns from an 8 hundred pages book into a tool?
Can the tool learn these patterns from existing written test cases?
Can we use machine learning in test generation?
Can Machines write codes?
Yes, There is a research area in computer science called Program synthesis.
[read defination]
In simple terms, Program synthesis is enabling programs to write programs! Like science fiction movies!
inspiration from natural language processing
part-of-speech tagging
How they writes a code?
Simply, By generating all possible programs in the language!
But, its impossible,
They reduce the domain of search using a Neural Network.
A NN calculates the probability of the presence of a token in the code
They claim: DeepCode does to software code what Grammarly does to written language