The intensive use of generative programming techniques provides an elegant engineering solution to deal with the heterogeneity of platforms and technological stacks. The use of domain-specific languages for example, leads to the creation of numerous code generators that automatically translate high-level system specifications into multi-target executable code. Producing correct and efficient code generator is complex and error-prone. Although software designers provide generally high-level test suites to verify the functional outcome of generated code, it remains challenging and tedious to verify the behavior of produced code in terms of non-functional properties. This paper describes a practical approach based on a runtime monitoring infrastructure to automatically check the potential inefficient code generators. This infrastructure, based on system containers as execution platforms, allows code-generator developers to evaluate the generated code performance. We evaluate our approach by analyzing the performance of Haxe, a popular high-level programming language that involves a set of cross-platform code generators. Experimental results show that our approach is able to detect some performance inconsistencies that reveal real issues in Haxe code generators.
3. Context
3
Software Platform
Diversity
Software Design Automatic Code Generation
Software
Designer
DSL
Model
GPL
Specs
GUI
Code Generator
creators/maintainers
Code generators
Generated code
All tests are successfully passed but…
How about the non-functional properties (quality) of generated code ?
Code generators are used everywhere
They automatically transform high-level system specifications (Models, DSLs,
GUIs, etc.) into general-purpose languages (JAVA, C++, C#, etc.)
Target diverse and heterogeneous software platorms
4. Context
4
• Testing issues:
- Defective code generators may generate poor-quality code
- Testing the non-functional properties is time-consuming
- Require examining different non-functional requirements
- Code generators are complex and difficult to understand (involve complex
and hetergenous technologies)
5. Motivation
5
Non-functional testing of code generators: The traditional way
• Analyze the non-functional properties of generated code using platform-
specific tools, profilers, etc.
Lack of tools for automatic non-functional testing of code generators
Footprint C
Footprint A
DSL
(Model)
SUT
SUT
SUT
Design
Generate
Generate
Generate
Code
Generator A
Code
Generator B
Code
Generator C
Execute
Execute
Execute
C++
Platform C
Platform B
Platform A
JAVA
C#
Profiler A
Profiler B
Profiler C
Bugs
Finding
Report
Report
Report
Footprint B
Code Generation Non-functional TestingCode ExecutionSoftware Design
7. Contributions
7
We propose:
• A runtime monitoring infrastructure, based on system containers (Docker) as
execution platforms, that allow code-generator developers to evaluate the
non-functional properties of generated code
• A black-box testing approach to automatically check the potential inefficient
code generators
8. Microservice-based infrastructure
8
Execute and monitor of the generated code using system containers
Different configurations, instances, images, machines, etc
Resource isolation and management
Less performance overhead
Provide a fine-grained understanding and analysis of compilers behavior
Automatic extraction of non-functional properties relative to resource usage
9. Approach Overview
9
Footprint C
Footprint A
DSL
(Model)
SUT
SUT
SUT
Design
Generate
Generate
Generate
Code
Generator A
Code
Generator B
Code
Generator C
Execute
Execute
Execute
C++
Platform C
Platform B
Platform A
JAVA
C#
Profiler A
Profiler B
Profiler C
Bugs
Finding
Report
Report
Report
Footprint B
Code Generation Non-functional TestingCode ExecutionSoftware Design
Container C
Container B
Container A
DSL
(Model)
SUT
SUT
SUT
Design
Generate
Generate
Generate
Code
Generator A
Code
Generator B
Code
Generator C
Code Generation Runtime monitoring engineCode ExecutionSoftware Design
Container A’
C#
Container B’
Container C’
Monitoring
Container
Back-end
Data Base
Container
Front-end
Visualization
Container
JAVA
C++
Footprint C’
Footprint A’
REST
Calls
Footprint B’Request
Bugs Finding
10. Approach Overview
000
000
Compile and execute the
generated code within
a new container instance
Gather at runtime non-
functional properties of running
programs under test
Save information relative
to resource consumptions
within a times series database
Analysis of the performance
and non-functional properties
of programs under test
1
2
3
4
Code
Execution
Runtime
Monitoring
Time series
Database
Performance
Analysis
10
12. Testing Method
12
Definition (Code generator family):
We define a code generator family as a set of code generators that takes as input
the same language/model and generate code for different target platforms
(example: Haxe, ThingML, etc)
Differential Testing:
Compare equivalent implementations of the same program written in different
languages
Standard deviation (std_dev):
Quantify the amount of variation among the execution traces in terms of memory
usage and execution time
13. Testing Method
13
Test suites with Std_dev > threshold value are interpreted as code generator
inconsistencies
…
Memory usage Memory usage Memory usage
Compare
Std_dev > kStd_dev < k
BugNo Bug
15. Experimental Setup
Haxe Libraries + Test suites
For monitoring:
Google cAdvisor
For storage:
InfluxDB
Execution time (S)
Programs
under test:
Haxe Libraries
Code Generators
under Test:
Haxe Compilers
Non-functional metrics
Memory usage (MBytes)
15
5 targets: C#, C++, JAVA, JS, PHP
16. Validation
16
• The comparison results of
running each test suite across
five target languages: the
metric used is the standard
deviation between execution
times
• Standard deviations are mostly
close to 0 - 8 interval.
• 8 data points where the
std_dev was extreamly high
17. Validation
17
Test suites with the highest variation in terms of execution time (k=60)
We can identify a singular behavior of the PHP code regarding the exectution
time
18. Validation
18
• The comparison results of
running each test suite across
five target languages: the
metric used is the standard
deviation between memory
consumptions
• Standard deviations are mostly
close to 0 - 150 interval.
• 6 data points where the
std_dev was extreamly high
19. Validation
19
Test suites with the highest variation in terms of memory usage (k=400)
We can identify a singular behavior of the PHP code regarding the memory
usage
20. Validation
20
For Color_TS4 in PHP:
• We observe the intensive use of « arrays »
• We replace « arrays » by « SplFixedArray »
=> Speedup x5
=> Memory usage reduction x2
22. Conclusion
22
Approach for testing and
monitoring the code generators
families using a container-based
infrastructure
Automatically extract information
about the resource usage
The evaluation results show that
we can find real issues in existing
code generators (i.e., PHP)
Summary
Detect more code generator
issues (e.g., CPU consumption)
Evaluate our approach:
• On other code generator families
• Compare to other state-of-the-art
approaches
Future directions
22