Automatic Non-functional Testing of
Code Generators Families
Mohamed
BOUSSAA
Olivier
BARAIS
Gerson
SUNYE
Benoit
BAUDRY
2016 IEEE International Conference on Software Quality, Reliability & Security (QRS 2016)
August 1-3, 2016 - Vienna, Austria
INRIA Rennes, France
15th International Conference on Generative Programming: Concepts & Experiences (GPCE 2016)
Amsterdam, Netherlands, October 31 – November 1, 2016
1
a1. Context
a2. Motivation
a3. Automatic Non-functional Testing of Code Generators Families
a4. Performance Evaluation
a5. Conclusion
Outline
2
Context
3
Software Platform
Diversity
Software Design Automatic Code Generation
Software
Designer
DSL
Model
GPL
Specs
GUI
Code Generator
creators/maintainers
Code generators
Generated code
All tests are successfully passed but…
How about the non-functional properties (quality) of generated code ?
Code generators are used everywhere
They automatically transform high-level system specifications (Models, DSLs,
GUIs, etc.) into general-purpose languages (JAVA, C++, C#, etc.)
Target diverse and heterogeneous software platorms
Context
4
• Testing issues:
- Defective code generators may generate poor-quality code
- Testing the non-functional properties is time-consuming
- Require examining different non-functional requirements
- Code generators are complex and difficult to understand (involve complex
and hetergenous technologies)
Motivation
5
 Non-functional testing of code generators: The traditional way
• Analyze the non-functional properties of generated code using platform-
specific tools, profilers, etc.
Lack of tools for automatic non-functional testing of code generators
Footprint C
Footprint A
DSL
(Model)
SUT
SUT
SUT
Design
Generate
Generate
Generate
Code
Generator A
Code
Generator B
Code
Generator C
Execute
Execute
Execute
C++
Platform C
Platform B
Platform A
JAVA
C#
Profiler A
Profiler B
Profiler C
Bugs
Finding
Report
Report
Report
Footprint B
Code Generation Non-functional TestingCode ExecutionSoftware Design
Automatic Non-functional Testing of
Code Generators Families
https://testingcodegenerators.wordpress.com
6
Contributions
7
 We propose:
• A runtime monitoring infrastructure, based on system containers (Docker) as
execution platforms, that allow code-generator developers to evaluate the
non-functional properties of generated code
• A black-box testing approach to automatically check the potential inefficient
code generators
Microservice-based infrastructure
8
 Execute and monitor of the generated code using system containers
 Different configurations, instances, images, machines, etc
 Resource isolation and management
 Less performance overhead
 Provide a fine-grained understanding and analysis of compilers behavior
 Automatic extraction of non-functional properties relative to resource usage
Approach Overview
9
Footprint C
Footprint A
DSL
(Model)
SUT
SUT
SUT
Design
Generate
Generate
Generate
Code
Generator A
Code
Generator B
Code
Generator C
Execute
Execute
Execute
C++
Platform C
Platform B
Platform A
JAVA
C#
Profiler A
Profiler B
Profiler C
Bugs
Finding
Report
Report
Report
Footprint B
Code Generation Non-functional TestingCode ExecutionSoftware Design
Container C
Container B
Container A
DSL
(Model)
SUT
SUT
SUT
Design
Generate
Generate
Generate
Code
Generator A
Code
Generator B
Code
Generator C
Code Generation Runtime monitoring engineCode ExecutionSoftware Design
Container A’
C#
Container B’
Container C’
Monitoring
Container
Back-end
Data Base
Container
Front-end
Visualization
Container
JAVA
C++
Footprint C’
Footprint A’
REST
Calls
Footprint B’Request
Bugs Finding
Approach Overview
000
000
Compile and execute the
generated code within
a new container instance
Gather at runtime non-
functional properties of running
programs under test
Save information relative
to resource consumptions
within a times series database
Analysis of the performance
and non-functional properties
of programs under test
1
2
3
4
Code
Execution
Runtime
Monitoring
Time series
Database
Performance
Analysis
10
Testing Infrastructure
Component
Under Test
Back-end
Database
Component
Cgroup file systems
Running…
Monitoring records
Front-end:
Visualization
Component
Time-series database
HTTP Requests
11
8086:
Monitoring
Component
…
Code
Generation +
Compilation
Software
Tester
Testing Method
12
Definition (Code generator family):
We define a code generator family as a set of code generators that takes as input
the same language/model and generate code for different target platforms
(example: Haxe, ThingML, etc)
Differential Testing:
Compare equivalent implementations of the same program written in different
languages
Standard deviation (std_dev):
Quantify the amount of variation among the execution traces in terms of memory
usage and execution time
Testing Method
13
Test suites with Std_dev > threshold value are interpreted as code generator
inconsistencies
…
Memory usage Memory usage Memory usage
Compare
Std_dev > kStd_dev < k
BugNo Bug
Evaluation
https://testingcodegenerators.wordpress.com/experimental-results/
14
Experimental Setup
Haxe Libraries + Test suites
For monitoring:
Google cAdvisor
For storage:
InfluxDB
Execution time (S)
Programs
under test:
Haxe Libraries
Code Generators
under Test:
Haxe Compilers
Non-functional metrics
Memory usage (MBytes)
15
5 targets: C#, C++, JAVA, JS, PHP
Validation
16
• The comparison results of
running each test suite across
five target languages: the
metric used is the standard
deviation between execution
times
• Standard deviations are mostly
close to 0 - 8 interval.
• 8 data points where the
std_dev was extreamly high
Validation
17
 Test suites with the highest variation in terms of execution time (k=60)
We can identify a singular behavior of the PHP code regarding the exectution
time
Validation
18
• The comparison results of
running each test suite across
five target languages: the
metric used is the standard
deviation between memory
consumptions
• Standard deviations are mostly
close to 0 - 150 interval.
• 6 data points where the
std_dev was extreamly high
Validation
19
 Test suites with the highest variation in terms of memory usage (k=400)
We can identify a singular behavior of the PHP code regarding the memory
usage
Validation
20
For Color_TS4 in PHP:
• We observe the intensive use of « arrays »
• We replace « arrays » by « SplFixedArray »
=> Speedup x5
=> Memory usage reduction x2
Conclusion
21
Conclusion
22
 Approach for testing and
monitoring the code generators
families using a container-based
infrastructure
 Automatically extract information
about the resource usage
 The evaluation results show that
we can find real issues in existing
code generators (i.e., PHP)
Summary
 Detect more code generator
issues (e.g., CPU consumption)
 Evaluate our approach:
• On other code generator families
• Compare to other state-of-the-art
approaches
Future directions
22
https://testingcodegenerators.wordpress.com 23
Questions?
Tool Support
24
Visualization
25
26
Code Generators Testing: ThingML

GPCE16: Automatic Non-functional Testing of Code Generators Families

  • 1.
    Automatic Non-functional Testingof Code Generators Families Mohamed BOUSSAA Olivier BARAIS Gerson SUNYE Benoit BAUDRY 2016 IEEE International Conference on Software Quality, Reliability & Security (QRS 2016) August 1-3, 2016 - Vienna, Austria INRIA Rennes, France 15th International Conference on Generative Programming: Concepts & Experiences (GPCE 2016) Amsterdam, Netherlands, October 31 – November 1, 2016 1
  • 2.
    a1. Context a2. Motivation a3.Automatic Non-functional Testing of Code Generators Families a4. Performance Evaluation a5. Conclusion Outline 2
  • 3.
    Context 3 Software Platform Diversity Software DesignAutomatic Code Generation Software Designer DSL Model GPL Specs GUI Code Generator creators/maintainers Code generators Generated code All tests are successfully passed but… How about the non-functional properties (quality) of generated code ? Code generators are used everywhere They automatically transform high-level system specifications (Models, DSLs, GUIs, etc.) into general-purpose languages (JAVA, C++, C#, etc.) Target diverse and heterogeneous software platorms
  • 4.
    Context 4 • Testing issues: -Defective code generators may generate poor-quality code - Testing the non-functional properties is time-consuming - Require examining different non-functional requirements - Code generators are complex and difficult to understand (involve complex and hetergenous technologies)
  • 5.
    Motivation 5  Non-functional testingof code generators: The traditional way • Analyze the non-functional properties of generated code using platform- specific tools, profilers, etc. Lack of tools for automatic non-functional testing of code generators Footprint C Footprint A DSL (Model) SUT SUT SUT Design Generate Generate Generate Code Generator A Code Generator B Code Generator C Execute Execute Execute C++ Platform C Platform B Platform A JAVA C# Profiler A Profiler B Profiler C Bugs Finding Report Report Report Footprint B Code Generation Non-functional TestingCode ExecutionSoftware Design
  • 6.
    Automatic Non-functional Testingof Code Generators Families https://testingcodegenerators.wordpress.com 6
  • 7.
    Contributions 7  We propose: •A runtime monitoring infrastructure, based on system containers (Docker) as execution platforms, that allow code-generator developers to evaluate the non-functional properties of generated code • A black-box testing approach to automatically check the potential inefficient code generators
  • 8.
    Microservice-based infrastructure 8  Executeand monitor of the generated code using system containers  Different configurations, instances, images, machines, etc  Resource isolation and management  Less performance overhead  Provide a fine-grained understanding and analysis of compilers behavior  Automatic extraction of non-functional properties relative to resource usage
  • 9.
    Approach Overview 9 Footprint C FootprintA DSL (Model) SUT SUT SUT Design Generate Generate Generate Code Generator A Code Generator B Code Generator C Execute Execute Execute C++ Platform C Platform B Platform A JAVA C# Profiler A Profiler B Profiler C Bugs Finding Report Report Report Footprint B Code Generation Non-functional TestingCode ExecutionSoftware Design Container C Container B Container A DSL (Model) SUT SUT SUT Design Generate Generate Generate Code Generator A Code Generator B Code Generator C Code Generation Runtime monitoring engineCode ExecutionSoftware Design Container A’ C# Container B’ Container C’ Monitoring Container Back-end Data Base Container Front-end Visualization Container JAVA C++ Footprint C’ Footprint A’ REST Calls Footprint B’Request Bugs Finding
  • 10.
    Approach Overview 000 000 Compile andexecute the generated code within a new container instance Gather at runtime non- functional properties of running programs under test Save information relative to resource consumptions within a times series database Analysis of the performance and non-functional properties of programs under test 1 2 3 4 Code Execution Runtime Monitoring Time series Database Performance Analysis 10
  • 11.
    Testing Infrastructure Component Under Test Back-end Database Component Cgroupfile systems Running… Monitoring records Front-end: Visualization Component Time-series database HTTP Requests 11 8086: Monitoring Component … Code Generation + Compilation Software Tester
  • 12.
    Testing Method 12 Definition (Codegenerator family): We define a code generator family as a set of code generators that takes as input the same language/model and generate code for different target platforms (example: Haxe, ThingML, etc) Differential Testing: Compare equivalent implementations of the same program written in different languages Standard deviation (std_dev): Quantify the amount of variation among the execution traces in terms of memory usage and execution time
  • 13.
    Testing Method 13 Test suiteswith Std_dev > threshold value are interpreted as code generator inconsistencies … Memory usage Memory usage Memory usage Compare Std_dev > kStd_dev < k BugNo Bug
  • 14.
  • 15.
    Experimental Setup Haxe Libraries+ Test suites For monitoring: Google cAdvisor For storage: InfluxDB Execution time (S) Programs under test: Haxe Libraries Code Generators under Test: Haxe Compilers Non-functional metrics Memory usage (MBytes) 15 5 targets: C#, C++, JAVA, JS, PHP
  • 16.
    Validation 16 • The comparisonresults of running each test suite across five target languages: the metric used is the standard deviation between execution times • Standard deviations are mostly close to 0 - 8 interval. • 8 data points where the std_dev was extreamly high
  • 17.
    Validation 17  Test suiteswith the highest variation in terms of execution time (k=60) We can identify a singular behavior of the PHP code regarding the exectution time
  • 18.
    Validation 18 • The comparisonresults of running each test suite across five target languages: the metric used is the standard deviation between memory consumptions • Standard deviations are mostly close to 0 - 150 interval. • 6 data points where the std_dev was extreamly high
  • 19.
    Validation 19  Test suiteswith the highest variation in terms of memory usage (k=400) We can identify a singular behavior of the PHP code regarding the memory usage
  • 20.
    Validation 20 For Color_TS4 inPHP: • We observe the intensive use of « arrays » • We replace « arrays » by « SplFixedArray » => Speedup x5 => Memory usage reduction x2
  • 21.
  • 22.
    Conclusion 22  Approach fortesting and monitoring the code generators families using a container-based infrastructure  Automatically extract information about the resource usage  The evaluation results show that we can find real issues in existing code generators (i.e., PHP) Summary  Detect more code generator issues (e.g., CPU consumption)  Evaluate our approach: • On other code generator families • Compare to other state-of-the-art approaches Future directions 22
  • 23.
  • 24.
  • 25.
  • 26.