From Concepts to Implementation: Why I Missed My First Paper Submission Deadline

1
From Concepts to
Implementation:
Why I Missed My First Paper
Submission Deadline
Miguel Velez
Software Engineering Ph.D. student

From Concepts to
Engineering (Problems)
Research Overview
2

Efﬁciently Measure Performance of Highly
Conﬁgurable Systems
4

Execution Time
5
Conﬁguration Performance
A=0 B=0 C=0 D=0 6s
A=0 B=0 C=1 D=1 6s
A=1 B=0 C=0 D=1 9s
… …
T = 6 + 3A + 1AB

Ideally
6
Identify what conﬁgurations to measure
Not measure all conﬁgurations

Our Proposal: Exploit Structure and
Behavior of the Program to Measure
Performance
7

Approach: Combine Static Taint Analysis
with Dynamic Analysis
9

31
Overview of the Approach
Program
Workload
Identify
Regions
Compress
Conﬁgurations Instrument
and
Execute
Build
Performance
Model
PM

From Concepts to Engineering (Problems)
32

Timeline
33
Fall
Survey
January
Idea
March
Sleep Implementation
FOSD Meeting

Timeline
34
April
Java Implementation
Paper Outline
Aim for GPCE 2017
July 2nd
GPCE Paper
Deadline

Timeline
35
April
Java Implementation
Paper Outline
Aim for GPCE 2017
July 2nd
GPCE Paper
Deadline
Late June

36
Program
Workload
Identify
Regions
Compress
and
Execute
Build
Performance
Model
PM

Tracking Load-time Configuration Options
Max Lillack
University of Leipzig, Germany
Christian Kästner
Carnegie Mellon University, USA
Eric Bodden
Fraunhofer SIT &
TU Darmstadt, Germany
ABSTRACT
Highly-configurable software systems are pervasive, although
configuration options and their interactions raise complexity
of the program and increase maintenance e↵ort. Especially
load-time configuration options, such as parameters from
command-line options or configuration files, are used with
standard programming constructs such as variables and if
statements intermixed with the program’s implementation;
manually tracking configuration options from the time they
are loaded to the point where they may influence control-
flow decisions is tedious and error prone. We design and
implement Lotrack, an extended static taint analysis to
automatically track configuration options. Lotrack derives
a configuration map that explains for each code fragment un-
der which configurations it may be executed. An evaluation
on Android applications shows that Lotrack yields high
accuracy with reasonable performance. We use Lotrack to
empirically characterize how much of the implementation of
Android apps depends on the platform’s configuration op-
tions or interactions of these options.
Categories and Subject Descriptors
K.6.3 [Software Management]: Software maintenance
Keywords
Variability mining; Configuration options; Static analysis
1. INTRODUCTION
Software has become increasingly configurable to support
di↵erent requirements for a wide range of customers and
market segments. Configuration options can be used to
support alternative hardware, cater for backward compat-
ibility, enable extra functionality, add debugging facilities,
and much more. While configuration mechanisms allow end
users to use the software in more contexts, they also raise the
software’s complexity for developers, adding more function-
ality that needs to be tested and maintained. Even worse,
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
ASE’14, September 15-19, 2014, Vasteras, Sweden.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-3013-8/14/09 ...$15.00.
http://dx.doi.org/10.1145/2642937.2643001.
configuration options may interact in unanticipated ways
and subtle behavior may hide in specific combinations of
options that are di cult to discover and understand in the
exponentially growing configuration space. Configuration
options raise challenges since they vary and thus complicate
the software’s control and data flow. As a result, develop-
ers need to trace configuration options through the software
to identify which code fragments are a↵ected by an option
and where and how options may interact. Overall, making
changes becomes harder because developers need to under-
stand a larger context and may need to retest many config-
urations.
There are many strategies to implement configuration op-
tions, but particularly common and problematic is to use
load-time parameters (command-line options, configuration
files, registry entries, and so forth): Parameters are loaded
and used as ordinary values within the program at runtime,
and configuration decisions are made through ordinary con-
trol statements (such as if statements) within a common im-
plementation. The implementation of configuration options
with plugins or conditional compilation provide some static
traceability of an option’s implementation which is miss-
ing in load-time configuration options. Identifying the code
fragments implementing an option requires tedious manual
e↵ort and, as our evaluation confirms, is challenging to get
right even in medium-size software systems.
In this work, we propose Lotrack,1
a tool to statically
track configuration options from the moment they are loaded
in the program to the code that is directly or indirectly af-
fected. Specifically, Lotrack aims at identifying all code
that is included if and only if a specific configuration option
or combination of configuration options is selected. The re-
covered traceability can support developers in many main-
tenance tasks [11], but, in the long run, can also be used
as input for further automated tasks, such as removing a
configuration option and its use from the program, translat-
ing load-time into compile-time options, or guaranteeing the
absence of interactions among configuration options. In con-
trast to slicing [32], which determines whether a statement’s
execution depends on a given value, Lotrack determines
under which configurations, i.e., a set of selected configura-
tion options, a given statement is executed.
To track configuration options precisely, we exploit the na-
ture of how configuration options are typically implemented.
Although a na¨ıve forward-slicing algorithm can identify all
code potentially a↵ected by a configuration option, directly
or indirectly, in practice, it will frequently return slices that
1
https://github.com/MaxLillack/Lotrack
Lotrack
39
Identify
Regions

Tracking Load-time Configuration Options
Max Lillack
University of Leipzig, Germany
Christian Kästner
Carnegie Mellon University, USA
Eric Bodden
Fraunhofer SIT &
TU Darmstadt, Germany
ABSTRACT
Highly-configurable software systems are pervasive, although
configuration options and their interactions raise complexity
of the program and increase maintenance e↵ort. Especially
load-time configuration options, such as parameters from
command-line options or configuration files, are used with
standard programming constructs such as variables and if
statements intermixed with the program’s implementation;
manually tracking configuration options from the time they
are loaded to the point where they may influence control-
flow decisions is tedious and error prone. We design and
implement Lotrack, an extended static taint analysis to
automatically track configuration options. Lotrack derives
a configuration map that explains for each code fragment un-
der which configurations it may be executed. An evaluation
on Android applications shows that Lotrack yields high
accuracy with reasonable performance. We use Lotrack to
empirically characterize how much of the implementation of
Android apps depends on the platform’s configuration op-
tions or interactions of these options.
Categories and Subject Descriptors
K.6.3 [Software Management]: Software maintenance
Keywords
Variability mining; Configuration options; Static analysis
1. INTRODUCTION
Software has become increasingly configurable to support
di↵erent requirements for a wide range of customers and
market segments. Configuration options can be used to
support alternative hardware, cater for backward compat-
ibility, enable extra functionality, add debugging facilities,
and much more. While configuration mechanisms allow end
users to use the software in more contexts, they also raise the
software’s complexity for developers, adding more function-
ality that needs to be tested and maintained. Even worse,
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
ASE’14, September 15-19, 2014, Vasteras, Sweden.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-3013-8/14/09 ...$15.00.
http://dx.doi.org/10.1145/2642937.2643001.
configuration options may interact in unanticipated ways
and subtle behavior may hide in specific combinations of
options that are di cult to discover and understand in the
exponentially growing configuration space. Configuration
options raise challenges since they vary and thus complicate
the software’s control and data flow. As a result, develop-
ers need to trace configuration options through the software
to identify which code fragments are a↵ected by an option
and where and how options may interact. Overall, making
changes becomes harder because developers need to under-
stand a larger context and may need to retest many config-
urations.
There are many strategies to implement configuration op-
tions, but particularly common and problematic is to use
load-time parameters (command-line options, configuration
files, registry entries, and so forth): Parameters are loaded
and used as ordinary values within the program at runtime,
and configuration decisions are made through ordinary con-
trol statements (such as if statements) within a common im-
plementation. The implementation of configuration options
with plugins or conditional compilation provide some static
traceability of an option’s implementation which is miss-
ing in load-time configuration options. Identifying the code
fragments implementing an option requires tedious manual
e↵ort and, as our evaluation confirms, is challenging to get
right even in medium-size software systems.
In this work, we propose Lotrack,1
a tool to statically
track configuration options from the moment they are loaded
in the program to the code that is directly or indirectly af-
fected. Specifically, Lotrack aims at identifying all code
that is included if and only if a specific configuration option
or combination of configuration options is selected. The re-
covered traceability can support developers in many main-
tenance tasks [11], but, in the long run, can also be used
as input for further automated tasks, such as removing a
configuration option and its use from the program, translat-
ing load-time into compile-time options, or guaranteeing the
absence of interactions among configuration options. In con-
trast to slicing [32], which determines whether a statement’s
execution depends on a given value, Lotrack determines
under which configurations, i.e., a set of selected configura-
tion options, a given statement is executed.
To track configuration options precisely, we exploit the na-
ture of how configuration options are typically implemented.
Although a na¨ıve forward-slicing algorithm can identify all
code potentially a↵ected by a configuration option, directly
or indirectly, in practice, it will frequently return slices that
1
https://github.com/MaxLillack/Lotrack
Lotrack
40
Identify
Regions
Implements a static taint analysis for Java!

Problems with Lotrack
41
Identify
Regions
Lack of documentation
Collaborator graduated
Lengthy and complicated installation process
Assumptions, bugs, and scalability issues
Output is not what I expected

42
Affected Components
Program
Workload
Identify
Regions
Compress
and
Execute
Build
Performance
Model
PM

Lessons Learned
43
Do not reinvent the wheel (if you can)
Research tools are difﬁcult to use and have
limitations
People are willing to help
TODO
NAME

44
Program
Workload
Identify
Regions
Compress
and
Execute
Build
Performance
Model
PM

ASM 4.0
A Java bytecode engineering library
Eric Bruneton
Bytecode Manipulation
46
Instrument
and
Execute

Lessons Learned
47
Attend SSSG talks
Ask help from research group and more senior
students
Instrument
and
Execute

48
Program
Workload
Identify
Regions
Compress
and
Execute
Build
Performance
Model
PM

Compression Algorithm
50
Researchers at FOSD Meeting developed an
algorithm
Limited to pairwise sampling
Compress
Conﬁgurations

Lessons Learned
51
Network with other researchers
Compress
Conﬁgurations

Class Report
53
Using Domain Knowledge and Data Analysis to Predict
Configuration Validity in Highly Configurable Systems
Miguel Velez
Institute for Software Research
mvelezce@cs.cmu.edu
Abstract
Highly configurable systems provide a large number of at-
tributes and features that we can tune to customize the sys-
tem to our specific needs. However, there are some config-
urations in the system that make it crash or not satisfy its
functional or non-functional requirements. It is important to
identify and specify the distinction between valid and invalid
configurations since it allows to check that the system com-
plies with its requirements and that we do not spend a long
time executing the system for it to eventually crash or not
function properly. In this paper, we propose to use domain
knowledge and data analysis techniques to predict the con-
figuration validity in highly configurable systems. The rule-
based model generated using the JRip classifier after our data
analysis yielded a performance of 0.828 and correctly classi-
fied 91.45% of instances on a new set. The model generated
in this project will allow us to reduce the cost of running
experiments on a robotics system.
1. Introduction
Configurable software systems, including web servers, robots,
and compilers, allow stakeholders to make customizations
and adjustments to satisfy various functional and non-
functional requirements depending on specific tasks and
needs. The hundreds or thousands of options in these highly
configurable systems create an exponential configuration
space that make it difficult to analyze and understand their
influence on the behavior and performance of the system.
While some configurations produce different functional-
ity and behavior, other configurations violate the state of the
system or make it not behave as expected. These valid and
invalid configurations, respectively, determine the outcome
of executing the system using those settings. Valid config-
urations make the system complete the task that the sys-
tem was assigned, such as compiling a program or making a
robot go to a specific point on a map. Invalid configurations,
on the other hand, prevent the system from completing the
assigned task or completing it without satisfying the com-
pletion requirements. This might be due to a violation of the
internal state of the system, such as selecting options that
should not be selected concurrently, or a configuration that
does not make the system behave as expected, such as reply-
ing to an HTTP request after several seconds.
Previous work on specifying and identifying configura-
tion validity in highly configurable systems is limited and
narrow. In Feature-Oriented Programming (FOP) [2] and
Software Product Lines (SPL) [3], feature models are usu-
ally included with the systems, which specify the valid con-
figurations of the system. However, they are used sparsely
and might only cover invalid configurations that produce in-
valid states of the system. Other systems use other methods
to report invalid configurations [11]. However, they are also
rarely used in practice. There are several static and dynamic
approaches to analyze configurations in highly configurable
systems [6, 7, 9]. However, their focus is not on identify-
ing configuration validity, but rather on understanding other
aspects of the system.
In this project, we propose a solution to predict the
validity of configurations in highly configurable systems.
With data collected for performance analysis of the local-
ization algorithm of the TurtleBot [10]; a highly config-
urable robotics system, we used domain knowledge of the
system and applied data analysis techniques to generate a
rule-based model using the JRip classifier implementation in
Weka1
. The model we learned using our modeling assump-
tions yielded a performance of 0.828 and correctly classified
91.45% of configurations on a new set. This model will re-
duce the cost of running experiments on this system since
invalid configurations will not be executed in the real robot.
2. Configuration Validity in Configurable
Systems
In this paper, we define the validity of a configuration as the
outcome of assigning a task to the system. A valid configu-
ration allows the system to complete its assigned task within
a specified time limit, while an invalid configuration hinders
the system from completing its task within the time limit or
makes the system crash.
Our goal is to determine the configuration validity in
highly configurable systems. That is, we want to know what
1 http://www.cs.waikato.ac.nz/ml/weka/

Class Report
54
“Failing a class is not
as bad as not being
productive on your
research” -Anonymous
Ph.D. student
Using Domain Knowledge and Data Analysis to Predict
Configuration Validity in Highly Configurable Systems
Miguel Velez
Institute for Software Research
mvelezce@cs.cmu.edu
Abstract
Highly configurable systems provide a large number of at-
tributes and features that we can tune to customize the sys-
tem to our specific needs. However, there are some config-
urations in the system that make it crash or not satisfy its
functional or non-functional requirements. It is important to
identify and specify the distinction between valid and invalid
configurations since it allows to check that the system com-
plies with its requirements and that we do not spend a long
time executing the system for it to eventually crash or not
function properly. In this paper, we propose to use domain
knowledge and data analysis techniques to predict the con-
figuration validity in highly configurable systems. The rule-
based model generated using the JRip classifier after our data
analysis yielded a performance of 0.828 and correctly classi-
fied 91.45% of instances on a new set. The model generated
in this project will allow us to reduce the cost of running
experiments on a robotics system.
1. Introduction
Configurable software systems, including web servers, robots,
and compilers, allow stakeholders to make customizations
and adjustments to satisfy various functional and non-
functional requirements depending on specific tasks and
needs. The hundreds or thousands of options in these highly
configurable systems create an exponential configuration
space that make it difficult to analyze and understand their
influence on the behavior and performance of the system.
While some configurations produce different functional-
ity and behavior, other configurations violate the state of the
system or make it not behave as expected. These valid and
invalid configurations, respectively, determine the outcome
of executing the system using those settings. Valid config-
urations make the system complete the task that the sys-
tem was assigned, such as compiling a program or making a
robot go to a specific point on a map. Invalid configurations,
on the other hand, prevent the system from completing the
assigned task or completing it without satisfying the com-
pletion requirements. This might be due to a violation of the
internal state of the system, such as selecting options that
should not be selected concurrently, or a configuration that
does not make the system behave as expected, such as reply-
ing to an HTTP request after several seconds.
Previous work on specifying and identifying configura-
tion validity in highly configurable systems is limited and
narrow. In Feature-Oriented Programming (FOP) [2] and
Software Product Lines (SPL) [3], feature models are usu-
ally included with the systems, which specify the valid con-
figurations of the system. However, they are used sparsely
and might only cover invalid configurations that produce in-
valid states of the system. Other systems use other methods
to report invalid configurations [11]. However, they are also
rarely used in practice. There are several static and dynamic
approaches to analyze configurations in highly configurable
systems [6, 7, 9]. However, their focus is not on identify-
ing configuration validity, but rather on understanding other
aspects of the system.
In this project, we propose a solution to predict the
validity of configurations in highly configurable systems.
With data collected for performance analysis of the local-
ization algorithm of the TurtleBot [10]; a highly config-
urable robotics system, we used domain knowledge of the
system and applied data analysis techniques to generate a
rule-based model using the JRip classifier implementation in
Weka1
. The model we learned using our modeling assump-
tions yielded a performance of 0.828 and correctly classified
91.45% of configurations on a new set. This model will re-
duce the cost of running experiments on this system since
invalid configurations will not be executed in the real robot.
2. Configuration Validity in Configurable
Systems
In this paper, we define the validity of a configuration as the
outcome of assigning a task to the system. A valid configu-
ration allows the system to complete its assigned task within
a specified time limit, while an invalid configuration hinders
the system from completing its task within the time limit or
makes the system crash.
Our goal is to determine the configuration validity in
highly configurable systems. That is, we want to know what
1 http://www.cs.waikato.ac.nz/ml/weka/

Stress
55
“Don’t feel bad”
“Your Ph.D. is not a sprint;
it is a marathon”
“You cannot work everyday
for that long”

Summary
56
Efﬁciently measure performance of highly
conﬁgurable systems
A lot work from concepts to implementation
Have fun!
WARNING:
Challenges
Ahead

57
Overview of the Approach
Program
Workload
Identify
Regions
Compress
and
Execute
Build
Performance
Model
PM

Wanted a Production Level Program
58
Attempted to develop the perfect
product
Had to be fully automated
Complicated
Modularized components
Saved intermediate results

Additional
59
Keenan Crane (kmcrane@cs.cmu.edu)

From Concepts to Implementation: Why I Missed My First Paper Submission Deadline

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to From Concepts to Implementation: Why I Missed My First Paper Submission Deadline

Similar to From Concepts to Implementation: Why I Missed My First Paper Submission Deadline (20)

Recently uploaded

Recently uploaded (20)

From Concepts to Implementation: Why I Missed My First Paper Submission Deadline