Learning Software Performance Models for Dynamic and Uncertain Environments

Learning Software Performance Models
for Dynamic and Uncertain Environments
Pooyan Jamshidi
NC State University
RAISE Lab
June 2017

My background
5
RobusT2Scale/FQL4KE [PhD]
✓ Engineering / Technical
✓ Auto-scaling in cloud
(RobusT2Scale)
✓ Self-learning controller for
cloud auto-scaling (FQL4KE)
BO4CO/TL4CO [Postdoc1@Imperial]
✓ Mathematical modeling
✓ Configuration optimization for big
data (BO4CO)
✓ Performance-aware DevOps
(TL4CO)
Transfer Learning [Postdoc2@CMU]
✓ Empirical
✓ Learns accurate and reliable models
from “related” sources
✓ Reuse learning across
environmental changes
Software industry [2003-2010]: Pre-PhD
Close collaborations with Intel and Microsoft [PhD]
3 EU projects: MODAClouds (cloud), DICE (big data), Human Brain (clustering) [Postdoc1@Imperial]
1 DARPA project: BRASS (Robotics) [Postdoc2@CMU]

Robotics systems
• Environment changes
• Resources available (e.g. power)
• New elements (e.g. obstacles)
• Evolve functionality
• Hardware and software changes
• New tasks, goals, policies
• Close proximity to human
• Human in the loop
6
[Credit to CMU CoBot]

Cloud applications
7
* JupiterResearch ** Amazon ***Google
• 82% of end-users give up on a lost payment transaction*
• 25% of end-users leave if load time > 4s**
• 1% reduced sale per 100ms load time**
• 20% reduced income if 0.5s longer load time***
flash-crowds
failures
capacity
shortage
slow
application
[Credit to Cristian Klein, Brownout]

Big data analytics
• Failures (of data flow services)
• Bottleneck (because of slow nodes or failure)
• End-to-end latency
8
• Click Stream Analytics
Example
Ingestion
Layer
Analytics
Layer
Storage
Layer [Credit to A. Khoshbarforushha]

Common characteristics of the systems
• Moderns systems are
increasingly configurable
• Modern systems are
deployed in dynamic and
uncertain environments
• Modern systems can be
adapted on the fly
9
Hey, You Have Given Me Too Many Knobs!
Understanding and Dealing with Over-Designed Configuration in System Software
Tianyin Xu*, Long Jin*, Xuepeng Fan*‡, Yuanyuan Zhou*,
Shankar Pasupathy† and Rukma Talwadker†
*University of California San Diego, ‡Huazhong Univ. of Science & Technology, †NetApp, Inc
{tixu, longjin, xuf001, yyzhou}@cs.ucsd.edu
{Shankar.Pasupathy, Rukma.Talwadker}@netapp.com
ABSTRACT
Configuration problems are not only prevalent, but also severely
mpair the reliability of today’s system software. One fundamental
eason is the ever-increasing complexity of configuration, reflected
y the large number of configuration parameters (“knobs”). With
undreds of knobs, configuring system software to ensure high re-
ability and performance becomes a daunting, error-prone task.
This paper makes a first step in understanding a fundamental
uestion of configuration design: “do users really need so many
nobs?” To provide the quantitatively answer, we study the con-
guration settings of real-world users, including thousands of cus-
omers of a commercial storage system (Storage-A), and hundreds
f users of two widely-used open-source system software projects.
Our study reveals a series of interesting findings to motivate soft-
ware architects and developers to be more cautious and disciplined
n configuration design. Motivated by these findings, we provide
few concrete, practical guidelines which can significantly reduce
he configuration space. Take Storage-A as an example, the guide-
nes can remove 51.9% of its parameters and simplify 19.7% of
he remaining ones with little impact on existing users. Also, we
tudy the existing configuration navigation methods in the context
f “too many knobs” to understand their effectiveness in dealing
with the over-designed configuration, and to provide practices for
uilding navigation support in system software.
Categories and Subject Descriptors: D.2.10 [Software Engineer-
7/2006 7/2008 7/2010 7/2012 7/2014
0
100
200
300
400
500
600
700
Storage-A
Numberofparameters
Release time
1/1999 1/2003 1/2007 1/2011
0
100
200
300
400
500
5.6.2
5.5.0
5.0.16
5.1.3
4.1.0
4.0.12
3.23.0
1/2014
MySQL
Numberofparameters
Release time
1/1998 1/2002 1/2006 1/2010 1/2014
0
100
200
300
400
500
600
1.3.14
2.2.14
2.3.4
2.0.35
1.3.24
Numberofparameters
Release time
Apache
1/2006 1/2008 1/2010 1/2012 1/2014
0
40
80
120
160
200
2.0.0
1.0.0
0.19.0
0.1.0
Hadoop
Numberofparameters
Release time
MapReduce
HDFS
Figure 1: The increasing number of configuration parameters with
software evolution. Storage-A is a commercial storage system from a ma-
jor storage company in the U.S.
all the customer-support cases in a major storage company in the
U.S., and were the most significant contributor (31%) among all
[Credit to Tianyin Xu, Too Many Knobs]

Elasticity Management in Cloud

Motivation
~50% = wasted hardware
Actual
traffic
Typical weekly traffic to Web-based applications (e.g., Amazon.com)

Motivation
Problem 1: ~75% wasted capacity
Actual
demand
Problem 2:
customer lost
Traffic in an unexpected burst in requests (e.g. end of
year traffic to Amazon.com)

Motivation
Auto-scaling enables you to realize this ideal on-demand provisioning

An Example of Auto-scaling Rule These values are
required to be
determined by users
Þ requires deep
knowledge of
application (CPU,
memory)
Þ requires
performance
modeling expertise
(how to scale)
Þ A unified opinion
of user(s) is
required
Amazon auto scaling
Microsoft Azure Watch
14
Microsoft Azure Auto-
scaling Application Block

RobusT2Scale: Fuzzy control to facilitate elasticity
policy encoding
RobusT2Scale Initial setting +
elasticity rules +
response-time SLA
environment
monitoring
application
monitoring
scaling
actions
Fuzzy Reasoning
Users
Prediction/
Smoothing
[SEAMS14,SEAMS15,TAAS16]

Cloud is a dynamic and uncertain
environment
17
0 50 100
0
500
1000
1500
0 50 100
100
200
300
400
500
0 50 100
0
1000
2000
0 50 100
0
200
400
600
0 50 100
0
500
1000
0 50 100
0
500
1000
Big spike Dual phase Large variations
Quickly varying Slowly varying Steep tri phase
0
1000
2:00 6:47
userrequests
time d
userrequests
Fig. 11: Synthetic workload patterns.
and it never converges to an optimal policy.
0
1
2
3
4
5
6
7
8
0 1
high e
numberofVMs
Fig. 12: Au
[CLOUD16, TOIT17]

Fuzzifier
Inference
Engine
Defuzzifier
Rule
base
Fuzzy
Q-learning
Cloud ApplicationMonitoring Actuator
Cloud Platform
Fuzzy Logic
Controller
Knowledge Learning
Autonomic Controller
𝑟𝑡
𝑤
𝑤,𝑟𝑡,𝑡ℎ,𝑣𝑚
𝑠𝑎
system state system goal• Learn how to react to
uncertainties instead of
encoding reaction
policies
• Self-learning the
adaptation rules
FQL4KE: Self-learning controller for dynamic
cloud environments
[QoSA16,CCGrid17]

Learning Software Performance
Models

Previous work on performance transfer learning
Performance modeling
Modeling [Woodside, Johnston, Yigitbasi]
Sampling: Random, Partial, Design of Experiment [Guo, Sarkar, Siegmund]
Optimization: Recursive Random Sampling [Ye], Smart Hill Climbing [Xi], Direct search [Zheng], Quick
Optimization via Guessing [Osogami], Bayesian Optimization [Jamshidi], Multi-objective optimization [Filieri]
Learning: SVM [Yigitbasi], Decision tree [Nair], Fourier sparse functions [Zhang], Active learning [Siegmund],
Search-based optimization and evolutionary algorithms [Henard, Wu]
Performance analysis across environmental change
Hardware: MapReduce [Yigitbasi], Anomaly detection [Stewart], Micro-benchmark measurements [Hoste],
Configurable systems [Thereska]
Transfer learning in performance modeling and analysis
Performance predictions [BO4CO]; Configuration dependency [Chen]
Model transfer [Valov]; Co-design exploration [Bodin]
Configuration optimization [TL4CO]
Transfer learning in software engineering
Defect prediction [Krishna, Nam]
Effort estimation [Kocaguneli]
Transfer learning in machine learning [Jialin Pan, Torrey]
Inductive Transfer Learning
Unsupervised Transfer Learning
Transductive Transfer Learning
21

Research gap
• Transfer learning can make performance modeling and analysis more
efficient.
• My research concerns:
• Transferring software performance models across heterogeneous
environments (A cost-aware TL solution, 1st part of this talk)
• “Why” and “when” transfer learning works (An exploratory study, 2nd part of
this talk)
22

Configuration options influence performance
1- Many different
Parameters =>
- large state space
- interactions
2- Defaults are
typically used =>
- poor performance

Configuration options influence performance
number of counters
number of splitters
latency(ms)
100
150
1
200
250
2
300
Cubic Interpolation Over Finer Grid
243 684 10125 14166 18
In our experiments we
observed improvement
up to 100%

Adapt to different environments
26
TurtleBot

27
TurtleBot 50+3*C1+15*C2-7*C2*C3

28
TurtleBot 50+3*C1+15*C2-7*C2*C3

Classic sensitivity analysis
29
Measure
TurtleBot

30
Measure
Learn
TurtleBot
50+3*C1+15*C2-7*C2*C3

31
Measure
Learn
TurtleBot
50+3*C1+15*C2-7*C2*C3

32
Measure
Learn
50+3*C1+15*C2-7*C2*C3
TurtleBot
Optimization +
Reasoning +
Debugging +
Tuning

Measuring performance is expensive
33
25 options × 10 values = 1025 configurations
Measure

Transfer Learning for
Performance Modeling and Analysis

Reuse data from cheaper sources
Measure
TurtleBot
Data

Measure
TurtleBot
Measure
Simulator (Gazebo)
Data
Data

37
Measure
TurtleBot
Measure
Simulator (Gazebo)
Data
Reuse
DataData

38
Measure
Learn
with TL
TurtleBot
50+3*C1+15*C2-7*C2*C3
Measure
Simulator (Gazebo)
Data
Reuse
DataData

39
Measure
Learn
with TL
TurtleBot
50+3*C1+15*C2-7*C2*C3
Measure
Simulator (Gazebo)
Data
Reuse
DataData
[SEAMS17]

Transfer between different systems/environments
40
Simulator
Noisy
simulator
Cheaper
robot
Different
mission

GP for modeling black box response function
true function
GP mean
GP variance
observation
selected point
true
minimum
making. The other reason is that all the computations in
this framework are based on tractable linear algebra.
In our previous work [21], we proposed BO4CO that ex-
ploits single-task GPs (no transfer learning) for prediction of
posterior distribution of response functions. A GP model is
composed by its prior mean (µ(·) : X ! R) and a covariance
unction (k(·, ·) : X ⇥ X ! R) [41]:
y = f(x) ⇠ GP(µ(x), k(x, x0
)), (2)
where covariance k(x, x0
) defines the distance between x
and x0
. Let us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be
the collection of t experimental data (observations). In this
ramework, we treat f(x) as a random variable, conditioned
on observations S1:t, which is normally distributed with the
ollowing posterior mean and variance functions [41]:
µt(x) = µ(x) + k(x)|
(K + 2
I) 1
(y µ) (3)
2
t (x) = k(x, x) + 2
I k(x)|
(K + 2
I) 1
k(x) (4)
where y := y1:t, k(x)|
= [k(x, x1) k(x, x2) . . . k(x, xt)],
µ := µ(x1:t), K := k(xi, xj) and I is identity matrix. The
shortcoming of BO4CO is that it cannot exploit the observa-
tions regarding other versions of the system and as therefore
cannot be applied in DevOps.
3.2 TL4CO: an extension to multi-tasks
TL4CO 1
uses MTGPs that exploit observations from other
previous versions of the system under test. Algorithm 1
defines the internal details of TL4CO. As Figure 4 shows,
TL4CO is an iterative algorithm that uses the learning from
other system versions. In a high-level overview, TL4CO: (i)
selects the most informative past observations (details in
Section 3.3); (ii) fits a model to existing data based on kernel
earning (details in Section 3.4), and (iii) selects the next
configuration based on the model (details in Section 3.5).
In the multi-task framework, we use historical data to fit a
better GP providing more accurate predictions. Before that,
we measure few sample points based on Latin Hypercube De-
n Optimization here is that it o↵ers a framework
asoning can be not only based on mean estimates
he variance, providing more informative decision
The other reason is that all the computations in
work are based on tractable linear algebra.
revious work [21], we proposed BO4CO that ex-
e-task GPs (no transfer learning) for prediction of
istribution of response functions. A GP model is
by its prior mean (µ(·) : X ! R) and a covariance
k(·, ·) : X ⇥ X ! R) [41]:
y = f(x) ⇠ GP(µ(x), k(x, x0
)), (2)
ariance k(x, x0
) defines the distance between x
et us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be
on of t experimental data (observations). In this
, we treat f(x) as a random variable, conditioned
tions S1:t, which is normally distributed with the
posterior mean and variance functions [41]:
= µ(x) + k(x)|
(K + 2
I) 1
(y µ) (3)
= k(x, x) + 2
I k(x)|
(K + 2
I) 1
k(x) (4)
y1:t, k(x)|
= [k(x, x1) k(x, x2) . . . k(x, xt)],
t), K := k(xi, xj) and I is identity matrix. The
g of BO4CO is that it cannot exploit the observa-
ding other versions of the system and as therefore
applied in DevOps.
4CO: an extension to multi-tasks
uses MTGPs that exploit observations from other
ersions of the system under test. Algorithm 1
e internal details of TL4CO. As Figure 4 shows,
an iterative algorithm that uses the learning from
em versions. In a high-level overview, TL4CO: (i)
Motivations:
1- mean estimates + variance
2- all computations are linear algebra
3- good estimations when few data
k(x)|
= [k(x, x1) k(x, x2) . . . k(x, xt)], I is
y matrix and
K :=
2
6
4
k(x1, x1) . . . k(x1, xt)
...
...
...
k(xt, x1) . . . k(xt, xt)
3
7
5 (7)
models have shown to be effective for performance
ions in data scarce domains [20]. However, as we
demonstrated in Figure 2, it may becomes inaccurate
the samples do not cover the space uniformly. For
configurable systems, we require a large number of
ations to cover the space uniformly, making GP models
tive in such situations.
del prediction using transfer learning
ansfer learning, the key question is how to make accu-
edictions for the target environment using observations
ther sources, Ds. We need a measure of relatedness not
etween input configurations but between the sources
l. The relationships between input configurations was
ed in the GP models using the covariance matrix that
efined based on the kernel function in Eq. (7). More
cally, a kernel is a function that computes a dot product
sure of “similarity”) between two input configurations.
e kernel helps to get accurate predictions for similar
urations. We now need to exploit the relationship be-
the source and target functions, g, f, using the current
ations Ds, Dt to build the predictive model ˆf. To capture
ationship, we define the following kernel function:
k(f, g, x, x0
) = kt(f, g) ⇥ kxx(x, x0
), (8)
the kernels kt represent the correlation between source
rget function, while kxx is the covariance function for
Typically, kxx is parameterized and its parameters are
by maximizing the marginal likelihood of the model
An overview of a self-optimization solution is depicted
in Figure 3 following the well-know MAPE-K framework
[9], [23]. We consider the GP model as the K (knowledge)
component of this framework that acts as an interface to which
other components can query the performance under specific
configurations or update the model given a new observation.
We use transfer learning to make the knowledge more accurate
using observations that are taken from a simulator or any
other cheap sources. For deciding how many observations
and from what source to transfer, we use the cost model
that we have introduced earlier. At runtime, the managed
system is Monitored by pulling the end-to-end performance
metrics (e.g., latency, throughput) from the corresponding
sensors. Then, the retrieved performance data needs to be
Analysed and the mean performance associated to a specific
setting of the system will be stored in a data repository.
Next, the GP model needs to be updated taking into account
the new performance observation. Having updated the GP
model, a new configuration may be Planned to replace the
current configuration. Finally, the new configuration will be
enacted by Executing appropriate platform specific operations.
This enables model-based knowledge evolution using machine
learning [2], [21]. The underlying GP model can now be
updated not only when a new observation is available but
also by transferring the learning from other related sources.
So at each adaptation cycle, we can update our belief about
the correct response given data from the managed system and
other related sources, accelerating the learning process.
IV. EXPERIMENTAL RESULTS
We evaluate the effectiveness and applicability of our
transfer learning approach for learning models for highly-
configurable systems, in particular, compared to conventional
non-transfer learning. Specifically, we aim to answer the
following three research questions:
RQ1: How much does transfer learning improve the prediction
ernal details about the system; the learning process can
ied in a black-box fashion using the sampled perfor-
measurements. In the GP framework, it is also possible
rporate domain knowledge as prior, if available, which
hance the model accuracy [20].
rder to describe the technical details of our transfer
g methodology, let us briefly describe an overview of
del regression; a more detailed description can be found
ere [35]. GP models assume that the function ˆf(x) can
rpreted as a probability distribution over functions:
y = ˆf(x) ⇠ GP(µ(x), k(x, x0
)), (4)
µ : X ! R is the mean function and k : X ⇥ X ! R
covariance function (kernel function) which describes
tionship between response values, y, according to the
e of the input values x, x0
. The mean and variance of
model predictions can be derived analytically [35]:
x) = µ(x) + k(x)|
(K + 2
I) 1
(y µ), (5)
x) = k(x, x) + 2
I k(x)|
(K + 2
I) 1
k(x), (6)
k(x)|
= [k(x, x1) k(x, x2) . . . k(x, xt)], I is
matrix and
K :=
2
6
4
k(x1, x1) . . . k(x1, xt)
...
...
...
k(xt, x1) . . . k(xt, xt)
3
7
5 (7)
models have shown to be effective for performance
ons in data scarce domains [20]. However, as we
emonstrated in Figure 2, it may becomes inaccurate
he samples do not cover the space uniformly. For
configurable systems, we require a large number of
tions to cover the space uniformly, making GP models
a standard method [35]. After learning the parameters of kxx,
we construct the covariance matrix exactly the same way as in
Eq. 7 and derive the mean and variance of predictions using
Eq. (5), (6) with the new K. The main essence of transfer
learning is, therefore, the kernel that capture the source and
target relationship and provide more accurate predictions using
the additional knowledge we can gain via the relationship
between source and target.
D. Transfer learning in a self-adaptation loop
Now that we have described the idea of transfer learning for
providing more accurate predictions, the question is whether
such an idea can be applied at runtime and how the self-
adaptive systems can benefit from it. More specifically, we
now describe the idea of model learning and transfer learning
in the context of self-optimization, where the system adapts
its configuration to meet performance requirements at runtime.
The difference to traditional configurable systems is that we
learn the performance model online in a feedback loop under
time and resource constraints. Such performance reasoning is
done more frequently for self-adaptation purposes.
An overview of a self-optimization solution is depicted
in Figure 3 following the well-know MAPE-K framework
[9], [23]. We consider the GP model as the K (knowledge)
component of this framework that acts as an interface to which
other components can query the performance under specific
configurations or update the model given a new observation.
We use transfer learning to make the knowledge more accurate
using observations that are taken from a simulator or any
other cheap sources. For deciding how many observations
and from what source to transfer, we use the cost model
that we have introduced earlier. At runtime, the managed
system is Monitored by pulling the end-to-end performance

Function prediction
441 3 5 9 10 11 12 13 14
0
50
100
150
200
Target function
Target samples

Prediction without transfer learning
45

Prediction with more data without transfer
learning
46

Prediction with transfer learning
47

Transfer learning improves sampling
49
Sample where
uncertainty is high
to gain more
information

-1.5 -1 -0.5 0 0.5 1 1.5
-4
-3
-2
-1
0
1
2
3
(a) 3 sample response functions
configuration domain
responsevalue
(1)
(2)
(3)
observations
(b) GP fit for (1) ignoring observations for (2),(3)
LCB
not informative
(c) multi-task GP fit for (1) by transfer learning from (2),(3)
highly informative
GP prediction mean
GP prediction variance
probability distribution
of the minimizers
Transfer learning improves optimization

Transfer from a misleading source
52
When no transfer
is better than a
bad transfer!

Transfer from a misleading source
53
When bad
transfer lead to a
high uncertainty
for the predictive
model

Case Study and Controlled Experiments
•RQ1: Improve prediction accuracy?
•RQ2: Tradeoffs among source and target samples?
•RQ3: Fast enough for self-adaptive systems?
55

Subject systems
56
• Autonomous service robot
• Environmental change
• 3 stream processing apps
• Workload change
• NoSQL
• Workload & hardware change

Performance prediction for CoBot
58
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25

Performance
prediction for
CoBot
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
59
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
CPU usage[%] CPU usage[%]
Source Target

Performance
prediction for
CoBot
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
60
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
CPU usage[%]
Source Target
Prediction
with 4
samples

Performance
prediction for
CoBot
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
61
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
CPU usage[%]CPU usage[%]
Source Target
Prediction
with 4
samples
Prediction
with TL

Tradeoff between Source and Target
Samples

Prediction error with different source and target samples
63
% Source
%Target

64
% Source
%Target

65
% Source
%Target

66
% Source
%Target

Prediction error of other systems
67
CoBot WordCount SOL
RollingSort Cassandra (HW) Cassandra (DB)

Take away from our TL approach
•Reuse data from similar
system
•Improves model
accuracy/reliability
•Decrease learning cost
68
Measure
Learn
with TL
TurtleBot
50+3*C1+15*C2-7*C2*C3
Measure
Simulator (Gazebo)
Data
Reuse
DataData

“Why” and “When” Transfer Learning
Works: An Exploratory Analysis

Small vs severe environmental changes
• Workload change (New tasks or missions, new environmental
conditions)
• Infrastructure change (New Intel NUC, new Camera, new Sensors)
• Code change (new versions of ROS, new localization algorithm)
• Combination of these changes
70

Transferable knowledge
71
Target (Learn)Source (Given)
DataModel
Transferable
Knowledge
II. INTUITION
Understanding the performance behavior of configurable
oftware systems can enable (i) performance debugging, (ii)
erformance tuning, (iii) design-time evolution, or (iv) runtime
daptation [11]. We lack empirical understanding of how the
erformance behavior of a system will vary when the environ-
ment of the system changes. Such empirical understanding will
rovide important insights to develop faster and more accurate
earning techniques that allow us to make predictions and
ptimizations of performance for highly configurable systems
n changing environments [10]. For instance, we can learn
erformance behavior of a system on a cheap hardware in a
ontrolled lab environment and use that to understand the per-
ormance behavior of the system on a production server before
hipping to the end user. More specifically, we would like to
now, what the relationship is between the performance of a
ystem in a specific environment (characterized by software
onfiguration, hardware, workload, and system version) to the
ne that we vary its environmental conditions.
In this research, we aim for an empirical understanding of
A. Preliminary concepts
In this section, we prov
cepts that we use through
enable us to concisely con
1) Configuration and e
the i-th feature of a confi
enabled or disabled and o
configuration space is ma
all the features C = Do
Dom(Fi) = {0, 1}. A
a member of the configu
all the parameters are as
range (i.e., complete instan
We also describe an env
e = [w, h, v] drawn from
W ⇥H ⇥V , where they re
values for workload, hard
2) Performance model:
configuration space F and
formance model is a blac
given some observations o
combination of system’s
NTUITION
mance behavior of configurable
(i) performance debugging, (ii)
gn-time evolution, or (iv) runtime
pirical understanding of how the
stem will vary when the environ-
Such empirical understanding will
develop faster and more accurate
ow us to make predictions and
e for highly configurable systems
10]. For instance, we can learn
ystem on a cheap hardware in a
nd use that to understand the per-
em on a production server before
ore specifically, we would like to
is between the performance of a
ment (characterized by software
kload, and system version) to the
mental conditions.
or an empirical understanding of
A. Preliminary concepts
In this section, we provide formal definitions of fo
cepts that we use throughout this study. The formal n
enable us to concisely convey concept throughout the
1) Configuration and environment space: Let Fi
the i-th feature of a configurable system A which i
enabled or disabled and one of them holds by defa
configuration space is mathematically a Cartesian pro
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd)
Dom(Fi) = {0, 1}. A configuration of a system
a member of the configuration space (feature space
all the parameters are assigned to a specific value
range (i.e., complete instantiations of the system’s para
We also describe an environment instance by 3 v
e = [w, h, v] drawn from a given environment spa
W ⇥H ⇥V , where they respectively represent sets of
values for workload, hardware and system version.
2) Performance model: Given a software system
configuration space F and environmental instances E
formance model is a black-box function f : F ⇥ E
given some observations of the system performance
combination of system’s features x 2 F in an envi
space F and environmental instances E, a per-
odel is a black-box function f : F ⇥ E ! R
observations of the system performance for each
of system’s features x 2 F in an environment
construct a performance model for a system A
ation space F, we run A in environment instance
ious combinations of configurations xi 2 F, and
sulting performance values yi = f(xi) + ✏i, xi 2
⇠ N (0, i). The training data for our regression
en simply Dtr = {(xi, yi)}n
i=1. In other words, a
ction is simply a mapping from the input space to
performance metric that produces interval-scaled
e assume it produces real numbers).
mance distribution: For the performance model,
and associated the performance response to each
, now let introduce another concept where we
ironment and we measure the performance. An
rformance distribution is a stochastic process,
(R), that defines a probability distribution over
measures for each environmental conditions. To
ware
o the
g of
med
rfor-
uited
we
arch
ans-
and
ried
able
ider
t of
vari-
ance
tand
be
configuration space F and environmental instances E, a
formance model is a black-box function f : F ⇥ E !
given some observations of the system performance for
combination of system’s features x 2 F in an environm
e 2 E. To construct a performance model for a system
with configuration space F, we run A in environment inst
e 2 E on various combinations of configurations xi 2 F,
record the resulting performance values yi = f(xi) + ✏i, x
F where ✏i ⇠ N (0, i). The training data for our regres
models is then simply Dtr = {(xi, yi)}n
i=1. In other wor
response function is simply a mapping from the input spac
a measurable performance metric that produces interval-sc
data (here we assume it produces real numbers).
3) Performance distribution: For the performance mo
we measured and associated the performance response to
configuration, now let introduce another concept where
vary the environment and we measure the performance
empirical performance distribution is a stochastic pro
pd : E ! (R), that defines a probability distribution
performance measures for each environmental conditions
Extract Reuse
f(·,es) f(·, et)
We hypothesize that
we can
- learn and transfer
- different forms of
knowledge
- across
environments,
- while so far only
simple transfers are
attempted!

Exploratory study
• Study performance models across environments
• Measure the performance of each system using standard benchmarks
• 36 comparisons of environmental changes:
• Different hardware
• Different workloads
• Different versions
• Severity of environmental changes [5 levels from small to severe]
• Less severe changes -> more related models -> easier to transfer
• More severe changes -> less related models -> difficult to transfer
72

Empirical analysis or relatedness
• 4 general research questions covering:
• RQ1: Entire configuration space
• RQ2: Option (feature) specific
• RQ3: Option (feature) interactions
• RQ4: Invalid configurations
• Different assumptions about the relatedness as hypotheses.
• For each hypothesis:
• Analyze environmental changes in four subject systems.
• Discuss how commonly we identify this kind of relatedness.
73

Subject systems
74
TABLE I: Overview of the real-world subject systems.
System Domain d |C| |H| |W| |V |
SPEAR SAT solver 14 16 384 3 4 2
x264 Video encoder 16 4 000 2 3 3
SQLite Database 14 1 000 2 14 2
SaC Compiler 50 71 267 1 10 1
d: configuration options; C: configurations; H: hardware environments; W : analyzed
workload; V : analyzed versions.
IV. PERFORMANCE BEHAVIOR CONSISTENCY (RQ1)
Here, we investigate the relatedness of the source and target
environments in the entire configuration space. We start by
testing the strongest assumptions about the relatedness of
environm
reuse me
predict th
that virtu
not be ab
Worklo
correlatio
SAT prob
the differ
across en
slightly
for other
instance

Level of relatedness between source and
target is important
10
20
30
40
50
60
AbsolutePercentageError[%]
Sources s s1 s2 s3 s4 s5 s6
noise-level 0 5 10 15 20 25 30
corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19
µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75
Fig. 6: Prediction accuracy of the model learned with samples
from different sources of different relatedness to the target.
GP is the model without transfer learning.
TABLE
column
datasets
measure
1
2
3
4
5
6
predictio
system,
as the e
pled for
• Model becomes more accurate
when the source is more
related to the target
• Even learning from a source
with a small correlation is
better than no transfer
75

76
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
5
10
15
20
25
30
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
10
12
14
16
18
20
22
24
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
10
15
20
25
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
10
15
20
25
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
6
8
10
12
14
16
18
20
22
24
(a) (b) (c)
(d) (e) 5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
12
14
16
18
20
22
24
(f)
CPU usage [%]
CPU usage [%] CPU usage [%]
CPU usage [%] CPU usage [%] CPU usage [%]
Less related->
Less accurate
More related->
More accurate

RQ1: Does performance behavior stay
consistent across environments?
For non-severe
hardware changes, we
can linearly transfer
performance models
across environments.
𝑓+ = 𝛼×𝑓/ + β
77
M1: Pearson linear correlation
M2: Kullback-Leibler (KL) divergence
M3: Spearman correlation coefficient
M4/M5: Percentage of top/bottom configurations
TABLE II
RQ1
Environment ES H1.1 H1.2 H1.3 H1.4 H
M1 M2 M3 M4 M5 M6 M7
SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914
ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7
ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7
ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7
ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7
ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5
ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6
ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4
x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read s
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1
ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4
ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4
ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5
SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody10
ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14
ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10
ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10
ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12
11

For severe environmental
changes, the performance
distributions are similar,
showing the potential for
learning a non-linear
transfer function.
78
TABLE II
RQ1
M1 M2 M3 M4 M5 M6 M7
ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7
ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7
ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7
ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7
ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5
ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6
ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1
ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4
ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4
ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5
ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14
ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10
ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10
ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12
11

The configurations retain
their relative performance
profile across hardware
platforms.
79
TABLE II
RQ1
M1 M2 M3 M4 M5 M6 M7
ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7
ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7
ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7
ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7
ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5
ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6
ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1
ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4
ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4
ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5
ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14
ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10
ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10
ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12
11

Only hardware changes
preserve top
configurations across
environments.
80
x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq;
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1
ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4
ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4
ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5
SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody100,
ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14
ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10
ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10
ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12
ec5 : [h1, w2 ! w3, v1] M 0.76 1.82 0.84 0.67 0.86 17 11
ec6 : [h1, w2 ! w4, v1] S 0.91 5.54 0.80 0.00 0.91 14 11
ec7 : [h1, w2 ! w5, v1] L 0.68 25.31 0.57 0.11 0.71 14 14
ec8 : [h1, w3 ! w4, v1] L 0.68 1.70 0.56 0.00 0.91 14 13
ec9 : [h1, w3 ! w5, v1] VL 0.06 3.68 0.20 0.00 0.64 16 10
ec10 : [h1, w4 ! w5, v1] M 0.70 4.85 0.76 0.00 0.75 12 12
ec11 : [h1, w6 ! w7, v1] S 0.82 5.79 0.77 0.25 0.88 36 30
ec12 : [h1, w6 ! w8, v1] S 1.00 0.52 0.92 0.80 0.97 38 30
ec13 : [h1, w8 ! w7, v1] S 1.00 0.32 0.92 0.53 0.99 30 33
ec14 : [h1, w9 ! w10, v1] L 0.24 4.85 0.56 0.44 0.77 22 21
ES: Expected severity of environmental change (see Sec. III-B): S: small change; SM: small medium c
SaC workload descriptions: srad: random matrix generator; pﬁlter: particle ﬁltering; hotspot: heat tran
nbody: simulation of dynamic systems; cg: conjugate gradient; gc: garbage collector. Hardware descript
h1: NUC/4/1.30/15/SSD; h2: NUC/2/2.13/7/SCSI; h3:Station/2/2.8/3/SCSI; h4: Amazon/1/2.4/1/SSD; h5:
11
TABLE II: R
RQ1
Environment ES H1.1 H1.2 H1.3 H1.4 H2.1
M1 M2 M3 M4 M5 M6 M7
SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4
ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7
ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7
ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7
ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7
ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5
ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6
ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4
x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r2
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq;
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1
ec : [h , w , v ! v ] M 0.95 1.00 0.79 0.24 0.29 2 4
11

RQ2: Is the influence of options on performance
Only a subset of options
are influential and a
large proportion of
influential options are
preserved.
81
TABLE II: Result
RQ1 RQ2
Environment ES H1.1 H1.2 H1.3 H1.4 H2.1
M1 M2 M3 M4 M5 M6 M7 M8
SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4 : 978
ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 7
ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 4
ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7 4
ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 4
ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5 3
ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 1
ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 2
x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r2744,
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 8
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 8
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 8
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 7
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 10
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 9
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 5
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 5
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; Versi
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 2
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 4
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 1
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 0
ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 1
ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 1
ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 2
SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody100, w7 : n
ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 8
ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 8
ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 9
ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12 10
11
TABLE II: Results.
RQ1 RQ2 RQ3
Environment ES H1.1 H1.2 H1.3 H1.4 H2.1 H2.2 H3.1 H
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M
SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4 : 978/7498; Version: v1 : 1.2, v2 : 2.7
ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 7 0 1 25 25 25
ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 4 2 0.51 41 27 21
ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7 4 3 1 23 23 22
ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 4 3 0.99 22 23 22
ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5 3 1 0.32 21 7 7
ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 1 3 0.68 7 21 7
ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 2 2 0.88 21 7 7 -
x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r2744, v3 : r2744
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 8 0 0.86 21 33 18
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 8 0 0.94 36 27 24
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 8 1 0.89 27 33 22
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 7 1 0.88 27 33 20
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 10 0 0.83 47 33 29
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 9 0 0.80 46 33 27
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 5 1 0.78 33 33 20
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 5 2 0.58 33 21 18
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; Version: v1 : 3.7.6.3, v2 : 3.19.0
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 2 0 1 13 9 8
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 4 0 1 10 11 9
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 1 0 1 9 9 7
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 0 0 1 4 2 2
ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 1 0 1 12 11 7
ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 1 1 0.31 7 11 6
ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 2 1 0.31 7 13 6
SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody100, w7 : nbody150, w8 : nbody750, w9 : gc, w10
ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 8 0 0.88 82 73 52
ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 8 0 0.91 82 63 50
ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 9 0 0.96 37 64 34
ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12 10 0 0.94 34 58 25
11
M6/M7: Number of influential options in source
and target
M8/M9: Number of options that agree/disagree
M10: Correlation between importance of options

RQ2: Is the influence of options on performance
The strength of the influence
of configuration options is
typically preserved across
environments.
82
x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version:
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.9
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.7
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.5
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.5
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.4
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.3
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.2
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.2
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.3
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.4
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.3
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.4
ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.2
ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.3
ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.4
SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw,
ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.7
ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.6
ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.9
ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.8
ec5 : [h1, w2 ! w3, v1] M 0.76 1.82 0.84 0.67 0.8
ec6 : [h1, w2 ! w4, v1] S 0.91 5.54 0.80 0.00 0.9
ec7 : [h1, w2 ! w5, v1] L 0.68 25.31 0.57 0.11 0.7
ec8 : [h1, w3 ! w4, v1] L 0.68 1.70 0.56 0.00 0.9
ec9 : [h1, w3 ! w5, v1] VL 0.06 3.68 0.20 0.00 0.6
ec10 : [h1, w4 ! w5, v1] M 0.70 4.85 0.76 0.00 0.7
ec11 : [h1, w6 ! w7, v1] S 0.82 5.79 0.77 0.25 0.8
ec12 : [h1, w6 ! w8, v1] S 1.00 0.52 0.92 0.80 0.9
ec13 : [h1, w8 ! w7, v1] S 1.00 0.32 0.92 0.53 0.9
ec14 : [h1, w9 ! w10, v1] L 0.24 4.85 0.56 0.44 0.7
ES: Expected severity of environmental change (see Sec. III-B): S: small change; SM
SaC workload descriptions: srad: random matrix generator; pfilter: particle filtering; h
nbody: simulation of dynamic systems; cg: conjugate gradient; gc: garbage collector.
h1: NUC/4/1.30/15/SSD; h2: NUC/2/2.13/7/SCSI; h3:Station/2/2.8/3/SCSI; h4: Amazon
11
RQ1
Environment ES H1.1 H1.2 H1.3 H1.4
M1 M2 M3 M4 M
SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3
ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.9
ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.8
ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.3
ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.1
ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.3
ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.0
ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.1
x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version:
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.9
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.7
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.5
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.5
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.4
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.3
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.2
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.2
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand,
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.3
ec : [h ! h , w , v ] M 0.97 1.08 0.88 0.40 0.4
11
TABLE II: Results.
RQ1 RQ2
Environment ES H1.1 H1.2 H1.3 H1.4 H2.1 H2.2
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M
SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4 : 978/7498; Version: v
ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 7 0 1
ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 4 2 0.51
ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7 4 3 1
ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 4 3 0.99
ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5 3 1 0.32
ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 1 3 0.68
ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 2 2 0.88
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 8 0 0.86
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 8 0 0.94
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 8 1 0.89
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 7 1 0.88
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 10 0 0.83
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 9 0 0.80
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 5 1 0.78
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 5 2 0.58
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; Version: v1 : 3.7.6.3, v
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 2 0 1
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 4 0 1
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 1 0 1
11
1 2 3 1 2 3
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 8 0 0.86
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 8 0 0.94
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 8 1 0.89
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 7 1 0.88
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 10 0 0.83
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 9 0 0.80
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 5 1 0.78
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 5 2 0.58
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; Version: v1 : 3.7.6.3, v
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 2 0 1
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 4 0 1
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 1 0 1
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 0 0 1
ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 1 0 1
ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 1 1 0.31
ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 2 1 0.31
SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody100, w7 : nbody150, w8 : nb
ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 8 0 0.88
ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 8 0 0.91
ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 9 0 0.96
ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12 10 0 0.94
ec5 : [h1, w2 ! w3, v1] M 0.76 1.82 0.84 0.67 0.86 17 11 9 1 0.95
ec6 : [h1, w2 ! w4, v1] S 0.91 5.54 0.80 0.00 0.91 14 11 8 0 0.85
ec7 : [h1, w2 ! w5, v1] L 0.68 25.31 0.57 0.11 0.71 14 14 8 0 0.88
ec8 : [h1, w3 ! w4, v1] L 0.68 1.70 0.56 0.00 0.91 14 13 9 1 0.88
ec9 : [h1, w3 ! w5, v1] VL 0.06 3.68 0.20 0.00 0.64 16 10 9 0 0.90
ec10 : [h1, w4 ! w5, v1] M 0.70 4.85 0.76 0.00 0.75 12 12 11 0 0.95
ec11 : [h1, w6 ! w7, v1] S 0.82 5.79 0.77 0.25 0.88 36 30 28 2 0.89 1
ec12 : [h1, w6 ! w8, v1] S 1.00 0.52 0.92 0.80 0.97 38 30 22 6 0.94
ec13 : [h1, w8 ! w7, v1] S 1.00 0.32 0.92 0.53 0.99 30 33 26 1 0.98
ec14 : [h1, w9 ! w10, v1] L 0.24 4.85 0.56 0.44 0.77 22 21 18 3 0.69 2
ES: Expected severity of environmental change (see Sec. III-B): S: small change; SM: small medium change; M: medium chan
SaC workload descriptions: srad: random matrix generator; pfilter: particle filtering; hotspot: heat transfer differential equations
nbody: simulation of dynamic systems; cg: conjugate gradient; gc: garbage collector. Hardware descriptions (ID: Type/CPUs/Clo
h1: NUC/4/1.30/15/SSD; h2: NUC/2/2.13/7/SCSI; h3:Station/2/2.8/3/SCSI; h4: Amazon/1/2.4/1/SSD; h5: Amazon/1/2.4/0.5/SSD;
11
M6/M7: Number of influential options in source
and target
M8/M9: Number of options that agree/disagree
M10: Correlation between importance of options

RQ3: Are the interactions among configuration
options preserved across environments?
• A low percentage of
potential interactions
are relevant for
performance.
• The importance of
interactions is typically
preserved across
environments.
83
TABLE II: Results.
RQ1 RQ2 RQ3
Environment ES H1.1 H1.2 H1.3 H1.4 H2.1 H2.2 H3.1 H3.2
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15
SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4 : 978/7498; Version: v1 : 1.2, v2 : 2.7
ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 7 0 1 25 25 25 1.00 0.47
ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 4 2 0.51 41 27 21 0.98 0.48
ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7 4 3 1 23 23 22 0.99 0.45
ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 4 3 0.99 22 23 22 0.99 0.45
ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5 3 1 0.32 21 7 7 0.33 0.45
ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 1 3 0.68 7 21 7 0.31 0.50
ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 2 2 0.88 21 7 7 -0.44 0.47
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 8 0 0.86 21 33 18 1.00 0.49
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 8 0 0.94 36 27 24 1.00 0.49
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 8 1 0.89 27 33 22 0.96 0.49
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 7 1 0.88 27 33 20 0.96 0.49
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 10 0 0.83 47 33 29 1.00 0.49
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 9 0 0.80 46 33 27 0.99 0.49
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 5 1 0.78 33 33 20 0.94 0.49
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 5 2 0.58 33 21 18 0.94 0.49
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; Version: v1 : 3.7.6.3, v2 : 3.19.0
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 2 0 1 13 9 8 1.00 N/A
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 4 0 1 10 11 9 1.00 N/A
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 1 0 1 9 9 7 0.99 N/A
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 0 0 1 4 2 2 1.00 N/A
ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 1 0 1 12 11 7 0.99 N/A
ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 1 1 0.31 7 11 6 0.96 N/A
ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 2 1 0.31 7 13 6 0.97 N/A
SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody100, w7 : nbody150, w8 : nbody750, w9 : gc, w10 : cg
ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 8 0 0.88 82 73 52 0.27 0.18
ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 8 0 0.91 82 63 50 0.56 0.18
ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 9 0 0.96 37 64 34 0.94 0.16
ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12 10 0 0.94 34 58 25 0.04 0.15
11
TA
RQ1
M1 M2 M3 M4 M5
SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 155
ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92
ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86
ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32
ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11
ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32
ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04
ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15
x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 :
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43
ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29
ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30
ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47
SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 :
ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79
ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65
ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90
ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81
11
M11: Number of interactions in the source
M12: Number of interactions in the target
M13: Number of interactions that agree
M14: Correlation between the coefficients

RQ4: Are the invalid configurations consistent
across environment?
• A large percentage of
configurations is
typically invalid.
• Information for
distinguishing invalid
regions can be
transferred across
environments.
84
TABLE II: Results.
RQ1 RQ2 RQ3 RQ4
ES H1.1 H1.2 H1.3 H1.4 H2.1 H2.2 H3.1 H3.2 H4.1 H4.2
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18
s): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4 : 978/7498; Version: v1 : 1.2, v2 : 2.7
S 1.00 0.22 0.97 0.92 0.92 9 7 7 0 1 25 25 25 1.00 0.47 0.45 1 1.00
L 0.59 24.88 0.91 0.76 0.86 12 7 4 2 0.51 41 27 21 0.98 0.48 0.45 1 0.98
S 0.96 1.97 0.17 0.44 0.32 9 7 4 3 1 23 23 22 0.99 0.45 0.45 1 1.00
M 0.90 3.36 -0.08 0.30 0.11 7 7 4 3 0.99 22 23 22 0.99 0.45 0.49 1 0.94
L 0.23 0.30 0.35 0.28 0.32 6 5 3 1 0.32 21 7 7 0.33 0.45 0.50 1 0.96
L -0.10 0.72 -0.05 0.35 0.04 5 6 1 3 0.68 7 21 7 0.31 0.50 0.45 1 0.96
VL -0.10 6.95 0.14 0.41 0.15 6 4 2 2 0.88 21 7 7 -0.44 0.47 0.50 1 0.97
8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r2744, v3 : r2744
SM 0.97 1.00 0.99 0.97 0.92 9 10 8 0 0.86 21 33 18 1.00 0.49 0.49 1 1
S 0.96 0.02 0.96 0.76 0.79 9 9 8 0 0.94 36 27 24 1.00 0.49 0.49 1 1
S 0.65 0.06 0.63 0.53 0.58 9 11 8 1 0.89 27 33 22 0.96 0.49 0.49 1 1
M 0.67 0.06 0.64 0.53 0.56 9 10 7 1 0.88 27 33 20 0.96 0.49 0.49 1 1
S 0.05 1.64 0.44 0.43 0.42 12 10 10 0 0.83 47 33 29 1.00 0.49 0.49 1 1
L 0.06 1.54 0.43 0.43 0.37 11 10 9 0 0.80 46 33 27 0.99 0.49 0.49 1 1
L 0.08 1.03 0.26 0.25 0.22 8 10 5 1 0.78 33 33 20 0.94 0.49 0.49 1 1
VL 0.09 14.51 0.26 0.23 0.25 8 9 5 2 0.58 33 21 18 0.94 0.49 0.49 1 1
w2 : write batch, w3 : read rand, w4 : read seq; Version: v1 : 3.7.6.3, v2 : 3.19.0
S 0.99 0.37 0.82 0.35 0.31 5 2 2 0 1 13 9 8 1.00 N/A N/A N/A N/A
M 0.97 1.08 0.88 0.40 0.49 5 5 4 0 1 10 11 9 1.00 N/A N/A N/A N/A
S 0.96 1.27 0.83 0.40 0.35 2 3 1 0 1 9 9 7 0.99 N/A N/A N/A N/A
M 0.50 1.24 0.43 0.17 0.43 1 1 0 0 1 4 2 2 1.00 N/A N/A N/A N/A
TAB
RQ1
M1 M2 M3 M4 M5 M
SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/
ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92
ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 1
ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32
ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11
ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32
ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04
ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15
x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r23
ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92
ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79
ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58
ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56
ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 1
ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 1
ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22
ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25
SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : re
ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31
ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49
ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35
ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43
11
M15/M16: Percentage of invalid
configurations in the source and target
M17: Percentage of invalid configurations,
which are common between environments
M18: Correlation between the coefficients
of the classification models

Implications for transfer learning
• When and why TL works for performance modeling
• Small environmental changes -> Performance behavior is consistent
• A linear transformation of performance models provide a good approximation
• Large environmental changes -> Individual options and interactions
may stay consistent
• A non-linear mapping between performance behavior across environments
• Severe environmental changes -> Found transferable knowledge
• Invalid configurations providing opportunities for avoiding measurements
• Intuitive judgments about transferability of knowledge
• Without deep knowledge about the configuration or implementation
85

Future Work, Insights and Ideas

Future research opportunities
• Sampling strategies
• More informative samples
• Exploiting the importance of specific regions or avoiding invalid regions
• Learning mechanisms
• Learning either a linear or non-linear associations
• Performance testing and debugging
• Transferring interesting test cases that cover interactions between options
• Performance tuning and optimization
• Identifying the interacting options
• Importance sampling exploiting feature interactions
87

Selecting from multiple sources
88
Source Simulator Target Simulator
Source Robot Target Robot
C1
C3
C2
- Different cost
associated to the sources
- Problem is to take
sample from appropriate
source to gain more
information given limited
budget

Active learning with transfer learning
89
Measure
Learn
with TL
TurtleBot
50+3*C1+15*C2-7*C2*C3
Measure
Simulator (Gazebo)
Data
Reuse
DataData
Iteratively find best
sample points that
maximize knowledge

Integrating transfer learning in MAPE-K
90
• Contribute to Knowledge
• Assist in self-optimization
• Support online learning
Knowledge
Environment
Monitoring
(+power)
Analysis
(+power)
Planning
(+recharge)
Execution
Accuracy
Architecture Task
Power

Recap of my previous work
92
RobusT2Scale/FQL4KE [PhD]
✓ Engineering / technical
✓ Maintains application
responsiveness
✓ Handles environmental
uncertainties
✓ Enables knowledge evolution
✗ Learns slowly when situation is
changed
BO4CO/TL4CO [Postdoc1@Imperial]
✓ Mathematical modeling
✓ Finds optimal configuration given a
measurement budget
✓ Step-wise active learning
✗ Does’nt scale well to high dimensions
✗ Expensive to learn
Goal
✔ Industry relevant research
✔ Learn accurate/reliable/cheap performance model
✔ Learned model is used for performance
tuning/debugging/optimization/runtime adaptation
Transfer Learning [Postdoc2@CMU]
✓ Empirical
✓ Learns accurate and reliable models
from “related” sources
✓ Reuse learning across
environmental changes
✗ For severe environmental changes,
transfer is limited, but possible!
[SEAMS14,QoSA16,CCGrid17, TAAS] [MASCOTS16,WICSA16] [SEAMS17]

Acknowledgements
93
Miguel Velez
PhD student, CMU
Christian Kaestner
Professor, CMU
Norbert Siegmund
Professor, Bauhaus-Weimar

Learning Software Performance Models for Dynamic and Uncertain Environments

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Learning Software Performance Models for Dynamic and Uncertain Environments

Similar to Learning Software Performance Models for Dynamic and Uncertain Environments (20)

More from Pooyan Jamshidi

More from Pooyan Jamshidi (20)

Recently uploaded

Recently uploaded (20)

Learning Software Performance Models for Dynamic and Uncertain Environments