Architectural Tradeoff in Learning-Based Software

Architectural Tradeoff in
Learning-Based Software
Pooyan Jamshidi

Carnegie Mellon University
cs.cmu.edu/~pjamshid
SATURN 2018

Texas, May 8th, 2018

I hang around here!
Software
Arch.
Machine
Learning
Distributed
Systems

Goal: Enable developers/users
to ﬁnd the right quality tradeoff

Today’s most popular (ML) systems are conﬁgurable
built

Empirical observations conﬁrm that systems are
becoming increasingly conﬁgurable
08 7/2010 7/2012 7/2014
Release time
1/1999 1/2003 1/2007 1/2011
0
1/2014
N
Release time
02 1/2006 1/2010 1/2014
2.2.14
2.3.4
2.0.35
.3.24
Release time
Apache
1/2006 1/2008 1/2010 1/2012 1/2014
0
40
80
120
160
200
2.0.0
1.0.0
0.19.0
0.1.0
Hadoop
Numberofparameters
Release time
MapReduce
HDFS
[Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]

Empirical observations confirm that systems are
becoming increasingly configurable
nia San Diego, ‡Huazhong Univ. of Science & Technology, †NetApp, Inc
tixu, longjin, xuf001, yyzhou}@cs.ucsd.edu
kar.Pasupathy, Rukma.Talwadker}@netapp.com
prevalent, but also severely
software. One fundamental
y of configuration, reflected
parameters (“knobs”). With
software to ensure high re-
aunting, error-prone task.
nderstanding a fundamental
users really need so many
answer, we study the con-
including thousands of cus-
m (Storage-A), and hundreds
ce system software projects.
ng findings to motivate soft-
ore cautious and disciplined
these findings, we provide
ich can significantly reduce
A as an example, the guide-
ters and simplify 19.7% of
on existing users. Also, we
tion methods in the context
7/2006 7/2008 7/2010 7/2012 7/2014
0
100
200
300
400
500
600
700
Storage-A
Numberofparameters
Release time
1/1999 1/2003 1/2007 1/2011
0
100
200
300
400
500
5.6.2
5.5.0
5.0.16
5.1.3
4.1.0
4.0.12
3.23.0
1/2014
MySQL
Numberofparameters
Release time
1/1998 1/2002 1/2006 1/2010 1/2014
0
100
200
300
400
500
600
1.3.14
2.2.14
2.3.4
2.0.35
1.3.24
Numberofparameters
Release time
Apache
1/2006 1/2008 1/2010 1/2012 1/2014
0
40
80
120
160
200
2.0.0
1.0.0
0.19.0
0.1.0
Hadoop
Numberofparameters
Release time
MapReduce
HDFS
[Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]

“all constants should be conﬁgurable,
even if we can’t see any reason to
conﬁgure them.” — HDFS-4304

Conﬁgurations determine the performance
behavior
void Parrot_setenv(. . . name,. . . value){
#ifdef PARROT_HAS_SETENV
my_setenv(name, value, 1);
#else
int name_len=strlen(name);
int val_len=strlen(value);
char* envs=glob_env;
if(envs==NULL){
return;
}
strcpy(envs,name);
strcpy(envs+name_len,"=");
strcpy(envs+name_len + 1,value);
putenv(envs);
#endif
}
#ifdef LINUX
extern int Parrot_signbit(double x){
endif
else
PARROT_HAS_SETENV
LINUX

Conﬁgurations determine the performance
behavior
#else
if(envs==NULL){
return;
}
strcpy(envs,name);
putenv(envs);
#endif
}
#ifdef LINUX
extern int Parrot_signbit(double x){
endif
else
PARROT_HAS_SETENV
LINUX
Speed
Energy

How do we understand performance behavior of
real-world highly-conﬁgurable systems that scale well…

How do we understand performance behavior of
real-world highly-conﬁgurable systems that scale well…
… and enable developers/users to reason about
qualities (performance, energy) and to make tradeoff?

Outline
Case
Study
Transfer
Learning
[SEAMS’17]

Outline
Case
Study
Transfer
Learning
Theory
Building
[SEAMS’17]
[ASE’17]

Outline
Case
Study
Transfer
Learning
Theory
Building
Guided
Sampling
[SEAMS’17]
[ASE’17]

Outline
Case
Study
Transfer
Learning
Theory
Building
Guided
Sampling
Future
Directions
[SEAMS’17]
[ASE’17]

SocialSensor
•Identifying trending topics

•Identifying user deﬁned topics

•Social media search

SocialSensor
Crawling
Tweets: [5k-20k/min]
Crawled
items
Internet

SocialSensor
Crawling
Store
Crawled
items
Internet

SocialSensor
Orchestrator
Crawling
Every 10 min:
[100k tweets]
Store
Crawled
items
FetchInternet

SocialSensor
Content AnalysisOrchestrator
Crawling
Every 10 min:
[100k tweets]
Store
Push
Crawled
items
FetchInternet

SocialSensor
Crawling
Every 10 min:
[100k tweets]
Tweets: [10M]
Store
Push
Store
Crawled
items
FetchInternet

SocialSensor
Crawling
Search and Integration
Every 10 min:
[100k tweets]
Tweets: [10M]
Fetch
Store
Push
Store
Crawled
items
FetchInternet

Challenges
Crawling
Every 10 min:
[100k tweets]
Tweets: [10M]
Fetch
Store
Push
Store
Crawled
items
FetchInternet

Challenges
Crawling
Every 10 min:
[100k tweets]
Tweets: [10M]
Fetch
Store
Push
Store
Crawled
items
FetchInternet
100X

Challenges
Crawling
Every 10 min:
[100k tweets]
Tweets: [10M]
Fetch
Store
Push
Store
Crawled
items
FetchInternet
100X
10X

Challenges
Crawling
Every 10 min:
[100k tweets]
Tweets: [10M]
Fetch
Store
Push
Store
Crawled
items
FetchInternet
100X
10X
Real time

How can we gain a better performance without
using more resources?

Let’s try out diﬀerent system conﬁgurations!

Opportunity: Data processing engines in the
pipeline were all conﬁgurable

> 100 > 100 > 100

> 100 > 100 > 100
2300

0 500 1000 1500
Throughput (ops/sec)
0
1000
2000
3000
4000
5000
Averagewritelatency(s)
Default conﬁguration was bad, so was the expert’
better
better

0 500 1000 1500
0
1000
2000
3000
4000
5000
Defaultbetter
better

0 500 1000 1500
0
1000
2000
3000
4000
5000
Default
Recommended
by an expert
better
better

0 500 1000 1500
0
1000
2000
3000
4000
5000
Default
Recommended
by an expert Optimal
Conﬁguration
better
better

0 0.5 1 1.5 2 2.5
Throughput (ops/sec) 10
4
0
50
100
150
200
250
300
Latency(ms)
better
better

0 0.5 1 1.5 2 2.5
4
0
50
100
150
200
250
300
Latency(ms)
Default
better
better

0 0.5 1 1.5 2 2.5
4
0
50
100
150
200
250
300
Latency(ms)
Default
Recommended
by an expert
better
better

0 0.5 1 1.5 2 2.5
4
0
50
100
150
200
250
300
Latency(ms)
Default
Recommended
by an expert
Optimal
Conﬁguration
better
better

Why this is an important problem?

Signiﬁcant time saving

• 2X-10X faster than worst

• Noticeably faster than median

• Default is bad

• Default is bad
• Expert’s is not optimal

• Default is bad
Large conﬁguration space

• Default is bad
• Exhaustive search is expensive

• Default is bad
• Exhaustive search is expensive
• Speciﬁc to hardware/workload/version

What did happen at the end?
• Achieved the objectives (100X user, same experience)

• Saved money by reducing cloud resources up to 20%
• Our tool was able to identify conﬁgurations that was
consistently better than expert recommendation

Outline
Case
Study
Transfer
Learning
Theory
Building
Guided
Sampling
Future
Directions

To enable performance tradeoff, we need a model
to reason about qualities
#else
if(envs==NULL){
return;
}
strcpy(envs,name);
putenv(envs);
#endif
}
#ifdef LINUX
endif
else
PARROT_HAS_SETENV
LINUX
f(·) = 5 + 3 ⇥ o1
Execution time (s)

#else
if(envs==NULL){
return;
}
strcpy(envs,name);
putenv(envs);
#endif
}
#ifdef LINUX
endif
else
PARROT_HAS_SETENV
LINUX
f(·) = 5 + 3 ⇥ o1
Execution time (s)
f(o1 := 1) = 8

#else
if(envs==NULL){
return;
}
strcpy(envs,name);
putenv(envs);
#endif
}
#ifdef LINUX
endif
else
PARROT_HAS_SETENV
LINUX
f(·) = 5 + 3 ⇥ o1
Execution time (s)
f(o1 := 0) = 5
f(o1 := 1) = 8

What is a performance model?
f : C ! R
f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2
c =< o1, o2 >

f : C ! R
f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2
c =< o1, o2 >
c =< o1, o2, ..., o10 >

f : C ! R
f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2
c =< o1, o2 >
c =< o1, o2, ..., o10 >
c =< o1, o2, ..., o100 >
···

How do we learn performance models?
TurtleBot

Measure
TurtleBot
Conﬁgurations

Measure
Learn
TurtleBot
Conﬁgurations
f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2

Measure
Learn
TurtleBot
Optimization
Reasoning
Debugging
Conﬁgurations
f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2

Insight: Performance measurements of the real
system is “similar” to the ones from the simulators
Measure
Conﬁgurations
Performance

Insight: Performance measurements of the real
system is “similar” to the ones from the simulators
Measure
Simulator (Gazebo)
Data
Conﬁgurations
Performance
So why not reuse these data,
instead of measuring on real robot?

We developed methods to make learning cheaper
via transfer learning
Target (Learn)Source (Given)
DataModel
Transferable
Knowledge
II. INTUITION
Understanding the performance behavior of configurable
software systems can enable (i) performance debugging, (ii)
performance tuning, (iii) design-time evolution, or (iv) runtime
adaptation [11]. We lack empirical understanding of how the
performance behavior of a system will vary when the environ-
ment of the system changes. Such empirical understanding will
provide important insights to develop faster and more accurate
learning techniques that allow us to make predictions and
optimizations of performance for highly configurable systems
in changing environments [10]. For instance, we can learn
performance behavior of a system on a cheap hardware in a
controlled lab environment and use that to understand the per-
formance behavior of the system on a production server before
shipping to the end user. More specifically, we would like to
know, what the relationship is between the performance of a
system in a specific environment (characterized by software
configuration, hardware, workload, and system version) to the
one that we vary its environmental conditions.
In this research, we aim for an empirical understanding of
A. Preliminary concept
In this section, we p
cepts that we use throu
enable us to concisely
1) Configuration and
the i-th feature of a co
enabled or disabled an
configuration space is m
all the features C =
Dom(Fi) = {0, 1}. A
a member of the confi
all the parameters are
range (i.e., complete ins
We also describe an
e = [w, h, v] drawn fr
W ⇥H ⇥V , where they
values for workload, ha
2) Performance mod
configuration space F
formance model is a b
given some observation
combination of system
II. INTUITION
standing the performance behavior of configurable
systems can enable (i) performance debugging, (ii)
ance tuning, (iii) design-time evolution, or (iv) runtime
on [11]. We lack empirical understanding of how the
ance behavior of a system will vary when the environ-
the system changes. Such empirical understanding will
important insights to develop faster and more accurate
techniques that allow us to make predictions and
tions of performance for highly configurable systems
ging environments [10]. For instance, we can learn
ance behavior of a system on a cheap hardware in a
ed lab environment and use that to understand the per-
e behavior of the system on a production server before
to the end user. More specifically, we would like to
hat the relationship is between the performance of a
n a specific environment (characterized by software
ation, hardware, workload, and system version) to the
we vary its environmental conditions.
s research, we aim for an empirical understanding of
A. Preliminary concepts
In this section, we provide formal definitions of
cepts that we use throughout this study. The forma
enable us to concisely convey concept throughout
1) Configuration and environment space: Let F
the i-th feature of a configurable system A whic
enabled or disabled and one of them holds by de
configuration space is mathematically a Cartesian
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(F
Dom(Fi) = {0, 1}. A configuration of a syste
a member of the configuration space (feature spa
all the parameters are assigned to a specific valu
range (i.e., complete instantiations of the system’s pa
We also describe an environment instance by 3
e = [w, h, v] drawn from a given environment s
W ⇥H ⇥V , where they respectively represent sets
values for workload, hardware and system version.
2) Performance model: Given a software syste
configuration space F and environmental instances
formance model is a black-box function f : F ⇥
given some observations of the system performanc
combination of system’s features x 2 F in an en
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
F where ✏i ⇠ N (0, i). The training data for our regression
models is then simply Dtr = {(xi, yi)}n
i=1. In other words, a
response function is simply a mapping from the input space to
a measurable performance metric that produces interval-scaled
data (here we assume it produces real numbers).
3) Performance distribution: For the performance model,
we measured and associated the performance response to each
configuration, now let introduce another concept where we
vary the environment and we measure the performance. An
empirical performance distribution is a stochastic process,
pd : E ! (R), that defines a probability distribution over
performance measures for each environmental conditions. To
tem version) to the
ons.
al understanding of
g via an informed
learning a perfor-
ed on a well-suited
the knowledge we
the main research
information (trans-
o both source and
ore can be carried
r. This transferable
[10].
s that we consider
uration is a set of
s the primary vari-
rstand performance
ike to understand
der study will be
formance model is a black-box function f : F ⇥ E
given some observations of the system performance fo
combination of system’s features x 2 F in an enviro
e 2 E. To construct a performance model for a syst
with configuration space F, we run A in environment in
e 2 E on various combinations of configurations xi 2 F
record the resulting performance values yi = f(xi) + ✏i
F where ✏i ⇠ N (0, i). The training data for our regr
models is then simply Dtr = {(xi, yi)}n
i=1. In other wo
response function is simply a mapping from the input sp
a measurable performance metric that produces interval-
data (here we assume it produces real numbers).
3) Performance distribution: For the performance m
we measured and associated the performance response t
configuration, now let introduce another concept whe
vary the environment and we measure the performanc
empirical performance distribution is a stochastic pr
pd : E ! (R), that defines a probability distributio
performance measures for each environmental conditio
Extract Reuse
Learn Learn
Goal: Gain strength by
transferring information
across environments

TurtleBot
[P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17]
Our transfer learning solution

TurtleBot
Simulator (Gazebo)

Data
Measure
TurtleBot
Simulator (Gazebo)

DataData
Data
Measure
Measure
Reuse
TurtleBot
Simulator (Gazebo)
Conﬁgurations

DataData
Data
Measure
Measure
Reuse Learn
TurtleBot
Simulator (Gazebo)
Conﬁgurations
f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2

Gaussian processes for performance modeling
t = n
output,f(x)
input, x

t = n
Observation
output,f(x)
input, x

t = n
Observation
Mean
output,f(x)
input, x

t = n
Observation
Mean
Uncertainty
output,f(x)
input, x

t = n
Observation
Mean
Uncertainty
New
observation
output,f(x)
input, x

input, x
t = n t = n + 1
Observation
Mean
Uncertainty
New
observation
output,f(x)
input, x

Gaussian Processes enables reasoning about
performance
Step 1: Fit GP to the data seen
so far

Step 2: Explore the model for
regions of most variance

Step 3: Sample that region

Step 4: Repeat
-1.5 -1 -0.5 0 0.5 1 1.5
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Conﬁguration
Space
Empirical
Model
Experiment
Experiment
0 20 40 60 80 100 120 140 160 180 200
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Selection Criteria
Sequential Design

CoBot experiment: DARPA BRASS
0 2 4 6 8
Localization error [m]
10
15
20
25
30
35
40
CPUutilization[%]
Pareto
front
better
better
no_of_particles=x
no_of_reﬁnement=y

0 2 4 6 8
10
15
20
25
30
35
40
CPUutilization[%]
Energy
constraint
Pareto
front
better
better
no_of_particles=x
no_of_reﬁnement=y

0 2 4 6 8
10
15
20
25
30
35
40
CPUutilization[%]
Energy
constraint
Safety
constraint
Pareto
front
better
better
no_of_particles=x
no_of_reﬁnement=y

0 2 4 6 8
10
15
20
25
30
35
40
CPUutilization[%]
Energy
constraint
Safety
constraint
Pareto
front
Sweet
Spot
better
better
no_of_particles=x
no_of_reﬁnement=y

CoBot
experiment
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
Source
(given)
CPU [%]

CoBot
experiment
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
Source
(given)
Target
(ground truth
6 months)
CPU [%] CPU [%]

CoBot
experiment
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
Source
(given)
Target
(ground truth
6 months)
Prediction with
4 samples
CPU [%] CPU [%]

CoBot
experiment
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
Source
(given)
Target
(ground truth
6 months)
Prediction with
4 samples
Prediction with
Transfer learning
CPU [%] CPU [%]

Results: Other conﬁgurable systems
CoBot WordCount SOL
RollingSort Cassandra (HW) Cassandra (DB)

Transfer Learning for Improving Model Predictions
in Highly Configurable Software
Pooyan Jamshidi, Miguel Velez, Christian Kästner
Carnegie Mellon University, USA
{pjamshid,mvelezce,kaestner}@cs.cmu.edu
Norbert Siegmund
Bauhaus-University Weimar, Germany
norbert.siegmund@uni-weimar.de
Prasad Kawthekar
Stanford University, USA
pkawthek@stanford.edu
Abstract—Modern software systems are built to be used in
dynamic environments using configuration capabilities to adapt to
changes and external uncertainties. In a self-adaptation context,
we are often interested in reasoning about the performance of
the systems under different configurations. Usually, we learn
a black-box model based on real measurements to predict
the performance of the system given a specific configuration.
However, as modern systems become more complex, there are
many configuration parameters that may interact and we end up
learning an exponentially large configuration space. Naturally,
this does not scale when relying on real measurements in the
actual changing environment. We propose a different solution:
Instead of taking the measurements from the real system, we
learn the model using samples from other sources, such as
simulators that approximate performance of the real system at
Predictive Model
Learn Model with
Transfer Learning
Measure Measure
Data
Source
Target
Simulator (Source) Robot (Target)
Adaptation
Fig. 1: Transfer learning for performance model learning.
order to identify the best performing configuration for a robot
Details: [SEAMS ’17]

Summary (transfer learning)
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25

• Model for making tradeoﬀ between qualities
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25

• Scale to large space and environmental changes
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25

• Transfer learning can help
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25

• Increase prediction accuracy 5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25

• Increase prediction accuracy
• Increase model reliability
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25

• Increase prediction accuracy
• Increase model reliability
• Decrease model building cost
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25

Looking further: When transfer learning goes
wrong
10
20
30
40
50
60
AbsolutePercentageError[%]
Sources s s1 s2 s3 s4 s5 s6
noise-level 0 5 10 15 20 25 30
corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19
µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75
Insight: Predictions become
more accurate when the source
is more related to the target.
Non-transfer-learning

wrong
10
20
30
40
50
60
noise-level 0 5 10 15 20 25 30
corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19
µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75
It worked!

wrong
10
20
30
40
50
60
noise-level 0 5 10 15 20 25 30
corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19
µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75
It worked! It didn’t!

Our empirical study: We looked at different highly-
conﬁgurable systems to gain insights
[P. Jamshidi, et al., “Transfer learning for performance modeling of conﬁgurable systems….”, ASE’17]
SPEAR (SAT Solver)
Analysis time
14 options
16,384 configurations
SAT problems
3 hardware
2 versions
X264 (video encoder)
Encoding time
16 options
Video quality/size
2 hardware
3 versions
SQLite (DB engine)
Query time
14 options
DB Queries
2 hardware
2 versions
SaC (Compiler)
Execution time
50 options
10 Demo programs

Observation 1: Linear shift happens only in limited
environmental changes
Soft Environmental change Severity Corr.
SPEAR
NUC/2 -> NUC/4 Small 1.00
Amazon_nano -> NUC Large 0.59
Hardware/workload/version V Large -0.10
x264
Version Large 0.06
Workload Medium 0.65
SQLite
write-seq -> write-batch Small 0.96
read-rand -> read-seq Medium 0.50
Target
Source
Throughput
Implication: Simple transfer learning is limited
to hardware changes in practice
log P(θ, Xobs )
Θ
l
P(θ|Xobs)
Θ
Figure 5: The ﬁrst column shows the log joint probab

Soft Environmental change Severity Dim t-test
x264
Version Large
16
12 10
Hardware/workload/ver V Large 8 9
SQLite
write-seq -> write-batch V Large
14
3 4
read-rand -> read-seq Medium 1 1
SaC Workload V Large 50 16 10
Implication: Avoid wasting budget on non-informative part
of conﬁguration space and focusing where it matters.
Observation 2: Inﬂuential options and interactions
are preserved across environments

Soft Environmental change Severity Dim t-test
x264
Version Large
16
12 10
Hardware/workload/ver V Large 8 9
SQLite
write-seq -> write-batch V Large
14
3 4
read-rand -> read-seq Medium 1 1
SaC Workload V Large 50 16 10
Implication: Avoid wasting budget on non-informative part
of conﬁguration space and focusing where it matters.
Observation 2: Inﬂuential options and interactions
are preserved across environments
216
250
= 0.000000000058
We only need to
explore part of
the space:

Transfer Learning for Performance Modeling of
Configurable Systems: An Exploratory Analysis
Pooyan Jamshidi
Norbert Siegmund
Bauhaus-University Weimar, Germany
Miguel Velez, Christian Kästner
Akshay Patel, Yuvraj Agarwal
Abstract—Modern software systems provide many configura-
tion options which significantly influence their non-functional
properties. To understand and predict the effect of configuration
options, several sampling and learning strategies have been
proposed, albeit often with significant cost to cover the highly
dimensional configuration space. Recently, transfer learning has
been applied to reduce the effort of constructing performance
models by transferring knowledge about performance behavior
across environments. While this line of research is promising to
learn more accurate models at a lower cost, it is unclear why
and when transfer learning works for performance modeling. To
shed light on when it is beneficial to apply transfer learning, we
conducted an empirical study on four popular software systems,
varying software configurations and environmental conditions,
such as hardware, workload, and software versions, to identify
the key knowledge pieces that can be exploited for transfer
learning. Our results show that in small environmental changes
(e.g., homogeneous workload change), by applying a linear
transformation to the performance model, we can understand
the performance behavior of the target environment, while for
severe environmental changes (e.g., drastic workload change) we
can transfer only knowledge that makes sampling more efficient,
e.g., by reducing the dimensionality of the configuration space.
Index Terms—Performance analysis, transfer learning.
Fig. 1: Transfer learning is a form of machine learning that takes
advantage of transferable knowledge from source to learn an accurate,
reliable, and less costly model for the target environment.
their byproducts across environments is demanded by many
Details: [ASE ’17]

Outline
Case
Study
Transfer
Learning
Empirical
Study
Guided
Sampling
Vision

What will the software systems
of the future look like?

Software 2.0
Increasingly customized and conﬁgurable
VISION

Software 2.0
Increasingly customized and conﬁgurable
VISION
Increasingly competing objectives
Accuracy
Training speed
Inference speed
Model size
Energy

Deep neural
network as a
highly
conﬁgurable
system
trans1
N: 40, op: [1, 1, 1, 0], groups: [4, 2, 2, X]
ks: [[3, 5], [5, 7], [3, 3], X]
d: [[2, 2], [1, 1], [3, 1], X]
N: 16, op: [1, 1, 0, 0], groups: [2, 1, X, X]
ks: [[5, 5], [3, 5], X, X]
d: [[1, 1], [3, 1], X, X]
N: 48, op: [1, 1, 1, 1], groups: [1, 1, 1, 4]
ks: [[5, 5], [7, 3], [3, 3], [3, 7]]
d: [[1, 1], [1, 3], [1, 2], [3, 1]]
N: 16, op: [1, 1, 1, 0], groups: [1, 4, 4, X]
ks: [[7, 3], [3, 5], [5, 5], X]
d: [[1, 1], [2, 1], [2, 2], X]
N: 16, op: [1, 1, 1, 1], groups: [4, 4, 2, 4]
ks: [[7, 7], [5, 3], [3, 5], [3, 5]]
d: [[1, 1], [1, 2], [3, 1], [1, 1]]
N: 8, op: [1, 1, 1, 1], groups: [4, 1, 1, 2]
ks: [[5, 3], [3, 5], [7, 5], [5, 5]]
d: [[2, 1], [3, 2], [1, 2], [1, 2]]
N: 48, op: [1, 1, 1, 1], groups: [1, 4, 4, 2]
ks: [[7, 3], [7, 7], [5, 5], [5, 7]]
d: [[1, 2], [1, 1], [1, 1], [1, 1]]
N: 16, op: [1, 1, 0, 0], groups: [2, 2, X, X]
ks: [[5, 7], [5, 3], X, X]
d: [[2, 2], [1, 2], X, X]
N: 56, op: [1, 0, 0, 0], groups: [2, X, X, X]
ks: [[3, 7], X, X, X]
d: [[1, 1], X, X, X]
N: 48, op: [1, 1, 1, 0], groups: [4, 2, 1, X]
ks: [[3, 5], [7, 5], [7, 5], X]
d: [[3, 2], [1, 2], [1, 2], X]
N: 56, op: [1, 0, 0, 0], groups: [1, X, X, X]
ks: [[5, 3], X, X, X]
d: [[2, 1], X, X, X]
+
N: 64, op: [1, 1, 1, 1], groups: [1, 1, 4, 2]
ks: [[5, 5], [5, 3], [5, 3], [7, 5]]
d: [[2, 1], [1, 1], [2, 1], [1, 1]]
+
N: 32, op: [1, 1, 0, 0], groups: [2, 2, X, X]
ks: [[7, 5], [3, 7], X, X]
d: [[1, 2], [3, 1], X, X]
N: 32, op: [1, 1, 1, 0], groups: [2, 2, 1, X]
ks: [[7, 3], [7, 7], [3, 7], X]
d: [[1, 3], [1, 1], [3, 1], X]
N: 40, op: [1, 1, 0, 0], groups: [2, 4, X, X]
ks: [[3, 7], [5, 7], X, X]
d: [[1, 1], [1, 1], X, X]
+
+
+
N: 8, op: [1, 1, 1, 1], groups: [4, 2, 1, 2]
ks: [[3, 7], [7, 5], [3, 3], [3, 7]]
d: [[3, 1], [1, 1], [2, 1], [2, 1]]
+++ +
N: 16, op: [1, 1, 1, 0], groups: [1, 4, 4, X]
ks: [[7, 5], [3, 7], [5, 7], X]
d: [[1, 1], [1, 1], [1, 1], X]
++
N: 56, op: [1, 1, 1, 1], groups: [1, 4, 1, 1]
ks: [[7, 5], [3, 3], [3, 3], [3, 3]]
d: [[1, 1], [1, 2], [2, 1], [3, 1]]
N: 40, op: [1, 1, 1, 1], groups: [1, 4, 4, 2]
ks: [[3, 3], [3, 5], [5, 3], [3, 3]]
d: [[1, 3], [1, 1], [1, 3], [3, 2]]
+
N: 24, op: [1, 1, 1, 1], groups: [2, 1, 1, 2]
ks: [[7, 3], [3, 3], [7, 3], [7, 5]]
d: [[1, 2], [2, 3], [1, 3], [1, 1]]
N: 32, op: [1, 1, 1, 1], groups: [1, 1, 1, 1]
ks: [[5, 5], [5, 3], [5, 7], [7, 3]]
d: [[2, 2], [1, 3], [1, 1], [1, 2]]
N: 24, op: [1, 1, 1, 1], groups: [2, 1, 4, 1]
ks: [[7, 7], [7, 7], [7, 3], [7, 5]]
d: [[1, 1], [1, 1], [1, 3], [1, 1]]
N: 40, op: [1, 1, 1, 1], groups: [1, 1, 2, 1]
ks: [[7, 3], [7, 5], [5, 3], [3, 3]]
d: [[1, 3], [1, 2], [2, 3], [1, 1]]
+
N: 24, op: [1, 1, 1, 0], groups: [4, 1, 2, X]
ks: [[5, 7], [7, 3], [5, 5], X]
d: [[1, 1], [1, 1], [1, 2], X]
++ + ++ + ++ ++
+
+ +
+
+
+
+ +
+
+ +
+
+
+
+
+ + ++
output
++
N: 24, op: [1, 0, 0, 0], groups: [2, X, X, X]
ks: [[7, 5], X, X, X]
d: [[1, 2], X, X, X]
N: 56, op: [1, 1, 1, 1], groups: [4, 4, 4, 4]
ks: [[7, 5], [3, 3], [3, 7], [5, 5]]
d: [[1, 2], [2, 1], [3, 1], [1, 2]]
+
+
+
N: 24, op: [1, 1, 1, 0], groups: [2, 2, 4, X]
ks: [[5, 5], [5, 3], [5, 5], X]
d: [[1, 1], [1, 2], [1, 2], X]
N: 56, op: [1, 1, 1, 1], groups: [2, 1, 1, 1]
ks: [[5, 7], [7, 3], [3, 5], [7, 5]]
d: [[2, 1], [1, 2], [2, 1], [1, 2]]
N: 40, op: [1, 0, 0, 0], groups: [2, X, X, X]
ks: [[7, 5], X, X, X]
d: [[1, 2], X, X, X]
N: 64, op: [1, 1, 1, 1], groups: [4, 1, 1, 1]
ks: [[7, 3], [5, 3], [7, 7], [5, 7]]
d: [[1, 1], [2, 2], [1, 1], [1, 1]]
+
+
+
+
+
+
N: 56, op: [1, 1, 1, 0], groups: [4, 2, 4, X]
ks: [[7, 5], [3, 7], [7, 5], X]
d: [[1, 2], [3, 1], [1, 2], X]
N: 32, op: [1, 1, 0, 0], groups: [1, 4, X, X]
ks: [[5, 3], [3, 3], X, X]
d: [[1, 2], [3, 2], X, X]
+
+
+
N: 16, op: [1, 1, 0, 0], groups: [1, 1, X, X]
ks: [[5, 5], [3, 7], X, X]
d: [[1, 2], [3, 1], X, X]
+
+
+
+
+
+
+
+
+
+ ++
+
++
+
+
+
++
+
+ +

Exploring the design space of deep networks
accuracy is reported on the CIFAR-10 test set. We note that the test set
tion, and it is only used for ﬁnal model evaluation. We also evaluate the
in a large-scale setting on the ImageNet challenge dataset (Sect. 4.3).
globalpool
linear&softmax
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
image
conv3x3
globalpool
linear&softmax
large CIFAR-10 model
cell
cell
cell
cell
cell
cell
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
sep.conv3x3/2
globalpool
linear&softmax
ImageNet model
cell
cell
cell
cell
cell
cell
n models constructed using the cells optimized with architecture search.
during architecture search on CIFAR-10. Top-right: large CIFAR-10
Optimal Architecture
(Yesterday)

globalpool
linear&softmax
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
image
conv3x3
globalpool
linear&softmax
cell
cell
cell
cell
cell
cell
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
sep.conv3x3/2
globalpool
linear&softmax
ImageNet model
cell
cell
cell
cell
cell
cell
(Yesterday)
New Fraud Pattern

globalpool
linear&softmax
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
image
conv3x3
globalpool
linear&softmax
cell
cell
cell
cell
cell
cell
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
sep.conv3x3/2
globalpool
linear&softmax
ImageNet model
cell
cell
cell
cell
cell
cell
milar approach has recently been used in (Zoph et al., 2017; Zhong et al., 2017).
rchitecture search is carried out entirely on the CIFAR-10 training set, which we split into two
ub-sets of 40K training and 10K validation images. Candidate models are trained on the training
ubset, and evaluated on the validation subset to obtain the ﬁtness. Once the search process is over,
e selected cell is plugged into a large model which is trained on the combination of training and
alidation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We note that the test set
never used for model selection, and it is only used for ﬁnal model evaluation. We also evaluate the
ells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge dataset (Sect. 4.3).
sep.conv3x3/2
sep.conv3x3/2
sep.conv3x3
conv3x3
globalpool
linear&softmax
small CIFAR-10 model
cell
cell
cell
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
image
conv3x3
globalpool
linear&softmax
cell
cell
cell
cell
cell
cell
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
image
conv3x3/2
conv3x3/2
sep.conv3x3/2
globalpool
linear&softmax
cell
cell
cell
cell
cell
cell
cell
(Yesterday)
(Today)
New Fraud Pattern

Deep architecture deployment
[2.44, 0.88]
[9.93, 0.77]

To achieve the optimal results, it is important to
explore:
• Network architectures

• Software libraries [TensorFlow, CNTK, Theano, etc]

• Deployment infrastructure

0.2 0.4 0.6 0.8 1 1.2
Inference time [h]
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Validationerror
better
better

0.2 0.4 0.6 0.8 1 1.2
Inference time [h]
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Validationerror
Default
subset, and evaluated on the validation subset
the selected cell is plugged into a large mode
validation sub-sets, and the accuracy is report
is never used for model selection, and it is only
cells, learned on CIFAR-10, in a large-scale se
image
sep.conv3x3/2
sep.conv3x3/2
sep.conv3x3
conv3x3
globalpool
linear&softmax
cell
cell
cell
sep.conv3x3
image
conv3x3/2
conv3x3/2
sep.conv3x3/2
Imag
cell
cell
cell
Figure 2: Image classiﬁcation models construc
Top-left: small model used during architectu
model used for learned cell evaluation. Bottom
better
better

0.2 0.4 0.6 0.8 1 1.2
Inference time [h]
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Validationerror
Default
Pareto
optimal
image
sep.conv3x3/2
sep.conv3x3/2
sep.conv3x3
conv3x3
globalpool
linear&softmax
cell
cell
cell
sep.conv3x3
image
conv3x3/2
conv3x3/2
sep.conv3x3/2
Imag
cell
cell
cell
better
better

0.2 0.4 0.6 0.8 1 1.2
Inference time [h]
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Validationerror
Default
Pareto
optimal
image
sep.conv3x3/2
sep.conv3x3/2
sep.conv3x3
conv3x3
globalpool
linear&softmax
cell
cell
cell
sep.conv3x3
image
conv3x3/2
conv3x3/2
sep.conv3x3/2
Imag
cell
cell
cell
Architecture search is carried out entirely on the CIFAR-10 training set, which w
sub-sets of 40K training and 10K validation images. Candidate models are trained
subset, and evaluated on the validation subset to obtain the fitness. Once the search
the selected cell is plugged into a large model which is trained on the combination
validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We note
is never used for model selection, and it is only used for final model evaluation. We
cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge data
image
sep.conv3x3/2
sep.conv3x3/2
sep.conv3x3
conv3x3
globalpool
linear&softmax
cell
cell
cell
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
image
conv3x3
cell
cell
cell
cell
cell
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
image
conv3x3/2
conv3x3/2
sep.conv3x3/2
globalpool
linear&softmax
ImageNet model
cell
cell
cell
cell
cell
cell
cell
Figure 2: Image classification models constructed using the cells optimized with arc
Top-left: small model used during architecture search on CIFAR-10. Top-right:
better
better

0.2 0.4 0.6 0.8 1 1.2
Inference time [h]
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Validationerror
Default
Pareto
optimal
image
sep.conv3x3/2
sep.conv3x3/2
sep.conv3x3
conv3x3
globalpool
linear&softmax
cell
cell
cell
sep.conv3x3
image
conv3x3/2
conv3x3/2
sep.conv3x3/2
Imag
cell
cell
cell
validation subset to obtain the fitness. Once the search process is over,
nto a large model which is trained on the combination of training and
ccuracy is reported on the CIFAR-10 test set. We note that the test set
globalpool
linear&softmax
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
image
conv3x3
globalpool
linear&softmax
cell
cell
cell
cell
cell
cell
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
sep.conv3x3/2
globalpool
linear&softmax
ImageNet model
cell
cell
cell
cell
cell
cell
valuation. Bottom: ImageNet model used for learned cell evaluation.
Architecture search is carried out entirely on the CIFAR-10 training set, which w
sub-sets of 40K training and 10K validation images. Candidate models are trained
subset, and evaluated on the validation subset to obtain the fitness. Once the search
the selected cell is plugged into a large model which is trained on the combination
validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We note
is never used for model selection, and it is only used for final model evaluation. We
cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge data
image
sep.conv3x3/2
sep.conv3x3/2
sep.conv3x3
conv3x3
globalpool
linear&softmax
cell
cell
cell
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
image
conv3x3
cell
cell
cell
cell
cell
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
image
conv3x3/2
conv3x3/2
sep.conv3x3/2
globalpool
linear&softmax
ImageNet model
cell
cell
cell
cell
cell
cell
cell
Figure 2: Image classification models constructed using the cells optimized with arc
Top-left: small model used during architecture search on CIFAR-10. Top-right:
better
better

ML pipeline
[Jeff Smith’s book on Reactive ML]

Architectural tradeoffs for production-ready ML
systems (Uber)

More info?
https://tinyurl.com/y97u68cr

Conﬁguration errors are prevalent

Conﬁguration complexity and dependencies between
options is a major source of conﬁguration errors
Default
Operational context I
Operational context II
Operational context III

Conﬁguration Options
Apache Hadoop Architecture

Conﬁguration Options
Apache Hadoop Architecture
Associated to

Conﬁgurations are software too

Conﬁguration errors are common
69%
31%
Conﬁguration
Error
Other
Errors
cobalt.io

We can ﬁnd the repair patches faster with a lighter
workload
• [localization]: Using transfer learning to derive the most
likely conﬁgurations that manifest the bugs

We can find the repair patches faster with a lighter
workload
• [localization]: Using transfer learning to derive the most
likely configurations that manifest the bugs
• [repair]: Automatically prepare patches to fix the
configuration bug

Architectural Tradeoff in Learning-Based Software

Architectural Tradeoff in Learning-Based Software

Recommended

Recommended

More Related Content

Similar to Architectural Tradeoff in Learning-Based Software

Similar to Architectural Tradeoff in Learning-Based Software (20)

More from Pooyan Jamshidi

More from Pooyan Jamshidi (20)

Recently uploaded

Recently uploaded (20)

Architectural Tradeoff in Learning-Based Software