SlideShare a Scribd company logo
1 of 213
Download to read offline
Transfer Learning for
Performance Analysis of
Machine Learning Systems
Pooyan Jamshidi

University of South Carolina
pooyanjamshidi
BSc. In CS,
Iran, 1999-2003
BSc. In CS,
Iran, 1999-2003
Masters of Eng.,
Iran, 2003-2006
BSc. In CS,
Iran, 1999-2003
Masters of Eng.,
Iran, 2003-2006
Software Industry,
2003-2010
BSc. In CS,
Iran, 1999-2003
Masters of Eng.,
Iran, 2003-2006
Software Industry,
2003-2010
PhD, DCU, Ireland
2010-2014
BSc. In CS,
Iran, 1999-2003
Masters of Eng.,
Iran, 2003-2006
Software Industry,
2003-2010
PhD, DCU, Ireland
2010-2014 Postdoc1, Imperial,
UK, 2014-2016
BSc. In CS,
Iran, 1999-2003
Masters of Eng.,
Iran, 2003-2006
Software Industry,
2003-2010
PhD, DCU, Ireland
2010-2014 Postdoc1, Imperial,
UK, 2014-2016
Postdoc2, CMU,
PA, 2016-2018
BSc. In CS,
Iran, 1999-2003
Masters of Eng.,
Iran, 2003-2006
Software Industry,
2003-2010
PhD, DCU, Ireland
2010-2014 Postdoc1, Imperial,
UK, 2014-2016
Postdoc2, CMU,
PA, 2016-2018
Assistant Prof.,
UofSC, 2018-*
Goal: Enable developers/users
to find the right quality tradeoff
Today’s most popular systems are configurable
built
Today’s most popular systems are configurable
built
Today’s most popular systems are configurable
built
Today’s most popular systems are configurable
built
Today’s most popular systems are configurable
built
Empirical observations confirm that systems are
becoming increasingly configurable
08 7/2010 7/2012 7/2014
Release time
1/1999 1/2003 1/2007 1/2011
0
1/2014
N
Release time
02 1/2006 1/2010 1/2014
2.2.14
2.3.4
2.0.35
.3.24
Release time
Apache
1/2006 1/2008 1/2010 1/2012 1/2014
0
40
80
120
160
200
2.0.0
1.0.0
0.19.0
0.1.0
Hadoop
Numberofparameters
Release time
MapReduce
HDFS
[Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]
Empirical observations confirm that systems are
becoming increasingly configurable
nia San Diego, ‡Huazhong Univ. of Science & Technology, †NetApp, Inc
tixu, longjin, xuf001, yyzhou}@cs.ucsd.edu
kar.Pasupathy, Rukma.Talwadker}@netapp.com
prevalent, but also severely
software. One fundamental
y of configuration, reflected
parameters (“knobs”). With
m software to ensure high re-
aunting, error-prone task.
nderstanding a fundamental
users really need so many
answer, we study the con-
including thousands of cus-
m (Storage-A), and hundreds
ce system software projects.
ng findings to motivate soft-
ore cautious and disciplined
these findings, we provide
ich can significantly reduce
A as an example, the guide-
ters and simplify 19.7% of
on existing users. Also, we
tion methods in the context
7/2006 7/2008 7/2010 7/2012 7/2014
0
100
200
300
400
500
600
700
Storage-A
Numberofparameters
Release time
1/1999 1/2003 1/2007 1/2011
0
100
200
300
400
500
5.6.2
5.5.0
5.0.16
5.1.3
4.1.0
4.0.12
3.23.0
1/2014
MySQL
Numberofparameters
Release time
1/1998 1/2002 1/2006 1/2010 1/2014
0
100
200
300
400
500
600
1.3.14
2.2.14
2.3.4
2.0.35
1.3.24
Numberofparameters
Release time
Apache
1/2006 1/2008 1/2010 1/2012 1/2014
0
40
80
120
160
200
2.0.0
1.0.0
0.19.0
0.1.0
Hadoop
Numberofparameters
Release time
MapReduce
HDFS
[Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]
Configurations determine the performance
behavior
void Parrot_setenv(. . . name,. . . value){
#ifdef PARROT_HAS_SETENV
my_setenv(name, value, 1);
#else
int name_len=strlen(name);
int val_len=strlen(value);
char* envs=glob_env;
if(envs==NULL){
return;
}
strcpy(envs,name);
strcpy(envs+name_len,"=");
strcpy(envs+name_len + 1,value);
putenv(envs);
#endif
}
#ifdef LINUX
extern int Parrot_signbit(double x){
endif
else
PARROT_HAS_SETENV
LINUX
Configurations determine the performance
behavior
void Parrot_setenv(. . . name,. . . value){
#ifdef PARROT_HAS_SETENV
my_setenv(name, value, 1);
#else
int name_len=strlen(name);
int val_len=strlen(value);
char* envs=glob_env;
if(envs==NULL){
return;
}
strcpy(envs,name);
strcpy(envs+name_len,"=");
strcpy(envs+name_len + 1,value);
putenv(envs);
#endif
}
#ifdef LINUX
extern int Parrot_signbit(double x){
endif
else
PARROT_HAS_SETENV
LINUX
Configurations determine the performance
behavior
void Parrot_setenv(. . . name,. . . value){
#ifdef PARROT_HAS_SETENV
my_setenv(name, value, 1);
#else
int name_len=strlen(name);
int val_len=strlen(value);
char* envs=glob_env;
if(envs==NULL){
return;
}
strcpy(envs,name);
strcpy(envs+name_len,"=");
strcpy(envs+name_len + 1,value);
putenv(envs);
#endif
}
#ifdef LINUX
extern int Parrot_signbit(double x){
endif
else
PARROT_HAS_SETENV
LINUX
Configurations determine the performance
behavior
void Parrot_setenv(. . . name,. . . value){
#ifdef PARROT_HAS_SETENV
my_setenv(name, value, 1);
#else
int name_len=strlen(name);
int val_len=strlen(value);
char* envs=glob_env;
if(envs==NULL){
return;
}
strcpy(envs,name);
strcpy(envs+name_len,"=");
strcpy(envs+name_len + 1,value);
putenv(envs);
#endif
}
#ifdef LINUX
extern int Parrot_signbit(double x){
endif
else
PARROT_HAS_SETENV
LINUX
Configurations determine the performance
behavior
void Parrot_setenv(. . . name,. . . value){
#ifdef PARROT_HAS_SETENV
my_setenv(name, value, 1);
#else
int name_len=strlen(name);
int val_len=strlen(value);
char* envs=glob_env;
if(envs==NULL){
return;
}
strcpy(envs,name);
strcpy(envs+name_len,"=");
strcpy(envs+name_len + 1,value);
putenv(envs);
#endif
}
#ifdef LINUX
extern int Parrot_signbit(double x){
endif
else
PARROT_HAS_SETENV
LINUX
Speed
Energy
Configurations determine the performance
behavior
void Parrot_setenv(. . . name,. . . value){
#ifdef PARROT_HAS_SETENV
my_setenv(name, value, 1);
#else
int name_len=strlen(name);
int val_len=strlen(value);
char* envs=glob_env;
if(envs==NULL){
return;
}
strcpy(envs,name);
strcpy(envs+name_len,"=");
strcpy(envs+name_len + 1,value);
putenv(envs);
#endif
}
#ifdef LINUX
extern int Parrot_signbit(double x){
endif
else
PARROT_HAS_SETENV
LINUX
Speed
Energy
Configurations determine the performance
behavior
void Parrot_setenv(. . . name,. . . value){
#ifdef PARROT_HAS_SETENV
my_setenv(name, value, 1);
#else
int name_len=strlen(name);
int val_len=strlen(value);
char* envs=glob_env;
if(envs==NULL){
return;
}
strcpy(envs,name);
strcpy(envs+name_len,"=");
strcpy(envs+name_len + 1,value);
putenv(envs);
#endif
}
#ifdef LINUX
extern int Parrot_signbit(double x){
endif
else
PARROT_HAS_SETENV
LINUX
Speed
Energy
How do we understand performance behavior of
real-world highly-configurable systems that scale well…
How do we understand performance behavior of
real-world highly-configurable systems that scale well…
… and enable developers/users to reason about
qualities (performance, energy) and to make tradeoff?
Outline
Case
Study
Outline
Case
Study
Transfer
Learning
[SEAMS’17]
Outline
Case
Study
Transfer
Learning
Theory
Building
[SEAMS’17]
[ASE’17]
Outline
Case
Study
Transfer
Learning
Theory
Building
Guided
Sampling
[SEAMS’17]
[ASE’17]
[FSE’18]
Outline
Case
Study
Transfer
Learning
Theory
Building
Guided
Sampling
Current
Research
[SEAMS’17]
[ASE’17]
[FSE’18]
SocialSensor
•Identifying trending topics

•Identifying user defined topics

•Social media search
SocialSensor
Internet
SocialSensor
Crawling
Tweets: [5k-20k/min]
Crawled
items
Internet
SocialSensor
Crawling
Tweets: [5k-20k/min]
Store
Crawled
items
Internet
SocialSensor
Orchestrator
Crawling
Tweets: [5k-20k/min]
Every 10 min:
[100k tweets]
Store
Crawled
items
FetchInternet
SocialSensor
Content AnalysisOrchestrator
Crawling
Tweets: [5k-20k/min]
Every 10 min:
[100k tweets]
Store
Push
Crawled
items
FetchInternet
SocialSensor
Content AnalysisOrchestrator
Crawling
Tweets: [5k-20k/min]
Every 10 min:
[100k tweets]
Tweets: [10M]
Store
Push
Store
Crawled
items
FetchInternet
SocialSensor
Content AnalysisOrchestrator
Crawling
Search and Integration
Tweets: [5k-20k/min]
Every 10 min:
[100k tweets]
Tweets: [10M]
Fetch
Store
Push
Store
Crawled
items
FetchInternet
Challenges
Content AnalysisOrchestrator
Crawling
Search and Integration
Tweets: [5k-20k/min]
Every 10 min:
[100k tweets]
Tweets: [10M]
Fetch
Store
Push
Store
Crawled
items
FetchInternet
Challenges
Content AnalysisOrchestrator
Crawling
Search and Integration
Tweets: [5k-20k/min]
Every 10 min:
[100k tweets]
Tweets: [10M]
Fetch
Store
Push
Store
Crawled
items
FetchInternet
100X
Challenges
Content AnalysisOrchestrator
Crawling
Search and Integration
Tweets: [5k-20k/min]
Every 10 min:
[100k tweets]
Tweets: [10M]
Fetch
Store
Push
Store
Crawled
items
FetchInternet
100X
10X
Challenges
Content AnalysisOrchestrator
Crawling
Search and Integration
Tweets: [5k-20k/min]
Every 10 min:
[100k tweets]
Tweets: [10M]
Fetch
Store
Push
Store
Crawled
items
FetchInternet
100X
10X
Real time
How can we gain a better performance without
using more resources?
Let’s try out different system configurations!
Opportunity: Data processing engines in the
pipeline were all configurable
Opportunity: Data processing engines in the
pipeline were all configurable
Opportunity: Data processing engines in the
pipeline were all configurable
Opportunity: Data processing engines in the
pipeline were all configurable
Opportunity: Data processing engines in the
pipeline were all configurable
> 100 > 100 > 100
Opportunity: Data processing engines in the
pipeline were all configurable
> 100 > 100 > 100
2300
0 500 1000 1500
Throughput (ops/sec)
0
1000
2000
3000
4000
5000
Averagewritelatency(s)
Default configuration was bad, so was the expert’
better
better
0 500 1000 1500
Throughput (ops/sec)
0
1000
2000
3000
4000
5000
Averagewritelatency(s)
Default configuration was bad, so was the expert’
Defaultbetter
better
0 500 1000 1500
Throughput (ops/sec)
0
1000
2000
3000
4000
5000
Averagewritelatency(s)
Default configuration was bad, so was the expert’
Default
Recommended
by an expert
better
better
0 500 1000 1500
Throughput (ops/sec)
0
1000
2000
3000
4000
5000
Averagewritelatency(s)
Default configuration was bad, so was the expert’
Default
Recommended
by an expert Optimal
Configuration
better
better
0 0.5 1 1.5 2 2.5
Throughput (ops/sec) 10
4
0
50
100
150
200
250
300
Latency(ms)
Default configuration was bad, so was the expert’
better
better
0 0.5 1 1.5 2 2.5
Throughput (ops/sec) 10
4
0
50
100
150
200
250
300
Latency(ms)
Default configuration was bad, so was the expert’
Default
better
better
0 0.5 1 1.5 2 2.5
Throughput (ops/sec) 10
4
0
50
100
150
200
250
300
Latency(ms)
Default configuration was bad, so was the expert’
Default
Recommended
by an expert
better
better
0 0.5 1 1.5 2 2.5
Throughput (ops/sec) 10
4
0
50
100
150
200
250
300
Latency(ms)
Default configuration was bad, so was the expert’
Default
Recommended
by an expert
Optimal
Configuration
better
better
0 500 1000 1500
Throughput (ops/sec)
0
1000
2000
3000
4000
5000
Averagewritelatency(s)
The default configuration is typically bad and the
optimal configuration is noticeably better than median
better
better
0 500 1000 1500
Throughput (ops/sec)
0
1000
2000
3000
4000
5000
Averagewritelatency(s)
The default configuration is typically bad and the
optimal configuration is noticeably better than median
Default Configurationbetter
better
• Default is bad
0 500 1000 1500
Throughput (ops/sec)
0
1000
2000
3000
4000
5000
Averagewritelatency(s)
The default configuration is typically bad and the
optimal configuration is noticeably better than median
Default Configuration
Optimal
Configuration
better
better
• Default is bad
0 500 1000 1500
Throughput (ops/sec)
0
1000
2000
3000
4000
5000
Averagewritelatency(s)
The default configuration is typically bad and the
optimal configuration is noticeably better than median
Default Configuration
Optimal
Configuration
better
better
• Default is bad
• 2X-10X faster than worst
0 500 1000 1500
Throughput (ops/sec)
0
1000
2000
3000
4000
5000
Averagewritelatency(s)
The default configuration is typically bad and the
optimal configuration is noticeably better than median
Default Configuration
Optimal
Configuration
better
better
• Default is bad
• 2X-10X faster than worst
• Noticeably faster than median
What did happen at the end?
• Achieved the objectives (100X user, same experience) 

• Saved money by reducing cloud resources up to 20%
• Our tool was able to identify configurations that was
consistently better than expert recommendation
Outline
Case
Study
Transfer
Learning
Theory
Building
Guided
Sampling
Current
Research
Setting the scene
Setting the scene
Compiler
(e.f., SaC, LLVM)
Setting the scene
ℂ = O1 × O2 × ⋯ × O19 × O20
Compiler
(e.f., SaC, LLVM)
Setting the scene
ℂ = O1 × O2 × ⋯ × O19 × O20
Dead code removal
Compiler
(e.f., SaC, LLVM)
Setting the scene
ℂ = O1 × O2 × ⋯ × O19 × O20
Dead code removal
Constant folding
Loop unrolling
Function inlining
Compiler
(e.f., SaC, LLVM)
Setting the scene
ℂ = O1 × O2 × ⋯ × O19 × O20
Dead code removal
Configuration
Space
Constant folding
Loop unrolling
Function inlining
Compiler
(e.f., SaC, LLVM)
Setting the scene
ℂ = O1 × O2 × ⋯ × O19 × O20
Dead code removal
Configuration
Space
Constant folding
Loop unrolling
Function inlining
Compiler
(e.f., SaC, LLVM)
Program
Setting the scene
ℂ = O1 × O2 × ⋯ × O19 × O20
Dead code removal
Configuration
Space
Constant folding
Loop unrolling
Function inlining
c1 = 0 × 0 × ⋯ × 0 × 1c1 ∈ℂ
Compiler
(e.f., SaC, LLVM)
Program
Setting the scene
ℂ = O1 × O2 × ⋯ × O19 × O20
Dead code removal
Configuration
Space
Constant folding
Loop unrolling
Function inlining
c1 = 0 × 0 × ⋯ × 0 × 1c1 ∈ℂ
Compiler
(e.f., SaC, LLVM)
Program Compiled
Code
Compile
Configure
Setting the scene
ℂ = O1 × O2 × ⋯ × O19 × O20
Dead code removal
Configuration
Space
Constant folding
Loop unrolling
Function inlining
c1 = 0 × 0 × ⋯ × 0 × 1c1 ∈ℂ
Compiler
(e.f., SaC, LLVM)
Program Compiled
Code
Instrumented
Binary
Hardware
Compile Deploy
Configure
Setting the scene
ℂ = O1 × O2 × ⋯ × O19 × O20
Dead code removal
Configuration
Space
Constant folding
Loop unrolling
Function inlining
c1 = 0 × 0 × ⋯ × 0 × 1c1 ∈ℂ
fc(c1) = 11.1msCompile
time
Execution
time
Energy
Compiler
(e.f., SaC, LLVM)
Program Compiled
Code
Instrumented
Binary
Hardware
Compile Deploy
Configure
fe(c1) = 110.3ms
fen(c1) = 100mwh
Non-functional
measurable/quantifiable
aspect
A typical approach for understanding the
performance behavior is sensitivity analysis
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
y1 = f(c1)
y2 = f(c2)
y3 = f(c3)
yn = f(cn)
⋯
A typical approach for understanding the
performance behavior is sensitivity analysis
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
y1 = f(c1)
y2 = f(c2)
y3 = f(c3)
yn = f(cn)
⋯
Training/Sample
Set
A typical approach for understanding the
performance behavior is sensitivity analysis
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
y1 = f(c1)
y2 = f(c2)
y3 = f(c3)
yn = f(cn)
̂f ∼ f( ⋅ )
⋯ LearnTraining/Sample
Set
Performance model could be in any appropriate
form of black-box models
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
y1 = f(c1)
y2 = f(c2)
y3 = f(c3)
yn = f(cn)
̂f ∼ f( ⋅ )
⋯
Training/Sample
Set
Performance model could be in any appropriate
form of black-box models
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
y1 = f(c1)
y2 = f(c2)
y3 = f(c3)
yn = f(cn)
̂f ∼ f( ⋅ )
⋯
Training/Sample
Set
Performance model could be in any appropriate
form of black-box models
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
y1 = f(c1)
y2 = f(c2)
y3 = f(c3)
yn = f(cn)
̂f ∼ f( ⋅ )
⋯
Training/Sample
Set
Performance model could be in any appropriate
form of black-box models
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
y1 = f(c1)
y2 = f(c2)
y3 = f(c3)
yn = f(cn)
̂f ∼ f( ⋅ )
⋯
Training/Sample
Set
Evaluating a performance model
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
y1 = f(c1)
y2 = f(c2)
y3 = f(c3)
yn = f(cn)
̂f ∼ f( ⋅ )
⋯ LearnTraining/Sample
Set
Evaluating a performance model
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
y1 = f(c1)
y2 = f(c2)
y3 = f(c3)
yn = f(cn)
̂f ∼ f( ⋅ )
⋯ LearnTraining/Sample
Set
Evaluating a performance model
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
y1 = f(c1)
y2 = f(c2)
y3 = f(c3)
yn = f(cn)
̂f ∼ f( ⋅ )
⋯ LearnTraining/Sample
Set
Evaluate
Accuracy
APE( ̂f, f ) =
| ̂f(c) − f(c)|
f(c)
× 100
A performance model contain useful information
about influential options and interactions
f( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
f : ℂ → ℝ
A performance model contain useful information
about influential options and interactions
f( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
f : ℂ → ℝ
A performance model contain useful information
about influential options and interactions
f( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
f : ℂ → ℝ
Performance model can then be used to reason
about qualities
void Parrot_setenv(. . . name,. . . value){
#ifdef PARROT_HAS_SETENV
my_setenv(name, value, 1);
#else
int name_len=strlen(name);
int val_len=strlen(value);
char* envs=glob_env;
if(envs==NULL){
return;
}
strcpy(envs,name);
strcpy(envs+name_len,"=");
strcpy(envs+name_len + 1,value);
putenv(envs);
#endif
}
#ifdef LINUX
endif
else
PARROT_HAS_SETENV
LINUX
f(·) = 5 + 3 ⇥ o1
Execution time (s)
Performance model can then be used to reason
about qualities
void Parrot_setenv(. . . name,. . . value){
#ifdef PARROT_HAS_SETENV
my_setenv(name, value, 1);
#else
int name_len=strlen(name);
int val_len=strlen(value);
char* envs=glob_env;
if(envs==NULL){
return;
}
strcpy(envs,name);
strcpy(envs+name_len,"=");
strcpy(envs+name_len + 1,value);
putenv(envs);
#endif
}
#ifdef LINUX
endif
else
PARROT_HAS_SETENV
LINUX
f(·) = 5 + 3 ⇥ o1
Execution time (s)
f(o1 := 1) = 8
Performance model can then be used to reason
about qualities
void Parrot_setenv(. . . name,. . . value){
#ifdef PARROT_HAS_SETENV
my_setenv(name, value, 1);
#else
int name_len=strlen(name);
int val_len=strlen(value);
char* envs=glob_env;
if(envs==NULL){
return;
}
strcpy(envs,name);
strcpy(envs+name_len,"=");
strcpy(envs+name_len + 1,value);
putenv(envs);
#endif
}
#ifdef LINUX
endif
else
PARROT_HAS_SETENV
LINUX
f(·) = 5 + 3 ⇥ o1
Execution time (s)
f(o1 := 0) = 5
f(o1 := 1) = 8
Insight: Performance measurements of the real
system is “similar” to the ones from the simulators
Measure
Configurations
Performance
Insight: Performance measurements of the real
system is “similar” to the ones from the simulators
Measure
Configurations
Performance
Insight: Performance measurements of the real
system is “similar” to the ones from the simulators
Measure
Simulator (Gazebo)
Data
Configurations
Performance
So why not reuse these data,
instead of measuring on real robot?
We developed methods to make learning cheaper
via transfer learning
Target (Learn)Source (Given)
DataModel
Transferable
Knowledge
II. INTUITION
Understanding the performance behavior of configurable
software systems can enable (i) performance debugging, (ii)
performance tuning, (iii) design-time evolution, or (iv) runtime
adaptation [11]. We lack empirical understanding of how the
performance behavior of a system will vary when the environ-
ment of the system changes. Such empirical understanding will
provide important insights to develop faster and more accurate
learning techniques that allow us to make predictions and
optimizations of performance for highly configurable systems
in changing environments [10]. For instance, we can learn
performance behavior of a system on a cheap hardware in a
controlled lab environment and use that to understand the per-
formance behavior of the system on a production server before
shipping to the end user. More specifically, we would like to
know, what the relationship is between the performance of a
system in a specific environment (characterized by software
configuration, hardware, workload, and system version) to the
one that we vary its environmental conditions.
In this research, we aim for an empirical understanding of
A. Preliminary concept
In this section, we p
cepts that we use throu
enable us to concisely
1) Configuration and
the i-th feature of a co
enabled or disabled an
configuration space is m
all the features C =
Dom(Fi) = {0, 1}. A
a member of the confi
all the parameters are
range (i.e., complete ins
We also describe an
e = [w, h, v] drawn fr
W ⇥H ⇥V , where they
values for workload, ha
2) Performance mod
configuration space F
formance model is a b
given some observation
combination of system
II. INTUITION
rstanding the performance behavior of configurable
systems can enable (i) performance debugging, (ii)
ance tuning, (iii) design-time evolution, or (iv) runtime
on [11]. We lack empirical understanding of how the
ance behavior of a system will vary when the environ-
the system changes. Such empirical understanding will
important insights to develop faster and more accurate
techniques that allow us to make predictions and
ations of performance for highly configurable systems
ging environments [10]. For instance, we can learn
ance behavior of a system on a cheap hardware in a
ed lab environment and use that to understand the per-
e behavior of the system on a production server before
to the end user. More specifically, we would like to
hat the relationship is between the performance of a
n a specific environment (characterized by software
ation, hardware, workload, and system version) to the
we vary its environmental conditions.
s research, we aim for an empirical understanding of
A. Preliminary concepts
In this section, we provide formal definitions of
cepts that we use throughout this study. The forma
enable us to concisely convey concept throughout
1) Configuration and environment space: Let F
the i-th feature of a configurable system A whic
enabled or disabled and one of them holds by de
configuration space is mathematically a Cartesian
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(F
Dom(Fi) = {0, 1}. A configuration of a syste
a member of the configuration space (feature spa
all the parameters are assigned to a specific valu
range (i.e., complete instantiations of the system’s pa
We also describe an environment instance by 3
e = [w, h, v] drawn from a given environment s
W ⇥H ⇥V , where they respectively represent sets
values for workload, hardware and system version.
2) Performance model: Given a software syste
configuration space F and environmental instances
formance model is a black-box function f : F ⇥
given some observations of the system performanc
combination of system’s features x 2 F in an en
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
F where ✏i ⇠ N (0, i). The training data for our regression
models is then simply Dtr = {(xi, yi)}n
i=1. In other words, a
response function is simply a mapping from the input space to
a measurable performance metric that produces interval-scaled
data (here we assume it produces real numbers).
3) Performance distribution: For the performance model,
we measured and associated the performance response to each
configuration, now let introduce another concept where we
vary the environment and we measure the performance. An
empirical performance distribution is a stochastic process,
pd : E ! (R), that defines a probability distribution over
performance measures for each environmental conditions. To
tem version) to the
ons.
al understanding of
g via an informed
learning a perfor-
ed on a well-suited
the knowledge we
the main research
information (trans-
o both source and
ore can be carried
r. This transferable
[10].
s that we consider
uration is a set of
is the primary vari-
rstand performance
ike to understand
der study will be
formance model is a black-box function f : F ⇥ E
given some observations of the system performance fo
combination of system’s features x 2 F in an enviro
e 2 E. To construct a performance model for a syst
with configuration space F, we run A in environment in
e 2 E on various combinations of configurations xi 2 F
record the resulting performance values yi = f(xi) + ✏i
F where ✏i ⇠ N (0, i). The training data for our regr
models is then simply Dtr = {(xi, yi)}n
i=1. In other wo
response function is simply a mapping from the input sp
a measurable performance metric that produces interval-
data (here we assume it produces real numbers).
3) Performance distribution: For the performance m
we measured and associated the performance response t
configuration, now let introduce another concept whe
vary the environment and we measure the performanc
empirical performance distribution is a stochastic pr
pd : E ! (R), that defines a probability distributio
performance measures for each environmental conditio
Extract Reuse
Learn Learn
Goal: Gain strength by
transferring information
across environments
What is transfer learning?
What is transfer learning?
Transfer learning is a machine learning technique, where
knowledge gain during training in one type of problem is
used to train in other similar type of problem
What is the advantage of transfer learning?
• During learning you may need thousands of rotten and fresh potato and
hours of training to learn. 

• But now using the same knowledge of rotten features you can identify
rotten tomato with less samples and training time. 

• You may have learned during daytime with enough light and exposure;
but your present tomato identification job is at night. 

• You may have learned sitting very close, just beside the box of potato; but
now for tomato identification you are in the other side of the glass.
TurtleBot
[P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17]
Our transfer learning solution
TurtleBot
Simulator (Gazebo)
[P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17]
Our transfer learning solution
Data
Measure
TurtleBot
Simulator (Gazebo)
[P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17]
Our transfer learning solution
DataData
Data
Measure
Measure
Reuse
TurtleBot
Simulator (Gazebo)
[P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17]
Configurations
Our transfer learning solution
DataData
Data
Measure
Measure
Reuse Learn
TurtleBot
Simulator (Gazebo)
[P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17]
Configurations
Our transfer learning solution
f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2
Gaussian processes for performance modeling
t = n
output,f(x)
input, x
Gaussian processes for performance modeling
t = n
Observation
output,f(x)
input, x
Gaussian processes for performance modeling
t = n
Observation
Mean
output,f(x)
input, x
Gaussian processes for performance modeling
t = n
Observation
Mean
Uncertainty
output,f(x)
input, x
Gaussian processes for performance modeling
t = n
Observation
Mean
Uncertainty
New
observation
output,f(x)
input, x
input, x
Gaussian processes for performance modeling
t = n t = n + 1
Observation
Mean
Uncertainty
New
observation
output,f(x)
input, x
Gaussian Processes enables reasoning about
performance
Step 1: Fit GP to the data seen
so far

Step 2: Explore the model for
regions of most variance

Step 3: Sample that region

Step 4: Repeat
-1.5 -1 -0.5 0 0.5 1 1.5
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Configuration
Space
Empirical
Model
Experiment
Experiment
0 20 40 60 80 100 120 140 160 180 200
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Selection Criteria
Sequential Design
The intuition behind our transfer learning
approach
Intuition: Observations on
the source(s) can affect
predictions on the target
Example: Learning the
chess game make learning
the Go game a lot easier!
Configurations
Performance
Models
tween the source and target functions, g, f, using
observations Ds, Dt to build the predictive model ˆf
the relationship, we define the following kernel f
k(f, g, x, x0
) = kt(f, g) ⇥ kxx(x, x0
),
where the kernels kt represent the correlation bet
and target function, while kxx is the covariance
inputs. Typically, kxx is parameterized and its pa
learnt by maximizing the marginal likelihood o
given the observations from source and target D
CoBot experiment: DARPA BRASS
CoBot experiment: DARPA BRASS
CoBot experiment: DARPA BRASS
0 2 4 6 8
Localization error [m]
10
15
20
25
30
35
40
CPUutilization[%]
Pareto
front
better
better
no_of_particles=x
no_of_refinement=y
CoBot experiment: DARPA BRASS
0 2 4 6 8
Localization error [m]
10
15
20
25
30
35
40
CPUutilization[%]
Energy
constraint
Pareto
front
better
better
no_of_particles=x
no_of_refinement=y
CoBot experiment: DARPA BRASS
0 2 4 6 8
Localization error [m]
10
15
20
25
30
35
40
CPUutilization[%]
Energy
constraint
Safety
constraint
Pareto
front
better
better
no_of_particles=x
no_of_refinement=y
CoBot experiment: DARPA BRASS
0 2 4 6 8
Localization error [m]
10
15
20
25
30
35
40
CPUutilization[%]
Energy
constraint
Safety
constraint
Pareto
front
Sweet
Spot
better
better
no_of_particles=x
no_of_refinement=y
CoBot
experiment
CoBot
experiment
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
Source
(given)
CPU [%]
CoBot
experiment
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
Source
(given)
Target
(ground truth
6 months)
CPU [%] CPU [%]
CoBot
experiment
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
Source
(given)
Target
(ground truth
6 months)
Prediction with
4 samples
CPU [%] CPU [%]
CoBot
experiment
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
0
5
10
15
20
25
Source
(given)
Target
(ground truth
6 months)
Prediction with
4 samples
Prediction with
Transfer learning
CPU [%] CPU [%]
Transfer Learning for Improving Model Predictions
in Highly Configurable Software
Pooyan Jamshidi, Miguel Velez, Christian K¨astner
Carnegie Mellon University, USA
{pjamshid,mvelezce,kaestner}@cs.cmu.edu
Norbert Siegmund
Bauhaus-University Weimar, Germany
norbert.siegmund@uni-weimar.de
Prasad Kawthekar
Stanford University, USA
pkawthek@stanford.edu
Abstract—Modern software systems are built to be used in
dynamic environments using configuration capabilities to adapt to
changes and external uncertainties. In a self-adaptation context,
we are often interested in reasoning about the performance of
the systems under different configurations. Usually, we learn
a black-box model based on real measurements to predict
the performance of the system given a specific configuration.
However, as modern systems become more complex, there are
many configuration parameters that may interact and we end up
learning an exponentially large configuration space. Naturally,
this does not scale when relying on real measurements in the
actual changing environment. We propose a different solution:
Instead of taking the measurements from the real system, we
learn the model using samples from other sources, such as
simulators that approximate performance of the real system at
Predictive Model
Learn Model with
Transfer Learning
Measure Measure
Data
Source
Target
Simulator (Source) Robot (Target)
Adaptation
Fig. 1: Transfer learning for performance model learning.
order to identify the best performing configuration for a robot
Details: [SEAMS ’17]
Outline
Case
Study
Transfer
Learning
Theory
Building
Guided
Sampling
Current
Research
Looking further: When transfer learning goes
wrong
10
20
30
40
50
60
AbsolutePercentageError[%]
Sources s s1 s2 s3 s4 s5 s6
noise-level 0 5 10 15 20 25 30
corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19
µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75
Insight: Predictions become
more accurate when the source
is more related to the target.
Non-transfer-learning
Looking further: When transfer learning goes
wrong
10
20
30
40
50
60
AbsolutePercentageError[%]
Sources s s1 s2 s3 s4 s5 s6
noise-level 0 5 10 15 20 25 30
corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19
µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75
It worked!
Insight: Predictions become
more accurate when the source
is more related to the target.
Non-transfer-learning
Looking further: When transfer learning goes
wrong
10
20
30
40
50
60
AbsolutePercentageError[%]
Sources s s1 s2 s3 s4 s5 s6
noise-level 0 5 10 15 20 25 30
corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19
µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75
It worked! It didn’t!
Insight: Predictions become
more accurate when the source
is more related to the target.
Non-transfer-learning
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
5
10
15
20
25
30
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
10
12
14
16
18
20
22
24
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
10
15
20
25
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
10
15
20
25
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
6
8
10
12
14
16
18
20
22
24
(a) (b) (c)
(d) (e) 5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
12
14
16
18
20
22
24
(f)
CPU usage [%]
CPU usage [%] CPU usage [%]
CPU usage [%] CPU usage [%] CPU usage [%]
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
5
10
15
20
25
30
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
10
12
14
16
18
20
22
24
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
10
15
20
25
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
10
15
20
25
5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
6
8
10
12
14
16
18
20
22
24
(a) (b) (c)
(d) (e) 5 10 15 20 25
number of particles
5
10
15
20
25
numberofrefinements
12
14
16
18
20
22
24
(f)
CPU usage [%]
CPU usage [%] CPU usage [%]
CPU usage [%] CPU usage [%] CPU usage [%]
It worked! It worked! It worked!
It didn’t! It didn’t! It didn’t!
Key question: Can we develop a theory to explain
when transfer learning works?
Target (Learn)Source (Given)
DataModel
Transferable
Knowledge
II. INTUITION
rstanding the performance behavior of configurable
e systems can enable (i) performance debugging, (ii)
mance tuning, (iii) design-time evolution, or (iv) runtime
on [11]. We lack empirical understanding of how the
mance behavior of a system will vary when the environ-
the system changes. Such empirical understanding will
important insights to develop faster and more accurate
g techniques that allow us to make predictions and
ations of performance for highly configurable systems
ging environments [10]. For instance, we can learn
mance behavior of a system on a cheap hardware in a
ed lab environment and use that to understand the per-
ce behavior of the system on a production server before
g to the end user. More specifically, we would like to
what the relationship is between the performance of a
in a specific environment (characterized by software
ration, hardware, workload, and system version) to the
t we vary its environmental conditions.
is research, we aim for an empirical understanding of
mance behavior to improve learning via an informed
g process. In other words, we at learning a perfor-
model in a changed environment based on a well-suited
g set that has been determined by the knowledge we
in other environments. Therefore, the main research
A. Preliminary concepts
In this section, we provide formal definitions of four con-
cepts that we use throughout this study. The formal notations
enable us to concisely convey concept throughout the paper.
1) Configuration and environment space: Let Fi indicate
the i-th feature of a configurable system A which is either
enabled or disabled and one of them holds by default. The
configuration space is mathematically a Cartesian product of
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
Dom(Fi) = {0, 1}. A configuration of a system is then
a member of the configuration space (feature space) where
all the parameters are assigned to a specific value in their
range (i.e., complete instantiations of the system’s parameters).
We also describe an environment instance by 3 variables
e = [w, h, v] drawn from a given environment space E =
W ⇥H ⇥V , where they respectively represent sets of possible
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
ON
behavior of configurable
erformance debugging, (ii)
e evolution, or (iv) runtime
understanding of how the
will vary when the environ-
mpirical understanding will
op faster and more accurate
to make predictions and
ighly configurable systems
or instance, we can learn
on a cheap hardware in a
that to understand the per-
a production server before
cifically, we would like to
ween the performance of a
(characterized by software
and system version) to the
conditions.
empirical understanding of
learning via an informed
we at learning a perfor-
ment based on a well-suited
ned by the knowledge we
erefore, the main research
A. Preliminary concepts
In this section, we provide formal definitions of four con-
cepts that we use throughout this study. The formal notations
enable us to concisely convey concept throughout the paper.
1) Configuration and environment space: Let Fi indicate
the i-th feature of a configurable system A which is either
enabled or disabled and one of them holds by default. The
configuration space is mathematically a Cartesian product of
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
Dom(Fi) = {0, 1}. A configuration of a system is then
a member of the configuration space (feature space) where
all the parameters are assigned to a specific value in their
range (i.e., complete instantiations of the system’s parameters).
We also describe an environment instance by 3 variables
e = [w, h, v] drawn from a given environment space E =
W ⇥H ⇥V , where they respectively represent sets of possible
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
oad, hardware and system version.
e model: Given a software system A with
ce F and environmental instances E, a per-
is a black-box function f : F ⇥ E ! R
rvations of the system performance for each
ystem’s features x 2 F in an environment
ruct a performance model for a system A
n space F, we run A in environment instance
combinations of configurations xi 2 F, and
ng performance values yi = f(xi) + ✏i, xi 2
(0, i). The training data for our regression
mply Dtr = {(xi, yi)}n
i=1. In other words, a
is simply a mapping from the input space to
ormance metric that produces interval-scaled
ume it produces real numbers).
e distribution: For the performance model,
associated the performance response to each
w let introduce another concept where we
ment and we measure the performance. An
mance distribution is a stochastic process,
that defines a probability distribution over
sures for each environmental conditions. To
ormance distribution for a system A with
ce F, similarly to the process of deriving
models, we run A on various combinations
2 F, for a specific environment instance
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
F where ✏i ⇠ N (0, i). The training data for our regression
models is then simply Dtr = {(xi, yi)}n
i=1. In other words, a
response function is simply a mapping from the input space to
a measurable performance metric that produces interval-scaled
data (here we assume it produces real numbers).
3) Performance distribution: For the performance model,
we measured and associated the performance response to each
configuration, now let introduce another concept where we
vary the environment and we measure the performance. An
empirical performance distribution is a stochastic process,
pd : E ! (R), that defines a probability distribution over
performance measures for each environmental conditions. To
construct a performance distribution for a system A with
configuration space F, similarly to the process of deriving
the performance models, we run A on various combinations
configurations xi 2 F, for a specific environment instance
Extract Reuse
Learn Learn
Key question: Can we develop a theory to explain
when transfer learning works?
Target (Learn)Source (Given)
DataModel
Transferable
Knowledge
II. INTUITION
rstanding the performance behavior of configurable
e systems can enable (i) performance debugging, (ii)
mance tuning, (iii) design-time evolution, or (iv) runtime
on [11]. We lack empirical understanding of how the
mance behavior of a system will vary when the environ-
the system changes. Such empirical understanding will
important insights to develop faster and more accurate
g techniques that allow us to make predictions and
ations of performance for highly configurable systems
ging environments [10]. For instance, we can learn
mance behavior of a system on a cheap hardware in a
ed lab environment and use that to understand the per-
ce behavior of the system on a production server before
g to the end user. More specifically, we would like to
what the relationship is between the performance of a
in a specific environment (characterized by software
ration, hardware, workload, and system version) to the
t we vary its environmental conditions.
is research, we aim for an empirical understanding of
mance behavior to improve learning via an informed
g process. In other words, we at learning a perfor-
model in a changed environment based on a well-suited
g set that has been determined by the knowledge we
in other environments. Therefore, the main research
A. Preliminary concepts
In this section, we provide formal definitions of four con-
cepts that we use throughout this study. The formal notations
enable us to concisely convey concept throughout the paper.
1) Configuration and environment space: Let Fi indicate
the i-th feature of a configurable system A which is either
enabled or disabled and one of them holds by default. The
configuration space is mathematically a Cartesian product of
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
Dom(Fi) = {0, 1}. A configuration of a system is then
a member of the configuration space (feature space) where
all the parameters are assigned to a specific value in their
range (i.e., complete instantiations of the system’s parameters).
We also describe an environment instance by 3 variables
e = [w, h, v] drawn from a given environment space E =
W ⇥H ⇥V , where they respectively represent sets of possible
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
ON
behavior of configurable
erformance debugging, (ii)
e evolution, or (iv) runtime
understanding of how the
will vary when the environ-
mpirical understanding will
op faster and more accurate
to make predictions and
ighly configurable systems
or instance, we can learn
on a cheap hardware in a
that to understand the per-
a production server before
cifically, we would like to
ween the performance of a
(characterized by software
and system version) to the
conditions.
empirical understanding of
learning via an informed
we at learning a perfor-
ment based on a well-suited
ned by the knowledge we
erefore, the main research
A. Preliminary concepts
In this section, we provide formal definitions of four con-
cepts that we use throughout this study. The formal notations
enable us to concisely convey concept throughout the paper.
1) Configuration and environment space: Let Fi indicate
the i-th feature of a configurable system A which is either
enabled or disabled and one of them holds by default. The
configuration space is mathematically a Cartesian product of
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
Dom(Fi) = {0, 1}. A configuration of a system is then
a member of the configuration space (feature space) where
all the parameters are assigned to a specific value in their
range (i.e., complete instantiations of the system’s parameters).
We also describe an environment instance by 3 variables
e = [w, h, v] drawn from a given environment space E =
W ⇥H ⇥V , where they respectively represent sets of possible
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
oad, hardware and system version.
e model: Given a software system A with
ce F and environmental instances E, a per-
is a black-box function f : F ⇥ E ! R
rvations of the system performance for each
ystem’s features x 2 F in an environment
ruct a performance model for a system A
n space F, we run A in environment instance
combinations of configurations xi 2 F, and
ng performance values yi = f(xi) + ✏i, xi 2
(0, i). The training data for our regression
mply Dtr = {(xi, yi)}n
i=1. In other words, a
is simply a mapping from the input space to
ormance metric that produces interval-scaled
ume it produces real numbers).
e distribution: For the performance model,
associated the performance response to each
w let introduce another concept where we
ment and we measure the performance. An
mance distribution is a stochastic process,
that defines a probability distribution over
sures for each environmental conditions. To
ormance distribution for a system A with
ce F, similarly to the process of deriving
models, we run A on various combinations
2 F, for a specific environment instance
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
F where ✏i ⇠ N (0, i). The training data for our regression
models is then simply Dtr = {(xi, yi)}n
i=1. In other words, a
response function is simply a mapping from the input space to
a measurable performance metric that produces interval-scaled
data (here we assume it produces real numbers).
3) Performance distribution: For the performance model,
we measured and associated the performance response to each
configuration, now let introduce another concept where we
vary the environment and we measure the performance. An
empirical performance distribution is a stochastic process,
pd : E ! (R), that defines a probability distribution over
performance measures for each environmental conditions. To
construct a performance distribution for a system A with
configuration space F, similarly to the process of deriving
the performance models, we run A on various combinations
configurations xi 2 F, for a specific environment instance
Extract Reuse
Learn Learn
Q1: How source and target
are “related”?
Key question: Can we develop a theory to explain
when transfer learning works?
Target (Learn)Source (Given)
DataModel
Transferable
Knowledge
II. INTUITION
rstanding the performance behavior of configurable
e systems can enable (i) performance debugging, (ii)
mance tuning, (iii) design-time evolution, or (iv) runtime
on [11]. We lack empirical understanding of how the
mance behavior of a system will vary when the environ-
the system changes. Such empirical understanding will
important insights to develop faster and more accurate
g techniques that allow us to make predictions and
ations of performance for highly configurable systems
ging environments [10]. For instance, we can learn
mance behavior of a system on a cheap hardware in a
ed lab environment and use that to understand the per-
ce behavior of the system on a production server before
g to the end user. More specifically, we would like to
what the relationship is between the performance of a
in a specific environment (characterized by software
ration, hardware, workload, and system version) to the
t we vary its environmental conditions.
is research, we aim for an empirical understanding of
mance behavior to improve learning via an informed
g process. In other words, we at learning a perfor-
model in a changed environment based on a well-suited
g set that has been determined by the knowledge we
in other environments. Therefore, the main research
A. Preliminary concepts
In this section, we provide formal definitions of four con-
cepts that we use throughout this study. The formal notations
enable us to concisely convey concept throughout the paper.
1) Configuration and environment space: Let Fi indicate
the i-th feature of a configurable system A which is either
enabled or disabled and one of them holds by default. The
configuration space is mathematically a Cartesian product of
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
Dom(Fi) = {0, 1}. A configuration of a system is then
a member of the configuration space (feature space) where
all the parameters are assigned to a specific value in their
range (i.e., complete instantiations of the system’s parameters).
We also describe an environment instance by 3 variables
e = [w, h, v] drawn from a given environment space E =
W ⇥H ⇥V , where they respectively represent sets of possible
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
ON
behavior of configurable
erformance debugging, (ii)
e evolution, or (iv) runtime
understanding of how the
will vary when the environ-
mpirical understanding will
op faster and more accurate
to make predictions and
ighly configurable systems
or instance, we can learn
on a cheap hardware in a
that to understand the per-
a production server before
cifically, we would like to
ween the performance of a
(characterized by software
and system version) to the
conditions.
empirical understanding of
learning via an informed
we at learning a perfor-
ment based on a well-suited
ned by the knowledge we
erefore, the main research
A. Preliminary concepts
In this section, we provide formal definitions of four con-
cepts that we use throughout this study. The formal notations
enable us to concisely convey concept throughout the paper.
1) Configuration and environment space: Let Fi indicate
the i-th feature of a configurable system A which is either
enabled or disabled and one of them holds by default. The
configuration space is mathematically a Cartesian product of
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
Dom(Fi) = {0, 1}. A configuration of a system is then
a member of the configuration space (feature space) where
all the parameters are assigned to a specific value in their
range (i.e., complete instantiations of the system’s parameters).
We also describe an environment instance by 3 variables
e = [w, h, v] drawn from a given environment space E =
W ⇥H ⇥V , where they respectively represent sets of possible
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
oad, hardware and system version.
e model: Given a software system A with
ce F and environmental instances E, a per-
is a black-box function f : F ⇥ E ! R
rvations of the system performance for each
ystem’s features x 2 F in an environment
ruct a performance model for a system A
n space F, we run A in environment instance
combinations of configurations xi 2 F, and
ng performance values yi = f(xi) + ✏i, xi 2
(0, i). The training data for our regression
mply Dtr = {(xi, yi)}n
i=1. In other words, a
is simply a mapping from the input space to
ormance metric that produces interval-scaled
ume it produces real numbers).
e distribution: For the performance model,
associated the performance response to each
w let introduce another concept where we
ment and we measure the performance. An
mance distribution is a stochastic process,
that defines a probability distribution over
sures for each environmental conditions. To
ormance distribution for a system A with
ce F, similarly to the process of deriving
models, we run A on various combinations
2 F, for a specific environment instance
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
F where ✏i ⇠ N (0, i). The training data for our regression
models is then simply Dtr = {(xi, yi)}n
i=1. In other words, a
response function is simply a mapping from the input space to
a measurable performance metric that produces interval-scaled
data (here we assume it produces real numbers).
3) Performance distribution: For the performance model,
we measured and associated the performance response to each
configuration, now let introduce another concept where we
vary the environment and we measure the performance. An
empirical performance distribution is a stochastic process,
pd : E ! (R), that defines a probability distribution over
performance measures for each environmental conditions. To
construct a performance distribution for a system A with
configuration space F, similarly to the process of deriving
the performance models, we run A on various combinations
configurations xi 2 F, for a specific environment instance
Extract Reuse
Learn Learn
Q1: How source and target
are “related”?
Q2: What characteristics
are preserved?
Key question: Can we develop a theory to explain
when transfer learning works?
Target (Learn)Source (Given)
DataModel
Transferable
Knowledge
II. INTUITION
rstanding the performance behavior of configurable
e systems can enable (i) performance debugging, (ii)
mance tuning, (iii) design-time evolution, or (iv) runtime
on [11]. We lack empirical understanding of how the
mance behavior of a system will vary when the environ-
the system changes. Such empirical understanding will
important insights to develop faster and more accurate
g techniques that allow us to make predictions and
ations of performance for highly configurable systems
ging environments [10]. For instance, we can learn
mance behavior of a system on a cheap hardware in a
ed lab environment and use that to understand the per-
ce behavior of the system on a production server before
g to the end user. More specifically, we would like to
what the relationship is between the performance of a
in a specific environment (characterized by software
ration, hardware, workload, and system version) to the
t we vary its environmental conditions.
is research, we aim for an empirical understanding of
mance behavior to improve learning via an informed
g process. In other words, we at learning a perfor-
model in a changed environment based on a well-suited
g set that has been determined by the knowledge we
in other environments. Therefore, the main research
A. Preliminary concepts
In this section, we provide formal definitions of four con-
cepts that we use throughout this study. The formal notations
enable us to concisely convey concept throughout the paper.
1) Configuration and environment space: Let Fi indicate
the i-th feature of a configurable system A which is either
enabled or disabled and one of them holds by default. The
configuration space is mathematically a Cartesian product of
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
Dom(Fi) = {0, 1}. A configuration of a system is then
a member of the configuration space (feature space) where
all the parameters are assigned to a specific value in their
range (i.e., complete instantiations of the system’s parameters).
We also describe an environment instance by 3 variables
e = [w, h, v] drawn from a given environment space E =
W ⇥H ⇥V , where they respectively represent sets of possible
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
ON
behavior of configurable
erformance debugging, (ii)
e evolution, or (iv) runtime
understanding of how the
will vary when the environ-
mpirical understanding will
op faster and more accurate
to make predictions and
ighly configurable systems
or instance, we can learn
on a cheap hardware in a
that to understand the per-
a production server before
cifically, we would like to
ween the performance of a
(characterized by software
and system version) to the
conditions.
empirical understanding of
learning via an informed
we at learning a perfor-
ment based on a well-suited
ned by the knowledge we
erefore, the main research
A. Preliminary concepts
In this section, we provide formal definitions of four con-
cepts that we use throughout this study. The formal notations
enable us to concisely convey concept throughout the paper.
1) Configuration and environment space: Let Fi indicate
the i-th feature of a configurable system A which is either
enabled or disabled and one of them holds by default. The
configuration space is mathematically a Cartesian product of
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
Dom(Fi) = {0, 1}. A configuration of a system is then
a member of the configuration space (feature space) where
all the parameters are assigned to a specific value in their
range (i.e., complete instantiations of the system’s parameters).
We also describe an environment instance by 3 variables
e = [w, h, v] drawn from a given environment space E =
W ⇥H ⇥V , where they respectively represent sets of possible
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
oad, hardware and system version.
e model: Given a software system A with
ce F and environmental instances E, a per-
is a black-box function f : F ⇥ E ! R
rvations of the system performance for each
ystem’s features x 2 F in an environment
ruct a performance model for a system A
n space F, we run A in environment instance
combinations of configurations xi 2 F, and
ng performance values yi = f(xi) + ✏i, xi 2
(0, i). The training data for our regression
mply Dtr = {(xi, yi)}n
i=1. In other words, a
is simply a mapping from the input space to
ormance metric that produces interval-scaled
ume it produces real numbers).
e distribution: For the performance model,
associated the performance response to each
w let introduce another concept where we
ment and we measure the performance. An
mance distribution is a stochastic process,
that defines a probability distribution over
sures for each environmental conditions. To
ormance distribution for a system A with
ce F, similarly to the process of deriving
models, we run A on various combinations
2 F, for a specific environment instance
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
F where ✏i ⇠ N (0, i). The training data for our regression
models is then simply Dtr = {(xi, yi)}n
i=1. In other words, a
response function is simply a mapping from the input space to
a measurable performance metric that produces interval-scaled
data (here we assume it produces real numbers).
3) Performance distribution: For the performance model,
we measured and associated the performance response to each
configuration, now let introduce another concept where we
vary the environment and we measure the performance. An
empirical performance distribution is a stochastic process,
pd : E ! (R), that defines a probability distribution over
performance measures for each environmental conditions. To
construct a performance distribution for a system A with
configuration space F, similarly to the process of deriving
the performance models, we run A on various combinations
configurations xi 2 F, for a specific environment instance
Extract Reuse
Learn Learn
Q1: How source and target
are “related”?
Q2: What characteristics
are preserved?
Q3: What are the actionable
insights?
We hypothesized that we can exploit similarities across
environments to learn “cheaper” performance models
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
ys1 = fs(c1)
ys2 = fs(c2)
ys3 = fs(c3)
ysn = fs(cn)
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
yt1 = ft(c1)
yt2 = ft(c2)
yt3 = ft(c3)
ytn = ft(cn)
Source Environment
(Execution time of Program X)
Target Environment
(Execution time of Program Y)
Similarity
[P. Jamshidi, et al., “Transfer learning for performance modeling of configurable systems….”, ASE’17]
Our empirical study: We looked at different highly-
configurable systems to gain insights
[P. Jamshidi, et al., “Transfer learning for performance modeling of configurable systems….”, ASE’17]
SPEAR	(SAT	Solver)
Analysis	time
14	options	
16,384	configurations
SAT	problems
3	hardware
2	versions
X264	(video	encoder)
Encoding	time
16	options	
4,000 configurations
Video	quality/size
2	hardware
3	versions
SQLite	(DB	engine)
Query	time
14	options	
1,000 configurations
DB	Queries
2	hardware
2 versions
SaC (Compiler)
Execution	time
50	options	
71,267 configurations
10	Demo	programs
Linear shift happens only in limited environmental
changes
Soft Environmental change Severity Corr.
SPEAR
NUC/2 -> NUC/4 Small 1.00
Amazon_nano -> NUC Large 0.59
Hardware/workload/version V Large -0.10
x264
Version Large 0.06
Workload Medium 0.65
SQLite
write-seq -> write-batch Small 0.96
read-rand -> read-seq Medium 0.50
Target
Source
Throughput
Implication: Simple transfer learning is limited
to hardware changes in practice
log P(θ, Xobs)
Θ
l
P(θ|Xobs)
Θ
Figure 5: The first column shows the log joint probab
Linear shift happens only in limited environmental
changes
Soft Environmental change Severity Corr.
SPEAR
NUC/2 -> NUC/4 Small 1.00
Amazon_nano -> NUC Large 0.59
Hardware/workload/version V Large -0.10
x264
Version Large 0.06
Workload Medium 0.65
SQLite
write-seq -> write-batch Small 0.96
read-rand -> read-seq Medium 0.50
Target
Source
Throughput
Implication: Simple transfer learning is limited
to hardware changes in practice
log P(θ, Xobs)
Θ
l
P(θ|Xobs)
Θ
Figure 5: The first column shows the log joint probab
Linear shift happens only in limited environmental
changes
Soft Environmental change Severity Corr.
SPEAR
NUC/2 -> NUC/4 Small 1.00
Amazon_nano -> NUC Large 0.59
Hardware/workload/version V Large -0.10
x264
Version Large 0.06
Workload Medium 0.65
SQLite
write-seq -> write-batch Small 0.96
read-rand -> read-seq Medium 0.50
Target
Source
Throughput
Implication: Simple transfer learning is limited
to hardware changes in practice
log P(θ, Xobs)
Θ
l
P(θ|Xobs)
Θ
Figure 5: The first column shows the log joint probab
Linear shift happens only in limited environmental
changes
Soft Environmental change Severity Corr.
SPEAR
NUC/2 -> NUC/4 Small 1.00
Amazon_nano -> NUC Large 0.59
Hardware/workload/version V Large -0.10
x264
Version Large 0.06
Workload Medium 0.65
SQLite
write-seq -> write-batch Small 0.96
read-rand -> read-seq Medium 0.50
Target
Source
Throughput
Implication: Simple transfer learning is limited
to hardware changes in practice
log P(θ, Xobs)
Θ
l
P(θ|Xobs)
Θ
Figure 5: The first column shows the log joint probab
Soft Environmental change Severity Dim t-test
x264
Version Large
16
12 10
Hardware/workload/ver V Large 8 9
SQLite
write-seq -> write-batch V Large
14
3 4
read-rand -> read-seq Medium 1 1
SaC Workload V Large 50 16 10
Implication: Avoid wasting budget on non-informative part
of configuration space and focusing where it matters.
Influential options and interactions are preserved
across environments
Soft Environmental change Severity Dim t-test
x264
Version Large
16
12 10
Hardware/workload/ver V Large 8 9
SQLite
write-seq -> write-batch V Large
14
3 4
read-rand -> read-seq Medium 1 1
SaC Workload V Large 50 16 10
Implication: Avoid wasting budget on non-informative part
of configuration space and focusing where it matters.
Influential options and interactions are preserved
across environments
Soft Environmental change Severity Dim t-test
x264
Version Large
16
12 10
Hardware/workload/ver V Large 8 9
SQLite
write-seq -> write-batch V Large
14
3 4
read-rand -> read-seq Medium 1 1
SaC Workload V Large 50 16 10
Implication: Avoid wasting budget on non-informative part
of configuration space and focusing where it matters.
Influential options and interactions are preserved
across environments
Soft Environmental change Severity Dim t-test
x264
Version Large
16
12 10
Hardware/workload/ver V Large 8 9
SQLite
write-seq -> write-batch V Large
14
3 4
read-rand -> read-seq Medium 1 1
SaC Workload V Large 50 16 10
Implication: Avoid wasting budget on non-informative part
of configuration space and focusing where it matters.
Influential options and interactions are preserved
across environments
216
250
= 0.000000000058
We only need to
explore part of
the space:
Transfer learning across environment
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
ys1 = fs(c1)
ys2 = fs(c2)
ys3 = fs(c3)
ysn = fs(cn)
̂fs ∼ fs( ⋅ )
Transfer learning across environment
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
ys1 = fs(c1)
ys2 = fs(c2)
ys3 = fs(c3)
ysn = fs(cn)
̂fs ∼ fs( ⋅ )
Transfer learning across environment
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
ys1 = fs(c1)
ys2 = fs(c2)
ys3 = fs(c3)
ysn = fs(cn)
Source
(Execution time of Program X)
̂fs ∼ fs( ⋅ )
Transfer learning across environment
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
ys1 = fs(c1)
ys2 = fs(c2)
ys3 = fs(c3)
ysn = fs(cn)
Source
(Execution time of Program X)
Learn
performance
model ̂fs ∼ fs( ⋅ )
Observation 1: Not all options and interactions are influential
and interactions degree between options are not high
̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
ℂ = O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10
Observation 2: Influential options and interactions
are preserved across environments
̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
Observation 2: Influential options and interactions
are preserved across environments
̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
Observation 2: Influential options and interactions
are preserved across environments
̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
Observation 2: Influential options and interactions
are preserved across environments
̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
Observation 2: Influential options and interactions
are preserved across environments
̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
Transfer Learning for Performance Modeling of
Configurable Systems: An Exploratory Analysis
Pooyan Jamshidi
Carnegie Mellon University, USA
Norbert Siegmund
Bauhaus-University Weimar, Germany
Miguel Velez, Christian K¨astner
Akshay Patel, Yuvraj Agarwal
Carnegie Mellon University, USA
Abstract—Modern software systems provide many configura-
tion options which significantly influence their non-functional
properties. To understand and predict the effect of configuration
options, several sampling and learning strategies have been
proposed, albeit often with significant cost to cover the highly
dimensional configuration space. Recently, transfer learning has
been applied to reduce the effort of constructing performance
models by transferring knowledge about performance behavior
across environments. While this line of research is promising to
learn more accurate models at a lower cost, it is unclear why
and when transfer learning works for performance modeling. To
shed light on when it is beneficial to apply transfer learning, we
conducted an empirical study on four popular software systems,
varying software configurations and environmental conditions,
such as hardware, workload, and software versions, to identify
the key knowledge pieces that can be exploited for transfer
learning. Our results show that in small environmental changes
(e.g., homogeneous workload change), by applying a linear
transformation to the performance model, we can understand
the performance behavior of the target environment, while for
severe environmental changes (e.g., drastic workload change) we
can transfer only knowledge that makes sampling more efficient,
e.g., by reducing the dimensionality of the configuration space.
Index Terms—Performance analysis, transfer learning.
Fig. 1: Transfer learning is a form of machine learning that takes
advantage of transferable knowledge from source to learn an accurate,
reliable, and less costly model for the target environment.
their byproducts across environments is demanded by many
Details: [ASE ’17]
Details: [AAAI Spring Symposium ’19]
Outline
Case
Study
Transfer
Learning
Theory
Building
Guided
Sampling
Current
Research
How to sample the
configuration space to
learn a “better”
performance behavior?
How to select the most
informative configurations?
The similarity across environment is a
rich source of knowledge for
exploration of the configuration space
When we treat the system as black boxes, we cannot
typically distinguish between different configurations
O1 × O2 × ⋯ × O19 × O20
0 × 0 × ⋯ × 0 × 1
0 × 0 × ⋯ × 1 × 0
0 × 0 × ⋯ × 1 × 1
1 × 1 × ⋯ × 1 × 0
1 × 1 × ⋯ × 1 × 1
⋯
c1
c2
c3
cn
• We therefore end up blindly explore the
configuration space

• That is essentially the key reason why “most”
work in this area consider random sampling.
Without considering this knowledge, many
samples may not provide new information
̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
Without considering this knowledge, many
samples may not provide new information
1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 0c1
c2 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 1
O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10
c3 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 1 × 0
1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1c128
⋯
̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
Without considering this knowledge, many
samples may not provide new information
1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 0c1
c2 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 1
O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10
̂fs(c1) = 14.9
̂fs(c2) = 14.9
c3 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 1 × 0 ̂fs(c3) = 14.9
1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 ̂fs(c128) = 14.9c128
⋯
̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
Without knowing this
knowledge, many blind/
random samples may not
provide any additional
information about
performance of the system
Evaluation:
Learning performance behavior of
Machine Learning Systems
ML system: https://pooyanjamshidi.github.io/mls
Configurations of deep neural networks affect
accuracy and energy consumption
0 500 1000 1500 2000 2500
Energy consumption [J]
0
20
40
60
80
100
Validation(test)error
CNN on CNAE-9 Data Set
72%
22X
Configurations of deep neural networks affect
accuracy and energy consumption
0 500 1000 1500 2000 2500
Energy consumption [J]
0
20
40
60
80
100
Validation(test)error
CNN on CNAE-9 Data Set
72%
22X
the selected cell is plugged into a large model which is trained on the combinatio
validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We not
is never used for model selection, and it is only used for final model evaluation. We
cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge dat
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
image
conv3x3/2
conv3x3/2
sep.conv3x3/2
globalpool
linear&softmax
cell
cell
cell
cell
cell
cell
cell
Figure 2: Image classification models constructed using the cells optimized with arc
Top-left: small model used during architecture search on CIFAR-10. Top-right:
model used for learned cell evaluation. Bottom: ImageNet model used for learned
For CIFAR-10 experiments we use a model which consists of 3 ⇥ 3 convolution w
Configurations of deep neural networks affect
accuracy and energy consumption
0 500 1000 1500 2000 2500
Energy consumption [J]
0
20
40
60
80
100
Validation(test)error
CNN on CNAE-9 Data Set
72%
22X
the selected cell is plugged into a large model which is trained on the combinatio
validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We not
is never used for model selection, and it is only used for final model evaluation. We
cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge dat
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
image
conv3x3/2
conv3x3/2
sep.conv3x3/2
globalpool
linear&softmax
cell
cell
cell
cell
cell
cell
cell
Figure 2: Image classification models constructed using the cells optimized with arc
Top-left: small model used during architecture search on CIFAR-10. Top-right:
model used for learned cell evaluation. Bottom: ImageNet model used for learned
For CIFAR-10 experiments we use a model which consists of 3 ⇥ 3 convolution w
the selected cell is plugged into a large model which is trained on the combination of training and
validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We note that the test set
is never used for model selection, and it is only used for final model evaluation. We also evaluate the
cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge dataset (Sect. 4.3).
sep.conv3x3/2
sep.conv3x3
sep.conv3x3/2
sep.conv3x3
sep.conv3x3
sep.conv3x3
image
conv3x3
globalpool
linear&softmax
cell
cell
cell
cell
cell
cell
Figure 2: Image classification models constructed using the cells optimized with architecture search.
Top-left: small model used during architecture search on CIFAR-10. Top-right: large CIFAR-10
model used for learned cell evaluation. Bottom: ImageNet model used for learned cell evaluation.
DNN measurements
are costly
Each sample cost ~1h

4000 * 1h ~= 6 months
DNN measurements
are costly
Each sample cost ~1h

4000 * 1h ~= 6 months

Yes, that’s the cost we
paid for conducting our
measurements!
L2S enables learning a more accurate model with less
samples exploiting the knowledge from the source
3 10 20 30 40 50 60 70
Sample Size
0
20
40
60
80
100
MeanAbsolutePercentageError
100
200
500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(a) DNN (hard)
3 10 20 30 40
Sample Si
0
20
40
60
80
100
MeanAbsolutePercentageError
L
L
D
M
R
(b) XGBoost (h
Convolutional Neural Network
L2S enables learning a more accurate model with less
samples exploiting the knowledge from the source
3 10 20 30 40 50 60 70
Sample Size
0
20
40
60
80
100
MeanAbsolutePercentageError
100
200
500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(a) DNN (hard)
3 10 20 30 40
Sample Si
0
20
40
60
80
100
MeanAbsolutePercentageError
L
L
D
M
R
(b) XGBoost (h
Convolutional Neural Network
L2S enables learning a more accurate model with less
samples exploiting the knowledge from the source
3 10 20 30 40 50 60 70
Sample Size
0
20
40
60
80
100
MeanAbsolutePercentageError
100
200
500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(a) DNN (hard)
3 10 20 30 40
Sample Si
0
20
40
60
80
100
MeanAbsolutePercentageError
L
L
D
M
R
(b) XGBoost (h
Convolutional Neural Network
L2S may also help data-reuse approach to learn
faster
30 40 50 60 70
Sample Size
100
200
500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(a) DNN (hard)
3 10 20 30 40 50 60 70
Sample Size
0
20
40
60
80
100
MeanAbsolutePercentageError 100
200
500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(b) XGBoost (hard)
3 10 20 30 40
Sample Siz
0
20
40
60
80
100
MeanAbsolutePercentageError
L
L
D
M
R
(c) Storm (ha
XGBoost
L2S may also help data-reuse approach to learn
faster
30 40 50 60 70
Sample Size
100
200
500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(a) DNN (hard)
3 10 20 30 40 50 60 70
Sample Size
0
20
40
60
80
100
MeanAbsolutePercentageError 100
200
500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(b) XGBoost (hard)
3 10 20 30 40
Sample Siz
0
20
40
60
80
100
MeanAbsolutePercentageError
L
L
D
M
R
(c) Storm (ha
XGBoost
L2S may also help data-reuse approach to learn
faster
30 40 50 60 70
Sample Size
100
200
500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(a) DNN (hard)
3 10 20 30 40 50 60 70
Sample Size
0
20
40
60
80
100
MeanAbsolutePercentageError 100
200
500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(b) XGBoost (hard)
3 10 20 30 40
Sample Siz
0
20
40
60
80
100
MeanAbsolutePercentageError
L
L
D
M
R
(c) Storm (ha
XGBoost
Evaluation:
Learning performance behavior of
Big Data Systems
Some environments the similarities across environments
may be too low and this results in “negative transfer”
0 30 40 50 60 70
Sample Size
100
200
500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(b) XGBoost (hard)
3 10 20 30 40 50 60 70
Sample Size
0
20
40
60
80
100
MeanAbsolutePercentageError
100200500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(c) Storm (hard)
Apache Storm
Some environments the similarities across environments
may be too low and this results in “negative transfer”
0 30 40 50 60 70
Sample Size
100
200
500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(b) XGBoost (hard)
3 10 20 30 40 50 60 70
Sample Size
0
20
40
60
80
100
MeanAbsolutePercentageError
100200500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(c) Storm (hard)
Apache Storm
Some environments the similarities across environments
may be too low and this results in “negative transfer”
0 30 40 50 60 70
Sample Size
100
200
500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(b) XGBoost (hard)
3 10 20 30 40 50 60 70
Sample Size
0
20
40
60
80
100
MeanAbsolutePercentageError
100200500
L2S+GP
L2S+DataReuseTL
DataReuseTL
ModelShift
Random+CART
(c) Storm (hard)
Apache Storm
Why performance
models using L2S
sample are more
accurate?
The samples generated by L2S contains more
information… “entropy <-> information gain”
10 20 30 40 50 60 70
sample size
0
1
2
3
4
5
6
entropy[bits]
Max entropy
L2S
Random
DNN
The samples generated by L2S contains more
information… “entropy <-> information gain”
10 20 30 40 50 60 70
sample size
0
1
2
3
4
5
6
entropy[bits]
Max entropy
L2S
Random
10 20 30 40 50 60 70
sample size
0
1
2
3
4
5
6
7
entropy[bits]
Max entropy
L2S
Random
10 20 30 40 50 60 70
sample size
0
1
2
3
4
5
6
7
entropy[bits]
Max entropy
L2S
Random
DNN XGboost Storm
Limitations
• Limited number of systems and environmental changes

• Synthetic models 

• https://github.com/pooyanjamshidi/GenPerf

• Binary options

• Non-binary options -> binary

• Negative transfer
Details: [FSE ’18]
Outline
Case
Study
Transfer
Learning
Empirical
Study
Guided
Sampling
Current
Research
What will the software systems
of the future look like?
Software 2.0
Increasingly customized and configurable
VISION
Software 2.0
Increasingly customized and configurable
VISION
Software 2.0
Increasingly customized and configurable
VISION
Software 2.0
Increasingly customized and configurable
VISION
Software 2.0
Increasingly customized and configurable
VISION
Increasingly competing objectives
Accuracy
Training speed
Inference speed
Model size
Energy
Deep neural
network as a
highly
configurable
system
of top/bottom conf.; M6/M7: Number of influential options; M8/M9: Number of options agree
Correlation btw importance of options; M11/M12: Number of interactions; M13: Number of inte
e↵ects; M14: Correlation btw the coe↵s;
Input #1
Input #2
Input #3
Input #4
Output
Hidden
layer
Input
layer
Output
layer
4 Technical Aims and Research Plan
We will pursue the following technical aims: (1) investigate potential criteria for e↵ectiv
exploration of the design space of DNN architectures (Section 4.2), (2) build analytical m
curately predict the performance of a given architecture configuration given other similar
which either have been measured in the target environments or other similar environm
measuring the network performance directly (Section 4.3), and (3), develop a tunni
that exploit the performance model from previous step to e↵ectively search for optima
(Section 4.4).
4.1 Project Timeline
We plan to complete the proposed project in two years. To mitigate project risks, we
project into three major phases:
8
Network
Design
Model
Compiler
Hybrid
Deployment
OS/
Hardware
Scopeof
thisProject
Neural Search
Hardware
Optimization
Hyper-parameter
DNNsystemdevelopmentstack
Deployment
Topology
We found many configuration with the same accuracy
while having drastically different energy demand
0 500 1000 1500 2000 2500
Energy consumption [J]
0
20
40
60
80
100
Validation(test)error
CNN on CNAE-9 Data Set
72%
22X
We found many configuration with the same accuracy
while having drastically different energy demand
0 500 1000 1500 2000 2500
Energy consumption [J]
0
20
40
60
80
100
Validation(test)error
CNN on CNAE-9 Data Set
72%
22X
100 150 200 250 300 350 400
Energy consumption [J]
8
10
12
14
16
18
Validation(test)error
CNN on CNAE-9 Data Set
22X
10%
300J
Pareto frontier
Details: [OpML ’19]
Thanks
Our Group
Yan Ren, ML + Systems Shahriar Iqbal, Systems + ML Rui Xin, Neural Arch Search
Jianhai Su, AI + Systems
+ Robotics
Ying Meng, AI Testing
Yuxiang Sun, Deep RL
Thahimum Hassan, AI
Fairness
Peter Mourfield, ML +
Causal Inference
Mohammad Ali Javidian,
ML + Causal Inference
pooyanjamshidi

More Related Content

Similar to Transfer Learning for Performance Analysis of Machine Learning Systems

Embedding Metadata In Word Processing Documents
Embedding Metadata In Word Processing DocumentsEmbedding Metadata In Word Processing Documents
Embedding Metadata In Word Processing DocumentsJim Downing
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...IEEEFINALYEARSTUDENTPROJECT
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...IEEEMEMTECHSTUDENTSPROJECTS
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEEFINALYEARSTUDENTPROJECTS
 
UMLassure: An approach to model software security
UMLassure: An approach to model software securityUMLassure: An approach to model software security
UMLassure: An approach to model software securitymanishthaper
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsPooyan Jamshidi
 
Exploiting undefined behaviors for efficient symbolic execution
Exploiting undefined behaviors for efficient symbolic executionExploiting undefined behaviors for efficient symbolic execution
Exploiting undefined behaviors for efficient symbolic executionAsankhaya Sharma
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
TUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsTUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsHong-Linh Truong
 
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Qazi Maaz Arshad
 
IRJET- Lost: The Horror Game
IRJET- Lost: The Horror GameIRJET- Lost: The Horror Game
IRJET- Lost: The Horror GameIRJET Journal
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEEMEMTECHSTUDENTPROJECTS
 
IRJET - Student Future Prediction System under Filtering Mechanism
IRJET - Student Future Prediction System under Filtering MechanismIRJET - Student Future Prediction System under Filtering Mechanism
IRJET - Student Future Prediction System under Filtering MechanismIRJET Journal
 
Minor Project Synopsis on Data Structure Visualizer
Minor Project Synopsis on Data Structure VisualizerMinor Project Synopsis on Data Structure Visualizer
Minor Project Synopsis on Data Structure VisualizerRonitShrivastava057
 
Hardware Design Practices For Modern Hardware
Hardware Design Practices For Modern HardwareHardware Design Practices For Modern Hardware
Hardware Design Practices For Modern HardwareWinstina Kennedy
 
IRJET- Course outcome Attainment Estimation System
IRJET-  	  Course outcome Attainment Estimation SystemIRJET-  	  Course outcome Attainment Estimation System
IRJET- Course outcome Attainment Estimation SystemIRJET Journal
 
Resume_Anton_Boshoff
Resume_Anton_BoshoffResume_Anton_Boshoff
Resume_Anton_BoshoffAnton Boshoff
 
A Hierarchical Distributed Processing Framework for Huge Image by using Big Data
A Hierarchical Distributed Processing Framework for Huge Image by using Big DataA Hierarchical Distributed Processing Framework for Huge Image by using Big Data
A Hierarchical Distributed Processing Framework for Huge Image by using Big DataIRJET Journal
 

Similar to Transfer Learning for Performance Analysis of Machine Learning Systems (20)

Embedding Metadata In Word Processing Documents
Embedding Metadata In Word Processing DocumentsEmbedding Metadata In Word Processing Documents
Embedding Metadata In Word Processing Documents
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
 
UMLassure: An approach to model software security
UMLassure: An approach to model software securityUMLassure: An approach to model software security
UMLassure: An approach to model software security
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
 
Exploiting undefined behaviors for efficient symbolic execution
Exploiting undefined behaviors for efficient symbolic executionExploiting undefined behaviors for efficient symbolic execution
Exploiting undefined behaviors for efficient symbolic execution
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
TUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsTUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflows
 
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
 
Online Job Portal
Online Job PortalOnline Job Portal
Online Job Portal
 
IRJET- Lost: The Horror Game
IRJET- Lost: The Horror GameIRJET- Lost: The Horror Game
IRJET- Lost: The Horror Game
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
 
Resume
ResumeResume
Resume
 
IRJET - Student Future Prediction System under Filtering Mechanism
IRJET - Student Future Prediction System under Filtering MechanismIRJET - Student Future Prediction System under Filtering Mechanism
IRJET - Student Future Prediction System under Filtering Mechanism
 
Minor Project Synopsis on Data Structure Visualizer
Minor Project Synopsis on Data Structure VisualizerMinor Project Synopsis on Data Structure Visualizer
Minor Project Synopsis on Data Structure Visualizer
 
Hardware Design Practices For Modern Hardware
Hardware Design Practices For Modern HardwareHardware Design Practices For Modern Hardware
Hardware Design Practices For Modern Hardware
 
IRJET- Course outcome Attainment Estimation System
IRJET-  	  Course outcome Attainment Estimation SystemIRJET-  	  Course outcome Attainment Estimation System
IRJET- Course outcome Attainment Estimation System
 
Resume_Anton_Boshoff
Resume_Anton_BoshoffResume_Anton_Boshoff
Resume_Anton_Boshoff
 
A Hierarchical Distributed Processing Framework for Huge Image by using Big Data
A Hierarchical Distributed Processing Framework for Huge Image by using Big DataA Hierarchical Distributed Processing Framework for Huge Image by using Big Data
A Hierarchical Distributed Processing Framework for Huge Image by using Big Data
 

More from Pooyan Jamshidi

Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachLearning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachPooyan Jamshidi
 
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn... A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...Pooyan Jamshidi
 
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Pooyan Jamshidi
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Pooyan Jamshidi
 
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...Pooyan Jamshidi
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsPooyan Jamshidi
 
Transfer Learning for Software Performance Analysis: An Exploratory Analysis
Transfer Learning for Software Performance Analysis: An Exploratory AnalysisTransfer Learning for Software Performance Analysis: An Exploratory Analysis
Transfer Learning for Software Performance Analysis: An Exploratory AnalysisPooyan Jamshidi
 
Sensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic SoftwareSensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic SoftwarePooyan Jamshidi
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Pooyan Jamshidi
 
Transfer Learning for Improving Model Predictions in Robotic Systems
Transfer Learning for Improving Model Predictions  in Robotic SystemsTransfer Learning for Improving Model Predictions  in Robotic Systems
Transfer Learning for Improving Model Predictions in Robotic SystemsPooyan Jamshidi
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOpsPooyan Jamshidi
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...Pooyan Jamshidi
 
Configuration Optimization Tool
Configuration Optimization ToolConfiguration Optimization Tool
Configuration Optimization ToolPooyan Jamshidi
 
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Pooyan Jamshidi
 
Configuration Optimization for Big Data Software
Configuration Optimization for Big Data SoftwareConfiguration Optimization for Big Data Software
Configuration Optimization for Big Data SoftwarePooyan Jamshidi
 
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...Pooyan Jamshidi
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICEPooyan Jamshidi
 
Self learning cloud controllers
Self learning cloud controllersSelf learning cloud controllers
Self learning cloud controllersPooyan Jamshidi
 

More from Pooyan Jamshidi (20)

Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachLearning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
 
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn... A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
 
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
 
Learning to Sample
Learning to SampleLearning to Sample
Learning to Sample
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of Robots
 
Transfer Learning for Software Performance Analysis: An Exploratory Analysis
Transfer Learning for Software Performance Analysis: An Exploratory AnalysisTransfer Learning for Software Performance Analysis: An Exploratory Analysis
Transfer Learning for Software Performance Analysis: An Exploratory Analysis
 
Architecting for Scale
Architecting for ScaleArchitecting for Scale
Architecting for Scale
 
Sensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic SoftwareSensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic Software
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
 
Transfer Learning for Improving Model Predictions in Robotic Systems
Transfer Learning for Improving Model Predictions  in Robotic SystemsTransfer Learning for Improving Model Predictions  in Robotic Systems
Transfer Learning for Improving Model Predictions in Robotic Systems
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
 
Configuration Optimization Tool
Configuration Optimization ToolConfiguration Optimization Tool
Configuration Optimization Tool
 
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
 
Configuration Optimization for Big Data Software
Configuration Optimization for Big Data SoftwareConfiguration Optimization for Big Data Software
Configuration Optimization for Big Data Software
 
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICE
 
Self learning cloud controllers
Self learning cloud controllersSelf learning cloud controllers
Self learning cloud controllers
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Transfer Learning for Performance Analysis of Machine Learning Systems

  • 1. Transfer Learning for Performance Analysis of Machine Learning Systems Pooyan Jamshidi University of South Carolina pooyanjamshidi
  • 2.
  • 3. BSc. In CS, Iran, 1999-2003
  • 4. BSc. In CS, Iran, 1999-2003 Masters of Eng., Iran, 2003-2006
  • 5. BSc. In CS, Iran, 1999-2003 Masters of Eng., Iran, 2003-2006 Software Industry, 2003-2010
  • 6. BSc. In CS, Iran, 1999-2003 Masters of Eng., Iran, 2003-2006 Software Industry, 2003-2010 PhD, DCU, Ireland 2010-2014
  • 7. BSc. In CS, Iran, 1999-2003 Masters of Eng., Iran, 2003-2006 Software Industry, 2003-2010 PhD, DCU, Ireland 2010-2014 Postdoc1, Imperial, UK, 2014-2016
  • 8. BSc. In CS, Iran, 1999-2003 Masters of Eng., Iran, 2003-2006 Software Industry, 2003-2010 PhD, DCU, Ireland 2010-2014 Postdoc1, Imperial, UK, 2014-2016 Postdoc2, CMU, PA, 2016-2018
  • 9. BSc. In CS, Iran, 1999-2003 Masters of Eng., Iran, 2003-2006 Software Industry, 2003-2010 PhD, DCU, Ireland 2010-2014 Postdoc1, Imperial, UK, 2014-2016 Postdoc2, CMU, PA, 2016-2018 Assistant Prof., UofSC, 2018-*
  • 10.
  • 11.
  • 12.
  • 13. Goal: Enable developers/users to find the right quality tradeoff
  • 14. Today’s most popular systems are configurable built
  • 15. Today’s most popular systems are configurable built
  • 16. Today’s most popular systems are configurable built
  • 17. Today’s most popular systems are configurable built
  • 18. Today’s most popular systems are configurable built
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. Empirical observations confirm that systems are becoming increasingly configurable 08 7/2010 7/2012 7/2014 Release time 1/1999 1/2003 1/2007 1/2011 0 1/2014 N Release time 02 1/2006 1/2010 1/2014 2.2.14 2.3.4 2.0.35 .3.24 Release time Apache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Numberofparameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]
  • 24. Empirical observations confirm that systems are becoming increasingly configurable nia San Diego, ‡Huazhong Univ. of Science & Technology, †NetApp, Inc tixu, longjin, xuf001, yyzhou}@cs.ucsd.edu kar.Pasupathy, Rukma.Talwadker}@netapp.com prevalent, but also severely software. One fundamental y of configuration, reflected parameters (“knobs”). With m software to ensure high re- aunting, error-prone task. nderstanding a fundamental users really need so many answer, we study the con- including thousands of cus- m (Storage-A), and hundreds ce system software projects. ng findings to motivate soft- ore cautious and disciplined these findings, we provide ich can significantly reduce A as an example, the guide- ters and simplify 19.7% of on existing users. Also, we tion methods in the context 7/2006 7/2008 7/2010 7/2012 7/2014 0 100 200 300 400 500 600 700 Storage-A Numberofparameters Release time 1/1999 1/2003 1/2007 1/2011 0 100 200 300 400 500 5.6.2 5.5.0 5.0.16 5.1.3 4.1.0 4.0.12 3.23.0 1/2014 MySQL Numberofparameters Release time 1/1998 1/2002 1/2006 1/2010 1/2014 0 100 200 300 400 500 600 1.3.14 2.2.14 2.3.4 2.0.35 1.3.24 Numberofparameters Release time Apache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Numberofparameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]
  • 25. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX
  • 26. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX
  • 27. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX
  • 28. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX
  • 29. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX Speed Energy
  • 30. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX Speed Energy
  • 31. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX Speed Energy
  • 32. How do we understand performance behavior of real-world highly-configurable systems that scale well…
  • 33. How do we understand performance behavior of real-world highly-configurable systems that scale well… … and enable developers/users to reason about qualities (performance, energy) and to make tradeoff?
  • 39. SocialSensor •Identifying trending topics •Identifying user defined topics •Social media search
  • 43. SocialSensor Orchestrator Crawling Tweets: [5k-20k/min] Every 10 min: [100k tweets] Store Crawled items FetchInternet
  • 44. SocialSensor Content AnalysisOrchestrator Crawling Tweets: [5k-20k/min] Every 10 min: [100k tweets] Store Push Crawled items FetchInternet
  • 45. SocialSensor Content AnalysisOrchestrator Crawling Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Store Push Store Crawled items FetchInternet
  • 46. SocialSensor Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet
  • 47. Challenges Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet
  • 48. Challenges Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet 100X
  • 49. Challenges Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet 100X 10X
  • 50. Challenges Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet 100X 10X Real time
  • 51. How can we gain a better performance without using more resources?
  • 52. Let’s try out different system configurations!
  • 53. Opportunity: Data processing engines in the pipeline were all configurable
  • 54. Opportunity: Data processing engines in the pipeline were all configurable
  • 55. Opportunity: Data processing engines in the pipeline were all configurable
  • 56. Opportunity: Data processing engines in the pipeline were all configurable
  • 57. Opportunity: Data processing engines in the pipeline were all configurable > 100 > 100 > 100
  • 58. Opportunity: Data processing engines in the pipeline were all configurable > 100 > 100 > 100 2300
  • 59.
  • 60. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) Default configuration was bad, so was the expert’ better better
  • 61. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) Default configuration was bad, so was the expert’ Defaultbetter better
  • 62. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) Default configuration was bad, so was the expert’ Default Recommended by an expert better better
  • 63. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) Default configuration was bad, so was the expert’ Default Recommended by an expert Optimal Configuration better better
  • 64. 0 0.5 1 1.5 2 2.5 Throughput (ops/sec) 10 4 0 50 100 150 200 250 300 Latency(ms) Default configuration was bad, so was the expert’ better better
  • 65. 0 0.5 1 1.5 2 2.5 Throughput (ops/sec) 10 4 0 50 100 150 200 250 300 Latency(ms) Default configuration was bad, so was the expert’ Default better better
  • 66. 0 0.5 1 1.5 2 2.5 Throughput (ops/sec) 10 4 0 50 100 150 200 250 300 Latency(ms) Default configuration was bad, so was the expert’ Default Recommended by an expert better better
  • 67. 0 0.5 1 1.5 2 2.5 Throughput (ops/sec) 10 4 0 50 100 150 200 250 300 Latency(ms) Default configuration was bad, so was the expert’ Default Recommended by an expert Optimal Configuration better better
  • 68. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) The default configuration is typically bad and the optimal configuration is noticeably better than median better better
  • 69. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) The default configuration is typically bad and the optimal configuration is noticeably better than median Default Configurationbetter better • Default is bad
  • 70. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) The default configuration is typically bad and the optimal configuration is noticeably better than median Default Configuration Optimal Configuration better better • Default is bad
  • 71. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) The default configuration is typically bad and the optimal configuration is noticeably better than median Default Configuration Optimal Configuration better better • Default is bad • 2X-10X faster than worst
  • 72. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) The default configuration is typically bad and the optimal configuration is noticeably better than median Default Configuration Optimal Configuration better better • Default is bad • 2X-10X faster than worst • Noticeably faster than median
  • 73. What did happen at the end? • Achieved the objectives (100X user, same experience) • Saved money by reducing cloud resources up to 20% • Our tool was able to identify configurations that was consistently better than expert recommendation
  • 77. Setting the scene ℂ = O1 × O2 × ⋯ × O19 × O20 Compiler (e.f., SaC, LLVM)
  • 78. Setting the scene ℂ = O1 × O2 × ⋯ × O19 × O20 Dead code removal Compiler (e.f., SaC, LLVM)
  • 79. Setting the scene ℂ = O1 × O2 × ⋯ × O19 × O20 Dead code removal Constant folding Loop unrolling Function inlining Compiler (e.f., SaC, LLVM)
  • 80. Setting the scene ℂ = O1 × O2 × ⋯ × O19 × O20 Dead code removal Configuration Space Constant folding Loop unrolling Function inlining Compiler (e.f., SaC, LLVM)
  • 81. Setting the scene ℂ = O1 × O2 × ⋯ × O19 × O20 Dead code removal Configuration Space Constant folding Loop unrolling Function inlining Compiler (e.f., SaC, LLVM) Program
  • 82. Setting the scene ℂ = O1 × O2 × ⋯ × O19 × O20 Dead code removal Configuration Space Constant folding Loop unrolling Function inlining c1 = 0 × 0 × ⋯ × 0 × 1c1 ∈ℂ Compiler (e.f., SaC, LLVM) Program
  • 83. Setting the scene ℂ = O1 × O2 × ⋯ × O19 × O20 Dead code removal Configuration Space Constant folding Loop unrolling Function inlining c1 = 0 × 0 × ⋯ × 0 × 1c1 ∈ℂ Compiler (e.f., SaC, LLVM) Program Compiled Code Compile Configure
  • 84. Setting the scene ℂ = O1 × O2 × ⋯ × O19 × O20 Dead code removal Configuration Space Constant folding Loop unrolling Function inlining c1 = 0 × 0 × ⋯ × 0 × 1c1 ∈ℂ Compiler (e.f., SaC, LLVM) Program Compiled Code Instrumented Binary Hardware Compile Deploy Configure
  • 85. Setting the scene ℂ = O1 × O2 × ⋯ × O19 × O20 Dead code removal Configuration Space Constant folding Loop unrolling Function inlining c1 = 0 × 0 × ⋯ × 0 × 1c1 ∈ℂ fc(c1) = 11.1msCompile time Execution time Energy Compiler (e.f., SaC, LLVM) Program Compiled Code Instrumented Binary Hardware Compile Deploy Configure fe(c1) = 110.3ms fen(c1) = 100mwh Non-functional measurable/quantifiable aspect
  • 86. A typical approach for understanding the performance behavior is sensitivity analysis O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn y1 = f(c1) y2 = f(c2) y3 = f(c3) yn = f(cn) ⋯
  • 87. A typical approach for understanding the performance behavior is sensitivity analysis O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn y1 = f(c1) y2 = f(c2) y3 = f(c3) yn = f(cn) ⋯ Training/Sample Set
  • 88. A typical approach for understanding the performance behavior is sensitivity analysis O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn y1 = f(c1) y2 = f(c2) y3 = f(c3) yn = f(cn) ̂f ∼ f( ⋅ ) ⋯ LearnTraining/Sample Set
  • 89. Performance model could be in any appropriate form of black-box models O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn y1 = f(c1) y2 = f(c2) y3 = f(c3) yn = f(cn) ̂f ∼ f( ⋅ ) ⋯ Training/Sample Set
  • 90. Performance model could be in any appropriate form of black-box models O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn y1 = f(c1) y2 = f(c2) y3 = f(c3) yn = f(cn) ̂f ∼ f( ⋅ ) ⋯ Training/Sample Set
  • 91. Performance model could be in any appropriate form of black-box models O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn y1 = f(c1) y2 = f(c2) y3 = f(c3) yn = f(cn) ̂f ∼ f( ⋅ ) ⋯ Training/Sample Set
  • 92. Performance model could be in any appropriate form of black-box models O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn y1 = f(c1) y2 = f(c2) y3 = f(c3) yn = f(cn) ̂f ∼ f( ⋅ ) ⋯ Training/Sample Set
  • 93. Evaluating a performance model O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn y1 = f(c1) y2 = f(c2) y3 = f(c3) yn = f(cn) ̂f ∼ f( ⋅ ) ⋯ LearnTraining/Sample Set
  • 94. Evaluating a performance model O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn y1 = f(c1) y2 = f(c2) y3 = f(c3) yn = f(cn) ̂f ∼ f( ⋅ ) ⋯ LearnTraining/Sample Set
  • 95. Evaluating a performance model O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn y1 = f(c1) y2 = f(c2) y3 = f(c3) yn = f(cn) ̂f ∼ f( ⋅ ) ⋯ LearnTraining/Sample Set Evaluate Accuracy APE( ̂f, f ) = | ̂f(c) − f(c)| f(c) × 100
  • 96. A performance model contain useful information about influential options and interactions f( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 f : ℂ → ℝ
  • 97. A performance model contain useful information about influential options and interactions f( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 f : ℂ → ℝ
  • 98. A performance model contain useful information about influential options and interactions f( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 f : ℂ → ℝ
  • 99. Performance model can then be used to reason about qualities void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX endif else PARROT_HAS_SETENV LINUX f(·) = 5 + 3 ⇥ o1 Execution time (s)
  • 100. Performance model can then be used to reason about qualities void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX endif else PARROT_HAS_SETENV LINUX f(·) = 5 + 3 ⇥ o1 Execution time (s) f(o1 := 1) = 8
  • 101. Performance model can then be used to reason about qualities void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX endif else PARROT_HAS_SETENV LINUX f(·) = 5 + 3 ⇥ o1 Execution time (s) f(o1 := 0) = 5 f(o1 := 1) = 8
  • 102. Insight: Performance measurements of the real system is “similar” to the ones from the simulators Measure Configurations Performance
  • 103. Insight: Performance measurements of the real system is “similar” to the ones from the simulators Measure Configurations Performance
  • 104. Insight: Performance measurements of the real system is “similar” to the ones from the simulators Measure Simulator (Gazebo) Data Configurations Performance So why not reuse these data, instead of measuring on real robot?
  • 105. We developed methods to make learning cheaper via transfer learning Target (Learn)Source (Given) DataModel Transferable Knowledge II. INTUITION Understanding the performance behavior of configurable software systems can enable (i) performance debugging, (ii) performance tuning, (iii) design-time evolution, or (iv) runtime adaptation [11]. We lack empirical understanding of how the performance behavior of a system will vary when the environ- ment of the system changes. Such empirical understanding will provide important insights to develop faster and more accurate learning techniques that allow us to make predictions and optimizations of performance for highly configurable systems in changing environments [10]. For instance, we can learn performance behavior of a system on a cheap hardware in a controlled lab environment and use that to understand the per- formance behavior of the system on a production server before shipping to the end user. More specifically, we would like to know, what the relationship is between the performance of a system in a specific environment (characterized by software configuration, hardware, workload, and system version) to the one that we vary its environmental conditions. In this research, we aim for an empirical understanding of A. Preliminary concept In this section, we p cepts that we use throu enable us to concisely 1) Configuration and the i-th feature of a co enabled or disabled an configuration space is m all the features C = Dom(Fi) = {0, 1}. A a member of the confi all the parameters are range (i.e., complete ins We also describe an e = [w, h, v] drawn fr W ⇥H ⇥V , where they values for workload, ha 2) Performance mod configuration space F formance model is a b given some observation combination of system II. INTUITION rstanding the performance behavior of configurable systems can enable (i) performance debugging, (ii) ance tuning, (iii) design-time evolution, or (iv) runtime on [11]. We lack empirical understanding of how the ance behavior of a system will vary when the environ- the system changes. Such empirical understanding will important insights to develop faster and more accurate techniques that allow us to make predictions and ations of performance for highly configurable systems ging environments [10]. For instance, we can learn ance behavior of a system on a cheap hardware in a ed lab environment and use that to understand the per- e behavior of the system on a production server before to the end user. More specifically, we would like to hat the relationship is between the performance of a n a specific environment (characterized by software ation, hardware, workload, and system version) to the we vary its environmental conditions. s research, we aim for an empirical understanding of A. Preliminary concepts In this section, we provide formal definitions of cepts that we use throughout this study. The forma enable us to concisely convey concept throughout 1) Configuration and environment space: Let F the i-th feature of a configurable system A whic enabled or disabled and one of them holds by de configuration space is mathematically a Cartesian all the features C = Dom(F1) ⇥ · · · ⇥ Dom(F Dom(Fi) = {0, 1}. A configuration of a syste a member of the configuration space (feature spa all the parameters are assigned to a specific valu range (i.e., complete instantiations of the system’s pa We also describe an environment instance by 3 e = [w, h, v] drawn from a given environment s W ⇥H ⇥V , where they respectively represent sets values for workload, hardware and system version. 2) Performance model: Given a software syste configuration space F and environmental instances formance model is a black-box function f : F ⇥ given some observations of the system performanc combination of system’s features x 2 F in an en formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 F where ✏i ⇠ N (0, i). The training data for our regression models is then simply Dtr = {(xi, yi)}n i=1. In other words, a response function is simply a mapping from the input space to a measurable performance metric that produces interval-scaled data (here we assume it produces real numbers). 3) Performance distribution: For the performance model, we measured and associated the performance response to each configuration, now let introduce another concept where we vary the environment and we measure the performance. An empirical performance distribution is a stochastic process, pd : E ! (R), that defines a probability distribution over performance measures for each environmental conditions. To tem version) to the ons. al understanding of g via an informed learning a perfor- ed on a well-suited the knowledge we the main research information (trans- o both source and ore can be carried r. This transferable [10]. s that we consider uration is a set of is the primary vari- rstand performance ike to understand der study will be formance model is a black-box function f : F ⇥ E given some observations of the system performance fo combination of system’s features x 2 F in an enviro e 2 E. To construct a performance model for a syst with configuration space F, we run A in environment in e 2 E on various combinations of configurations xi 2 F record the resulting performance values yi = f(xi) + ✏i F where ✏i ⇠ N (0, i). The training data for our regr models is then simply Dtr = {(xi, yi)}n i=1. In other wo response function is simply a mapping from the input sp a measurable performance metric that produces interval- data (here we assume it produces real numbers). 3) Performance distribution: For the performance m we measured and associated the performance response t configuration, now let introduce another concept whe vary the environment and we measure the performanc empirical performance distribution is a stochastic pr pd : E ! (R), that defines a probability distributio performance measures for each environmental conditio Extract Reuse Learn Learn Goal: Gain strength by transferring information across environments
  • 106. What is transfer learning?
  • 107. What is transfer learning? Transfer learning is a machine learning technique, where knowledge gain during training in one type of problem is used to train in other similar type of problem
  • 108. What is the advantage of transfer learning? • During learning you may need thousands of rotten and fresh potato and hours of training to learn. • But now using the same knowledge of rotten features you can identify rotten tomato with less samples and training time. • You may have learned during daytime with enough light and exposure; but your present tomato identification job is at night. • You may have learned sitting very close, just beside the box of potato; but now for tomato identification you are in the other side of the glass.
  • 109. TurtleBot [P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17] Our transfer learning solution
  • 110. TurtleBot Simulator (Gazebo) [P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17] Our transfer learning solution
  • 111. Data Measure TurtleBot Simulator (Gazebo) [P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17] Our transfer learning solution
  • 112. DataData Data Measure Measure Reuse TurtleBot Simulator (Gazebo) [P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17] Configurations Our transfer learning solution
  • 113. DataData Data Measure Measure Reuse Learn TurtleBot Simulator (Gazebo) [P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17] Configurations Our transfer learning solution f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2
  • 114. Gaussian processes for performance modeling t = n output,f(x) input, x
  • 115. Gaussian processes for performance modeling t = n Observation output,f(x) input, x
  • 116. Gaussian processes for performance modeling t = n Observation Mean output,f(x) input, x
  • 117. Gaussian processes for performance modeling t = n Observation Mean Uncertainty output,f(x) input, x
  • 118. Gaussian processes for performance modeling t = n Observation Mean Uncertainty New observation output,f(x) input, x
  • 119. input, x Gaussian processes for performance modeling t = n t = n + 1 Observation Mean Uncertainty New observation output,f(x) input, x
  • 120. Gaussian Processes enables reasoning about performance Step 1: Fit GP to the data seen so far Step 2: Explore the model for regions of most variance Step 3: Sample that region Step 4: Repeat -1.5 -1 -0.5 0 0.5 1 1.5 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Configuration Space Empirical Model Experiment Experiment 0 20 40 60 80 100 120 140 160 180 200 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Selection Criteria Sequential Design
  • 121. The intuition behind our transfer learning approach Intuition: Observations on the source(s) can affect predictions on the target Example: Learning the chess game make learning the Go game a lot easier! Configurations Performance Models tween the source and target functions, g, f, using observations Ds, Dt to build the predictive model ˆf the relationship, we define the following kernel f k(f, g, x, x0 ) = kt(f, g) ⇥ kxx(x, x0 ), where the kernels kt represent the correlation bet and target function, while kxx is the covariance inputs. Typically, kxx is parameterized and its pa learnt by maximizing the marginal likelihood o given the observations from source and target D
  • 124. CoBot experiment: DARPA BRASS 0 2 4 6 8 Localization error [m] 10 15 20 25 30 35 40 CPUutilization[%] Pareto front better better no_of_particles=x no_of_refinement=y
  • 125. CoBot experiment: DARPA BRASS 0 2 4 6 8 Localization error [m] 10 15 20 25 30 35 40 CPUutilization[%] Energy constraint Pareto front better better no_of_particles=x no_of_refinement=y
  • 126. CoBot experiment: DARPA BRASS 0 2 4 6 8 Localization error [m] 10 15 20 25 30 35 40 CPUutilization[%] Energy constraint Safety constraint Pareto front better better no_of_particles=x no_of_refinement=y
  • 127. CoBot experiment: DARPA BRASS 0 2 4 6 8 Localization error [m] 10 15 20 25 30 35 40 CPUutilization[%] Energy constraint Safety constraint Pareto front Sweet Spot better better no_of_particles=x no_of_refinement=y
  • 129. CoBot experiment 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 Source (given) CPU [%]
  • 130. CoBot experiment 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 Source (given) Target (ground truth 6 months) CPU [%] CPU [%]
  • 131. CoBot experiment 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 Source (given) Target (ground truth 6 months) Prediction with 4 samples CPU [%] CPU [%]
  • 132. CoBot experiment 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 Source (given) Target (ground truth 6 months) Prediction with 4 samples Prediction with Transfer learning CPU [%] CPU [%]
  • 133. Transfer Learning for Improving Model Predictions in Highly Configurable Software Pooyan Jamshidi, Miguel Velez, Christian K¨astner Carnegie Mellon University, USA {pjamshid,mvelezce,kaestner}@cs.cmu.edu Norbert Siegmund Bauhaus-University Weimar, Germany norbert.siegmund@uni-weimar.de Prasad Kawthekar Stanford University, USA pkawthek@stanford.edu Abstract—Modern software systems are built to be used in dynamic environments using configuration capabilities to adapt to changes and external uncertainties. In a self-adaptation context, we are often interested in reasoning about the performance of the systems under different configurations. Usually, we learn a black-box model based on real measurements to predict the performance of the system given a specific configuration. However, as modern systems become more complex, there are many configuration parameters that may interact and we end up learning an exponentially large configuration space. Naturally, this does not scale when relying on real measurements in the actual changing environment. We propose a different solution: Instead of taking the measurements from the real system, we learn the model using samples from other sources, such as simulators that approximate performance of the real system at Predictive Model Learn Model with Transfer Learning Measure Measure Data Source Target Simulator (Source) Robot (Target) Adaptation Fig. 1: Transfer learning for performance model learning. order to identify the best performing configuration for a robot Details: [SEAMS ’17]
  • 135. Looking further: When transfer learning goes wrong 10 20 30 40 50 60 AbsolutePercentageError[%] Sources s s1 s2 s3 s4 s5 s6 noise-level 0 5 10 15 20 25 30 corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19 µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75 Insight: Predictions become more accurate when the source is more related to the target. Non-transfer-learning
  • 136. Looking further: When transfer learning goes wrong 10 20 30 40 50 60 AbsolutePercentageError[%] Sources s s1 s2 s3 s4 s5 s6 noise-level 0 5 10 15 20 25 30 corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19 µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75 It worked! Insight: Predictions become more accurate when the source is more related to the target. Non-transfer-learning
  • 137. Looking further: When transfer learning goes wrong 10 20 30 40 50 60 AbsolutePercentageError[%] Sources s s1 s2 s3 s4 s5 s6 noise-level 0 5 10 15 20 25 30 corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19 µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75 It worked! It didn’t! Insight: Predictions become more accurate when the source is more related to the target. Non-transfer-learning
  • 138. 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 5 10 15 20 25 30 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 10 12 14 16 18 20 22 24 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 10 15 20 25 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 10 15 20 25 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 6 8 10 12 14 16 18 20 22 24 (a) (b) (c) (d) (e) 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 12 14 16 18 20 22 24 (f) CPU usage [%] CPU usage [%] CPU usage [%] CPU usage [%] CPU usage [%] CPU usage [%]
  • 139. 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 5 10 15 20 25 30 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 10 12 14 16 18 20 22 24 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 10 15 20 25 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 10 15 20 25 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 6 8 10 12 14 16 18 20 22 24 (a) (b) (c) (d) (e) 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 12 14 16 18 20 22 24 (f) CPU usage [%] CPU usage [%] CPU usage [%] CPU usage [%] CPU usage [%] CPU usage [%] It worked! It worked! It worked! It didn’t! It didn’t! It didn’t!
  • 140. Key question: Can we develop a theory to explain when transfer learning works? Target (Learn)Source (Given) DataModel Transferable Knowledge II. INTUITION rstanding the performance behavior of configurable e systems can enable (i) performance debugging, (ii) mance tuning, (iii) design-time evolution, or (iv) runtime on [11]. We lack empirical understanding of how the mance behavior of a system will vary when the environ- the system changes. Such empirical understanding will important insights to develop faster and more accurate g techniques that allow us to make predictions and ations of performance for highly configurable systems ging environments [10]. For instance, we can learn mance behavior of a system on a cheap hardware in a ed lab environment and use that to understand the per- ce behavior of the system on a production server before g to the end user. More specifically, we would like to what the relationship is between the performance of a in a specific environment (characterized by software ration, hardware, workload, and system version) to the t we vary its environmental conditions. is research, we aim for an empirical understanding of mance behavior to improve learning via an informed g process. In other words, we at learning a perfor- model in a changed environment based on a well-suited g set that has been determined by the knowledge we in other environments. Therefore, the main research A. Preliminary concepts In this section, we provide formal definitions of four con- cepts that we use throughout this study. The formal notations enable us to concisely convey concept throughout the paper. 1) Configuration and environment space: Let Fi indicate the i-th feature of a configurable system A which is either enabled or disabled and one of them holds by default. The configuration space is mathematically a Cartesian product of all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where Dom(Fi) = {0, 1}. A configuration of a system is then a member of the configuration space (feature space) where all the parameters are assigned to a specific value in their range (i.e., complete instantiations of the system’s parameters). We also describe an environment instance by 3 variables e = [w, h, v] drawn from a given environment space E = W ⇥H ⇥V , where they respectively represent sets of possible values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 ON behavior of configurable erformance debugging, (ii) e evolution, or (iv) runtime understanding of how the will vary when the environ- mpirical understanding will op faster and more accurate to make predictions and ighly configurable systems or instance, we can learn on a cheap hardware in a that to understand the per- a production server before cifically, we would like to ween the performance of a (characterized by software and system version) to the conditions. empirical understanding of learning via an informed we at learning a perfor- ment based on a well-suited ned by the knowledge we erefore, the main research A. Preliminary concepts In this section, we provide formal definitions of four con- cepts that we use throughout this study. The formal notations enable us to concisely convey concept throughout the paper. 1) Configuration and environment space: Let Fi indicate the i-th feature of a configurable system A which is either enabled or disabled and one of them holds by default. The configuration space is mathematically a Cartesian product of all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where Dom(Fi) = {0, 1}. A configuration of a system is then a member of the configuration space (feature space) where all the parameters are assigned to a specific value in their range (i.e., complete instantiations of the system’s parameters). We also describe an environment instance by 3 variables e = [w, h, v] drawn from a given environment space E = W ⇥H ⇥V , where they respectively represent sets of possible values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 oad, hardware and system version. e model: Given a software system A with ce F and environmental instances E, a per- is a black-box function f : F ⇥ E ! R rvations of the system performance for each ystem’s features x 2 F in an environment ruct a performance model for a system A n space F, we run A in environment instance combinations of configurations xi 2 F, and ng performance values yi = f(xi) + ✏i, xi 2 (0, i). The training data for our regression mply Dtr = {(xi, yi)}n i=1. In other words, a is simply a mapping from the input space to ormance metric that produces interval-scaled ume it produces real numbers). e distribution: For the performance model, associated the performance response to each w let introduce another concept where we ment and we measure the performance. An mance distribution is a stochastic process, that defines a probability distribution over sures for each environmental conditions. To ormance distribution for a system A with ce F, similarly to the process of deriving models, we run A on various combinations 2 F, for a specific environment instance values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 F where ✏i ⇠ N (0, i). The training data for our regression models is then simply Dtr = {(xi, yi)}n i=1. In other words, a response function is simply a mapping from the input space to a measurable performance metric that produces interval-scaled data (here we assume it produces real numbers). 3) Performance distribution: For the performance model, we measured and associated the performance response to each configuration, now let introduce another concept where we vary the environment and we measure the performance. An empirical performance distribution is a stochastic process, pd : E ! (R), that defines a probability distribution over performance measures for each environmental conditions. To construct a performance distribution for a system A with configuration space F, similarly to the process of deriving the performance models, we run A on various combinations configurations xi 2 F, for a specific environment instance Extract Reuse Learn Learn
  • 141. Key question: Can we develop a theory to explain when transfer learning works? Target (Learn)Source (Given) DataModel Transferable Knowledge II. INTUITION rstanding the performance behavior of configurable e systems can enable (i) performance debugging, (ii) mance tuning, (iii) design-time evolution, or (iv) runtime on [11]. We lack empirical understanding of how the mance behavior of a system will vary when the environ- the system changes. Such empirical understanding will important insights to develop faster and more accurate g techniques that allow us to make predictions and ations of performance for highly configurable systems ging environments [10]. For instance, we can learn mance behavior of a system on a cheap hardware in a ed lab environment and use that to understand the per- ce behavior of the system on a production server before g to the end user. More specifically, we would like to what the relationship is between the performance of a in a specific environment (characterized by software ration, hardware, workload, and system version) to the t we vary its environmental conditions. is research, we aim for an empirical understanding of mance behavior to improve learning via an informed g process. In other words, we at learning a perfor- model in a changed environment based on a well-suited g set that has been determined by the knowledge we in other environments. Therefore, the main research A. Preliminary concepts In this section, we provide formal definitions of four con- cepts that we use throughout this study. The formal notations enable us to concisely convey concept throughout the paper. 1) Configuration and environment space: Let Fi indicate the i-th feature of a configurable system A which is either enabled or disabled and one of them holds by default. The configuration space is mathematically a Cartesian product of all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where Dom(Fi) = {0, 1}. A configuration of a system is then a member of the configuration space (feature space) where all the parameters are assigned to a specific value in their range (i.e., complete instantiations of the system’s parameters). We also describe an environment instance by 3 variables e = [w, h, v] drawn from a given environment space E = W ⇥H ⇥V , where they respectively represent sets of possible values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 ON behavior of configurable erformance debugging, (ii) e evolution, or (iv) runtime understanding of how the will vary when the environ- mpirical understanding will op faster and more accurate to make predictions and ighly configurable systems or instance, we can learn on a cheap hardware in a that to understand the per- a production server before cifically, we would like to ween the performance of a (characterized by software and system version) to the conditions. empirical understanding of learning via an informed we at learning a perfor- ment based on a well-suited ned by the knowledge we erefore, the main research A. Preliminary concepts In this section, we provide formal definitions of four con- cepts that we use throughout this study. The formal notations enable us to concisely convey concept throughout the paper. 1) Configuration and environment space: Let Fi indicate the i-th feature of a configurable system A which is either enabled or disabled and one of them holds by default. The configuration space is mathematically a Cartesian product of all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where Dom(Fi) = {0, 1}. A configuration of a system is then a member of the configuration space (feature space) where all the parameters are assigned to a specific value in their range (i.e., complete instantiations of the system’s parameters). We also describe an environment instance by 3 variables e = [w, h, v] drawn from a given environment space E = W ⇥H ⇥V , where they respectively represent sets of possible values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 oad, hardware and system version. e model: Given a software system A with ce F and environmental instances E, a per- is a black-box function f : F ⇥ E ! R rvations of the system performance for each ystem’s features x 2 F in an environment ruct a performance model for a system A n space F, we run A in environment instance combinations of configurations xi 2 F, and ng performance values yi = f(xi) + ✏i, xi 2 (0, i). The training data for our regression mply Dtr = {(xi, yi)}n i=1. In other words, a is simply a mapping from the input space to ormance metric that produces interval-scaled ume it produces real numbers). e distribution: For the performance model, associated the performance response to each w let introduce another concept where we ment and we measure the performance. An mance distribution is a stochastic process, that defines a probability distribution over sures for each environmental conditions. To ormance distribution for a system A with ce F, similarly to the process of deriving models, we run A on various combinations 2 F, for a specific environment instance values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 F where ✏i ⇠ N (0, i). The training data for our regression models is then simply Dtr = {(xi, yi)}n i=1. In other words, a response function is simply a mapping from the input space to a measurable performance metric that produces interval-scaled data (here we assume it produces real numbers). 3) Performance distribution: For the performance model, we measured and associated the performance response to each configuration, now let introduce another concept where we vary the environment and we measure the performance. An empirical performance distribution is a stochastic process, pd : E ! (R), that defines a probability distribution over performance measures for each environmental conditions. To construct a performance distribution for a system A with configuration space F, similarly to the process of deriving the performance models, we run A on various combinations configurations xi 2 F, for a specific environment instance Extract Reuse Learn Learn Q1: How source and target are “related”?
  • 142. Key question: Can we develop a theory to explain when transfer learning works? Target (Learn)Source (Given) DataModel Transferable Knowledge II. INTUITION rstanding the performance behavior of configurable e systems can enable (i) performance debugging, (ii) mance tuning, (iii) design-time evolution, or (iv) runtime on [11]. We lack empirical understanding of how the mance behavior of a system will vary when the environ- the system changes. Such empirical understanding will important insights to develop faster and more accurate g techniques that allow us to make predictions and ations of performance for highly configurable systems ging environments [10]. For instance, we can learn mance behavior of a system on a cheap hardware in a ed lab environment and use that to understand the per- ce behavior of the system on a production server before g to the end user. More specifically, we would like to what the relationship is between the performance of a in a specific environment (characterized by software ration, hardware, workload, and system version) to the t we vary its environmental conditions. is research, we aim for an empirical understanding of mance behavior to improve learning via an informed g process. In other words, we at learning a perfor- model in a changed environment based on a well-suited g set that has been determined by the knowledge we in other environments. Therefore, the main research A. Preliminary concepts In this section, we provide formal definitions of four con- cepts that we use throughout this study. The formal notations enable us to concisely convey concept throughout the paper. 1) Configuration and environment space: Let Fi indicate the i-th feature of a configurable system A which is either enabled or disabled and one of them holds by default. The configuration space is mathematically a Cartesian product of all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where Dom(Fi) = {0, 1}. A configuration of a system is then a member of the configuration space (feature space) where all the parameters are assigned to a specific value in their range (i.e., complete instantiations of the system’s parameters). We also describe an environment instance by 3 variables e = [w, h, v] drawn from a given environment space E = W ⇥H ⇥V , where they respectively represent sets of possible values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 ON behavior of configurable erformance debugging, (ii) e evolution, or (iv) runtime understanding of how the will vary when the environ- mpirical understanding will op faster and more accurate to make predictions and ighly configurable systems or instance, we can learn on a cheap hardware in a that to understand the per- a production server before cifically, we would like to ween the performance of a (characterized by software and system version) to the conditions. empirical understanding of learning via an informed we at learning a perfor- ment based on a well-suited ned by the knowledge we erefore, the main research A. Preliminary concepts In this section, we provide formal definitions of four con- cepts that we use throughout this study. The formal notations enable us to concisely convey concept throughout the paper. 1) Configuration and environment space: Let Fi indicate the i-th feature of a configurable system A which is either enabled or disabled and one of them holds by default. The configuration space is mathematically a Cartesian product of all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where Dom(Fi) = {0, 1}. A configuration of a system is then a member of the configuration space (feature space) where all the parameters are assigned to a specific value in their range (i.e., complete instantiations of the system’s parameters). We also describe an environment instance by 3 variables e = [w, h, v] drawn from a given environment space E = W ⇥H ⇥V , where they respectively represent sets of possible values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 oad, hardware and system version. e model: Given a software system A with ce F and environmental instances E, a per- is a black-box function f : F ⇥ E ! R rvations of the system performance for each ystem’s features x 2 F in an environment ruct a performance model for a system A n space F, we run A in environment instance combinations of configurations xi 2 F, and ng performance values yi = f(xi) + ✏i, xi 2 (0, i). The training data for our regression mply Dtr = {(xi, yi)}n i=1. In other words, a is simply a mapping from the input space to ormance metric that produces interval-scaled ume it produces real numbers). e distribution: For the performance model, associated the performance response to each w let introduce another concept where we ment and we measure the performance. An mance distribution is a stochastic process, that defines a probability distribution over sures for each environmental conditions. To ormance distribution for a system A with ce F, similarly to the process of deriving models, we run A on various combinations 2 F, for a specific environment instance values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 F where ✏i ⇠ N (0, i). The training data for our regression models is then simply Dtr = {(xi, yi)}n i=1. In other words, a response function is simply a mapping from the input space to a measurable performance metric that produces interval-scaled data (here we assume it produces real numbers). 3) Performance distribution: For the performance model, we measured and associated the performance response to each configuration, now let introduce another concept where we vary the environment and we measure the performance. An empirical performance distribution is a stochastic process, pd : E ! (R), that defines a probability distribution over performance measures for each environmental conditions. To construct a performance distribution for a system A with configuration space F, similarly to the process of deriving the performance models, we run A on various combinations configurations xi 2 F, for a specific environment instance Extract Reuse Learn Learn Q1: How source and target are “related”? Q2: What characteristics are preserved?
  • 143. Key question: Can we develop a theory to explain when transfer learning works? Target (Learn)Source (Given) DataModel Transferable Knowledge II. INTUITION rstanding the performance behavior of configurable e systems can enable (i) performance debugging, (ii) mance tuning, (iii) design-time evolution, or (iv) runtime on [11]. We lack empirical understanding of how the mance behavior of a system will vary when the environ- the system changes. Such empirical understanding will important insights to develop faster and more accurate g techniques that allow us to make predictions and ations of performance for highly configurable systems ging environments [10]. For instance, we can learn mance behavior of a system on a cheap hardware in a ed lab environment and use that to understand the per- ce behavior of the system on a production server before g to the end user. More specifically, we would like to what the relationship is between the performance of a in a specific environment (characterized by software ration, hardware, workload, and system version) to the t we vary its environmental conditions. is research, we aim for an empirical understanding of mance behavior to improve learning via an informed g process. In other words, we at learning a perfor- model in a changed environment based on a well-suited g set that has been determined by the knowledge we in other environments. Therefore, the main research A. Preliminary concepts In this section, we provide formal definitions of four con- cepts that we use throughout this study. The formal notations enable us to concisely convey concept throughout the paper. 1) Configuration and environment space: Let Fi indicate the i-th feature of a configurable system A which is either enabled or disabled and one of them holds by default. The configuration space is mathematically a Cartesian product of all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where Dom(Fi) = {0, 1}. A configuration of a system is then a member of the configuration space (feature space) where all the parameters are assigned to a specific value in their range (i.e., complete instantiations of the system’s parameters). We also describe an environment instance by 3 variables e = [w, h, v] drawn from a given environment space E = W ⇥H ⇥V , where they respectively represent sets of possible values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 ON behavior of configurable erformance debugging, (ii) e evolution, or (iv) runtime understanding of how the will vary when the environ- mpirical understanding will op faster and more accurate to make predictions and ighly configurable systems or instance, we can learn on a cheap hardware in a that to understand the per- a production server before cifically, we would like to ween the performance of a (characterized by software and system version) to the conditions. empirical understanding of learning via an informed we at learning a perfor- ment based on a well-suited ned by the knowledge we erefore, the main research A. Preliminary concepts In this section, we provide formal definitions of four con- cepts that we use throughout this study. The formal notations enable us to concisely convey concept throughout the paper. 1) Configuration and environment space: Let Fi indicate the i-th feature of a configurable system A which is either enabled or disabled and one of them holds by default. The configuration space is mathematically a Cartesian product of all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where Dom(Fi) = {0, 1}. A configuration of a system is then a member of the configuration space (feature space) where all the parameters are assigned to a specific value in their range (i.e., complete instantiations of the system’s parameters). We also describe an environment instance by 3 variables e = [w, h, v] drawn from a given environment space E = W ⇥H ⇥V , where they respectively represent sets of possible values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 oad, hardware and system version. e model: Given a software system A with ce F and environmental instances E, a per- is a black-box function f : F ⇥ E ! R rvations of the system performance for each ystem’s features x 2 F in an environment ruct a performance model for a system A n space F, we run A in environment instance combinations of configurations xi 2 F, and ng performance values yi = f(xi) + ✏i, xi 2 (0, i). The training data for our regression mply Dtr = {(xi, yi)}n i=1. In other words, a is simply a mapping from the input space to ormance metric that produces interval-scaled ume it produces real numbers). e distribution: For the performance model, associated the performance response to each w let introduce another concept where we ment and we measure the performance. An mance distribution is a stochastic process, that defines a probability distribution over sures for each environmental conditions. To ormance distribution for a system A with ce F, similarly to the process of deriving models, we run A on various combinations 2 F, for a specific environment instance values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 F where ✏i ⇠ N (0, i). The training data for our regression models is then simply Dtr = {(xi, yi)}n i=1. In other words, a response function is simply a mapping from the input space to a measurable performance metric that produces interval-scaled data (here we assume it produces real numbers). 3) Performance distribution: For the performance model, we measured and associated the performance response to each configuration, now let introduce another concept where we vary the environment and we measure the performance. An empirical performance distribution is a stochastic process, pd : E ! (R), that defines a probability distribution over performance measures for each environmental conditions. To construct a performance distribution for a system A with configuration space F, similarly to the process of deriving the performance models, we run A on various combinations configurations xi 2 F, for a specific environment instance Extract Reuse Learn Learn Q1: How source and target are “related”? Q2: What characteristics are preserved? Q3: What are the actionable insights?
  • 144. We hypothesized that we can exploit similarities across environments to learn “cheaper” performance models O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn ys1 = fs(c1) ys2 = fs(c2) ys3 = fs(c3) ysn = fs(cn) O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ yt1 = ft(c1) yt2 = ft(c2) yt3 = ft(c3) ytn = ft(cn) Source Environment (Execution time of Program X) Target Environment (Execution time of Program Y) Similarity [P. Jamshidi, et al., “Transfer learning for performance modeling of configurable systems….”, ASE’17]
  • 145. Our empirical study: We looked at different highly- configurable systems to gain insights [P. Jamshidi, et al., “Transfer learning for performance modeling of configurable systems….”, ASE’17] SPEAR (SAT Solver) Analysis time 14 options 16,384 configurations SAT problems 3 hardware 2 versions X264 (video encoder) Encoding time 16 options 4,000 configurations Video quality/size 2 hardware 3 versions SQLite (DB engine) Query time 14 options 1,000 configurations DB Queries 2 hardware 2 versions SaC (Compiler) Execution time 50 options 71,267 configurations 10 Demo programs
  • 146. Linear shift happens only in limited environmental changes Soft Environmental change Severity Corr. SPEAR NUC/2 -> NUC/4 Small 1.00 Amazon_nano -> NUC Large 0.59 Hardware/workload/version V Large -0.10 x264 Version Large 0.06 Workload Medium 0.65 SQLite write-seq -> write-batch Small 0.96 read-rand -> read-seq Medium 0.50 Target Source Throughput Implication: Simple transfer learning is limited to hardware changes in practice log P(θ, Xobs) Θ l P(θ|Xobs) Θ Figure 5: The first column shows the log joint probab
  • 147. Linear shift happens only in limited environmental changes Soft Environmental change Severity Corr. SPEAR NUC/2 -> NUC/4 Small 1.00 Amazon_nano -> NUC Large 0.59 Hardware/workload/version V Large -0.10 x264 Version Large 0.06 Workload Medium 0.65 SQLite write-seq -> write-batch Small 0.96 read-rand -> read-seq Medium 0.50 Target Source Throughput Implication: Simple transfer learning is limited to hardware changes in practice log P(θ, Xobs) Θ l P(θ|Xobs) Θ Figure 5: The first column shows the log joint probab
  • 148. Linear shift happens only in limited environmental changes Soft Environmental change Severity Corr. SPEAR NUC/2 -> NUC/4 Small 1.00 Amazon_nano -> NUC Large 0.59 Hardware/workload/version V Large -0.10 x264 Version Large 0.06 Workload Medium 0.65 SQLite write-seq -> write-batch Small 0.96 read-rand -> read-seq Medium 0.50 Target Source Throughput Implication: Simple transfer learning is limited to hardware changes in practice log P(θ, Xobs) Θ l P(θ|Xobs) Θ Figure 5: The first column shows the log joint probab
  • 149. Linear shift happens only in limited environmental changes Soft Environmental change Severity Corr. SPEAR NUC/2 -> NUC/4 Small 1.00 Amazon_nano -> NUC Large 0.59 Hardware/workload/version V Large -0.10 x264 Version Large 0.06 Workload Medium 0.65 SQLite write-seq -> write-batch Small 0.96 read-rand -> read-seq Medium 0.50 Target Source Throughput Implication: Simple transfer learning is limited to hardware changes in practice log P(θ, Xobs) Θ l P(θ|Xobs) Θ Figure 5: The first column shows the log joint probab
  • 150. Soft Environmental change Severity Dim t-test x264 Version Large 16 12 10 Hardware/workload/ver V Large 8 9 SQLite write-seq -> write-batch V Large 14 3 4 read-rand -> read-seq Medium 1 1 SaC Workload V Large 50 16 10 Implication: Avoid wasting budget on non-informative part of configuration space and focusing where it matters. Influential options and interactions are preserved across environments
  • 151. Soft Environmental change Severity Dim t-test x264 Version Large 16 12 10 Hardware/workload/ver V Large 8 9 SQLite write-seq -> write-batch V Large 14 3 4 read-rand -> read-seq Medium 1 1 SaC Workload V Large 50 16 10 Implication: Avoid wasting budget on non-informative part of configuration space and focusing where it matters. Influential options and interactions are preserved across environments
  • 152. Soft Environmental change Severity Dim t-test x264 Version Large 16 12 10 Hardware/workload/ver V Large 8 9 SQLite write-seq -> write-batch V Large 14 3 4 read-rand -> read-seq Medium 1 1 SaC Workload V Large 50 16 10 Implication: Avoid wasting budget on non-informative part of configuration space and focusing where it matters. Influential options and interactions are preserved across environments
  • 153. Soft Environmental change Severity Dim t-test x264 Version Large 16 12 10 Hardware/workload/ver V Large 8 9 SQLite write-seq -> write-batch V Large 14 3 4 read-rand -> read-seq Medium 1 1 SaC Workload V Large 50 16 10 Implication: Avoid wasting budget on non-informative part of configuration space and focusing where it matters. Influential options and interactions are preserved across environments 216 250 = 0.000000000058 We only need to explore part of the space:
  • 154. Transfer learning across environment O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn ys1 = fs(c1) ys2 = fs(c2) ys3 = fs(c3) ysn = fs(cn) ̂fs ∼ fs( ⋅ )
  • 155. Transfer learning across environment O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn ys1 = fs(c1) ys2 = fs(c2) ys3 = fs(c3) ysn = fs(cn) ̂fs ∼ fs( ⋅ )
  • 156. Transfer learning across environment O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn ys1 = fs(c1) ys2 = fs(c2) ys3 = fs(c3) ysn = fs(cn) Source (Execution time of Program X) ̂fs ∼ fs( ⋅ )
  • 157. Transfer learning across environment O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn ys1 = fs(c1) ys2 = fs(c2) ys3 = fs(c3) ysn = fs(cn) Source (Execution time of Program X) Learn performance model ̂fs ∼ fs( ⋅ )
  • 158. Observation 1: Not all options and interactions are influential and interactions degree between options are not high ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 ℂ = O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10
  • 159. Observation 2: Influential options and interactions are preserved across environments ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
  • 160. Observation 2: Influential options and interactions are preserved across environments ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
  • 161. Observation 2: Influential options and interactions are preserved across environments ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
  • 162. Observation 2: Influential options and interactions are preserved across environments ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
  • 163. Observation 2: Influential options and interactions are preserved across environments ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
  • 164. Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis Pooyan Jamshidi Carnegie Mellon University, USA Norbert Siegmund Bauhaus-University Weimar, Germany Miguel Velez, Christian K¨astner Akshay Patel, Yuvraj Agarwal Carnegie Mellon University, USA Abstract—Modern software systems provide many configura- tion options which significantly influence their non-functional properties. To understand and predict the effect of configuration options, several sampling and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort of constructing performance models by transferring knowledge about performance behavior across environments. While this line of research is promising to learn more accurate models at a lower cost, it is unclear why and when transfer learning works for performance modeling. To shed light on when it is beneficial to apply transfer learning, we conducted an empirical study on four popular software systems, varying software configurations and environmental conditions, such as hardware, workload, and software versions, to identify the key knowledge pieces that can be exploited for transfer learning. Our results show that in small environmental changes (e.g., homogeneous workload change), by applying a linear transformation to the performance model, we can understand the performance behavior of the target environment, while for severe environmental changes (e.g., drastic workload change) we can transfer only knowledge that makes sampling more efficient, e.g., by reducing the dimensionality of the configuration space. Index Terms—Performance analysis, transfer learning. Fig. 1: Transfer learning is a form of machine learning that takes advantage of transferable knowledge from source to learn an accurate, reliable, and less costly model for the target environment. their byproducts across environments is demanded by many Details: [ASE ’17]
  • 165. Details: [AAAI Spring Symposium ’19]
  • 167. How to sample the configuration space to learn a “better” performance behavior? How to select the most informative configurations?
  • 168. The similarity across environment is a rich source of knowledge for exploration of the configuration space
  • 169. When we treat the system as black boxes, we cannot typically distinguish between different configurations O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn • We therefore end up blindly explore the configuration space • That is essentially the key reason why “most” work in this area consider random sampling.
  • 170. Without considering this knowledge, many samples may not provide new information ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  • 171. Without considering this knowledge, many samples may not provide new information 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 0c1 c2 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 1 O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10 c3 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 1 × 0 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1c128 ⋯ ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  • 172. Without considering this knowledge, many samples may not provide new information 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 0c1 c2 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 1 O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10 ̂fs(c1) = 14.9 ̂fs(c2) = 14.9 c3 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 1 × 0 ̂fs(c3) = 14.9 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 ̂fs(c128) = 14.9c128 ⋯ ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  • 173. Without knowing this knowledge, many blind/ random samples may not provide any additional information about performance of the system
  • 174. Evaluation: Learning performance behavior of Machine Learning Systems ML system: https://pooyanjamshidi.github.io/mls
  • 175. Configurations of deep neural networks affect accuracy and energy consumption 0 500 1000 1500 2000 2500 Energy consumption [J] 0 20 40 60 80 100 Validation(test)error CNN on CNAE-9 Data Set 72% 22X
  • 176. Configurations of deep neural networks affect accuracy and energy consumption 0 500 1000 1500 2000 2500 Energy consumption [J] 0 20 40 60 80 100 Validation(test)error CNN on CNAE-9 Data Set 72% 22X the selected cell is plugged into a large model which is trained on the combinatio validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We not is never used for model selection, and it is only used for final model evaluation. We cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge dat sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3/2 conv3x3/2 sep.conv3x3/2 globalpool linear&softmax cell cell cell cell cell cell cell Figure 2: Image classification models constructed using the cells optimized with arc Top-left: small model used during architecture search on CIFAR-10. Top-right: model used for learned cell evaluation. Bottom: ImageNet model used for learned For CIFAR-10 experiments we use a model which consists of 3 ⇥ 3 convolution w
  • 177. Configurations of deep neural networks affect accuracy and energy consumption 0 500 1000 1500 2000 2500 Energy consumption [J] 0 20 40 60 80 100 Validation(test)error CNN on CNAE-9 Data Set 72% 22X the selected cell is plugged into a large model which is trained on the combinatio validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We not is never used for model selection, and it is only used for final model evaluation. We cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge dat sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3/2 conv3x3/2 sep.conv3x3/2 globalpool linear&softmax cell cell cell cell cell cell cell Figure 2: Image classification models constructed using the cells optimized with arc Top-left: small model used during architecture search on CIFAR-10. Top-right: model used for learned cell evaluation. Bottom: ImageNet model used for learned For CIFAR-10 experiments we use a model which consists of 3 ⇥ 3 convolution w the selected cell is plugged into a large model which is trained on the combination of training and validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We note that the test set is never used for model selection, and it is only used for final model evaluation. We also evaluate the cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge dataset (Sect. 4.3). sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3 globalpool linear&softmax cell cell cell cell cell cell Figure 2: Image classification models constructed using the cells optimized with architecture search. Top-left: small model used during architecture search on CIFAR-10. Top-right: large CIFAR-10 model used for learned cell evaluation. Bottom: ImageNet model used for learned cell evaluation.
  • 178. DNN measurements are costly Each sample cost ~1h 4000 * 1h ~= 6 months
  • 179. DNN measurements are costly Each sample cost ~1h 4000 * 1h ~= 6 months Yes, that’s the cost we paid for conducting our measurements!
  • 180. L2S enables learning a more accurate model with less samples exploiting the knowledge from the source 3 10 20 30 40 50 60 70 Sample Size 0 20 40 60 80 100 MeanAbsolutePercentageError 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (a) DNN (hard) 3 10 20 30 40 Sample Si 0 20 40 60 80 100 MeanAbsolutePercentageError L L D M R (b) XGBoost (h Convolutional Neural Network
  • 181. L2S enables learning a more accurate model with less samples exploiting the knowledge from the source 3 10 20 30 40 50 60 70 Sample Size 0 20 40 60 80 100 MeanAbsolutePercentageError 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (a) DNN (hard) 3 10 20 30 40 Sample Si 0 20 40 60 80 100 MeanAbsolutePercentageError L L D M R (b) XGBoost (h Convolutional Neural Network
  • 182. L2S enables learning a more accurate model with less samples exploiting the knowledge from the source 3 10 20 30 40 50 60 70 Sample Size 0 20 40 60 80 100 MeanAbsolutePercentageError 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (a) DNN (hard) 3 10 20 30 40 Sample Si 0 20 40 60 80 100 MeanAbsolutePercentageError L L D M R (b) XGBoost (h Convolutional Neural Network
  • 183. L2S may also help data-reuse approach to learn faster 30 40 50 60 70 Sample Size 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (a) DNN (hard) 3 10 20 30 40 50 60 70 Sample Size 0 20 40 60 80 100 MeanAbsolutePercentageError 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (b) XGBoost (hard) 3 10 20 30 40 Sample Siz 0 20 40 60 80 100 MeanAbsolutePercentageError L L D M R (c) Storm (ha XGBoost
  • 184. L2S may also help data-reuse approach to learn faster 30 40 50 60 70 Sample Size 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (a) DNN (hard) 3 10 20 30 40 50 60 70 Sample Size 0 20 40 60 80 100 MeanAbsolutePercentageError 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (b) XGBoost (hard) 3 10 20 30 40 Sample Siz 0 20 40 60 80 100 MeanAbsolutePercentageError L L D M R (c) Storm (ha XGBoost
  • 185. L2S may also help data-reuse approach to learn faster 30 40 50 60 70 Sample Size 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (a) DNN (hard) 3 10 20 30 40 50 60 70 Sample Size 0 20 40 60 80 100 MeanAbsolutePercentageError 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (b) XGBoost (hard) 3 10 20 30 40 Sample Siz 0 20 40 60 80 100 MeanAbsolutePercentageError L L D M R (c) Storm (ha XGBoost
  • 187. Some environments the similarities across environments may be too low and this results in “negative transfer” 0 30 40 50 60 70 Sample Size 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (b) XGBoost (hard) 3 10 20 30 40 50 60 70 Sample Size 0 20 40 60 80 100 MeanAbsolutePercentageError 100200500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (c) Storm (hard) Apache Storm
  • 188. Some environments the similarities across environments may be too low and this results in “negative transfer” 0 30 40 50 60 70 Sample Size 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (b) XGBoost (hard) 3 10 20 30 40 50 60 70 Sample Size 0 20 40 60 80 100 MeanAbsolutePercentageError 100200500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (c) Storm (hard) Apache Storm
  • 189. Some environments the similarities across environments may be too low and this results in “negative transfer” 0 30 40 50 60 70 Sample Size 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (b) XGBoost (hard) 3 10 20 30 40 50 60 70 Sample Size 0 20 40 60 80 100 MeanAbsolutePercentageError 100200500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (c) Storm (hard) Apache Storm
  • 190. Why performance models using L2S sample are more accurate?
  • 191. The samples generated by L2S contains more information… “entropy <-> information gain” 10 20 30 40 50 60 70 sample size 0 1 2 3 4 5 6 entropy[bits] Max entropy L2S Random DNN
  • 192. The samples generated by L2S contains more information… “entropy <-> information gain” 10 20 30 40 50 60 70 sample size 0 1 2 3 4 5 6 entropy[bits] Max entropy L2S Random 10 20 30 40 50 60 70 sample size 0 1 2 3 4 5 6 7 entropy[bits] Max entropy L2S Random 10 20 30 40 50 60 70 sample size 0 1 2 3 4 5 6 7 entropy[bits] Max entropy L2S Random DNN XGboost Storm
  • 193. Limitations • Limited number of systems and environmental changes • Synthetic models • https://github.com/pooyanjamshidi/GenPerf • Binary options • Non-binary options -> binary • Negative transfer
  • 196. What will the software systems of the future look like?
  • 197. Software 2.0 Increasingly customized and configurable VISION
  • 198. Software 2.0 Increasingly customized and configurable VISION
  • 199. Software 2.0 Increasingly customized and configurable VISION
  • 200. Software 2.0 Increasingly customized and configurable VISION
  • 201. Software 2.0 Increasingly customized and configurable VISION Increasingly competing objectives Accuracy Training speed Inference speed Model size Energy
  • 202. Deep neural network as a highly configurable system of top/bottom conf.; M6/M7: Number of influential options; M8/M9: Number of options agree Correlation btw importance of options; M11/M12: Number of interactions; M13: Number of inte e↵ects; M14: Correlation btw the coe↵s; Input #1 Input #2 Input #3 Input #4 Output Hidden layer Input layer Output layer 4 Technical Aims and Research Plan We will pursue the following technical aims: (1) investigate potential criteria for e↵ectiv exploration of the design space of DNN architectures (Section 4.2), (2) build analytical m curately predict the performance of a given architecture configuration given other similar which either have been measured in the target environments or other similar environm measuring the network performance directly (Section 4.3), and (3), develop a tunni that exploit the performance model from previous step to e↵ectively search for optima (Section 4.4). 4.1 Project Timeline We plan to complete the proposed project in two years. To mitigate project risks, we project into three major phases: 8 Network Design Model Compiler Hybrid Deployment OS/ Hardware Scopeof thisProject Neural Search Hardware Optimization Hyper-parameter DNNsystemdevelopmentstack Deployment Topology
  • 203. We found many configuration with the same accuracy while having drastically different energy demand 0 500 1000 1500 2000 2500 Energy consumption [J] 0 20 40 60 80 100 Validation(test)error CNN on CNAE-9 Data Set 72% 22X
  • 204. We found many configuration with the same accuracy while having drastically different energy demand 0 500 1000 1500 2000 2500 Energy consumption [J] 0 20 40 60 80 100 Validation(test)error CNN on CNAE-9 Data Set 72% 22X 100 150 200 250 300 350 400 Energy consumption [J] 8 10 12 14 16 18 Validation(test)error CNN on CNAE-9 Data Set 22X 10% 300J Pareto frontier
  • 206. Thanks
  • 207. Our Group Yan Ren, ML + Systems Shahriar Iqbal, Systems + ML Rui Xin, Neural Arch Search Jianhai Su, AI + Systems + Robotics Ying Meng, AI Testing Yuxiang Sun, Deep RL Thahimum Hassan, AI Fairness Peter Mourfield, ML + Causal Inference Mohammad Ali Javidian, ML + Causal Inference
  • 208.
  • 209.
  • 210.
  • 211.
  • 212.