Continuous Architecting of Stream-Based Systems

An Uncertainty-Aware Approach to Optimal
Configuration of Stream Processing Systems
Pooyan Jamshidi
(joint work work Giuliano Casale)
Imperial College London
p.jamshidi@imperial.ac.uk
University of Bern
1st Nov 2016

Motivation
1- Many different
Parameters =>
- large state space
- interactions
2- Defaults are
typically used =>
- poor performance

Motivation
0 1 2 3 4 5
average read latency (µs) ×10
4
0
20
40
60
80
100
120
140
160
observations
1000 1200 1400 1600 1800 2000
average read latency (µs)
0
10
20
30
40
50
60
70
observations
1
1
(a) cass-20 (b) cass-10
Best configurations
Worst configurations
Experiments on
Apache Cassandra:
- 6 parameters, 1024 configurations
- Average read latency
- 10 millions records (cass-10)
- 20 millions records (cass-20)

Motivation (Apache Storm)
number of counters
number of splitters
latency(ms)
100
150
1
200
250
2
300
Cubic Interpolation Over Finer Grid
243 684 10125 14166 18
In our experiments we
observed improvement
up to 100%

Goal
is denoted by f(x). Throughout, we assume
ncy, however, other metrics for response may
re consider the problem of finding an optimal
⇤
that globally minimizes f(·) over X:
x⇤
= arg min
x2X
f(x) (1)
esponse function f(·) is usually unknown or
n, i.e., yi = f(xi), xi ⇢ X. In practice, such
may contain noise, i.e., yi = f(xi) + ✏. The
of the optimal configuration is thus a black-
on program subject to noise [27, 33], which
harder than deterministic optimization. A
n is based on sampling that starts with a
pled configurations. The performance of the
sociated to this initial samples can deliver
tanding of f(·) and guide the generation of
of samples. If properly guided, the process
ration-evaluation-feedback-regeneration will
tinuously, (ii) Big Data systems are d
frameworks (e.g., Apache Hadoop, S
on similar platforms (e.g., cloud clust
versions of a system often share a sim
To the best of our knowledge, only
the possibility of transfer learning in
The authors learn a Bayesian networ
of a system and reuse this model fo
systems. However, the learning is lim
the Bayesian network. In this paper,
that not only reuse a model that has b
but also the valuable raw data. There
to the accuracy of the learned model
consider Bayesian networks and inste
2.4 Motivation
A motivating example. We now
points on an example. WordCount (cf.
benchmark [12]. WordCount features
(Xi). In general, Xi may either indicate (i) integer vari-
such as level of parallelism or (ii) categorical variable
as messaging frameworks or Boolean variable such as
ng timeout. We use the terms parameter and factor in-
angeably; also, with the term option we refer to possible
s that can be assigned to a parameter.
assume that each configuration x 2 X in the configura-
pace X = Dom(X1) ⇥ · · · ⇥ Dom(Xd) is valid, i.e., the
m accepts this configuration and the corresponding test
s in a stable performance behavior. The response with
guration x is denoted by f(x). Throughout, we assume
f(·) is latency, however, other metrics for response may
ed. We here consider the problem of finding an optimal
guration x⇤
x⇤
= arg min
x2X
f(x) (1)
fact, the response function f(·) is usually unknown or
ally known, i.e., yi = f(xi), xi ⇢ X. In practice, such
it still requires hundr
per, we propose to ad
with the search e ci
than starting the sear
the learned knowledg
software to accelerate
version. This idea is i
in real software engin
in DevOps di↵erent
tinuously, (ii) Big Da
frameworks (e.g., Ap
on similar platforms (
versions of a system o
To the best of our k
the possibility of tran
The authors learn a B
of a system and reus
systems. However, the
his configuration and the corresponding test
le performance behavior. The response with
is denoted by f(x). Throughout, we assume
ncy, however, other metrics for response may
e consider the problem of finding an optimal
⇤
x⇤
= arg min
x2X
f(x) (1)
esponse function f(·) is usually unknown or
, i.e., yi = f(xi), xi ⇢ X. In practice, such
may contain noise, i.e., yi = f(xi) + ✏. The
f the optimal configuration is thus a black-
n program subject to noise [27, 33], which
harder than deterministic optimization. A
n is based on sampling that starts with a
pled configurations. The performance of the
sociated to this initial samples can deliver
tanding of f(·) and guide the generation of
of samples. If properly guided, the process
ation-evaluation-feedback-regeneration will
erge and the optimal configuration will be
r, a sampling-based approach of this kind can
in DevOps di↵erent versions of a system is delivere
tinuously, (ii) Big Data systems are developed using s
frameworks (e.g., Apache Hadoop, Spark, Kafka) an
on similar platforms (e.g., cloud clusters), (iii) and di↵
versions of a system often share a similar business log
To the best of our knowledge, only one study [9] ex
the possibility of transfer learning in system configur
The authors learn a Bayesian network in the tuning p
of a system and reuse this model for tuning other s
systems. However, the learning is limited to the struct
the Bayesian network. In this paper, we introduce a m
that not only reuse a model that has been learned prev
but also the valuable raw data. Therefore, we are not li
to the accuracy of the learned model. Moreover, we d
consider Bayesian networks and instead focus on MT
2.4 Motivation
A motivating example. We now illustrate the pre
points on an example. WordCount (cf. Figure 1) is a p
benchmark [12]. WordCount features a three-layer arc
ture that counts the number of words in the incoming s
A Processing Element (PE) of type Spout reads the
havior. The response with
. Throughout, we assume
metrics for response may
blem of finding an optimal
nimizes f(·) over X:
f(x) (1)
(·) is usually unknown or
xi ⇢ X. In practice, such
i.e., yi = f(xi) + ✏. The
figuration is thus a black-
t to noise [27, 33], which
ministic optimization. A
mpling that starts with a
. The performance of the
itial samples can deliver
d guide the generation of
perly guided, the process
in DevOps di↵erent versions of a system is delivered co
tinuously, (ii) Big Data systems are developed using simila
frameworks (e.g., Apache Hadoop, Spark, Kafka) and ru
on similar platforms (e.g., cloud clusters), (iii) and di↵eren
versions of a system often share a similar business logic.
To the best of our knowledge, only one study [9] explore
the possibility of transfer learning in system configuratio
The authors learn a Bayesian network in the tuning proce
of a system and reuse this model for tuning other simila
systems. However, the learning is limited to the structure
the Bayesian network. In this paper, we introduce a metho
that not only reuse a model that has been learned previous
but also the valuable raw data. Therefore, we are not limite
to the accuracy of the learned model. Moreover, we do no
consider Bayesian networks and instead focus on MTGPs.
2.4 Motivation
A motivating example. We now illustrate the previou
points on an example. WordCount (cf. Figure 1) is a popula
benchmark [12]. WordCount features a three-layer archite
Partially known
Measurements subject to noise
Configuration space

Non-linear interactions
0 5 10 15 20
Number of counters
100
120
140
160
180
200
220
240
Latency(ms)
splitters=2
splitters=3
number of counters
number of splitters
latency(ms)
100
150
1
200
250
2
300
Cubic Interpolation Over Finer Grid
243 684 10125 14166 18
Response surface is:
- Non-linear
- Non convex
- Multi-modal

The measurements are subject to variability
wc wc+rs wc+sol 2wc 2wc+rs+sol
10
1
10
2
Latency(ms)
The scale of
measurement variability
is different in different
deployments
(heteroscedastic noise)
at points x that has been
here consider the problem
x⇤
that minimizes f over
w experiments as possible:
f(x) (1)
) is usually unknown or
xi ⇢ X. In practice, such
.e., yi = f(xi) + ✏i. Note
ly partially-known, ﬁnding
kbox optimization problem
noise. In fact, the problem
on-convex and multi-modal
P-hard [36]. Therefore, on
locate a global optimum,
st possible local optimum
udget.
It shows the non-convexity, multi-modality and the substantial
performance difference between different conﬁgurations.
0 5 10 15 20
Number of counters
100
120
140
160
180
200
220
240
Latency(ms)
splitters=2
splitters=3
Fig. 3: WordCount latency, cut though Figure 2.
demonstrates that if one tries to minimize latency by acting
just on one of these parameters at the time, the resulting

BO4CO architecture
Configuration
Optimisation Tool
performance
repository
Monitoring
Deployment Service
Data Preparation
configuration
parameters
values
configuration
parameters
values
Experimental Suite
Testbed
Doc
Data Broker
Tester
experiment time
polling interval
configuration
parameters
GP model
Kafka
System Under Test
Workload
Generator
Technology Interface
Storm
Cassandra
Spark

GP for modeling blackbox response function
true function
GP mean
GP variance
observation
selected point
true
minimum
mposed by its prior mean (µ(·) : X ! R) and a covariance
nction (k(·, ·) : X ⇥ X ! R) [41]:
y = f(x) ⇠ GP(µ(x), k(x, x0
)), (2)
here covariance k(x, x0
) defines the distance between x
d x0
. Let us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be
e collection of t experimental data (observations). In this
mework, we treat f(x) as a random variable, conditioned
observations S1:t, which is normally distributed with the
lowing posterior mean and variance functions [41]:
µt(x) = µ(x) + k(x)|
(K + 2
I) 1
(y µ) (3)
2
t (x) = k(x, x) + 2
I k(x)|
(K + 2
I) 1
k(x) (4)
here y := y1:t, k(x)|
= [k(x, x1) k(x, x2) . . . k(x, xt)],
:= µ(x1:t), K := k(xi, xj) and I is identity matrix. The
ortcoming of BO4CO is that it cannot exploit the observa-
ns regarding other versions of the system and as therefore
nnot be applied in DevOps.
2 TL4CO: an extension to multi-tasks
TL4CO 1
uses MTGPs that exploit observations from other
evious versions of the system under test. Algorithm 1
fines the internal details of TL4CO. As Figure 4 shows,
4CO is an iterative algorithm that uses the learning from
her system versions. In a high-level overview, TL4CO: (i)
ects the most informative past observations (details in
ction 3.3); (ii) fits a model to existing data based on kernel
arning (details in Section 3.4), and (iii) selects the next
ork are based on tractable linear algebra.
evious work [21], we proposed BO4CO that ex-
task GPs (no transfer learning) for prediction of
tribution of response functions. A GP model is
y its prior mean (µ(·) : X ! R) and a covariance
·, ·) : X ⇥ X ! R) [41]:
y = f(x) ⇠ GP(µ(x), k(x, x0
)), (2)
iance k(x, x0
us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be
n of t experimental data (observations). In this
we treat f(x) as a random variable, conditioned
ons S1:t, which is normally distributed with the
sterior mean and variance functions [41]:
µ(x) + k(x)|
(K + 2
I) 1
(y µ) (3)
k(x, x) + 2
I k(x)|
(K + 2
I) 1
k(x) (4)
1:t, k(x)|
= [k(x, x1) k(x, x2) . . . k(x, xt)],
, K := k(xi, xj) and I is identity matrix. The
of BO4CO is that it cannot exploit the observa-
ng other versions of the system and as therefore
pplied in DevOps.
CO: an extension to multi-tasks
uses MTGPs that exploit observations from other
Motivations:
1- mean estimates + variance
2- all computations are linear algebra
3- good estimations when few data

Sparsity of Effects
• Correlation-based
feature selector
• Merit is used to select
subsets that are highly
correlated with the
response variable
• At most 2-3 parameters
were strongly interacting
with each other
TABLE I: Sparsity of effects on 5 experiments where we have varied
different subsets of parameters and used different testbeds. Note that
these are the datasets we experimentally measured on the benchmark
systems and we use them for the evaluation, more details including
the results for 6 more experiments are in the appendix.
Topol. Parameters Main factors Merit Size Testbed
1 wc(6D)
1-spouts, 2-max spout,
3-spout wait, 4-splitters,
5-counters, 6-netty min wait
{1, 2, 5} 0.787 2880 C1
2 sol(6D)
3-top level, 4-netty min wait,
5-message size, 6-bolts
{1, 2, 3} 0.447 2866 C2
3 rs(6D)
3-sorters, 4-emit freq,
5-chunk size, 6-message size
{3} 0.385 3840 C3
4 wc(3D)
1-max spout, 2-splitters,
3-counters {1, 2} 0.480 756 C4
5 wc(5D)
1-spouts, 2-splitters,
3-counters,
4-buffer-size, 5-heap
{1} 0.851 1080 C5
102
s)
Experiments on:
1. C1: OpenNebula (X)
2. C2: Amazon EC2 (Y)
3. C3: OpenNebula (3X)
4. C4: Amazon EC2 (2Y)
5. C5: Microsoft Azure (X)

-1.5 -1 -0.5 0 0.5 1 1.5
-1.5
-1
-0.5
0
0.5
1
x1 x2 x3 x4
true function
GP surrogate
mean estimate
observation
Fig. 5: An example of 1D GP model: GPs provide mean esti-
mates as well as the uncertainty in estimations, i.e., variance.
Configuration
Optimisation Tool
performance
repository
Monitoring
Deployment Service
Data Preparation
configuration
parameters
values
configuration
parameters
values
Experimental Suite
Testbed
Doc
Data Broker
Tester
experiment time
polling interval
configuration
parameters
GP model
Kafka
System Under Test
Workload
Generator
Technology Interface
Storm
Cassandra
Spark
Algorithm 1 : BO4CO
Input: Configuration space X, Maximum budget Nmax, Re-
sponse function f, Kernel function K✓, Hyper-parameters
✓, Design sample size n, learning cycle Nl
Output: Optimal configurations x⇤
and learned model M
1: choose an initial sparse design (lhd) to find an initial
design samples D = {x1, . . . , xn}
2: obtain performance measurements of the initial design,
yi f(xi) + ✏i, 8xi 2 D
3: S1:n {(xi, yi)}n
i=1; t n + 1
4: M(x|S1:n, ✓) fit a GP model to the design . Eq.(3)
5: while t  Nmax do
6: if (t mod Nl = 0) ✓ learn the kernel hyper-
parameters by maximizing the likelihood
7: find next configuration xt by optimizing the selection
criteria over the estimated response surface given the data,
xt arg maxxu(x|M, S1:t 1) . Eq.(9)
8: obtain performance for the new configuration xt, yt
f(xt) + ✏t
9: Augment the configuration S1:t = {S1:t 1, (xt, yt)}
10: M(x|S1:t, ✓) re-fit a new GP model . Eq.(7)
11: t t + 1
12: end while
13: (x⇤
, y⇤
) = min S1:Nmax
14: M(x)
-1.5 -1 -0.5 0 0.5 1 1.5
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Configuration
Space
Empirical
Model
2
4
6
8
10
12
1
2
3
4
5
6
160
140
120
100
80
60
180
Experiment
(exhastive)
Experiment
Experiment
0 20 40 60 80 100 120 140 160 180 200
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Selection Criteria
(b) Sequential Design
(a) Design of Experiment

-1.5 -1 -0.5 0 0.5 1 1.5
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
configuration domain
responsevalue
-1.5 -1 -0.5 0 0.5 1 1.5
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
true response function
GP fit
-1.5 -1 -0.5 0 0.5 1 1.5
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
criteria evaluation
new selected point
-1.5 -1 -0.5 0 0.5 1 1.5
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
new GP fit
Acquisition function:
O then fits a GP model to
elief about the underlying
rithm 1). The while loop in
belief until the budget runs
:t = {(xi, yi)}t
i=1, where
a prior distribution Pr(f)
1:t|f) form the posterior
) Pr(f).
ions [37], specified by its
iance (see Section III-E1):
), k(x, x0
)), (3)
where
µt(x) = µ(x) + k(x)|
(K + 2
I) 1
(y µ) (7)
2
t (x) = k(x, x) + 2
I k(x)|
(K + 2
I) 1
k(x) (8)
These posterior functions are used to select the next point xt+1
as detailed in Section III-C.
C. Configuration selection criteria
The selection criteria is defined as u : X ! R that selects
xt+1 2 X, should f(·) be evaluated next (step 7):
xt+1 = argmax
x2X
u(x|M, S1:t) (9)

Logical
View
Physical
View
pipe
Spout A Bolt A Bolt B
socket socket
out queue in queue
Worker A Worker B Worker C
out queue in queue
Kafka Spout Splitter Bolt Counter Bolt
(sentence) (word)
[paintings, 3]
[poems, 60]
[letter, 75]
Kafka Topic
Stream to
Kafka
File
(sentence)
(sentence)
(sentence)
Kafka Spout
RollingCount
Bolt
Intermediate
Ranking Bolt
(hashtags)
(hashtag,
count)
Ranking
Bolt
(ranking)
(trending
topics)Kafka Topic
Twitter to
Kafka
(tweet)
Twitter Stream
(tweet)
(tweet)
Storm Architecture
Word Count Architecture
• CPU intensive
Rolling Sort Architecture
• Memory intensive
Applications:
• Fraud detection
• Trending topics

Experimental results
0 20 40 60 80 100
Iteration
10
-3
10-2
10-1
100
101
10
2
103
10
4
AbsoluteError
BO4CO
SA
GA
HILL
PS
Drift
0 20 40 60 80 100
Iteration
10
-2
10-1
100
101
102
10
3
AbsoluteError
BO4CO
SA
GA
HILL
PS
Drift
(a) WordCount(3D) (b) WordCount(5D)
- 30 runs, report average performance
- Yes, we did full factorial
measurements and we know where
global min is…

0 50 100 150 200
Iteration
10
-2
10
-1
10
0
101
10
2
10
3
104
AbsoluteError
BO4CO
SA
GA
HILL
PS
Drift
0 50 100 150 200
Iteration
10
-2
10
-1
10
0
101
10
2
10
3
104
AbsoluteError
BO4CO
SA
GA
HILL
PS
Drift
(a) SOL(6D) (b) RollingSort(6D)

0 20 40 60 80 100
Iteration
10-4
10-3
10-2
10-1
100
101
102
AbsoluteError
BO4CO
SA
GA
HILL
PS
Drift
0 20 40 60 80 100
Iteration
10-2
10
-1
100
10
1
AbsoluteError
BO4CO
SA
GA
HILL
PS
Drift
(a) Branin(2D) (b) Dixon(2D)

Model accuracy (comparison with polynomial regression models)
BO4CO polyfit1 polyfit2 polyfit3 polyfit4 polyfit5
10
-10
10
-8
10
-6
10
-4
10
-2
10
0
10
2
AbsolutePercentageError[%]

Prediction accuracy over time
0 10 20 30 40 50 60 70 80
Iteration
10
1
10
2
10
3
PredictionError
BO4CO
polyfit1
M5Tree
RegressionTree
M5Rules
LWP(GAU)
PRIM

Exploitation vs exploration
0 20 40 60 80 100
Iteration
10
-4
10
-3
10
-2
10
-1
10
0
10
1
10
2
AbsoluteError
BO4CO(adaptive)
BO4CO(µ:=0)
BO4CO(κ:=0.1)
BO4CO(κ:=1)
BO4CO(κ:=6)
BO4CO(κ:=8)
0 2000 4000 6000 8000 10000
Iteration
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
Kappa
ϵ=1
ϵ=0.1
ϵ=0.01
the next configuration to measure. Intuitively,
lect the minimum response. This is done using
ction u : X ! R that determines xt+1 2 X,
e evaluated next as:
xt+1 = argmax
x2X
u(x|M, S1
1:t) (11)
on criterion depends on the MTGP model M
h its predictive mean µt(xt) and variance 2
t (xt)
on observations S1
1:t. TL4CO uses the Lower
ound (LCB) [24]:
B(x|M, S1
1:t) = argmin
x2X
µt(x)  t(x), (12)
xploitation-exploration parameter. For instance,
to find a near optimal configuration we set a
 to take the most out of the predictive mean.
e are looking for a globally optimum one, we can
ue in order to skip local minima. Furthermore,
ted over time [22] to perform more explorations.
ws that in TL4CO,  can start with a relatively
at the early iterations comparing to BO4CO
mer provides a better estimate of mean and
xt+1 = argmax
x2X
u(x|M, S1
1:t) (11)
e selection criterion depends on the MTGP model M
through its predictive mean µt(xt) and variance 2
t (xt)
tioned on observations S1
1:t. TL4CO uses the Lower
dence Bound (LCB) [24]:
uLCB(x|M, S1
1:t) = argmin
x2X
µt(x)  t(x), (12)
 is a exploitation-exploration parameter. For instance,
require to find a near optimal configuration we set a
alue to  to take the most out of the predictive mean.
ver, if we are looking for a globally optimum one, we can
high value in order to skip local minima. Furthermore,
be adapted over time [22] to perform more explorations.
e 6 shows that in TL4CO,  can start with a relatively
r value at the early iterations comparing to BO4CO
the former provides a better estimate of mean and
ore contains more information at the early stages.
4CO output. Once the Nmax di↵erent configurations of
ystem under test are measured, the TL4CO algorithm
nates. Finally, TL4CO produces the outputs including
ptimal configuration (step 14 in Algorithm 1) as well

Runtime overhead
0 20 40 60 80 100
Iteration
0.15
0.2
0.25
0.3
0.35
0.4
ElapsedTime(s)
WordCount (3D)
WordCount (6D)
SOL (6D)
RollingSort (6D)
WordCount (5D)
- The computation time in larger
datasets is higher than those with
less data and lower.
- The computation time increases
over time since the matrix size for
Cholesky inversion gets larger.
mean is shown in yellow and the 95% confidence interval at
each point in the shaded red area. The stars indicate ex-
perimental measurements (or observation interchangeably).
Some points x 2 X have a large confidence interval due to
lack of observations in their neighborhood, while others have
a narrow confidence. The main motivation behind the choice
of Bayesian Optimization here is that it o↵ers a framework
in which reasoning can be not only based on mean estimates
but also the variance, providing more informative decision
making. The other reason is that all the computations in
this framework are based on tractable linear algebra.
In our previous work [21], we proposed BO4CO that ex-
ploits single-task GPs (no transfer learning) for prediction of
posterior distribution of response functions. A GP model is
composed by its prior mean (µ(·) : X ! R) and a covariance
function (k(·, ·) : X ⇥ X ! R) [41]:
y = f(x) ⇠ GP(µ(x), k(x, x0
)), (2)
where covariance k(x, x0
and x0
. Let us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be
the collection of t experimental data (observations). In this
framework, we treat f(x) as a random variable, conditioned
on observations S1:t, which is normally distributed with the
following posterior mean and variance functions [41]:
µt(x) = µ(x) + k(x)|
(K + 2
I) 1
(y µ) (3)
2
t (x) = k(x, x) + 2
I k(x)|
(K + 2
I) 1
k(x) (4)
where y := y1:t, k(x)|
= [k(x, x1) k(x, x2) . . . k(x, xt)],
n approach using a 1-dimensional response. The
blue is the unknown true response, whereas the
hown in yellow and the 95% confidence interval at
t in the shaded red area. The stars indicate ex-
al measurements (or observation interchangeably).
nts x 2 X have a large confidence interval due to
servations in their neighborhood, while others have
confidence. The main motivation behind the choice
an Optimization here is that it o↵ers a framework
easoning can be not only based on mean estimates
he variance, providing more informative decision
The other reason is that all the computations in
ework are based on tractable linear algebra.
previous work [21], we proposed BO4CO that ex-
le-task GPs (no transfer learning) for prediction of
distribution of response functions. A GP model is
by its prior mean (µ(·) : X ! R) and a covariance
k(·, ·) : X ⇥ X ! R) [41]:
y = f(x) ⇠ GP(µ(x), k(x, x0
)), (2)
variance k(x, x0
Let us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be
tion of t experimental data (observations). In this
k, we treat f(x) as a random variable, conditioned
ations S1:t, which is normally distributed with the

Correlations: SPS experiments
100
150
1
200
250
Latency(ms)
300
2 5
3 104
5 15
6
14
16
18
20
1
22
24
26
Latency(ms)
28
30
32
2 53 104
5 156
number of countersnumber of splitters number of countersnumber of splitters
2.8
2.9
1
3
3.1
3.2
3.3
2
Latency(ms)
3.4
3.5
3.6
3 5
4 10
5 15
6
1.2
1.3
1.4
1
1.5
1.6
1.7
Latency(ms)
1.8
1.9
2 53
104
5 156
(a) WordCount v1
(b) WordCount v2
(c) WordCount v3 (d) WordCount v4
(e) Pearson correlation coefficients
(g) Measurement noise across WordCount versions
(f) Spearman correlation coefficients
correlation coefficient
p-value
v1 v2 v3 v4
500
600
700
800
900
1000
1100
1200
Latency(ms)
hardware change
softwarechange
Table 1: My caption
v1 v2 v3 v4
v1 1 0.41 -0.46 -0.50
v2 7.36E-06 1 -0.20 -0.18
v3 6.92E-07 0.04 1 0.94
v4 2.54E-08 0.07 1.16E-52 1
Table 2: My caption
v1 v2 v3 v4
v1 1 0.49 -0.51 -0.51
v2 5.50E-08 1 -0.2793 -0.24
v3 1.30E-08 0.003 1 0.88
v4 1.40E-08 0.01 8.30E-36 1
Table 3: My caption
ver. µ µ
v1 516.59 7.96 64.88
v1 v2 v3 v4
v1 1 0.41 -0.46 -0.50
v2 7.36E-06 1 -0.20 -0.18
v3 6.92E-07 0.04 1 0.94
v4 2.54E-08 0.07 1.16E-52 1
Table 2: My caption
v1 v2 v3 v4
v1 1 0.49 -0.51 -0.51
v2 5.50E-08 1 -0.2793 -0.24
v3 1.30E-08 0.003 1 0.88
v4 1.40E-08 0.01 8.30E-36 1
Table 3: My caption
ver. µ µ
v1 516.59 7.96 64.88
v2 584.94 2.58 226.32
v3 654.89 13.56 48.30
v4 1125.81 16.92 66.56
- Different correlations
- Different optimum
Configurations
- Different noise level

DevOps
- Different versions are continuously delivered (daily basis).
- Big Data systems are developed using similar frameworks
(Apache Storm, Spark, Hadoop, Kafka, etc).
- Different versions share similar business logics.

Solution: Transfer Learning for Configuration Optimization
Configuration Optimization
(version j=M)
performance
measurements
Initial Design
Model Fit
Next Experiment
Model Update
Budget Finished
performance
repository
Configuration Optimization
(version j=N)
Initial Design
Model Fit
Next Experiment
Model Update
Budget Finished
select data
for training
GP model hyper-parameters
store filter

The case where we learn from correlated responses
-1.5 -1 -0.5 0 0.5 1 1.5
-4
-3
-2
-1
0
1
2
3
(a) 3 sample response functions
configuration domain
responsevalue
(1)
(2)
(3)
observations
(b) GP fit for (1) ignoring observations for (2),(3)
LCB
not informative
(c) multi-task GP fit for (1) by transfer learning from (2),(3)
highly informative
GP prediction mean
GP prediction variance
probability distribution
of the minimizers

Comparison with default and expert prescription
0 500 1000 1500
Throughput (ops/sec)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Averagereadlatency(µs)
×10
4
TL4CO
BO4CO
BO4CO after
20 iterations TL4CO after
20 iterations
TL4CO after
100 iterations
0 500 1000 1500
Throughput (ops/sec)
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Averagewritelatency(µs)
TL4CO
BO4CO
Default configuration
Configuration
recommended
by expert
TL4CO after
100 iterations
BO4CO after
100 iterations
Default configuration
Configuration
recommended
by expert

Prediction accuracy over time
0 20 40 60 80 100
Iteration
10-4
10-3
10
-2
10
-1
100
101
PredictionError(RMSE)
T=2,m=100
T=2,m=200
T=2,m=300
T=3,m=100
0 20 40 60 80 100
Iteration
10-4
10-3
10-2
10-1
100
101
PredictionError(RMSE)
TL4CO
polyfit1
polyfit2
polyfit4
polyfit5
M5Tree
M5Rules
PRIM (a) (b)

Entropy of the density function of the minimizers
0 20 40 60 80 100
0
1
2
3
4
5
6
7
8
9
10
Entropy
T=1(BO4CO)
T=2,m=100
T=2,m=200
T=2,m=300
T=2,m=400
T=3,m=100
1 2 3 4 5 6 7 8 9
0
2
4
6
8
10
BO4CO
TL4CO
Entropy
Iteration
Branin Hartmann WC(3D) SOL(6D) WC(5D)Dixon WC(6D) RS(6D) cass-20
he knowledge about the location of optimum configura-
is summarized by the approximation of the conditional
ability density function of the response function mini-
rs, i.e., X⇤
= Pr(x⇤
|f(x)), where f(·) is drawn from
MTGP model (cf. solid red line in Figure 5(b,c)). The
opy of the density functions in Figure 5(b,c) are 6.39,
so we know more information about the latter.
he results in Figure 19 confirm that the entropy measure
e minimizers with the models provided by TL4CO for all
datasets (synthetic and real) significantly contains more
mation. The results demonstrate that the main reason
finding quick convergence comparing with the baselines
at TL4CO employs a more e↵ective model. The results
igure 19(b) show the change of entropy of X⇤
over time
WC(5D) dataset. First, it shows that in TL4CO, the
opy decreases sharply. However, the overall decrease of
opy for BO4CO is slow. The second observation is that
TL4CO
variance,
storing K
making th
5. DIS
5.1 Be
TL4CO
experimen
practice.
than thre
the system
our appro
Knowledge about the location of the minimizer

Takeaways
Ø Be aware of Uncertainty
- By quantifying the uncertainty
- Make decisions taking into account the right level of uncertainty (homoscedastic vs
heteroscedastic)
- Uncertainty sometimes helps (models that provide an estimation of the uncertainty
are typically more informative)
- By exploiting this knowledge you can only explore interesting zones rather than
learning the whole performance function
Ø You can learn from operational data
- Not only from the current version, but from previous measurements as well
- Use the learning from past measurements as prior knowledge
- Too much data can be also harmful, it would slow down or blur the proper learning
(negative transfer)

Acknowledgement:
-BO4CO as a part of DevOps pipeline in H2020 DICE
-BO4CO is being acquired by TATA (TCS)
Big Data Technologies
Cloud (Priv/Pub)
`
DICE IDE
Profile
Plugins
Sim Ver Opt
DPIM
DTSM
DDSM TOSCAMethodology
Deploy Config Test
M
o
n
Anomaly
Trace
Iter. Enh.
Data Intensive Application (DIA)
Cont.Int. Fault Inj.
WP4
WP3
WP2
WP5
WP1 WP6 - Demonstrators
Code and data: https://github.com/dice-project/DICE-Configuration-BO4CO

Submit to SEAMS 2017
- Any work on Self-*
- Abstract Submission: 6 Jan, 2017 (firm)
- Paper Submission: 13 Jan, 2017 (firm)
- Page limit:
- Long: 10+2,
- New ideas and tools: 6+1
- More info: https://wp.doc.ic.ac.uk/seams2017/
- Symposium: 22-23 May, 2017
- We accept artifacts submissions (tool, data, model)
12th
International Symposium on Software Engineering for Adaptive and Self-Managing Systems
Buenos Aires, Argentina, May 22-23, 2017, http://wp.doc.ic.ac.uk/seams2017

Call for Papers
Self-adaptation and self-management are key objectives in many modern and emerging software systems, including
the industrial internet of things, cyber-physical systems, cloud computing, and mobile computing. These systems must
be able to adapt themselves at run time to preserve and optimize their operation in the presence of uncertain changes
in their operating environment, resource variability, new user needs, attacks, intrusions, and faults.
Approaches to complement software-based systems with self-managing and self-adaptive capabilities are an important
area of research and development, offering solutions that leverage advances in fields such as software architecture,
fault-tolerant computing, programming languages, robotics, and run-time program analysis and verification.
Additionally, research in this field is informed by related areas like biologically-inspired computing, artificial
intelligence, machine learning, control systems, and agent-based systems. The SEAMS symposium focuses on applying
software engineering to these approaches, including methods, techniques, and tools that can be used to support self-*
properties like self-adaptation, self-management, self-healing, self-optimization, and self-configuration.
The objective of SEAMS is to bring together researchers and practitioners from diverse areas to investigate, discuss,
and examine the fundamental principles, state of the art, and critical challenges of engineering self-adaptive and self-
managing systems.
Topics of Interest: All topics related to engineering self-adaptive and self-managing systems, including:
Foundational Concepts
• self-* properties
• control theory
• algorithms
• decision-making and planning
• managing uncertainty
• mixed-initiative and human-in-the-loop systems
Languages
• formal notations for modeling and analyzing self-*
properties
• programming language support for self-adaptation
Constructive methods
• requirements elicitation techniques
• reuse support (e.g., patterns, designs, code)
• architectural techniques
• legacy systems

Analytical Methods for Self-Adaptation and -Management
• evaluation and assurance
• verification and validation
• analysis and testing frameworks
Application Areas
• Industrial internet of things
• Cyber-physical systems
• Cloud computing
• Mobile computing
• Robotics
• Smart user interfaces
• Security and privacy
• Wearables and ubiquitous/pervasive systems
Artifacts* and Evaluations
• model problems and exemplars
• resources, metrics, or software that can be used to
compare self-adaptive approaches
• experiences in applying tools to real problems
Paper Submission Details

Further Information
Symposia-related email should be addressed to:
seams17-org@lists.andrew.cmu.edu
Important Dates:
Abstract Submission: 6 Jan, 2017 (AoE,firm)
Paper Submission: 13 Jan, 2017 (AoE,firm)
Notification: 21 February, 2017
Camera ready: 6 Mar, 2017

SEAMS solicits three types of papers: long papers (10 pages for the main text, inclusive of figures, tables, appendices,
etc.; references may be included in up to two additional pages), short papers for new ideas and early results (6 pages +
1 for references) and artifact papers (6 pages + 1 reference). Long papers should clearly describe innovative and
original research or explain how existing techniques have been applied to real-world examples. Short papers should
describe novel and promising ideas and/or techniques that are in an early stage of development. Artifact papers must
describe why and how the accompanying artifact may be useful for the broader community. Papers must not have
been previously published or concurrently submitted elsewhere. Papers must conform to IEEE formatting guidelines
(see ICSE 2017 style guidelines), and submitted via EasyChair. Accepted papers will appear in the symposium
proceedings that will be published in the ACM and IEEE digital libraries. Accepted artifact papers will also be archived
on the Dagstuhl Artifacts Series (DARTS).
*There will be a specific session to be dedicated to artifacts that may be useful for the community as a
whole. Please see http://wp.doc.ic.ac.uk/seams2017/call-for-artifacts/ for more details.

Selected papers will be invited to submit to the ACM Transactions on Autonomous and Adaptive Systems (TAAS).
General Chair
David Garlan, USA
Program Chair
Bashar Nuseibeh, UK & Ireland
Artifacts Chair
Javier Cámara, USA
Publicity Chair
Pooyan Jamshidi, UK
Local Chair
Nicolás D’Ippolito, Argentina
Program Committee
Dalal Alrajeh, UK
Jesper Andersson, Sweden
Rami Bahsoon, UK
Arosha Bandara, UK
Luciano Baresi, Italy
Jacob Beal, USA
Nelly Bencomo, UK
Amel, Bennaceur, UK
Victor Braberman, Argentina
Tomas Bures, Czech Republic
Radu Calinescu, UK
Javier Camara, USA
Betty Cheng, USA
Siobhán Clarke, Ireland
Rogério de Lemos, UK
Elisabetta di Nitto, Italy
Nicolás D’Ippolito, Argentina
Ada Diaconescu, France
Gregor Engels, Germany
Antonio Filieri, UK
Erik Fredericks, USA
Holger Giese, Germany
Hassan Gomaa, USA
Joel Greenyer, Germany
Mark Harman, UK
Valerie Issarny, France
Pooyan Jamshidi, UK
Jean-Marc Jézéquel, France
Samuel Kounev, Germany
Philippe Lalanda, France
Seok–Won Lee, South Korea
Marin Litoiu, Canada
Xiaoxing Ma, China
Martina Maggio, Sweden
Sam Malek, USA
Nenad Medvidovic, USA
Hausi Müller, Canada
Henry Muccini, Italy
John Mylopoulos, Canada
Ingrid Nunes, Brazil
Liliana Pasquale, Ireland
Patrizio Pelliccione, Sweden
Xin Peng, China
David Rosenblum, Singapore
Bradley Schmerl, USA
Hella Seebach, Germany
Amir Molzam Sharifloo, Germany
Vitor Silva Sousa, Brazil
Jan-Philipp Steghöfer, Sweden
Ladan Tahvildari, Canada
Kenji Tei, Japan
Axel van Lamsweerde, Belgium
Giuseppe Valetto, Italy
Mirko Viroli, Italy
Danny Weyns, Belgium
Yijun Yu, UK
Artifact Evaluation Committee
Konstantinos Angelopoulos, UK
Nuno Antunes, Portugal
Amel Bennaceur, UK
Javier Cámara, USA
Ilias Gerostathopoulos, Germany
Mahmoud Hammad, USA
Muhammad Usman Iftikhar, Sweden
Ashutosh Pandey, USA
Roykrong Sukkerd, USA
Christos Tsigkanos, Italy
Co-locatedwith

Continuous Architecting of Stream-Based Systems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Continuous Architecting of Stream-Based Systems

Similar to Continuous Architecting of Stream-Based Systems (20)

More from CHOOSE

More from CHOOSE (9)

Recently uploaded

Recently uploaded (20)

Continuous Architecting of Stream-Based Systems