Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Microservices Architecture Enables ... by Pooyan Jamshidi 8012 views
- Scalable machine learning by Arnaud Rachez 5345 views
- Transfer Learning for Improving Mod... by Pooyan Jamshidi 334 views
- Transfer Learning for Improving Mod... by Pooyan Jamshidi 673 views
- An Uncertainty-Aware Approach to Op... by Pooyan Jamshidi 1847 views
- Cloud Migration Patterns: A Multi-C... by Pooyan Jamshidi 7711 views

1,306 views

Published on

Machine Learning meets DevOps:

when uncertainty can be helpful

Published in:
Software

No Downloads

Total views

1,306

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

53

Comments

0

Likes

6

No embeds

No notes for slide

- 1. Machine Learning meets DevOps: when uncertainty can be helpful Pooyan Jamshidi Imperial College London p.jamshidi@imperial.ac.uk Software Performance Engineering in the DevOps World, Sept 2016
- 2. Motivation 1- Many different Parameters => - large state space - interactions 2- Defaults are typically used => - poor performance
- 3. Motivation 0 1 2 3 4 5 average read latency (µs) ×104 0 20 40 60 80 100 120 140 160 observations 1000 1200 1400 1600 1800 2000 average read latency (µs) 0 10 20 30 40 50 60 70 observations 1 1 (a) cass-20 (b) cass-10 Best configurations Worst configurations Experiments on Apache Cassandra: - 6 parameters, 1024 configurations - Average read latency - 10 millions records (cass-10) - 20 millions records (cass-20) Look at these outliers! Large statistical dispersion
- 4. Motivation 0 1000 2000 3000 4000 5000 average write latency ( s) 0 50 100 150 200 250 300 350 400 450 500 observations 1 - Large statistical dispersion - Long tailed distributions
- 5. Motivation (throughput) -500 0 500 1000 1500 throughput (ops/sec) 0 10 20 30 40 50 60 observations Configuration that generate low throughput Configurations that generate high throughput
- 6. Motivation (Apache Storm) number of counters number of splitters latency(ms) 100 150 1 200 250 2 300 Cubic Interpolation Over Finer Grid 243 684 10125 14166 18 In our experiments we observed improvement up to 100%
- 7. Goal is denoted by f(x). Throughout, we assume ncy, however, other metrics for response may re consider the problem of ﬁnding an optimal ⇤ that globally minimizes f(·) over X: x⇤ = arg min x2X f(x) (1) esponse function f(·) is usually unknown or n, i.e., yi = f(xi), xi ⇢ X. In practice, such may contain noise, i.e., yi = f(xi) + ✏. The of the optimal conﬁguration is thus a black- on program subject to noise [27, 33], which harder than deterministic optimization. A n is based on sampling that starts with a pled conﬁgurations. The performance of the sociated to this initial samples can deliver tanding of f(·) and guide the generation of of samples. If properly guided, the process ration-evaluation-feedback-regeneration will tinuously, (ii) Big Data systems are d frameworks (e.g., Apache Hadoop, S on similar platforms (e.g., cloud clust versions of a system often share a sim To the best of our knowledge, only the possibility of transfer learning in The authors learn a Bayesian network of a system and reuse this model fo systems. However, the learning is lim the Bayesian network. In this paper, that not only reuse a model that has b but also the valuable raw data. There to the accuracy of the learned model consider Bayesian networks and inste 2.4 Motivation A motivating example. We now i points on an example. WordCount (cf. benchmark [12]. WordCount features (Xi). In general, Xi may either indicate (i) integer vari- such as level of parallelism or (ii) categorical variable as messaging frameworks or Boolean variable such as ng timeout. We use the terms parameter and factor in- angeably; also, with the term option we refer to possible s that can be assigned to a parameter. assume that each conﬁguration x 2 X in the conﬁgura- pace X = Dom(X1) ⇥ · · · ⇥ Dom(Xd) is valid, i.e., the m accepts this conﬁguration and the corresponding test s in a stable performance behavior. The response with guration x is denoted by f(x). Throughout, we assume f(·) is latency, however, other metrics for response may ed. We here consider the problem of ﬁnding an optimal guration x⇤ that globally minimizes f(·) over X: x⇤ = arg min x2X f(x) (1) fact, the response function f(·) is usually unknown or ally known, i.e., yi = f(xi), xi ⇢ X. In practice, such it still requires hundr per, we propose to ad with the search e ci than starting the sear the learned knowledg software to accelerate version. This idea is i in real software engin in DevOps di↵erent tinuously, (ii) Big Da frameworks (e.g., Ap on similar platforms ( versions of a system o To the best of our k the possibility of tran The authors learn a B of a system and reus systems. However, the the Bayesian network. his conﬁguration and the corresponding test le performance behavior. The response with is denoted by f(x). Throughout, we assume ncy, however, other metrics for response may e consider the problem of ﬁnding an optimal ⇤ that globally minimizes f(·) over X: x⇤ = arg min x2X f(x) (1) esponse function f(·) is usually unknown or , i.e., yi = f(xi), xi ⇢ X. In practice, such may contain noise, i.e., yi = f(xi) + ✏. The of the optimal conﬁguration is thus a black- n program subject to noise [27, 33], which harder than deterministic optimization. A n is based on sampling that starts with a pled conﬁgurations. The performance of the sociated to this initial samples can deliver tanding of f(·) and guide the generation of of samples. If properly guided, the process ation-evaluation-feedback-regeneration will erge and the optimal conﬁguration will be in DevOps di↵erent versions of a system is delivered tinuously, (ii) Big Data systems are developed using s frameworks (e.g., Apache Hadoop, Spark, Kafka) an on similar platforms (e.g., cloud clusters), (iii) and di↵ versions of a system often share a similar business log To the best of our knowledge, only one study [9] ex the possibility of transfer learning in system conﬁgur The authors learn a Bayesian network in the tuning p of a system and reuse this model for tuning other s systems. However, the learning is limited to the struct the Bayesian network. In this paper, we introduce a m that not only reuse a model that has been learned prev but also the valuable raw data. Therefore, we are not li to the accuracy of the learned model. Moreover, we d consider Bayesian networks and instead focus on MTG 2.4 Motivation A motivating example. We now illustrate the pre points on an example. WordCount (cf. Figure 1) is a po benchmark [12]. WordCount features a three-layer arc ture that counts the number of words in the incoming s A Processing Element (PE) of type Spout reads the havior. The response with ). Throughout, we assume metrics for response may blem of ﬁnding an optimal nimizes f(·) over X: f(x) (1) (·) is usually unknown or xi ⇢ X. In practice, such i.e., yi = f(xi) + ✏. The ﬁguration is thus a black- t to noise [27, 33], which ministic optimization. A mpling that starts with a . The performance of the itial samples can deliver d guide the generation of perly guided, the process in DevOps di↵erent versions of a system is delivered con tinuously, (ii) Big Data systems are developed using simila frameworks (e.g., Apache Hadoop, Spark, Kafka) and ru on similar platforms (e.g., cloud clusters), (iii) and di↵eren versions of a system often share a similar business logic. To the best of our knowledge, only one study [9] explore the possibility of transfer learning in system conﬁguration The authors learn a Bayesian network in the tuning proces of a system and reuse this model for tuning other simila systems. However, the learning is limited to the structure o the Bayesian network. In this paper, we introduce a metho that not only reuse a model that has been learned previousl but also the valuable raw data. Therefore, we are not limite to the accuracy of the learned model. Moreover, we do no consider Bayesian networks and instead focus on MTGPs. 2.4 Motivation A motivating example. We now illustrate the previou points on an example. WordCount (cf. Figure 1) is a popula benchmark [12]. WordCount features a three-layer archite Partially known Measurements subject to noise Configuration space
- 8. Non-linear interactions 0 5 10 15 20 Number of counters 100 120 140 160 180 200 220 240 Latency(ms) splitters=2 splitters=3 number of counters number of splitters latency(ms) 100 150 1 200 250 2 300 Cubic Interpolation Over Finer Grid 243 684 10125 14166 18 Response surface is: - Non-linear - Non convex - Multi-modal
- 9. The measurements are subject to variability wc wc+rs wc+sol 2wc 2wc+rs+sol 10 1 10 2 Latency(ms) The scale of measurement variability is different in different deployments (heteroscedastic noise) y at points x that has been here consider the problem x⇤ that minimizes f over w experiments as possible: f(x) (1) ) is usually unknown or xi ⇢ X. In practice, such .e., yi = f(xi) + ✏i. Note ly partially-known, ﬁnding kbox optimization problem noise. In fact, the problem on-convex and multi-modal P-hard [36]. Therefore, on locate a global optimum, st possible local optimum udget. It shows the non-convexity, multi-modality and the substantial performance difference between different conﬁgurations. 0 5 10 15 20 Number of counters 100 120 140 160 180 200 220 240 Latency(ms) splitters=2 splitters=3 Fig. 3: WordCount latency, cut though Figure 2. demonstrates that if one tries to minimize latency by acting just on one of these parameters at the time, the resulting
- 10. Heavy tailed performance distributions -10 0 10 20 30 40 normalized distance (99perc-mean) 0 50 100 150 200 250 300 350 400 numberofperformancedistributions
- 11. BO4CO architecture Conﬁguration Optimisation Tool performance repository Monitoring Deployment Service Data Preparation conﬁguration parameters values conﬁguration parameters values Experimental Suite Testbed Doc Data Broker Tester experiment time polling interval conﬁguration parameters GP model Kafka System Under Test Workload Generator Technology Interface Storm Cassandra Spark
- 12. GP for modeling black box response function true function GP mean GP variance observation selected point true minimum mposed by its prior mean (µ(·) : X ! R) and a covariance nction (k(·, ·) : X ⇥ X ! R) [41]: y = f(x) ⇠ GP(µ(x), k(x, x0 )), (2) here covariance k(x, x0 ) deﬁnes the distance between x d x0 . Let us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be e collection of t experimental data (observations). In this mework, we treat f(x) as a random variable, conditioned observations S1:t, which is normally distributed with the lowing posterior mean and variance functions [41]: µt(x) = µ(x) + k(x)| (K + 2 I) 1 (y µ) (3) 2 t (x) = k(x, x) + 2 I k(x)| (K + 2 I) 1 k(x) (4) here y := y1:t, k(x)| = [k(x, x1) k(x, x2) . . . k(x, xt)], := µ(x1:t), K := k(xi, xj) and I is identity matrix. The ortcoming of BO4CO is that it cannot exploit the observa- ns regarding other versions of the system and as therefore nnot be applied in DevOps. 2 TL4CO: an extension to multi-tasks TL4CO 1 uses MTGPs that exploit observations from other evious versions of the system under test. Algorithm 1 ﬁnes the internal details of TL4CO. As Figure 4 shows, 4CO is an iterative algorithm that uses the learning from her system versions. In a high-level overview, TL4CO: (i) ects the most informative past observations (details in ction 3.3); (ii) ﬁts a model to existing data based on kernel arning (details in Section 3.4), and (iii) selects the next ork are based on tractable linear algebra. evious work [21], we proposed BO4CO that ex- task GPs (no transfer learning) for prediction of tribution of response functions. A GP model is y its prior mean (µ(·) : X ! R) and a covariance ·, ·) : X ⇥ X ! R) [41]: y = f(x) ⇠ GP(µ(x), k(x, x0 )), (2) iance k(x, x0 ) deﬁnes the distance between x us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be n of t experimental data (observations). In this we treat f(x) as a random variable, conditioned ons S1:t, which is normally distributed with the sterior mean and variance functions [41]: µ(x) + k(x)| (K + 2 I) 1 (y µ) (3) k(x, x) + 2 I k(x)| (K + 2 I) 1 k(x) (4) 1:t, k(x)| = [k(x, x1) k(x, x2) . . . k(x, xt)], , K := k(xi, xj) and I is identity matrix. The of BO4CO is that it cannot exploit the observa- ng other versions of the system and as therefore pplied in DevOps. CO: an extension to multi-tasks uses MTGPs that exploit observations from other Motivations: 1- mean estimates + variance 2- all computations are linear algebra 3- good estimations when few data
- 13. Sparsity of Effects • Correlation-based feature selector • Merit is used to select subsets that are highly correlated with the response variable • At most 2-3 parameters were strongly interacting with each other TABLE I: Sparsity of effects on 5 experiments where we have varied different subsets of parameters and used different testbeds. Note that these are the datasets we experimentally measured on the benchmark systems and we use them for the evaluation, more details including the results for 6 more experiments are in the appendix. Topol. Parameters Main factors Merit Size Testbed 1 wc(6D) 1-spouts, 2-max spout, 3-spout wait, 4-splitters, 5-counters, 6-netty min wait {1, 2, 5} 0.787 2880 C1 2 sol(6D) 1-spouts, 2-max spout, 3-top level, 4-netty min wait, 5-message size, 6-bolts {1, 2, 3} 0.447 2866 C2 3 rs(6D) 1-spouts, 2-max spout, 3-sorters, 4-emit freq, 5-chunk size, 6-message size {3} 0.385 3840 C3 4 wc(3D) 1-max spout, 2-splitters, 3-counters {1, 2} 0.480 756 C4 5 wc(5D) 1-spouts, 2-splitters, 3-counters, 4-buffer-size, 5-heap {1} 0.851 1080 C5 102 s) Experiments on: 1. C1: OpenNebula (X) 2. C2: Amazon EC2 (Y) 3. C3: OpenNebula (3X) 4. C4: Amazon EC2 (2Y) 5. C5: Microsoft Azure (X)
- 14. -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 x1 x2 x3 x4 true function GP surrogate mean estimate observation Fig. 5: An example of 1D GP model: GPs provide mean esti- mates as well as the uncertainty in estimations, i.e., variance. Conﬁguration Optimisation Tool performance repository Monitoring Deployment Service Data Preparation conﬁguration parameters values conﬁguration parameters values Experimental Suite Testbed Doc Data Broker Tester experiment time polling interval conﬁguration parameters GP model Kafka System Under Test Workload Generator Technology Interface Storm Cassandra Spark Algorithm 1 : BO4CO Input: Conﬁguration space X, Maximum budget Nmax, Re- sponse function f, Kernel function K✓, Hyper-parameters ✓, Design sample size n, learning cycle Nl Output: Optimal conﬁgurations x⇤ and learned model M 1: choose an initial sparse design (lhd) to ﬁnd an initial design samples D = {x1, . . . , xn} 2: obtain performance measurements of the initial design, yi f(xi) + ✏i, 8xi 2 D 3: S1:n {(xi, yi)}n i=1; t n + 1 4: M(x|S1:n, ✓) ﬁt a GP model to the design . Eq.(3) 5: while t Nmax do 6: if (t mod Nl = 0) ✓ learn the kernel hyper- parameters by maximizing the likelihood 7: ﬁnd next conﬁguration xt by optimizing the selection criteria over the estimated response surface given the data, xt arg maxxu(x|M, S1:t 1) . Eq.(9) 8: obtain performance for the new conﬁguration xt, yt f(xt) + ✏t 9: Augment the conﬁguration S1:t = {S1:t 1, (xt, yt)} 10: M(x|S1:t, ✓) re-ﬁt a new GP model . Eq.(7) 11: t t + 1 12: end while 13: (x⇤ , y⇤ ) = min S1:Nmax 14: M(x) -1.5 -1 -0.5 0 0.5 1 1.5 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Conﬁguration Space Empirical Model 2 4 6 8 10 12 1 2 3 4 5 6 160 140 120 100 80 60 180 Experiment (exhastive) Experiment Experiment 0 20 40 60 80 100 120 140 160 180 200 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Selection Criteria (b) Sequential Design (a) Design of Experiment P. Jamshidi, G. Casale, “An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing Systems”, MASCOTS 2016.
- 15. -1.5 -1 -0.5 0 0.5 1 1.5 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 conﬁguration domain responsevalue -1.5 -1 -0.5 0 0.5 1 1.5 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 true response function GP ﬁt -1.5 -1 -0.5 0 0.5 1 1.5 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 criteria evaluation new selected point -1.5 -1 -0.5 0 0.5 1 1.5 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 new GP ﬁt Acquisition function: btaining the measurements O then ﬁts a GP model to elief about the underlying rithm 1). The while loop in belief until the budget runs :t = {(xi, yi)}t i=1, where a prior distribution Pr(f) 1:t|f) form the posterior ) Pr(f). ions [37], speciﬁed by its iance (see Section III-E1): ), k(x, x0 )), (3) where µt(x) = µ(x) + k(x)| (K + 2 I) 1 (y µ) (7) 2 t (x) = k(x, x) + 2 I k(x)| (K + 2 I) 1 k(x) (8) These posterior functions are used to select the next point xt+1 as detailed in Section III-C. C. Conﬁguration selection criteria The selection criteria is deﬁned as u : X ! R that selects xt+1 2 X, should f(·) be evaluated next (step 7): xt+1 = argmax x2X u(x|M, S1:t) (9)
- 16. Correlations: SPS experiments 100 150 1 200 250 Latency(ms) 300 2 5 3 104 5 15 6 14 16 18 20 1 22 24 26 Latency(ms) 28 30 32 2 53 104 5 156 number of countersnumber of splitters number of countersnumber of splitters 2.8 2.9 1 3 3.1 3.2 3.3 2 Latency(ms) 3.4 3.5 3.6 3 5 4 10 5 15 6 1.2 1.3 1.4 1 1.5 1.6 1.7 Latency(ms) 1.8 1.9 2 53 104 5 156 (a) WordCount v1 (b) WordCount v2 (c) WordCount v3 (d) WordCount v4 (e) Pearson correlation coeﬃcients (g) Measurement noise across WordCount versions (f) Spearman correlation coeﬃcients correlation coeﬃcient p-value v1 v2 v3 v4 500 600 700 800 900 1000 1100 1200 Latency(ms) hardware change softwarechange Table 1: My caption v1 v2 v3 v4 v1 1 0.41 -0.46 -0.50 v2 7.36E-06 1 -0.20 -0.18 v3 6.92E-07 0.04 1 0.94 v4 2.54E-08 0.07 1.16E-52 1 Table 2: My caption v1 v2 v3 v4 v1 1 0.49 -0.51 -0.51 v2 5.50E-08 1 -0.2793 -0.24 v3 1.30E-08 0.003 1 0.88 v4 1.40E-08 0.01 8.30E-36 1 Table 3: My caption ver. µ µ v1 516.59 7.96 64.88 v1 v2 v3 v4 v1 1 0.41 -0.46 -0.50 v2 7.36E-06 1 -0.20 -0.18 v3 6.92E-07 0.04 1 0.94 v4 2.54E-08 0.07 1.16E-52 1 Table 2: My caption v1 v2 v3 v4 v1 1 0.49 -0.51 -0.51 v2 5.50E-08 1 -0.2793 -0.24 v3 1.30E-08 0.003 1 0.88 v4 1.40E-08 0.01 8.30E-36 1 Table 3: My caption ver. µ µ v1 516.59 7.96 64.88 v2 584.94 2.58 226.32 v3 654.89 13.56 48.30 v4 1125.81 16.92 66.56 - Different correlations - Different optimum Configurations - Different noise level
- 17. Correlations: Cassandra experiments 0 4 0.5 1 4 ×104 Latency(µs) 3 1.5 3 2 2 2 1 1 -1 4 0 1 2 4 Latency(µs) ×10 5 3 3 4 3 5 2 2 1 1 1000 4 1500 2000 2500 4 3000 Latency(µs) 3 3500 4000 3 4500 2 2 1 1 1300 4 1350 1400 4 Latency(µs) 3 1450 1500 3 1550 2 2 1 1 (a) cass-20 v1 (b) cass-20 v2 (c) cass-10 v1 (d) cass-10 v2 concurrent_reads concurrent_writes concurrent_reads concurrent_writes concurrent_reads concurrent_writes concurrent_reads concurrent_writes - Different correlations - Different optimum configurations
- 18. DevOps - Different versions are continuously delivered (daily basis). - Big Data systems are developed using similar frameworks (Apache Storm, Spark, Hadoop, Kafka, etc). - Different versions share similar business logics.
- 19. Solution: Transfer Learning for Configuration Optimization Conﬁguration Optimization (version j=M) performance measurements Initial Design Model Fit Next Experiment Model Update Budget Finished performance repository Conﬁguration Optimization (version j=N) Initial Design Model Fit Next Experiment Model Update Budget Finished select data for training GP model hyper-parameters store ﬁlter
- 20. The case where we learn from correlated responses -1.5 -1 -0.5 0 0.5 1 1.5 -4 -3 -2 -1 0 1 2 3 (a) 3 sample response functions conﬁguration domain responsevalue (1) (2) (3) observations (b) GP ﬁt for (1) ignoring observations for (2),(3) LCB not informative (c) multi-task GP ﬁt for (1) by transfer learning from (2),(3) highly informative GP prediction mean GP prediction variance probability distribution of the minimizers
- 21. Comparison with default and expert prescription 0 500 1000 1500 Throughput (ops/sec) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Averagereadlatency(µs) ×10 4 TL4CO BO4CO BO4CO after 20 iterations TL4CO after 20 iterations TL4CO after 100 iterations 0 500 1000 1500 Throughput (ops/sec) 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Averagewritelatency(µs) TL4CO BO4CO Default conﬁguration Conﬁguration recommended by expert TL4CO after 100 iterations BO4CO after 100 iterations Default conﬁguration Conﬁguration recommended by expert
- 22. Prediction accuracy over time 0 20 40 60 80 100 Iteration 10-4 10-3 10 -2 10-1 10 0 10 1 PredictionError(RMSE) T=2,m=100 T=2,m=200 T=2,m=300 T=3,m=100 0 20 40 60 80 100 Iteration 10-4 10-3 10-2 10-1 100 101 PredictionError(RMSE) TL4CO polyfit1 polyfit2 polyfit4 polyfit5 M5Tree M5Rules PRIM (a) (b)
- 23. Entropy of the density function of the minimizers 0 20 40 60 80 100 0 1 2 3 4 5 6 7 8 9 10 Entropy T=1(BO4CO) T=2,m=100 T=2,m=200 T=2,m=300 T=2,m=400 T=3,m=100 1 2 3 4 5 6 7 8 9 0 2 4 6 8 10 BO4CO TL4CO Entropy Iteration Branin Hartmann WC(3D) SOL(6D) WC(5D)Dixon WC(6D) RS(6D) cass-20 he knowledge about the location of optimum conﬁgura- is summarized by the approximation of the conditional bability density function of the response function mini- ers, i.e., X⇤ = Pr(x⇤ |f(x)), where f(·) is drawn from MTGP model (cf. solid red line in Figure 5(b,c)). The opy of the density functions in Figure 5(b,c) are 6.39, so we know more information about the latter. he results in Figure 19 conﬁrm that the entropy measure e minimizers with the models provided by TL4CO for all datasets (synthetic and real) signiﬁcantly contains more mation. The results demonstrate that the main reason ﬁnding quick convergence comparing with the baselines at TL4CO employs a more e↵ective model. The results igure 19(b) show the change of entropy of X⇤ over time WC(5D) dataset. First, it shows that in TL4CO, the opy decreases sharply. However, the overall decrease of opy for BO4CO is slow. The second observation is that TL4CO variance, storing K making th 5. DIS 5.1 Be TL4CO experimen practice. A than thre the system our appro Knowledge about the location of the minimizer
- 24. Lets discuss Ø Be aware of Uncertainty - By quantifying the uncertainty (look at Catia’s work) - Make decisions taking into account the right level of uncertainty (homoscedastic vs heteroscedastic) - Uncertainty sometimes helps (models that provide an estimation of the uncertainty are typically more informative) - By exploiting this knowledge you can only explore interesting zones rather than learning the whole performance function Ø You can learn from operational data - Not only from the current version, but from previous measurements as well - Use the learning from past measurements as prior knowledge - Too much data can be also harmful, it would slow down or blur the proper learning
- 25. Submit to SEAMS 2017 - Any work on Self-* - For Performance-Aware DevOps community: - DevOps for adaptive systems? - Self-adaptive DevOps pipeline? - Abstract Submission: 6 Jan, 2017 (firm) - Paper Submission: 13 Jan, 2017 (firm) - Page limit: - Long: 10+2, - New ideas and tools: 6+1 - More info: https://wp.doc.ic.ac.uk/seams2017/ - Symposium: 22-23 May, 2017 - We accept artifacts submissions (tool, data, model) 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems http://wp.doc.ic.ac.uk/seams2017 Call for Papers Self-adaptation and self-management are key objectives in many modern and emerging software systems, including the industrial internet of things, cyber-physical systems, cloud computing, and mobile computing. These systems must be able to adapt themselves at run time to preserve and optimize their operation in the presence of uncertain changes in their operating environment, resource variability, new user needs, attacks, intrusions, and faults. Approaches to complement software-based systems with self-managing and self-adaptive capabilities are an important area of research and development, offering solutions that leverage advances in fields such as software architecture, fault-tolerant computing, programming languages, robotics, and run-time program analysis and verification. Additionally, research in this field is informed by related areas like biologically-inspired computing, artificial intelligence, machine learning, control systems, and agent-based systems. The SEAMS symposium focuses on applying software engineering to these approaches, including methods, techniques, and tools that can be used to support self-* properties like self-adaptation, self-management, self-healing, self-optimization, and self-configuration. The objective of SEAMS is to bring together researchers and practitioners from diverse areas to investigate, discuss, and examine the fundamental principles, state of the art, and critical challenges of engineering self-adaptive and self- managing systems. Topics of Interest: All topics related to engineering self-adaptive and self-managing systems, including: Foundational Concepts • self-* properties • control theory • algorithms • decision-making and planning • managing uncertainty • mixed-initiative and human-in-the-loop systems Languages • formal notations for modeling and analyzing self-* properties • programming language support for self-adaptation Constructive methods • requirements elicitation techniques • reuse support (e.g., patterns, designs, code) • architectural techniques • legacy systems Analytical Methods for Self-Adaptation and -Management • evaluation and assurance • verification and validation • analysis and testing frameworks Application Areas • Industrial internet of things • Cyber-physical systems • Cloud computing • Mobile computing • Robotics • Smart user interfaces • Security and privacy • Wearables and ubiquitous/pervasive systems Artifacts* and Evaluations • model problems and exemplars • resources, metrics, or software that can be used to compare self-adaptive approaches • experiences in applying tools to real problems *There will be a specific session to be dedicated to artifacts that may be useful for the community as a whole. Please see http://wp.doc.ic.ac.uk/seams2017/call-for-artifacts/ for more details. Selected papers will be invited to submit to the ACM Transactions on Autonomous and Adaptive Systems (TAAS). Paper Submission Details Further Information Symposia-related email should be addressed to: seams17-org@lists.andrew.cmu.edu Important Dates: Abstract Submission: 6 January, 2017 (firm) Paper Submission: 13 January, 2017 (firm) Notification: 21 February, 2017 Camera ready: 6 Mar, 2017 SEAMS solicits three types of papers: long papers (10 pages for the main text, inclusive of figures, tables, appendices, etc.; references may be included on up to two additional pages), short papers for new ideas and early results (6 pages + 1 references) and artifact papers (6 pages + 1 reference). Long papers should clearly describe innovative and original research or explain how existing techniques have been applied to real-world examples. Short papers should describe novel and promising ideas and/or techniques that are in an early stage of development. Artifact papers must describe why and how the accompanying artifact may be useful for the broader community. Papers must not have been previously published or concurrently submitted elsewhere. Papers must conform to IEEE formatting guidelines (see ICSE 2017 style guidelines), and submitted via EasyChair. Accepted papers will appear in the symposium proceedings that will be published in the ACM and IEEE digital libraries. General Chair David Garlan, USA Program Chair Bashar Nuseibeh, UK Artifact Chair Javier Cámara, US Publicity Chair Pooyan Jamshidi, UK Local Chair Nicolás D’Ippolito, AR Program Committee TBD Steering Committee Luciano Baresi, Italy Nelly Bencomo, UK Gregor Engels, Germany Rogério de Lemos, UK David Garlan, USA Paola Inverardi, Italy Marin Litoiu (Chair), Canada John Mylopoulos, Italy Hausi A. Müller, Canada Bashar Nuseibeh, UK Bradley Schmerl, USA Co-locatedwith
- 26. Acknowledgement / IC4 activities - My participation to the Dagstuhl seminar is fully supported by IC4. - We are working on a machine learning work for predicting the performance (job completion time, utilizations, throughput, performance regressions) of big data (Apache Hadoop and Spark), the results will be soon published (PIs: Theo Lynn, Brian Lee, and other colleagues Saul Gill, Binesh Nair, David O’Shea, Yuansong Qiao) - We are also working on cloud/microservices migration for IC4 industry members (PIs: Theo Lynn, Claus Pahl) - And a self-configuration tool for highly configurable systems (with Theo Lynn)

No public clipboards found for this slide

Be the first to comment