Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Learning Software Performance Models for Dynamic and Uncertain Environments

541 views

Published on

Pooyan's job talk

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Learning Software Performance Models for Dynamic and Uncertain Environments

  1. 1. Learning Software Performance Models for Dynamic and Uncertain Environments Pooyan Jamshidi NC State University RAISE Lab June 2017
  2. 2. My background 5 RobusT2Scale/FQL4KE [PhD] ✓ Engineering / Technical ✓ Auto-scaling in cloud (RobusT2Scale) ✓ Self-learning controller for cloud auto-scaling (FQL4KE) BO4CO/TL4CO [Postdoc1@Imperial] ✓ Mathematical modeling ✓ Configuration optimization for big data (BO4CO) ✓ Performance-aware DevOps (TL4CO) Transfer Learning [Postdoc2@CMU] ✓ Empirical ✓ Learns accurate and reliable models from “related” sources ✓ Reuse learning across environmental changes Software industry [2003-2010]: Pre-PhD Close collaborations with Intel and Microsoft [PhD] 3 EU projects: MODAClouds (cloud), DICE (big data), Human Brain (clustering) [Postdoc1@Imperial] 1 DARPA project: BRASS (Robotics) [Postdoc2@CMU]
  3. 3. Robotics systems • Environment changes • Resources available (e.g. power) • New elements (e.g. obstacles) • Evolve functionality • Hardware and software changes • New tasks, goals, policies • Close proximity to human • Human in the loop 6 [Credit to CMU CoBot]
  4. 4. Cloud applications 7 * JupiterResearch ** Amazon ***Google • 82% of end-users give up on a lost payment transaction* • 25% of end-users leave if load time > 4s** • 1% reduced sale per 100ms load time** • 20% reduced income if 0.5s longer load time*** flash-crowds failures capacity shortage slow application [Credit to Cristian Klein, Brownout]
  5. 5. Big data analytics • Failures (of data flow services) • Bottleneck (because of slow nodes or failure) • End-to-end latency 8 • Click Stream Analytics Example Ingestion Layer Analytics Layer Storage Layer [Credit to A. Khoshbarforushha]
  6. 6. Common characteristics of the systems • Moderns systems are increasingly configurable • Modern systems are deployed in dynamic and uncertain environments • Modern systems can be adapted on the fly 9 Hey, You Have Given Me Too Many Knobs! Understanding and Dealing with Over-Designed Configuration in System Software Tianyin Xu*, Long Jin*, Xuepeng Fan*‡, Yuanyuan Zhou*, Shankar Pasupathy† and Rukma Talwadker† *University of California San Diego, ‡Huazhong Univ. of Science & Technology, †NetApp, Inc {tixu, longjin, xuf001, yyzhou}@cs.ucsd.edu {Shankar.Pasupathy, Rukma.Talwadker}@netapp.com ABSTRACT Configuration problems are not only prevalent, but also severely mpair the reliability of today’s system software. One fundamental eason is the ever-increasing complexity of configuration, reflected y the large number of configuration parameters (“knobs”). With undreds of knobs, configuring system software to ensure high re- ability and performance becomes a daunting, error-prone task. This paper makes a first step in understanding a fundamental uestion of configuration design: “do users really need so many nobs?” To provide the quantitatively answer, we study the con- guration settings of real-world users, including thousands of cus- omers of a commercial storage system (Storage-A), and hundreds f users of two widely-used open-source system software projects. Our study reveals a series of interesting findings to motivate soft- ware architects and developers to be more cautious and disciplined n configuration design. Motivated by these findings, we provide few concrete, practical guidelines which can significantly reduce he configuration space. Take Storage-A as an example, the guide- nes can remove 51.9% of its parameters and simplify 19.7% of he remaining ones with little impact on existing users. Also, we tudy the existing configuration navigation methods in the context f “too many knobs” to understand their effectiveness in dealing with the over-designed configuration, and to provide practices for uilding navigation support in system software. Categories and Subject Descriptors: D.2.10 [Software Engineer- 7/2006 7/2008 7/2010 7/2012 7/2014 0 100 200 300 400 500 600 700 Storage-A Numberofparameters Release time 1/1999 1/2003 1/2007 1/2011 0 100 200 300 400 500 5.6.2 5.5.0 5.0.16 5.1.3 4.1.0 4.0.12 3.23.0 1/2014 MySQL Numberofparameters Release time 1/1998 1/2002 1/2006 1/2010 1/2014 0 100 200 300 400 500 600 1.3.14 2.2.14 2.3.4 2.0.35 1.3.24 Numberofparameters Release time Apache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Numberofparameters Release time MapReduce HDFS Figure 1: The increasing number of configuration parameters with software evolution. Storage-A is a commercial storage system from a ma- jor storage company in the U.S. all the customer-support cases in a major storage company in the U.S., and were the most significant contributor (31%) among all [Credit to Tianyin Xu, Too Many Knobs]
  7. 7. Elasticity Management in Cloud
  8. 8. Motivation ~50% = wasted hardware Actual traffic Typical weekly traffic to Web-based applications (e.g., Amazon.com)
  9. 9. Motivation Problem 1: ~75% wasted capacity Actual demand Problem 2: customer lost Traffic in an unexpected burst in requests (e.g. end of year traffic to Amazon.com)
  10. 10. Motivation Auto-scaling enables you to realize this ideal on-demand provisioning
  11. 11. An Example of Auto-scaling Rule These values are required to be determined by users Þ requires deep knowledge of application (CPU, memory) Þ requires performance modeling expertise (how to scale) Þ A unified opinion of user(s) is required Amazon auto scaling Microsoft Azure Watch 14 Microsoft Azure Auto- scaling Application Block
  12. 12. RobusT2Scale: Fuzzy control to facilitate elasticity policy encoding RobusT2Scale Initial setting + elasticity rules + response-time SLA environment monitoring application monitoring scaling actions Fuzzy Reasoning Users Prediction/ Smoothing [SEAMS14,SEAMS15,TAAS16]
  13. 13. Cloud is a dynamic and uncertain environment 17 0 50 100 0 500 1000 1500 0 50 100 100 200 300 400 500 0 50 100 0 1000 2000 0 50 100 0 200 400 600 0 50 100 0 500 1000 0 50 100 0 500 1000 Big spike Dual phase Large variations Quickly varying Slowly varying Steep tri phase 0 1000 2:00 6:47 userrequests time d userrequests Fig. 11: Synthetic workload patterns. and it never converges to an optimal policy. 0 1 2 3 4 5 6 7 8 0 1 high e numberofVMs Fig. 12: Au [CLOUD16, TOIT17]
  14. 14. Fuzzifier Inference Engine Defuzzifier Rule base Fuzzy Q-learning Cloud ApplicationMonitoring Actuator Cloud Platform Fuzzy Logic Controller Knowledge Learning Autonomic Controller 𝑟𝑡 𝑤 𝑤,𝑟𝑡,𝑡ℎ,𝑣𝑚 𝑠𝑎 system state system goal• Learn how to react to uncertainties instead of encoding reaction policies • Self-learning the adaptation rules FQL4KE: Self-learning controller for dynamic cloud environments [QoSA16,CCGrid17]
  15. 15. Learning Software Performance Models
  16. 16. Previous work on performance transfer learning Performance modeling Modeling [Woodside, Johnston, Yigitbasi] Sampling: Random, Partial, Design of Experiment [Guo, Sarkar, Siegmund] Optimization: Recursive Random Sampling [Ye], Smart Hill Climbing [Xi], Direct search [Zheng], Quick Optimization via Guessing [Osogami], Bayesian Optimization [Jamshidi], Multi-objective optimization [Filieri] Learning: SVM [Yigitbasi], Decision tree [Nair], Fourier sparse functions [Zhang], Active learning [Siegmund], Search-based optimization and evolutionary algorithms [Henard, Wu] Performance analysis across environmental change Hardware: MapReduce [Yigitbasi], Anomaly detection [Stewart], Micro-benchmark measurements [Hoste], Configurable systems [Thereska] Transfer learning in performance modeling and analysis Performance predictions [BO4CO]; Configuration dependency [Chen] Model transfer [Valov]; Co-design exploration [Bodin] Configuration optimization [TL4CO] Transfer learning in software engineering Defect prediction [Krishna, Nam] Effort estimation [Kocaguneli] Transfer learning in machine learning [Jialin Pan, Torrey] Inductive Transfer Learning Unsupervised Transfer Learning Transductive Transfer Learning 21
  17. 17. Research gap • Transfer learning can make performance modeling and analysis more efficient. • My research concerns: • Transferring software performance models across heterogeneous environments (A cost-aware TL solution, 1st part of this talk) • “Why” and “when” transfer learning works (An exploratory study, 2nd part of this talk) 22
  18. 18. Configuration options influence performance 1- Many different Parameters => - large state space - interactions 2- Defaults are typically used => - poor performance
  19. 19. Configuration options influence performance number of counters number of splitters latency(ms) 100 150 1 200 250 2 300 Cubic Interpolation Over Finer Grid 243 684 10125 14166 18 In our experiments we observed improvement up to 100%
  20. 20. Adapt to different environments 26 TurtleBot
  21. 21. Adapt to different environments 27 TurtleBot 50+3*C1+15*C2-7*C2*C3
  22. 22. Adapt to different environments 28 TurtleBot 50+3*C1+15*C2-7*C2*C3
  23. 23. Classic sensitivity analysis 29 Measure TurtleBot
  24. 24. Classic sensitivity analysis 30 Measure Learn TurtleBot 50+3*C1+15*C2-7*C2*C3
  25. 25. Classic sensitivity analysis 31 Measure Learn TurtleBot 50+3*C1+15*C2-7*C2*C3
  26. 26. Classic sensitivity analysis 32 Measure Learn 50+3*C1+15*C2-7*C2*C3 TurtleBot Optimization + Reasoning + Debugging + Tuning
  27. 27. Measuring performance is expensive 33 25 options × 10 values = 1025 configurations Measure
  28. 28. Transfer Learning for Performance Modeling and Analysis
  29. 29. Reuse data from cheaper sources Measure TurtleBot Data
  30. 30. Reuse data from cheaper sources Measure TurtleBot Measure Simulator (Gazebo) Data Data
  31. 31. Reuse data from cheaper sources 37 Measure TurtleBot Measure Simulator (Gazebo) Data Reuse DataData
  32. 32. Reuse data from cheaper sources 38 Measure Learn with TL TurtleBot 50+3*C1+15*C2-7*C2*C3 Measure Simulator (Gazebo) Data Reuse DataData
  33. 33. Reuse data from cheaper sources 39 Measure Learn with TL TurtleBot 50+3*C1+15*C2-7*C2*C3 Measure Simulator (Gazebo) Data Reuse DataData [SEAMS17]
  34. 34. Transfer between different systems/environments 40 Simulator Noisy simulator Cheaper robot Different mission
  35. 35. Technical Details
  36. 36. GP for modeling black box response function true function GP mean GP variance observation selected point true minimum making. The other reason is that all the computations in this framework are based on tractable linear algebra. In our previous work [21], we proposed BO4CO that ex- ploits single-task GPs (no transfer learning) for prediction of posterior distribution of response functions. A GP model is composed by its prior mean (µ(·) : X ! R) and a covariance unction (k(·, ·) : X ⇥ X ! R) [41]: y = f(x) ⇠ GP(µ(x), k(x, x0 )), (2) where covariance k(x, x0 ) defines the distance between x and x0 . Let us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be the collection of t experimental data (observations). In this ramework, we treat f(x) as a random variable, conditioned on observations S1:t, which is normally distributed with the ollowing posterior mean and variance functions [41]: µt(x) = µ(x) + k(x)| (K + 2 I) 1 (y µ) (3) 2 t (x) = k(x, x) + 2 I k(x)| (K + 2 I) 1 k(x) (4) where y := y1:t, k(x)| = [k(x, x1) k(x, x2) . . . k(x, xt)], µ := µ(x1:t), K := k(xi, xj) and I is identity matrix. The shortcoming of BO4CO is that it cannot exploit the observa- tions regarding other versions of the system and as therefore cannot be applied in DevOps. 3.2 TL4CO: an extension to multi-tasks TL4CO 1 uses MTGPs that exploit observations from other previous versions of the system under test. Algorithm 1 defines the internal details of TL4CO. As Figure 4 shows, TL4CO is an iterative algorithm that uses the learning from other system versions. In a high-level overview, TL4CO: (i) selects the most informative past observations (details in Section 3.3); (ii) fits a model to existing data based on kernel earning (details in Section 3.4), and (iii) selects the next configuration based on the model (details in Section 3.5). In the multi-task framework, we use historical data to fit a better GP providing more accurate predictions. Before that, we measure few sample points based on Latin Hypercube De- n Optimization here is that it o↵ers a framework asoning can be not only based on mean estimates he variance, providing more informative decision The other reason is that all the computations in work are based on tractable linear algebra. revious work [21], we proposed BO4CO that ex- e-task GPs (no transfer learning) for prediction of istribution of response functions. A GP model is by its prior mean (µ(·) : X ! R) and a covariance k(·, ·) : X ⇥ X ! R) [41]: y = f(x) ⇠ GP(µ(x), k(x, x0 )), (2) ariance k(x, x0 ) defines the distance between x et us assume S1:t = {(x1:t, y1:t)|yi := f(xi)} be on of t experimental data (observations). In this , we treat f(x) as a random variable, conditioned tions S1:t, which is normally distributed with the posterior mean and variance functions [41]: = µ(x) + k(x)| (K + 2 I) 1 (y µ) (3) = k(x, x) + 2 I k(x)| (K + 2 I) 1 k(x) (4) y1:t, k(x)| = [k(x, x1) k(x, x2) . . . k(x, xt)], t), K := k(xi, xj) and I is identity matrix. The g of BO4CO is that it cannot exploit the observa- ding other versions of the system and as therefore applied in DevOps. 4CO: an extension to multi-tasks uses MTGPs that exploit observations from other ersions of the system under test. Algorithm 1 e internal details of TL4CO. As Figure 4 shows, an iterative algorithm that uses the learning from em versions. In a high-level overview, TL4CO: (i) Motivations: 1- mean estimates + variance 2- all computations are linear algebra 3- good estimations when few data k(x)| = [k(x, x1) k(x, x2) . . . k(x, xt)], I is y matrix and K := 2 6 4 k(x1, x1) . . . k(x1, xt) ... ... ... k(xt, x1) . . . k(xt, xt) 3 7 5 (7) models have shown to be effective for performance ions in data scarce domains [20]. However, as we demonstrated in Figure 2, it may becomes inaccurate the samples do not cover the space uniformly. For configurable systems, we require a large number of ations to cover the space uniformly, making GP models tive in such situations. del prediction using transfer learning ansfer learning, the key question is how to make accu- edictions for the target environment using observations ther sources, Ds. We need a measure of relatedness not etween input configurations but between the sources l. The relationships between input configurations was ed in the GP models using the covariance matrix that efined based on the kernel function in Eq. (7). More cally, a kernel is a function that computes a dot product sure of “similarity”) between two input configurations. e kernel helps to get accurate predictions for similar urations. We now need to exploit the relationship be- the source and target functions, g, f, using the current ations Ds, Dt to build the predictive model ˆf. To capture ationship, we define the following kernel function: k(f, g, x, x0 ) = kt(f, g) ⇥ kxx(x, x0 ), (8) the kernels kt represent the correlation between source rget function, while kxx is the covariance function for Typically, kxx is parameterized and its parameters are by maximizing the marginal likelihood of the model An overview of a self-optimization solution is depicted in Figure 3 following the well-know MAPE-K framework [9], [23]. We consider the GP model as the K (knowledge) component of this framework that acts as an interface to which other components can query the performance under specific configurations or update the model given a new observation. We use transfer learning to make the knowledge more accurate using observations that are taken from a simulator or any other cheap sources. For deciding how many observations and from what source to transfer, we use the cost model that we have introduced earlier. At runtime, the managed system is Monitored by pulling the end-to-end performance metrics (e.g., latency, throughput) from the corresponding sensors. Then, the retrieved performance data needs to be Analysed and the mean performance associated to a specific setting of the system will be stored in a data repository. Next, the GP model needs to be updated taking into account the new performance observation. Having updated the GP model, a new configuration may be Planned to replace the current configuration. Finally, the new configuration will be enacted by Executing appropriate platform specific operations. This enables model-based knowledge evolution using machine learning [2], [21]. The underlying GP model can now be updated not only when a new observation is available but also by transferring the learning from other related sources. So at each adaptation cycle, we can update our belief about the correct response given data from the managed system and other related sources, accelerating the learning process. IV. EXPERIMENTAL RESULTS We evaluate the effectiveness and applicability of our transfer learning approach for learning models for highly- configurable systems, in particular, compared to conventional non-transfer learning. Specifically, we aim to answer the following three research questions: RQ1: How much does transfer learning improve the prediction ernal details about the system; the learning process can ied in a black-box fashion using the sampled perfor- measurements. In the GP framework, it is also possible rporate domain knowledge as prior, if available, which hance the model accuracy [20]. rder to describe the technical details of our transfer g methodology, let us briefly describe an overview of del regression; a more detailed description can be found ere [35]. GP models assume that the function ˆf(x) can rpreted as a probability distribution over functions: y = ˆf(x) ⇠ GP(µ(x), k(x, x0 )), (4) µ : X ! R is the mean function and k : X ⇥ X ! R covariance function (kernel function) which describes tionship between response values, y, according to the e of the input values x, x0 . The mean and variance of model predictions can be derived analytically [35]: x) = µ(x) + k(x)| (K + 2 I) 1 (y µ), (5) x) = k(x, x) + 2 I k(x)| (K + 2 I) 1 k(x), (6) k(x)| = [k(x, x1) k(x, x2) . . . k(x, xt)], I is matrix and K := 2 6 4 k(x1, x1) . . . k(x1, xt) ... ... ... k(xt, x1) . . . k(xt, xt) 3 7 5 (7) models have shown to be effective for performance ons in data scarce domains [20]. However, as we emonstrated in Figure 2, it may becomes inaccurate he samples do not cover the space uniformly. For configurable systems, we require a large number of tions to cover the space uniformly, making GP models a standard method [35]. After learning the parameters of kxx, we construct the covariance matrix exactly the same way as in Eq. 7 and derive the mean and variance of predictions using Eq. (5), (6) with the new K. The main essence of transfer learning is, therefore, the kernel that capture the source and target relationship and provide more accurate predictions using the additional knowledge we can gain via the relationship between source and target. D. Transfer learning in a self-adaptation loop Now that we have described the idea of transfer learning for providing more accurate predictions, the question is whether such an idea can be applied at runtime and how the self- adaptive systems can benefit from it. More specifically, we now describe the idea of model learning and transfer learning in the context of self-optimization, where the system adapts its configuration to meet performance requirements at runtime. The difference to traditional configurable systems is that we learn the performance model online in a feedback loop under time and resource constraints. Such performance reasoning is done more frequently for self-adaptation purposes. An overview of a self-optimization solution is depicted in Figure 3 following the well-know MAPE-K framework [9], [23]. We consider the GP model as the K (knowledge) component of this framework that acts as an interface to which other components can query the performance under specific configurations or update the model given a new observation. We use transfer learning to make the knowledge more accurate using observations that are taken from a simulator or any other cheap sources. For deciding how many observations and from what source to transfer, we use the cost model that we have introduced earlier. At runtime, the managed system is Monitored by pulling the end-to-end performance
  37. 37. Exploiting Similarity
  38. 38. Function prediction 441 3 5 9 10 11 12 13 14 0 50 100 150 200 Target function Target samples
  39. 39. Prediction without transfer learning 45
  40. 40. Prediction with more data without transfer learning 46
  41. 41. Prediction with transfer learning 47
  42. 42. Benefits of Transfer Learning
  43. 43. Transfer learning improves sampling 49 Sample where uncertainty is high to gain more information
  44. 44. -1.5 -1 -0.5 0 0.5 1 1.5 -4 -3 -2 -1 0 1 2 3 (a) 3 sample response functions configuration domain responsevalue (1) (2) (3) observations (b) GP fit for (1) ignoring observations for (2),(3) LCB not informative (c) multi-task GP fit for (1) by transfer learning from (2),(3) highly informative GP prediction mean GP prediction variance probability distribution of the minimizers Transfer learning improves optimization
  45. 45. When TL Can Go Wrong!
  46. 46. Transfer from a misleading source 52 When no transfer is better than a bad transfer!
  47. 47. Transfer from a misleading source 53 When bad transfer lead to a high uncertainty for the predictive model
  48. 48. Evaluation
  49. 49. Case Study and Controlled Experiments •RQ1: Improve prediction accuracy? •RQ2: Tradeoffs among source and target samples? •RQ3: Fast enough for self-adaptive systems? 55
  50. 50. Subject systems 56 • Autonomous service robot • Environmental change • 3 stream processing apps • Workload change • NoSQL • Workload & hardware change
  51. 51. Prediction Accuracy
  52. 52. Performance prediction for CoBot 58 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25
  53. 53. Performance prediction for CoBot 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 59 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 CPU usage[%] CPU usage[%] Source Target
  54. 54. Performance prediction for CoBot 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 60 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 CPU usage[%] CPU usage[%] CPU usage[%] Source Target Prediction with 4 samples
  55. 55. Performance prediction for CoBot 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 61 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 CPU usage[%]CPU usage[%] CPU usage[%] CPU usage[%] Source Target Prediction with 4 samples Prediction with TL
  56. 56. Tradeoff between Source and Target Samples
  57. 57. Prediction error with different source and target samples 63 % Source %Target
  58. 58. Prediction error with different source and target samples 64 % Source %Target
  59. 59. Prediction error with different source and target samples 65 % Source %Target
  60. 60. Prediction error with different source and target samples 66 % Source %Target
  61. 61. Prediction error of other systems 67 CoBot WordCount SOL RollingSort Cassandra (HW) Cassandra (DB)
  62. 62. Take away from our TL approach •Reuse data from similar system •Improves model accuracy/reliability •Decrease learning cost 68 Measure Learn with TL TurtleBot 50+3*C1+15*C2-7*C2*C3 Measure Simulator (Gazebo) Data Reuse DataData
  63. 63. “Why” and “When” Transfer Learning Works: An Exploratory Analysis
  64. 64. Small vs severe environmental changes • Workload change (New tasks or missions, new environmental conditions) • Infrastructure change (New Intel NUC, new Camera, new Sensors) • Code change (new versions of ROS, new localization algorithm) • Combination of these changes 70
  65. 65. Transferable knowledge 71 Target (Learn)Source (Given) DataModel Transferable Knowledge II. INTUITION Understanding the performance behavior of configurable oftware systems can enable (i) performance debugging, (ii) erformance tuning, (iii) design-time evolution, or (iv) runtime daptation [11]. We lack empirical understanding of how the erformance behavior of a system will vary when the environ- ment of the system changes. Such empirical understanding will rovide important insights to develop faster and more accurate earning techniques that allow us to make predictions and ptimizations of performance for highly configurable systems n changing environments [10]. For instance, we can learn erformance behavior of a system on a cheap hardware in a ontrolled lab environment and use that to understand the per- ormance behavior of the system on a production server before hipping to the end user. More specifically, we would like to now, what the relationship is between the performance of a ystem in a specific environment (characterized by software onfiguration, hardware, workload, and system version) to the ne that we vary its environmental conditions. In this research, we aim for an empirical understanding of A. Preliminary concepts In this section, we prov cepts that we use through enable us to concisely con 1) Configuration and e the i-th feature of a confi enabled or disabled and o configuration space is ma all the features C = Do Dom(Fi) = {0, 1}. A a member of the configu all the parameters are as range (i.e., complete instan We also describe an env e = [w, h, v] drawn from W ⇥H ⇥V , where they re values for workload, hard 2) Performance model: configuration space F and formance model is a blac given some observations o combination of system’s NTUITION mance behavior of configurable (i) performance debugging, (ii) gn-time evolution, or (iv) runtime pirical understanding of how the stem will vary when the environ- Such empirical understanding will develop faster and more accurate ow us to make predictions and e for highly configurable systems 10]. For instance, we can learn ystem on a cheap hardware in a nd use that to understand the per- em on a production server before ore specifically, we would like to is between the performance of a ment (characterized by software kload, and system version) to the mental conditions. or an empirical understanding of A. Preliminary concepts In this section, we provide formal definitions of fo cepts that we use throughout this study. The formal n enable us to concisely convey concept throughout the 1) Configuration and environment space: Let Fi the i-th feature of a configurable system A which i enabled or disabled and one of them holds by defa configuration space is mathematically a Cartesian pro all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd) Dom(Fi) = {0, 1}. A configuration of a system a member of the configuration space (feature space all the parameters are assigned to a specific value range (i.e., complete instantiations of the system’s para We also describe an environment instance by 3 v e = [w, h, v] drawn from a given environment spa W ⇥H ⇥V , where they respectively represent sets of values for workload, hardware and system version. 2) Performance model: Given a software system configuration space F and environmental instances E formance model is a black-box function f : F ⇥ E given some observations of the system performance combination of system’s features x 2 F in an envi space F and environmental instances E, a per- odel is a black-box function f : F ⇥ E ! R observations of the system performance for each of system’s features x 2 F in an environment construct a performance model for a system A ation space F, we run A in environment instance ious combinations of configurations xi 2 F, and sulting performance values yi = f(xi) + ✏i, xi 2 ⇠ N (0, i). The training data for our regression en simply Dtr = {(xi, yi)}n i=1. In other words, a ction is simply a mapping from the input space to performance metric that produces interval-scaled e assume it produces real numbers). mance distribution: For the performance model, and associated the performance response to each , now let introduce another concept where we ironment and we measure the performance. An rformance distribution is a stochastic process, (R), that defines a probability distribution over measures for each environmental conditions. To ware o the g of med rfor- uited we arch ans- and ried able ider t of vari- ance tand be configuration space F and environmental instances E, a formance model is a black-box function f : F ⇥ E ! given some observations of the system performance for combination of system’s features x 2 F in an environm e 2 E. To construct a performance model for a system with configuration space F, we run A in environment inst e 2 E on various combinations of configurations xi 2 F, record the resulting performance values yi = f(xi) + ✏i, x F where ✏i ⇠ N (0, i). The training data for our regres models is then simply Dtr = {(xi, yi)}n i=1. In other wor response function is simply a mapping from the input spac a measurable performance metric that produces interval-sc data (here we assume it produces real numbers). 3) Performance distribution: For the performance mo we measured and associated the performance response to configuration, now let introduce another concept where vary the environment and we measure the performance empirical performance distribution is a stochastic pro pd : E ! (R), that defines a probability distribution performance measures for each environmental conditions Extract Reuse f(·,es) f(·, et) We hypothesize that we can - learn and transfer - different forms of knowledge - across environments, - while so far only simple transfers are attempted!
  66. 66. Exploratory study • Study performance models across environments • Measure the performance of each system using standard benchmarks • 36 comparisons of environmental changes: • Different hardware • Different workloads • Different versions • Severity of environmental changes [5 levels from small to severe] • Less severe changes -> more related models -> easier to transfer • More severe changes -> less related models -> difficult to transfer 72
  67. 67. Empirical analysis or relatedness • 4 general research questions covering: • RQ1: Entire configuration space • RQ2: Option (feature) specific • RQ3: Option (feature) interactions • RQ4: Invalid configurations • Different assumptions about the relatedness as hypotheses. • For each hypothesis: • Analyze environmental changes in four subject systems. • Discuss how commonly we identify this kind of relatedness. 73
  68. 68. Subject systems 74 TABLE I: Overview of the real-world subject systems. System Domain d |C| |H| |W| |V | SPEAR SAT solver 14 16 384 3 4 2 x264 Video encoder 16 4 000 2 3 3 SQLite Database 14 1 000 2 14 2 SaC Compiler 50 71 267 1 10 1 d: configuration options; C: configurations; H: hardware environments; W : analyzed workload; V : analyzed versions. IV. PERFORMANCE BEHAVIOR CONSISTENCY (RQ1) Here, we investigate the relatedness of the source and target environments in the entire configuration space. We start by testing the strongest assumptions about the relatedness of environm reuse me predict th that virtu not be ab Worklo correlatio SAT prob the differ across en slightly for other instance
  69. 69. Level of relatedness between source and target is important 10 20 30 40 50 60 AbsolutePercentageError[%] Sources s s1 s2 s3 s4 s5 s6 noise-level 0 5 10 15 20 25 30 corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19 µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75 Fig. 6: Prediction accuracy of the model learned with samples from different sources of different relatedness to the target. GP is the model without transfer learning. TABLE column datasets measure 1 2 3 4 5 6 predictio system, as the e pled for • Model becomes more accurate when the source is more related to the target • Even learning from a source with a small correlation is better than no transfer 75
  70. 70. 76 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 5 10 15 20 25 30 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 10 12 14 16 18 20 22 24 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 10 15 20 25 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 10 15 20 25 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 6 8 10 12 14 16 18 20 22 24 (a) (b) (c) (d) (e) 5 10 15 20 25 number of particles 5 10 15 20 25 numberofrefinements 12 14 16 18 20 22 24 (f) CPU usage [%] CPU usage [%] CPU usage [%] CPU usage [%] CPU usage [%] CPU usage [%] Less related-> Less accurate More related-> More accurate
  71. 71. RQ1: Does performance behavior stay consistent across environments? For non-severe hardware changes, we can linearly transfer performance models across environments. 𝑓+ = 𝛼×𝑓/ + β 77 M1: Pearson linear correlation M2: Kullback-Leibler (KL) divergence M3: Spearman correlation coefficient M4/M5: Percentage of top/bottom configurations TABLE II RQ1 Environment ES H1.1 H1.2 H1.3 H1.4 H M1 M2 M3 M4 M5 M6 M7 SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914 ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7 ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5 ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read s ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody10 ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12 11
  72. 72. RQ1: Does performance behavior stay consistent across environments? For severe environmental changes, the performance distributions are similar, showing the potential for learning a non-linear transfer function. 78 TABLE II RQ1 Environment ES H1.1 H1.2 H1.3 H1.4 H M1 M2 M3 M4 M5 M6 M7 SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914 ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7 ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5 ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read s ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody10 ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12 11 M1: Pearson linear correlation M2: Kullback-Leibler (KL) divergence M3: Spearman correlation coefficient M4/M5: Percentage of top/bottom configurations
  73. 73. RQ1: Does performance behavior stay consistent across environments? The configurations retain their relative performance profile across hardware platforms. 79 M1: Pearson linear correlation M2: Kullback-Leibler (KL) divergence M3: Spearman correlation coefficient M4/M5: Percentage of top/bottom configurations TABLE II RQ1 Environment ES H1.1 H1.2 H1.3 H1.4 H M1 M2 M3 M4 M5 M6 M7 SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914 ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7 ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5 ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read s ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody10 ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12 11
  74. 74. RQ1: Does performance behavior stay consistent across environments? Only hardware changes preserve top configurations across environments. 80 x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody100, ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12 ec5 : [h1, w2 ! w3, v1] M 0.76 1.82 0.84 0.67 0.86 17 11 ec6 : [h1, w2 ! w4, v1] S 0.91 5.54 0.80 0.00 0.91 14 11 ec7 : [h1, w2 ! w5, v1] L 0.68 25.31 0.57 0.11 0.71 14 14 ec8 : [h1, w3 ! w4, v1] L 0.68 1.70 0.56 0.00 0.91 14 13 ec9 : [h1, w3 ! w5, v1] VL 0.06 3.68 0.20 0.00 0.64 16 10 ec10 : [h1, w4 ! w5, v1] M 0.70 4.85 0.76 0.00 0.75 12 12 ec11 : [h1, w6 ! w7, v1] S 0.82 5.79 0.77 0.25 0.88 36 30 ec12 : [h1, w6 ! w8, v1] S 1.00 0.52 0.92 0.80 0.97 38 30 ec13 : [h1, w8 ! w7, v1] S 1.00 0.32 0.92 0.53 0.99 30 33 ec14 : [h1, w9 ! w10, v1] L 0.24 4.85 0.56 0.44 0.77 22 21 ES: Expected severity of environmental change (see Sec. III-B): S: small change; SM: small medium c SaC workload descriptions: srad: random matrix generator; pfilter: particle filtering; hotspot: heat tran nbody: simulation of dynamic systems; cg: conjugate gradient; gc: garbage collector. Hardware descript h1: NUC/4/1.30/15/SSD; h2: NUC/2/2.13/7/SCSI; h3:Station/2/2.8/3/SCSI; h4: Amazon/1/2.4/1/SSD; h5: 11 TABLE II: R RQ1 Environment ES H1.1 H1.2 H1.3 H1.4 H2.1 M1 M2 M3 M4 M5 M6 M7 SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4 ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7 ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5 ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r2 ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 ec : [h , w , v ! v ] M 0.95 1.00 0.79 0.24 0.29 2 4 11 M1: Pearson linear correlation M2: Kullback-Leibler (KL) divergence M3: Spearman correlation coefficient M4/M5: Percentage of top/bottom configurations
  75. 75. RQ2: Is the influence of options on performance consistent across environments? Only a subset of options are influential and a large proportion of influential options are preserved. 81 TABLE II: Result RQ1 RQ2 Environment ES H1.1 H1.2 H1.3 H1.4 H2.1 M1 M2 M3 M4 M5 M6 M7 M8 SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4 : 978 ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 7 ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 4 ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7 4 ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 4 ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5 3 ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 1 ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 2 x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r2744, ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 8 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 8 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 8 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 7 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 10 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 9 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 5 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 5 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; Versi ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 2 ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 4 ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 1 ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 0 ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 1 ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 1 ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 2 SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody100, w7 : n ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 8 ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 8 ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 9 ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12 10 11 TABLE II: Results. RQ1 RQ2 RQ3 Environment ES H1.1 H1.2 H1.3 H1.4 H2.1 H2.2 H3.1 H M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4 : 978/7498; Version: v1 : 1.2, v2 : 2.7 ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 7 0 1 25 25 25 ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 4 2 0.51 41 27 21 ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7 4 3 1 23 23 22 ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 4 3 0.99 22 23 22 ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5 3 1 0.32 21 7 7 ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 1 3 0.68 7 21 7 ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 2 2 0.88 21 7 7 - x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r2744, v3 : r2744 ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 8 0 0.86 21 33 18 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 8 0 0.94 36 27 24 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 8 1 0.89 27 33 22 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 7 1 0.88 27 33 20 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 10 0 0.83 47 33 29 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 9 0 0.80 46 33 27 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 5 1 0.78 33 33 20 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 5 2 0.58 33 21 18 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; Version: v1 : 3.7.6.3, v2 : 3.19.0 ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 2 0 1 13 9 8 ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 4 0 1 10 11 9 ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 1 0 1 9 9 7 ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 0 0 1 4 2 2 ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 1 0 1 12 11 7 ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 1 1 0.31 7 11 6 ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 2 1 0.31 7 13 6 SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody100, w7 : nbody150, w8 : nbody750, w9 : gc, w10 ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 8 0 0.88 82 73 52 ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 8 0 0.91 82 63 50 ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 9 0 0.96 37 64 34 ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12 10 0 0.94 34 58 25 11 M6/M7: Number of influential options in source and target M8/M9: Number of options that agree/disagree M10: Correlation between importance of options
  76. 76. RQ2: Is the influence of options on performance consistent across environments? The strength of the influence of configuration options is typically preserved across environments. 82 x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.9 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.7 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.5 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.5 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.4 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.3 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.2 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.2 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.3 ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.4 ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.3 ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.4 ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.2 ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.3 ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.4 SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.7 ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.6 ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.9 ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.8 ec5 : [h1, w2 ! w3, v1] M 0.76 1.82 0.84 0.67 0.8 ec6 : [h1, w2 ! w4, v1] S 0.91 5.54 0.80 0.00 0.9 ec7 : [h1, w2 ! w5, v1] L 0.68 25.31 0.57 0.11 0.7 ec8 : [h1, w3 ! w4, v1] L 0.68 1.70 0.56 0.00 0.9 ec9 : [h1, w3 ! w5, v1] VL 0.06 3.68 0.20 0.00 0.6 ec10 : [h1, w4 ! w5, v1] M 0.70 4.85 0.76 0.00 0.7 ec11 : [h1, w6 ! w7, v1] S 0.82 5.79 0.77 0.25 0.8 ec12 : [h1, w6 ! w8, v1] S 1.00 0.52 0.92 0.80 0.9 ec13 : [h1, w8 ! w7, v1] S 1.00 0.32 0.92 0.53 0.9 ec14 : [h1, w9 ! w10, v1] L 0.24 4.85 0.56 0.44 0.7 ES: Expected severity of environmental change (see Sec. III-B): S: small change; SM SaC workload descriptions: srad: random matrix generator; pfilter: particle filtering; h nbody: simulation of dynamic systems; cg: conjugate gradient; gc: garbage collector. h1: NUC/4/1.30/15/SSD; h2: NUC/2/2.13/7/SCSI; h3:Station/2/2.8/3/SCSI; h4: Amazon 11 RQ1 Environment ES H1.1 H1.2 H1.3 H1.4 M1 M2 M3 M4 M SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.9 ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.8 ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.3 ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.1 ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.3 ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.0 ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.1 x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.9 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.7 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.5 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.5 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.4 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.3 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.2 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.2 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.3 ec : [h ! h , w , v ] M 0.97 1.08 0.88 0.40 0.4 11 TABLE II: Results. RQ1 RQ2 Environment ES H1.1 H1.2 H1.3 H1.4 H2.1 H2.2 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4 : 978/7498; Version: v ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 7 0 1 ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 4 2 0.51 ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7 4 3 1 ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 4 3 0.99 ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5 3 1 0.32 ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 1 3 0.68 ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 2 2 0.88 x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r2744, v3 : r2744 ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 8 0 0.86 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 8 0 0.94 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 8 1 0.89 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 7 1 0.88 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 10 0 0.83 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 9 0 0.80 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 5 1 0.78 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 5 2 0.58 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; Version: v1 : 3.7.6.3, v ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 2 0 1 ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 4 0 1 ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 1 0 1 11 1 2 3 1 2 3 ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 8 0 0.86 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 8 0 0.94 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 8 1 0.89 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 7 1 0.88 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 10 0 0.83 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 9 0 0.80 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 5 1 0.78 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 5 2 0.58 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; Version: v1 : 3.7.6.3, v ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 2 0 1 ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 4 0 1 ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 1 0 1 ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 0 0 1 ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 1 0 1 ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 1 1 0.31 ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 2 1 0.31 SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody100, w7 : nbody150, w8 : nb ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 8 0 0.88 ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 8 0 0.91 ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 9 0 0.96 ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12 10 0 0.94 ec5 : [h1, w2 ! w3, v1] M 0.76 1.82 0.84 0.67 0.86 17 11 9 1 0.95 ec6 : [h1, w2 ! w4, v1] S 0.91 5.54 0.80 0.00 0.91 14 11 8 0 0.85 ec7 : [h1, w2 ! w5, v1] L 0.68 25.31 0.57 0.11 0.71 14 14 8 0 0.88 ec8 : [h1, w3 ! w4, v1] L 0.68 1.70 0.56 0.00 0.91 14 13 9 1 0.88 ec9 : [h1, w3 ! w5, v1] VL 0.06 3.68 0.20 0.00 0.64 16 10 9 0 0.90 ec10 : [h1, w4 ! w5, v1] M 0.70 4.85 0.76 0.00 0.75 12 12 11 0 0.95 ec11 : [h1, w6 ! w7, v1] S 0.82 5.79 0.77 0.25 0.88 36 30 28 2 0.89 1 ec12 : [h1, w6 ! w8, v1] S 1.00 0.52 0.92 0.80 0.97 38 30 22 6 0.94 ec13 : [h1, w8 ! w7, v1] S 1.00 0.32 0.92 0.53 0.99 30 33 26 1 0.98 ec14 : [h1, w9 ! w10, v1] L 0.24 4.85 0.56 0.44 0.77 22 21 18 3 0.69 2 ES: Expected severity of environmental change (see Sec. III-B): S: small change; SM: small medium change; M: medium chan SaC workload descriptions: srad: random matrix generator; pfilter: particle filtering; hotspot: heat transfer differential equations nbody: simulation of dynamic systems; cg: conjugate gradient; gc: garbage collector. Hardware descriptions (ID: Type/CPUs/Clo h1: NUC/4/1.30/15/SSD; h2: NUC/2/2.13/7/SCSI; h3:Station/2/2.8/3/SCSI; h4: Amazon/1/2.4/1/SSD; h5: Amazon/1/2.4/0.5/SSD; 11 M6/M7: Number of influential options in source and target M8/M9: Number of options that agree/disagree M10: Correlation between importance of options
  77. 77. RQ3: Are the interactions among configuration options preserved across environments? • A low percentage of potential interactions are relevant for performance. • The importance of interactions is typically pre- served across environments. 83 TABLE II: Results. RQ1 RQ2 RQ3 Environment ES H1.1 H1.2 H1.3 H1.4 H2.1 H2.2 H3.1 H3.2 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4 : 978/7498; Version: v1 : 1.2, v2 : 2.7 ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 9 7 7 0 1 25 25 25 1.00 0.47 ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 12 7 4 2 0.51 41 27 21 0.98 0.48 ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 9 7 4 3 1 23 23 22 0.99 0.45 ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 7 7 4 3 0.99 22 23 22 0.99 0.45 ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 6 5 3 1 0.32 21 7 7 0.33 0.45 ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 5 6 1 3 0.68 7 21 7 0.31 0.50 ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 6 4 2 2 0.88 21 7 7 -0.44 0.47 x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r2744, v3 : r2744 ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 9 10 8 0 0.86 21 33 18 1.00 0.49 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 9 9 8 0 0.94 36 27 24 1.00 0.49 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 9 11 8 1 0.89 27 33 22 0.96 0.49 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 9 10 7 1 0.88 27 33 20 0.96 0.49 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 12 10 10 0 0.83 47 33 29 1.00 0.49 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 11 10 9 0 0.80 46 33 27 0.99 0.49 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 8 10 5 1 0.78 33 33 20 0.94 0.49 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 8 9 5 2 0.58 33 21 18 0.94 0.49 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : read seq; Version: v1 : 3.7.6.3, v2 : 3.19.0 ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 5 2 2 0 1 13 9 8 1.00 N/A ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 5 5 4 0 1 10 11 9 1.00 N/A ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 2 3 1 0 1 9 9 7 0.99 N/A ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 1 1 0 0 1 4 2 2 1.00 N/A ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 2 4 1 0 1 12 11 7 0.99 N/A ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 3 4 1 1 0.31 7 11 6 0.96 N/A ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 3 5 2 1 0.31 7 13 6 0.97 N/A SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : nbody100, w7 : nbody150, w8 : nbody750, w9 : gc, w10 : cg ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 13 14 8 0 0.88 82 73 52 0.27 0.18 ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 13 10 8 0 0.91 82 63 50 0.56 0.18 ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 12 10 9 0 0.96 37 64 34 0.94 0.16 ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 16 12 10 0 0.94 34 58 25 0.04 0.15 11 TA RQ1 Environment ES H1.1 H1.2 H1.3 H1.4 M1 M2 M3 M4 M5 SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 155 ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 ec5 : [h1, w1, v1 ! v2] M 0.95 1.00 0.79 0.24 0.29 ec6 : [h1, w2 ! w1, v1 ! v2] L 0.51 2.80 0.44 0.25 0.30 ec7 : [h2 ! h1, w2 ! w1, v1 ! v2] VL 0.53 4.91 0.53 0.42 0.47 SaC— Workload: w1 : srad, w2 : pfilter, w3 : kmeans, w4 : hotspot, w5 : nw, w6 : ec1 : [h1, w1 ! w2, v1] L 0.66 25.02 0.65 0.10 0.79 ec2 : [h1, w1 ! w3, v1] L 0.44 15.77 0.42 0.10 0.65 ec3 : [h1, w1 ! w4, v1] S 0.93 7.88 0.93 0.36 0.90 ec4 : [h1, w1 ! w5, v1] S 0.96 2.82 0.78 0.06 0.81 11 M11: Number of interactions in the source M12: Number of interactions in the target M13: Number of interactions that agree M14: Correlation between the coefficients
  78. 78. RQ4: Are the invalid configurations consistent across environment? • A large percentage of configurations is typically invalid. • Information for distinguishing invalid regions can be transferred across environments. 84 TABLE II: Results. RQ1 RQ2 RQ3 RQ4 ES H1.1 H1.2 H1.3 H1.4 H2.1 H2.2 H3.1 H3.2 H4.1 H4.2 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 s): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/11914, w4 : 978/7498; Version: v1 : 1.2, v2 : 2.7 S 1.00 0.22 0.97 0.92 0.92 9 7 7 0 1 25 25 25 1.00 0.47 0.45 1 1.00 L 0.59 24.88 0.91 0.76 0.86 12 7 4 2 0.51 41 27 21 0.98 0.48 0.45 1 0.98 S 0.96 1.97 0.17 0.44 0.32 9 7 4 3 1 23 23 22 0.99 0.45 0.45 1 1.00 M 0.90 3.36 -0.08 0.30 0.11 7 7 4 3 0.99 22 23 22 0.99 0.45 0.49 1 0.94 L 0.23 0.30 0.35 0.28 0.32 6 5 3 1 0.32 21 7 7 0.33 0.45 0.50 1 0.96 L -0.10 0.72 -0.05 0.35 0.04 5 6 1 3 0.68 7 21 7 0.31 0.50 0.45 1 0.96 VL -0.10 6.95 0.14 0.41 0.15 6 4 2 2 0.88 21 7 7 -0.44 0.47 0.50 1 0.97 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r2389, v2 : r2744, v3 : r2744 SM 0.97 1.00 0.99 0.97 0.92 9 10 8 0 0.86 21 33 18 1.00 0.49 0.49 1 1 S 0.96 0.02 0.96 0.76 0.79 9 9 8 0 0.94 36 27 24 1.00 0.49 0.49 1 1 S 0.65 0.06 0.63 0.53 0.58 9 11 8 1 0.89 27 33 22 0.96 0.49 0.49 1 1 M 0.67 0.06 0.64 0.53 0.56 9 10 7 1 0.88 27 33 20 0.96 0.49 0.49 1 1 S 0.05 1.64 0.44 0.43 0.42 12 10 10 0 0.83 47 33 29 1.00 0.49 0.49 1 1 L 0.06 1.54 0.43 0.43 0.37 11 10 9 0 0.80 46 33 27 0.99 0.49 0.49 1 1 L 0.08 1.03 0.26 0.25 0.22 8 10 5 1 0.78 33 33 20 0.94 0.49 0.49 1 1 VL 0.09 14.51 0.26 0.23 0.25 8 9 5 2 0.58 33 21 18 0.94 0.49 0.49 1 1 w2 : write batch, w3 : read rand, w4 : read seq; Version: v1 : 3.7.6.3, v2 : 3.19.0 S 0.99 0.37 0.82 0.35 0.31 5 2 2 0 1 13 9 8 1.00 N/A N/A N/A N/A M 0.97 1.08 0.88 0.40 0.49 5 5 4 0 1 10 11 9 1.00 N/A N/A N/A N/A S 0.96 1.27 0.83 0.40 0.35 2 3 1 0 1 9 9 7 0.99 N/A N/A N/A N/A M 0.50 1.24 0.43 0.17 0.43 1 1 0 0 1 4 2 2 1.00 N/A N/A N/A N/A TAB RQ1 Environment ES H1.1 H1.2 H1.3 H1.4 M1 M2 M3 M4 M5 M SPEAR— Workload (#variables/#clauses): w1 : 774/5934, w2 : 1008/7728, w3 : 1554/ ec1 : [h2 ! h1, w1, v2] S 1.00 0.22 0.97 0.92 0.92 ec2 : [h4 ! h1, w1, v2] L 0.59 24.88 0.91 0.76 0.86 1 ec3 : [h1, w1 ! w2, v2] S 0.96 1.97 0.17 0.44 0.32 ec4 : [h1, w1 ! w3, v2] M 0.90 3.36 -0.08 0.30 0.11 ec5 : [h1, w1, v2 ! v1] L 0.23 0.30 0.35 0.28 0.32 ec6 : [h1, w1 ! w2, v1 ! v2] L -0.10 0.72 -0.05 0.35 0.04 ec7 : [h1 ! h2, w1 ! w4, v2 ! v1] VL -0.10 6.95 0.14 0.41 0.15 x264— Workload (#pictures/size): w1 : 8/2, w2 : 32/11, w3 : 128/44; Version: v1 : r23 ec1 : [h2 ! h1, w3, v3] SM 0.97 1.00 0.99 0.97 0.92 ec2 : [h2 ! h1, w1, v3] S 0.96 0.02 0.96 0.76 0.79 ec3 : [h1, w1 ! w2, v3] S 0.65 0.06 0.63 0.53 0.58 ec4 : [h1, w1 ! w3, v3] M 0.67 0.06 0.64 0.53 0.56 ec5 : [h1, w3, v2 ! v3] S 0.05 1.64 0.44 0.43 0.42 1 ec6 : [h1, w3, v1 ! v3] L 0.06 1.54 0.43 0.43 0.37 1 ec7 : [h1, w1 ! w3, v2 ! v3] L 0.08 1.03 0.26 0.25 0.22 ec8 : [h2 ! h1, w1 ! w3, v2 ! v3] VL 0.09 14.51 0.26 0.23 0.25 SQLite— Workload: w1 : write seq, w2 : write batch, w3 : read rand, w4 : re ec1 : [h3 ! h2, w1, v1] S 0.99 0.37 0.82 0.35 0.31 ec2 : [h3 ! h2, w2, v1] M 0.97 1.08 0.88 0.40 0.49 ec3 : [h2, w1 ! w2, v1] S 0.96 1.27 0.83 0.40 0.35 ec4 : [h2, w3 ! w4, v1] M 0.50 1.24 0.43 0.17 0.43 11 M15/M16: Percentage of invalid configurations in the source and target M17: Percentage of invalid configurations, which are common between environments M18: Correlation between the coefficients of the classification models
  79. 79. Implications for transfer learning • When and why TL works for performance modeling • Small environmental changes -> Performance behavior is consistent • A linear transformation of performance models provide a good approximation • Large environmental changes -> Individual options and interactions may stay consistent • A non-linear mapping between performance behavior across environments • Severe environmental changes -> Found transferable knowledge • Invalid configurations providing opportunities for avoiding measurements • Intuitive judgments about transferability of knowledge • Without deep knowledge about the configuration or implementation 85
  80. 80. Future Work, Insights and Ideas
  81. 81. Future research opportunities • Sampling strategies • More informative samples • Exploiting the importance of specific regions or avoiding invalid regions • Learning mechanisms • Learning either a linear or non-linear associations • Performance testing and debugging • Transferring interesting test cases that cover interactions between options • Performance tuning and optimization • Identifying the interacting options • Importance sampling exploiting feature interactions 87
  82. 82. Selecting from multiple sources 88 Source Simulator Target Simulator Source Robot Target Robot C1 C3 C2 - Different cost associated to the sources - Problem is to take sample from appropriate source to gain more information given limited budget
  83. 83. Active learning with transfer learning 89 Measure Learn with TL TurtleBot 50+3*C1+15*C2-7*C2*C3 Measure Simulator (Gazebo) Data Reuse DataData Iteratively find best sample points that maximize knowledge
  84. 84. Integrating transfer learning in MAPE-K 90 • Contribute to Knowledge • Assist in self-optimization • Support online learning Knowledge Environment Monitoring (+power) Analysis (+power) Planning (+recharge) Execution Accuracy Architecture Task Power
  85. 85. Summary
  86. 86. Recap of my previous work 92 RobusT2Scale/FQL4KE [PhD] ✓ Engineering / technical ✓ Maintains application responsiveness ✓ Handles environmental uncertainties ✓ Enables knowledge evolution ✗ Learns slowly when situation is changed BO4CO/TL4CO [Postdoc1@Imperial] ✓ Mathematical modeling ✓ Finds optimal configuration given a measurement budget ✓ Step-wise active learning ✗ Does’nt scale well to high dimensions ✗ Expensive to learn Goal ✔ Industry relevant research ✔ Learn accurate/reliable/cheap performance model ✔ Learned model is used for performance tuning/debugging/optimization/runtime adaptation Transfer Learning [Postdoc2@CMU] ✓ Empirical ✓ Learns accurate and reliable models from “related” sources ✓ Reuse learning across environmental changes ✗ For severe environmental changes, transfer is limited, but possible! [SEAMS14,QoSA16,CCGrid17, TAAS] [MASCOTS16,WICSA16] [SEAMS17]
  87. 87. Acknowledgements 93 Miguel Velez PhD student, CMU Christian Kaestner Professor, CMU Norbert Siegmund Professor, Bauhaus-Weimar

×