Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Architectural Tradeoff in Learning-Based Software

609 views

Published on

In classical software development, developers write explicit instructions in a programming language to hardcode the explicit behavior of software systems. By writing each line of code, the programmer instructs the software to have the desirable behavior by exploring a specific point in program space.

Recently, however, software systems are adding learning components that, instead of hardcoding an explicit behavior, learn a behavior through data. The learning-intensive software systems are written in terms of models and their parameters that need to be adjusted based on data. In learning-enabled systems, we specify some constraints on the behavior of a desirable program (e.g., a data set of input–output pairs of examples) and use the computational resources to search through the program space to find a program that satisfies the constraints. In neural networks, we restrict the search to a continuous subset of the program space.

This talk provides experimental evidence of making tradeoffs for deep neural network models, using the Deep Neural Network Architecture system as a case study. Concrete experimental results are presented; also featured are additional case studies in big data (Storm, Cassandra), data analytics (configurable boosting algorithms), and robotics applications.

Published in: Software
  • Be the first to comment

Architectural Tradeoff in Learning-Based Software

  1. 1. Architectural Tradeoff in Learning-Based Software Pooyan Jamshidi Carnegie Mellon University cs.cmu.edu/~pjamshid SATURN 2018 Texas, May 8th, 2018
  2. 2. Who am I?
  3. 3. Who am I?
  4. 4. Who am I?
  5. 5. I hang around here! Software Arch. Machine Learning Distributed Systems
  6. 6. Goal: Enable developers/users to find the right quality tradeoff
  7. 7. Today’s most popular (ML) systems are configurable built
  8. 8. Today’s most popular (ML) systems are configurable built
  9. 9. Today’s most popular (ML) systems are configurable built
  10. 10. Today’s most popular (ML) systems are configurable built
  11. 11. Today’s most popular (ML) systems are configurable built
  12. 12. Empirical observations confirm that systems are becoming increasingly configurable 08 7/2010 7/2012 7/2014 Release time 1/1999 1/2003 1/2007 1/2011 0 1/2014 N Release time 02 1/2006 1/2010 1/2014 2.2.14 2.3.4 2.0.35 .3.24 Release time Apache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Numberofparameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]
  13. 13. Empirical observations confirm that systems are becoming increasingly configurable nia San Diego, ‡Huazhong Univ. of Science & Technology, †NetApp, Inc tixu, longjin, xuf001, yyzhou}@cs.ucsd.edu kar.Pasupathy, Rukma.Talwadker}@netapp.com prevalent, but also severely software. One fundamental y of configuration, reflected parameters (“knobs”). With software to ensure high re- aunting, error-prone task. nderstanding a fundamental users really need so many answer, we study the con- including thousands of cus- m (Storage-A), and hundreds ce system software projects. ng findings to motivate soft- ore cautious and disciplined these findings, we provide ich can significantly reduce A as an example, the guide- ters and simplify 19.7% of on existing users. Also, we tion methods in the context 7/2006 7/2008 7/2010 7/2012 7/2014 0 100 200 300 400 500 600 700 Storage-A Numberofparameters Release time 1/1999 1/2003 1/2007 1/2011 0 100 200 300 400 500 5.6.2 5.5.0 5.0.16 5.1.3 4.1.0 4.0.12 3.23.0 1/2014 MySQL Numberofparameters Release time 1/1998 1/2002 1/2006 1/2010 1/2014 0 100 200 300 400 500 600 1.3.14 2.2.14 2.3.4 2.0.35 1.3.24 Numberofparameters Release time Apache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Numberofparameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]
  14. 14. “all constants should be configurable, even if we can’t see any reason to configure them.” — HDFS-4304
  15. 15. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX
  16. 16. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX
  17. 17. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX
  18. 18. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX
  19. 19. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX Speed Energy
  20. 20. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX Speed Energy
  21. 21. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX Speed Energy
  22. 22. How do we understand performance behavior of real-world highly-configurable systems that scale well…
  23. 23. How do we understand performance behavior of real-world highly-configurable systems that scale well… … and enable developers/users to reason about qualities (performance, energy) and to make tradeoff?
  24. 24. Outline Case Study
  25. 25. Outline Case Study Transfer Learning [SEAMS’17]
  26. 26. Outline Case Study Transfer Learning Theory Building [SEAMS’17] [ASE’17]
  27. 27. Outline Case Study Transfer Learning Theory Building Guided Sampling [SEAMS’17] [ASE’17]
  28. 28. Outline Case Study Transfer Learning Theory Building Guided Sampling Future Directions [SEAMS’17] [ASE’17]
  29. 29. SocialSensor •Identifying trending topics •Identifying user defined topics •Social media search
  30. 30. SocialSensor Internet
  31. 31. SocialSensor Crawling Tweets: [5k-20k/min] Crawled items Internet
  32. 32. SocialSensor Crawling Tweets: [5k-20k/min] Store Crawled items Internet
  33. 33. SocialSensor Orchestrator Crawling Tweets: [5k-20k/min] Every 10 min: [100k tweets] Store Crawled items FetchInternet
  34. 34. SocialSensor Content AnalysisOrchestrator Crawling Tweets: [5k-20k/min] Every 10 min: [100k tweets] Store Push Crawled items FetchInternet
  35. 35. SocialSensor Content AnalysisOrchestrator Crawling Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Store Push Store Crawled items FetchInternet
  36. 36. SocialSensor Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet
  37. 37. Challenges Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet
  38. 38. Challenges Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet 100X
  39. 39. Challenges Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet 100X 10X
  40. 40. Challenges Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet 100X 10X Real time
  41. 41. How can we gain a better performance without using more resources?
  42. 42. Let’s try out different system configurations!
  43. 43. Opportunity: Data processing engines in the pipeline were all configurable
  44. 44. Opportunity: Data processing engines in the pipeline were all configurable
  45. 45. Opportunity: Data processing engines in the pipeline were all configurable
  46. 46. Opportunity: Data processing engines in the pipeline were all configurable
  47. 47. Opportunity: Data processing engines in the pipeline were all configurable > 100 > 100 > 100
  48. 48. Opportunity: Data processing engines in the pipeline were all configurable > 100 > 100 > 100 2300
  49. 49. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) Default configuration was bad, so was the expert’ better better
  50. 50. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) Default configuration was bad, so was the expert’ Defaultbetter better
  51. 51. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) Default configuration was bad, so was the expert’ Default Recommended by an expert better better
  52. 52. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) Default configuration was bad, so was the expert’ Default Recommended by an expert Optimal Configuration better better
  53. 53. 0 0.5 1 1.5 2 2.5 Throughput (ops/sec) 10 4 0 50 100 150 200 250 300 Latency(ms) Default configuration was bad, so was the expert’ better better
  54. 54. 0 0.5 1 1.5 2 2.5 Throughput (ops/sec) 10 4 0 50 100 150 200 250 300 Latency(ms) Default configuration was bad, so was the expert’ Default better better
  55. 55. 0 0.5 1 1.5 2 2.5 Throughput (ops/sec) 10 4 0 50 100 150 200 250 300 Latency(ms) Default configuration was bad, so was the expert’ Default Recommended by an expert better better
  56. 56. 0 0.5 1 1.5 2 2.5 Throughput (ops/sec) 10 4 0 50 100 150 200 250 300 Latency(ms) Default configuration was bad, so was the expert’ Default Recommended by an expert Optimal Configuration better better
  57. 57. Why this is an important problem?
  58. 58. Why this is an important problem? Significant time saving
  59. 59. Why this is an important problem? Significant time saving • 2X-10X faster than worst
  60. 60. Why this is an important problem? Significant time saving • 2X-10X faster than worst • Noticeably faster than median
  61. 61. Why this is an important problem? Significant time saving • 2X-10X faster than worst • Noticeably faster than median • Default is bad
  62. 62. Why this is an important problem? Significant time saving • 2X-10X faster than worst • Noticeably faster than median • Default is bad • Expert’s is not optimal
  63. 63. Why this is an important problem? Significant time saving • 2X-10X faster than worst • Noticeably faster than median • Default is bad • Expert’s is not optimal
  64. 64. Why this is an important problem? Significant time saving • 2X-10X faster than worst • Noticeably faster than median • Default is bad • Expert’s is not optimal Large configuration space
  65. 65. Why this is an important problem? Significant time saving • 2X-10X faster than worst • Noticeably faster than median • Default is bad • Expert’s is not optimal Large configuration space • Exhaustive search is expensive
  66. 66. Why this is an important problem? Significant time saving • 2X-10X faster than worst • Noticeably faster than median • Default is bad • Expert’s is not optimal Large configuration space • Exhaustive search is expensive • Specific to hardware/workload/version
  67. 67. What did happen at the end? • Achieved the objectives (100X user, same experience) • Saved money by reducing cloud resources up to 20% • Our tool was able to identify configurations that was consistently better than expert recommendation
  68. 68. Outline Case Study Transfer Learning Theory Building Guided Sampling Future Directions
  69. 69. To enable performance tradeoff, we need a model to reason about qualities void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX endif else PARROT_HAS_SETENV LINUX f(·) = 5 + 3 ⇥ o1 Execution time (s)
  70. 70. To enable performance tradeoff, we need a model to reason about qualities void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX endif else PARROT_HAS_SETENV LINUX f(·) = 5 + 3 ⇥ o1 Execution time (s) f(o1 := 1) = 8
  71. 71. To enable performance tradeoff, we need a model to reason about qualities void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX endif else PARROT_HAS_SETENV LINUX f(·) = 5 + 3 ⇥ o1 Execution time (s) f(o1 := 0) = 5 f(o1 := 1) = 8
  72. 72. What is a performance model? f : C ! R f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2 c =< o1, o2 >
  73. 73. What is a performance model? f : C ! R f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2 c =< o1, o2 >
  74. 74. What is a performance model? f : C ! R f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2 c =< o1, o2 >
  75. 75. What is a performance model? f : C ! R f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2 c =< o1, o2 > c =< o1, o2, ..., o10 >
  76. 76. What is a performance model? f : C ! R f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2 c =< o1, o2 > c =< o1, o2, ..., o10 > c =< o1, o2, ..., o100 > ···
  77. 77. How do we learn performance models? TurtleBot
  78. 78. How do we learn performance models? Measure TurtleBot Configurations
  79. 79. How do we learn performance models? Measure Learn TurtleBot Configurations f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2
  80. 80. How do we learn performance models? Measure Learn TurtleBot Optimization Reasoning Debugging Configurations f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2
  81. 81. Insight: Performance measurements of the real system is “similar” to the ones from the simulators Measure Configurations Performance
  82. 82. Insight: Performance measurements of the real system is “similar” to the ones from the simulators Measure Configurations Performance
  83. 83. Insight: Performance measurements of the real system is “similar” to the ones from the simulators Measure Simulator (Gazebo) Data Configurations Performance So why not reuse these data, instead of measuring on real robot?
  84. 84. We developed methods to make learning cheaper via transfer learning Target (Learn)Source (Given) DataModel Transferable Knowledge II. INTUITION Understanding the performance behavior of configurable software systems can enable (i) performance debugging, (ii) performance tuning, (iii) design-time evolution, or (iv) runtime adaptation [11]. We lack empirical understanding of how the performance behavior of a system will vary when the environ- ment of the system changes. Such empirical understanding will provide important insights to develop faster and more accurate learning techniques that allow us to make predictions and optimizations of performance for highly configurable systems in changing environments [10]. For instance, we can learn performance behavior of a system on a cheap hardware in a controlled lab environment and use that to understand the per- formance behavior of the system on a production server before shipping to the end user. More specifically, we would like to know, what the relationship is between the performance of a system in a specific environment (characterized by software configuration, hardware, workload, and system version) to the one that we vary its environmental conditions. In this research, we aim for an empirical understanding of A. Preliminary concept In this section, we p cepts that we use throu enable us to concisely 1) Configuration and the i-th feature of a co enabled or disabled an configuration space is m all the features C = Dom(Fi) = {0, 1}. A a member of the confi all the parameters are range (i.e., complete ins We also describe an e = [w, h, v] drawn fr W ⇥H ⇥V , where they values for workload, ha 2) Performance mod configuration space F formance model is a b given some observation combination of system II. INTUITION standing the performance behavior of configurable systems can enable (i) performance debugging, (ii) ance tuning, (iii) design-time evolution, or (iv) runtime on [11]. We lack empirical understanding of how the ance behavior of a system will vary when the environ- the system changes. Such empirical understanding will important insights to develop faster and more accurate techniques that allow us to make predictions and tions of performance for highly configurable systems ging environments [10]. For instance, we can learn ance behavior of a system on a cheap hardware in a ed lab environment and use that to understand the per- e behavior of the system on a production server before to the end user. More specifically, we would like to hat the relationship is between the performance of a n a specific environment (characterized by software ation, hardware, workload, and system version) to the we vary its environmental conditions. s research, we aim for an empirical understanding of A. Preliminary concepts In this section, we provide formal definitions of cepts that we use throughout this study. The forma enable us to concisely convey concept throughout 1) Configuration and environment space: Let F the i-th feature of a configurable system A whic enabled or disabled and one of them holds by de configuration space is mathematically a Cartesian all the features C = Dom(F1) ⇥ · · · ⇥ Dom(F Dom(Fi) = {0, 1}. A configuration of a syste a member of the configuration space (feature spa all the parameters are assigned to a specific valu range (i.e., complete instantiations of the system’s pa We also describe an environment instance by 3 e = [w, h, v] drawn from a given environment s W ⇥H ⇥V , where they respectively represent sets values for workload, hardware and system version. 2) Performance model: Given a software syste configuration space F and environmental instances formance model is a black-box function f : F ⇥ given some observations of the system performanc combination of system’s features x 2 F in an en formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 F where ✏i ⇠ N (0, i). The training data for our regression models is then simply Dtr = {(xi, yi)}n i=1. In other words, a response function is simply a mapping from the input space to a measurable performance metric that produces interval-scaled data (here we assume it produces real numbers). 3) Performance distribution: For the performance model, we measured and associated the performance response to each configuration, now let introduce another concept where we vary the environment and we measure the performance. An empirical performance distribution is a stochastic process, pd : E ! (R), that defines a probability distribution over performance measures for each environmental conditions. To tem version) to the ons. al understanding of g via an informed learning a perfor- ed on a well-suited the knowledge we the main research information (trans- o both source and ore can be carried r. This transferable [10]. s that we consider uration is a set of s the primary vari- rstand performance ike to understand der study will be formance model is a black-box function f : F ⇥ E given some observations of the system performance fo combination of system’s features x 2 F in an enviro e 2 E. To construct a performance model for a syst with configuration space F, we run A in environment in e 2 E on various combinations of configurations xi 2 F record the resulting performance values yi = f(xi) + ✏i F where ✏i ⇠ N (0, i). The training data for our regr models is then simply Dtr = {(xi, yi)}n i=1. In other wo response function is simply a mapping from the input sp a measurable performance metric that produces interval- data (here we assume it produces real numbers). 3) Performance distribution: For the performance m we measured and associated the performance response t configuration, now let introduce another concept whe vary the environment and we measure the performanc empirical performance distribution is a stochastic pr pd : E ! (R), that defines a probability distributio performance measures for each environmental conditio Extract Reuse Learn Learn Goal: Gain strength by transferring information across environments
  85. 85. TurtleBot [P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17] Our transfer learning solution
  86. 86. TurtleBot Simulator (Gazebo) [P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17] Our transfer learning solution
  87. 87. Data Measure TurtleBot Simulator (Gazebo) [P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17] Our transfer learning solution
  88. 88. DataData Data Measure Measure Reuse TurtleBot Simulator (Gazebo) [P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17] Configurations Our transfer learning solution
  89. 89. DataData Data Measure Measure Reuse Learn TurtleBot Simulator (Gazebo) [P. Jamshidi, et al., “Transfer learning for improving model predictions ….”, SEAMS’17] Configurations Our transfer learning solution f(o1, o2) = 5 + 3o1 + 15o2 7o1 ⇥ o2
  90. 90. Gaussian processes for performance modeling t = n output,f(x) input, x
  91. 91. Gaussian processes for performance modeling t = n Observation output,f(x) input, x
  92. 92. Gaussian processes for performance modeling t = n Observation Mean output,f(x) input, x
  93. 93. Gaussian processes for performance modeling t = n Observation Mean Uncertainty output,f(x) input, x
  94. 94. Gaussian processes for performance modeling t = n Observation Mean Uncertainty New observation output,f(x) input, x
  95. 95. input, x Gaussian processes for performance modeling t = n t = n + 1 Observation Mean Uncertainty New observation output,f(x) input, x
  96. 96. Gaussian Processes enables reasoning about performance Step 1: Fit GP to the data seen so far Step 2: Explore the model for regions of most variance Step 3: Sample that region Step 4: Repeat -1.5 -1 -0.5 0 0.5 1 1.5 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Configuration Space Empirical Model Experiment Experiment 0 20 40 60 80 100 120 140 160 180 200 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Selection Criteria Sequential Design
  97. 97. CoBot experiment: DARPA BRASS
  98. 98. CoBot experiment: DARPA BRASS
  99. 99. CoBot experiment: DARPA BRASS 0 2 4 6 8 Localization error [m] 10 15 20 25 30 35 40 CPUutilization[%] Pareto front better better no_of_particles=x no_of_refinement=y
  100. 100. CoBot experiment: DARPA BRASS 0 2 4 6 8 Localization error [m] 10 15 20 25 30 35 40 CPUutilization[%] Energy constraint Pareto front better better no_of_particles=x no_of_refinement=y
  101. 101. CoBot experiment: DARPA BRASS 0 2 4 6 8 Localization error [m] 10 15 20 25 30 35 40 CPUutilization[%] Energy constraint Safety constraint Pareto front better better no_of_particles=x no_of_refinement=y
  102. 102. CoBot experiment: DARPA BRASS 0 2 4 6 8 Localization error [m] 10 15 20 25 30 35 40 CPUutilization[%] Energy constraint Safety constraint Pareto front Sweet Spot better better no_of_particles=x no_of_refinement=y
  103. 103. CoBot experiment
  104. 104. CoBot experiment 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 Source (given) CPU [%]
  105. 105. CoBot experiment 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 Source (given) Target (ground truth 6 months) CPU [%] CPU [%]
  106. 106. CoBot experiment 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 Source (given) Target (ground truth 6 months) Prediction with 4 samples CPU [%] CPU [%]
  107. 107. CoBot experiment 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 Source (given) Target (ground truth 6 months) Prediction with 4 samples Prediction with Transfer learning CPU [%] CPU [%]
  108. 108. Results: Other configurable systems CoBot WordCount SOL RollingSort Cassandra (HW) Cassandra (DB)
  109. 109. Transfer Learning for Improving Model Predictions in Highly Configurable Software Pooyan Jamshidi, Miguel Velez, Christian K¨astner Carnegie Mellon University, USA {pjamshid,mvelezce,kaestner}@cs.cmu.edu Norbert Siegmund Bauhaus-University Weimar, Germany norbert.siegmund@uni-weimar.de Prasad Kawthekar Stanford University, USA pkawthek@stanford.edu Abstract—Modern software systems are built to be used in dynamic environments using configuration capabilities to adapt to changes and external uncertainties. In a self-adaptation context, we are often interested in reasoning about the performance of the systems under different configurations. Usually, we learn a black-box model based on real measurements to predict the performance of the system given a specific configuration. However, as modern systems become more complex, there are many configuration parameters that may interact and we end up learning an exponentially large configuration space. Naturally, this does not scale when relying on real measurements in the actual changing environment. We propose a different solution: Instead of taking the measurements from the real system, we learn the model using samples from other sources, such as simulators that approximate performance of the real system at Predictive Model Learn Model with Transfer Learning Measure Measure Data Source Target Simulator (Source) Robot (Target) Adaptation Fig. 1: Transfer learning for performance model learning. order to identify the best performing configuration for a robot Details: [SEAMS ’17]
  110. 110. Summary (transfer learning) 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25
  111. 111. Summary (transfer learning) • Model for making tradeoff between qualities 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25
  112. 112. Summary (transfer learning) • Model for making tradeoff between qualities • Scale to large space and environmental changes 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25
  113. 113. Summary (transfer learning) • Model for making tradeoff between qualities • Scale to large space and environmental changes • Transfer learning can help 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25
  114. 114. Summary (transfer learning) • Model for making tradeoff between qualities • Scale to large space and environmental changes • Transfer learning can help • Increase prediction accuracy 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25
  115. 115. Summary (transfer learning) • Model for making tradeoff between qualities • Scale to large space and environmental changes • Transfer learning can help • Increase prediction accuracy • Increase model reliability 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25
  116. 116. Summary (transfer learning) • Model for making tradeoff between qualities • Scale to large space and environmental changes • Transfer learning can help • Increase prediction accuracy • Increase model reliability • Decrease model building cost 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 5 10 15 20 25
  117. 117. Outline Case Study Transfer Learning Theory Building Guided Sampling Future Directions
  118. 118. Looking further: When transfer learning goes wrong 10 20 30 40 50 60 AbsolutePercentageError[%] Sources s s1 s2 s3 s4 s5 s6 noise-level 0 5 10 15 20 25 30 corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19 µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75 Insight: Predictions become more accurate when the source is more related to the target. Non-transfer-learning
  119. 119. Looking further: When transfer learning goes wrong 10 20 30 40 50 60 AbsolutePercentageError[%] Sources s s1 s2 s3 s4 s5 s6 noise-level 0 5 10 15 20 25 30 corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19 µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75 It worked! Insight: Predictions become more accurate when the source is more related to the target. Non-transfer-learning
  120. 120. Looking further: When transfer learning goes wrong 10 20 30 40 50 60 AbsolutePercentageError[%] Sources s s1 s2 s3 s4 s5 s6 noise-level 0 5 10 15 20 25 30 corr. coeff. 0.98 0.95 0.89 0.75 0.54 0.34 0.19 µ(pe) 15.34 14.14 17.09 18.71 33.06 40.93 46.75 It worked! It didn’t! Insight: Predictions become more accurate when the source is more related to the target. Non-transfer-learning
  121. 121. Our empirical study: We looked at different highly- configurable systems to gain insights [P. Jamshidi, et al., “Transfer learning for performance modeling of configurable systems….”, ASE’17] SPEAR (SAT Solver) Analysis time 14 options 16,384 configurations SAT problems 3 hardware 2 versions X264 (video encoder) Encoding time 16 options 4,000 configurations Video quality/size 2 hardware 3 versions SQLite (DB engine) Query time 14 options 1,000 configurations DB Queries 2 hardware 2 versions SaC (Compiler) Execution time 50 options 71,267 configurations 10 Demo programs
  122. 122. Observation 1: Linear shift happens only in limited environmental changes Soft Environmental change Severity Corr. SPEAR NUC/2 -> NUC/4 Small 1.00 Amazon_nano -> NUC Large 0.59 Hardware/workload/version V Large -0.10 x264 Version Large 0.06 Workload Medium 0.65 SQLite write-seq -> write-batch Small 0.96 read-rand -> read-seq Medium 0.50 Target Source Throughput Implication: Simple transfer learning is limited to hardware changes in practice log P(θ, Xobs ) Θ l P(θ|Xobs) Θ Figure 5: The first column shows the log joint probab
  123. 123. Observation 1: Linear shift happens only in limited environmental changes Soft Environmental change Severity Corr. SPEAR NUC/2 -> NUC/4 Small 1.00 Amazon_nano -> NUC Large 0.59 Hardware/workload/version V Large -0.10 x264 Version Large 0.06 Workload Medium 0.65 SQLite write-seq -> write-batch Small 0.96 read-rand -> read-seq Medium 0.50 Target Source Throughput Implication: Simple transfer learning is limited to hardware changes in practice log P(θ, Xobs ) Θ l P(θ|Xobs) Θ Figure 5: The first column shows the log joint probab
  124. 124. Observation 1: Linear shift happens only in limited environmental changes Soft Environmental change Severity Corr. SPEAR NUC/2 -> NUC/4 Small 1.00 Amazon_nano -> NUC Large 0.59 Hardware/workload/version V Large -0.10 x264 Version Large 0.06 Workload Medium 0.65 SQLite write-seq -> write-batch Small 0.96 read-rand -> read-seq Medium 0.50 Target Source Throughput Implication: Simple transfer learning is limited to hardware changes in practice log P(θ, Xobs ) Θ l P(θ|Xobs) Θ Figure 5: The first column shows the log joint probab
  125. 125. Observation 1: Linear shift happens only in limited environmental changes Soft Environmental change Severity Corr. SPEAR NUC/2 -> NUC/4 Small 1.00 Amazon_nano -> NUC Large 0.59 Hardware/workload/version V Large -0.10 x264 Version Large 0.06 Workload Medium 0.65 SQLite write-seq -> write-batch Small 0.96 read-rand -> read-seq Medium 0.50 Target Source Throughput Implication: Simple transfer learning is limited to hardware changes in practice log P(θ, Xobs ) Θ l P(θ|Xobs) Θ Figure 5: The first column shows the log joint probab
  126. 126. Soft Environmental change Severity Dim t-test x264 Version Large 16 12 10 Hardware/workload/ver V Large 8 9 SQLite write-seq -> write-batch V Large 14 3 4 read-rand -> read-seq Medium 1 1 SaC Workload V Large 50 16 10 Implication: Avoid wasting budget on non-informative part of configuration space and focusing where it matters. Observation 2: Influential options and interactions are preserved across environments
  127. 127. Soft Environmental change Severity Dim t-test x264 Version Large 16 12 10 Hardware/workload/ver V Large 8 9 SQLite write-seq -> write-batch V Large 14 3 4 read-rand -> read-seq Medium 1 1 SaC Workload V Large 50 16 10 Implication: Avoid wasting budget on non-informative part of configuration space and focusing where it matters. Observation 2: Influential options and interactions are preserved across environments
  128. 128. Soft Environmental change Severity Dim t-test x264 Version Large 16 12 10 Hardware/workload/ver V Large 8 9 SQLite write-seq -> write-batch V Large 14 3 4 read-rand -> read-seq Medium 1 1 SaC Workload V Large 50 16 10 Implication: Avoid wasting budget on non-informative part of configuration space and focusing where it matters. Observation 2: Influential options and interactions are preserved across environments
  129. 129. Soft Environmental change Severity Dim t-test x264 Version Large 16 12 10 Hardware/workload/ver V Large 8 9 SQLite write-seq -> write-batch V Large 14 3 4 read-rand -> read-seq Medium 1 1 SaC Workload V Large 50 16 10 Implication: Avoid wasting budget on non-informative part of configuration space and focusing where it matters. Observation 2: Influential options and interactions are preserved across environments 216 250 = 0.000000000058 We only need to explore part of the space:
  130. 130. Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis Pooyan Jamshidi Carnegie Mellon University, USA Norbert Siegmund Bauhaus-University Weimar, Germany Miguel Velez, Christian K¨astner Akshay Patel, Yuvraj Agarwal Carnegie Mellon University, USA Abstract—Modern software systems provide many configura- tion options which significantly influence their non-functional properties. To understand and predict the effect of configuration options, several sampling and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort of constructing performance models by transferring knowledge about performance behavior across environments. While this line of research is promising to learn more accurate models at a lower cost, it is unclear why and when transfer learning works for performance modeling. To shed light on when it is beneficial to apply transfer learning, we conducted an empirical study on four popular software systems, varying software configurations and environmental conditions, such as hardware, workload, and software versions, to identify the key knowledge pieces that can be exploited for transfer learning. Our results show that in small environmental changes (e.g., homogeneous workload change), by applying a linear transformation to the performance model, we can understand the performance behavior of the target environment, while for severe environmental changes (e.g., drastic workload change) we can transfer only knowledge that makes sampling more efficient, e.g., by reducing the dimensionality of the configuration space. Index Terms—Performance analysis, transfer learning. Fig. 1: Transfer learning is a form of machine learning that takes advantage of transferable knowledge from source to learn an accurate, reliable, and less costly model for the target environment. their byproducts across environments is demanded by many Details: [ASE ’17]
  131. 131. Outline Case Study Transfer Learning Empirical Study Guided Sampling Vision
  132. 132. What will the software systems of the future look like?
  133. 133. Software 2.0 Increasingly customized and configurable VISION
  134. 134. Software 2.0 Increasingly customized and configurable VISION
  135. 135. Software 2.0 Increasingly customized and configurable VISION
  136. 136. Software 2.0 Increasingly customized and configurable VISION
  137. 137. Software 2.0 Increasingly customized and configurable VISION Increasingly competing objectives Accuracy Training speed Inference speed Model size Energy
  138. 138. Deep neural network as a highly configurable system trans1 N: 40, op: [1, 1, 1, 0], groups: [4, 2, 2, X] ks: [[3, 5], [5, 7], [3, 3], X] d: [[2, 2], [1, 1], [3, 1], X] N: 16, op: [1, 1, 0, 0], groups: [2, 1, X, X] ks: [[5, 5], [3, 5], X, X] d: [[1, 1], [3, 1], X, X] N: 48, op: [1, 1, 1, 1], groups: [1, 1, 1, 4] ks: [[5, 5], [7, 3], [3, 3], [3, 7]] d: [[1, 1], [1, 3], [1, 2], [3, 1]] N: 16, op: [1, 1, 1, 0], groups: [1, 4, 4, X] ks: [[7, 3], [3, 5], [5, 5], X] d: [[1, 1], [2, 1], [2, 2], X] N: 16, op: [1, 1, 1, 1], groups: [4, 4, 2, 4] ks: [[7, 7], [5, 3], [3, 5], [3, 5]] d: [[1, 1], [1, 2], [3, 1], [1, 1]] N: 8, op: [1, 1, 1, 1], groups: [4, 1, 1, 2] ks: [[5, 3], [3, 5], [7, 5], [5, 5]] d: [[2, 1], [3, 2], [1, 2], [1, 2]] N: 48, op: [1, 1, 1, 1], groups: [1, 4, 4, 2] ks: [[7, 3], [7, 7], [5, 5], [5, 7]] d: [[1, 2], [1, 1], [1, 1], [1, 1]] N: 16, op: [1, 1, 0, 0], groups: [2, 2, X, X] ks: [[5, 7], [5, 3], X, X] d: [[2, 2], [1, 2], X, X] N: 56, op: [1, 0, 0, 0], groups: [2, X, X, X] ks: [[3, 7], X, X, X] d: [[1, 1], X, X, X] N: 48, op: [1, 1, 1, 0], groups: [4, 2, 1, X] ks: [[3, 5], [7, 5], [7, 5], X] d: [[3, 2], [1, 2], [1, 2], X] N: 56, op: [1, 0, 0, 0], groups: [1, X, X, X] ks: [[5, 3], X, X, X] d: [[2, 1], X, X, X] + N: 64, op: [1, 1, 1, 1], groups: [1, 1, 4, 2] ks: [[5, 5], [5, 3], [5, 3], [7, 5]] d: [[2, 1], [1, 1], [2, 1], [1, 1]] + N: 32, op: [1, 1, 0, 0], groups: [2, 2, X, X] ks: [[7, 5], [3, 7], X, X] d: [[1, 2], [3, 1], X, X] N: 32, op: [1, 1, 1, 0], groups: [2, 2, 1, X] ks: [[7, 3], [7, 7], [3, 7], X] d: [[1, 3], [1, 1], [3, 1], X] N: 40, op: [1, 1, 0, 0], groups: [2, 4, X, X] ks: [[3, 7], [5, 7], X, X] d: [[1, 1], [1, 1], X, X] + + + N: 8, op: [1, 1, 1, 1], groups: [4, 2, 1, 2] ks: [[3, 7], [7, 5], [3, 3], [3, 7]] d: [[3, 1], [1, 1], [2, 1], [2, 1]] +++ + N: 16, op: [1, 1, 1, 0], groups: [1, 4, 4, X] ks: [[7, 5], [3, 7], [5, 7], X] d: [[1, 1], [1, 1], [1, 1], X] ++ N: 56, op: [1, 1, 1, 1], groups: [1, 4, 1, 1] ks: [[7, 5], [3, 3], [3, 3], [3, 3]] d: [[1, 1], [1, 2], [2, 1], [3, 1]] N: 40, op: [1, 1, 1, 1], groups: [1, 4, 4, 2] ks: [[3, 3], [3, 5], [5, 3], [3, 3]] d: [[1, 3], [1, 1], [1, 3], [3, 2]] + N: 24, op: [1, 1, 1, 1], groups: [2, 1, 1, 2] ks: [[7, 3], [3, 3], [7, 3], [7, 5]] d: [[1, 2], [2, 3], [1, 3], [1, 1]] N: 32, op: [1, 1, 1, 1], groups: [1, 1, 1, 1] ks: [[5, 5], [5, 3], [5, 7], [7, 3]] d: [[2, 2], [1, 3], [1, 1], [1, 2]] N: 24, op: [1, 1, 1, 1], groups: [2, 1, 4, 1] ks: [[7, 7], [7, 7], [7, 3], [7, 5]] d: [[1, 1], [1, 1], [1, 3], [1, 1]] N: 40, op: [1, 1, 1, 1], groups: [1, 1, 2, 1] ks: [[7, 3], [7, 5], [5, 3], [3, 3]] d: [[1, 3], [1, 2], [2, 3], [1, 1]] + N: 24, op: [1, 1, 1, 0], groups: [4, 1, 2, X] ks: [[5, 7], [7, 3], [5, 5], X] d: [[1, 1], [1, 1], [1, 2], X] ++ + ++ + ++ ++ + + + + + + + + + + + + + + + + + ++ output ++ N: 24, op: [1, 0, 0, 0], groups: [2, X, X, X] ks: [[7, 5], X, X, X] d: [[1, 2], X, X, X] N: 56, op: [1, 1, 1, 1], groups: [4, 4, 4, 4] ks: [[7, 5], [3, 3], [3, 7], [5, 5]] d: [[1, 2], [2, 1], [3, 1], [1, 2]] + + + N: 24, op: [1, 1, 1, 0], groups: [2, 2, 4, X] ks: [[5, 5], [5, 3], [5, 5], X] d: [[1, 1], [1, 2], [1, 2], X] N: 56, op: [1, 1, 1, 1], groups: [2, 1, 1, 1] ks: [[5, 7], [7, 3], [3, 5], [7, 5]] d: [[2, 1], [1, 2], [2, 1], [1, 2]] N: 40, op: [1, 0, 0, 0], groups: [2, X, X, X] ks: [[7, 5], X, X, X] d: [[1, 2], X, X, X] N: 64, op: [1, 1, 1, 1], groups: [4, 1, 1, 1] ks: [[7, 3], [5, 3], [7, 7], [5, 7]] d: [[1, 1], [2, 2], [1, 1], [1, 1]] + + + + + + N: 56, op: [1, 1, 1, 0], groups: [4, 2, 4, X] ks: [[7, 5], [3, 7], [7, 5], X] d: [[1, 2], [3, 1], [1, 2], X] N: 32, op: [1, 1, 0, 0], groups: [1, 4, X, X] ks: [[5, 3], [3, 3], X, X] d: [[1, 2], [3, 2], X, X] + + + N: 16, op: [1, 1, 0, 0], groups: [1, 1, X, X] ks: [[5, 5], [3, 7], X, X] d: [[1, 2], [3, 1], X, X] + + + + + + + + + + ++ + ++ + + + ++ + + +
  139. 139. Exploring the design space of deep networks accuracy is reported on the CIFAR-10 test set. We note that the test set tion, and it is only used for final model evaluation. We also evaluate the in a large-scale setting on the ImageNet challenge dataset (Sect. 4.3). globalpool linear&softmax sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3 globalpool linear&softmax large CIFAR-10 model cell cell cell cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 sep.conv3x3/2 globalpool linear&softmax ImageNet model cell cell cell cell cell cell n models constructed using the cells optimized with architecture search. during architecture search on CIFAR-10. Top-right: large CIFAR-10 Optimal Architecture (Yesterday)
  140. 140. Exploring the design space of deep networks accuracy is reported on the CIFAR-10 test set. We note that the test set tion, and it is only used for final model evaluation. We also evaluate the in a large-scale setting on the ImageNet challenge dataset (Sect. 4.3). globalpool linear&softmax sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3 globalpool linear&softmax large CIFAR-10 model cell cell cell cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 sep.conv3x3/2 globalpool linear&softmax ImageNet model cell cell cell cell cell cell n models constructed using the cells optimized with architecture search. during architecture search on CIFAR-10. Top-right: large CIFAR-10 Optimal Architecture (Yesterday) New Fraud Pattern
  141. 141. Exploring the design space of deep networks accuracy is reported on the CIFAR-10 test set. We note that the test set tion, and it is only used for final model evaluation. We also evaluate the in a large-scale setting on the ImageNet challenge dataset (Sect. 4.3). globalpool linear&softmax sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3 globalpool linear&softmax large CIFAR-10 model cell cell cell cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 sep.conv3x3/2 globalpool linear&softmax ImageNet model cell cell cell cell cell cell n models constructed using the cells optimized with architecture search. during architecture search on CIFAR-10. Top-right: large CIFAR-10 milar approach has recently been used in (Zoph et al., 2017; Zhong et al., 2017). rchitecture search is carried out entirely on the CIFAR-10 training set, which we split into two ub-sets of 40K training and 10K validation images. Candidate models are trained on the training ubset, and evaluated on the validation subset to obtain the fitness. Once the search process is over, e selected cell is plugged into a large model which is trained on the combination of training and alidation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We note that the test set never used for model selection, and it is only used for final model evaluation. We also evaluate the ells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge dataset (Sect. 4.3). sep.conv3x3/2 sep.conv3x3/2 sep.conv3x3 conv3x3 globalpool linear&softmax small CIFAR-10 model cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3 globalpool linear&softmax large CIFAR-10 model cell cell cell cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3/2 conv3x3/2 sep.conv3x3/2 globalpool linear&softmax cell cell cell cell cell cell cell Optimal Architecture (Yesterday) Optimal Architecture (Today) New Fraud Pattern
  142. 142. Deep architecture deployment
  143. 143. Deep architecture deployment
  144. 144. Deep architecture deployment [2.44, 0.88] [9.93, 0.77]
  145. 145. To achieve the optimal results, it is important to explore: • Network architectures • Software libraries [TensorFlow, CNTK, Theano, etc] • Deployment infrastructure
  146. 146. Exploring the design space of deep networks 0.2 0.4 0.6 0.8 1 1.2 Inference time [h] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Validationerror better better
  147. 147. Exploring the design space of deep networks 0.2 0.4 0.6 0.8 1 1.2 Inference time [h] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Validationerror better better
  148. 148. Exploring the design space of deep networks 0.2 0.4 0.6 0.8 1 1.2 Inference time [h] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Validationerror Default subset, and evaluated on the validation subset the selected cell is plugged into a large mode validation sub-sets, and the accuracy is report is never used for model selection, and it is only cells, learned on CIFAR-10, in a large-scale se image sep.conv3x3/2 sep.conv3x3/2 sep.conv3x3 conv3x3 globalpool linear&softmax small CIFAR-10 model cell cell cell sep.conv3x3 image conv3x3/2 conv3x3/2 sep.conv3x3/2 Imag cell cell cell Figure 2: Image classification models construc Top-left: small model used during architectu model used for learned cell evaluation. Bottom better better
  149. 149. Exploring the design space of deep networks 0.2 0.4 0.6 0.8 1 1.2 Inference time [h] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Validationerror Default Pareto optimal subset, and evaluated on the validation subset the selected cell is plugged into a large mode validation sub-sets, and the accuracy is report is never used for model selection, and it is only cells, learned on CIFAR-10, in a large-scale se image sep.conv3x3/2 sep.conv3x3/2 sep.conv3x3 conv3x3 globalpool linear&softmax small CIFAR-10 model cell cell cell sep.conv3x3 image conv3x3/2 conv3x3/2 sep.conv3x3/2 Imag cell cell cell Figure 2: Image classification models construc Top-left: small model used during architectu model used for learned cell evaluation. Bottom better better
  150. 150. Exploring the design space of deep networks 0.2 0.4 0.6 0.8 1 1.2 Inference time [h] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Validationerror Default Pareto optimal subset, and evaluated on the validation subset the selected cell is plugged into a large mode validation sub-sets, and the accuracy is report is never used for model selection, and it is only cells, learned on CIFAR-10, in a large-scale se image sep.conv3x3/2 sep.conv3x3/2 sep.conv3x3 conv3x3 globalpool linear&softmax small CIFAR-10 model cell cell cell sep.conv3x3 image conv3x3/2 conv3x3/2 sep.conv3x3/2 Imag cell cell cell Figure 2: Image classification models construc Top-left: small model used during architectu model used for learned cell evaluation. Bottom Architecture search is carried out entirely on the CIFAR-10 training set, which w sub-sets of 40K training and 10K validation images. Candidate models are trained subset, and evaluated on the validation subset to obtain the fitness. Once the search the selected cell is plugged into a large model which is trained on the combination validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We note is never used for model selection, and it is only used for final model evaluation. We cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge data image sep.conv3x3/2 sep.conv3x3/2 sep.conv3x3 conv3x3 globalpool linear&softmax small CIFAR-10 model cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 image conv3x3 large CIFAR-10 model cell cell cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3/2 conv3x3/2 sep.conv3x3/2 globalpool linear&softmax ImageNet model cell cell cell cell cell cell cell Figure 2: Image classification models constructed using the cells optimized with arc Top-left: small model used during architecture search on CIFAR-10. Top-right: better better
  151. 151. Exploring the design space of deep networks 0.2 0.4 0.6 0.8 1 1.2 Inference time [h] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Validationerror Default Pareto optimal subset, and evaluated on the validation subset the selected cell is plugged into a large mode validation sub-sets, and the accuracy is report is never used for model selection, and it is only cells, learned on CIFAR-10, in a large-scale se image sep.conv3x3/2 sep.conv3x3/2 sep.conv3x3 conv3x3 globalpool linear&softmax small CIFAR-10 model cell cell cell sep.conv3x3 image conv3x3/2 conv3x3/2 sep.conv3x3/2 Imag cell cell cell Figure 2: Image classification models construc Top-left: small model used during architectu model used for learned cell evaluation. Bottom validation subset to obtain the fitness. Once the search process is over, nto a large model which is trained on the combination of training and ccuracy is reported on the CIFAR-10 test set. We note that the test set tion, and it is only used for final model evaluation. We also evaluate the in a large-scale setting on the ImageNet challenge dataset (Sect. 4.3). globalpool linear&softmax sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3 globalpool linear&softmax large CIFAR-10 model cell cell cell cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 sep.conv3x3/2 globalpool linear&softmax ImageNet model cell cell cell cell cell cell n models constructed using the cells optimized with architecture search. during architecture search on CIFAR-10. Top-right: large CIFAR-10 valuation. Bottom: ImageNet model used for learned cell evaluation. Architecture search is carried out entirely on the CIFAR-10 training set, which w sub-sets of 40K training and 10K validation images. Candidate models are trained subset, and evaluated on the validation subset to obtain the fitness. Once the search the selected cell is plugged into a large model which is trained on the combination validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We note is never used for model selection, and it is only used for final model evaluation. We cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge data image sep.conv3x3/2 sep.conv3x3/2 sep.conv3x3 conv3x3 globalpool linear&softmax small CIFAR-10 model cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 image conv3x3 large CIFAR-10 model cell cell cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3/2 conv3x3/2 sep.conv3x3/2 globalpool linear&softmax ImageNet model cell cell cell cell cell cell cell Figure 2: Image classification models constructed using the cells optimized with arc Top-left: small model used during architecture search on CIFAR-10. Top-right: better better
  152. 152. ML pipeline [Jeff Smith’s book on Reactive ML]
  153. 153. Architectural tradeoffs for production-ready ML systems (Uber)
  154. 154. Architectural tradeoffs for production-ready ML systems (Uber)
  155. 155. Architectural tradeoffs for production-ready ML systems (Uber)
  156. 156. More info? https://tinyurl.com/y97u68cr
  157. 157. Configuration errors are prevalent
  158. 158. Configuration errors are prevalent
  159. 159. Configuration errors are prevalent
  160. 160. Configuration errors are prevalent
  161. 161. Configuration errors are prevalent
  162. 162. Configuration errors are prevalent
  163. 163. Configuration errors are prevalent
  164. 164. Configuration errors are prevalent
  165. 165. Configuration errors are prevalent
  166. 166. Configuration complexity and dependencies between options is a major source of configuration errors Default Operational context I Operational context II Operational context III
  167. 167. Configuration complexity and dependencies between options is a major source of configuration errors Configuration Options Apache Hadoop Architecture
  168. 168. Configuration complexity and dependencies between options is a major source of configuration errors Configuration Options Apache Hadoop Architecture Associated to
  169. 169. Configuration complexity and dependencies between options is a major source of configuration errors Configuration Options Apache Hadoop Architecture Associated to
  170. 170. Configuration complexity and dependencies between options is a major source of configuration errors Configuration Options Apache Hadoop Architecture Associated to
  171. 171. Configurations are software too
  172. 172. Configuration errors are common 69% 31% Configuration Error Other Errors cobalt.io
  173. 173. We can find the repair patches faster with a lighter workload • [localization]: Using transfer learning to derive the most likely configurations that manifest the bugs
  174. 174. We can find the repair patches faster with a lighter workload • [localization]: Using transfer learning to derive the most likely configurations that manifest the bugs • [repair]: Automatically prepare patches to fix the configuration bug
  175. 175. Thanks

×