Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning meets DevOps

730 views

Published on

https://www.re-work.co/events/machine-learning-for-devops-summit-2018

Published in: Technology
  • Be the first to comment

Machine Learning meets DevOps

  1. 1. Machine Learning meets DevOps Transfer Learning for Performance Optimization Pooyan Jamshidi University of South Carolina @pooyanjamshidi
  2. 2. ML Systems Machine Learning Computer Systems Software Engineering ML Systems: https://pooyanjamshidi.github.io/mls/ My goal is to advance a scientific, principled understanding of “machine learning systems”
  3. 3. Configuration Optimization is a key activity in DevOps Performance Optimization DevOps AI/ML AI/ML
  4. 4. Today’s most popular systems are configurable built
  5. 5. Today’s most popular systems are configurable built
  6. 6. Today’s most popular systems are configurable built
  7. 7. Today’s most popular systems are configurable built
  8. 8. Today’s most popular systems are configurable built
  9. 9. Systems are becoming increasingly more configurable and, therefore, more difficult to understand their performance behavior 010 7/2012 7/2014 e time 1/1999 1/2003 1/2007 1/2011 0 1/2014 Release time 006 1/2010 1/2014 2.2.14 2.3.4 35 se time ache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Numberofparameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]
  10. 10. Systems are becoming increasingly more configurable and, therefore, more difficult to understand their performance behavior 010 7/2012 7/2014 e time 1/1999 1/2003 1/2007 1/2011 0 1/2014 Release time 006 1/2010 1/2014 2.2.14 2.3.4 35 se time ache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Numberofparameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15] 2180 218 = 2162 Increase in size of configuration space
  11. 11. Empirical observations confirm that systems are becoming increasingly more configurable nia San Diego, ‡Huazhong Univ. of Science & Technology, †NetApp, Inc tixu, longjin, xuf001, yyzhou}@cs.ucsd.edu kar.Pasupathy, Rukma.Talwadker}@netapp.com prevalent, but also severely software. One fundamental y of configuration, reflected parameters (“knobs”). With m software to ensure high re- aunting, error-prone task. nderstanding a fundamental users really need so many answer, we study the con- including thousands of cus- m (Storage-A), and hundreds ce system software projects. ng findings to motivate soft- ore cautious and disciplined these findings, we provide ich can significantly reduce A as an example, the guide- ters and simplify 19.7% of on existing users. Also, we tion methods in the context 7/2006 7/2008 7/2010 7/2012 7/2014 0 100 200 300 400 500 600 700 Storage-A Numberofparameters Release time 1/1999 1/2003 1/2007 1/2011 0 100 200 300 400 500 5.6.2 5.5.0 5.0.16 5.1.3 4.1.0 4.0.12 3.23.0 1/2014 MySQL Numberofparameters Release time 1/1998 1/2002 1/2006 1/2010 1/2014 0 100 200 300 400 500 600 1.3.14 2.2.14 2.3.4 2.0.35 1.3.24 Numberofparameters Release time Apache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Numberofparameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]
  12. 12. – HDFS-4304 “all constants should be configurable, even if we can’t see any reason to configure them.”
  13. 13. – HDFS-4304 “all constants should be configurable, even if we can’t see any reason to configure them.”
  14. 14. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX
  15. 15. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX
  16. 16. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX
  17. 17. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX
  18. 18. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX Speed Energy
  19. 19. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX Speed Energy
  20. 20. Configurations determine the performance behavior void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX Speed Energy
  21. 21. Configuration options interact, therefore, creating a non-linear and complex performance behavior • Non-linear • Non-convex • Multi-modal number of counters number of splitters latency(ms) 100 150 1 200 250 2 300 Cubic Interpolation Over Finer Grid 243 684 10125 14166 18
  22. 22. How do we understand performance behavior of real-world highly-configurable systems that scale well…
  23. 23. How do we understand performance behavior of real-world highly-configurable systems that scale well… … and enable developers/users to reason about qualities (performance, energy) and to make tradeoff?
  24. 24. SocialSensor as a case study to motivate configuration optimization •Identifying trending topics •Identifying user defined topics •Social media search
  25. 25. SocialSensor is a data processing pipeline Internet
  26. 26. SocialSensor is a data processing pipeline Crawling Tweets: [5k-20k/min] Crawled items Internet
  27. 27. SocialSensor is a data processing pipeline Crawling Tweets: [5k-20k/min] Store Crawled items Internet
  28. 28. SocialSensor is a data processing pipeline Orchestrator Crawling Tweets: [5k-20k/min] Every 10 min: [100k tweets] Store Crawled items FetchInternet
  29. 29. SocialSensor is a data processing pipeline Content AnalysisOrchestrator Crawling Tweets: [5k-20k/min] Every 10 min: [100k tweets] Store Push Crawled items FetchInternet
  30. 30. SocialSensor is a data processing pipeline Content AnalysisOrchestrator Crawling Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Store Push Store Crawled items FetchInternet
  31. 31. SocialSensor is a data processing pipeline Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet
  32. 32. They expected to see an increase in their user base significantly over a short period Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet
  33. 33. They expected to see an increase in their user base significantly over a short period Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet 100X
  34. 34. They expected to see an increase in their user base significantly over a short period Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet 100X 10X
  35. 35. They expected to see an increase in their user base significantly over a short period Content AnalysisOrchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items FetchInternet 100X 10X Real time
  36. 36. How can we gain a better performance without using more resources?
  37. 37. Let’s try out different system configurations!
  38. 38. Opportunity: Data processing engines in the pipeline were all configurable
  39. 39. Opportunity: Data processing engines in the pipeline were all configurable
  40. 40. Opportunity: Data processing engines in the pipeline were all configurable
  41. 41. Opportunity: Data processing engines in the pipeline were all configurable
  42. 42. Opportunity: Data processing engines in the pipeline were all configurable > 100 > 100 > 100
  43. 43. Opportunity: Data processing engines in the pipeline were all configurable > 100 > 100 > 100 2300
  44. 44. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) Default configuration was bad, so was the expert’ better better
  45. 45. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) Default configuration was bad, so was the expert’ Defaultbetter better
  46. 46. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) Default configuration was bad, so was the expert’ Default Recommended by an expert better better
  47. 47. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) Default configuration was bad, so was the expert’ Default Recommended by an expert Optimal Configuration better better
  48. 48. 0 0.5 1 1.5 2 2.5 Throughput (ops/sec) 10 4 0 50 100 150 200 250 300 Latency(ms) Default configuration was bad, so was the expert’ better better
  49. 49. 0 0.5 1 1.5 2 2.5 Throughput (ops/sec) 10 4 0 50 100 150 200 250 300 Latency(ms) Default configuration was bad, so was the expert’ Default better better
  50. 50. 0 0.5 1 1.5 2 2.5 Throughput (ops/sec) 10 4 0 50 100 150 200 250 300 Latency(ms) Default configuration was bad, so was the expert’ Default Recommended by an expert better better
  51. 51. 0 0.5 1 1.5 2 2.5 Throughput (ops/sec) 10 4 0 50 100 150 200 250 300 Latency(ms) Default configuration was bad, so was the expert’ Default Recommended by an expert Optimal Configuration better better
  52. 52. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) The default configuration is typically bad and the optimal configuration is noticeably better than median better better
  53. 53. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) The default configuration is typically bad and the optimal configuration is noticeably better than median Default Configurationbetter better • Default is bad
  54. 54. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) The default configuration is typically bad and the optimal configuration is noticeably better than median Default Configuration Optimal Configuration better better • Default is bad
  55. 55. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) The default configuration is typically bad and the optimal configuration is noticeably better than median Default Configuration Optimal Configuration better better • Default is bad • 2X-10X faster than worst
  56. 56. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Averagewritelatency(s) The default configuration is typically bad and the optimal configuration is noticeably better than median Default Configuration Optimal Configuration better better • Default is bad • 2X-10X faster than worst • Noticeably faster than median
  57. 57. How to sample the configuration space to learn a “better” performance behavior? How to select the most informative configurations?
  58. 58. We looked at different highly-configurable systems to gain insights about similarities across environments SPEAR (SAT Solver) Analysis time 14 options 16,384 configurations SAT problems 3 hardware 2 versions X264 (video encoder) Encoding time 16 options 4,000 configurations Video quality/size 2 hardware 3 versions SQLite (DB engine) Query time 14 options 1,000 configurations DB Queries 2 hardware 2 versions SaC (Compiler) Execution time 50 options 71,267 configurations 10 Demo programs
  59. 59. Observation 1: Not all options and interactions are influential and interactions degree between options are not high ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 ℂ = O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10
  60. 60. Observation 2: Influential options and interactions are preserved across environments ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
  61. 61. Observation 2: Influential options and interactions are preserved across environments ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
  62. 62. Observation 2: Influential options and interactions are preserved across environments ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
  63. 63. Observation 2: Influential options and interactions are preserved across environments ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
  64. 64. Observation 2: Influential options and interactions are preserved across environments ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7
  65. 65. The similarity across environment is a rich source of knowledge for exploration of the configuration space
  66. 66. Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis Pooyan Jamshidi Carnegie Mellon University, USA Norbert Siegmund Bauhaus-University Weimar, Germany Miguel Velez, Christian K¨astner Akshay Patel, Yuvraj Agarwal Carnegie Mellon University, USA Abstract—Modern software systems provide many configura- tion options which significantly influence their non-functional properties. To understand and predict the effect of configuration options, several sampling and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort of constructing performance models by transferring knowledge about performance behavior across environments. While this line of research is promising to learn more accurate models at a lower cost, it is unclear why and when transfer learning works for performance modeling. To shed light on when it is beneficial to apply transfer learning, we conducted an empirical study on four popular software systems, varying software configurations and environmental conditions, such as hardware, workload, and software versions, to identify the key knowledge pieces that can be exploited for transfer learning. Our results show that in small environmental changes (e.g., homogeneous workload change), by applying a linear transformation to the performance model, we can understand the performance behavior of the target environment, while for severe environmental changes (e.g., drastic workload change) we can transfer only knowledge that makes sampling more efficient, e.g., by reducing the dimensionality of the configuration space. Index Terms—Performance analysis, transfer learning. Fig. 1: Transfer learning is a form of machine learning that takes advantage of transferable knowledge from source to learn an accurate, reliable, and less costly model for the target environment. their byproducts across environments is demanded by many Details: [ASE ’17]
  67. 67. Learning to Sample (L2S)
  68. 68. Extracting the knowledge about influential options and interactions: Step-wise linear regression O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn ys1 = fs(c1) ys2 = fs(c2) ys3 = fs(c3) ysn = fs(cn) Source (Execution time of Program X)
  69. 69. Extracting the knowledge about influential options and interactions: Step-wise linear regression O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn ys1 = fs(c1) ys2 = fs(c2) ys3 = fs(c3) ysn = fs(cn) Source (Execution time of Program X) Learn performance model ̂fs ∼ fs( ⋅ )
  70. 70. Extracting the knowledge about influential options and interactions: Step-wise linear regression O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn ys1 = fs(c1) ys2 = fs(c2) ys3 = fs(c3) ysn = fs(cn) Source (Execution time of Program X) Learn performance model ̂fs ∼ fs( ⋅ ) 1. Fit an initial model
  71. 71. Extracting the knowledge about influential options and interactions: Step-wise linear regression O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn ys1 = fs(c1) ys2 = fs(c2) ys3 = fs(c3) ysn = fs(cn) Source (Execution time of Program X) Learn performance model ̂fs ∼ fs( ⋅ ) 1. Fit an initial model 2. Forward selection: Add terms iteratively
  72. 72. Extracting the knowledge about influential options and interactions: Step-wise linear regression O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn ys1 = fs(c1) ys2 = fs(c2) ys3 = fs(c3) ysn = fs(cn) Source (Execution time of Program X) Learn performance model ̂fs ∼ fs( ⋅ ) 1. Fit an initial model 2. Forward selection: Add terms iteratively 3. Backward elimination: Removes terms iteratively
  73. 73. Extracting the knowledge about influential options and interactions: Step-wise linear regression O1 × O2 × ⋯ × O19 × O20 0 × 0 × ⋯ × 0 × 1 0 × 0 × ⋯ × 1 × 0 0 × 0 × ⋯ × 1 × 1 1 × 1 × ⋯ × 1 × 0 1 × 1 × ⋯ × 1 × 1 ⋯ c1 c2 c3 cn ys1 = fs(c1) ys2 = fs(c2) ys3 = fs(c3) ysn = fs(cn) Source (Execution time of Program X) Learn performance model ̂fs ∼ fs( ⋅ ) 1. Fit an initial model 2. Forward selection: Add terms iteratively 3. Backward elimination: Removes terms iteratively 4. Terminate: When neither (2) or (3) improve the model
  74. 74. L2S extracts the knowledge about influential options and interactions via performance models ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  75. 75. L2S extracts the knowledge about influential options and interactions via performance models ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  76. 76. L2S exploits the knowledge it gained from the source to sample the target environment O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7 ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  77. 77. L2S exploits the knowledge it gained from the source to sample the target environment 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0c1 O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10 ̂ft(c1) = 10.4 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7 ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  78. 78. L2S exploits the knowledge it gained from the source to sample the target environment 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0c1 c2 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10 ̂ft(c1) = 10.4 ̂ft(c2) = 8.1 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7 ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  79. 79. L2S exploits the knowledge it gained from the source to sample the target environment 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0c1 c2 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10 ̂ft(c1) = 10.4 ̂ft(c2) = 8.1 c3 0 × 0 × 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 ̂ft(c3) = 11.6 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7 ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  80. 80. L2S exploits the knowledge it gained from the source to sample the target environment 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0c1 c2 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10 ̂ft(c1) = 10.4 ̂ft(c2) = 8.1 c3 0 × 0 × 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 ̂ft(c3) = 11.6 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7 c4 0 × 0 × 0 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(c4) = 12.6 ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  81. 81. L2S exploits the knowledge it gained from the source to sample the target environment 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0c1 c2 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10 ̂ft(c1) = 10.4 ̂ft(c2) = 8.1 c3 0 × 0 × 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 ̂ft(c3) = 11.6 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7 c4 0 × 0 × 0 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(c4) = 12.6 c5 0 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(c5) = 11.7 ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  82. 82. L2S exploits the knowledge it gained from the source to sample the target environment 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0c1 c2 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10 ̂ft(c1) = 10.4 ̂ft(c2) = 8.1 c3 0 × 0 × 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 ̂ft(c3) = 11.6 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7 c4 0 × 0 × 0 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(c4) = 12.6 c5 0 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(c5) = 11.7 c6 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(c6) = 23.7 ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  83. 83. L2S exploits the knowledge it gained from the source to sample the target environment 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0c1 c2 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10 ̂ft(c1) = 10.4 ̂ft(c2) = 8.1 c3 0 × 0 × 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 ̂ft(c3) = 11.6 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7 c4 0 × 0 × 0 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(c4) = 12.6 c5 0 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(c5) = 11.7 c6 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(c6) = 23.7 ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  84. 84. L2S exploits the knowledge it gained from the source to sample the target environment 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0c1 c2 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 × 0 O1 × O2 × O3 × O4 × O5 × O6 × O7 × O8 × O9 × O10 ̂ft(c1) = 10.4 ̂ft(c2) = 8.1 c3 0 × 0 × 1 × 0 × 0 × 0 × 0 × 0 × 0 × 0 ̂ft(c3) = 11.6 ̂ft( ⋅ ) = 10.4 − 2.1o1 + 1.2o3 + 2.2o7 + 0.1o1o3 − 2.1o3o7 + 14o1o3o7 c4 0 × 0 × 0 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(c4) = 12.6 c5 0 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(c5) = 11.7 c6 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(c6) = 23.7 cx 1 × 0 × 1 × 0 × 0 × 0 × 1 × 0 × 0 × 0 ̂ft(cx) = 9.6 ̂fs( ⋅ ) = 1.2 + 3o1 + 5o3 + 0.9o7 + 0.8o3o7 + 4o1o3o7
  85. 85. Exploration vs Exploitation We also explore the configuration space using pseudo-random sampling to detect missing interactions
  86. 86. Evaluation: Learning performance behavior of Machine Learning Systems ML systems: https://pooyanjamshidi.github.io/mls
  87. 87. Configurations of deep neural networks affect accuracy and energy consumption 0 500 1000 1500 2000 2500 Energy consumption [J] 0 20 40 60 80 100 Validation(test)error CNN on CNAE-9 Data Set 72% 22X
  88. 88. Configurations of deep neural networks affect accuracy and energy consumption 0 500 1000 1500 2000 2500 Energy consumption [J] 0 20 40 60 80 100 Validation(test)error CNN on CNAE-9 Data Set 72% 22X the selected cell is plugged into a large model which is trained on the combinatio validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We not is never used for model selection, and it is only used for final model evaluation. We cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge dat image sep.conv3x3/2 sep.conv3x3/2 sep.conv3x3 conv3x3 globalpool linear&softmax small CIFAR-10 model cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 image conv3x3 large CIFAR-10 model cell cell cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3/2 conv3x3/2 sep.conv3x3/2 globalpool linear&softmax ImageNet model cell cell cell cell cell cell cell Figure 2: Image classification models constructed using the cells optimized with arc Top-left: small model used during architecture search on CIFAR-10. Top-right: model used for learned cell evaluation. Bottom: ImageNet model used for learned For CIFAR-10 experiments we use a model which consists of 3 ⇥ 3 convolution w
  89. 89. Configurations of deep neural networks affect accuracy and energy consumption 0 500 1000 1500 2000 2500 Energy consumption [J] 0 20 40 60 80 100 Validation(test)error CNN on CNAE-9 Data Set 72% 22X the selected cell is plugged into a large model which is trained on the combinatio validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We not is never used for model selection, and it is only used for final model evaluation. We cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge dat image sep.conv3x3/2 sep.conv3x3/2 sep.conv3x3 conv3x3 globalpool linear&softmax small CIFAR-10 model cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 image conv3x3 large CIFAR-10 model cell cell cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3/2 conv3x3/2 sep.conv3x3/2 globalpool linear&softmax ImageNet model cell cell cell cell cell cell cell Figure 2: Image classification models constructed using the cells optimized with arc Top-left: small model used during architecture search on CIFAR-10. Top-right: model used for learned cell evaluation. Bottom: ImageNet model used for learned For CIFAR-10 experiments we use a model which consists of 3 ⇥ 3 convolution w the selected cell is plugged into a large model which is trained on the combination of training and validation sub-sets, and the accuracy is reported on the CIFAR-10 test set. We note that the test set is never used for model selection, and it is only used for final model evaluation. We also evaluate the cells, learned on CIFAR-10, in a large-scale setting on the ImageNet challenge dataset (Sect. 4.3). image sep.conv3x3/2 sep.conv3x3/2 sep.conv3x3 conv3x3 globalpool linear&softmax small CIFAR-10 model cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3 globalpool linear&softmax large CIFAR-10 model cell cell cell cell cell cell sep.conv3x3/2 sep.conv3x3 sep.conv3x3/2 sep.conv3x3 sep.conv3x3 sep.conv3x3 image conv3x3/2 conv3x3/2 sep.conv3x3/2 globalpool linear&softmax ImageNet model cell cell cell cell cell cell cell Figure 2: Image classification models constructed using the cells optimized with architecture search. Top-left: small model used during architecture search on CIFAR-10. Top-right: large CIFAR-10 model used for learned cell evaluation. Bottom: ImageNet model used for learned cell evaluation.
  90. 90. L2S enables learning a more accurate model with less samples exploiting the knowledge from the source 3 10 20 30 40 50 60 70 Sample Size 0 20 40 60 80 100 MeanAbsolutePercentageError 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (a) DNN (hard) 3 10 20 30 40 Sample Si 0 20 40 60 80 100 MeanAbsolutePercentageError L L D M R (b) XGBoost (h Convolutional Neural Network
  91. 91. L2S enables learning a more accurate model with less samples exploiting the knowledge from the source 3 10 20 30 40 50 60 70 Sample Size 0 20 40 60 80 100 MeanAbsolutePercentageError 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (a) DNN (hard) 3 10 20 30 40 Sample Si 0 20 40 60 80 100 MeanAbsolutePercentageError L L D M R (b) XGBoost (h Convolutional Neural Network
  92. 92. L2S enables learning a more accurate model with less samples exploiting the knowledge from the source 3 10 20 30 40 50 60 70 Sample Size 0 20 40 60 80 100 MeanAbsolutePercentageError 100 200 500 L2S+GP L2S+DataReuseTL DataReuseTL ModelShift Random+CART (a) DNN (hard) 3 10 20 30 40 Sample Si 0 20 40 60 80 100 MeanAbsolutePercentageError L L D M R (b) XGBoost (h Convolutional Neural Network
  93. 93. Why performance models using L2S sample are more accurate?
  94. 94. The samples generated by L2S contains more information… “entropy <-> information gain” 10 20 30 40 50 60 70 sample size 0 1 2 3 4 5 6 entropy[bits] Max entropy L2S Random DNN
  95. 95. The samples generated by L2S contains more information… “entropy <-> information gain” 10 20 30 40 50 60 70 sample size 0 1 2 3 4 5 6 entropy[bits] Max entropy L2S Random 10 20 30 40 50 60 70 sample size 0 1 2 3 4 5 6 7 entropy[bits] Max entropy L2S Random 10 20 30 40 50 60 70 sample size 0 1 2 3 4 5 6 7 entropy[bits] Max entropy L2S Random DNN XGboost Storm
  96. 96. Details: [FSE ’18]
  97. 97. Thanks
  98. 98. @pooyanjamshidi
  99. 99. @pooyanjamshidi
  100. 100. @pooyanjamshidi
  101. 101. @pooyanjamshidi

×