Advertisement
Advertisement

More Related Content

Advertisement

PyWren: Real-Time Elastic Execution

  1. PYWREN
 Real-Time Elastic Execution Eric Jonas Postdoctoral Scientist jonas@eecs.berkeley.edu
 @stochastician Shivaram Venkataraman Ben Recht Ion Stoica Qifan Pu Berkeley Center for
 Computational Imaging Allan Peng Vaishaal Shankar pywren.io
  2. TALK OUTLINE
  3. TALK OUTLINE • What Is Pywren
  4. TALK OUTLINE • What Is Pywren • Why did we build it?
  5. TALK OUTLINE • What Is Pywren • Why did we build it? • Capabilities and Internals
  6. TALK OUTLINE • What Is Pywren • Why did we build it? • Capabilities and Internals • Applications
  7. TALK OUTLINE • What Is Pywren • Why did we build it? • Capabilities and Internals • Applications • Where we are going
  8. WHAT IS PYWREN
  9. WHAT IS PYWREN Research
  10. WHAT IS PYWREN Research Tool
  11. WHAT IS PYWREN Research Tool Exploiting real-time elastic execution
  12. WHAT IS PYWREN Research Tool How do systems change when you have real-time access to 5000 stateless cores in <1 sec? Exploiting real-time elastic execution
  13. WHAT IS PYWREN Research Tool How do systems change when you have real-time access to 5000 stateless cores in <1 sec? Exploiting real-time elastic execution How can we bring the benefits of elastic compute to underserved audiences? Building a “cloud button”
  14. Adaptive Optics
  15. Superresolution Adaptive Optics
  16. Superresolution Tomography Adaptive Optics
  17. Superresolution Phase contrastTomography Adaptive Optics
  18. PREVIOUSLY,AT COMP IMAGING LUNCH
  19. PREVIOUSLY,AT COMP IMAGING LUNCH
  20. Why is there no “cloud button”? PREVIOUSLY,AT COMP IMAGING LUNCH
  21. Jimmy McMillan Founder and Chairman The Rent isToo Damn High Party
  22. The cloud is too damn hard! Jimmy McMillan Founder and Chairman The Rent isToo Damn High Party
  23. The cloud is too damn hard! Jimmy McMillan Founder and Chairman The Rent isToo Damn High Party Less than half of the graduate students in our group have ever written a 
 Spark or Hadoop job
  24. #THECLOUDISTOODAMNHARD
  25. #THECLOUDISTOODAMNHARD • What type? what instance? What base image?
  26. #THECLOUDISTOODAMNHARD • What type? what instance? What base image? • How many to spin up? What price? spot?
  27. #THECLOUDISTOODAMNHARD • What type? what instance? What base image? • How many to spin up? What price? spot? • wait,Wait,WAIT oh god
  28. #THECLOUDISTOODAMNHARD • What type? what instance? What base image? • How many to spin up? What price? spot? • wait,Wait,WAIT oh god • now what? DEVOPS
  29. ORIGINAL DESIGN GOALS
  30. ORIGINAL DESIGN GOALS 1.Very little overhead for setup once someone has an AWS account. In particular, no persistent overhead -- you don't have to keep a large (expensive) cluster up and you don't have to wait 10+ min for a cluster to come up
  31. ORIGINAL DESIGN GOALS 1.Very little overhead for setup once someone has an AWS account. In particular, no persistent overhead -- you don't have to keep a large (expensive) cluster up and you don't have to wait 10+ min for a cluster to come up 2.As close to zero overhead for users as possible -- in particular, anyone who can write python should be able to invoke it through a reasonable interface.
  32. ORIGINAL DESIGN GOALS 1.Very little overhead for setup once someone has an AWS account. In particular, no persistent overhead -- you don't have to keep a large (expensive) cluster up and you don't have to wait 10+ min for a cluster to come up 2.As close to zero overhead for users as possible -- in particular, anyone who can write python should be able to invoke it through a reasonable interface. 3.Target jobs that run in the minutes-or-more regime.
  33. ORIGINAL DESIGN GOALS 1.Very little overhead for setup once someone has an AWS account. In particular, no persistent overhead -- you don't have to keep a large (expensive) cluster up and you don't have to wait 10+ min for a cluster to come up 2.As close to zero overhead for users as possible -- in particular, anyone who can write python should be able to invoke it through a reasonable interface. 3.Target jobs that run in the minutes-or-more regime. 4.I don't want to run a service.That is, I personally don't want to offer the front-end for other people to use, rather, I want to directly pay AWS.
  34. ORIGINAL DESIGN GOALS 1.Very little overhead for setup once someone has an AWS account. In particular, no persistent overhead -- you don't have to keep a large (expensive) cluster up and you don't have to wait 10+ min for a cluster to come up 2.As close to zero overhead for users as possible -- in particular, anyone who can write python should be able to invoke it through a reasonable interface. 3.Target jobs that run in the minutes-or-more regime. 4.I don't want to run a service.That is, I personally don't want to offer the front-end for other people to use, rather, I want to directly pay AWS. 5.It has to be from a cloud player that's likely to give out an academic grant -- AWS, Google,Azure.There are startups in this space that might build cool technology, but often don't want to be paid in AWS research credits.
  35. “Most wrens are small and rather inconspicuous, except for their loud and often complex songs.”
  36. THE API The most important primitive: map(function, data) and… that’s mostly it
  37. Powered by Continuum Analytics +
  38. • 300 seconds 
 single-core (AVX2) • 512 MB in /tmp • 1.5GB RAM • Python, Java, Node AWS LAMBDA
  39. • 300 seconds 
 single-core (AVX2) • 512 MB in /tmp • 1.5GB RAM • Python, Java, Node AWS LAMBDA
  40. • 300 seconds 
 single-core (AVX2) • 512 MB in /tmp • 1.5GB RAM • Python, Java, Node AWS LAMBDA
  41. LAMBDA SCALABILITY Compute Data
  42. YOU CAN DO A LOT OF WORK WITH MAP! ETL parameter tuning
  43. IMAGENET EXAMPLE Preprocess 1.4M images from IMAGENET Compute GIST image descriptor (some random python code off the internet)
  44. HOW IT WORKS your laptop the cloud
  45. HOW IT WORKS your laptop the cloud future = runner.map(fn, data)
  46. HOW IT WORKS your laptop the cloud future = runner.map(fn, data) Serialize func and data
  47. HOW IT WORKS your laptop the cloud future = runner.map(fn, data) Serialize func and data Put on S3 func datadatadata
  48. HOW IT WORKS your laptop the cloud future = runner.map(fn, data) Serialize func and data Put on S3 Invoke Lambda func datadatadata
  49. HOW IT WORKS your laptop the cloud future = runner.map(fn, data) Serialize func and data Put on S3 Invoke Lambda func datadatadata
  50. HOW IT WORKS pull job from s3 your laptop the cloud future = runner.map(fn, data) Serialize func and data Put on S3 Invoke Lambda func datadatadata
  51. HOW IT WORKS pull job from s3 download anaconda runtime your laptop the cloud future = runner.map(fn, data) Serialize func and data Put on S3 Invoke Lambda func datadatadata
  52. HOW IT WORKS pull job from s3 download anaconda runtime python to run code your laptop the cloud future = runner.map(fn, data) Serialize func and data Put on S3 Invoke Lambda func datadatadata
  53. HOW IT WORKS pull job from s3 download anaconda runtime python to run code your laptop the cloud future = runner.map(fn, data) Serialize func and data Put on S3 Invoke Lambda func datadatadata future.result()
  54. HOW IT WORKS pull job from s3 download anaconda runtime python to run code your laptop the cloud future = runner.map(fn, data) Serialize func and data Put on S3 Invoke Lambda func datadatadata future.result() poll S3
  55. HOW IT WORKS pull job from s3 download anaconda runtime python to run code pickle result your laptop the cloud future = runner.map(fn, data) Serialize func and data Put on S3 Invoke Lambda func datadatadata future.result() poll S3
  56. HOW IT WORKS pull job from s3 download anaconda runtime python to run code pickle result stick in S3 your laptop the cloud future = runner.map(fn, data) Serialize func and data Put on S3 Invoke Lambda func datadatadata future.result() poll S3 result
  57. HOW IT WORKS pull job from s3 download anaconda runtime python to run code pickle result stick in S3 your laptop the cloud future = runner.map(fn, data) Serialize func and data Put on S3 Invoke Lambda func datadatadata future.result() poll S3 unpickle and return result
  58. (Leptotyphlops carlae)
  59. (Leptotyphlops carlae) Want our runtime to include
  60. (Leptotyphlops carlae) Start 1205MB Want our runtime to include
  61. (Leptotyphlops carlae) Start conda clean 1205MB Want our runtime to include
  62. (Leptotyphlops carlae) Start conda clean 977 MB 1205MB Want our runtime to include
  63. (Leptotyphlops carlae) Start conda clean eliminate pkg 977 MB 1205MB Want our runtime to include
  64. (Leptotyphlops carlae) Start conda clean eliminate pkg 977 MB 1205MB 946 MB Want our runtime to include
  65. (Leptotyphlops carlae) Start Delete non-AVX2 MKL conda clean eliminate pkg 977 MB 1205MB 946 MB Want our runtime to include
  66. (Leptotyphlops carlae) Start Delete non-AVX2 MKL conda clean eliminate pkg 977 MB 1205MB 946 MB 670 MB Want our runtime to include
  67. (Leptotyphlops carlae) Start Delete non-AVX2 MKL strip shared libs conda clean eliminate pkg 977 MB 1205MB 946 MB 670 MB Want our runtime to include
  68. (Leptotyphlops carlae) Start Delete non-AVX2 MKL strip shared libs conda clean eliminate pkg 977 MB 1205MB 946 MB 670 MB 510MB Want our runtime to include
  69. (Leptotyphlops carlae) Start Delete non-AVX2 MKL strip shared libs conda clean eliminate pkg delete pyc 977 MB 1205MB 946 MB 670 MB 510MB Want our runtime to include
  70. (Leptotyphlops carlae) Start Delete non-AVX2 MKL strip shared libs conda clean eliminate pkg delete pyc 977 MB 1205MB 441MB 946 MB 670 MB 510MB Want our runtime to include
  71. BEHINDTHE HOOD
  72. UNDERSTANDING HOST ALLOCATION
  73. HOW WE’RE USING IT (my day job)
  74. COMPUTATIONAL IMAGING Nick Antipa, Sylvia Necula, Ren Ng, Laura Waller "Single-shot diffuser-encoded light field imaging." Computational Photography (ICCP), 2016 IEEE International Conference on. IEEE, 2016.
  75. COMPUTATIONAL IMAGING Hardware design Nick Antipa, Sylvia Necula, Ren Ng, Laura Waller "Single-shot diffuser-encoded light field imaging." Computational Photography (ICCP), 2016 IEEE International Conference on. IEEE, 2016.
  76. COMPUTATIONAL IMAGING Hardware design Take Image Nick Antipa, Sylvia Necula, Ren Ng, Laura Waller "Single-shot diffuser-encoded light field imaging." Computational Photography (ICCP), 2016 IEEE International Conference on. IEEE, 2016.
  77. COMPUTATIONAL IMAGING Hardware design Take Image Processing Nick Antipa, Sylvia Necula, Ren Ng, Laura Waller "Single-shot diffuser-encoded light field imaging." Computational Photography (ICCP), 2016 IEEE International Conference on. IEEE, 2016.
  78. COMPUTATIONAL IMAGING Hardware design Take Image Processing Success Nick Antipa, Sylvia Necula, Ren Ng, Laura Waller "Single-shot diffuser-encoded light field imaging." Computational Photography (ICCP), 2016 IEEE International Conference on. IEEE, 2016.
  79. COMPUTATIONAL IMAGING Hardware design Take Image Processing Success Complex forward modelsNick Antipa, Sylvia Necula, Ren Ng, Laura Waller "Single-shot diffuser-encoded light field imaging." Computational Photography (ICCP), 2016 IEEE International Conference on. IEEE, 2016.
  80. COMPUTATIONAL IMAGING Hardware design Take Image Processing Success Complex forward models Large-scale solvers Nick Antipa, Sylvia Necula, Ren Ng, Laura Waller "Single-shot diffuser-encoded light field imaging." Computational Photography (ICCP), 2016 IEEE International Conference on. IEEE, 2016.
  81. 1.5TB/day Jonas, Shankar, Bobra, Recht. Solar Flare Prediction via AIA and HMI Image data. American Geophysical Union Annual Meeting, 2016
  82. 1.5TB/day Jonas, Shankar, Bobra, Recht. Solar Flare Prediction via AIA and HMI Image data. American Geophysical Union Annual Meeting, 2016
  83. NEUROSCIENCE Eric Jonas and Konrad Kording. Automatic discovery of cell types and microcircuitry from neural connectomics eLife,April 30 2015
  84. NEUROSCIENCE Eric Jonas and Konrad Kording. Automatic discovery of cell types and microcircuitry from neural connectomics eLife,April 30 2015
  85. Could a Neuroscientist understand a microprocessor? Jonas, Kording. PLOS Computational Biology, 2017
  86. Could a Neuroscientist understand a microprocessor? Jonas, Kording. PLOS Computational Biology, 2017
  87. Could a Neuroscientist understand a microprocessor? Jonas, Kording. PLOS Computational Biology, 2017
  88. Could a Neuroscientist understand a microprocessor? Jonas, Kording. PLOS Computational Biology, 2017
  89. Could a Neuroscientist understand a microprocessor? Jonas, Kording. PLOS Computational Biology, 2017
  90. CURRENT RESEARCH DIRECTIONS
  91. CURRENT PYWREN RESEARCH
  92. CURRENT PYWREN RESEARCH • Beyond PSPACE
  93. CURRENT PYWREN RESEARCH • Beyond PSPACE • λPACK
  94. CURRENT PYWREN RESEARCH • Beyond PSPACE • λPACK • Towards Shuffle
  95. CURRENT PYWREN RESEARCH • Beyond PSPACE • λPACK • Towards Shuffle • Comparison of Cloud Providers
  96. CURRENT PYWREN RESEARCH • Beyond PSPACE • λPACK • Towards Shuffle • Comparison of Cloud Providers NSDI ‘17
  97. CURRENT PYWREN RESEARCH • Beyond PSPACE • λPACK • Towards Shuffle • Comparison of Cloud Providers NSDI ‘17
  98. CURRENT PYWREN RESEARCH • Beyond PSPACE • λPACK • Towards Shuffle • Comparison of Cloud Providers NSDI ‘17 Johann Schleier-Smith 
 & Joe Hellerstein Serverless Databases
  99. HOW EXPENSIVE IS S3? (Taking dimensionality analysis seriously, or “beyond PSPACE”)
  100. • How do algorithms change when you have infinite memory (through a straw) • Never discard intermediate information
  101. LAPACK
  102. λPACK Vaishaal Shankar
  103. • That’s a lot of SIMD cores! λPACK D x N = D x D N x D + … + = D x D • Parallel matrix multiplication is easy when output matrix is small • Fits cleanly into map-reduce framework
  104. λPACK • However when output matrix is very large it becomes very difficult or expensive to store in memory • For example for N = 1e6 and D=1e4 • D x D matrix of doubles is 800 Mb • N x N matrix of doubles is 8TB • Storing 8TB in memory traditional cluster is expensive! N x D D x N = N x N
  105. λPACK • However when output matrix is very large it becomes very difficult or expensive to store in memory • For example for N = 1e6 and D=1e4 • D x D matrix of doubles is 800 Mb • N x N matrix of doubles is 8TB • Storing 8TB in memory traditional cluster is expensive! N x D D x N = N x N Keeping the kernel dream alive!Ben Recht
  106. λPACK • Solution: Use S3 to store matrices, stream blocks to Lambdas to compute output matrix in parallel N x N
  107. λPACK • Solution: Use S3 to store matrices, stream blocks to Lambdas to compute output matrix in parallel N x N N D Lambdas Runtime Output Size 50000 784 225 192s 20 GB 50000 18432 225 271s 20 GB 1.2 Millon 4096 3000 1320s 11TB 1.2 Million 18432 3000 2520s 11TB
  108. WHAT ABOUT OTHER PROVIDERS?
  109. WHAT ABOUT OTHER PROVIDERS? Allan Peng
  110. AWS Lambda Google Cloud Functions Azure Functions Status Production/GA Beta Production/GA Python Support Yes None. Spawn child process in node.js “Experimental”, not well supported Max Concurrent Invocations. 3000 400, but pywren is limited by socket quota. Unclear Socket Limit None 1GB/100s = 8 runtime downloads Unknown Async Invocation Yes. No asynchronous http invocation, can achieve asynchronous with pub/sub Asynchronous, using queue storage Auth Yes None. Yes.
  111. AWS Lambda Google Cloud Functions Azure Functions Max execution time 300s 540s 600s Memory 1.5GB 2GB 1.5 GB TEMP disk space 512 MB Counted against Memory 500MB Persistent disk space NA NA 5TB [sic]
 shared across all functions $Pr/1m requests $0.20 $0.40 $0.20 $Pr/second of compute $0.00002501 $0.000029 $0.000024 Free tier requests 1m 2m 1m
  112. AWS Google Cloud Functions Azure Functions Deployment options • aws CLI • Web Interface • Boto3 API • gcloud CLI • az CLI • Source Control • Web Interface • Kudu REST API Known Limits Publicly Available Publicly Available Unavailable OS Amazon Linux AMI Debian 8 Windows Server 2012 Support Good Slow. Active on Github
  113. BUT I CAN’T USETHE CLOUD!
  114. BUT I CAN’T USETHE CLOUD!
  115. DOTHE SHUFFLE
  116. DOTHE SHUFFLE
  117. DOTHE SHUFFLE Qifan Pu
  118. DOTHE SHUFFLE (1TB) range-partition each input sort each partition RedisS3 S3
  119. DOTHE SHUFFLE (1TB)
  120. Best thus far: 100TB in 5393 secs for $771, 
 where the record is 2983s for $144. DOTHE SHUFFLE (100TB) one 1TB sort … 100 rounds of 1TB sort … … … … merge data from all rounds for each partition ……
  121. TPC-DS PERFORMANCE NA NA NA 0 200 400 600 800 Q1 Q16 Q94 Q95 QueryTime(seconds) Spark PyWren+Redis PyWren+S3
  122. TODAY’S EXERCISES
  123. TODAY’S EXERCISES
  124. TODAY’S EXERCISES • All Jupyter notebooks like yesterday, but with real amazon accounts backing them
  125. TODAY’S EXERCISES • All Jupyter notebooks like yesterday, but with real amazon accounts backing them • Please don’t invoke more than 100 lambdas simultaneously
  126. TODAY’SEXERCISES
  127. TODAY’SEXERCISES pywren-intro.ipynb
  128. TODAY’SEXERCISES pywren-intro.ipynb analyze-wikipedia.ipynb
  129. TODAY’SEXERCISES pywren-intro.ipynb analyze-wikipedia.ipynb hyperparameter-optimization.ipynb
  130. TODAY’SEXERCISES pywren-intro.ipynb analyze-wikipedia.ipynb matrix-computations-advanced.ipynbhyperparameter-optimization.ipynb
  131. UPCOMING FEATURES • Standalone mode support for GPUs • Iterator interface — run jobs longer than 5m on lambda • Better logging and caching pywren.io Shivaram Venkataraman Ben Recht Ion Stoica Qifan Pu Allan Peng Vaishaal Shankar
  132. THANKYOU! Shivaram Venkataraman Qifan Pu Allan Peng Vaishaal Shankar BenIon Remember to start a new session by going to risecamp.tech/login
Advertisement