PYWREN
Real-Time Elastic Execution
Eric Jonas
Postdoctoral Scientist
jonas@eecs.berkeley.edu
@stochastician
Shivaram
Venkataraman
Ben
Recht
Ion
Stoica
Qifan
Pu
Berkeley Center for
Computational Imaging
Allan
Peng
Vaishaal
Shankar
pywren.io
WHAT IS PYWREN
Research Tool
How do systems change
when you have real-time
access to 5000 stateless
cores in <1 sec?
Exploiting real-time
elastic execution
WHAT IS PYWREN
Research Tool
How do systems change
when you have real-time
access to 5000 stateless
cores in <1 sec?
Exploiting real-time
elastic execution
How can we bring the
benefits of elastic compute
to underserved audiences?
Building a “cloud button”
The cloud is too
damn hard!
Jimmy McMillan
Founder and Chairman
The Rent isToo Damn High Party
The cloud is too
damn hard!
Jimmy McMillan
Founder and Chairman
The Rent isToo Damn High Party
Less than half of the graduate
students in our group have
ever written a
Spark or Hadoop job
#THECLOUDISTOODAMNHARD
• What type? what
instance? What base
image?
• How many to spin up?
What price? spot?
• wait,Wait,WAIT oh god
• now what? DEVOPS
ORIGINAL DESIGN GOALS
1.Very little overhead for setup once someone has an AWS account. In
particular, no persistent overhead -- you don't have to keep a large (expensive) cluster
up and you don't have to wait 10+ min for a cluster to come up
ORIGINAL DESIGN GOALS
1.Very little overhead for setup once someone has an AWS account. In
particular, no persistent overhead -- you don't have to keep a large (expensive) cluster
up and you don't have to wait 10+ min for a cluster to come up
2.As close to zero overhead for users as possible -- in particular, anyone who can
write python should be able to invoke it through a reasonable interface.
ORIGINAL DESIGN GOALS
1.Very little overhead for setup once someone has an AWS account. In
particular, no persistent overhead -- you don't have to keep a large (expensive) cluster
up and you don't have to wait 10+ min for a cluster to come up
2.As close to zero overhead for users as possible -- in particular, anyone who can
write python should be able to invoke it through a reasonable interface.
3.Target jobs that run in the minutes-or-more regime.
ORIGINAL DESIGN GOALS
1.Very little overhead for setup once someone has an AWS account. In
particular, no persistent overhead -- you don't have to keep a large (expensive) cluster
up and you don't have to wait 10+ min for a cluster to come up
2.As close to zero overhead for users as possible -- in particular, anyone who can
write python should be able to invoke it through a reasonable interface.
3.Target jobs that run in the minutes-or-more regime.
4.I don't want to run a service.That is, I personally don't want to offer the
front-end for other people to use, rather, I want to directly pay AWS.
ORIGINAL DESIGN GOALS
1.Very little overhead for setup once someone has an AWS account. In
particular, no persistent overhead -- you don't have to keep a large (expensive) cluster
up and you don't have to wait 10+ min for a cluster to come up
2.As close to zero overhead for users as possible -- in particular, anyone who can
write python should be able to invoke it through a reasonable interface.
3.Target jobs that run in the minutes-or-more regime.
4.I don't want to run a service.That is, I personally don't want to offer the
front-end for other people to use, rather, I want to directly pay AWS.
5.It has to be from a cloud player that's likely to give out an academic
grant -- AWS, Google,Azure.There are startups in this space that might build cool
technology, but often don't want to be paid in AWS research credits.
“Most wrens are small and rather inconspicuous, except
for their loud and often complex songs.”
THE API
The most important primitive:
map(function, data)
and… that’s mostly it
HOW IT WORKS
your laptop the cloud
future = runner.map(fn, data)
Serialize func and data
HOW IT WORKS
your laptop the cloud
future = runner.map(fn, data)
Serialize func and data
Put on S3
func datadatadata
HOW IT WORKS
your laptop the cloud
future = runner.map(fn, data)
Serialize func and data
Put on S3
Invoke Lambda
func datadatadata
HOW IT WORKS
your laptop the cloud
future = runner.map(fn, data)
Serialize func and data
Put on S3
Invoke Lambda
func datadatadata
HOW IT WORKS
pull job from s3
your laptop the cloud
future = runner.map(fn, data)
Serialize func and data
Put on S3
Invoke Lambda
func datadatadata
HOW IT WORKS
pull job from s3
download anaconda runtime
your laptop the cloud
future = runner.map(fn, data)
Serialize func and data
Put on S3
Invoke Lambda
func datadatadata
HOW IT WORKS
pull job from s3
download anaconda runtime
python to run code
your laptop the cloud
future = runner.map(fn, data)
Serialize func and data
Put on S3
Invoke Lambda
func datadatadata
HOW IT WORKS
pull job from s3
download anaconda runtime
python to run code
your laptop the cloud
future = runner.map(fn, data)
Serialize func and data
Put on S3
Invoke Lambda
func datadatadata
future.result()
HOW IT WORKS
pull job from s3
download anaconda runtime
python to run code
your laptop the cloud
future = runner.map(fn, data)
Serialize func and data
Put on S3
Invoke Lambda
func datadatadata
future.result()
poll S3
HOW IT WORKS
pull job from s3
download anaconda runtime
python to run code
pickle result
your laptop the cloud
future = runner.map(fn, data)
Serialize func and data
Put on S3
Invoke Lambda
func datadatadata
future.result()
poll S3
HOW IT WORKS
pull job from s3
download anaconda runtime
python to run code
pickle result
stick in S3
your laptop the cloud
future = runner.map(fn, data)
Serialize func and data
Put on S3
Invoke Lambda
func datadatadata
future.result()
poll S3
result
HOW IT WORKS
pull job from s3
download anaconda runtime
python to run code
pickle result
stick in S3
your laptop the cloud
future = runner.map(fn, data)
Serialize func and data
Put on S3
Invoke Lambda
func datadatadata
future.result()
poll S3
unpickle and return
result
• That’s a lot of SIMD cores!
λPACK
D x N = D x D
N x D
+ … + = D x D
• Parallel matrix multiplication is easy when output matrix is
small
• Fits cleanly into map-reduce framework
λPACK
• However when output matrix is very large it becomes very difficult
or expensive to store in memory
• For example for N = 1e6 and D=1e4
• D x D matrix of doubles is 800 Mb
• N x N matrix of doubles is 8TB
• Storing 8TB in memory traditional cluster is expensive!
N x D
D x N =
N x N
λPACK
• However when output matrix is very large it becomes very difficult
or expensive to store in memory
• For example for N = 1e6 and D=1e4
• D x D matrix of doubles is 800 Mb
• N x N matrix of doubles is 8TB
• Storing 8TB in memory traditional cluster is expensive!
N x D
D x N =
N x N
Keeping the
kernel
dream alive!Ben Recht
λPACK
• Solution: Use S3 to store matrices,
stream blocks to Lambdas to
compute output matrix in parallel
N x N
λPACK
• Solution: Use S3 to store matrices,
stream blocks to Lambdas to
compute output matrix in parallel
N x N
N D Lambdas Runtime Output Size
50000 784 225 192s 20 GB
50000 18432 225 271s 20 GB
1.2
Millon
4096 3000 1320s 11TB
1.2
Million
18432 3000 2520s 11TB
AWS Lambda Google Cloud Functions Azure Functions
Status Production/GA Beta Production/GA
Python Support Yes
None. Spawn child process in
node.js
“Experimental”, not well
supported
Max Concurrent
Invocations.
3000
400, but pywren is limited by
socket quota.
Unclear
Socket Limit None
1GB/100s = 8 runtime
downloads
Unknown
Async Invocation Yes.
No asynchronous http
invocation, can achieve
asynchronous with pub/sub
Asynchronous, using
queue storage
Auth Yes None. Yes.
AWS Lambda Google Cloud Functions Azure Functions
Max execution
time
300s 540s 600s
Memory 1.5GB 2GB 1.5 GB
TEMP disk space 512 MB Counted against Memory 500MB
Persistent disk
space
NA NA 5TB [sic]
shared across all functions
$Pr/1m requests
$0.20 $0.40 $0.20
$Pr/second of
compute
$0.00002501 $0.000029 $0.000024
Free tier requests 1m 2m 1m
AWS
Google Cloud
Functions
Azure Functions
Deployment
options
• aws CLI
• Web Interface
• Boto3 API
• gcloud CLI
• az CLI
• Source Control
• Web Interface
• Kudu REST API
Known Limits Publicly Available Publicly Available Unavailable
OS
Amazon Linux
AMI
Debian 8
Windows Server
2012
Support Good Slow. Active on Github
Best thus far: 100TB in 5393 secs for $771,
where the record is 2983s for $144.
DOTHE SHUFFLE (100TB)
one 1TB sort
…
100 rounds of 1TB sort
…
…
…
…
merge data from all rounds
for each partition
……
TPC-DS PERFORMANCE
NA NA NA
0
200
400
600
800
Q1 Q16 Q94 Q95
QueryTime(seconds)
Spark
PyWren+Redis
PyWren+S3
TODAY’S
EXERCISES
• All Jupyter notebooks like
yesterday, but with real amazon
accounts backing them
• Please don’t invoke more than
100 lambdas simultaneously
UPCOMING FEATURES
• Standalone mode support for GPUs
• Iterator interface — run jobs longer than 5m on lambda
• Better logging and caching
pywren.io
Shivaram
Venkataraman
Ben
Recht
Ion
Stoica
Qifan
Pu
Allan
Peng
Vaishaal
Shankar