Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Ufora @ MLConf
Braxton McKee, CEO & Founder

Why should I have to write a different
program for 1000 rows or 1 billion?

Our Vision: Simplified Distributed
Computing
• Using lots of machines should be as easy as using one.
• Enable scalable, fast machine learning and data processing
• Parallelism should be natural, come from the language itself
I want to treat the cloud like it’s one big, fast, desktop.

What is Ufora?
Auto-parallel, compiled, multi-host python
Key Components
• JIT Compiled
• Implicit Parallelism at the language level
• Fault tolerant
• Automatic co-location of data and compute

We are now open source!
• 5 years of work by ~ 5 engineers
• ~350k lines of code
• Apache 2.0 License
• Hosted on GitHub

Sound Familiar?
• Similar approach to JIT Compilation
• Scalable but without frameworks like MapReduce
• Package that works easily with existing python workflow

How do I use it?
Install the client pip install pyfora
pyfora_aws start … --num-instances 4
or
docker run … ufora/service
import pyfora
ufora = pyfora.connect('http://<ip_address>:30000’)
with ufora.remotely:
#your code here
Get some workers
In your python program

How do I use it?
def isPrime(p):
if p < 2: return 0
x = 2
while x*x <= p:
if p%x == 0: return 0
x = x + 1
return 1
result = sum(isPrime(x) for x in xrange(100 * 1000 * 1000))
~1 hour

How do I use it?
def isPrime(p):
if p < 2: return 0
x = 2
while x*x <= p:
if p%x == 0: return 0
x = x + 1
return 1
with ufora.remote:
result = sum(isPrime(x) for x in xrange(100 * 1000 * 1000))
~10 secs

What do you give up?
• No mutability of data-structures
• No side-effects
• No nondeterminism
• Emphasize “functional” programming style

Architecture
Worker Nodes S3/HDFS
Gateway Node
PyFora Client

def filter(v,f):
if len(v) == 0:
return []
if len(v) == 1:
return v if f(v[0]) else []
mid = len(v)/2
return filter(v[:mid],f) + filter(v[mid:],f)
primes = filter(range(100000000),isPrime)
Naturally parallel
(divide and conquer)
Implicit Parallelism

CORE
#1
CORE #2 CORE #3 CORE #4
0 – 25M 25M – 50M 50M – 75M 75M – 100M
100M Integers
0 – 50M 50M – 100M
filter(v, isPrime)
Splitting
Adaptive
Parallelis
m

How do we know where to put the data?

Answer: React dynamically as the program runs
Watch running threads to see what blocks of data they’re accessing.
Move threads to data, or data to threads, depending on what’s cheaper.
Detect when two blocks of data absolutely have to be on the same machine.
Build a statistical model of correlations between block accesses.
Place data to minimize expected future number of machine boundary crossings.

Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
A simple example
v = range(0, 2*10**9)
Red boxes are blocks of data

Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
Computation starts on Machine 1
When the computation exhausts the data on one
machine, the runtime moves it to the next
for x in v:
state = f(state,x)

But real access patterns are more complex!
User writes
Now the computation is looking at all pairs v[i] and v[i+10]
res = 0
def f(x,y):
# some function
for i in xrange(0, len(v)-10):
res = res + f(v[i], v[i+10])

Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
But when the computation reaches the end of block 4,
v[i] and v[i+10] aren’t on the same machine!
At first, everything is OK, since v[ix] and v[ix+10] are
close to each other in the data

Every time we have to move the
computation, we’re hitting the network.
Block 4 on Machine 1
Block 5 on Machine 2
v[ix]
v[ix+10]
This is really slow!

Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
Solution: Replicate blocks so that they overlap
5
9
13
Data can live on two different machines at the same
time because its immutable!

Project Roadmap: Current Version (0.1)
• Coverage of core python2.7 language.
• Run locally (using docker) or in AWS
• Import pyfora and go!

Project Roadmap: Upcoming Release
(0.2)
• Core numpy and dataframe implementations (in python)
• Coverage for some core scikit data science algorithms (gbm,
regressions, etc.)
• Better error handling, lots of bugfixes

Project Roadmap: the future
• Python 3 support
• Execution of arbitrary python code out-of-process (for non-pure
code we don't want to port)
• More generic model for import/export of data from the cluster.
• Enabling better feedback in the pyfora api for tracking progress
of computations.
• Support for running calculations on GPU

Ufora is Auto-Parallel, Multi-Host Python
• Star/fork the repo: github.com/ufora/ufora
• Contribute to the codebase
• Find me after this presentation
• Tell us what we should build next. This affects our priorities!!!
• Email me: braxton@ufora.com

Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Similar to Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15 (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15