Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Ufora @ MLConf
Braxton McKee, CEO & Founder
Why should I have to write a different
program for 1000 rows or 1 billion?
Our Vision: Simplified Distributed
Computing
• Using lots of machines should be as easy as using one.
• Enable scalable, f...
What is Ufora?
Auto-parallel, compiled, multi-host python
Key Components
• JIT Compiled
• Implicit Parallelism at the lang...
We are now open source!
• 5 years of work by ~ 5 engineers
• ~350k lines of code
• Apache 2.0 License
• Hosted on GitHub
Sound Familiar?
• Similar approach to JIT Compilation
• Scalable but without frameworks like MapReduce
• Package that work...
How do I use it?
Install the client pip install pyfora
pyfora_aws start … --num-instances 4
or
docker run … ufora/service
...
How do I use it?
def isPrime(p):
if p < 2: return 0
x = 2
while x*x <= p:
if p%x == 0: return 0
x = x + 1
return 1
result ...
How do I use it?
def isPrime(p):
if p < 2: return 0
x = 2
while x*x <= p:
if p%x == 0: return 0
x = x + 1
return 1
with uf...
What do you give up?
• No mutability of data-structures
• No side-effects
• No nondeterminism
• Emphasize “functional” pro...
Architecture
Worker Nodes S3/HDFS
Gateway Node
PyFora Client
def filter(v,f):
if len(v) == 0:
return []
if len(v) == 1:
return v if f(v[0]) else []
mid = len(v)/2
return filter(v[:mid...
CORE
#1
CORE #2 CORE #3 CORE #4
0 – 25M 25M – 50M 50M – 75M 75M – 100M
100M Integers
0 – 50M 50M – 100M
filter(v, isPrime)...
How do we know where to put the data?
Answer: React dynamically as the program runs
Watch running threads to see what blocks of data they’re accessing.
Move thr...
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
A simple example
v = range(0, 2*10**9)
Red ...
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
Computation starts on Machine 1
When the co...
But real access patterns are more complex!
User writes
Now the computation is looking at all pairs v[i] and v[i+10]
res = ...
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
But when the computation reaches the end of...
Every time we have to move the
computation, we’re hitting the network.
Block 4 on Machine 1
Block 5 on Machine 2
v[ix]
v[i...
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
Solution: Replicate blocks so that they ove...
Project Roadmap: Current Version (0.1)
• Coverage of core python2.7 language.
• Run locally (using docker) or in AWS
• Imp...
Project Roadmap: Upcoming Release
(0.2)
• Core numpy and dataframe implementations (in python)
• Coverage for some core sc...
Project Roadmap: the future
• Python 3 support
• Execution of arbitrary python code out-of-process (for non-pure
code we d...
Ufora is Auto-Parallel, Multi-Host Python
• Star/fork the repo: github.com/ufora/ufora
• Contribute to the codebase
• Find...
Upcoming SlideShare
Loading in …5
×

Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

1,089 views

Published on

Is Machine Learning Code for 100 Rows or a Billion the Same?: We have built an automatically distributed, implicitly parallel data science platform for running large scale machine learning applications. By abstracting away the computer science required to scale machine learning models, The Ufora platform lets data scientists focus on building data science models in simple scripting code, without having to worry about building large-scale distributed systems, their race conditions, fault-tolerance, etc. This automatic approach requires solving some interesting challenges, like optimal data layout for different ML models. For example, when a data scientist says “do a linear regression on this 100GB dataset”, Ufora needs to figure out how to automatically distribute and lay out that data across a cluster of machines in the cluster in order to minimize travel over the wire. Running a GBM against the same dataset might require a completely different layout of that data. This talk will cover how the platform works, in terms of data and thread distribution, how it generates parallel processes out of single-threaded programs, and more.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

  1. 1. Ufora @ MLConf Braxton McKee, CEO & Founder
  2. 2. Why should I have to write a different program for 1000 rows or 1 billion?
  3. 3. Our Vision: Simplified Distributed Computing • Using lots of machines should be as easy as using one. • Enable scalable, fast machine learning and data processing • Parallelism should be natural, come from the language itself I want to treat the cloud like it’s one big, fast, desktop.
  4. 4. What is Ufora? Auto-parallel, compiled, multi-host python Key Components • JIT Compiled • Implicit Parallelism at the language level • Fault tolerant • Automatic co-location of data and compute
  5. 5. We are now open source! • 5 years of work by ~ 5 engineers • ~350k lines of code • Apache 2.0 License • Hosted on GitHub
  6. 6. Sound Familiar? • Similar approach to JIT Compilation • Scalable but without frameworks like MapReduce • Package that works easily with existing python workflow
  7. 7. How do I use it? Install the client pip install pyfora pyfora_aws start … --num-instances 4 or docker run … ufora/service import pyfora ufora = pyfora.connect('http://<ip_address>:30000’) with ufora.remotely: #your code here Get some workers In your python program
  8. 8. How do I use it? def isPrime(p): if p < 2: return 0 x = 2 while x*x <= p: if p%x == 0: return 0 x = x + 1 return 1 result = sum(isPrime(x) for x in xrange(100 * 1000 * 1000)) ~1 hour
  9. 9. How do I use it? def isPrime(p): if p < 2: return 0 x = 2 while x*x <= p: if p%x == 0: return 0 x = x + 1 return 1 with ufora.remote: result = sum(isPrime(x) for x in xrange(100 * 1000 * 1000)) ~10 secs
  10. 10. What do you give up? • No mutability of data-structures • No side-effects • No nondeterminism • Emphasize “functional” programming style
  11. 11. Architecture Worker Nodes S3/HDFS Gateway Node PyFora Client
  12. 12. def filter(v,f): if len(v) == 0: return [] if len(v) == 1: return v if f(v[0]) else [] mid = len(v)/2 return filter(v[:mid],f) + filter(v[mid:],f) primes = filter(range(100000000),isPrime) Naturally parallel (divide and conquer) Implicit Parallelism
  13. 13. CORE #1 CORE #2 CORE #3 CORE #4 0 – 25M 25M – 50M 50M – 75M 75M – 100M 100M Integers 0 – 50M 50M – 100M filter(v, isPrime) Splitting Adaptive Parallelis m
  14. 14. How do we know where to put the data?
  15. 15. Answer: React dynamically as the program runs Watch running threads to see what blocks of data they’re accessing. Move threads to data, or data to threads, depending on what’s cheaper. Detect when two blocks of data absolutely have to be on the same machine. Build a statistical model of correlations between block accesses. Place data to minimize expected future number of machine boundary crossings.
  16. 16. Machine 1 Machine 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Machine 3 Machine 4 A simple example v = range(0, 2*10**9) Red boxes are blocks of data
  17. 17. Machine 1 Machine 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Machine 3 Machine 4 Computation starts on Machine 1 When the computation exhausts the data on one machine, the runtime moves it to the next for x in v: state = f(state,x)
  18. 18. But real access patterns are more complex! User writes Now the computation is looking at all pairs v[i] and v[i+10] res = 0 def f(x,y): # some function for i in xrange(0, len(v)-10): res = res + f(v[i], v[i+10])
  19. 19. Machine 1 Machine 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Machine 3 Machine 4 But when the computation reaches the end of block 4, v[i] and v[i+10] aren’t on the same machine! At first, everything is OK, since v[ix] and v[ix+10] are close to each other in the data
  20. 20. Every time we have to move the computation, we’re hitting the network. Block 4 on Machine 1 Block 5 on Machine 2 v[ix] v[ix+10] This is really slow!
  21. 21. Machine 1 Machine 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Machine 3 Machine 4 Solution: Replicate blocks so that they overlap 5 9 13 Data can live on two different machines at the same time because its immutable!
  22. 22. Project Roadmap: Current Version (0.1) • Coverage of core python2.7 language. • Run locally (using docker) or in AWS • Import pyfora and go!
  23. 23. Project Roadmap: Upcoming Release (0.2) • Core numpy and dataframe implementations (in python) • Coverage for some core scikit data science algorithms (gbm, regressions, etc.) • Better error handling, lots of bugfixes
  24. 24. Project Roadmap: the future • Python 3 support • Execution of arbitrary python code out-of-process (for non-pure code we don't want to port) • More generic model for import/export of data from the cluster. • Enabling better feedback in the pyfora api for tracking progress of computations. • Support for running calculations on GPU
  25. 25. Ufora is Auto-Parallel, Multi-Host Python • Star/fork the repo: github.com/ufora/ufora • Contribute to the codebase • Find me after this presentation • Tell us what we should build next. This affects our priorities!!! • Email me: braxton@ufora.com

×